HUGI
The Hebrew University campus Grid (HUGI) is a production multi-cluster consisting of 15 MOSIX2 clusters with over 400 nodes (~650 CPUs). Most clusters are private, belonging to research groups in Computer Science, Chemistry, Life-Sciences and the Medical School. Four clusters are made of workstations in student labs. Idle workstations are used to run guest processes from the private clusters on the condition that whenever a student logs in, all guest processes are moved out instantly from that workstation.
Due to the increased computing demands by researchers, the amount of installed memory in the workstations was increased (beyond the needs of the students), to allow large guest processes from the private clusters to run in these workstations.
Contents
On-line monitor
The current status of HUGI can be seen in this link.
HUGI Management
All the nodes in all the clusters of HUGI are diskless. They are booted by CLiP and they do not rely on local disks for running. Local disks may be used for temporary storage. User-files and home directories are located on central NFS servers.
Rules for running applications on HUGI
- Users are requested to login and start their jobs in the private cluster of their group.
- Remote logins to the student workstations are not permitted.
- Users that submit jobs with a large number of processes are requested to use either the -q or the -S queuing options of mosrun.
- Users are requested to use the -m parameter of mosrun, predicting the amount of memory they will require.
Policies for running applications on HUGI
- All the workstations are rebooted every night.
- Before rebooting a workstation, all guest processes are moved out.
- Gust processes can move to other nodes in the grid.
- If no other nodes are available, they are frozen in the home node.
- Processes will automatically migrate to the best available (grid-wide) nodes (subject to the priority of their home cluster).
- Students have the highest priority over their workstations. Whenever a student logs in, all guest processes are moved out from that workstation.
Freeze space
Each cluster owner is responsible to designate sufficient freeze space to accommodate all returning guest processes originating from that cluster.