Main concepts | File systems | Bproc commands | Scheduler commands |
Job property which distinguishes between jobs of different "nature". Currently supported: Batch (submitted with cmsublist or cmsubmpi) and Interactive (requested with cmgrab). Use diagnose -c to learn about current class configuration.
JobA "unit of computation" defined by the required resources (node, memory etc.), the task (the program(s) to run), environment etc. The status of the job defines its position in the queue: it's either Idle, Running or Deferred.
Job IDA unique identifier given to a job by the Resource Manager. The ID consists of two parts: a name which is usually supplied by the user and in many commands defaults to the username, and a number which can serve to distinguish between jobs with the same name - e.g., gregory.0,cvpr_test_A.15 When a batch job starts, its ID is contained in the environmental variable JOBID, which is set by the scheduler and made available to the job.
NodeA single physical machine, with 2 CPUs, RAM, hard disk etc. Can be allocated to a single user only. Note that the assignment to jobs is done at the CPU level, so for computational purposes the cluster includes 32 nodes, but 64 CPU "tasks". Use diagnose -n to query about the detailed status of specific nodes. The status of a node from the Bproc point of view is obtained using bpstat.
QOS (quality of service)A "priority class" assigned to each job. It is requested by user at the time of job submission, can be monitored by e.g. checkjob and modified by setqos.
ReservationA reservation is allocation of certain resources for a particular goal for a certain time. There are two kinds of reservations: job-created and user-created. Job-created reservations are automatically set up to accomodate jobs that are scheduled to run on a specific set of nodes; in particular, a currently running job always induces a reservation on teh nodes on which it is running. User-created reservations may be set up using the setres command.
Wallclock limitThe time limit for job's execution measured in the real ("wall clock") time, as declared at the time of job submission. The clock starts once the job enters the running state. If the wallclock limit is exceeded by more than the allowed leeway (1 hour 30 minutes) the job is killed. Note: diagnose -j will warn you if the stated time limit is up.
A subset of nodes dedicatd to a particular goal. Jobs usually are submitted to a specific partition and can not span multiple partitions.
Main concepts | File systems | Bproc commands | Scheduler commands |
To log on to bourbaki, use your CSAIL (Kerberos) password. You should get the Kerberos ticket and the AFS token automatically. Once you log in, your home directory is mounted under the same logical name as on any other Linux machine in CSAIL - e.g., /afs/csail/u/g/gregory. Other filesystems mounted on the head node are: /projects, /data, /home,/afs and /csail
These public filesystems can be used in the following ways:
The compute nodes do not see these filesystems. Therefore, if you want to open a file from the process, it has to reside in one of the two directories which are available to you on the compute nodes:
Main concepts | File systems | Bproc commands | Scheduler commands |
Every compute node in the cluster has properties and
ownership similar to
those of a file in Unix. These properties and the status of the node
can be displayed using
bpstat
The main tool for executing commands on the remote node on which
you have execution permission is
bpsh
In order to copy files to/from the local disks on the compute nodes, you can use
bpcp
Main concepts | File systems | Bproc commands | Scheduler commands |
diagnose -j | Get information on job status |
checkjob | Get detailed information status of a job |
diagnose -n | Check the status of node(s) |
checknode | Get detailed information on a node |
diagnose -p | Check priority values for jobs |
showq | Show the list of jobs in the system |
showres | Show the list of reservations in the system |
diagnose -r | Get detailed information on reservations |
cmwait | Wait until the job starts running and report the nodes allocated to it |
cmgetnodes | List the nodes assigned to a job |
There are three kinds of jobs, which differ in the way they are submitted to the queue.
cmsubmpi | Submit an MPI batch job |
cmsublist | Submit a list of 1-CPU jobs |
cmgrab | Request nodes for an interactive job |
canceljob |
setqos | Change the priority class of a job |
cmjobupdate | Change a parameter for a pending job |
setres | Create and advance reservation |
releaseres | Cancel a reservation |
cmall | Apply a command to jobs listed in a file |