Table of Contents
CS Resource Manager (Job Scheduler)
General Information
The UVA Computer Science department utilizes (SLURM) to manage server resources.
Slurm acts as the “job scheduler” and the purpose of a job scheduler is to allocate computational resources (individual server(s)) to users who submit job(s) to a queue. The job scheduler looks at the requirements stated in the job's command or script and will allocate server(s) which match the requirements specified. For example, if a job script specifies that a job needs 64GBs of memory, the job scheduler will find a server with at least that much memory free.
Terminology & Important General Information
- Servers managed by the slurm scheduler are referred to as nodes, slurm nodes, or compute nodes
- A collection of nodes controlled by the slurm scheduler is referred to as a cluster
- tasks in slurm can be considered as individual processes
- CPUs in slurm can be considered as individual cores on a processor
- For a job that allocates a single CPU, it is a single process program within a single task
- For a job that allocates multiple CPUs on the same node, it is a multicore program within a single task
- For a job that allocates CPU(s) and multiple nodes (distributed program), then a task will be run on each node
- GPUs in slurm are referred to as a Generic Resource (GRES)
- Using a specific GRES requires specifying the string associated to the GRES
- For example, using
--gres=gpu:1
or--gpus=1
will allocate the first available GPU, regardless of type - Using
#SBATCH --gres=gpu:1
with--constraint=a100_40gb
will require that an A100 GPU with 40GBs be used for your job
- SSH logins to slurm nodes is disabled. Interactive jobs are required for accessing a server's command line
- The CS scheduler is configured to be first-in-first-out (FIFO) for queued jobs
Environment Overview
The slurm scheduler runs as a service or daemon process named slurmctld
on a single designated non-compute node. On each individual compute node, a daemon process named slurmd
is running that the scheduler service slurmctld
communicates with. This allows the scheduler to assign and control jobs on individual nodes and perform status checks.
Head login nodes such as portal
then will contact slurmctld
when commands such as sinfo
, salloc
, and sbatch
are invoked to define a job and required resources to run a program. In turn, slurmctld
will contact the appropriate slurmd
on a node that has the resources required for a job, and then will queue the job to run on the node when the resources become available.
Software modules are made available throughout the CS environment, including slurm nodes. Please see the CS wiki about Software Modules for further details.
Updates & Announcements
- As of 26-Sep-2023, a default QoS was configured on all slurm partitions in the CS cluster, limiting the number of concurrent jobs per user to 64
- Reservations may circumvent this limit by using the QoS
csresnolim
insrun
commands orsbatch
scripts using the parameter-q csresnolim
or--qos=csresnolim
see the section regarding reservations
- As of summer 2024, the slurm job schedule has been updated to version 23.11, and several changes have been applied
- Scheduler and node configurations have been altered to allow for utilization of Linux cgroupsv2 which restrict jobs to their requested resources
- The default partition
main*
has been renamed tocpu
. A default partition is no longer defined. Users must specify a partition when submitting a job
- As of summer 2024, email notifications have been enabled for virginia.edu email addresses only
- To obtain email notifications when a job starts and finishes, include the following
#SBATCH --mail-type=begin,end
#SBATCH --mail-user=<computingID>@virginia.edu
- A
Feature
string namednvlink
has been added to nodes with NVLink technology available for GPU cards. This can be specified with the--constraint
flag
- In August of 2024, a node with 254 cores, 1000GBs of memory, and 8x A100 80GB GPUs was added to the cluster
Resources Available
The tables below describe the resources available by partition name in the CS slurm cluster.
cpu Partition Nodes
#Nodes | CPUs/Node | Sockets | Mem(GBs)/Node | Features |
---|---|---|---|---|
1 | 14 | 1 | 96 | |
1 | 62 | 1 | 122 | |
5 | 14 | 1 | 125 | |
10 | 26 | 1 | 125 | |
1 | 14 | 1 | 500 | |
1 | 22 | 1 | 250 | |
1 | 22 | 1 | 500 | |
3 | 26 | 1 | 500 | |
2 | 30 | 2 | 62 | |
1 | 30 | 1 | 125 | |
2 | 30 | 2 | 125 | |
10 | 46 | 2 | 500 | |
1 | 62 | 2 | 250 | |
1 | 62 | 2 | 500 (1)* | |
1 | 158 | 2 | 252 |
gpu Partition Nodes
All available GPU cards are manufactured by Nvidia. No AMD GPUs are available in slurm.
#Nodes | CPUs/Node | Mem(GBs)/Node | GPU Type | GPUs/Node | GPU Mem(GBs)/GPU | Features |
---|---|---|---|---|---|---|
5 | 30 | 128 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti |
3 | 30 | 62 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti |
1 | 30 | 62 | GeForce GTX 1080Ti GeForce GTX 1080 | 1 2 | 11 8 | gtx_1080ti gtx_1080 |
1 | 30 | 62 | Titan X GeForce GTX 1080 | 3 1 | 12 8 | titan_x gtx_1080 |
2 | 30 | 62 | Titan X | 4 | 12 | titan_x |
3 | 30 | 62 | Tesla P100 | 4 | 16 | tesla_p100 |
2 | 70 | 1000 (2)* | GeForce RTX 2080ti | 2 | 11 | rtx_2080ti |
6 | 30 | 1000 | Quadro RTX 4000 | 4 | 8 | rtx_4000 |
1 | 14 | 250 | Quadro RTX 4000 | 4 | 8 | rtx_4000 |
1 | 78 | 250 | Quadro RTX 6000 | 8 | 24 | rtx_6000 |
2 | 38 | 500 | RTX A4000 | 4 | 16 | a4000 |
1 | 222 | 1000 | RTX A4500 | 8 | 20 | a4500 amd_epyc_7663 |
1 | 78 | 250 | RTX A6000 | 6 | 48 | a6000 |
1 | 30 | 1000 | A16 | 8 | 16 | a16 |
1 | 48 | 126 | A40 | 2 | 48 | a40 |
1 | 62 | 1000 | A40 (3)* | 4 | 48 | a40 nvlink |
1 | 62 | 250 | A40 (3)* | 4 | 48 | a40 nvlink |
1 | 30 | 250 | A100 | 4 | 40 | a100_40gb amd_epyc_7252 |
1 | 254 | 1000 | A100 (4)* | 4 | 80 | a100_80gb nvlink amd_epyc_7742 |
1 | 62 | 750 | A100 (4)* | 4 | 80 | a100_80gb nvlink amd_epyc_9124 |
1 | 254 | 1000 | A100 | 8 | 80 | a100_80gb amd_epyc_7763 |
1 | 222 | 2000 | H100 (4)* | 4 | 80 | h100_80gb nvlink |
nolim Partition Nodes
#Nodes | CPUs/Node | Sockets | Mem(GBs)/Node | Features |
---|---|---|---|---|
5 | 62 | 2 | 125 | |
1 | 38 | 2 | 160 | |
1 | 6 | 1 | 62 |
gnolim Partition Nodes
All available GPU cards are manufactured by Nvidia.
#Nodes | CPUs/Node | Mem(GBs)/Node | GPU Type | GPUs/Node | GPU Mem(GBs)/GPU | Features |
---|---|---|---|---|---|---|
2 | 30 | 125 | GeForce GTX 1080 | 4 | 8 | gtx_1080 |
2 | 22 | 220 | GeForce GTX 1080 | 2 | 8 | gtx_1080 |
5 | 30 | 62 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti |
2 | 30 | 125 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti |
1 | 30 | 112 | GeForce GTX 1080Ti | 4 | 11 | gtx_1080ti |
2 | 22 | 246 | Titan X | 1 | 12 | titan_x |
2 | 18 | 58 | Titan X | 1 | 12 | titan_x |
- Listed features can be used with the
--constraint
flag when submitting a job- For example,
--constraint=h100_80gb,nvlink
, will request H100 cards that share an NVLink
- (1)* 512GB Intel Optane memory, 512 DDR4 memory
- (2)* In addition to 1TB of DDR4 RAM, these nodes also house a 900GB Optane NVMe SSD and a 1.6TB NVMe regular SSD drive
- (3)* Scaled to 96GB by pairing GPUs with NVLink technology that provides a maximum bi-directional bandwidth of 112GB/s between the paired A40s
- (4)* Scaled to 320GB by aggregating all four GPUs with NVLink technology
Information Gathering
Slurm produces a significant amount of information regarding node statuses and job statuses. These are a few of the commonly used tools for querying job data.
Viewing Partitions
Quick overview of all partitions and all nodes and respective statuses. Include -l
for more details
~$ sinfo ~$ sinfo -l
Take note of the TIMELIMIT column. This describes that maximum WALL time that a job may run for on a given node. It is formatted as days-hours:minutes:seconds
, which is shortened to d-hh:mm:ss
~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST gpu up 4-00:00:00 1 idle gpunode ...
This shows a maximum WALL time of 4 days for any job run in the gpu partition.
Viewing Job Queues
To display all jobs that are running, queued, or in another state such as pending (PD)
~$ squeue
To display your jobs that are running, queued, or in another state such as pending (PD) (replace <userid> with your username)
~$ squeue -u <userid>
Node Status
A full list of node states and symbol usage can be found on the SINFO webpage.
At times, a reason can be viewed for why a node is in a certain state such as DOWN or DRAINING/DRAINED with sinfo -R
. When a node is administratively set to drain or be down, relevant information from an admin will be found here
~$ sinfo -R OR ~$ sinfo --list-reasons REASON USER TIMESTAMP NODELIST Not responding userid 2024-04-16T09:53:47 node0 Server Upgrade userid 2024-04-16T09:53:47 node1
To view a specific node for all details including a state reason
~$ scontrol show node <nodename>
Job Status
A full list of job states can be found on the SQUEUE webpage.
After submitting a job, be sure to check the state the job enters into. A job will likely enter a PENDING (PD) state while waiting for resources to become available, and then begin RUNNING (R) (replace <userid> with your username)
~$ squeue -u <userid> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 12345 cpu myjob userid PD 00:00:64 1 (Resources) 12346 cpu myjob userid R 00:00:64 1 node0
Individual Node Details (scontrol)
Most information found here can be found via the SINFO command and resources table.
However, to view full details about a given node in the cluster including node state, reasons, resources allocated/available, features, etc.
~$ scontrol show node <nodename> NodeName=node0 Arch=x86_64 CoresPerSocket=8 CPUAlloc=2 CPUEfctv=32 CPUTot=32 CPULoad=4.35 AvailableFeatures=(null) ActiveFeatures=(null) Gres=gpu:a100_40gb:4(S:0-1) NodeAddr=node0 NodeHostName=node0 Version=22.05.9 OS=Linux RealMemory=256000 AllocMem=0 FreeMem=148379 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=gpu BootTime=2024-03-22T16:11:55 slurmdStartTime=2024-03-24T17:36:28 LastBusyTime=2024-04-16T00:34:17 CfgTRES=cpu=32,mem=250G,billing=32,gres/gpu=4 AllocTRES=cpu=2 CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Job Accounting
A distinction between active (running) and completed jobs is made as information for recently completed jobs is only available to scontrol
for five minutes after the job completes.
Active & Completed Jobs
Utilize the command sacct
to obtain details about your job(s). The amount of details that can be obtained are vast, and thus for example, only a few options are shown. For a full list of available fields, visit the sacct webpage.
Note, time output is formatted as days-hours:minutes:seconds
and shortened to d-hh:mm:ss
. For example, a job with 3-04:05:16
has been running for three days, four hours, five minutes, and sixteen seconds.
To query for all of your recently completed or actively running jobs
~$ sacct -o "jobid,jobname,state,exitcode,elapsed" JobID JobName State ExitCode Elapsed ------------ ---------- ---------- -------- ---------- 1234 myjob RUNNING 0:0 00:05:02 1234.0 myjob RUNNING 0:0 00:05:02
To query for a single completed or active job, include -j <jobid>
~$ sacct -j <jobid> -o "jobname,state,exitcode,elapsed" JobName State ExitCode Elapsed ---------- ---------- -------- ---------- myjob COMPLETED 0:0 00:30:02 myjob COMPLETED 0:0 00:30:02
There are no built in methods in slurm to easily check GPU utilization metrics. Instead the following can be done to view GPU usage for an actively running job
~$ srun --pty --overlap --jobid=<jobid> nvidia-smi
Completed Jobs
To view utilization details of a completed job such as CPU and RAM utilization
~$ seff <jobid> Job ID: 123456 Cluster: cs User/Group: userid/group State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 2 CPU Utilized: 00:00:00 CPU Efficiency: 0.00% of 00:01:42 core-walltime Job Wall-clock time: 00:00:51 Memory Utilized: 4.88 MB Memory Efficiency: 0.06% of 8.00 GB
Reservations
Slurm reservations are made available when dedicated usage is needed for a subset of nodes and their respective resources from the cluster.
Computing resources are finite, and there is considerable demand for nodes, especially the higher end GPU nodes. Reserving a node means that you are not allowing anyone else to use the node, denying that resource to others. So if the support staff sees that a reserved node has sat idle for a few hours, they will delete the reservation. As a result, reservations are generally discouraged unless one is absolutely necessary, and all resources will be used throughout its duration.
Important Notes
- Reservations will be deleted if a job is not submitted/actively running for a period of time
- Reservations should be made in advance when possible
- Same day reservations requests will not cancel active jobs on the node(s) that provide the resources requested
- Nodes and their resources may be reserved for an initial maximum of fourteen days
- Extensions may be requested in increments of one week at most
Requesting a reservation
- Observe the avaialble resource tables above and determine which node(s) will meet your resource requirements
- Send a request to
cshelpdesk@virginia.edu
with the following details- Resources needed
- Number of Nodes
- Number of CPUs (per node)
- Amount of RAM (per node)
- (optinal) Type and number of GPUs
- Reservation duration (maximum is two weeks)
- Users that should have access (userids)
Using a reservation
Include the reservation name in your slurm commands
Be sure to include the slurm QoS csresnolim
to avoid cluster limits such as concurrent running job restrictions.
Using salloc
or srun
(replace <reservation name> with the name of the reservation)
~$ srun --reservation=<reservation name> --qos=csresnolim ~$ salloc --reservation=<reservation name> --qos=csresnolim
Using an sbatch
script
#SBATCH --reservation=<reservation name> #SBATCH --qos=csresnolim
Viewing Current Reservations
To view existing reservations
~$ scontrol show res
Submitting & Controlling Jobs
Submitting Jobs
To submit a job to run on slurm nodes in the CS cluster, you must be logged into one of the login nodes. Currently, this is the portal cluster. After logging into the head nodes, you are able to run slurm commands such as salloc
, srun
, and sbatch
.
- salloc when run will allocate resources/nodes but will not run anything (official salloc documentation)
- One purpose is parallel processing for allocating multiple nodes, then srun can execute a command across the allocated nodes
- srun can utilize a resource allocation from salloc, or can create one itself (official srun documentation)
- sbatch is used to submit a script that describes the resources required and program execution procedure (official sbatch documentation)
Sample command execution
// submit an sbatch script ~$ sbatch mysbatchscript
Important Job Submission Notes
- If you submit a job and it does not run immediately, it may be waiting for resources that are reserved, or you may be at the maximum number of concurrent jobs allowed. Jobs can be left in the queue and will run as soon as resources are available again
Common Job Options
Flags and parameters are passed directly to salloc and srun via the command line, or can be defined in an sbatch script by prefixing the flag with #SBATCH
followed by a space. For example, #SBATCH --nodes=1
attempts to allocate a single nodes.
This list contains several of the common flags to request resources when submitting a job. When submitting a job, you should avoid specifying a hostname, and instead specify required resources.
Note, the syntax of <...>
denotes what should be replaced by various input such as a number or name. A full list of available options can be found on the official SLURM website for salloc, srun, and sbatch.
-J or --job-name=<jobname> The name of your job -N <n> or --nodes=<n> Number of nodes to allocate -n <n> or --ntasks=<n> Number of tasks to run --ntasks-per-node=<n> Number of tasks to be run on each allocated node --ntasks-per-core=<n> Number of tasks to be run on each allocated core --ntasks-per-gpu=<n> Number of tasks to be run on each allocated GPU -p <partname> or --partition=<partname> Submit a job to a specified partition -c <n> or --cpus-per-task=<n> Number of cores to allocate per process, primarily for multithreaded jobs, default is one core per process --mem=<n> System memory required for each node specified in MBs, (Ex. --mem=4000 requires 4000MBs of memory per node) (Ex. --mem=4G requires 4GBs = 4096MBs of memory per node) (Note, --mem=0 requires ALL memory to be available on a node for your job to run. It is recommended to avoid specifying '0' as this requires an entire node to be idle (i.e. no other jobs running) to process your job) --mem-per-cpu=<n> Minimum system memory required for each allocated core, specified in MBs --mem-per-gpu=<n> Minimum system memory required for each allocated GPU, specified in MBs -t D-HH:MM:SS or --time=D-HH:MM:SS Maximum WALL clock time for a job, which should always be specified for interactive jobs. Estimate high, adding at least six hours more than you expect your program to run for. Defaults to partition limit, which can be checked with the 'sinfo' command. Cannot exceed partition maximum WALL time. --gres=<list>:<n> Comma separated list of GRES (such as GPUs) to include (Ex. --gres=gpu:1, allocates the first available GPU) (Ex. --gres=gpu:2 and --constraint=a100_40gb, allocates 2 A100 GPUs that each have 40GBs of GPU memory) -C <features> or --constraint=<features> Specify unique resource requirements. Can be comma separated "Features" shown in resources table, (Ex. -C amd_epyc_7252 allocates a node that has an AMD Epyc processor) (Ex. -C h100_80gb allocates a node that has an h100 GPU with 80GBs of memory) --mail-type=<type> Specify the job state that should generate an email. Valid types are: none,begin,end,fail,requeue,all (Ex. --mail-type=begin,end, send emails for when a job starts and completes) --mail-user=<computingID>@virginia.edu Specify the recipient virginia email address for email notifications (all other domains such as 'gmail.com' are ignored) * Note, --mem, --mem-per-cpu, and --mem-per-gpu options are mutually exclusive and should not be used together * Note, for memory specifications, 1G = 1024MBs. For example, a node with 4000MBs (4 Gigabytes) of available memory will not accept jobs specifying --mem=4G since 4G = 4096MBs.
Environment Variables
A full list of slurm environment variables can be found here slurm environment variables. When a job is submitted, all environment variables are carried forward into the job unless otherwise specified. This behavior can be modified and is primarily changed for sbatch scripts when needed. When a job is submitted, by default, slurm will cd
to the directory the job was submitted (using salloc, sbatch, or srun) from.
To not carry environment variables from your shell forward
#SBATCH --export=NONE
To export individual variables
#SBATCH --export=var0,var1
Variables can be set and exported with
#SBATCH --export=var0=value0,var1=value1
Log files: Standard Output (STDOUT) and Standard Error (STDERR)
slurm by default aggregates STDOUT and STDERR streams into a single file, which by default will be named slurm-%A_%a.out
where %A
is the jobid and %a
is the array index (for job arrays). By default will be found in the directory that sbatch is executed in. The name of this file, streams included, and its location can be modified.
See more information about file name patterns.
Modify combined output file name
#SBATCH -o <output file name> or #SBATCH --output=<output file name>
To separate the output files, define both file names for STDOUT and STDERR
#SBATCH --output=<output file name> #SBATCH -e <error file name> or #SBATCH --output=<output file name> #SBATCH --error=<error file name>
A path can be specified for either option, patterns can be used in the file name as well
#SBATCH --output="/p/myproject/slurmlogs/%A.out"
Interactive Job
Direct SSH connections are disabled to slurm nodes. Instead, an interactive job can be initialized to run commands directly on a node.
Note, idle interactive sessions deny resource allocations for other jobs. As such, idle interactive jobs when found are terminated. Generally, interactive sessions should be used for testing and debugging with the intention of creating an SBATCH script to run your job.
Note, an interactive session will time out after one hour if no commands are executed during the hour.
The following example creates a resource allocation within the CPU partition for one node with two cores, 4GBs of memory, and a time limit of 30 minutes. Then, a BASH shell is initialized within the allocation:
userid@portal01~$ salloc -p cpu -N 1 -c 2 --mem=4000 -J InteractiveJob -t 30 salloc: Granted job allocation 12345 userid@portal01~$ srun --pty bash -i -l -- userid@node01~$ ... testing performed ... userid@node01~$ exit userid@portal01~$ exit salloc: Relinquishing job allocation 12345 salloc: Job allocation 12345 has been revoked. userid@portal01~$
Be sure to type exit
twice. Firstly to close/cancel the srun command on a node, then a second time to relinquish the allocation made with salloc.
View Active Job Details
To view all of your job(s) and their respective statuses
~$ squeue -u <userid>
To view individual job details in full
~$ scontrol show job <jobid>
To view full job utilization and allocation details
~$ sstat <jobid>
A time to job start estimation can sometimes be gained by running
~$ squeue --start -j <jobid>
Canceling Jobs
To obtain jobids, be sure to utilize the squeue
command as shown
To cancel an individual job
~$ scancel <jobid>
To send a different signal to job processes (default is SIGTERM/terminate), use the --signal=<signal>
flag
~$ scancel --signal=KILL <jobid>
To cancel all of your jobs regardless of state
~$ scancel -u <userid>
To cancel all of your PENDING (PD) jobs, include the -t <state>
or --state=<state>
flag
~$ scancel -u <userid> -t PENDING
To completely restart a job, that is to cancel and restart from the beginning
~$ scontrol requeue <jobid>
Email Notifications
Email notifications are available for various job state changes, for example, when a job starts, completes, or fails. The scheduler checks every five minutes for emails to send out.
When receiving an email for completed jobs, the seff <jobID>
command is executed to provide utilization statistics for allocated CPUs and memory.
To enable email notifications, include in a given SBATCH script
#SBATCH --mail-type=<type> #SBATCH --mail-user=<computingID>@virginia.edu
Replace <type>
with the option(s) (comma separated) none,begin,end,fail,requeue,all
, and replace <computingID>
with your UVA computing ID.
Note, all other email domains such as gmail.com are silently ignored.
Note, for job arrays, the scheduler only generates a single email for <jobID>_*
and not each individual job such as <jobID>_1
. Further the seff
command is run on the last array job that completes.
For example, to receive a notification when a job starts and finishes
#SBATCH --mail-type=begin,end #SBATCH --mail-user=<computingID>@virginia.edu
Official documentation for SLURM emailing functionality can be found (here)
Example SBATCH Scripts
sbatch scripts are simple sequentially executed command scripts. These are simple bash scripts where slurm parameters (#SBATCH <flag>
) are defined along with commands to run a program. Simply, the commands should be the same that you would use in your terminal.
When submitting an sbatch script, be aware of file paths. An sbatch script will use the current directory from where it was submitted.
To submit an sbatch script from a login node, replace <script file name>
with the name of your sbatch script
~$ sbatch <script file name>
The examples below are to give a starting point for creating a job script. It is recommended to modify as needed for your job(s).
Single Process Program
The following is an example of a single process job that runs on the CPU partition. This will allocate the default amount of memory for the cpu partition per core, which is only 256MBs
#!/bin/bash #SBATCH -n 1 #SBATCH -t 04:00:00 #SBATCH -p cpu #SBATCH --mail-type=begin,end #SBATCH --mail-user=<computingID>@virginia.edu ./myprogram
Simple GPU Program (allocates first available GPU)
The following allocates the first available GPU, regardless of model/type, along with 8GBs of system memory
#!/bin/bash #SBATCH --gres=gpu:1 #SBATCH --mem=8000 #SBATCH -n 1 #SBATCH -t 04:00:00 #SBATCH -p gpu #SBATCH --mail-type=begin,end #SBATCH --mail-user=<computingID>@virginia.edu python3 myprogram.py
Simple GPU Program (allocates a specific GPU)
The following requests a single A100 GPU card with 40GBs of memory when one is available, along with 16GBs of system memory
#!/bin/bash #SBATCH --gres=gpu:1 #SBATCH --constraint=a100_40gb #SBATCH --mem=16000 #SBATCH -n 1 #SBATCH -t 04:00:00 #SBATCH -p gpu #SBATCH --mail-type=begin,end #SBATCH --mail-user=<computingID>@virginia.edu module load cuda-toolkit python3 myprogram
Learn more about allocating GPUs for your job here.
Simple Parallel Program
For jobs that will utilize the Message Passing Interface (MPI), several nodes/processors will have to be requested to utilize.
The following requests two servers, each to run 8 tasks, for a total of 16 tasks.
#!/bin/bash #SBATCH --nodes=2 #SBATCH --ntasks-per-node=8 #SBATCH -t 06:00:00 #SBATCH -p cpu #SBATCH --mail-type=begin,end #SBATCH --mail-user=<computingID>@virginia.edu module load gcc module load openmpi srun ./myparallel_program
Job Arrays
An option using slurm is to submit several jobs simultaneously for processing separate data.
The following requests a total of 16 array tasks, wherein each individually requires 2 CPUs and 1GB of memory. The resulting output files have the format examplearray_%A-%a.out
, where %A
is the jobid, and %a
is the task number.
#!/bin/bash #SBATCH --job-name=myjobarray #SBATCH --array=1-16 #SBATCH --ntasks=1 #SBATCH --mem=1000 #SBATCH -t 00:04:00 #SBATCH -p cpu #SBATCH --output=examplearray_%A-%a.out #SBATCH --mail-type=begin,end #SBATCH --mail-user=<computingID>@virginia.edu echo "Hello world from $HOSTNAME, slurm taskid is: $SLURM_ARRAY_TASK_ID"
Canceling Array Tasks
To cancel one single task from an array
~$ scancel <jobid>_<taskid>
To cancel a range of tasks
~$ scancel <jobid>_[<task0>-<task3>]
A list format can also be provided
~$ scancel <jobid>_[<task0>,<task1>,<taskid2>]