HPC Cluster Job Scheduler: Difference between revisions
No edit summary |
|||
Line 38: | Line 38: | ||
=== Node Exclusivity === | === Node Exclusivity === | ||
The job allocation can not share nodes with other running jobs. | |||
This option should be used judiciously and sparingly. If for example, your job requires only 2 CPU cores and is scheduled on a node with 32 cores, no other job will be able to make use of the remaining 30 cores (not even your own job). Where this may make sense is when your job is competing for memory (RAM) with others running on the same node. The system is not yet configured to enforce memory limitations like it does for CPU cores. Using this option will guarantee that the entire node is exclusive to your job. | |||
Example: | Example: | ||
<pre> | <pre> |
Revision as of 18:22, 2 June 2019
This content is under construction. Check back often for updates.
Submitting Your First HPC Job
Content to be created.
Anatomy of a SLURM Sbatch Submit Script
Content to be updated.
!/bin/bash #SBATCH --workdir=./ # Set the working directory #SBATCH --mail-user=nobody@tcnj.edu # Who to send emails to #SBATCH --mail-type=ALL # Send emails on start, end and failure #SBATCH --job-name=pi_dart # Name to show in the job queue #SBATCH --output=job.%j.out # Name of stdout output file (%j expands to jobId) #SBATCH --ntasks=4 # Total number of mpi tasks requested #SBATCH --nodes=1 # Total number of nodes requested #SBATCH --partition=test # Partition (a.k.a. queue) to use # Disable selecting Infiniband export OMPI_MCA_btl=self,tcp # Run MPI program echo "Starting on "`date` mpirun pi_dartboard echo "Finished on "`date`
Advanced Submit Script Options
Content to be created.
Constraints
Available constraints.
Example:
#SBATCH --constraint=skylake
Node Exclusivity
The job allocation can not share nodes with other running jobs.
This option should be used judiciously and sparingly. If for example, your job requires only 2 CPU cores and is scheduled on a node with 32 cores, no other job will be able to make use of the remaining 30 cores (not even your own job). Where this may make sense is when your job is competing for memory (RAM) with others running on the same node. The system is not yet configured to enforce memory limitations like it does for CPU cores. Using this option will guarantee that the entire node is exclusive to your job.
Example:
#SBATCH --exclusive
Job Arrays
Example 1:
#SBATCH --output=job.%A_%a.out #SBATCH --array=1-100
Example 2 (step size):
#SBATCH --output=job.%A_%a.out #SBATCH --array=1-100:20
Example 3 (limit simultaneous task):
#SBATCH --output=job.%A_%a.out #SBATCH --array=1-100%5
Example Submit Scripts
Content to be created.
ELSA Job Partitions/Queues
Parition/Queue Name | Max Time Limit | Resource Type |
---|---|---|
short | 6 hours | CPU |
normal | 24 hours | CPU |
long | 7 days | CPU |
nolimit* | none | CPU |
shortgpu | 6 hours | GPU |
gpu | 7 days | GPU |
* - Use of the nolimit partition is restricted to approved cluster users. Faculty may request access for themselves and students by emailing ssivy@tcnj.edu.