Differences between revisions 1 and 36 (spanning 35 versions)
Revision 1 as of 2021-03-01 15:57:47
Size: 1539
Editor: jbjohnston
Comment:
Revision 36 as of 2021-03-08 10:37:55
Size: 5352
Editor: jbjohnston
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl TechnicalServicesGroup:read,write,delete,revert -All:read
= Title =
<<TableOfContents(2)>>
#acl TechnicalServicesGroup:read,write,delete,revert All:read
= Job Scripts =
<<TableOfContents(3)>>
Line 8: Line 8:
{{{#!/bin/bash
{{{
# ====Sample Job Script===
#!/bin/bash
Line 11: Line 12:
Line 13: Line 13:
Line 15: Line 14:
Line 17: Line 15:
Line 19: Line 16:
Line 25: Line 21:

someApp <args>
someApp
Line 29: Line 24:
A single process run only requires 1 node, as well as 1 cpu and a single task. These are reflected in the example script. We change to the same directory from where we submitted the job ( ${SLURM_SUBMIT_DIR} to produce our output. Then we load the module "someApp" and execute it. A single process run only requires 1 node, as well as 1 cpu core and a single task. These are reflected in the example script. We change to the same directory from where we submitted the job ( ${SLURM_SUBMIT_DIR}) to produce our output. Then we load the module "someApp" and execute it.

Note that ${SLURM_SUBMIT_DIR} is one of many environmental variables available from within a SLURM job script. For a comprehensive list, please refer to the [[https://slurm.schedmd.com/sbatch.html|SLURM documentation]].
Line 34: Line 31:
{{{#!/bin/bash
{{{
# ====Sample Job Script===
#!/bin/bash
Line 37: Line 35:
Line 39: Line 36:
Line 41: Line 37:
Line 43: Line 38:
Line 45: Line 39:
Line 51: Line 44:
Line 57: Line 49:
== Multiple Serial Jobs ==
Here we demonstrate it is possible to run multiple copies of the same application, and leverage SLURM's "srun" command to distribute tasks on multiple nodes:
Line 58: Line 53:
command # ====Sample Job Script===
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --time=0-01:00:00
#SBATCH --mem=3102

module load someApp
srun -n 2 python myScript.py &
srun -n 2 someApp &
wait
}}}
=== Explanation ===
We specify 2 nodes and 2 tasks per node (total 4 tasks). The "srun" command is used to direct that 2 copies of each application should be run. srun works with SLURM to launch and schedule each task across our assigned nodes. The ampersand (&) causes each task to be run "in the background" so that all tasks may be launched in parallel and are not blocked waiting for other tasks to complete. The "wait" directive tells SLURM to wait until all background tasks are completed.

== MPI Jobs ==
This is an example of a job script that runs a single MPI application across multiple nodes with distributed memory. It is recommended to use "srun" instead of "mpirun":

{{{
# ====Sample Job Script===
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=14
#SBATCH --ntasks=28
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-10:00:00

module load OpenMPI

srun ./my_mpi_app
}}}
=== Explanation ===
Two nodes are assigned with 14 tasks per node (28 tasks total). One GB of RAM is allocated per CPU and srun is used to launch our MPI-based application.

== Interactive Jobs ==
It is possible to schedule an interactive job on the cluster. This can be accomplished using "srun" and specifying resource parameters on the command line:

{{{
srun -N 1 -c 1 -t 30:00 --pty /bin/bash --login
}}}
=== Explanation ===
Here 1 node is specified as is 1 core, with a walltime of 30 minutes. The balance of the command gives us a bash login shell that will be scheduled by SLURM on one of the compute nodes.

== Job Arrays ==
Job arrays are a convenient way to perform the same set of procedures or tasks on multiple data sets without having to launch more than one job. This reduces the number of job scripts required, and allows jobs to run in parallel with a single script. In the example below, we are executing the same process on 4 different input files:

{{{
# ====Sample Job Script===
#!/bin/bash
#SBATCH --job-name=myArrayest
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task
#SBATCH --time=0-10:00:00
#SBATCH --array=1-4

file=$(awk "NR==${SLURM_ARRAY_TASK_ID}" file_list.txt)
python /home/someUser/myscript.py $file > myoutput_${SLURM_ARRAY_TASK_ID}.out
}}}
=== Explanation ===
 1. The line "#SBATCH --array=1-4" specified we are running 4 tasks, numbered 1-4
 1. The line beginning "file=" uses the scripting language "awk" to read the line number corresponding to the SLURM_ARRAY_TASK_ID (1-4) from the file "file_list.txt" which is contained in the working directory
 1. The python script "myscript.py" operates on the value returned for "$file" (the filename) and to store the output in a file named "myoutput_#.out" where "#" corresponds to the job array ID for the SLURM task.

GPU Jobs

To run a GPU-based job we simply need to add an SBATCH request for the generic resource "gres" for a "gpu" as shown below:

{{{
# ====Sample Job Script===
#!/bin/bash
#SBATCH --job-name=myArrayest
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task
#SBATCH --gres:gpu:1

./myGPUapp

Job Scripts

Serial Single Threaded

This example illustrates a job script designed to run a simple single-threaded processes on a single compute node:

# ====Sample Job Script===
#!/bin/bash
#SBATCH --job-name=mySerialjob
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=0-00:20:00
#SBATCH --mem=3102

cd ${SLURM_SUBMIT_DIR}

module load someApp
someApp

Explanation

A single process run only requires 1 node, as well as 1 cpu core and a single task. These are reflected in the example script. We change to the same directory from where we submitted the job ( ${SLURM_SUBMIT_DIR}) to produce our output. Then we load the module "someApp" and execute it.

Note that ${SLURM_SUBMIT_DIR} is one of many environmental variables available from within a SLURM job script. For a comprehensive list, please refer to the SLURM documentation.

Multi-Threaded Single Node

In this example we are running an application capable of utilizing multiple process threads on a single node (BLAST):

# ====Sample Job Script===
#!/bin/bash
#SBATCH --job-name=myBLASTjob
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=0-01:00:00
#SBATCH --mem=3102

cd ${SLURM_SUBMIT_DIR}

module load BLAST
blastn --num_threads 8 <...>

Explanation

In this case we still have a single task (our blastn run) but we require 8 cpu cores to accommodate the 8 threads we've specified on the command line. The ellipses between the angle brackets represents the balance of our command line arguments.

Multiple Serial Jobs

Here we demonstrate it is possible to run multiple copies of the same application, and leverage SLURM's "srun" command to distribute tasks on multiple nodes:

# ====Sample Job Script===
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --time=0-01:00:00
#SBATCH --mem=3102

module load someApp
srun -n 2 python myScript.py &
srun -n 2 someApp &
wait

Explanation

We specify 2 nodes and 2 tasks per node (total 4 tasks). The "srun" command is used to direct that 2 copies of each application should be run. srun works with SLURM to launch and schedule each task across our assigned nodes. The ampersand (&) causes each task to be run "in the background" so that all tasks may be launched in parallel and are not blocked waiting for other tasks to complete. The "wait" directive tells SLURM to wait until all background tasks are completed.

MPI Jobs

This is an example of a job script that runs a single MPI application across multiple nodes with distributed memory. It is recommended to use "srun" instead of "mpirun":

# ====Sample Job Script===
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=14
#SBATCH --ntasks=28
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-10:00:00

module load OpenMPI

srun ./my_mpi_app

Explanation

Two nodes are assigned with 14 tasks per node (28 tasks total). One GB of RAM is allocated per CPU and srun is used to launch our MPI-based application.

Interactive Jobs

It is possible to schedule an interactive job on the cluster. This can be accomplished using "srun" and specifying resource parameters on the command line:

srun -N 1 -c 1 -t 30:00 --pty /bin/bash --login

Explanation

Here 1 node is specified as is 1 core, with a walltime of 30 minutes. The balance of the command gives us a bash login shell that will be scheduled by SLURM on one of the compute nodes.

Job Arrays

Job arrays are a convenient way to perform the same set of procedures or tasks on multiple data sets without having to launch more than one job. This reduces the number of job scripts required, and allows jobs to run in parallel with a single script. In the example below, we are executing the same process on 4 different input files:

# ====Sample Job Script===
#!/bin/bash
#SBATCH --job-name=myArrayest
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task
#SBATCH --time=0-10:00:00
#SBATCH --array=1-4

file=$(awk "NR==${SLURM_ARRAY_TASK_ID}" file_list.txt)
python /home/someUser/myscript.py $file > myoutput_${SLURM_ARRAY_TASK_ID}.out

Explanation

  1. The line "#SBATCH --array=1-4" specified we are running 4 tasks, numbered 1-4
  2. The line beginning "file=" uses the scripting language "awk" to read the line number corresponding to the SLURM_ARRAY_TASK_ID (1-4) from the file "file_list.txt" which is contained in the working directory
  3. The python script "myscript.py" operates on the value returned for "$file" (the filename) and to store the output in a file named "myoutput_#.out" where "#" corresponds to the job array ID for the SLURM task.

GPU Jobs

To run a GPU-based job we simply need to add an SBATCH request for the generic resource "gres" for a "gpu" as shown below:

# ====Sample Job Script===
#!/bin/bash
#SBATCH --job-name=myArrayest
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task
#SBATCH --gres:gpu:1

./myGPUapp


CategoryHPC