Launching jobs
The scheduling software Slurm can be used to launch jobs on Matilda HPC Cluster.
The srun command is designed for interactive use, with someone monitoring the output.
Batch jobs can be launched using the sbatch command or Slurm job submission script
If you're unfamiliar with Slurm Workload Manager, have a look at the Slurm documentation at SchedMD
Main Slurm Commands
sbatch
sbatch - submit a job script. The sbatch command submits a batch processing job to the slurm queue manager. These scripts typically contain one or more srun commands to queue jobs for processing.
sbatch samplejobscript.sh (That needs 16 cores in total, spread to 4 nodes, and using 4 cpus)
#!/bin/bash # # Sample Batch Script # # # specify how many nodes (physical server) to use. #SBATCH --nodes=4 # use -n or --ntasks to specify how many tasks to run #SBATCH --ntasks=4 # Specify how many CPU cores to use per task #SBATCH --cpus-per-task=4 # Specify a time limit for the job run #SBATCH --time=00:10:00 # Standard output and error log #SBATCH --output=job_output_%j.log # Clear the environment from any previously loaded modules module purge > /dev/null 2>&1 # Load the module environment suitable for the job module load gcc slurm # And finally run the job​ srun hostname srun sleep 10
srun
srun - run a command on allocated compute node(s). The srun command is used to submit jobs for execution, or to initiate steps of jobs in real time. For the full range of options that can be passed to the srun command,.
scancel
scancel - delete a job. The scancel command will terminate pending and running job steps. You can also use it to send a unix signal to all processes associated with a running job or job step.
scancel <jobid>
squeue
squeue - show state of jobs. The squeue command will report the state of running and pending jobs.
squeue -u username
sinfo
sinfo - show state of nodes and partitions (queues). The sinfo command will report the status of the available partitions and nodes
smap
smap - show jobs, partitions and nodes in a graphical network topology. The smap command is similar to the sinfo command, except it displays all of the information in a pseudo-graphical, ncurses terminal.
scontrol
scontrol - modify jobs or show information about various aspects of the cluster The scontrol command is used to tweak a number of slurm things. You'll most likely use it to modify your jobs while they're in the queue, either number of nodes or number of tasks/cpus. Can also be used to display information about jobs, partition structures, and nodes.
CategoryHPC