Using SLURM with Jupyter Notebook
Overview
When working in a Jupyter Notebook session it can be advantageous to launch jobs using output from the session, or to conduct parallel processing tasks. This tutorial demonstrates two methods of launching SLURM jobs from inside a Jupyter Notebook.
SLURM-MAGIC
The python package slurm-magic permits complete job scripts to be constructed inside of a Jupyter Notebook and submitted to the cluster resource manager. It is recommended to install slurm-magic in the same conda environment in which you've installed Jupyter Notebook, since it will use this environment and any installed packages for executing the scripted job.
To install slurm-magic:
conda activate myenv module load git-gcc pip install git+https://github.com/NERSC/slurm-magic.git
where "myenv" is the name of the conda environment you intend to run Jupyter Notebook. After starting a job to begin an interactive SLURM session (for example using "srun"), start Jupyter Notebook, create a new notebook and enter the following in a single cell:
%load_ext slurm_magic import warnings warnings.filterwarnings("ignore")
After entering the code above, hit "Shift" and "Return" together. Now you can create the job script inside the next cell. For example, here is a case where we are submitting a script which will run on a GPU node:
%%sbatch #SBATCH --job-name=myGPUTeset #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --gres=gpu:1 #SBATCH --mail-type=ALL #SBATCH [email protected] python /home/u/user/gpuTask.py -i /scratch/users/user/myInput.dat -o /scratch/users/user/gpuTask.out
Once again, hit "Shift" and "Return" to submit the job. (make sure to substitute your actual script or command for the line beginning with "python".
NOTE: Do NOT try to combine the first and second code blocks as this may generate an error. Ensure that "%load_ext slurm_magic" statement is executed BEFORE creating your job script.
You can use essentially any common SLURM directive in this manner, without special keywords or limitations. Please be aware however, whatever code you execute will be subject to the packages installed in the same conda environment as you used to start Jupyter Notebook. Thus, loading other Python modules (for example) may cause unexpected errors or other issues.
Once the job script is submitted, you can check on the status using the "squeue -u username" command in another terminal.
Here is a second example showing how to submit a multi-node MPI job using slurm-magic. Suppose we have a script named "mpiTest.py" that contains the following:
import ipyparallel as ipp def mpi_example(): from mpi4py import MPI comm = MPI.COMM_WORLD return f"Hello World from rank {comm.Get_rank()}. total ranks={comm.Get_size()}. host={MPI.Get_processor_name()}" # request an MPI cluster with 24 engines with ipp.Cluster(controller_ip="*", engines="mpi", n=24) as rc: # get a broadcast_view on the cluster which is best # suited for MPI style computation view = rc.broadcast_view() # run the mpi_example function on all engines in parallel r = view.apply_sync(mpi_example) # Retrieve and print the result from the engines print("\n".join(r)) # at this point, the cluster processes have been shutdown
If you have the necessary prerequisites installed in the conda environment you used to launch Jupyter Notebook (for example, ipyparallel, mpi4py, mpich, etc.) you can submit this using slurm-magic with the following:
%%sbatch #SBATCH --job-name=myMPITest #SBATCH --nodes=3 #SBATCH --ntasks-per-node=8 #SBATCH --ntasks=24 #SBATCH --mail-type=ALL #SBATCH [email protected] python /home/u/user/mpiTest.py
Remember to load the slurm-magic extension first before attempting to run the code above. This job will assign 3 nodes, with 8 cores for each (24 total) and run our MPI script across those nodes (see Using MPI with Jupyter for more information).
SUBMITIT
The python/conda package "submitit" can also be used to submit jobs to the SLURM resource manager inside Jupyter Notebook. Unlike "slurm-magic", "submitit" has unique notations used for structuring and submitting the job. Documentation on "submitit" is not extensive, and some common SLURM directives may not be available. Submitit also is designed more to be an "in-line" part of your code, rather than a standalone job script.
To install submitit:
conda activate myenv conda install submitit
Next, start an interactive SLURM session and start Jupyter Notebook. Then create a new notebook, and enter your code in a cell.
In the following example, we create a function called "add" and then submit it with arguments to the cluster using submitit:
import submitit import sys import os def primes(nprimes): os.system('module load Python') n = nprimes for p in range(2, n+1): for i in range(2, p): if p % i == 0: break else: print (p), print ('Done') log_folder = "log_test/%j" executor = submitit.AutoExecutor(folder=log_folder) executor.update_parameters(slurm_job_name="PrimesTest", tasks_per_node=1, nodes=1, gpus_per_node=1, timeout_min=300, slurm_partition="defq") job = executor.submit(primes, 1000000) print(job.job_id) # ID of your job output = job.result()
In this example, we create a function "primes" which calculates prime numbers until the value of "n" is reached. We need to create a "log_folder" for submitit - this is where all of the job related files will be stored, including output, error, and job submittal scripts. Next, an submitit objected named "executor" is created. This is used to specify job parameters as shown above. Finally, we create an submit object called "jobs" - this actually submits our function "primes" to the cluster as a job, with an input value of "1000000" for the function. The "print" statement will display your jobid and the "output" object is used to capture any error or warning information should the job fail.
Once the code is all entered into a cell, hit "Shit" and "Return" to execute. Unless your function is designed to produce output in a specified location, it will be found in a *.out file under your "log_test" directory under the jobid corresponding to the job you just submitted.
More Information
CategoryHPC