Using MPI with Jupyter Notebook
Contents
Overview
This tutorial covers how to use Jupyter Notebook with a multi-node job using the Message Passing Interface (MPI). MPI is used by applications designed to run on multiple processors distributed across several nodes. Memory utlization for these types of jobs is similarly distrbuted across the several nodes used for computation (distributed memory), and thus an MPI library becomes necessary in order to manage data and instructions stored in the distributed space. This contrasts with applications that are run using multiple processors on a single node, that used a simple shared memory model.
Prerequistes
The instructions provided here assume that you have created an appropriate conda environment and understand the basics of launching Jupyter Notebook. Documentation is provided on both of these concepts within our knowledgebase (please refer to the links).
Although it is possible to initiate a Jupyter Notebook session using a job script, this tutorial will utilize a scheduled interactive run using the SLURM command "srun".
Conda Packages
Although there are several methods and packages available for creating an MPI-capable conda environment for Jupyter Notebook, this tutorial will focus on using one of the more popular methods that utilizes a combination of ipyparallel, mpi4py, and MPICH.
First, activate your conda environment, and check which channels you have installed and their order:
(myenv) [user@hpc-login-p01 ~]$ conda activate myenv (myenv) [user@hpc-login-p01 ~]$ conda config --show channels channels: - conda-forge - defaults - bioconda
Experience has shown that the combination of packages associated with conda-forge work best, and mixing these packages across channels can sometimes generate unexpected results. If conda-forge is not present, or not the first channel in the list, please enter the following:
(myenv) [user@hpc-login-p01 ~]$ conda config --add channels conda-forge
If conda-forge is not present it will be installed and inserted at the top of the list. If it is present, entering the command above will move it to the top of the list.
Now you can proceed with installing the necessary additional packages:
(myenv) [user@hpc-login-p01 ~]$ conda install mpich (myenv) [user@hpc-login-p01 ~]$ conda install mpi4py (myenv) [user@hpc-login-p01 ~]$ conda install ipyparallel
If everything installs properly, you can proceed to the next step.
Running Jupyter
Begin by scheduling an interactive job. In this example, we will use 3 nodes, 24 tasks, and 8 tasks per node:
[user@hpc-login-p01 ~]$ srun -N 3 --ntasks=24 --ntasks-per-node=8 -t 5:00:00 --pty /bin/bash --login
Once you are connected to the primary run node (in this example, hpc-throughput-p01), make sure to activate your conda environment, then start Jupyter Notebook:
[user@hpc-throughput-p01 ~]$ conda activate myenv (myenv) [user@hpc-throughput-p01 ~]$ jupyter-notebook --no-browser --port=8889 --ip=0.0.0.0 ... [I 15:01:29.649 NotebookApp] Loading IPython parallel extension [I 15:01:29.650 NotebookApp] Serving notebooks from local directory: /home/u/user [I 15:01:29.650 NotebookApp] Jupyter Notebook 6.4.6 is running at: [I 15:01:29.650 NotebookApp] http://hpc-throughput-p01:8889/?token=5eec98287a8e7c43c7c8d7edf52c0e62ee8f01346df5a24c [I 15:01:29.650 NotebookApp] or http://127.0.0.1:8889/?token=5eec98287a8e7c43c7c8d7edf52c0e62ee8f01346df5a24c [I 15:01:29.650 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 15:01:29.661 NotebookApp] To access the notebook, open this file in a browser: file:///home/u/user/.local/share/jupyter/runtime/nbserver-1130563-open.html Or copy and paste one of these URLs: http://hpc-throughput-p01:8889/?token=5eec98287a8e7c43c7c8d7edf52c0e62ee8f01346df5a24c or http://127.0.0.1:8889/?token=5eec98287a8e7c43c7c8d7edf52c0e62ee8f01346df5a24c
Now in a separate terminal window, open an SSH tunnel to the appropriate node and port:
ssh -N -L 8889:hpc-throughput-p01:8889 [email protected]
Now open your browser and copy and paste the URL listed after you started Jupyter Notebook:
http://127.0.0.1:8889/?token=5eec98287a8e7c43c7c8d7edf52c0e62ee8f01346df5a24c
This will put you into a Jupyter Notebook session. The next step is to start a new Jupyter Notebook by clicking "New" (upper right corner) and selecting "Python 3" under the "Notebook Header":
Once you are in the new notebook, you can enter the code and test which will be covered in the next section.
Running MPI Code
The following is an example that uses some key statements found in "ipyparallel" and "mpi4py". The former permits the establishment of a "cluster" among the nodes assigned to your SLURM job, and facilitates communication between those hosts. The latter provides mechanisms for multiprocessing across multiple processors in a distributed memory space.
Please copy and paste the following example code into a single cell in the notebook:
import ipyparallel as ipp def mpi_example(): from mpi4py import MPI comm = MPI.COMM_WORLD return f"Hello World from rank {comm.Get_rank()}. total ranks={comm.Get_size()}. host={MPI.Get_processor_name()}" # request an MPI cluster with 24 engines with ipp.Cluster(controller_ip="*", engines="mpi", n=24) as rc: # get a broadcast_view on the cluster which is best # suited for MPI style computation view = rc.broadcast_view() # run the mpi_example function on all engines in parallel r = view.apply_sync(mpi_example) # Retrieve and print the result from the engines print("\n".join(r)) # at this point, the cluster processes have been shutdown
Now hit "Shift" and "Return" together to execute the code block. It should look something like the following:
Wait a few moments and you will see the "engines" start. Once they start, you should get a result like the following:
Note above, that we have 8 tasks running on 3 nodes (hpc-throughput-p01 thru hpc-throughput-p03). Each task/core reports in from each host in our cluster.
When you're finished, you want to stop the notebook by selecting "File" then "Close and Halt" as shown below:
Once you get back to the main page, hit "Quit" and then enter "exit" once you return to the terminal. This will end your job.
MPI Code Explanation
For clarity, let's explain some of the elements of the code presented in the following example. First note our first statement:
import ipyparallel as ipp
This imports the functionality of "ipyparallel" as an object "ipp" and enables us to establish a cluster among the nodes reserved for our job. Next, we notice the following function:
def mpi_example(): from mpi4py import MPI comm = MPI.COMM_WORLD return f"Hello World from rank {comm.Get_rank()}. total ranks={comm.Get_size()}. host={MPI.Get_processor_name()}"
This Python code is written using "mpi4py" and establishes parallel programming elements to run on our cluster. In this case, we import "MPI" from "mpi4py" and create the object handle "comm". This enables us to get the rank, total ranks, and hostnames of each parallel distributed process.
Next we see the following:
# request an MPI cluster with 24 engines with ipp.Cluster(controller_ip="*", engines="mpi", n=24) as rc: # get a broadcast_view on the cluster which is best # suited for MPI style computation view = rc.broadcast_view() # run the mpi_example function on all engines in parallel r = view.apply_sync(mpi_example) # Retrieve and print the result from the engines print("\n".join(r)) # at this point, the cluster processes have been shutdown
Here we establish the "Cluster" using ipp and pass-in the arguments "controller_ip="*", engines="mpi", n=24". The "controller_ip" parameter establishes communications between all hosts, while "engines" starts the engines (process threads) using MPI on all of our nodes. The "n=24" specifies the number of engines to start, which should correspond to our total tasks specified when we initiated the SLURM "srun" command. We assign this to the handle "rc".
We run "rc" as a contiguous execution block (via the Python "with" statement) whereby the cluster is created, synchronized, and the function "mpi_example" is run.
Here's another interesting way we could do this:
import os import ipyparallel as ipp def mpi_example(): from mpi4py import MPI comm = MPI.COMM_WORLD return f"Hello World from rank {comm.Get_rank()}. total ranks={comm.Get_size()}. host={MPI.Get_processor_name()}" cluster = ipp.Cluster(controller_ip="*", engines="mpi", n=24) cluster.start_cluster_sync() c = cluster.connect_client_sync() c.wait_for_engines(n = int(os.getenv('SLURM_NTASKS'))) c.ids # get a broadcast_view on the cluster which is best # suited for MPI style computation view = c.broadcast_view() r = view.apply_sync(mpi_example) # Retrieve and print the result from the engines print("\n".join(r)) cluster.stop_cluster_sync()
In this example, we define our function as before (note we also "import os" as we need it later on), and then we start and connect our cluster as independent steps. We then "wait_for_engines" to start and display their "ids" (c.ids). If you just entered the first part of this code (up through c.ids) and executed the block, you could open a separate terminal and login to one of the other assigned nodes in your job. Then, enter "top -c -u <username>". You'll notice there are several worker "engines" running, and waiting for input.
We then establish a "broadcast_view" object of the running cluster, and finally "apply" our "mpi_example" code.
Note that in this case the cluster will continue to run until we stop it and clean-up using the statement:
cluster.stop_cluster_sync()
Summary
As you can see, there are several different ways we can establish a running MPI "cluster" within a set of SLURM-assigned job nodes. Hopefully this tutorial helps you distinguish between the establishment of the cluster using MPI, and the parallel code that runs on it from Jupyter Notebook
More Information
CategoryHPC