Using Jupyter Notebook and Lab on Matilda
Contents
Overview
This tutorial assumes you have some basic knowledge of conda/miniconda and can setup a basic environment. On Matilda, we have installed miniconda3 as a module to help you setup your conda environments. To use:
module load miniconda3
Once you have initialized conda and created a basic environment (in this tutorial called "test-env") you may install the jupyter package and it's dependencies into the environment and you are ready to run Jupyter Notebook on Matilda. Jupyter Lab is very similar but with a few small differences. These are covered in a later section.
To begin installing Jupyter Notebook, we can use the following (example using our test-env environment, with Python v. 3.9):
conda create -n test-env python==3.9 conda activate test-env conda install notebook
This tutorial covers 2 basic methods for running Jupyter Notebook on the Matilda HPC compute nodes.
Interactive Scheduled Job
srun
The first method involves invoking a schedule interactive job to begin the process. For example:
srun -N 1 -c 1 -t 5:00:00 --pty /bin/bash --login
(For more information on the specifics of interactive runs, please refer to Job Script Examples.)
Once your job starts and you are logged into the compute node use the following to initiate your notebook session:
module load miniconda3 source ~/.bashrc conda activate test-env jupyter-notebook --no-browser --port=8889 --ip=0.0.0.0
Substitute the actual name of your environment for "test-env" and choose a port (here we used 8889, default is 8888 if none specified). You should see something like the following:
[I 13:49:58.728 NotebookApp] Serving notebooks from local directory: /home/u/username [I 13:49:58.728 NotebookApp] Jupyter Notebook 6.3.0 is running at: [I 13:49:58.728 NotebookApp] http://hpc-throughput-p01:8889/?token=47ef2216d4ce8e14f30967def52d6e8dd6a0db0514692b00 [I 13:49:58.728 NotebookApp] or http://127.0.0.1:8889/?token=47ef2216d4ce8e14f30967def52d6e8dd6a0db0514692b00 [I 13:49:58.728 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 13:49:58.741 NotebookApp] To access the notebook, open this file in a browser: file:///home/u/username/.local/share/jupyter/runtime/nbserver-1239796-open.html Or copy and paste one of these URLs: http://hpc-throughput-p01:8889/?token=47ef2216d4ce8e14f30967def52d6e8dd6a0db0514692b00 or http://127.0.0.1:8889/?token=47ef2216d4ce8e14f30967def52d6e8dd6a0db0514692b00
From another shell on your workstation, initiate a port forwarding session to the compute node as shown below:
ssh -N -L 8889:hpc-throughput-p01:8889 [email protected]
Make sure to substitute in the actual port used as well as the actual node you are logged into (in this example we are running on hpc-throughput-p01).
You will be asked for your password. Enter it, and please note, your session will appear to "hang" - this is normal, so go to the next step (see below for example):
DO NOT CLOSE THE FORWARDED TERMINAL WINDOW UNTIL YOU FINISH USING NOTEBOOK.
Now open a browser and enter:
http://127.0.0.1:8889/?token=47ef2216d4ce8e14f30967def52d6e8dd6a0db0514692b00
You should now be connected to the Jupyter notebook. Please again note to substitute the actual port used for "8889" above (also your token will be different than the one shown above). Just copy and paste the appropriate link.
salloc
If you want to attach and detach from your notebook without killing the interactive job session, you can use the salloc approach to allocate resources, and then connect to the scheduled resources using srun. For example:
salloc -N 1 -c 1 -t 30:00 srun --jobid=<#####> --pty /bin/bash
Once attached to the allocated resource session, simply follow the instructions for instantiating your Jupyter Notebook as described above (under srun). The salloc/srun interactive job technique is described in more detail in the Example Job Scripts wiki page.
** Please note that if you exit the interactive session it will kill the notebook but your node reservation will remain. When reconnecting with srun you will have to restart the notebook. **
SBATCH Job
You can schedule a Jupyter Notebook run using a conventional "sbatch" job script as shown in the example below:
Example Job Script
{{{#!/bin/bash # ====Sample Job Script==== #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=4G #SBATCH --time=01:00:00 #SBATCH --job-name=jupyter-notebook
# get tunneling info XDG_RUNTIME_DIR="" node=$(hostname -s) user=$(whoami) cluster="hpc-login" port=8889
# print tunneling instructions jupyter-log echo -e " Command to create ssh tunnel: ssh -N -L ${port}:${node}:${port} ${user}@${cluster}.oakland.edu \n Use a Browser on your local machine to go to: localhost:${port} (prefix w/ https:// if using password) " > jupyter-log
# load modules or conda environments here module load miniconda3
# activate conda environment cd source ~/.bashrc conda activate test-env
# Run Jupyter jupyter-notebook --no-browser --port=${port} --ip=${node} }}} By viewing the accompanying jupyter-log file generated by this script you can display instructions for connecting to the batch job. To summarize these - once the job is running open a separate terminal and ssh tunnel to the node where the job is running. For example:
ssh -N -L 8889:hpc-throughput-p01:8889 [email protected]
You will be asked for your password. Enter it, and please note, your session will appear to "hang" - this is normal, so go to the next step (see below for example):
DO NOT CLOSE THE FORWARDED TERMINAL WINDOW UNTIL YOU FINISH USING NOTEBOOK.
Once the tunnel is established, open a local browser session and enter:
http://localhost:8889
Substitute the actual port used for "8889" above. Note this can be adjusted in the example job script.
Overall, this is very similar to the procedure used for the interactive run except the notebook session is established via the job script instead of interactively.
Precautions and Notes
There are several issues to keep in mind when running Jupyter Notebooks on Matilda. These are highlighted below.
Ports
The default port for Jupyter notebooks is "8888". Since others may be running a notebook on the node on which your job is running, it is possible that port will be in-use and an error will occur. If this happens, increment the port number until you find one that is not being used. This can be controlled either in the interactive job session, or by making a change to the job script for port.
Generally, most port numbers in the range of numbers between 1024 and 65535 can be used, unless another service is utilizing it.
Login Node
Please exercise care if running Jupyter Notebooks on the login node. Login is not meant for intensive jobs and notebooks consuming inordinate resources may be killed without warning.
When ever practical, you should be running your notebook on a compute node as a schedule job (interactive or batch).
Compute Node
Installing pakages on the Compute node with "pip --install" need additional commands.
pip install "PACKAGENAME" --trusted-host pypi.org --trusted-host files.pythonhosted.org
Security
Security to your Jupyter notebook can be provided by generating a unique token, or by setting a password. If you set a password for your notebook (discussed below), a token will not be generated by default.
Tokens
When running a Jupyter Notebook it is possible for someone else to attach to your session. If you have not setup a password for your notebook, running the "jupyter-notebook" command will generate an authentication token which you can use to access the session. You will see something like the following:
To access the notebook, open this file in a browser: file:///home/u/username/.local/share/jupyter/runtime/nbserver-1237569-open.html Or copy and paste one of these URLs: http://hpc-throughput-p01:8889/?token=e9624d77e74d00265f88abdd0691510311b94e4355a77a04 or http://127.0.0.1:8889/?token=e9624d77e74d00265f88abdd0691510311b94e4355a77a04
In the case above, use one of the following in your browser:
http://127.0.0.1:8889/?token=e9624d77e74d00265f88abdd0691510311b94e4355a77a04 OR http://localhost:8889/?token=e9624d77e74d00265f88abdd0691510311b94e4355a77a04
DO NOT USE the compute node name in your browser as this will result in a connection failure.
Passwords
Alternately, you may setup a password to use with your Jupyter Notebook sessions. To create the necessary configuration file enter the following on your Matilda account:
module load miniconda3 conda activate test-env jupyter notebook --generate-config
(Substitute the actual name of your conda environment for "test-env" where Jupyter packages are installed.) This will create a file named:
~/.jupyter/jupyter_notebook_config.py
Now to generate a password for your notebooks:
jupyter notebook password
You will be prompted to enter a password and again, for confirmation. Use this password to login to your notebook sessions.
NOTE: If you choose to assign a password to your notebook, and token will NOT be generated!
Jupyter Lab
Most of the guidelines presented above for Jupyter Notebook will work for Jupyter Lab, particularly with respect to connecting to the Jupyter Lab instance running on one of Matilda's nodes. Presented below are a few of the notable differences.
Installation
The only difference from Notebook will be the package that is installed:
conda create -n test-env python==3.9 conda activate test-env conda install -c conda-forge jupyterlab
Launching
Once you have an job running on one of Matilda's nodes as describe above, launch Jupyter Lab using the following:
jupyter lab --no-browser --port=8889 --ip=0.0.0.0
Note that the URI produced for connecting to Jupyter Lab is slightly different than for notebook:
The URI:
http://127.0.0.1:8889/lab?token=5f5d58dc087a5298dd5afb238d6b41e635c82d90866fa1ab
...will be what you paste into your local browser to connect to Jupyter Lab (after you have forwarded your connection as shown previously for Notebook).
When connected you should be able to pull up the following in your browser:
CategoryHPC