GNU Parallel

Overview

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. In most cases, GNU parallel is used to run "serial" or even multi-threaded processes that are self-contained on a node. GNU Parallel can be used to spawn multiple jobs across one or more nodes, but in most cases it would not be used to run jobs/process that are in-and-of-themselves, MPI multi-node jobs.

One of the more typical uses of GNU Parallel is to run the same application or application workflow (with the same option parameters) on many different input sets. SLURM is capable of providing this same functionality using it's built in Job Arrays feature. In most cases, we recommend using Job Arrays for these types of jobs, but users may choose to accomplish this same work using GNU Parallel. There are several cautions which should be considered when using GNU Parallel with SLURM to distribute job processes across more than one node. These are discussed later on in this KB.

This document describes the basics on how to properly use GNU Parallel with the SLURM workload manager, especially in cases where users may want to run many jobs on more than one node. There are many options and techniques for using GNU Parallel, and this document only provides some basic examples (it is not intended to be an in-depth tutorial). More more information, please refer to the links provided at the end of this KB.

Single Node Operation

There are a few different ways that GNU Parallel can be used to launch several identical tasks on one node.

Command Line

In this example, we want to run the Python script below on several different input parameters:

## TestScript.py
import argparse
import sys
import os
import time

myhost = os.uname()[1]
parser = argparse.ArgumentParser()
parser.add_argument('kValue')
args = parser.parse_args()

print ("Hostname: ",myhost)
print ("K = ",args.kValue)
time.sleep(10)

We will further segregate our input data into a separate file (data.set):

200
225
34
4500

To launch our GNU Parallel tasks, we simply need something like the following (run on the Login node):

module load parallel
parallel --jobs 4 srun -N1 -n1 -c2 --output GNUparallel.out ./TestScript.py :::: data.set

The "parallel" command instructs that multiple instances of "TestScript.py" should be run on the values contained in the file "data.set". The "srun" command ensures that each of the parallel tasks will be passed to SLURM for handling. The parameters "-N1 -n1 -c2 --output GNUparallel.out" instruct the SLURM resource manager to assign:

  • One node for each parallel job (-N1)
  • One task for each parallel job (-n1)
  • Two cpus for each task (-c2)
  • Put all output from the runs into GNUparallel.out (--output GNUparallel.out)

In this case, the running parallel tasks were assigned to hpc-throughput-p01 by SLURM and appear as follows:

Running Parallel Processes

The output produced:

Hostname:  hpc-throughput-p01
K =  4500
Hostname:  hpc-throughput-p01
K =  225
Hostname:  hpc-throughput-p01
K =  34
Hostname:  hpc-throughput-p01
K =  200

Note that the presence of four (4) colons (:) after the "srun" section. This indicates that a parallel job should be created for each of the values stored in a FILE. We could also have used three (3) colons (:) to in-line our data directly:

parallel --jobs 4 srun -N1 -n1 -c2 --output GNUparallel.out ./TestScript.py ::: 4500 225 34 200

SLURM Interactive Job

We could also schedule an interactive SLURM job, and then run a GNU Parallel command right from the command line of the assigned node. We would need to anticipate how many cores we need for all of our parallel tasks, and make our request for a node accordingly. For example, to be safe we might launch the following from Login:

srun -N1 -n20 -c2 -t 1:00:00 --pty /bin/bash

This would provide us with a typical compute node with 40 cores in an interactive session (for more information, see the KB article on SLURM Interactive Jobs). Once we are assigned to our node at the prompt, we can use something like the following:

module load parallel
parallel --jobs 20 --results GNUparallel.out ./TestScript.py :::: data.set

Note it is not necessary to specify the "srun" with the number of tasks and cpus, since that was done when we established our interactive session. Each GNU Parallel "job" is a "task" with 2 "cpus" assigned to each. Also note that we use the "--results" flag here, since this is the directive for an output file sent to GNU Parallel (--output in the previous example is an instruction to SLURM/srun).

SLURM Batch Job

We can also run our GNU Parallel tasks using a SLURM batch script, and leverage some of the built-in environmental variables provided by SLURM to help us execute our tasks. For example, let's run the same Python script across multiple single input values as we did before:

### Sample job.script
#!/bin/bash
#SBATCH --job-name=GNU_Parallel_Test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --cpus-per-task=2
#SBATCH --time=1:00:00

module load parallel 

parallel --jobs $SLURM_NTASKS --results GNUparallel.out ./TestScript.py :::: data.set

In this example we can simply change the "--ntasks" at the top of the batch script and this will be used for setting the number of jobs launched by GNU Parallel, using the SLURM environmental variable "$SLURM_NTASKS".

Multi-Node Operation

In most cases, users would probably be advised to perform multi-node dispatch of task processes using a SLURM batch job script. However for completeness, a brief discussion is also provided below on launching these jobs via the command line.

Command Line

Although it is possible to launch multi-node distributed parallel tasks from the command line, this technique is not without some hazards (including accidentally dispatching large number of intensive tasks to login directly). Thus, it is strongly recommended to use the batch job technique described in the following section.

In this case, we perform a simple modification of the single node command line example as follows:

parallel --jobs 4 srun -N2 -n2 -c1 --output GNUparallel.out ./TestScript.py ::: 4500 225 34 200

This produces 4 jobs, with the inputs of 4500, 225, 34, and 200, respectively. Each GNU Parallel job has 2 nodes, 2 tasks, and 1 cpu-per-task assigned, and the tasks are jobs are dispatched to SLURM via "srun".

The screen capture below shows the running state of the jobs in the SLURM queue. Note there are two nodes assigned: hpc-throughput-p05 and hpc-throughput-p05:

Running Job Status

If we look at the resulting output stored in the file GNUparallel.out, we see the following:

GNU Parallel Results

Note that a value is produced for both nodes for both tasks (recall this is a very simple script that just outputs the input value). This demonstrates we actually used 2 nodes for each task.

SLURM Batch Job

Single Input Value Jobs

A SLURM batch script could be easily formulated for a multi-node job by combining the single-node batch script, and simply adding more nodes as shown above for the multi-node command line example. However, it is quite common for users to want to pass multiple input parameters to a multi (or single) node job. This is addressed in the next section.

Multiple Input Value Jobs

Passing multiple parameters to GNU Parallel requires some finesse, and an additional layer of complexity is added when wrapping parallel around "srun". GNU Parallel and HPC documentation generally neglects such scenarios, so this example is provided as guidance. This method will work as well with single node jobs as it does with multi-node ones. The key thing to remember, is that in order for SLURM to properly dispatch any GNU Parallel job, it must be used in conjunction with "srun".

One naive approach to passing multiple input parameters to a series of parallel tasks, is to store all of the values as an entry in an array, and then to pass the element as a single "input" to GNU Parallel via srun. This will generally fail, since the spaces or other delimiters stored using such a method (to separate each value) will cause problems unless properly "escaped" - and escaping these values in-line with GNU Parallel and "srun" is very tricky.

A better more direct method is to store each of the input parameters (for each parallel job) in separate arrays, and then to pass those array elements to GNU Parallel. However, the delimiter used to separate the input values from the GNU Parallel command must be modified for this to work. Recall previously when we were working with single input values, we used the delimiter ":::" for in-line values, and "::::" for files containing lines with individual input values.

When constructing a GNU Parallel line using "srun" and passing multiple input values to each job task, we start using the same delimiter for the first value, and then add a "+" to the end of the delimiter for each add-on value. For example:

parallel --jobs 4 srun -N1 -n1 -c2 --output GNUparallel.out ./TestScript.py ::: ${commandsk[@]} :::+ ${commandsj[@]} :::+ ${commandsm[@]}

In the example above, we have stored all of the values for our first input in the array "commandsk", the second in "commandsj" and the third in "commandsm". We could also do something like:

parallel --jobs 4 srun -N1 -n1 -c2 --output GNUparallel.out ./TestScript.py ::: 45 50 51 67 :::+ 2 1 4 5 :::+ 10 20 30 40

For clarity, lets expand the input values that will be used for each of our four (4) jobs:

  • Job1: 45 2 10
  • Job2: 50 1 20
  • Job3: 51 4 30
  • Job4: 67 5 40

We could do the same thing using input files and the delimiters ":::: file.val1 ::::+ file.val2 ::::+ file.val3"

Where each file contained the respective list of all values for input 1, for input 2, and input 3, respectively.

Finally, let's present a job batch script example where we generate input permutations, store each in a separate array, and then pass these to GNU Parallel using "srun":

### Job_script.sh example
#!/bin/bash
#SBATCH --job-name=GNU_Parallel_Test
#SBATCH --nodes=5
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=2
#SBATCH --time=1:00:00
#SBATCH --mem=1G # Adjust memory as needed
module load Python # Adjust version if needed
module load parallel # Load GNU Parallel module if available

# Define Python script and parameters
PYTHON_SCRIPT="TestScript.py"
J_VALUES=(0 1 2 3 4) # 4 values for first input
K_VALUES=("A" "B" ) # 2 values for second input
M_VALUES=(1.09854114e+13 1.93069773e+13 3.39322177e+13 5.96362332e+13) #four values for third input

# Create arrays to store each input value
commandsk=()
commandsj=()
commandsm=()

# Generate simulation commands
for Mh in "${M_VALUES[@]}"; do
    for j in "${J_VALUES[@]}"; do
        for k in "${K_VALUES[@]}"; do
            ## You could do something more meaningful here, like subject each value to some pre-job calculation
            commandsk+=("$k")
            commandsj+=("$j")
            commandsm+=("$Mh")
        done
    done
done

# Dispatch the GNU parallel tasks to SLURM via srun
parallel --jobs $SLURM_NTASKS srun --nodes=1 --ntasks=1 --cpus-per-task=2 $PYTHON_SCRIPT ::: ${commandsk[@]} :::+ ${commandsj[@]} :::+ ${commandsm[@]}

In this example, we would generate a total of 32 jobs/tasks - 4 kvalues for each of 2 jvalues (8), and 8 of these for each of 4 mvalues (8*4=32). Each of those 32 tasks will take a permutation of 3 input values.

More Information


CategoryHPC