Using AlphaFold on Matilda
Overview
AlphaFold is a protein structure prediction package that was originally developed as a Docker implementation. Because Docker containers are not permitted on Matilda, AlphaFold was reconfigured to load as a standard module application. Although every effort has been made to test AlphaFold's functionality in this configuration, we cannot guarantee that operation of the program will be without errors in all cases.
Basic Instructions
To use AlphaFold, please load the modulefile:
module load AlphaFold
This will load all paths and external applications needed to run AlphaFold.
Next, please do NOT execute the run_alphafold.py script as directed on the AlphaFold website. Instead, use the following:
run_alphafold.sh <args>
Databases
The databases for AlphaFold have already been downloaded and placed in the main installation directory. The path to these databases is identified by the environmental variable (set during module load) "$ALPHAFOLD_DATA". Please reference this environmental variable when referencing the databases as shown below:
run_alphafold.sh -d $ALPHAFOLD_DATA <args>
GPU Functionality
AlphaFold has GPU capabilities and as such can be run on the GPU nodes by specifying the "gres" parameter in your job script. For example:
#SBATCH gres=gpu:4
Testing has shown that AlphaFold can use up to the maximum number of GPU's on the Matilda GPU nodes (4), and in fact, will attempt to use all the GPUs. This may enhance the performance of the program substantially.
However, when crafting your job script, please use "gres=gpu:4" or the "--exclusive" flag to ensure you have the entire GPU node for AlphaFold, since it is indiscriminate about which GPUs it attempts to utilize, and may collide with a process that is already running on that node. This will not impact the job that is already running but will cause AlphaFold to fail.
CategoryHPC