Conda Essentials
Contents
Overview
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. Unlike the Matilda HPC modules system, Conda creates custom personalized environments with sometimes incompatible packages that can coexist side-by-side and that can be activated as "environments" by the user.
Initializing Conda
To facilitate the use of Conda by users, we have installed miniconda3 as part of the modules system. The most recent version of Conda installed on Matilda can be utilized by users by loading the module:
[hpctester02@hpc-login-p01 ~]$ module load miniconda3/current
It is STRONGLY RECOMMENDED that you use the "current" miniconda3 modulefile when setting up your environment. Older versions may not support many newer Conda-based applications.
Software that is released as Conda-only distributions cannot be readily ported to the HPC modules system. In these cases users should use the miniconda3 modulefile to prepare and build their own custom environments. This document covers some of the highlights of using Conda.
There are multiple versions of miniconda available. You can see the versions available using:
[hpctester02@hpc-login-p01 ~]$ module av miniconda3 ------------------------------------------------------------------------------ /cm/shared/modulefiles ------------------------------------------------------------------------------- miniconda3/current (D) miniconda3/4.9.2-py385 miniconda3/4.10.3-py385 miniconda3/24.1.2-py311
To load the default (D) STRONGLY RECOMMENDED:
[hpctester02@hpc-login-p01 ~]$ module load miniconda3
Or you may load any version you like by including the version number. For example:
[hpctester02@hpc-login-p01 ~]$ module load miniconda3/24.1.2-py311
Newer versions may have more features, bug fixes, or additional commands or options.
If you would like to ensure you always use the latest version of Conda installed on Matilda, you should use:
[hpctester02@hpc-login-p01 ~]$ module load miniconda3/current
The "current" modulefile will always point to the most recent version of Conda available on Matilda. Once the modulefile is loaded, it is necessary to run the "conda init" command the first time you setup Conda. Initialization will modify your ~/.bashrc file to use the "current" path of miniconda3. For example:
[hpctester02@hpc-login-p01 ~]$ module load miniconda3/current [hpctester02@hpc-login-p01 ~]$ conda init (this will revise ~/.bashrc)
You may logout and then back in again to activate the base Conda environment (this will load the Conda base environment every time you login to Matilda). Alternatively, you can "source" the ~/.bashrc file without logging out and in again:
[hpctester02@hpc-login-p01 ~]$ source ~/.bashrc
An alternative to changing the ~/.bashrc file would be to copy ~/.bashrc to another file, and then open ~/.bashrc and strip out the lines added by "conda init". For example:
[hpctester02@hpc-login-p01 ~]$ cp ~/.bashrc ~/conda.bashrc [hpctester02@hpc-login-p01 ~]$ vim ~/.bashrc (strip the Conda lines and save) [hpctester02@hpc-login-p01 ~]$ source ~/conda.bashrc (base) [hpctester02@hpc-login-p01 ~]$
This will allow you to stay out of the Conda base environment unless you choose to work with Conda.
If you have already initialized Conda using an older version and wish to change to the latest, current version, make sure you exit the Conda base environment (if it is active) and then reinitialize to the new version. This will update your ~/.bashrc file. For example:
(base) [hpctester02@hpc-login-p01 ~]$ conda deactivate [hpctester02@hpc-login-p01 ~]$ module load miniconda3/current [hpctester02@hpc-login-p01 ~]$ conda init [hpctester02@hpc-login-p01 ~]$ conda activate or source ~/.bashrc (base) [hpctester02@hpc-login-p01 ~]$
Conda environments created with older versions of miniconda3 should work with newer versions in most cases, however issues have occasionally been seen when using environments built with newer versions that have been imported into environments running older versions.
Creating a Conda Environment
Once your Conda environment has been initialzed, and you have activated the Conda base environment, use the "conda create" command to create a new environment. Environments are generally stored under the ".conda/envs" folder inside the user home directory. Conda environments can be located elsewhere by using the "--prefix=/path/to/environment" flag, but care must be exercised when managing environments in multiple and/or "non-default" locations.
To create a new environment named "test_env" in the default location (recommended) you could use something like the following from your base Conda environment (recommended):
(base) [hpctester02@hpc-login-p01 ~]$ conda create --name test_env python=3.11
Or if you have not activated the Conda base environment:
[hpctester02@hpc-login-p01 ~]$ module load miniconda3 [hpctester02@hpc-login-p01 ~]$ conda create --name test_env python=3.11
To create this same environment under the "/projects/myproject" directory we might use something like the following:
module load miniconda3 (omit if you are already in the base environment) conda create --name test_env python=3.11 --prefix=/projects/myproject
The above command will create the environment "test_env" as a python 3.11 distribution in the directory "/projects/myproject".
Activating the Environment
Use the "conda activate" method to enter your new environment. This should ideally be done with the base environment activated:
(base) [hpctester02@hpc-login-p01 ~]$ conda activate test_env (test_env) [hpctester02@hpc-login-p01 ~]$
This will place us inside the virtual environment where we can now install packages if we wish. For example:
(test_env) [hpctester02@hpc-login-p01 ~]$ conda install numpy scipy matplotlib (test_env) [hpctester02@hpc-login-p01 ~]$ pip install mrx-link
To deactivate the environment:
(test_env) [hpctester02@hpc-login-p01 ~]$ conda deactivate (base) [hpctester02@hpc-login-p01 ~]$
Managing Available Environments
We can get a list of our available Conda environments using the following"
(base) [hpctester02@hpc-login-p01 ~]$ conda info --envs
To see high level information on our Conda environment:
(test_env) [hpctester02@hpc-login-p01 ~]$ conda info
Similarly, we can see all of the Conda packages installed for the activated environment using:
(test_env) [hpctester02@hpc-login-p01 ~]$ conda list
To see any pip packages installed in the active environment:
(test_env) [hpctester02@hpc-login-p01 ~]$ pip list
If we want to remove an environment permanently we can use (from inside the base environment):
(test_env) [hpctester02@hpc-login-p01 ~]$ conda deactivate (base) [hpctester02@hpc-login-p01 ~]$ conda remove --name test_env --all
Software developers will sometimes provide a "YAML" file for creating a custom environment for the application. These files have the file extension *.yml. These can be used to create a Conda environment as follows:
(base) [hpctester02@hpc-login-p01 ~]$ conda env create -f myapp-linux.yml
YAML environment files contain information about Conda software channels to use as well as Conda and pip packages that should be installed to the environment.
Conda Channels
Community supported software channels can be imported into Conda and used to install a wide variety of packages. Examples include bioconda and conda-forge. Users can specify channels to use during environment creation:
conda create -n test_env --channel conda-forge --channel bioconda <pkgs to install>
We can follow the channel specification with a list of packages to install.
Managing Conda Environment Locations
As mentioned previously by default Conda will install your environments under the "~/.conda/envs" directory. Conda packages can take up quite a bit of space, and are stored under similarly stored under "~/.conda/pkgs" by default. You can install a Conda environment to a different location using the "--prefix" flag:
conda env create --prefix /projects/myprojspace/someuser -f myapp-linux.yml
Please be aware however, that installing an environment in a non-default location means you will have to specify the complete path when activating the environment. For example:
conda activate /projects/myprojspace/someuser/myapp
You can alter the default location for Conda packages by creating a ".condarc" file. This has the advantage of not filling up your home directory space with packages - which is a somewhat common occurrence. A .condarc file might look something like the following:
pkgs_dirs: - /projects/projspace/someuser/conda/pkgs channels: - bioconda - conda-forge - defaults
Place the .condarc file in the root of your home directory. If you have already used and initialized Conda, a .condarc file should already be present. Use a text editor to modify the file as desired.
Exporting a Conda Environment
You can share a Conda environment or transfer it to another machine using an export procedure from inside the activated environment. For example:
(base) [hpctester02@hpc-login-p01 ~]$ conda activate juplink (juplink) [hpctester02@hpc-login-p01 ~]$ conda env export > juplink_export.yml
The file "juplink_export.yml" can then be transfered to another user or another computer and imported using the procedure mentioned previously.
See the KB article on exporting and importing Conda Environments for more detailed information.
Best Practices
If Conda environments are handled carefully they provide a reliable way for users to create any number of custom environments to serve their needs. However, it is not uncommon for users to find a newly installed environment does not work correctly after it is installed, or for older environments to no longer work after installing a new one. Similarly, problems can be encountered when a user exports a working environment to another machine or user, whereupon the transferred environment does not successfully install, or does not work as-expected.
For some of the most common problems encountered with Conda environments, please refer to the KB article on Troubleshooting Conda and Python.
Presented in the following subsections are some best practices that can be used to prevent some of the most common issues users experience.
Building Conda Environments
When creating a Conda environment, make sure you are careful to install everything the environment needs under the activated, working Conda environment, whether you are using “conda install” or “pip install”. This will ensure that all packages are self-contained in the environment and will mitigate problems with packages missing from the export file, or version conflicts between packages. The easiest way to do this is to make sure you are inside the newly created and activated environment before doing your installations:
(base) [hpctester02@hpc-login-p01 ~]$ conda activate test_env (test_env) [hpctester02@hpc-login-p01 ~]$ conda install <pkg> (test_env) [hpctester02@hpc-login-p01 ~]$ pip install <pkg>
Python Package Location
Avoid using pip and/or Python commands that will install packages in your /home directory. Some installation commands (depending on where they are run or command options used) will place Python packages by default in the ~/.local directory. Although this directory is on your "sys.path" (the PATH specifiers Python package locations) in any Conda environment you use, it is very common for users to wind up with two different versions of the same package in an environment: one located in the environment itself, and the other in ~/.local. The following installation situations should be avoided:
[hpctester02@hpc-login-p01 ~]$ python3 -m pip install <package> --user (inside or outside the activated environment) (base) [hpctester02@hpc-login-p01 ~]$ pip install <package>
Both of the commands above will install packages in the ~/.local directory.
Conda Base Version
While newer versions of miniconda3 can usually run environments created with older versions, the reverse is sometimes not the case. Additionally, some Conda packages may not be compatible with older versions of miniconda3. Keep your miniconda3 version up-to-date by either using the "module load miniconda3/current" version when setting up Conda, or running the upgrade procedure described above.
More Information
This document provides only some basic guidance on a few of Conda's features and commands. For more information, you may find some of the following resources to be helpful:
CategoryHPC