HPC Cluster Lmod
Lmod Overview
The official website describes Lmod as
- "Lmod is a Lua based module system that easily handles the MODULEPATH Hierarchical problem. Environment Modules provide a convenient way to dynamically change the users’ environment through modulefiles. This includes easily adding or removing directories to the PATH environment variable. Modulefiles for Library packages provide environment variables that specify where the library and header files can be found."
What this means is that you can have multiple environments for your applications and be able to swap between them easily. This simplifies what you need to know in order to start working with the applications installed on the HPC cluster. It also make it easy for the cluster to support multiple version of the same application without worrying about configuration conflicts.
The official users guide is a good reference site. A summary of the most common commands will be in the text below and the video that follows.
Your main interaction with the Lmod system is the module
command. If you run the command at a terminal prompt without any arguments, you get a help screen similar to the abridge one below.
$ module Modules based on Lua: Version 7.7.14 2017-11-16 16:23 -07:00 by Robert McLay mclay@tacc.utexas.edu module [options] sub-command [args ...] Help sub-commands: ------------------ help prints this message help module [...] print help message from module(s) Loading/Unloading sub-commands: ------------------------------- load | add module [...] load module(s) try-load | try-add module [...] Add module(s), do not complain if not found del | unload module [...] Remove module(s), do not complain if not found swap | sw | switch m1 m2 unload m1 and load m2 purge unload all modules refresh reload aliases from current list of modules. update reload all currently loaded modules. >>> [ some output text removed ] <<< ------------------------------------------------------------------------------------------------------------------------------------------------------- Lmod Web Sites Documentation: http://lmod.readthedocs.org Github: https://github.com/TACC/Lmod Sourceforge: https://lmod.sf.net TACC Homepage: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod To report a bug please read http://lmod.readthedocs.io/en/latest/075_bug_reporting.html -------------------------------------------------------------------------------------------------------------------------------------------------------
As you can see, there are many options that can be specified. Let's look at a few common ones that you will tend to use regularly.
To list all available module, use the module avail
command. The list is large, but some modules are used by other modules so you won't necessarily need to load all of them. Modules can specify dependencies and even load other modules automatically so you don't need to specifically know those dependencies. For modules that don't do this, error messages will clearly state which module is required to be loaded first.
$ module avail ----------------------------------------------------------------- /opt/tcnjhpc/modulefiles ------------------------------------------------------------------ R/3.4.0 cudahome/9.2.88 (D) go/1.10.1 matlab/R2016a rstudio_singularity/3.5.1 amber/16 cudahome/10.0.130 go/1.10.4 matlab/R2018a (D) sagemath/8.3 amber/18 (D) cudnn/5.0.5 go/1.11 (D) miniconda3/4.3.11 sas/9.4 ambertest/18 cudnn/5.1.10 gromacs+plumed/2018.4 mopac/2016 soapdenovo2/2.04-r241 anaconda3/4.4.0 cudnn/6.0.21 (D) gromacs-test/2018.3 mpmc/r285 sondovac/1.3 athena++/1.0 elsa-tutorial/1.0 gromacs/5.1.4.avx2 mrbayes/3.2.6 sop-gpu/2.0 athena/4.2 espresso/3.3.1 gromacs/5.1.4.sse41 netpbm/10.73.10 spades/3.13.0 blast+/2.7.1 fastx/0.0.14 gromacs/5.1.4 nwchem-cuda/6.8 swift/0.96.2 cafemol/3.0 ffmpeg/2.8 gromacs/2018.3 (D) nwchem/6.8 transcriptomics/1.0 cassandra/1.2 garli/2.01 hoomd-blue/2.1.6 omnia trimmomatic/0.38 cpptraj/18.01 gaussian/16.avx jdk/1.8.0_102 paraview/5.5.2 trinity/2.5.1 cuda/8.0 gaussian/16.avx2 jdk/1.8 (D) paraview/5.6.0 (D) visit/2.13.2 cuda/9.2 (D) gaussian/16.legacy jellyfish/2.2.10 pnetcdf/1.11.0 vmd/1.9.3 cuda/10.0 gaussian/16.sse4 julia/1.0.1 prinseq/0.20.4 cudahome/8.0.44 gaussian/16 (D) lammps/2017.03.31 python/2.7.12 cudahome/8.0.61 gnuplot/5.0.6 mathematica/11.3.0 python/3.6.0 (D) ------------------------------------------------------------------- /cm/local/modulefiles ------------------------------------------------------------------- cluster-tools/8.1 dot (L) gcc/7.2.0 (L) lua/5.3.4 module-info openldap cmd freeipmi/1.5.7 ipmitool/1.8.18 module-git null shared (L) ------------------------------------------------------------------ /usr/share/modulefiles ------------------------------------------------------------------- DefaultModules (L) ------------------------------------------------------------------ /cm/shared/modulefiles ------------------------------------------------------------------- acml/gcc-int64/64/5.3.1 cuda10.0/nsight/10.0.130 default-environment (L) lapack/gcc/64/3.8.0 acml/gcc-int64/fma4/5.3.1 cuda10.0/profiler/10.0.130 fftw2/openmpi/gcc/64/double/2.1.5 mpich/ge/gcc/64/3.2.1 acml/gcc-int64/mp/64/5.3.1 cuda10.0/toolkit/10.0.130 fftw2/openmpi/gcc/64/float/2.1.5 mpiexec/0.84_432 acml/gcc-int64/mp/fma4/5.3.1 cuda80/blas/8.0.61 fftw3/openmpi/gcc/64/3.3.7 mvapich2/gcc/64/2.3b acml/gcc/64/5.3.1 cuda80/fft/8.0.61 gdb/8.0.1 netcdf/gcc/64/4.5.0 acml/gcc/fma4/5.3.1 cuda80/nsight/8.0.61 globalarrays/openmpi/gcc/64/5.6.1 netperf/2.7.0 acml/gcc/mp/64/5.3.1 cuda80/profiler/8.0.61 hdf5/1.10.1 openblas/dynamic/0.2.20 acml/gcc/mp/fma4/5.3.1 cuda80/toolkit/8.0.61 hdf5_18/1.8.20 openmpi/gcc/64/1.10.7 (L) blacs/openmpi/gcc/64/1.1patch03 cuda92/blas/9.2.88 hpl/2.2 scalapack/openmpi/gcc/64/2.0.2 blas/gcc/64/3.8.0 cuda92/fft/9.2.88 hwloc/1.11.8 sge/2011.11p1 bonnie++/1.97.3 cuda92/nsight/9.2.88 intel-tbb-oss/ia32/2018_20180618oss slurm/17.11.8 (L) cuda10.0/blas/10.0.130 cuda92/profiler/9.2.88 intel-tbb-oss/intel64/2018_20180618oss tcnjhpc (L) cuda10.0/fft/10.0.130 cuda92/toolkit/9.2.88 iozone/3_471 torque/6.1.1 Where: L: Module is loaded D: Default Module Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
To load a particular module into your current environment, use the module load
or module add
command. They are equivalent.
So what does it mean when you load an environment? Programs typically require certain environment variables be configure to point to configuration files, needed support libraries, etc. The module file contains all those commands with the appropriate values so you don't need to manually set them yourself. Let's look at a sample module file for the AMBER application.
help([[ This module loads the Amber 18 environment Version 18 ]]) whatis("Name: Amber") whatis("Version 18") whatis("Category: chemistry") whatis("Description: Amber") whatis("Keyword: amber") family("amber") -- always_load("cudahome/8.0.61"); always_load("cudahome/9.2.88"); local amberHome = "/opt/tcnjhpc/amber18" setenv("AMBERHOME", amberHome) prepend_path("PATH", pathJoin(amberHome, "bin")) prepend_path("PYTHONPATH", pathJoin(amberHome, "lib/python2.7/site-packages")) prepend_path("LD_LIBRARY_PATH", pathJoin(amberHome, "lib")) -- Prevent SLURM error after step runs -- "srun: error: _server_read: fd 18 error reading header: Connection reset by peer" -- This seems to only occur with Amber at the moment so we'll stick it in here setenv("HYDRA_LAUNCHER_EXTRA_ARGS", "--input none")
There is a lot of stuff in this file that is boilerplate about descriptions of what this module does and such. Let's focus on a few lines from the module.
always_load("cudahome/9.2.88"); local amberHome = "/opt/tcnjhpc/amber18" setenv("AMBERHOME", amberHome) prepend_path("PATH", pathJoin(amberHome, "bin")) prepend_path("PYTHONPATH", pathJoin(amberHome, "lib/python2.7/site-packages")) prepend_path("LD_LIBRARY_PATH", pathJoin(amberHome, "lib"))
These lines tell it to load another module called cudahome, sets local variable called amberHome which is used in the next few lines. Those remaining lines setup an AMBERHOME environement variable that the AmberTools will use along with adding it to existing search path environment variables. What all this means is that with this module being setup for you to simply run module add amber
you would need to run the command below (while avoiding any typos) before using any of the AmberTools
export AMBERHOME=/opt/tcnjhpc/amber18 export PATH=/opt/tcnjhpc/amber18/bin:$PATH export PYTHONPATH=/opt/tcnjhpc/amber18/lib/python2.7/site-packages export LD_LIBRARY_PATH=/opt/tcnjhpc/amber18/lib:$LD_LIBRARY_PATH
Common Lmod Commands
Now that you have a good idea of the benefits of module files, let's work with using the ones provide on the ELSA HPC cluster.
List available modules
module avail
Load a module
If you don't specify the module version for an application, it will usually default to the latest version available unless a specific one has been pinned as the default (module avail
will list (D)
next to the default version).
module load python
or
module add python
Load a specific module version
module add python/3.6.0
Unload a module
module unload python
Unload a specific module version
module unload python/3.6.0
List which modules are currently loaded
module list
You can also use module avail
and look for the modules with a (L)</code) next to them to indicated they are currently loaded.