HPC Cluster Lmod: Difference between revisions

From HPC Docs
Jump to navigation Jump to search
No edit summary
No edit summary
Line 197: Line 197:
The official [https://lmod.readthedocs.io/en/latest/010_user.html Lmod User documentation].
The official [https://lmod.readthedocs.io/en/latest/010_user.html Lmod User documentation].


===ELSA Lmod Tutorial===
==ELSA Lmod Tutorial==
<embedvideo service="youtube" urlargs="rel=0">https://www.youtube.com/watch?v=HaaEEkFF8lg</embedvideo>
<embedvideo service="youtube" urlargs="rel=0">https://www.youtube.com/watch?v=HaaEEkFF8lg</embedvideo>

Revision as of 19:02, 4 June 2019

Lmod Overview

The official website describes Lmod as

"Lmod is a Lua based module system that easily handles the MODULEPATH Hierarchical problem. Environment Modules provide a convenient way to dynamically change the users’ environment through modulefiles. This includes easily adding or removing directories to the PATH environment variable. Modulefiles for Library packages provide environment variables that specify where the library and header files can be found."

What this means is that you can have multiple environments for your applications and be able to swap between them easily. This simplifies what you need to know in order to start working with the applications installed on the HPC cluster. It also make it easy for the cluster to support multiple version of the same application without worrying about configuration conflicts.

The official users guide is a good reference site. A summary of the most common commands will be in the text below and the video that follows.

Your main interaction with the Lmod system is the module command. If you run the command at a terminal prompt without any arguments, you get a help screen similar to the abridge one below.

$ module

Modules based on Lua: Version 7.7.14  2017-11-16 16:23 -07:00
    by Robert McLay mclay@tacc.utexas.edu

module [options] sub-command [args ...]

Help sub-commands:
------------------
  help                              prints this message
  help                module [...]  print help message from module(s)

Loading/Unloading sub-commands:
-------------------------------
  load | add          module [...]  load module(s)
  try-load | try-add  module [...]  Add module(s), do not complain if not found
  del | unload        module [...]  Remove module(s), do not complain if not found
  swap | sw | switch  m1 m2         unload m1 and load m2
  purge                             unload all modules
  refresh                           reload aliases from current list of modules.
  update                            reload all currently loaded modules.

>>> [ some output text removed ] <<< 

    -------------------------------------------------------------------------------------------------------------------------------------------------------
Lmod Web Sites

  Documentation:    http://lmod.readthedocs.org
  Github:           https://github.com/TACC/Lmod
  Sourceforge:      https://lmod.sf.net
  TACC Homepage:    https://www.tacc.utexas.edu/research-development/tacc-projects/lmod

  To report a bug please read http://lmod.readthedocs.io/en/latest/075_bug_reporting.html
    -------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, there are many options that can be specified.

To list all available modules, use the module avail command. The list is large, but some modules are used by other modules so you won't necessarily need to load all of them. Modules can specify dependencies and even load other modules automatically so you don't need to specifically know those dependencies. For modules that don't do this, error messages will clearly state which module is required to be loaded first.

$ module avail

----------------------------------------------------------------- /opt/tcnjhpc/modulefiles ------------------------------------------------------------------
   R/3.4.0                cudahome/9.2.88    (D)    go/1.10.1                    matlab/R2016a            rstudio_singularity/3.5.1
   amber/16               cudahome/10.0.130         go/1.10.4                    matlab/R2018a     (D)    sagemath/8.3
   amber/18        (D)    cudnn/5.0.5               go/1.11               (D)    miniconda3/4.3.11        sas/9.4
   ambertest/18           cudnn/5.1.10              gromacs+plumed/2018.4        mopac/2016               soapdenovo2/2.04-r241
   anaconda3/4.4.0        cudnn/6.0.21       (D)    gromacs-test/2018.3          mpmc/r285                sondovac/1.3
   athena++/1.0           elsa-tutorial/1.0         gromacs/5.1.4.avx2           mrbayes/3.2.6            sop-gpu/2.0
   athena/4.2             espresso/3.3.1            gromacs/5.1.4.sse41          netpbm/10.73.10          spades/3.13.0
   blast+/2.7.1           fastx/0.0.14              gromacs/5.1.4                nwchem-cuda/6.8          swift/0.96.2
   cafemol/3.0            ffmpeg/2.8                gromacs/2018.3        (D)    nwchem/6.8               transcriptomics/1.0
   cassandra/1.2          garli/2.01                hoomd-blue/2.1.6             omnia                    trimmomatic/0.38
   cpptraj/18.01          gaussian/16.avx           jdk/1.8.0_102                paraview/5.5.2           trinity/2.5.1
   cuda/8.0               gaussian/16.avx2          jdk/1.8               (D)    paraview/5.6.0    (D)    visit/2.13.2
   cuda/9.2        (D)    gaussian/16.legacy        jellyfish/2.2.10             pnetcdf/1.11.0           vmd/1.9.3
   cuda/10.0              gaussian/16.sse4          julia/1.0.1                  prinseq/0.20.4
   cudahome/8.0.44        gaussian/16        (D)    lammps/2017.03.31            python/2.7.12
   cudahome/8.0.61        gnuplot/5.0.6             mathematica/11.3.0           python/3.6.0      (D)

------------------------------------------------------------------- /cm/local/modulefiles -------------------------------------------------------------------
   cluster-tools/8.1    dot            (L)    gcc/7.2.0       (L)    lua/5.3.4     module-info    openldap
   cmd                  freeipmi/1.5.7        ipmitool/1.8.18        module-git    null           shared   (L)

------------------------------------------------------------------ /usr/share/modulefiles -------------------------------------------------------------------
   DefaultModules (L)

------------------------------------------------------------------ /cm/shared/modulefiles -------------------------------------------------------------------
   acml/gcc-int64/64/5.3.1            cuda10.0/nsight/10.0.130      default-environment                    (L)    lapack/gcc/64/3.8.0
   acml/gcc-int64/fma4/5.3.1          cuda10.0/profiler/10.0.130    fftw2/openmpi/gcc/64/double/2.1.5             mpich/ge/gcc/64/3.2.1
   acml/gcc-int64/mp/64/5.3.1         cuda10.0/toolkit/10.0.130     fftw2/openmpi/gcc/64/float/2.1.5              mpiexec/0.84_432
   acml/gcc-int64/mp/fma4/5.3.1       cuda80/blas/8.0.61            fftw3/openmpi/gcc/64/3.3.7                    mvapich2/gcc/64/2.3b
   acml/gcc/64/5.3.1                  cuda80/fft/8.0.61             gdb/8.0.1                                     netcdf/gcc/64/4.5.0
   acml/gcc/fma4/5.3.1                cuda80/nsight/8.0.61          globalarrays/openmpi/gcc/64/5.6.1             netperf/2.7.0
   acml/gcc/mp/64/5.3.1               cuda80/profiler/8.0.61        hdf5/1.10.1                                   openblas/dynamic/0.2.20
   acml/gcc/mp/fma4/5.3.1             cuda80/toolkit/8.0.61         hdf5_18/1.8.20                                openmpi/gcc/64/1.10.7          (L)
   blacs/openmpi/gcc/64/1.1patch03    cuda92/blas/9.2.88            hpl/2.2                                       scalapack/openmpi/gcc/64/2.0.2
   blas/gcc/64/3.8.0                  cuda92/fft/9.2.88             hwloc/1.11.8                                  sge/2011.11p1
   bonnie++/1.97.3                    cuda92/nsight/9.2.88          intel-tbb-oss/ia32/2018_20180618oss           slurm/17.11.8                  (L)
   cuda10.0/blas/10.0.130             cuda92/profiler/9.2.88        intel-tbb-oss/intel64/2018_20180618oss        tcnjhpc                        (L)
   cuda10.0/fft/10.0.130              cuda92/toolkit/9.2.88         iozone/3_471                                  torque/6.1.1

  Where:
   L:  Module is loaded
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

To load a particular module into your current environment, use the module load or module add command. They are equivalent.

So what does it mean when you load an environment? Programs typically require that certain environment variables be configured to point to things like configuration files, support libraries, where the software is installed, etc. The module file contains all those commands with the appropriate values so you don't need to manually set them yourself.

Let's look at a sample module file for the AMBER application.


help([[
This module loads the Amber 18 environment

Version 18
]])

whatis("Name: Amber")
whatis("Version 18")
whatis("Category: chemistry")
whatis("Description: Amber")
whatis("Keyword: amber") 

family("amber")

-- always_load("cudahome/8.0.61");
always_load("cudahome/9.2.88");

local amberHome = "/opt/tcnjhpc/amber18"

setenv("AMBERHOME", amberHome)
prepend_path("PATH", pathJoin(amberHome, "bin"))
prepend_path("PYTHONPATH", pathJoin(amberHome, "lib/python2.7/site-packages"))
prepend_path("LD_LIBRARY_PATH", pathJoin(amberHome, "lib"))

-- Prevent SLURM error after step runs
-- "srun: error: _server_read: fd 18 error reading header: Connection reset by peer"
-- This seems to only occur with Amber at the moment so we'll stick it in here
setenv("HYDRA_LAUNCHER_EXTRA_ARGS", "--input none")

There is a lot of stuff in this file that is boilerplate about descriptions of what this module does and such. Let's focus on just a few lines from the module.

always_load("cudahome/9.2.88");

local amberHome = "/opt/tcnjhpc/amber18"

setenv("AMBERHOME", amberHome)
prepend_path("PATH", pathJoin(amberHome, "bin"))
prepend_path("PYTHONPATH", pathJoin(amberHome, "lib/python2.7/site-packages"))
prepend_path("LD_LIBRARY_PATH", pathJoin(amberHome, "lib"))

These lines tell it to load another module called cudahome with version 9.2.88 and sets a local variable called amberHome which is used in the next few lines. The remaining lines setup an AMBERHOME environment variable that will be used by the AMBER programs as well as in setting up various search path environment variables. What all this means is that the module configures all these things for you by simply running module add amber. Without it you would need to run the commands below (while avoiding any typos) before you could access AMBER.

export AMBERHOME=/opt/tcnjhpc/amber18
export PATH=/opt/tcnjhpc/amber18/bin:$PATH
export PYTHONPATH=/opt/tcnjhpc/amber18/lib/python2.7/site-packages
export LD_LIBRARY_PATH=/opt/tcnjhpc/amber18/lib:$LD_LIBRARY_PATH

Common Lmod Commands

Now that you have a good idea of the benefits of module files, let's look at some common commands to work with ones provide on the ELSA HPC cluster.

List all available modules

module avail

Load a module

If you don't specify the module version for an application, it will usually default to the latest version available unless a specific one has been pinned as the default (module avail will list (D) next to the default version).

module load python

or

module add python

Load a specific module version

module add python/3.6.0

Unload a module

module unload python

Unload a specific module version

module unload python/3.6.0

List which modules are currently loaded

module list

You can also use module avail and look for the modules with a (L) next to them to indicated that they are currently loaded.

Additional Resources

The official Lmod User documentation.

ELSA Lmod Tutorial