HPC Cluster Hardware Resources

From HPC Docs
Revision as of 19:20, 31 May 2019 by Admin (talk | contribs)
Jump to navigation Jump to search

Networking/Interconnects

All nodes in the chart listed below on this page contain one or two network connections. All node with a processor family of Broadwell, Skylake Silver or Skylake Gold contain 10Gbps Ethernet network interfaces. Other nodes (e.g. Hapertown and Nehalem) have 1Gbps Ethernet network interfaces. In addition, some nodes, as indicated in the chart, include an Infiniband interconnect. This is an alternate high-speed, low-latency network connection. Currently ELSA utilized Infiniband FDR which provides 56Gbps speeds. You can control whether your job uses the Infiniband interface by modifying your SLURM job submission script.

Network Storage

The ELSA cluster utilized network based storage that is shared among all nodes for storing personal files, project/research files, course files and the HPC applications. There are two identical storage servers. One is located in the STEM cluster room and another in the Green Hall datacenter. Data from the STEM storage server is regularly (i.e. weekly) replicated to the one in Green Hall. The is a total of approximately 2PB of raw storage (about 1.2PB of usage storage of redundancy and spares are accounted for). The storage server is Linux-based utilizing the ZFS on Linux filesystem and NFS for sharing with the cluster nodes. It is based on the Disruptive Storage Workshop from the Johns Hopkins School of Public Health and BioTeam.

Data Transfer Node

A data transfer node (DTN) is used for transferring large files in and out of a cluster. It is designed to handle high-speed, high-volume transfers. The ELSA DTN contains 12TB of disk storage for temporarily holding file transfers. It also has a 40Gbps Ethernet interface and Infiniband FDR interconnect to maximize throughput.

PerfSONAR

PerfSONAR is a network performance testing and monitoring system. It regularly runs tests bandwidth and latency tests and if issues arise, it helps pinpoint the location in the network path causing the issue.

Node Configurations

The following describes the contents of columns in the tables below.

  • Node Name = name of the node server. Login nodes (login001, login002, login003, login004) are accessible via the elsa.hpc.tcnj.edu load-balancer (e.g. using SSH) from the campus network (wired or wireless) or via the TCNJ VPN. The development node dev001 is accessible via dev1.hpc.tcnj.edu. Other nodes are not meant to be directly accessed.
  • Processor Family = the generation of processor in the node Skylake Gold > Skylake Silver > Broadwell > Nehalem > Hapertown
  • Available Cores = these are the processing cores that compute jobs can use
  • Reserved Cores = these cores are reserved for system use and not available to user jobs
  • RAM Memory = how much memory the node contains
  • Infiniband = Infiniband is a high speed interconnect (network) connection. FDR = 56Gbps
  • GPU Count = number of GPU accelerators in the node
  • NVIDIA GPU Type = the model of the GPU accelerators in the node
  • Queue Membership = which queues (SLURM partitions) this node is a member of. Nodes can be a member of multiple queues. Note some queues are used for internal purposes (e.g. remoteviz, interactive) and should not be used for submitting your jobs except under certain conditions. Please see the SLURM Partitions page for more information on the specification of these queues/partition.
Node
Name
Processor
Family
Available
Cores
Reserved
Cores
RAM
Memory
Infiniband GPU
Count
NVIDIA
GPU Type
Queue
Membership(s)
Notes
login001 Harpertown 8 0 32G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login002 virtual 2 0 4G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login003 virtual 2 0 4G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login004 virtual 2 0 4G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
dev001 Broadwell 20 0 256G n/a 1 GTX 1080 n/a Public hostname
dev1.hpc.tcnj.edu
node001 Nehalem 7 1 48G n/a 0 n/a interactive  
node002 Nehalem 7 1 48G n/a 0 n/a interactive  
node003 Nehalem 7 1 48G n/a 0 n/a interactive  
node004 Nehalem 7 1 48G n/a 0 n/a interactive  
node005 Nehalem 7 1 48G n/a 0 n/a interactive  
node006 Nehalem 7 1 48G n/a 0 n/a short, interactive  
node007 Nehalem 7 1 48G n/a 0 n/a short, interactive  
node008 Nehalem 7 1 48G n/a 0 n/a short, interactive  
node009 Nehalem 7 1 48G n/a 0 n/a short, interactive  
node010 Nehalem 7 1 48G n/a 0 n/a short, interactive  
node011 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node012 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node013 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node014 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node015 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node016 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node017 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node018 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node019 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node020 Nehalem 7 1 48G n/a 0 n/a short, normal, interactive  
node021 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node022 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node023 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node024 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node025 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node026 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node027 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node028 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node029 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node030 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive  
node031 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node032 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node033 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node034 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node035 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node036 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node037 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node038 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node039 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node040 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node041 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node042 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node043 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node044 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node045 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node046 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node047 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node048 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node049 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node050 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node051 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node052 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node053 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node054 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node055 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node056 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node057 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node058 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node059 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node060 Nehalem 7 1 48G n/a 0 n/a short, normal, long, interactive, nolimit  
node061 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node062 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node063 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node064 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node065 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node066 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node067 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node068 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node069 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node070 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node071 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node072 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node073 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node074 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node075 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node076 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node077 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node078 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node079 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node080 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node081 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node082 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive  
node083 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive, nolimit  
node084 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive, nolimit  
node085 Skylake Gold 32 0 192G FDR 0 n/a short, normal, long, interactive, nolimit  
node131 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node132 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node133 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node134 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node135 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node136 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node137 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
node138 Broadwell 32 0 128G FDR 0 n/a short, normal, long, interactive, nolimit  
gpu-node001 Broadwell 20 0 256G FDR 4 GTX 1080 gpu  
gpu-node002 Broadwell 20 0 256G FDR 4 GTX 1080 gpu  
gpu-node003 Broadwell 20 0 256G FDR 4 GTX 1080 gpu  
gpu-node004 Broadwell 20 0 256G FDR 4 GTX 1080 gpu  
gpu-node005 Broadwell 20 0 256G FDR 4 GTX 1080 shortgpu  
gpu-node006 Skylake Silver 20 0 192G FDR 8 GTX 1080Ti gpu  
gpu-node007 Skylake Silver 20 0 192G FDR 8 GTX 1080Ti gpu  
gpu-node008 Skylake Silver 20 0 192G FDR 8 GTX 1080Ti gpu  
gpu-node009 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node010 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node011 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node012 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node013 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node014 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node015 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node016 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node017 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
gpu-node018 Skylake Gold 32 0 384G FDR 4 RTX 2080 gpu  
viz-node001 Skylake Gold 40 0 768G FDR 4 Tesla V100 remoteviz  
viz-node002 Skylake Gold 40 0 768G FDR 4 Tesla V100 remoteviz