HPC Cluster Hardware Resources

From HPC Docs
Revision as of 13:25, 12 April 2019 by Admin (talk | contribs)
Jump to navigation Jump to search

Networking/Interconnects

All nodes in the chart listed below on this page contain one or two network connections. All node with a processor family of Broadwell, Skylake Silver or Skylake Gold contain 10Gbps Ethernet network interfaces. Other nodes (e.g. Hapertown and Nehalem) have 1Gbps Ethernet network interfaces. In addition, some nodes, as indicated in the chart, include an Infiniband interconnect. This is an alternate high-speed, low-latency network connection. Currently ELSA utilized Infiniband FDR which provides 56Gbps speeds. You can control whether your job uses the Infiniband interface by modifying your SLURM job submission script.

Network Storage

The ELSA cluster utilized network based storage that is shared among all nodes for storing personal files, project/research files, course files and the HPC applications. There are two identical storage servers. One is located in the STEM cluster room and another in the Green Hall datacenter. Data from the STEM storage server is regularly (i.e. weekly) replicated to the one in Green Hall. The is a total of approximately 2PB of raw storage (about 1.2PB of usage storage of redundancy and spares are accounted for). The storage server is Linux-based utilizing the ZFS on Linux filesystem and NFS for sharing with the cluster nodes. It is based on the Disruptive Storage Workshop from the Johns Hopkins School of Public Health and BioTeam.

Data Transfer Node

A data transfer node (DTN) used for transferring large files in and out of a cluster. It is designed to handle high-speed, high-volume transfers. The ELSA DTN contains 12TB of disk storage for temporarily holding file transfers. It also has a 40Gbps Ethernet interface and Infiniband FDR interconnect to maximize throughput.

PerfSONAR

PerfSONAR is a network performance testing and monitoring system. It regularly runs tests bandwidth and latency tests and if issues arise, it helps pinpoint the location in the network path causing the issue.

Node Configurations

The following describes the contents of columns in the tables below.

  • Node Name = name of the node server. Login nodes (login001, login002, login003, login004) are accessible via the elsa.hpc.tcnj.edu load-balancer (e.g. using SSH) from the campus network (wired or wireless) or via the TCNJ VPN. The development node dev001 is accessible via dev1.hpc.tcnj.edu. Other nodes are not meant to be directly accessed.
  • Processor Family = the generation of processor in the node Skylake Gold > Skylake Silver > Broadwell > Nehalem > Hapertown
  • Available Cores = these are the processing cores that compute jobs can use
  • Reserved Cores = these cores are reserved for system use and not available to user jobs
  • RAM Memory = how much memory the node contains
  • Infiniband = Infiniband is a high speed interconnect (network) connection. FDR = 56Gbps
  • GPU Count = number of GPU accelerators in the node
  • NVIDIA GPU Type = the model of the GPU accelerators in the node
  • Queue Membership = which queues (SLURM partitions) this node is a member of. Nodes can be a member of multiple queues
Node
Name
Processor
Family
Available
Cores
Reserved
Cores
RAM
Memory
Infiniband GPU
Count
NVIDIA
GPU Type
Queue
Membership
Notes
login001 Harpertown 8 0 32G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login002 virtual 2 0 4G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login003 virtual 2 0 4G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login004 virtual 2 0 4G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
dev001 Broadwell 20 0 256G n/a 1 GTX 1080 n/a Public hostname
dev1.hpc.tcnj.edu
node001 Nehalem 7 1 48G n/a 0 n/a Example  
node002 Nehalem 7 1 48G n/a 0 n/a Example  
node003 Nehalem 7 1 48G n/a 0 n/a Example  
node004 Nehalem 7 1 48G n/a 0 n/a Example  
node005 Nehalem 7 1 48G n/a 0 n/a Example  
node006 Nehalem 7 1 48G n/a 0 n/a Example  
node007 Nehalem 7 1 48G n/a 0 n/a Example  
node008 Nehalem 7 1 48G n/a 0 n/a Example  
node009 Nehalem 7 1 48G n/a 0 n/a Example  
node010 Nehalem 7 1 48G n/a 0 n/a Example  
node011 Nehalem 7 1 48G n/a 0 n/a Example  
node012 Nehalem 7 1 48G n/a 0 n/a Example  
node013 Nehalem 7 1 48G n/a 0 n/a Example  
node014 Nehalem 7 1 48G n/a 0 n/a Example  
node015 Nehalem 7 1 48G n/a 0 n/a Example  
node016 Nehalem 7 1 48G n/a 0 n/a Example  
node017 Nehalem 7 1 48G n/a 0 n/a Example  
node018 Nehalem 7 1 48G n/a 0 n/a Example  
node019 Nehalem 7 1 48G n/a 0 n/a Example  
node020 Nehalem 7 1 48G n/a 0 n/a Example  
node021 Nehalem 7 1 48G n/a 0 n/a Example  
node022 Nehalem 7 1 48G n/a 0 n/a Example  
node023 Nehalem 7 1 48G n/a 0 n/a Example  
node024 Nehalem 7 1 48G n/a 0 n/a Example  
node025 Nehalem 7 1 48G n/a 0 n/a Example  
node026 Nehalem 7 1 48G n/a 0 n/a Example  
node027 Nehalem 7 1 48G n/a 0 n/a Example  
node028 Nehalem 7 1 48G n/a 0 n/a Example  
node029 Nehalem 7 1 48G n/a 0 n/a Example  
node030 Nehalem 7 1 48G n/a 0 n/a Example  
node031 Nehalem 7 1 48G n/a 0 n/a Example  
node032 Nehalem 7 1 48G n/a 0 n/a Example  
node033 Nehalem 7 1 48G n/a 0 n/a Example  
node034 Nehalem 7 1 48G n/a 0 n/a Example  
node035 Nehalem 7 1 48G n/a 0 n/a Example  
node036 Nehalem 7 1 48G n/a 0 n/a Example  
node037 Nehalem 7 1 48G n/a 0 n/a Example  
node038 Nehalem 7 1 48G n/a 0 n/a Example  
node039 Nehalem 7 1 48G n/a 0 n/a Example  
node040 Nehalem 7 1 48G n/a 0 n/a Example  
node041 Nehalem 7 1 48G n/a 0 n/a Example  
node042 Nehalem 7 1 48G n/a 0 n/a Example  
node043 Nehalem 7 1 48G n/a 0 n/a Example  
node044 Nehalem 7 1 48G n/a 0 n/a Example  
node045 Nehalem 7 1 48G n/a 0 n/a Example  
node046 Nehalem 7 1 48G n/a 0 n/a Example  
node047 Nehalem 7 1 48G n/a 0 n/a Example  
node048 Nehalem 7 1 48G n/a 0 n/a Example  
node049 Nehalem 7 1 48G n/a 0 n/a Example  
node050 Nehalem 7 1 48G n/a 0 n/a Example  
node051 Nehalem 7 1 48G n/a 0 n/a Example  
node052 Nehalem 7 1 48G n/a 0 n/a Example  
node053 Nehalem 7 1 48G n/a 0 n/a Example  
node054 Nehalem 7 1 48G n/a 0 n/a Example  
node055 Nehalem 7 1 48G n/a 0 n/a Example  
node056 Nehalem 7 1 48G n/a 0 n/a Example  
node057 Nehalem 7 1 48G n/a 0 n/a Example  
node058 Nehalem 7 1 48G n/a 0 n/a Example  
node059 Nehalem 7 1 48G n/a 0 n/a Example  
node060 Nehalem 7 1 48G n/a 0 n/a Example  
node061 Skylake Gold 32 0 192G FDR 0 n/a Example  
node062 Skylake Gold 32 0 192G FDR 0 n/a Example  
node063 Skylake Gold 32 0 192G FDR 0 n/a Example  
node064 Skylake Gold 32 0 192G FDR 0 n/a Example  
node065 Skylake Gold 32 0 192G FDR 0 n/a Example  
node066 Skylake Gold 32 0 192G FDR 0 n/a Example  
node067 Skylake Gold 32 0 192G FDR 0 n/a Example  
node068 Skylake Gold 32 0 192G FDR 0 n/a Example  
node069 Skylake Gold 32 0 192G FDR 0 n/a Example  
node070 Skylake Gold 32 0 192G FDR 0 n/a Example  
node071 Skylake Gold 32 0 192G FDR 0 n/a Example  
node072 Skylake Gold 32 0 192G FDR 0 n/a Example  
node073 Skylake Gold 32 0 192G FDR 0 n/a Example  
node074 Skylake Gold 32 0 192G FDR 0 n/a Example  
node075 Skylake Gold 32 0 192G FDR 0 n/a Example  
node076 Skylake Gold 32 0 192G FDR 0 n/a Example  
node077 Skylake Gold 32 0 192G FDR 0 n/a Example  
node078 Skylake Gold 32 0 192G FDR 0 n/a Example  
node079 Skylake Gold 32 0 192G FDR 0 n/a Example  
node080 Skylake Gold 32 0 192G FDR 0 n/a Example  
node081 Skylake Gold 32 0 192G FDR 0 n/a Example  
node082 Skylake Gold 32 0 192G FDR 0 n/a Example  
node083 Skylake Gold 32 0 192G FDR 0 n/a Example  
node084 Skylake Gold 32 0 192G FDR 0 n/a Example  
node085 Skylake Gold 32 0 192G FDR 0 n/a Example  
node131 Broadwell 32 0 128G FDR 0 n/a Example  
node132 Broadwell 32 0 128G FDR 0 n/a Example  
node133 Broadwell 32 0 128G FDR 0 n/a Example  
node134 Broadwell 32 0 128G FDR 0 n/a Example  
node135 Broadwell 32 0 128G FDR 0 n/a Example  
node136 Broadwell 32 0 128G FDR 0 n/a Example  
node137 Broadwell 32 0 128G FDR 0 n/a Example  
node138 Broadwell 32 0 128G FDR 0 n/a Example  
gpu-node001 Broadwell 20 0 256G FDR 4 GTX 1080 Example  
gpu-node002 Broadwell 20 0 256G FDR 4 GTX 1080 Example  
gpu-node003 Broadwell 20 0 256G FDR 4 GTX 1080 Example  
gpu-node004 Broadwell 20 0 256G FDR 4 GTX 1080 Example  
gpu-node005 Broadwell 20 0 256G FDR 4 GTX 1080 shortgpu  
gpu-node006 Skylake Silver 20 0 256G FDR 8 GTX 1080Ti Example  
gpu-node007 Skylake Silver 20 0 256G FDR 8 GTX 1080Ti Example  
gpu-node008 Skylake Silver 20 0 256G FDR 8 GTX 1080Ti Example  
gpu-node006 Skylake Silver 20 0 192G FDR 8 GTX 1080Ti Example  
gpu-node007 Skylake Silver 20 0 192G FDR 8 GTX 1080Ti Example  
gpu-node008 Skylake Silver 20 0 192G FDR 8 GTX 1080Ti Example  
gpu-node009 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node010 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node011 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node012 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node013 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node014 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node015 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node016 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node017 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
gpu-node018 Skylake Gold 32 0 384G FDR 4 RTX 2080 Example  
viz-node001 Skylake Gold 40 0 768G FDR 4 Tesla V100 remoteviz  
viz-node002 Skylake Gold 40 0 768G FDR 4 Tesla V100 remoteviz