HPC Cluster Hardware Resources

From HPC Docs
Revision as of 16:35, 22 February 2024 by Admin (talk | contribs)
Jump to navigation Jump to search

Networking/Interconnects

All nodes in the chart listed below on this page contain one or two network connections. All node with a processor family of Broadwell, Skylake Silver or Skylake Gold contain 10Gbps Ethernet network interfaces.

Network Storage

The ELSA cluster utilized network based storage that is shared among all nodes for storing personal files, project/research files, course files and the HPC applications. There are two identical storage servers. One is located in the STEM cluster room and another in the Green Hall datacenter. Data from the STEM storage server is regularly (i.e. weekly) replicated to the one in Green Hall. The is a total of approximately 2PB of raw storage (about 1.2PB of usage storage of redundancy and spares are accounted for). The storage server is Linux-based utilizing the ZFS on Linux filesystem and NFS for sharing with the cluster nodes. It is based on the Disruptive Storage Workshop from the Johns Hopkins School of Public Health and BioTeam.

Data Transfer Node

A data transfer node (DTN) is used for transferring large files in and out of a cluster. It is designed to handle high-speed, high-volume transfers. The ELSA DTN contains 12TB of disk storage for temporarily holding file transfers. It also has a 40Gbps Ethernet interface and Infiniband FDR interconnect to maximize throughput.

PerfSONAR

PerfSONAR is a network performance testing and monitoring system. It regularly runs tests bandwidth and latency tests and if issues arise, it helps pinpoint the location in the network path causing the issue.

Node Configurations

The following describes the contents of columns in the tables below.

  • Node Name = name of the node server. Login nodes (login001 & login002) are accessible via the elsa.hpc.tcnj.edu load-balancer (e.g. using SSH) from the campus network (wired or wireless) or via the TCNJ VPN. Other nodes are not meant to be directly accessed.
  • Processor Family = the generation of processor in the node Skylake Gold > Skylake Silver > Broadwell
  • Available Cores = these are the processing cores that compute jobs can use
  • Reserved Cores = these cores are reserved for system use and not available to user jobs
  • RAM Memory = how much memory the node contains
  • Infiniband = Infiniband is a high speed interconnect (network) connection. NO LONGER USED.
  • GPU Count = number of GPU accelerators in the node
  • NVIDIA GPU Type = the model of the GPU accelerators in the node
  • Queue Membership = which queues (SLURM partitions) this node is a member of. Nodes can be a member of multiple queues. Note some queues are used for internal purposes (e.g. remoteviz, interactive) and should not be used for submitting your jobs except under certain conditions. Please see the SLURM Partitions page for more information on the specification of these queues/partition.
Node
Name
Processor
Family
Available
Cores
Reserved
Cores
RAM
Memory
Infiniband GPU
Count
NVIDIA
GPU Type
Queue
Membership(s)
Notes
login001 virtual 8 0 8G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
login002 virtual 8 0 8G n/a 0 n/a n/a Public hostname
elsa.hpc.tcnj.edu
osg-login virtual 4 0 4G n/a 0 n/a n/a Dedicated for Open Science Grid job submissions
node061 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node062 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node063 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node064 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node065 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node066 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node067 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node068 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node069 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node070 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node071 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node072 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node073 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node074 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node075 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node076 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node077 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node078 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node079 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node080 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node081 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node082 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive  
node083 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive, nolimit  
node084 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive, nolimit  
node085 Intel Skylake Gold 30 2 192G n/a 0 n/a short, normal, long, interactive, nolimit  
node086 AMD EPYC2 62 2 512G n/a 0 n/a amd  
node087 AMD EPYC2 62 2 512G n/a 0 n/a amd  
node131 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node132 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node133 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node134 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node135 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node136 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node137 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
node138 Intel Broadwell 30 2 128G n/a 0 n/a short, normal, long, interactive, nolimit  
gpu-node001 Intel Broadwell 20 0 256G n/a 4 GTX 1080 gpu  
gpu-node002 Intel Broadwell 20 0 256G n/a 4 GTX 1080 gpu  
gpu-node003 Intel Broadwell 20 0 256G n/a 4 GTX 1080 gpu  
gpu-node004 Intel Broadwell 20 0 256G n/a 4 GTX 1080 gpu  
gpu-node005 Intel Broadwell 20 0 256G n/a 4 GTX 1080 shortgpu  
gpu-node006 Intel Skylake Silver 20 0 192G n/a 8 GTX 1080Ti gpu  
gpu-node007 Intel Skylake Silver 20 0 192G n/a 8 GTX 1080Ti gpu  
gpu-node008 Intel Skylake Silver 20 0 192G n/a 8 GTX 1080Ti gpu  
gpu-node009 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node010 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node011 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node012 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node013 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node014 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node015 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node016 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node017 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
gpu-node018 Intel Skylake Gold 32 0 384G n/a 4 RTX 2080 gpu  
viz-node001 Intel Skylake Gold 40 0 768G n/a 1 Tesla V100 remoteviz  
viz-node002 Intel Skylake Gold 40 0 768G n/a 1 Tesla V100 remoteviz