MSU Institute for Cyber-Enabled Research (ICER) High-Performance Computing Cluster (HPCC)

Oakland University has 4 Intel buy-in nodes (2017) and 3 AMD buy-in nodes (2020) as part of the MSU Institute for Cyber-Enabled Research (ICER) High-Performance Computing Cluster (HPCC).
OU Faculty can get priority access on the buy-in nodes, but also access resources on the entire cluster.


General Information

For general information about ICER, see the ICER About page.
You may find these pages useful as well:

Obtaining an iCER HPCC User Account

Account Request

To utilize Oakland University's iCER nodes, you will need to establish 2 accounts:

  1. MSU Guest Account (NetID)
  2. MSU Community ID

Instructions for obtaining these accounts can be found below.

Obtain an MSU Guest Account

The Oakland University user requesting an MSU HPCC account must first register for an “MSU Guest Account” (formerly Community ID) by following the instructions located here:

https://tech.msu.edu/msu-guest-account/

Important: Please ensure that an official ‘@oakland.edu ‘ email address is used for the registration of the Guest Account. If a commercial email address is used, the HPCC account will not be created.

MSU iCER Community ID

Once the MSU Guest Account has been registered and activated, the Oakland University user requesting access to the HPCC account must then complete the "MSU ICER Community ID Form" located at the link below:

https://contact.icer.msu.edu/community_id

When signing-in, use the "@oakland.edu" email address you used to obtain your MSU Guest Account as your username, along with the associated password. For the "Identity Provider", use "Michigan State University".

MSU Account Approval

Once the preceeding steps have been completed, you will need to have the request approved by a designated UTS contact, or an Oakland University Principle Investigator (PI). If you are an OU PI, please use the following link:

https://contact.icer.msu.edu/account

Important: Do NOT fill out the account contact form above if you are NOT an OU PI. Only a PI with an existing iCER/MSU account, or a designated UTS contact may approve account requests. If you require assistance from a UTS iCER contact, please send an email with your request to " [email protected] ".

Once the account request is approved, you will be able to access the iCER HPCC. You will be provided with a "NetID" that can be used to access HPCC resources along with additional information by iCER.

Accessing the ICER HPCC

Nodes

Once you have your accounts, you will then be able to log into OU's buy-in nodes on the iCER HPCC cluster.

The cluster is primarily accessed by means of the Secure Shell (SSH) network protocol. An SSH connection can be established to hpcc.msu.edu from a terminal prompt (Mac/Linux) or a program like PuTTY (Windows).
Windows users may find using MobaXterm more feature rich than PuTTY, as it supports graphical applications.
See the Install SSH Client page for details. You may also refer to OU's documentation on setting up SSH clients for OU's Matilda Cluster, as this will provide additional guidance on installation and configuration (adapt login and connection details for iCER HPCC).

Note: Please review OU Software Regulations Policy 870 and submit a Software & Hosted Solution Purchasing Checklist form prior to installing software on any Oakland University equipment.

See the iCER HPCC Layout page for a brief description of the system.

Gateway Node

When you make your initial connection to the ICER HPPC, you will find yourself on one of the gateway nodes.
(e.g. ssh [email protected])

The gateway nodes are simply login nodes for users to enter into the ICER HPPC. Once a user has connected to the gateway, they can continue to connect to one of the development nodes.
The gateway nodes are not meant for running software, connecting to scratch space, or compute nodes.
Gateway nodes do have access to the Internet.

rsync Gateway Node

You are also able to connect to an rsync gateway node from your station using an SSH client.
(e.g. ssh [email protected])

The rsync gateway node is meant for transferring files and is able to connect to scratch. Users are not able to access compute nodes from an rsync node.
Rsync nodes do have access to the Internet.

Note: Users cannot connect directly to an rsync gateway node using the Web Browser connection method.

Development (dev) Node

From a gateway or rsync gateway node, you can further SSH into one of the development nodes listed upon log in.
(e.g. ssh dev-intel16)

Development nodes are available for users to compile their code and do short tests to estimate run-time and memory usage.
These short tests should not take longer than 2 hours.

File Systems

See the page on File Systems for a list of various file systems on the cluster.

To determine which file system to use, see the Guidelines for Choosing File Storage and I/O document.

Home Space

The first file system that you will access on the HPCC is the Home Space.

This is typically referred to as your home directory and is the beginning working directory after you log in to any node in the cluster. There is a 50GB limit for storage space and can not contain more than 1 million files.

Note: Users can request to increase their Home Space by completing the Quota Increase Request form.

Research Space

Research space can be created upon request from a principal investigator.

For more information about this space, see Research Space.

Scratch Space

Another set of important file systems are the Scratch File Systems.
The scratch file systems are spaces that are designated for temporary data file storage. Files saved in these locations have no back-up and may be deleted if they are not modified in 45 days. The scratch spaces are also not available from the gateway nodes.
You should save your results or data back into the Home or Research file systems after your job has finished running.

When on an rsync or development node:

  • You can reference the ls15 scratch space with variable $SC15.

  • You can reference the gs18 scratch space with variable $SCRATCH.

Local Files Systems

Local File Systems are available on each cluster compute node and development node.

While these spaces are good for fast temporary storage while running a job, there are some things to be aware of; such as files over 2 weeks old are deleted, and in the situation that the local space is over 90% full, unused files may be deleted without notice.

Running Jobs

Viewing HPCC Commands and Examples

One of the first things you may wish to do if you are new to the cluster is to load the powertools modules and view the available examples and commands.

Assuming you are already on a gateway node (lines beginning with # are comments and should not be executed):

# Load the powertools module
module load powertools

# SSH to a random dev node
dev

# Run the powertools command to print a list of available commands to your terminal
powertools

# Run the getexample command for a list of possible examples to download
getexample

Job Submission examples

Submitting a job to the cluster is done via the sbatch command.

If may be helpful to view some of the examples to get an understanding of job submission.

To get the helloworld example, try the following on a development node:

# Load the powertools module
module load powertools

# Get the helloworld example
getexample helloworld

# change to the helloworld directory
cd helloworld

# Compile the helloworld.c source code
gcc hello.c -o hello

# Run hello.sb using the sbatch command
sbatch hello.sb

# Check the status of your submission using squeue
squeue -u $USER

You may find it helpful to get a few other examples and go through the READMEs, as well as the other files that each README references.

Buy-in Nodes

The 4 Intel nodes purchased in 2017 have the following specifications.

  • 2 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (14 cores per processor) - 28 cores per node.
  • 512 GB memory per node

The 3 AMD nodes purchased in 2020 have the following specifications:

  • 2 AMD EPYC 7H12 processors (64 cores per processor,2.6 GHz clock speed) - 128 cores per node.
  • 4 GB memory/core - 512 GB memory per node.

You should be able to check the status of the nodes and the jobs running on them with the following:

# Load the powertools module
module load powertools

# Check jobs running on all buyin nodes associated with your user
prs

Laconia (intel16)

  1. lac-311
  2. lac-312
  3. lac-313
  4. lac-314

AMD

  1. amr-091
  2. amr-092
  3. amr-093


TSSHowTo TechnicalServiceSystem CategoryHPC