Matilda HPC Cluster Rates 2022-2024

Last Update: July 8, 2024

Overview

This document describes the base (free) computation allocation provided to Principal Investigators (PIs) and their labs on Matilda. CPU and GPU base allocations are refreshed annually, and run from January 1st to December 31st. PIs whose labs exceed the base allocation may purchase additional compute hours or storage as described in the following sections.

For details on the computational resources available, please refer to the KB article on the Matilda HPC Cluster.

Base Individual Resource Allocations

Upon request, all OU-affiliated researchers receive the following base individual resource allocations on the Matilda HPC Cluster. This resource allocation allows an OU-affiliated researcher access to the Matilda HPC Cluster and submit jobs as part of a Principal Investigators project/group.

  • Home directory storage: 50 GiB
  • Scratch storage1: 10 TiB

Base Group/Project Allocations

The Principal Investigator (PI) is provided with shared project space for their research project or group. Base project/group allocations are as follows and assigned to the PI and usable by members of the PI’s group.

  • Compute Hours2 per year: 1,000,000

  • GPU Hours per year: 50,000
  • Shared Project/Group storage: 1 TiB
  • Shared Project/Group Scratch storage1: 10 TiB

Researcher compute and GPU allocations are "convertible". This is accomplished by granting a base allocation as referenced above, and applying a "billing weight" multiplier of 10x to GPU hours for the purposes of computing usage. To create the base allocation, GPU hours are multiplied by a factor of 10 to create "generic" CPU hours and added to the base CPU hour allocation. This gives researchers the flexibility to use their allocation in the way that best suits their needs. For example, 100 CPU hours would convert to 10 GPU hours, but 100 GPU hours would convert to 1,000 CPU hours.

Please note that the allocations for computational time (CPU/GPU) above are issued for any research-related activity of the PI’s choosing. CPU and GPU usage for the PI and their group is tracked in aggregate, and reset to zero (0) at the start of each calendar year. If additional tracking by the PI of base CPU/GPU allocations is necessary, UTS can assign additional accounting parameters for this purpose, providing we have descriptions and the number of required trackable activities. Additional tracking can be provided with fixed not-to-exceed sub-allocations (provided the PI knows how much they wish to allocate from the base allocation), or simply for accounting/bookkeeping purposes. The annual base allotment of trackable resources is computed using the following formula:

  • Annual Allotment = 1,000,000 CPU hours + (10.0)*(50,000 GPU hours) = 1.5 million hours
    • where: 50,000 GPU hours would convert to 500,000 “CPU” hours.
  • Calculated as CPU hours: 50,000 * (0.240/0.024) = 500,000. (where the costs per CPU hour and GPU hour are 0.024 and 0.24, respectively)

As mentioned previously , CPU and GPU hours can be used interchangeably, since GPU hours are billed at 10x’s the rate of CPU hours. For example, single or combined workloads that use 100 CPU hours and 10 GPU hours would be computed as follows:

  • Workload Usage = 100 CPU hours + (10)*10 GPU hours = 200 total computational hours

This alleviates the need for researchers who might use few CPU hours but many GPU hours, to request manual reallocation of CPU base hours to GPU hours, and vice versa.

Rates for Additional Computational Resources

The 2022-24 rates3 for the Matilda HPC Cluster are presented below. Researchers with a need for additional computational time (CPU/GPU hours) have the option of purchasing additional resources.

These rates represent cost recovery for the Matilda cluster and do not include any support that individual units may choose to provide.

  • Compute hours2: Additional compute hours can be purchased at a rate of $0.024 per hour

  • GPU hours4: Additional GPU hours can be purchased at a rate of $0.24 per hour

Additional computational time purchased by the PI will generally be placed in a separate account that is accessible to the PI and any group members of their choosing. Whereas base allocation amounts are “use it or lose it” (i.e., unused portions do not roll over year-over-year), unused purchased time will remain intact until the account is drained. To use additional purchased hours, the PI or their group members must specify the account to be used during job submission.

Buy-In Nodes

Researchers who need hardware capacity beyond what is currently available on the Matilda cluster have the option to buy additional nodes. UTS staff will add purchased nodes to the cluster and manage them together with the rest of the cluster. Buy-in users and their research groups will receive priority access5 on the cluster resources they purchase. Buy-in users will also receive additional compute time (CPU or GPU, as needed or desired) in the calendar year they purchase resources, based on the rates in effect at the time of purchase. To purchase a node, contact UTS at [email protected] to discuss your needs and get a quote. The exact price will depend on the hardware chosen, plus any incidentals that may be needed to connect the new hardware to the cluster. As a rough guideline for budget planning purposes, the table below lists the estimated cost5 for nodes with various hardware specifications:

Researchers who need hardware capacity beyond what is currently available on the Matilda cluster have the option to buy additional nodes. UTS staff will add purchased nodes to the cluster and manage them together with the rest of the cluster. Buy-in users and their research groups will receive priority access on the cluster resources they purchase. Buy-in users will also receive additional compute time (CPU or GPU, as needed or desired) in the calendar year they purchase resources, based on the rates in effect at the time of purchase. To purchase a node, contact UTS at [email protected] to discuss your needs and get a quote. The exact price will depend on the hardware chosen, plus any incidentals that may be needed to connect the new hardware to the cluster. As a rough guideline for budget planning purposes, the table below lists the estimated cost6 for nodes with various hardware specifications:

Buy-In Node Option

Specifications

Estimated Cost

Large Memory Compute 2TB

2TB Memory, 32 CPU Cores (Intel)

$38,000

Big Memory Intel GPU 512GB

512GB Memory, 32 CPU Cores, 4x NVLink Nvidia H100 80GB

$198,000

Big Memory Intel GPU 1TB

1TB Memory, 32 CPU Cores, 4x Nvidia L40s 48GB without NVLink

$80,000

Rates for Additional Storage

Researchers or groups needing storage beyond the base allocations described in section 1 and 2 above have a variety of options to purchase additional storage, depending on the researcher’s specific storage needs. There are two base storage types: storage on the Matilda HPC cluster itself, or storage in one or more OU data centers, but without direct access to/from the Matilda cluster.

Storage Type

Specifications

Cost Per TiB (1 year)

Matilda project or home directory quota

Storage quota on Matilda is increased and is immediately available for use on Matilda: Two locations (snapshot & replication between NFH and DH) and DR-AWS/Deep Archive. Offers best protection for data, 2+1 locations

$260

Matilda scratch space quota

Scratch space on Matilda is increased and immediately available for use on Matilda: High-speed Lustre parallel scratch, files are not backed up and purged automatically after 45 days from time of last access. Offers increased working storage for large data files.

$72

Performance tier

Single local (snapshot) storage location (NFH), high speed. Good for storing data you need occasionally where file loss is not catastrophic.

$170

Archive tier7

Single local8 (snapshot) storage location (DH). Good for archiving data you need to keep but not access often and where file loss is not catastrophic.

$90

Replicated performance tier

Two locations (snapshot & replication between NFH & DH). Good for storing data you need occasionally where file loss would be catastrophic.

$250

Replicated performance tier with deep archive

Two locations (snapshot & replication between NFH & DH) and DR-AWS/Deep Archive. Best protection for storing data you need occasionally where file loss would be catastrophic.

$260

Archive tier7 with deep archive

Single location and DR-AWS/Deep Archive. Good for archiving data with infrequent access but requires off-site data protection.

$90


CategoryHPC

  1. Scratch storage is short-term storage used for working files only. It is not backed up or mirrored, and inactive files (determined by last access time) are deleted after 45 days. (1 2)

  2. Compute Hours are measured per CPU Core used in a job, thus a job running on 40 CPU cores and running for 1 hour would consume 40 Compute Hours. (3 4)

  3. Rates are currently revised every two years. (5)

  4. GPU Hours are measured per GPU requested as typically only a single job can be run on a GPU at a time. E.g. A job requesting 2 GPU resources and running for 1 hour would consume 2 GPU Hours. (6)

  5. Priority access means that users are guaranteed to be able to start a job on a purchased resource in less than four hours when they need their resource for a research project. Priority access lasts for five years from the date of purchase or the anticipated useful life of the hardware, whichever is less. When the purchaser is not using that resource, it will be available to other cluster users for a maximum walltime of 4 hours per job. (7)

  6. Estimated Cost, quotes for the node specifications were obtained in July 2022 and should be treated as estimates with pricing subject to change. (8)

  7. Can be made available through Globus to move data in and out of the Matilda HPC Cluster (9 10)

  8. Local refers to storage that is within one of the data center on the OU campus either in Dodge Hall (DH) or North Foundation Hall (NFH) (11)