Hpc-syspros-basics.github.io

HPC Basics · Documentation home for HPC Basics instruction

WebWe speak directly to the state of the practice of standing up and operating high performance systems with an emphasis on solutions that can be implemented by systems staff at …

Actived: 7 days ago

URL: https://hpc-syspros-basics.github.io/

Installing NHC · HPC Basics

WebExecuting the make test step in the previous example is optional but recommended as this will run NHC’s built-in unit test suite to make sure everything is functioning properly.

Category:  Health Go Health

Reference Books · HPC Basics

WebReference Books High Performance Computing: Modern Systems and Practices. High Performance Computing: Modern Systems and Practices is a fully comprehensive and …

Category:  Health Go Health

Configuring NHC · HPC Basics

WebConfiguring NHC. After installing NHC we can start to put in some configuration to have the script start verifing the health of nodes. The starting configuration is meant to highlight …

Category:  Health Go Health

HPC Basics · HPC Basics

WebHPC Basics {% include list.liquid all=true %} What is a Supercomputer? What does a Supercomputer do? Moore’s Law Scalability Benchmarking Supercomp

Category:  Health Go Health

Writing a custom check for NHC · HPC Basics

WebWriting a custom check for NHC. Occasionally you will run into an instance where one of the built in checks does not satisfy a use case you have for checking the health of a node.

Category:  Health Go Health

Node Health Check · HPC Basics

WebWe will go over some of the basics about why Node Health check scripts are useful. We will then detail how to install the warewulf node health check script on a compute node and …

Category:  Health Go Health

Benchmarking · HPC Basics

WebBenchmarking. Benchmarking is the practice of running tests on your hardware to verify performance and in some cases stress the hardware to ensure that the hardware is …

Category:  Health Go Health

Login Node Resource Management · HPC Basics

WebLogin Node Resource Management. The login nodes of an HPC cluster are a shared, yet finite and small, resource. Users can easily overload the login nodes creating a poor …

Category:  Health Go Health

What is a Supercomputer

WebWhat is a Supercomputer? A supercomputer is not simply a fast or very large computer: it works in an entirely different way, typically using parallel processing instead of the serial …

Category:  Health Go Health

Scheduler Integration · HPC Basics

WebSlurm Integration. Add the following to /etc/slurm.slurm.conf (or wherever your slurm.conf file is located in your environment) on your controller node (s) AND your compute nodes …

Category:  Health Go Health

Advanced Topics · HPC Basics

WebAdvanced Topics {% include list.liquid all=true %} Benchmarking CPU Benchmarks Memory Benchmarks Storage Benchmarks GPU Benchmarks Network Benchmar

Category:  Health Go Health

GPU Benchmarks · HPC Basics

Webgpu-burn. While we have this in the benchmarks section, it is hardly a benchmark, though it does output Floating point operations per second readouts. This piece of code can be …

Category:  Health Go Health

Audience · HPC Basics

WebPrerequisites. In order for much of the documentation to make sense we encourage folks to have a basic knowledge of a few concepts in order to get the most out of our …

Category:  Health Go Health

CPU Benchmarks · HPC Basics

WebThe High Performance Linpack benchmark is the oldest and most widely accepted benchmark that measures the double precision floating point performance of distributed …

Category:  Health Go Health

cgroups · HPC Basics

Webcgroups are a way to restrict and manage users and processes. One use case for cgroups on login nodes is: A policy like the above will prevent individual users from overwhelming …

Category:  Health Go Health