ds-econ

Share this post

Working on a SLURM cluster

dsecon.substack.com

Working on a SLURM cluster

Things, I keep re-discovering

Finn
May 17, 2023
Share

If you are working in academia on projects which involve big data, it is likely that you will, sooner or later, make use of a high-performance computing (HPC) cluster.

Typically, these are managed with the SLURM workload manager, which provides a framework for job scheduling and resource allocation. In essence, it coordinates “supply and demand” in the cluster, in an efficient way.

Thanks for reading ds-econ! Subscribe for free to receive new posts and support my work.

Thank you for reading ds-econ. This post is public so feel free to share it.

Share

As working on a HPC is not something that I do everyday,
here are a few tips that I keep re-discovering:

First, start with a small sample on your local machine. Hash out all the bugs, make sure your script is clean, simple, and it works.

Create a template job script that includes the parameters you need to edit, as well as an outline of how to execute your computer code.

For example, you want to load modules, activate environments, and print out the date-time when you start the execution of the script.

It could look something like this:

#!/bin/bash
#SBATCH --job-name=my_r_job
#SBATCH --output=output.txt
#SBATCH --error=error.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=2G
#SBATCH --time=02:00:00

# Load required modules
module load R

# Run R script
Rscript my_script.R

Set up shortcuts for connecting to the cluster, downloading files, uploading files, and, most importantly, updating your code.

e.g. use iTerm2

When you have these shortcuts, you don’t even need to connect your IDE to the cluster (albeit a great idea), but you can simply edit your code locally, save it, and execute the shortcut to update your code. Trust me, you will use this over and over again.

Filter the system messages of the cluster into a separate folder in your email inbox. You will get a lot of them.

scp is your friend, as it helps you to upload and download files and folder easily.

Loading...

There might only be different versions of your programming language available on the cluster. Prepare yourself for some issues with package compatibilities.

There will be tiny issues: For example, I noticed that, when calling R from the terminal, it matters whether you have an .r or .R file.

Consider removing intermediary data objects to save RAM.
Streamline your code.

Thanks for reading ds-econ! Become a friend.

Consider creating a logger, such that you can monitor the progress better. You will need it. Also, print out the resources used by the system at intermediate steps in the program. This will be very helpful with debugging.

Submit the script using the sbatch command to add your job to the queue and allocate necessary resources. Monitor the job status with squeue to see the queue.

By following this workflow, you can efficiently submit and monitor your jobs, retrieve output, and identify and resolve any issues that may arise.

Thanks for reading ds-econ! Subscribe for free to receive new posts and support my work.

Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Finn
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing