Building HPC with Raspberry Pi

Step - 0: The Hardware

2x Raspberry Pi 4 Model B - for compute nodes (Shop at raspberrypi.org)
1x Raspberry Pi 4 Model B - for master/# node (Shop at raspberrypi.org)
3x MicroSD Cards (16 GB or higher) (Shop at Amazon)
3x USB-C power cables (Shop at Amazon)
1x 5-port Wireless Router (Shop at Amazon)
3x CAT 5/6 Ethernet Cable (Shop at Amazon)
1x 5-port 10/100/1000 network switch (Optional) (Shop at Amazon)
1x 6-port USB power-supply (optional) (Shop at Amazon)
1x SSD/external HDD or Flash drive (Shop at Amazon)
1x Raspberry Pi 4 cluster case with cooling fan and heatsinks (Shop at Amazon)

Step - 1: Prepare the Micro-SD cards for Raspberry Pi OS

Download latest version of Raspbian Lite OS by using your MacOS/Linux terminal and type

wget https://downloads.raspberrypi.org/raspbian_lite_latest -O raspbian_lite_latest.zip

Now extract the zipped file

unzip raspbian_lite_latest.zip

Check the directory for the contents from the extraction and find the name of the disk image file with extension .img (e.g. 2020-02-13-raspbian-buster-lite.img) Now insert a SD/Micro-SD card inside your laptop and check for the attached devices/mount point of the card For MacOs:

diskutil list

For Linux:

lsblk

Let’s say it is attached to dev/disk2. First unmount the disk,

diskutil unmountDisk /dev/disk2

For linux (use "sudo" if necessary)

umount /dev/disk2

Then flash the image to memory card

sudo dd if=2020-02-13-raspbian-buster-lite.img of=/dev/disk2

If successful, a drive will be mounted under the name boot. Raspberry Pis usually comes with disabled SSH configuration. We don’t want that. To enable it create an empty file inside the boot directory.

For MacOS, you can find it under /Volumes/boot

cd /Volumes/boot/

Now, type

touch ssh

Now, we have successfully configured a Raspbian Lite OS having ssh enabled.

Let’s eject the card from the Mac

cd

diskutil unmountDisk /dev/disk2

Repeat this process for all three memory cards. Now insert the cards to your Raspberry Pis. Remember to mark the master node to separate it from others.

Now plug in all the three memory cards in to the storage port of Raspberry Pis. Then connect the network cables(CAT5/6/6A) to in the ethernet port of Pis. Do not power on the Pis at the moment.

External storage as shared storage

The concept of cluster is based on idea of working together. In order to do so, they need to have access to the same files. We can arrange this by mounting an external SSD drive (not necessary but convenient and faster) or flash drive, and exporting that storage as a network file system (NFS). It would allow us to access the files from all nodes.

The process is straight forward and simple. At this point, insert the external storage into your master node.

Step - 2: Network Setup

To do this part, you need a wireless router with DHCP enabled. The Dynamic Host Configuration Protocol (DHCP) will allocate IPs as soon as we connect our Raspberry Pis to the network. If you have network switch, first plugin the other end of the ethernet cables connected to Pis. Now plugin one extra cable from switch to Wireless router. Physical network complete. Now power on the wireless router and the switch.

Note: If you do not have the network switch, then connect the network cables directly to the wireless router.

Now, login to the wireless router management page using browser. If your laptop is connected to the same network, just type the gateway IP. e.g. if you IP is 10.10.0.10, usually your gateway is 10.10.0.1, it's really simple. If you have trouble getting into the management page look for proper information on the router body. It's written somewhere on the body. After getting into the management page, go to the connected devices page and keep it open.

Setting up Wireless Router for the first time

⚠️ Instructions are applicable for TP-Link Routers. 🎬 Video instruction

Similar informations are available for other manufactures:

D-Link Routers: Click Here 👈
NETGEAR Routers: Click Here👈

If you want to use the router as modem, you need to follow the instruction given in the picture below: ⚠️ If you already have a modem and want use this device as a router only, follow the instruction below: Now, power on the master node first by connecting the USB-C cable from a power outlet (or the 6-port USB power supply) and keep refresing the page. If everything goes well, you should see a new device named raspberrypi connected to the network. Now note down the IPV4 address associated with it.

Next, power on one of the compute nodes and do the same (note it as node01). Repeat the process for all the compute nodes. At the end, you should have something similar to the following information with you:

master IPV4: 10.10.0.11
node01 IPV4: 10.10.0.12
node02 IPV4: 10.10.0.13

Now, try to ping each of the Pis from your computer terminal and wait for couple of seconds, then kill it by pressing Ctrl + c.

ping 10.10.0.11

You should get an output very similar to the following

PING 10.10.0.11: 56 data bytes
64 bytes from 10.10.0.11: icmp_seq=0 ttl=59 time=1.947 ms
64 bytes from 10.10.0.11: icmp_seq=1 ttl=59 time=3.582 ms
64 bytes from 10.10.0.11: icmp_seq=2 ttl=59 time=3.595 ms
64 bytes from 10.10.0.11: icmp_seq=3 ttl=59 time=3.619 ms
...
--- 10.10.0.11 ping statistics ---
6 packets transmitted, 6 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.947/3.317/3.635/0.614 ms

Note: If you have options to reserve IP on your wireless router management page, it is advised to do so for all the Pis. However, it is not mandetory.

Step - 3: Setting Up the Master Node

Now, log in to your master node using

ssh pi@10.10.0.11

Upon connection use password raspberry. (Note: it is the default password)

Now, use the following command to download the shell script for master node

pi@raspberrypi~$ wget https://raw.githubusercontent.com/sayanadhikari/wipi/automated/deployment/master_deployment.sh

The script you just downloaded should be in /home/pi/ with the name master_deployment.sh.

Safety check for shared storage:

Just before setting up the network, we inserted the external storage into master node. Now use the following command to see the dev location and mount point of your storage.

pi@raspberrypi ~> lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
mmcblk0     105:0    0  7.4G  0 disk 
├─mmcblk0p1 105:1    0 43.8M  0 part /boot
└─mmcblk0p2 105:2    0  7.4G  0 part /
sda         3:16     0 59.2G  0 disk
└─sda1      3:17     0 59.2G  0 part

In our case, the main partition of the external storage is mounted at /dev/sda1. However, if it is different for you, you should edit the master_deployment.sh file and modify the following lines accordingly. (Note: replace /dev/sda1 with /dev/sda2 or whatever path you get from lsblk command)

mkfs.ext4 /dev/sda1
UUID=$(blkid -o value -s UUID /dev/sda1)

Now run the script to prepare the master node.

pi@raspberrypi~$ sudo bash ./master_deployment.sh

At the end of the script execution, the system will automatically reboot. After reboot, log-in to the master node again using

ssh pi@10.10.0.11

and run the following command

pi@master~$ sudo chmod 777 -R /shared_dir

Step - 4: Setting Up the Worker Nodes

We already have the IPs for worker nodes See Step - 2. Now let's prepare them one by one. Log into node01 by using the following command,

ssh pi@10.10.0.12

Upon connection use password raspberry. (Note: it is the default password)

Now, use the following command to download the shell script for worker node

pi@raspberrypi~$ wget https://raw.githubusercontent.com/sayanadhikari/wipi/automated/deployment/node_deployment.sh

The script you just downloaded should be in /home/pi/ with the name node_deployment.sh. Now run the script to prepare the worker node.

pi@raspberrypi~$ sudo bash ./node_deployment.sh node01

At the end of the script execution, the system will automatically reboot.

Now repeat the process for rest of the worker nodes. Login to rest of the worker nodes using their respective ips (information available at Step-2). Also, remember to replace “node01” word in the last command with their respective node numbers.

Step - 5: Configuring SLURM on master Node

Slurm is an open source, and highly scalable cluster management and job scheduling system. It can be used for both large and small Linux clusters. Let’s install it on our Pi cluster. To do that, first we need to login to the master node using ssh again,

ssh pi@10.10.0.11

Then, go to the deployment directory inside the wipi repository,

pi@master ~> cd /home/pi/wipi/deployment

To allow password-less ssh across the system, run the password_less_ssh_master.sh using the following command,

pi@master~$ bash password_less_ssh_master.sh

Now run the script slurm_config_master.sh to prepare the master node for slurm.

pi@master~$ sudo bash slurm_config_master.sh

To ensure smooth operation, the system will reboot at this point.

Step - 6: Configuring SLURM on Compute Nodes

We have successfully configured the master node, we need to do the same on compute nodes. Now, log into the one of the nodes let's say node01

ssh pi@10.10.0.12

Then, go to the deployment directory inside the wipi repository,

pi@node01 ~> cd /home/pi/wipi/deployment

Now run the script slurm_config_nodes.sh to prepare the worker node for slurm.

pi@node01~$ sudo bash slurm_config_nodes.sh

To ensure smooth operation, the system will reboot at this point.

Diagnostic check for MUNGE

Now, we need to verify whether our the SLURM controller can successfully authenticate with the client nodes using munge. In order to do that, we need to login to master node and use the following command,

pi@master ~> ssh pi@node01 munge -n | unmunge

Upon successful operation, you should get output something similar to the following,

ssh pi@node01 munge -n | unmunge
pi@node01's password: 
STATUS:           Success (0)
ENCODE_HOST:      master (127.0.1.1)
ENCODE_TIME:      2020-08-30 22:45:00 +0200 (1598820300)
DECODE_TIME:      2020-08-30 22:45:00 +0200 (1598820300)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha256 (5)
ZIP:              none (0)
UID:              pi (1001)
GID:              pi (1001)
LENGTH:           0

Sometime, you might get an error, which indicates that you may have failed to copy the exact munge key to the nodes.

Now repeat this process on all the other nodes.

Step - 7: Test SLURM

Login to master node using ssh and type the following command

pi@master ~>sinfo

You should get an output something like this

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
picluster*    up   infinite      2   idle node[01-03]

You can simply run a task to ask the hostname for each node

pi@master ~>srun --nodes=2 hostname

It will give you an output similar to

node02
node01

Step - 8: Powering On and Off (Cluster)

Write a shell script with the following lines of codes and save it as clusterup.sh

#!/bin/bash
sudo scontrol update NodeName=node[01-02] state=resume
sinfo
echo "Nodes up and running"

Each time you power on your cluster, run the following command at the startup,

pi@master ~>clusterup

Each time you need to power off your cluster, run the following command,

pi@master ~>clusterdown

If you wanr to check the temperatures of individual nodes use the following command,

pi@master ~>tempcheck

Step - 9: OpenMPI

OpenMPI is the Open sourced Message Passing Interface. In short it is a very abstract description on how messages can be exchanged between different processes. It will allow us to run a job across multiple nodes connected to the same cluster.

Now let's test OPENMPI, go to the open_mpi directory inside shared_dir

pi@master ~>cd /shared_dir/open_mpi

Now, compile the program using mpicc

mpicc hello_mpi.c

This would create an executable name a.out You can run the executable using the following command

mpirun -np 2 -hostfile hostfile ./a.out

Now, let’s test the same using SLURM job manager. In order to do so, first we have to create a job script. Go to the slurm_jobs directory inside shared_dir

pi@master ~>cd /shared_dir/slurm_jobs

To submit a job use the following command

sbatch hello_mpi_slurm.sh

To view the status of any job

squeue -u pi

NOTE: pi is your username

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
cluster_manage_scripts		cluster_manage_scripts
deployment		deployment
docs/old_docs		docs/old_docs
images		images
open_mpi		open_mpi
slurm_jobs		slurm_jobs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building HPC with Raspberry Pi

Step - 0: The Hardware

Step - 1: Prepare the Micro-SD cards for Raspberry Pi OS

External storage as shared storage

Step - 2: Network Setup

Setting up Wireless Router for the first time

Step - 3: Setting Up the Master Node

Safety check for shared storage:

Step - 4: Setting Up the Worker Nodes

Step - 5: Configuring SLURM on master Node

Step - 6: Configuring SLURM on Compute Nodes

Diagnostic check for MUNGE

Step - 7: Test SLURM

Step - 8: Powering On and Off (Cluster)

Step - 9: OpenMPI

References:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

projectRaspberry/wipi

Folders and files

Latest commit

History

Repository files navigation

Building HPC with Raspberry Pi

Step - 0: The Hardware

Step - 1: Prepare the Micro-SD cards for Raspberry Pi OS

External storage as shared storage

Step - 2: Network Setup

Setting up Wireless Router for the first time

Step - 3: Setting Up the Master Node

Safety check for shared storage:

Step - 4: Setting Up the Worker Nodes

Step - 5: Configuring SLURM on master Node

Step - 6: Configuring SLURM on Compute Nodes

Diagnostic check for MUNGE

Step - 7: Test SLURM

Step - 8: Powering On and Off (Cluster)

Step - 9: OpenMPI

References:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages