-
Notifications
You must be signed in to change notification settings - Fork 7
VSC ReFrame meeting 2022 02 10
Sam Moors edited this page Feb 10, 2022
·
1 revision
- Sam Moors (VUB)
- Kenneth Hoste (UGent)
- Franky Backeljauw (Antwerp)
- Michele Pugno (Antwerp)
- Robin Verschoren (Antwerp)
- Maxime Van den Bossche (Leuven)
- Steven Vandenbrande (Leuven)
- currently separate systems for
hydra
,hortense
, ...- could also have a common
vsc
system - could also be tags instead?
-
vsc
tag for tests that should work anywhere - site-specific tags for tests that don't work everywhere (yet):
ugent
,vub
,kul
, ... -
mpi
,single_node
,gpu
?
-
- could also have a common
- current tests work for:
- VUB (Sam)
- Hortense (Kenneth)
- KUL (Steven)
- default launcher in ReFrame config: mpirun
- KUL tests assume this in their tests
- UAntwerpen (Michele)
- launcher: site-specific or same one everywhere
- mpirun (used-focused/Torque) vs srun (Slurm)
- allowing site-specific launcher (using RE
vsc:*
tag??)-
vsc:torque
,vsc:slurm
-
- identify site via
$VSC_INSTITUTE
environment variable - common version of reframe 3.10.1 (developers are reckless)
- agreed in CUE (see list at ...)
- presence and validity of
VSC_
environment variables - presence of system tools like
singularity
, ... + version - availability + path of different shared filesystems (home/data)
- important for e.g. Globus
- local testing (Ghent VSC account on Ghent system) vs cross-site testing
- presence and validity of
- how do users check storage quota (tools?)
- UAntwerpen uses
myquota
command - UGent Tier-2: via accountpage
- Tier-1 Hortense scratch:
- UAntwerpen uses
- ideas
- availablility of common software modules?
- ReFrame module
- EESSI stack?
- toolchains
- MPI launcher?
- srun (VUB, UA @ Vaughn)
- mpirun (KUL, UA @ Leibniz)
- mympirun (UGent)
- availablility of common software modules?
- testing the VSC network (connecting to other VSC sites, ...)
- perfsonar project (VSC project @ UA),
iperf
- should not be part of ReFrame test suite
- continuous performance monitoring
- connectivity + performance
- perfsonar project (VSC project @ UA),
- submitting simple jobs to different partitions, queues (in order of importance)
- single-core, multi-core, multi-node?, single-GPU?
- Different schedulers might be a problem e.g. torque vs slurm
- A job that tests itself and the env variables of the executing instance
- test the node file or equivalent env variable
- tests verify that recommendations in docs work (and keep working)
- also list jobs, delete jobs, ...
- single-core, multi-core, multi-node?, single-GPU?
- CPU tests
- HPL (LINPACK)
- c-ray (ray tracing)
- BLAS-Tester
- memory tests
- STREAM
- shared storage tests
- IOR
- network tests
- OSU Microbenchmarks (latency, bandwidth)
- basic MPI tests (hello world, ring, ...)
- CP2K
- GROMACS
- Python, numpy
- R, Bioconductor
- TensorFlow
- OpenFOAM
- collect data about:
- functionality, verification of results, performance
- CSCS: https://github.com/eth-cscs/reframe/tree/master/hpctestlib
- HPC-VUB: https://github.com/vub-hpc/reframe-tests
- StackHPC: https://github.com/stackhpc/hpc-tests
- HPC-UGent: https://github.ugent.be/hpcugent/vsc-testing/tree/master/ReFrame (private internal repo currently)
- HPC-KUL: (not public yet?)
- SURF (not public yet?)
- Univ. of Birmingham (not public yet?)
- non-ReFrame repos:
- common repo: https://github.com/vscentrum/vsc-test-suite
- run from any site, automatically spawn to all clusters?
- dedicated credits account on Leuven systems + Hortense?
- how to collect/present the data?
- currently ReFrame only logs perf data
- about to change, see https://github.com/eth-cscs/reframe/issues/2394
- Fake sanity with performance as workaround?
- send logs to ELK stack?
- GreyLog + Grafana?
- Push it back in github repo? Easier way
- Otherwise how do we access running server with log manager?
- currently ReFrame only logs perf data
- run weekly/monthly?
- Difference between large and small tests?
- dealing with different scheduler frontends (Torque, Slurm)
- using tags, create system partitions with a common prefix
-
next meeting: Thu 10 Mar 2022 - 14:00
-
folder structure (
reframe -R -r
), create atests
folder with: (Michele)run.sh --tags xxx --site yyy
-
tests
common.py
-
constants.py
UGENT = 'ugent'
-
cue
common.py
env.py
-
micro
-
mpi
common.py
hello.py
-
-
apps
-
python
numpy.py
-
openfoam
motorbike.py
-
-
CUE tests
- env (Kenneth, Franky)
- see CUE list
- is it defined?
- is value correct?
- do path variables point to existing paths? (Different test class in the implementation)
- tools (Sam, Michele)
- is command available
- check version (range, greater equal than )
- shared FSs (Robin, Sam)
/home
/data/<site>/<account>
/scratch/...
- env (Kenneth, Franky)
-
MPI hello world (Steven, Kenneth)
-
Franky: share list of env + tools CUE
-
script Michele to run all the tests
export BIN_DIR=/apps/antwerpen/reframe/versions/current/bin
export TESTS_DIR=/apps/antwerpen/reframe/testsuite
export RFM_CONFIG_FILE=$TESTS_DIR/config/settings.py
$BIN_DIR/reframe -v --prefix $TESTS_DIR --perflogdir $TESTS_DIR/perflogs -s $TESTS_DIR/stage -o $TESTS_DIR/output -c $TESTS_DIR -R -r --performance-report