Alternate implementation of MaxConcurrentIO parameter #41

jchelly · 2024-05-31T10:48:39Z

On the FLAMINGO 10k run I've been finding that if not all ranks are allowed to read at the same time then the code is very slow. I think this might be because if the system is busy and a few ranks suffer long delays then the others are forced to wait. The current implementation divides the MPI ranks into groups and only one group at a time may read. None of the ranks in the next group can start until ALL ranks in the previous group finish.

This pull request modifies the code so that as soon as any one rank finishes reading another is immediately allowed to start. This is implemented by having the first rank which finishes reading become responsible for signalling the others to start.

The previous implementation split the MPI ranks into groups and prevented any ranks in the next group from proceeding until all ranks in the current group have finished. This will waste a lot of time if one (or a few) ranks are very slow. This new implementation tries to make sure that we always have MaxConcurrentIO ranks reading. When any one rank finishes the next is allowed to start immediately.

jchelly · 2024-12-16T16:43:21Z

I also found that on COLIBRE L400M7 DMO the code suddenly started taking a long time to write the output at a particular snapshot. This pull request might help a bit if it's due to certain OSTs being slow.

I think it might also be worth looking into increasing the metadata block size to avoid small writes (H5Pset_meta_block_size) and paged file space management (H5Pset_file_space_strategy) to align data blocks with lustre stripes.

jchelly added 8 commits May 29, 2024 16:25

Renumber MPI ranks in concurrent IO code

9bc80bc

Fix wrong communicator in task_limited_section.h

dd79886

Add unit test for task_limited_section.h

907a275

Merge branch 'master' into limit_io_ranks

7892a9c

Use TaskLimitedSection to limit ranks writing Sub/SrcSnap files

0cde664

Fix incorrect function prototypes

60d8403

Add missing includes

4be4eb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternate implementation of MaxConcurrentIO parameter #41

Alternate implementation of MaxConcurrentIO parameter #41

jchelly commented May 31, 2024

jchelly commented Dec 16, 2024

Alternate implementation of MaxConcurrentIO parameter #41

Are you sure you want to change the base?

Alternate implementation of MaxConcurrentIO parameter #41

Conversation

jchelly commented May 31, 2024

jchelly commented Dec 16, 2024