-
Notifications
You must be signed in to change notification settings - Fork 13
XIOS Integration
17th February 2023
Dr Alexander R. Smith, University of Cambridge
Target Release Date: Spring 2023 (MVP)
Reviewers: Tim Spain (Core Impact), Kacper Kornet (MPI Integration)
Issue Number: #212
Contents
Nextsim-DG currently reads in information and outputs data to and from NetCDF files using the NetCDF library, netcdf-cxx4. Read occurs during model initialisation and write occurs at the end of each dynamics step and at the end of the simulation. However it is believed that using XIOS to perform the write at the end of the dynamics step will improve performance. It will do this:
-
By enabling asynchronous writing to a single file, we no longer require all calculations to have been completed in every grid cell. When MPI is enabled, we will be able to split the surface into grid cells and distribute the calculations across processors/nodes. Similarly, with XIOS, we are able to distribute the write task across the grid using the XIOS API e.g. xios_send_field. There is no synchronisation required by XIOS before writing and it leverages NetCDF-Parallel/HDF5 to enable asynchronous write. Calculations can therefore continue while other grid cells play catch-up.
The intention is to later enable coupling between different simulations/model components and this may alter the data access requirements of Nextsim-DG. Seek further clarification
There are some outstanding questions which will affect the ultimate speed-up gained by leveraging XIOS:
- Is the dynamics or the read/write the rate limiting step? If read/write at the end of a dynamics step is causing performance issues, we would expect to see reduction in runtime. If dynamics calculations are the limiting step, we may expect to see a more complex scaling.
After implementation we should take the corresponding actions:
- Assess the scaling relationship between compute time and number of nodes for
- Regular grid of intended scale
- Grid created with the domain decomposition tool (designed to balance load)
We should also:
- Verify the XIOS scaling claim
XIOS runs on Linux and Mac using GCC and specific X64 architectures. A full list of architectures is available (see 'Functional Design: Build' section for details)
We require that XIOS
-
Must not become a compulsory dependency at this stage and, in the event that a user does not link against XIOS during build, we revert back to previous functionality.
-
Can create Nextsim-DG data outputs --- a diagnostic file and a restart file --- should be output via methods which leverage XIOS and match current outputs. Production of the log file will be unchanged.
-
Must be enabled/disabled by a parameter in the config file under it's own '[Xios]' heading.
-
Data can be managed by calls to methods provided in the Nextsim namespace.
Note:
- Initialisation of the ModelState is performed in Model class via 'initialState(StructureFactory::stateFromFile(initialFileName));' and it could continue to be performed this way in the MVP before being copied to XIOS infrastructure. However, it may be worth considering initialising the XIOS state from data and then copying that information to the Model State...
Users will need to download XIOS from the SVN server and create a local build (We will provide commands in the documentation). This is similar to how we require users to provide other dependencies e.g. eigen
, netcdf
etc.
svn checkout http://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/trunk xios
XIOS's build requires user to specify what system they are running on. When running ./make_xios
you must also specify the --arch
flag, e.g. for a basic linux you might use:
./make_xios --arch GCC_LINUX
Where the full list of architectures is available at
./make_xios --avail
So while we can update Nextsim-DG documentation to include the extra download and build step, the XIOS build step requires users to specify a flag on the command line. If their system is not in the list provided by XIOS, then the user will need to create a custom file to specify their system for XIOS and provide that filename when using the --arch flag.
To build on a machine that doesn't have an existing config I recommend modifying these two files:
arch/arch-GCC_LINUX.path
arch/arch-GCC_LINUX.env
You will also need the following dependencies:
curl
-
hdf5
(with+cxx+fortran+hl+mpi
) -
netcdf-c
(with+mpi
and optionally with+parallel-netcdf
for full XIOS functionality) netcdf-fortran
Now you can use ./make_xios --arch GCC_LINUX
to build the code. The additional options --job 8 --full
are useful. The first will run with 8 processes and the second will effectively run a make clean
before installing. This is useful if you have an existing build you want to overwrite.
XIOS is built using FCM
and therefore does not leave a .cmake
file specifying how the library build was configured. This makes the find_package
process in our CMake
less streamlined that other packages e.g., netcdf
, mpi
, openMP
etc. Initially, we will require users to set XIOS_DIR
either an environment variable specifying build details (XIOS lib/ folder, inc/ folder) or add this to the cmake command line arguments. We will then provide a FindXios.cmake
file in a cmake/
directory under the top nextsim directory and this will respond to the environment variables set by the user and will enable CMakeLists.txt
to include and link XIOS
.
Users will be able to enable XIOS via a flag in the config file. Default behaviour when this flag is not included (but XIOS is available in the build) is set to false. Decision point: What do we want here?
New cmake folder for the FindXios.cmake file. Suggest that this should replace the build folder and should/could be tied in with the change to simplify the CMake configuration.
A series of setters and getters have been created to handle the calendar properties. This removes the need for any pre-processing steps that ensure that the iodef.xml file matches the content of the Nextsim DG config file.
Before accessing any calendar information, it must be set beforehand either using these methods or defined within the iodef.xml (or children).
These calendar access methods lean on inc/my_xios.hpp (now xios_c_interface.hpp) in the xios_cpp_toy repository. Some of the forward declarations/bindings in this list were missing, in particular for the converter functions between cxios types and strings. However, use of some of these methods proved problematic, raising Xios CExceptions. As these were undocumented and involved debugging into a third-party library, I decided to create my own converters that create the corresponding Xios data types cxios_date and cxios_duration (which behave like structs):
- convertXiosDatetimeToString
- convertStringToXiosDatetime
- convertXiosDurationToString
- convertStringToXiosDuration
One benefit of this approach is that we already need to convert between datetime standards for Xios and Nextsim, so the cost of maintenance is not as high as it initially appears.
Nextsim-DG follows ISO 8061 standard while XIOS returns without the 'T' and 'Z'
Nextsim-DG uses ISO 1806 duration except it uses the convention PY-DTh:m:s
where D can run to 365 and h, m and s can be any int, including those larger than 24, 60 and 60 respectively. No additional padding is required where an entry is zero e.g. 10 hours can be P0-0T10:0:0
. Use of the format PY-M-DTh:m:s
is allowed but 1M
= 30 days and is converted to seconds internally for all calculations. We may wish to throw a warning in this scenario and we will need to translate all M
entries to 30 days to advance duration in XIOS calendar. We should ensure this is documented on the XIOS pages and the Nextsim-DG time/config file pages.
We should consider how we handle time in the config files for Nextsim-DG.
Outstanding questions:
- After implementation, what is the best way of enabling use of Nextsim-DG when XIOS cannot be installed or built or linked correctly?
Implementing XIOS functionality in a separate IO handler class allows the code's purpose to be clearly defined and prevents bloat. Access to XIOS functionality should ordinarily be via the existing methods and a flag in the config file (or failure to build with the XIOS library) will determine if read/write is via XIOS.
XIOS handler class will be in core/, and due to the above, dynamics and physics will not depend on the class.
XIOS class will be created and maintained by model (or an adjacent) class.
XIOS handler will need to be able to process information regarding the grid-processor distribution and creates a dependency on the MPI feature.
Interfacing with the XIOS library and XIOS calls will be made by a handler class which will be owned by model or grid (TBC). Server initialisation will be handled during XIOS class creation.
A minimalist XIOS XML file will be provided in the MVP under core/src/ and XIOS state will be handled (approximately) when Model State is initialised. Xios state will need to be periodically updated. Data writes will be performed alongside Dev/Rect/Para-Grid DumpModelState methods.
Will depend on the MPI support (or create upstream impact)
To be continued...
We will want tests which both verify that the XIOS library has been installed correctly by the user and doesn't produce errors at runtime and also verifies that our usage of XIOS is and continues to be correct and performing functionality/behaviours as we had intended.
Unit Tests:
- XIOS C++ Demo run inside
DocTest
as part of Nextsim test suite- Does it run? Can the XIOS server be initialised on Np=1,2,3?
- Can we recreate a 'golden' output file for a demo grid (Requirement 2)
- Can we modify data and query it? (Requirement 4)
Integration/System Tests:
-
Feature Control/Access Switch
- Can we control usage of XIOS via switch(es) in the config file? Do outputs match in both cases? (Requirement 3)
- Can we compile XIOS when not linked during build? (Requirement 1)
-
Dump Model Data
- Can we run the model for each grid type and produce the 'golden' output file (restart.nc)? (Requirement 2)
Feature Specifications By Component
-
General IO
-
Developer Tools
-
Model Architecture
Draft Component Specifications
- Coupling