Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adding a hook mechanism #94

Open
tyralla opened this issue Nov 16, 2022 · 1 comment
Open

Adding a hook mechanism #94

tyralla opened this issue Nov 16, 2022 · 1 comment

Comments

@tyralla
Copy link
Member

tyralla commented Nov 16, 2022

Except for time-series data, HydPy projects usually only consist of Python code. On one side, we have the network, control parameter, and initial condition files defining a hydrological model for a specific basin. These usually exist within a single directory and contain standardised and readable Python code (but not always the most concise one). On the other side, short individual scripts or relatively complex packages consisting of modules and scripts define our workflows, for example, how we calibrate parameters or perform forecasts. These "workflow files" are usually clearly separated from the "model files" (often, the "model files" are generated by some of the "workflow files"). This separation comes with advantages but also disadvantages.

  1. In one project, we calibrated a precipitation correction factor (among other parameters) based on measured precipitation. For forecasting based on meteorological model output, we expected other precipitation errors and thus changed the precipitation correction factor but left all other parameters as is. We did so by loading the existing control files, modifying the correction factor, and saving this change by writing new control files. So now we have two colossal control directories that differ at most in one value per file.
  2. When setting up a project based on the "input node" concept, a part of the required configuration is not part of the "model files" but of the "workflow files". Hence, anybody writing a new (independent) script to run the model must know this detail and add the corresponding commands.
  3. Currently, XML files do not allow defining the deploy mode. Hence, it is currently not possible to calibrate a subcatchment based on the observed or previously simulated inflow of its upstream neighbours via tools like OpenDA.
  4. At the time of writing, the basics of the new submodel concept (Modularisation: the Submodel concept #90) and its first implementation (Support using GARTO as a submodel of HydPy-L #91) are working. However, many relevant features are still missing. For example, there is still no standard way to automatically read or write a submodel's control or condition file. Hence, we need to do some extra work in the "workflow files" and cannot use the XML support (without adding some hacks).
  5. Often, we want options in our river basin models, for example, including/excluding control structures to/from a river network. Currently, we write network files that are duplicated in most aspects in such cases.
  6. Last but not least, there is still no way to specify the most general options within the "model files". For example, each workflow script must know the data format of the available input time series (see Save project settings. #47).

These are the first disadvantages that came to my mind (maybe we should extend the list later not to miss some critical points). Of course, some will resolve themselves automatically as HydPy evolves, but we can be sure enough that others will emerge.

After first discussions on this topic, we came to the preliminary conclusion of adding a "hook mechanism" to HydPy. First, we would need to define some "hook points" (as sphinx names it), for example:

  1. after initialising the HydPy class (e.g. for setting the most general options)
  2. after executing prepare_network (e.g. for adding a dam model into the river network)
  3. after executing prepare_models (e.g. for adding GARTO submodels)
  4. after executing load_conditions (e.g. for loading the submodels' initial conditions)
  5. before executing the different load_series methods (e.g. to specify different source directories)
  6. after executing the different load_series methods (e.g. to load the time series of some "input nodes")

(Adding hook points before executing prepare_network, prepare_models, and load_conditions does not seem as important as for the different load_series methods, but it would possibly give a more consistent appeal.)

Additionally, we could provide some "replacement hooks" to override the original behaviour completely, for example, for reading time series from an unsupported file type.

A hook function for preparing and connecting GARTO submodels could look like this:

import os, runpy
from hydpy import pub

def after_prepare_models(hp: HydPy) -> None:
    for catchment in hp.elements.search_keywords("catchment"):
        filename = f"{catchment.name}_ga.py"
        filepath = os.path.join(pub.controlmanager.currentpath, filename)
        with hydpy.pub.options.warnsimulationstep(False):
            soilmodel = runpy.run_path(filepath)["model"]
        soilmodel.parameters.update()
        gebiet.model.soilmodel = soilmodel

But where to define such functions within the project structure? And how to choose the relevant ones? In some cases, certain hook functions should always be executed, but in other cases, the user should choose and possibly even be able to pass some arguments (to select the relevant dam models, the suitable deploy mode and so on).

One could collect multiple methods addressing a specific hook point within one class. Specialised base classes would help clarify the connection and possibly add general functionality if required later. Decorators like optional and required could define under which circumstances the functions are executed (and eventually, how).

A first draft:

from hydpy import AfterPrepareModels

class AfterPrepareModelsLahnH(AfterPrepareModels):

    @optional
    def load_garto(keyword: str = "catchment") -> None:
        for catchment in self.hp.elements.search_keywords(keyword):
            filename = f"{catchment.name}_ga.py"
            filepath = os.path.join(pub.controlmanager.currentpath, filename)
            with self.pub.options.warnsimulationstep(False):
                soilmodel = runpy.run_path(filepath)["model"]
            soilmodel.parameters.update()
            gebiet.model.soilmodel = soilmodel

    @optional
    def modify_precipitation_correction_factor(factor: float) -> None
        for catchment in self.hp.elements.search_keywords("catchment"):
            catchment.model.parameters.control.pcorr(factor)

In XML, we could activate (optional) hook functions like this (not sure if this can work):

    <hooks>
        <modify_precipitation_correction_factor>
            <argument name="factor" value="1.0"/>
        </modify_precipitation_correction_factor>
    </hooks>

Alternatively (should work):

    <hooks>
        <function name="modify_precipitation_correction_factor">
            <argument name="factor" value="1.0"/>
        </function>
    </hooks>

We could place all hook methods in a single file called "hooks.py" or so. An optional "hooks" directory with arbitrarily named "hook files" could be an alternative. The latter could simplify copy-pasting hook functions between different projects. On the downside, we would need to consider handling name collisions and execution orders of hook functions.

A few first thoughts on using optional hook functions in Python scripts (not compatible with "after_hydpy_init"):

from hydpy import HydPy, pub
pub.timegrids = "2000-01-01",  "2001-01-01", "1d"
hp = HydPy("LahnH")
hp.hooks.after_prepare_models["load_garto"].activate()
hp.hooks.after_prepare_models.modify_precipitation_correction_factor.activate(factor=1.0)
hp.prepare_network()
hp.prepare_models()
...

Or one would just execute them at the right time:

from hydpy import HydPy, pub
pub.timegrids = "2000-01-01",  "2001-01-01", "1d"
hp = HydPy("LahnH")
hp.prepare_network()
hp.prepare_models()
hp.hooks.after_prepare_models["load_garto"]()
hp.hooks.after_prepare_models.modify_precipitation_correction_factor(factor=1.0)
...
@tyralla
Copy link
Member Author

tyralla commented Sep 11, 2023

We added the general CallbackParameter, which allows manipulating parameter values via callback functions. See Calc_Discharge_V3 and user-defined-control for examples. This approach covers different use cases than the ones discussed here but is still strongly related, so I mention it here as a reminder for later discussions.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant