diff --git a/docs/source/tutorials/artifact.rst b/docs/source/tutorials/artifact.rst index d819d05be..5e04d7336 100644 --- a/docs/source/tutorials/artifact.rst +++ b/docs/source/tutorials/artifact.rst @@ -2,6 +2,11 @@ Artifact ======== +.. todo:: + + This tutorial is very out of date and needs to be overhauled. The basic + concepts are still correct, but the code examples are not. + A data artifact is a bundle of input data associated with a particular model. It is typically stored as an ``hdf`` file on disk with very particular formatting. This file is then used by the :mod:`vivarium` simulations to fill diff --git a/docs/source/tutorials/boids.rst b/docs/source/tutorials/boids.rst index 2f9b2010f..4bb2dc833 100644 --- a/docs/source/tutorials/boids.rst +++ b/docs/source/tutorials/boids.rst @@ -47,7 +47,7 @@ Imports +++++++ .. literalinclude:: ../../../src/vivarium/examples/boids/population.py - :lines: 1-8 + :lines: 1-6 :linenos: `NumPy `_ is a library for doing high performance @@ -78,48 +78,22 @@ configuration information. Components typically expose the values they use in the ``configuration_defaults`` class attribute. .. literalinclude:: ../../../src/vivarium/examples/boids/population.py - :lines: 15-19 + :lines: 13-17 :dedent: 4 :linenos: - :lineno-start: 15 + :lineno-start: 13 We'll talk more about configuration information later. For now observe that we're exposing a set of possible colors for our boids. -The ``setup`` method -++++++++++++++++++++ - -Almost every component in Vivarium will have a setup method. The setup method -gives the component access to an instance of the -:class:`~vivarium.framework.engine.Builder` which exposes a handful of tools -to help build components. The simulation framework is responsible for calling -the setup method on components and providing the builder to them. We'll -explore these tools that the builder provides in detail as we go. - -.. literalinclude:: ../../../src/vivarium/examples/boids/population.py - :lines: 26-27 - :dedent: 4 - :linenos: - :lineno-start: 26 - -Our setup method is pretty simple: we just save the configured colors for later use. -The component is accessing the subsection of the configuration that it cares about. -The full simulation configuration is available from the builder as -``builder.configuration``. You can treat the configuration object just like -a nested python -`dictionary `_ -that's been extended to support dot-style attribute access. Our access here -mirrors what's in the ``configuration_defaults`` at the top of the class -definition. - The ``columns_created`` property ++++++++++++++++++++++++++++++++ .. literalinclude:: ../../../src/vivarium/examples/boids/population.py - :lines: 20 + :lines: 18 :dedent: 4 :linenos: - :lineno-start: 20 + :lineno-start: 18 The ``columns_created`` property tells Vivarium what columns (or "attributes") the component will add to the population table. @@ -138,6 +112,32 @@ See the next section for where we actually create these columns. __ https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html +The ``setup`` method +++++++++++++++++++++ + +Almost every component in Vivarium will have a setup method. The setup method +gives the component access to an instance of the +:class:`~vivarium.framework.engine.Builder` which exposes a handful of tools +to help build components. The simulation framework is responsible for calling +the setup method on components and providing the builder to them. We'll +explore these tools that the builder provides in detail as we go. + +.. literalinclude:: ../../../src/vivarium/examples/boids/population.py + :lines: 24-25 + :dedent: 4 + :linenos: + :lineno-start: 24 + +Our setup method is pretty simple: we just save the configured colors for later use. +The component is accessing the subsection of the configuration that it cares about. +The full simulation configuration is available from the builder as +``builder.configuration``. You can treat the configuration object just like +a nested python +`dictionary `_ +that's been extended to support dot-style attribute access. Our access here +mirrors what's in the ``configuration_defaults`` at the top of the class +definition. + The ``on_initialize_simulants`` method ++++++++++++++++++++++++++++++++++++++ @@ -148,10 +148,10 @@ This is where we should initialize values in the ``columns_created`` by this component. .. literalinclude:: ../../../src/vivarium/examples/boids/population.py - :lines: 33-41 + :lines: 31-39 :dedent: 4 :linenos: - :lineno-start: 33 + :lineno-start: 31 We see that like the ``setup`` method, ``on_initialize_simulants`` takes in a special argument that we don't provide. This argument, ``pop_data`` is an @@ -173,12 +173,12 @@ simulation time when the simulant was generated. simulation to look up information, calculate simulant-specific values, and update information about the simulants' state. -Using the population index, we generate a ``pandas.DataFrame`` on lines 34-40 +Using the population index, we generate a ``pandas.DataFrame`` on lines 32-38 and fill it with the initial values of 'entrance_time' and 'color' for each new simulant. Right now, this is just a table with data hanging out in our simulation. To actually do something, we have to tell Vivarium's population management system to update the underlying population table, which we do -on line 41. +on line 39. Putting it together +++++++++++++++++++ @@ -260,19 +260,19 @@ We call the :meth:`vivarium.framework.values.ValuesInterface.register_value_prod method to register a new pipeline. .. literalinclude:: ../../../src/vivarium/examples/boids/movement.py - :lines: 34-36 + :lines: 32-34 :dedent: 4 :linenos: - :lineno-start: 34 + :lineno-start: 32 This call provides a ``source`` function for our pipeline, which initializes the values. In this case, the default is zero acceleration: .. literalinclude:: ../../../src/vivarium/examples/boids/movement.py - :lines: 42-43 + :lines: 40-41 :dedent: 4 :linenos: - :lineno-start: 42 + :lineno-start: 40 This may seem pointless, since acceleration will always be zero. Value pipelines have another feature we will see later: other components can *modify* @@ -297,10 +297,10 @@ we simply call that pipeline as a function, using ``event.index``, which is the set of simulants affected by the event (in this case, all of them). .. literalinclude:: ../../../src/vivarium/examples/boids/movement.py - :lines: 63-87 + :lines: 61-85 :dedent: 4 :linenos: - :lineno-start: 63 + :lineno-start: 61 Putting it together +++++++++++++++++++ @@ -392,7 +392,7 @@ boids and maybe some arrows to indicated their velocity. .. literalinclude:: ../../../src/vivarium/examples/boids/visualization.py :caption: **File**: :file:`~/code/vivarium_examples/boids/visualization.py` - :lines: 1-18 + :lines: 1-17 We can then visualize our flock with @@ -466,7 +466,7 @@ magnitude. .. literalinclude:: ../../../src/vivarium/examples/boids/forces.py :caption: **File**: :file:`~/code/vivarium_examples/boids/forces.py` - :lines: 1-111 + :lines: 1-113 :linenos: To access the value pipeline we created in the Neighbors component, we use @@ -481,10 +481,10 @@ We register that the ``apply_force`` method will modify the acceleration values .. literalinclude:: ../../../src/vivarium/examples/boids/forces.py :caption: **File**: :file:`~/code/vivarium_examples/boids/forces.py` - :lines: 36-39 + :lines: 35-38 :dedent: 4 :linenos: - :lineno-start: 36 + :lineno-start: 35 Once we start adding components with these modifiers into our simulation, acceleration won't always be zero anymore! @@ -498,9 +498,9 @@ parameter: the distance within which it should act. .. literalinclude:: ../../../src/vivarium/examples/boids/forces.py :caption: **File**: :file:`~/code/vivarium_examples/boids/forces.py` - :lines: 117-168 + :lines: 116-167 :linenos: - :lineno-start: 117 + :lineno-start: 116 For a quick test of our swarming behavior, let's add in these forces and check in on our boids after 100 steps: @@ -546,7 +546,7 @@ Add this method to ``visualization.py``: .. literalinclude:: ../../../src/vivarium/examples/boids/visualization.py :caption: **File**: :file:`~/code/vivarium_examples/boids/visualization.py` - :lines: 21-42 + :lines: 20-41 Then, try it out like so: diff --git a/docs/source/tutorials/disease_model.rst b/docs/source/tutorials/disease_model.rst index fb3119681..5ba5f3de4 100644 --- a/docs/source/tutorials/disease_model.rst +++ b/docs/source/tutorials/disease_model.rst @@ -5,7 +5,7 @@ Disease Model ============= .. todo:: - Motivate the development of the disease model. We're trying to understand + Motivate the development of the disease model. We're trying to understand the impact of interventions. Here we'll produce a data-free disease model focusing on core Vivarium @@ -41,43 +41,58 @@ is one of the more complicated components in the simulation as it typically is responsible for bootstrapping some of the more interesting features in vivarium. -We need a population though. So we'll start with one here and defer explanation +We need a population, though, so we'll start with one here and defer explanation of some of the more complex pieces/systems until later. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py :caption: **File**: :file:`~/code/vivarium/examples/disease_model/population.py` -There are a lot of things here. Let's take them piece by piece. -(*Note*: I'll be leaving out the docstrings in the code snippets below). +There are a lot of things here. Let's take them piece by piece. + +*Note: docstrings are left out of the code snippets below.* Imports +++++++ .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 1-5 + :lines: 1-8 -Aside from ``pandas``, we also import three classes from the core Vivarium -framework here. We'll use them to provide -`typing `_ information in method -signatures. +It's typical to import all required objects at the top of each module. In this case, +we are importing ``pandas`` and the Vivarium +:class:`Component ` class because they are used +explicitly throughout the file. Further, we import several objects from python's +``typing`` package as well as three classes from the core Vivarium framework +which are used solely for `typing `_ +information in method signatures. .. note:: - Providing type hints in Python totally optional, but if you're using a + Providing type hints in Python is totally optional, but if you're using a modern python `IDE `_ or plugins for traditional text editors, they can offer you completion options and easy access to interface documentation. It also enables the use of other static analysis tools like `mypy `_. +BasePopulation Instantiation +++++++++++++++++++++++++++++ + +.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py + :lines: 11 + +We define a class called ``BasePopulation`` that inherits from the Vivarium +:class:`Component `. This inheritance is what +makes a class a proper Vivarium :term:`component` and all the affordances that +come with that. + Default Configuration +++++++++++++++++++++ .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 8, 18-26 + :lines: 18-19, 25-32 You'll see this sort of pattern repeated in many, many Vivarium components. -We declare a configuration block as a class attribute for components. Vivarium +We declare a configuration block as a property for components. Vivarium has a :doc:`cascading configuration system ` that aggregates configuration data from many locations. The configuration is essentially a declaration of the parameter space for the simulation. @@ -86,17 +101,31 @@ The most important thing to understand is that configuration values are given default values provided by the components and that they can be overriden with a higher level system like a command line argument later. -In this component in particular declares defaults for the age range for the +This component specifically declares defaults for the age range for the initial population of simulants. It also notes that there is a `'population_size'` key. This key has a default value set by Vivarium's population management system. +Columns Created ++++++++++++++++ +.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py + :lines: 34-36 + +This property is a list of the columns that the component will create in the +population state table. The population management system uses information about +what columns are created by which components in order to determine what order to +call initializers defined in separate classes. We'll see what this means in +practice later. + The ``__init__()`` method +++++++++++++++++++++++++ -Though Vivarium components are represented are represented by Python +Though Vivarium components are specific implementations of Python `classes `_ you'll notice -that many of the classes have very sparse ``__init__`` methods. +that many of the classes have very sparse ``__init__`` methods. Indeed, this +**BasePopulation** class does not even have one defined at this level (though +there is one in the **Component** parent class it inherits from). + Due to the way the simulation bootstraps itself, the ``__init__`` method is usually only used to assign names to generic components and muck with the ``configuration_defaults`` a bit. We'll see more of this later. @@ -107,11 +136,6 @@ The ``setup`` method Instead of the ``__init__`` method, most of the component initialization takes place in the ``setup`` method. -.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 32, 44-63 - :dedent: 4 - :linenos: - The signature for the ``setup`` method is the same in every component. When the framework is constructing the simulation it looks for a ``setup`` method on each component and calls that method with a @@ -146,30 +170,31 @@ method on each component and calls that method with a - ``builder.time`` : The simulation clock. - ``builder.components`` : The component management system. Primarily used for registering subcomponents for setup. + - ``builder.results`` : The results management system. This provides access + to stratification and observation registration functions. Let's step through the ``setup`` method and examine what's happening. -Line 2 simply grabs a copy of the simulation -:class:`configuration `. This is essentially -a dictionary that supports ``.``-access notation. - .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 44 + :lines: 43, 55-71 :dedent: 4 :linenos: - :lineno-start: 2 -Lines 4-13 interact with Vivarium's +Line 2 simply grabs a copy of the simulation +:class:`configuration `. This is essentially +a dictionary that supports ``.``-access notation. + +Lines 4-18 interact with Vivarium's :class:`randomness system `. Several things are happening here. -Lines 4-9 deal with the topic of :doc:`Common Random Numbers `, +Lines 4-13 deal with the topic of :doc:`Common Random Numbers `, a variance reduction technique employed by the Vivarium framework to make it easier to perform counterfactual analysis. It's not important to have a full grasp of this system at this point. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 45-50 + :lines: 57-66 :dedent: 4 :linenos: :lineno-start: 4 @@ -186,7 +211,7 @@ grasp of this system at this point. For example, suppose we have two simulations of the world. We model the world as it is in the first simulation and we introduce a vaccine for the - flu in the second simulation. Unless my model explicitly encodes the causal + flu in the second simulation. Unless the model explicitly encodes the causal relationship between flu vaccination and vehicle traffic patterns, the person who died in a vehicle accident on the 43rd time step in the first simulation will also die in a vehicle accident on the 43rd time step @@ -195,25 +220,24 @@ grasp of this system at this point. In practice, what the CRN system requires is a way to uniquely identify simulants across simulations. We need to randomly generate some simulant characteristics in a repeatable fashion and then use those characteristics to -identify the simulants in the randomness system later. This is **only** handled -by the population component typically. It's vitally important to get right +identify the simulants in the randomness system later. This is (typically) **only** +handled by the population component. It's vitally important to get right when doing counterfactual analysis, but it's not especially important that you understand the mechanics of the implementation. In this component we're using some information about the configuration of the randomness system to let us know whether or not we care about using CRN. -We'll explore this much later when we're looking at running simulations with +We'll explore this later when we're looking at running simulations with interventions. -The next thing we do is grab actual -:class:`randomness streams ` +Finally, we grab actual :class:`randomness streams ` from the framework. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 53-55 + :lines: 68-71 :dedent: 4 :linenos: - :lineno-start: 11 + :lineno-start: 15 ``get_stream`` is the only call most components make to the randomness system. The best way to think about randomness streams is as decision points in your @@ -221,67 +245,18 @@ simulation. Any time you need to answer a question that requires a random number, you should be using a randomness stream linked to that question. Here we have the questions "What age are my simulants when they enter the -simulation?" and "What sex are my simulants?" and streams to go along with -them. +simulation?" and "What sex are my simulants?"; we have assigned their corresponding +randomness streams to ``age_randomness`` and ``sex_randomness`` attributes, respectively. -The ``for_initialization`` argument tells the stream that the simulants you're -asking this question about won't already be registered with the randomness -system. This is the bootstrapping part. Here we're using the -``'entrance_time'`` and ``'age'`` to identify a simulant and so we need a +For ``age_randomness``, the ``initializes_crn_attributes`` argument +tells the stream that the simulants you're asking this question about won't already +be registered with the randomness system; this is the bootstrapping part. Here we're +using the ``'entrance_time'`` and ``'age'`` to identify a simulant and so we need a stream to initialize ages with. There is should really only be one of these initialization streams in a simulation. The ``'sex_randomness'`` is a much more typical example of how to interact -with the randomness system. - -Next we register the ``on_initialize_simulants`` method of our -``BasePopulation`` object as a population initializer and let the -:class:`population management system ` -know that it is responsible for generating the ``'age'``, ``'sex'``, -``'alive'``, and ``'entrance_time'`` columns in the population state table. - -.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 57-59 - :dedent: 4 - :linenos: - :lineno-start: 15 - -.. note:: - - **The Population Table** - - When we talk about columns in the context of Vivarium, we are typically - talking about the simulant :term:`attributes `. Vivarium - represents the population of simulants as a single - :class:`pandas.DataFrame`. We think of each simulant as a row in this table - and each column as an attribute of the simulants. - -Next we get a view into the population table. - -.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 61 - :dedent: 4 - :linenos: - :lineno-start: 19 - -:class:`Population views ` are -used both to query the current state of simulants and to update that state -information. When you request a population view from the builder, you must -tell it which columns in the population table you want to see, and so here we -pass along the same set of columns we've said we're creating. - -Finally, we register the ``age_simulants`` method as a listener to the -``'time_step'`` event using the -:class:`event system `. Vivarium -emits several :doc:`events ` over the course of the -simulation. Any time the ``'time_step'`` event is called, the ``age_simulants`` -method will be called as well. - -.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 63 - :dedent: 4 - :linenos: - :lineno-start: 21 +with the randomness system - we are simply getting the stream. **That was a lot of stuff** @@ -295,32 +270,50 @@ conventions. The ``on_initialize_simulants`` method ++++++++++++++++++++++++++++++++++++++ -During ``setup``, we registered this method with the framework as a -simulant initializer. You can name this whatever you like in practice, but I -have a tendency to give methods that the framework is calling names that -describe where in the simulation life-cycle they occur. This helps me think -more clearly about what's going on and helps debugging. +The primary purpose of this method (for this class) is to generate the initial +population. Specifically, it will generate the 'age', 'sex', 'alive', and +'entrance_time' columns for the population table (recall that the ``columns_created`` +property dictates that this component will indeed create these columns). -.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 65, 91-115 - :dedent: 4 - :linenos: +.. note:: + + **The Population Table** -Every initializer is called by the population management whenever simulants + When we talk about columns in the context of Vivarium, we are typically + talking about the simulant :term:`attributes `. Vivarium + represents the population of simulants as a single + :class:`pandas.DataFrame`. We think of each simulant as a row in this table + and each column as an attribute of the simulants. + +As previously mentioned, this class is a proper Vivarium :term:`Component`. Among +other things, this means that much of the setup happens automatically during the +simulation's ``Setup`` :doc:`lifecycle phase `. +There are several methods available to define for a component's setup, depending +on what you want to happen when: ``on_post_setup()``, ``on_initialize_simulants()`` +(this one), ``on_time_step_prepare()``, ``on_time_step()``, ``on_time_step_cleanup()``., +``on_collect_metrics()``, and ``on_simulation_end()``. The framework looks for +any of these methods during the setup phase and calls them if they are defined. +The fact that this method is called ``on_initialize_simulants`` guarantees that +it will be called during the population initialization phase of the simulation. + +This initializer method is called by the population management whenever simulants are created. For our purposes, this happens only once at the very beginning of the simulation. Typically, we'd task another component with responsibility for managing other ways simulants might enter (we might, for instance, have a ``Migration`` component that knows about how and when people enter and exit -our location of interest). +our location of interest or a ``Fertility`` component that handles new simulants +being born). -The population management system uses information about what columns are -created by which components in order to determine what order to call -initializers defined in separate classes. We'll see what this means in -practice later. +We'll take this method line by line as we did with ``setup``. -We see that like the ``setup`` method, ``on_initialize_simulants`` takes in a -special argument that we don't provide. This argument, ``pop_data`` is an -instance of :class:`~vivarium.framework.population.manager.SimulantData` containing a +.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py + :lines: 77, 102-132 + :dedent: 4 + :linenos: + +First, we see that this method takes in a special argument that we don't provide. +This argument, ``pop_data`` is an instance of +:class:`~vivarium.framework.population.manager.SimulantData` containing a handful of information useful when initializing simulants. .. note:: @@ -338,32 +331,31 @@ handful of information useful when initializing simulants. - ``creation_window`` : The size of the time step over which the simulants are created. A ``pandas.Timedelta``. -We'll take this method line by line as we did with ``setup``. The most interesting thing that that the ``BasePopulation`` component does is manage the age of our simulants. Back in the ``configuration_defaults`` -we specified an ``'age_start'`` and ``'age_end'``. Here we use these +property we specified an ``'age_start'`` and ``'age_end'``. Here we use these to generate the age distribution of our initial population. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 91-99 + :lines: 102-111 :dedent: 4 :linenos: :lineno-start: 2 We've built in support for two different kinds of populations based on the -``'age_start'`` and ``'age_end'`` specified in the configuration. If we get +``'age_start'`` and ``'age_end'`` specified in the configuration. If we get the same ``'age_start'`` and ``'age_end'``, we have a cohort, and so we smear -out ages within the width of a single time step (the ``creation_window``). +out ages within the width of a single time step (the ``pop_data.creation_window``). Otherwise, we assume our population is uniformly distributed within the age window bounded by ``'age_start'`` and ``'age_end'``. You can use demographic data here to generate arbitrarily complex starting populations. The only thing really of note here is the call to -``self.age_randomness.get_draw``. If we recall from the ``setup`` method, +``self.age_randomness.get_draw``. If we recall from the ``setup`` method, ``self.age_randomness`` is an instance of a :class:`~vivarium.framework.randomness.stream.RandomnessStream` which supports several -convenience methods for interacting with random numbers. ``get_draw`` takes +convenience methods for interacting with random numbers. ``get_draw`` takes in an ``index`` representing particular simulants and returns a ``pandas.Series`` with a uniformly drawn random number for each simulant in the index. @@ -386,10 +378,10 @@ These ``key_columns`` are what the randomness system uses to uniquely identify simulants across simulations. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 101-106 + :lines: 113-120 :dedent: 4 :linenos: - :lineno-start: 2 + :lineno-start: 13 If we are using CRN, we must generate these columns before any other calls are made to the randomness system with the population index. We then @@ -406,39 +398,45 @@ If we're not using CRN, we can just generate the full set of simulant attributes straightaway. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 107-113 + :lines: 121-130 :dedent: 4 :linenos: - :lineno-start: 2 + :lineno-start: 21 In either case, we are hanging on to a table representing some attributes of our new simulants. However, this table does not matter yet because the simulation's population system doesn't know anything about it. We must first inform the simulation by passing in the ``DataFrame`` to our :class:`population view's ` -``update`` method. This method is the only way to modify the underlying +``update`` method. This method is the only way to modify the underlying population table. +.. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py + :lines: 132 + :dedent: 4 + :linenos: + :lineno-start: 32 + .. warning:: The data generated and passed into the population view's ``update`` method must have the same index that was passed in with the ``pop_data``. You can potentially cause yourself a great deal of headache otherwise. -Aging our simulants -+++++++++++++++++++ +The ``on_time_step`` method ++++++++++++++++++++++++++++ The last piece of our population component is the ``'time_step'`` listener -method ``age_simulants``. +method ``on_time_step``. .. literalinclude:: ../../../src/vivarium/examples/disease_model/population.py - :lines: 117, 127-129 + :lines: 134, 144-146 :dedent: 4 :linenos: This method takes in an :class:`~vivarium.framework.event.Event` argument provided by the simulation. This is very similar to the ``SimulantData`` -argument provided to ``on_initialize_simulants``. It carries around +argument provided to ``on_initialize_simulants``. It carries around some information about what's happening in the event. .. note:: @@ -459,20 +457,20 @@ some information about what's happening in the event. about here. In order to age our simulants, we first acquire a copy of the current -population state from our population view. In addition to the ``update`` +population state from our population view. In addition to the ``update`` method, population views also support a ``get`` method that takes in an index and an optional ``query`` used to filter down the returned -population. Here, we only want to increase the age of people still living. +population. Here, we only want to increase the age of people still living. The ``query`` argument needs to be consistent with the :meth:`pandas.DataFrame.query` method. What we get back is another ``pandas.DataFrame`` containing the filtered -rows corresponding to the index we passed in. The columns of the returned -``DataFrame`` are precisely the columns we specified when we created the -view. +rows corresponding to the index we passed in. The columns of the returned +``DataFrame`` are precisely the columns this component created (as well as +any additional ``columns_required``, of which this component has none). We next update the age of our simulants by adding on the width of the time step -to their current age and passing the update table to the ``update`` method +to their current age and passing the updated table to the ``update`` method of our population view as we did in ``on_initialize_simulants`` Examining our work @@ -493,12 +491,28 @@ Now that we've done all this hard work, let's see what it gives us. :: - tracked alive sex age entrance_time - 0 True alive Male 78.088109 2005-07-01 - 1 True alive Male 44.072665 2005-07-01 - 2 True alive Female 48.346571 2005-07-01 - 3 True alive Female 91.002147 2005-07-01 - 4 True alive Female 63.641191 2005-07-01 + tracked sex age entrance_time alive + 0 True Female 78.088109 2005-07-01 alive + 1 True Female 44.072665 2005-07-01 alive + 2 True Female 48.346571 2005-07-01 alive + 3 True Female 91.002147 2005-07-01 alive + 4 True Female 63.641191 2005-07-01 alive + +.. testcode:: + :hide: + + import pandas as pd + + from vivarium import InteractiveContext + from vivarium.examples.disease_model.population import BasePopulation + + config = {'randomness': {'key_columns': ['entrance_time', 'age']}} + sim = InteractiveContext(components=[BasePopulation()], configuration=config) + expected = pd.DataFrame({ + 'age': [78.08810902, 44.07266518, 48.34657108, 91.00214722, 63.64119145], + 'sex': ['Female']*5, + }) + pd.testing.assert_frame_equal(sim.get_population().head()[['age', 'sex']], expected) Great! We generate a population with a non-trivial age and sex distribution. Let's see what happens when our simulation takes a time step. @@ -508,30 +522,27 @@ Let's see what happens when our simulation takes a time step. sim.step() print(sim.get_population().head()) - :: - tracked alive sex age entrance_time - 0 True alive Male 78.090849 2005-07-01 - 1 True alive Male 44.075405 2005-07-01 - 2 True alive Female 48.349311 2005-07-01 - 3 True alive Female 91.004887 2005-07-01 - 4 True alive Female 63.643931 2005-07-01 + tracked sex age entrance_time alive + 0 True Female 78.090849 2005-07-01 alive + 1 True Female 44.075405 2005-07-01 alive + 2 True Female 48.349311 2005-07-01 alive + 3 True Female 91.004887 2005-07-01 alive + 4 True Female 63.643931 2005-07-01 alive -Everyone gets older! Right now though, we could just keep taking steps -in our simulation and people would continue getting older. This, of course, -does not reflect how the world goes. Time to introduce the grim reaper. .. testcode:: :hide: - from vivarium import InteractiveContext - from vivarium.examples.disease_model import BasePopulation + import numpy as np - config = {'randomness': {'key_columns': ['entrance_time', 'age']}} - sim = InteractiveContext(components=[BasePopulation()], configuration=config) sim.step() + assert np.isclose((sim.get_population().head()['age'] - expected['age'])*365, 1, 0.000001).all() +Everyone gets older by exactly one time step! We could just keep taking steps in +our simulation and people would continue getting infinitely older. This, of +course, does not reflect how the world goes. Time to introduce the grim reaper. Mortality --------- @@ -548,53 +559,58 @@ in the ``BasePopulation`` component used again here and a few new things. Let's dive in. -What's new in the configuration? -++++++++++++++++++++++++++++++++ +Default Configuration ++++++++++++++++++++++ Since we're building our disease model without data to inform it, we'll expose all the important bits of the model as parameters in the configuration. .. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py - :lines: 8, 19-23 - :linenos: + :lines: 16-17, 23-27 Here we're specifying the overall mortality rate in our simulation. Rates have units! We'll phrase our model with rates specified in terms of events per person-year. So here we're specifying a uniform mortality rate of 0.01 deaths -per person-year. This is obviously not realistic. Using toy data like this is -often extremely useful in validating a model though. +per person-year. This is obviously not realistic, but using toy data like this is +often extremely useful in validating a model. + +Columns Required +++++++++++++++++ + +.. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py + :lines: 29-31 -Setting up the mortality component -++++++++++++++++++++++++++++++++++ +While this component does not create any new columns like the ``BasePopulation`` +component, it does require the ``'tracked'`` and ``'alive'`` columns to be +present in the population table. You'll see that these columns are indeed used +in the ``on_time_step`` and ``on_time_step_prepare`` methods. -Many of the tools we explored in the ``BasePopulation`` component are -used again here. There are two new things to look at. +The ``setup`` method +++++++++++++++++++++ + +There is not a whole lot going on in this setup method, but there is one new concept +we should discuss. .. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py - :lines: 29, 41-47 - :dedent: 4 - :linenos: + :lines: 38, 50-55 + +The first two lines are simply adding some useful attributes: the mortality-specific +configuration and the mortality randomness stream (which is used to answer the +question "which simulants died at this time step?"). -The first comes in line 3. Previously, we'd acquired a population view -from the builder and then supplied a query to filter out dead people when -we were requesting the population table from the view. We can also provide -a default query when we construct the view and bypas the query argument -when requesting the population table from the view later. In line 3 we're -saying we want a view of the ``'alive'`` column of the population table, -but only for those people who are actually alive in the current time step. - -The other feature of note is is the introduction of the -:class:`values system ` in line -6. The values system provides a way of distributing the computation of a -value over multiple components. This is a bit difficult to get used to, +The main feature of note is the introduction of the +:class:`values system `. +The values system provides a way of distributing the computation of a +value over multiple components. This can be a bit difficult to grasp, but is vital to the way we think about components in Vivarium. The best way to understand this system is by :doc:`example. ` -In our current context we introduce a named value "pipeline" into the -simulation called ``'mortality_rate'``. The source for a value is always a -callable function or method. It typically takes in a ``pandas.Index`` as its -only argument. Other things are possible, but not necessary for our current use -case. +In our current context we register a named value "pipeline" into the +simulation called ``'mortality_rate'`` via the ``builder.value.register_rate_producer`` +method. The source for a value is always a callable function or method +(``self.base_mortality_rate`` in this case) which typically takes in a +``pandas.Index`` as its only argument. Other things are possible, but not +necessary for our current use case. The ``'mortality_rate'`` source is then responsible for returning a ``pandas.Series`` containing a base mortality rate for each simulant @@ -603,36 +619,17 @@ as modifiers to this base rate. We'll see more of this once we get to the disease modelling portion of the tutorial. The value system will coordinate how the base value is modified behind the -scenes and return the results of all computations wherever the pipeline is -called from (here, in the soon to be discussed ``determine_deaths`` method. - -Supplying a base mortality rate -+++++++++++++++++++++++++++++++ - -As just discussed, the ``base_mortality_rate`` method is the source for -the ``'mortality_rate'`` value. Here we take in an index and build -a ``pandas.Series`` that assigns each individual the mortality rate -specified in the configuration. - -.. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py - :lines: 49, 62 - :dedent: 4 - :linenos: - -In an actual simulation, we'd inform the base mortality rate with data -specific to the age, sex, location, year (and potentially other demographic -factors) that represent each simulant. We might disaggregate or interpolate -our data here as well. Which is all to say, the source of a data pipeline can -do some pretty complicated stuff. +scenes and return the results of all computations whenever the pipeline is +called (in the ``on_time_step`` method in this case - see below). -Determining who dies -++++++++++++++++++++ +The ``on_time_step`` method ++++++++++++++++++++++++++++ -Like our aging method in the population component, our ``determine_deaths`` -method responds to ``'time_step'`` events. +Similar to how we aged simulants in the population component, we determine which +simulants die during ``'time_step'`` events. .. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py - :lines: 64, 74-78 + :lines: 61, 71-77 :dedent: 4 :linenos: @@ -643,15 +640,52 @@ this changes once we bring in a disease. Importantly for now though, the pipeline is automatically rescaling the rate down to the size of the time steps we're taking. -In lines 3-5, we determine who died this time step. We turn our mortality rate +In lines 3-5, we determine who died this time step. We turn our mortality rate into a probability of death in the given time step by assuming deaths are `exponentially distributed `_ -and using the inverse distribution function. -We then draw a uniformly distributed random number for each person and -determine who died by comparing that number to the computed probability of -death for the individual. +and using the inverse distribution function. We then draw a uniformly distributed +random number for each person and determine who died by comparing that number to +the computed probability of death for the individual. + +Finally, we update the state table ``'alive'`` column with the newly dead simulants. + +Note that when getting a view of the state table to update, we are using the +``subview`` method which returns only the columns requested. + +The ``on_time_step_prepare`` method ++++++++++++++++++++++++++++++++++++ + +This method simply updates any simulants who died during the previous time step +to be marked as untracked (that is, their ``'tracked'`` value is set to ``False``). + +.. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py + :lines: 79, 92-96 -Finally, in line 6, we update the state table with the newly dead simulants. +Why didn't we update the newly-dead simulants ``'tracked'`` values at the same time +as their ``'alive'`` values in the ``on_time_step`` method? The reason is that the +deaths observer (discussed later) records the number of deaths that occurred during +the previous time step during the ``collect_metrics`` phase. By updating +the ``'alive'`` column during the ``time_step`` phase (which occurs *before* +``collect_metrics``) and the ``'tracked'`` column during the ``time_step_prepare`` +phase (which occurs *after* ``collect_metrics``), we ensure that the observer +can distinguish which simulants died specifically during the previous time step. + +Supplying a base mortality rate ++++++++++++++++++++++++++++++++ + +As discussed above, the ``base_mortality_rate`` method is the source for +the ``'mortality_rate'`` value. Here we take in an index and build +a ``pandas.Series`` that assigns each individual the mortality rate +specified in the configuration. + +.. literalinclude:: ../../../src/vivarium/examples/disease_model/mortality.py + :lines: 102, 115 + +In an actual simulation, we'd inform the base mortality rate with data +specific to the age, sex, location, year (and potentially other demographic +factors) that represent each simulant. We might disaggregate or interpolate +our data here as well. Which is all to say, the source of a data pipeline can +do some pretty complicated stuff. Did it work? ++++++++++++ @@ -663,7 +697,7 @@ can see the impact of our mortality component without taking too many steps. .. code-block:: python - from vivarium InteractiveContext + from vivarium import InteractiveContext from vivarium.examples.disease_model.population import BasePopulation from vivarium.examples.disease_model.mortality import Mortality @@ -681,60 +715,78 @@ can see the impact of our mortality component without taking too many steps. :: - tracked alive sex age entrance_time - 0 True alive Male 78.088109 2005-07-01 - 1 True alive Male 44.072665 2005-07-01 - 2 True alive Female 48.346571 2005-07-01 - 3 True alive Female 91.002147 2005-07-01 - 4 True alive Female 63.641191 2005-07-01 + tracked sex age entrance_time alive + 0 True Female 78.088109 2005-07-01 alive + 1 True Female 44.072665 2005-07-01 alive + 2 True Female 48.346571 2005-07-01 alive + 3 True Female 91.002147 2005-07-01 alive + 4 True Female 63.641191 2005-07-01 alive -This looks (exactly!) the same as last time. Good. +.. testcode:: + :hide: + + from vivarium.examples.disease_model.mortality import Mortality + + config = { + 'population': { + 'population_size': 100_000 + }, + 'randomness': { + 'key_columns': ['entrance_time', 'age'] + } + } + sim = InteractiveContext(components=[BasePopulation(), Mortality()], configuration=config) + + expected = pd.DataFrame({ + 'age': [78.08810902, 44.07266518, 48.34657108, 91.00214722, 63.64119145], + 'sex': ['Female']*5, + }) + pd.testing.assert_frame_equal(sim.get_population().head()[['age', 'sex']], expected) + +This looks (exactly!) the same as it did prior to implementing mortality. Good - +we haven't taken a time step yet and so no one should have died. .. code-block:: python - sim.get_population().alive.value_counts() + print(sim.get_population().alive.value_counts()) :: + alive alive 100000 - Name: alive, dtype: int64 + Name: count, dtype: int64 -Just checking that everyone is alive. Let's run our simulation for a while +.. testcode:: + :hide: + + assert sim.get_population().alive.value_counts().alive == 100_000 + +Just checking that everyone is alive. Let's run our simulation for a while and see what happens. .. code-block:: python sim.take_steps(365) # Run for one year with one day time steps - sim.get_population().alive.value_counts() + sim.get_population('tracked==True').alive.value_counts() :: - alive 99037 - dead 963 - Name: alive, dtype: int64 + alive + alive 99015 + dead 985 + Name: count, dtype: int64 -We simulated somewhere between 99,037 (if everyone died in the first time step) +We simulated somewhere between 99,015 (if everyone died in the first time step) and 100,000 (if everyone died in the last time step) living person-years and -saw 963 deaths. This means our empirical mortality rate is somewhere close -to 0.0097 deaths per person-year, very close to the 0.01 rate we provided. +saw 985 deaths. This means our empirical mortality rate is somewhere close +to 0.0099 deaths per person-year, very close to the 0.01 rate we provided. .. testcode:: :hide: - from vivarium import InteractiveContext - from vivarium.examples.disease_model import BasePopulation, Mortality - - config = { - 'population': { - 'population_size': 100_000 - }, - 'randomness': { - 'key_columns': ['entrance_time', 'age'] - } - } - sim = InteractiveContext(components=[BasePopulation(), Mortality()], configuration=config) sim.take_steps(2) + assert sim.get_population('tracked==True')['alive'].value_counts()['dead'] == 6 Disease ------- @@ -808,6 +860,7 @@ observations up to this point in the simulation. configuration=config ) sim.take_steps(365) # Run for one year with one day time steps + print(sim.get_results()["dead"]) print(sim.get_results()["ylls"]) @@ -825,20 +878,8 @@ been a total of 27,987 years of life lost. .. testcode:: :hide: - from vivarium import InteractiveContext - from vivarium.examples.disease_model.population import BasePopulation - from vivarium.examples.disease_model.mortality import Mortality from vivarium.examples.disease_model.observer import DeathsObserver, YllsObserver - config = { - 'population': { - 'population_size': 100_000 - }, - 'randomness': { - 'key_columns': ['entrance_time', 'age'] - } - } - sim = InteractiveContext( components=[ BasePopulation(), diff --git a/docs/source/tutorials/exploration.rst b/docs/source/tutorials/exploration.rst index 0aafefc32..db0cc97fb 100644 --- a/docs/source/tutorials/exploration.rst +++ b/docs/source/tutorials/exploration.rst @@ -272,12 +272,12 @@ your starting population. :: - tracked sex alive age entrance_time child_growth_failure_propensity diarrhea - 0 True Female alive 3.452598 2005-06-28 0.552276 susceptible_to_diarrhea - 1 True Female alive 4.773249 2005-06-28 0.019633 susceptible_to_diarrhea - 2 True Male alive 23.423383 2005-06-28 0.578892 susceptible_to_diarrhea - 3 True Female alive 13.792463 2005-06-28 0.988650 susceptible_to_diarrhea - 4 True Male alive 0.449368 2005-06-28 0.407759 susceptible_to_diarrhea + tracked age alive sex entrance_time lower_respiratory_infections child_wasting_propensity + 0 True 4.341734 alive Male 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.612086 + 1 True 1.009906 alive Male 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.395465 + 2 True 1.166290 alive Male 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.670765 + 3 True 4.075051 alive Female 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.289266 + 4 True 2.133430 alive Female 2021-12-31 12:00:00 susceptible_to_lower_respiratory_infections 0.700001 This gives you a ``pandas.DataFrame`` representing your starting population. You can use it to check all sorts of characteristics about individuals or diff --git a/src/vivarium/examples/boids/forces.py b/src/vivarium/examples/boids/forces.py index c2a9897d8..eacbca393 100644 --- a/src/vivarium/examples/boids/forces.py +++ b/src/vivarium/examples/boids/forces.py @@ -1,12 +1,11 @@ from abc import ABC, abstractmethod -from typing import Any, Dict, List, Optional +from typing import Any, Dict import numpy as np import pandas as pd from vivarium import Component from vivarium.framework.engine import Builder -from vivarium.framework.population import SimulantData class Force(Component, ABC): diff --git a/src/vivarium/examples/boids/movement.py b/src/vivarium/examples/boids/movement.py index 88f2ff54d..dd081783a 100644 --- a/src/vivarium/examples/boids/movement.py +++ b/src/vivarium/examples/boids/movement.py @@ -1,5 +1,3 @@ -from typing import Any, Dict, List - import numpy as np import pandas as pd diff --git a/src/vivarium/examples/boids/neighbors.py b/src/vivarium/examples/boids/neighbors.py index 8fa054b87..60db8cbae 100644 --- a/src/vivarium/examples/boids/neighbors.py +++ b/src/vivarium/examples/boids/neighbors.py @@ -1,5 +1,3 @@ -from typing import Any, Dict, List, Optional - import pandas as pd from scipy import spatial diff --git a/src/vivarium/examples/boids/population.py b/src/vivarium/examples/boids/population.py index 34ff2924a..ad6abc4a1 100644 --- a/src/vivarium/examples/boids/population.py +++ b/src/vivarium/examples/boids/population.py @@ -1,5 +1,3 @@ -from typing import Any, Dict, List - import numpy as np import pandas as pd diff --git a/src/vivarium/examples/boids/visualization.py b/src/vivarium/examples/boids/visualization.py index 6b7ba8a9c..f8f3ada78 100644 --- a/src/vivarium/examples/boids/visualization.py +++ b/src/vivarium/examples/boids/visualization.py @@ -1,5 +1,4 @@ import matplotlib.pyplot as plt -import numpy as np from matplotlib.animation import FuncAnimation diff --git a/src/vivarium/examples/disease_model/disease.py b/src/vivarium/examples/disease_model/disease.py index c0b8bc0d8..93eb612f1 100644 --- a/src/vivarium/examples/disease_model/disease.py +++ b/src/vivarium/examples/disease_model/disease.py @@ -1,4 +1,4 @@ -from typing import Any, Dict, List, Optional +from typing import List, Optional import pandas as pd @@ -93,7 +93,7 @@ def setup(self, builder: Builder): Parameters ---------- - builder : `engine.Builder` + builder Interface to several simulation tools. """ super().setup(builder) diff --git a/src/vivarium/examples/disease_model/mortality.py b/src/vivarium/examples/disease_model/mortality.py index 5c58ea7e8..1908886bc 100644 --- a/src/vivarium/examples/disease_model/mortality.py +++ b/src/vivarium/examples/disease_model/mortality.py @@ -15,10 +15,10 @@ class Mortality(Component): @property def configuration_defaults(self) -> Dict[str, Any]: - """ - A set of default configuration values for this component. These can be - overwritten in the simulation model specification or by providing - override values when constructing an interactive simulation. + """A set of default configuration values for this component. + + These can be overwritten in the simulation model specification or by + providing override values when constructing an interactive simulation. """ return { "mortality": { @@ -44,7 +44,7 @@ def setup(self, builder: Builder) -> None: Parameters ---------- - builder : + builder Access to simulation tools and subsystems. """ self.config = builder.configuration.mortality @@ -63,7 +63,7 @@ def on_time_step(self, event: Event) -> None: Parameters ---------- - event : + event An event object emitted by the simulation containing an index representing the simulants affected by the event and timing information. @@ -84,7 +84,7 @@ def on_time_step_prepare(self, event: Event) -> None: Parameters ---------- - event : + event An event object emitted by the simulation containing an index representing the simulants affected by the event and timing information. @@ -104,7 +104,7 @@ def base_mortality_rate(self, index: pd.Index) -> pd.Series: Parameters ---------- - index : + index A representation of the simulants to compute the base mortality rate for. diff --git a/src/vivarium/examples/disease_model/population.py b/src/vivarium/examples/disease_model/population.py index 5064f4155..3476a4251 100644 --- a/src/vivarium/examples/disease_model/population.py +++ b/src/vivarium/examples/disease_model/population.py @@ -17,10 +17,10 @@ class BasePopulation(Component): @property def configuration_defaults(self) -> Dict[str, Any]: - """ - A set of default configuration values for this component. These can be - overwritten in the simulation model specification or by providing - override values when constructing an interactive simulation. + """A set of default configuration values for this component. + + These can be overwritten in the simulation model specification or by + providing override values when constructing an interactive simulation. """ return { "population": { @@ -49,7 +49,7 @@ def setup(self, builder: Builder) -> None: Parameters ---------- - builder : + builder Access to simulation tools and subsystems. """ self.config = builder.configuration @@ -80,24 +80,23 @@ def on_initialize_simulants(self, pop_data: SimulantData) -> None: This component is responsible for creating and filling four columns in the population state table: - 'age' : + 'age' The age of the simulant in fractional years. - 'sex' : + 'sex' The sex of the simulant. One of {'Male', 'Female'} - 'alive' : + 'alive' Whether or not the simulant is alive. One of {'alive', 'dead'} - 'entrance_time' : + 'entrance_time' The time that the simulant entered the simulation. The 'birthday' for simulants that enter as newborns. A `pandas.Timestamp`. Parameters ---------- - pop_data : + pop_data A record containing the index of the new simulants, the start of the time step the simulants are added on, the width of the time step, and the age boundaries for the simulants to generate. - """ age_start = pop_data.user_data.get("age_start", self.config.population.age_start) @@ -137,7 +136,7 @@ def on_time_step(self, event: Event) -> None: Parameters ---------- - event : + event An event object emitted by the simulation containing an index representing the simulants affected by the event and timing information.