Lots of `Modules` perform the same initial setting of DataFrame columns in `pre_initialise_simulation` #1428

willGraham01 · 2024-07-18T12:21:29Z

Each Module subclass defines a list of PROPERTIES that it "owns" in the population DataFrame. During the initialise_population method, each Module subclass then sets the initial values for said columns.

However, the majority of Module subclasses seem to perform the same actions in this method - which is to just set the columns (defined in self.PROPERTIES) to the default value for the variable type of the property in question. As an example, consider the chronicsyndrome module - the PROPERTIES match up identically to the columns set in the population DataFrame during initialise_population. There a few other cases where a module will set the initial value of a column to something other than the default value defined by the Property class, but will still explicitly set all other property columns to the default value.

Other modules do this and a bit more; Alri for example has 10 properties which sets an additional group of columns on top of those defined in it's PROPERTIES. However, it still needs to set the PROPERTIES columns to their defaults too.

We can likely automate this process for the vast majority of modules by implementing initialise_population generally in the Module class itself:

def initialise_population(self, population: Population) -> None:
  df = population.props

  for property_name, property in self.PROPERTIES.items():
      df.loc[df.is_alive, property_name] = (
          property._default_value
          # If the property is a CATEGORICAL, we might have to lookup the correct default value to assign here
      )

And then allowing Property to take an override at instantiation with the default value to assign to a series;

class Property(Specifiable):
    def __init__(
        self,
        type_: Types,
        description: str,
        categories: Set[Any] = None,
        *,
        ordered: bool = False,
        default_property_value: Optional[Any] = None,
    ) -> None:
        # All the usual stuff we already do
        ...

        # Set supplied default value, if appropriate
        self._default_property_value = (
            default_property_value
            if default_property_value is not None
            and (
                (
                    self.type_ is Types.CATEGORICAL
                    and default_property_value in categories
                )
                or isinstance(default_property_value, self.python_type)
            )
            else None
        )

    @property
    def _default_value(self) -> Type[Any]:
        return (
            self.PANDAS_TYPE_DEFAULT_VALUE_MAP[self.pandas_type] # Fall back to dtype map if no explicit default was given.
            if self._default_property_value is None
            else self._default_property_value
        )

Disease Modules that then need to do something different to this can still overwrite initialise_population as usual. Modules that need to do something in addition to this can invoke super().initialise_population to copy these steps, before running the custom instructions they need in advance. Modules that just need to do these steps then don't even need to implement initialise_population explicitly.

This also means that if the names of the PROPERTIES for a given module are ever updated, they don't need to be changed in two places (within PROPERTIES and again in initialise_population).

The text was updated successfully, but these errors were encountered:

willGraham01 · 2024-07-19T13:37:34Z

Related to the above; the on_birth method in most modules also does something similar for the newborn child: sets all of their properties in the DF to be the defaults. We could again have the Module class define on_birth by default:

class Module:
    ....

    def on_birth(self, mother_id: int, child_id: int) -> None:
      for property_name, property in self.PROPERTIES.items():
          df.loc[child_id, property_name] = property._default_value

      # Or maybe more efficient
      df.loc[child_id, [p_name for p_name in self.PROPERTIES.keys()]] = [p._default_value for p in self.PROPERTIES.values()]

And again, modules can:

Not explicitly define this method if they want to do exactly this on a new birth
Use super() to do this and then run some additional commands specific to the subclass
Overwrite the method explicitly if they need to do something completely different

willGraham01 added module framework labels Jul 18, 2024

This was referenced Jul 19, 2024

Refactor cancer modules to reduce code duplication #1426

Closed

Refactor setting of PROPERTIES columns across Modules #1429

Closed

willGraham01 mentioned this issue Jul 29, 2024

Allow PROPERTIES to set a default value, and Modules to auto-initialise their DF columns to these values #1436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots of `Modules` perform the same initial setting of DataFrame columns in `pre_initialise_simulation` #1428

Lots of `Modules` perform the same initial setting of DataFrame columns in `pre_initialise_simulation` #1428

willGraham01 commented Jul 18, 2024 •

edited

Loading

willGraham01 commented Jul 19, 2024 •

edited

Loading

Lots of Modules perform the same initial setting of DataFrame columns in pre_initialise_simulation #1428

Lots of Modules perform the same initial setting of DataFrame columns in pre_initialise_simulation #1428

Comments

willGraham01 commented Jul 18, 2024 • edited Loading

willGraham01 commented Jul 19, 2024 • edited Loading

Lots of `Modules` perform the same initial setting of DataFrame columns in `pre_initialise_simulation` #1428

Lots of `Modules` perform the same initial setting of DataFrame columns in `pre_initialise_simulation` #1428

willGraham01 commented Jul 18, 2024 •

edited

Loading

willGraham01 commented Jul 19, 2024 •

edited

Loading