Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of Modules perform the same initial setting of DataFrame columns in pre_initialise_simulation #1428

Open
willGraham01 opened this issue Jul 18, 2024 · 1 comment

Comments

@willGraham01
Copy link
Collaborator

willGraham01 commented Jul 18, 2024

Each Module subclass defines a list of PROPERTIES that it "owns" in the population DataFrame. During the initialise_population method, each Module subclass then sets the initial values for said columns.

However, the majority of Module subclasses seem to perform the same actions in this method - which is to just set the columns (defined in self.PROPERTIES) to the default value for the variable type of the property in question. As an example, consider the chronicsyndrome module - the PROPERTIES match up identically to the columns set in the population DataFrame during initialise_population. There a few other cases where a module will set the initial value of a column to something other than the default value defined by the Property class, but will still explicitly set all other property columns to the default value.

Other modules do this and a bit more; Alri for example has 10 properties which sets an additional group of columns on top of those defined in it's PROPERTIES. However, it still needs to set the PROPERTIES columns to their defaults too.

We can likely automate this process for the vast majority of modules by implementing initialise_population generally in the Module class itself:

def initialise_population(self, population: Population) -> None:
  df = population.props

  for property_name, property in self.PROPERTIES.items():
      df.loc[df.is_alive, property_name] = (
          property._default_value
          # If the property is a CATEGORICAL, we might have to lookup the correct default value to assign here
      )

And then allowing Property to take an override at instantiation with the default value to assign to a series;

class Property(Specifiable):
    def __init__(
        self,
        type_: Types,
        description: str,
        categories: Set[Any] = None,
        *,
        ordered: bool = False,
        default_property_value: Optional[Any] = None,
    ) -> None:
        # All the usual stuff we already do
        ...

        # Set supplied default value, if appropriate
        self._default_property_value = (
            default_property_value
            if default_property_value is not None
            and (
                (
                    self.type_ is Types.CATEGORICAL
                    and default_property_value in categories
                )
                or isinstance(default_property_value, self.python_type)
            )
            else None
        )

    @property
    def _default_value(self) -> Type[Any]:
        return (
            self.PANDAS_TYPE_DEFAULT_VALUE_MAP[self.pandas_type] # Fall back to dtype map if no explicit default was given.
            if self._default_property_value is None
            else self._default_property_value
        )

Disease Modules that then need to do something different to this can still overwrite initialise_population as usual. Modules that need to do something in addition to this can invoke super().initialise_population to copy these steps, before running the custom instructions they need in advance. Modules that just need to do these steps then don't even need to implement initialise_population explicitly.

This also means that if the names of the PROPERTIES for a given module are ever updated, they don't need to be changed in two places (within PROPERTIES and again in initialise_population).

@willGraham01
Copy link
Collaborator Author

willGraham01 commented Jul 19, 2024

Related to the above; the on_birth method in most modules also does something similar for the newborn child: sets all of their properties in the DF to be the defaults. We could again have the Module class define on_birth by default:

class Module:
    ....

    def on_birth(self, mother_id: int, child_id: int) -> None:
      for property_name, property in self.PROPERTIES.items():
          df.loc[child_id, property_name] = property._default_value

      # Or maybe more efficient
      df.loc[child_id, [p_name for p_name in self.PROPERTIES.keys()]] = [p._default_value for p in self.PROPERTIES.values()]

And again, modules can:

  • Not explicitly define this method if they want to do exactly this on a new birth
  • Use super() to do this and then run some additional commands specific to the subclass
  • Overwrite the method explicitly if they need to do something completely different

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant