SFDO-Tooling · prescod · Jul 10, 2022 · Jul 11, 2022 · Jul 11, 2022 · Jul 12, 2022
@@ -2,7 +2,7 @@ default_language_version:
     python: python3
 repos:
 -   repo: https://github.com/ambv/black
-    rev: 21.4b2
+    rev: 22.6.0
     hooks:
     - id: black
 -   repo: https://github.com/pre-commit/pre-commit-hooks

@@ -8,8 +8,6 @@ The Snowfakery interpreter reads a recipe, translates it into internal data stru
 
 Obviously, Snowfakery architecture will be easier to understand in the context of the language itself, so understanding the syntax is a good first step.
 
-
-
 ## Levels of Looping
 
 Snowfakery recipes are designed to be evaluated over and over again, top to bottom. Each run-through is called
@@ -21,15 +19,15 @@ This is useful for generating chunks of data called _portions_, and then handing
 
 Here is the overall pattern:
 
-| CumulusCI     | Snowfakery  | Data Loader  |
-| ------------- |-------------| -------------|
-| Generate Data | Start       | Wait         |
-| Load Data     | Stop        | Start        |
-| Generate Data | Start       | Stop         |
-| Load Data     | Stop        | Start        |
-| Generate Data | Start       | Stop         |
-| Load Data     | Finish      | Start        |
-| Load Data     | Finished    | Finish       |
+| CumulusCI     | Snowfakery | Data Loader |
+| ------------- | ---------- | ----------- |
+| Generate Data | Start      | Wait        |
+| Load Data     | Stop       | Start       |
+| Generate Data | Start      | Stop        |
+| Load Data     | Stop       | Start       |
+| Generate Data | Start      | Stop        |
+| Load Data     | Finish     | Start       |
+| Load Data     | Finished   | Finish      |
 
 Note that every time you Start and Stop Snowfakery, you generate a whole new Interpreter object, which re-reads the recipe. In some contexts, the new Intepreter object may be in a different process or (theoretically) on a different computer altogether.
 
@@ -57,9 +55,9 @@ So Snowfakery would run it once snapshot the "continuation state" and then fan t
 
 When reading Snowfakery code, you must always think about the lifetime of each data structure:
 
-* Will it survive for a single iteration, like local variables? We call these Transients.
-* Will it survive for a single continuation, like "FakerData" objects? We could call these Interpreter Managed objects.
-* Will it be saved and loaded between continuations, and thus survive across continuations? These are Globals.
+- Will it survive for a single iteration, like local variables? We call these Transients.
+- Will it survive for a single continuation, like "FakerData" objects? We could call these Interpreter Managed objects.
+- Will it be saved and loaded between continuations, and thus survive across continuations? These are Globals.
 
 ## The Parser
 
@@ -76,12 +74,12 @@ is executed once per continuation (or just once if the recipe is not continued).
 The Interpreter mediates access betewen the recipe (represented by the ParseResult) and resources
 such as:
 
- * the Output Stream
- * Global persistent data that survives continuations by being saved to and loaded from YAML
- * Transient persistent data that is discarded and rebuilt (as necessary) after continuation
- * The Row History which is used for allowing randomized access to objects for the `random_reference` feature
- * Plugins and Providers which extend Snowfakery
- * Runtime Object Model objects
+- the Output Stream
+- Global persistent data that survives continuations by being saved to and loaded from YAML
+- Transient persistent data that is discarded and rebuilt (as necessary) after continuation
+- The Row History which is used for allowing randomized access to objects for the `random_reference` feature
+- Plugins and Providers which extend Snowfakery
+- Runtime Object Model objects
 
 On my relatively slow computer it takes 1/25 of a second to initialize an Interpreter from a Recipe once all modules are loaded. It takes about 3/4 of a second to launch an interpreter and load the corre, required modules.
 
@@ -97,8 +95,7 @@ For example, a VariableDefinition represents this structure:
 
 ```
 
-
- An ObjectTemplate represents this one:
+An ObjectTemplate represents this one:
 
 ```
 - object: XXX
@@ -128,12 +125,12 @@ id_manager:
     Contact: 2
     Opportunity: 5
 intertable_dependencies:
-- field_name: AccountId
-  table_name_from: Contact
-  table_name_to: Account
-- field_name: AccountId
-  table_name_from: Opportunity
-  table_name_to: Account
+  - field_name: AccountId
+    table_name_from: Contact
+    table_name_to: Account
+  - field_name: AccountId
+    table_name_from: Opportunity
+    table_name_to: Account
 nicknames_and_tables:
   Account: Account
   Contact: Contact
@@ -173,28 +170,27 @@ today: 2022-06-06
 
 This also shows the contents of the Globals object. Things we track:
 
-* The last used IDs for various Tables, so we don't generate overlapping IDs
-* Inter-table dependencies, so we can generate a CCI mapping file or other output schema that depends on
+- The last used IDs for various Tables, so we don't generate overlapping IDs
+- Inter-table dependencies, so we can generate a CCI mapping file or other output schema that depends on
   relationships
-* Mapping from nicknames to tablenames, with tables own names being registered as nicknames for convenience
-* Data from specific ("persistent") objects which the user asked to be generated just once and may want to refer to again later
-* The current date to allow the `today` function to be consistent even if a process runs across midnight (perhaps we should revisit this)
+- Mapping from nicknames to tablenames, with tables own names being registered as nicknames for convenience
+- Data from specific ("persistent") objects which the user asked to be generated just once and may want to refer to again later
+- The current date to allow the `today` function to be consistent even if a process runs across midnight (perhaps we should revisit this)
 
 ### Transients
 
-If data should be discarded on every iteration (analogous to 'local variables' in a programming language) then it should be stored in the Transients object which is recreated on every iteration. This object is accessible through the Globals but is not saved to YAML. 
+If data should be discarded on every iteration (analogous to 'local variables' in a programming language) then it should be stored in the Transients object which is recreated on every iteration. This object is accessible through the Globals but is not saved to YAML.
 
 ### Row History
 
 RowHistory is a way of keeping track of the contents of a subset of all of the rows/objects generated by Snowfakery in a single continuation.
 
 There are a few Recipe patterns enabled by the row history:
 
- * `random_reference` lookups to nicknames
- * `random_reference` lookups to objects that have data of interest, such as _another_ `random_reference`
+- `random_reference` lookups to nicknames
+- `random_reference` lookups to objects that have data of interest, such as _another_ `random_reference`
 
-
-Row History data structures survive for as long as a single process/interpreter/continuation. A new 
+Row History data structures survive for as long as a single process/interpreter/continuation. A new
 continuation gets a new Row History, so it is not possible to use Row History to make links across
 continuation boundaries.
 
@@ -215,11 +211,10 @@ Here is the kind of recipe that might blow up memory:
   fields:
     ref:
       random_reference: target
-    name:
-      ${{ref.bloat}}
+    name: ${{ref.bloat}}
 ```
 
-The second object picks from one of a 100M unique strings 
+The second object picks from one of a 100M unique strings
 which are each approx 80M in size. That's a lot of data and
 would quickly blow up memory.
 
@@ -242,8 +237,24 @@ All Fake Data is mediated through the [FakeData](https://github.com/SFDO-Tooling
 
 Snowfakery extends and customizes the set of fake data providers through its [FakeNames](https://github.com/SFDO-Tooling/Snowfakery/search?q=%22class+FakeNames%22) class. For example, Snowfakery's email address provider incorporates the first name and last name of the imaginary person into the email. Snowfakery renames `postcode` to `postalcode` to match Salesforc conventions. Snowfakery adds timezones to date-time fakers.
 
-## Formulas 
+## Formulas
 
 Snowfakery `${{formulas}}` are Jinja Templates controlled by a class called the [`JinjaTemplateEvaluatorFactory`](https://github.com/SFDO-Tooling/Snowfakery/search?q=%22class+JinjaTemplateEvaluatorFactory%22). The `Interpreter` object keeps a reference to this class.
 
+## Continuations
+
+Recall that there are multiple [Levels of Looping](#levels-of-looping). Data which
+survives beyond continutation (process) boundaries lives in continuation files.
+You can see how that works here:
+
+```sh
+$ snowfakery foo.yml --generate-continuation-file /tmp/continue.yml && snowfakery foo.yml --continuation-file /tmp/continue.yml
+
+$ cat /tmp/continue.yml
+```
+
+The contents of `/tmp/continue.yml` are specific to a version of Snowfakery and subject
+to change over time.
 
+In general, it saves the contents of `just_once` objects and recently created
+objects.
@@ -36,7 +36,7 @@
 
 
 # save every single object to history. Useful for testing saving of datatypes
-SAVE_EVERYTHING = os.environ.get("SF_SAVE_EVERYTHING")
+SAVE_EVERYTHING = os.environ.get("SF_SAVE_EVERYTHING", False)
 
 
 class StoppingCriteria(NamedTuple):
@@ -126,11 +126,18 @@ def __init__(
         today: date = None,
         name_slots: Mapping[str, str] = None,
     ):
-        # these lists start empty and are filled.
-        # They survive iterations and continuations.
+        # all of these properties start empty and are filled.
+        # They all survive iterations and continuations.
+
+        # These two are indexed by name
         self.persistent_nicknames = {}
         self.persistent_objects_by_table = {}
 
+        # Not indexed because it is used only to refresh the RowHistory DB
+        #     after continuation
+        # Otherwise the data is never read or written
+        self.persistent_random_referenceable_objects = []
+
         self.id_manager = IdManager()
         self.intertable_dependencies = OrderedSet()
         self.today = today or date.today()
@@ -139,16 +146,25 @@ def __init__(
         self.reset_slots()
 
     def register_object(
-        self, obj: ObjectRow, nickname: Optional[str], persistent_object: bool
+        self,
+        obj: ObjectRow,
+        nickname: Optional[str],
+        persistent_object: bool,
+        random_referenced_object: bool,
     ):
         """Register an object for lookup by object type and (optionally) Nickname"""
         if nickname:
+            # should survive continuations. Somebody will probably `reference:`` it
             if persistent_object:
                 self.persistent_nicknames[nickname] = obj
             else:
                 self.transients.nicknamed_objects[nickname] = obj
         if persistent_object:
             self.persistent_objects_by_table[obj._tablename] = obj
+
+        if persistent_object and random_referenced_object:
+            self.persistent_random_referenceable_objects.append((nickname, obj))
+
         self.transients.last_seen_obj_by_table[obj._tablename] = obj
 
     @property
@@ -214,6 +230,10 @@ def serialize_dict_of_object_rows(dct):
             "today": self.today,
             "nicknames_and_tables": self.nicknames_and_tables,
             "intertable_dependencies": intertable_dependencies,
+            "persistent_random_referenceable_objects": [
+                (nn, obj.__getstate__())
+                for (nn, obj) in self.persistent_random_referenceable_objects
+            ],
         }
         return state
 
@@ -233,6 +253,10 @@ def deserialize_dict_of_object_rows(dct):
         self.intertable_dependencies = OrderedSet(
             Dependency(*dep) for dep in getattr(state, "intertable_dependencies", [])
         )
+        self.persistent_random_referenceable_objects = [
+            (nickname, hydrate(ObjectRow, v))
+            for (nickname, v) in state["persistent_random_referenceable_objects"]
+        ]
 
         self.today = state["today"]
         persistent_objects_by_table = state.get("persistent_objects_by_table")
@@ -373,26 +397,8 @@ def resave_objects_from_continuation(
     ):
         """Re-save just_once objects to the local history cache after resuming a continuation"""
 
-        # deal with objs known by their nicknames
-        relevant_objs = [
-            (obj._tablename, nickname, obj)
-            for nickname, obj in globals.persistent_nicknames.items()
-        ]
-        already_saved = set(obj._id for (_, _, obj) in relevant_objs)
-        # and those known by their tablename, if not already in the list
-        relevant_objs.extend(
-            (tablename, None, obj)
-            for tablename, obj in globals.persistent_objects_by_table.items()
-            if obj._id not in already_saved
-        )
-        # filter out those in tables that are not history-backed
-        relevant_objs = (
-            (table, nick, obj)
-            for (table, nick, obj) in relevant_objs
-            if table in tables_to_keep_history_for
-        )
-        for tablename, nickname, obj in relevant_objs:
-            self.row_history.save_row(tablename, nickname, obj._values)
+        for nickname, obj in globals.persistent_random_referenceable_objects:
+            self.row_history.save_row(obj._tablename, nickname, obj._values)
 
     def execute(self):
         RowHistoryCV.set(self.row_history)
@@ -569,19 +575,25 @@ def remember_row(self, tablename: str, nickname: T.Optional[str], row: dict):
                 self.interpreter.globals.register_intertable_reference(
                     tablename, fieldvalue._tablename, fieldname
                 )
+        if self._should_save(tablename, nickname):
+            self.interpreter.row_history.save_row(tablename, nickname, row)
+
+    def _should_save(self, tablename: str, nickname: T.Optional[str]) -> bool:
         history_tables = self.interpreter.tables_to_keep_history_for
-        should_save: bool = (
+        return (
             (tablename in history_tables)
             or (nickname in history_tables)
             or SAVE_EVERYTHING
         )
-        if should_save:
-            self.interpreter.row_history.save_row(tablename, nickname, row)
 
     def register_object(self, obj, name: Optional[str], persistent: bool):
         "Keep track of this object in case other objects refer to it."
         self.obj = obj
-        self.interpreter.globals.register_object(obj, name, persistent)
+        should_save = self._should_save(obj._tablename, name)
+        # `persistent means`: is it `just_once` and therefore might be
+        #   referred to by `reference` in a future iteration
+        # `should_save` means it may be referred to by `random_reference`
+        self.interpreter.globals.register_object(obj, name, persistent, should_save)
 
     @contextmanager
     def child_context(self, template):

@@ -70,6 +70,9 @@ def __init__(self, tablename: str, id: int):
 
 class LazyLoadedObjectReference(ObjectReference):
     _data = None
+    yaml_loader = yaml.SafeLoader
+    yaml_dumper = SnowfakeryDumper
+    yaml_tag = "!snowfakery_lazyloadedobjectrow"
 
     def __init__(
         self,
@@ -85,10 +88,17 @@ def __getattr__(self, attrname):
         if attrname.endswith("__"):  # pragma: no cover
             raise AttributeError(attrname)
         if self._data is None:
-            row_history = RowHistoryCV.get()
-            self._data = row_history.load_row(self.sql_tablename, self.id)
+            self._load_data()
         return self._data[attrname]
 
+    def _load_data(self):
+        row_history = RowHistoryCV.get()
+        self._data = row_history.load_row(self.sql_tablename, self.id)
+
+    def __reduce_ex__(self, *args, **kwargs):
+        self._load_data()
+        return super().__reduce_ex__(*args, **kwargs)
+
 
 class SlotState(Enum):
     """The current state of a NicknameSlot.

@@ -0,0 +1,19 @@
+### This recipe creates reference Account record for PMM data
+# Look at examples/salesforce/Account.recipe.yml for more examples.
+
+# Run this like this:
+
+# cci task run generate_and_load_from_yaml --generator_yaml snowfakery_samples/PMM/pmm_0_Account.recipe.yml --num_records 300 --num_records_tablename Account --org qa
+# snowfakery snowfakery_samples/PMM/pmm_0_Account.recipe.yml --output-format json --output-file src/foo.json
+
+# Set Macro for Household and Organization Record Type
+
+- object: Account
+  count: 3
+  just_once: True
+
+- object: Account
+  just_once: True
+  fields:
+    parent:
+      random_reference: Account
@@ -64,6 +64,7 @@ def test_stopping_criteria_with_startids(self, write_row):
 nicknames_and_tables: {}
 today: 2022-11-03
 persistent_nicknames: {}
+persistent_random_referenceable_objects: []
                 """
         generate(
             StringIO(yaml),

@@ -121,7 +121,8 @@ def test_parent_application__streams_instead_of_files(self, generated_rows):
            Foo: Foo
         persistent_nicknames: {}
         persistent_objects_by_table: {}
-        today: 2021-04-07"""
+        today: 2021-04-07
+        persistent_random_referenceable_objects: []"""
         )
         generate_continuation_file = StringIO()
         decls = """[{"sf_object": Opportunity, "api": bulk}]"""