NVIDIA · amaslenn · Aug 27, 2024 · Sep 9, 2024 · Sep 10, 2024 · Sep 10, 2024
diff --git a/doc/DEV.md b/doc/DEV.md
@@ -43,3 +43,97 @@ TBD
 
 ## Systems
 TBD
+
+
+# Desired (?) class diagram
+
+```mermaid
+---
+title: Cloud AI
+---
+classDiagram
+    TestRun *-- Test
+    Job *-- TestRun
+    Job *-- System
+
+    class System {
+        +run_cmd()
+        +get_dir()
+        +get_file()
+        +...()
+    }
+    class Test {
+        cmd_args
+        +cmd_as_dict()
+        +cmd_as_str()
+    }
+    class TestRun {
+        Test test
+        TestRun[] dependencies
+        Path _base_output_dir
+        int num_nodes
+        str[] nodes
+        str time_limit
+
+        +output_dir()
+    }
+    class Job {
+        int job_id
+        TestRun tr
+        System system
+
+        +run()
+        +kill()
+
+        +job_id()
+        +is_running()
+        +is_done()
+        +status()
+    }
+```
+1. `Test` is a `TestDefinition` from Pydantic intro PR. It is a test with all arguments. Basically, it is a reflection of a Test.toml, where all params are defined or default values are used.
+1. `TestRun` is a `Test` instance with `System`-specific parameters, like `num_nodes` for Slurm system.
+1. `Job` is a single runnable unit. `Job` knows how to interact with the system to get required information like job status. It can consist of a single `TestRun` or multiple `TestRun`s. For Slurm system this means that a single sbatch script can contain one or multiple tests.
+    1. Q: `Test` can have System-dependent parameters.  
+      True, but we define `Test` as a set of args and even right now all such parameters are defined inside Test TOML config and not coming from the System level. If we still need to have it, `TestRun` should be able to add/override such parameters.
+
+Notes and thoughts:
+1. `BaseRunner` and derivatives to be merged into Job class.
+1. Command line contruction: `Test.cmd_as_*()` methods generate actual test's command line. `Job` and `TestRun` are responsible to generate launcher command line and insert test's command line into it. For Slurm, this means that `srun` args are managed by `Job` and `TestRun`, `Test` should not know about srun.
+
+# Execution flow through the system (current logic)
+```mermaid
+flowchart TB
+    ur(A user runs cloudai specifying CLI arguments)
+    ur --> parsing
+    sp --> ttp
+    ttp --> tp
+    tp --> tsp
+
+    subgraph parsing
+    sp(First, System TOML is parsed into System object. It contains all system-specific parameters, plus output and installation directories.)
+    ttp(Then, all Test Template TOMLs are parsed into TestTemplate objects. These objects contain env and cmd arguments. TestTemplate requires a System object to be construct Strategies: Install, CommandGen, etc. *)
+    tp(Then, all Test TOMLs are parsed into Test objects. These objects contain TestTemplate as a property. And again, use env and cmd arguments, but also have extra_env and extra_cmd arguments.)
+    tsp(Finally, Test Scenario TOML is parsed into TestScenario object. It constructs TestRun objects, which contain Test objects and some run-specific parameters like num_nodes.)
+    end
+
+    jr --> st
+    st --> jm
+    parsing --> execution    
+    subgraph execution
+    jr(BaseRunner derivative is created to manage job execution. It takes System and TestSceanrio objects as arguments.)
+    st(BaseRunner._submit_test is called to produce Job object.)
+    jm(Job is monitored by BaseRunner.)
+    end
+
+    execution --> report_generation
+    subgraph report_generation
+    end
+```
+\* Some Strategies are inherited from TestTemplateStrategy and require System, env, and cmd arguments for their construction. Other Strategies fo not require any arguments.
+
+## output directory
+Output directory is set per Cloud AI invocation. It is constructed as follows:
+1. `BaseOutputDir = System.output_directory + TestScenario.name + CurrentTime`
+1. Each test then adds its own subdirectory to the output directory like `BaseOutputDir/TestName`
+