Docs enhancement (#1327)

* Docs enhancement * Last changes
equinor · Jun 7, 2023 · 5ffeca3 · 5ffeca3
1 parent 4e59150
commit 5ffeca3
Show file tree

Hide file tree

Showing 15 changed files with 746 additions and 352 deletions.
diff --git a/README.md b/README.md
@@ -33,6 +33,8 @@ Gordo fulfills the role of inhaling config files and supplying components to the
 * [gordo-core](https://github.com/equinor/gordo-core/) - Gordo core library.
 * [gordo-client](https://github.com/equinor/gordo-client/) - Gordo server's client. It can make predictions from deployed models.
 
+---
+
 [Documentation is available on Read the Docs](https://gordo1.readthedocs.io/)
 
 ---
@@ -118,4 +120,4 @@ This command will run the local documentation server:
 ```console
 > cd docs/
 > make watch
-```
+```
diff --git a/docs/api/cli.rst b/docs/api/cli.rst
@@ -1,12 +1,33 @@
 CLI
 ---
 
+See :ref:`this <general/cli:command-line>` for command-line interface overview.
+
+Exceptions reporter
+^^^^^^^^^^^^^^^^^^^
+
 .. automodule:: gordo.cli.exceptions_reporter
     :members:
     :undoc-members:
     :show-inheritance:
 
+Click types
+^^^^^^^^^^^
+
 .. automodule:: gordo.cli.custom_types
     :members:
     :undoc-members:
     :show-inheritance:
+
+Utils
+^^^^^
+
+.. automodule:: gordo.cli.cli
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+.. automodule:: gordo.cli.workflow_generator
+    :members:
+    :undoc-members:
+    :show-inheritance:
diff --git a/docs/api/machine/model/transformer-funcs.rst b/docs/api/machine/model/transformer-funcs.rst
@@ -2,7 +2,7 @@ Transformer Functions
 ---------------------
 
 A collection of functions which can be referenced within the
-:class:``sklearn.preprocessing.FunctionTransformer`` transformer.
+:class:`sklearn.preprocessing.FunctionTransformer` transformer.
 
 General
 =======

diff --git a/docs/conf.py b/docs/conf.py
@@ -14,6 +14,8 @@
 
 _module_path = os.path.join(os.path.dirname(__file__), "..")
 sys.path.insert(0, _module_path)
+_examples_path = os.path.join(os.path.dirname(__file__), "..", "examples")
+# sys.path.insert(0, _examples_path)
 
 import gordo
 
@@ -42,13 +44,16 @@
     "IPython.sphinxext.ipython_console_highlighting",
     "sphinx_copybutton",
     "sphinx_click",
+    "nbsphinx"
 ]
 
 root_doc = "index"
 
 templates_path = ["_templates"]
 exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
 
+source_suffix = [".rst", ".md"]
+
 code_url = f"https://github.com/equinor/{project}/blob/{commit}"
 
 _ignore_linkcode_infos = [

diff --git a/docs/general/cli.rst b/docs/general/cli.rst
@@ -1,4 +1,4 @@
-Command Line
+Command-line
 ------------
 
 gordo CLI

diff --git a/docs/general/endpoints.rst b/docs/general/endpoints.rst
@@ -19,6 +19,8 @@ A detailed example of this API usage could be found :ref:`here <general/cluster_
 POST /prediction
 ^^^^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/<gordo_name>/prediction``
+
 :func:`gordo.server.blueprints.base.post_prediction`
 
 The ``/prediction`` endpoint will return the basic values a model
@@ -145,6 +147,8 @@ Furthermore, you can increase efficiency by instead converting your data to parq
 POST /anomaly/prediction
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/<gordo_name>/anomaly/prediction``
+
 :func:`gordo.server.blueprints.anomaly.post_anomaly_prediction`
 
 The ``/anomaly/prediction`` endpoint will return the data supplied by the :ref:`post-prediction` endpoint
@@ -227,27 +231,35 @@ against.
 GET /metadata
 ^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/expected-models``
+
 :func:`gordo.server.blueprints.base.get_metadata`
 
 Various metadata surrounding the current model and environment.
 
 GET /expected-models
 ^^^^^^^^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/expected-models``
+
 :func:`gordo.server.blueprints.base.get`
 
 Returns list of models for this project. Those models are expected to be built.
 
 GET /models
 ^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/models``
+
 :func:`gordo.server.blueprints.base.get_model_list`
 
 List of the current built models.
 
 GET /revisions
 ^^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/revisions``
+
 :func:`gordo.server.blueprints.base.get_revision_list`
 
 List of available model revisions (versions).
@@ -256,13 +268,17 @@ List of available model revisions (versions).
 GET /download-model
 ^^^^^^^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/<gordo_name>/download-model``
+
 :func:`gordo.server.blueprints.base.get_download_model`
 
 Returns the current model being served. Loadable via :func:`gordo.serializer.loads`.
 
 DELETE /revision
 ^^^^^^^^^^^^^^^^
 
+``/gordo/v0/<gordo_project>/<gordo_name>/revision/<revision>``
+
 :func:`gordo.server.blueprints.base.delete_model_revision`
 
 Delete one particular revision from the storage.
diff --git a/docs/ml/model_configuration.rst b/docs/ml/model_configuration.rst
@@ -40,4 +40,49 @@ which is used when using the model for predictions.
 
 :class:`gordo.machine.machine.Machine` class holds basically all information that is contained in the one Gordo config.
 The :class:`gordo.builder.build_model.ModelBuilder` class takes a :class:`gordo.machine.machine.Machine` and does the heavy lifting
-when it comes to data fetching, cross-validation and model training.
+when it comes to data fetching, cross-validation and model training.
+
+Evaluation specification
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Alongside the ML-model itself, all aspects of the cross-validation evaluation is parameterized in the config:
+
+.. code-block:: yaml
+
+   - evaluation:
+        cv: 
+          sklearn.model_selection.TimeSeriesSplit:
+            n_splits: 3
+        cv_mode: full_build
+        scoring_scaler: sklearn.preprocessing.MinMaxScaler
+        metrics:
+        - explained_variance_score
+        - r2_score
+        - mean_squared_error
+        - mean_absolute_error
+
+Alternatively, the ``cv_mode`` can be set to ``cross_val_only`` which will not fit the final model.
+
+Cross-validation methods
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Setting ``cv`` to :class:`sklearn.model_selection.TimeSeriesSplit` , the dataset is split as depicted below.
+Independent of the number of splits, the test set always is of the same size.
+
+An alternative is to use `k-fold <https://scikit-learn.org/stable/modules/cross_validation.html>`_ cross-validation.
+Here, one can decide to shuffle the data before it is split into folds.
+In contradiction to the time-series-split above, which augments the considered data in each fold with time-consecutive observations, this method is uncoupled from the time dimension.
+This must be considered when comparing results from different folds.
+
+The following parameters can then be set as such:
+
+.. code-block:: yaml
+
+   - evaluation:
+        cv: 
+          sklearn.model_selection.KFold:
+            n_splits: 3
+            shuffle: True
+            random_state: 0
+
+Borrowed from `scikit-learn <https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py>`_ , which performs the actual split/train for us.
diff --git a/docs/ml/model_output.rst b/docs/ml/model_output.rst
@@ -109,3 +109,13 @@ Based on these thresholds, the following metrics are reported:
 
 * ``anomaly-confidence`` = ``tag-anomaly-scaled`` / ``feature-thresholds-per-fold(last fold)``
 * ``total-anomaly-confidence`` = ``total-anomaly-scaled`` / ``aggregate-thresholds-per-fold(last-fold)``
+
+Scaling of data during cross-validation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Before the cross-validation is executed, the ``scoring_scaler`` is extracted.
+This is used to transform the data before the wanted metrics are calculated.
+
+An internal method, :func:`gordo.builder.build_model.ModelBuilder.build_metrics_dict`, is called prior to the cross-validation, which specified list ``metrics`` and the ``scoring_scaler``.
+This method builds a callable dictionary by using the nested function :func:`gordo.machine.model.utils.metric_wrapper`.
+This generated dictionary now contains information about the ``scoring_scaler`` which will be used later.
diff --git a/docs/overview.rst b/docs/overview.rst
@@ -11,15 +11,15 @@ The main interface after building the models is a set of ``REST`` APIs
 `Gordo <https://github.com/equinor/gordo-helm/blob/main/charts/gordo/templates/crds/gordos.equinor.com.yaml>`_ is a `CustomResourceDefinition <https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/>`_
 represents the project and could contains multiple Machine Learning models.
 
-`Model <https://github.com/equinor/gordo-helm/blob/main/charts/gordo/templates/crds/models.equinor.com.yaml>`_ is the CustomResourceDefinition
-represents the project and can contains multiple Machine Learning models.
-
 `gordo-controller <https://github.com/equinor/gordo-controller>`_ is a `K8S controller <https://cluster-api.sigs.k8s.io/developer/providers/implementers-guide/controllers_and_reconciliation.html>`_ and an API server that provides Gordos/Models statuses.
 
 ``dpl`` is a deployment `Job <https://kubernetes.io/docs/concepts/workloads/controllers/job/>`_ thats run :ref:`generate workflow <general/cli:generate>` command.
 
 ``model_builder1``, ``model_builder2`` Jobs builds ML models with :ref:`build <general/cli:build>` command.
 
+`Model <https://github.com/equinor/gordo-helm/blob/main/charts/gordo/templates/crds/models.equinor.com.yaml>`_ is the CustomResourceDefinition
+represents the model entity generated by the Argo workflow.
+
 ``storage`` is `PersistentVolume <https://kubernetes.io/docs/concepts/storage/persistent-volumes/>`_ where ML models have to be stored.
 
 ``gordo-server`` is a ML Server. Full API spec can be found :ref:`here <general/endpoints:endpoints>`.
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,7 +2,7 @@ Transformer Functions @@
     ---------------------
     A collection of functions which can be referenced within the
-    :class:``sklearn.preprocessing.FunctionTransformer`` transformer.
+    :class:`sklearn.preprocessing.FunctionTransformer` transformer.
     General
     =======
@@ Expand Down @@