Skip to content

Commit

Permalink
Docs enhancement (#1327)
Browse files Browse the repository at this point in the history
* Docs enhancement

* Last changes
  • Loading branch information
koropets authored Jun 7, 2023
1 parent 4e59150 commit 5ffeca3
Show file tree
Hide file tree
Showing 15 changed files with 746 additions and 352 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ Gordo fulfills the role of inhaling config files and supplying components to the
* [gordo-core](https://github.com/equinor/gordo-core/) - Gordo core library.
* [gordo-client](https://github.com/equinor/gordo-client/) - Gordo server's client. It can make predictions from deployed models.

---

[Documentation is available on Read the Docs](https://gordo1.readthedocs.io/)

---
Expand Down Expand Up @@ -118,4 +120,4 @@ This command will run the local documentation server:
```console
> cd docs/
> make watch
```
```
21 changes: 21 additions & 0 deletions docs/api/cli.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,33 @@
CLI
---

See :ref:`this <general/cli:command-line>` for command-line interface overview.

Exceptions reporter
^^^^^^^^^^^^^^^^^^^

.. automodule:: gordo.cli.exceptions_reporter
:members:
:undoc-members:
:show-inheritance:

Click types
^^^^^^^^^^^

.. automodule:: gordo.cli.custom_types
:members:
:undoc-members:
:show-inheritance:

Utils
^^^^^

.. automodule:: gordo.cli.cli
:members:
:undoc-members:
:show-inheritance:

.. automodule:: gordo.cli.workflow_generator
:members:
:undoc-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion docs/api/machine/model/transformer-funcs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Transformer Functions
---------------------

A collection of functions which can be referenced within the
:class:``sklearn.preprocessing.FunctionTransformer`` transformer.
:class:`sklearn.preprocessing.FunctionTransformer` transformer.

General
=======
Expand Down
5 changes: 5 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

_module_path = os.path.join(os.path.dirname(__file__), "..")
sys.path.insert(0, _module_path)
_examples_path = os.path.join(os.path.dirname(__file__), "..", "examples")
# sys.path.insert(0, _examples_path)

import gordo

Expand Down Expand Up @@ -42,13 +44,16 @@
"IPython.sphinxext.ipython_console_highlighting",
"sphinx_copybutton",
"sphinx_click",
"nbsphinx"
]

root_doc = "index"

templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

source_suffix = [".rst", ".md"]

code_url = f"https://github.com/equinor/{project}/blob/{commit}"

_ignore_linkcode_infos = [
Expand Down
2 changes: 1 addition & 1 deletion docs/general/cli.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Command Line
Command-line
------------

gordo CLI
Expand Down
16 changes: 16 additions & 0 deletions docs/general/endpoints.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ A detailed example of this API usage could be found :ref:`here <general/cluster_
POST /prediction
^^^^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/<gordo_name>/prediction``

:func:`gordo.server.blueprints.base.post_prediction`

The ``/prediction`` endpoint will return the basic values a model
Expand Down Expand Up @@ -145,6 +147,8 @@ Furthermore, you can increase efficiency by instead converting your data to parq
POST /anomaly/prediction
^^^^^^^^^^^^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/<gordo_name>/anomaly/prediction``

:func:`gordo.server.blueprints.anomaly.post_anomaly_prediction`

The ``/anomaly/prediction`` endpoint will return the data supplied by the :ref:`post-prediction` endpoint
Expand Down Expand Up @@ -227,27 +231,35 @@ against.
GET /metadata
^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/expected-models``

:func:`gordo.server.blueprints.base.get_metadata`

Various metadata surrounding the current model and environment.

GET /expected-models
^^^^^^^^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/expected-models``

:func:`gordo.server.blueprints.base.get`

Returns list of models for this project. Those models are expected to be built.

GET /models
^^^^^^^^^^^

``/gordo/v0/<gordo_project>/models``

:func:`gordo.server.blueprints.base.get_model_list`

List of the current built models.

GET /revisions
^^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/revisions``

:func:`gordo.server.blueprints.base.get_revision_list`

List of available model revisions (versions).
Expand All @@ -256,13 +268,17 @@ List of available model revisions (versions).
GET /download-model
^^^^^^^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/<gordo_name>/download-model``

:func:`gordo.server.blueprints.base.get_download_model`

Returns the current model being served. Loadable via :func:`gordo.serializer.loads`.

DELETE /revision
^^^^^^^^^^^^^^^^

``/gordo/v0/<gordo_project>/<gordo_name>/revision/<revision>``

:func:`gordo.server.blueprints.base.delete_model_revision`

Delete one particular revision from the storage.
47 changes: 46 additions & 1 deletion docs/ml/model_configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,49 @@ which is used when using the model for predictions.

:class:`gordo.machine.machine.Machine` class holds basically all information that is contained in the one Gordo config.
The :class:`gordo.builder.build_model.ModelBuilder` class takes a :class:`gordo.machine.machine.Machine` and does the heavy lifting
when it comes to data fetching, cross-validation and model training.
when it comes to data fetching, cross-validation and model training.

Evaluation specification
^^^^^^^^^^^^^^^^^^^^^^^^

Alongside the ML-model itself, all aspects of the cross-validation evaluation is parameterized in the config:

.. code-block:: yaml
- evaluation:
cv:
sklearn.model_selection.TimeSeriesSplit:
n_splits: 3
cv_mode: full_build
scoring_scaler: sklearn.preprocessing.MinMaxScaler
metrics:
- explained_variance_score
- r2_score
- mean_squared_error
- mean_absolute_error
Alternatively, the ``cv_mode`` can be set to ``cross_val_only`` which will not fit the final model.

Cross-validation methods
^^^^^^^^^^^^^^^^^^^^^^^^

Setting ``cv`` to :class:`sklearn.model_selection.TimeSeriesSplit` , the dataset is split as depicted below.
Independent of the number of splits, the test set always is of the same size.

An alternative is to use `k-fold <https://scikit-learn.org/stable/modules/cross_validation.html>`_ cross-validation.
Here, one can decide to shuffle the data before it is split into folds.
In contradiction to the time-series-split above, which augments the considered data in each fold with time-consecutive observations, this method is uncoupled from the time dimension.
This must be considered when comparing results from different folds.

The following parameters can then be set as such:

.. code-block:: yaml
- evaluation:
cv:
sklearn.model_selection.KFold:
n_splits: 3
shuffle: True
random_state: 0
Borrowed from `scikit-learn <https://scikit-learn.org/stable/auto_examples/model_selection/plot_cv_indices.html#sphx-glr-auto-examples-model-selection-plot-cv-indices-py>`_ , which performs the actual split/train for us.
10 changes: 10 additions & 0 deletions docs/ml/model_output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,13 @@ Based on these thresholds, the following metrics are reported:

* ``anomaly-confidence`` = ``tag-anomaly-scaled`` / ``feature-thresholds-per-fold(last fold)``
* ``total-anomaly-confidence`` = ``total-anomaly-scaled`` / ``aggregate-thresholds-per-fold(last-fold)``

Scaling of data during cross-validation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Before the cross-validation is executed, the ``scoring_scaler`` is extracted.
This is used to transform the data before the wanted metrics are calculated.

An internal method, :func:`gordo.builder.build_model.ModelBuilder.build_metrics_dict`, is called prior to the cross-validation, which specified list ``metrics`` and the ``scoring_scaler``.
This method builds a callable dictionary by using the nested function :func:`gordo.machine.model.utils.metric_wrapper`.
This generated dictionary now contains information about the ``scoring_scaler`` which will be used later.
6 changes: 3 additions & 3 deletions docs/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ The main interface after building the models is a set of ``REST`` APIs
`Gordo <https://github.com/equinor/gordo-helm/blob/main/charts/gordo/templates/crds/gordos.equinor.com.yaml>`_ is a `CustomResourceDefinition <https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/>`_
represents the project and could contains multiple Machine Learning models.

`Model <https://github.com/equinor/gordo-helm/blob/main/charts/gordo/templates/crds/models.equinor.com.yaml>`_ is the CustomResourceDefinition
represents the project and can contains multiple Machine Learning models.

`gordo-controller <https://github.com/equinor/gordo-controller>`_ is a `K8S controller <https://cluster-api.sigs.k8s.io/developer/providers/implementers-guide/controllers_and_reconciliation.html>`_ and an API server that provides Gordos/Models statuses.

``dpl`` is a deployment `Job <https://kubernetes.io/docs/concepts/workloads/controllers/job/>`_ thats run :ref:`generate workflow <general/cli:generate>` command.

``model_builder1``, ``model_builder2`` Jobs builds ML models with :ref:`build <general/cli:build>` command.

`Model <https://github.com/equinor/gordo-helm/blob/main/charts/gordo/templates/crds/models.equinor.com.yaml>`_ is the CustomResourceDefinition
represents the model entity generated by the Argo workflow.

``storage`` is `PersistentVolume <https://kubernetes.io/docs/concepts/storage/persistent-volumes/>`_ where ML models have to be stored.

``gordo-server`` is a ML Server. Full API spec can be found :ref:`here <general/endpoints:endpoints>`.
Loading

0 comments on commit 5ffeca3

Please sign in to comment.