Skip to content

Commit

Permalink
Merge pull request #1221 from NLnetLabs/docs-migration
Browse files Browse the repository at this point in the history
Migrate docs into main project
  • Loading branch information
AlexanderBand authored Jul 3, 2024
2 parents 3253d3c + 1d1ee4f commit 40f4b7a
Show file tree
Hide file tree
Showing 56 changed files with 15,901 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ target/
tmp
work
examples/openid_connect_mock.rs
/doc/manual/build/
26 changes: 26 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS and Python version
build:
os: ubuntu-22.04
tools:
python: "3.11"

# Build documentation in with Sphinx
sphinx:
configuration: doc/manual/source/conf.py

# Build PDF & ePub
formats:
- epub
- pdf

# Declare the Python requirements required # to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: doc/manual/source/requirements.txt
303 changes: 303 additions & 0 deletions doc/manual/source/architecture.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
.. _doc_krill_architecture:

Architecture
============

This section is intended to give you an overview of the architecture of Krill,
which is important to keep in mind when deploying the application in your
infrastructure. It will give you an understanding how and where data is stored,
how to make your setup redundant and how to save and restore backups.

.. Warning:: Krill does NOT support clustering at this time. You can achieve
high availability by doing a fail-over to a standby *inactive*
installation using the same data and configuration. However, you
cannot have multiple active instances. This
`feature <https://github.com/NLnetLabs/krill/issues/20>`_ is on our
long term roadmap.

Used Disk Space
---------------

Krill stores all of its data under the ``DATA_DIR``. For users who will operate
a CA under an RIR / NIR parent the following sub-directories are relevant:

+-----------------+------------------------------------------------------------+
| Directory | Contents |
+=================+============================================================+
| data_dir/ssl | The HTTPS key and certificate used by Krill |
+-----------------+------------------------------------------------------------+
| data_dir/cas | The history of your CA(s) in raw JSON format |
+-----------------+------------------------------------------------------------+
| data_dir/pubd | If used, the history of your Publication Server |
+-----------------+------------------------------------------------------------+

.. Note:: Note that old versions of Krill also used the directories
``data_dir/rfc8181`` and ``data_dir/rfc6492`` for storing all
protocol messages exchanged between your CAs and their parent
and repository. If they are still present on your system, you
can safely remove them and save space - potentially quite a bit
of space.

Archiving
"""""""""

Krill offers the option to archive old, less relevant, historical information
related to publication. You can enable this by setting the option
``archive_threshold_days`` in your configuration file. If set Krill will move
all publication events older than the specified number of days to a subdirectory
called ``archived`` under the relevant data directory, i.e.
``data_dir/pubd/0/archived`` if you are using the Krill Publication Server and
``data_dir/cas/<your-ca-name>/archived`` for each of your CAs.

You can set up a cronjob to delete these events once and for all, but we
recommend that you save them in long term storage if you can. The reason is that
if (and only if) you have this data, you will be able to rebuild the complete
Krill state based on its *audit* log of events, and irrevocably prove that no
changes were made to Krill other than the changes recorded in the audit trail.
We have no tooling for this yet, but we have an `issue
<https://github.com/NLnetLabs/krill/issues/331>`_ on our backlog.

Saving State Changes
--------------------

You can skip this section if you're not interested in the gory details. However,
understanding this section will help to explain how backup and restore works in
Krill, and why a standby fail-over node can be used, but Krill's locking and
storage mechanism needs to be changed in order to make
`multiple active nodes <https://github.com/NLnetLabs/krill/issues/20>`_
work.

State changes in Krill are tracked using *events*. Krill CA(s) and Publication
Servers are versioned. They can only be changed by applying an *event* for a
specific version. An *event* just contains the data that needs to be changed.
Crucially, they cannot cause any side effects. As such, the overall state can
always be reconstituted by applying all past events. This concept is called
*event-sourcing*, and in this context the CAs and Publication Servers are
so-called *aggregates*.

Events are not applied directly. Rather, users of Krill and background jobs will
send their intent to make a change through the API, which then translates
this into a so-called *command*. Krill will then *lock* the target aggregate
and send the command to it. This locking mechanism is not aware of any
clustering, and it's a primary reason why Krill cannot run as an active-active
cluster yet.

Upon receiving a command the aggregate (your CA etc.) will do some work. In some
cases a command can have a side-effect. For example it may instruct your CA to
create a new key pair, after receiving entitlements from its parent. The key pair
is random — applying a command again would result in a new random key pair.
Remember that commands are not re-applied to aggregates, only their resulting
events are. Thus in this example there would be an event caused that contains
the resulting key pair.

After receiving the command, the aggregate will return one of the following:

1. An error
Usually this means that the command is not applicable to the aggregate
state. For example, you may have tried to remove a ROA which does not
exist.

When Krill encounters such an error, it will store the command with some
meta-information like the time the command was issued, and a summary of the
error, so that it can be seen in the history. It will then unlock the
aggregate, so that the next command can be sent to it.
2. No error, zero events
In this case the command turned out to be a *no-op*, and Krill just unlocks
the aggregate. The command sequence counter is not updated, and the command
is not saved. This is used as a feature whenever the 'republish' background
job kicks in. A 'republish' command is sent, but it will only have an
actual effect if there was a need to republish — e.g. a manifest would need
to be re-issued before it would expire.
3. One or more events
In this case there *is* a desired state change in a Krill aggregate. Krill
will now apply and persist the changes in the following order:

* Each event is stored. If an event already exists for a version, then
then the update is aborted. Because Krill cannot run as a cluster, and
it uses locking to ensure that updates are done in sequence, this will
only fail on the first event if a user tried to issue concurrent updates
to the same CA.
* On every fifth event a snapshot of the state is saved to a new file. If
this is successful then the old snapshot (if there is one) is renamed
and kept as a backup snapshot. The new snapshot is then renamed to the
'current' snapshot.
* When all events are saved, the command is saved enumerating all
resulting events, and including meta-information such as the time that
the time that the command was executed. And when `multiple users
<https://github.com/NLnetLabs/krill/issues/294>`_ will be supported,
this will also include *who* made a change.
* Finally the version information file for the aggregate is updated to
indicate its current version, and command sequence counter.

.. Warning:: Krill will crash, **by design**, if there is any failure in saving
any of the above files to disk. If Krill cannot persist its state
it should not try to carry on. It could lead to disjoints between
in-memory and on-disk state that are impossible to fix. Therefore,
crashing and forcing an operator to look at the system is the only
sensible thing Krill can now do. Fortunately, this should not
happen unless there is a serious system failure.

Loading State at Startup
------------------------

Krill will rebuild its internal state whenever it starts. If it finds that there
are surplus events or commands compared to the latest information state for any
of the aggregates, then it will assume that they are present because, either
Krill stopped in the middle of writing a transaction of changes to disk, or your
backup was taken in the middle of a transaction. Such surplus files are backed
up to a subdirectory called ``surplus`` under the relevant data directory, i.e.
``data_dir/pubd/0/surplus`` if you are using the Krill Publication Server and
``data_dir/cas/<your-ca-name>/surplus`` for each of your CAs.


Recover State at Startup
------------------------

When Krill starts, it will try to go back to the last possible **recoverable**
state if:

* it cannot rebuild its state at startup due to data corruption
* the environment variable: ``KRILL_FORCE_RECOVER`` is set
* the configuration file contains ``always_recover_data = true``

Under normal circumstances, i.e. when there is no data corruption, performing
this recovery will not be necessary. It can also take significant time due to
all the checks performed. So, we do **not recommend** forcing this.

Krill will try the following checks and recovery attempts:

* Verify each recorded command and its effects (events) in their historical
order.
* If any command or event file is corrupt it will be moved to a subdirectory
called ``corrupt`` under the relevant data directory, and all subsequent
commands and events will be moved to a subdirectory called ``surplus`` under
the relevant data directory.
* Verify that each snapshot file can be parsed. If it can't then this file is
moved to the relevant ``corrupt`` sub-directory.
* If a snapshot file could not be parsed, try to parse the backup snapshot. If
this file can't be parsed, move it to the relevant ``corrupt`` sub-directory.
* Try to rebuild the state to the last recoverable state, i.e. the last known
good event. Note that if this pre-dates the available snapshots, or, if no
snapshots are available this means that Krill will try to rebuild state by
replaying all events. If you had enabled archiving of events, it will not be
able rebuild state.
* If rebuilding state failed, Krill will now exit with an error.

Note that in case of data corruption Krill may be able to fall back to an
earlier recoverable state, but this state may be far in the past. You should
always verify your ROAs and/or delegations to child CAs in such cases.

Of course, it's best to avoid data corruption in the first place. Please monitor
available disk space, and make regular backups.

Backup / Restore
----------------

Backing up Krill is as simple as backing up its data directory. There is no need
to stop Krill during the backup. To restore put back your data directory and
make sure that you refer to it in the configuration file that you use for your
Krill instance. As described above, if Krill finds that the backup contain an
incomplete transaction, it will just fall back to the state prior to it.

.. Warning:: You may want to **encrypt** your backup, because the
``data_dir/ssl`` directory contains your private keys in clear
text. Encrypting your backup will help protect these, but of course
also implies that you can only restore if you have the ability to
decrypt.

Krill Upgrades
--------------

All Krill versions 0.4.1 and upwards can be automatically upgraded to the
current version. Any required data migrations will be performed automatically.
To do so we recommend that you:

* backup your krill data directories
* install the new version of Krill
* stop the running Krill instance
* start Krill again, using the new binary, and the same configuration

If you want to test if data migrations will work correctly for your data,
you can do the following:

* copy your data directory to another system
* set the env variable ``KRILL_UPGRADE_ONLY=1``
* create a configuration file, and set ``data_dir=/path/to/your/copy``
* start up Krill

Krill will then perform the data migrations, rebuild its state, and then exit
before doing anything else.

Krill Downgrades
----------------

Downgrading Krill data is not supported. So, downgrading can only be achieved
by installing a previous version of Krill and restoring a backup from before
your upgrade.

.. _proxy_and_https:

Proxy and HTTPS
---------------

HTTPS Mode
""""""""""

Krill uses HTTPS by default, and will generate a key pair and create a
self-signed certificate if no previous key pair or certificate is found.
Files are stored under the data directory as :file:`ssl/key.pem` and
:file:`ssl/cert.pem` respectively.

Alternatively you make Krill configure krill to not generate these files
but use existing files at the same file locations. This should work, but
has not been tested extensively. To use this mode you can use
```https_mode = "existing"``` in your krill configuration file.

It also possible to force Krill to disable HTTPS and use plain HTTP. We
do not recommend this set up, but it may be useful in certain setups.
Arguably, as long as Krill listens on 127.0.0.1 only (as is the default),
and an HTTPS enabled proxy server is used for public access, then having
plain HTTP traffic between the proxy and Krill over the loopback interface
is not necessarily problematic. To use this mode set
```https_mode = "disable"``` in your configuration file.

If you need to access the Krill UI or API (also used by the CLI) from
another machine, then we highly recommend that you use a proxy server
such as NGINX or Apache. This proxy can then also use a proper HTTPS
certificate signed by a web TA, and production grade TLS support.

Proxy Krill UI
""""""""""""""

The Krill UI and assets are hosted directly under the base path ``/``. So, in
order to proxy to the Krill UI you should proxy ALL requests under ``/`` to the
Krill back-end.

Note that although the UI and API are protected by a token, you should consider
further restrictions in your proxy setup, such as restrictions on source IP or
adding your own authentication.

Proxy Krill as Parent
"""""""""""""""""""""

If you delegated resources to child CAs then you will need to ensure that these
children can reach your Krill. Child requests for resource certificates are
directed to the ``/rfc6492`` directory under the ``service_uri`` that you
defined in your configuration file.

Note that contrary to the UI you should not add any additional authentication
mechanisms to this location. :RFC:`6492` uses cryptographically signed messages
sent over HTTP and is secure. However, verifying messages and signing responses
can be computationally heavy, so if you know the source IP addresses of your
child CAs, you may wish to restrict access based on this.

Proxy Krill as Publication Server
"""""""""""""""""""""""""""""""""

If you are running Krill as a Publication Server, then you should read
:ref:`here<doc_krill_publication_server>` how to do the Publication Server
specific set up.

.. Warning:: We recommend that you do **not** make Krill available to the public
internet unless you really need remote access to the UI or API, or
you are serving as parent CA or Publication Server for other CAs.
Loading

0 comments on commit 40f4b7a

Please sign in to comment.