Skip to content

Commit

Permalink
Update documentation and comment. (#1091)
Browse files Browse the repository at this point in the history
  • Loading branch information
Tim Bruijnzeels committed Dec 28, 2023
1 parent 46feb1e commit c19463e
Show file tree
Hide file tree
Showing 6 changed files with 195 additions and 1,017 deletions.
8 changes: 6 additions & 2 deletions doc/development/01_daemon.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@ configuration, and concurrency are handled.
Binaries
--------

The project includes two binaries:
The project includes the following binaries:
* `krill` is used to start a Krill daemon
* `krillc` is the CLI which uses the daemon's API
* `krillta` is used for the (offline) TA Signer, and access the TA Proxy through the API
* `krillup` can be used to test data migrations in isolation

HTTPS Server
------------
Expand All @@ -37,9 +39,11 @@ function in `src/daemon/http/server.rs`. This function performs the following st
* Verifies that the configured data directory is usable.
* Calls 'pre-start' upgrades before state is built. (e.g. migrate data structures).
* Instantiates a `KrillServer`, which will guard all state.
* Calls 'post_start_upgrade' to trigger any upgrade related runtime actions (may be none).
* Gets the shared `Scheduler` from `KrillServer` and runs so it can pick up tasks
* Creates a self-signed TLS certificate, unless one was prepared earlier.
* Builds a `hyper` server which then connects to the configured port and handles connections.
* This server keeps running until the Krill binary is terminated.
* This server keeps running until the `KrillServer` or `Scheduler` is terminated.

Note that the `hyper` server itself is stateless. For this it relies on an `Arc<KrillServer>` which can
be cloned cheaply whenever a request is processed. So, we use hyper for the following:
Expand Down
3 changes: 3 additions & 0 deletions doc/development/02_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ Krill Command Line Client Setup

There is a CLI binary included in Krill: `krillc`

User documentation is here:
https://krill.docs.nlnetlabs.nl/en/stable/cli.html

The CLI can be used to manage Certification Authorities, as well as the Publication Server (if used).
And it offers some functionality to users that the UI does not offer.

Expand Down
168 changes: 87 additions & 81 deletions doc/development/04_es_krill.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,35 +18,31 @@ KeyValueStore and JSON
----------------------

Before we delve in to the Krill eventsourcing code, we should talk a bit about storage.
Krill stores all values in a `KeyValueStore`, which is currently implemented as an
enum using a disk based back-end as the only current implementation. The idea is that
this will be extended in future with other implementations, perhaps [sled](https://docs.rs/sled/0.34.6/sled/),
[tikv](https://github.com/tikv/tikv) or some redis based store.
Krill stores all values in a `KeyValueStore`.

It may be good to use an enum, because if we have all possible implementations in our
own code then we don't need generics - which have a way of trickling up and causing long
compilation times.
This is implemented in the `kvx` library:
https://github.com/nlnetlabs/kvx

In any case, the `KeyValueStore` (`src/commons/eventsourcing/kv.rs`) expects that we present
a `KeyStoreKey` and save or retrieve values. The key supports 'scopes' which can be useful
for categorizing values according to their 'aggregate'. Scopes can also use `/` characters
to present sub-scopes, or sub-dirs in the the disk based implementation:
There is a PR to port the `kvx` implementation back into the core Krill code (Krill is
the only user after all), and in the process make it support async. To support this,
the code was updated to rely on an enum rather than a trait for `KeyValueStore`. This
PR can be found here:
https://github.com/NLnetLabs/krill/pull/1152

```rust
#[derive(Clone, Debug)]
pub struct KeyStoreKey {
scope: Option<String>,
name: String,
}
```
Currently, only disk and memory (for testing) implementations are supported. Database
options may be added in future, but note that that will require async support. The `kvx`
library claims to support postgresql but it can't be used in Krill because while the
library is sync, it uses a runtime under the hood for the database connection and this
conflicts with Krill because it already uses hyper and tokio.

We use serde json (de-)serialization for all types that need to be stored. The following
trait is used as a convenient shorthand:
Opt-in locking on disk relies on `fd-lock` in `kvx` and plain `rustix` in the PR (only
supports UNIX). This locking is used by Krill to ensure that updates to CAs, the Publication
Server Access (which Publishers have access) and Publication Server Content are always
applied sequentially.

```rust
pub trait Storable: Clone + Serialize + DeserializeOwned + Sized + 'static {}
impl<T: Clone + Serialize + DeserializeOwned + Sized + 'static> Storable for T {}
```
In principle, since the locking leverages `flock` which is supposedly NFS safe, this
should mean that as of 0.14.4 it is safe to run multiple active Krill instances using
the same shared NFS data directory. But.. this needs proper testing!

Aggregate
---------
Expand Down Expand Up @@ -294,7 +290,7 @@ Event Listeners
The Krill eventsourcing stack defines two different `EventListener` traits which are called
by the `AggregateStore` (see below) when an Aggregate is successfully updated. The first is
called before updates are saved, and it can fail, the second is called after all changes have
been applied, and cannot fail:
been applied, and cannot fail.

```rust
/// This trait defines a listener for events which is designed to receive
Expand All @@ -315,6 +311,17 @@ pub trait PostSaveEventListener<A: Aggregate>: Send + Sync + 'static {
}
```

In a nutshell, we use the event listeners for two things:
- Trigger that the `CaObjects` for a CA gets an updated Manifest and CRL (pre-save)
- Trigger that follow-up tasks are put on the `Scheduler`, based on events

As discussed in issue: https://github.com/NLnetLabs/krill/issues/1182
it would be best to remove the `PreSaveEventListener` trait and do everything
through (idempotent) triggered tasks on the queue in the `Scheduler`. Note
that Krill will add any missing tasks on this queue at startup, so this means
that even if an aggregate, like a CA, is saved and then the follow-up task
scheduling fails because of an outage, then the task will simply be re-added
when Krill restarts.

AggregateStore
--------------
Expand Down Expand Up @@ -354,7 +361,7 @@ where
/// Send a command to the latest aggregate referenced by the handle in the command.
///
/// This will:
/// - Retrieve the latest aggregate for this command.
/// - Wait for a LOCK for the latest aggregate for this command.
/// - Call the A::process_command function
/// on success:
/// - call pre-save listeners with events
Expand Down Expand Up @@ -384,68 +391,67 @@ explain its inner workings:
pub struct AggregateStore<A: Aggregate> {
kv: KeyValueStore,
cache: RwLock<HashMap<Handle, Arc<A>>>,
history_cache: Option<Mutex<HashMap<MyHandle, Vec<CommandHistoryRecord>>>>,
pre_save_listeners: Vec<Arc<dyn PreSaveEventListener<A>>>,
post_save_listeners: Vec<Arc<dyn PostSaveEventListener<A>>>,
outer_lock: RwLock<()>,
}
```

- Locking transactions
- Applying changes

First of all, note the presence of `outer_lock`. This is used a transactional
lock, across ALL aggregates. The `AggregateStore` will hold a write lock during
any updates, i.e. when an `Aggregate` is added, or when a command is sent to it.
And it will use a read lock for other operations.
ALL CHANGES to Aggregates, like `CertAuth` are done through the `command` function.
This function waits for a lock (using flock mentioned earlier) to ensure that
all changes to the Aggregate are applied sequentially.

This is not the most efficient way of doing things, and it should be revised
in future - especially when non-disk-based `KeyValueStore` options come into
play.
First the `Aggregate` is retrieved from the in memory cache if present. If there
is no cached instance then latest snapshot is retrieved from storage instead. If
there is also no snapshot, then the INIT command (i.e. with version 0) is retrieved
and applied instead. Then the key value store is queried for follow-up `StoredCommand`
values (for the version of the `Aggregate`) which are then applied. Note that
this would mean, in a possible cluster set up that guarantees locking, that even
if a cluster node is behind the other, it will simply find the missing updates.

Locking across all `Aggregate` instances on updates could be optimized already.
We could have separate locks for each instead, but we would need to manage these
locks when new instances are added or removed, and it does not really matter
in real terms of performance today. So, it is just a simple implementation for now.

Another thing worth mentioning here is that we keep a key value pair for each
`Aggregate` that describes its current version information. The structure is as
follows:

```rust
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
pub struct StoredValueInfo {
pub snapshot_version: u64,
pub last_event: u64,
pub last_command: u64,
pub last_update: Time,
}
```
Once the lates `Aggregate` has been retrieved, the `Command` is sent to it. Note
that this `Command` is not a `StoredCommand`. This contains the intent for a change,
while the second contains the result of such a change. The `Aggregate` is responsible
for verifying the command and it can return with: an error, no effect, or a vec
of change events.

If a command has no effect, i.e. there is no error and no change, then it is
simply forgotten. But, if a command resulted in changes, or an error, then
it is saved. In case of an error we simply save the command and a description
of the error. In case of success we save the command and all events. Whenever
a command is saved the `last_command` field is updated. If the command was
successful then the `last_event` and `last_update` values are also updated.

- Storing / Retrieving

The `AggregateStore` uses a `KeyValueStore` to save/retrieve key value pairs.
Commands and events are saved and retrieved this way. The `AggregateStore`
uses a strategy for key naming - which is probably too detailed for this
documentation. Values are saved/retrieved using JSON (`serde_json`). In
addition to commands and events we also save a current, and backup snapshot
for each `Aggregate`, and the `self.cache` is updated.

When an `AggregateStore` needs to get an `Aggregate` it will first retrieve
the latest `StoredValueInfo` from the `KeyValueStore`. Then it will try to
get the `Aggregate` from its `self.cache`.

If the `AggregateStore` cannot find an entry in the cache then it will try
to rebuilt the `Aggregate` by retrieving and deserializing the current
snapshot, and if that fails the backup snapshot, and if that also fails by
instantiating it using the initialization event (delta-0.xml).

In either case (cache or no cache) it will now verify whether the version of
the `Aggregate` matches the `latest_event` found in the `StoredValueInfo`.
If the number is lower, then it will retrieve all missing events from the
`KeyValueStore` and apply them.
simply forgotten.

If a command has an effect then a `StoredCommand` is created that contains a
storable representation of the command (e.g. certain values in a command, like
an `Arc<Signer>`, or `Arc<Config>` are not included) and either the error or
a Vec of Events. This `StoredCommand` gets a unique key name based on the
version of the `Aggregate` that it affects.

If there should be an existing key-value pair for this `StoredCommand`, then
this indicates that the locking mechanism failed somehow. This should not happen,
but if it did, then the command is NOT saved. Instead Krill exits with an error
message.

But if all is well (as expected), then the command with events is applied
to the aggregate. Note that this just updates its version in case of a command
that resulted in an error. The in-memory cached aggregate is updated to
help performance when retrieving it later. Then the command is saved.

- Retrieving

When retrieving an `Aggregate` for read access Krill actually follows
the same code path that is used for applying a `Command`, except that in
this case the underlying function that takes care of locking and retrieving
the latest Aggregate is called with `None` instead of `Some(command)`, so
it simply returns without trying to apply any changes. This ensures however,
that the same locking rules are observed in both cases and allows us to
avoid code duplication (thus increasing the loci for bugs).

- Snapshots

Note that we do not update the full snapshot on every change because this
could create a performance issue in cases where aggregates are big (e.g. a
CA with many children or objects) and serialization takes long. Instead, updating
the snapshots is done daily through a different code path from the `Scheduler`.
See `Task::UpdateSnapshots`. This code, again, actually uses the same underlying
function as above to retrieve the snapshot, this timing setting the `save_snapshot`
parameter to true to ensure that the snapshot is saved.
45 changes: 34 additions & 11 deletions doc/development/05_repo_manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,25 @@ it was convenient to create wrapper types to help access:
/// * verifying that a publisher is allowed to publish
/// * publish content to RRDP and rsync
pub struct RepositoryManager {
config: Arc<Config>,
access: Arc<RepositoryAccessProxy>,
content: Arc<RepositoryContentProxy>,

// shared task queue, use to schedule RRDP updates when content is updated.
tasks: Arc<TaskQueue>,

config: Arc<Config>,
signer: Arc<KrillSigner>,
}
```

```rust
/// We can only have one (1) RepositoryAccess, but it is an event-sourced
/// type which is stored in an AggregateStore which could theoretically
/// serve multiple RepositoryAccess instances. So, we use RepositoryAccessProxy
/// as a wrapper around this so that callers don't need to worry about storage details.
/// typed which is stored in an AggregateStore which could theoretically
/// serve multiple. So, we use RepositoryAccessProxy as a wrapper around
/// this so that callers don't need to worry about storage details.
pub struct RepositoryAccessProxy {
store: AggregateStore<RepositoryAccess>,
key: Handle,
key: MyHandle,
}
```

Expand All @@ -43,9 +47,8 @@ pub struct RepositoryAccessProxy {
/// so that callers don't need to worry about storage details.
#[derive(Debug)]
pub struct RepositoryContentProxy {
cache: RwLock<Option<Arc<RepositoryContent>>>,
store: RwLock<KeyValueStore>,
key: KeyStoreKey,
store: Arc<WalStore<RepositoryContent>>,
default_handle: MyHandle,
}
```

Expand Down Expand Up @@ -186,10 +189,9 @@ pub struct PublisherStats {
Publishing / RFC 8181
---------------------

Now, for the main purpose of course.. we have a simple function here that will take bytes
Now, for the main purpose of course.. we have a function here that will take bytes
submitted by a publisher, parse it and validate it as an RFC 8181 request and return the
appropriate signed response. If a valid delta was submitted, then the RRDP and rsync content
will be updated immediately.
appropriate signed response.

```rust
pub fn rfc8181(&self, publisher_handle: PublisherHandle, msg_bytes: Bytes) -> KrillResult<Bytes>;
Expand All @@ -201,3 +203,24 @@ from - e.g. there is I/O error - something is seriously broken in the server - t
will return an actual *rust* Error. It's up to the caller of this function to take
appropriate action, e.g. give the publisher an HTTP response code, but possibly even
crash this server - if it cannot function anymore.

Under the hood this function will first retrieve the publisher from `RepositoryAccess`
to verify that the sender is known, and get their ID certificate (used for signing)
and base_uri Rsync jail. The content of the query (list or send updates) is then
passed on to `RepositoryContent`.

The `RepositoryContent` type is not event-sourced as it turned out that keeping the
full history of changes from all CAs (especially their manifests and CRL updates)
resulted in keeping way too much data.

Instead this relies on the Write-Ahead-Log support. This is very, very, similar
to the event-sourcing used elsewhere, except that it does not guarantee that all
changes are kept. The `WalStore` (Wal: Write-Ahead-Log) only keeps a snapshot and
the latest 'change set's since that snapshot. As with the `AggregateStore` snapshots
are only updated nightly for performance reasons. In fact, it was the 250MB+ nic.br
repository content entity that triggered this code change.

The locking, retrieval and updating are similar to `AggregateStore`, but uses separate
types for all this for historical reasons. A future improvement could be to merge the
types and make it a per instance type (CertAuth, RepositoryAccess, RepositoryContent etc)
choice whether it's fully event-sourced or only recent changes are kept.
Loading

0 comments on commit c19463e

Please # to comment.