-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update developer docs for backend with more notes
* Split it out into separate files * Add notes about status of docs when relevant * Integrate documentation from docs into main backend repo
- Loading branch information
Showing
15 changed files
with
3,992 additions
and
518 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,112 @@ | ||
# OONI backend | ||
# OONI Backend | ||
|
||
Welcome to the OONI backend! | ||
The backend infrastructure performs multiple functions: | ||
|
||
- Provide APIs for data consumers | ||
|
||
- Instruct probes on what measurements to perform | ||
|
||
- Receive measurements from probes, process them and store them in the database | ||
|
||
- Upload new measurements to a bucket on [S3 data bucket](#s3-data-bucket) | ||
|
||
- Fetch data from external sources e.g. fingerprints from a GitHub repository | ||
|
||
## Main data flows | ||
|
||
OONI Probes will run generally once every hour or every day, depending on the platform. | ||
As part of these runs the sequence diagram of a probe run looks like the following: | ||
|
||
```mermaid | ||
sequenceDiagram | ||
participant OONIProbe as OONI Probe | ||
participant ProbeServices as OONI Backend | ||
participant Internet | ||
OONIProbe ->>+ Internet: lookupProbeMeta() | ||
Internet ->>- OONIProbe: ProbeMeta | ||
OONIProbe ->>+ ProbeServices: checkIn(ProbeMeta) | ||
ProbeServices -->>- OONIProbe: []Targets | ||
loop Every target | ||
OONIProbe ->>+ Internet: runExperiment(target) | ||
opt Control | ||
OONIProbe ->>+ ProbeServices: runControl(target) | ||
ProbeServices ->>- OONIProbe: CtrlMeasurement | ||
end | ||
Internet ->>- OONIProbe: Measurement | ||
OONIProbe ->> ProbeServices: upload(Measurement) | ||
end | ||
``` | ||
|
||
The following diagram on the other hand, represents the main flow of measurement data. | ||
|
||
The dark rectangles represent processes. The cilinders represent data at rest: | ||
as files on disk, files on S3 or records in database tables. | ||
|
||
```mermaid | ||
flowchart LR | ||
A(("Measurement")):::measurement --> B["Measurement is uploaded"] | ||
B --> C["Fastpath (realtime)"]:::gray8Node & D["Disk Queue"] | ||
C --> E["Fastpath Table"]:::gray3Node@{ shape: cyl} | ||
D --> F["S3 Uploader (every hour)"]:::gray8Node | ||
F --> G["s3://ooni-data-eu-fra bucket"]@{shape: cyl} | ||
E --> H["OONI API"]:::gray8Node | ||
D --> decision{"`is older than 1h?`"} | ||
G --> decision | ||
decision --> H | ||
G --> PipelineV5["OONI Pipeline v5 (every day)"]:::gray8Node | ||
PipelineV5 --> O["Observation Tables"]:::gray3Node@{ shape: cyl} | ||
O --> H | ||
classDef measurement fill:#0588cb,color:#fff | ||
classDef gray2Node fill:#e9ecef,color:#000000 | ||
classDef gray3Node fill:#ced4da,color:#000000 | ||
classDef gray8Node fill:#343a40,color:#fff | ||
``` | ||
|
||
Probes submit measurements to the API with a POST at the following path: | ||
<https://api.ooni.io/apidocs/#/default/post_report__report_id_> The | ||
measurement is optionally decompressed if zstd compression is detected. | ||
It is then parsed and added with a unique ID and saved to disk. Very | ||
little validation is done at this time in order to ensure that all | ||
incoming measurements are accepted. | ||
|
||
Measurements are enqueued on disk using one file per measurement. On | ||
hourly intervals they are batched together, compressed and uploaded to | ||
S3 by the [Measurement uploader](#measurement-uploader) ⚙. The batching is | ||
performed to allow efficient compression. See the | ||
[dedicated subchapter](#measurement-uploader) ⚙ for details. | ||
|
||
The measurement is also sent to the [Fastpath](#fastpath) ⚙. The | ||
Fastpath runs as a dedicated daemon with a pool of workers. It | ||
calculates scoring for the measurement and writes a record in the | ||
fastpath table. Each measurement is processed individually in real time. | ||
See the [dedicated subchapter](#fastpath) ⚙ below. | ||
|
||
The disk queue is also used by the API to access recent measurements | ||
that have not been uploaded to S3 yet. See the | ||
[measurement API](#getting-measurement-bodies) 🐝 for details. | ||
|
||
## Reproducibility | ||
|
||
The measurement processing pipeline is meant to generate outputs that | ||
can be equally generated by 3rd parties like external researchers and | ||
other organizations. | ||
|
||
This is meant to keep OONI accountable and as a proof that we do not | ||
arbitrarily delete or alter measurements and that we score them as | ||
accessible/anomaly/confirmed/failure in a predictable and transparent | ||
way. | ||
|
||
> **important** | ||
> The only exceptions were due to privacy breaches that required removal | ||
> of the affected measurements from the [S3 data bucket](#s3-data-bucket) 💡 | ||
> bucket. | ||
As such, the backend infrastructure is | ||
[FOSS](https://en.wikipedia.org/wiki/Free_and_open-source_software) and | ||
can be deployed by 3rd parties. We encourage researchers to replicate | ||
our findings. | ||
|
||
Incoming measurements are minimally altered by the | ||
[Measurement uploader](#measurement-uploader) ⚙ and uploaded to S3. |
Oops, something went wrong.