Skip to content

Commit

Permalink
[docs] add browsertrix webhook doc
Browse files Browse the repository at this point in the history
  • Loading branch information
williamchong committed Jun 20, 2024
1 parent fb9641b commit 2a5c669
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 0 deletions.
91 changes: 91 additions & 0 deletions docs/browsertrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Browsertrix Webhook setup

Guide to setup app.browsertrix.com for receiving webhook events

## Authentication

Currently the API only support using username and password to exchange a JWT token with ~3 months expiration. Please refer to `Login` session below.

After getting the JWT, set `Authorization: Bearer ${JWT}` when accessing authenticated routes

## Login

POST `https://app.browsertrix.com//api/auth/jwt/login`

Use your account username and password to exchange for JWT

```
curl --location 'https://app.browsertrix.com//api/auth/jwt/login' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Accept: application/json' \
--data-urlencode 'username=<string>' \
--data-urlencode 'password=<string>'
```

Response:

```
{
"access_token": "string",
"token_type": "string"
}
```

## Get Organization ID

GET `https://app.browsertrix.com/api/orgs`

List organization to receive the organization ID needed for API calls

```
curl --location 'https://app.browsertrix.com/api/orgs' \
--header 'Authorization: Bearer <jwt>
```

Response:

```
{
"items": [
{
"id": "cb8515f9-7622-4879-b79e-d1f084a11ea2",
"name": "Starling Lab",
"slug": "starling-lab",
"users": {
...
}
...
}
]
}
```

## Set Webhook URL Config

POST `https://app.browsertrix.com/api/orgs/<org-id>/event-webhook-urls`

Set the set of webhook URLs to be used for all crawls in an organization

```
curl --location 'https://app.browsertrix.com/api/orgs/<org-id>/event-webhook-urls' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <jwt>
--data '{
"crawlStarted": "<webhook-url>",
"crawlFinished": "<webhook-url>",
"crawlDeleted": "<webhook-url>",
"uploadFinished": "<webhook-url>",
"uploadDeleted": "<webhook-url>",
"addedToCollection": "<webhook-url>",
"removedFromCollection": "<webhook-url>",
"collectionDeleted": "<webhook-url>"
}'
```

Response:

```
{
"updated": true
}
```
25 changes: 25 additions & 0 deletions docs/webhook.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,22 @@ Uploaded files are saved to a local directory with the filename set as the calcu

## Setup

### General

Make sure the following config values are set: `AA.*`, `webhook.Host`, `Dirs.Files`, and `Dirs.EncKeys`. `Dirs.Files` is the local directory where the uploaded files are saved as `${CID}`. `Dirs.EncKeys` is where attribute encryption keys are generated/read, with filenames saved in the format of `${CID}_${ATTRIBUTE_KEY}.key`.

An environment variable `JWT_SECRET` should be set as a 32-character secret, which will be used for signing `HS256` authentication JWTs.

### Browsertrix

To use the `/browsertrix` endpoint, `Browsertrix.User` and `Browsertrix.Password` should be set as your app.browsertrix username and password. It is used to get crawl information.

`Browsertrix.WebhookSecret` is a random string for authentication that is used as querystring `s` when setting up browsertrix webhook. e.g. `Browsertrix.WebhookSecret = secret` means the webhook should be setup as `/browsertrix?s=secret`

Webhook URLs are expected to be set through app.browsertrix.com API. A JWT token can be exchanged using [login](https://app.browsertrix.com/api/redoc#tag/auth/operation/login_api_auth_jwt_login_post) and [update event webhook urls](https://app.browsertrix.com/api/redoc#tag/organizations/operation/update_event_webhook_urls_api_orgs__oid__event_webhook_urls_post). Please refer to [Browsertrix webhook setup doc](./browsertrix.md) for details.

Crawl metadata are expected to be set in `(key):(value)` format. e.g. `project_id:test_project`. `project_id` must be set, otherwise the crawl events will not be processed.

## Authenticating with the Webhook

For webhook callers, ensure the config values `webhook.Host` and `webhook.Jwt` are set. `webhook.Jwt` should be a pre-shared `HS256` JWT signed by the webhook host (`JWT_SECRET`).
Expand Down Expand Up @@ -40,3 +52,16 @@ Generic endpoint for uploading and registering a file with attributes.
In the `metadata` part, there is a special key called `private`. Any key-value pairs under `private` will be stored in Authenticated Attributes as attributes like normal, but encrypted, with the encryption key stored at `Dirs.EncKeys` from the config.

The encryption key is stored with the name `${CID}_${ATTRIBUTE_KEY}.key`, but CLI tools like `attr` will automatically find and use it for you.

### POST /browsertrix

#### Body

- **Type:** `application/json`

- **Parts:**
Please refer to [Browsertrix crawlFinished event](https://app.browsertrix.com/api/redoc#operation/crawl_finishedcrawlFinished_post)

#### Description

For use with [Browsertrix cloud](https://app.browsertrix.com) webhook events. Please refer to [Browsertrix webhook setup doc](./browsertrix.md) for setup details. WACZ is downloaded and verified from crawl result. Extra metadata are fetched from the crawl's tags in the format of `(key):(value)`. Currently only supported keys are `project_id` and `asset_origin_id`.

0 comments on commit 2a5c669

Please sign in to comment.