Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30 rebuild vocabs #20

Merged
merged 11 commits into from
Jan 24, 2024
5 changes: 5 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
node_modules/
dist/
data/
.git/
.env
5 changes: 4 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@ SECRET=YOUR_GITHUB_WEBHOOK_SECRET
BUILD_URL=https://test.skohub.io/build
DOCKER_IMAGE=skohub/skohub-vocabs-docker
DOCKER_TAG=latest
PULL_IMAGE_SECRET=ThisIsATest
PULL_IMAGE_SECRET=ThisIsATest
GITHUB_TOKEN=
REBUILD_MAX_ATTEMPTS=30

3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ RUN chown -R node:node /app
COPY --chown=node:node .env.example .env
COPY --chown=node:node . .

# don't run prepare step with husky
RUN npm pkg delete scripts.prepare
RUN npm config set update-notifier false

RUN npm i --only=production

Expand Down
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ The `.env` file contains configuration details used by the static site generator
- `DOCKER_IMAGE`: The docker image which should be used to build the vocabulary, defaults to `skohub/skohub-vocabs-docker`
- `DOCKER_TAG`: The docker tag for the `DOCKER_IMAGE`, defaults to `latest`
- `PULL_IMAGE_SECRET`: The secret needed for the `/image` endpoint to trigger the pull of new images via webhook.
- `GITHUB_TOKEN`: In order to avoid api rate limits when using GitHub, provide a personal access token.
- `REBUILD_MAX_ATTEMPTS`: Max attempts trying to check the build status of the vocabulary. The build status is checked every 2 seconds, so 30 attempts make 60 seconds of waiting for the build status.

## How does it work?

Expand Down Expand Up @@ -63,6 +65,26 @@ In order to wire this up with GitHub, this has to be available to the public. Yo

To restart and rebuild the service, e.g. after a `git pull` do `docker compose up --build --force-recreate`.

## Rebuilding vocabularies

**Notice:**

During the rebuild of all hosted vocabularies, the service might make a lot of requests to the GitHub Api.
Depending on the number of hosted vocabularies, you might run into API rate limits.
To avoid this you can provide a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) in `.env` file (see above).
You also might want to increase the number of `REBUILD_MAX_ATTEMPTS` (to something like 300 or 3000), because rebuilds might take some time, depending on your machine, network, number of vocabularies etc.

To rebuild all vocabularies:

1. Make a backup of the dist-folder: `cp -R ./dist ./dist-backup`
1. Make sure to have built docker image: `docker build -t skohub-webhook .`
1. Then mount the dist folder of the webhook container and rebuilt the vocabs: `docker run -v ./.env:/app/.env -v ./dist:/app/dist skohub-webhook:latest "npm run rebuild-vocabs"`


To rebuild only a specific vocabulary you can provide the GitHub repository and the branch to build:

`docker run -v ./.env:/app/.env -v ./dist:/app/dist skohub-webhook:latest npm run rebuild-vocabs -- test/test-vocabs main`

## Connecting to our webhook server

Feel free to clone https://github.com/literarymachine/skos.git to poke around. Go to https://github.com/YOUR_GITHUB_USER/skos/settings/hooks/new to set up the web hook (get in touch to receive the secret). Edit https://github.com/YOUR_GITHUB_USER/skos/edit/master/hochschulfaecher.ttl and commit the changes to master. This will trigger a build and expose it at https://test.skohub.io/YOUR_GITHUB_USER/skos/w3id.org/class/hochschulfaecher/scheme.
Expand Down
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
"test:unit": "jest --forceExit unit",
"test:int": "jest --forceExit int",
"test:docker": "jest --forceExit docker",
"lint:js": "eslint src --ext .jsx,.js --quiet"
"lint:js": "eslint src --ext .jsx,.js --quiet",
"rebuild-vocabs": "node src/rebuildVocabs.js"
},
"repository": {
"type": "git",
Expand Down
28 changes: 26 additions & 2 deletions src/common.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
const crypto = require("crypto")
const fetch = require("node-fetch")

require("dotenv").config()

const { GITHUB_TOKEN } = process.env
const gitHubApiHeaders = GITHUB_TOKEN
? {
"Authorization": `Bearer ${GITHUB_TOKEN}`,
"X-GitHub-Api-Version": "2022-11-28"
}
: {}

const getHookGitHub = (headers, payload, SECRET) => {
const obj = {
type: "github",
Expand Down Expand Up @@ -80,6 +90,17 @@ const isValid = (hook, event) => {
return false
}

/**
* @param {Object} payload
* @param {string} SECRET
* @returns {string} signed payload
*/
const securePayload = (payload, SECRET) => {
const hmac = crypto.createHmac("sha1", SECRET)
const digest = "sha1=" + hmac.update(JSON.stringify(payload)).digest("hex")
return digest
}

const isSecured = (signature, payload, SECRET) => {
// Is not secured if all the parameters are not present
if (!signature || !payload || !SECRET) {
Expand Down Expand Up @@ -129,7 +150,8 @@ const getRepositoryFiles = async ({ type, repository, ref, filesURL }) => {
async function fetchTTLFilesFromGitHubRepository(repository, path = '', ref = '') {
const response = await fetch(`https://api.github.com/repos/${repository}/contents/${path}?` + new URLSearchParams({
ref: ref
}));
}),
{ headers: gitHubApiHeaders });
const contents = await response.json();
let ttlFiles = formatGitHubFiles(contents)
const subDirectories = contents.filter(file => file.type === 'dir');
Expand Down Expand Up @@ -215,5 +237,7 @@ module.exports = {
isSecured,
getRepositoryFiles,
parseHook,
checkStdOutForError
checkStdOutForError,
securePayload,
gitHubApiHeaders
}
207 changes: 207 additions & 0 deletions src/rebuildVocabs.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
const fs = require("fs-extra")
const path = require('path');
const { securePayload, gitHubApiHeaders } = require("./common.js")
require("dotenv").config()

const { SECRET, BUILD_URL, REBUILD_MAX_ATTEMPTS } = process.env
const args = process.argv.slice(2);

function protocolizeUrl(url) {
return url.startsWith("http")
? url
: "https://" + url

}

/**
* @typedef {Object} BuildInfo
* @property {string} id
* @property {Object} body
* @property {Date} date
* @property {string} ref
* @property {string} repository
* @property {string} status
*/

// get all json data from dist/build folder
const readBuildDir = async () => {
const directoryPath = './dist/build';
try {
const files = await fs.readdir(directoryPath);
const jsonFiles = files.filter(file => path.extname(file) === '.json');

const jsonObjects = await Promise.all(jsonFiles.map(async file => {
const filePath = path.join(directoryPath, file);
const data = await fs.readFile(filePath, 'utf8');
return JSON.parse(data);
}));

return jsonObjects;
} catch (err) {
console.error('Error:', err);
throw err;
}
}

/**
* Create an array only containing the most recent webhook requests
* @param {BuildInfo[]} buildInfo
* @returns {BuildInfo[]} sorted build information
*/
const sortBuildInfo = (buildInfo) => {
const reposToBuild = buildInfo.filter(b => {
if (buildInfo.some(e => e.ref === b.ref && e.repository === b.repository && new Date(e.date).getTime() > new Date(b.date).getTime())) {
return false
} else if (buildInfo.some(e => e.ref === b.ref && e.repository === b.repository && new Date(e.date).getTime() <= new Date(b.date).getTime())) {
return true
}
})
return reposToBuild
}

/**
* @param {string} repository
* @param {string} ref
*/
async function checkIfBranchExists(repository, ref) {
const branchName = ref.split("/").slice(-1)[0]
const result = await fetch(`https://api.github.com/repos/${repository}/branches`, {
headers: gitHubApiHeaders
})
.then(response => {
if (!response.ok) {
throw new Error('Network response was not ok');
}
return response.json();
})
.then(data => {
const branchExists = data.some(branch => branch.name === branchName);
if (branchExists) {
console.log(`${repository}/${branchName} exists!`);
return true
} else {
console.log(`Branch "${branchName}" does not exist.`);
return false
}
})
.catch(error => {
console.error(`Fetch error for repo ${repository} and branch ${branchName}, ${error}`);
return false
});
return result
}

/**
* send fetch request to webhook
* @param {BuildInfo} buildInfo
* @returns {string} buildId
*/
const sendBuildRequest = async (buildInfo) => {
const payload = {
repository: {
full_name: buildInfo.repository
},
ref: buildInfo.ref
}
const signature = securePayload(payload, SECRET)
const headers = {
"x-hub-signature": signature,
"Accept": "application/json",
"Content-Type": "application/json",
"x-github-event": "push",
}
try {
const response = await fetch(protocolizeUrl(BUILD_URL), {
method: "POST",
headers,
body: JSON.stringify(payload)
})
if (!response.ok) {
throw new Error("Network response was not ok!")
}
const respBody = await response.text()
// get url from response
const responseUrl = protocolizeUrl(respBody.substring(17))
console.log("Build Urls:", responseUrl)
const url = new URL(responseUrl)
const id = url.searchParams.get("id")

return {
id,
repository: buildInfo.repository,
ref: buildInfo.ref
}
} catch (error) {
console.error("Error sending request", error)
return {
id: null,
repository: buildInfo.repository,
ref: buildInfo.ref
}
}
}

/**
* @param {{
* id: string
* repository: string
* ref: string
* }} buildInfo
*/
const checkBuildStatus = (buildInfo) => {
const maxAttempts = REBUILD_MAX_ATTEMPTS ? Number(REBUILD_MAX_ATTEMPTS) : 30
let attempts = 0
let json = {}
const getData = async () => {
try {
const response = fs.readFileSync(`./dist/build/${buildInfo.id}.json`)
/** @type {BuildInfo} */
json = JSON.parse(response)
if (json.status === "complete" || json.status === "error") {
console.log(`${json.repository}, ${json.ref}: Finish with status: ${json.status} (ID: ${buildInfo.id})`)
return
} else {
throw new Error("Not completed")
}
} catch (error) {
if (attempts > maxAttempts || !buildInfo.id) {
console.log(`${buildInfo.repository}, ${buildInfo.ref}: did not finish after ${attempts} attempts. Aborting. Error: ${error} (ID: ${buildInfo.id})`)
return
}
setTimeout(() => {
if (json?.status === "processing") {
// we just keep trying to get the data
getData()
} else {
// increase attempts as it seems to be somewhere in between defined statuses
attempts++;
getData()
}
}, 2000)
}
}
getData()
}

const main = async () => {
if (args.length) {
const buildInfo = {
repository: args[0],
ref: "refs/heads/" + args[1]
}
const branchesExisting = await Promise.all([buildInfo].map(b => checkIfBranchExists(b.repository, b.ref)))
const cleanedBuildInfo = [buildInfo].filter((_, i) => branchesExisting[i])
const newBuildInfo = await Promise.all(cleanedBuildInfo.map((b) => sendBuildRequest(b)))
newBuildInfo.forEach(info => checkBuildStatus(info))

} else {
const buildInfo = await readBuildDir()
const sortedBuildInfo = sortBuildInfo(buildInfo)
const branchesExisting = await Promise.all(sortedBuildInfo.map(b => checkIfBranchExists(b.repository, b.ref)))
const cleanedBuildInfo = sortedBuildInfo.filter((_, i) => branchesExisting[i])
const newBuildInfo = await Promise.all(cleanedBuildInfo.map((b) => sendBuildRequest(b)))
newBuildInfo.forEach(info => checkBuildStatus(info))
}
}

main()