Skip to content

Latest commit

 

History

History
237 lines (180 loc) · 13.8 KB

README.md

File metadata and controls

237 lines (180 loc) · 13.8 KB

Wikibase Suite Wikidata Query Service (wdqs) image

The Wikidata Query Service (WDQS) provides a way for tools to access Wikibase data, via a SPARQL API. It is based on Blazegraph.

💡 This image is part of Wikibase Suite (WBS). WBS Deploy provides everything you need to self-host a Wikibase instance out of the box.

Requirements

In order to run WDQS, you need:

  • at least 2 GB RAM to start WDQS
  • MediaWiki/Wikibase instance
  • WDQS as server
  • WDQS as updater
  • WDQS Proxy for public facing setups
  • Configuration via environment variables

MediaWiki/Wikibase instance

We suggest using the WBS Wikibase image because this is the image we run all our tests against. Follow the setup instructions over there to get it up and running.

WDQS as server

You'll need one instance of the image to execute the actual WDQS daemon started using /runBlazegraph.sh.

You can send GET requests with your SPARQL query to the WDQS endpoint (following the example below): http://wdqs:9999/bigdata/namespace/wdq/sparql?query={SPARQL}

WDQS as updater

You'll need one instance of the image to execute the updater started using /runUpdate.sh. This polls changes from Wikibase.

WDQS Proxy for public facing setups

By default, WDQS exposes some endpoints and methods that reveal internal details or functionality that might allow for abuse of the system. Wikibase Suite offers the WDQS-proxy which filters out all long-running or unwanted requests.

When running WDQS in a setup without WDQS-proxy, please consider disabling these endpoints in some other way.

Environment variables

Variables in bold are required.

Variable Default Description
WIKIBASE_HOST "wikibase" Hostname to reach the Wikibase service, e.g. the docker network internal hostname
WIKIBASE_CONCEPT_URI "" Concept URI, required for /runUpdate.sh only, the identifying prefix to entities in this knowledge graph, e.g. the public URL of the Wikibase host.
WDQS_HOST "wdqs" WDQS hostname (this service)
WDQS_PORT "9999" WDQS port (this service)
WIKIBASE_SCHEME "http" URL scheme used to reach the Wikibase service, e.g. http to reach a local wikibase on the same docker network
WDQS_ENTITY_NAMESPACES "120,122" Wikibase namespaces to load data from
WIKIBASE_MAX_DAYS_BACK "90" Maximum number of days updater can reach back in time from now
MEMORY "" Memory limit for Blazegraph
HEAP_SIZE "1g" Heap size for Blazegraph
BLAZEGRAPH_EXTRA_OPTS "" Extra options to be passed to Blazegraph,they must be prefixed with -D. Example: -Dhttps.proxyHost=http://my.proxy.com -Dhttps.proxyPort=3128. See the WDQS User Manual.

Example

Here's an example of how to run this image together with the WBS Wikibase image using Docker Compose.

services:
  wikibase:
    image: wikibase/wikibase
    depends_on:
      mysql:
        condition: service_healthy
    restart: unless-stopped
    ports:
      - 8880:80
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.wikibase.rule=Host(`wikibase.example`)"
      - "traefik.http.routers.wikibase.entrypoints=websecure"
      - "traefik.http.routers.wikibase.tls.certresolver=letsencrypt"
    volumes:
      - ./config:/config
      - wikibase-image-data:/var/www/html/images
    environment:
      MW_ADMIN_NAME: "admin"
      MW_ADMIN_PASS: "change-this-password"
      MW_ADMIN_EMAIL: "[email protected]"
      MW_WG_SERVER: https://wikibase.example
      DB_SERVER: mysql:3306
      DB_NAME: "my_wiki"
      DB_USER: "mariadb-user"
      DB_PASS: "change-this-password"
    healthcheck:
      test: curl --silent --fail localhost/wiki/Main_Page
      interval: 10s
      start_period: 5m

  wikibase-jobrunner:
    image: wikibase/wikibase
    command: /jobrunner-entrypoint.sh
    depends_on:
      wikibase:
        condition: service_healthy
    restart: always
    volumes_from:
      - wikibase

  mysql:
    image: mariadb:10.11
    restart: unless-stopped
    volumes:
      - mysql-data:/var/lib/mysql
    environment:
      MYSQL_DATABASE: "my_wiki"
      MYSQL_USER: "mariadb-user"
      MYSQL_PASSWORD: "change-this-password"
      MYSQL_RANDOM_ROOT_PASSWORD: yes
    healthcheck:
      test: healthcheck.sh --connect --innodb_initialized
      start_period: 1m
      interval: 20s
      timeout: 5s

  wdqs:
    image: wikibase/wdqs
    command: /runBlazegraph.sh
    depends_on:
      wikibase:
        condition: service_healthy
    restart: unless-stopped
    ulimits:
      nofile:
        soft: 32768
        hard: 32768
    volumes:
      - wdqs-data:/wdqs/data
    healthcheck:
      test: curl --silent --fail localhost:9999/bigdata/namespace/wdq/sparql
      interval: 10s
      start_period: 2m

  wdqs-updater:
    image: wikibase/wdqs
    command: /runUpdate.sh
    depends_on:
      wdqs:
        condition: service_healthy
    restart: unless-stopped
    ulimits:
      nofile:
        soft: 32768
        hard: 32768
    environment:
      WIKIBASE_CONCEPT_URI: https://wikibase.example

  wdqs-proxy:
    image: wikibase/wdqs-proxy
    depends_on:
      wdqs:
        condition: service_healthy
    restart: unless-stopped

volumes:
  wikibase-image-data:
  mysql-data:
  wdqs-data:

Releases

Official releases of this image can be found on Docker Hub wikibase/wdqs.

Tags and Versioning

This image uses semantic versioning.

We provide several tags that relate to the versioning semantics.

Tag Example Description
MAJOR 3 Tags the latest image with this major version. Gets overwritten whenever a new version is released with this major version. This will include new builds triggered by base image changes, patch version updates and minor version updates.
MAJOR.MINOR 3.1 Tags the latest image with this major and minor version. Gets overwritten whenever a new version is released with this major and minor version. This will include new builds triggered by base image changes and patch version updates.
MAJOR.MINOR.PATCH 3.1.7 Tags the latest image with this major, minor and patch version. Gets overwritten whenever a new version is released with this major, minor and patch version. This only happens for new builds triggered by base image changes.
MAJOR.MINOR.PATCH_wdqsWDQS-VERSION 3.1.7_wdqs0.1.317 Same as above, but also mentioning the current WDQS version.
MAJOR.MINOR.PATCH_buildBUILD-TIMESTAMP 3.1.7_build20240530103941 Tag that never gets overwritten. Every image will have this tag with a unique build timestamp. Can be used to reference images explicitly for reproducibility.

Upgrading

When upgrading between WDQS versions, the data stored in /wdqs/data may not be compatible with the newer version. When testing the new image, if no data appears to have been loaded into the Query Service, you'll need to reload the data.

If all changes still appear in [RecentChanges], removing /wdqs/data and restarting the service should reload all data.

However, [RecentChanges] are periodically purged of older entries, as determined by the MediaWiki configuration $wgRCMaxAge.

If you can't use [RecentChanges], you'll need to reload from an RDF dump:

Internal filesystem layout

Hooking into the internal filesystem can extend the functionality of this image.

File Description
/wdqs/allowlist.txt SPARQL endpoints allowed for federation
/wdqs/RWStore.properties Properties for the service
/templates/mwservices.json Template for MediaWiki services (populated and placed into /wdqs/mwservices.json at runtime)

Known issues

Updater keeps restarting

In some situations the WDQS Updater enters a restart loop, e.g., when restarted without containing any entities. When you restart a freshly installed instance, you will encounter this issue.

A workaround is to start the updater once with manual --init --start parameters. This forces it to sync data from MediaWiki for the current day.

In the Docker Compose example provided above, you might use the commands and instructions supplied below. This will also fix the problem in a Wikibase Suite Deploy instance.

# Stop the stock updater
docker compose stop wdqs-updater

# Start an updater with force sync settings
docker compose run --rm wdqs-updater /wdqs/runUpdate.sh -h http://\$WDQS_HOST:\$WDQS_PORT -- --wikibaseUrl \$WIKIBASE_SCHEME://\$WIKIBASE_HOST --conceptUri \$WIKIBASE_CONCEPT_URI --entityNamespaces \$WDQS_ENTITY_NAMESPACES --init --start $(date +%Y%m%d000000)

# As soon as you see "Sleeping for 10 secs" in the logs, press CTRL-C to stop it again

# Start the stock updater again
docker compose start wdqs-updater

As soon as the updater has synced the first entity from MediaWiki, the issue should disappear.

Source

This image is built from this Dockerfile.

Authors & Contact

This image is maintained by the Wikibase Suite Team at Wikimedia Germany (WMDE).

If you have questions not listed above or need help, use this bug report form to start a conversation with the engineering team.