diff --git a/README.md b/README.md index a8f31b5..8648812 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ ## Usage scenario 1. Select datasets from Gen3 data browser (starting at portal or at Galaxy data source that redirects to browser) -2. Click on “Analyze data in Galaxy” button in data browser to launch/navigate to Galaxy +2. Click on “Analyze data in Galaxy” button in data browser to launch/navigate to Galaxy 3. Galaxy starts and datasets + metadata from portal are added to history 4. User can run tools/workflows on data in place and outputs are placed in object storage bucket 5. Concrete example: run pathway activity tools on TCGA RNA-seq breast cancer cohort (~1000 datasets) @@ -14,3 +14,8 @@ ## Assumptions/Goals * Single-tenant Galaxy instance * Integration platform preferences: (1) Gen3; (2) ISB or FireCloud; (3) Cavatica; ideal: Gen3 running on cancer cloud(s) and orchestrating Galaxy provisioning/deployment + + +## Examples +* [gen3 aware workspace](docs/fence/README.md) +* [gcsfuse docker](docs/gcsfuse/README.md) diff --git a/docs/fence/README.md b/docs/fence/README.md new file mode 100644 index 0000000..3ebdfde --- /dev/null +++ b/docs/fence/README.md @@ -0,0 +1,112 @@ +# Fence client + +## overview + +Practical example to configure your server side application to retrieve the authenticated user's authorization record. + +![image](https://user-images.githubusercontent.com/47808/54386078-6f00d200-4655-11e9-841b-62bcf804f7b9.png) + + +This document will illustrate: +a) how to configure revproxy (nginx) to recognize your application +b) how to call fence from within your application to retrieve authorization + +It will not cover: +* How fence authenticates the user [see here for more](https://github.com/uc-cdis/fence#oidc--oauth2) +* How to configure authorization [see here for more](https://github.com/uc-cdis/compose-services#setting-up-users) + + +## setup + +We assume you have installed either gen3's [docker-compose](https://github.com/uc-cdis/compose-services) or [cloud](https://github.com/uc-cdis/cloud-automation) services. + +## your service + +Add a stanza to your [compose-service]( +https://github.com/uc-cdis/compose-services/blob/master/docker-compose.yml). + +``` +mock-workspace-service: + build: + context: ./mock-workspace + container_name: mock-workspace-service + networks: + - devnet + ports: + - "5000:5000" + depends_on: + - fence-service +``` + +Then inform revproxy about your service + +``` +depends_on: + - indexd-service + - peregrine-service + - sheepdog-service + - fence-service + - portal-service + - pidgin-service + # my service + - mock-workspace-service +``` + + +## revproxy + +Next we need to tell `revproxy` about your service and what path it will respond to. + +Add an nginx stanza to your [revproxy-service]( +https://github.com/uc-cdis/compose-services/blob/master/docker-compose.yml). +See [this](https://github.com/uc-cdis/cloud-automation/blob/4241f40c6e10fd7096085a9456217b7b5e7cbb24/kube/services/revproxy/gen3.nginx.conf/fence-service.conf#L5) for inspiration. +Note: the `/lw-workspace` path is maintained in `portal-service` and is not currently configurable. + +``` +# +location /lw-workspace/ { + proxy_pass http://mock-workspace-service:5000/; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; +} +``` + + +## query fence + +Using a simple flask application, we respond to the request on /lw-workspace and simply return all relevant information to the caller. + +The fence `/user` endpoint is documented [here](https://github.com/uc-cdis/fence/blob/master/openapis/swagger.yaml#L215) and the returned authorization payload is described [here](https://github.com/uc-cdis/fence/blob/master/openapis/swagger.yaml#L1805) + + + +``` +"""A mock API.""" +from flask import Flask +from flask import jsonify +from flask import request +import requests + +app = Flask(__name__) + +@app.route('/', defaults={'path': ''}) +@app.route('/') +def index(path): + """A simple way to create a Catch-All function which serves every URL including / is to chain two route filters. One for the root path '/' and one including a path placeholder for the rest. + We can't just use one route filter including a path placeholder because each placeholder must at least catch one character.""" + + # call fence and get information on the user + fence_user = requests.get('http://fence-service/user', cookies=request.cookies) + + # just dump out all the data + return jsonify( + {'path': path, + 'headers': dict(**request.headers), + 'cookies': dict(**request.cookies), + 'fence': fence_user.json() + }) + +if __name__ == '__main__': + app.run(debug=True) +``` diff --git a/docs/fence/mock-workspace/Dockerfile b/docs/fence/mock-workspace/Dockerfile new file mode 100644 index 0000000..90f9d36 --- /dev/null +++ b/docs/fence/mock-workspace/Dockerfile @@ -0,0 +1,19 @@ +# debian:stretch +FROM python:3.7-stretch + +SHELL ["/bin/bash", "-c"] + +WORKDIR /home/app +COPY requirements.txt requirements.txt +RUN python -m venv venv +RUN venv/bin/pip install -r requirements.txt +RUN venv/bin/pip install gunicorn + +COPY . . +COPY boot.sh ./ +RUN chmod +x boot.sh + +ENV FLASK_APP app.py + +EXPOSE 5000 +ENTRYPOINT ["/bin/bash", "-c" ,"./boot.sh"] diff --git a/docs/fence/mock-workspace/boot.sh b/docs/fence/mock-workspace/boot.sh new file mode 100644 index 0000000..5c0e226 --- /dev/null +++ b/docs/fence/mock-workspace/boot.sh @@ -0,0 +1,3 @@ +# this script is used to boot a Docker container +source venv/bin/activate +exec gunicorn -b 0.0.0.0:5000 --access-logfile - --error-logfile - echo.app:app diff --git a/docs/fence/mock-workspace/echo/__init__.py b/docs/fence/mock-workspace/echo/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/docs/fence/mock-workspace/echo/app.py b/docs/fence/mock-workspace/echo/app.py new file mode 100644 index 0000000..bcb1b4b --- /dev/null +++ b/docs/fence/mock-workspace/echo/app.py @@ -0,0 +1,27 @@ +"""A mock API.""" +from flask import Flask +from flask import jsonify +from flask import request +import requests + +app = Flask(__name__) + +@app.route('/', defaults={'path': ''}) +@app.route('/') +def index(path): + """A simple way to create a Catch-All function which serves every URL including / is to chain two route filters. One for the root path '/' and one including a path placeholder for the rest. + We can't just use one route filter including a path placeholder because each placeholder must at least catch one character.""" + + # call fence and get information on the user + fence_user = requests.get('http://fence-service/user', cookies=request.cookies) + + # just dump out all the data + return jsonify( + {'path': path, + 'headers': dict(**request.headers), + 'cookies': dict(**request.cookies), + 'fence': fence_user.json() + }) + +if __name__ == '__main__': + app.run(debug=True) diff --git a/docs/fence/mock-workspace/requirements.txt b/docs/fence/mock-workspace/requirements.txt new file mode 100644 index 0000000..e635204 --- /dev/null +++ b/docs/fence/mock-workspace/requirements.txt @@ -0,0 +1,2 @@ +Flask +requests diff --git a/docs/gcsfuse/README.md b/docs/gcsfuse/README.md new file mode 100644 index 0000000..fea6fa1 --- /dev/null +++ b/docs/gcsfuse/README.md @@ -0,0 +1,94 @@ +# gcsfuse docker container + +## overview + +Practical example to configure a docker container to mount a google bucket into a filesystem. + +![image](https://user-images.githubusercontent.com/47808/54389220-d4a48c80-465c-11e9-8596-9bb5400e10c6.png) + + +This document will illustrate: + +* How to configure gcsfuse within your docker container + +It will not cover: + +* How to create GCE VMs or buckets + +## setup + +We assume you have created and logged into your GCE VM and have access to a bucket. +`docker` should already be installed, if not there are many blog posts and documentation on how to accomplish this. + + +## your image + +This example docker file: +* installs gcsfuse +* installs google's cloud utilities +* specifies that the user's `service_account.json` credentials should be mounted in the `/config` volume. +* launches a shell script to authenticate a start you service + +`Dockerfile` +``` +FROM python:3.7.2 + +# creates a CLI environment: +# * python 3.7.2 environment +# * mounts gs://data.bmeg.io on /data.bmeg.io +# Uses service_account_email argument and config/service_account.json + +ARG service_account_email + +# install gcsfuse +RUN echo "deb http://packages.cloud.google.com/apt gcsfuse-jessie main" | tee /etc/apt/sources.list.d/gcsfuse.list; \ + curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \ + apt-get update ; apt-get install -y apt-utils kmod && apt-get install -y gcsfuse + +# install google utilities, ensure they are on path +RUN curl https://sdk.cloud.google.com | bash +ENV PATH=$PATH:/root/google-cloud-sdk/bin +RUN gcloud config set disable_usage_reporting true + +# create a volume, this image should be started with +VOLUME ["/config"] + +# mount and sleep forever +COPY docker-start.sh /docker-start.sh +ENTRYPOINT ["/docker-start.sh"] +``` + +This example entrypoint authenticates to google, mounts the volume + +You can specify the following environmental variables: + +* PROJECT=[the GCE project that contains the service account]() +* FUSE_PATH=[the directory to mount the bucket into]() +* BUCKET_NAME=[the bucket name]() + +Note: the `--implicit-dirs` flag is necessary to parse object prefixes into directory names. + +`/docker-start.sh` +``` +#!/usr/bin/env bash + + +# authenticate +gcloud auth activate-service-account \ + $service_account_email \ + --key-file=/config/service_account.json --project=$PROJECT + +# create data dir +mkdir -p $FUSE_PATH + +# mount the bucket. +gcsfuse --implicit-dirs \ + --key-file=/config/service_account.json \ + $BUCKET_NAME $FUSE_PATH + +echo $FUSE_PATH mounted + +# launch your service, for this example sleep forever +echo sleep infinity... +sleep infinity +```