Skip to content
This repository has been archived by the owner on Dec 2, 2024. It is now read-only.

Fuse example #8

Merged
merged 3 commits into from
Mar 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Usage scenario
1. Select datasets from Gen3 data browser (starting at portal or at Galaxy data source that redirects to browser)
2. Click on “Analyze data in Galaxy” button in data browser to launch/navigate to Galaxy
2. Click on “Analyze data in Galaxy” button in data browser to launch/navigate to Galaxy
3. Galaxy starts and datasets + metadata from portal are added to history
4. User can run tools/workflows on data in place and outputs are placed in object storage bucket
5. Concrete example: run pathway activity tools on TCGA RNA-seq breast cancer cohort (~1000 datasets)
Expand All @@ -14,3 +14,8 @@
## Assumptions/Goals
* Single-tenant Galaxy instance
* Integration platform preferences: (1) Gen3; (2) ISB or FireCloud; (3) Cavatica; ideal: Gen3 running on cancer cloud(s) and orchestrating Galaxy provisioning/deployment


## Examples
* [gen3 aware workspace](docs/fence/README.md)
* [gcsfuse docker](docs/gcsfuse/README.md)
112 changes: 112 additions & 0 deletions docs/fence/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Fence client

## overview

Practical example to configure your server side application to retrieve the authenticated user's authorization record.

![image](https://user-images.githubusercontent.com/47808/54386078-6f00d200-4655-11e9-841b-62bcf804f7b9.png)


This document will illustrate:
a) how to configure revproxy (nginx) to recognize your application
b) how to call fence from within your application to retrieve authorization

It will not cover:
* How fence authenticates the user [see here for more](https://github.com/uc-cdis/fence#oidc--oauth2)
* How to configure authorization [see here for more](https://github.com/uc-cdis/compose-services#setting-up-users)


## setup

We assume you have installed either gen3's [docker-compose](https://github.com/uc-cdis/compose-services) or [cloud](https://github.com/uc-cdis/cloud-automation) services.

## your service

Add a stanza to your [compose-service](
https://github.com/uc-cdis/compose-services/blob/master/docker-compose.yml).

```
mock-workspace-service:
build:
context: ./mock-workspace
container_name: mock-workspace-service
networks:
- devnet
ports:
- "5000:5000"
depends_on:
- fence-service
```

Then inform revproxy about your service

```
depends_on:
- indexd-service
- peregrine-service
- sheepdog-service
- fence-service
- portal-service
- pidgin-service
# my service
- mock-workspace-service
```


## revproxy

Next we need to tell `revproxy` about your service and what path it will respond to.

Add an nginx stanza to your [revproxy-service](
https://github.com/uc-cdis/compose-services/blob/master/docker-compose.yml).
See [this](https://github.com/uc-cdis/cloud-automation/blob/4241f40c6e10fd7096085a9456217b7b5e7cbb24/kube/services/revproxy/gen3.nginx.conf/fence-service.conf#L5) for inspiration.
Note: the `/lw-workspace` path is maintained in `portal-service` and is not currently configurable.

```
#
location /lw-workspace/ {
proxy_pass http://mock-workspace-service:5000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
```


## query fence

Using a simple flask application, we respond to the request on /lw-workspace and simply return all relevant information to the caller.

The fence `/user` endpoint is documented [here](https://github.com/uc-cdis/fence/blob/master/openapis/swagger.yaml#L215) and the returned authorization payload is described [here](https://github.com/uc-cdis/fence/blob/master/openapis/swagger.yaml#L1805)



```
"""A mock API."""
from flask import Flask
from flask import jsonify
from flask import request
import requests

app = Flask(__name__)

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def index(path):
"""A simple way to create a Catch-All function which serves every URL including / is to chain two route filters. One for the root path '/' and one including a path placeholder for the rest.
We can't just use one route filter including a path placeholder because each placeholder must at least catch one character."""

# call fence and get information on the user
fence_user = requests.get('http://fence-service/user', cookies=request.cookies)

# just dump out all the data
return jsonify(
{'path': path,
'headers': dict(**request.headers),
'cookies': dict(**request.cookies),
'fence': fence_user.json()
})

if __name__ == '__main__':
app.run(debug=True)
```
19 changes: 19 additions & 0 deletions docs/fence/mock-workspace/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# debian:stretch
FROM python:3.7-stretch

SHELL ["/bin/bash", "-c"]

WORKDIR /home/app
COPY requirements.txt requirements.txt
RUN python -m venv venv
RUN venv/bin/pip install -r requirements.txt
RUN venv/bin/pip install gunicorn

COPY . .
COPY boot.sh ./
RUN chmod +x boot.sh

ENV FLASK_APP app.py

EXPOSE 5000
ENTRYPOINT ["/bin/bash", "-c" ,"./boot.sh"]
3 changes: 3 additions & 0 deletions docs/fence/mock-workspace/boot.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# this script is used to boot a Docker container
source venv/bin/activate
exec gunicorn -b 0.0.0.0:5000 --access-logfile - --error-logfile - echo.app:app
Empty file.
27 changes: 27 additions & 0 deletions docs/fence/mock-workspace/echo/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""A mock API."""
from flask import Flask
from flask import jsonify
from flask import request
import requests

app = Flask(__name__)

@app.route('/', defaults={'path': ''})
@app.route('/<path:path>')
def index(path):
"""A simple way to create a Catch-All function which serves every URL including / is to chain two route filters. One for the root path '/' and one including a path placeholder for the rest.
We can't just use one route filter including a path placeholder because each placeholder must at least catch one character."""

# call fence and get information on the user
fence_user = requests.get('http://fence-service/user', cookies=request.cookies)

# just dump out all the data
return jsonify(
{'path': path,
'headers': dict(**request.headers),
'cookies': dict(**request.cookies),
'fence': fence_user.json()
})

if __name__ == '__main__':
app.run(debug=True)
2 changes: 2 additions & 0 deletions docs/fence/mock-workspace/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Flask
requests
94 changes: 94 additions & 0 deletions docs/gcsfuse/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# gcsfuse docker container

## overview

Practical example to configure a docker container to mount a google bucket into a filesystem.

![image](https://user-images.githubusercontent.com/47808/54389220-d4a48c80-465c-11e9-8596-9bb5400e10c6.png)


This document will illustrate:

* How to configure gcsfuse within your docker container

It will not cover:

* How to create GCE VMs or buckets

## setup

We assume you have created and logged into your GCE VM and have access to a bucket.
`docker` should already be installed, if not there are many blog posts and documentation on how to accomplish this.


## your image

This example docker file:
* installs gcsfuse
* installs google's cloud utilities
* specifies that the user's `service_account.json` credentials should be mounted in the `/config` volume.
* launches a shell script to authenticate a start you service

`Dockerfile`
```
FROM python:3.7.2

# creates a CLI environment:
# * python 3.7.2 environment
# * mounts gs://data.bmeg.io on /data.bmeg.io
# Uses service_account_email argument and config/service_account.json

ARG service_account_email

# install gcsfuse
RUN echo "deb http://packages.cloud.google.com/apt gcsfuse-jessie main" | tee /etc/apt/sources.list.d/gcsfuse.list; \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update ; apt-get install -y apt-utils kmod && apt-get install -y gcsfuse

# install google utilities, ensure they are on path
RUN curl https://sdk.cloud.google.com | bash
ENV PATH=$PATH:/root/google-cloud-sdk/bin
RUN gcloud config set disable_usage_reporting true

# create a volume, this image should be started with
VOLUME ["/config"]

# mount and sleep forever
COPY docker-start.sh /docker-start.sh
ENTRYPOINT ["/docker-start.sh"]
```

This example entrypoint authenticates to google, mounts the volume

You can specify the following environmental variables:

* PROJECT=[the GCE project that contains the service account]()
* FUSE_PATH=[the directory to mount the bucket into]()
* BUCKET_NAME=[the bucket name]()

Note: the `--implicit-dirs` flag is necessary to parse object prefixes into directory names.

`/docker-start.sh`
```
#!/usr/bin/env bash


# authenticate
gcloud auth activate-service-account \
$service_account_email \
--key-file=/config/service_account.json --project=$PROJECT

# create data dir
mkdir -p $FUSE_PATH

# mount the bucket.
gcsfuse --implicit-dirs \
--key-file=/config/service_account.json \
$BUCKET_NAME $FUSE_PATH

echo $FUSE_PATH mounted

# launch your service, for this example sleep forever
echo sleep infinity...
sleep infinity
```