-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality that allow to pre-load data into the storage bucket(s) #79
Comments
https://github.com/oittaa/gcp-storage-emulator#docker
|
This is a directory controlled by the service and is read/write root by default, as the Docker service also runs as root. Additionally this cannot apply in memory backed storage. What I would like to have is a user directory, with user permissions and mounting on the container so at launch it imports all data in there. Ideally the top level directories of the import directory should be used as bucket names. For example the following directory:
Should be loaded on startup and the server should create or use the buckets |
Yeah, that sounds like a good idea. I don't have much time at the moment, but pull requests are welcome. |
@MiltiadisKoutsokeras just FYI https://github.com/fsouza/fake-gcs-server has the behavior you're after. For our use-case we actually don't want that behavior and and are trying to move to |
I have come up with a solution to the problem. Here it goes. First I use Docker Compose to launch the container with these directives:
As you can see I pass the desired project name and bucket name via Env Vars, entrypoint.sh #!/usr/bin/env bash
# Exit in any error
set -e
[ "${PROJECT_ID}" = "" ] && { echo "PROJECT_ID Environment Variable is not Set!"; exit 1; }
# Install Python requirements
pip install google-cloud-storage==1.31.2
# Execute command line arguments in background and save process ID
"${@}" & PROCESSID=$!
# Wait process to start
while ! kill -0 "${PROCESSID}" >/dev/null 2>&1
do
echo "Waiting for process to start..."
sleep 1
done
echo "Process started, ID = ${PROCESSID}"
sleep 2
# Cloud Emulators
export STORAGE_EMULATOR_HOST=http://google_storage:9023
# Import data to bucket
echo "Importing data..."
python3 /docker_entrypoint_init.py
echo "DONE"
# Wait process to exit
wait "${PROCESSID}" docker_entrypoint_init.py """Initialize Google Storage data
"""
import logging
from os import scandir, environ
import sys
from google.auth.credentials import AnonymousCredentials
from google.cloud import storage
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
def upload_contents(client, directory, bucket_name=None):
"""Upload recursively contents of specified directory.
Args:
client (google.cloud.storage.Client): Google Storage Client.
directory (str): upload directory path.
bucket_name (str, optional): Bucket name to use for upload. Defaults to
None.
"""
for entry in scandir(directory):
print(entry.path)
if entry.is_dir():
if bucket_name is not None:
# This is a normal directory inside a bucket
upload_contents(client, directory + '/' +
entry.name, bucket_name)
else:
# This is a bucket directory
upload_contents(client, directory + '/' +
entry.name, entry.name)
elif entry.is_file():
if bucket_name is not None:
tokens = entry.path.split(bucket_name + '/')
bucket_obj = client.bucket(bucket_name)
if len(tokens) > 1:
gs_path = tokens[1]
blob_obj = bucket_obj.blob(gs_path)
blob_obj.upload_from_filename(entry.path)
PROJECT_ID = environ.get('PROJECT_ID')
if PROJECT_ID is None:
logger.error('Missing required Environment Variables! Please set \
PROJECT_ID')
sys.exit(1)
storage_client = storage.Client(credentials=AnonymousCredentials(),
project=PROJECT_ID)
# Scan import data directory
upload_contents(storage_client, '/docker-entrypoint-init-storage')
logger.info('Successfully imported bucket data!')
logger.info('List:')
for bucket in storage_client.list_buckets():
print(f'Bucket: {bucket}')
for blob in bucket.list_blobs():
print(f'|_Blob: {blob}')
# All OK
sys.exit(0) I hope this is helpful. |
It would be really useful, if the Docker container could start with pre-loaded data. This can help the usage of the tool for unit and integration testing. This could be done by either attaching a volume with data to pre-load or provide a hook script that is called before the server starts. Similar functionality is implemented in other GCP Storage Emulators and also in other common Docker images like databases (postgres Docker image has the
/docker-entrypoint-initdb.d
directory where the user can have SQL scripts for database initialization and data import).The text was updated successfully, but these errors were encountered: