diff --git a/faqs/galaxy/collections_change_datatype.md b/faqs/galaxy/collections_change_datatype.md index 7ca0d4b4a36889..770f05ac861018 100644 --- a/faqs/galaxy/collections_change_datatype.md +++ b/faqs/galaxy/collections_change_datatype.md @@ -4,7 +4,7 @@ description: This will set the datatype for all files in your collection. Does n area: collections box_type: tip layout: faq -contributors: [shiltemann] +contributors: [shiltemann, hexylena] --- 1. Click on **Edit** {% icon galaxy-pencil %} next to the collection name in your history @@ -13,3 +13,10 @@ contributors: [shiltemann] - tip: you can start typing the datatype into the field to filter the dropdown menu 4. Click the **Save** button + +**Cannot find the feature?** + +If you are on a smaller Galaxy server, i.e. not one of the large (multi)national public servers, you may not be able to find this operation, and there is no indication it is missing or why it is disabled. + +Galaxy has recently started putting [more features behind a setting and deployment configuration](https://docs.galaxyproject.org/en/master/admin/production.html#use-celery-for-asynchronous-tasks) that needs to be enabled by the server administrator. +Your administrator will need to deploy Celery and potentially additionally flower and redis to their stack to enable changing the datatype of a collection. Consider sending your Galaxy administrator the link to the [simpler deployment option]({% link topics/admin/tutorials/celeryless/tutorial.md %}) or more complex [GTN tutorial for setting up redis and flower]({% link topics/admin/tutorials/celery/tutorial.md %}). diff --git a/topics/admin/tutorials/celery/tutorial.md b/topics/admin/tutorials/celery/tutorial.md index 8aa280f9517917..96783d3c8d37fc 100644 --- a/topics/admin/tutorials/celery/tutorial.md +++ b/topics/admin/tutorials/celery/tutorial.md @@ -31,9 +31,6 @@ tags: - git-gat --- -# Overview - - Celery is a distributed task queue written in Python that can spawn multiple workers and enables asynchronous task processing on multiple nodes. It supports scheduling, but focuses more on real-time operations. From the Celery website: @@ -52,7 +49,9 @@ From the Celery website: > {: .quote cite="https://docs.celeryq.dev/en/stable/getting-started/introduction.html#what-s-a-task-queue"} -[A slideshow presentation on this subject is available](slides.html). +[A slideshow presentation on this subject is available](slides.html). + +If you are not interesting in managing Redis and Flower, you might be interested in the [lower-configuration deployment option]({% link topics/admin/tutorials/celeryless/tutorial.md %}). > > diff --git a/topics/admin/tutorials/celeryless/tutorial.md b/topics/admin/tutorials/celeryless/tutorial.md new file mode 100644 index 00000000000000..291224dadcaa17 --- /dev/null +++ b/topics/admin/tutorials/celeryless/tutorial.md @@ -0,0 +1,86 @@ +--- +layout: tutorial_hands_on + +title: "Alternative Celery Deployment for Galaxy" +zenodo_link: "" +questions: + - What is *required* for Celery to work in Galaxy? +objectives: + - Setup the bare minimum configuration to get tasks working + - Avoid deploying, securing, and managing RabbitMQ and Redis and Flower +time_estimation: "1h" +key_points: + - While a combination of RabbitMQ and Redis is perhaps the most production ready, you can use Postgres as a backend for Celery + - This significantly simplifies operational complexity, and reduces the attack surface of your Galaxy. +contributions: + authorship: + - hexylena +requirements: + - type: "internal" + topic_name: admin + tutorials: + - ansible + - ansible-galaxy +subtopic: data +tags: + - ansible +--- + +Celery is a new component to the Galaxy world (ca 2023) and is a distributed task queue that *can* be used to run tasks asynchronously. It isn't mandatory, but you might find some features you expect to use to be missing without it. + +If you are running a large production deployment you probably want to follow the [Celery+Redis+Flower Tutorial]({% link topics/admin/tutorials/celery/tutorial.md %}). + +However, if you are running a smaller Galaxy you may not want to manage deploying Celery (past what Gravity does for you automatically), you may not want to add Redis to your stack, and you may not have need of Flower! + +> +> +> 1. TOC +> {:toc} +> +{: .agenda} + +# Configuring Galaxy to use Postgres + +AMQP is a message queue protocol which processes can use to pass messages between each other. While a real message queue like RabbitMQ is perhaps the most robust choice, there is an easier option: Postgres + +Add the following to your Galaxy configuration to use Postgres: + +```yaml +amqp_internal_connection: "sqlalchemy+postgresql:///galaxy?host=/var/run/postgresql" +``` + +# Configuring Celery to use Postgres + +Celery would prefer you use Redis (a Key-Value store) as a backend to store results. But we have a database! So let's try using that instead: + +```yaml +enable_celery_tasks: true +celery_conf: + broker_url: null # This should default to using amqp_internal_connection + result_backend: "db+postgresql:///galaxy?host=/var/run/postgresql" + task_routes: + galaxy.fetch_data: galaxy.external + galaxy.set_job_metadata: galaxy.external +``` + + +With that we should now be able to [use useful features like](https://docs.galaxyproject.org/en/master/admin/production.html#use-celery-for-asynchronous-tasks): + +- Changing the datatype of a collection. +- Exporting histories +- other things! + +# Configuring with Ansible + +If you're using Ansible, this could also look like: + +```yaml +amqp_internal_connection: "sqlalchemy+{{ database_connection }}" +enable_celery_tasks: true +celery_conf: + broker_url: null # This should default to using amqp_internal_connection + result_backend: "db+{{ database_connection }}" + task_routes: + galaxy.fetch_data: galaxy.external + galaxy.set_job_metadata: galaxy.external +```