Skip to content

Commit

Permalink
Merge pull request #5496 from galaxyproject/pompano-python
Browse files Browse the repository at this point in the history
Add caveat, simplified celery deployment option that doesn't use RabbitMQ/Redis/Flower
  • Loading branch information
hexylena authored Nov 7, 2024
2 parents 4c1db3d + 65f703a commit 2e21f8f
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 5 deletions.
9 changes: 8 additions & 1 deletion faqs/galaxy/collections_change_datatype.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: This will set the datatype for all files in your collection. Does n
area: collections
box_type: tip
layout: faq
contributors: [shiltemann]
contributors: [shiltemann, hexylena]
---

1. Click on **Edit** {% icon galaxy-pencil %} next to the collection name in your history
Expand All @@ -13,3 +13,10 @@ contributors: [shiltemann]
- tip: you can start typing the datatype into the field to filter the dropdown menu
4. Click the **Save** button


**Cannot find the feature?**

If you are on a smaller Galaxy server, i.e. not one of the large (multi)national public servers, you may not be able to find this operation, and there is no indication it is missing or why it is disabled.

Galaxy has recently started putting [more features behind a setting and deployment configuration](https://docs.galaxyproject.org/en/master/admin/production.html#use-celery-for-asynchronous-tasks) that needs to be enabled by the server administrator.
Your administrator will need to deploy Celery and potentially additionally flower and redis to their stack to enable changing the datatype of a collection. Consider sending your Galaxy administrator the link to the [simpler deployment option]({% link topics/admin/tutorials/celeryless/tutorial.md %}) or more complex [GTN tutorial for setting up redis and flower]({% link topics/admin/tutorials/celery/tutorial.md %}).
7 changes: 3 additions & 4 deletions topics/admin/tutorials/celery/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,6 @@ tags:
- git-gat
---

# Overview


Celery is a distributed task queue written in Python that can spawn multiple workers and enables asynchronous task processing on multiple nodes. It supports scheduling, but focuses more on real-time operations.

From the Celery website:
Expand All @@ -52,7 +49,9 @@ From the Celery website:
>
{: .quote cite="https://docs.celeryq.dev/en/stable/getting-started/introduction.html#what-s-a-task-queue"}

[A slideshow presentation on this subject is available](slides.html).
[A slideshow presentation on this subject is available](slides.html).

If you are not interesting in managing Redis and Flower, you might be interested in the [lower-configuration deployment option]({% link topics/admin/tutorials/celeryless/tutorial.md %}).

> <agenda-title></agenda-title>
>
Expand Down
86 changes: 86 additions & 0 deletions topics/admin/tutorials/celeryless/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: tutorial_hands_on

title: "Alternative Celery Deployment for Galaxy"
zenodo_link: ""
questions:
- What is *required* for Celery to work in Galaxy?
objectives:
- Setup the bare minimum configuration to get tasks working
- Avoid deploying, securing, and managing RabbitMQ and Redis and Flower
time_estimation: "1h"
key_points:
- While a combination of RabbitMQ and Redis is perhaps the most production ready, you can use Postgres as a backend for Celery
- This significantly simplifies operational complexity, and reduces the attack surface of your Galaxy.
contributions:
authorship:
- hexylena
requirements:
- type: "internal"
topic_name: admin
tutorials:
- ansible
- ansible-galaxy
subtopic: data
tags:
- ansible
---

Celery is a new component to the Galaxy world (ca 2023) and is a distributed task queue that *can* be used to run tasks asynchronously. It isn't mandatory, but you might find some features you expect to use to be missing without it.

If you are running a large production deployment you probably want to follow the [Celery+Redis+Flower Tutorial]({% link topics/admin/tutorials/celery/tutorial.md %}).

However, if you are running a smaller Galaxy you may not want to manage deploying Celery (past what Gravity does for you automatically), you may not want to add Redis to your stack, and you may not have need of Flower!

> <agenda-title></agenda-title>
>
> 1. TOC
> {:toc}
>
{: .agenda}

# Configuring Galaxy to use Postgres

AMQP is a message queue protocol which processes can use to pass messages between each other. While a real message queue like RabbitMQ is perhaps the most robust choice, there is an easier option: Postgres

Add the following to your Galaxy configuration to use Postgres:

```yaml
amqp_internal_connection: "sqlalchemy+postgresql:///galaxy?host=/var/run/postgresql"
```
# Configuring Celery to use Postgres
Celery would prefer you use Redis (a Key-Value store) as a backend to store results. But we have a database! So let's try using that instead:
```yaml
enable_celery_tasks: true
celery_conf:
broker_url: null # This should default to using amqp_internal_connection
result_backend: "db+postgresql:///galaxy?host=/var/run/postgresql"
task_routes:
galaxy.fetch_data: galaxy.external
galaxy.set_job_metadata: galaxy.external
```
With that we should now be able to [use useful features like](https://docs.galaxyproject.org/en/master/admin/production.html#use-celery-for-asynchronous-tasks):
- Changing the datatype of a collection.
- Exporting histories
- other things!
# Configuring with Ansible
If you're using Ansible, this could also look like:
```yaml
amqp_internal_connection: "sqlalchemy+{{ database_connection }}"
enable_celery_tasks: true
celery_conf:
broker_url: null # This should default to using amqp_internal_connection
result_backend: "db+{{ database_connection }}"
task_routes:
galaxy.fetch_data: galaxy.external
galaxy.set_job_metadata: galaxy.external
```

0 comments on commit 2e21f8f

Please sign in to comment.