Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caveat, simplified celery deployment option that doesn't use RabbitMQ/Redis/Flower #5496

Merged
merged 7 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion faqs/galaxy/collections_change_datatype.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: This will set the datatype for all files in your collection. Does n
area: collections
box_type: tip
layout: faq
contributors: [shiltemann]
contributors: [shiltemann, hexylena]
---

1. Click on **Edit** {% icon galaxy-pencil %} next to the collection name in your history
Expand All @@ -13,3 +13,10 @@ contributors: [shiltemann]
- tip: you can start typing the datatype into the field to filter the dropdown menu
4. Click the **Save** button


**Cannot find the feature?**

If you are on a smaller Galaxy server, i.e. not one of the large (multi)national public servers, you may not be able to find this operation, and there is no indication it is missing or why it is disabled.

Galaxy has recently started putting [more features behind a setting and deployment configuration](https://docs.galaxyproject.org/en/master/admin/production.html#use-celery-for-asynchronous-tasks) that needs to be enabled by the server administrator.
Your administrator will need to deploy Celery and potentially additionally flower and redis to their stack to enable changing the datatype of a collection. Consider sending your Galaxy administrator the link to the [simpler deployment option]({% link topics/admin/tutorials/celeryless/tutorial.md %}) or more complex [GTN tutorial for setting up redis and flower]({% link topics/admin/tutorials/celery/tutorial.md %}).
7 changes: 3 additions & 4 deletions topics/admin/tutorials/celery/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,6 @@ tags:
- git-gat
---

# Overview


Celery is a distributed task queue written in Python that can spawn multiple workers and enables asynchronous task processing on multiple nodes. It supports scheduling, but focuses more on real-time operations.

From the Celery website:
Expand All @@ -52,7 +49,9 @@ From the Celery website:
>
{: .quote cite="https://docs.celeryq.dev/en/stable/getting-started/introduction.html#what-s-a-task-queue"}

[A slideshow presentation on this subject is available](slides.html).
[A slideshow presentation on this subject is available](slides.html).

If you are not interesting in managing Redis and Flower, you might be interested in the [lower-configuration deployment option]({% link topics/admin/tutorials/celeryless/tutorial.md %}).

> <agenda-title></agenda-title>
>
Expand Down
86 changes: 86 additions & 0 deletions topics/admin/tutorials/celeryless/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: tutorial_hands_on

title: "Alternative Celery Deployment for Galaxy"
zenodo_link: ""
questions:
- What is *required* for Celery to work in Galaxy?
objectives:
- Setup the bare minimum configuration to get tasks working
- Avoid deploying, securing, and managing RabbitMQ and Redis and Flower
time_estimation: "1h"
key_points:
- While a combination of RabbitMQ and Redis is perhaps the most production ready, you can use Postgres as a backend for Celery
- This significantly simplifies operational complexity, and reduces the attack surface of your Galaxy.
contributions:
authorship:
- hexylena
requirements:
- type: "internal"
topic_name: admin
tutorials:
- ansible
- ansible-galaxy
subtopic: data
tags:
- ansible
---

Celery is a new component to the Galaxy world (ca 2023) and is a distributed task queue that *can* be used to run tasks asynchronously. It isn't mandatory, but you might find some features you expect to use to be missing without it.

If you are running a large production deployment you probably want to follow the [Celery+Redis+Flower Tutorial]({% link topics/admin/tutorials/celery/tutorial.md %}).

However, if you are running a smaller Galaxy you may not want to manage deploying Celery (past what Gravity does for you automatically), you may not want to add Redis to your stack, and you may not have need of Flower!

> <agenda-title></agenda-title>
>
> 1. TOC
> {:toc}
>
{: .agenda}

# Configuring Galaxy to use Postgres

AMQP is a message queue protocol which processes can use to pass messages between each other. While a real message queue like RabbitMQ is perhaps the most robust choice, there is an easier option: Postgres

Add the following to your Galaxy configuration to use Postgres:

```yaml
amqp_internal_connection: "sqlalchemy+postgresql:///galaxy?host=/var/run/postgresql"
```

# Configuring Celery to use Postgres

Celery would prefer you use Redis (a Key-Value store) as a backend to store results. But we have a database! So let's try using that instead:

```yaml
enable_celery_tasks: true
celery_conf:
broker_url: null # This should default to using amqp_internal_connection
result_backend: "db+postgresql:///galaxy?host=/var/run/postgresql"
task_routes:
galaxy.fetch_data: galaxy.external
galaxy.set_job_metadata: galaxy.external
```


With that we should now be able to [use useful features like](https://docs.galaxyproject.org/en/master/admin/production.html#use-celery-for-asynchronous-tasks):

- Changing the datatype of a collection.
- Exporting histories
- other things!

# Configuring with Ansible

If you're using Ansible, this could also look like:

```yaml
amqp_internal_connection: "sqlalchemy+{{ database_connection }}"
enable_celery_tasks: true
celery_conf:
broker_url: null # This should default to using amqp_internal_connection
result_backend: "db+{{ database_connection }}"
task_routes:
galaxy.fetch_data: galaxy.external
galaxy.set_job_metadata: galaxy.external
```
Loading