Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuration for community content #169

Merged
merged 4 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 0 additions & 100 deletions .github/workflows/fetch_all_tools.yaml

This file was deleted.

51 changes: 0 additions & 51 deletions .github/workflows/fetch_all_tutorials.yaml

This file was deleted.

121 changes: 121 additions & 0 deletions .github/workflows/fetch_filter_resources.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
name: Weekly resource fetching and community filtering

on:
workflow_dispatch:
schedule:
#Every Sunday at 8:00 am
- cron: "0 8 * * 0"

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "tools"
cancel-in-progress: false

jobs:
fetch-servers:
runs-on: ubuntu-20.04
name: Merge tools, fetch tutorials and filter the resources for communities
steps:
- name: Checkout main
uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install requirement
run: |
python -m pip install -r requirements.txt
sudo apt-get install jq
- name: Fetch list of all available servers
run: |
python sources/bin/get_public_galaxy_servers.py -o sources/data/available_public_servers.csv
- name: Archive available servers
uses: actions/upload-artifact@v4
with:
name: available-servers
path: sources/data/available_public_servers.csv
fetch-tools-stepwise:
runs-on: ubuntu-20.04
name: Fetch tool stepwise
strategy:
#max-parallel: 1 #need to run one after another, since otherwise there is a chance, that mulitple jobs want to push to the results branch at the same time (which fails due to merge)
matrix:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combined WF triggers a subset job, where each repositories*.list is processed individually. The way it's now combined, everything after the merge part does not really make sense ... and the Fetch list of all available servers should probably also not be a part of the matrix subset. I would propose to run the Fetch all tool stepwise in an individual job and store the results as an artifact. Then use the artifact as input for the merge part and have that as another job.

Copy link
Member Author

@bebatut bebatut Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the CRON workflow to have 3 jobs

  1. For getting the available servers and storing the TSV as artifact
  2. For triggering the subset jobs to fetch tools in parallel and storing the outputs as artifact
  3. For merging the TSV and running the rest

python-version: [3.11]
subset:
- repositories01.list
- repositories02.list
- repositories03.list
- repositories04.list
steps:
- name: Checkout main
uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install requirement
run: python -m pip install -r requirements.txt
- name: Download a single artifact
uses: actions/download-artifact@v4
with:
name: available-servers
path: sources/data/
- name: Fetch all tool stepwise
run: |
bash sources/bin/extract_all_tools.sh "${{ matrix.subset }}"
env:
GITHUB_API_KEY: ${{ secrets.GH_API_TOKEN }}
- name: Archive tool sublists production artifacts
uses: actions/upload-artifact@v4
with:
name: tools-${{ matrix.subset }}
path: communities/all/resources/repositories${{ matrix.subset }}.list_tools.tsv
merge-fetch-filter:
runs-on: ubuntu-20.04
name: Merge tools, fetch tutorials and filter the resources for communities
steps:
- name: Checkout main
uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install requirement
run: |
python -m pip install -r requirements.txt
sudo apt-get install jq
- name: Download All Artifacts
uses: actions/download-artifact@v4
with:
path: tools
pattern: tools-*
merge-multiple: true
path: communities/all/resources/
- name: Merge all tools
run: | #merge files with only one header -> https://stackoverflow.com/questions/16890582/unixmerge-multiple-csv-files-with-same-header-by-keeping-the-header-of-the-firs; map(.[]) -> https://stackoverflow.com/questions/42011086/merge-arrays-of-json (get flat array, one tool per entry)
awk 'FNR==1 && NR!=1{next;}{print}' communities/all/resources/repositories*.list_tools.tsv > communities/all/resources/tools.tsv
jq -s 'map(.[])' communities/all/resources/repositories*.list_tools.json > communities/all/resources/all_tools.json
- name: Generate wordcloud and interactive table
run: |
bash sources/bin/format_tools.sh
- name: Fetch all tutorials
run: |
bash bin/extract_all_tutorials.sh
env:
PLAUSIBLE_API_KEY: ${{ secrets.PLAUSIBLE_API_TOKEN }}
- name: Filter tutorials for communities
run: |
bash bin/get_community_tutorials.sh
- name: Update tool to keep and exclude for communities
run: |
bash bin/update_tools_to_keep_exclude.sh
- name: Filter tools for communities
run: |
bash bin/get_community_tools.sh
- name: Create Pull Request
uses: peter-evans/create-pull-request@v4
with:
commit-message: Update resources
title: Automatic resources update
body: Automatic resource update done via GitHub Action once a week
base: main
branch: resource-update
delete-branch: true
Loading