Skip to content

Commit

Permalink
Update README and bash scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
bebatut committed Oct 31, 2023
1 parent cf5bfd1 commit 5cee01f
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 31 deletions.
63 changes: 35 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,40 +38,22 @@ Galaxy Tool extractor
$ python3 -m pip install -r requirements.txt
```
# Extract tools for categories in the ToolShed
## Extract all tools
1. Get an API key ([personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens)) for GitHub
2. (Optional) Create a text file with ToolShed categories for which tools need to be extracted: 1 ToolShed category per row ([example for microbial data analysis](data/microgalaxy/categories))
3. (Optional) Create a text file with list of tools to exclude: 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_exclude))
4. (Optional) Create a text file with list of tools to really keep (already reviewed): 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_keep))
4. Run the tool extractor script
2. Export the GitHub API key as an environment variable:
```
$ python bin/extract_galaxy_tools.py \
--api <GitHub API key> \
--output <Path to output file> \
[--categories <Path to ToolShed category file>] \
[--excluded <Path to excluded tool file category file>]\
[--keep <Path to to-keep tool file category file>]
$ export GITHUB_API_KEY=<your GitHub API key>
```
For microGalaxy, a Bash script in `bin` can used by:
1. Exporting the GitHub API key as an environment variable:
```
$ export GITHUB_API_KEY=<your GitHub API key>
```
2. Running the script
```
$ bash bin/extract_microgalaxy_tools.sh
```
It will take the files in the `data/microgalaxy` folder and export the tools into `microgalaxy_tools.csv`
3. Run the script
```
$ python bin/extract_all_tools.sh
```
The script will generate a CSV file with each tool found in the list of GitHub repository and several information for these tools:
The script will generate a CSV file with each tool found in the list of GitHub repositories and metadata for these tools:
1. Galaxy wrapper id
2. Description
Expand All @@ -89,5 +71,30 @@ The script will generate a CSV file with each tool found in the list of GitHub r
14. Galaxy wrapper version
15. Conda id
16. Conda version
17. Reviewed
18. To keep
## Filter tools based on their categories in the ToolShed
1. Run the extraction as explained before
2. (Optional) Create a text file with ToolShed categories for which tools need to be extracted: 1 ToolShed category per row ([example for microbial data analysis](data/microgalaxy/categories))
3. (Optional) Create a text file with list of tools to exclude: 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_exclude))
4. (Optional) Create a text file with list of tools to really keep (already reviewed): 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_keep))
4. Run the tool extractor script
```
$ python bin/extract_galaxy_tools.py \
--tools <Path to CSV file with all extracted tools> \
--filtered_tools <Path to output CSV file with filtered tools> \
[--categories <Path to ToolShed category file>] \
[--excluded <Path to excluded tool file category file>]\
[--keep <Path to to-keep tool file category file>]
```
### Filter tools for microbial data analysis
For microGalaxy, a Bash script in `bin` can used by running the script
```
$ bash bin/extract_microgalaxy_tools.sh
```
It will take the files in the `data/microgalaxy` folder and export the tools into `microgalaxy_tools.csv`
6 changes: 6 additions & 0 deletions bin/extract_all_tools.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

python bin/extract_galaxy_tools.py \
extractools \
--api $GITHUB_API_KEY \
--all_tools 'results/all_tools.csv'
16 changes: 13 additions & 3 deletions bin/extract_microgalaxy_tools.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
#!/usr/bin/env bash

curl \
-L \
"https://docs.google.com/spreadsheets/d/1Nq_g-CPc8t_eC4M1NAS9XFJDflA7yE3b9hfSg3zu9L4/export?format=tsv&gid=1533244711" \
-o "data/microgalaxy/tools_to_keep"

curl \
-L \
"https://docs.google.com/spreadsheets/d/1Nq_g-CPc8t_eC4M1NAS9XFJDflA7yE3b9hfSg3zu9L4/export?format=tsv&gid=672552331" \
-o "data/microgalaxy/tools_to_exclude"

python bin/extract_galaxy_tools.py \
--api $GITHUB_API_KEY \
--output microgalaxy_tools.csv \
filtertools \
--tools 'results/all_tools.csv' \
--categories "data/microgalaxy/categories" \
--excluded "data/microgalaxy/tools_to_exclude" \
--exclude "data/microgalaxy/tools_to_exclude" \
--keep "data/microgalaxy/tools_to_keep"

0 comments on commit 5cee01f

Please sign in to comment.