Skip to content

Commit

Permalink
Merge pull request #4436 from galaxyproject/afrino-chromatic
Browse files Browse the repository at this point in the history
Final updates for gallantries
  • Loading branch information
shiltemann authored Nov 2, 2023
2 parents 6d73a70 + fec28bd commit 37d4427
Show file tree
Hide file tree
Showing 29 changed files with 972 additions and 19 deletions.
4 changes: 0 additions & 4 deletions _includes/default-footer.html
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,3 @@
</div>
</div>
</footer>


</div>
</footer>
10 changes: 7 additions & 3 deletions _includes/pathway-card.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{% assign coverimage = path.coverimage | default: "/assets/images/GTNLogo1000.png" %}
{% assign coverimagealt = path.coverimagealt | default: "GTN logo with a multi-coloured star and the words Galaxy Training Network"%}


{% if path.draft != true or jekyll.environment != "production" %}
<div class="pathwayitem col-md-4">
<a href="{{site.baseurl}}{{path.url}}">
<div class="card d-flex">
Expand All @@ -21,12 +21,16 @@ <h5 class="card-title">{{path.title}}</h5>
</p>
</div>
<div class="card-footer">

{% if path.draft %}
<span class="label label-default tutorial_tag" style="{{ 'Draft' | colour_tag }}">Draft</span>
{% endif %}

{% for tag in path.tags %}
<span class="label label-default tutorial_tag" style="{{ tag | colour_tag }}">{{ tag }}</span>
{% endfor %}
</div>
</div>
</a>
</div>


{% endif %}
38 changes: 37 additions & 1 deletion bin/schema-learning-pathway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,48 @@ mapping:
- type: str
#enum:
#- CONTRIBUTORS

tags:
type: seq
description: Any relevant tags that would help a user discover this LP
sequence:
- type: str
required: true

type:
type: str
description: |
The type of topic, some have subtly different behaviours.
`admin-dev`
: should be used for admin and developer topics that are not scientifically focused.
`basics`
: Only used for galaxy-interface type topics
`data-science`
: Topics which are not necessarily Galaxy focused but expand into broader communities
`use`
: These topics use galaxy for some analysis
`instructors`
: Specific to topics related to instruction of Galaxy
required: true
enum:
- admin-dev
- basics
- data-science
- use
- instructors

draft:
type: bool
description: |
`true` to hide your LP from the LP list (optional). This
is useful if you need an LP for a workshop, but have not
finished making it up to GTN standards.
pathway:
type: seq
required: true
Expand All @@ -67,7 +104,6 @@ mapping:
type: str
tutorials:
type: seq
required: true
sequence:
- type: map
mapping:
Expand Down
3 changes: 2 additions & 1 deletion bin/validate-frontmatter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,8 @@ def self.run
.grep_v(/schema-*/)
.select do |x|
d = YAML.load_file(x)
d.key? 'editorial_board' or d.key? 'summary' or d.key? 'type'
# Ignore non-hashes
d.is_a?(Hash) && (d.key? 'editorial_board' or d.key? 'summary' or d.key? 'type')
end

errors += materials.map { |x| [x, lint_topic(x)] }
Expand Down
4 changes: 4 additions & 0 deletions learning-pathways/climate-learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ description: |
How to have a complete overview of how Galaxy works going from the user welcome page to use batch tools and finishing by conducting interactive analysis. These set of 3 Climate tutorials allow you to understand and see plenty of the multiple features of Galaxy and learning about the cool subject of climate analysis.
tags: [Climate, Overview]

editorial_board:
- Marie59

type: use

pathway:
- section: "Following 3 climate tutorials"
Expand Down
149 changes: 149 additions & 0 deletions learning-pathways/io1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
layout: learning-pathway
tags: [beginner]
type: use

editorial_board:
- fpsom
- shiltemann
- hexylena
funding:
- gallantries

title: Gallantries Grant - Intellectual Output 1 - Introduction to data analysis and -management, statistics, and coding

description: |
This Learning Pathway collects the results of Intellectual Output 1 in the Gallantries Project
cover-image: ./shared/images/Gallantries_logo.png
cover-image-alt: "Gallantries logo with the carpentries wrench in galaxy 2 stripes 1 strip colour scheme."

priority: 5
draft: true

pathway:
- section: "Year 1: Coding in Python"
description: |
Intro to Coding in Python. Covers variables, functions, and data structures [SC1.1,2]
tutorials:
- name: python-basics
topic: data-science
- name: python-advaced-np-pd
topic: data-science

- section: "Year 1: Coding in Python Modular (Avans)"
description: |
Intro to Coding in Python. Covers variables, functions, and data structures [SC1.1,2]
In collaboration with Avans Hogeschool, an associated Partner we produced the following lessons
tutorials:
- topic: data-science
name: python-math
- topic: data-science
name: python-functions
- topic: data-science
name: python-types
- topic: data-science
name: python-iterables
- topic: data-science
name: python-flow
- topic: data-science
name: python-loops
- topic: data-science
name: python-exceptions
- topic: data-science
name: python-files
- topic: data-science
name: python-basics-recap
- topic: data-science
name: python-glob
- topic: data-science
name: python-argparse
- topic: data-science
name: python-subprocess
- topic: data-science
name: python-venv
- topic: data-science
name: python-conda

- section: "Year 1: Coding in R"
description: |
Intro to Coding in R. Covers variables, functions, and data structures [SC1.1,2]
tutorials:
- name: r-basics
topic: data-science
- name: r-advanced
topic: data-science
- name: r-dplyr
topic: data-science

- section: "Year 1: Intro to Command Line"
description: |
This submodule will cover the basics of the shell (variables, for loops), needed for data handling [SC1.1,2,6]
tutorials:
- name: cli-basics
topic: data-science
- name: cli-advanced
topic: data-science
- name: cli-bashcrawl
topic: data-science
- name: snakemake
topic: data-science

- section: "Year 1: Intro to Git and GitHub"
description: |
This submodule will cover the basics of research software development and sharing (committing, branching, forking, GitHub, etc.) [SC1.1,2,6]
tutorials:
- name: bash-git
topic: data-science
- name: git-cli
topic: data-science
- name: github-command-line-contribution
topic: contributing
- name: github-interface-contribution
topic: contributing

- section: "Year 2: Introduction to Genomics"
description: |
This submodule covers the biological background, as well as the technological concepts involved in genome sequencing, and their effects on downstream data analysis. [SC1.3,4,6]
- section: "Year 2: Quality Control"
description: |
This submodule will cover the evaluation of the quality of datasets, and how to improve quality by a cyclic process of cleaning, trimming and filtering datasets and re-evaluating the quality. [SC1.3-5]
tutorials:
- name: quality-control
topic: sequence-analysis

- section: "Year 2: Mapping"
description: |
This submodule will cover the comparison of genome sequencing samples to a reference genome. The concept of reference data is relevant in many data analyses across life sciences; connecting to online databases and incorporating this data into an analysis. [SC1.3,4]
tutorials:
- name: mapping
topic: sequence-analysis

- section: "Year 3: Variant Analysis"
description: |
This submodule will cover the topic of variant calling; after mapping of sequences to the reference genome, the regions that are different from the reference genome (variants) must be determined, and evaluated for impact. As any two individuals will by definition show many differences, the challenge of distinguishing between healthy variation and potential disease-causing variants is one of the main challenges in variant calling. [SC1.3-5]
tutorials:
- name: bash-variant-calling
topic: data-science

- section: "Year 3: Transcriptomics"
description: |
DNA only describes the potential of the genome; which genes are actually active within the cell and impacting the health and function of the organism, is determined via transcriptomics (RNA sequencing). By integrating data from these two levels of analysis (DNA and RNA), a clearer picture of the state of the cell can be obtained. [SC1.3-5]
tutorials:
- name: rna-seq-bash-star-align
topic: transcriptomics

---

In total, this module will form a course of around 10 days (± 2 days depending on exact analysis stories we identify). Some of these introductory submodules will build on existing training material available in the GTN or Carpentries (~15%).

Success Criteria:

- SC1.1) Basic coding skills. This module will cover the basics of the R and Python coding languages for novices. No coding experience will be assumed nor expected. Basic coding concepts will be introduced (variables, functions, data structures).
- SC1.2) Research software development. We will cover best practices for research software development. It will follow Open Science principles, and include topics such as collaborative code development (e.g. git), reproducible research, code review, and quality control.
- SC1.3) Familiarity with federated data analysis, management, and compute infrastructures. We will introduce the Galaxy platform, a user-friendly web-based analysis platform capable of distributing work across public/private clouds and High-Performance Computing (HPC) resources.
- SC1.4) Basic statistical analysis skills. This submodule will cover the basic concepts involved in statistical analysis of scientific data.
- SC1.5) Data acquisition and integration. Scientific data analyses often require interaction with external datasets. We will cover ways to retrieve data from online data sources, transform it to the required format, and integrate it into the analysis.
- SC1.6) Reproducibility and data sharing. A cornerstone of scientific research is reproducibility. We will cover how to effectively share data and analysis pipelines in order to make scientific results optimally reproducible.
113 changes: 113 additions & 0 deletions learning-pathways/io2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
layout: learning-pathway
tags: [beginner]
type: use

editorial_board:
- shiltemann
- hexylena
- bebatut
funding:
- gallantries

title: Gallantries Grant - Intellectual Output 2 - Large-scale data analysis, and introduction to visualisation and data modelling


description: |
This Learning Pathway collects the results of Intellectual Output 2 in the Gallantries Project
cover-image: ./shared/images/Gallantries_logo.png
cover-image-alt: "Gallantries logo with the carpentries wrench in galaxy 2 stripes 1 strip colour scheme."

priority: 5
draft: true

pathway:



- section: "Year 1: Introduction to large-scale analyses in Galaxy"
description: |
Galaxy offers support for the analysis of large collections of data. This submodule will cover the upload, organisation, and analysis of such large sets of data and files. [SC2.1; SC1.3,5]
tutorials:
- name: upload-rules
topic: galaxy-interface
- name: upload-rules-advanced
topic: galaxy-interface
- name: ncbi-sarf
topic: galaxy-interface
- name: history-to-workflow
topic: galaxy-interface
- name: collections
topic: galaxy-interface
- name: workflow-automation
topic: galaxy-interface
- name: workflow-editor
topic: galaxy-interface
- name: workflow-parameters
topic: galaxy-interface


- section: "Year 1: Introduction to the human microbiome analyses"
description: |
The human microbiome consists of a community of thousands of species of microorganisms. Sequencing of this community is often performed to identify which species of microorganism are present. This aids in diagnostics and treatment of patients. [SC2.1-3,6; SC1.4,5]
tutorials:
- name: beer-data-analysis
topic: metagenomics
- name: nanopore-16s-metagenomics
topic: metagenomics

- section: "Year 1: Advanced microbiome analysis"
description: |
By using more complex sequencing techniques, it is possible to not only obtain information about which organisms are present in the microbiome, but also their activity. This can e.g. aid in identification of antibiotic resistance. This more complex sequencing requires more complex data analysis [SC2.1-4,6; SC1.4,5]
tutorials:
- name: pathogen-detection-from-nanopore-foodborne-data
topic: metagenomics

- section: "Year 2: Cancer Analysis"
description: |
The previous submodules focused on scaling up in terms of number of samples. This submodule will focus on scaling up in terms of complexity. Cancer is a disease of the genome, it is a multifaceted and heterogeneous disease. This leads to complex datasets and analysis pipelines [SC2.3,4; SC1.5]
tutorials:
- name: mapping-by-sequencing
topic: variant-analysis
# https://github.com/galaxyproject/training-material/pull/3802

- section: "Year 2: Intro to machine learning"
description: |
Going beyond conventional statistics, many scientific data analyses benefit from machine learning techniques for modelling of datasets. This is widely used in biomedical domain. [SC2.4,5; SC1.4]
tutorials:
- name: intro-to-ml-with-r
topic: statistics

- section: "Year 2: Introduction to the Galaxy visualisation framework"
description: |
(This module was cancelled due to insufficiencies in the Galaxy Visualisation Framework.) Galaxy has many options for visualisation of scientific data. This module will cover how to use this framework to create and share visualisation. [SC2.2-3; SC1.1,3,6]
tutorials: []

- section: "Year 3: Visualisation of complex multidimensional data"
description: |
For advanced visualisation, tools such as Circos may be utilized where Galaxy’s basic visualisation framework does not suffice. [SC2.2-3; SC1.5]
tutorials:
- name: circos
topic: visualisation

- section: "Year 3: Introduction to Visualisation with R and Python"
description: |
When the available visualisation options do not suffice, custom plots and visualisations can be created using one of several extensive visualisation libraries available in R and Python. This module will cover the basics of using R and Python to create custom plots and visualisations. [SC2.3; SC1.1]
tutorials:
- name: data-manipulation-olympics-viz-r
topic: data-science
- name: python-plotting
topic: data-science

---

Success Criteria:

- SC2.1) Large-scale data analyses and -handling. In this module, learners will gain competency in managing, organizing, and analysing large collections of datasets.
- SC2.2) Analysis of high-dimensional datasets. Real-world scientific studies often involve more complex datasets. For example, combining data from different experiments or timepoints. This more complex experimental setup translates to increased complexity in data analysis.
- SC2.3) Data visualisation. This module will cover the basics of data visualisation to aid with exploration, interpretation of complex datasets.
- SC2.4) Data modelling. This module will introduce learners to the basics data modelling techniques. This is often required for the identification of patterns in data required for e.g. classification.
- SC2.5) Machine learning. This module will also cover more advanced data modelling techniques such as machine learning.
- SC2.6) Reasoning about impact of computation on results. Many choices must be made during data analysis. This includes experimental design, choice of data analysis tools and their parameter settings, and external reference databases. Each of these choices will impact the results. Accurate interpretation of results is only possible with an understanding and awareness of the impact of these factors.

Loading

0 comments on commit 37d4427

Please sign in to comment.