Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Add information about 1-based or 0-based data in beacon #138

Open
blankdots opened this issue Nov 20, 2019 · 3 comments
Open

Add information about 1-based or 0-based data in beacon #138

blankdots opened this issue Nov 20, 2019 · 3 comments
Labels
enhancement New feature or request ga4gh Global Alliance for Genomic and Health good first issue Good for newcomers

Comments

@blankdots
Copy link
Contributor

blankdots commented Nov 20, 2019

Proposed solution

Have a info in the response that specifies if the data in the beacon is 0-based or 1-based.
While the recommendation for the API is to be 0-based ga4gh-beacon/specification#251 , that might not always be the case. Hence we will add some information to the API that a beacon deployment can specify what kind of data it has.

This is GA4GH related.

DoD (Definition of Done)

infoobject contains a key that specifies if 0-base or 1-base.

Testing

Unit test and peer review.

@blankdots blankdots added enhancement New feature or request ga4gh Global Alliance for Genomic and Health labels Nov 20, 2019
@blankdots
Copy link
Contributor Author

based on @teemukataja offline conversation:

  • Beacon-spec says, that coordinates are 0-based
  • VCF-spec says, that coordinates are 1-based
  • Different datasets (there are other dataset types, not only VCF) use either 0 or 1 as the first coordinate
  • European genome browsers typically use 1-based coordinate system
  • North-American genome browsers typically use 0-based coordinate system
  • People use both 0- and 1- based coordinate system based on their background

@blankdots
Copy link
Contributor Author

Solved in beacon network UI with: CSCfi/beacon-network-ui@2ffc700

@blankdots blankdots added the good first issue Good for newcomers label Nov 20, 2019
@teemukataja
Copy link
Contributor

Three solutions come to mind:

  1. Declare the file type (because the file types have specifications, and that might convey the information to the user)
{
    "datasetAlleleResponses": [
        {
            ...,
            "info": {
                "fileType": "vcf"
            }
        }
    ]
}
  1. Declare the coordinate base system, regardless of file type.
{
    "datasetAlleleResponses": [
        {
            ...,
            "info": {
                "coordinateBase": 1
            }
        }
    ]
}
  1. Or combine them both
{
    "datasetAlleleResponses": [
        {
            ...,
            "info": {
                "fileType": "vcf",
                "coordinateBase": 1
            }
        }
    ]
}

We could get the fileType from the input datafiles *.vcf in beacon_init, so they are inserted into the database with the metadata.

Concerns

What if a dataset contains multiple file types? Then we could use arrays instead "fileType": ["bam", "vcf"], and "coordinateBase": [0, 1] or "coordinateBase": "mixed", but I don't know if it's typical for a dataset to contain mixed filetypes and mixed coordinate base systems... Will need to investigate.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request ga4gh Global Alliance for Genomic and Health good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants