Skip to content

Commit

Permalink
Merge pull request #13 from Center-for-Health-Data-Science/update_rec…
Browse files Browse the repository at this point in the history
…ount3

Several updates to the Recount3 subpackage
  • Loading branch information
ValeSora authored Jul 23, 2024
2 parents 2ccd6d4 + 71d0c6b commit ec1b6f0
Show file tree
Hide file tree
Showing 11 changed files with 35 additions and 9,020 deletions.
30 changes: 26 additions & 4 deletions bulkDGD/execs/dgd_get_recount3_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,9 @@ def main():
# Create a list to store the futures.
futures = []

# Create a set to store the names of the output/log files.
output_names = set()

# For each row of the data frame containing the samples' batches
for num_batch, row in enumerate(df.itertuples(index = False), 1):

Expand All @@ -288,9 +291,29 @@ def main():

#-------------------------------------------------------------#

# Get the overall name for the output/log files.
output_name = f"{project_name}_{samples_category}"

# Set a counter in case the name already exists and we need
# to name the files differently.
counter = 1

# If the name already exists
while output_name in output_names:

# Uniquify the name by adding a counter.
output_name = output_name + f"_{counter}"

# Update the counter.
counter += 1

# Add the new name to the list of names.
output_names.add(output_name)

#-------------------------------------------------------------#

# Get the name of the output file.
output_csv_name = \
f"{project_name}_{samples_category}_{num_batch}.csv"
output_csv_name = f"{output_name}.csv"

# Get the path to the output file.
output_csv_path = os.path.join(wd, output_csv_name)
Expand All @@ -306,8 +329,7 @@ def main():
#-------------------------------------------------------------#

# Get the path to the log file and the file's extension.
log_file_name = \
f"{project_name}_{samples_category}_{num_batch}.log"
log_file_name = f"{output_name}.log"

# Get the path to the log file.
log_file_path = os.path.join(wd, log_file_name)
Expand Down
94 changes: 0 additions & 94 deletions bulkDGD/recount3/data/gtex_tissues.txt

This file was deleted.

61 changes: 1 addition & 60 deletions bulkDGD/recount3/data/readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# `data`

Last updated: 29/03/2024
Last updated: 22/07/2024

## `gtex_metadata_fileds`

Expand All @@ -23,43 +23,6 @@ AGE
DTHHRDY
```

## `gtex_tissues.txt`

A plain text file containing the list of available GTEx tissues. `dgd_get_recount3_data` uses it to check whether the user-provided tissue is valid.

Example:

```
# GTEx tissue types - STUDY_NA is not included
# Adipose tissue
ADIPOSE_TISSUE
# Adrenal gland
ADRENAL_GLAND
# Blood
BLOOD
# Blood vessel
BLOOD_VESSEL
```

## `sra_codes.txt`

A plain text file containing the list of available SRA codes. `dgd_get_recount3_data` uses it to check whether the user-provided SRA code is valid.

Example:

```
# SRA codes
SRP107565
SRP149665
SRP017465
SRP119165
```

## `sra_metadata_fields.txt`

A plain text file containing the fields (= columns) found in the files describing the metadata associated with SRA samples downloaded from the Recount3 platform.
Expand All @@ -78,28 +41,6 @@ sample_acc
experiment_acc
```

## `tcga_cancer_types.txt`

A plain text file containing the list of TCGA cancer types. `dgd_get_recount3_data` uses it to check whether the user-provided cancer type is valid.

Example:

```
# TCGA cancer types (from https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations, CNTL, FPPP, and MISC excluded)
# Adrenocortical carcinoma
ACC
# Bladder Urothelial Carcinoma
BLCA
# Breast Invasive Carcinoma
BRCA
# Cervical squamous cell carcinoma and endocervical carcinoma
CESC
```

## `tcga_metadata_fields.txt`

A plain text file containing the fields (= columns) found in the files describing the metadata associated with TCGA samples downloaded from the Recount3 platform.
Expand Down
Loading

0 comments on commit ec1b6f0

Please sign in to comment.