Skip to content

Commit

Permalink
brick started
Browse files Browse the repository at this point in the history
  • Loading branch information
tomlue committed May 31, 2024
1 parent 10633d3 commit c2c6f1c
Show file tree
Hide file tree
Showing 12 changed files with 103 additions and 6 deletions.
4 changes: 3 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@
},
"extensions": [
"ms-python.python",
"ms-toolsai.jupyter"
"ms-toolsai.jupyter",
"ms-vsliveshare.vsliveshare", // Live Share extension
"github.copilot" // GitHub Copilot extension
],
"remoteUser": "vscode"
}
3 changes: 3 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/config.local
/tmp
/cache
6 changes: 6 additions & 0 deletions .dvc/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[core]
remote = biobricks.ai
['remote "biobricks.ai"']
url = https://ins-dvc.s3.amazonaws.com/insdvc
['remote "s3.biobricks.ai"']
url = s3://ins-dvc/insdvc
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/download
/brick
6 changes: 6 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# SMRT Small Molecule Retention Time


This dataset is available on figshare at

https://figshare.com/articles/dataset/The_METLIN_small_molecule_dataset_for_machine_learning-based_retention_time_prediction/8038913
1 change: 1 addition & 0 deletions code/00_status.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# PURPOSE: CHECK IF THE SOURCE HAS CHANGED
7 changes: 7 additions & 0 deletions code/01_download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# PURPOSE: DOWNLOAD THE SMRT DATA TO THE ./download DIRECTORY
import os

# downloads to the ./download directory
os.makedirs('download', exist_ok=True)

# read the data from the ./download directory
7 changes: 7 additions & 0 deletions code/02_process.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# PURPOSE: CHANGE THE DOWNLOADED DATA TO ONE OR MORE PARQUET FILES
import os

# exports to the ./brick directory
os.makedirs('brick', exist_ok=True)

# read the data from the ./download directory
40 changes: 40 additions & 0 deletions dvc.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
schema: '2.0'
stages:
status:
cmd: python code/00_status.py
deps:
- path: code/00_status.py
hash: md5
md5: 95a09d63c054eb185a1408771f4ee8a3
size: 43
download:
cmd: python code/01_download.py
deps:
- path: code/01_download.py
hash: md5
md5: f82fd5fc2597b90ed411991180e4ac30
size: 195
outs:
- path: download/
hash: md5
md5: d751713988987e9331980363e24189ce.dir
size: 0
nfiles: 0
process:
cmd: python code/02_process.py
deps:
- path: code/02_process.py
hash: md5
md5: c520d6a17cb1fb7e47155d606ce80701
size: 197
- path: download/
hash: md5
md5: d751713988987e9331980363e24189ce.dir
size: 0
nfiles: 0
outs:
- path: brick/
hash: md5
md5: d751713988987e9331980363e24189ce.dir
size: 0
nfiles: 0
18 changes: 18 additions & 0 deletions dvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
stages:
status:
cmd: python code/00_status.py
deps:
- code/00_status.py
download:
cmd: python code/01_download.py
deps:
- code/01_download.py
outs:
- download/
process:
cmd: python code/02_process.py
deps:
- download/
- code/02_process.py
outs:
- brick/
12 changes: 7 additions & 5 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
python-dotenv
pandas
biobricks
fastparquet
pyarrow
python-dotenv==1.0.1
pandas==2.2.2
biobricks==0.3.7
fastparquet==2024.5.0
pyarrow==16.1.0
dvc==3.51.1
dvc-s3==3.2.0

0 comments on commit c2c6f1c

Please sign in to comment.