-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
bca9f1f
commit b71317c
Showing
4 changed files
with
31 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,27 +47,29 @@ gleams cluster --help | |
GLEAMS provides the `gleams embed` command to convert MS/MS spectra in peak files to 32-dimensional embeddings. Example: | ||
|
||
``` | ||
gleams embed *.mzML --embed_name GLEAMS.embed | ||
gleams embed *.mzML --embed_name GLEAMS_embed | ||
``` | ||
|
||
This will read the MS/MS spectra from all matched mzML files and export the results to a two-dimensional NumPy array of dimension _n_ x 32 in file `GLEAMS.embed.npy`, with _n_ the number of MS/MS spectra read from the mzML files. | ||
Additionally, a tabular file `GLEAMS.embed.parquet` will be created containing corresponding metadata for the embedded spectra. | ||
This will read the MS/MS spectra from all matched mzML files and export the results to a two-dimensional NumPy array of dimension _n_ x 32 in file `GLEAMS_embed.npy`, with _n_ the number of MS/MS spectra read from the mzML files. | ||
Additionally, a tabular file `GLEAMS_embed.parquet` will be created containing corresponding metadata for the embedded spectra. | ||
|
||
### Embedding clustering | ||
|
||
After converting the MS/MS spectra to 32-dimensional embeddings, they can be clustered to group spectra with similar embeddings using the `gleams cluster` command. Example: | ||
|
||
``` | ||
gleams cluster --embed_name GLEAMS.embed --cluster_name GLEAMS.cluster --eps 0.05 | ||
gleams cluster --embed_name GLEAMS_embed --cluster_name GLEAMS_cluster --distance_threshold 0.3 | ||
``` | ||
|
||
This will perform DBSCAN clustering on the embeddings. | ||
The output will be written to the `GLEAMS.cluster.npy` NumPy file with cluster labels per embedding (`-1` indicates noise, minimum cluster size 2). | ||
Additionally, a tabular file `GLEAMS.cluster.parquet` will be created containing corresponding metadata for the clustered spectra. | ||
Note that although this `GLEAMS.cluster.parquet` metadata file contains information for the same spectra as the `GLEAMS.embed.parquet` metadata file, the order of the spectra (matching the clustering results) is different. | ||
This will perform hierarchical clustering on the embeddings with the given distance threshold. | ||
The output will be written to the `GLEAMS_cluster.npy` NumPy file with cluster labels per embedding (`-1` indicates noise, minimum cluster size 2). | ||
Additionally, a file `GLEAMS_cluster_medoids.npy` will be created containing indexes of the cluster representative spectra (medoids). | ||
|
||
### Advanced usage | ||
|
||
Full configuration of GLEAMS, including various configurations to train the neural network, can be modified in the `gleams/config.py` file. | ||
|
||
Contact | ||
------- | ||
|
||
For more information you can visit the [official code website](https://github.com/bittremieux/GLEAMS) or send an email to <[email protected]>. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,4 +79,5 @@ | |
num_probe = 1024 | ||
|
||
# Clustering. | ||
linkage = 'average' | ||
distance_threshold = 0.35 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters