Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating splitting #14

Open
mwang87 opened this issue Sep 16, 2021 · 4 comments
Open

Investigating splitting #14

mwang87 opened this issue Sep 16, 2021 · 4 comments

Comments

@mwang87
Copy link
Contributor

mwang87 commented Sep 16, 2021

Sometimes we have same precursor m/z and very similar MS/MS not ending up in the same cluster, even with high EPS values. One example here:

Falcon Clustering
Networking

We can see here repetitions of 327 m/z

network_5fae3956b11346e4b120352b735d54b3_73

Specifically, we can see the reptitions here in the clustering specifically:

Link

Just one example, two clusters

mzspec:GNPS:TASK-48f893dc8a4147e59798910e6c866ce2-workflow_results/clustered_result.mgf:scan:327
mzspec:GNPS:TASK-48f893dc8a4147e59798910e6c866ce2-workflow_results/clustered_result.mgf:scan:326

image

@mwang87
Copy link
Contributor Author

mwang87 commented Sep 16, 2021

Here is a clustering at EPS 0.5.

https://proteomics2.ucsd.edu/ProteoSAFe/result.jsp?task=690714c8c2434ab3ad76c6323bd0c4bd&view=view_results#%7B%22main._dyn_%23precursor_mz_lowerinput%22%3A%22327%22%2C%22main._dyn_%23precursor_mz_upperinput%22%3A%22328%22%7D

Some examples:

mzspec:GNPS:TASK-690714c8c2434ab3ad76c6323bd0c4bd-workflow_results/clustered_result.mgf:scan:362
mzspec:GNPS:TASK-690714c8c2434ab3ad76c6323bd0c4bd-workflow_results/clustered_result.mgf:scan:363

image

@mwang87
Copy link
Contributor Author

mwang87 commented Sep 20, 2021

Link to file to be clustered Link.

@bittremieux
Copy link
Owner

I need to dig a bit deeper into the pairwise distance matrix to see the hashed vector similarities and figure out why the spectra might not be clustered together.

Looking at the overview here though, the results don't look that bad. The first cluster contains 1133 spectra, and then there are just a few stragglers spread over a few very small clusters. So to a large extent the spectra are grouped in a single large cluster.

Ideally we want all similar spectra clustered together though, so I'll have to look at the data in more detail.

@mwang87
Copy link
Contributor Author

mwang87 commented Sep 22, 2021

Yeah, I agree (especially with regards to how MSCluster performed) Falcon is doing a pretty good job. Just thought it would be interesting to investigate these stragglers as they do make a qualitatively big difference in how networks are arranged. If difficult, definitely can think about cleaning up on my end with hybrid solution with Falcon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants