-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MS2 spectra without a precursor charge are ignored #26
Comments
Hi! |
Hey, yes seems like there is no charge reported. Unfortunately, that's common in public metabolomics data. Are you using the charge information in some way or you using this basically as a noise filter? |
We're mainly using the charge to split the spectra into charge-disjoint groups, to avoid that spectra with different charge states are clustered together. This is more relevant for proteomics data of course, where you'll encounter more different charges. We can look into how we can generalize the code a bit so that this information is no longer mandatory to be present. |
Hey, thank you for working on this! I tried out the branch where you implemented it and it seems to work now. However, it appears you won't allow clusters to be formed between spectra with and without charge. Just putting that here in case you have not thought of it. Thank you! |
Good point. Considering that we're clustering spectra by charge but in this case that information is unknown, it's not immediately obvious how this should best be handled.
@Janne98 what do you think? Or maybe you have a better solution? Splitting the clusters by charge is pretty essential though, as Ming has demonstrated in the MS-RT paper that cluster quality is significantly degraded when you don't do that. |
Thanks! Interesting that the charge state splitting was the main reason for falcons better performance. Did not completely grasp that from the MS-RT manuscript. If charge states are that essential what do you think about deriving them from the precursor MS1 scan if possible? I am actually surprised you always have this information for most Vendors already reported in the mzML files. I am not sure about the proportion of (public) metabolomics data where you have this information available, but I think I have stumbled across such data randomly twice at this point, so I imagine it's not that rare. Hence it would probably be an issue for larger-scale public data reanalysis efforts. |
Ah, I wouldn't say that this is the main reason, but rather one of the contributing factors. We know it makes some difference, so we don't want to make falcon worse by ignoring the relevant charge information. We know that the actual clustering approach in falcon is also significantly superior from what MS-Cluster does (and further upgrading that in recent developments), so there are multiple things at play to explain the performance difference.
Afaik charge state is missing when vendor/on-board instrument software can't determine it. So in that case, how likely is it that it can be derived by post-processing? I actually don't know, I haven't really looked at that yet. We have some plans that would require going back to the MS1 data (#19), so such a task could be tackled at the same time. I'm not sure we'll manage to do that for the next release though.
Indeed. It should majority be older data though, right? Afaik the charge is missing when no isotopic envelope can be detected, which is (much) more often the case for older instruments with a lower resolution. |
Interesting. Typically like in the files I shared in the beginning of the issue and also files such as this do not appear to charge information for any of the MS2 scans. So this must be due to what is saved into the file rather than what can be derived from isotope situation. Also can not be due to resolution since we have mostly charge 1 in metabolomics which requires a bit more than unit resolution,.. |
Yeah, fair enough. I think the vendor also matters, Thermo typically has charge but Waters/Bruker less often. One good thing though: if none of the spectra have a charge, there won't be any charge splitting and all spectra will be clustered together. So maybe it's less of an issue than feared? |
I run into the issue because I like clustering spectra of files from different studies because I can align things despite different retention times (with admittedly high tolerances when clustering across vendors). |
Yes, that's a valid use case that we definitely want to support. Looking at the last example, the MS1 actually has a nice isotopic pattern. So it should be possible to at least for some spectra correct the charge information. This might be functionality for version x+1 though. In the mean time, maybe the |
Ah, that would be a good intermediate solution. Unfortunately, |
The documentation here says that TOF data should also work. In any case, this is an interesting feature that we want to support, but it might not be for the next release. |
Oh, it seems I have an outdated version. Thanks! |
Hello, I was trying to cluster mzXML files from https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=88a7dfeeecb74131a6d6bfb7a9db0a46 in WSL:Ubuntu-22.04 but it does not seem to recognize any spectra. My parameters and output are below:
Tried to convert the mzXML to mzML and mgf via ProteoWizard 3.0.24124 but that did not solve the issue. I have confirmed that the files contain indeed MS2 spectra.
Thank you for the support!
The text was updated successfully, but these errors were encountered: