In the simplest of forms this tool uses a variant of DBSCAN to cluster peaks. Initially in the mz-ims dimensions along each frame. Then in the mz-rt dimensions across the frames. Finally in the rt-ims dimensions across the frames.
Then these clusters are used to generate pseudo-spectra. These pseudo-spectra are searched with Sage internally.
cargo build --release
./target/release/ionmesh --help
RUST_LOG=info ./target/release/ionmesh ...
Its a toml file ...
[denoise_config]
mz_scaling = 0.015
ims_scaling = 0.03
ms2_min_n = 2
ms1_min_n = 3
ms1_min_cluster_intensity = 100
ms2_min_cluster_intensity = 50
[tracing_config]
mz_scaling = 0.019999999552965164
rt_scaling = 2.200000047683716
ims_scaling = 0.02
min_n = 2
min_neighbor_intensity = 200
[pseudoscan_generation_config]
rt_scaling = 0.7
quad_scaling = 5.0
ims_scaling = 0.02
min_n = 4
min_neighbor_intensity = 500
[sage_search_config]
static_mods = [[
"C",
57.0215,
]]
variable_mods = [[
"M",
[15.9949],
]]
fasta_path = "./tmp/UP000005640_9606.fasta"
[output_config] # These options can be missing, if missing will not output the files.
out_features_csv = "features.csv"
debug_traces_csv = "debug_traces.csv"
debug_scans_json = "debug_scans.json"
There are a couple of features for development.
RUST_LOG=info # will change the log level ... levels are standard (info, debug, warn, error, trace)
DEBUG_TRACES_FROM_CACHE=1 # If set and non empty will load the traces from the cache.
# It will skip the generation of the traces and will read the file specified on the config. (handy when optimizing the pseudospectra generation)
- Use aggregation metrics to re-score sage search.
- [In progress] Do a two pass speudospec generation, where the first pass finds the centroids and the second pass aggregates around a radius. (this will prevent the issue where common ions, like b2's are assigned only to the most intense spectrum in a window....)
- RN I believe it is over-aggregating peaks and leading to a lot of straggler peaks.
- Re-define rt parmeters in the config as a function of the cycle time and not raw seconds.
- This can help with the observed fact that params perform very differently on instrument/method variants. (Some reparametrization could be % of frame intensity vs absolute intensity... Number of cycles instead of retention time... Use more actively the intensity of the cluster rather than the number of neighbors...)
- Add targeted extraction.
- Add detection of MS1 features + notched search instead of wide window search.
- Clean up some of the features and decide what aggregation steps use interal paralellism. (in some steps making multiple aggregations in paralle is better than doing parallel operations within the aggregation).
- Fix nomenclature ... I dont like how it is not consistent (indexed, indexer, index are using interchangeably ...).
- Compilation warning cleanup.
- Clean up dead/commented out code.
- Refactor
max_extension_distances
argument in the generic dbscan implementation to prevent the errors that might arise from mixing up the dimensions.- Should that be a propoerty of the converter?
- Commit to f32/f64 in specific places ... instead of the harder to maintain generic types.
- Add CICD to distribute the pre-compiled binaries.
- Add semver checks to the CICD pipeline.
- Add IMS output to the sage report.
- Change pseudo-spectrum aggregation
- I am happy with the trace aggregation (It can maybe be generalized to handle synchro or midia).
- Ids are pretty close to the equivalent DDA runs with the correct parameters ... They do seem good via manual inspection but the number of ids is low compared to peptide-centric searches.