Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matchMz, CompareMassParam #64

Merged
merged 7 commits into from
Mar 23, 2022
Merged

matchMz, CompareMassParam #64

merged 7 commits into from
Mar 23, 2022

Conversation

andreavicini
Copy link
Collaborator

Hi, this PR contains a first implementation of the matchMz, CompareMassParam that we discussed in issue #56. There are a few things I'm not very sure of and I will try to add specific question on the code.

R/matchMz.R Outdated
slots = c(
queryAdducts = "adductClass",
targetAdducts = "adductClass"),
contains = "MzParam",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ok or should I just inherit from "Param"? The name might be misleading but actually "MzParam" would be ok also for mass matching. We could even inherit from "Mass2MzParam" adding only the slot "queryAdducts" (and possibly rename slot "adduct" to "targetAdducts" in "Mass2MzParam"). What do you think would be best?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to rename the adduct slot in Mass2MzParam to targetAdduct and also add a queryAdduct slot to Mass2MzParam (even if we don't use it for now). Then I would rename the CompareMassParam class maybe to Mz2MassParam (see also discussion issue #56 ) and make that extending Mass2MzParam.

I'm actually even wondering if we shouldn't also add a slot FUN to that class with default value @FUN = mass2mz for Mass2MzParam and @FUN = mz2mass for Mz2MassParam - the user should not have the possibility to set this slot (i.e. it should not be included as a parameter in the Mass2MzParam or Mz2MassParam functions...

R/matchMz.R Outdated Show resolved Hide resolved
R/matchMz.R Outdated
target = target_mass,
tolerance = param@tolerance,
ppm = param@ppm)
if (nrow(matches[[i]]))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or create separate columns for query and target adducts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to know which adducts matched.

Copy link
Collaborator Author

@andreavicini andreavicini Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I referenced one line above the correct one. Information on matched adducts is added in the same way as in PR #58.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more helpful to have them in separate columns, i.e. "query_adduct" and "target_adduct".

@andreavicini andreavicini marked this pull request as ready for review March 2, 2022 11:31
@jorainer jorainer mentioned this pull request Mar 2, 2022
Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrea for implementing! While going through the code I realized (like you did) that there is room to avoid redundancy and code duplication. Maybe we should first discuss some suggested changes also in issue #56 and then adapt this PR.

R/matchMz.R Outdated Show resolved Hide resolved
R/matchMz.R Outdated
target = target_mass,
tolerance = param@tolerance,
ppm = param@ppm)
if (nrow(matches[[i]]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more helpful to have them in separate columns, i.e. "query_adduct" and "target_adduct".

R/matchMz.R Show resolved Hide resolved
R/matchMz.R Show resolved Hide resolved
R/matchMz.R Outdated
slots = c(
queryAdducts = "adductClass",
targetAdducts = "adductClass"),
contains = "MzParam",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to rename the adduct slot in Mass2MzParam to targetAdduct and also add a queryAdduct slot to Mass2MzParam (even if we don't use it for now). Then I would rename the CompareMassParam class maybe to Mz2MassParam (see also discussion issue #56 ) and make that extending Mass2MzParam.

I'm actually even wondering if we shouldn't also add a slot FUN to that class with default value @FUN = mass2mz for Mass2MzParam and @FUN = mz2mass for Mz2MassParam - the user should not have the possibility to set this slot (i.e. it should not be included as a parameter in the Mass2MzParam or Mz2MassParam functions...

R/matchMz.R Outdated
metadata = list(param = param))
target_mz <- .mass_to_mz_df(target, param@targetAdducts)
matches <- do.call(
rbind, bpmapply(seq_along(query), query, FUN = .getMatches,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason why for this function we left bpmapply instead of a for loop like in all the other functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't know. bpmapply would make only sense if .getMatches is slow. Question is also if for loop or simple lapply?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me to convert to for loop if it simplifies the code (and is not slower).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly the first implementation had lapply then we moved to bpmapply and finally to the for loop (which is used in all the other cases except this one). To be sure maybe I can check which one is best.

R/matchMz.R Outdated
@@ -430,18 +560,18 @@ setMethod("matchMz",
target = "numeric",
param = "MzParam"),
function(query, target, param, BPPARAM = SerialParam()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here, like in all the other function where bpmapply is not used maybe BPPARAM is not needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right.

@andreavicini andreavicini requested a review from jorainer March 11, 2022 07:45
Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a comprehensive PR ;) Looks good Andrea, I have some minor change requests but the rest really looks great.

R/matchMz.R Outdated
signature = c(query = "numeric",
target = "data.frameOrSimilar",
param = "ValueParam"),
function(query, target, param, valueColname,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when valueColname is not defined? do you have a unit test for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Actually since you suggested not to export ValueParam I thought also matchMz, ValueParam wouldn't be used directly by the user but maybe like you said it wouldn't be bad to export it even though maybe it's not very likely that the user wants to match other things than mz and mass. If valueColname is not defined I think an error should be thrown (because there's not default value for it) but I could add a condition if (missing(valueColname)) stop("ValueParam has to be provided") and check this in a unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if we support only column names (i.e. character) a cleaner solution would be to have valueParam = character() in the function definition and check if length(valueParam) > 0.

R/matchMz.R Outdated
metadata = list(param = param))
target_mz <- .mass_to_mz_df(target, param@targetAdducts)
matches <- do.call(
rbind, bpmapply(seq_along(query), query, FUN = .getMatches,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me to convert to for loop if it simplifies the code (and is not slower).

R/matchMz.R Outdated
signature = c(query = "numeric",
target = "numeric",
param = "Mz2MassParam"),
function(query, target, param, BPPARAM = SerialParam()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the BPPARAM if you're using a for loop.

R/matchMz.R Outdated
Matched(query = query, target = target,
matches = do.call(rbind, res),
metadata = list(param = param))
target_mz <- data.frame(index = seq_along(target),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to matchMz, numeric, numeric, ValueParam - so I think you can completely remove this method, since MzParam is also inheriting from ValueParam.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! Thanks!

R/matchMz.R Outdated
signature = c(query = "numeric",
target = "numeric",
param = "ValueParam"),
function(query, target, param, BPPARAM = SerialParam()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop the BPPARAM here.

R/matchMz.R Show resolved Hide resolved
R/matchMz.R Show resolved Hide resolved
@andreavicini
Copy link
Collaborator Author

Hi, I’ve pushed the requested updates and also tried to correct documentation. I would still have the following questions. SummarizedExperiment for query should be supported for all the various matching approaches/parameter objects? I noticed that at the moment it should work only for those using also retention time for the match. Should also a Mz2MassRtParam be added?

@andreavicini andreavicini requested a review from jorainer March 16, 2022 14:45
@michaelwitting
Copy link
Collaborator

Yes, we would need a Mz2MassRtParam for matching between ion modes and use rtime as additional filter. Here two SummarizedExperiment can be also the input, one as query and one as target

@andreavicini
Copy link
Collaborator Author

I have added matchMz, Mz2MassRtParam but SummarizedExperiment is not yet supported and I would like to ask your opinion on that. If besides query we allow also target to be SummarizedExperiment maybe we should create 3 different classes inheriting from Matched. At the moment we have MatchedSummarizedExperiment where only query is expected to be SummarizedExperiment and maybe we should add another 2 for the cases when only target is SummarizedExperiment or both are. Alternatively since the only thing that changes between these is whether or not the rowData is called on query and/or target, maybe we could define one function that depending on the type of query/target returns identity or rowData and use only the Matched class with that. What do you think would be best?

@michaelwitting
Copy link
Collaborator

I would go for the solution using only the rowData and have simple Matched object as results containing the rowData of both, query and target

@andreavicini
Copy link
Collaborator Author

I thought we wanted to have the slot @query of the matched object to contain the object passed via matchMz parameter query but yes as you said we could also could fill slot @query with rowData(query). In that case then we wouldn’t need the MatchedSummarizedExperiment class or similar any more

@michaelwitting
Copy link
Collaborator

Maybe we can get feedback from @jorainer ? I didn't use it so far with a SummarizedExperiment. How would the return data look like when I'm calling matchedData is it the complete data with intensities or only everything in rowData of query and target?

@andreavicini
Copy link
Collaborator Author

Calling matchedData on a MatchedSummarizedExperiment returns by default all columns in rowData(query) and target (plus variables associated to the matches such as score)

@michaelwitting
Copy link
Collaborator

Okay. That is fine for me. I wasn't sure about the return of matchedData, if it would have returned everything that would have been a lot. I think it makes sense then to have a MatchSummarizedExperiment

@jorainer
Copy link
Member

Sorry, I'm a little late on the discussion here. I think @query or @target should always contain the full input object (e.g. the full SummarizedExperiment) but, as @andreavicini pointed out, matching is done on rowData and hence rowData is also presented with the matchedData function.

I haven't thought we will need to support also a target = SummarizedExperiment, but maybe that's good to have - for completeness.

@jorainer
Copy link
Member

Actually, I would prefer to first merge this PR before adding support for SummarizedExperiment @andreavicini - otherwise the PR becomes rather lengthy and hard to read.

@andreavicini
Copy link
Collaborator Author

I agree. I too had the impression that this PR maybe is getting rather lenghty.

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this PR. Let's merge and then work separately on the SummarizedExperiment.

@jorainer jorainer merged commit 193c065 into master Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants