-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matchMz, CompareMassParam #64
Conversation
Merge branch 'master' into andrea # Conflicts: # NAMESPACE
R/matchMz.R
Outdated
slots = c( | ||
queryAdducts = "adductClass", | ||
targetAdducts = "adductClass"), | ||
contains = "MzParam", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ok or should I just inherit from "Param"
? The name might be misleading but actually "MzParam"
would be ok also for mass matching. We could even inherit from "Mass2MzParam"
adding only the slot "queryAdducts" (and possibly rename slot "adduct" to "targetAdducts" in "Mass2MzParam"). What do you think would be best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to rename the adduct
slot in Mass2MzParam
to targetAdduct
and also add a queryAdduct
slot to Mass2MzParam
(even if we don't use it for now). Then I would rename the CompareMassParam
class maybe to Mz2MassParam
(see also discussion issue #56 ) and make that extending Mass2MzParam
.
I'm actually even wondering if we shouldn't also add a slot FUN
to that class with default value @FUN = mass2mz
for Mass2MzParam
and @FUN = mz2mass
for Mz2MassParam
- the user should not have the possibility to set this slot (i.e. it should not be included as a parameter in the Mass2MzParam
or Mz2MassParam
functions...
R/matchMz.R
Outdated
target = target_mass, | ||
tolerance = param@tolerance, | ||
ppm = param@ppm) | ||
if (nrow(matches[[i]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or create separate columns for query and target adducts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to know which adducts matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I referenced one line above the correct one. Information on matched adducts is added in the same way as in PR #58.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more helpful to have them in separate columns, i.e. "query_adduct"
and "target_adduct"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrea for implementing! While going through the code I realized (like you did) that there is room to avoid redundancy and code duplication. Maybe we should first discuss some suggested changes also in issue #56 and then adapt this PR.
R/matchMz.R
Outdated
target = target_mass, | ||
tolerance = param@tolerance, | ||
ppm = param@ppm) | ||
if (nrow(matches[[i]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more helpful to have them in separate columns, i.e. "query_adduct"
and "target_adduct"
.
R/matchMz.R
Outdated
slots = c( | ||
queryAdducts = "adductClass", | ||
targetAdducts = "adductClass"), | ||
contains = "MzParam", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to rename the adduct
slot in Mass2MzParam
to targetAdduct
and also add a queryAdduct
slot to Mass2MzParam
(even if we don't use it for now). Then I would rename the CompareMassParam
class maybe to Mz2MassParam
(see also discussion issue #56 ) and make that extending Mass2MzParam
.
I'm actually even wondering if we shouldn't also add a slot FUN
to that class with default value @FUN = mass2mz
for Mass2MzParam
and @FUN = mz2mass
for Mz2MassParam
- the user should not have the possibility to set this slot (i.e. it should not be included as a parameter in the Mass2MzParam
or Mz2MassParam
functions...
R/matchMz.R
Outdated
metadata = list(param = param)) | ||
target_mz <- .mass_to_mz_df(target, param@targetAdducts) | ||
matches <- do.call( | ||
rbind, bpmapply(seq_along(query), query, FUN = .getMatches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason why for this function we left bpmapply
instead of a for
loop like in all the other functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't know. bpmapply
would make only sense if .getMatches
is slow. Question is also if for loop or simple lapply
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me to convert to for loop if it simplifies the code (and is not slower).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly the first implementation had lapply
then we moved to bpmapply
and finally to the for
loop (which is used in all the other cases except this one). To be sure maybe I can check which one is best.
R/matchMz.R
Outdated
@@ -430,18 +560,18 @@ setMethod("matchMz", | |||
target = "numeric", | |||
param = "MzParam"), | |||
function(query, target, param, BPPARAM = SerialParam()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here, like in all the other function where bpmapply
is not used maybe BPPARAM
is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a comprehensive PR ;) Looks good Andrea, I have some minor change requests but the rest really looks great.
R/matchMz.R
Outdated
signature = c(query = "numeric", | ||
target = "data.frameOrSimilar", | ||
param = "ValueParam"), | ||
function(query, target, param, valueColname, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when valueColname
is not defined? do you have a unit test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Actually since you suggested not to export ValueParam
I thought also matchMz, ValueParam
wouldn't be used directly by the user but maybe like you said it wouldn't be bad to export it even though maybe it's not very likely that the user wants to match other things than mz and mass. If valueColname
is not defined I think an error should be thrown (because there's not default value for it) but I could add a condition if (missing(valueColname)) stop("
ValueParam has to be provided")
and check this in a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if we support only column names (i.e. character
) a cleaner solution would be to have valueParam = character()
in the function definition and check if length(valueParam) > 0
.
R/matchMz.R
Outdated
metadata = list(param = param)) | ||
target_mz <- .mass_to_mz_df(target, param@targetAdducts) | ||
matches <- do.call( | ||
rbind, bpmapply(seq_along(query), query, FUN = .getMatches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me to convert to for loop if it simplifies the code (and is not slower).
R/matchMz.R
Outdated
signature = c(query = "numeric", | ||
target = "numeric", | ||
param = "Mz2MassParam"), | ||
function(query, target, param, BPPARAM = SerialParam()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the BPPARAM
if you're using a for
loop.
R/matchMz.R
Outdated
Matched(query = query, target = target, | ||
matches = do.call(rbind, res), | ||
metadata = list(param = param)) | ||
target_mz <- data.frame(index = seq_along(target), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks identical to matchMz, numeric, numeric, ValueParam
- so I think you can completely remove this method, since MzParam
is also inheriting from ValueParam
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! Thanks!
R/matchMz.R
Outdated
signature = c(query = "numeric", | ||
target = "numeric", | ||
param = "ValueParam"), | ||
function(query, target, param, BPPARAM = SerialParam()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop the BPPARAM
here.
Hi, I’ve pushed the requested updates and also tried to correct documentation. I would still have the following questions. |
Yes, we would need a |
I have added |
I would go for the solution using only the |
I thought we wanted to have the slot |
Maybe we can get feedback from @jorainer ? I didn't use it so far with a |
Calling |
Okay. That is fine for me. I wasn't sure about the return of |
Sorry, I'm a little late on the discussion here. I think I haven't thought we will need to support also a |
Actually, I would prefer to first merge this PR before adding support for |
I agree. I too had the impression that this PR maybe is getting rather lenghty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this PR. Let's merge and then work separately on the SummarizedExperiment
.
Hi, this PR contains a first implementation of the
matchMz, CompareMassParam
that we discussed in issue #56. There are a few things I'm not very sure of and I will try to add specific question on the code.