closest for m/z - RT pair #20

michaelwitting · 2020-05-31T18:59:44Z

One thing I have to do quite often is to search for a specific m/z - RT pair in a data set. A function similar to closest would be great, but only returning the closest, but all matches. Maybe to be fit for the future already include the possibility of CCS values?

What do you think @jorainer ?

The text was updated successfully, but these errors were encountered:

jorainer · 2021-02-10T07:02:51Z

hm, could be that I already implemented this somewhere (can't remember if it was in xcms or some other package). That's indeed an important function. Maybe named like closestPair or similar? But it's definitely a tricky one!

michaelwitting · 2021-02-10T07:37:02Z

I don't know, could be that I missed it so far. But I think this is definitely something for MetaboCoreUtils. Can be used in MS1 annotation, alignment etc...

jorainer · 2023-09-06T06:35:36Z

Picking that issue up again: I would suggest the following definition:

mclosest <- function(x, table, ppm = 0, tolerance = Inf) {
...
}

where x and table can be two dimensional arrays (matrix or data.frame) with the same number of columns (doesn't have to be limited to 2). The function should then find for each row in x the row in table with the smallest distance considering each pair of columns (i.e. smallest difference between column 1 in both arrays, column 2 in both arrays etc). Other properties:

ppm and tolerance should be numeric of length 1 or equal to the number of columns of x.
the result should be an integer of length equal to the number of rows of x, each element being the index (row) in table with the closest values.
I would not use any similarity algorithm (like euclidian distance or similar) to calculate the similarity, because the columns are expected to contain values with different units (e.g. if x and table are data frames with m/z and retention time values.

Implementation suggestion:

calculate absolute difference between pairwise columns in x and tables (i.e. absolute difference of values in column 1 of x and table, absolute difference of values in column 2 of x and table etc.) - might be that we will need to loop over rows in x - or alternatively do some matrix operation?
replace differences larger than allowed by ppm and tolerance with NA
rank differences (or replace with their order)
return for each row in x the index of the row in table with the lowest rank product

The name mclosest should tell that this is a multi closest calculation... not perfect name, so open for alternative suggestions.

would that be something you would be OK with @michaelwitting ? I could let Philippine @philouail implement that.

michaelwitting · 2023-09-06T20:20:28Z

Will this always match columns called mz and then the additional one?
I'm just thinking how this could be used in a flexible manner to match retention times or collisional cross sections. Shall the user be allowed to define name of the column, which shall be used for the additional matching? Of course it has to be present then in both input data frames.

jorainer · 2023-09-07T06:16:44Z

I would require that both x and table have the same number of columns. That would keep this function very generic and could be applied to many different use cases. The user has to ensure that these are provided in the correct order (i.e. first columns being m/z, second columns retention times, third columns ...).

Examples:

mclosest(a[, "mzmed", "rtmed"], b[, "mz", "rt"]) would return for each row in a the index in table with the best match.
mclosest(a[, "mzmed", "rtmed"], b[, "mz", "rt"], ppm = 0, tolerance = c(0.01, 2)) would also return the best match, but only if the difference between the m/z values in a and b is below 0.01 and the difference in retention times is below 2.

does this make sense?

michaelwitting · 2023-09-07T08:12:25Z

Makes totally sense to me.

jorainer · 2023-10-02T08:53:38Z

@philouail implemented this now (PR #71). It's in the main branch and I'll push to Bioconductor.

michaelwitting added the enhancement New feature or request label May 31, 2020

michaelwitting mentioned this issue Feb 9, 2021

Retention time indexing for metabolomics #33

Closed

philouail self-assigned this Sep 12, 2023

This was referenced Sep 29, 2023

addition of mclosest function #70

Closed

addition mclosest #71

Merged

jorainer closed this as completed Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

closest for m/z - RT pair #20

closest for m/z - RT pair #20

michaelwitting commented May 31, 2020

jorainer commented Feb 10, 2021

michaelwitting commented Feb 10, 2021

jorainer commented Sep 6, 2023

michaelwitting commented Sep 6, 2023

jorainer commented Sep 7, 2023

michaelwitting commented Sep 7, 2023

jorainer commented Oct 2, 2023

closest for m/z - RT pair #20

closest for m/z - RT pair #20

Comments

michaelwitting commented May 31, 2020

jorainer commented Feb 10, 2021

michaelwitting commented Feb 10, 2021

jorainer commented Sep 6, 2023

michaelwitting commented Sep 6, 2023

jorainer commented Sep 7, 2023

michaelwitting commented Sep 7, 2023

jorainer commented Oct 2, 2023