Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

vec_locate_matches() returns unexpected multiple needle matches #1960

Closed
t-kalinowski opened this issue Jan 17, 2025 · 1 comment
Closed

Comments

@t-kalinowski
Copy link
Member

I can't tell if this is a bug or me not understanding what vec_locate_matches() does.

In this example, I would expect that nrow(matches) == length(needles), because of the filter = "max" arg. But the last two needles entries have multiple corresponding haystack matches. Is this expected?

library(vctrs)
haystack <-
  c(6L, 107L, 108L, 112L, 113L, 193L, 265L, 375L, 419L, 420L, 544L, 
    639L, 679L, 797L, 798L, 830L, 947L, 996L, 1145L, 1146L, 1183L, 
    1219L, 1358L, 1414L, 1415L, 1448L, 1517L, 1607L, 1722L, 1723L, 
    1727L, 1728L, 1829L, 1865L, 1865L)
needles <-
  c(501L, 693L, 920L, 1179L, 1496L, 1719L, 1948L, 2228L)

matches <- vec_locate_matches(needles,
                              haystack,
                              condition = ">=",
                              filter = "max")


nrow(matches) == length(needles)
#> [1] FALSE
any(duplicated(matches$needles))
#> [1] TRUE

matches
#>    needles haystack
#> 1        1       10
#> 2        2       13
#> 3        3       16
#> 4        4       20
#> 5        5       26
#> 6        6       28
#> 7        7       34
#> 8        7       35
#> 9        8       34
#> 10       8       35

Created on 2025-01-17 with reprex v2.1.1

@t-kalinowski
Copy link
Member Author

t-kalinowski commented Jan 17, 2025

I just realized it's because the matched entries are duplicates in haystack. Calling unique(haystack) or passing arg multiple = "any" leads to the expected output from vec_locate_matches().

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant