More permissive search (substring, typo) #504

TurtleSmoke · 2025-03-09T12:23:29Z

It would be nice to have an option that offers a more permissive search. For example, with the word specialize.

What works in the search:

specialize
special
spe

What doesn't work:

specializ (why?)
pecialize (or other substrings)
sepcialize (typo)

I think substrings should be easy to implement, but I'm not familiar enough with the code to do it. Any guidance?

The text was updated successfully, but these errors were encountered:

weareoutman · 2025-03-10T02:10:38Z

I agree it will be great to support approximate string matching.

Under the hood, we use Lunr.js, which implements The Porter Stemming Algorithm. The word specialize seems to be stemmed to special.

Lunr.js also supports wildcard matching, but we have enabled trailing wildcard only right now. While it seems we can apply the approximate string matching by constructing wildcard patterns, it needs some effort.

weareoutman · 2025-03-12T02:24:31Z

Introduced fuzzyMatchingDistance which defaults to 1 since v0.49.0.

While this will not resolve the stemming issue (E.g., can't match specialize with specializ), will look into it later.

TurtleSmoke · 2025-03-12T09:42:45Z

Thanks for the update, I've tested it and I observed some weird behavior using a fuzzy of 2:

If I type wriet it will return in order:

Printing (in Title, i.e. # Printing) (error?)
write (in title) (OK)
print (in codeblock) (wriet -> priet -> print: OK)
Writing (in Title) (error?)
write in "text" (OK)
writing in "text" (error?)
wrote in "text" (wriet -> write -> wrote: OK because transposing is allowed)
printing in "text" (error?)
brief in "text" (wriet -> briet -> brief: OK)

I'm curious because Printing, writing, etc... Should not be detected based on lunr documentation. I think it may be due to the wildcard trailing that allowed better search for incomplete word.

But the main problem IMO is the ordering, write should be ranked higher than printing (even if they both match for whatever reason). Again, I believe it's due to the wildcard trailing, but I don't understand why the priority is Printing > write > Writing in 'Title' but write > writing > printing in 'text.

Furthermore, with highlighting, it select what the user type and not the "fuzzy" word. In this example if I click on "write", it will try to highlight "wriet".

Also, I feel like there is a weird interaction between The Porter Stemming Algorithm and the fuzzy finding, I don't know which one has priority over the other. It does not really matter and is only visible with weird query, but for the backlog, I put it here.

weareoutman · 2025-03-12T10:50:20Z

Lunr.js will give the same score for different edit distances. See olivernn/lunr.js#383

We need to add boosts along the edit distance.

Ref #504

weareoutman · 2025-03-12T12:13:57Z

We're using a different approach (constructing multiple queries in order), try v0.49.1

TurtleSmoke · 2025-03-12T15:12:49Z

Thanks for the fix!

I've just retried, and in my opinion, it's much better: the ranking feels more natural. However, there are still some issues:

write or wriet returns randomly ranked occurrences of both write and writing (BAD: expected write and then writting)
writing behaves the same way. (BAD: expected writing then write)
writingg does not work, while writinge and writings do. (BAD: expected writing)

With removeDefaultStemmer: true:

write and wriet correctly returns all occurrences of write first, followed by wrote, and not writing (GOOD: I think is expected).
wri**tt**ing first returns all write occurrences, followed by many unrelated words (bit, wi, fait, droit, suit). (BAD: very noisy results)
writingg works and returns only writing. (GOOD: as expected)

So, I feel like the stemming is not well integrated with fuzzy searching. On its own (and on other example), it works well, but when combined with the fuzzy option, the results feel inconsistent.

weareoutman added the feature request label Mar 10, 2025

weareoutman added a commit that referenced this issue Mar 12, 2025

feat: support fuzzy matching, closes #504

8e072b4

weareoutman mentioned this issue Mar 12, 2025

feat: support fuzzy matching, closes #504 #505

Merged

weareoutman added a commit that referenced this issue Mar 12, 2025

feat: support fuzzy matching, closes #504

ca1e766

weareoutman closed this as completed in #505 Mar 12, 2025

weareoutman closed this as completed in c8310f3 Mar 12, 2025

easyops-eve mentioned this issue Mar 12, 2025

chore(master): release 0.49.0 #506

Merged

weareoutman added a commit that referenced this issue Mar 12, 2025

fix: refine fuzzy matching order

60d2b0b

Ref #504

weareoutman mentioned this issue Mar 12, 2025

fix: refine fuzzy matching order #507

Merged

weareoutman added a commit that referenced this issue Mar 12, 2025

fix: refine fuzzy matching order (#507)

646b5a0

Ref #504

easyops-eve mentioned this issue Mar 12, 2025

chore(master): release 0.49.1 #508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More permissive search (substring, typo) #504

More permissive search (substring, typo) #504

TurtleSmoke commented Mar 9, 2025 •

edited

Loading

weareoutman commented Mar 10, 2025

weareoutman commented Mar 12, 2025 •

edited

Loading

TurtleSmoke commented Mar 12, 2025

weareoutman commented Mar 12, 2025 •

edited

Loading

weareoutman commented Mar 12, 2025

TurtleSmoke commented Mar 12, 2025

More permissive search (substring, typo) #504

More permissive search (substring, typo) #504

Comments

TurtleSmoke commented Mar 9, 2025 • edited Loading

weareoutman commented Mar 10, 2025

weareoutman commented Mar 12, 2025 • edited Loading

TurtleSmoke commented Mar 12, 2025

weareoutman commented Mar 12, 2025 • edited Loading

weareoutman commented Mar 12, 2025

TurtleSmoke commented Mar 12, 2025

TurtleSmoke commented Mar 9, 2025 •

edited

Loading

weareoutman commented Mar 12, 2025 •

edited

Loading

weareoutman commented Mar 12, 2025 •

edited

Loading