-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
feat: support fuzzy matching, closes #504 #505
Conversation
WalkthroughThis pull request introduces fuzzy matching capabilities across various parts of the project. It standardizes code block formatting in the documentation, adds a new configuration option ( Changes
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
✅ Deploy Preview for easyops-cn-docusaurus-search-local ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for fuzzy matching by introducing a new configuration option (fuzzyMatchingDistance) and updating queries generation and tests accordingly. Key changes include:
- Adding a new "fuzzyMatchingDistance" option in configuration, its validation, and documentation.
- Modifying query generation functions and tests to handle fuzzy matching.
- Introducing a new webpack plugin for source mapping in the Docusaurus config.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
docusaurus-search-local/src/server/utils/generate.spec.ts | Added tests for fuzzy matching in the generate module |
website/docusaurus.config.js | Added a webpack plugin for source maps |
docusaurus-search-local/src/client/utils/smartQueries.ts | Updated import and query generation to support fuzzy matching |
docusaurus-search-local/src/client/utils/smartQueries.spec.ts | Added tests to verify fuzzy matching behavior |
docusaurus-search-local/src/server/utils/validateOptions.spec.ts | Updated options validation tests to include fuzzyMatchingDistance |
docusaurus-search-local/src/index.ts | Documented and defined fuzzyMatchingDistance option |
docusaurus-search-local/src/client/theme/worker.ts | Updated query term construction to include fuzzy matching |
README.md | Updated documentation to include fuzzyMatchingDistance |
docusaurus-search-local/src/shared/interfaces.ts | Added an optional "editDistance" property to query term items |
docusaurus-search-local/src/server/utils/validateOptions.ts | Extended validation schema with fuzzyMatchingDistance |
docusaurus-search-local/src/client/utils/mocks/proxiedGeneratedConstants.ts | Added fuzzyMatchingDistance mock configuration |
docusaurus-search-local/src/server/utils/generate.ts | Updated generated constants file to export fuzzyMatchingDistance |
Comments suppressed due to low confidence (2)
docusaurus-search-local/src/client/theme/worker.ts:77
- Spreading null with the spread operator can cause runtime errors; consider using an empty object (e.g., ... (item.editDistance ? { editDistance: item.editDistance } : {})) to safely include the property.
...(item.editDistance ? { editDistance: item.editDistance } : null),
docusaurus-search-local/src/client/utils/smartQueries.ts:112
- [nitpick] Consider renaming the parameter 'editDistance' to 'fuzzyMatchingDistance' for consistency with the rest of the codebase.
function getQueriesMaybeTyping(terms: SmartTerm[], editDistance?: number): SmartQuery[] {
8e072b4
to
ca1e766
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
docusaurus-search-local/src/server/utils/validateOptions.ts (1)
64-64
: Appropriate schema validation for fuzzy matchingAdding
fuzzyMatchingDistance
to the validation schema with a default value of 1 is a good choice. This provides a reasonable default that allows for one character difference in fuzzy matches, while still keeping precise matching as the primary behavior. The use ofJoi.number()
correctly restricts this value to numbers only.Consider documenting what this value actually means (Levenshtein distance) either in a code comment or in the project documentation, to help users understand how to configure this option effectively.
docusaurus-search-local/src/server/utils/generate.spec.ts (1)
240-255
: Great addition of the "fuzzy matching distance" test.This test validates that the generated file reflects the specified fuzzy matching distance. Consider adding boundary or exceptional tests (e.g., negative values or zero) to ensure robust coverage of all possible inputs.
docusaurus-search-local/src/client/utils/smartQueries.ts (1)
131-160
: Appropriate handling of short tokens under fuzzy matching.The logic to skip applying edit distance for tokens shorter than
editDistance
is suitable to prevent overly permissive matches. Nonetheless, be sure this aligns with user expectations; some might need fuzzy matching even on short tokens.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
README.md
(4 hunks)docusaurus-search-local/src/client/theme/worker.ts
(1 hunks)docusaurus-search-local/src/client/utils/__mocks__/proxiedGeneratedConstants.ts
(2 hunks)docusaurus-search-local/src/client/utils/smartQueries.spec.ts
(3 hunks)docusaurus-search-local/src/client/utils/smartQueries.ts
(2 hunks)docusaurus-search-local/src/declarations.ts
(1 hunks)docusaurus-search-local/src/index.ts
(1 hunks)docusaurus-search-local/src/server/utils/generate.spec.ts
(1 hunks)docusaurus-search-local/src/server/utils/generate.ts
(2 hunks)docusaurus-search-local/src/server/utils/validateOptions.spec.ts
(9 hunks)docusaurus-search-local/src/server/utils/validateOptions.ts
(1 hunks)docusaurus-search-local/src/shared/interfaces.ts
(1 hunks)website/docs/tutorial-basics/markdown-features.mdx
(1 hunks)website/docusaurus.config.js
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...earch-local-modal-shadow | inset 1px 1px 0 0 hsla(0, 0%, 100%, 0.5),
`0 3p...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...0%, 0.5),<br />
0 3px 8px 0 #555a64|
inset 1px 1px 0 0 #2c2e40,<br />
0 3px 8px 0 #309...
(ENGLISH_WORD_REPEAT_RULE)
🔇 Additional comments (21)
docusaurus-search-local/src/shared/interfaces.ts (1)
26-26
: Good addition to support fuzzy matchingThe addition of the optional
editDistance
property to theQueryTermItem
interface correctly supports the fuzzy matching functionality being implemented. This property will likely be used to specify the edit distance for matching terms during searches.docusaurus-search-local/src/declarations.ts (1)
33-33
: Good addition of fuzzy matching constantThe declaration of the
fuzzyMatchingDistance
constant aligns perfectly with the PR's objective to implement fuzzy matching functionality. This constant will likely be used throughout the codebase to configure the edit distance limit for fuzzy matching.docusaurus-search-local/src/index.ts (1)
202-209
: LGTM: Well-documented new configuration optionThe
fuzzyMatchingDistance
property has been properly added to the PluginOptions interface with clear JSDoc comments explaining its purpose, behavior, and default value.docusaurus-search-local/src/client/utils/__mocks__/proxiedGeneratedConstants.ts (2)
5-5
: LGTM: New variable for fuzzy matching configurationThe
fuzzyMatchingDistance
variable is properly initialized for mock testing purposes.
14-17
: LGTM: Setter function for test configurationThe
__setFuzzyMatchingDistance
function follows the established pattern used for other configuration options in this mock file.docusaurus-search-local/src/client/theme/worker.ts (1)
77-79
: LGTM: Clean implementation of conditional parameterGood use of the spread operator to conditionally include the editDistance parameter only when it's defined. This enables fuzzy matching functionality while maintaining clean code.
README.md (3)
45-68
: LGTM: Standardized code block formattingThe code block delimiters have been standardized from ````js
to ```js
for better consistency.
103-103
: LGTM: Clear documentation for new optionThe
fuzzyMatchingDistance
option is well-documented in the Theme Options table with a clear description and default value.
140-155
:✅ Verification successful
Verify markdown table formatting
The markdown table formatting looks good, but there are some warnings about repeated words that might be due to the
<br />
tags in the table cells.
🏁 Script executed:
#!/bin/bash # Verify the README.md renders correctly echo "This is a simple check to ensure the markdown table formatting renders correctly." echo "Note: The static analysis warnings about repeated words are likely false positives due to <br /> tags in code blocks." echo "No action needed - just confirming the markdown will render as expected."Length of output: 572
Markdown Table Formatting Verified – No Changes Required
- The markdown table in
README.md
(lines 140–155) renders correctly.- The static analysis warnings regarding repeated words are false positives caused by the
<br />
tags in the code blocks.- No modifications are needed.
🧰 Tools
🪛 LanguageTool
[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...earch-local-modal-shadow |inset 1px 1px 0 0 hsla(0, 0%, 100%, 0.5),
`0 3p...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...0%, 0.5),<br />
0 3px 8px 0 #555a64|
inset 1px 1px 0 0 #2c2e40,<br />
0 3px 8px 0 #309...(ENGLISH_WORD_REPEAT_RULE)
website/docusaurus.config.js (1)
131-140
: LGTM! Source map configuration looks good.Adding source maps will help with debugging, which is especially valuable when implementing new features like fuzzy matching. The plugin configuration follows the correct Docusaurus pattern.
docusaurus-search-local/src/server/utils/validateOptions.spec.ts (2)
60-60
: LGTM! Added fuzzyMatchingDistance to default options.The default value of 1 for fuzzyMatchingDistance is consistently applied across test cases, which aligns with the fuzzy matching implementation.
163-164
: LGTM! Testing explicit fuzzyMatchingDistance configuration.Good test case for when users explicitly set fuzzyMatchingDistance to 0, which would disable fuzzy matching.
docusaurus-search-local/src/client/utils/smartQueries.spec.ts (3)
6-6
: LGTM! New import for fuzzy matching configuration.Correctly importing the new function to set fuzzy matching distance for testing.
263-308
: LGTM! Comprehensive test suite for fuzzy matching.This new test suite effectively validates the fuzzy matching functionality:
- It correctly sets up the fuzzy matching distance
- Tests different combinations of tokens
- Verifies the generated query syntax includes the
~1
edit distance notation- Correctly handles cases where fuzzy matching shouldn't be applied (very short tokens)
The tests will ensure the feature works as expected.
326-329
: LGTM! Updated query transformation to include edit distance.This modification correctly implements the fuzzy matching syntax by appending the edit distance when applicable.
docusaurus-search-local/src/server/utils/generate.ts (2)
23-23
: LGTM! Added fuzzyMatchingDistance to configuration extraction.Correctly extracting the fuzzyMatchingDistance from the config object.
152-159
: LGTM! Export fuzzyMatchingDistance for client-side usage.The implementation follows the established pattern for exporting constants and will make the fuzzy matching distance available to the client-side code.
docusaurus-search-local/src/server/utils/generate.spec.ts (1)
236-238
: Looks good regarding the added expectation check.This new expectation ensures that
"export const Mark = null;"
is correctly written when using the "hashed with filename" option. No issues spotted.docusaurus-search-local/src/client/utils/smartQueries.ts (3)
4-8
: Import for fuzzy matching distance looks correct.Importing
fuzzyMatchingDistance
fromproxiedGeneratedConstants
is consistent with the other config-based imports. The approach is straightforward and no issues stand out.
111-115
: Neat approach for implementing the optional edit distance logic.The
getQueriesMaybeTyping
function now correctly passeseditDistance
to support fuzzy queries. The approach is clear. No functional concerns.Also applies to: 123-123
100-108
:❓ Verification inconclusive
Potential performance considerations when concatenating fuzzy and exact queries.
By returning both standard and fuzzy queries, plus their subsets, you might generate a large number of queries for big tokens arrays. This is fine for smaller inputs, but keep in mind possible performance overhead if
fuzzyMatchingDistance
or the number of tokens grows significantly.
🏁 Script executed:
#!/bin/bash # Exploratory script to identify potential performance hot spots when generating queries. # We'll grep usage patterns for smartQueries calls, to see if large token arrays or large distances can occur. rg 'smartQueries\([^)]+' -A 4Length of output: 2180
Performance Impact Consideration: Query Explosion Risk
The code currently builds both exact and fuzzy query arrays (for both primary and extra terms), which is perfectly acceptable for typical input sizes observed in tests and production. However, if the token arrays become very large or if
fuzzyMatchingDistance
is enabled, the number of generated queries can increase significantly, potentially impacting performance. Please ensure that, for scenarios with unusually large inputs or increased fuzzy matching thresholds, appropriate benchmarks or input-size safeguards are in place.
closes #504
Summary by CodeRabbit
New Features
Documentation
Configuration