Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: support fuzzy matching, closes #504 #505

Merged
merged 2 commits into from
Mar 12, 2025
Merged

Conversation

weareoutman
Copy link
Member

@weareoutman weareoutman commented Mar 12, 2025

closes #504

Summary by CodeRabbit

  • New Features

    • Enhanced search functionality with fuzzy matching for more flexible query results.
    • Introduced a configurable option to adjust fuzzy matching distance via Theme Options.
  • Documentation

    • Updated code snippet formatting and expanded documentation to include fuzzy matching details.
    • Added a new tutorial section demonstrating specialized test examples.
  • Configuration

    • Integrated source-map support for improved debugging during builds.

Copy link

coderabbitai bot commented Mar 12, 2025

Walkthrough

This pull request introduces fuzzy matching capabilities across various parts of the project. It standardizes code block formatting in the documentation, adds a new configuration option (fuzzyMatchingDistance with a default of 1) to control edit distance in searches, and updates public APIs and interfaces accordingly. Client-side query functions and test cases are enhanced for fuzzy matching, while server utilities and validation schemas are modified to process the new configuration. Additionally, a new section is added to the tutorial docs and a Webpack plugin is introduced in the site configuration.

Changes

File(s) Change Summary
README.md Standardized JavaScript code block delimiters; added fuzzyMatchingDistance in Theme Options with a default of 1; reformatted CSS variables table.
docusaurus-search-local/.../theme/worker.ts,
docusaurus-search-local/.../utils/__mocks__/proxiedGeneratedConstants.ts,
docusaurus-search-local/.../utils/smartQueries.ts,
docusaurus-search-local/.../utils/smartQueries.spec.ts
Integrated fuzzy matching in search queries by conditionally including editDistance; introduced helper __setFuzzyMatchingDistance and updated related tests.
docusaurus-search-local/.../declarations.ts,
docusaurus-search-local/.../index.ts,
docusaurus-search-local/.../shared/interfaces.ts
Added new public constant and optional properties for fuzzy matching; extended PluginOptions and QueryTermItem interfaces.
docusaurus-search-local/.../server/utils/generate.ts,
docusaurus-search-local/.../server/utils/generate.spec.ts,
docusaurus-search-local/.../server/utils/validateOptions.ts,
docusaurus-search-local/.../server/utils/validateOptions.spec.ts
Updated the generate function and validation schema to incorporate fuzzyMatchingDistance; added tests to verify proper handling and export of this configuration.
website/docs/tutorial-basics/markdown-features.mdx,
website/docusaurus.config.js
Added a new "Specialize" section in the tutorial document; introduced a new Webpack plugin for source map configuration in the site’s configuration.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

netlify bot commented Mar 12, 2025

Deploy Preview for easyops-cn-docusaurus-search-local ready!

Name Link
🔨 Latest commit 32cca80
🔍 Latest deploy log https://app.netlify.com/sites/easyops-cn-docusaurus-search-local/deploys/67d0ee88bc9c1f0008127f62
😎 Deploy Preview https://deploy-preview-505--easyops-cn-docusaurus-search-local.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@weareoutman weareoutman requested a review from Copilot March 12, 2025 02:08

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for fuzzy matching by introducing a new configuration option (fuzzyMatchingDistance) and updating queries generation and tests accordingly. Key changes include:

  • Adding a new "fuzzyMatchingDistance" option in configuration, its validation, and documentation.
  • Modifying query generation functions and tests to handle fuzzy matching.
  • Introducing a new webpack plugin for source mapping in the Docusaurus config.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docusaurus-search-local/src/server/utils/generate.spec.ts Added tests for fuzzy matching in the generate module
website/docusaurus.config.js Added a webpack plugin for source maps
docusaurus-search-local/src/client/utils/smartQueries.ts Updated import and query generation to support fuzzy matching
docusaurus-search-local/src/client/utils/smartQueries.spec.ts Added tests to verify fuzzy matching behavior
docusaurus-search-local/src/server/utils/validateOptions.spec.ts Updated options validation tests to include fuzzyMatchingDistance
docusaurus-search-local/src/index.ts Documented and defined fuzzyMatchingDistance option
docusaurus-search-local/src/client/theme/worker.ts Updated query term construction to include fuzzy matching
README.md Updated documentation to include fuzzyMatchingDistance
docusaurus-search-local/src/shared/interfaces.ts Added an optional "editDistance" property to query term items
docusaurus-search-local/src/server/utils/validateOptions.ts Extended validation schema with fuzzyMatchingDistance
docusaurus-search-local/src/client/utils/mocks/proxiedGeneratedConstants.ts Added fuzzyMatchingDistance mock configuration
docusaurus-search-local/src/server/utils/generate.ts Updated generated constants file to export fuzzyMatchingDistance
Comments suppressed due to low confidence (2)

docusaurus-search-local/src/client/theme/worker.ts:77

  • Spreading null with the spread operator can cause runtime errors; consider using an empty object (e.g., ... (item.editDistance ? { editDistance: item.editDistance } : {})) to safely include the property.
...(item.editDistance ? { editDistance: item.editDistance } : null),

docusaurus-search-local/src/client/utils/smartQueries.ts:112

  • [nitpick] Consider renaming the parameter 'editDistance' to 'fuzzyMatchingDistance' for consistency with the rest of the codebase.
function getQueriesMaybeTyping(terms: SmartTerm[], editDistance?: number): SmartQuery[] {
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
docusaurus-search-local/src/server/utils/validateOptions.ts (1)

64-64: Appropriate schema validation for fuzzy matching

Adding fuzzyMatchingDistance to the validation schema with a default value of 1 is a good choice. This provides a reasonable default that allows for one character difference in fuzzy matches, while still keeping precise matching as the primary behavior. The use of Joi.number() correctly restricts this value to numbers only.

Consider documenting what this value actually means (Levenshtein distance) either in a code comment or in the project documentation, to help users understand how to configure this option effectively.

docusaurus-search-local/src/server/utils/generate.spec.ts (1)

240-255: Great addition of the "fuzzy matching distance" test.

This test validates that the generated file reflects the specified fuzzy matching distance. Consider adding boundary or exceptional tests (e.g., negative values or zero) to ensure robust coverage of all possible inputs.

docusaurus-search-local/src/client/utils/smartQueries.ts (1)

131-160: Appropriate handling of short tokens under fuzzy matching.

The logic to skip applying edit distance for tokens shorter than editDistance is suitable to prevent overly permissive matches. Nonetheless, be sure this aligns with user expectations; some might need fuzzy matching even on short tokens.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31f98a5 and ca1e766.

📒 Files selected for processing (14)
  • README.md (4 hunks)
  • docusaurus-search-local/src/client/theme/worker.ts (1 hunks)
  • docusaurus-search-local/src/client/utils/__mocks__/proxiedGeneratedConstants.ts (2 hunks)
  • docusaurus-search-local/src/client/utils/smartQueries.spec.ts (3 hunks)
  • docusaurus-search-local/src/client/utils/smartQueries.ts (2 hunks)
  • docusaurus-search-local/src/declarations.ts (1 hunks)
  • docusaurus-search-local/src/index.ts (1 hunks)
  • docusaurus-search-local/src/server/utils/generate.spec.ts (1 hunks)
  • docusaurus-search-local/src/server/utils/generate.ts (2 hunks)
  • docusaurus-search-local/src/server/utils/validateOptions.spec.ts (9 hunks)
  • docusaurus-search-local/src/server/utils/validateOptions.ts (1 hunks)
  • docusaurus-search-local/src/shared/interfaces.ts (1 hunks)
  • website/docs/tutorial-basics/markdown-features.mdx (1 hunks)
  • website/docusaurus.config.js (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...earch-local-modal-shadow | inset 1px 1px 0 0 hsla(0, 0%, 100%, 0.5),
`0 3p...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...0%, 0.5),<br />0 3px 8px 0 #555a64|inset 1px 1px 0 0 #2c2e40,<br />0 3px 8px 0 #309...

(ENGLISH_WORD_REPEAT_RULE)

🔇 Additional comments (21)
docusaurus-search-local/src/shared/interfaces.ts (1)

26-26: Good addition to support fuzzy matching

The addition of the optional editDistance property to the QueryTermItem interface correctly supports the fuzzy matching functionality being implemented. This property will likely be used to specify the edit distance for matching terms during searches.

docusaurus-search-local/src/declarations.ts (1)

33-33: Good addition of fuzzy matching constant

The declaration of the fuzzyMatchingDistance constant aligns perfectly with the PR's objective to implement fuzzy matching functionality. This constant will likely be used throughout the codebase to configure the edit distance limit for fuzzy matching.

docusaurus-search-local/src/index.ts (1)

202-209: LGTM: Well-documented new configuration option

The fuzzyMatchingDistance property has been properly added to the PluginOptions interface with clear JSDoc comments explaining its purpose, behavior, and default value.

docusaurus-search-local/src/client/utils/__mocks__/proxiedGeneratedConstants.ts (2)

5-5: LGTM: New variable for fuzzy matching configuration

The fuzzyMatchingDistance variable is properly initialized for mock testing purposes.


14-17: LGTM: Setter function for test configuration

The __setFuzzyMatchingDistance function follows the established pattern used for other configuration options in this mock file.

docusaurus-search-local/src/client/theme/worker.ts (1)

77-79: LGTM: Clean implementation of conditional parameter

Good use of the spread operator to conditionally include the editDistance parameter only when it's defined. This enables fuzzy matching functionality while maintaining clean code.

README.md (3)

45-68: LGTM: Standardized code block formatting

The code block delimiters have been standardized from ````js to ```js for better consistency.


103-103: LGTM: Clear documentation for new option

The fuzzyMatchingDistance option is well-documented in the Theme Options table with a clear description and default value.


140-155:

✅ Verification successful

Verify markdown table formatting

The markdown table formatting looks good, but there are some warnings about repeated words that might be due to the <br /> tags in the table cells.


🏁 Script executed:

#!/bin/bash
# Verify the README.md renders correctly
echo "This is a simple check to ensure the markdown table formatting renders correctly."
echo "Note: The static analysis warnings about repeated words are likely false positives due to <br /> tags in code blocks."
echo "No action needed - just confirming the markdown will render as expected."

Length of output: 572


Markdown Table Formatting Verified – No Changes Required

  • The markdown table in README.md (lines 140–155) renders correctly.
  • The static analysis warnings regarding repeated words are false positives caused by the <br /> tags in the code blocks.
  • No modifications are needed.
🧰 Tools
🪛 LanguageTool

[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...earch-local-modal-shadow | inset 1px 1px 0 0 hsla(0, 0%, 100%, 0.5),
`0 3p...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~143-~143: Possible typo: you repeated a word.
Context: ...0%, 0.5),<br />0 3px 8px 0 #555a64|inset 1px 1px 0 0 #2c2e40,<br />0 3px 8px 0 #309...

(ENGLISH_WORD_REPEAT_RULE)

website/docusaurus.config.js (1)

131-140: LGTM! Source map configuration looks good.

Adding source maps will help with debugging, which is especially valuable when implementing new features like fuzzy matching. The plugin configuration follows the correct Docusaurus pattern.

docusaurus-search-local/src/server/utils/validateOptions.spec.ts (2)

60-60: LGTM! Added fuzzyMatchingDistance to default options.

The default value of 1 for fuzzyMatchingDistance is consistently applied across test cases, which aligns with the fuzzy matching implementation.


163-164: LGTM! Testing explicit fuzzyMatchingDistance configuration.

Good test case for when users explicitly set fuzzyMatchingDistance to 0, which would disable fuzzy matching.

docusaurus-search-local/src/client/utils/smartQueries.spec.ts (3)

6-6: LGTM! New import for fuzzy matching configuration.

Correctly importing the new function to set fuzzy matching distance for testing.


263-308: LGTM! Comprehensive test suite for fuzzy matching.

This new test suite effectively validates the fuzzy matching functionality:

  1. It correctly sets up the fuzzy matching distance
  2. Tests different combinations of tokens
  3. Verifies the generated query syntax includes the ~1 edit distance notation
  4. Correctly handles cases where fuzzy matching shouldn't be applied (very short tokens)

The tests will ensure the feature works as expected.


326-329: LGTM! Updated query transformation to include edit distance.

This modification correctly implements the fuzzy matching syntax by appending the edit distance when applicable.

docusaurus-search-local/src/server/utils/generate.ts (2)

23-23: LGTM! Added fuzzyMatchingDistance to configuration extraction.

Correctly extracting the fuzzyMatchingDistance from the config object.


152-159: LGTM! Export fuzzyMatchingDistance for client-side usage.

The implementation follows the established pattern for exporting constants and will make the fuzzy matching distance available to the client-side code.

docusaurus-search-local/src/server/utils/generate.spec.ts (1)

236-238: Looks good regarding the added expectation check.

This new expectation ensures that "export const Mark = null;" is correctly written when using the "hashed with filename" option. No issues spotted.

docusaurus-search-local/src/client/utils/smartQueries.ts (3)

4-8: Import for fuzzy matching distance looks correct.

Importing fuzzyMatchingDistance from proxiedGeneratedConstants is consistent with the other config-based imports. The approach is straightforward and no issues stand out.


111-115: Neat approach for implementing the optional edit distance logic.

The getQueriesMaybeTyping function now correctly passes editDistance to support fuzzy queries. The approach is clear. No functional concerns.

Also applies to: 123-123


100-108:

❓ Verification inconclusive

Potential performance considerations when concatenating fuzzy and exact queries.

By returning both standard and fuzzy queries, plus their subsets, you might generate a large number of queries for big tokens arrays. This is fine for smaller inputs, but keep in mind possible performance overhead if fuzzyMatchingDistance or the number of tokens grows significantly.


🏁 Script executed:

#!/bin/bash
# Exploratory script to identify potential performance hot spots when generating queries.
# We'll grep usage patterns for smartQueries calls, to see if large token arrays or large distances can occur.

rg 'smartQueries\([^)]+' -A 4

Length of output: 2180


Performance Impact Consideration: Query Explosion Risk

The code currently builds both exact and fuzzy query arrays (for both primary and extra terms), which is perfectly acceptable for typical input sizes observed in tests and production. However, if the token arrays become very large or if fuzzyMatchingDistance is enabled, the number of generated queries can increase significantly, potentially impacting performance. Please ensure that, for scenarios with unusually large inputs or increased fuzzy matching thresholds, appropriate benchmarks or input-size safeguards are in place.

@weareoutman weareoutman merged commit c8310f3 into master Mar 12, 2025
5 of 6 checks passed
@weareoutman weareoutman deleted the steve/fuzzy-search branch March 12, 2025 02:17
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More permissive search (substring, typo)
1 participant