Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: add wildcard expr #7

Merged
merged 2 commits into from
Oct 31, 2024
Merged

feat: add wildcard expr #7

merged 2 commits into from
Oct 31, 2024

Conversation

laojianzi
Copy link
Owner

@laojianzi laojianzi commented Oct 31, 2024

Summary by Sourcery

Add support for wildcard expressions in the parser, enabling pattern matching with identifiers and strings. Refactor existing expression handling to accommodate this new feature and introduce extensive tests to validate the implementation.

New Features:

  • Introduce wildcard expressions in the parser, allowing for flexible pattern matching with identifiers and strings.

Tests:

  • Add comprehensive tests for wildcard expressions.

@laojianzi laojianzi self-assigned this Oct 31, 2024
Copy link

sourcery-ai bot commented Oct 31, 2024

Reviewer's Guide by Sourcery

This PR refactors the AST package by splitting it into multiple files and adds support for wildcard expressions in the parser. The implementation moves each AST node type into its own file and introduces a new WildcardExpr type for handling wildcard patterns in identifiers and strings.

Class diagram for AST refactoring and WildcardExpr addition

classDiagram
    class Expr {
        <<interface>>
        +Pos() int
        +End() int
        +String() string
    }

    class ParenExpr {
        +L int
        +R int
        +Expr Expr
        +Pos() int
        +End() int
        +String() string
    }
    Expr <|-- ParenExpr

    class CombineExpr {
        +LeftExpr Expr
        +Keyword token.Kind
        +RightExpr Expr
        +Pos() int
        +End() int
        +String() string
    }
    Expr <|-- CombineExpr

    class BinaryExpr {
        +pos int
        +Field string
        +Operator token.Kind
        +Value Expr
        +HasNot bool
        +Pos() int
        +End() int
        +String() string
    }
    Expr <|-- BinaryExpr

    class Literal {
        +pos int
        +end int
        +Kind token.Kind
        +Value string
        +WithDoubleQuote bool
        +Pos() int
        +End() int
        +String() string
    }
    Expr <|-- Literal

    class WildcardExpr {
        +Literal *Literal
        +Indexes []int
        +Pos() int
        +End() int
        +String() string
    }
    Expr <|-- WildcardExpr
    Literal <|-- WildcardExpr

    note for WildcardExpr "New class for handling wildcard patterns in identifiers and strings"
Loading

File-Level Changes

Change Details Files
Split AST package into multiple files for better organization
  • Moved each AST node type (Binary, Combine, Literal, Paren, Wildcard) into separate files
  • Added corresponding test files for each AST node type
  • Removed the original monolithic ast.go file
ast/ast.go
ast/binary.go
ast/binary_test.go
ast/combine.go
ast/combine_test.go
ast/literal.go
ast/literal_test.go
ast/paren.go
ast/paren_test.go
Implement wildcard expression support
  • Added new WildcardExpr type for handling wildcard patterns
  • Modified parser to detect and handle wildcard characters (*) in identifiers and strings
  • Added support for escaped wildcards (*)
  • Added comprehensive tests for wildcard expressions
ast/wildcard.go
ast/wildcard_test.go
parser/parser.go
parser/lexer.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

deepsource-io bot commented Oct 31, 2024

Here's the code health analysis summary for commits f992192..16c6e90. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource Go LogoGo✅ SuccessView Check ↗
DeepSource Test coverage LogoTest coverage✅ Success
❗ 9 occurences introduced
🎯 4 occurences resolved
View Check ↗

Code Coverage Report

MetricAggregateGo
Composite Coverage83.2% (down 1.1% from main)83.2% (down 1.1% from main)
Line Coverage83.2% (down 1.1% from main)83.2% (down 1.1% from main)
New Composite Coverage81%81%
New Line Coverage81%81%

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @laojianzi - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


var indexes []int

runes := []rune(tok.Value)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The wildcard detection logic needs to properly handle escaped sequences

The current implementation only checks for a single backslash before '*', but doesn't handle cases where the backslash itself is escaped (e.g. "foo\*bar"). Consider implementing proper escape sequence parsing that accounts for all levels of escaping.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progressing #3

@laojianzi laojianzi mentioned this pull request Oct 31, 2024
12 tasks
@laojianzi laojianzi merged commit ce58a86 into main Oct 31, 2024
4 checks passed
Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 PR Summary

  • Purpose and scope of changes: Introduce support for wildcard expressions in the KQL parser, allowing for flexible pattern matching with identifiers and strings. The changes involve refactoring the existing expression handling to accommodate this new feature and introducing extensive tests to validate the implementation.
  • Key components modified: AST package and parser logic.
  • Impact assessment: Improved query flexibility and robustness, potentially increasing the complexity of the parser logic.

1.2 Architecture Changes

  • System design modifications: Split the AST package into multiple files for better organization. Each AST node type (Binary, Combine, Literal, Paren, Wildcard) is moved into separate files.
  • Component interactions: The parser is modified to detect and handle wildcard characters (*) in identifiers and strings.
  • Integration points: The new WildcardExpr type is introduced to handle wildcard patterns, and corresponding test files are added for each AST node type.

2. Detailed Technical Analysis

2.1 Code Logic Deep-Dive

Core Logic Changes

Wildcard Detection Logic

  • File Path: parser/parser.go
    • Submitted PR Code:
      func (p *defaultParser) parseWildcard() (ast.Expr, error) {
          kind := p.lexer.Token.Kind
      
          tok, err := p.expect(kind)
          if err != nil {
              return nil, err
          }
      
          pos, end := tok.Pos, tok.End
          if kind == token.TokenKindString { // with double quote "
              pos -= 1
              end += 1
          }
      
          lit := ast.NewLiteral(pos, end, kind, tok.Value)
          if kind != token.TokenKindIdent && kind != token.TokenKindString {
              return lit, nil
          }
      
          var indexes []int
          runes := []rune(tok.Value)
          for i := range runes {
              if runes[i] == '*' && (i == 0 || runes[i-1] != '\\') { // skip escaped wildcard
                  indexes = append(indexes, i)
              }
          }
      
          if len(indexes) == 0 { // not found wildcard
              return lit, nil
          }
      
          return ast.NewWildcardExpr(lit, indexes), nil
      }
    • Analysis:
      • The current logic only checks for a single backslash before '*'.
      • Does not handle cases where the backslash itself is escaped (e.g. "foo\*bar").
    • LlamaPReview Suggested Improvements:
      func (p *defaultParser) parseWildcard() (ast.Expr, error) {
          kind := p.lexer.Token.Kind
      
          tok, err := p.expect(kind)
          if err != nil {
              return nil, err
          }
      
          pos, end := tok.Pos, tok.End
          if kind == token.TokenKindString { // with double quote "
              pos -= 1
              end += 1
          }
      
          lit := ast.NewLiteral(pos, end, kind, tok.Value)
          if kind != token.TokenKindIdent && kind != token.TokenKindString {
              return lit, nil
          }
      
          var indexes []int
          runes := []rune(tok.Value)
          escape := false
          for i := range runes {
              if !escape && runes[i] == '\\' {
                  escape = true
                  continue
              }
              if runes[i] == '*' && !escape {
                  indexes = append(indexes, i)
              }
              escape = false
          }
      
          if len(indexes) == 0 { // not found wildcard
              return lit, nil
          }
      
          return ast.NewWildcardExpr(lit, indexes), nil
      }

2.2 Implementation Quality

  • Code organization and structure: The AST package is now better organized with each node type in its own file.
  • Design patterns usage: The parser logic is modular and follows good design principles.
  • Error handling approach: Errors are handled gracefully, but there is room for improvement in the wildcard detection logic.
  • Resource management: Memory usage may increase slightly due to the storage of wildcard indexes.

3. Critical Findings

3.1 Potential Issues

  • 🔴 Critical Issues

    • Issue description: The wildcard detection logic needs to properly handle escaped sequences.
    • Impact: Incorrect parsing of escaped wildcards can lead to unexpected behavior.
    • Recommendation: Improve the wildcard detection logic to handle multiple levels of escaping.
  • 🟡 Warnings

    • Warning description: Ensure comprehensive test coverage for various wildcard patterns.
    • Potential risks: Missing edge cases can lead to incomplete testing.
    • Suggested improvements: Add more test cases to cover edge cases and complex patterns.

3.2 Code Quality Concerns

  • Maintainability aspects: The code is well-organized and modular.
  • Readability issues: The code is generally readable, but the wildcard detection logic could be clearer.
  • Performance bottlenecks: The parser's performance with wildcard expressions needs to be validated through performance testing.

4. Security Assessment

4.1 Security Considerations

  • Authentication/Authorization impacts: No direct impact.
  • Data handling concerns: No direct impact.
  • Input validation: Ensure that wildcard patterns do not introduce new attack vectors.
  • Security best practices: Validate that the parser correctly handles escaped sequences to prevent injection attacks.

4.2 Vulnerability Analysis

  • Potential security risks: Incorrect handling of escaped wildcards.
  • Mitigation strategies: Implement proper escape sequence parsing.
  • Security testing requirements: Test the parser with various wildcard patterns to ensure correct behavior.

5. Testing Strategy

5.1 Test Coverage

  • Unit test analysis: Ensure all AST node types are thoroughly tested.
  • Integration test requirements: Test the parser with complex queries containing wildcards.
  • Edge cases coverage: Cover cases with multiple wildcards and escaped characters.

5.2 Test Recommendations

Suggested Test Cases

func TestWildcard(t *testing.T) {
    type args struct {
        pos             int
        end             int
        kind            token.Kind
        value           string
        withDoubleQuote bool
        indexes         []int
    }

    cases := []struct {
        name        string
        args        args
        wantPos     int
        wantEnd     int
        wantString  string
        wantIndexes []int
    }{
        {
            name: "only wildcard on ident",
            args: args{
                end:   1,
                kind:  token.TokenKindIdent,
                value: "*",
            },
            wantEnd:     1,
            wantString:  "*",
            wantIndexes: []int{0},
        },
        {
            name: "only wildcard on string",
            args: args{
                pos:             1,
                end:             2,
                kind:            token.TokenKindString,
                value:           "*",
                withDoubleQuote: true,
            },
            wantPos:     1,
            wantEnd:     2,
            wantString:  `"*"`,
            wantIndexes: []int{1},
        },
        {
            name: "int value with wildcard on ident",
            args: args{
                end:   3,
                kind:  token.TokenKindIdent,
                value: "4*9",
            },
            wantEnd:     3,
            wantString:  "4*9",
            wantIndexes: []int{1},
        },
        {
            name: "int value with multi-wildcard on ident",
            args: args{
                end:   3,
                kind:  token.TokenKindIdent,
                value: "*0*",
            },
            wantEnd:     3,
            wantString:  "*0*",
            wantIndexes: []int{0, 2},
        },
        {
            name: "float value with wildcard on ident",
            args: args{
                end:   4,
                kind:  token.TokenKindIdent,
                value: "0.*9",
            },
            wantEnd:     4,
            wantString:  "0.*9",
            wantIndexes: []int{2},
        },
        {
            name: "float value with multi-wildcard on ident",
            args: args{
                end:   4,
                kind:  token.TokenKindIdent,
                value: "*.9*",
            },
            wantEnd:     4,
            wantString:  "*.9*",
            wantIndexes: []int{0, 3},
        },
        {
            name: "string value with wildcard on ident",
            args: args{
                end:   3,
                kind:  token.TokenKindIdent,
                value: "f*o",
            },
            wantEnd:     3,
            wantString:  "f*o",
            wantIndexes: []int{1},
        },
        {
            name: "string value with multi-wildcard on ident",
            args: args{
                end:   3,
                kind:  token.TokenKindIdent,
                value: "*o*",
            },
            wantEnd:     3,
            wantString:  "*o*",
            wantIndexes: []int{0, 2},
        },
        {
            name: "value with wildcard on string",
            args: args{
                end:             5,
                kind:            token.TokenKindString,
                value:           "f*o",
                withDoubleQuote: true,
            },
            wantEnd:     5,
            wantString:  `"f*o"`,
            wantIndexes: []int{2},
        },
        {
            name: "value with multi-wildcard on string",
            args: args{
                end:             5,
                kind:            token.TokenKindString,
                value:           "*o*",
                withDoubleQuote: true,
            },
            wantEnd:     5,
            wantString:  `"*o*"`,
            wantIndexes: []int{1, 3},
        },
    }

    for _, c := range cases {
        t.Run(c.name, func(t *testing.T) {
            expr := ast.NewLiteral(c.args.pos, c.args.end, c.args.kind, c.args.value)
            assert.Equal(t, c.wantPos, expr.Pos())
            assert.Equal(t, c.wantEnd, expr.End())
            assert.Equal(t, c.wantString, expr.String())
        })
    }
}
  • Coverage improvements: Aim for at least 90% test coverage.
  • Performance testing needs: Measure the performance impact of wildcard expressions on large queries.

6. Documentation & Maintenance

6.1 Documentation Requirements

  • API documentation updates: Document the new WildcardExpr type and its usage.
  • Architecture documentation: N/A
  • Configuration changes: N/A
  • Usage examples: N/A

6.2 Maintenance Considerations

  • Long-term maintainability: The code is well-organized and modular.
  • Technical debt assessment: Address the wildcard detection logic to handle escaped sequences.
  • Monitoring requirements: Monitor the performance of the parser in production.

7. Deployment & Operations

7.1 Deployment Impact

  • Deployment strategy: Follow standard deployment practices.
  • Rollback plan: N/A
  • Configuration changes: N/A

7.2 Operational Considerations

  • Monitoring requirements: Track any errors related to wildcard expressions.
  • Performance metrics: Measure the parser's performance with wildcard expressions.
  • Resource utilization: Ensure efficient resource usage.

8. Summary & Recommendations

8.1 Key Action Items

  1. Fix the wildcard detection logic to handle multiple levels of escaping (High Priority).
  2. Add comprehensive test cases for various wildcard patterns and edge cases (Medium Priority).
  3. Optimize the parsing logic for wildcards to improve performance (Low Priority).

8.2 Future Considerations

  • Long-term improvements: Continuously improve the parser's efficiency.
  • Technical debt items: Address any remaining issues with the wildcard detection logic.
  • Scalability considerations: Ensure the parser remains efficient with the new feature.

By addressing the identified issues and following the recommended actions, this PR can be improved to ensure a robust and efficient implementation of wildcard expressions in the KQL parser.

@laojianzi laojianzi deleted the feat/value-wildcard branch October 31, 2024 08:10
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant