Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: add fuzz testing for parser and handle escaped colons #10

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

laojianzi
Copy link
Owner

@laojianzi laojianzi commented Nov 29, 2024

Summary by Sourcery

Enhance the parser's robustness by introducing fuzz testing and improving the lexer to handle escaped characters more accurately. Add new test cases to cover edge cases and ensure the parser's stability against malformed input.

New Features:

  • Introduce fuzz testing for the parser to ensure robustness against unexpected input.

Enhancements:

  • Improve the lexer to handle escaped colons correctly and refine token collection logic.

Tests:

  • Add a new test suite for fuzz testing the parser, including a variety of seed inputs to test different query formats.
  • Add specific test cases to handle edge cases in parsing, such as unexpected tokens and escaped characters.

Copy link

sourcery-ai bot commented Nov 29, 2024

Reviewer's Guide by Sourcery

This PR adds fuzzing tests to the parser package and fixes several bugs discovered through fuzzing. The changes include improvements to token handling in the lexer, parser modifications for better operator handling, and new test cases for previously uncovered edge cases.

Sequence diagram for fuzzing test process

sequenceDiagram
    actor Developer
    participant FuzzParser
    participant Parser
    participant Lexer
    Developer->>FuzzParser: Add seed queries
    FuzzParser->>Parser: Parse query
    Parser->>Lexer: Tokenize query
    Lexer-->>Parser: Return tokens
    Parser-->>FuzzParser: Return statement
    FuzzParser->>Parser: Parse statement.String()
    Parser->>Lexer: Tokenize statement
    Lexer-->>Parser: Return tokens
    Parser-->>FuzzParser: Return statement
    FuzzParser-->>Developer: Assert no errors and equality
Loading

Class diagram for parser and lexer changes

classDiagram
    class Parser {
        +Stmt() ast.Expr
    }
    class Lexer {
        +shouldBreak(i int, isString, withEscape bool, endChar rune) bool
        +collectNextToken(start int) string
    }
    class TestParserFuzzBugs {
        +TestParserFuzzBugs(t *testing.T)
    }
    class FuzzParser {
        +FuzzParser(f *testing.F)
    }
    Parser --> Lexer
    TestParserFuzzBugs --> Parser
    FuzzParser --> Parser
    note for Lexer "Improved token handling logic"
    note for Parser "Modified operator handling logic"
Loading

File-Level Changes

Change Details Files
Added fuzzing infrastructure and test cases
  • Implemented a new FuzzParser function with comprehensive seed inputs
  • Added basic safety checks for parsing, including no-panic verification
  • Included validation for String() output stability and re-parsing capability
parser/fuzz_test.go
parser/testdata/fuzz/FuzzParser/45eecbed51a417f5
parser/testdata/fuzz/FuzzParser/74ba8e9a0ce951ee
Enhanced lexer token handling
  • Improved character peeking logic for better escape sequence handling
  • Added special handling for colon characters in non-string contexts
  • Modified token collection break conditions
parser/lexer.go
Updated parser operator validation
  • Added additional check for field token kind in binary expression parsing
parser/parser.go
Added regression tests for fuzzing-discovered bugs
  • Created test cases for edge cases with escaped AND keywords
  • Added tests for various colon placement scenarios
  • Included error case validation for invalid syntax
parser/parser_test.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai bot changed the title @sourcery-ai feat: add fuzz testing for parser and handle escaped colons Nov 29, 2024
Copy link
Contributor

deepsource-io bot commented Nov 29, 2024

Here's the code health analysis summary for commits 733963f..d9796a1. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource Go LogoGo✅ SuccessView Check ↗
DeepSource Test coverage LogoTest coverage✅ Success
❗ 1 occurence introduced
View Check ↗

Code Coverage Report

MetricAggregateGo
Composite Coverage84.9% (down 0.4% from main)84.9% (down 0.4% from main)
Line Coverage84.9% (down 0.4% from main)84.9% (down 0.4% from main)
New Composite Coverage66.7%66.7%
New Line Coverage66.7%66.7%

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @laojianzi - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 PR Summary

  • Business value and requirements alignment: The PR enhances the parser's robustness by introducing fuzz testing and improving the lexer to handle escaped characters more accurately. This aligns with the business requirement to ensure the parser can handle a wider range of inputs, including edge cases and malformed queries.
  • Key components modified: The PR primarily modifies the parser, lexer, and test files.
  • Impact assessment: The improvements will make the parser more robust, accurate, and reliable, benefiting users by improving the overall reliability of the KQL parser.

1.2 Architecture Changes

  • System design modifications: Introduction of fuzz testing to the parser package.
  • Component interactions: The lexer's improved handling of escaped characters will directly impact the parser's ability to process queries accurately.
  • Integration points: The new fuzz testing infrastructure will integrate with the existing testing framework.
  • Dependency changes and implications: No new dependencies are introduced, but the testing suite is expanded with fuzz testing.

2. Detailed Technical Analysis

2.1 Code Logic Deep-Dive

parser/fuzz_test.go - FuzzParser

  • Submitted PR Code:
    func FuzzParser(f *testing.F) {
    	// Add initial corpus
    	seeds := []string{
    		"field:value",
    		"field: value",
    		"field : value",
    		`field: "value"`,
    		"field: *",
    		"field: value*",
    		"field: *value",
    		"field: *value*",
    		"field > 10",
    		"field >= 10",
    		"field < 10",
    		"field <= 10",
    		"field: true",
    		"field: false",
    		"field: null",
    		"field1: value1 AND field2: value2",
    		"field1: value1 OR field2: value2",
    		"NOT field: value",
    		"(field: value)",
    		"(field1: value1) AND (field2: value2)",
    		`field: "value with spaces"`,
    		`field: "value with \"escaped\" quotes"`,
    		`field: "value with
     newline"`,
    		"field1: value1 AND field2: value2 OR field3: value3",
    		"field1: (value1 OR value2) AND field2: value3",
    	}
    
    	for _, seed := range seeds {
    		f.Add(seed)
    	}
    
    	f.Fuzz(func(t *testing.T, query string) {
    		if strings.TrimSpace(query) == "" {
    			return
    		}
    
    		// TODO: Current fuzzing implementation has limitations in input/output validation.
    		// This test only covers basic safety checks:
    		// 1. No panics during parsing
    		// 2. String() output can be re-parsed
    		// 3. String() output remains stable
    		//
    		// Contributions welcome for better validation approaches :)
    		stmt, err := parser.New(query).Stmt()
    		if err != nil || stmt == nil {
    			return
    		}
    
    		stmt2, err := parser.New(stmt.String()).Stmt()
    		assert.NoError(t, err)
    		assert.Equal(t, stmt.String(), stmt2.String())
    	})
    }
    • Analysis:
      • Current logic and potential issues: The fuzz testing function adds a variety of seed inputs to test different query formats. The current implementation has basic safety checks but lacks comprehensive input/output validation.
      • Edge cases and error handling: The function handles empty queries by returning early. However, it does not cover all edge cases, such as malformed queries that might cause panics.
      • Cross-component impact: The fuzz testing will help identify issues in the parser and lexer, improving overall robustness.
      • Business logic considerations: Ensures the parser can handle a wide range of inputs, including edge cases and malformed queries.
    • LlamaPReview Suggested Improvements:
      // Enhance the TODO comment with specific suggestions for better validation
      // TODO: Current fuzzing implementation has limitations in input/output validation.
      // Suggestions:
      // 1. Implement a more robust validation mechanism to check the semantic correctness of the parsed statements.
      // 2. Add checks for known edge cases and malformed inputs.
      // 3. Consider using a validation library or framework to automate some of these checks.
    • Improvement rationale:
      • Technical benefits: Enhances the robustness of the fuzz testing by covering more edge cases and validating the semantic correctness of the parsed statements.
      • Business value: Improves the reliability of the parser, ensuring it can handle a wider range of inputs.
      • Risk assessment: Low risk, as the suggestions are mainly about enhancing the validation mechanism without changing the core logic.

parser/lexer.go - shouldBreak

  • Submitted PR Code:
    func (l *defaultLexer) shouldBreak(i int, isString, withEscape bool, endChar rune) bool {
    	ch := l.peek(i)
    	if isString && !withEscape && ch == endChar {
    		return true
    	}
    
    	if !isString && !withEscape {
    		if unicode.IsSpace(ch) || ch == ')' || ch == ':' {
    			return true
    		}
    	}
    
    	// not \:
    	if !isString && withEscape && ch == ':' && (!l.peekOk(i-1) || l.peek(i-1) != '\\') {
    		return true
    	}
    
    	return false
    }
    • Analysis:
      • Current logic and potential issues: The function checks if token collection should stop based on various conditions. The addition of handling escaped colons improves accuracy but might miss some edge cases.
      • Edge cases and error handling: The function handles escaped colons correctly but does not cover all possible escape sequences.
      • Cross-component impact: Improvements in the lexer will directly affect the parser's ability to handle queries accurately.
      • Business logic considerations: Ensures the lexer can handle escaped characters correctly, improving the parser's robustness.
    • LlamaPReview Suggested Improvements:
      // Add a comment explaining the purpose of the new condition
      // Handle escaped colons correctly to prevent premature token breaking
      if !isString && withEscape && ch == ':' && (!l.peekOk(i-1) || l.peek(i-1) != '\\') {
      	return true
      }
    • Improvement rationale:
      • Technical benefits: Clarifies the purpose of the new condition, making the code more maintainable.
      • Business value: Ensures the lexer handles escaped colons correctly, improving the parser's accuracy.
      • Risk assessment: Low risk, as the suggestion is about adding a comment for better code understanding.

parser/parser.go - parseLiteral

  • Submitted PR Code:
    expr, err := p.parseLiteral()
    if err != nil {
    	return nil, err
    }
    
    if !hasNot {
    	pos = expr.Pos()
    }
    
    op := p.lexer.Token.Kind
    if !op.IsOperator() || !p.lexer.lastTokenKind.IsField() {
    	return ast.NewBinaryExpr(pos, "", 0, expr, hasNot), nil
    }
    • Analysis:
      • Current logic and potential issues: The parser checks if the current token is an operator and if the last token is a field. This ensures that only valid binary expressions are parsed. However, it does not handle cases where the operator is valid but the field is not.
      • Edge cases and error handling: The current implementation might miss edge cases where the operator is valid but the field is not, leading to incorrect parsing.
      • Cross-component impact: This change affects the parser's ability to handle queries accurately, impacting the overall robustness of the system.
      • Business logic considerations: Ensures that only valid binary expressions are parsed, improving the accuracy of query processing.
    • LlamaPReview Suggested Improvements:
      // Add a comment explaining the purpose of the new condition
      // Ensure that the operator is valid and the last token is a field
      if !op.IsOperator() || !p.lexer.lastTokenKind.IsField() {
      	return ast.NewBinaryExpr(pos, "", 0, expr, hasNot), nil
      }
    • Improvement rationale:
      • Technical benefits: Clarifies the purpose of the new condition, making the code more maintainable.
      • Business value: Ensures that only valid binary expressions are parsed, improving the accuracy of query processing.
      • Risk assessment: Low risk, as the suggestion is about adding a comment for better code understanding.

parser/parser_test.go - TestParserFuzzBugs

  • Submitted PR Code:
    func TestParserFuzzBugs(t *testing.T) {
    	tests := []struct {
    		query   string
    		want    ast.Expr
    		wantErr error
    	}{
    		{
    			"0 :0",
    			nil,
    			errors.New("line 0:2 expected keyword OR|AND|NOT, but got \":\"

0 :0
^
"),
},
{
"\AND :0",
ast.NewBinaryExpr(0, "\AND", token.TokenKindOperatorEql, ast.NewLiteral(
6, 7, token.TokenKindInt, "0", nil,
), false),
nil,
},
{
"\AND: 0",
ast.NewBinaryExpr(0, "\AND", token.TokenKindOperatorEql, ast.NewLiteral(
6, 7, token.TokenKindInt, "0", nil,
), false),
nil,
},
}

for _, tt := range tests {
	t.Run(tt.query, func(t *testing.T) {
		stmt, err := parser.New(tt.query).Stmt()

		if tt.wantErr == nil {
			assert.NoError(t, err)
		} else {
			assert.EqualError(t, err, tt.wantErr.Error())
		}

		assert.EqualValues(t, tt.want, stmt)
	})
}

}

- **Analysis**:
  * **Current logic and potential issues**: The test cases cover various edge cases, including escaped AND keywords and colon placement scenarios. However, they do not cover all possible edge cases, such as malformed queries with invalid escape sequences.
  * **Edge cases and error handling**: The current implementation handles specific edge cases but might miss others, leading to incomplete test coverage.
  * **Cross-component impact**: These test cases help identify issues in the parser and lexer, improving overall robustness.
  * **Business logic considerations**: Ensures the parser can handle a wide range of inputs, including edge cases and malformed queries.
- **LlamaPReview Suggested Improvements**:
  ```go
  // Add more test cases to cover additional edge cases
  {
  	"invalid\\escape: 0",
  	nil,
  	errors.New("line 0:1 expected keyword OR|AND|NOT, but got \"invalid\\escape\"
invalid\\escape: 0
^
"),
  },
  {
  	"field:\\invalid",
  	nil,
  	errors.New("line 0:7 expected keyword OR|AND|NOT, but got \"\\invalid\"
field:\\invalid
     ^
"),
  },
  ```
- **Improvement rationale**:
  * **Technical benefits**: Enhances test coverage by including more edge cases, ensuring the parser can handle a wider range of inputs.
  * **Business value**: Improves the reliability of the parser, ensuring it can handle various edge cases and malformed queries.
  * **Risk assessment**: Low risk, as the suggestions are about adding more test cases for better coverage.

### 2.2 Implementation Quality
- **Code Structure**:
* **Organization and modularity**: The code is well-organized, with clear separation of concerns between the lexer, parser, and tests.
* **Design pattern adherence**: The code follows Go idioms and best practices, such as using interfaces and structs for modularity.
* **Reusability aspects**: The fuzz testing infrastructure can be reused for other components, enhancing overall test coverage.
* **Maintainability factors**: The code is maintainable, with clear comments and logical structure.

- **Error Handling**:
* **Exception scenarios coverage**: The fuzz testing covers basic safety checks but lacks comprehensive input/output validation.
* **Recovery mechanisms**: The code handles errors gracefully, returning early in case of invalid inputs.
* **Logging and monitoring**: The code uses assertions for testing, but more detailed logging would be beneficial for monitoring.
* **User experience impact**: Improved error handling will enhance the user experience by making the parser more robust.

- **Performance Considerations**:
* **Resource utilization**: The fuzz testing might increase resource utilization due to the large number of test cases.
* **Scalability aspects**: The improvements in the lexer and parser will enhance scalability by handling more complex queries.
* **Bottleneck analysis**: The current implementation might have bottlenecks in handling large inputs, which fuzz testing will help identify.
* **Optimization opportunities**: Optimizing the fuzz testing to cover more edge cases without increasing resource utilization significantly.

## 3. Critical Findings
<details>

### 3.1 Potential Issues
- **Issue**: The fuzz testing lacks comprehensive input/output validation, which might lead to undetected issues in the parser.
- **Impact**:
* **Technical implications**: Potential undetected bugs in the parser that could lead to incorrect query processing.
* **Business consequences**: Decreased reliability of the parser, affecting user trust.
* **User experience effects**: Users might encounter unexpected behavior or errors when using the parser.
- **Suggested Resolution**:
* **Technical changes**: Enhance the fuzz testing to include more comprehensive input/output validation.
* **Configuration updates**: Update the fuzz testing configuration to cover more edge cases.
* **Testing requirements**: Add more test cases to validate the semantic correctness of the parsed statements.

- **Issue**: The lexer's handling of escaped characters might miss some edge cases, leading to incorrect token collection.
- **Impact**:
* **Performance implications**: Incorrect token collection might lead to performance issues in query processing.
* **Maintenance overhead**: Increased maintenance overhead due to potential bugs in the lexer.
* **Future scalability**: Affects the scalability of the parser by limiting its ability to handle complex queries.
- **Suggested Resolution**:
* **Technical changes**: Enhance the lexer to handle more escape sequences accurately.
* **Migration strategy**: Gradually introduce more comprehensive escape sequence handling in the lexer.
* **Testing requirements**: Add test cases to cover the new escape sequences.

- **Issue**: The parser does not handle cases where the operator is valid but the field is not, leading to incorrect parsing.
- **Impact**:
* **Accuracy implications**: Incorrect parsing might lead to incorrect query results, affecting the accuracy of data retrieval.
* **User trust**: Decreased user trust in the parser due to incorrect query results.
* **Maintenance burden**: Increased maintenance burden due to potential bugs in the parser.
- **Suggested Resolution**:
* **Technical changes**: Enhance the parser to handle cases where the operator is valid but the field is not.
* **Configuration updates**: Update the parser configuration to cover more edge cases.
* **Testing requirements**: Add test cases to validate the handling of these edge cases.

- **Issue**: The test cases in `parser/parser_test.go` do not cover all possible edge cases, leading to incomplete test coverage.
- **Impact**:
* **Robustness implications**: Incomplete test coverage might lead to undetected bugs in the parser, affecting its robustness.
* **User experience**: Users might encounter unexpected behavior or errors due to undetected bugs.
* **Maintenance overhead**: Increased maintenance overhead due to potential bugs in the parser.
- **Suggested Resolution**:
* **Technical changes**: Add more test cases to cover additional edge cases.
* **Testing requirements**: Update the test suite to include more comprehensive test cases.

</details>

## 4. Security Assessment
<details>

- **Authentication/Authorization impacts**: No significant security considerations in the current changes.
- **Data handling concerns**: No significant security considerations in the current changes.
- **Input validation**: The current fuzz testing implementation lacks comprehensive input/output validation, which might lead to undetected issues in the parser.
- **Security best practices**: Follow best practices for input validation and error handling to ensure the robustness of the parser.
- **Potential security risks**: The lack of comprehensive input/output validation might lead to undetected bugs in the parser, affecting its robustness and reliability.
- **Mitigation strategies**: Enhance the fuzz testing to include more comprehensive input/output validation.
- **Security testing requirements**: Add more test cases to validate the semantic correctness of the parsed statements.

</details>

## 5. Testing Strategy
<details>

- **Test Coverage**:
* **Unit test requirements**: Ensure comprehensive unit tests for the lexer and parser, covering all edge cases.
* **Integration test scenarios**: Add integration tests to validate the interaction between the lexer and parser.
* **Edge case validation**: The new test cases cover various edge cases, but more can be added for better coverage.
- **Test Recommendations**:
* **Suggested Test Cases**:
  ```go
  // Add more test cases to cover additional edge cases
  {
  	"invalid\\escape: 0",
  	nil,
  	errors.New("line 0:1 expected keyword OR|AND|NOT, but got \"invalid\\escape\"
invalid\\escape: 0
^
"),
  },
  {
  	"field:\\invalid",
  	nil,
  	errors.New("line 0:7 expected keyword OR|AND|NOT, but got \"\\invalid\"
field:\\invalid
     ^
"),
  },
  ```
* **Coverage improvements**: Ensure all critical paths in the lexer and parser are covered by tests.
* **Performance testing needs**: Monitor the performance impact of fuzz testing and optimize as needed.

</details>

## 6. Documentation & Maintenance
<details>

- **Documentation updates needed**: Update the documentation to reflect the new fuzz testing infrastructure and improvements in the lexer and parser.
- **Long-term maintenance considerations**: Ensure the code is well-documented and maintainable, with clear comments and logical structure.
- **Technical debt and monitoring requirements**: Monitor the technical debt and address any potential issues in the lexer and parser.

</details>

## 7. Deployment & Operations
<details>

- **Deployment impact and strategy**: The changes should be thoroughly tested before deployment to ensure they do not introduce new issues.
- **Key operational considerations**: Monitor the performance impact of fuzz testing and optimize as needed.

</details>

## 8. Summary & Recommendations
### 8.1 Key Action Items
1. **Enhance the fuzz testing** to include more comprehensive input/output validation.
2. **Update the lexer** to handle more escape sequences accurately.
3. **Improve the parser** to handle cases where the operator is valid but the field is not.
4. **Add more test cases** to cover additional edge cases and ensure comprehensive test coverage.

### 8.2 Future Considerations
- **Technical evolution path**: Continuously enhance the fuzz testing infrastructure and improve the lexer and parser to handle more complex queries.
- **Business capability evolution**: Ensure the parser can handle a wider range of inputs, including edge cases and malformed queries, to improve overall reliability.
- **System integration impacts**: Monitor the integration of the new fuzz testing infrastructure with the existing testing framework.

---
💡 **LlamaPReview Community**
Have feedback on this AI Code review tool? Join our [GitHub Discussions](https://github.com/JetXu-LLM/LlamaPReview-site/discussions) to share your thoughts and help shape the future of LlamaPReview.

@laojianzi laojianzi merged commit 68d76a1 into main Nov 29, 2024
4 checks passed
@laojianzi laojianzi deleted the test/fuzz branch November 29, 2024 08:22
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant