Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix(torii-erc): lenient parsing for invalid json for erc721 #2977

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Larkooo
Copy link
Collaborator

@Larkooo Larkooo commented Jan 31, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a utility function to sanitize JSON strings by escaping unescaped quotes.
  • Bug Fixes

    • Improved error handling and logging mechanisms in ERC721 token processing.
    • Enhanced metadata fetching error reporting.
  • Chores

    • Streamlined logging statements for better diagnostic clarity.
    • Added tests to validate the new JSON sanitization functionality.

Copy link

coderabbitai bot commented Jan 31, 2025

Walkthrough

Ohayo, sensei! The pull request focuses on refining error handling and logging in the crates/torii/sqlite/src/executor/erc.rs file. The changes primarily target the process_register_erc721_token_query method, introducing a match statement for clearer error management and enhanced logging mechanisms. Additionally, a new utility function, sanitize_json_string, has been added in crates/torii/sqlite/src/utils.rs to improve JSON string sanitation. These modifications enhance log clarity and error context without fundamentally altering the core functionality of the methods.

Changes

File Change Summary
crates/torii/sqlite/src/executor/erc.rs - Replaced nested structure with a match statement in process_register_erc721_token_query
- Enhanced error logging with debug! and warn! macros
- Improved metadata parsing log messages
- Removed trace! logging statement
crates/torii/sqlite/src/utils.rs - Added public function sanitize_json_string for escaping unescaped quotes
- Introduced test function test_sanitize_json_string to validate the new function

Possibly related PRs

Suggested reviewers

  • glihm

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
crates/torii/sqlite/src/executor/erc.rs (1)

Line range hint 303-325: Ohayo! Enhanced logging for data URI handling, sensei!

The additional debug logging helps track the data URI parsing process and decoded content.

However, there are two issues to address:

  1. The decoded string is logged but not used (we're using the original decoded.0)
  2. The formatting needs adjustment per pipeline error

Apply these fixes:

-                debug!(decoded_str = %decoded_str, "Decoded metadata");
-
-                let json: serde_json::Value = serde_json::from_str(&String::from_utf8_lossy(&decoded.0))
+                let json: serde_json::Value = serde_json::from_str(&decoded_str)
🧰 Tools
🪛 GitHub Actions: ci

[error] 322-324: Code formatting error: Incorrect line breaks and indentation in JSON parsing logic

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4b733d and b2472e0.

📒 Files selected for processing (1)
  • crates/torii/sqlite/src/executor/erc.rs (4 hunks)
🧰 Additional context used
🪛 GitHub Actions: ci
crates/torii/sqlite/src/executor/erc.rs

[error] 245-247: Code formatting error: Incorrect block formatting for match expression with metadata handling


[error] 322-324: Code formatting error: Incorrect line breaks and indentation in JSON parsing logic

🔇 Additional comments (1)
crates/torii/sqlite/src/executor/erc.rs (1)

13-13: Ohayo! Import optimization looks good, sensei!

The logging imports have been correctly optimized to include only the levels actually used in the code.

Comment on lines 245 to 257
let metadata = Self::fetch_metadata(&token_uri).await;

match metadata {
Ok(metadata) => serde_json::to_string(&metadata).context("Failed to serialize metadata")?,
Err(err) => {
debug!(error = %err, token_uri = %token_uri, "Error fetching metadata");
warn!(
contract_address = format!("{:#x}", register_erc721_token.contract_address),
token_id = %register_erc721_token.actual_token_id,
"Error fetching metadata, empty metadata will be used instead.",
);
"".to_string()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ohayo! Nice error handling improvements, sensei!

The match-based error handling with dual-level logging provides better visibility into metadata fetch failures.

However, the block formatting needs adjustment to fix the pipeline error.

Apply this formatting fix:

-            let metadata = Self::fetch_metadata(&token_uri).await;
-
-            match metadata {
+            match Self::fetch_metadata(&token_uri).await {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let metadata = Self::fetch_metadata(&token_uri).await;
match metadata {
Ok(metadata) => serde_json::to_string(&metadata).context("Failed to serialize metadata")?,
Err(err) => {
debug!(error = %err, token_uri = %token_uri, "Error fetching metadata");
warn!(
contract_address = format!("{:#x}", register_erc721_token.contract_address),
token_id = %register_erc721_token.actual_token_id,
"Error fetching metadata, empty metadata will be used instead.",
);
"".to_string()
}
match Self::fetch_metadata(&token_uri).await {
Ok(metadata) => serde_json::to_string(&metadata).context("Failed to serialize metadata")?,
Err(err) => {
debug!(error = %err, token_uri = %token_uri, "Error fetching metadata");
warn!(
contract_address = format!("{:#x}", register_erc721_token.contract_address),
token_id = %register_erc721_token.actual_token_id,
"Error fetching metadata, empty metadata will be used instead.",
);
"".to_string()
}
🧰 Tools
🪛 GitHub Actions: ci

[error] 245-247: Code formatting error: Incorrect block formatting for match expression with metadata handling

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/torii/sqlite/src/utils.rs (2)

56-106: Ohayo sensei! Consider using a JSON parser for more robust handling.

While the current implementation works for basic cases, it might miss edge cases that a proper JSON parser would handle. Consider using serde_json for more robust parsing and string sanitization.

-pub fn sanitize_json_string(s: &str) -> String {
-    let mut result = String::new();
-    let mut chars = s.chars().peekable();
-    let mut in_string = false;
-    
-    while let Some(c) = chars.next() {
-        match c {
-            '"' => {
-                if !in_string {
-                    // Starting a string
-                    result.push('"');
-                    in_string = true;
-                } else {
-                    // Check next char to see if this is the end of the string
-                    match chars.peek() {
-                        Some(&':') | Some(&',') | Some(&'}') => {
-                            // This is end of a JSON string
-                            result.push('"');
-                            in_string = false;
-                        }
-                        _ => {
-                            // This is an internal quote that needs escaping
-                            result.push_str("\\\"");
-                        }
-                    }
-                }
-            }
-            '\\' => {
-                if let Some(&next) = chars.peek() {
-                    if next == '"' {
-                        // Already escaped quote, preserve it without adding extra escapes
-                        result.push('\\');
-                        result.push('"');
-                        chars.next(); // Consume the quote
-                    } else {
-                        // Regular backslash
-                        result.push('\\');
-                    }
-                } else {
-                    result.push('\\');
-                }
-            }
-            _ => {
-                result.push(c);
-            }
-        }
-    }
-    
-    result
+use serde_json::{Value, json};
+
+pub fn sanitize_json_string(s: &str) -> String {
+    // Parse the input as JSON
+    if let Ok(mut value) = serde_json::from_str::<Value>(s) {
+        // If parsing succeeds, the string is already valid JSON
+        return s.to_string();
+    }
+    
+    // If parsing fails, try to escape quotes in string values
+    let escaped = s.replace(r#"""#, r#"\""#);
+    if let Ok(_) = serde_json::from_str::<Value>(&escaped) {
+        return escaped;
+    }
+    
+    // If all attempts fail, return the original string
+    s.to_string()
}
🧰 Tools
🪛 GitHub Actions: ci

[warning] 58-58: Code formatting issue: unnecessary blank line.


[warning] 101-101: Code formatting issue: unnecessary blank line.


221-232: Add more test cases for edge scenarios.

The current test cases cover basic scenarios but miss important edge cases.

 #[test]
 fn test_sanitize_json_string() {
     let input = r#"{"name":""Rage Shout" DireWolf"}"#;
     let expected = r#"{"name":"\"Rage Shout\" DireWolf"}"#;
     let sanitized = sanitize_json_string(input);
     assert_eq!(sanitized, expected);
 
     let input_escaped = r#"{"name":"\"Properly Escaped\" Wolf"}"#;
     let expected_escaped = r#"{"name":"\"Properly Escaped\" Wolf"}"#;
     let sanitized_escaped = sanitize_json_string(input_escaped);
     assert_eq!(sanitized_escaped, expected_escaped);
+    
+    // Test nested objects
+    let input_nested = r#"{"metadata":{"name":""Nested" Object"}}"#;
+    let expected_nested = r#"{"metadata":{"name":"\"Nested\" Object"}}"#;
+    let sanitized_nested = sanitize_json_string(input_nested);
+    assert_eq!(sanitized_nested, expected_nested);
+    
+    // Test arrays
+    let input_array = r#"{"names":[""First"", ""Second""]}"#;
+    let expected_array = r#"{"names":["\"First\"", "\"Second\""]}"#;
+    let sanitized_array = sanitize_json_string(input_array);
+    assert_eq!(sanitized_array, expected_array);
+    
+    // Test empty string
+    assert_eq!(sanitize_json_string(""), "");
+    
+    // Test invalid JSON
+    let input_invalid = r#"{"name":}"#;
+    assert_eq!(sanitize_json_string(input_invalid), input_invalid);
 }
🧰 Tools
🪛 GitHub Actions: ci

[warning] 224-224: Code formatting issue: unnecessary blank line.


[warning] 231-231: Code formatting issue: unnecessary blank line.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42aa4c3 and 8a52e55.

📒 Files selected for processing (2)
  • crates/torii/sqlite/src/executor/erc.rs (4 hunks)
  • crates/torii/sqlite/src/utils.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/torii/sqlite/src/executor/erc.rs
🧰 Additional context used
🪛 GitHub Actions: ci
crates/torii/sqlite/src/utils.rs

[warning] 58-58: Code formatting issue: unnecessary blank line.


[warning] 101-101: Code formatting issue: unnecessary blank line.


[warning] 224-224: Code formatting issue: unnecessary blank line.


[warning] 231-231: Code formatting issue: unnecessary blank line.

@@ -53,6 +53,58 @@ pub fn sql_string_to_felts(sql_string: &str) -> Vec<Felt> {
sql_string.split(SQL_FELT_DELIMITER).map(|felt| Felt::from_str(felt).unwrap()).collect()
}

/// Sanitizes a JSON string by escaping unescaped double quotes within string values.
pub fn sanitize_json_string(s: &str) -> String {
let mut result = String::new();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix formatting issues flagged by CI.

Remove unnecessary blank lines as indicated by the pipeline failures.

 pub fn sanitize_json_string(s: &str) -> String {
-
     let mut result = String::new();
     // ... rest of the function
     match c {
         _ => {
             result.push(c);
         }
-
     }
     // ... rest of the function
     let sanitized = sanitize_json_string(input);
     assert_eq!(sanitized, expected);
-
     let input_escaped = r#"{"name":"\"Properly Escaped\" Wolf"}"#;
     // ... rest of the test
     let sanitized_escaped = sanitize_json_string(input_escaped);
     assert_eq!(sanitized_escaped, expected_escaped);
-
 }

Also applies to: 101-101, 224-224, 231-231

🧰 Tools
🪛 GitHub Actions: ci

[warning] 58-58: Code formatting issue: unnecessary blank line.

@Larkooo Larkooo changed the title fix(torii): metadata data uri parsing for erc721 fix(torii-erc): metadata data uri parsing for erc721 Feb 3, 2025
@Larkooo Larkooo changed the title fix(torii-erc): metadata data uri parsing for erc721 fix(torii-erc): lenient parsing for invalid json for erc721 Feb 3, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
crates/torii/sqlite/src/utils.rs (3)

56-57: Enhance function documentation.

Ohayo sensei! The documentation could be more comprehensive to help users understand the function's purpose and usage better.

-/// Sanitizes a JSON string by escaping unescaped double quotes within string values.
+/// Sanitizes a JSON string by properly escaping unescaped double quotes within string values.
+///
+/// This function processes JSON strings that may have improperly escaped quotes within their string values,
+/// making them valid for JSON parsing. It maintains string boundary awareness to only escape quotes that
+/// are inside JSON string values.
+///
+/// # Arguments
+/// * `s` - A string slice that contains a JSON string that may have unescaped quotes
+///
+/// # Returns
+/// A String with all necessary quotes properly escaped
+///
+/// # Example
+/// ```
+/// let input = r#"{"name":""Rage Shout" DireWolf"}"#;
+/// let sanitized = sanitize_json_string(input);
+/// assert_eq!(sanitized, r#"{"name":"\"Rage Shout\" DireWolf"}"#);
+/// ```

58-106: Consider adding error handling for malformed JSON.

The function assumes well-formed JSON input but could handle edge cases better.

 pub fn sanitize_json_string(s: &str) -> String {
-    let mut result = String::new();
+    let mut result = String::with_capacity(s.len() + 10); // Pre-allocate with some extra space for escapes
     let mut chars = s.chars().peekable();
     let mut in_string = false;
+    let mut last_char_pos = 0;
 
     while let Some(c) = chars.next() {
+        last_char_pos += 1;
         match c {
             '"' => {
                 if !in_string {
                     // Starting a string
                     result.push('"');
                     in_string = true;
                 } else {
                     // Check next char to see if this is the end of the string
                     match chars.peek() {
                         Some(&':') | Some(&',') | Some(&'}') => {
                             // This is end of a JSON string
                             result.push('"');
                             in_string = false;
                         }
                         _ => {
                             // This is an internal quote that needs escaping
                             result.push_str("\\\"");
                         }
                     }
                 }
             }
             // ... rest of the match cases ...
         }
     }
 
+    // Check for unclosed strings
+    if in_string {
+        warn!(
+            "Unclosed string detected at position {}. Input may be malformed.",
+            last_char_pos
+        );
+    }
+
     result
 }

221-232: Enhance test coverage with additional cases.

The current tests cover basic scenarios, but we should add more edge cases for robustness.

 #[test]
 fn test_sanitize_json_string() {
     let input = r#"{"name":""Rage Shout" DireWolf"}"#;
     let expected = r#"{"name":"\"Rage Shout\" DireWolf"}"#;
     let sanitized = sanitize_json_string(input);
     assert_eq!(sanitized, expected);
 
     let input_escaped = r#"{"name":"\"Properly Escaped\" Wolf"}"#;
     let expected_escaped = r#"{"name":"\"Properly Escaped\" Wolf"}"#;
     let sanitized_escaped = sanitize_json_string(input_escaped);
     assert_eq!(sanitized_escaped, expected_escaped);
+
+    // Test empty string
+    assert_eq!(sanitize_json_string(""), "");
+
+    // Test string without quotes
+    let no_quotes = r#"{"number":42}"#;
+    assert_eq!(sanitize_json_string(no_quotes), no_quotes);
+
+    // Test nested objects
+    let nested = r#"{"data":{"name":""Nested" Object"}}"#;
+    let expected_nested = r#"{"data":{"name":"\"Nested\" Object"}}"#;
+    assert_eq!(sanitize_json_string(nested), expected_nested);
+
+    // Test array of strings
+    let array = r#"{"tags":["Hello "World"", "Test"]}"#;
+    let expected_array = r#"{"tags":["Hello \"World\"", "Test"]}"#;
+    assert_eq!(sanitize_json_string(array), expected_array);
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d45f9fc and 0721b87.

📒 Files selected for processing (2)
  • crates/torii/sqlite/src/executor/erc.rs (4 hunks)
  • crates/torii/sqlite/src/utils.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/torii/sqlite/src/executor/erc.rs
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: docs
  • GitHub Check: ensure-wasm
  • GitHub Check: build

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 67.18750% with 21 lines in your changes missing coverage. Please review.

Project coverage is 57.02%. Comparing base (f4b733d) to head (0721b87).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
crates/torii/sqlite/src/executor/erc.rs 0.00% 15 Missing ⚠️
crates/torii/sqlite/src/utils.rs 87.75% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2977      +/-   ##
==========================================
- Coverage   57.10%   57.02%   -0.09%     
==========================================
  Files         424      425       +1     
  Lines       56215    56352     +137     
==========================================
+ Hits        32104    32137      +33     
- Misses      24111    24215     +104     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant