Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update examples for new models #80

Merged
merged 3 commits into from
Mar 6, 2025
Merged

Update examples for new models #80

merged 3 commits into from
Mar 6, 2025

Conversation

boqiny
Copy link
Member

@boqiny boqiny commented Mar 6, 2025

User description

Description

Since we were turning the chart and figure off by default in the extract_args in cdk, we need to ensure the figure output are still returned in the examples, so here. I added the config for the extract_args. Also update the result markdown_list to be properly rendered, previously the examples was not able to run, since the output is a list instead of strings now.

Related Issue

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement

How Has This Been Tested?

Screenshots (if applicable)

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes


PR Type

  • Enhancement
  • Documentation

Description

  • Updated notebook cell execution counts.

  • Added extract_args and dotenv configurations.

  • Joined markdown list into single string.

  • Downgraded Python version in examples.


Changes walkthrough 📝

Relevant files
Enhancement
parse_img.ipynb
Update execution counts and display call.                               

examples/parse_img.ipynb

  • Updated cell execution counts.
  • Adjusted markdown display call.
+11/-12 
parse_pdf2.ipynb
Improve PDF example with config and formatting.                   

examples/parse_pdf2.ipynb

  • Added dotenv and OS import.
  • Introduced extract_args for PDF parsing.
  • Joined markdown list elements.
  • Updated timing output and execution numbers.
  • +109/-79
    async_parse_pdf2.ipynb
    Improve async PDF parsing and output display.                       

    examples/async_parse_pdf2.ipynb

  • Added dotenv imports and extract_args usage.
  • Updated cell execution counts.
  • Inserted waiting response messages.
  • Joined markdown output for display.
  • +105/-75
    parse_docx.ipynb
    Refactor DOCX example execution and formatting.                   

    examples/parse_docx.ipynb

  • Updated cell execution counts.
  • Revised table markdown formatting.
  • Adjusted version details and display call.
  • +13/-11 
    Additional files
    async_parse_pdf.ipynb +268/-314
    parse_pdf.ipynb +268/-133

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link

    github-actions bot commented Mar 6, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Markdown Handling

    The new change uses "display(Markdown(markdown_string[0]))" which assumes the output markdown is a list with at least one element. Verify that this approach consistently renders the complete markdown output across different cells.

    "display(Markdown(markdown_string[0]))\n",
    
    API Key Handling

    The new dotenv configuration loads the API key from the environment but does not include error handling if the key is missing. Consider adding checks or logging to handle missing environment variables.

     "from any_parser import AnyParser\n",
     "import os\n",
     "from dotenv import load_dotenv"
    ]
    
    Python Compatibility

    The Python version has been downgraded from 3.11.10 to 3.10.15. Ensure that this change is intentional and that all dependencies and features remain compatible.

    "version": "3.10.15"
    

    Copy link

    github-actions bot commented Mar 6, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Validate API key presence

    Verify that the API key is obtained successfully to prevent runtime failures.

    examples/parse_pdf2.ipynb [33]

     example_apikey = os.getenv("CAMBIO_API_KEY")
    +if not example_apikey:
    +    raise ValueError("CAMBIO_API_KEY not set")
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion adds a defensive check to raise an error if the API key is not set, which helps prevent runtime failures. It correctly targets the line where the API key is retrieved.

    Medium
    Validate markdown output type

    Ensure that markdown_output is a list before joining to avoid unintended behavior
    when it is a string.

    examples/parse_pdf2.ipynb [200-202]

    -"# Join the list elements with newlines to create a single string\nmarkdown_text = '\\n\\n'.join(markdown_output)\ndisplay(Markdown(markdown_text))"
    +if isinstance(markdown_output, list):
    +    markdown_text = "\n\n".join(markdown_output)
    +else:
    +    markdown_text = markdown_output
    +display(Markdown(markdown_text))
    Suggestion importance[1-10]: 4

    __

    Why: The suggestion adds a type check to ensure that markdown_output is a list before joining, which is a minor defensive coding improvement. However, the PR diff does not indicate issues with markdown_output’s type, so the impact is lower.

    Low

    @lingjiekong
    Copy link
    Member

    @boqiny can you fix the build failure.

    @lingjiekong lingjiekong requested a review from Copilot March 6, 2025 08:19
    Copy link

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    PR Overview

    This PR updates several Jupyter notebook examples to support new configuration options for extracting figures and charts, while also refining markdown output formatting and standardizing execution counts. Key changes include:

    • Adding dotenv configuration and extract_args for improved extraction parameters.
    • Changing markdown output handling by joining list elements into a single string.
    • Downgrading the Python version metadata to 3.10.15 for consistency.

    Reviewed Changes

    File Description
    examples/parse_pdf2.ipynb Added dotenv and extract_args, refined markdown joining, and updated execution counts and version.
    examples/parse_img.ipynb Updated execution counts and modified the markdown output display to index the joined output list.
    examples/parse_docx.ipynb Refactored markdown formatting and execution counts, along with minor text corrections.
    examples/async_parse_pdf2.ipynb Enhanced async PDF parsing with extract_args, added waiting messages, and updated metadata version.

    Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

    Comments suppressed due to low confidence (2)

    examples/parse_pdf2.ipynb:222

    • The Python version downgrade to 3.10.15 should be confirmed to be compatible with all dependencies; ensure that this change doesn't inadvertently break any functionality.
        "version": "3.10.15"
    

    examples/parse_img.ipynb:175

    • [nitpick] Standardize the handling of markdown output across notebooks; if markdown_string is always a list, consider renaming the variable or documenting its structure for clarity.
    display(Markdown(markdown_string[0]))
    

    @@ -48,6 +59,10 @@
    "name": "stdout",
    "output_type": "stream",
    "text": [
    "Waiting for response...\n",
    Copy link
    Preview

    Copilot AI Mar 6, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    [nitpick] The repeated 'Waiting for response...' messages might clutter the output; consider reducing the repetition or implementing a dynamic progress indicator.

    Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

    Positive Feedback
    Negative Feedback

    Provide additional feedback

    Please help us improve GitHub Copilot by sharing more details about this comment.

    Please select one or more of the options
    @boqiny
    Copy link
    Member Author

    boqiny commented Mar 6, 2025

    @boqiny can you fix the build failure.

    fixed, preprocess to remove special characters and line breaks, this aligns with the testcases in cdk now.

    @lingjiekong lingjiekong merged commit eca8b0f into main Mar 6, 2025
    1 check passed
    # for free to join this conversation on GitHub. Already have an account? # to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants