Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update examples for new models #80

Merged
merged 3 commits into from
Mar 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
582 changes: 268 additions & 314 deletions examples/async_parse_pdf.ipynb

Large diffs are not rendered by default.

180 changes: 105 additions & 75 deletions examples/async_parse_pdf2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,41 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display, Markdown\n",
"from any_parser import AnyParser"
"from any_parser import AnyParser\n",
"import os\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"ap = AnyParser(api_key=\"...\")"
"load_dotenv(override=True)\n",
"example_apikey = os.getenv(\"CAMBIO_API_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"file_id = ap.async_parse(file_path=\"./sample_data/Earnings-Presentation-Q2-2024.pdf\")"
"ap = AnyParser(example_apikey)\n",
"\n",
"# Define extract_args as a dictionary with your desired parameters\n",
"extract_args = {\n",
" \"vqa_figures_flag\": True,\n",
" \"vqa_charts_flag\": True\n",
"}\n",
"\n",
"file_id = ap.async_parse(file_path=\"./sample_data/Earnings-Presentation-Q2-2024.pdf\", extract_args=extract_args)"
]
},
{
Expand All @@ -48,6 +59,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Waiting for response...\n",
Copy link
Preview

Copilot AI Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The repeated 'Waiting for response...' messages might clutter the output; consider reducing the repetition or implementing a dynamic progress indicator.

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
"Waiting for response...\n",
"Waiting for response...\n",
"Waiting for response...\n",
"Waiting for response...\n"
]
}
Expand All @@ -58,37 +73,47 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"## Meta Earnings Presentation\n",
"## Q2 2024\n",
"\n",
"investor.fb.com Meta logo, consisting of a stylized infinity symbol next to the text \"Meta\"\n",
"Revenue by User Geography In Millions Meta logo\n",
"\n",
"| Quarter | Rest of World | Asia-Pacific | Europe | US & Canada |\n",
"|---------|---------------|--------------|--------|-------------|\n",
"| Q2'24 | $39,071 | $7,888 | $7,888 | $9,300 |\n",
"| Q1'24 | $16,847 | $15,824 | $15,824| $16,847 |\n",
"| Q4'23 | $18,585 | $15,824 | $15,824| $16,847 |\n",
"| Q3'23 | $15,777 | $15,777 | $15,777| $16,847 |\n",
"| Q2'23 | $14,422 | $15,190 | $15,190| $16,847 |\n",
"| Q1'23 | $13,342 | $14,422 | $14,422| $16,847 |\n",
"| Q4'22 | $12,655 | $13,249 | $13,249| $16,847 |\n",
"| Q3'22 | $13,797 | $13,035 | $13,035| $16,847 |\n",
"| Q2'22 | $13,782 | $13,050 | $13,050| $16,847 |\n",
"\n",
"The chart shows revenue by user geography in millions of dollars from Q2'22 to Q2'24. It is divided into three categories: Rest of World, Asia-Pacific, and Europe. The data is presented for each quarter, with the y-axis showing revenue in millions of dollars and the x-axis showing the quarters. The legend indicates that the rest of world is represented by light gray, Asia-Pacific by dark gray, and Europe by dark gray.\n",
"Meta Earnings Presentation Q2 2024 \n",
"\n",
"investor.fb.com\n",
"\n",
" Meta logo, consisting of an infinity symbol followed by the text \"Meta\"\n",
"\n",
"Revenue by User Geography Meta logo \n",
"\n",
"In Millions\n",
"\n",
" \n",
"| Quarter | US & Canada | Europe | Asia-Pacific | Rest of World | Total |\n",
"|---|---|---|---|---|---|\n",
"| Q2'24 | 16,847 | 9,300 | 7,888 | 5,036 | 39,071 |\n",
"| Q1'24 | 15,824 | 8,483 | 7,481 | 4,667 | 36,455 |\n",
"| Q4'23 | 18,585 | 9,441 | 7,512 | 4,573 | 40,111 |\n",
"| Q3'23 | 15,190 | 7,777 | 6,928 | 4,251 | 34,146 |\n",
"| Q2'23 | 14,422 | 7,323 | 6,515 | 3,739 | 31,999 |\n",
"| Q1'23 | 13,048 | 6,345 | 5,960 | 3,292 | 28,645 |\n",
"| Q4'22 | 15,636 | 7,050 | 6,050 | 3,429 | 32,165 |\n",
"| Q3'22 | 13,035 | 5,797 | 5,782 | 3,100 | 27,714 |\n",
"| Q2'22 | 13,249 | 6,452 | 5,797 | 3,213 | 28,822 |\n",
"\n",
"This stacked bar chart shows the revenue by user geography for Meta from Q2'22 to Q2'24. The revenue is divided into four categories: US & Canada, Europe, Asia-Pacific, and Rest of World. The total revenue for each quarter is shown at the top of each bar.\n",
" \n",
"\n",
"Our revenue by user geography is geographically apportioned based on our estimation of the geographic location of our users when they perform a revenue-generating activity. This allocation differs from our revenue disaggregated by geography disclosure in our condensed consolidated financial statements where revenue is geographically apportioned based on the addresses of our customers.\n",
"\n",
"3\n",
"Segment Results In Millions Meta logo\n",
" 3\n",
"\n",
"Segment Results Meta logo \n",
"\n",
"In Millions\n",
"\n",
" \n",
"| | Q2'22 | Q3'22 | Q4'22 | Q1'23 | Q2'23 | Q3'23 | Q4'23 | Q1'24 | Q2'24 |\n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| Advertising | $ 28,152 | $ 27,237 | $ 31,254 | $ 28,101 | $ 31,498 | $ 33,643 | $ 38,706 | $ 35,635 | $ 38,329 |\n",
Expand All @@ -100,55 +125,65 @@
"| Reality Labs Operating (Loss) | (2,806) | (3,672) | (4,279) | (3,992) | (3,739) | (3,742) | (4,646) | (3,846) | (4,488) |\n",
"| Total Income from Operations | $ 8,358 | $ 5,664 | $ 6,399 | $ 7,227 | $ 9,392 | $ 13,748 | $ 16,384 | $ 13,818 | $ 14,847 |\n",
"| Operating Margin | 29% | 20% | 20% | 25% | 29% | 40% | 41% | 38% | 38% |\n",
" \n",
"\n",
"We report our financial results based on two reportable segments: Family of Apps (FoA) and Reality Labs (RL). FoA includes Facebook, Instagram, Messenger, WhatsApp, and other services. RL includes our virtual, augmented, and mixed reality related consumer hardware, software, and content.\n",
"\n",
"4\n",
"Net Income In Millions Meta logo\n",
"\n",
"| Quarter | Net Income (In Millions) |\n",
"|---------|--------------------------|\n",
"| Q2'22 | 6,687 |\n",
"| Q3'22 | 4,395 |\n",
"| Q4'22 | 4,652 |\n",
"| Q1'23 | 5,709 |\n",
"| Q2'23 | 7,788 |\n",
"| Q3'23 | 11,583 |\n",
"| Q4'23 | 12,369 |\n",
"| Q1'24 | 13,465 |\n",
"| Q2'24 | 14,017 |\n",
"\n",
"This chart shows the Net Income in millions of dollars for Meta from Q2'22 to Q2'24. The y-axis ranges from $6,687 million to $14,017 million, with each quarter's data represented by a bar graph.\n",
"\n",
"7\n",
"Diluted Earnings Per Share Meta logo, consisting of an infinity symbol\n",
"\n",
"| Quarter | Diluted Earnings Per Share |\n",
"|---------|---------------------------|\n",
"| Q2'22 | $2.46 |\n",
"| Q3'22 | $1.64 |\n",
"| Q4'22 | $1.76 |\n",
"| Q1'23 | $2.20 |\n",
"| Q2'23 | $2.98 |\n",
"| Q3'23 | $4.39 |\n",
"| Q4'23 | $5.33 |\n",
"| Q1'24 | $4.71 |\n",
"| Q2'24 | $5.16 |\n",
"\n",
"This chart shows the diluted earnings per share from Q2'22 to Q2'24. The y-axis ranges from $2.46 to $5.16, with each quarter's earnings represented by a bar. The bars are stacked vertically, with the highest bar in Q2'24 and the lowest in Q1'23.\n",
"\n",
"8\n",
"Limitations of Key Metrics and Other Data Meta logo\n",
" 4\n",
"\n",
"Net Income Meta logo \n",
"\n",
"In Millions\n",
"\n",
" \n",
"| Quarter | Net Income |\n",
"|---|---|\n",
"| Q2'22 | $6,687 |\n",
"| Q3'22 | $4,395 |\n",
"| Q4'22 | $4,652 |\n",
"| Q1'23 | $5,709 |\n",
"| Q2'23 | $7,788 |\n",
"| Q3'23 | $11,583 |\n",
"| Q4'23 | $14,017 |\n",
"| Q1'24 | $12,369 |\n",
"| Q2'24 | $13,465 |\n",
"\n",
"This bar chart shows the Net Income in millions for Meta from Q2'22 to Q2'24. The y-axis ranges from $0 to $14,017 million, with increments of $1,000 million. The highest net income was $14,017 million in Q4'23, while the lowest was $4,395 million in Q3'22.\n",
" \n",
"\n",
" 7\n",
"\n",
"Diluted Earnings Per Share Meta logo \n",
"\n",
" \n",
"| Quarter | Earnings Per Share |\n",
"|---|---|\n",
"| Q2'22 | $2.46 |\n",
"| Q3'22 | $1.64 |\n",
"| Q4'22 | $1.76 |\n",
"| Q1'23 | $2.20 |\n",
"| Q2'23 | $2.98 |\n",
"| Q3'23 | $4.39 |\n",
"| Q4'23 | $5.33 |\n",
"| Q1'24 | $4.71 |\n",
"| Q2'24 | $5.16 |\n",
"\n",
"This bar chart shows the Diluted Earnings Per Share for Meta from Q2'22 to Q2'24. The y-axis ranges from $1.64 to $5.33, with increments of $0.02. The chart demonstrates an overall increasing trend in earnings per share over the period, with the highest point in Q4'23 at $5.33 and the lowest in Q3'22 at $1.64.\n",
" \n",
"\n",
" 8\n",
"\n",
"Limitations of Key Metrics and Other Data Meta logo \n",
"\n",
"To calculate our estimates of DAP, we currently use a series of machine learning models that are developed based on internal reviews of limited samples of user accounts and calibrated against user survey data. We apply significant judgment in designing these models and calculating these estimates. For example, to match user accounts within individual products and across multiple products, we use data signals such as similar device information, IP addresses, and user names. We also calibrate our models against data from periodic user surveys of varying sizes and frequency across our products, which survey questions are based on monthly usage, and which are inherently subject to error. The timing and results of such user surveys have in the past contributed, and may in the future contribute, to changes in our reported Family metrics from period to period. In addition, our data limitations may affect our understanding of certain details of our business and increase the risk of error for our Family metrics estimates. Our techniques and models rely on a variety of data signals from different products, and we rely on more limited data signals for some products compared to others. For example, as a result of limited visibility into encrypted products, we have fewer data signals from WhatsApp user accounts and primarily rely on phone numbers and device information to match WhatsApp user accounts with accounts on our other products. Any loss of access to data signals we use in our process for calculating Family metrics, whether as a result of our own product decisions, actions by third-party browser or mobile platforms, regulatory or legislative requirements, or other factors, also may impact the stability or accuracy of our reported Family metrics, as well as our ability to report these metrics at all. Our estimates of Family metrics also may change as our methodologies evolve, including through the application of new data signals or technologies, product changes, or other improvements in our user surveys, algorithms, or machine learning that may improve our ability to match accounts within and across our products or otherwise evaluate the broad population of our users. In addition, such evolution may allow us to identify previously undetected violating accounts (as defined below).\n",
"\n",
"We regularly evaluate our Family metrics to estimate the percentage of our DAP consisting solely of \"violating\" accounts. We define \"violating\" accounts as accounts which we believe are intended to be used for purposes that violate our terms of service, including bots and spam. In the first quarter of 2024, we estimated that less than 3% of our worldwide DAP consisted solely of violating accounts. Such estimation is based on an internal review of a limited sample of accounts, and we apply significant judgment in making this determination. For example, we look for account information and behaviors associated with Facebook and Instagram accounts that appear to be authentic to the reviewers, but we have limited visibility into WhatsApp user activity due to encryption. In addition, if we believe an individual person has one or more violating accounts, we do not include such person in our violating accounts estimation as long as we believe they have one account that does not constitute a violating account. From time to time, we disable certain user accounts, make product changes, or take other actions to reduce the number of violating accounts among our users, which may also reduce our DAP estimates in a particular period. We intend to disclose our estimates of the percentage of our DAP consisting solely of violating accounts on an annual basis. Violating accounts are very difficult to measure at our scale, and it is possible that the actual number of violating accounts may vary significantly from our estimates.\n",
"We regularly evaluate our Family metrics to estimate the percentage of our DAP consisting solely of \"violating\" accounts. We define \"violating\" accounts as accounts which we believe are intended to be used for purposes that violate our terms of service, including bots and spam. In the first quarter of 2024, we estimated that less than 3% of our worldwide DAP consisted solely of violating accounts. Such estimation is based on an internal review of a limited sample of accounts, and we apply significant judgment in making this determination. For example, we look for account information and behaviors associated with Facebook and Instagram accounts that appear to be inauthentic to the reviewers, but we have limited visibility into WhatsApp user activity due to encryption. In addition, if we believe an individual person has one or more violating accounts, we do not include such person in our violating accounts estimation as long as we believe they have one account that does not constitute a violating account. From time to time, we disable certain user accounts, make product changes, or take other actions to reduce the number of violating accounts among our users, which may also reduce our DAP estimates in a particular period. We intend to disclose our estimates of the percentage of our DAP consisting solely of violating accounts on an annual basis. Violating accounts are very difficult to measure at our scale, and it is possible that the actual number of violating accounts may vary significantly from our estimates.\n",
"\n",
"## User Geography\n",
"\n",
"Our estimates for revenue by user location, as well as year-over-year percentage changes in ad impressions delivered and the average price per ad by user location, are also affected by data limitations and other challenges in measuring user geography. Our data regarding the geographic location of our users is estimated based on a number of factors, such as the user's IP address and self-disclosed location. These factors may not always accurately reflect the user's actual location. For example, a user may appear to be accessing our products from the location of the proxy server that the user connects to rather than from the user's actual location. The methodologies used to measure our metrics are also susceptible to algorithm or other technical errors.\n",
"\n",
"17"
" 17"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
Expand All @@ -159,15 +194,10 @@
}
],
"source": [
"display(Markdown(markdown_output))"
"# Join the list elements with newlines to create a single string\n",
"markdown_text = '\\n\\n'.join(markdown_output)\n",
"display(Markdown(markdown_text))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -186,7 +216,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
"version": "3.10.15"
}
},
"nbformat": 4,
Expand Down
24 changes: 13 additions & 11 deletions examples/parse_docx.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -57,7 +57,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -87,7 +87,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -108,7 +108,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -147,26 +147,28 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"## Test document\n",
"\n",
"## Here is an example chart:\n",
"Here is an example chart:\n",
"\n",
" \n",
"| Investor Metrics | FY23 Q1 | FY23 Q2 | FY23 Q3 | FY23 Q4 | FY24 Q1 |\n",
"|------------------|---------|---------|---------|---------|---------|\n",
"|---|---|---|---|---|---|\n",
"| Office Commercial products and cloud services revenue growth (y/y) | 7% / 13% | 7% / 14% | 13% / 17% | 12% / 14% | 15% / 14% |\n",
"| Office Consumer products and cloud services revenue growth (y/y) | 7% / 11% | (2)% / 3% | 1% / 4% | 3% / 6% | 3% / 4% |\n",
"| Office 365 Commercial seat growth (y/y) | 14% | 12% | 11% | 11% | 10% |\n",
"| Microsoft 365 Consumer subscribers (in millions) | 65.1 | 67.7 | 70.8 | 74.9 | 76.7 |\n",
"| Dynamics products and cloud services revenue growth (y/y) | 15% / 22% | 13% / 20% | 17% / 21% | 19% / 21% | 22% / 21% |\n",
"| LinkedIn revenue growth (y/y) | 17% / 21% | 10% / 14% | 8% / 11% | 6% / 8% | 8% |\n",
"\n",
"Growth rates include non-GAAP CC growth (GAP % / CC %)\n",
"Growth rates include non-GAAP CC growth (GAAP % / CC %)\n",
" \n",
"\n",
"Done."
],
Expand All @@ -181,7 +183,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Time Elapsed: 3.83 seconds\n"
"Time Elapsed: 9.20 seconds\n"
]
}
],
Expand All @@ -191,7 +193,7 @@
"# extract returns a tuple containing the markdown as a string and total time\n",
"markdown_string, total_time = ap.parse(example_local_file)\n",
"\n",
"display(Markdown(markdown_string))\n",
"display(Markdown(markdown_string[0]))\n",
"print(total_time)"
]
},
Expand Down Expand Up @@ -225,7 +227,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
"version": "3.10.15"
}
},
"nbformat": 4,
Expand Down
Loading