Is it possible to extract the tables span across multiple pages ? #531

All-In-Coder · 2025-01-10T10:47:28Z

I have a pdf where the table is spread across multiple pages. I need it to be in a single csv or excel format.
I have attached a screenshot of the PDF as well.

Steps to reproduce the bug

If you try to extract the code, it will extract the first table nicely but it is not able to extract the table below it.

Expected behavior

Both tables should be in one single table

Code

try:
  tables = camelot.read_pdf(pdf_path, pages="all") # Extract all pages
except Exception as e:
  print(f"Error extracting tables from {pdf_path}: {e}")
  return

extracted_data: Dict[str, Any] = {}

# Store table data as CSV and include path in JSON
for i, table in enumerate(tables):
    table_filename = f"table_{i + 1}.csv"
    table_path = os.path.join(tables_dir, table_filename)
    table.to_csv(table_path, index=False) # store as CSV
    extracted_data[f"table_{i+1}"] = table_path

PDF

Screenshots

Environment

OS: [e.g. macOS]
Python version:
Numpy version:
OpenCV version:
Ghostscript version:
camelot version:

Additional context

The text was updated successfully, but these errors were encountered:

All-In-Coder added the bug Something isn't working label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to extract the tables span across multiple pages ? #531

Is it possible to extract the tables span across multiple pages ? #531

All-In-Coder commented Jan 10, 2025

Is it possible to extract the tables span across multiple pages ? #531

Is it possible to extract the tables span across multiple pages ? #531

Comments

All-In-Coder commented Jan 10, 2025