no skip_rows option for Excel files #19161

TiburonEl · 2024-10-09T09:44:13Z

Description

I'm having trouble importing an Excel file where the first few rows contain merged cells. I considered skipping these rows during import, but unfortunately, there is no skip_rows option available for Excel files in Polars (only for CSV files).

Is there a way to skip these rows? If not, could this functionality be added in a future release?

cmdlineluser · 2024-10-09T10:18:43Z

read_options={"skip_rows": ...} will get passed to the underlying engine¹.

http://fastexcel.toucantoco.dev/fastexcel.html#ExcelReader.load_sheet_by_name ↩

TiburonEl · 2024-10-09T11:57:15Z

From what I understand, this parameter can only be an integer, meaning I can skip just one row. What if I want to skip multiple rows? Imagine I was to skip row 1,2 and 5th

avimallu · 2024-10-10T12:42:59Z

skip_rows is typically intended to ignore a certain number of rows before reading a full table in, because of the Excel file author's formatting preferences.

The operation you're describing is more of a filter, which is what I think you should be using with something like .filter(~pl.int_range(0, pl.len()).is_in(rows_to_skip).

anapaulagomes · 2024-12-18T02:13:38Z

I'd love to work on this issue.

anapaulagomes · 2024-12-18T02:36:26Z

This is what worked for me by testing the proposed solutions:

# with read_options
land = pl.read_excel("data/DTB_2022/RELATORIO_DTB_BRASIL_MUNICIPIO.xls", read_options={"header_row": 6})

# without read_options
land = pl.read_excel("data/DTB_2022/RELATORIO_DTB_BRASIL_MUNICIPIO.xls")
land.filter(~pl.int_range(0, pl.len()).lt(5))

I used header_row because it skips the rows and uses the row passed as header.

TiburonEl added the enhancement New feature or an improvement of an existing feature label Oct 9, 2024

PrettyWood mentioned this issue Oct 14, 2024

support callable for skip_rows ToucanToco/fastexcel#302

Open

alexander-beedie added the A-io-spreadsheet Area: reading/writing Excel/ODS files label Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no skip_rows option for Excel files #19161

no skip_rows option for Excel files #19161

TiburonEl commented Oct 9, 2024

cmdlineluser commented Oct 9, 2024

TiburonEl commented Oct 9, 2024 •

edited

Loading

avimallu commented Oct 10, 2024

anapaulagomes commented Dec 18, 2024

anapaulagomes commented Dec 18, 2024

no skip_rows option for Excel files #19161

no skip_rows option for Excel files #19161

Comments

TiburonEl commented Oct 9, 2024

Description

cmdlineluser commented Oct 9, 2024

Footnotes

TiburonEl commented Oct 9, 2024 • edited Loading

avimallu commented Oct 10, 2024

anapaulagomes commented Dec 18, 2024

anapaulagomes commented Dec 18, 2024

TiburonEl commented Oct 9, 2024 •

edited

Loading