Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement multiple header_rows #166

Open
deanm0000 opened this issue Feb 1, 2024 · 3 comments
Open

Implement multiple header_rows #166

deanm0000 opened this issue Feb 1, 2024 · 3 comments

Comments

@deanm0000
Copy link
Contributor

I find that people will often have Excel sheets where they use multiple header rows to make up their column names.

Here's a snippet of how I deal with that coming from a df generated with python_calamine

header_merge_char="_" # for example
header_rows = 3 # again, for example
df.columns=[
    header_merge_char.join([y for y in x if y != ""])
    for x in zip(*[df.rows()[x] for x in range(header_rows)])
]

this is just a snippet and doesn't handle duplicate column names but that's a separate issue.

A more advanced version of this might infer the header_rows by skipping down (let's say) 10 rows and look for types starting there. Then choose a column which isn't a string and go back to the true row=0 and see how many rows down it needs to go before it no longer sees strings. Then that's the inferred header_rows

@deanm0000
Copy link
Contributor Author

In case it helps, I made this function using python-calamine which does header_row detecting

@PrettyWood
Copy link
Member

Hello @deanm0000
@lukapeschke and I are considering adding this to v0.12.0
Do you have a MRE with an excel file + expected output?
Thanks

@deanm0000
Copy link
Contributor Author

example.xlsx

Here's an attached excel file.

I'd expect the output to be (assuming polars)

pl.DataFrame({
    'index':[1,2,3,4,5,6],
    'Red_apple':[3,10,3,5,0,7],
    'Red_banana':[2,4,0,5,4,6],
    'Red_carrot':[7,9,10,5,1,5,],
    'Blue_apple':[6,4,0,3,2,5],
    'Blue_banana':[8,1,7,0,10,4],
    'Blue_carrot':[0,10,5,8,1,10]
})
shape: (6, 7)
┌───────┬───────────┬────────────┬────────────┬────────────┬─────────────┬─────────────┐
│ index ┆ Red_apple ┆ Red_banana ┆ Red_carrot ┆ Blue_apple ┆ Blue_banana ┆ Blue_carrot │
│ ---   ┆ ---       ┆ ---        ┆ ---        ┆ ---        ┆ ---         ┆ ---         │
│ i64   ┆ i64       ┆ i64        ┆ i64        ┆ i64        ┆ i64         ┆ i64         │
╞═══════╪═══════════╪════════════╪════════════╪════════════╪═════════════╪═════════════╡
│ 1     ┆ 3         ┆ 2          ┆ 7          ┆ 6          ┆ 8           ┆ 0           │
│ 2     ┆ 10        ┆ 4          ┆ 9          ┆ 4          ┆ 1           ┆ 10          │
│ 3     ┆ 3         ┆ 0          ┆ 10         ┆ 0          ┆ 7           ┆ 5           │
│ 4     ┆ 5         ┆ 5          ┆ 5          ┆ 3          ┆ 0           ┆ 8           │
│ 5     ┆ 0         ┆ 4          ┆ 1          ┆ 2          ┆ 10          ┆ 1           │
│ 6     ┆ 7         ┆ 6          ┆ 5          ┆ 5          ┆ 4           ┆ 10          │

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants