Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Panic when printing a Struct of Objects #15237

Closed
2 tasks done
douglas-raillard-arm opened this issue Mar 22, 2024 · 3 comments · Fixed by #20680
Closed
2 tasks done

Panic when printing a Struct of Objects #15237

douglas-raillard-arm opened this issue Mar 22, 2024 · 3 comments · Fixed by #20680
Labels
A-dtype-object Area: object data type A-dtype-struct Area: struct data type A-panic Area: code that results in panic exceptions bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@douglas-raillard-arm
Copy link
Contributor

douglas-raillard-arm commented Mar 22, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
pl.DataFrame(dict(a=[1], schema={'a': pl.Duration()}))

Log output

>>> POLARS_VERBOSE=1 python3
Python 3.11.7 (main, Dec  8 2023, 18:56:57) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> pl.DataFrame(dict(a=[1], schema={'a': pl.Duration()}))
[1]    2396717 segmentation fault (core dumped)  POLARS_VERBOSE=1 python3

Issue description

Creating the DataFrame segfaults when specifying a schema that uses pl.Duration()

Expected behavior

Something else than a segfault

Installed versions

--------Version info---------
Polars:               0.20.16
Index type:           UInt32
Platform:             Linux-5.15.0-92-generic-x86_64-with-glibc2.31
Python:               3.11.7 (main, Dec  8 2023, 18:56:57) [GCC 9.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.3
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

@douglas-raillard-arm douglas-raillard-arm added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 22, 2024
@mcrumiller
Copy link
Contributor

You are not calling pl.DataFrame correctly--you put your schema inside your dict. So what you are doing is the equivalent of:

import polars as pl

pl.DataFrame({
    "a": [1],
    "schema": {"a": pl.Duration()},
})

However, this is still a segfault which is bad. The primary issue here is actually Cannot create struct column where elements are dtypes:

import polars as pl

df = pl.DataFrame({
    "a": [pl.Int32, pl.UInt8],
})
df.select(pl.struct("a"))  # segmentation fault

@douglas-raillard-arm
Copy link
Contributor Author

Indeed, that probably explains why no-one else had reported something similar :)

@stinodego stinodego added P-low Priority: low and removed needs triage Awaiting prioritization by a maintainer labels Mar 22, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Mar 22, 2024
@stinodego stinodego added the A-input-parsing Area: parsing input arguments label Mar 22, 2024
@orlp
Copy link
Collaborator

orlp commented Apr 11, 2024

An even simpler reproduction:

import polars as pl

obj = object()
df = pl.DataFrame({"a": [obj]})
struct = df.select(pl.struct("a"))
print(struct) # segfault

@orlp orlp added P-high Priority: high and removed P-low Priority: low labels Apr 11, 2024
@stinodego stinodego added A-dtype-struct Area: struct data type A-dtype-object Area: object data type and removed A-input-parsing Area: parsing input arguments labels Apr 11, 2024
@ritchie46 ritchie46 changed the title pl.DataFrame(schema={...: pl.Duration()}) segfaults pl.DataFrame(schema={...: pl.Duration()}) panics Apr 15, 2024
@ritchie46 ritchie46 added P-low Priority: low and removed P-high Priority: high labels Apr 15, 2024
@stinodego stinodego changed the title pl.DataFrame(schema={...: pl.Duration()}) panics Segfault when using a Struct of Objects Apr 15, 2024
@stinodego stinodego changed the title Segfault when using a Struct of Objects Panic when printing a Struct of Objects May 25, 2024
@stinodego stinodego added the A-panic Area: code that results in panic exceptions label Jun 17, 2024
@stinodego stinodego added P-medium Priority: medium and removed P-low Priority: low labels Jul 3, 2024
@github-project-automation github-project-automation bot moved this from Ready to Done in Backlog Jan 13, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-dtype-object Area: object data type A-dtype-struct Area: struct data type A-panic Area: code that results in panic exceptions bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants