Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BUG: Integrated pandas can't Read CSV while latest pandas can #730

Open
charliedream1 opened this issue Oct 3, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@charliedream1
Copy link

Describe the bug

  • Problem 1: a 25G csv file, latest pandas can load properly, however, "import xorbits.pandas as pd" can't, xorbits gives out EOF error
  • Problem 2: a data frame data loaded from latest pandas can't be send to dup function (from xorbits.experimental import dedup)
  • Problem 3: dedup function can handle a str with very long str, e.g. length between 4000-100,000, it gives out error "too many open files"

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version: 3.10
  2. The version of Xorbits you use: 0.6.3
  3. Versions of crucial packages, such as numpy, scipy and pandas: numpy 1.26.0, scipy 1.11.3, pandas 2.1.1
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added the bug Something isn't working label Oct 3, 2023
@XprobeBot XprobeBot added this to the v0.7.0 milestone Oct 3, 2023
@codingl2k1
Copy link
Contributor

Problem 1: Is your csv file located in local disk or remote (by a url)?
Probelm 2: Are you using pandas to load the csv and constructing a xorbit Dataframe by the pandas Dataframe? If so, it could be out of memory crash, because the full data will be serilialized to worker.
Problem 3: The too many open files can be fixed by configure the ulimit.

@XprobeBot XprobeBot modified the milestones: v0.7.0, v0.7.1 Oct 23, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2 Nov 21, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.2, v0.7.3 Jan 5, 2024
@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024
@luweizheng luweizheng removed this from the v0.7.4 milestone Dec 16, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants