Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BUG: read_parquet generates a memory allocation error #729

Open
BlackArbsCEO opened this issue Oct 2, 2023 · 1 comment
Open

BUG: read_parquet generates a memory allocation error #729

BlackArbsCEO opened this issue Oct 2, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@BlackArbsCEO
Copy link

Describe the bug

A clear and concise description of what the bug is.

when I try to read a large parquet file using pd.read_parquet('my_large_file.pqt') it generates the below stack trace. I know it fits in memory because pandas can read it albeit slowly. The files are between 4.5 GB and 1.5 GB in size.

2023-10-02 14:15:39,732 xorbits._mars.deploy.oscar.local 25232 WARNING  Web service started at http://127.0.0.1:59977
  0%|          |   0.00/100 [00:00<?, ?it/s]2023-10-02 14:15:39,929 xorbits._mars.services.scheduling.worker.execution 25232 ERROR    Failed to run subtask eDnnloDuO0VyUkOf3tneQUCY on band numa-0
Traceback (most recent call last):
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 494, in internal_run_subtask
    subtask_info.result = await self._retry_run_subtask(
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 618, in _retry_run_subtask
    return await _retry_run(subtask, subtask_info, _run_subtask_once)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 192, in _retry_run
    raise ex
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 154, in _retry_run
    return await target_async_func(*args)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 527, in _run_subtask_once
    await quota_ref.request_batch_quota(batch_quota_req)
  File "xoscar\\core.pyx", line 284, in __pyx_actor_method_wrapper
  File "xoscar\\core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\quota.py", line 119, in request_batch_quota
    raise ValueError(
ValueError: Cannot allocate quota size 19629902034.0 larger than total capacity 13668492902.
2023-10-02 14:15:39,932 xorbits._mars.services.scheduling.worker.execution 25232 ERROR    Failed to run subtask GeGwp96LjmNNdpVR6oS8x325 on band numa-0
Traceback (most recent call last):
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 494, in internal_run_subtask
    subtask_info.result = await self._retry_run_subtask(
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 618, in _retry_run_subtask
    return await _retry_run(subtask, subtask_info, _run_subtask_once)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 192, in _retry_run
    raise ex
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 154, in _retry_run
    return await target_async_func(*args)
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\execution.py", line 527, in _run_subtask_once
    await quota_ref.request_batch_quota(batch_quota_req)
  File "xoscar\\core.pyx", line 284, in __pyx_actor_method_wrapper
  File "xoscar\\core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
  File "C:\Users\kngka\miniconda3\envs\agd\lib\site-packages\xorbits\_mars\services\scheduling\worker\quota.py", line 119, in request_batch_quota
    raise ValueError(
ValueError: Cannot allocate quota size 26955130596.0 larger than total capacity 13668492902.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
    python 3.10
  2. The version of Xorbits you use
    xorbits 0.6.3 pypi_0 pypi
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added the bug Something isn't working label Oct 2, 2023
@XprobeBot XprobeBot added this to the v0.7.0 milestone Oct 2, 2023
@aresnow1
Copy link
Contributor

aresnow1 commented Oct 7, 2023

Thanks for your reporting, could you provide your schema of parquet file?

@XprobeBot XprobeBot modified the milestones: v0.7.0, v0.7.1 Oct 23, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2 Nov 21, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.2, v0.7.3 Jan 5, 2024
@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024
@luweizheng luweizheng removed this from the v0.7.4 milestone Dec 16, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants