-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
MemoryError with more than 1E9 rows #8252
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
You can try separately creating Series (with each of the columns first), then putting them into a dict and creating the frame. However you might be having a problem finding contiguous memory. |
finally had time to look at this. I think their was an extra copy going on in certain cases. so try this out using master (once I merge this change). This seems to scale much better. and the following slightly modified code:
|
@mattdowle I updated the example to give a pretty simplied version, that give pretty good memory performance (e.g is just a bit over 1X final data size) by not trying to create everything at once. |
I have 240GB of RAM. Nothing else running on the machine. I'm trying to create 1.5E9 rows, which I think should create a data frame of around 100GB, but getting this MemoryError. This works fine with 1E9 but not 1.5E9. I could understand a limit at about 2^31 (2E9) or 2^32 (4E9) but all 240GB seems exhausted (according to htop) at somewhere between 1E9 and 1.5E9 rows. Any ideas? Thanks.
An earlier question on S.O. is here : http://stackoverflow.com/questions/25631076/is-this-the-fastest-way-to-group-in-pandas
The text was updated successfully, but these errors were encountered: