You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't really know where to put this question and it is not really an issue. I am using Pyfixest on a very large dataset. It seems that when running several times pf.feols and storing the output in a list, it increases a lot the memory usage. I fill that the dataset I use is being copied and stored in the list each time I am adding a new pf.feols object.
I would like to know if:
Is it the case or the increase in size is only due to the storing of the fixed effects?
Is there any option to shrink the size of the pf.feols output in order to keep only the information relevant for a regression table?
Best,
Swan
The text was updated successfully, but these errors were encountered:
Hi Swan - I've had the exact same problem before and that's why there are two function argument for feols and fepois that allow you to configure how to store the data objects in the Feols / Fepois objects:
store_data:
Whether to store the data in the model object, by default True. If set to False, the data is not stored in the model object, which can improve performance and save memory. However, it will no longer be possible to access the data via the data attribute of the model object. This has impact on post-estimation capabilities that rely on the data, e.g. predict() or vcov().
By setting store_data = False, you lose the capability to update the vcov matrix post estimation - the reason is that you cannot select a cluster variable that is not part of the initial model if the data is not stored fully. Additionally, it is not possible to predict when estimating with fixed effects, as the fixef() method needs access to the input data. etable() will still work - I think this is the option that you want.
copy_data allows you to select out of making an internal copy of the input data - this might help with memory as well but might change data outside of the function call if it is modified inside of the call (which currently shouldn't be the case, but there are no guarantees).
Generally I am wondering if this is a design flaw and we should change the default to not store all the data objects in the Feols object?
Thank you for this fast answer. It really fits my needs! Since the main use of PyFixest I have is to be able to run FeOLS on large dataset, I think not storing the values should be a better default variable !
Dear all,
Thank you for this very useful package.
I don't really know where to put this question and it is not really an issue. I am using Pyfixest on a very large dataset. It seems that when running several times pf.feols and storing the output in a list, it increases a lot the memory usage. I fill that the dataset I use is being copied and stored in the list each time I am adding a new pf.feols object.
I would like to know if:
Best,
Swan
The text was updated successfully, but these errors were encountered: