Storing values #765

SwannChelly · 2024-12-30T09:28:46Z

Dear all,

Thank you for this very useful package.

I don't really know where to put this question and it is not really an issue. I am using Pyfixest on a very large dataset. It seems that when running several times pf.feols and storing the output in a list, it increases a lot the memory usage. I fill that the dataset I use is being copied and stored in the list each time I am adding a new pf.feols object.

I would like to know if:

Is it the case or the increase in size is only due to the storing of the fixed effects?
Is there any option to shrink the size of the pf.feols output in order to keep only the information relevant for a regression table?

Best,
Swan

s3alfisc · 2024-12-30T10:34:02Z

Hi Swan - I've had the exact same problem before and that's why there are two function argument for feols and fepois that allow you to configure how to store the data objects in the Feols / Fepois objects:

store_data:

Whether to store the data in the model object, by default True. If set to False, the data is not stored in the model object, which can improve performance and save memory. However, it will no longer be possible to access the data via the data attribute of the model object. This has impact on post-estimation capabilities that rely on the data, e.g. predict() or vcov().

By setting store_data = False, you lose the capability to update the vcov matrix post estimation - the reason is that you cannot select a cluster variable that is not part of the initial model if the data is not stored fully. Additionally, it is not possible to predict when estimating with fixed effects, as the fixef() method needs access to the input data. etable() will still work - I think this is the option that you want.

copy_data allows you to select out of making an internal copy of the input data - this might help with memory as well but might change data outside of the function call if it is modified inside of the call (which currently shouldn't be the case, but there are no guarantees).

Generally I am wondering if this is a design flaw and we should change the default to not store all the data objects in the Feols object?

SwannChelly · 2024-12-30T10:40:34Z

Hi s3alfisc,

Thank you for this fast answer. It really fits my needs! Since the main use of PyFixest I have is to be able to run FeOLS on large dataset, I think not storing the values should be a better default variable !

Have a great day!

s3alfisc added the question Further information is requested label Dec 30, 2024

s3alfisc mentioned this issue Dec 30, 2024

Global Options to Set Estimation Parameters #766

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing values #765

Storing values #765

SwannChelly commented Dec 30, 2024

s3alfisc commented Dec 30, 2024

SwannChelly commented Dec 30, 2024

Storing values #765

Storing values #765

Comments

SwannChelly commented Dec 30, 2024

s3alfisc commented Dec 30, 2024

SwannChelly commented Dec 30, 2024