Coefficient Distribution and Confidence Intervals #280

AmauryVVK · 2024-12-19T11:12:45Z

AmauryVVK
Dec 19, 2024

Hello DoubleML team,

I have a question regarding the coefficient residual distribution when running multiple repetitions. The residuals are expected to follow a normal distribution as explained in your basics documentation. However, in the examples, you generate new samples at each repetition. I’m wondering if this applies also to real-life (fixed-size) datasets.

To experiment this, I applied, on the 401(k) dataset, the same approach you follow except I’m sampling from the same original dataset at each repetition. I observe that the distribution of residuals narrows down as the sample size increases (whether I apply PLR or IRM).

Similarly, if I run a model using n_rep > 1 and I then compare the provided confidence intervals with the observed quantiles (i.e. using .all_coef), the observed quantiles are narrower than the calculated CIs.

If building confidence intervals requires independent data samples, what is the purpose of using n_rep > 1?

Thanks in advance

SvenKlaassen · 2024-12-28T07:13:25Z

SvenKlaassen
Dec 28, 2024
Maintainer

Dear @AmauryVVK ,

Thank you for your question and observations! Let me clarify:

The primary purpose of a confidence interval is to quantify the uncertainty around a point estimate. In the package, confidence intervals are generally designed under the assumption of independent sampling. With fixed datasets, observed quantiles tend to be narrower because they do not account for sampling variability. When reusing the same dataset (as shown in the right figure), the point estimate reflects the fixed dataset structure and varies primarily due to factors like variability in the learners, sample splits, etc.
Even with fixed datasets, using multiple repetitions helps stabilize estimates by averaging out variability introduced by different sample splits. This approach also evaluates robustness, although it cannot fully replicate the variability observed with independent sampling.

Best regards

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coefficient Distribution and Confidence Intervals #280

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Coefficient Distribution and Confidence Intervals #280

AmauryVVK Dec 19, 2024

Replies: 1 comment

SvenKlaassen Dec 28, 2024 Maintainer

AmauryVVK
Dec 19, 2024

SvenKlaassen
Dec 28, 2024
Maintainer