Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

vector memory exhausted issue #31

Open
leqi0001 opened this issue Jan 7, 2023 · 1 comment
Open

vector memory exhausted issue #31

leqi0001 opened this issue Jan 7, 2023 · 1 comment

Comments

@leqi0001
Copy link

leqi0001 commented Jan 7, 2023

Hi,

Thanks for developing this package!

I'm following the vignette and trying to run glmmSeq on a relatively small dataset (26 samples * 20k genes). The 26 samples are 13 pairs as a random effects variable of (1|individual). If I'm using the model ~disease+(1|condition)+covar1+covar2+covar3+covar4, R will give me Error: cannot allocate vector of size 6223.5 Gb. It runs ok if I remove 1 fixed effect variable. It wouldn't run on an HPC either, and I suppose no cores can handle a vector of this size.

@myles-lewis
Copy link
Owner

Hi leqi0001,

Thanks. I haven't seen this error before. I suggest you try to isolate the issue as follows:

  1. Take a column of data from just 1 gene
  2. Apply log2+1 so that it is converted to be more gaussian
  3. Add your metadata

Fit your model using:
fit <- lme4::lmer(formula, data)
where you formula is of the form gene ~ disease+(1|condition)+covar1+covar2+covar3+covar4

Examine the result using summary(fit)
See if this can work on a single gene. If this works, then move to trying the neg binom model:
fit <- lme4::glmer(formula, data, family = MASS::negative.binomial(theta = 1/disp))

Try fixing the dispersion disp to a simple value e.g. 1, which makes the model simpler as it is essentially a Poisson model. This time you'll need to provide count data not gaussian data: count ~ disease+(1|condition)+covar1+covar2+covar3+covar4

This way you will find out whether a mixed model of such a magnitude is feasible.

I suspect the model is too large. Mixed models get big quickly because in essence there's a regression for each 'individual' or random effect level.

Best,
Myles

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants