Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG]: function power_prune() does not behave correctly when SNPs in a given outcome-study subset have varying sample sizes #557

Open
phageghost opened this issue Aug 29, 2024 · 0 comments
Labels

Comments

@phageghost
Copy link
Contributor

phageghost commented Aug 29, 2024

Please make sure that this is a bug! If you have questions about how to use TwoSampleMR please use the Discussions function instead.

Describe the bug (required)

When running power_prune() with method 2 and assuming a continuous outcome distribution, outcome data which has different values for samplesize produces a mismatch in size between an intermediate data.frame and the iv.se vector used to populate a column in that data.frame.

Describe the current behaviour you observe (required)

bmi_exp_dat <- extract_instruments(outcomes = 'ieu-a-2', opengwas_jwt=JWT_TOKEN)
ao <- available_outcomes(opengwas_jwt = JWT_TOKEN)
chd_studies = subset(ao, trait == 'Coronary heart disease')
chd_out_dat <- extract_outcome_data(snps = bmi_exp_dat$SNP, outcomes = chd_studies$id, opengwas_jwt=JWT_TOKEN)
dat <- harmonise_data(
exposure_dat = bmi_exp_dat,
outcome_dat = chd_out_dat
)
dat <- power_prune(dat, method = 2, dist.outcome = "continuous")

Extracting data for 79 SNP(s) from 5 GWAS(s)

Finding proxies for 10 SNPs in outcome ieu-a-9

Extracting data for 10 SNP(s) from 1 GWAS(s)

Finding proxies for 47 SNPs in outcome ieu-a-6

Extracting data for 47 SNP(s) from 1 GWAS(s)

Finding proxies for 1 SNPs in outcome ebi-a-GCST000998

Extracting data for 1 SNP(s) from 1 GWAS(s)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ebi-a-GCST000998 (ebi-a-GCST000998)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-6 (ieu-a-6)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-8 (ieu-a-8)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-9 (ieu-a-9)

[1] 1
[1] "Body mass index Coronary heart disease"
[1] "identifying best powered summary set: Body mass index || id:ieu-a-2 ieu-a-2 Coronary heart disease || id:ebi-a-GCST000998 ebi-a-GCST000998"
[1] "identifying best powered summary set: Body mass index || id:ieu-a-2 ieu-a-2 Coronary heart disease || id:ieu-a-6 ieu-a-6"
Error in $<-.data.frame(*tmp*, "iv.se", value = c(0.0513059534662941, : replacement has 47 rows, data has 61
Traceback:

  1. power_prune(dat, method = 2, dist.outcome = "continuous")
  2. $<-(*tmp*, "iv.se", value = c(0.0513059534662941, 0.0514040302606308,
    . 0.0513042674065994, 0.0513422440048546, 0.0513025815131199, 0.0513506947075708,
    . 0.0513017386287024, 0.0627769011116508, 0.051311855984565, 0.0513625327048284,
    . 0.0513169169076932, 0.051333797472922, 0.0512966821944926, 0.0514133598706217,
    . 0.0513143862589525, 0.0515334001685128, 0.0513819986293536, 0.0714917801900777,
    . 0.0513456237852549, 0.0513186042148732, 0.0513034244390844, 0.0513067965584807,
    . 0.0513557671326049, 0.0512882581256275, 0.0513219793286036, 0.0629381381083169,
    . 0.0512924696412343, 0.0513101693429018, 0.0513253551083434, 0.0513312643261322,
    . 0.0513879274355384, 0.0513000529844945, 0.0514286374686093, 0.0513076396922313,
    . 0.0513583039088733, 0.0513380202177548, 0.0513160733165124, 0.0514583829379987,
    . 0.0513354864458105, 0.0513084828675495, 0.0513371755854206, 0.0512916272551145,
    . 0.0747620792897615, 0.0512975248296999, 0.0627722684149263, 0.0569081998241544,
    . 0.051317760540479))
  3. $<-.data.frame(*tmp*, "iv.se", value = c(0.0513059534662941,
    . 0.0514040302606308, 0.0513042674065994, 0.0513422440048546, 0.0513025815131199,
    . 0.0513506947075708, 0.0513017386287024, 0.0627769011116508, 0.051311855984565,
    . 0.0513625327048284, 0.0513169169076932, 0.051333797472922, 0.0512966821944926,
    . 0.0514133598706217, 0.0513143862589525, 0.0515334001685128, 0.0513819986293536,
    . 0.0714917801900777, 0.0513456237852549, 0.0513186042148732, 0.0513034244390844,
    . 0.0513067965584807, 0.0513557671326049, 0.0512882581256275, 0.0513219793286036,
    . 0.0629381381083169, 0.0512924696412343, 0.0513101693429018, 0.0513253551083434,
    . 0.0513312643261322, 0.0513879274355384, 0.0513000529844945, 0.0514286374686093,
    . 0.0513076396922313, 0.0513583039088733, 0.0513380202177548, 0.0513160733165124,
    . 0.0514583829379987, 0.0513354864458105, 0.0513084828675495, 0.0513371755854206,
    . 0.0512916272551145, 0.0747620792897615, 0.0512975248296999, 0.0627722684149263,
    . 0.0569081998241544, 0.051317760540479))
  4. stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
    . "replacement has %d rows, data has %d"), N, nrows), domain = NA)

Describe the behaviour you expect (required)

Return a pruned dataframe without errors

R code to reproduce the issue (required)

See above

Contribute a solution (optional)

PR 556

System information

  • Ubuntu 24.04 LTS
  • R version 4.4.1 (2024-06-14) -- "Race for Your Life"

Additional context

This was discovered while elaborating an example in the documentation to test the power_prune() function (which doesn't have a complete end-to-end example since the rest of the example on that doc page only deals with a single outcome subset).

@phageghost phageghost added the bug label Aug 29, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
@phageghost and others