-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
159 lines (126 loc) · 6.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# MRcML for Independent Samples and Overlapping Samples
<!-- badges: start -->
<!-- badges: end -->
R package for Mendelian randomization with constraind maximum likelihood (MRcML) methods. Here is the reference for the original MRcML method with two independent samples: [**Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects**](https://www.cell.com/ajhg/pdfExtended/S0002-9297(21)00219-6).
Here is the reference for the extension of MRcML method with overlapping samples:
[**Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data**](https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010762).
## Installation
<!-- You can install the released version of MRcML from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("MRcML")
```
-->
Install the package from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("xue-hr/MRcML")
```
## Example
Here is an example which shows how to apply MRcML methods to make inference about the causal effect from **Fast Glucose (FG)** to **Type-2 Diabetes (T2D)**.
```{r example}
library(MRcML)
summary(T2D_FG)
```
Example data `T2D_FG` is a list which contains estimated effects sizes and standard errors of 17 SNPs on T2D and FG. Now we perfrom the main function with sample size of FG which is 46186, and using 100 random start points. We set the random seed `random_seed = 1` to make sure results are replicable. First we use the function `mr_cML()`, which is for the two independent sample case. Then we apply the function `mr_cML_Overlap()`, which is for the overlapping sample case; for illustration purpose, here we assume the correlations between GWAS summary data being 0.1, i.e. setting `rho = 0.1`.
```{r}
### mr_cML() for two independent samples
cML_result = mr_cML(T2D_FG$b_exp,
T2D_FG$b_out,
T2D_FG$se_exp,
T2D_FG$se_out,
n = 46186,
random_start = 100,
random_seed = 1)
### mr_cML_Overlap() for two overlapping samples
cML_result_Overlap = mr_cML_Overlap(T2D_FG$b_exp,
T2D_FG$b_out,
T2D_FG$se_exp,
T2D_FG$se_out,
n = 46186,
random_start = 100,
random_seed = 1,
rho = 0.1)
```
We get a warning message from the function `cML_estimate_random()`. The reason is: here we use 100 random starting points to minimize the non-convex loss function, some of them may not converge to a local minimum and result in Fisher Information matrices that are not positive definite. It is not likely affecting the optimization result, since in the end we only use the start point gives the minimum loss and discard all other start points including those do not converge. Now lets take a look at the results:
```{r}
cML_result
cML_result_Overlap
```
BIC selected model gives us indices of invalid IVs: 8, 12, 13, 15, 17. Now lets draw the scatter plot, invalid IVs are marked with blue:
```{r, echo = FALSE}
library(ggplot2)
plot_data = data.frame(b_exp = T2D_FG$b_exp,b_out = T2D_FG$b_out,
se_exp = T2D_FG$se_exp,se_out = T2D_FG$se_out)
invalid_ind = as.factor(as.numeric(is.element(1:17,c(8, 12, 13, 15, 17))))
plot_data = cbind(plot_data,invalid_ind = invalid_ind)
fp2 =
ggplot(data = plot_data,aes(x = b_exp,y = b_out)) +
geom_point(aes(col = invalid_ind)) +
geom_errorbar(aes(ymin = b_out-se_out,ymax = b_out+se_out,
col = invalid_ind),width=0) +
geom_errorbarh(aes(xmin = b_exp-se_exp, xmax = b_exp+se_exp,
col = invalid_ind),height=0) +
scale_color_manual(values = c("grey","blue")) +
theme(legend.position = "none") +
#geom_abline(intercept = 0,
# slope = c(TLP_result[i,c(1,BIC_min_ind)]),
# col = c("black","red"),
# linetype=c("dashed","solid"),
# size = c(1,1)
#) +
geom_hline(yintercept = 0, linetype="solid",
color = "black", size=0.5) +
geom_vline(xintercept = 0, linetype="solid",
color = "black", size=0.5) +
labs(x = "Effect Size of FG",
y = "Effect Size of T2D")
fp2
```
Now let us perform cML with data perturbation. The default number of perturbations is 200, and we use 10 random start points. In real application, we recommend use more random start points to get reliable results even it takes more time, like 10 or even 100; in simulations the number of random start points could be set to 0 (i.e. do not use random start) to speed up.
First we use the function `mr_cML_DP()`, which is for the two independent sample case. Then we apply the function `mr_cML_DP_Overlap()`, which is for the overlapping sample case; again for illustration purpose, here we assume the correlations between GWAS summary data being 0.1, i.e. setting `rho = 0.1`.
```{r, warning=FALSE}
### mr_cML_DP() for two independent samples
cML_result_DP = mr_cML_DP(T2D_FG$b_exp,
T2D_FG$b_out,
T2D_FG$se_exp,
T2D_FG$se_out,
n = 46186,
random_start = 10,
random_start_pert = 10,
random_seed = 1,
num_pert = 200)
### mr_cML_DP_Overlap() for two overlapping samples
cML_result_DP_Overlap = mr_cML_DP_Overlap(b_exp = T2D_FG$b_exp,
b_out = T2D_FG$b_out,
se_exp = T2D_FG$se_exp,
se_out = T2D_FG$se_out,
n = 46186,
random_start = 10,
random_start_pert = 10,
random_seed = 1,
num_pert = 200,
rho = 0.1)
```
Results with data perturbation:
```{r}
cML_result_DP
cML_result_DP_Overlap
```
<!-- You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date.
You can also embed plots, for example:
```{r pressure, echo = FALSE}
plot(pressure)
```
In that case, don't forget to commit and push the resulting figure files, so they display on GitHub! -->