-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAnswer-logreg2.Rmd
138 lines (107 loc) · 3.96 KB
/
Answer-logreg2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "Answers to exercises \"Binary data: Logistic regression\" part 2"
output:
github_document:
toc: true
toc_depth: 2
---
# Exercise 1
## 1a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/osteochon.csv"
lr5 <- read.table(adress,header=TRUE,sep=",")
```
## 1b
```{r}
osteo.fit <- glm(oc~factor(food)+factor(ground)+height,
family=binomial,data = lr5)
```
## 1c
```{r}
drop1(osteo.fit,test="Chisq")
```
The variable food has highest p-value so it is left out.
## 1d
```{r}
osteo2.fit <- glm(oc~factor(ground)+height,
family=binomial,data = lr5)
```
ground is not significant
```{r}
osteo3.fit <- glm(oc~height,family=binomial,data = lr5)
drop1(osteo3.fit,test="Chisq")
summary(osteo3.fit)
```
## 1e
Only height has a significant realtion with osteochondroses.
The log-odds ratio for heigth is 0.094: The difference in log-odds if height is increased by 1. And the difference in log-odds is the log oddsratio.
## 1f
```{r}
confint(osteo3.fit)
```
## 1g
The occurence of osteochondrosis was analysed using a logistic regression model with food, ground and height as independent variables. Model reduction was done with the likelihood ratio test. The final model only contained the height variable. The log-odds ratio for heigth is 0.094: The difference in log-odds if height is increased by 1.
(Off course a description of the variables is also needed.)
# Exercise 2
## 2a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/episode.txt"
lr1 <- read.table(adress,header=TRUE)
lr1$immune <- 1*(lr1$cd4<200)
```
## 2b
```{r}
epifit.1 <- glm(episode~immune+age+offset(log(followup)),
family=binomial,data=lr1)
summary(epifit.1)
```
## 2c
The estimates are the differences in log-odds per unit time, or the log-odds ratio's per unit time.
Or for instance: the odds per month follow up for the low immune status group is exp(1.69421)=5.44 times as high as for the high immune statusgroup
## 2d
```{r}
drop1(epifit.1)
```
So the model without age has smallest AIC. Immune cannot be left
out.
# Exercise 3
## 3a
data is in lr5
## 3b
```{r}
osteo1.fit <- glm(oc~factor(father)+factor(food)+factor(ground)
+height,family=binomial,data=lr5)
drop1(osteo1.fit,test="Chisq")
```
```{r}
osteo2.fit <- glm(oc~factor(father)+factor(ground)
+height,family=binomial,data=lr5)
drop1(osteo2.fit,test="Chisq")
```
```{r}
osteo3.fit <- glm(oc~factor(father)+height,family=binomial,data=lr5)
drop1(osteo3.fit,test="Chisq")
```
With some fathers there is a higer chance for daughters of getting oc. This might indicate an heredibility effect. (note some of the standard errors are huge indicating problems with the fit of this model). If a mare at the age of 3 is relatively high then the mare grew harder which might cause problems.
## 3e
The step function can now be used. This will give model selection based on the AIC. The LRT test is only shown, not used.
```{r}
step(osteo1.fit,test="Chisq")
```
## 3f
Variable father is thrown out first, since this leads to the lowest AIC; then food and ground are removed, leaving only height in the model. For this final model the coefficients are given. An important message: model building with LRT tests can be substantially different from model building with AIC. So do not mix AIC with hypothesis testing.
## 3g
```{r}
confint(glm(oc~height,family=binomial,data=lr5))
```
The probability that this interval contains the population value of the parameter is 0.95 or the interval contain plausible values for the log odds ratio in the population in the sense that those values would lead to the conclusion do not reject when tested in a null-hypothesis.
# Exercise 4
# 4a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/lowbirth.dat"
lr2 <-read.table(adress,header=TRUE)
low.fit <- glm(low~age+lwt+factor(race)+smoke+factor(ptl)+
ht+ui+factor(ftv),family=binomial,data=lr2)
step(low.fit)
```
Only age and ftv can be left out.