Name Date
This is an R Markdown document, an example of literate programming which allows users to interweave text, statistical output and the code that produces that output.
To complete your assignment:
-
Edit this file directly, interweaving text and R code as appropriate to answer the questions below. Remember to
Knit
the file to make sure everything is running smoothly. Detailed information on R Markdown is available here, and there is a useful cheat sheet here. -
Use Git to
commit
changes you make in this repo locally. -
Push
the repo, together with this edited.Rmd
file, the corresponding.md
file and any other relevant files to GitHub. -
Note that the output format is specified as
github_document
in the chunk above. The advantage of this is that your submitted assessment can be viewed directly in GitHub. However, if you are using an R package that produces HTML output, you can change the output format tohtml_document
.
You can commit
and push
as often as necessary—your assessment will
be graded on the most recent version of your repo at the assessment due
date.
Good luck!
In this assessment you are provided with observational data on the
health and smoking habits of 654 youth in the lungcap
dataset shipped
with the GLMsData
package. If you don’t have GLMsData
installed you
will need to install it. Your aim is to estimate the total causal effect
of smoking on lung capacity. You will draw a DAG and undertake a
matching analysis for this simple dataset.
When drawing your DAG you can use whatever tool you prefer. Options
include ggdag()
in R, daggity.net, a
screenshot from a DAG drawn in MS Word etc, or even an embedded picture
of a DAG drawn with pen and paper.
The lungcap
dataset is drawn from Tager et al
(1983)
Longitudinal Study of the Effects of Maternal Smoking on Pulmonary
Function in Children. More details on the use of this dataset as a
teaching dataset can be found in Kahn, M. (2003) Data Sleuth, STATS,
37, 24..
Below is a brief description of the available variables, taken from the
R help documentation. More information can be found by entering
?lungcap
at the console (provided the GLMsData
pacakge is installed
and loaded).
- Age the age of the subject in completed years; a numeric vector
- FEV the forced expiratory volume in litres, a measure of lung capacity; a numeric vector
- Ht the height in inches; a numeric vector
- Gender the gender of the subjects: a numeric vector with females coded as 0 and males as 1
- Smoke the smoking status of the subject: a numeric vector with non-smokers coded as 0 and smokers as 1
Answer the questions below with reference to the lungcap
dataset. Your
aim is to estimate the total causal effect of smoking on lung capacity.
-
Draw a DAG to represent the causal relationship between smoking and lung capacity. Write a brief descriptive paragraph that highlights any backdoor path(s) and the observed variables that might be useful to close the path(s). The DAG should be plausible but doesn’t have to include every single possible relevant variable; <10 most relevant nodes should be adequate. (25%)
Hint: Not all variables that appear in the dataset need to be on your DAG and not all nodes that are on your DAG will necessarily be observed variables.
-
Match smokers to non-smokers using a matching method of your choice and matching variables that are consistent with your DAG. Briefly summarise the balance in both cases, drawing on appropriate numerical and graphical summaries. (25%)
-
Fit the model
FEV ~ Smoke
in (i) The raw data and (ii) the matched data. Briefly describe the model results. How does the estimated effect for smoking change and why? (25%) -
Briefly discuss any limitations of the model 3(ii) and argue whether or not the assumption of exchangeability is likely to hold. (25%)
Instructions: Indicate that you understand and agree with the following three statements by typing an x in the square brackets below, e.g. [x].
I declare that this assessment item is my own work, except where acknowledged, and has not been submitted for academic credit elsewhere or previously, or produced independently of this course (e.g. for a third party such as your place of employment) and acknowledge that the assessor of this item may, for the purpose of assessing this item: (i) Reproduce this assessment item and provide a copy to another member of the University; and/or (ii) Communicate a copy of this assessment item to a plagiarism checking service (which may then retain a copy of the assessment item on its database for the purpose of future plagiarism checking).
- I understand and agree
I certify that I have read and understood the University Rules in respect of Student Academic Misconduct.
- I understand and agree
I have a backup copy of the assessment.
- I understand and agree