-
Notifications
You must be signed in to change notification settings - Fork 0
coding principles
Ivan Rudik edited this page May 7, 2019
·
2 revisions
We generally want to follow the Gentzkow and Shapiro code structure and data storage protocols. Several basic things to re-emphasize:
- The entire project, from initial data to compiling the paper pdf, can be run from one script, typically
~/git/project-name/make-paper.r
. This R script will call other files, e.g. Stata, R, and LaTeX.
-
Do not manually pre-process data, e.g. manipulate Excel sheets, before importing into R or Stata. All data processing, beginning with the original file, should be automated and, in the final version, called by
make-paper.r
.
- Keep code less than 100 characters wide so that it is easy to read.
- Each dataset has a valid (unique, non-missing) key / observation ID. For example, you might have dataset of US county characteristics, e.g. square miles and 1969 population, with one row for each county, and the key being the 1000*state_fips+county_fips.
- Keep datasets normalized (meaning that they contain only variables at the same logical level as the key) as late in the data preparation process as possible. Once you merge a state-level dataset with a county-level dataset, the state-level variables are recorded many times (one for each county). This takes a lot of space and can also confuse other aspects of data preparation.
-
Project Management
-
Style Guides