Skip to content

Latest commit

 

History

History
32 lines (31 loc) · 1.34 KB

README.md

File metadata and controls

32 lines (31 loc) · 1.34 KB

Data-Science-AC209

AC209: Data Science is a course offered at Harvard SEAS. I completed 8 coding assignments and a final group project in this course in fall 2016.

What did I learn from this course?

The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered integrates the five key facets of an investigation using data: (1) data collection: data wrangling, cleaning, and sampling to get a suitable data set;
(2) data management: accessing data quickly and reliably;
(3) exploratory data analysis: generating hypotheses and building intuition;
(4) prediction or statistical learning;
(5) communication ? summarizing results through visualization, stories, and interpretable summaries.

Skillset:

Python packages:

Numpy, Pandas, scipy, Scikit-learn, matplotlib, BeautifulSoup

Models:

  1. Linear regression
  2. Linear regression with regularization (Ridge and Lasso)
  3. Logistic regression
  4. Multinomial logistic regression
  5. LDA and QDA
  6. KNN
  7. Random forest
  8. Bagging and boosting
  9. SVM

Model building skills:

  1. dimension reduction
  2. variable selection
  3. parameter tuning
  4. boostrapping and cross-validation
  5. model evaluation

Data wrangling and cleaning:

  1. pulling data out of HTML and XML files
  2. imbalanced data
  3. missing data