Data-Science-AC209

AC209: Data Science is a course offered at Harvard SEAS. I completed 8 coding assignments and a final group project in this course in fall 2016.

What did I learn from this course?

The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered integrates the five key facets of an investigation using data: (1) data collection: data wrangling, cleaning, and sampling to get a suitable data set;
(2) data management: accessing data quickly and reliably;
(3) exploratory data analysis: generating hypotheses and building intuition;
(4) prediction or statistical learning;
(5) communication ? summarizing results through visualization, stories, and interpretable summaries.

Skillset:

Python packages:

Numpy, Pandas, scipy, Scikit-learn, matplotlib, BeautifulSoup

Models:

Linear regression
Linear regression with regularization (Ridge and Lasso)
Logistic regression
Multinomial logistic regression
LDA and QDA
KNN
Random forest
Bagging and boosting
SVM

Model building skills:

dimension reduction
variable selection
parameter tuning
boostrapping and cross-validation
model evaluation

Data wrangling and cleaning:

pulling data out of HTML and XML files
imbalanced data
missing data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data-Science-AC209

What did I learn from this course?

Skillset:

Python packages:

Models:

Model building skills:

Data wrangling and cleaning:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data-Science-AC209

What did I learn from this course?

Skillset:

Python packages:

Models:

Model building skills:

Data wrangling and cleaning: