Teaching material for an introductory course on Data Science and Machine Learning. I am using this material for a fourth undergraduate course of Telecommunication Engineering at University Carlos III in Madrid (Spain), and decided to share with anyone interested.
I have structured the course with 18 lessons organized as follows:
- Introduction to Python Notebooks and Numpy
- Introduction to Pandas
- Linear regression with a single variable
- Learning curves, regularization, and cross validation
- Numerical optimization with gradient descend
- Nonparametric regression
- Introduction to binary classification and logistic regression
- The kernel trick in logistic regression
- Support Vector Machines
- Unsupervised Learning. Clustering with K-means
- Dimensionality Reduction with PCA. Probabilistic PCA.
- Clustering with Gaussian Mixture Models and the EM algorithm
- Mixtures of Bernoulli distributions
- Introduction to Neural Networks & Tensorflow
- Building an image classifier with Convolutional Neural Networks
- Word Embeddings
- Recurrent Neural Networks for text prediction
Update (April 2018): The course is finally finished, but will continue evolving during the next academic semester! I plan to add more practical examples with real data and certainly expand the introduction to deep learning.
For every lesson, I provide a self-contained notebook with both models descriptions and theoretical description (using integrated LaTeX Markdown blocks) and practical examples. In sessions 3-5, all steps are implemented manually (so students can learn how to implement stuff by themselves) but from session 6 we start using built-in sciki-learn libraries.
This course is largely inspired in the excellent courses by Andrew Ng and by Emily Fox and Carlos Guestrin, all in available in Coursera.
This material is distributed under the MIT License.