This repository is meant for sharing lecture note, assignments and data for predictive analytics module in Big Data Analytics (LA) and Big Social Data Analytics (LA) course at Copenhagen Business School, Denmark.
Predictive Analytics module is divided into three module. Every module consists of a mixture of lectures and in-class assignment. Lecture exercises, and in-class assignments will be provided through Jupyter notebooks.
Introduction to Statistical Learning (https://www-bcf.usc.edu/~gareth/ISL/) and its big brother, The Elements of Statistical Learning (https://www.springer.com/gp/book/9780387848570), are quite seriously mathematical and conceptual book, which is in fact basis for lecture notes.
Building Machine Learning System with Python (https://github.com/luispedro/BuildingMachineLearningSystemsWithPython) gives good overview of regression analysis.
Time series analysis, James Douglas Hamilton. Princeton Univ. Press, Princeton, NJ, (1994) (https://press.princeton.edu/titles/5386.html)
We will use Python to learn statistical and probabilistic approaches for understanding, analyzing and gaining insights from data. Please download and install Anaconda (Python 3.7 version) as per your operating system. Its must for running in class assignment. Best place to start learning python is Google's Python Class. Best way to go about "installing python", "starting Jupyter notebook" and "interacting with conda" can be found at Unidata's (https://unidata.github.io/online-python-training/)
Details on each module component will be hosted on public GitHub repository (https://github.com/pankajk/PredictiveAnalytics). Its good to have look at Python, numpy, scikit-learn, pandas, matplotlib, pdb; git; LaTeX; markdown.
This repository borrows parts of code and data from Kaggle Competitions. The slides used in lectures are partly taken from various sources, which includes books, lectures notes and videos. A big shout for all of them even if their name is not dully acknowledged.
In this module, we will cover mathematics foundation for predictive analytics. The objective is to develop data analytics skills through investigations of different ways to organize and represent data and describe and analyze variation in data. Probability helps predict the likelihood that an event will happen, whereas statistics helps us to make sense of data.
In this module, we will cover different types of linear/non-linear regression analysis with machine learning flavor. It is powerful predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. The process of performing a regression allows us to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other.
Time Series Analysis has wide applicability in predictive analytics, weither its finance, marketing and engineering, among many other fields of practice. This module will illustrate time series analysis using many applications from these fields. This module will provide exposure to standard time series analysis topics such as modeling time series using regression analysis, univariate ARMA/ARIMA modelling, (G)ARCH modeling, Vector Autoregressive (VAR) model along with forecasting, model identification and diagnostics.
Time-series analysis application to find the best model and predict future stock values using S&P 500 Stock Index is quite well documented here.