Course project for UCLA CS145, Introduction to Data Mining
The main driver script is
. It takes in a single argument, the ML model type: [NN, PR, AR, ARIMA, ARMA, MA, SARIMA]
PR: Polynomial Regression
NN: Neural Network
AR: Auto Regression
MA: Moving Average
py NN
This will generate a result csv file, matching the Kaggle submission format. To change any configurations, refer to the constant variables declared in,,, or (superclass of all prediction models).
To transform input data, run:
It will then create a csv file for each states, each containing its state's daily report. Miscellaneous states from the input data set are ignored
NOTE Each time this script is ran, all the <state>.csv
files are truncated an refilled from the daily report files.
Data format (copied from
USA daily state reports (csse_covid_19_daily_reports_us)
This table contains an aggregation of each USA State level data.
To create the test.csv file, run:
To get MAPE of the prediction vs truth data, run:
MM-DD-YYYY.csv in UTC.
- Province_State - The name of the State within the USA.
- Country_Region - The name of the Country (US).
- Last_Update - The most recent date the file was pushed.
- Lat - Latitude.
- Long_ - Longitude.
- Confirmed - Aggregated case count for the state.
- Deaths - Aggregated death toll for the state.
- Recovered - Aggregated Recovered case count for the state.
- Active - Aggregated confirmed cases that have not been resolved (Active cases = total cases - total recovered - total deaths).
- FIPS - Federal Information Processing Standards code that uniquely identifies counties within the USA.
- Incident_Rate - cases per 100,000 persons.
- People_Tested - Total number of people who have been tested.
- People_Hospitalized - Total number of people hospitalized. (Nullified on Aug 31, see Issue #3083)
- Mortality_Rate - Number recorded deaths * 100/ Number confirmed cases.
- UID - Unique Identifier for each row entry.
- ISO3 - Officialy assigned country code identifiers.
- Testing_Rate - Total test results per 100,000 persons. The "total test results" are equal to "Total test results (Positive + Negative)" from COVID Tracking Project.
- Hospitalization_Rate - US Hospitalization Rate (%): = Total number hospitalized / Number cases. The "Total number hospitalized" is the "Hospitalized – Cumulative" count from COVID Tracking Project. The "hospitalization rate" and "Total number hospitalized" is only presented for those states which provide cumulative hospital data. (Nullified on Aug 31, see Issue #3083)
For more details of Neural Network Model please refer to
In this class we train based on Neural Network and we use GridSearch to find the best parameters
You can add/remove parameters and their values to see how to find the optimal NN settings. Please only modify the following in
self.parameters = {
'hidden_layer_sizes': [(80, 80), (70, 70), (60, 60)],
'activation': ['relu'],
'solver': ['adam'],
'learning_rate': ['adaptive'],
'learning_rate_init': [0.0001, 0.001, 0.005, 0.0005]