Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Dev #8

Merged
merged 32 commits into from
Mar 8, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
d953a2c
test
robertmartin8 Mar 2, 2018
9ba86d4
added features to READMe
robertmartin8 Mar 2, 2018
5e00fa9
removed quandlData and dataAcquistion
robertmartin8 Mar 2, 2018
b7695b9
removed .gitignore
robertmartin8 Mar 2, 2018
cbe5e18
fixed download of current_data
robertmartin8 Mar 2, 2018
fd53e8f
rebuilt historical stock price download
robertmartin8 Mar 2, 2018
2576048
updated README (still a work in progress)
robertmartin8 Mar 2, 2018
88b4b15
added testing!
robertmartin8 Mar 3, 2018
345abf1
created utils file to store useful functions
robertmartin8 Mar 3, 2018
194b026
improved data_string_to_float
robertmartin8 Mar 4, 2018
2f61ccd
revamped stock_prediction.py
robertmartin8 Mar 4, 2018
2802daa
added quickstart and backtesting to README
robertmartin8 Mar 4, 2018
fe5d3ef
Improved test structure
robertmartin8 Mar 4, 2018
cdaa8b0
Finished writing and testing download_historical_prices
robertmartin8 Mar 6, 2018
a810e01
wrote tests on the stock_price and sp500_index datasets
robertmartin8 Mar 6, 2018
6243032
finished current_data.py with tests
robertmartin8 Mar 7, 2018
ea769fc
of course the date range needs to go one year later...
robertmartin8 Mar 7, 2018
5f87e6d
increased the end date again
robertmartin8 Mar 7, 2018
b963f14
updated readme
robertmartin8 Mar 7, 2018
485624e
finished parsing_keystats with tests
robertmartin8 Mar 7, 2018
bd26cb2
added a data folder for people to follow along
robertmartin8 Mar 7, 2018
677cdb8
finished the simple backtest
robertmartin8 Mar 8, 2018
9d4ca14
moves status_calc to utils
robertmartin8 Mar 8, 2018
f85cf75
finished backtesting
robertmartin8 Mar 8, 2018
9d8ea6f
finished stock prediction
robertmartin8 Mar 8, 2018
48d6a57
fixed tests of stock_prediction dataset
robertmartin8 Mar 8, 2018
7c20c94
moved status_calc to utils
robertmartin8 Mar 8, 2018
2e73644
removed test_dates
robertmartin8 Mar 8, 2018
285de11
Added requirements file
robertmartin8 Mar 8, 2018
2a71fd6
updated readme
robertmartin8 Mar 8, 2018
4fa255e
improved documentation
robertmartin8 Mar 8, 2018
43970c4
removed .csv files
robertmartin8 Mar 8, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 0 additions & 108 deletions .gitignore

This file was deleted.

347 changes: 292 additions & 55 deletions README.md

Large diffs are not rendered by default.

78 changes: 78 additions & 0 deletions backtesting.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Preprocessing
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score
from utils import status_calc


def backtest():
"""
A simple backtest, which splits the dataset into a train set and test set,
then fits a Random Forest classifier to the train set. We print the precision and accuracy
of the classifier on the test set, then run a backtest comparing this strategy's performance
to passive investment in the S&P500.
Please note that there is a methodological flaw in this backtest which will give deceptively
good results, so the results here should not encourage you to live trade.
"""
# Build the dataset, and drop any rows with missing values
data_df = pd.read_csv("keystats.csv", index_col='Date')
data_df.dropna(axis=0, how='any', inplace=True)

features = data_df.columns[6:]
X = data_df[features].values
# The labels are generated by applying the status_calc to the dataframe.
# '1' if a stock beats the S&P500 by more than 10%, else '0'
y = list(
map(status_calc, data_df["stock_p_change"], data_df["SP500_p_change"]))

# z is required for us to track returns
z = np.array(data_df[["stock_p_change", "SP500_p_change"]])

# Generate the train set and test set by randomly splitting the dataset
X_train, X_test, y_train, y_test, z_train, z_test = train_test_split(
X, y, z, test_size=0.2)

# Instantiate a RandomForestClassifier with 100 trees, then fit it to the training data
clf = RandomForestClassifier(n_estimators=100, random_state=0)
clf.fit(X_train, y_train)

# Generate the predictions, then print test set accuracy and precision
y_pred = clf.predict(X_test)
print("Classifier performance\n", "=" * 20)
print(f"Accuracy score: {clf.score(X_test, y_test): .2f}")
print(f"Precision score: {precision_score(y_test, y_pred): .2f}")

num_positive_predictions = sum(y_pred)
if num_positive_predictions < 0:
print("No stocks predicted!")

# Recall that z_test stores the change in stock price in column 0, and the
# change in S&P500 price in column 1.
# Whenever a stock is predicted to outperform (y_pred = 1), we 'buy' that stock
# and simultaneously `buy` the index for comparison.
stock_returns = 1 + z_test[y_pred, 0] / 100
market_returns = 1 + z_test[y_pred, 1] / 100

# Calculate the average growth for each stock we predicted 'buy'
# and the corresponding index growth
avg_predicted_stock_growth = sum(stock_returns) / num_positive_predictions
index_growth = sum(market_returns) / num_positive_predictions

percentage_stock_returns = 100 * (avg_predicted_stock_growth - 1)
percentage_market_returns = 100 * (index_growth - 1)
total_outperformance = percentage_stock_returns - percentage_market_returns

print("\n Stock prediction performance report \n", "=" * 40)
print(f"Total Trades:", num_positive_predictions)
print(
f"Average return for stock predictions: {percentage_stock_returns: .1f} %")
print(
f"Average market return in the same period: {percentage_market_returns: .1f}% ")
print(
f"Compared to the index, our strategy earns {total_outperformance: .1f} percentage points more")


if __name__ == "__main__":
backtest()
Loading