Final project for the Spiced Data Science Bootcamp by Pavel Romanenko. For this project Pavel took the live data with enterprise innovation projects from his website Future Atelier. Future Atelier is a database and newsletter with the most remarkable enterprise innovation projects. Based on the preferences of the newsletter subscribers of Future Atelier Pavel has created a recommender bot which can predict innovation projects a new user may like based on the input of the new user.
Malte Bonart created the template for deploying a Flask app to Heroku; and Paul Wlodkowski adopted this template.
A live version of this recommender bot is avaliable via https://fay-recommender.herokuapp.com/
PLEASE NOTE: The website may take some 20 seconds to load when you try to access it for the first time during your session. This is due to using a free tier Heroku server which goes to sleep when the website was not visited recently.
For this demo version Pavel created a MySQL database with 100+ enterprise innovation projects.
The information from this database is processed with Python and RegEx to get clean tags and URLs.
database['tags'] = database['tags'].replace('[?⚡⚽⚫✈️]', ',', regex = True) # Remove question marks which replcaed the emojis
database['tags'] = database['tags'].replace('[\U00010000-\U0010ffff]', ',', regex = True) # Remove all emojis
database['tags'] = database['tags'].replace(',,', ',', regex = True) # Remove double commas with single comma
database['tags'] = database['tags'].replace('\s*,\s*',', ', regex = True) # Remove the spacings before and after commas
database['tags'] = database['tags'].replace('(^..)','', regex = True) # !! This one should be replaced by a better regex or a for-loop to remove leading comma in a string
database['tags'] = database['tags'].replace('^\s+', '', regex = True) # Remove white space at the beginning of the string
database['tags'] = database['tags'].replace('[ \t]+$','', regex = True) # Remove white space at the end of the string
database['url'] = database['url'].replace('\?utm_source.+','', regex = True) # Remove the tracking link ?utm_source=Future+Atelier...
database['url'] = database['url'].replace('utm_source.+','', regex = True)
To get recommendations we are using the Non-Negative Matrix Factorization from the sklearn.decomposition module.
To fit the model we need the usage data of the sample users who clicked on the news from the database. Based on this data we can create two matrices to train the model: one for the preferred URLs and one for the preferred tags. With this model we can predict the URLs and tags a new user may like, given one or more
- Fix the mobile version of the demo website