My hipstery netflix prize tools.
I play NFP for fun a lot, years after it's over, because I enjoy machine learning, and there aren't a whole bunch of high quality large machine learning datasets to play with. I have to restart since a machine theft, so, might as well make this one public, because it's been like seven years since the contest was over, and that pretty much means that this repo comes with a cardigan, some Lisa Loeb glasses, and a coupon for a soy chai latte, light mint.
This generates a bunch of useful SQL data, correctly. There's more to come, though. This is importantly incomplete.
This is simple data munging tools to support a new Netflix Prize attempt. For example, this contains SQL importing tools, to pull the raw files into a MySQL install; it also contains a tool to convert the dataset into a raw integer state for a C
-style system. This is not a machine learning toolkit; this is the groundwork to support one, using this specific dataset.
I know. And that blows. And the worst part is I understand why Netflix did it, and it's legit. (Someone found a way to partially de-anonymize the data, and used it to learn stuff about people. Creepy stuff.)
At the same time, the internet rarely forgets, so it's pretty easy to find a copy on Google.
It is even easier if you know that the filename you're looking for is nf_prize_dataset.tar.gz
.
nfp_hipster
is MIT licensed, because viral licenses and newspeak language modification are evil. Free is only free when it's free for everyone.