A simple python module to clean, organize, preprocess, and make sense of your entire Facebook message history.
This module was originally written to help create a Markov chatbot to emulate our friend Chris after he was kicked from a chat group for excessive crassness. The messages.htm file retreived from the downloadable archive off of Facebook was too unwieldy to work with, and I couldn't find any specific libraries that minimalistically processed the corpus the way I wanted to, so this module was written.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
-
Install python3: Self explanatory.
-
Install BeautifulSoup: Required for HTML parsing.
-
Install the lxml parser: lxml is way faster than html.parser or html5lib.
-
Download an archive of your all your Facebook data.
- Log into Facebook.
- Navigate to settings.
- At the bottom of General Account Settings find the line of text that says "Download a copy of your Facebook data" and click the link.
- Follow the instructions on the next page.
- After your archive is compiled, you will receive an email with your download link.
- Extract your data dump. Your messages.htm file is located in ./html/messages.htm
See examples for more.
Import fbmsgparse
from fbmsgparse import FbMsgParse
Create a FbMsgParse object with the path to your messages.htm document.
messages_path ='html/messages.htm'
fmp = FbMsgParse(messages_path)
Print out some stats
print(fmp.stats())
Use get_user_messages() to extract a list of all of a specified user's messages from threads with a minimum of 3 members.
u_id = '1234567890'
u_name = 'Name Surname'
msgs = fmp.get_user_messages(u_id, u_name, min_size=3)
- Beautiful Soup - Scraping galore
This project is licensed under the MIT License - see the LICENSE file for details.