The bot knows about Ben Thompson, Stratechery, and the Stratechery articles listed in data.json. The oldest known article dates back to Nov 8, 2023.
- Stratechery articles were saved as markdown files, split into smaller chunks, and embedded in a Chroma database.
- On (almost) every query, the bot embeds your query, identifies the 7 most similar article chunks and 0-3 most relevant article summaries and places them into GPT's context to answer your question. This technique is known as Retreival-Augmented Generation (RAG).
- RAG is far from perfect! I used the open-sourced
all-MiniLM-L6-v2
model to create the embeddings. I used it because it's free (I'm cheap) and has good speed and performance for what you're getting. - I removed all those markdown Stratechery articles from this repo out of respect to Ben Thompson.
data.py is a mess of a codebase that shows how I retrieved, chunked, and embedded the Stratechery articles
chatbot.py is the UI logic for the Streamlit chatbot.
chatbot_helper.py is the helper functions for the Streamlit chatbot. This is where the magic happens with the GPT chat completions.
This bot was built by Ben Wallace. He's been a Stratechery subscriber for about 4 years. He wanted to build a chatbot from scratch and was inspired by the LennyBot, a GPT bot trained on Lenny's Newsletters.
Disclaimer: This app is not affiliated with, endorsed by, or approved by Ben Thompson or Stratechery.