What is the German Bundestag (national parliament of the Federal Republic of Germany) up to? (Was treibt der Bundestag?)

This repo contains all necessary code to set up an autonomous bot that scrapes the Bundestag committee's page for PDFs, analyzes these using gpt-4 and automatically posts their relevant content on Instagram @was_treibt_der_bundestag. For more information about the deployment, see section Usage.

Telegram bot: t.me/was_treibt_der_bundestag_bot [Not properly implemented yet] Website: wastreibtderbundestag.de [Not properly implemented yet]

Main contributors: Lorenz Hufe @lowrenz, Justus Westerhoff @MassEast, Jakob Maleck

Affiliation: BLISS Berlin

Why?

As of March 2024, the German Bundestag's committees regularly publish their work, plans and activities in the form of PDF files. We think that the committe's work isn't transparent and 'shareable' enough and hence thought of a simple solution to make it more accessible. Especially, we wanted to point out which proposals the different parties bring up to clearly see what they are dealing with. Examples:

13.03.2024, Antrag der AfD: "Kinderkopftuch als politisch-weltanschauliches Symbol - Verbot in öffentlichen Kindertageseinrichtungen und Schulen" (in English: AfD motion: "Children's headscarves as a political and ideological symbol - ban in public kindergartens and schools")

Usage

We currently deployed it on Google Cloud in the following way:

flowchart TD;
    S[Cloud Scheduler]-->|trigger|F[Cloud Function: scraper_function.py];
    F ~~~ D[(Firebase on Google Cloud)]
    F-->|save potentially new data|D;
    D-->|get actually new data|F;
    F-->|trigger for each datum in new data|B[Cloud Docker: this repo's Docker]
    B-->|posts|I(Instagram)
    B-->|triggers|T[Cloud Function: telegram_bot.py]
    T-->|broadcasts|TB(Telegram)
    T-->|save chat_id on /start or /sub|D
    D-->|retrieve chat_ids for broadcast|T

.env

Your .env file should contain a backend URL that runs the docker ("BACKEND_URL"), an OpenAI key ("OPENAI_API_KEY"), an Instagram username ("INSTAGRAM_USERNAME") and password ("INSTAGRAM_PASSWORD"). If a Telegram bot wants to be used as well, also provide a Telegram bot token ("TELEGRAM_BOT_TOKEN").

Remarks

Without instragrapi this probably wouldn't have been so easy, since using Meta's actual Instagram API turned out to be very restrictive (need for Business account and much more). [Meaning: We could not figure it out in one night, the night where this project was drafted and implemented.]

At the Berlin Hack & Tell #99, we were made aware of the Bundestagszusammenfasser, which does similar thing, but not as focused as producing specialized social media posts, but rather structuring (and also summarizing) very nicely almost everything that can be found on German state websites. Take a look at Sabrina's website!

Potential improvements

become independent from OpenAI and use a self-hosted, opensource LLM instead
use multi-prompts/multi-agents, e.g., through LangChain, to achieve a better and more reliable information retrieval from the PDF files (the currently used prompt is already kind of large and may easily be subgoaled)
check if found titles by the LLM actually appear in the document to be somewhat certain that it is not making stuff up
use better text extraction from the PDF, e.g., by better taking into account the actual PDF structure (notice that there are out-of-the-box solutions to achieve this with pdfplumber and also pypdf, but we did not find an easy way to combine these with highlighting bold text as well (which we thought of as more important than the PDF's structure))
follow instagrapi's best practices because we may easily (update: we did!) overshoot the mentioned safe limit of 4-16 posts and should hence use a proxy, session saving or delays between requests, as outlined on the page

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cloud		cloud
res		res
src		src
tests		tests
website		website
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
telegram_posting.ipynb		telegram_posting.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is the German Bundestag (national parliament of the Federal Republic of Germany) up to? (Was treibt der Bundestag?)

Why?

Usage

.env

Remarks

Potential improvements

About

Releases

Packages

Contributors 2

Languages

License

Bliss-e-V/was_treibt_der_bundestag

Folders and files

Latest commit

History

Repository files navigation

What is the German Bundestag (national parliament of the Federal Republic of Germany) up to? (Was treibt der Bundestag?)

Why?

Usage

.env

Remarks

Potential improvements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages