Tweet2Sheet is a Python-based application that scans URLs from RSS feeds and identifies embedded tweets or tweet links. These are then extracted and logged into a Google Spreadsheet for easy access and review.
- Scans RSS Feeds: Utilizes Feedparser to scan and extract URLs from RSS feeds.
- Extracts Tweets: Employs BeautifulSoup to scan URLs and detect embedded or linked tweets.
- Outputs to Google Spreadsheet: Stores all identified tweets into a Google Spreadsheet.
- Ignores Duplicate Entries: When run multiple times, the script disregards duplicate entries, ensuring a unique collection of tweets.
- Configurable Columns: Easily customize the columns in your Google Sheet via the config file.
- Domain Formatting: Automatically formats domain names for consistency (removes 'www.' and '.no').
The output Google Spreadsheet contains the following columns:
Date | Time | Domain | Twitter Handle | Article URL | Tweet URL | Article Title | Article Summary | Published Date | Tweet Count |
---|---|---|---|---|---|---|---|---|---|
The application uses a YAML configuration file (config.yaml
) for easy setup and customization. Here's an example structure:
google_credentials:
# Your Google Sheets API credentials here
master_config_url: "https://docs.google.com/spreadsheets/d/your-spreadsheet-id/edit#gid=0"
rss_feeds:
- "https://example.com/feed1.rss"
- "https://example.com/feed2.rss"
column_names:
- Date
- Time
- Domain
- Twitter Handle
- Article URL
- Tweet URL
- Article Title
- Article Summary
- Published Date
- Tweet Count
max_workers: 10
request_timeout: 10
rate_limit: 1
max_age_days: 7
The repository includes all necessary files to clone and run the project as an app on Heroku. You just need to update the config.yaml
file with your Google account details. If you're unfamiliar with this process, follow the steps outlined in this tutorial.
Credentials are required for Google Sheets API access. These should be placed in the config.yaml
file. You can obtain these credentials from your Google account (a json-file will be provided by Google). For details on how to obtain and use these credentials, refer to the above-mentioned tutorial.
After obtaining the json-file, copy the credentials into the config.yaml
file before running the script. Always be mindful to keep such credentials secure and do not share them publicly.