Fully-Featured search app for curated news summaries for competetive Extemporaneous speaking.
Extemporaneous speaking (extemp) is a competitive Speech and Debate event practiced at various levels from middle school to college. Speakers are given a question surrounding current events and are given 30 minutes to prep a seven minute speech. Speakers must cite the sources they referred to during their prep time, and national finalists can cite upwards of 20 sources in one speech.
Thus, effecient use of prep time is one of the competetive factors that seperate good extempers from not-so-great extempers. Finding high-quality sources, ability to quickly filter information, and ability to summarize lengthy articles are some of the skills extempers hone to become the best in their event.
Great question! This tool:
- Parses a curated group of RSS feeds from high-quality sources every morning
- Using Natural Language Processing, the content of the articles in the feed are summarized into 5 sentences or less.
- The source information and summary are indexed into ElasticSearch.
- Users can then interact with the Search webapp to search for relevant articles, and filter with fully-feature filtering.
Google is definetly what most of think of when we think "search". However, one must wade through sometimes irrevelant articles from low-quality sources that happen to be good at SEO. Extempers also have to skim through entire articles which can reduce the amount of sources they can cover in a limited time.
There are a few paid tools out there that offer a far more complete feature set than this tool. Due to the limited time I have to develop this tool, it is currently focused on the source management aspect of extemp software.
While I support folks making money off of their labor, this tool is a labor of love for the community and not intended to be something I profit off of. If you belive in the philosphy of open source software, then perhaps you'll consider using this tool.
List the core languages/frameworks used, e.g.:
- Python 3.8
- SpAcy
- ElasticSearch AppSearch
- React
- Elastic/Search-UI
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
- Docker installed
- An ElasticSearch AppSearch Cloud host and token (14 day free trial available, or you can self-host as well)
Download the code and unzip. Navigate to the rss_scraper
directory and run
make local
This will install the required dependencies in a local environment in your directory.
I've found it works best in a docker container.
Add your host and token to src/config/engine.json
Then build your dockerfile with
docker build -t search:dev .
You can also run npm install
but I've had some conflicts with a brew-installed version of Node on MacOS.
To run the code, you'll want to set some environment variables. Run the following before you run the scraper locally:
export ES_HOST=<your host>
export ES_TOKEN=<you es private token>
Then you can run the following:
python rss_scraper/rss_to_elasticsearch.py
This will run the code and push it up to ElasticSearch
You can either run it via the Docker container with:
docker run -p 3000:3000 search:dev
or run npm start
. Either way, once it's up you can access the UI at localhost:3000
To deploy the entire tool, you will need to setup a GCP account.
- Create a new project. Store that project id as a Github secret
PROJECT_ID
. - Create a new service account with the following permissions:
- Container Registry Service Agent
- Service Account User
- Cloud Run Admin
- Storage Admin
- Create a service account key for the new account, save the JSON, and store it as a Github secret
secrets.GCP_SA_KEY
- Create Github secrets for your
ES_HOST
andES_TOKEN
, respectively. - Create a new Cloud Run service in GCP called "extemp-assist-rss", for now, use the sample container. Set the invoke permissions to internal only, and do not allow unathenticated invocation.
- Go to GCP Cloud Scheduler. Create a job with the following cron rule:
0 4 * * *
to run it everyday at 4 AM. Retrieve the URL from step 5 and use that as a HTTP target. - Create another GCP Cloud Run service, called "extemp-assist-ui". Allow all incoming connections and unathenticated invocations.
- Use main.yml file in
.github/workflows
as your Github Actions template and deploy using a Github action.