Legal NLP Project - DB_COMP

Overview

This repository is part of a personal project that aims to apply Natural Language Processing (NLP) techniques to the legal domain. The goal is to analyze Competition Law decisions using advanced text processing methods.

Features

DB_COMP Web Scraper

Scrapes decision documents from the DB_COMP website.
Extracts key information such as decision dates, titles and URLs.
Stores the extracted data in a structured format for easy analysis.

Downloading and Processing PDFs from DB COMP website

The code in this section:

Downloads a set of PDFs from given URLs.
Extracts the text content.
Preprocesses the text for further analysis.

Extracting Outcomes of Decisions

This segment:

Extracts the outcomes of decisions from the processed text.
Focuses on identifying and isolating specific sections that pertain to the decisions and their outcomes.

Text Summarization of Outcomes

The repository includes functionality for:

Summarizing the extracted outcomes using the GPT-3.5-turbo-0125 model with a zero-shot prompt.
The prompt guides the model to summarize whether the decision sanctioned or did not sanction the involved firms, starting with 'Sanction' or 'Not Sanction' respectively.

Merging Data and Creating Summary Database

The final step involves:

Merging the processed data with a scraped database.
Adding a column indicating whether the decision sanctioned or did not sanction the involved firms.
Resulting in a database with over 500 decisions, each with a summary of the outcome.

Project Objective

This project aims to leverage NLP techniques to enhance the analysis of legal documents, providing valuable insights and summaries that can aid in understanding Competition Law decisions more effectively.

Contact Details: Mail: pa.malcav@gmail.com Linkedin: https://www.linkedin.com/in/piero-alexis-malca-vilchez-27b3b1129/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
outputs		outputs
.gitattributes		.gitattributes
1_db_scrapper.ipynb		1_db_scrapper.ipynb
2_processing_pdf_files.ipynb		2_processing_pdf_files.ipynb
3_extracting_outcomes.ipynb		3_extracting_outcomes.ipynb
4_text_summarization.ipynb		4_text_summarization.ipynb
5_creating_database.ipynb		5_creating_database.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal NLP Project - DB_COMP

Overview

Features

DB_COMP Web Scraper

Downloading and Processing PDFs from DB COMP website

Extracting Outcomes of Decisions

Text Summarization of Outcomes

Merging Data and Creating Summary Database

Project Objective

About

Releases

Packages

Languages

pmalca/db_comp

Folders and files

Latest commit

History

Repository files navigation

Legal NLP Project - DB_COMP

Overview

Features

DB_COMP Web Scraper

Downloading and Processing PDFs from DB COMP website

Extracting Outcomes of Decisions

Text Summarization of Outcomes

Merging Data and Creating Summary Database

Project Objective

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages