Skip to content

This repository implements LLMs, including GPT-2, GPT-3.5, LLaMA 3 and Gemini to automate the generation of news articles, including examples of generated news articles in spanish, along with datasets used for model training.

Notifications You must be signed in to change notification settings

BrauuHdzM/LLM-for-automated-news-articles-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-for-automated-news-articles-generation

This repository contains the source code of the modules described in the article "On the usage of LLM for automated news article generation." This article explores the use of Large Language Models to generate a news article from a brief description of the event of interest. Additionally, it includes the datasets used to train the models and the news articles generated by the models reported in the article.

Prerequisites

Ensure you have the following before running the project:

  • Python 3.x
  • Required libraries: transformers, torch, pandas, spacy, unsloth among others.

All the necessary dependencies are specififed in each notebook

Project Structure

The repository consists of the following key files and directories:

Models:

  • GPT-2: Implementation of the GPT-2 model for news generation, which has been fine-tuned and is available for free use.
  • GPT-3.5: Requires users to fine-tune the model themselves and utilize it with their own API keys from OpenAI. Be aware that using GPT-3.5 comes with associated API usage costs.
  • LLaMA3: A fine-tuned version of this model is also available for free use in the project.
  • Gemini: Like GPT-3.5, this model requires training and usage under the user’s own Google API key, and costs associated with its API usage may apply.
  • GPT-4: Similar to Gemini or GPT-3.5, this model requires users to provide their own OpenAI API key to make requests. Users should be aware that API usage costs may apply when using GPT-4.

The repository includes the fine-tuning code for each of the models mentioned, along with examples that demonstrate how to use each model individually. Users can refer to these examples to either fine-tune or deploy the models in their own workflows.

Similar News Retrieval

  • RSS_Similar_News_Retrieval.ipynb: A notebook that implements the similar news retrieval module. This notebook provides examples and code to help users integrate news retrieval functionalities into their workflows.

Dataset:

  • News_Dataset_Complete.csv: The full dataset of news.
  • News_Dataset_For_FT.csv: Dataset specifically prepared for fine-tuning the models.

Experiments:

  • Quantitative evaluation:
    • UniEval_evaluation.ipynb: A notebook that performs the quantitative evaluation described in the paper for the generated news article in English.
  • Qualitative evaluation:
    • Qualitative evaluation.xls: The file shows the qualitative analysis performed on the news articles generated by the LLMs. Each news article was subjected to the qualitative evaluation criteria described in the paper, and its results are shown directly in the generated text using a color palette.
  • News_Generated_In_Spanish.csv: The news articles, in their original Spanish version, that were generated by the LLMs in the experiments reported in the paper.
  • News_Generated_In_English.csv: English translation of the News_Generated_In_Spanish.csv file (used in the quantitative evaluation).

How to Use the Code

Follow these steps to run the models:

  1. Clone the repository:

    git clone https://github.com/BrauuHdzM/LLM-for-automated-news-articles-generation.git
    cd LLM-for-automated-news-articles-generation
    
  2. Open the notebook in Jupyter or Colab for the model you wish to test (e.g., GPT-2.ipynb or GPT-3.5.ipynb).

  3. Run the cells in the notebook to load the model, train it if necessary, and generate news articles automatically.

Contributing

To contribute to this project, you can:

  1. Fork the repository.
  2. Create a new branch with your changes.
  3. Submit a pull request for review.

Suggestions for improvement include:

  • Expanding to other LLMs
  • Fixing bugs
  • Enhancing the code structure

About

This repository implements LLMs, including GPT-2, GPT-3.5, LLaMA 3 and Gemini to automate the generation of news articles, including examples of generated news articles in spanish, along with datasets used for model training.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published