This repository contains the source code of the modules described in the article "On the usage of LLM for automated news article generation." This article explores the use of Large Language Models to generate a news article from a brief description of the event of interest. Additionally, it includes the datasets used to train the models and the news articles generated by the models reported in the article.
Ensure you have the following before running the project:
- Python 3.x
- Required libraries:
transformers
,torch
,pandas
,spacy
,unsloth
among others.
All the necessary dependencies are specififed in each notebook
The repository consists of the following key files and directories:
- GPT-2: Implementation of the GPT-2 model for news generation, which has been fine-tuned and is available for free use.
- GPT-3.5: Requires users to fine-tune the model themselves and utilize it with their own API keys from OpenAI. Be aware that using GPT-3.5 comes with associated API usage costs.
- LLaMA3: A fine-tuned version of this model is also available for free use in the project.
- Gemini: Like GPT-3.5, this model requires training and usage under the user’s own Google API key, and costs associated with its API usage may apply.
- GPT-4: Similar to Gemini or GPT-3.5, this model requires users to provide their own OpenAI API key to make requests. Users should be aware that API usage costs may apply when using GPT-4.
The repository includes the fine-tuning code for each of the models mentioned, along with examples that demonstrate how to use each model individually. Users can refer to these examples to either fine-tune or deploy the models in their own workflows.
RSS_Similar_News_Retrieval.ipynb
: A notebook that implements the similar news retrieval module. This notebook provides examples and code to help users integrate news retrieval functionalities into their workflows.
News_Dataset_Complete.csv
: The full dataset of news.News_Dataset_For_FT.csv
: Dataset specifically prepared for fine-tuning the models.
- Quantitative evaluation:
UniEval_evaluation.ipynb
: A notebook that performs the quantitative evaluation described in the paper for the generated news article in English.
- Qualitative evaluation:
Qualitative evaluation.xls
: The file shows the qualitative analysis performed on the news articles generated by the LLMs. Each news article was subjected to the qualitative evaluation criteria described in the paper, and its results are shown directly in the generated text using a color palette.
News_Generated_In_Spanish.csv
: The news articles, in their original Spanish version, that were generated by the LLMs in the experiments reported in the paper.News_Generated_In_English.csv
: English translation of the News_Generated_In_Spanish.csv file (used in the quantitative evaluation).
Follow these steps to run the models:
-
Clone the repository:
git clone https://github.com/BrauuHdzM/LLM-for-automated-news-articles-generation.git cd LLM-for-automated-news-articles-generation
-
Open the notebook in Jupyter or Colab for the model you wish to test (e.g.,
GPT-2.ipynb
orGPT-3.5.ipynb
). -
Run the cells in the notebook to load the model, train it if necessary, and generate news articles automatically.
To contribute to this project, you can:
- Fork the repository.
- Create a new branch with your changes.
- Submit a pull request for review.
Suggestions for improvement include:
- Expanding to other LLMs
- Fixing bugs
- Enhancing the code structure