Code for paper LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments.
We recommend using conda and pip to manage the environment. To set up the environment:
conda create --name lola
conda activate lola
conda install pip
pip install datasets
pip install peft
pip install evaluate
pip install transformers -U
pip install -U scikit-learn
pip install -U matplotlib
pip install progressbar2
pip install openai
# to download the Llama-3 model (only needed for fine-tuning Llama-3), register on huggingface for access to the model and then run the following command
pip install -U "huggingface_hub[cli]"
huggingface-cli login
# type in your huggingface credentials
We save some intermediate result to reproduce the results in the paper. Use these intermediate results can save time by skipping OpenAI API calls and finetuning the model.
run Pure LLM - Prompt/visualize_result.py
, this will generate mean_differences_heatmap_multiple.pdf and p_values_heatmap_multiple.pdf.
run Pure LLM - Embedding/predict_with_embedding.py
run Finetune CTR Prediction/plot.py
run Jupyter Notebook LOLA - Regret Minimize/LOLA_regret_minimize.ipynb
Below are the steps to run all the code, from getting intermediate results to getting final result.
The original dataset we used is https://osf.io/jd64p/.
The pre-processed dataset can be downloaded from Kaggle, or use the kaggle CLI command:
kaggle datasets download -d shuffleofficial/lola-llm-assisted-online-learning-algorithm
-
For data processing
- Code Path
Upworthy Data Processing.ipynb
- Running this code will generate a csv file named
ctr-all.csv
, along with various data splits - Data used:
upworthy-archive-holdout-packages-03.12.2020.csv
,upworthy-archive-exploratory-packages-03.12.2020.csv
andupworthy-archive-confirmatory-packages-03.12.2020.csv
(these data are downloaded from https://osf.io/jd64p/)
- Code Path
-
For Prompt Engineering Method
- Code Path
Pure LLM Approaches/Pure LLM - Prompt/main.py
andPure LLM Approaches/Pure LLM - Prompt/visualize_result.py
- Data used
winner-all.csv
- Code Path
-
For CTR prediction using OpenAI and Word2Vec Embedding
- Run
Pure LLM - Embedding/get_embedding.py
to get the embedding for the dataset - Run
Pure LLM - Embedding/predict_with_embedding.py
to get the prediction result. - Data used:
selected_pairs_df_005_256.csv
andselected_pairs_df_005_3072.csv
- Run
-
For LOLA
- Code Path
LOLA - Regret Minimize/LOLA_regret_minimize.ipynb
- Data used:
LoRA CTR.csv
andsimulation_results_regret_min
- Code Path
-
Survey Results
- Code and data path
Survey
- Code and data path