From bbb189bdde58bca7609b15511558d7a61bd5fed0 Mon Sep 17 00:00:00 2001 From: Sebastian Bordt Date: Thu, 8 Aug 2024 21:20:29 +0200 Subject: [PATCH] readme --- colm-2024-paper-code/README.md | 20 ++----------------- .../run_time_series_experiments.py | 1 - 2 files changed, 2 insertions(+), 19 deletions(-) diff --git a/colm-2024-paper-code/README.md b/colm-2024-paper-code/README.md index 980a516..538bd3b 100644 --- a/colm-2024-paper-code/README.md +++ b/colm-2024-paper-code/README.md @@ -1,25 +1,9 @@ -# 🐘 Testing Language Models for Memorization of Tabular Datasets -![PyPI - Version](https://img.shields.io/pypi/v/tabmemcheck) -![Python](https://img.shields.io/badge/Python-3.9+-blue.svg) -![License](https://img.shields.io/github/license/interpretml/TalkToEBM.svg?style=flat-square) -[![tests](https://github.com/interpretml/LLM-Tabular-Memorization-Checker/actions/workflows/run_tests.yaml/badge.svg?branch=main)](https://github.com/interpretml/LLM-Tabular-Memorization-Checker/actions/workflows/run_tests.yaml) -[![Documentation](https://img.shields.io/badge/Documentation-View-blue)](http://interpret.ml/LLM-Tabular-Memorization-Checker/) +# 🐘 Never Forget: Memorization and Learning of Tabular Data in Large Language Models

Header Test

- -Tabmemcheck is an open-source Python library to test language models for memorization of tabular datasets. - -Features: -- [x] Test GPT-3.5, GPT-4, and other LLMs for memorization of tabular datasets. -- [x] Supports chat models and (base) language models. In chat mode, the prompts are designed toward GPT-3.5 and GPT-4. We recommend testing the base models with other LLMs. -- [x] Based entirely on prompts (no access to the probability distribution over tokens ('logprobs') is required). -- [x] The submodule ``tabmemcheck.datasets`` allows to load tabular datasets in perturbed form (``original``, ``perturbed``, ``task``, ``statistical``). - -The different tests are described in a Neurips'23 workshop [paper](https://arxiv.org/abs/2403.06644). - -The dataset transforms and the consequences of memorization for few-shot learning are discussed in this [pre-print](https://arxiv.org/abs/2404.06209). +Here we provide the code to replicate the COLM'24 [paper](https://arxiv.org/abs/2404.06209) "Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models". ### Installation diff --git a/colm-2024-paper-code/run_time_series_experiments.py b/colm-2024-paper-code/run_time_series_experiments.py index 5bf454d..2a47f85 100644 --- a/colm-2024-paper-code/run_time_series_experiments.py +++ b/colm-2024-paper-code/run_time_series_experiments.py @@ -4,7 +4,6 @@ import pandas as pd -import tabmemcheck import tabular_queries import yaml