Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
sbordt committed Aug 8, 2024
1 parent 735bd8d commit bbb189b
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 19 deletions.
20 changes: 2 additions & 18 deletions colm-2024-paper-code/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,9 @@
# 🐘 Testing Language Models for Memorization of Tabular Datasets
![PyPI - Version](https://img.shields.io/pypi/v/tabmemcheck)
![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)
![License](https://img.shields.io/github/license/interpretml/TalkToEBM.svg?style=flat-square)
[![tests](https://github.com/interpretml/LLM-Tabular-Memorization-Checker/actions/workflows/run_tests.yaml/badge.svg?branch=main)](https://github.com/interpretml/LLM-Tabular-Memorization-Checker/actions/workflows/run_tests.yaml)
[![Documentation](https://img.shields.io/badge/Documentation-View-blue)](http://interpret.ml/LLM-Tabular-Memorization-Checker/)
# 🐘 Never Forget: Memorization and Learning of Tabular Data in Large Language Models

<p align="center">
<img src="img/elephant.webp" width="800" alt="Header Test"/>
</p>

Tabmemcheck is an open-source Python library to test language models for memorization of tabular datasets.

Features:
- [x] Test GPT-3.5, GPT-4, and other LLMs for memorization of tabular datasets.
- [x] Supports chat models and (base) language models. In chat mode, the prompts are designed toward GPT-3.5 and GPT-4. We recommend testing the base models with other LLMs.
- [x] Based entirely on prompts (no access to the probability distribution over tokens ('logprobs') is required).
- [x] The submodule ``tabmemcheck.datasets`` allows to load tabular datasets in perturbed form (``original``, ``perturbed``, ``task``, ``statistical``).

The different tests are described in a Neurips'23 workshop [paper](https://arxiv.org/abs/2403.06644).

The dataset transforms and the consequences of memorization for few-shot learning are discussed in this [pre-print](https://arxiv.org/abs/2404.06209).
Here we provide the code to replicate the COLM'24 [paper](https://arxiv.org/abs/2404.06209) "Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models".

### Installation

Expand Down
1 change: 0 additions & 1 deletion colm-2024-paper-code/run_time_series_experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

import pandas as pd

import tabmemcheck
import tabular_queries

import yaml
Expand Down

0 comments on commit bbb189b

Please # to comment.