Segmenting Watermarked Texts From Language Models

Implementation of the methods described in "Segmenting Watermarked Texts From Language Models" by Xingchi Li, Guanxun Li, Xianyang Zhang.

Prerequisites

Python environments

Cython==3.0.10
datasets==2.19.1
huggingface_hub==0.23.0
nltk==3.8.1
numpy==1.26.4
sacremoses==0.0.53
scipy==1.13.0
sentencepiece==0.2.0
tokenizers==0.19.1
torch==2.3.0.post100
torchaudio==2.3.0
torchvision==0.18.0
tqdm==4.66.4
transformers==4.40.2

Set up environments

# PyTorch: https://pytorch.org/get-started/locally
# Transformers: https://huggingface.co/docs/transformers/en/installation
conda install cython scipy nltk sentencepiece sacremoses

Instructions

All experiments are conducted using Slurm workload manager. Expected running time and memory usage are provided in the corresponding sbatch scripts.

Important

Please modify the paths, Slurm mail options and adjust the GPU resources in the sbatch scripts before running the experiments.

Caution

The Python SeedBS script is modified based on the R version. The output is not guaranteed to be the same.

# Setup pyx.
sbatch 1-setup.sh

# Text generation.
bash 2-textgen-helper.sh
sbatch 2-textgen.sh

# Rolling window watermark detection.
bash 3-detect-helper.sh
sbatch 3-detect.sh

# Change point analysis using R.
bash 4-seedbs-helper.sh
sbatch 4-seedbs.sh
# OR using Python.
bash 4.1-seedbs-helper.sh
sbatch 4.1-seedbs.sh

# Result analysis and ploting.
Rscript 5-not.R

Tip

The implementation of NOT can be found in the 5-not.R script from line 348 to 371.

Citation

@inproceedings{NEURIPS2024_1a8d2958,
  author = {Li, Xingchi and Li, Guanxun and Zhang, Xianyang},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
  pages = {14634--14665},
  publisher = {Curran Associates, Inc.},
  title = {Segmenting Watermarked Texts From Language Models},
  url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/1a8d295871250443f9747d239925b89d-Paper-Conference.pdf},
  volume = {37},
  year = {2024}
}

OpenReview and ArXiv

@inproceedings{
  li2024segmenting,
  title={Segmenting Watermarked Texts From Language Models},
  author={Xingchi Li and Guanxun Li and Xianyang Zhang},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=FAuFpGeLmx}
}

@misc{li2024segmentingwatermarkedtextslanguage,
  title={Segmenting Watermarked Texts From Language Models}, 
  author={Xingchi Li and Guanxun Li and Xianyang Zhang},
  year={2024},
  eprint={2410.20670},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2410.20670}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
log		log
rebuttal		rebuttal
watermarking		watermarking
.gitignore		.gitignore
1-setup.py		1-setup.py
1-setup.sh		1-setup.sh
2-textgen-helper.sh		2-textgen-helper.sh
2-textgen.py		2-textgen.py
2-textgen.sh		2-textgen.sh
3-detect-helper.sh		3-detect-helper.sh
3-detect.py		3-detect.py
3-detect.sh		3-detect.sh
4-seedbs-helper.sh		4-seedbs-helper.sh
4-seedbs.R		4-seedbs.R
4-seedbs.sh		4-seedbs.sh
4.1-seedbs-helper.sh		4.1-seedbs-helper.sh
4.1-seedbs.py		4.1-seedbs.py
4.1-seedbs.sh		4.1-seedbs.sh
5-not.R		5-not.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segmenting Watermarked Texts From Language Models

Prerequisites

Set up environments

Instructions

Citation

Stargazers over time

About

Languages

doccstat/llm-watermark-cpd

Folders and files

Latest commit

History

Repository files navigation

Segmenting Watermarked Texts From Language Models

Prerequisites

Set up environments

Instructions

Citation

Stargazers over time

About

Topics

Resources

Stars

Watchers

Forks

Languages