Skip to content

Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"

License

Notifications You must be signed in to change notification settings

centerforaisafety/emergent-values

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

utility_engineering

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Website | Paper

Additional code and contents will be added soon.

Utility Analysis

You can run experiments using the run_experiments.py script. For example, the following command will compute utilities over outcomes for GPT-4o:

python run_experiments.py --experiments compute_utilities --models gpt-4o

For more details about the experiments and available options, please see the Utility Analysis README.

Citation

If you find this useful in your research, please consider citing:

@article{mazeika2025utility,
  title={Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs},
  author={Mazeika, Mantas and Yin, Xuwang and Tamirisa, Rishub and Lim, Jaehyuk and Lee, Bruce W and Ren, Richard and Phan, Long and Mu, Norman and Khoja, Adam and Zhang, Oliver and others},
  journal={arXiv preprint arXiv:2502.08640},
  year={2025}
}

About

Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •