Skip to content
Change the repository type filter

All

    Repositories list

    • The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability"
      Python
      0200Updated Feb 17, 2025Feb 17, 2025
    • CELLO

      Public
      Python
      Apache License 2.0
      0000Updated Feb 13, 2025Feb 13, 2025
    • MORE

      Public
      JavaScript
      Apache License 2.0
      0000Updated Feb 13, 2025Feb 13, 2025
    • Python
      Apache License 2.0
      0000Updated Feb 13, 2025Feb 13, 2025
    • CaLM

      Public
      Python
      Apache License 2.0
      0000Updated Feb 13, 2025Feb 13, 2025
    • ADCE

      Public
      The official code for paper: Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability.
      Python
      1000Updated Feb 13, 2025Feb 13, 2025
    • SEER

      Public
      Self-Explainability Enhancement of LLMs’ Representations
      Python
      0500Updated Feb 11, 2025Feb 11, 2025
    • Python
      37400Updated Feb 3, 2025Feb 3, 2025
    • ReflectionBench
      Python
      2810Updated Jan 20, 2025Jan 20, 2025
    • VLSBench

      Public
      Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
      Python
      13010Updated Jan 17, 2025Jan 17, 2025
    • Python
      0000Updated Jan 17, 2025Jan 17, 2025
    • ESC-Eval

      Public
      Python
      0100Updated Jan 17, 2025Jan 17, 2025
    • Python
      0000Updated Jan 17, 2025Jan 17, 2025
    • Python
      0200Updated Jan 17, 2025Jan 17, 2025
    • modpo

      Public
      Python
      0000Updated Jan 17, 2025Jan 17, 2025
    • .github

      Public
      0000Updated Jan 16, 2025Jan 16, 2025
    • 【ACL 2024】 SALAD benchmark & MD-Judge
      Python
      Apache License 2.0
      0100Updated Jan 16, 2025Jan 16, 2025
    • REEF

      Public
      The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source LLMs.
      Python
      34000Updated Jan 16, 2025Jan 16, 2025
    • CredID

      Public
      Python
      1000Updated Dec 26, 2024Dec 26, 2024
    • Makefile
      The Unlicense
      0005Updated Dec 23, 2024Dec 23, 2024
    • T2ISafety

      Public
      0100Updated Nov 25, 2024Nov 25, 2024
    • DEAN

      Public
      Python
      Apache License 2.0
      11000Updated Oct 25, 2024Oct 25, 2024
    • [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
      Python
      MIT License
      33611Updated Oct 25, 2024Oct 25, 2024
    • MLLMGuard

      Public
      Python
      22120Updated Oct 22, 2024Oct 22, 2024
    • ivg

      Public
      Official repository of the paper "Inference-Time Language Model Alignment via Integrated Value Guidance"
      Python
      0000Updated Sep 27, 2024Sep 27, 2024
    • Persafety

      Public
      The repository of paper "The Better Angels of Machine Personality: How Personality Relates to LLM Safety".
      0300Updated Jul 2, 2024Jul 2, 2024
    • Python
      Apache License 2.0
      4000Updated May 22, 2024May 22, 2024
    • Flames

      Public
      Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.
      Apache License 2.0
      04110Updated May 21, 2024May 21, 2024
    • Python
      Apache License 2.0
      0500Updated Mar 22, 2024Mar 22, 2024