Skip to content
@AI45Lab

AI45 Lab

Welcome 👋

to AI45, a safety ecosystem platform developed by Shanghai Artificial Inteiligence Laboratory.

Core Philosophy

The platform is guided by the AI-45° Law. From a long-term perspective, AI safety and performance should ideally advance in parallel along a 45° line. Short-term fluctuations are permissible, but in the long run, this balance should neither fall below 45° (as at present) nor exceed it (to avoid constraining development).

Multiple technical pathways may achieve this “AI-45° Law”. We are exploring a causality-centered approach—“the Causal Ladder of Trustworthy AGI"—spanning three progressive layers: Approximate Alignment Layer, Intervenable Layer, and Reflectable Layer.'

Core Modules

🔬 Safety Foundation

🛡️ Safety Technology

🏆 Safety Evaluation

🌐 Safety Services

Popular repositories Loading

  1. ActorAttack ActorAttack Public

    Python 74 3

  2. Flames Flames Public

    Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.

    41

  3. REEF REEF Public

    The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source LLMs.

    Python 40 3

  4. CodeAttack CodeAttack Public

    [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

    Python 36 3

  5. VLSBench VLSBench Public

    Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety

    Python 30 1

  6. MLLMGuard MLLMGuard Public

    Python 21 2

Repositories

Showing 10 of 29 repositories
  • X-Boundary Public

    The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability"

    AI45Lab/X-Boundary’s past year of commit activity
    Python 2 0 0 0 Updated Feb 17, 2025
  • CELLO Public
    AI45Lab/CELLO’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Feb 13, 2025
  • MORE Public
    AI45Lab/MORE’s past year of commit activity
    JavaScript 0 Apache-2.0 0 0 0 Updated Feb 13, 2025
  • AI45Lab/SelfConsciousness’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Feb 13, 2025
  • CaLM Public
    AI45Lab/CaLM’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Feb 13, 2025
  • ADCE Public Forked from OpenCausaLab/ADCE

    The official code for paper: Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability.

    AI45Lab/ADCE’s past year of commit activity
    Python 0 1 0 0 Updated Feb 13, 2025
  • SEER Public

    Self-Explainability Enhancement of LLMs’ Representations

    AI45Lab/SEER’s past year of commit activity
    Python 5 0 0 0 Updated Feb 11, 2025
  • ActorAttack Public
    AI45Lab/ActorAttack’s past year of commit activity
    Python 74 3 0 0 Updated Feb 3, 2025
  • ReflectionBench Public

    ReflectionBench

    AI45Lab/ReflectionBench’s past year of commit activity
    Python 8 2 1 0 Updated Jan 20, 2025
  • VLSBench Public

    Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety

    AI45Lab/VLSBench’s past year of commit activity
    Python 30 1 1 0 Updated Jan 17, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…