Prepare related work section #25

sternakt · 2024-11-26T06:56:57Z

Description

Prepare the initial draft of the "Related Work" section for a technical report on prompt leakage probing using the agentic framework AutoGen. This includes gathering and analyzing relevant literature on attacks on large language models (LLMs), with a specific focus on prompt leakage. The section should be structured to include an overview of attack types, notable studies, and identify gaps related to the use of agentic frameworks for probing.

Checklist

Gather Initial Literature
- Find a comprehensive review article on LLM attacks (e.g., prompt leakage, prompt injection, jailbreak attacks).
- Include key characteristics of each attack type with precise descriptions:
  - Prompt Leakage: Methods and motivations, e.g., accessing hidden or restricted prompt content.
  - Prompt Injection: Manipulating prompts to alter model behavior.
  - Jailbreak Attacks: Techniques for bypassing restrictions or ethical safeguards.
- Identify and cite foundational or highly-cited papers for each attack type.
Analyze Literature
- Summarize key findings from selected works.
- Highlight gaps in existing research relevant to our focus on AutoGen and prompt leakage.
Include Varied Methods of Prompt Leakage
- Search for papers detailing diverse approaches to prompt leakage (e.g., encoding strategies like Base64).
LLM-Generated Attacks on LLMs
- Collect studies where LLMs have been used to design or execute attacks against other LLMs.
Explore Use of Agentic Frameworks
- Investigate whether any prior works have used agentic frameworks, like AutoGen, for probing LLM attacks.
- Note if such work has not been done before (expected outcome).
Draft Related Work Section
- Write the initial draft, integrating findings and citations.
Add missing quotes to defence mechanisms

Notes

Focus on Prompt Leakage: Centralize on studies that explore prompt leakage in diverse contexts, including unconventional methods like Base64 encoding.
Agentic Frameworks Gap: This work aims to highlight the novelty of using AutoGen for probing attacks. Confirm the absence of similar prior work.
Quality Sources: Prioritize high-citation reviews and foundational papers for credibility and impact.

sternakt self-assigned this Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare related work section #25

Prepare related work section #25

sternakt commented Nov 26, 2024 •

edited

Loading

Prepare related work section #25

Prepare related work section #25

Comments

sternakt commented Nov 26, 2024 • edited Loading

Description

Checklist

Notes

sternakt commented Nov 26, 2024 •

edited

Loading