Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Prepare related work section #25

Open
6 of 7 tasks
sternakt opened this issue Nov 26, 2024 · 0 comments
Open
6 of 7 tasks

Prepare related work section #25

sternakt opened this issue Nov 26, 2024 · 0 comments
Assignees

Comments

@sternakt
Copy link
Collaborator

sternakt commented Nov 26, 2024

Description

Prepare the initial draft of the "Related Work" section for a technical report on prompt leakage probing using the agentic framework AutoGen. This includes gathering and analyzing relevant literature on attacks on large language models (LLMs), with a specific focus on prompt leakage. The section should be structured to include an overview of attack types, notable studies, and identify gaps related to the use of agentic frameworks for probing.

Checklist

  • Gather Initial Literature
    • Find a comprehensive review article on LLM attacks (e.g., prompt leakage, prompt injection, jailbreak attacks).
    • Include key characteristics of each attack type with precise descriptions:
      • Prompt Leakage: Methods and motivations, e.g., accessing hidden or restricted prompt content.
      • Prompt Injection: Manipulating prompts to alter model behavior.
      • Jailbreak Attacks: Techniques for bypassing restrictions or ethical safeguards.
    • Identify and cite foundational or highly-cited papers for each attack type.
  • Analyze Literature
    • Summarize key findings from selected works.
    • Highlight gaps in existing research relevant to our focus on AutoGen and prompt leakage.
  • Include Varied Methods of Prompt Leakage
    • Search for papers detailing diverse approaches to prompt leakage (e.g., encoding strategies like Base64).
  • LLM-Generated Attacks on LLMs
    • Collect studies where LLMs have been used to design or execute attacks against other LLMs.
  • Explore Use of Agentic Frameworks
    • Investigate whether any prior works have used agentic frameworks, like AutoGen, for probing LLM attacks.
    • Note if such work has not been done before (expected outcome).
  • Draft Related Work Section
    • Write the initial draft, integrating findings and citations.
  • Add missing quotes to defence mechanisms

Notes

  • Focus on Prompt Leakage: Centralize on studies that explore prompt leakage in diverse contexts, including unconventional methods like Base64 encoding.
  • Agentic Frameworks Gap: This work aims to highlight the novelty of using AutoGen for probing attacks. Confirm the absence of similar prior work.
  • Quality Sources: Prioritize high-citation reviews and foundational papers for credibility and impact.
@sternakt sternakt self-assigned this Nov 26, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant