Lectures
Date | Topic | Discussions (do readings before class) |
01/11/24 | Intro (slides) | |
01/16/24 | Hacking | Deploying local LLMs/Play prompt injection challenges |
01/18/24 | Intro to Research Methods (reading,writing) |
Papers
Jailbreaks/Alignment/Prompt Injections
- Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
- Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
- SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- Jatmo: Prompt Injection Defense by Task-Specific Finetuning
-
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Additional papers:
- Many-shot Jailbreaking
- Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
- Jailbroken: How Does LLM Safety Training Fail?
- Prompt Injection attack against LLM-integrated Applications
- Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Code Generation Security
- Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions
- Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants
- CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models
-
Stealthy Backdoor Attack for Code
Additional papers:
- VFCFinder: Seamlessly Pairing Security Advisories and Patches
- TrojanPuzzle: Covertly Poisoning Code-Suggestion Models
- Large Language Models for Code: Security Hardening and Adversarial Testing CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot
HackPack CTF challenges
- Murder Mystery
- ThenanoGPT
- Super Spy
- OPERATION CODESANITIZE
- One is the new two: LLiaM the code reviewer
- Tome of Babel
- nl2sh
- One NLPiece