When
Noon – 1 p.m., April 13, 2026
Where
Abstract: One of the ways that AI systems are fixing misalignment is by adding filters to prevent dangerous and forbidden prompts from ever reaching the model itself. However, one of the vulnerabilities of this system is the computational asymmetry between the filter and the LLM. Cryptographers have exploited this vulnerability by using “time-lock puzzles.” We will formalize the concept and examine how time-lock puzzles enable bypassing these filters.
This seminar is inspired by the article "Bypassing prompt guards in production will controlled-release prompting" and will include a journal club-style discussion.
Zoom link: https://www.math.arizona.edu/~klin/rtg-zoom