Bypassing AI Filters

When

Noon – 1 p.m., April 13, 2026

Where

Abstract: One of the ways that AI systems are fixing misalignment is by adding filters to prevent dangerous and forbidden prompts from ever reaching the model itself. However, one of the vulnerabilities of this system is the computational asymmetry between the filter and the LLM. Cryptographers have exploited this vulnerability by using “time-lock puzzles.” We will formalize the concept and examine how time-lock puzzles enable bypassing these filters.

This seminar is inspired by the article "Bypassing prompt guards in production will controlled-release prompting" and will include a journal club-style discussion.

Zoom link: https://www.math.arizona.edu/~klin/rtg-zoom

Data Research Training Group Seminar

Attachments

Bypassing-AI-prompts.pdf

Bypassing AI Filters

When

Where

Attachments

Information For

Academic Programs

Research Centers & Outreach

Resources

Connect