Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
Introduced the need to study personalized safety in LLMs. Argued that alignment should not be purely global, but tailored to a user's background since risk profiles vary across users. Showed where current models fall short, and introduced an inference-time mitigation approach using LLM-guided Monte Carlo Tree Search.