Large Language Models Often Know When They Are Being Evaluated

• June 16, 2025

If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations, leading to less reliable benchmarks for deployment and governanc…

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and acce… [+257 chars]

Visiting original passage: Large Language Models Often Know When They Are Being Evaluated

Twitter
Facebook
Pinterest
Linkedin

Large Language Models Often Know When They Are Being Evaluated

Modal Title