AI & Machine Learning
Microsoft scanner detects LLM backdoors via three behavioral signals, low false positives
Microsoft's AI red team released a lightweight scanner detecting sleeper-agent backdoors in open-weight LLMs without needing training data. The tool flags three telltale patterns: unusual attention on trigger phrases, semantic drift in outputs, and abnormal memory extraction. Backdoors survive safety training and fine-tuning, making pre-deployment scanning critical for enterprise deployments.