Even if an agent is properly authenticated and authorized, can it still be manipulated into unsafe or policy-violating behavior? 440 executable security tests across 31 modules. MCP + A2A + L402 + ...