← Ethicality Policy Institute
Safety2025 · 11 min read· Open for response

AI Security, Resilience, and Misuse Prevention

AI-specific security controls across the lifecycle: threat assessments, red-team testing for high-risk systems, model-integrity and access controls, abuse detection and rate limits, supplier security review, fallback and shutdown procedures, and severity-classified incident reporting to the certification body.

By Aiden Muscovitch · Ethicality Policy Institute

Proposal

This policy proposal establishes minimum security, resilience, and misuse-prevention requirements for AI systems certified under AIMSS. AI systems introduce security risks that differ from traditional software risks, including prompt injection, model extraction, data leakage, adversarial manipulation, unsafe autonomous behaviour, synthetic impersonation, and misuse at scale. Organisations must therefore adopt AI-specific security controls that operate across design, development, deployment, monitoring, and decommissioning.

The organisation shall maintain an AI Security and Misuse Risk Management Policy covering all AI systems within certification scope. The policy shall define security ownership, threat assessment methods, minimum testing requirements, incident classification, access controls, monitoring obligations, and response procedures. Each AI system shall be assigned a security owner responsible for ensuring that risks are assessed before deployment and monitored after release.

Before deployment, every material AI system shall undergo an AI threat assessment. This assessment shall identify potential misuse scenarios, likely attackers, vulnerable interfaces, sensitive data exposure risks, dependency risks, and possible harms to users or the public. For high-risk systems, the organisation shall conduct red-team testing or adversarial evaluation before release. Testing shall include attempts to bypass safeguards, extract confidential data, manipulate outputs, generate harmful content, impersonate individuals, or misuse the system for fraud, disinformation, harassment, or automated harm.

The organisation shall implement controls to protect model integrity, system availability, and data confidentiality. These controls shall include secure development practices, access management, logging, encryption where appropriate, change control, vendor security review, and monitoring of abnormal system behaviour. For externally accessible AI systems, the organisation shall maintain rate limits, abuse detection, and response mechanisms to prevent large-scale misuse. Where systems rely on third-party models, APIs, cloud infrastructure, or datasets, the organisation shall assess supplier security risks and document dependency controls.

Resilience shall be treated as a certification requirement. AI systems used in important business or public-facing processes shall have fallback procedures. Where the AI system becomes unavailable, produces unsafe outputs, or shows signs of drift or compromise, the organisation must be able to suspend, roll back, or transfer operations to human review. High-risk AI systems shall not be deployed without defined shutdown criteria and emergency escalation procedures.

The policy shall also require misuse monitoring after deployment. Organisations shall maintain logs of abuse attempts, security incidents, harmful outputs, user reports, and system anomalies. Material incidents shall be classified by severity and subject to root-cause analysis. Severe incidents, including significant data leakage, safety failure, unlawful harm, or large-scale misuse, shall be reported to the certification body within a defined timeframe.

Certification evidence shall include threat assessments, red-team reports, security test results, access control records, incident logs, monitoring dashboards, supplier assessments, vulnerability management records, and corrective action plans. Failure to perform AI-specific security testing should be treated as a major nonconformity. Concealment of serious incidents, uncontrolled high-risk deployment, or repeated failure to prevent known misuse should be treated as a critical nonconformity.