Blueinfy's blog: Defending AI applications

AI Security Controls
When implementing AI systems, covering both non-agentic and agentic use cases, there are certain security controls that should be considered for building secure systems. There are grouped into foundational areas such as data handling, model integrity, access control, context safety, action governance, and operational monitoring. These controls are drawn from established industry frameworks, including NIST AI RMF, OWASP LLM Top 10, SANS AI Security Guidelines, and emerging agentic protocols. They are intended to help development and platform teams integrate security into the design, build, and deployment of AI applications. These controls can be provided as a practical checklist with implementation guidance so they can be adopted consistently during development, validated during testing, and formally handed over to operations for ongoing governance.

Deployment Strategies
Model and Dependency Security Scanning

Scan all models before deployment.
Ensure no malicious code, backdoors, supply-chain implants, biased data, PII or unsafe dependencies exist in the model artifacts.
Regularly perform open-source model license compliance checks.
Continuous model scanning after any retraining, fine-tuning, or indexing.

RAG, Embedding, and Vector Database Controls

Validate and sanitize all documents before adding them for indexing.
Prevent indexing of sensitive or classified documents (which can lead to breach of contracts, copyright issues or anything of that nature).
Use embedding models from trusted, verified sources only.
Enforce metadata-level access control to ensure retrieval respects user permissions.
Periodically purge documents from the vector DB.

Data Protection & Inference Security
Model Input Validation and Prompt Handling

Validate all user inputs/prompts before passing them to the model. Reject or sanitize unexpected formats, extremely long inputs, and potentially harmful content.
Prevent resource exhaustion be rate-limiting implementation.
Do not allow raw user input/prompts to directly control system or developer prompts. Use controlled templates and parameterized instructions.
Remove or neutralize prefixes, control tokens, or patterns that could trigger jailbreaks or prompt injection.
Restrict model-access to sensitive business operations; ensure the model cannot perform unintended administrative actions via crafted prompts.

Model Output Controls and Response Filtering

All model outputs should be validated and processed before rendering them to end users or any other downstream systems.
Content filtering should be implemented for harmful content categories (e.g., violence, hate, self-harm, disallowed medical/legal advice, malware, phishing etc.) in the output.
Ensure the system never executes model outputs as code, commands, or queries.
Strict type-validation should be enforced on model-generated structured data such as JSON, SQL, or code.
Variety of guardrails should be implemented to prevent hallucinated URLs, contacts, API calls, or misrepresentations.

Sensitive Data Controls

Do not expose sensitive, proprietary, copyright or personal data to external users or third-party model APIs.
Apply masking, tokenization, or pseudonymization before sending data to the model.
Restrict the model from storing or self-learning based on user-provided sensitive data.
Inspect prompts and output logs to ensure no leakage of secrets, credentials, API keys, or internal endpoints.
Disable ingestion, training or fine-tuning capabilities without a pre-approved process flow.
Prompt Injection, Jailbreak, and Abuse Prevention
Implementation of layered guardrails - system prompts, content filters, and structured templates.
Strict use of isolation prompts to prevent the model from modifying or revealing its system instructions.
Prevent the model from operating on untrusted references such as user-uploaded documents without sanitization.
Ensure the application ignores user attempts to override model identity, policies, or instructions.
Continuous evaluation of jailbreaks using adversarial test suites and automated red-teaming pipelines.

Bias, Toxicity, and Hallucination Mitigation

Evaluate model outputs for bias across protected categories; log and mitigate recurring patterns.
Policy-based constraints that block discriminatory, abusive, or identity-targeted outputs.
More focused data (RAG, retrieval checks) to reduce hallucinations.
Ensure that the system provides citations or source references for high-risk outputs.
Include human-in-the-loop for critical or decision-impacting outputs.

Model Behaviour Integrity and Drift Monitoring

Track versioning for prompts, model weights, system templates, and embedding indices.
Monitor outputs for accuracy degradation, bias drift, or changes in harmful content Behaviour.
Ensure rollback mechanisms exist to revert to a previous safe model version.
Implement real-time anomaly detection on output patterns using approved monitoring tools.
Perform periodic re-evaluation of the model’s performance on business-critical tasks.

Access Controls

Access Control and Authentication for AI Endpoints

All model inference endpoints must require authentication; public inference endpoints are prohibited unless approved.
Apply role-based access controls for administrative operations such as fine-tuning, dataset upload, vector store rebuild, or configuration updates.
Apply record-level, data-level access controls for protecting against unintended data access.
Enforce strong API key management standards; keys must not appear in client-side code.
Restrict high-cost or high-impact model operations such as batch inference to privileged roles.
Log access patterns, anomalies, and repeated misuse attempts.

Environment, Infrastructure, and API Security

Isolate model workloads in dedicated inference environments; prevent lateral movement from model containers.
Disable shell access and system command execution from within model pipelines.
Encrypt all communications: model API calls, embedding store interactions, and dataset transfers.
Apply strict resource quotas to prevent model abuse, cost spikes, or denial-of-service scenarios.
Redact logs to avoid storing sensitive prompts and outputs in plaintext.

Monitoring & Governance, Risk, Compliance (GRC)

Model Explainability and Governance

Maintain clear documentation for training datasets, model versions, decision boundaries, and fine-tuning sources.
Provide audit logs for all inference requests/prompts tied to a user identity.
Ensure regulatory alignment with data protection, algorithmic accountability, and sector-specific AI guidelines.
Document known limitations or unsafe failure modes for the model.
Integrate with internal AI governance workflows for approvals, reviews, and continuous compliance.

Human Feedback, Reinforcement, and Fine-tuning Controls

Train or tune models only on authorized, clean, governance-approved datasets.
Prevent user-generated adversarial prompts from polluting RLHF or fine-tuning datasets.
Conduct manual review on all human-labelled datasets used for alignment.
Track dataset lineage, consent, provenance, and ownership.
Evaluate fine-tuned models again for safety, bias, and hallucination risk.

Additional Controls (Focused on Agentic AI) (Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent–User Interaction Protocol (AG-UI))
• Authentication & Identity Assurance

Enforce strong authentication for all protocol endpoints, agents, tools, services, and UI sessions.
Each request or message must be tied to a verifiable identity.
No anonymous or implicitly trusted protocol participants.

• Authorization and Capability Restrictions

Restrict the context provided to the agent/tool to the minimum required as per defined task.
Least privilege principle should be followed with proper permission checks against a strict capability list.
Require explicit policy checks before executing any agent-initiated action.

• Message Integrity & Transport Security

Use TLS for all communication channels and enforce message signing or integrity hashes.
Reject tampered, malformed, or unsigned MCP/ACP/AG-UI messages.
Reject duplicated messages, stale actions, or replayed session traffic.

• Context and State Isolation

Segment context and shared state by user, agent, or task.
Prevent leakage across sessions, tools, or agents. Enforce TTL and expiry on all context and state artifacts.

• Tool and Action Execution Controls

Use a sandbox or isolated environment for all external calls or tool executions.
Prevent model outputs from directly executing commands without validation.
Require human approval for high-risk actions defined by policy.

• Tool Discovery and Registration Security

Restrict which tools can be discovered or registered.
Require authentication and integrity checks for tool manifests.
Prevent unapproved or malicious tool descriptors from being exposed to agents.
Validate that agents cannot enumerate or access tools outside their assigned capability scope.

• Information and Session Leakage Prevention

Ensure error responses do not disclose internal details, system state, or sensitive context.
Use short-lived, scoped tokens for all protocol operations.
Store credentials securely and prevent leakage via logs or metadata.
Rotate tokens and invalidate them on session termination.

• Monitoring and Logging

Log all agent requests, decisions, actions, and results in an immutable audit trail.
Continuously monitor the agent for anomalous behaviour or unsafe action patterns.
Maintain rollback mechanisms for any state-changing actions performed by the agent.
Implement a kill-switch to immediately disable agent actions if unsafe conditions occur.

Article by Hemil Shah and Rishita Sarabhai

Pages

Defending AI applications - Security Controls