This blog series establishes a foundation for securing AI implementations across layers and platforms. It covers AI security controls and approaches for addressing AI vulnerabilities, highlighting remediation options and controls at the LLM through configurable guardrails across multiple layers. Azure Content Safety (previous blog entry) demonstrates how native safeguards can be configured in Azure based AI applications. Building on these concepts, this blog focuses on guardrails that can be leveraged within Google Cloud based AI applications.
Google Cloud provides configurable guardrails and security options that allow teams to protect LLM and AI Agents against risks such as prompt injection, hallucinations, and adversarial inputs. It offers configurable content filters, policy enforcement controls, and monitoring features that work across the AI lifecycle. These services help organizations to apply consistent, privacy aware protections without building custom security mechanisms from scratch, supporting safer and more controlled AI deployments.
Dedicated Safety and Security Services
Model Armor
This security service checks the inputs (prompts) and outputs (responses) of language models to find and reduce risks like harmful content and data exposure before they reach applications. It uses adjustable filters, allowing organizations to customize protections.
The below diagram explains the flow and sanitization applied at various steps: -
Image: https://docs.cloud.google.com/model-armor/overview
Model Armor flags content based on risk levels: High (high likelihood), Medium_and_above (medium or high likelihood), and Low_and_above (any likelihood). It features the responsible AI safety filter, which screens for hate speech, harassment, sexually explicit content, and dangerous material, while also protecting sensitive data and blocking harmful URLs to maintain trust and compliance in AI solutions.
Checks Guardrails API
Checks Guardrails are a runtime safety feature from Google that evaluates both input and output of AI models against predefined safety policies. These policies cover areas like Dangerous Content, Personally Identifiable Information (PII), Harassment, Hate Speech, and more. Each policy returns a score from 0 to 1, indicating the likelihood that the content fits the category, along with a result showing whether it passes or fails based on a set threshold.
These guardrails help ensure ethical and legal standards by identifying inappropriate or harmful content before it reaches users, enabling actions like logging, blocking, or reformulating outputs. The scoring provides insights into safety risks and supports trust and compliance in AI usage. The below image shows the supported policies provided by Check Guardrails API: -
Image: https://developers.google.com/checks/guide/ai-safety/guardrailsVertex AI Safety Filters
Vertex AI Safety Filters are safeguards from Google Cloud's Vertex AI platform that screen prompts and output. They operate independently to evaluate content before reaching an application, reducing the risk of harmful responses. There are two types of safety scores: -
- Based on probability of being unsafe
- Based on severity of harmful content
The probability safety attribute reflects the likelihood that an input or model response is associated with the respective safety attribute. The severity safety attribute reflects the magnitude of how harmful an input or model response might be. When content exceeds safety thresholds, it gets blocked.
ShieldGemma
ShieldGemma is a collection of safety classifier models released by Google, designed to evaluate text and images for compliance with safety policies – it is like using LLMs to analyze LLM inputs and LLM generated outputs against defined policies. Built on the Gemma family, the models come in various sizes (ShieldGemma 1 - 2B, 9B, 27B parameters for text and ShieldGemma 2 – 4B parameters for images) and can be fine-tuned.
These classifiers score content against categories such as sexually explicit material, hate speech, and harassment, providing clear labels on safety compliance. Their open weights allow for flexibility and integration into broader safety systems, and mitigating harmful outputs across different generative AI applications.
Conclusion
Google Cloud's AI safety systems aim to reduce harmful inputs and outputs by filtering content, but a right balance is required in the configurability to ensure that harmless information is not blocked. It meaningfully reduces risk for organizations compared to deploying an unprotected AI applications. Still, no tool on its own can provide complete safety. A well-rounded approach requires combining it with solid application design, clear organizational guidelines, tailored configurations, and continuous oversight.
References
https://docs.cloud.google.com/model-armor/overview
https://developers.google.com/checks/guide/ai-safety/guardrails
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters
https://ai.google.dev/responsible/docs/safeguards/shieldgemma
Article by Hemil Shah and Rishita Sarabhai





