Understanding Where AI/LLM Vulnerabilities Originate — and How to Fix Them?

Most discussions around AI/LLM security focus on what the vulnerabilities are i.e. prompt injection, data leakage, bias, abuse, or excessive agency including our past blogs. However, during real world engagements, the most important question which needs to be addressed is 

Which part of the AI system is actually introducing these vulnerabilities, and where should those vulnerabilities needs to be fixed?

In non-agentic implementations, the most common vulnerability is prompt injection which results in either unintended data access or data exfiltration or bias, abuse, or content manipulation.  

In agentic implementations, the most common vulnerability is LLM excessive agency, where the system performs actions beyond its intended scope.

These vulnerabilities are rarely caused by the LLM alone. They originate from how the overall system is architected and integrated. In practice, AI systems are not a single component. They are composed of multiple layers, each with a specific responsibility. Vulnerabilities appear when responsibilities of each of these components are blurred or controls are missing. It is important to have defence in depth approach and implement security at all components.

Let’s take an example of a typical AI/LLM architecture and discuss the layers and protections that needs to be implemented at each component: -



The sequence of layers introduces the most impactful vulnerabilities and thus these underlying vulnerabilities at the core should be fixed on top priority. Below are the nature of vulnerabilities arising due to weaknesses at various layers and the type of fixes that can be applied: -

API Layer – Authorization and Access Control
A frequent root cause of unintended data access or excessive agency is due to LLM responses triggering APIs without validating user permissions and/or over-privileged service accounts used by AI workflows.
Fix:

  • Enforce strict authorization checks on every API call
  • Validate user context before returning data
  • Do not rely on the LLM to decide access control
  • Restrict tool access based on task and role
  • Keep an approval-workflow for high-impact tools

The API layer will always be the final gatekeeper and thus this protection helps built secure AI systems. 

Code Layer – Prompt Encoding and Sanitization
Data exfiltration via prompts often succeeds because user inputs or model outputs are passed directly into HTML pages, logs, downstream systems or follow-up prompts as raw data without any encoding. 
Fix:

  • Encode and sanitize all user inputs
  • Encode and sanitize all LLM outputs
  • Treat LLM output as untrusted data, similar to user input

This is a traditional application security control that still applies in all AI systems that are returning user injected data as output to the users.

LLM / Reasoning Layer – Using Provider Guardrails
Most cloud AI services provide security features (like Azure AI Content Safety, Checks Guardrails APIs (Google)), but they are often disabled or misconfigured.
Fix:

  • Enable content filtering and safety controls
  • Tune policies based on use case
  • Do not treat default settings as sufficient

Provider guardrails are a baseline, not a complete solution. We will write a separate detailed blog entry in coming days on various configuration available for some of the LLM providers.

Application Layer – Custom Input and Output Controls
Relying only on LLM provider guardrails is insufficient. These guardrails can be bypassed by various prompt injection techniques and thus additional input/output validations are required to decrease the impact of prompt injection findings.
Introduce application-level validations such as: -

  • Block special characters
  • Input length restrictions
  • Allowed language checks
  • Encoding and decoding validation
  • Custom blacklists or allow lists for specific keywords or characters
  • Output validation before execution or display 

Prompt & LLM Integration Layer – Clear Instructions and Boundaries
Weak or ambiguous system prompts increase the likelihood of prompt injection and excessive agency. System prompts/instructions can be enhanced to clearly define what the model can do/not do, restrict response formats where possible, reinforce boundaries consistently across prompts. System prompts act as policy documents for the model and need continuous enhancements as and when bypasses are discovered – though they are not enough to block any attacks on AI systems. 

Conclusion
Just as with traditional applications, building secure AI systems is not really an afterthought, security has to be designed into the architecture from day one. As described above, AI security issues rarely originate from the LLM alone and are usually the result of missing controls across multiple layers. Thus, effective AI security requires: -

  • Understanding the architecture
  • Mapping risks to the correct layer
  • Applying traditional security principles alongside AI-specific controls

The above details might help organizations move beyond identifying vulnerabilities and build and architect secure AI systems in a structured and sustainable way. In the coming posts, we will share practical configurations and patterns that can be applied across these layers to help teams design and deploy AI implementations with security in mind.

Article by Hemil Shah and Rishita Sarabhai