Strategizing Penetration Testing of AI Implementations!

The global AI market is valued at nearly $500 billion, as of this year, and is projected to grow rapidly. Given the rapid growth, adoption of AI in business tasks and the nature of uncovered vulnerabilities, rigorous testing is a necessity for reaping the benefits of AI without adding any risks. In these contextual implementations, the architecture typically involves a front-end layer with a back-end API connecting to the LLMs - thus the traditional application penetration testing needs to be enhanced to include LLM based test cases. This also includes cases where traditional attacks like RCE and SQLi are identified via LLM prompts as demonstrated in previous blogs. The design and behavior of LLMs makes it less predictable and more resource-intensive than conventional application penetration testing. Here we are listing down some of the generic challenges in designing an ideal approach for testing AI implementations and a strategy/solution to enhance security of AI based applications.


Dynamic Nature & Complexity:
AI/ML implementations are built on distinct algorithms and models with unique features, distinguishing them from traditional applications. These systems are inherently non-deterministic, meaning they can generate different outputs for the same input based on varying conditions. Additionally, AI/ML components often need to integrate with other parts of an application, leading to diverse and intricate architectures for each implementation. In contrast to traditional applications that usually undergo testing and validation only when modified (code change), AI/ML systems continuously learn, train, and adapt to new data and inputs. This continuous learning process adds complexity to traditional assessment approaches and generic risk assessment strategies. The risk of the identified LLM vulnerabilities might reduce with time based on the exploit scenarios but would also increase significantly with new exploit techniques being identified like zero-day vulnerabilities. 

Tools & Techniques:
As there is less industry standardization and most attacks are scenario/implementation driven, this type of testing requires a blend of cybersecurity expertise and advanced knowledge in AI/ML, making it highly interdisciplinary. It is imperative for testers to understand how to manipulate input data and model behaviours to deceive AI systems and perform context-driven testing. It requires a critical thinking ability which typically automated tools do not have.

Adequate and Accurate Training Data:
Adequate and accurate training data is crucial for the successful implementation of AI systems, as the quality of data directly influences the model's performance and reliability. However, obtaining such data is often industry and context-dependent and comprehensive datasets are typically not available at the outset of an AI implementation. Instead, these datasets evolve over time through self-learning and continuous feedback as the application is used. This iterative process allows the AI system to refine its models and improve accuracy, but also introduces challenges in ensuring data quality, relevance, and security of the system.

Risk Assessment:
The risk and severity of vulnerabilities in AI/ML implementations, such as data breaches or model biases, vary significantly depending on the context. Factors like the sensitivity and classification of data (e.g., personal, financial, healthcare), as well as the potential business impact of these vulnerabilities, are crucial considerations. Key influencers in risk assessment include regulatory requirements, ethical implications, the specific characteristics of AI algorithms and their applications, and potential societal impacts. These variables underscore the importance of tailored risk assessments and mitigation strategies that address the unique complexities and potential repercussions of AI/ML vulnerabilities across different scenarios.

Human-driven testing and ongoing evaluation are indispensable for ensuring the reliability, security, and ethical operation of AI-driven applications. 

1. Human-driven testing involves experts manually assessing the AI system's performance, identifying potential biases, vulnerabilities, and unintended behaviours that automated testing might miss. Moreover, scenario/context based implementation bypasses, which lead to vulnerabilities like SQLi, RCE etc., through prompts are uncovered only by critical thinking.
2. Ongoing testing is crucial because AI models evolve continuously, necessitating periodic assessments to detect changes in performance, accuracy, or ethical implications based on self-learning and fine-tuning of LLMs. This iterative testing process is essential for mitigating risks and ensuring that AI-driven implementations consistently meet the business requirements and expectations of users and stakeholders.

It is advisable to combine the above and build a unique testing approach to test AI applications that would include the below coverage: -

AI Model Interpretation - Effective testing begins with a thorough grasp of the underlying AI model driving the application. This involves understanding the core algorithms, input data, and anticipated outcomes. With a detailed understanding of the AI's behaviour, precise test scenarios can be created.

Data Relevance, Biases & LLM Scoring - The AI application should undergo testing with a diverse range of data inputs, including edge cases, to ensure it responds accurately across different scenarios. Furthermore, it's essential to validate the integrity and relevance of the data to avoid biased outcomes. Depending on the context and implementation, specific categories should be defined to analyze outcomes in these particular scenarios. Each implementation should then be scored based on categories like fairness/biases, abuse, ethics, PII (input + output), code, politics, hallucination, out-of-context, sustainability, threat, insult etc.

Scenario Specific Testing - Develop test scenarios that simulate real-world situations. For instance, if the AI application is a Chabot, create scenarios where users ask common questions, unusual queries, or complex inquiries. Evaluate how effectively the AI responds in each scenario. Additionally, consider real-world threats such as phishing attacks, malware distribution, and theft of Personally Identifiable Information (PII), and assess their potential impact on the implementation. Moreover, critical thinking about the scenarios and use cases introduces the potential of uncovering traditional attacks like RCE, SQLi etc. through LLM based vulnerabilities like "Prompt Injection".

Risk/Impact Assessment - An assessment of risk and impact in AI based implementations is the process of evaluating the outcomes of AI-based decisions or actions within real-world contexts. This evaluation includes assessing how AI models influence various aspects like business operations, user trust, regulatory compliances, brand image and societal impacts. The primary step is to comprehensively understand both the intended and unintended behavior of AI based applications. Based on that, organizations can identify potential risks that may arise from the deployment of AI based use cases, analyze the impact on different stakeholders and then take significant measures to mitigate any negative impact.  

An ideal risk assessment approach for reviewing AI-driven applications would be human-in-the-loop (HTIL) continuous penetration testing especially for the LLM based components (after a first full initial review of the implementation) especially due to the below factors: -

1. LLM based vulnerabilities cannot be reproduced directly (based on evidence/steps like in traditional application penetration testing reports) since LLM behavior is driven by the context of that particular chat – the behavior is different even if there is small word change in the chat. In order to fix these vulnerabilities, developers would need real-time validation where someone can test on-the-go for quite some time with the developer’s fine tuning their guardrails to block some exploits/attacks in parallel.

2. The remediation for LLM related vulnerabilities is typically not a code fix (like traditional applications) but introduction of a variety of guardrails/system context/meta prompt. For example, a "Prompt Injection" vulnerability identified in an initial report would be risk rated based on the attack/exploit scenario at that time and various guardrails for input – output content filtering would be introduced to remediate the vulnerability – this calls for a re-assessment of the risk. As the impact of these findings mainly depends on the nature of data and exploitability and LLMs are continuously evolving with new bypass techniques (just like zero-day vulnerabilities) each day – an ongoing assessment (red teaming) of such vulnerabilities should be in place to review the implementations for real-world threats like data leakage, unintended access, brand impact and compliance issues etc. 

3. The automated tools available at this point in time are prompt scanners which run dictionary attacks, like brute-forcing/fuzzing, but lack the logic/context of the implementation. These tools would help in scoring the LLMs on generic categories like ethics, biases, abuse etc. but fail to uncover contextual attacks like inter-tenant data leakage, or retrieving back-end system information etc.   

Article by Rishita Sarabhai & Hemil Shah

Freedom to Customize GPT – Higher Vulnerabilities!


Based on the cost related to developing, training, and deploying large language models from scratch, most organizations prefer to use pre-trained models from established providers like OpenAI, Google, or Microsoft. These models can then be customized by end users to suit their specific needs. This approach allows users to benefit from the advanced capabilities of LLMs without bearing the high costs of building them, enabling tailored implementations and persona's for specific use cases. Instructing large language models (LLMs) often involves three main components: user prompts, assistant responses, and system context. 

Interaction Workflow 

User Prompt: The user provides an input to the LLM, such as a question or task description.
System Context: The system context is either pre-configured or dynamically set, guiding the LLM on how to interpret and respond to the user prompt.
Assistant Response: The LLM processes the user prompt within the constraints and guidelines of the system context and generates an appropriate response.

In this specific implementation, the GPT interface allowed users to customize the GPT by punching in custom instructions and thus utilize the BOT for certain contextual conversations in order to get better output. Moreover, the customized GPT (in form of a persona with custom instructions) could be shared with other users of the application.


An ability to provide custom instructions to the GPT means being able to instruct the GPT in system context. The system context acts as a rulebook for the BOT and thus gives the end users a means to manipulate the behavior of the LLM and share the customized persona with other users of the application. A malicious user can then write instructions that could cause various impacts like along with doing its normal use case of answering contextual questions from the user: -

1.    The BOT trying to steal information (like chat history) by rendering markdown images every time the user asks a question

2.    The BOT trying to poke other users of the BOT to provide their sensitive/PII information

3.    The BOT trying to spread mis-information to the end users

4.    The BOT providing phishing links to the end users in the name of collecting feedback

5.    The BOT using biased, abusive language while providing reverts to end users


The main impact for such kind of LLM attacks is the brand image of the organization. The highest impact would be data exfiltration followed by phishing, data leakage etc. Additionally, an implementation with such behavior would also be a very poorly scored LLM implementation when analyzed based on parameters like fairness/biases, abuse, ethics, PII (input + output), code, politics, hallucination, out-of-context, sustainability, threats and insults etc.

Fixing the Vulnerability?

The first and foremost requirement would be implementing real-time content filtering to detect and block harmful outputs before they reach the user and using moderation tools that flag or block abusive, offensive, and unethical content etc. by scoring/categorization based on various parameters while following the instructions provided to the LLMs. Additionally, any implementation that allows the end users to write instructions to the LLM as a base, requires LLM guardrails at an input level as well such that malicious instructions cannot be fed to the LLM.

Article by Rishita Sarabhai & Hemil Shah