Blueinfy's blog: Strategizing Penetration Testing of AI Implementations!

The global AI market is valued at nearly $500 billion, as of this year, and is projected to grow rapidly. Given the rapid growth, adoption of AI in business tasks and the nature of uncovered vulnerabilities, rigorous testing is a necessity for reaping the benefits of AI without adding any risks. In these contextual implementations, the architecture typically involves a front-end layer with a back-end API connecting to the LLMs - thus the traditional application penetration testing needs to be enhanced to include LLM based test cases. This also includes cases where traditional attacks like RCE and SQLi are identified via LLM prompts as demonstrated in previous blogs. The design and behavior of LLMs makes it less predictable and more resource-intensive than conventional application penetration testing. Here we are listing down some of the generic challenges in designing an ideal approach for testing AI implementations and a strategy/solution to enhance security of AI based applications.

Challenges

Dynamic Nature & Complexity:
AI/ML implementations are built on distinct algorithms and models with unique features, distinguishing them from traditional applications. These systems are inherently non-deterministic, meaning they can generate different outputs for the same input based on varying conditions. Additionally, AI/ML components often need to integrate with other parts of an application, leading to diverse and intricate architectures for each implementation. In contrast to traditional applications that usually undergo testing and validation only when modified (code change), AI/ML systems continuously learn, train, and adapt to new data and inputs. This continuous learning process adds complexity to traditional assessment approaches and generic risk assessment strategies. The risk of the identified LLM vulnerabilities might reduce with time based on the exploit scenarios but would also increase significantly with new exploit techniques being identified like zero-day vulnerabilities.

Tools & Techniques:
As there is less industry standardization and most attacks are scenario/implementation driven, this type of testing requires a blend of cybersecurity expertise and advanced knowledge in AI/ML, making it highly interdisciplinary. It is imperative for testers to understand how to manipulate input data and model behaviours to deceive AI systems and perform context-driven testing. It requires a critical thinking ability which typically automated tools do not have.

Adequate and Accurate Training Data:
Adequate and accurate training data is crucial for the successful implementation of AI systems, as the quality of data directly influences the model's performance and reliability. However, obtaining such data is often industry and context-dependent and comprehensive datasets are typically not available at the outset of an AI implementation. Instead, these datasets evolve over time through self-learning and continuous feedback as the application is used. This iterative process allows the AI system to refine its models and improve accuracy, but also introduces challenges in ensuring data quality, relevance, and security of the system.

Risk Assessment:
The risk and severity of vulnerabilities in AI/ML implementations, such as data breaches or model biases, vary significantly depending on the context. Factors like the sensitivity and classification of data (e.g., personal, financial, healthcare), as well as the potential business impact of these vulnerabilities, are crucial considerations. Key influencers in risk assessment include regulatory requirements, ethical implications, the specific characteristics of AI algorithms and their applications, and potential societal impacts. These variables underscore the importance of tailored risk assessments and mitigation strategies that address the unique complexities and potential repercussions of AI/ML vulnerabilities across different scenarios.

Solution/Strategy
Human-driven testing and ongoing evaluation are indispensable for ensuring the reliability, security, and ethical operation of AI-driven applications.

1. Human-driven testing involves experts manually assessing the AI system's performance, identifying potential biases, vulnerabilities, and unintended behaviours that automated testing might miss. Moreover, scenario/context based implementation bypasses, which lead to vulnerabilities like SQLi, RCE etc., through prompts are uncovered only by critical thinking.
2. Ongoing testing is crucial because AI models evolve continuously, necessitating periodic assessments to detect changes in performance, accuracy, or ethical implications based on self-learning and fine-tuning of LLMs. This iterative testing process is essential for mitigating risks and ensuring that AI-driven implementations consistently meet the business requirements and expectations of users and stakeholders.

It is advisable to combine the above and build a unique testing approach to test AI applications that would include the below coverage: -

AI Model Interpretation - Effective testing begins with a thorough grasp of the underlying AI model driving the application. This involves understanding the core algorithms, input data, and anticipated outcomes. With a detailed understanding of the AI's behaviour, precise test scenarios can be created.

Data Relevance, Biases & LLM Scoring - The AI application should undergo testing with a diverse range of data inputs, including edge cases, to ensure it responds accurately across different scenarios. Furthermore, it's essential to validate the integrity and relevance of the data to avoid biased outcomes. Depending on the context and implementation, specific categories should be defined to analyze outcomes in these particular scenarios. Each implementation should then be scored based on categories like fairness/biases, abuse, ethics, PII (input + output), code, politics, hallucination, out-of-context, sustainability, threat, insult etc.

Scenario Specific Testing - Develop test scenarios that simulate real-world situations. For instance, if the AI application is a Chabot, create scenarios where users ask common questions, unusual queries, or complex inquiries. Evaluate how effectively the AI responds in each scenario. Additionally, consider real-world threats such as phishing attacks, malware distribution, and theft of Personally Identifiable Information (PII), and assess their potential impact on the implementation. Moreover, critical thinking about the scenarios and use cases introduces the potential of uncovering traditional attacks like RCE, SQLi etc. through LLM based vulnerabilities like "Prompt Injection".

Risk/Impact Assessment - An assessment of risk and impact in AI based implementations is the process of evaluating the outcomes of AI-based decisions or actions within real-world contexts. This evaluation includes assessing how AI models influence various aspects like business operations, user trust, regulatory compliances, brand image and societal impacts. The primary step is to comprehensively understand both the intended and unintended behavior of AI based applications. Based on that, organizations can identify potential risks that may arise from the deployment of AI based use cases, analyze the impact on different stakeholders and then take significant measures to mitigate any negative impact.

Conclusion
An ideal risk assessment approach for reviewing AI-driven applications would be human-in-the-loop (HTIL) continuous penetration testing especially for the LLM based components (after a first full initial review of the implementation) especially due to the below factors: -

1. LLM based vulnerabilities cannot be reproduced directly (based on evidence/steps like in traditional application penetration testing reports) since LLM behavior is driven by the context of that particular chat – the behavior is different even if there is small word change in the chat. In order to fix these vulnerabilities, developers would need real-time validation where someone can test on-the-go for quite some time with the developer’s fine tuning their guardrails to block some exploits/attacks in parallel.

2. The remediation for LLM related vulnerabilities is typically not a code fix (like traditional applications) but introduction of a variety of guardrails/system context/meta prompt. For example, a "Prompt Injection" vulnerability identified in an initial report would be risk rated based on the attack/exploit scenario at that time and various guardrails for input – output content filtering would be introduced to remediate the vulnerability – this calls for a re-assessment of the risk. As the impact of these findings mainly depends on the nature of data and exploitability and LLMs are continuously evolving with new bypass techniques (just like zero-day vulnerabilities) each day – an ongoing assessment (red teaming) of such vulnerabilities should be in place to review the implementations for real-world threats like data leakage, unintended access, brand impact and compliance issues etc.

3. The automated tools available at this point in time are prompt scanners which run dictionary attacks, like brute-forcing/fuzzing, but lack the logic/context of the implementation. These tools would help in scoring the LLMs on generic categories like ethics, biases, abuse etc. but fail to uncover contextual attacks like inter-tenant data leakage, or retrieving back-end system information etc.

Article by Rishita Sarabhai & Hemil Shah

Pages

Strategizing Penetration Testing of AI Implementations!