The Fading Line Between Data and Instruction in LLM-Driven Applications

Overview

In the ever-advancing landscape of technology, the once distinct boundary between data and instruction is steadily dissolving, particularly in the realm of Language Model (LM) driven applications. These applications, powered by impressive language models like GPT-3.5, built upon the foundation of Large Language Model (LLM) architecture, possess the extraordinary capability to process and generate vast volumes of text. This remarkable feat allows them to undertake tasks that were previously exclusive to human intelligence, marking a significant paradigm shift.

Over time, language models have undergone remarkable transformations, and LLMs have propelled this evolution to unprecedented heights. Traditional programming paradigms relied on explicit instructions to manipulate data. However, LLMs have revolutionized this approach by harnessing the power of large datasets, extracting intricate patterns, and leveraging their knowledge to generate text. This transformative capability empowers LLMs to undertake a diverse range of tasks, including natural language understanding, text completion, and even creative writing.

Root cause:

The diminishing line between data and instruction in LLM-driven applications can be attributed, in large part, to the dynamic interpretation of input. LLMs possess the remarkable ability to comprehend and interpret the contextual intricacies of the provided data, enabling them to discern the underlying task or instruction at hand. This dynamic interpretation empowers LLMs to generate responses or outputs that align seamlessly with the intended objective, even when the instructions are not explicitly specified.

LLMs shine in their aptitude for contextual understanding, a trait previously reserved for human cognition. By meticulously analyzing the surrounding text, LLMs can discern the desired instructions or tasks implicitly embedded within the data. This context-driven approach equips them to generate outputs that are remarkably accurate and relevant, even when the inputs are incomplete or ambiguous.

Security implication:

While the blurring line between data and instruction in LLM-driven applications opens up new horizons and offers unparalleled convenience, it also raises crucial considerations regarding security. On one hand, this convergence facilitates more seamless and intuitive interactions with technology, as users can input data in a natural and flexible manner. On the other hand, concerns arise regarding the potential for misinterpretation or bias in the outputs generated by these models.

Prompt Leakage

The inadvertent exposure of initial prompts in language models, known as prompt leakage, has sparked concerns regarding the disclosure of sensitive information, biases, and limitations embedded within them. This phenomenon poses potential risks to privacy and security.

Language models, including the powerful LLM (Large Language Model), often come preconfigured with specific prompts to initiate the generation of responses and guide the model's behavior. These prompts may contain sensitive information, limitations, or inherent biases that should be treated with utmost confidentiality. However, prompt leakage occurs when a language model unintentionally reveals its initial prompt configurations, undermining the safeguarding of such sensitive content.

The following link provides a compilation of examples showcasing leaked system prompts: [Link: https://matt-rickard.com/a-list-of-leaked-system-prompts].

Prompt Injection

A Large Language Model (LLM) stands as a formidable and intricate form of language model, distinguished by its immense size and parameter count. LLMs, often based on architectures like Transformers, undergo training on vast datasets, equipping them with the ability to learn patterns, grammar, and semantics from extensive textual sources. These models boast billions of parameters, granting them the capacity to generate coherent and contextually relevant text across a wide array of tasks and applications.

Within the realm of machine learning, particularly in the context of LLMs, the role of a prompt is of paramount importance. A prompt serves as the initial input or instruction provided to the model, shaping its behavior and influencing the output it produces. It acts as a starting point or contextual framework for the model's responses or task execution. Prompts can assume various forms, ranging from a few words to a sentence, a paragraph, or even a dialogue, tailored to the specific requirements of the model and the task at hand. The primary purpose of a prompt is to furnish the necessary context, constraints, or explicit instructions that guide the model's decision-making process and shape the content it generates.

Prompt injection, conversely, entails purposefully embedding a targeted instruction, query, question, or context into the model's prompt to manipulate or influence its subsequent output. Through skillful construction of the prompt, users can steer the model's responses in their desired direction or elicit specific types of answers. Prompt injection grants users greater control over the generated text, enabling them to tailor the model's output to meet specific criteria or objectives.

The concept of prompt injection proves especially valuable when fine-tuning the model's responses to attain desired outcomes or generate content that aligns with specific requirements. It empowers users to guide the model's creative output and shape the conversation in accordance with their needs. Prompt injection finds applications in various domains, including generating creative writing, providing customized responses in chatbots, or facilitating specific tasks such as code generation or translation.

Nevertheless, it is crucial to acknowledge that prompt injection can introduce vulnerabilities and raise ethical concerns. There is a potential for malicious actors to manipulate the model to generate harmful or biased content. Hence, it is of utmost importance to implement safeguards, robust validation mechanisms, and regular model updates to mitigate potential risks.

Here is the link with examples and possbilities - https://simonwillison.net/2023/Apr/14/worst-that-can-happen/

SSRF - Detection, Exploitation, and Mitigation Techniques [Part 2]

In the previous section, we explored different techniques for detecting Server-Side Request Forgery (SSRF) based on the application's scenarios. Now, let's delve into the exploitation techniques associated with SSRF, which come into play once SSRF has been confirmed within the application. These techniques aim to assess the vulnerability's risk or impact. The SSRF exploitation process can be divided into two main parts.

Exploiting Application Internal infrastructure:

  • Companies utilize various architectural patterns for running applications, including reverse proxies, load balancers, cache servers, and different routing methods. It is crucial to determine if an application is running on the same host. URL bypass techniques can be employed to invoke well-known URLs and Protocols like localhost (127.0.0.1) and observe the resulting responses. Malicious payloads can sometimes trigger error messages or responses that inadvertently expose internal IP addresses, providing valuable insights into the internal network.
  • Another approach involves attempting connections to well-known ports on localhost or leaked IP addresses and analyzing the responses received on different ports.
  • Application-specific information, such as the operating system, application server version, load balancer or reverse proxy software/platform, and vulnerable server-side library versions, can aid in targeting specific payloads for exploitation. It is also worthwhile to check if the application permits access to default sensitive files located in predefined locations. For example, on Windows systems, accessing critical files like win.ini, sysprep.inf, sysprep.xml, and NTLM hashes can be highly valuable. A comprehensive list of Windows files is available at https://github.com/soffensive/windowsblindread/blob/master/windows-files.txt. On Linux, an attacker may exfiltrate file:////etc/passwd hashes through SSRF.
  • If the application server runs on Node.js, a protocol redirection attack can be attempted by redirecting from an attacker's HTTPS server endpoint to HTTP. For instance, using a URL like https://attackerserver.com/redirect.aspx?target=http://localhost/test.
  • It is essential to identify all endpoints where the application responds with an 'access denied' (403) error. These URLs can then be used in SSRF to compare differences in responses.
  • By identifying the platform or components used in an application, it becomes possible to exploit platform-specific vulnerabilities through SSRF. For example, if the application relies on WordPress, its admin or configuration internal URLs can be targeted. Platform-specific details can be found at https://github.com/assetnote/blind-ssrf-chains, which assists in exploiting Blind/Time-based SSRF.
  • DNS Rebinding attack: This type of attack occurs when an attacker-controlled DNS server initially responds to a DNS query with a valid IP address with very low TTL value, but subsequently returns internal, local, or restricted IP addresses. The application may allow these restricted IP addresses in later requests while restricting them in the first request. DNS Rebinding attacks can be valuable when the application imposes domain/IP-level restrictions.
  • Cloud metadata exploitation: Cloud metadata URLs operate on specific IP addresses and control the configuration of cloud infrastructures. These endpoints are typically accessible only from the local environment. If an application is hosted on a cloud infrastructure and is susceptible to SSRF, these endpoints can be exploited to gain access to the cloud machine.

Amazon (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html)

  • http://169.254.169.254/
  • http://169.254.169.254/latest/meta-data/
  • http://169.254.169.254/latest/user-data
  • http://169.254.169.254/latest/user-data/iam/security-credentials/<<role>>
  • http://169.254.169.254/latest/meta-data/iam/security-credentials/<<role>>
  • http://169.254.169.254/latest/meta-data/ami-id
  • http://169.254.169.254/latest/meta-data/hostname
  • http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key
  • http://169.254.169.254/latest/meta-data/public-keys/<<id>>/openssh-key

Google (https://cloud.google.com/compute/docs/metadata/querying-metadata)

  • http://169.254.169.254/computeMetadata/v1/
  • http://metadata.google.internal/computeMetadata/v1/
  • http://metadata/computeMetadata/v1/
  • http://metadata.google.internal/computeMetadata/v1/instance/hostname
  • http://metadata.google.internal/computeMetadata/v1/instance/id
  • http://metadata.google.internal/computeMetadata/v1/project/project-id

Azure (https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=windows)    

  • http://169.254.169.254/metadata/v1/maintenance 

 

Exploiting external network

  • If an application makes backend API calls and an attacker is aware of the backend API domains, they can exploit SSRF to abuse the application by targeting those backend APIs. Since the application is already authenticated with the domain of the backend API, it provides an avenue for the attacker to manipulate the requests.
  • Furthermore, an attacker can utilize a vulnerable application as a proxy to launch attacks on third-party servers. By leveraging SSRF, they can make requests to external servers through the compromised application, potentially bypassing security measures in place.
  • SSRF can be combined with other vulnerabilities such as XSS (Cross-Site Scripting), XXE (XML External Entity), Open redirect, and Request Smuggling to amplify the impact and severity of the overall vulnerability. This combination of vulnerabilities can lead to more advanced attacks and potentially result in unauthorized access, data leakage, or server-side compromise.

In the next section of this blog, we will delve into various strategies and techniques for preventing and mitigating SSRF attacks in different application scenarios.

Article by Amish Shah