Cyber threats against AI systems are on the rise, and today’s AI developers need a robust approach to securing AI applications that address the unique vulnerabilities and attack patterns associated with AI systems and ML models deployed in production environments.
In this blog, we’re taking a closer look at two specific tools that AI developers can use to help detect cyber threats against AI systems:
You’ll discover the meaning of MLOps for modern AI developer teams, why digital adversaries are targeting AI systems, how to understand threats against AI systems using the MITRE ATLAS framework, and how AI developers can use the MITRE ATLAS framework with MLOps monitoring to detect cyber threats against AI systems.
Machine Learning Operations (MLOps) is a set of tools and best practices for designing, developing, deploying, and operating ML models in production reliably and efficiently. MLOps incorporates theoretical frameworks and best practices from machine learning, data engineering, and DevOps to help AI developers effectively manage the entire lifecycle of machine learning models, from ML model design and development to deployment and operation.
MLOps spans the entire lifecycle of an ML model, from design and development to deployment and operation.
MLOps Monitoring is the continuous process of monitoring, tracking, and observing ML models deployed in production environments. MLOps monitoring can support a variety of use cases, including:
MLOps monitoring is often focused on diagnosing performance issues, preventing model degradation, and ensuring efficient resource utilization, but AI developers are increasingly leveraging an MLOps monitoring approach to detect security threats against AI systems.
Machine learning applications and AI-based intelligent systems are increasingly being targeted by digital adversaries with malicious intent. To identify and defend against these attacks, enterprise security teams should start by understanding the most common motivations behind these attacks. These include:
Digital adversaries often search for ways to manipulate AI models for their own financial gain. This encompasses several types of attacks, such as:
Targeted sabotage and business disruption are another important source of motivation for digital adversaries. By attacking the availability of AI systems or degrading AI models and their outputs, digital adversaries can disrupt the operations of a targeted organization and damage their reputation, often resulting in poor end-user experiences and revenue loss.
A cyber attack against an AI-based system is often used as a means to enable downstream attacks against other systems. This could involve:
The MITRE framework is a knowledge base of documented and categorized cyber threats against AI systems.
MITRE ATLAS, an extension of the MITRE ATT&CK framework, describes 14 adversarial tactics used by cyber criminals to penetrate targeted AI systems, escalate privileges, gain access to confidential information or sensitive data, exfiltrate that data, and ultimately cause harm to the targeted organization. For each tactic, the MITRE ATLAS framework details one or more techniques that have been used by digital adversaries to achieve that tactical objective. Some of these techniques are sufficiently versatile that they can be deployed in different ways to achieve different tactical objectives in the process of attacking an AI system.
Below, we summarize the 14 tactics described in the MITRE ATLAS framework and give brief examples of the techniques used by digital adversaries to achieve each tactical objective.
Reconnaissance is a tactic used by digital adversaries to obtain information about a machine learning system that can be used to plan future attacks. The MITRE ATLAS framework details several different reconnaissance techniques used by digital adversaries, such as:
Resource development is a tactic where digital adversaries attempt to create, purchase, or steal resources that can support an attack against an AI system. Resource development may involve techniques like:
Digital adversaries utilize a range of techniques to gain initial access to ML and AI systems, such as:
In this graphic illustrating a basic prompt injection attack, a user is deploying a malicious prompt to trick an AI system into disclosing private information.
A common goal for digital adversaries is to gain access to the targeted machine learning model, giving them knowledge of the model’s architecture, parameters, and other details. Digital adversaries can gain access to ML models via public APIs, by utilizing ML-enabled products and services, by accessing the physical environment where the model is hosted, or by deploying cyber attacks against the host infrastructure.
If a digital adversary can exfiltrate the entire ML model, they can use it to develop adversarial inputs or craft malicious prompts and verify their effects before deploying them against targeted production systems that run the same models.
In the MITRE ATLAS framework, execution encompasses a range of techniques used by digital adversaries to run malicious code embedded in ML artifacts or software on the target organization’s host machine. These include:
Persistence tactics are used by digital adversaries to maintain their access to ML artifacts or software over time. Documented persistence techniques include:
Privilege Escalation encompasses techniques used by digital adversaries to gain higher-level permissions within an AI system. Adversaries may attempt to exploit system weaknesses, vulnerabilities, or misconfigurations to obtain higher access.
According to the MITRE ATLAS framework, digital adversaries have been observed using techniques like LLM prompt injection and LLM plugin compromise to escalate privileges on targeted AI systems. A third technique known as LLM Jailbreak involves crafting a malicious prompt that will bypass LLM security controls, filters, and usage restrictions, allowing the attacker to exploit the AI system in unintended ways.
Defense Evasion encompasses several techniques used by digital adversaries to avoid detection by ML-enabled security monitoring software:
After successfully compromising the security of an AI system, attackers may search the system to find insecurely stored credentials. Such credentials may be used to escalate privileges and enable further downstream attacks against integrated systems.
Discovery encompasses a group of adversarial techniques whose common goal is to extract useful knowledge about the targeted ML system and the network where that system is hosted. Digital adversaries may attempt to discover specific information like:
Adversaries may also try to discover the initial instructions given to an AI system, also called the “meta prompt”, which can reveal insights about the inner workings of an ML system and its intended purpose.
Collection includes several techniques used by digital adversaries to gather ML artifacts and other information that may be relevant to the core purpose of the attack. Adversaries frequently target ML artifacts on the host network, as well as data from information repositories and from local systems.
ML attack staging is the final tactic in the MITRE ATLAS framework before data exfiltration. At this point, adversaries have gained initial access to an AI system, escalated privileges, evaded defenses, learned valuable information about the targeted ML system, and collected data they wish to exfiltrate from the target system.
ML attack staging encompasses techniques that digital adversaries use to leverage their newly developed knowledge of an AI system into a successful attack. It includes techniques like:
Data exfiltration encompasses the techniques used by digital adversaries to steal ML artifacts, proprietary or confidential information about ML systems, and other kinds of sensitive data.
Data exfiltration may be conducted via the ML Inference API, transferred over a C2 channel, or transmitted over some other network connection. Adversaries can also craft malicious prompts that trick LLMs into releasing other sensitive data, including private user data and proprietary information or IP.
Impact techniques are used by digital adversaries to disrupt the availability of an AI system, often causing financial and reputational damage to the targeted organization. They include techniques like:
In our summary above of the 14 tactics in MITRE ATLAS, we briefly described a broad range of techniques used by digital adversaries to target AI systems. Now, let’s take a closer look at five of the most common threats against AI systems and how AI security engineers can detect those threats with MLOps Monitoring.
Data poisoning is when a digital adversary modifies the training data of an ML model to degrade the accuracy of the model, cause the model to behave incorrectly, and/or introduce vulnerabilities in the model that may be exploited later.
Data poisoning attacks often target ML models in the training and development stage, before the model is deployed in a production environment. As a result, MLOps monitoring that takes place after the ML model is deployed may not be able to detect data poisoning attacks. However, AI developers can deploy MLOps monitoring software in the model training phase to identify data poisoning attacks by:
An ML evasion attack is when a digital adversary manipulates input data in a way that causes an ML model to make incorrect predictions or behave incorrectly.
ML evasion attacks exploit weaknesses in an ML model without affecting the model training process, which can make them challenging to detect. However, AI developers can use MLOps monitoring to detect ML evasion attacks by:
Adversarial examples are deployed against ML-based services in production to degrade classifier performance and create vulnerabilities in the model that can be exploited.
A supply chain compromise attack is when a digital adversary attacks an AI system by targeting external tools, libraries, or other components of the tech stack used to develop an ML model. A successful supply chain compromise can introduce backdoors or other vulnerabilities into the AI system.
AI developers can use MLOps monitoring software tools to proactively search for Indicators of Compromise (IoCs) throughout the supply chain and deploy countermeasures before digital adversaries can successfully exfiltrate their data. Developers can also track changes to ML pipeline using log analytics to identify unauthorized modifications in the tech stack that might indicate a supply chain compromise attack.
An LLM plugin compromise attack is when a digital adversary targets software plugins or extensions that integrate an LLM in production with other applications or systems on the same network. This can allow the attacker to escalate permissions or execute unauthorized API calls on applications that shouldn’t be publicly accessible.
AI developers can use MLOps monitoring software to:
LLM prompt injection is when a digital adversary feeds malicious prompts to an LLM that cause it to behave in unintended ways, such as bypassing security protocols and other usage restrictions.
LLM prompt injection is an excellent candidate for detection via MLOps monitoring, as this type of attack necessarily occurs on LLMs deployed in production. To detect LLM prompt injection attacks, AI developers can implement strategies like:
MLOps monitoring is a vital capability for AI developers focused on detecting cyber threats against ML models and securing AI systems against cyber attacks. With ChaosSearch, AI developers can implement MLOps monitoring at scale to efficiently monitor, track, and observe AI systems in production.
Using cloud object storage as a cost-effective storage backing, ChaosSearch gives AI developers comprehensive observability of AI systems so developers can efficiently measure the performance of ML models, monitor cloud infrastructure and resource utilization, capture and analyze real user behavior data, hunt for security threats, and alert on any suspicious or anomalous activity that could indicate a cyber attack.
With ChaosSearch, AI developers can detect unauthorized changes to ML algorithms and training data, capture security log data at scale to identify suspicious access patterns or login behavior, and detect anomalous activity that could indicate a threat to AI systems.
Discover how you can create a security data lake for MLOps monitoring and safeguard your AI systems in production with ChaoSearch.