How to Detect Threats to AI Systems with MITRE ATLAS Framework
Cyber threats against AI systems are on the rise, and today’s AI developers need a robust approach to securing AI applications that address the unique vulnerabilities and attack patterns associated with AI systems and ML models deployed in production environments.
In this blog, we’re taking a closer look at two specific tools that AI developers can use to help detect cyber threats against AI systems:
- MITRE Adversarial Threat Landscape for Artificial Intelligence Systems (ATLAS) framework
- MLOps Monitoring
You’ll discover the meaning of MLOps for modern AI developer teams, why digital adversaries are targeting AI systems, how to understand threats against AI systems using the MITRE ATLAS framework, and how AI developers can use the MITRE ATLAS framework with MLOps monitoring to detect cyber threats against AI systems.
What is MLOps Monitoring?
Machine Learning Operations (MLOps) is a set of tools and best practices for designing, developing, deploying, and operating ML models in production reliably and efficiently. MLOps incorporates theoretical frameworks and best practices from machine learning, data engineering, and DevOps to help AI developers effectively manage the entire lifecycle of machine learning models, from ML model design and development to deployment and operation.
MLOps spans the entire lifecycle of an ML model, from design and development to deployment and operation.
MLOps Monitoring is the continuous process of monitoring, tracking, and observing ML models deployed in production environments. MLOps monitoring can support a variety of use cases, including:
- Model Performance Monitoring - Tracking the precision and accuracy of model predictions and conducting model evaluation to measure the performance of the model.
- Infrastructure Monitoring - Tracking the ML model’s resource usage to ensure efficiency, avoid bottlenecks or outages, and optimize costs.
- Anomaly Detection and Alerting - Detecting unexpected or suspicious ML model behaviors that might indicate an adversarial attack and sending automatic alerts to AI developer teams.
- Versioning/Deployment Monitoring - Ensuring that the correct version of the model is deployed and tracking changes to the model in production.
- Security Log Analytics - Capturing security, access control, user behavior, and other types of logs from interactions with the ML model and analyzing those logs to diagnose performance or security issues.
MLOps monitoring is often focused on diagnosing performance issues, preventing model degradation, and ensuring efficient resource utilization, but AI developers are increasingly leveraging an MLOps monitoring approach to detect security threats against AI systems.
Why are Digital Adversaries Targeting AI Systems?
Machine learning applications and AI-based intelligent systems are increasingly being targeted by digital adversaries with malicious intent. To identify and defend against these attacks, enterprise security teams should start by understanding the most common motivations behind these attacks. These include:
Financial Gain
Digital adversaries often search for ways to manipulate AI models for their own financial gain. This encompasses several types of attacks, such as:
- Financial Fraud - Digital adversaries may try to commit financial fraud by targeting and manipulating AI-based algorithms intended for applications like fraud prevention, securities trading, and loan approval.
- IP Theft - Digital adversaries may target AI systems in hopes of stealing valuable intellectual property (IP) that may be sold to a competitor company or nation-state.
- Data Theft - Digital adversaries can exploit vulnerabilities to steal sensitive or proprietary data from an AI-based system. This data may be sold to the highest bidder, used to commit identity theft attacks, or used to support downstream attacks against the targeted organization’s people, digital assets, etc.
Targeted Sabotage/Disruption
Targeted sabotage and business disruption are another important source of motivation for digital adversaries. By attacking the availability of AI systems or degrading AI models and their outputs, digital adversaries can disrupt the operations of a targeted organization and damage their reputation, often resulting in poor end-user experiences and revenue loss.
Enabling Downstream Attacks
A cyber attack against an AI-based system is often used as a means to enable downstream attacks against other systems. This could involve:
- Gaining Initial Access - Digital adversaries might attack an AI system in hopes of gaining deeper access to secure networks and systems operated by the targeted organization.
- Exploiting AI Dependency - If an organization’s cybersecurity defenses rely on an AI-based system for threat detection or decision-making, adversaries can exploit that dependency by attacking the AI system in hopes of weakening the target’s defenses and making it more vulnerable to follow-up attacks.
- Weaponizing AI Systems - Digital adversaries can weaponize AI-based applications by hijacking models and automating malicious activities, often supported by the targeted organization’s own data storage and compute resources.
Understanding Attacks on AI Systems with the MITRE ATLAS Framework
The MITRE framework is a knowledge base of documented and categorized cyber threats against AI systems.
MITRE ATLAS, an extension of the MITRE ATT&CK framework, describes 14 adversarial tactics used by cyber criminals to penetrate targeted AI systems, escalate privileges, gain access to confidential information or sensitive data, exfiltrate that data, and ultimately cause harm to the targeted organization. For each tactic, the MITRE ATLAS framework details one or more techniques that have been used by digital adversaries to achieve that tactical objective. Some of these techniques are sufficiently versatile that they can be deployed in different ways to achieve different tactical objectives in the process of attacking an AI system.
Below, we summarize the 14 tactics described in the MITRE ATLAS framework and give brief examples of the techniques used by digital adversaries to achieve each tactical objective.
1. Reconnaissance
Reconnaissance is a tactic used by digital adversaries to obtain information about a machine learning system that can be used to plan future attacks. The MITRE ATLAS framework details several different reconnaissance techniques used by digital adversaries, such as:
- Searching for the target’s publicly available research materials or vulnerability analysis documentation.
- Searching application repositories or websites operated by the target.
- Active Scanning: Actively probing or scanning the targeted ML system for information or vulnerabilities.
2. Resource Development
Resource development is a tactic where digital adversaries attempt to create, purchase, or steal resources that can support an attack against an AI system. Resource development may involve techniques like:
- Acquiring public ML artifacts.
- Obtaining or developing digital capabilities and/or infrastructure to support adversarial operations.
- Training Data Poisoning: Modifying an ML system’s training data to create vulnerabilities that can be exploited later.
- Publishing poisoned datasets, models, or hallucinated entities to public locations where they may be accessed by (and used to target) legitimate users.
3. Initial Access
Digital adversaries utilize a range of techniques to gain initial access to ML and AI systems, such as:
- ML Supply Chain Compromise: Compromising portions of the ML supply chain, especially the tech stack used to develop a targeted AI application.
- Creating valid accounts to access the AI application.
- Stealing access credentials for AI applications via social engineering methods like phishing or impersonation.
- LLM Prompt Injection: Inputting malicious prompts that cause an LLM to act outside its intended purpose.
In this graphic illustrating a basic prompt injection attack, a user is deploying a malicious prompt to trick an AI system into disclosing private information.
4. ML Model Access
A common goal for digital adversaries is to gain access to the targeted machine learning model, giving them knowledge of the model’s architecture, parameters, and other details. Digital adversaries can gain access to ML models via public APIs, by utilizing ML-enabled products and services, by accessing the physical environment where the model is hosted, or by deploying cyber attacks against the host infrastructure.
If a digital adversary can exfiltrate the entire ML model, they can use it to develop adversarial inputs or craft malicious prompts and verify their effects before deploying them against targeted production systems that run the same models.
5. Execution
In the MITRE ATLAS framework, execution encompasses a range of techniques used by digital adversaries to run malicious code embedded in ML artifacts or software on the target organization’s host machine. These include:
- Deploying traditional malware or social engineering attacks against users.
- LLM Plugin Compromise: Exploiting LLM access to compromise plugins that connect the LLM to other systems and executing API calls to integrated applications.
- Abusing command and script interpreters.
6. Persistence
Persistence tactics are used by digital adversaries to maintain their access to ML artifacts or software over time. Documented persistence techniques include:
- LLM Prompt Self-Replication - Injecting malicious self-replicating LLM prompts that propagate and persist on the targeted system.
- Backdoor ML Model - Inserting a backdoor into the model that can be exploited by the attacker to re-establish access when needed by using a specially designed back door trigger.
- Training Data Poisoning
7. Privilege Escalation
Privilege Escalation encompasses techniques used by digital adversaries to gain higher-level permissions within an AI system. Adversaries may attempt to exploit system weaknesses, vulnerabilities, or misconfigurations to obtain higher access.
According to the MITRE ATLAS framework, digital adversaries have been observed using techniques like LLM prompt injection and LLM plugin compromise to escalate privileges on targeted AI systems. A third technique known as LLM Jailbreak involves crafting a malicious prompt that will bypass LLM security controls, filters, and usage restrictions, allowing the attacker to exploit the AI system in unintended ways.
8. Defense Evasion
Defense Evasion encompasses several techniques used by digital adversaries to avoid detection by ML-enabled security monitoring software:
- Evade ML Model - A technique where attackers try to evade ML-based security systems by crafting adversarial data that the ML model cannot identify correctly.
- LLM Prompt Injection
- LLM Jailbreak can be used to bypass security controls and other restrictions inside an AI system.
9. Credential Access
After successfully compromising the security of an AI system, attackers may search the system to find insecurely stored credentials. Such credentials may be used to escalate privileges and enable further downstream attacks against integrated systems.
10. Discovery
Discovery encompasses a group of adversarial techniques whose common goal is to extract useful knowledge about the targeted ML system and the network where that system is hosted. Digital adversaries may attempt to discover specific information like:
- ML Model Ontology - Understanding how the model’s output is restricted.
- ML Model Family - Understanding the type of ML model used to develop an AI system.
- ML Artifacts - Discovering ML artifacts that exist on a secured network.
- LLM Hallucinations - Identifying hallucinated information that may be exploited to launch a cyber attack,
- AI Model Outputs - Identifying AI outputs that may reveal important information about the system or weaknesses that can be exploited.
Adversaries may also try to discover the initial instructions given to an AI system, also called the “meta prompt”, which can reveal insights about the inner workings of an ML system and its intended purpose.
11. Collection
Collection includes several techniques used by digital adversaries to gather ML artifacts and other information that may be relevant to the core purpose of the attack. Adversaries frequently target ML artifacts on the host network, as well as data from information repositories and from local systems.
12. ML Attack Staging
ML attack staging is the final tactic in the MITRE ATLAS framework before data exfiltration. At this point, adversaries have gained initial access to an AI system, escalated privileges, evaded defenses, learned valuable information about the targeted ML system, and collected data they wish to exfiltrate from the target system.
ML attack staging encompasses techniques that digital adversaries use to leverage their newly developed knowledge of an AI system into a successful attack. It includes techniques like:
- Establishing backdoor access to ensure access to the ML model is retained, regardless of whether the attack is detected or prevented.
- Creating a proxy ML model to simulate the attack and verify the outcome before attacking the targeted AI system.
- Crafting adversarial data or inputs that will cause the targeted AI system to behave according to the attacker’s intentions.
13. Data Exfiltration
Data exfiltration encompasses the techniques used by digital adversaries to steal ML artifacts, proprietary or confidential information about ML systems, and other kinds of sensitive data.
Data exfiltration may be conducted via the ML Inference API, transferred over a C2 channel, or transmitted over some other network connection. Adversaries can also craft malicious prompts that trick LLMs into releasing other sensitive data, including private user data and proprietary information or IP.
14. Impact
Impact techniques are used by digital adversaries to disrupt the availability of an AI system, often causing financial and reputational damage to the targeted organization. They include techniques like:
- Erode ML Model Integrity - Eroding the integrity of an ML model with adversarial data inputs.
- Cost Harvesting - Spamming queries to drive up operational costs for an AI system, causing financial damage to the targeted organization.
- Denial of ML Service - Flooding an AI system with requests and junk traffic to degrade service for legitimate users.
How to Detect 5 Common Threats to AI Systems with MLOps Monitoring
In our summary above of the 14 tactics in MITRE ATLAS, we briefly described a broad range of techniques used by digital adversaries to target AI systems. Now, let’s take a closer look at five of the most common threats against AI systems and how AI security engineers can detect those threats with MLOps Monitoring.
1. Data Poisoning
Data poisoning is when a digital adversary modifies the training data of an ML model to degrade the accuracy of the model, cause the model to behave incorrectly, and/or introduce vulnerabilities in the model that may be exploited later.
Data poisoning attacks often target ML models in the training and development stage, before the model is deployed in a production environment. As a result, MLOps monitoring that takes place after the ML model is deployed may not be able to detect data poisoning attacks. However, AI developers can deploy MLOps monitoring software in the model training phase to identify data poisoning attacks by:
- Monitoring training data for anomalies that could indicate a data poisoning attempt.
- Using data provenance tracking to maintain records of data sources and transformations.
- Using model performance monitoring on ML models in the production environment to track performance metrics and identify sudden drops in model precision or accuracy that could indicate the training data has been compromised.
- Implementing data integrity checks that use rules, schemas, or checksums to identify outliers or detect changes to data that could indicate a data poisoning attack.
2. ML Evasion Attack
An ML evasion attack is when a digital adversary manipulates input data in a way that causes an ML model to make incorrect predictions or behave incorrectly.
ML evasion attacks exploit weaknesses in an ML model without affecting the model training process, which can make them challenging to detect. However, AI developers can use MLOps monitoring to detect ML evasion attacks by:
- Training the ML model on adversarial examples that help the algorithm learn to detect adversarial inputs in real-time.
- Detecting and alerting on anomalous activity, such as high misclassification rates (unexpected low ML model precision/accuracy) or unexpected confidence scores that could indicate an ML Evasion Attack.
Adversarial examples are deployed against ML-based services in production to degrade classifier performance and create vulnerabilities in the model that can be exploited.
3. Supply Chain Compromise
A supply chain compromise attack is when a digital adversary attacks an AI system by targeting external tools, libraries, or other components of the tech stack used to develop an ML model. A successful supply chain compromise can introduce backdoors or other vulnerabilities into the AI system.
AI developers can use MLOps monitoring software tools to proactively search for Indicators of Compromise (IoCs) throughout the supply chain and deploy countermeasures before digital adversaries can successfully exfiltrate their data. Developers can also track changes to ML pipeline using log analytics to identify unauthorized modifications in the tech stack that might indicate a supply chain compromise attack.
4. LLM Plugin Compromise
An LLM plugin compromise attack is when a digital adversary targets software plugins or extensions that integrate an LLM in production with other applications or systems on the same network. This can allow the attacker to escalate permissions or execute unauthorized API calls on applications that shouldn’t be publicly accessible.
AI developers can use MLOps monitoring software to:
- Detect and alert on any unauthorized changes to installed plugins.
- Implement continuous MLOps monitoring and threat detection to detect suspicious or unexpected plugin behaviors, such as unexpected API calls or data access.
- Track and analyze plugin access logs to detect unauthorized access or suspicious usage patterns.
5. LLM Prompt Injection
LLM prompt injection is when a digital adversary feeds malicious prompts to an LLM that cause it to behave in unintended ways, such as bypassing security protocols and other usage restrictions.
LLM prompt injection is an excellent candidate for detection via MLOps monitoring, as this type of attack necessarily occurs on LLMs deployed in production. To detect LLM prompt injection attacks, AI developers can implement strategies like:
- Using anomaly detection to analyze ML model outputs for unintended or harmful responses that violate established security rules and could indicate a prompt injection attack.
- Monitoring user interactions with the ML model to identify unusual behavior patterns that could indicate an attempt at LLM prompt injection.
Implement MLOps Monitoring and Secure AI Systems with ChaosSearch
MLOps monitoring is a vital capability for AI developers focused on detecting cyber threats against ML models and securing AI systems against cyber attacks. With ChaosSearch, AI developers can implement MLOps monitoring at scale to efficiently monitor, track, and observe AI systems in production.
Using cloud object storage as a cost-effective storage backing, ChaosSearch gives AI developers comprehensive observability of AI systems so developers can efficiently measure the performance of ML models, monitor cloud infrastructure and resource utilization, capture and analyze real user behavior data, hunt for security threats, and alert on any suspicious or anomalous activity that could indicate a cyber attack.
With ChaosSearch, AI developers can detect unauthorized changes to ML algorithms and training data, capture security log data at scale to identify suspicious access patterns or login behavior, and detect anomalous activity that could indicate a threat to AI systems.
Ready to learn more?
Discover how you can create a security data lake for MLOps monitoring and safeguard your AI systems in production with ChaoSearch.