Modern organizations generate large amounts of logs from multiple data sources, creating significant challenges when it comes to analyzing the data and extracting useful insights at scale. Data scientists can tackle these challenges with help from Mosaic AI, which helps Databricks users build and deploy artificial intelligence (AI) and machine learning (ML) solutions.
In this blog post, we examine three Mosaic AI use cases that enhance log analytics capabilities, allowing Databricks customers to streamline and accelerate insights from enterprise log data at scale.
"The Lake House" by Thomas Paal Photography is licensed under CC BY-NC-SA 2.0.
What is Databricks Mosaic AI?
Mosaic AI is a suite of tools that allows Databricks users to build, manage, and deploy software solutions that incorporate AI, ML, and large language model (LLM) technologies.
Mosaic AI is fully integrated within the Databricks Data Intelligence Platform, which provides a single solution for storing data in a unified data lakehouse, training AI and machine learning models, and deploying those AI/ML solutions in production.
Image Source
Workflow for building and deploying AI and ML applications using Mosaic AI products on the Databricks platform.
Databricks Mosaic AI encompasses the following products:
- Mosaic AI Vector Search - A queryable vector database integrated with the Databricks Platform, Mosaic AI Vector Search is used in LLM solutions to store and retrieve mathematical representations of the semantic contents of text or image data.
- Mosaic AI Agent Framework - A set of Databricks tools that allow developers to build, deploy, and evaluate AI agents using Retrieval Augmented Generation (RAG), an AI design technique that augments an existing LLM with an external knowledge base.
- Mosaic AI Model Serving - A solution for deploying LLMs and accessing Gen-AI models, including open LLMs (via Foundation Model APIs) and external LLMs hosted outside Databricks.
- Mosaic AI Gateway - A tool for managing the usage of Gen-AI models, Mosaic AI Gateway delivers monitoring, governance, and production readiness features like usage tracking, access permissions, and traffic routing.
- Mosaic AI Model Training - An AI model training solution that allows users to customize open-source LLMs or cost-effectively train new ones using enterprise data.
- Feature Store - A solution for creating, publishing, and re-using features used to train ML models or feed batch inference pipelines.
- Databricks AutoML - Databricks AutoML is a solution that provides a low-code approach to building, training, and deploying ML models.
- MLflow - MLflow is an open-source platform used to manage artifacts and workflows throughout the MLOps pipeline - from initial model development and training, through to deployment and operation.
- Lakehouse Monitoring - A tool for monitoring data quality in the data lakehouse, Lakehouse Monitoring can also be used to track the performance of ML models and model-serving endpoints.
Though not technically a Mosaic AI product, Databricks Unity Catalog is another important service that provides centralized discovery, management, and governance of models and data stored in the Databricks lakehouse.
3 Mosaic AI Use Cases to Supercharge Your Log Analytics
Organizations that generate large amounts of log data can centralize their data in a Databricks data lakehouse to create a single source of truth for log analytics initiatives. Centralizing enterprise cloud, security, and event logs allows data science teams to use the logs as training data for AI and ML models. These models can be built and deployed using Mosaic AI to automate log analytics use cases.
Below, we share three of the top Mosaic AI use cases that can help organizations streamline, automate, and accelerate insights from log analytics initiatives.
1. Streamlining Security Operations with AI
SecOps teams analyze security logs from throughout an organization’s IT infrastructure to support vital security functions like anomaly and threat detection, security incident investigation and response, proactive threat hunting, and root cause analysis of security incidents.
As organizations expand their digital presence and generate growing volumes of security log data, SecOps teams run into challenges like:
- Alert Fatigue - Security tooling generates a large number of alerts, many of which do not represent a genuine security concern. This can lead to SecOps teams missing or ignoring important alerts.
- Prioritizing Incidents - Too much data and too many alerts makes it difficult for SecOps teams to evaluate, prioritize, and respond to security incidents in a timely fashion, which makes it easier for digital adversaries to achieve their objectives.
- Detecting Complex Threats - The need to process large amounts of data can make it difficult for SecOps teams to track down sophisticated or stealthy threats to enterprise networks and systems.
Databricks Mosaic AI opens up a number of new potential strategies for overcoming these challenges and making it easier for enterprise SecOps teams to protect their assets, especially when dealing with large volumes of log data. SecOps teams can centralize log data in a security data lake, then use the data to build ML and LLM models with Mosaic AI. These could include:
- Building Anomaly Detection Models - SecOps teams can leverage Mosaic AI model training capabilities to build anomaly detection models that can automate the process of analyzing security log data in a unified data lakehouse to detect suspicious or anomalous network activity at scale.
- Using LLMs to Summarize Log Data - With help from Mosaic AI, SecOps teams can start using LLMs to summarize the contents of log data in plain text and make it easier for security analysts to understand the context or correlations behind a security incident.
- Enabling AI-Powered Threat Detection - SecOps teams can use Mosaic AI to build powerful ML models that can detect known Indicators of Compromise (IoCs) in real-time and at scale.
2. Enabling Cloud Observability with AI
Enterprise cloud operations teams analyze log data from cloud-based infrastructure and services to track cloud infrastructure performance, monitor resource utilization, detect unexpected changes in resource usage, identify latency issues, enforce compliance rules, and predict future resource consumption trends.
As organizations grow their presence in the cloud, the operational burden of monitoring and managing cloud infrastructure through log analytics also increases. This can make it more challenging for cloud operations engineers to detect anomalous activity, determine the root cause of latency issues, or make accurate predictions about the future in a timely fashion.
With Databricks Mosaic AI and a centralized data lakehouse of operational logs from the cloud, cloud operations engineers can leverage data and AI to build ML models that automate critical aspects of cloud infrastructure monitoring and help drive log analytics ROI. Databricks Mosaic AI use cases that enhance cloud observability could include things like:
- Building ML Models to Predict Service Failures - CloudOps engineers can train and deploy ML models to accurately predict future cloud service failures using Mosaic AutoML, MLflow, and historical cloud logs.
- Building ML Models to Detect Anomalous Activity - Anomaly detection is another example of a Mosaic AI use case that enhances cloud observability. Cloud engineers can use ML models to establish a baseline for normal utilization of cloud infrastructure and resources, then detect deviations from that baseline (e.g. traffic spikes, high latency, recurring errors, etc.) by analyzing log and telemetry data in real-time.
- Building ML Models to Forecast Resource Demand - CloudOps engineers can use Mosaic AI and historical log data to forecast future cloud resource demands based on past trends.
3. Extracting User Insights with AI
Enterprise DevOps and product teams analyze user behavior logs from digital products and cloud services to extract insights that help them understand the customer journey, identify application performance issues and bottlenecks, prioritize feature releases, predict user behavior patterns, and develop strategies to improve the overall user experience.
Organizations with large amounts of users interacting across multiple touchpoints generate massive amounts of log data, which frequently leads to analytics challenges around:
- Correlating Data from Disparate Systems - User behavior logs can originate from multiple systems and touchpoints. DevOps and product teams often lack the capability to efficiently correlate user data from disparate systems into a fully contextualized view of user behavior.
- Identifying Noise in Data - Popular apps may experience large volumes of spam or bot traffic. Behavior logs from these “fake” users are often collected and analyzed alongside genuine user data. DevOps and product teams often lack the tools to identify and exclude these logs from user behavior analytics, resulting in misleading insights.
- Decoding Customer Sentiment - Customer sentiment analysis can inform targeted interventions like personalized promotions, offers, or nudges - but DevOps and product teams often lack the necessary capability to decode customer sentiment in real-time and at scale, which is necessary for delivering effective interventions.
Databricks Mosaic AI capabilities give enterprise DevOps and product teams the opportunity to overcome these challenges and accelerate insights from user behavior analysis with help from AI, ML, and LLMs. With Mosaic AI, DevOps and product teams can do things like:
- Build ML Models to Identify Bot Activity - DevOps and product teams can leverage Mosaic AI capabilities to train ML models that can analyze user behavior logs to distinguish between genuine users and bot activity.
- Craft ML Models to Predict Customer Churn - Using Mosaic AI capabilities, product teams can leverage user behavior data from past customers to build an ML algorithm that predicts customer churn based on recent session data and recommends targeted interventions to boost customer retention.
- Use LLMs to Decode User Sentiment - With help from Databricks Mosaic AI, DevOps and product teams can centralize customer data in a unified data lakehouse, then use LLMs to analyze written feedback, reviews, support tickets, chatbot conversations, and other unstructured customer data. This analysis can be correlated with data from user sessions to identify positive/negative sentiment and help prioritize customer support or follow-up activities.
The ChaosSearch / Databricks Partnership: Bringing AI/ML to Observability and Security
ChaosSearch is now a Databricks Technology Partner, bringing Databricks users centralized log analytics for observability and security as part of a seamless experience on the Databricks Data Platform.
Databricks users can now centralize log and event data from throughout their IT and cloud infrastructure in the data lakehouse, then index the data with ChaosSearch in near-real time to enable full-text search, Gen-AI, and SQL querying.
From there, Databricks users can leverage Mosaic AI capabilities with enterprise logs as training data to build and deploy AI, ML, and LLM solutions that support a variety of log analytics use cases.
Ready to learn more?
Read the solution brief: Extend Your Databricks with ChaosSearch to learn more about how you can activate AI and ML for log analytics use cases with ChaosSearch and Databricks.