Breaking the Logjam of Log Analytics | ChaosSearch

Written by Guest Blogger: Kevin Petrie, Vice President Research, Eckerson Group | Feb 9, 2021

To understand the value of logs—those many digital records of hardware and software events—picture a big puzzle. You put all the pieces together to make sense of them.

Every day the modern enterprise generates billions of logs, each capturing a user log-in, application record change, network service interruption—as well as the messages these entities send to one another.

Data teams collect and normalize logs, then use that data to correlate events, describe patterns and identify anomalies.

This log analytics process helps control IT operations, reduce security risk, and enable compliance.

Enterprises need log analytics to monitor and manage their fast-digitizing businesses. But rising log volumes can overwhelm their supporting architectures, most notably the ELK stack (comprising the open-source Elasticsearch, Logstash and Kibana.) Enterprise data teams need more efficient ways to index, search and query all those log files, especially to support AI/ML algorithms. One way is a new, lightweight index… which might boost scale and performance for many additional workloads.

Rising log volumes can overwhelm log analytics architectures

such as the ELK stack of Elasticsearch, Logstash and Kibana.

How to Use Log Analytics for Your Enterprise Organization

There are four primary use cases for log analytics: ITOps, DevOps, security and customer analytics.

ITOps: Platform and site reliability engineers analyze IT logs from applications, devices, systems and network infrastructure. This helps them monitor and adjust IT service delivery, to improve performance and reliability.
DevOps Analysis: DevOps engineers analyze IT logs to track and optimize the software development lifecycle. This helps them speed up releases, reduce bugs, and improve collaboration between development and operations teams.
Security Analysis: Security administrators analyze logs of events such as user authentication attempts, firewall blocking actions, file integrity checks, and the detection of malware. They use their findings to predict, assess and respond to threats, and assist compliance efforts.
Customer Analytics: Marketing managers and BI analysts study customers’ digital footprints, including website clicks, asset downloads, service requests and purchases. Analyzing these logs helps describe customer behavior, identify intent, predict behavior and prescribe action.

These use cases have a common problem: processing data at scale.

Perhaps the greatest chokepoint is indexing.

As log volumes rise, they can inflate data indexes, which in turn drives up processing overhead and chokes search and query workloads. For example, the ELK stack runs on the Apache Lucene search engine, whose “inverted index” lacks (among other things) the compression needed to properly handle larger workloads. Lucene users also must spend time setting up—and scaling—their clusters, schemas, and shards.

Figure 1. How Event and Message Logs Overload the ELK Stack

All of this hurts productivity, impairs performance, and pushes cloud compute costs through the roof.

To meet SLAs and budget requirements, many enterprises are forced to scale back by shortening their log retention periods.

This in turn makes analytics output less specific and potentially less accurate. In short, log analytics are yet another case of log jammed data.

How Enterprises Can Better Access and Use Their Log Data

ITOps, DevOps and SecOps teams have a few options to break the logjam.

Carefully inventory their logs and exclude those with little value—perhaps those related to test environments.
Migrate their ELK stack to a cloud platform and leverage the elastic compute resources for bursty workloads (but watch those compute costs!).
Consider an architectural change to remediate persistent bottlenecks.

New cloud-based platforms can address the final option. They rapidly and dramatically compress indexed data, which is critical given the high number and small content of logs. Users can automatically discover, normalize and catalog all those log files, and assemble metadata to improve query planning—all with a smaller footprint than predecessors such as Lucene. The log data remains in place, but presents a single logical view with familiar visualization tools the user already knows (such as Kibana via open APIs).

Using a new solution like ChaosSearch, enterprise data teams can increase the scope and scale of their log analytics, which makes their ITOps, DevOps and security initiatives more effective.

Enterprise data teams should watch this space. Offerings like ChaosSearch will continue to make this data easier to analyze.

Log Analytics and Application Insights

This case study for log analytics underscores three guiding principles that apply to all aspects of enterprise analytics.

First, in a volatile world, find ways to cost-effectively process and analyze as many data points as possible, including historical data. Last month’s logs might help identify the security threat that reappears next month.

Second, build architectures that minimize cloud compute cycles wherever possible to avoid runaway costs.

Third, and perhaps most importantly, seek out technologies that are fit for purpose. In this case, when you have a puzzle with millions of pieces, you need a fast and simple way to index the pieces.