ChaosSearch Blog - Tips for Wrestling Your Data Chaos

Bringing Together Short & Long-Term Log Data with ELK Analytics – Part 2

Written by Les Yetton | May 10, 2018

Part 2: ELK + CHAOSSEARCH — The Future of Log and Event Analytics

Today’s blog post is a follow-up to last week’s Part 1 post providing an overview of ELK: The Good, the Bad and the Ugly. In this blog we discuss how CHAOSSEARCH + Amazon S3 can be implemented next to ELK (or as an Elasticsearch replacement) to deliver a scalable, secure, cost-effective solution for long-term search and analytics on historical log and event data. Think ELK analytics.

CHAOSSEARCH Turns S3 into a Searchable Elastic Cluster

From a functionality and cost standpoint, ELK is great for managing real-time time series log data coming in from services and applications. Let’s call this HOT data.

The ELK stack, as we discussed in Part 1 of this series, is arguably the most popular open-source tool used today as a building block in a log management system. A building block — yes. A complete solution — no. As we discussed, it is expensive to build and maintain; it is expensive to scale; and is cost-prohibitive for anyone looking to retain data over time. And as I’m sure you know, there’s a lot of value that can be harvested from historical or WARM and cold log and event data over time — that’s where CHAOSSEARCH comes in.

CHAOSSEARCH is a cloud analytics service built on AWS that extends the power of Elasticsearch and Kibana onto Amazon S3. Built from the ground up utilizing industry-leading technology, CHAOSSEARCH is a data fabric that extends Amazon S3 to include ELK Stack functionality. CHAOSSEARCH allows businesses to quickly derive insights from long-term log and event data stored in S3 via the Elasticsearch API and Kibana – at a dramatically reduced data footprint and cost.

The bottom line is that a single Elasticsearch cluster is cost-prohibitive when used as both hot and warm data stores.

Hot data should be separated from warm since the requirements are different between the two data stores — such as query response time, data retention, etc. Using CHAOSSEARCH, SaaS businesses can scale back the size and complexity of their ELK clusters and quickly index, search, and visualize data directly in S3; and extend the value of their data into months and years.

CHAOSSEARCH scales your data, not your infrastructure. As shown in the image below, log data is directed to a HOT Elasticsearch (ELK) cluster for real-time alerts and monitoring, while simultaneously streamed into CHAOSSEARCH + S3 for organization, preparation, and indexing for WARM historical analytics.

ELK + CHAOSSEARCH — The Future of Log and Event Analytics

As more businesses move their IT resources to cloud services like AWS, Azure, and GCP, scalable and secure logging solutions will become even more important. In these cloud environments, performance isolation of both systems and applications can be difficult to pin down, particularly when systems are tasked with a heavy workload. Log management systems such as the ELK stack are a good solution for real-time monitoring and processing of operating system logs, NGINX and IIS server logs for technical SEO and web traffic analysis, application logs, ELB and S3 logs on AWS.

But as data grows, so does cost. A typical ELK environment can cost upwards of $4,500/month to support a workload of 100 GB data per day. Businesses want access to more data over longer time periods. They are looking for more comprehensive long-term business trends and the ability to investigate ongoing performance or security issues. However, due to cost and budget constraints, they are forced to prematurely delete or archive valuable data.

Rather than incur the potentially disastrous opportunity cost of deleting data permanently, many organizations choose to archive their historical data in other storage solutions, typically at a fraction of the cost. This seems a valid solution until they try to gain insight from that data at a future date. Because this data is no longer within a structured store, engineering-intensive ETL processes must be used. Once this data is loaded to the new target analytics platform (e.g. a separate Elasticsearch cluster, or relational database), data is now duplicated and storage costs increase.

Optimal log management and analytics strategies use both ELK and CHAOSSEARCH combined as complementary technologies. The ELK stack is optimized for processing alerts on real-time time series of data. CHAOSSEARCH turns S3 into a warm, searchable Elastic cluster for cost-effective analytics on historical data sets.

ELK + CHAOSSEARCH provides a balanced short-and-long-term solution at a price point that you can sustain over time. CHAOSSEARCH leverages S3 cost economics and enables historical trend and machine learning analytics at a fraction of the cost of an ELK solution. It supports both relational analytics and text-based search from a single solution. With CHAOSSEARCH, log and event data can be organized, managed, indexed, and analyzed directly via REST-based S3 and Elasticsearch APIs, delivering value in minutes and enabling your DevOps teams, data engineers, and data analysts to be more productive.

Learn more about how to easily and cost-effectively access months and years of log and event data with CHAOSSEARCH at chaossearch.io!