AWS enables you to easily turn on logging for most services and have those logs quickly land in your Amazon S3 bucket. However, once the logs are in your bucket, how are you supposed to make sense of them to derive value?
Why is it hard to get insights into data?
AWS provides best practices documentation that will point you to numerous services like Lambda, Glue, Quicksight, and Athena. These services are simply building blocks, and it’s up to the user to tie them together to gain any real insights or answers about specific data questions. And, when you do build them into a complete solution, they lack the ability for you to hunt and search across your data. For example, finding specific error codes or IP addresses is a challenge if you don’t know where to look. This is why tools like Elasticsearch and the ELK (Elasticsearch – Logstash – Kibana) Stack have exploded in popularity. With the ELK Stack, users gain the ability to use the powerful log analysis tool, Kibana, to hunt, search, and aggregate their log and event data. However, in order to leverage these solutions, you have to move your data out of S3.
Why move logs out of Amazon S3?
Customers we speak with do so because of some of the inherent limitations of services like Athena. These customers have told us that while Athena can be useful if you know exactly what you are looking for, it does require you to bring your own visualization tool, or perhaps even build one yourself to get the answers you need. To do any sort of time trending analysis of your data, or even basic data visualizations and aggregations, you need to integrate AWS Quicksight with your Athena data repository or set up third-party tools like Tableau. Even after all of this, customers say they want to be able to search across their logs, and that’s where Athena falls short. Athena only gives you SQL style access to your S3 data.
We’ve also talked with customers that have taken the plunge and are using Logstash with the S3 input plugin to move logs out of S3 and into a format they can now actually search. However, they quickly learn (and this is why they reach out to us) when your data volume starts growing in Elasticsearch, it becomes extremely expensive in both capital and human resources.
Even if you outsource the operation of Elasticsearch to a hosted service like Amazon Elasticsearch, you’ll find that the main storage format, Lucene, was never designed for storing highly structured data. Lucene was designed to ADD structure to unstructured data. Think about indexing emails or other types of documents. In those cases, the underlying indices would always be much smaller than the source data. But with highly structured log data, Lucene wants to add MORE structure. So you end up in a situation where the indices are at best only a tiny fraction smaller than the source data. If you compare the Lucene indices to the source data in a compressed state like GZIP, you’ll find the Lucene indices could be many times larger.
Don’t move your data. Store everything. Ask anything
This is where CHAOSSEARCH can help. We don’t want you to move your data out of your S3 buckets. Our fully managed service integrates with your S3 buckets and indexes the data right back into your Amazon account. We write the indices back in a highly compressed state — which saves you money in the long run. Even though all your data has been compressed, it’s still in a fully searchable state. The highly efficient storage size (on par with GZIP) gives you the ability to get immediate answers to your questions and derive value from your data.
What’s even more powerful is that if you’re already an Elasticsearch or ELK user, there is nothing new that you or your team need to learn with CHAOSSEARCH. We’ve extended the Elasticsearch API on top of YOUR data on Amazon S3, so you continue using the tools you already know such as Kibana to hunt, query, and visualize log and event data.
Interested in learning more about CHAOSSEARCH? We would love to talk with you and demo the platform.