Immediate Insights In-Place: ELB Logs on S3

Amazon has a couple of powerful load balancers that you can deploy within AWS. The Application Load Balancer (ALB) and the Network Load Balancer (NLB) are the current generations of this service that aims to provide customers with high-performance load balancing and application delivery for their services. Load balancers can act as the “front door” to your website or customer-facing application, and being able to capture and query log data for these services is critical for the long-term support and overall happiness of your customers. You never want your customers to be the ones monitoring your site, and if you can’t quickly and efficiently debug web applications, you are flying blind. ELB services allow for logging all the requests to buckets in your Amazon S3 account, but what do you do when you actually want to ask questions and get answers from your data?

You could spend valuable engineering time provisioning up an ELK Stack, but as your data volume grows, the cost and management overhead can easily overwhelm many companies. Alternatively, you could leverage AWS Athena to bring SQL style queries to your log data, but when you start reaching terabyte levels of data volumes, you could get to a point very quickly where your costs could be measured in hundreds of dollars PER QUERY. Since Athena lacks any sort of native visualization, you are still on the hook to integrate 3rd party tools to start getting answers to your questions.

This is where CHAOSSEARCH comes in. Don’t spend time moving your ELB log data out of your AWS S3 buckets into ELK; it can take about 150 minutes to download 1 TB of data over a 1 GB connection. Also, don’t waste time building schemas and configuring data visualization with Athena. I will show you how you can use the power of CHAOSSEARCH to go from RAW ELB log data to in-depth answers all within minutes. No Elasticsearch, no data movement — CHAOSSEARCH can integrate with your data in your S3 bucket, indexing that data back into your bucket so the data is always under your control.

In my CHAOSSEARCH account, I have data from many sources that stream into a single bucket — I’m going to create an “Object Group” — which looks like a Virtual S3 Bucket in the CHAOSSEARCH world, and allows you to catalog and categorize your data into logical groups quickly. We can target specific prefixes or just use regex if our data is spread across many different prefixes. In my case, I’ll use regex to group together all my ELB logs.

A great time-saving feature of the CHAOSSEARCH platform is the ability to identify and parse your semi-structured log data automatically. With JSON or CSV data, we can quickly and easily parse that dataset because the structure is part of the event. But with semi-structured log data, engineers often need to create the line parsing themselves. We have streamlined this process, providing you with quick auto regex parsing for many types of generic log datasets.

Another huge time saver for technical operators is the CHAOSSEARCH ability to create your schema and mappings for you automatically. We provide a revolutionary new technology to allow for schema-on-read concepts, which both automatically applies the schema for your data and will enable you to change your schema without ever having to reindex that data.

CHAOSSEARCH can process both “on-demand” datasets as well as “real-time” data indexing.  If you are continually streaming data into your S3 bucket via Logstash, Fluentd, or even AWS Kinesis Firehose, we can automatically get notified of each S3 PUT request and index the data in minutes. You can also use our recently announced CloudFormation templates which will automatically set up all of the IAM roles and S3 to SQS to CHAOSSEARCH in one click.

After indexing, we can navigate directly to our fully integrated Kibana interface — where indexed object groups are now visible just like an Elasticsearch index.

Now we can start searching and visualizing our ELB logs just like we would in an ELK cluster, without needing the complexity and expense of a hot Elasticsearch cluster. CHAOSSEARCH also fully indexes ALL fields in the documents, which means no longer needing to choose between storage size and fields to index.

CHAOSSEARCH indexes are often about 25% the size of a Lucene index on Elasticsearch.

Now that I have this data ready for a query, I want to understand the rate of non-200 errors within my load balancers logs. I can navigate to the “Visualize” tab and create a pie chart, showing me all the non-200 backend status codes over the last 7 days.

From here, I now want to visualize the frequency of each non-200 status code over time. Did all the failures happen on the same day? Or did they happen across a more random time?

As you can see, these errors were all somewhat randomly distributed across the last 7 days. Let’s now see if we can find out WHICH endpoints all these failures were happening on. Was it a single faulty endpoint, or are there a large number of locations in our application that are failing?

Now what we’ve learned is that no one single endpoint is where all the failures are. But that doesn’t mean we can’t keep diving into this data set. Because CHAOSSEARCH provides customers with unlimited retention of their log data, we can expand the time horizon of this query — let’s dive into all the failed ELB endpoints broken out by status code for the past YEAR.

Now you can see all of the top endpoints that were failing across all of my ELB requests for the last 12 months of log data. I have now unlocked the value of my data, and since I don’t need to worry about data storage and retention requirements anymore, I’m able to continue to ask larger and even broader questions across months and years. Imagine being able to plan for your web application future scale forecasting with hyper accuracy because you can finally visualize seasonality within your log data. One CHAOSSEARCH customer was able to conclusively identify, debug, and fix a bug in their application stack because they were able to create detailed visualizations of their user behavior and activity across the last 3 years of their application logs.

What is most amazing to me, as a former technical operator, is that I was able to go from raw data to deep insights and intelligence all within MINUTES of setting up my CHAOSSEARCH account.

What actionable insights and intelligence are you missing out on because you are forced to delete your log and event data due to the increasing cost and complexity of your ELK stack?  Reach out today so we can show you how quickly and easily you can get answers from your log and event data.