Guest post by Tom McLaughlin, ServerlessOps
In an earlier post, I talked about how the Build versus Buy argument changes when you are doing serverless development or managing serverless infrastructure. Even if you are buying AWS services, you still need to build the glue between them and build the missing features you may require.
So while I built a serverless application to extract AWS cost data, I want to explain why I chose CHAOSSEARCH to perform my data analysis rather than building my own platform or using a managed Elasticsearch + Kibana (ELK) stack.
What Was I Building?
As a part of the serverless world, I’ve been interested in a concept called FinDev. At a high level, it’s the tracing of capital flow through serverless systems. While tracking revenue is a long way off, we can already track cost as a first-class system metric. The data for serverless system cost is readily available through the standard AWS Cost And Usage report; BUT analyzing and visualizing the data in a useful manner is outside the scope of the current AWS bill analysis tools.
The problem I wanted to understand and dive into with my account billing data was:
How would code changes and features impact the cost of a running system?
I started off by building a serverless application using AWS Lambda for normalizing some of the billing report data which was then deposited into S3. (There are some interesting idiosyncrasies in AWS billing report data that make tracking cost data over time inaccurate without handling these certain cases.) What I eventually had from a lightly used AWS account was over 10,000 AWS billing line items per day, all individually stored as separate objects in S3. This was fine for my early analysis with AWS Athena as I initially investigated my data, but I quickly outgrew that.
I knew I had to add a new feature to my serverless application for improved data analysis capability. I settled on Elasticsearch and Kibana due to some familiarity with them. My initial options were to either run my own Elasticsearch cluster or use the AWS Elasticsearch service. I had no desire to run EC2 instances on my own, so AWS Elasticsearch looked attractive.
I was also already familiar with CHAOSSEARCH and intrigued by the idea of using a low-cost analytics service for my existing billing data store. This third option intrigued me…
So here were my options for analyzing my billing data data further.
- Build and operate my own Elasticsearch cluster
- Buy an Elasticsearch cluster and build the glue to integrate it into my application
- Buy an analytics platform to analyze my data in place
Why I Chose CHAOSSEARCH
Let’s walk through the highlights of why I chose CHAOSSEARCH over alternatives, and some of the benefits.
The first win for CHAOSSEARCH was on integration. In particular, what work would I have to do to use the product with my dataset currently sitting in S3? I’ve already accumulated a large set of cost data and will continue to as my serverless application continues to run daily.
If I were to operate my own Elasticsearch cluster or use a managed solution, I’d need to do two things.
- First, I’d need to build a new feature in my application to additionally ship data to my new Elasticsearch cluster. Writing a single record to Elasticsearch is easy… But writing tens to hundreds of thousands of records to Elasticsearch in unpredictable bursts is not. There’s not just development work, but also operational work required to make this a success.
- Second, I’d need to backfill my existing data in S3 into Elasticsearch. That’s more development and operations work to migrate data between systems. All this work is added development and operational cycles to getting up and running and analyzing my data.
By contrast, CHAOSSEARCH allows me to use my existing data in S3 without moving it. No new application features or lengthy process of moving data into Elasticsearch. Instead, I integrate my AWS account with the CHAOSSEARCH platform, create an object group in CHAOSSEARCH, and explore my data.
I had my data ready to be analyzed through CHAOSSEARCH in less time than it took me to plan an Elasticsearch integration before even touching any code.
Single Source of Data Truth
If I had gone the route of running an Elasticsearch cluster or using a managed service, I’d also be dealing with duplication of data. If I ship data to Elasticsearch, I’d still need to store it someplace else, such as S3. Storing all my data in Elasticsearch would gradually become too expensive. As a result, it’s not uncommon for people to age data out of Elasticsearch and rely on S3 as long term storage because of its cost and reliability. But why do that if you don’t have to?
Instead I can keep using the cheap storage of S3 as both my single source of data truth and for my analysis work. One of the key contributing factors to data growth that leads to management problems is data duplication. CHAOSSEARCH lets me avoid this duplication and the issues that come with it.
I want to scale my data, not Elasticsearch. One of the promises of serverless is that scalability is built in; I choose serverless approaches for my applications because I want to spend more time solving my problems and less time on tasks such as scaling.
Scaling Elasticsearch can become significant work. As your data grows so does the operational work required to have Elasticsearch continue to fit your needs. For managed Elasticsearch services, you pay a premium for that expertise. However, with CHAOSSEARCH I wouldn’t have to worry about these issues. In the application that I’ve built, AWS handles the work of scaling S3 for me and CHAOSSEARCH does the work of scaling so that I can easily analyze my growing data.
Finally, let’s mention how CHAOSSEARCH offers a familiar interface and user experience. The UI is Kibana-based! There was no new UI for me to learn. Instead, I could use my knowledge from previous Elasticsearch setups to immediately start combing through my data. This was what I needed. Less time learning a new tool and more time analyzing my data for insight. Plus, the promise of wide Elasticsearch API support coming has me picturing ways in which I can get more out of the platform and subsequently, my data.
What Makes a Good SaaS Buy Over AWS?
I opted to use CHAOSSEARCH and have been very happy with that decision. Most important to me was that I went from figuring out how to build an analytics capability into my serverless application to purchasing a product to analyze my existing data in S3. As a result, I was quickly back to analyzing my data. That meant I was back to solving the problems inside my data and not the problem of how do I analyze my data.
A good SaaS product compared to an AWS service is more than just technology. It’s everything else around it — from the onboarding experience, to the documentation, to the understanding of what features customers really need to solve their problems, to its responsiveness to prospect and customer feedback, and more! If your problem involves analyzing large data sets in an easy and cost effective way, don’t build an analytics platform before you have a look at CHAOSSEARCH.
Check out this video of Pete Cheslock, VP of Product at CHAOSSEARCH, diving into my billing data to identify anomalies.
If you would like to learn more, reach out to the CHAOSSEARCH team and they can show you how you can get insight into YOUR data on YOUR Amazon S3 account.