Identify Anomalies in your AWS CloudTrail Data

Released in 2013, AWS CloudTrail is a service provided by Amazon Web Services (AWS) which keeps a record of every single API call that happens within your AWS account. CloudTrail provides you with the ability to get deep visibility into the activity that occurs within your account, allowing you to see exactly who did what and when. You can use the CloudTrail logs not only to track the security of the user access but also for operational troubleshooting. There are no charges to use the CloudTrail service, but since all the data is logged into a bucket in your AWS account, standard S3 charges will apply.

CloudTrail is only an API logging tool — it does not come with any native way to analyze the data other than a simple UI that lets you run some basic searches on events over the last 90 days worth of data. If you wanted to use native AWS services to visualize usage trends you would need to ingest your data into Redshift or use Athena and QuickSight to build dashboards and other visualizations. If you wanted the ability to run free-form queries on your data, such as wildcard searches on various fields, you would need to ingest this data into an Elasticsearch cluster.

Elasticsearch is a powerful distributed search cluster, but there is a lot of technical complexity involved in order to consume and index this data. The biggest complaint our customers mention when they talk about trying to get value out of their CloudTrail data is the time it takes to build a schema for the data. CloudTrail data is notoriously sparse, with hundreds of fields requiring distinct mappings for. Additionally, if you were to make a mistake during index creation, you would need to spend time and energy updating that schema and reindexing all of your data.

CHAOSSEARCH is the first technology that turns your data on Amazon S3 into a fully searchable cluster with support for the Elasticsearch API as well as a fully integrated Kibana interface. After you get started with CHAOSSEARCH and integrate your account with a Read-Only IAM access role, you can create a Virtual Bucket grouping together all of your CloudTrail data for indexing.

CHAOSSEARCH will identify this data as GZIP JSON, and if you want to have this data continually indexed as AWS writes logs to your S3 bucket, you can enable SQS notifications which lets CHAOSSEARCH know when new objects are available in the bucket for indexing.

You won’t need to spend any time creating an index schema and mapping for the data with CHAOSSEARCH. We can automatically identify which fields are strings, integers, or time values. Since we leverage a revolutionary Schema on Read approach to data indexing, you can modify the schema for the data anytime without ever needing to reindex your data.

After indexing, this data is now available within our integrated Kibana interface and you can see that we have indexed 455 separate fields within this very sparse dataset. Everything looks and feels like a normal Kibana interface, except all this data exists within your Amazon S3 infrastructure.

From here we can navigate to the discover screen to start analyzing API usage over time. Since all the data lives within your Amazon S3 account, you can now cost-effectively retain months and years worth of your log and event data. In the event of an operational issue or a security event, you will always be able to go back to data with the platform no matter how long ago the event occurred.

In this scenario, I want to analyze my AWS S3 API calls for any potentially anomalous activity. I can immediately run an aggregation for all events where the source is an Amazon S3 API call.

Sometimes you may want to order by the most frequently used API calls, but in this case, I want to investigate the least common API calls by type. When I adjust my query I can see two DeleteBucket API calls. Let’s continue diving in to see which buckets were deleted.

I can use the native Kibana tools to pin the filter and go back to our raw event view to identify which buckets were deleted.

In my case, I see a bucket related to logs for my blog was deleted, and the bucket that the CHAOSSEARCH indexes were stored in was deleted as well.

Let’s now continue diving into my CloudTrail data to see if we can identify and anomalous user console logins to the AWS platform.

When I search for the ConsoleLogin event name I can see that a majority of user logins to the Console were made by my IAM user “petecheslock” — but there were also 5 “root” user logins as well. There should be no reason that I can think of for logging in as the root user of an AWS account, so let’s identify when these logins occurred and what activity was taken.

You can see those 5 logins occurred at random times over the last 6 months — but I’m most interested in the most recent login — what did the root user do around the end of April?

By adjusting my query and time scale I can see that the root user was used in order to redeem an AWS promo code for my account.

This entire process from raw data within my AWS account, into actionable insights on my data, was all able to happen within minutes from initially getting my AWS account set up with CHAOSSEARCH. I didn’t have to spend any time deploying a database like Elasticsearch, sizing, sharding, or creating schemas and mappings for my data. I can leave all my data within my own AWS account, let CHAOSSEARCH index that data, and write that indexed data back into my AWS S3 buckets. Reach out today and get set up in minutes to start getting answers to your CloudTrail data questions.