ChaosSearch Blog - Tips for Wrestling Your Data Chaos

3 Straightforward Pros and Cons of Datadog for Log Analytics

Written by Dave Armlin | Jan 4, 2024

Observability is a key pillar for today’s cloud-native companies. Cloud elasticity and the emergence of microservices architectures allow cloud native companies to build massively scalable architectures but also exponentially increase the complexity of IT systems.

At the same time, vast amounts of machine-generated data created by these systems are crucial for an ever greater set of stakeholders - from SRE/DevOps and developers for monitoring and troubleshooting to SecOps for threat hunting to product & data science for A/B testing and growth.

This has led to the emergence of observability - the ability to measure a system’s internal state and health based on the telemetry data it generates (i.e., logs, metrics and traces), with Datadog being a tool of choice.

However, while centralizing all telemetry in a single platform works well initially, it creates significant challenges at scale. This is because the underlying data technologies for log analysis and monitoring are fundamentally different, often leading to ballooning costs, reduced data retention, increased operational burden and limited ability to answer relevant analytics questions.

 

 

This blog post explores the most important pros and cons of leveraging Datadog for log analytics. We’ll highlight the key features and benefits that have driven Datadog adoption, along with the critical drawbacks that lead organizations to choose additional log management and log analytics solutions.

 

What does Datadog do?

Datadog is an infrastructure monitoring and observability platform primarily used by cloud-native companies. It's features include real user monitoring, application performance monitoring (APM), security monitoring and log management.

Datadog allows customers to ingest all metrics, traces and logs across applications, infrastructure and third-party services. The company also monitors systems in a single platform, which is why it is popular among fast-growing companies.

Common use cases for Datadog include:

  • Monitoring automation for DevOps
  • Shift-left testing
  • Real-time business intelligence
  • Security analytics
  • Digital experience monitoring
  • And more.

 

 

Datadog Database Pros and Cons

Now that we’ve reviewed the primary ways to use Datadog, let’s focus on some other questions. What does Datadog do well? And what are the cons of fully depending on Datadog for log analysis in cloud-based environments.

 

Datadog Pros

1. Cloud-native startups love it

Datadog made a name for itself as an infrastructure application monitoring platform for cloud-native startups. As these startups grew, many stuck with the service yet faced data ingestion and retention limitations. As mentioned above, Datadog is excellent at detecting issues, yet finding the root cause is far more complex.

2. Powerful and configurable UI

Many Datadog users love the clean user interface and the out-of-the-box dashboards within the platform. This single pane of glass is useful for visualizing the entire system. Drag-and-drop widgets let you create custom views without having to code. An array of visualization tools allow you to see data in a variety of formats and easily generate reports.

3. Easy to get up and running

Datadog is simple to get up and running. You can install and configure the Datadog agent quickly and connect external services via API integrations. However, once you dive deeper into the log analytics use case, you may find that the ingestion and retention “rehydration” process becomes far more costly and complex than you want to manage.

 

Datadog Cons

While the centralization of telemetry & intuitive UI make Datadog a very popular application for fast-growing startups, its cost as they scale become a key challenge, especially in the current market environment. There is no place where this is more prominent than logs. While metrics and traces are priced by host and hence scale only with the number of new services, logs are priced by volume, so they scale more directly with usage, especially in microservices architectures.

1. Complex Log Ingestion, Indexing and Retention Process

The log analytics process within Datadog is far more complex than it needs to be. You can send logs to Datadog, but you can’t analyze them. If you want to analyze them, you need to index and retain them. There’s a separate pricing structure around ingestion and retention (we’ll cover more about that next). Because of the complexity and cost structure, some organizations choose not to retain as many logs as they might need or want to. That leads to issues when troubleshooting and root cause analysis, especially for persistent issues that last beyond the retention period for your logs. To index and analyze your logs, you need to get the logs out of your cloud object storage (e.g. Amazon S3) and rehydrate them. This process can take hours and requires someone to manage it. With persistent talent shortages and an overabundance of work for DevOps and site reliability teams, many organizations can’t afford to manage this level of complexity.

2. Costly Datadog Log Analytics Workflow

When it comes to logs, Datadog log management pricing starts at $.10 to ingest data but $1.06 (3 days) to $2.50 (30 days) for retention. To retain logs for longer, you need to call Datadog to arrange custom pricing, which can quickly add up as companies scale. While Datadog is helpful for monitoring and detection, once it comes to root cause analysis and troubleshooting, Datadog logs pricing can quickly balloon out of control.

3. Scaling Challenges

Shortening log retention windows can become a significant tradeoff and result in a loss of visibility into more complex issues – ranging from lingering application and infrastructure performance problems to advanced persistent security threats. Many startups that begin with Datadog find that as they scale, they end up spending absurd amounts of money to retain their logs. With scale, Datadog becomes both more expensive and harder to use.

Read: Log Analytics and SIEM for Enterprise Security Operations and Threat Hunting

 

Why is Datadog Log Management Pricing So Expensive?

Modern cloud computing's intricacy drives the need for observability. Observability systems gather and examine data from logs, metrics, events, and traces — enhancing system performance and user experience.

Yet, with the rise of microservices across cloud computing environments, understanding behavior grows harder. Microservices-based applications combined with rising application usage multiplies the volume of log data, requiring a more complex infrastructure for monitoring and troubleshooting. Thus escalating data volumes and increased retention makes observability systems like Datadog too expensive for many organizations to sustain for certain workflows, like log analytics.

If you’re here looking to reduce the price you pay for Datadog, consider the strategy below.

 

 

Datadog Alternatives

If cost is a concern, what are some of the best Datadog alternatives for log analytics at scale?

There are plenty of DevOps tools for continuous monitoring. Unfortunately, the common thread of high ingestion and retention costs is the same for other observability systems in data-heavy cloud environments.



Elasticsearch

Despite its relatively low barrier to entry, Elasticsearch presents a steep learning curve and high levels of management complexity for users. To effectively deploy and manage an Elasticsearch cluster, you’ll need to develop the appropriate knowledge and skills. Combined with a budget-breaking total cost of ownership and troublesome stability and uptime issues, many users seek out alternatives for log management.

 

Splunk

Some alternatives like Splunk are great for certain use cases, such as security observability. Like Datadog, Splunk is highly effective for real-time alerting and analysis. However, as you retain log data for longer periods of time for threat hunting and advanced persistent threats, costs can spiral out of control.

 

New Relic

New Relic is another continuous monitoring tool that delivers full observability of the entire software stack. This single platform brings together four types of telemetry data: events, logs, metrics, and traces. Core features include browser and mobile session monitoring, visibility into servers, on-prem VMs, and cloud-native infrastructure, real user monitoring, and synthetic monitoring capabilities. However, like Splunk and Datadog, New Relic users may find some issues with cost as log data scales, along with performance degradation issues in the log management workflow.

 

While Vantage’s cloud cost report for Q1 2023 identified log management as the top cost driver in Datadog, similar cost and performance issues at scale persist across the observability solutions identified above.

Consider this if you’re still looking for a way to reduce Datadog costs: Use Datadog database monitoring in real-time, but replace the log management workflow (bonus points if the chosen solution has a seamless Datadog integration). Why? Companies must analyze the log and event data they need, for as long as they need it— without worrying about retention costs. The log management component of an observability system must scale along with an increase in data, while seamlessly integrating with the existing observability stack.

Watch on-demand: ChaosSearch + DataDog: Better Together

 

 

Add ChaosSearch to your Datadog observability stack

ChaosSearch can seamlessly complement Datadog. Our data platform, purpose-built for cost-performant analytics at scale, allows customers to centralize all their logs with unlimited data retention and analyze them performant via Elastic or SQL API at a fraction of the cost.

It’s simple to get started. All you have to do is:

  1. Send your logs to your Amazon S3/GCS: you can send them directly from the source or ingest them into Datadog and use Amazon S3/GCS as the destination.
  2. Connect to ChaosSearch: give ChaosSearch read-only access to the raw log buckets, create a new bucket for Chaos Index® and create a few object groups and views.
  3. Analyze your logs via Elastic, SQL or Generative AI: Analyze all your logs in our console via Kibana (for troubleshooting), Superset (for relational analytics), or via Elastic, GenAI or SQL API.

Furthermore, all of this is done in a fully managed service at a fraction of the cost. ChaosSearch provides unlimited data retention with a starting price of $0.30/GB and significant discounts at scale. No more rehydration processes or discussions about retention. You can index it all and let your users analyze data in their tool of choice.

Set yourself up with the best smoke alarm (Datadog) and forensics tool (ChaosSearch) for all of your internal users at a fraction of the cost. You can have true observability at scale across your systems, free your teams from toil and improve your efficiency to further fuel your growth.