Buyer Beware! Three Challenges with Elasticsearch and OpenSearch

Written by David Bunting | Nov 2, 2023

Elasticsearch and OpenSearch are powerful enterprise search and analytics engines that have become popular in the world of data management and telemetry analysis. Their ability to swiftly search, analyze, and visualize data has made them indispensable for organizations.

However, in this blog, we will explore a few key challenges faced by companies using Elasticsearch and OpenSearch, shedding light on important considerations when selecting the right tool for your needs.

Challenges With Opensearch and Elasticsearch

A Few Key Differences Between Elasticsearch and OpenSearch

Elasticsearch and the emergence of the Amazon OpenSearch service

Elasticsearch, which started as an open-source project, underwent a significant transformation in its licensing model in 2021. It transitioned towards more proprietary licenses, creating some concerns for users who had become accustomed to its open-source nature. This licensing shift led to the emergence of OpenSearch as a compelling open-source alternative. OpenSearch is essentially a fork of Elasticsearch created to provide a community-driven solution. Since then, OpenSearch has transitioned into a managed services offering from AWS and Oracle.

Recent research from Elasticsearch revealed a performance gap between Elasticsearch and OpenSearch, claiming that Elasticsearch is 40-140% faster than OpenSearch while using fewer compute resources. However, users may find it beneficial to conduct their own analysis of how these tools perform in their environments. It’s important to note that both systems use the same underlying architecture, which requires duplicating and moving index data across multiple availability zones.

As we analyzed in a previous blog, a serverless architecture that’s truly stateless can overcome some of the performance and scale challenges with the stateful architectures of both Elasticsearch and OpenSearch.

Challenge 1: Elasticsearch and OpenSearch Total Cost of Ownership (TCO)

Elasticsearch's proprietary licenses mean that users are required to pay for certain features or extensive software use. These licensing costs quickly add up, and can prove challenging for organizations with tight budgets. This can pose operational limitations, if it becomes too expensive for users to access the full suite of Elasticsearch's capabilities.

A proprietary license structure enables Elasticsearch to invest in the development of advanced features and provide more robust support to its users, which can be beneficial. However, it might also lead to vendor lock-in, where users are tied to Elasticsearch due to their substantial financial investment, making it difficult to switch to alternatives if needed.

For users managing extensive data analytics workloads, these licensing costs can be a significant concern. As a result, users must carefully consider the cost implications and evaluate whether the benefits of Elasticsearch justify the investment. As a result, some users may find a self-managed community version of OpenSearch cheaper, although management complexity could still add up to a significant financial and talent investment (we’ll cover more about this in the next section).

Regardless of whether you self-manage or deploy your system in a public cloud, you’ll incur high data storage and computing costs that scale exponentially based on the volume of data you ingest each day and how long you retain the data. This includes the cost of scaling your OpenSearch or Elasticsearch cluster with additional nodes (which consume computing resources), as well as data transfer and monitoring costs.

Ultimately, the more data you ingest and the longer you retain it for, the higher your data storage and querying costs will climb. To see cost projections for Elasticsearch and how they compare to alternative solutions, download our white paper on the subject (linked below).

Challenge 2: Elasticsearch and OpenSearch Management Complexity

The complexity of managing Elasticsearch and OpenSearch can easily be underestimated. Both systems use a specific underlying data structure that can be challenging to navigate, adding an extra layer of complexity to data management. That’s because first-generation databases like Elasticsearch and OpenSearch were built with a large-scale viewpoint, as individual databases connected via a synchronization protocol.

Each node within a computer cluster works within a group, but executes in isolation and synchronizes state amongst its peers within a quorum. A key part of database state synchronization is division of work across the cluster. In other words, concepts like partitioning data into shards during ingestion, as well as querying those shards, is a major construct in such architectures.

By their very nature, these databases are “stateful”, since they depend upon numerous pieces of the system to remain in a persistent state. Storage and compute are still tightly coupled, even though a sharded architecture helps with scale. To conduct an operation such as querying or synchronization in these databases, you must read off a disk, put compute in RAM, and hold state.

These complex database architectures require many teams to continually allocate more budget and additional personnel to manage it. As Elasticsearch or OpenSearch environments expand in size and complexity, they become increasingly unstable. When pushed beyond their designed architectural scalability limits, these deployments frequently encounter outages, which can severely disrupt the operations that rely on them.

Challenge 3: Elasticsearch and OpenSearch Long-term Storage and Retention

Both Elasticsearch and OpenSearch share limitations with ingestion and retention costs. These limitations stem from the fundamental design of their systems, impacting the ability to analyze long-term data effectively. Common scalability challenges associated with operating these systems include:

Slow Indexing - Elasticsearch and OpenSearch can be prone to sluggish performance when indexing large-scale data. There’s plenty of advice in the Elasticsearch documentation on how to tune for indexing speed, such as by using bulk requests, disabling replicas for initial loads, or increasing the refresh interval.
Slow Query Performance - Another typical challenge when scaling Elasticsearch and OpenSearch is degrading search speed. As the size of an index increases, query time also increases and it may be necessary to simplify the data model or reduce query complexity to speed things up.
Sub-optimal Sharding Strategy - Shards allow you to split the contents of an index across multiple nodes to accelerate query performance. However, poor sharding strategy can be a major cause of poor performance at scale. Too little sharding results in massive indices and slow query performance, while oversharding (dividing an index into too many shards) can lead to poor responsiveness and stability issues.

Scaling your Elasticsearch or OpenSearch deployment involves increasing the capacity and performance of your cluster by adding additional nodes, sharding, and replicas to handle increased volumes of data. As daily log volume increases, Elasticsearch and OpenSearch users run into issues like ballooning costs, degrading query performance, and increased management overhead. By retaining data for as little as seven days, Elasticsearch and OpenSearch users can reduce their data storage costs, limit the size of indices to preserve query performance, and reduce the need for complex sharding strategies.

In practical terms, this means that the team responsible for the Elasticsearch or OpenSearch environment must find ways to limit its growth by constraining the amount of data ingested daily and reducing data retention rates. It's crucial to remember that a centralized log management system serves as the primary source of truth for a specific IT environment. Therefore, deciding to restrict the data collected and available for analysis creates gaps in insights.

Considering how essential log data access is for all use cases dependent on it, these gaps can have catastrophic consequences. Whether investigating a data breach or conducting trend analysis to determine infrastructure needs, incomplete data can result in flawed analyses. The challenge with log data is that it's difficult to predict precisely what data will be needed until the moment arises. Consequently, when making trade-off decisions, the team managing the Elasticsearch or OpenSearch environment is operating with limited foresight.

Mitigating these Elasticsearch and OpenSearch challenges

To mitigate these challenges of Elasticsearch and OpenSearch, consider replacing these tools with ChaosSearch for high-volume log and metric telemetry analytics workloads that demand longer data retention.

ChaosSearch takes a fundamentally different approach to search and analytics, representing a new generation of log management platforms. Whereas Elasticsearch and OpenSearch platforms are “closed” systems, in which data is transformed during the ingest process, and stored within an internal database with its own data format, ChaosSearch simply connects to and indexes data that is already stored by the customer, in the customer’s existing cloud object storage (e.g. Amazon S3 or GCP).

With read-only access to this customer data in the cloud, ChaosSearch builds a separate index without manipulating or taking custody of the underlying original data. Upon ingest, ChaosSearch introduces no bottlenecks — the data can stream directly into a customer’s cloud storage in its native format. And because it avoids the burden of “data custody”, ChaosSearch has no internal database size constraint. ChaosSearch simply leverages the performance, scale, and economics of the public cloud. This is the key that allows ChaosSearch to deliver unlimited scalability, industry-leading resiliency, and massive time and cost savings.

In conclusion, Elasticsearch and OpenSearch are powerful search and analytics engines, but they come with notable limitations for log analytics at scale. When choosing a solution, it's crucial to consider your organization's specific needs and the limitations you can tolerate. By understanding the limitations and opportunities presented by Elasticsearch and OpenSearch, you can make informed decisions to meet your data analysis requirements. Augmenting or replacing these tools with modern, serverless alternatives like ChaosSearch may be a far more cost-efficient alternative, as your organization's data analysis needs grow.

View full post