As big data continues to grow exponentially, enterprises are discovering that legacy data environments (e.g. data warehouse or data mart) were never designed to efficiently process and extract insights from the vast volumes of data they generate today.
In turn, enterprises are shifting investments away from legacy data environments and searching for future-proof alternatives (e.g., data lakes, data lakehouse, data fabric, or data mesh) to support data-driven, new-generation initiatives.
To understand how modern enterprises are reimagining and reconfiguring their data environments for the future, ChaosSearch commissioned the 2022 Data Delivery and Consumption Patterns Survey.
This survey of 209 IT and data managers looks at the current composition and maturity of data environments in the enterprise, future investment plans for a variety of data environment solutions, and the key opportunities and challenges driving change.
This week’s blog highlights several significant findings from the 2022 Data Delivery and Consumption Patterns Survey. If you’d like to learn more about how today’s enterprises are preparing for a data-driven future, you can download the full report by clicking the link below.
A data warehouse is a type of data environment that provides business intelligence for structured operational data, usually from RDBMS.
Enterprises use the Extract-Transform-Load (ETL) process to flow data from transactional systems, relational databases, and other sources into the warehouse. But while the volume of data increases, the time, cost, and complexity of ETL increase as well, and data warehousing becomes expensive and inefficient.
Despite this challenge, our survey shows that the majority of enterprises are still deploying data warehousing solutions in business-critical applications: 82% of survey respondents said they were using data warehousing solutions to store and maintain enterprise data for analysis and reporting. Furthermore, the same percentage said that data warehouse environments were critical to running their day-to-day operations.
Next, 36% of respondents told us they relied exclusively on data warehouses and had yet to implement other kinds of data environments (e.g., data lakes, data fabrics, etc.) to support data-driven initiatives.
We also learned that data warehousing solutions in the enterprise had the highest average maturity compared to other kinds of data environments. While the average enterprise data mesh solution in our survey was just 2.1 years old, respondents with data warehouse solutions had been using them for an average of just over 11 years.
Watch: The Rise of Data Mesh and Data Fabric Architectures
Modern enterprises know that public cloud infrastructure provides the best management experience and cost economics for most of their workloads.
As enterprises transition their workloads, apps, and infrastructure into the cloud, they’re also generating more data in the cloud. And since moving data out of the cloud is costly and time-consuming, it only makes sense that enterprises are beginning to move their data environments into the cloud as well.
When we asked our survey respondents who relied exclusively on data warehousing solutions, 73% told us they were running those data warehouses on on-premise systems. But when we asked where those same systems would be maintained over the next 1-3 years, only 34% indicated that data warehouses would be staying on-prem.
While the overall trend leans towards the cloud, some data environments are trending faster than others. The general outlook for data mart/OLAP and data warehouse solutions involves more on-prem deployments. In contrast, data lakes, lakehouse solutions, and data mesh/fabric solutions are more likely to be deployed in the cloud.
Read: New Report Shares Best Practices for Modern Enterprise Data Management in Multi-Cloud World
Our survey revealed that data delivery issues are increasing for enterprises across the board, regardless of which data environments they use for analysis and reporting.
For organizations that rely on data warehousing solutions, the exponential growth of big data means allocating more and more resources to data pipelines and the ETL process. When data engineers can’t keep up with the flow of data, the result is increased time to insights, which limits the impact of data-driven decision-making.
For organizations diversifying their data environments with data lake, lakehouse, or fabric solutions, data delivery issues arise from the sophisticated planning required to ensure data quality and reliability at scale.
When we asked our survey respondents whether data delivery issues have increased or decreased within their organizations over the past three years, 65% reported that data delivery issues had at least somewhat increased, while 20% said that data delivery issues had increased substantially for their businesses.
We asked survey respondents in executive and management roles to identify the most important challenges they regularly encountered while working with enterprise data.
Our respondents indicated various challenges, including data governance, validation and quality enforcement, having limited time and resources available for analysis, ensuring data security, and the greatest of all, data preparation. In total, 53% of respondents felt challenged by the overall amount of time/resources spent identifying, cleansing, rationalizing, consolidating, and transforming data.
The ChaosSearch cloud data platform addresses data preparation challenges with a schema-on-read approach that allows enterprises to index and analyze their data with no data movement and no resource-intensive ETL process.
Data proliferation is something enterprises should avoid when dealing with big data. When data is replicated, the administrative and financial burden of storing and securing the data is multiplied. In addition to security issues, data replication can also cause data reliability issues when it becomes unclear which data is the most current or accurate.
Today, enterprises are leveraging a growing number of applications to analyze and visualize their data. As a result, data is frequently replicated in various locations across the enterprise. In our survey, 52% of data managers reported having four or more copies of the same data at various places within their enterprise.
To avoid needless replication, enterprise data should be centralized in a single storage repository where it can be queried or analyzed as needed with no data movement. At ChaosSearch, we’re leading the “No Data Movement Movement” with a cloud data platform that enables our customers to analyze data directly in Amazon S3 buckets and put an end to costly and redundant data movement.
As enterprises prepare to meet the data challenges of the future, we’re seeing an increased level of investment in data lake solutions. In total, 56% of the Data Delivery survey respondents said they were planning to accelerate data lake investments over the next three years.
Our 2022 Data Delivery and Consumption Patterns Survey yielded plenty of valuable insights into how enterprises are structuring their data environments and what the future could look like as enterprises execute on their investment plans and diversify their data environments over the next three years.
In addition to what’s presented here, our respondents gave us plenty of valuable insight into the types of workloads they run, management and data integration challenges, latency issues, and future investment priorities. To get all the details, you can read the full survey report by visiting the link below.
Read the Blog: IT Professionals Reveal Cloud Data Platform Highs and Lows
Listen to the Podcast: Differentiate or Drown: Managing Modern-Day Data
Check out the Brief: How a Cloud Data Platform Scales Log Analytics and Fulfills the Data Lake Promise