First Principles of CHAOS

Written by Thomas Hazel | Dec 10, 2018

CHAOSSEARCH, the company, is three years in the making. But as an idea, as an invention, a career defined. Another way of saying, CHAOS is a culmination of a life focused on the desire to solve and purposely rethink theories and algorithms geared towards wrestling problems related to managing and analyzing chaotic data, particular at absurd scale.

To capture this journey (or better yet company mission) in a word: Information. Not information as in the educational sense or weekend web surfing binge (though see my wiki addiction), but information as in the study thereof. I’ve always been enamored with the theory of information. One could even say an “eternal groupie” of Claude Shannon where representation, measurement, and storage of information is my own personal life doctrine. Yet for some, thinking about (let alone working towards) such abstract, obscure, and even elusive problems might be akin to Don Quixote Tilting at windmills. For me, entropy windmills (a.k.a chaos monsters) are everywhere, each needing to be slain. And this obsession is where the premise of CHAOSSEARCH began — an idea derived from an exercise of First Principles.

First Principles

With a seed of an idea, I began to use First Principles. For those not familiar with its definition, First Principles are a foundational proposition or assumption that cannot be deduced from any other proposition or assumption. In regards to CHAOSSEARCH first principles, there were three main areas of focus with respects to Information:

Minimums
Distribution
Analysis thereof

And with such focus (or better yet belief) that if one could take the above constructs and achieve best in class, one could algorithmically slay those darn windmills. But beyond these principles, if one could holistically leverage each construct together, it would have a multiplier effect where the magnitude of benefit could be worthy of starting a company. So that’s what I went off to do...

Theoretical Minimums

The idea that if one could represent information as small as possible, or at least smaller than anyone else, it would have a significant competitive advantage. In other words, “smallest” leads to reducing the amount of storage required, network utilized, and computation executed.

And while laboring over this theoretical minimum principle, I had a breakthrough. An insight on how to uniquely represent information. A representation that allowed for compression ratios greater than a given compression algorithm alone. And yet this revelation was not the ending, but a beginning. This representation (which I call Data Edge) was not just a new way to make data smaller, but a new database index. An index that was “not” monolithic with traditional mathematical limits, but rather distributed in both its creation, linking, and analysis thereof. But unlike classic database indices where over-indexing results in degeneration, Data Edge linearly indexes every aspect of a data source, all the while supporting text search and relational query operations via a single format.

Distributed Execution

As indicated, small is paramount in any big data endeavor and a winning bet. However, making information small (that cannot scale distributedly or easily) is not actually small at all. And this is where Data Edge truly shined. Its ability to index data sources independently, linked logically, without the need to rebuild and/or reorder the index, change how an architecture could be built. In other words, this representation facilitated a whole new database design for bigger, easier, and most importantly, cost-effective scale.

Analysis of Information

The last principle of CHAOS is the analysis thereof. Or in other words, the value of information. Making information infinitely small or distributed has zero benefit if it doesn't ultimately lead to some business or financial value. And even if value can be derived, if the cost is greater than the actual benefit, the value will be left dormant... But if information could be stored/analyzed directly on elastic/cheap storage, without the need to design/build silos, all the while accessible via text, relational, or machine learning analytics, certainly it would be something to aspire to.

Formation of a Company

And finally, maybe most importantly, to spearhead (let alone found) a startup via a new idea and technology requires like-minded teammates. A team that not only believes in the mission, but is inspired by it. A company that can handle and overcome relentless hard. And when I say hard, distributed databases don’t get much harder. And when you introduce unique technology into the mix, divergent thinking, as a team, is a daily necessity.

Meet this unstoppable and remarkable team: chaossearch.io/company/

In my next blog, I will walk through the study of Data Edge (now called Chaos Index®) and how this representation allowed CHAOSSEARCH to reimagine big information normalization, virtualization, and materialization for both text search and relational query analytics.

View full post