No-SQL. The topography of a database

In our previous BLOG on No-SQL databases, we focused on Big Data.  We explored the idea that because of the enormous size of the underlying data, our former notions of data efficiency and order no longer apply.  Rather than spread related data across numerous, normalized tables, we strive to keep related data together.  In doing so, we greatly simplify the task of retrieving and storing data.  When we need it, the data is stored in one complex record in one table.  One read and we have it.

But in simplifying the retrieval and storage of data, we create complexity of another kind.  How do we keep track of data that we formerly parsed out with logical precision to individual tables?

  • Customers can place many orders.
  • The orders can contain many line items.
  • The line items can, in turn represent many products.
  • There are invoices to be sent out
  • Backorders to be dealth with, and
  • payments to be received.

How do we propose to store all of this data in one record? In answering this question, we find that our data takes on an unusual shape or topography.  Each”record” is no longer flat like Kansas.  On the contrary, it has contours, shapes, and texture, like Colorado.

We find that each of our data records are lumpy.  They accommodate all the data necessary to describe the underlying business or information problem.

In No-SQL, tThe records and tables are so different, in fact, that when we refer to them we must use different terms.  We refer to collections rather than tables because the structure of the collections are diverse enough to accommodate many different aspects of one data problem.  And we refer to documents rather than records because a record implies structural uniformity rather than the diversity of information that the No-SQL database can accommodate.

But do not confuse No-SQL documents with a word document or other kind of unstructured computer text files.  These are highly-structured data-rich groupings of information designed expressly to accommodate our high performance data storage and retrieval needs.

In attempting to understand the benefits of No-SQL, we can find a helpful analogy in physics where the conceptual transition from Newtonian physics to Einsteinian physics comes to mind.  In Einsteinian physics, space is no longer Euclidean.  It becomes curved.  And time is no longer purely fixed intervals, it behaves differently depending on the relative speed of the objects in question.

Similarly, in No-SQL we no longer think of documents (formerly records) as uniform in length or field count.  Documents can contain a variety of related information that is stored together to describe our business problem or data problem.

We refer to the topography of the documents (the diverse shape of the records)  in a No-SQL database.  Understanding this topography and knowing that it is dynamic and can be changed over time with relative ease is a powerful concept indeed.

No-SQL. When our logical assumptions become illogical

SQL or Structured Query language has been the prevailing mode of database organization for over 40 years. The fundamental concepts that form the basis of SQL were introduced in the early 1970’s.  And fifteen years later, in the mid 1980’s standards were introduced that enforced uniformity for all SQL database solutions from a wide variety of the most respected and pervasive software vendors in the industry:  IBM, Oracle and Microsoft to name the most prominent.  What more could we ask for?

  • an overwhelmingly logical database structure,
  • accepted by the leaders of our industry,
  • with standard that promote uniformity and compliance across commercial software products.

But beginning in 2006 and more recently, we find two forces emerging that are challenging the leadership, acceptance and viability of  SQL.

The first emerging force is “big data”.  We are beginning to see databases of 500 billion or more records.  These databases span disk storage devices and even span computers themselves.  For the past 40 years, it was reasonable in traditional SQL to make logical connections between related data sets.  For example:

  • An order and the underlying order items are related.
  • The customer and customer payments are related.

But the tables in our databases are beginning to take on sizes that exceed our wildest imagination.  And while the task of joining or connecting these “related” tables seems logically sensible, the practical task of doing so for immense data tables is no longer feasible.

Faced with this dilemma, we begin to question the logical purity of SQL and revisit the most fundamental of our data organization questions and assumptions.  Rather than continually splintering and reassembling the components of a logical data record when needed, why don’t we simply store all related information in one record.

We are finding that the very cornerstone and foundation of database logic is being turned on its head.  As is frequently the case, once we challenge our most fundamental assumptions, absent this foundation, elated ideas also begin to give way.  Things that were previously difficult now become easy.  But the contrary is also true.  Those things which were easy with SQL become more difficult.

In subsequent blogs, we will construct a new logical vision.  We will explore these logical issues, the benefits and costs, as we leave the accepted real of SQL and enter the world of No-SQL.