Intermission: Scaling

As I work on articulating the issues of capacity and scaling for a current blog, I have to deal with half truths about what logging systems can do today.  Truth is that what major companies are selling for their back end large data systems are flat files and relational databases, whose performance numbers are skewed. 

LetsAllGoToTheLobby

In general, people underestimate the complexity of big data.  Like the Cult of the Offensive in World War I, there is a general populace that just thinks sending more resources over the top will create progress. Big data is not the same battlefield as we had before, most network and computer engineers do not understand how things scale.  The resources thrown over the top,  are wasted and those that never sent their resources into the breach are the temporary winners. Temporary, for how big data is implemented is a revolution to the classic server oriented architectures of today that most commercial tools still sell.  

Big Data is not a big database.  That is like thinking that a car dealership is a big parking lot.  The dealership does not make money by parking cars, it makes money by selling them.  Big Data does not make money by storing data, it makes money by making data accessible. Companies already know how to store data and create large data centers, it is making this data useful is the reason for big data.

Big Data solutions avoid talking about the speed of accessing data and performing analytics over the data at rest.  Most of the analytics, like Splunk, are generated from preprocessors as data enters the system.  A statements of ingesting 100k events and preprocessing that data might be true.  They are just not true at the same time:  The syslog process can handle 100k events per second (EPS); The Java preprocessor might handle 30k EPS; while the Splunk python script handle 3k EPS. Often the numbers are a snapshot of one components max capability. But its the slow process (3k EPS) that marks what the system can really do. In the end, there is a manager trying to get an engineer to meet the performance they thought they bought. 

The ability to scale, whether business or infrastructure, is the most powerful trait today.  When we think of Google’s Search, we are really thinking of their ability to make searching relevant to the scale of the Internet.  Today’s games focus on being online. Interacting with others is a key element to modern game design. The games scale to the level of its own society.  Microsoft’s most powerful element of the xbox is the ability to scale to its user base.  It’s not easy and they still have outages.

Scaling requires a process to give us structure. Its really the process that scales.  If computers are designed for repetitive tasks, why do they fail to scale? The answer is that as systems grow, emphasis is placed on characteristics that were hidden at lower stresses.  This is the case for DiskIO.  But DiskIO is not the only process in storing and accessing data.

The attributes that prevent scaling are hidden in abstraction.  When we teach computers, we focus on abstraction.  We create logical boundaries and build off these blocks.  With proper structure we are taught that we can keep on building.  Logical  structures are not bound by the same laws that physical structures are bound.  This boundless universe is a lie.  

At the edge of the abstraction, we see a laptop with limits to process power, network bandwidth and disk space.  What is not seen are the countless logical abstractions and their individual constraints.  

The weakest components of computer systems are the logical links that connect the components.  These components are the motherboard, the disk IO, and bridges. The Cray supercomputers pushed innovation by aiming at the weakest elements of the system as it scaled.  Creation of components like the High-Performance Peripheral Interface (HIPPI) bus, showed the importance of moving data between components. I have heard it joked that Cray did not have a memory bus, but a memory freighter.  In short, elements of moving from one step of a process to another is often overlooked.  The attention to this detail was the power of Cray.

Cray is the computer revolution of the 1970’s.  True Big Data is the revolution of the 2010’s. The key as we develop and engineer as we move forward is to realize what from the past to keep and what form today should we build upon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: