Reasoning About Data and Consistency in Systems

In this episode, Daniel Norman - CTO of güdTECH, joins us to introduce the nomenclature of consistency models, and to discuss their various pros and cons. He invites us to consider some deeper truths about our software systems, their limitations, and the ways we think about them in order to further our business objectives.

The highlights:

  • What is the purpose of software systems?

  • What do the humans expect?

  • What is a consistency model, and why does it matter?

  • What consistency models am I likely to encounter?

  • What does my software system look like in these terms?

All this, and much more on today’s edition of 7CTOs Collaborate.
 

In this edition of 7CTOs Collaborate Daniel reviews the most basic motivations of software systems, the reality of user expectation, and the way we reason about our own systems. He argues that our systems exist to model physical reality, and to compute a useful work product in a way that satisfies these user expectations. If we do this well, we will earn their approval, and improve the bottom line.

What is the purpose of software systems?

All software systems exist to model some aspect of physical reality. From accounting software to video games and everywhere in between – Our software seeks to create a model of reality to which we can apply various computations. We do this to yield a useful work-product in service of some objective. This may seem trite, but bear with me a moment. We’re going somewhere with this...

What do the humans expect?

Each of our systems have user stakeholders who have expectations we must attempt to meet. They expect our systems to be fast, to be available, and to be consistent with their worldview. In some cases we’ve temporarily convinced them to tolerate less. In some of those cases, the implications of missing said expectations could be seen as trivial; But above all, these expectations aren’t arbitrary. They are rooted in the physical reality into which we were each born. The human expectation is fundamentally causal in nature, commensurate with those laws of physics under whose inexorable thumb we all live – cause, and effect. Objects, when placed upon a table for instance, tend not to jump spontaneously to other side of said table without some external cause; nor do they hang motionlessly in midair when we release them. Humans tend to react negatively when “weird” things happen which violate this causal expectation.

This is equally true of software systems as it is of ghost stories and hallucinations. When values in a form spontaneously become un-set, or the user-interface becomes unresponsive or unavailable, the user becomes upset.  If you’re lucky, you’ll get an angry phone call – If you’re unlucky, they’ll merely grumble to themselves and you’ll risk losing market-share when something better comes along.

The challenge then, is to design, implement, and maintain a system which satisfies these expectations with rarest exceptions possible. The way we do this in practice tends to be an odd mix of knowledge-work and guess-work. Sometimes “weird” things happen to our systems; like a race condition which is occasionally lost, or a database failure which may take hours to recover from. We get “weird” logjams, hiccups, and interruptions. Why is this? Why does it happen to us?

The problem?

In short, the problem is that we tend to suffer from blind spots, both in terms of what our systems are composed of, as well as the actual extent and boundaries of our systems – More on that to follow.

What is a consistency model, and why does it matter?

The technical definition of a consistency model, according to Kyle Kingsbury is:
“A set of all histories of operations allowable under a system”

This is a pretty good technical definition, but let’s break it down a bit:

A consistency model is a contract of sorts. It offers certain guarantees which we can rely on. It also comes with certain limitations which are important to understand. It is, in a sense, a conceptual building block which we can use to build our systems. It has properties that we can trust (IE: “invariants”) without requiring a detailed understanding of the things inside.

What consistency models am I likely to encounter?

Let’s review some common consistency models in our systems:

Linearizable Consistency

AKA: Total Ordering

This consistency model is very common. You probably use it in dozens of ways you might not even realize, but one of the most recognizable uses is the relational database. Mysql, PostgreSQL, MSSQL, Oracle, etc. They all use a linearizable (or in some cases serializable) consistency model.

Under a linearizable consistency model, all operations have to complete in a single specific order. A previous operation must be completed before the next one begins. By maintaining a single, exclusive list of operations applied to the database, it assures that there is a very clear, and decisive order of operations that occurred.

This consistency model is considered by many to be Strongly consistent insofar as no ambiguity or concurrency is permitted in our write operations. Because the list of operations is an exclusive resource, everyone gets the same answer to their queries. None of these answers at a given point in time will disagree, because they’re all drawing from the same “source of truth”, as some like to call it.

One way we can visualize this as a single file line, or a chain. Because the end of a chain can only exist in a single point in space, that necessarily means we must travel to it.

This consistency model is weak, however, in the sense that travel requires patience – Your operations and their responses have to travel to and from the central arbiter. As travel takes time, and can be fraught with hazards, your data be delayed, or perhaps not arrive at all. They cannot be guaranteed to ever complete. Under some “weird” circumstances, components implementing linearizability may become stuck, or log jammed.

Serializable Consistency

This consistency model is quite similar to linearizability, except that it allows some operations to execute concurrently, provided they are not in direct conflict with any other operations underway at the time. This tends to entail record-level locking.

You could Imagine a single file line for a buffet, where the person behind you is able to jump ahead of you to get some green beans, but only if you look uninterested in them.

Like linearizability, serializability can also said to be strongly consistent in the sense that all queries at a given time receive a consistent answer.

It is also weak in the sense that it requires patience to travel to central arbiter, and offers no promises about if and when you’ll get a response.

Sequential Consistency

This consistency model starts to differ a bit from previous two. Operations can be originated in any order, concurrently and without waiting. Only the application of these operations must happen in a specific order. There is no guarantee of any ordering correlation between operations initiated, and the application thereof – The only guarantee provided is that they will be applied in some specific order that everyone will agree upon. The sequential consistency model tends to be employed by message queues, like RabbitMQ, or Amazon SQS.

Eventual Consistency

This consistency model is said to be among the weakest consistency models available. Essentially the only guarantee that it offers is that, given a quiescent system and enough time, all live nodes will eventually agree on their ordering of events, and come to a stable, consistent state. Some systems that use this are MongoDB, Cassandra, Amazon SimpleDB.

The main benefit of this consistency model is that it tends to enable high-availability configurations, with no waiting or coordination required for readers or writers. This is desirable because the user expects the overarching system to be available, and fast – a feat which is difficult, or perhaps even impossible with one of the above consistency models, as each of them requires a central arbiter to determine the order of events.

The downside is that, when using an eventual consistency model as a component, we as system builders have very little to work with when it comes to meeting our overarching causal expectation. Sure, it can be faster and more likely to succeed without all that coordination overhead, but the user might also get a “weird” result that is not to their liking.

Because of this, eventual consistency can have a fairly insidious side-effect in practice. Because it provides so few guarantees of its own, developers have a nasty habit of working out their own half-baked consistency models to address the need. Done carefully, this is all well and good, but more often than not there are mistakes. Some of these can be relatively innocuous, others maddening, and others still lay in wait until the one day they take your system down without warning. At best, this can be a time-waster for your engineering team, but at worst, it could cost you customers.

Causal Consistency

Gradually becoming more popular, the causal consistency model is the big winner. While a little more complicated to implement, it can tend to offer the best of both worlds. Free from the dangers of coordination and single arbiters, operations can execute in any order, so long as they faithfully record their precursors. Observers can also apply operations in any order so long as they consider those recorded precursors. This means that no waiting is required, and the system will tend to be more stable in the face of network partitions or outages versus other consistency models. Readers and writers can work concurrently to provide better performance, and better availability.

The best part of the causal consistency model is that it is precisely what is needed to  meet the user’s causal expectation.  Yes, it requires conflict resolution procedures, but the same applies to the physical world.

At present, Riak is among the best examples of causal consistency models in production.

What does my software system look like in these terms?

System extent and boundaries

It’s easy to think about our software systems discretely – imagining them to end at the datacenter walls. In truth, the full extent of our systems, whether we like to admit it or not, is inclusive of the servers and the users themselves; Inclusive of their workstations, their browsers, their computers, and their connections over the internet to the datacenter as well. It’s important to remember this as you design and maintain software systems, as the factors outside of the data center often times play a crucial role to the success of your enterprise.

System components, and their consistency models

The reality is that we don’t have merely one consistency model. We have many consistency models that comprise our systems. These building blocks are cobbled together in the best ways we can, but we often short-change ourselves in our conceptualization thereof.

One could argue that the expectation of this overall system is causal. We users have the same expectation of software as we do of the physical world – In the same fashion that we do not tolerate objects jumping across tables spontaneously, nor will we tolerate it when software acts strangely or becomes unavailable. When this kind of behavior is seen, it is questioned, however subtly, and in our case seen as a mark of poor craftsmanship, or poor trustworthiness.

As we strive to provide technology leadership, it can be helpful to take a moment to see the bigger picture, and better understand the tools at our disposal. With which, we hope to remain relevant, and innovate rather than be disrupted.

Episode Resources

CTO and Founder, güdTECH
Founder: Unbase
Twitter: DreamingInCode
Co-Organizer: Papers We Love San Diego
Co-Organizer: San Diego Rust
LinkedIn: Daniel Norman
Personal Blog: AbNorman.com