Patterns of an Eventually Consistent Bounded Context: Out of Band Healing

By Matt Baker - November 3, 2017

6 minutes - 1170 words

When working in a distributed system, your overall system is comprised of discrete components. These components can have different names, e.g. microservices or sub-systems. I’m going to call these discrete components “bounded contexts”, a term borrowed from Eric Evan’s book Domain Driven Design¹.

Sometimes these bounded contexts have the need to be eventually consistent². This series of articles is an attempt to catalog common patterns that I’ve encountered when working with eventually consistent bounded contexts.

In this installment we will cover Out of Band Healing, a pattern that can be used to reduce temporal coupling when healing your server-side caches.

The House That Temporal Coupling Built

You’ve built a bounded context that handles incoming requests. Your client calls you, and in order to issue a response to your client you first need to call another bounded context. Your bounded context has a dependency on another bounded context. The architecture might look something like this:

Naive Architecture

Within this architecture, as is the case with all architectures, there is coupling. There is one particular kind of coupling that I’d like to focus on. Before we can respond to a request from the client we need to call our dependency. In other words, we cannot respond to our client until we’ve received a response from our dependency. This is a form of temporal coupling. We can define temporal coupling as “the degree to which the sending and handling of a message are connected in time”³. This coupling produces an interesting reality for us – our state, at any point in time, depends on the state of our dependency.

To be sure, this architecture is just fine assuming you have satisfactory answers to the following questions:

Is it okay if the client waits while you make a synchronous call to the dependency?
What do you do if the dependency is down or otherwise unavailable in a timely manner?

It may be that you can return an error and/or a default response to the client when you cannot communicate with the dependency. If that is the case, I think we’re done here. However, if an error code or default response is not a solution that fits your constraints, let’s talk about caching.

Cashing in With Cache

A great way to begin reducing a temporal coupling is to introduce caching. Applied here, our architecture might evolve into something like this:

Architecture With Caching

When we receive a request, we’ll get the necessary data from our dependency. Prior to responding to our client, we will save that data to our cache. The next time a similar request comes in, instead of going all the way to our dependency we will use the data found in our cache. How long we keep this data in our cache is context specific – it might be 5 minutes, it might be 5 days. The amount of time that we keep this data in our cache is the exact degree by which we have reduced our temporal coupling.

While we’ve reduced our temporal coupling to a degree, we’ve yet to totally remove it. Anytime that we are not able to use the data in our cache to respond to our client, we have to call out to our dependency in order to get the data we need – hello temporal coupling.

It may be that you are perfectly fine with this degree of temporal coupling. If you are familiar with the CAP theorem⁴, this is a great architecture if you are optimizing for consistency. However, it may be that you are not optimizing for consistency, or maybe this degree of temporal coupling is not acceptable. If that is the case, let’s keep walking towards the light.

In-Band Healing

I need to take a moment here and define a term. When a client makes a request to our bounded context, everything that happens prior to issuing a response to that client is considered “in-band”. If we cannot serve a response to our client based on data in our cache, we need to heal our cache by getting data from our dependency. In other words, we don’t respond to our client until we have healed our cache. This is known as “in-band healing”.

Kick Cache Out of the Band

We still have a temporal coupling to our dependency anytime that we need to perform in-band healing. To remove this coupling, we need to move the healing out-of-band. Our third and final iteration looks like this:

Architecture With Out-of-Band Healing

When requesting data from our cache, one of two things is going to happen. Either we will have a cache-miss, or we will have a cache-hit. A cache-hit occurs when there is valid data in our cache for a given request. A cache-miss occurs when:

There is no data in our cache.
There is data in our cache but it has an expired TTL.

A TTL for cached data is simply a time associated with a given cache record that determines how long to consider the record up-to-date. An expired TTL tells us that we need to perform some action on the cached data.

If we experience a cache-miss when reading data from our cache, we need to decide how to respond to our client. If the cache-miss is due to expired data, we might just respond with the expired data. If the cache-miss is due to no data⁵, we can issue a default response to our client, or an error. In addition to handling the cache-miss, we will also publish a “cache-missed” event. This is where the “healer” comes into play. The healer is subscribed to the cache-missed event. Upon receiving the cache-missed event, the healer will retrieve the necessary data from our dependency and write it to our cache, ensuring that we have valid data for the next time a similar request comes through.

The crucial part to note is that the healer is operating out-of-band. When we experience a cache-miss, we don’t turn around and call our dependency. Instead, we respond as best we can and raise a cache-missed event that will be handled asynchronously, or out-of-band. If you examine the final diagram, you’ll note that we are no longer directly coupled to our dependency. Our temporal coupling is completely removed – our bounded context can continue to operate even if our dependency is completely unreachable.

Conclusion

Out-of-band healing allows us to remove the temporal coupling that is present when healing your server-side caches. I hope that I’ve demonstrated the reasoning behind this pattern and left you better equipped to use it, if necessary. While this pattern removes temporal coupling, it also introduces extra complexity. As with all patterns, you should assess the benefit and ensure that it outweighs the cost for your given context. When in doubt, I like to start simple and evolve my solution to meet the needs that I discover along the way.

Footnotes

1: https://www.goodreads.com/book/show/179133.Domain_Driven_Design
2: https://en.wikipedia.org/wiki/Eventual_consistency
3: https://www.infoq.com/news/2009/04/coupling
4: https://en.wikipedia.org/wiki/CAP_theorem
5: There are strategies you can use to prevent cache-misses due to no cached data. This will be the topic of another article.

Categories: technical

Tags: architecture, messaging, APIs