Switching Databases

By Allan Stewart - March 23, 2017

6 minutes - 1267 words

In the summer of 2014, Pluralsight suffered a significant outage when our primary database failed. After recovering from the immediate problem, we decided we should migrate to a completely different database to improve performance and availability. Though it took some time to complete, the transition was surprisingly simple thanks to a powerful pattern.

Repository Pattern

The repository pattern separates data structures from how they are persisted. A repository exposes a generic interface that hides any database details. Here is a contrived C# example of an abstract repository for users:

public interface IUserRepository {
    void Save(Account account);
    Account Find(string accountId);
    Account FindByEmail(string email);
    List<Account> GetAccountsCreatedBetween(DateTimeOffset start, DateTimeOffset end);
    List<Account> GetActiveAccounts(int perPage, int offset);
}

The UserRepository implementing that interface deals with all the database queries and any required data mapping.

The repository pattern is simple to implement in just about any language and does not require adopting any kind of framework. It can work with simple data-transfer objects or classes that encapsulate data with behavior.

Because our monolith made extensive use of this pattern, it was a simple matter of writing up new repositories that used the same interface but accessed a different database. Dependency injection made it simple to swap implementations. But we also wanted to build confidence in the new database and avoid downtime for data migration.

Decoration to the Rescue!

To ease into using the new database, we applied the decorator pattern to create a third instance of the repository interface which composed the other two. Inspired by a Netflix blog post, the decorator used a set of feature toggles to control when the old and new repositories (and consequently their respective databases) were used.

With all the toggles turned off, the decorator would just act as a pass-through to the old repository. Then we would enable dual writing which would cause all of the write operations (such as Save in the example above) to go to both databases. We made sure to wrap try..catch logic around the new repository and log any problems. That way we could find and fix problems without impacting the our users.

With new and updated data being written to both databases, we could then start migrating existing data from the old database. In addition to a standard ETL job, the decorator also provided the option for us to write programs to just read and re-save all records. In some cases this was much easier than trying to map dissimilar table structures (e.g. NoSQL documents vs. relational SQL columns).

The next step was to enable shadow reads. When this option is enabled, the decorator reads data from both repositories, compares the results, then returns the data from the old database. As with dual writing, exceptions from the new database were caught and logged to avoid impacting end users. When results differed, we knew there were still bugs to fix. However, some repository methods could return a lot of data, so for some methods we skipped the shadow read step and opted to check consistency via a separate process rather than comparing thousands (or millions!) of records every time.

The last feature toggle would cause the decorator to return results from the new database. Because the data is still being written to both databases, we could safely try the new database and switch back to the old if something went wrong.

Once enough time passed to give us confidence in the new database, we took the final step of removing the decorator and the old interface. Database migration complete!

Well, for one repository anyway…

Reflect on Duplication

Our monolith had a fair number of repositories that needed to switch. With few exceptions, each repository was backed by a single database table. This kept coupling low, but meant that each repository had a different interface and would need its own decorator. To avoid the duplication of creating many decorators, some of my smart (or perhaps daring?) co-workers turned to the power of reflection.

First, they created some attributes to signal which repository methods were writes and which were reads. For example:

public interface IUserRepository {
    [RepositoryMethods.Write]
    void Save(Account account);

    [RepositoryMethods.Read]
    Account Find(string accountId);

    [RepositoryMethods.Read]
    Account FindByEmail(string email);

    [RepositoryMethods.Read]
    List<Account> GetAccountsCreatedBetween(DateTimeOffset start, DateTimeOffset end);

    [RepositoryMethods.Read]
    List<Account> GetActiveAccounts(int perPage, int offset);
}

Then they created a RepositoryProxy<T> class with a static method to create the decorated repository. With those in place, a ready-to-use decorator could be created by calling RepositoryProxy.Create(oldRepo, newRepo);.

Of course, reflection can be a dangerous tool to wield. It worked well for us, but in other contexts or other languages, there might be better ways to handle this. Or it might be best to just accept a small amount of repetition in creating decorators.

The Database: an Implementation Detail

Being able to switch from one database to another is pretty cool. But that’s really just one positive outcome of a much deeper architectural concept: your database should be loosely coupled to your code.

I’ve worked in codebases where the database code was so intertwined with the business logic that there was little hope of ever switching from one SQL-based database to another, let alone moving to NoSQL. I’ve also worked on projects where an ORM abstracted our database only to introduce tight coupling with that framework. Those were not fun projects to work on because changes tended to be a slog.

Maybe you will never need to swap databases. Maybe an ORM will not give you grief. Whether you use the repository pattern or not, you will always find benefit in keeping your database coupling low. Because at the end of the day, the database should just be an implementation detail.

Conclusion

There are many reasons why you might switch databases, including factors like scalability and cost. In our case, the Pluralsight monolith was using (or perhaps misusing) a database which was not giving us the performance and availability we needed. I didn’t name the specific databases here to protect the innocent, but really it doesn’t matter since this pattern is database agnostic. Use of the repository pattern drastically lowered the difficulty of switching databases.

Reflection Details

Describing how we utilized reflection for the RepositoryProxy<T> goes outside the scope of this article. The details are very specific to C#, and honestly it goes outside my comfort zone! But for those who are interested, here is a simplified example:

public class RepositoryProxy<T> : RealProxy
{
    readonly T oldRepository;
    readonly T newRepository;
    readonly Config config;

    RepositoryProxy(T oldRepository, T newRepository, Config config) : base(typeof (T))
    {
        this.oldRepository = oldRepository;
        this.newRepository = newRepository;
        this.config = config;
    }

    public static T Create(T oldRepository, T newRepository, Config config)
    {
        return (T) new RepositoryProxy<T>(oldRepository, newRepository, config).GetTransparentProxy();
    }

    public override IMessage Invoke(IMessage msg)
    {
        var methodCall = (IMethodCallMessage) msg;
        var method = (MethodInfo) methodCall.MethodBase;

        if (method.DeclaringType != typeof(T))
        {
            var returnValue = method.Invoke(this, methodCall.Args);
            return new ReturnMessage(returnValue, null, 0, methodCall.LogicalCallContext, methodCall);
        }

        var attributes = method.GetCustomAttributes(typeof (IRepositoryAttribute), true);
        if (attributes.Any())
        {
            var returnValue = ((IRepositoryAttribute) attributes.First()).Execute(methodCall, method, oldRepository, newRepository, config);
            return new ReturnMessage(returnValue, null, 0, methodCall.LogicalCallContext, methodCall);
        }

        throw new NotImplementedException($"'{method.Name}' is not marked with a RepositoryAttribute on '{typeof (T).Name}'");
    }
}

public interface IRepositoryAttribute
{
    object Execute<T>(IMethodCallMessage methodCall, MethodInfo method, T oldRepository, T newRepository, Config config);
}

public class WriteAttribute : Attribute, IRepositoryAttribute
{
    public object Execute<T>(IMethodCallMessage methodCall, MethodInfo method, T oldRepository, T newRepository, Config config)
    {
        // logic to write to the repositories based on configured feature toggles
    }
}

public class ReadAttribute : Attribute, IRepositoryAttribute
{
    public object Execute<T>(IMethodCallMessage methodCall, MethodInfo method, T oldRepository, T newRepository, Config config)
    {
        // logic to read from the repositories and compare results based on configured feature toggles
    }
}

Categories: technical

Tags: devops, database, clean code