Engineering @ Pluralsight: Responsible, Autonomous Teams
At Pluralsight we’ve put a lot of thought and effort into developing the engineering practices we use every day. These practices are based on principles that we’ve come to value and those principles lead to practices. We’ve documented those principles and practices in our Engineering at Pluralsight document. On the left side of each page is a principle that we value, and on the right side is a list of items that we Do, Encourage, and Avoid. This post is going to focus on the principle: We Architect Our System to Support Responsible, Autonomous Teams.
What We Do
To support responsibly architecting our system for autonomy, here are the practices that we ask all engineers at Pluralsight to adopt:
We build highly cohesive, loosely coupled bounded contexts. There are three important concepts in this statement: Highly cohesive, loosely coupled, and bounded contexts. We’ll start with the last one – Bounded Contexts. The concept of a bounded context comes from Domain Driven Design and largely encompases the idea of strategically dividing up a larger, complex system into smaller, manageable subsystems.
One significant advantage of bounded contexts is that as systems reach a certain size, it becomes increasingly difficult to have a unified model that works across a large codebase and a large database, etc. This is where the loosely coupled part comes in – we create a very loose coupling between bounded contexts which allows each context to have its own data and code models. In fact, each bounded context could even have its own tech stack. Each context also has its own database and if one bounded context’s data is needed in another bounded context, we prefer to share that data asynchronously via pub/sub messaging. In some cases we need to allow synchronous communication via APIs, but in no case do we ever allow one bounded context to talk directly to another bounded context’s database. This pushes the cross-bounded-context integration to the edges of the system – where messages are published or consumed or APIs are exposed.
With the integration pieces pushed to the edges, this allows the internals of the bounded contexts to be isolated and cohesive. Bounded contexts are cohesive because everything that is needed to deliver a user experience from design all the way to production is contained within that bounded context. This is because bounded contexts are created by dividing our system into vertical, not horizontal, slices.
We divide our product into full vertical slices. A vertical slice means that everything from the database to the UI is contained in a single bounded context. Contrast this with dividing the system horizontally, where you could end up with, for example, a front-end app and back-end services. Dividing the system horizontally means that bounded contexts (and therefore teams) are highly coupled to each other. If one team owns the UI and another team owns the back-end API for that UI, then the front-end team gets blocked whenever they need changes to the API.
When you divide a system vertically, you find seams within the system and divide them vertically such that everything from the database to the front-end are contained within the bounded context. When you combine this with cohesive, loosely coupled bounded contexts, you create an environment that scales because every team can work autonomously with very little friction between teams. You can keep creating teams and while the overall system complexity increases, the local complexity for each team is minimized and each team can execute mostly independently.
We collaborate with our architecture team on system design. Our architecture team is largely a support organization instead of a dictatorial authority. They certainly have opinions about and provide guidance to teams on how they should architect their systems, and ideally teams are reaching out to their architects for support, but ultimately the internal architecture of a bounded context is up to the teams. Having said that, the architecture team does own the overall system architecture. This is because there are important architectural decisions that affect all teams, especially due to the global complexity that is created by our loosely coupled bounded contexts. We all agree to work with the architecture team on items that affect the system globally such as how we communicate between bounded contexts. We also involve our architects when making big changes internal to a bounded context to seek their input.
We follow our technology radar. This is a piece of our Engineering @ Pluralsight document that maybe needs to be updated slightly. Our technology radar used to determine which technology and tools, etc. were allowed at Pluralsight. We have recently pushed much of that decision making authority down to the teams and their associated business units. But we do use our technology radar as a guide. The technology rader shows which technologies, languages, tools, frameworks, etc. are used and how broadly. We value autonomy, but we also encourage responsibility. Teams should consult the technology radar, the architecture team when considering which technologies they are going to use within a bounded context. To be good citizens within our organization, we need to consider how our decisions might affect the organization and Pluralsight as a whole. While we prefer to create local simplicity for teams, we need to be sure we’re not compromising the entire system by over-optimizing at the local level.
We participate in architecture guild. Guilds at Pluralsight are groups of people who share a common interest and so they create a guild where they can talk and learn together. The architecture guild is a place to discuss architectural concerns that affect the system as a whole. This guild is an opportunity to share cross-cutting concerns that are important for each team to know. We ask that a member of each team attend regularly. Anyone can bring up concerns or questions and together we address them.
What We Encourage
To support architecting our system for responsible, autonomous teams, here are the practices that we encourage all engineers at Pluralsight to adopt. They are important to us and we believe that they will bring additional value, but we also respect the autonomy of our teams and engineers and so, while we believe that these are valuable, we allow some flexibility here based on the context in which each team is working.
We favor asynchronous integrations between bounded contexts. Asynchronous communication has a few benefits. First of all, it is easier to push it to the edges of a bounded contexts systems. A team can create listeners that listen to published messages and stores the data in their database and that can be completely separate from the rest of the bounded context’s systems.
Secondly, and equally important, asynchronous communication is more resilient. Teams publish data as it changes and that data is then replicated, via messaging, to all bounded contexts that need it. And this happens in advance of when the data is needed. Each bounded context stores the data, as a local cache, in their own database. The big advantage here is that if one part of the system goes down, other parts of the system can continue to operate with their cached data. Of course, this doesn’t work out perfectly, so we try to create a workable, degraded functionality when a part of our system is down and this is much easier when we already have a set of cached data.
Of course, there are cases when asynchronous communication isn’t rational such as algorithmically heavy real-time computed data or cases when there is a very low tolerance for eventual consistency. We’ve found, however, that those cases are more rare than common and so we encourage the use of asynchronous communication where possible. There’s a lot that goes into making this work well including self-healing and cache invalidation, but that could be a whole blog post of its own.
We choose practices that match our team and problem context. Every team and every bounded context they own are unique; architecting our systems to allow for team autonomy makes it possible for teams to choose the practices that fit their unique situation. Engineering @ Pluralsight adds as a guide rail and includes both prescribed and encouraged practices but teams can also discover and create practices on their own team that are unique to them. This is much easier because our systems and teams are loosely coupled. There are a lot of decisions that teams have the freedom to choose on their own so long as they are responsibly considering the impacts their decisions have on the whole system and organization.
What We Avoid
There is one practice that we ask teams to avoid in order to support responsible, autonomous teams. It has to do with how we share data between teams.
We do not share data stores between bounded contexts. This is an important rule for autonomy. As soon as one team can access another teams database, that database becomes more rigid and unchangeable. It’s important that teams are able to move quickly within their bounded context and everywhere there is a coupling, especially across teams, it slows us down. Our teams know that they can freely change their database including dropping columns, changing data types, or dropping tables so long as their own systems are ready for the change. There’s no need to check with other teams because we strictly prohibit teams from touching another team’s database.
Conclusion
The end result of all of this is that we have 40-plus teams at Pluralsight who all are able to work autonomously and deliver within their own systems with minimized friction across teams. When a developer gets to work in the morning, they know that they can work on a feature with their team and ship it all the way to production as soon as they’re ready. Of course, we need to collaborate with other teams because we’re all working within the same system, but the goal is to minimize, as much as possible, the rigidity and friction of cross-team coupling. Architecting our system to support responsible, autonomous teams helps with that.