Fixed time/fixed scope engineering work at Pluralsight
Arguably, the most frequently asked question from an engineer is: “When will this be done?” And yet in our engineering principles document we stipulate that “we don’t do fixed time/fixed scope work”. No wonder, this is one of the most frequent questions that different parts of our organization are asking about: “How can we make an agreement or sign a contract if we don’t have a fixed point in time when we can all agree that something will be ready for our clients?”
In order to answer this question, we first need to connect it to a few more of our principles and practices:
“We optimize for value delivery”
One of the ways that we optimize for flow efficiency (Lean framework) is to remove secondary work. For example: we don’t keep detailed backlogs, and we don’t have “backlog grooming” sessions. We don’t spend a lot of time arranging work that is not actively being worked on. Of course there is planning of features and coordination at the product roadmap level. For development though, as part of our Lean methodology framework, we practice “just in time” discovery and design of a feature or card.
Contrary to “just in time” work, estimation (the main process by which we can agree to a timeline) needs a certain degree of technical discovery done ahead of time. The more time spent on technical discovery, the more confidence we can have in our estimates, although only up to a certain point; there are some features that will only reveal their complexity once the team has started working on them. Also, there is the risk of discovery work becoming obsolete by the time the actual work is ready to begin.
One of the factors that can complicate the estimation of a feature is the level of maturity of the codebases involved in the feature or project. Older codebases could have a lot of pitfalls when trying to do a seemingly simple update. I made the comparison of an old codebase to an episode of “This old house” (a construction related TV series, where the crew makes updates to old houses), in which, moving an outlet by a foot, could cause the rebuilding of half the house (!), based on what gets uncovered behind the wall (fire hazards, systems out of code etc). On the other hand, a new codebase could produce unforeseen problems or challenges for the team. A new codebase, like new construction, is the equivalent of designing and building it at the same time. Some designs can go ahead of time, but a lot is determined at the moment of building it. Usually that process has a million small, yet important, decisions that have to be made, sometimes during the course of a single day. For these decisions, we need the whole team to be present and weigh in on the trade-offs. This is part of the way we work, the process baked into our pairing and mobbing practices, in order to keep everyone informed on the decisions and to get everyone’s input and ideas. In cases like this, the process (and progress) can be slow at times, because we are prioritizing the resilience of the team and the codebase, which we leverage against the speed of development.
“We are empowered and responsible”
Usually a feature or project cannot be completed entirely within the boundaries of a single team. Following the set of principles around how a team works in this scenario, brings on new complexity. Let’s take the example of a new project or user journey that needs to be implemented by at least two different teams. The two teams will need to choose how to address the problem, but will also need to identify ways to collaborate; maybe share ownership and functionality of a single page, or work out a solution to minimize the dependency on each other’s Bounded Context (BC) in case of failures or errors in cross-BC communication.
Architecting a solution in this scenario is a creative process that might require its own brainstorming, cost benefit analysis, alternative design production, synthesis, and agreements. Most times the more fail-safe solutions require the heaviest investment in time and effort, but also pay off in stability and maintainability of our systems.
Side effects of such efforts will probably include re-prioritization of each team’s work, maybe even remaking agreements with other teams or business commitments. OKRs and product roadmaps are also impacted through this process.
“We create and maintain psychological safety” and “We create quality products that we are proud of”
For those of us that have worked in client services, consultancy, and/or as contractors, we can tell you that the art of estimation (to meet an agreed timeline) invokes strong negative feelings, especially anxiety, as to the accuracy of the estimates.
Why?
Estimates of time and effort, in some businesses, are the main way of calculating cost, price, and client billings. Consequently, they have to be made as accurate as possible. However, estimation is an inaccurate process and could be off, with detrimental consequences:
- usually personal overtime needs to be done ASAP at the risk of burn-out,
- the business will have to invest time and effort in damage control,
- no one gets compensated for the extra time and effort,
- the person or team responsible for the delay may feel like an outcast.
Engineering at Pluralsight is part of our effort to shift the above common practice to a very different narrative: Teams use OKRs, as well as product and technology roadmaps, in order to support the company’s strategic goals and commit to timelines (granularity at the quarter level.) Additional work and/or time-sensitive work that interferes with these goals is carefully vetted for its effectiveness and cost-benefit before getting added to the team’s workload. All teams involved in that effort agree to the timeline and scope. Agreements are (re)made and understanding of the high-risk points are understood by everyone involved in the process. Flexibility, along with clear ownership of the steps and good use of decision making frameworks such as RAPID, help support all efforts.
Risks
You might ask: “What is the real problem when an estimate goes wrong? Why can’t we put in some extra hours of work, or a couple of more people (or a team) to work on it?" The main issue with adding more people to a process is that the process will have to slow down first, while people are onboarded before it can be sped up (the mythical man-month).
As engineers, we start from the principle that “We create quality products that we are proud of”, yet sometimes in crunch mode, the time trade off is much too great to bypass. The result of work in crunch mode, will inevitably accrue more technical debt than other codebases (ie. more code will need to be refactored, code comments might reveal a need for a 2.0 version, or our solution won’t scale to a generic solution).
Lack of extensibility might also be a side effect of a codebase being worked on in crunch mode: The use case that we are solving for is the use case that we have in front of us right now and we might not be thinking about the more generalized user case that will need to be supported in the future. The trade-offs need to be understood and a solution needs to be agreed upon by all the stakeholders.
The more complex the interrelation between the functionality owned by multiple teams, the easier it might be for one team to implement a solution on their BC that would have been better owned by another team/BC. This way a BC might end up owning functionality that will first need to be moved and then iterated on by the proper team. This process goes against our value of delivering value fast.
Any process of the type of “building the plane as we fly it” contains a constant conversation or negotiation of scope as new or more defined needs take shape around the new functionality. Each conversation might lead to reevaluation or re-writing of part of the codebase.
A time sensitive deliverable might jeopardize strategic goals of the business long term. The “urgent” vs. “important” might get confused and the wrong things might get prioritized. It happens. Sometimes because people are in the hot seat with clients or big contracts, sometimes because fear and panic take over.
Last but not least: Sometimes scope changes seem harmless, but for every exception made, there is a possible future where the exception was not communicated to every person or team involved, and miscommunication and misalignment are the natural consequences.
Mitigation of risk: What can each of us do to help?
- We acknowledge that sometimes, when asked the question of “Can we do X?", the answer cannot be “No”, but we will respectfully challenge the basic assumptions so that we can collectively arrive at an elegant solution and agreement.
- We allow for enough buffer time when coming up with estimates.
- We make our best effort to prioritize work that will unblock other teams that have more critical or time sensitive work to complete.
- We document functional or technical specs for this type of work and keep them updated with any pivots or new decisions.
- We include all the teams impacted by an agreement involved in the making of the agreement. It seems self-evident, but the more teams involved the harder it is to get everyone on the table at critical moments.
- We provide clarity as to the ownership of all parts and we provide RAPIDs, which take some time to set up but work well.
- We have a common front to external customers, we try to provide documentation of the details and intricacies so revenue and leadership can effectively explain and advocate based on the nuances and trade-offs.
- We try to minimize additional scope creep and we try to avoid doing constant discovery “on the fly” for any question related to a feature or product, because it interferes with the teams’ focus and speed.
- After the completion of this kind of work, we allow the teams time to take care of technical debt that might have been accrued, we reflect on what went well and what didn’t in order to update our practices.
Despite best efforts or even if all risk mitigation is implemented, there might come a time with fixed time/fixed scope work where we will be faced with an impossible deadline. In those situations transparency, frequent communication and creative solutioning are to be prioritized in order to minimize the impact on the teams as well the business. Explaining the constraints and pivots to our clients goes a long way. We have seen big clients agree to delaying the general release of a product by a month or more due to problems encountered along the way.
In conclusion, we would like to invite all groups in Pluralsight involved in a signed deal to think with a bit more flexibility in mind. Can we be creative about the progression of a timeline? Can we bake language into our contracts to allow for flexibility in case of a “perfect storm”? (Think of a hypothetical scenario where there is a pandemic + reorganization + attrition + hire freeze happening all at the same time.) What could have we put into place in our contracts to ease some of the finality of the deliverables? (Could we have remade some agreements?)
Note: This post has been created in collaboration with the Center of Excellence, specifically with the help of Todd Fisher and Markus Neuhoff.