Improving the User Experience by Scaling Experimentation

By Levi Thatcher - June 7, 2019

7 minutes - 1477 words

At Pluralsight we strive to be deliberate in everything we do, especially when it comes to the product experience. Long-time readers of this blog have no doubt heard about Directed Discovery, our framework for user-centered design pioneered by Nate Walkingshaw. Product managers (PMs) have leveraged it for years to discover unmet needs and delight customers. Recently, as our data infrastructure has matured at Pluralsight we’ve examined how quantitative data gathering could be done more within Directed Discovery.

Voice of customer (VOC) interviews are a great way to learn about users’ needs and motivations, and to understand users’ pain points to determine the key problem space. Customer preference testing (CPT) interviews are great for receiving direct feedback on mockups and understanding why particular design solutions are preferred by users. However, these interviews can be prone to bias. For example, only certain kinds of users will be willing to take the time to chat. And without building an MVP it’s difficult to get the user in-context such that they know how they’d actually react to a particular design or solution. As important as interviews are, one of the overarching issues with such qualitative feedback is, as Nir Eyal states in Hooked, that “people’s declared preferences… are far different from their revealed preferences.” In other words, what people say they want is often different from what they demonstrate by their actions.

Context is Crucial

How exactly can product practitioners get around that? First, by usability testing, where users are observed in context. And, more importantly for this post, via data collected as users traverse a live AB test. While we had done some AB testing on the product at Pluralsight, we hadn’t invested the time or effort needed to efficiently integrate it into our Directed Discovery process—so we determined we needed to skill-up. If you’re unfamiliar, AB testing is a way to randomly present users one of two page designs, for example, and then track how users respond over time on that page or in other parts of the app. In other words, a product team might have a hypothesis that their new design increases user engagement and they’d set up a test to determine if users who are randomly assigned to see the B experience subsequently engage more than those who were randomly assigned design A (i.e., the current or control experience). Often, experiments can lead to specific qualitative inquiry to understand the why behind certain behaviors seen in the experiment.

While such experimentation had been used in the product before, we hadn’t invested the time to determine how it should be scaled across our 20+ product experience teams. If we were going to scale it, we wanted to provide appropriate tools, process, and education to do it via best practices. This required research, and yes, some discovery efforts.

Build Measure Learn

As this blog has mentioned before, Pluralsight engineering culture is dedicated to Lean Development, which focuses on flow-efficiency, short iteration cycles, and eliminating unnecessary code or functionality. Our engineering teams have long-implemented many of these principles via mobbing, single piece-flow, and leveraging autonomous teams (including a product manager and designer) that plan their own work. An important lesson from the The Lean Startup is that we should be relentless in eliminating the waste of building features or products that don’t improve the user experience. They way to best achieve this is by leveraging a build-measure-learn loop.

While it’s important for developers to work efficiently, it’s crucial that they ship the right thing. The build-measure-learn loop from The Lean Startup helps verify that that’s the case. Instead of focusing on quickly shipping, the focus is on maximizing the rate at which teams learn about their users. This means rigorously testing the assumptions that come out of user interviews via MVPs and AB tests. Instead of celebrating any new feature, it means celebrating features that have moved the needle on key user experience or business metrics.

Before vetting potential AB testing solutions to help us accelerate such efforts, we needed to determine the cultural and technical hurdles that inhibited past experimentation.

Discovery to Improve our Discovery

We not only leverage Directed Discovery with external customers, but also when building tools or establishing processes internally as well. Here we interviewed internal product managers to determine pain points, needs, and tooling in the testing space; we combined this with external interviews and research to determine how other firms had scaled experimentation on their product teams.

We brought internal folks together from product, design, data science, product analytics, and engineering to study the interview transcripts and build affinity maps to organize the various concepts, thoughts, and feelings that arose in the interviews. We then worked through a customer journey map to determine the ideal flow of a product manager through an idealized AB testing framework. A few key needs arose out of this work (not all of them technical):

A framework around when teams should move from interviews to actually building MVPs and running experiments.
A way to easily create two experiences and deploy them in a short but rigorous experiment
A clear idea of what metrics are most important to the business (for judging the test)
Broad visibility into standard and reproducible analyses
Socialization of results to maximize organization, distribution, and understanding

When Should I Test?

Product teams at Pluralsight have long-known when and how to do customer interviews, when to generate hypotheses, and when to gather feedback on mockups. As we did research on integrating quantitative insights into Directed Discovery, we found that product managers had as many questions around when to run a test as they did around how to run a test. For example: How certain do we have to be before building? If we test, do we still run CPTs with mockups? When do we skip the test altogether? Scaling experimentation requires nearly as much process education as technical investment. To facilitate this, our practices organization is investing in fresh best practices to blend qualitative and quantitative research (i.e., mixed methods) in Directed Discovery.

Just Let Me Test

We found that for many PMs, being able to test an assumption quickly is key. Product teams at Pluralsight are autonomous and used to making decisions without onerous process. Following our cultural adoption of lean, these testing tools (since they’re internally focused) should focus on the PM as the flow unit. Of course, ultimately the goal is to improve the customer experience. This tooling focus on the PM requires that the various components of an experiment—building, feature flagging, web tagging, analysis, etc—be flow efficient for them (i.e., provide a high amount of value-adding activities relative to throughput time).

Metric Clarity

In our interviews, despite the fact that Pluralsight has KPIs and quarterly OKRs, even seasoned product managers expressed frustration by the lack of metric focus in their part of the app. While they felt empowered to make decisions, they often didn’t know the right metric to optimize toward. While we’ll discuss this more in the next post, a crystal clear focus on one or two north star metrics is crucial not only for scalability but also to run meaningful tests that build on each other rather than pull the product in opposite directions.

Trustable Analyses

While PMs—of course—trust their own analyses, due to the ad-hoc nature of past experiments we found trust issues around the quantitative research conducted by other PMs. While you’ll always trust your conclusions more than others’ findings, we were determined to close that gap. One of the main technical needs that came out of our experimentation discovery was for standardized calculations, reproducible analyses, and visibility into the process. In other words, a stats engine with clear visibility into the source code.

Can I Find That Research?

Considering our long history of qualitative user research, we have good internal tooling for centralizing qualitative findings (i.e., a knowledge repo). This allows teams to organize their own work and study past results before embarking on new research. We found that a similar solution is needed for quantitative results. We determined that this internal product was needed not only to organize research, but also to trumpet successes and provide visibility into team impacts on the business.

Conclusion

Whew—easy, right? Some of these solutions require process change and cultural shifts. These issues aren’t fixed in a week, but they can be solved by coordination and education. However, the other issues around standardizing metrics, calculations, and analysis as well as socializing and providing a knowledge repo of results—that requires serious technical investment. To do product experimentation well, and for it to scale as we grow, we’ll have to invest in either building a solution or rely on a third-party vendor. The decision-making process around build vs buy and what an MVP might look like will be detailed in Part 2. Come back soon!

For Part 2 in this series, see here

Categories: practices

Tags: directed discovery, experiments