Measuring Impact In The Dojo
Last month at Agile Day Chicago, I (Joel) had the pleasure of listening to Mark Graban speak about separating signal from noise in our measurements. Mark referenced Process Behavior Charts, a technique described in the book Understanding Variation: The Key to Managing Chaos by Donald J. Wheeler. This simple tool helps us look at metrics over time and understand the difference between naturally occurring variations and signals, or variation in the metrics representing real changes. Wheeler calls both of these (signal and noise) “The Voice of the Process,” with the key being able to distinguish between the two. Signals can be indicators that a desired change is manifesting, or they can be indicators that something is wrong and requires further investigation.
We immediately saw the value in being able to separate signal from noise when evaluating the types of metrics we’re capturing in the Dojo that we talked about in our last post. We both grabbed copies of the book, devoured it quickly, and started brainstorming on applications for Process Behavior charts.
Let's look at an example of how to use Process Behavior Charts in the Dojo.
This may sound obvious, but before you start any measurement think about the questions you want to answer or the decisions you want to make with the data you’ll collect.
In the Dojo, we help teams shift from a project to product mindset. We focus on delivering specific outcomes, not simply more features . When delivering a new feature the obvious question is – did the feature have the desired outcome?
Imagine yourself in this type of situation…
We’re working with a team and we’re helping them move from a project model to a product model. In the past, the team cranked out features based on stakeholders’ wishes and success was simply judged on whether the features were delivered or not. We’re helping the team shift to judging success on whether outcomes are achieved or not.
We’re also working with the stakeholders and there’s resistance to moving to a product model because there’s fear around empowering the teams to make product decisions. New features are already queued up for delivery. Before we give the team more ownership of the product, the stakeholders want delivery of some of the features in the queue.
We can use this as a coaching opportunity.
The stakeholders believe the next feature in the queue will lead to more sales - more conversions of customers. The team delivers the feature. Now we need to see if we achieved the desired outcome.
Our first step is to establish a baseline using historical data. Luckily, we’re already capturing conversion rates and for the 10 days prior to the introduction of the new feature the numbers look like this:
Sample Data Set
Then we look at the data for the next 10 days. On Day 11, we have 14 conversions. Success, right? But on day 12, we have 4 conversions. Certain failure?
Here’s the full set of data for the next 10 days:
Sample Data Set, Next 10 Days After Feature Introduced. Did It Matter?
Overall, it looks better, right? The average number of conversions have increased from 6.1 to 7.9. The stakeholders who pushed for the new feature shout “success!”
Given a system that is reasonably stable, a Process Behavior Chart shows you what values the system will produce without interference. In our case, that means what values we can expect without introducing the new feature. Let's create a process behavior chart for our example and see if our new feature made a difference.
First Step - Chart Your Data In A Time Series and Mark the Average
Plotting Daily Conversions, Average Marked In Dotted Red Line
What does this show us? Well, not much. Roughly half of our points are below average and half are above average (some might call that the definition of average).
Second Step - Calculate the Moving Range Average
Our next step is to calculate the average change from day to day. Our day to day changes would be 2, 4, 4, 2, 6, 3, 2, 5, 3 for an average change of 3.4. All this means is that on average, we see a change in the number of conversions day to day of about 3. If we were to plot the number of changes in conversion day to day, we would see roughly half above and half below - again, the definition of average.
Third Step - Calculate The Upper And Lower Bounds
To calculate the upper and lower bounds, you take the moving range average and multiply it by 2.66. Why 2.66? Great question - and it is well covered in Don Wheeler's book. In brief, you could calculate out the standard deviation and look at 3 sigma, but 2.66 is faster, easier to remember, and ultimately tells the same story.
We take our moving range average of 3.4 and multiply it by 2.66 giving us 9.044. What does this number mean? It means that with normal variance (the Voice of the Process), we can expect conversions to fluctuate 9.044 above or below our average number of conversions which was 6.
To put it more clearly, without any intervention or new features added, we should expect between 0 and 15 conversions per day - and that would be completely normal.
Let's visualize this data. We add our upper and lower bounds to our chart for our first 10 days. It now looks like this:
Data With Process Control Limits Applied. UPC - Upper Process Control, LPC - Lower Process Control. NOTE - Since the LPC is actually -3, we use 0 since a negative is not possible
Fourth Step - Introduce Change & Continue To Measure
We have established the upper and lower bounds of what we can expect to happen. We know that after the feature was introduced, our conversion numbers looked better. Remember, the average went up almost 30% (from 6.1 to 7.9) - so that is success, right?
We extend our chart and look to see if the change actually made a difference.
Conversion Chart With Upper And Lower Process Controls. Note - Average, UPC, and LPC Were Not Updated With New Data Points To Prove The Next 10 Days Fell Within Previous Dataset Limits
Our average for the next 10 days was higher, but looking at what we could normally expect the system to produce, all of the conversions were within the expected range. In essence, the feature we delivered did not create a meaningful impact to our conversions.
Note, we’re not saying that nothing could be learned from delivering the new feature. The point we’re making is that prior to delivering the feature we assumed it would lead to an increase in conversions. Using a Process Behavior Chart we were able to show our assumption was invalid.
Now we can continue the conversation with the stakeholders around empowering the team to improve the product. Maybe now they'll be more open to listening to what the team thinks will lead to an increase in conversions.
We like using this visual display of data to help us concretely answer questions focused on whether or not our actions are leading to the intended outcomes. For example, we are experimenting with Process Behavior Charts to measure the impact of teaching new engineering and DevOps practices in the Dojo.
Process Behavior Charts can be powerful, but they require that you ask the right questions, collect the right data, AND and take the right perspective. Using a Process Behavor Chart to prove a change is beneficial to one part of the value stream (e.g., the “Dev” group) while not taking into consideration the impact to another group (e.g., the “Ops” group) would be missing the point. Consider the complete value stream when you are looking at these charts.
For more information on these charts, as well as the math behind them and what other trends in data are significant, we recommend the following:
Understanding Variation - The Key To Managing Chaos; Don Wheeler
Lean Blog - Mark Graban, in particular this post on home runs in the World Series
Process Behavior Charts (also called Shewhart Charts) – this article talks about various patterns that are statistically significant
If you found this helpful and you adopt Process Behavior Charts, please let us know how you are using them and what you are discovering.
Dojo Metrics - Moving From What is Easy to capture to What Matters
A fair question to ask when starting a Dojo (or any initiative for that matter) is “how do we know this is working?” Invariably, right on the heels of that question somebody always brings up the idea of capturing metrics. Then they turn to us and say “What are the right metrics for the Dojo?”.
The best metrics provide insights that help us take action to improve the current situation. In the case of a new initiative like a Dojo, that action might be making a decision to continue the initiative, modify it, or end it.
Sadly, metrics are often arbitrary or they tell an incomplete story. Single metrics fail to capture the interplay and tradeoffs between different metrics. We’ve heard many stories of how organizations optimizing for one metric created detrimental results overall. (We’re looking at you, capacity utilization.)
how do we measure the effectiveness of the Dojo?
The primary goal of the Dojo is to foster learning. We need to measure the effectiveness of that learning and ultimately, we need to measure the economic impact that learning has on the organization. But it’s not learning at any cost. We’re aligned with Don Reinertsen on this point.
In product development, neither failure, nor success, nor knowledge creation, nor learning is intrinsically good. In product development our measure of “goodness” is economic: does the activity help us make money? In product development we create value by generating valuable information efficiently. Of course, it is true that success and failure affect the efficiency with which we generate information, but in a more complex way than you may realize. It is also true that learning and knowledge sometimes have economic value; but this value does not arise simply because learning and knowledge are intrinsically “good.” Creating information, resolving uncertainty, and generating new learning only improve economic outcomes when cost of creating this learning is less than its benefit."
Don Reinertsen - "The Four Impostors: Success, Failure, Knowledge Creation, and Learning"
Reinertsen stresses the need to generate information efficiently. This is easy to understand when thinking in terms of generating information that helps you make decisions about your product. For example, it’s a fairly straightforward exercise to determine the costs for generating information by running low-fi, paper prototype tests that answer the question “should we include this feature or not?”
It’s also easy to understand how you might measure the effectiveness of knowledge creation when helping teams make improvements on their continuous delivery pipelines. We can calculate the cost of learning DevOps practices and compare that to expenses saved by automating manual processes.
What’s not as easy to understand is how to measure the impact of learning cloud native architecture or micro services - or something even more nebulous, like product thinking and the impact of learning a design practice like personas.
We would expect the impact of these learnings to result in lower development costs, decreased cycle times, and increased revenues resulting from better market fit for our products. But – there is a high degree of uncertainty as to the level of impact these learnings are going to have on the organization. (Again, hat tip to Don Reinertsen. His post about looking at the economics of technical debt influences our thinking here.)
In addition, during a team’s tenure in the Dojo it’s quite probable that their productivity will decrease as the team is creating new knowledge and incorporating new practices. The team's investment in learning carries a cost.
Ultimately, we need to understand the impact the Dojo has on lifecycle profits. That impact will often occur after a team has left the Dojo.
We have started organizing metrics in the Dojo into three groups. Our goal is to help orient stakeholders, leaders, and teams around what actions these metrics will help them take. We also want to help them understand the level of effort required to collect the metrics and the timeframes in which they will be available.
Three Categories of Metrics for the Dojo
Simple To Capture - Organizational Reach
These metrics simply show the amount of “touch” the Dojo has.
Examples include:
Number of teams going through the Dojo
Total number of attendees
Number of Programs / Portfolios touched
Astute readers may critically call these “vanity metrics” and they would not be wrong. These metrics do not equate to impact. They don’t help us answer the questions “Were the right teams involved?”, “Did the amount of learning that happened justify the investment?”, or “How much learning stuck?”
However, these metrics are simple to collect and can be used as leading indicators once we have metrics on the economic impact the Dojo has on teams. For many organizations, these metrics are important because they imply value as the Dojo is being bootstrapped, even though they don't prove it. They are metrics everyone is comfortable with.
Harder To Capture – Directional/Team Based Improvements
Metrics in this category are more important than the previous category in the sense that these metrics look at the directional impact of learning in the Dojo and how that learning is impacting teams.
Examples include:
Number of automated tests
SQALE code quality index
Percentage reduction in defects
Cycle time reduction to deliver a product increment
Velocity / Story count (with the obvious caveat that these can be easily gamed)
Again, these metrics are far from perfect. The testing related metrics do not prove the right tests were written (or the right code for that matter). Metrics showing products were built faster don’t shed any light on whether those products should have been built in the first place (what if nobody buys them?).
What these metrics do show is the incorporation of product delivery practices that are being taught in the Dojo - practices that our experience and the experiences of other organizations have shown to have a positive impact on lifecycle profits. These metrics can be collected with agile project management software, SonarQube, Hygieia, or other comparable tools.
When we use these types of metrics we need to have a baseline. It’s helpful to have data for teams for two to three months prior to when they enter the Dojo. We don’t always have this baseline, however, and in some cases the best we can do during a team’s tenure in the Dojo is help them establish the baseline. Obviously, we want to track these metrics for teams after they’ve left the Dojo to see how well new practices are sticking.
Difficult To Capture – Impact/Economic Improvements
Metrics in this group are challenging - not only to collect but also because using them to drive action challenges the way many organizations work. These are the metrics that force us to look at the question “Is this initiative having a positive economic impact on the organization?”
Examples include:
Increase in sales conversion
Cycle time reduction for a delivery with impact (not just delivery, but a delivery that mattered)
Systematic cost reductions (not silo optimizations that may have detrimental effects in other areas)
Savings resulting from killing bad product ideas early in the discovery/delivery cycle
Metrics like these can prove initiatives like the Dojo are having a positive impact on lifecycle profits. These metrics will be substantially harder to collect. We need to collect data for a much longer period of time. We need to align with the finance department in our organizations. And, we need whole product communities aligned around a shared understanding of what successful outcomes look like. In addition, we need to understand how to separate real signals of change from noise. (This post has more on that topic.)
Ultimately, this last category of metrics is what matters. This is where the Dojo shines. We work with teams to teach the practices, thinking, and communication strategies that will have an impact on lifecycle profits.
This is an ongoing area of improvement for us. This is what we are currently practicing. These categories of metrics are helping foster conversations, understanding of what knowledge individual metrics can provide, and the value of investing in the Dojo.