Testable Improvements

…the Scrum Team has completed a Sprint and wishes to improve in the next Sprint. Team members have collected information on their current performance, and are doing a Sprint Retrospective to determine what they should do to improve. Naturally, a team wants to take actions that have a real, lasting effect on performance.

✥       ✥       ✥ 

Self-improvement efforts are typically abstract platitudes. If a performance boost follows a planned change, it may simply be a coincidence.

It’s easy to decide to do something in the hope that it will improve team performance. The success of a good kaizen (incremental improvement; see Kaizen and Kaikaku) depends first on agreeing on a plan of attack, second on adhering to that plan, and third on testing whether the plan worked. To change the plan without testing the consequences is an arbitrary behavior. Not following the plan of attack may mean that different team members have interpreted the plan differently and that people are following their own interpretation. It will be a waste of effort if the team blindly follows its plan without heeding data that suggest that the team is headed in the wrong direction. It feels good to focus on improvement, and it’s easy to confuse how hard the team is trying with the degree the team is remaining faithful to the new discipline. Without feedback about results, it is as likely that a change can wreak havoc as create improvement. If the team is not conscious about what it is doing and the degree it is faithful to the agreed kaizen, it will be difficult to ascribe any change in performance to the planned kaizen itself. This leaves the team in a position of not knowing whether to continue a given behavior in the long term, or not.

When we take action to improve results, we expect to see improved results. Without specific objective measurement, we might just imagine improvement. For example, people who buy a dietary supplement designed to make them feel healthier and stronger may enjoy a placebo effect that leads them to feel healthier and stronger, regardless of any physiological improvement.

Similarly, when we take some action to improve results, we may subconsciously change other behaviors because we are watching ourselves. For example, a person who buys a fuel efficiency device will probably see an improvement in gas mileage—not from the device but because he or she subconsciously changes driving habits to drive more efficiently. Who knows if the fuel efficiency device is even operating? A kaizen proposing that inspections might increase the fault detection rate brings as much focus to fault density as to the inspections themselves. Team members are likely to unconsciously be more circumspect about preventing faults during their work leading up to the inspection. It’s an evil form of the old adage that you will get the results for which you measure, and the fault lies in measuring results alone.

Some actions sound good, but without measurement we may invest considerable time and effort, but with no objective understanding of their impact. One of the authors worked in a large company when his division worked to obtain ISO 9001 certification. He asked the process coordinator whether he thought the actions required by certification would really result in any team improvement. The process coordinator was fully confident that ISO certification would spark considerable improvement. However, this author never saw any studies to determine whether quality or productivity had improved after ISO certification. The unspoken goal had become the certification rather than the culture of improvement it might have introduced. Furthermore, the certification evaluation is by nature somewhat subjective, and entails personal judgment about the degree of compliance.


Write improvement plans in terms of specific concrete actions (not goals) that the team can measure objectively to assess whether the team is applying the process change. First, measure to see if the team is following the planned action. Second, measure the change in performance to evaluate whether the kaizen had the desired results.

In short: say what you will do, and do what you say.

✥       ✥       ✥ 

If the team knows it has been following a planned set of improvement actions, then its members can assess results to know whether to continue on that path or rather to try something else.

In a Community of Trust such as a good Scrum Team there is no independent or external testing agent to scrutinize adherence to a plan of action. While Scrum follows the Toyota Production System in promoting transparency, research shows that decreasing employee monitoring can actually increase net transparency ([1]). The ScrumMaster should encourage team members to assess themselves with checklists and reflection.

Important: This pattern is as much about knowing whether people are taking the agreed upon actions as it is about measuring to see whether the team has improved (e.g., did your velocity increase?) First, we want to find out if the team is actually adhering to the planned action. Second, we measure whether the desired improvement actually came to pass. 

This is all about getting away from wishing and hoping we will get better, moving to actually doing something concrete to get better, and understanding whether actions really had the desired results.


Avoid measures that are hard to quantify, such as:

Note that if the actions are not testable, it’s usually really hard to know how to do them. For example, exactly how do you go about “communicating better?” So the natural reaction is to not bother trying. And you end up with retrospectives that are only window dressing, and the sense that this may be true becomes the elephant in the room. [2] Such pro forma actions lead people to view Scrum as an arbitrarily imposed set of hoops through which everyone must jump. That feeling can fuel apathy and cynicism.

Most Scrum traditions measure success in terms of ROI, some other impact on financials, or some other value proposition (see Value and ROI). Chris Matts [3] suggests that the business hold the Product Owners accountable to some direct, measurable outcome of their Value Stream management, within their purview of influence. For example, instead of measuring ROI, we might measure the market engagement with the product; a financial group or management can convert that to ROI if needed.

In order to implement Testable Improvements, you must have regular Retrospectives. Within a Retrospective, do the following:

  1. Examine the previous Testable Improvements to see whether the team actually did them, and whether they had positive impact.
  2. For each proposed improvement, ask how the team will validate the improvement (how you know whether the team took the planned action, and to what extent.) If you can’t validate the proposed improvement, don’t accept it.

Note that some improvements require that the team “stop doing” something (e.g., “Stop picking your nose!”). Such improvements are generally easy to measure (e.g., video record you, and count the number of times you picked your nose).

To measure whether an improvement action actually works, you need a meter, a scale, and a baseline of the current performance. The team uses the scale to quantify the improvement, such as percentage of Daily Scrum meetings nobody missed. The meter indicates the process to establish a location on the scale, for example, counting the Daily Scrum meetings nobody missed and calculating the percentage at the end of the Sprint. As with other measures such as velocity, they are owned by the team and are for the team’s use, though these measures remain transparent to all stakeholders. They should not be used for any external assessment of the team’s performance.

You will be able to test whether you are moving toward Greatest Value.

Regarding objectivity and subjectivity: Jerry Weinberg says, “You can turn anything into a number.” (Jerry has long said this, and confirmed it again in a conversation with Jim Coplien on December 15, 2017.) This is a two-edged sword. On one hand, you can come up with numbers that are meaningless, and people view them with far more statistical significance than they deserve. On the other hand, you can take deeply meaningful “subjective” concepts such as team engagement and quantify its trend as improving (+1), getting worse (-1), or staying the same (0). These measures provide a foundation for discussion and powerful discovery, particularly as one explores the why behind them. See Happiness Metric.

Contrast this pattern with Definition of Done, which is more about the result than the process used to obtain it. To a rough approximation, Testable Improvements is best when the process is the primary focus, and Done when the focus is on improving the product.

This is loosely related to the Japanese concept of kamashibai, which is a record of observations of conformance to standard work. The term kamashibai usually applies to management activities. Here, we intend that every team member self-monitor and that, in addition, the ScrumMaster continuously assesses the team members’ faithfulness to their charted kaizen direction.

See also [4], pp. 73–128.

[1] Ethan S. Bernstein. “The Transparency Paradox: A Role for Privacy in Organizational Learning and Operational Control.” In Administrative Science Quarterly 57(2), 21 June, 2012, http://journals.sagepub.com/doi/abs/10.1177/0001839212453028 (accessed 2 November 2017).

[2] —. Wikipedia: Elephant in the room. https://en.wikipedia.org/wiki/Elephant_in_the_room (accessed 28 January 2019).

[3] Chris Matts. “Why business cases are toxic.” The Risk Manager, https://theitriskmanager.wordpress.com/2017/08/20/why-business-cases-are-toxic/, 20 August 2017, accessed 7 April 2018.

[4] Mike Rother. Toyota Kata: Managing People for Improvement, Adaptiveness and Superior Results. New York: McGraw-Hill Education, 2009, pp. 73–128.

Picture credits: Shutterstock.com.