Part 1: Requirements
Ever since childhood, we all want to know what grades we get on our report cards, and what these grades mean in terms of how well we are doing. We want to be evaluated based on parameters we understand and we can affect by our efforts.
A key issue in manufacturing is consistency as we go from a shop to an department to an entire plant and to the company as a whole. We don’t want to use parameters in terms of which which excellent local performance can aggregate to poor global performance. Once performance measures are identified, the next challenge is to use them as a basis for management decisions that are in the best interest of the company while being fair and nonthreatening to employees. In particular, actions taken to improve one aspect of performance must not degrade another. In addition to these issues, in a lean environment, we need to consider the impact of improvement projects, before and after they are carried out.
Measuring process compliance or results?
One possible approach to performance evaluation is to measure how closely our practice matches a standard of how things should be done. This is how you will be evaluated if you apply for the Malcolm Baldridge award or for ISO-900x certification. If matters little whether your outgoing quality is any good, as long as you follow the “right” processes. Iwao Kobayashi’s “20-keys” approach follows the same logic. The keys have names like “Cleaning and organizing,” or “Quick changeover,” and each key has 5 levels of achievement. By definition, a plant that is at level 5 in all 20 keys is excellent.
The advantage of process measures is that the corrective action for bad performance is always to bring the plant closer to compliance. But is it impossible for a plant to be at level 5 in all 20 keys and still chronically lose money? Don’t some of the keys matter more than others? The world would be simpler if a process existed such that compliance guaranteed excellence.
In fact, all the stakeholders in a factory care much more about the results it achieves than the processes by which it does. Most commonly used are the five dimensions of Quality, Cost, Delivery, Safety, and Morale. More generally, Harvard’s R. Kaplan has proposed a “balance scorecard” to measure multiple aspects of business performance, as opposed to just manufacturing performance.
Requirements on metrics
Metrics should be focused on results rather than process compliance. The Malcolm Baldridge award criteria, ISO-900x, or Kobayashi’s “20-keys to workplace improvement” promote performance measurement based on check lists of how close actual shop practices are to some norm. The problem with this approach is that it is possible to score high on any of these check lists and still go bankrupt. In other words, it’s not what you do that counts but what good it does. The key requirements for metrics are as follows:
- Ratings by external agencies for consumer goods. The JD Power and Associates Initial Quality Surveys rate products based on the number of problems reported by buyers within the first three months of purchase, which reflect manufacturing quality. Consumer Reports publishes reliability and maintainability ratings for cars that have been on the road for several years, which are more indicative of design quality.
- Counts of customer claims. For parts sold to OEMs, Quality Problem Reports (QPR) are the equivalent of traffic tickets issued by customers. They require failure analysis by the supplier’s quality department and a report to the customer in a prescribed format, such as Ford’s “8D” or Honda’s “5P.” The rate at which such claims are issued is clearly a metric of the customers’ perception of your product quality.
- Rejection rates. Defectives are weeded out at Receiving, at various points inside the manufacturing process, at final test, and at customers’ Receiving. Rejection rates are calculated at each of these points and fed back to the managers in charge. There may, however, be too many of these rates to be usable by top management.
- First-pass yield. The first-pass yield may be the most useful measure of in-process quality for top management. It is the percentage of the production volume that makes it through the entire process unscathed — that is, without failing any inspection or test, and without being waived, reworked, or scrapped (See Figure 2).
Figure 2. The First-Pass Yield
Part 3: Equipment
The aggregate metric for equipment most often mentioned in the Lean literature is Overall Equipment Effectiveness (OEE). I first encountered it 15 years ago, when a client’s intern who had been slated to help on a project I was coaching, was instead commandeered to spend the summer calculating the OEEs of machine tools. I argued that the project was a better opportunity than just taking measurements, both for the improvements at stake for the client and for the intern’s learning experience, but I lost. Looking closer at the OEE itself, I felt that it was difficult to calculate accurately, analyze, and act on. In other words, it does not meet the requirements listed in Part 1.
The OEE is usually defined as follows:
OEE = Availability × Performance × Quality
A perfect machine works whenever you need it, and is therefore available 100% of the time. It works at its nominal speed, and therefore its performance is 100%, and it never makes a defective product, so that its yield is 100%, and so is its OEE. The OEE of a real machine is intended reflects the combination of its failures to live up to these ideals.
The first problem is the meaning of availability. When we say of any device that it is available, we mean that we can use it right now. For a production machine that does one operation at a time, it would mean that it is both up and not in use. The main reason for it to be unavailable is that it is busy, which really shouldn’t count against it, should it? In telecommunications, availability for a telephone switch is synonymous with the probability that it is up. This is because it is supposed to be up all the time, and to have the capacity to handle all incoming calls. In principle, it could be unavailable because of saturation, but the availability formula does not even consider it. It is only based on uptime and downtime, or on time between failures and time to repair.
But a lathe doesn’t work like a telephone switch it at least two ways:
- It is rarely expected to work all the time: it may work two shifts a day, five days a week, and, whether it is down the rest of the time has no effect on performance.
- If you have one work piece on a spindle, you can’t load another one at the same time, and the spindle is unavailable.
In the OEE context, we are not talking about the machine being available in the sense being up and ready to take on a new task but instead of time available to a scheduler to assign it work in the course of a planning period, which may be a shift or a day, or whatever time interval is used for this factory.
If, in a 480-minute shift, a machine stops during a 30-minute break and has up to 60 minutes of unscheduled downtime and setups, then the planner can count of 480 -30-60 = 390 minutes in which to schedule work, which yields a ratio of: Availability = 390/480 = 87%.
This assumes that the machine’s ability to do work is proportional to the time it is up. My first moped as a teenager was a hand-me-down from a relative that had been garaged for 7 years. It started fine when cold, but the spark plug started malfunctioning once it was warm, about 15 minutes later. It would stay up for 75% of a 20-minute ride but that didn’t mean it completed 75% of the rides. It actually left me stranded about 100% of the time; it was unusable. Likewise, your link to a server may work 99% of the time while uploading a large file and break every time you try to save it. The formula makes it look as if it has 99% availability when in fact it is 0%.
There is also an issue with deducting setups from available time, because, unlike breakdowns, it is not just an issue of the technical performance of the machine but is directly affected by operating policies. The planner can influence the amount of time used for setups, reducing it by increasing the size of production runs or, if setup times vary with all pairs of from- and t0-products, by sequencing them so as to minimize the total setup time.
This is not to say that the formula is wrong but only that it commingles the effects of many causes and that its relevance is not universal. There may be better ways to quantify availability depending on the characteristics of a machine and the work it is assigned. Companies that calculate OEEs often do not bother with such subtleties and simply equate availability with uptime.
Performance is a generic term with many different meanings. As a factor in the OEE, it is the ratio of nominal to actual process time of the machine. If the machine actually takes two minutes to process a part when it is supposed to take only one, its performance is 50%. The times used are net of setups and don’t consider any quality issue, because quality is accounted for in the last factor. This factor is meant to account for microstoppages and reduced speeds, and it is a relevant and important equipment metric in its own right.
As discussed in Part 2, Quality is not a metric but a whole dimension of manufacturing performance with many relevant metrics. In the OEE, this factor is just the yield of the operation, meaning the ratio of good parts to total parts produced. It is not the First-Pass Yield, because reworked parts are still counted as good.
Conclusions on the OEE
While the OEE summarizes metrics that are individually of interest, not much use can be made of it without unbundling it into its different factors. Since the meaning and the calculation methods for its factors vary across companies, it cannot be used for benchmarking. Within a company and even within a plant, it is not always obvious that raising the OEE of every machine enhances the performance of the plant as a whole.
In principle, it should. Who doesn’t want all machines to be always available, perform to standard, and make good parts? The problem is that, in practice, increasing the OEE is often confused with increasing utilization, and that there are machines for which it is not a worthwhile goal. Such machines may be cheap auxiliaries to expensive ones, like a drill press following a large milling machine in a cell, or they may have been bought for their ability to take on a large variety of tasks on demand.
Unbundling the OEE into its component factors yields a more easily understandable set of equipment metrics that is less likely to mislead management. While these metrics can be collected on each piece of equipment, management must then be wary of aggregating them over machines that are intended to be used differently.
Part 4: Gaming and How to Prevent It
As massively practiced today, Management-by-Objectives (MBO) boils down to management imposing numerical targets on a few half-baked metrics, cascading this approach down the organization and giving individuals a strong incentive to spin their numbers. It is a caricature of the process Peter Drucker recommended almost 60 years ago, and he deserves no more of the blame for it than Toyota does for what passes as Lean in most companies that claim to implement it.
A non-manufacturing example of decadent MBO is the French police under former president Sarkozy, which was tasked by the government to decrease the crime rate by 3%/year while increasing the proportion of solved cases. According to the French press, this was achieved by gaming the numbers. The journalists first latched on to a reported yearly decrease in identity theft, which seemed unlikely. They found that police stations routinely refused to register complaints about identity theft on the grounds that the victims were the banks and not the individuals whose identities were stolen. A retired officer also explained how crimes were systematically downgraded with, for example, an attempted break-in recorded as the less severe “vandalism.”
The fastest way the police had found to boost the rate of case solutions was to focus on violations detected through their own actions, such as undocumented aliens found through identity checks. The solution rate for such crimes is 100%, because they are simultaneously discovered and solved. The challenge is to generate just enough of such cases to boost the solution rate without increasing the overall crime rate… To achieve this result, packs of police officers stalked train stations in search of offenders, as reported both by cops who felt this was not what they had joined up to do, and innocent citizens who complained about being harassed for their ethnicity.
In organizations affected by this kind of gaming, members work to make numbers look good rather than fulfill their missions. It is a widely held belief that you get what you measure and that people will always work to improve their performance metrics, but this is not a simplistic view of human nature. This behavior does not come naturally. On their own, schoolteachers focus on educating children, not boosting test scores, and production operators on making parts they can take pride in. It takes heavy-handed management to turn conscientious professionals into metrics-obsessed gamers, in the form, for example, of daily meetings focused entirely on the numbers, backed up by matching human resource policies on retention, promotion, raises and bonuses.
But enough about police work. Let us return to Manufacturing, and list a few of the most common ways of gaming metrics in our environment:
- Taking advantage of bad metrics. As discussed in The Staying Power of Bad Metrics, many metrics commonly used in manufacturing are poorly defined, providing gaming opportunities, such as outsourcing in order to increase sales per employee.
- Stealing from the future. In sports, nothing is more dramatic than the game won by points scored in the last seconds of a game. The bell rings right after the ball spirals into the basket and the Cinderella team wins the championship. In business, the end of an accounting period is the end of a game, and, as it approaches, sales scrambles to close last-minute deals and manufacturing to ship a few more orders. This is what Eli Goldratt called the “hockey stick effect.” Of course, this is done by moving up activities that would otherwise have taken place a few days later, during the beginning of the next accounting period. As a consequence, the beginning of the period is almost quiescent. Not much is going on, but it will be made up at the end…
- Redefining 100%. Many ratios, by definition, top out at 100%. A machine cannot run 25 hours/day, and a manufacturing process cannot produce more good parts than the total it makes. This is why ratios like equipment uptime and first-pass yield top out at 100%. Any result under 100%, however, invites questions on how it could be improved. A common way to fob off the questioners is to decree, for example, that a particular machine could not possibly be up more than 85% of the time, and redefine the scale so that 85% uptime is 100% performance. For production rates in manual operations, the ratio of an operator’s output to a work standard is often used instead of process times or piece rates. Such ratios have the advantage of being comparable across operations, and are not bounded in either direction. But their relevance depends on a work standard, and, when everybody in a shop performs at 140% of standard, chances are that the standards are engineered for this purpose.
- Leveraging ambiguity. Terms like availability, cycle time, or value added are used with different meanings in different organizations, creating many opportunities to game the metrics. If the product’s market share in the first quarter went for 1% to 2%, it doubled, but, if it went back to 1% in the second quarter, it went down by 1%.
Why do people who, in other parts of their lives, may be model citizens, engage in such behaviors, ranging from spinning to cheating? One answer is that, with what MBO has degenerated into in many companies, management is co-opting metrics gamers into its ranks. It is not that gaming is human nature, but instead that you are actively weeding out those who don’t engage in it. Changing such habits in an organization is obviously not easy.
Assume, for example, that your goal is to be competitive by having a skilled work force, and that your analysis shows that it requires employees to stay for entire careers so that what they learn at the company stays in the company. You then apply a number of different methods to make it happen:
- Communications. You make sure that all employees know what you are doing.
- Career planning. You have human resources develop a plan with all employees so that each one knows what he or she can aspire to by staying with the company.
- Organized professional development. You organize formal training, on-the-job training, and continuous improvement to provide opportunities for employees to develop the skills they need to execute their plans.
- Job enrichment. You redesign the jobs themselves to make more effective use of each employee’s talents.
If employees appreciate their jobs and have long-term career perspectives within the company, few of them should quit or make excuses not to come to work today, and the results should be visible in lower employee turnover rates and absenteeism.
The metrics are there to validate the approaches taken to reach the goal, but the goal is not to improve the metrics. It is a subtle difference. If you have the flu, you have a fever, but your goal is to heal, not just to bring down the fever. Once you are healed, you fever will be gone, and the decrease in your temperature is therefore a relevant indicator of your healing process, but it is not the healing process. If bringing down the fever were the goal, you could “game” your temperature and bring it down without healing. This distinction existed in Drucker’s original writings about MBO, but got lost in implementation.
So, what can you do to prevent metrics gaming? Let us examine three strategies:
- Review the metrics themselves. Use the requirements listed in my first post on this subject. You may not be able to completely game-proof your metrics, but at least you could make sure that they make sense for your business and are not trivially easy to game.
- Decouple the metrics from immediate rewards. Piece rates used to be the most common form of payment for production work, but have almost entirely vanished in advanced manufacturing economies, and been replaced by hourly wages. Performance expectations are attached, but there is no direct link to the amount produced in a given hour of a given day. There are many reasons for this evolution:
- The pace of work is often set by machines or by a moving line, rather than by the individual.
- The best performance for the plant is not necessarily achieved by every individual maximizing output at all times.
- More is expected of all individuals than just putting out work pieces, including training or participating in improvement activities.
One consequence of this decoupling is that time studies are easier and more accurate than in a piece rate environment. The same logic applies in management: the more direct the link between metrics and individual evaluations, the more intense the gaming. Don’t make the metrics the key to promotions or to prizes representing a substantial part of a manager’s compensation. Use them only as indicators to inform discussions on plans and strategies.
- Increase the measurement frequency. The longer the reporting period, the more opportunities it offers for gaming the metrics by stealing from the future, and the more pronounced the hockey stick effect. Conversely, you can reduce it by measuring more often, and eliminate it by monitoring continuously, as is done, for example by the electronic production monitors that keep a running tally of planned versus actual production in a line during a shift. Periods exist in accounting because of the limitations of data processing technology at the time the accounting methods were developed. In the days of J.P. Morgan, closing the books was a major effort that a company could only be do every so often. In 2012, there is no technical impediment to the “anytime close,” but the publication of periodic reports continues by force of habit. Metrics in the language of things as well as the language of money can be monitored continuously.
- Have third parties calculate the metrics. In principle, counting chips should be done more accurately by agents with no stake in where they may fall. In practice, it is not only expensive but does not always produce the desired result. It is the approach used in Management Accounting. A plant’s accounting manager, or comptroller, is not chosen by the plant manager, he or she reports directly to corporate finance, and has no motivation to humor the plant manager. This is a double-edged sword because, with neutrality, comes a distance from the object of the measurement that may cause misunderstandings, and Management Accounting leaders like Robert Kaplan, Orrie Fiume, or Brian Maskell have been struggling with the challenge of providing relevant, actionable information to managers for the past 30 years. Outside of Accounting, for metrics in the language of things, the closest you can come to having a 3rd party produce the measurements is to have a computer system do it, based on automatic data acquisition. There is no opportunity for gaming, but the issues of relevance are as acute as in Management Accounting.
Editor’s note: this is part of a continuing series by Michel Baudin. As new sections become available we will add them to this article. You can also check out Michel’s blog here.