Thursday, November 1, 2007

Non-Normal Distributions in the Real World

The existence of Normal Law is based on the Central Limit Theorem, that's probably the reason why the huge majority of people - after they have followed some Statistics course at their Business or Engineering High School - believe that in the Real World, Normal Law is so common.

That's a big myth: Normal Law especially in Manufacturing is very rare; in fact that's the very foundation of Shewhart's talk about a "controlled process": if Normal Law were magic, there would be no need for him to invent this latter concept in his book "Statistical Method from the Viewpoint of Quality Control".

It is astonishing that some practionners in Quality field do rediscover the wheel even after they have heard about Walter A. Shewhart - whereas others seem to have totally failed to even engage in the way to do so - I reproduce below an article By Thomas Pyzdek.

I'm not affiliated with him - I don't know him whatsoever - and I only posted his opinions here to illustrate what I'm claiming above as well as to serve as a reference from my other blog's article about "Shewhart/Deming Statistical Process Control vs Six Sigma".

Non-Normal Distributions in the Real World
Copyright © 2000 by Thomas Pyzdek, all rights reserved
Reproduction allowed if no changes are made to content

One day, early in my quality career, I was approached by my friend Wayne, the manager of our galvanizing plant.

'Tom," he began, "I've really been pushing quality in my area lately, and everyone's involved. We're currently working on a problem with plating thick­ness. Your reports always show a 3-percent to 7-percent reject rate, and we want to drive that number down to zero."

I, of course, was pleased. The galvanizing area had been the company's perennial problem child. "How can I help?" I asked.

"We've been trying to discover the cause of the low thicknesses, but we're stumped. I want to show copies of the quality reports to the team so they can see what was happening with the process when the low thicknesses were produced."

"No problem:' I said, "I'll have them for you this afternoon."

Wayne left, and I went to my galvanizing reports file. The inspection procedure called for seven light poles to be sampled and plotted each hour. Using the reports, I computed the daily average and standard deviation by hand (this was before the age of personal computers). Then, using a table of normal distribution areas. I found the estimated percent below the low specification limit. This number had been reported to Wayne and a number of others. As Wayne had said, the rate tended to be between 3 percent and 7 percent.

I searched through hundreds of galvanizing reports, but I didn't find a single thickness below the minimum. My faith in the normal distribution wasn't shaken, however. I concluded that the operators must be "adjusting" their results by not recording out-of-tolerance thicknesses. I set out for the storage yard, my thickness gage in hand, to prove my theory.

Hundreds of parts later, I admitted defeat. I simply couldn't find any thickness readings below the minimum requirement. The hard-working galvanizing teams met this news with shock and dismay.

"How could you people do this to us?" Wayne asked.

This embarrassing experience led me to begin a personal exploration of just how common normal distributions really are. After nearly two decades of research involving thousands of real-world manufacturing and nonmanufacturing operations, I have an announcement to make: Normal distributions are not the norm.

You can easily prove this by collecting data from live processes and evaluating it with an open mind. In fact, the early quality pioneers (such as Walter A. Shewhart) were fully aware of the scarcity of normally distributed data. Today, the prevailing wisdom seems to say, "If it ain't normal, something's wrong." That's just not so.

For instance, most business processes don't produce normal distributions. There are many reasons why this is so. One important reason is that the objective of most management and engineering activity is to control natural processes tightly, eliminating sources of variation whenever possible. This control often results in added value to the customer. Other distortions occur when we try to measure our results. Some examples of "de-normalizing" activities include human behavior patterns, physical laws and inspection.
Human Behavior Patterns

Figure 1 shows a histogram of real data from a billing process. A control chart of days-to-pay (i.e., the number of days customers take to pay their bills) for nonprepaid invoices showed statistical control. The histogram indicates that some customers like to prepay, thus eliminating the work associated with tracking accounts payable. Customers who don't prepay tend to send payments that arrive just after the due date. There is a second, smaller spike after statements are sent, then a gradual drop-off. The high end is unbounded because a few of the customers will never pay their bills. This pattern suggests a number of possible process improvements. hut the process will probably never produce a normally distributed result. Human behavior is rarely random, and processes involving human behavior are rarely normal.

Figure 1. Days between mailing of invoice and receipt of payment.

Physical Laws

Nature doesn't always follow the "Normal Law" either. Natural phenomena often produce distinctly non-normal patterns. The hot-dip galvanizing process discussed previously is an example. A metallurgist described the process to me (but too late, alas, to prevent the aforementioned debacle) as the creation of a zinc-iron alloy at the boundary. The alloy forms when the base material reaches the temperature of the molten zinc. Pure zinc will accumulate after the alloy layer has formed. However, if the part is removed before the threshold temperature is reached, no zinc will adhere to the base metal. Such parts are so obviously defective that they're never made.

Thus, the distribution is bounded on the low side by the alloy-layer thickness, but (for all practical purposes) unbounded on the high side because pure zinc will accumulate on top of the alloy layer as long as the part remains submerged. Figure 2 shows the curve for the process - a non-normal curve.

Figure 2. The distribution of zinc-plating thicknesses.


Sometimes inspection itself can create non-normal data. ANSI Y14.5, a standard for dimensioning and tolerancing used by aerospace and defense contractors, describes a concept called "true position." The true position of a feature is found by converting an X and Y deviation from target to a radial deviation and multiplying by two. Even if X and Y are normally distributed (of course, they usually aren't), the true position won't be. True position is bounded at zero and the shape often depends solely on the standard deviation.

Many other inspection procedures create non-normal distributions from otherwise normal data. Perpendicularity might be normally distributed if the actual angle were measured and recorded. Quite often, though, perpendicularity is meas­ured as the deviation from 90 degrees, with 88 degrees and 92 degrees both being shown as 2 degrees from 90 degrees. The shape of the resulting distribution varies depending on the mean and standard deviation. Its shape can range from a normal curve to a decidedly non-normal curve. This apparent discrepancy also applies to flatness, camber and most other form callouts in ANSI Y14.5. The shape of the curve tells you nothing about your control of the process.

At this point, a purist might say, "So what?" After all, any model is merely an abstraction of reality and in error to some extent. Nevertheless, when the error is so large that it has drastic consequences, the model should be re-evaluated and perhaps discarded. Such is often the case with the normal model.

Process capability analysis (PCA) is a procedure used to predict the long-term performance of statistically controlled processes. Virtually all PCA techniques assume that the process distribution is normal. If it isn't, PCA methods, such as Cpk, may show an incapable process as capable, or vice versa. Such methods may predict high reject rates even though no rejects ever appear (as with the galvanizing process discussed earlier) or vice versa.

If you're among the enlightened few who have abandoned the use of "goal-post tolerances" and PCA, you'll find that assuming normality hampers your efforts at continuous improvement. If the process distribution is skewed, the optimal setting (or target) will be somewhere other than the center of the engineering tolerance, but you'll never find it if you assume nor­mality. Your quality-improvement plan must begin with a clear understanding of the process and its distribution.

Failure to understand non-normality leads to tampering, increased reject rates, sub-optimal process settings, failure to detect special causes, missed opportunities for improvement, and many other problems. The result is loss of face, loss of faith in SPC in general, and strained customer-supplier relations.

1 comment:

Christopher said...

While I agree with a great deal of what is in this article... Tom does not approach the notion of sampling in normality to discover when something is truly a special cause (regardless of the distribution). Am i missing something here?