|Nutek, Inc. Quality Engineering Seminar, Software, and Consulting ( Since 1987) Site: Reports|
Multiple Criteria of
Evaluations for Designed Experiments
Ranjit K. Roy, Ph.D.,P.E.
Proper measurement and evaluation of performance is key to comparing performances of products and processes. When there is only one objective, carefully defined quantitative evaluation most often serves the purpose. However, when the product or process under study is to satisfy multiple objectives, performances can be scientifically compared only when the individual criteria of evaluations are combined into a single number. This short paper deals with a method of handling evaluations of multiple objectives together by combining them into an Overall Evaluation Criteria (OEC).
In engineering and scientific applications, measurements and evaluations of performance are everyday affairs. Although there are situations where measured performances are expressed in terms of attributes such as Good, Poor, Acceptable, Deficient, etc., most evaluations can be expressed in terms of numerical quantities (instead of Good and Bad, use 10 - 0). When these performance evaluations are expressed in numbers, they can be conveniently compared to select the preferred candidate. The task of selecting the best product, a better machine, a taller building, a champion athlete, etc. is much simpler when there is only one objective with performance measured in terms of a single number. Consider a product such as a 9 Volt Transistor Battery whose functional life expressed in terms of hours is the only characteristic of concern. Given two batteries: Brand A (20 hours) and Brand B (22.5 hours), it is easy to determine which one is preferable. Now suppose that you are not only concerned about the functional life, but also the unit costs which are: $ 1.25 for Brand A and $1.45 for Brand B. The decision about which brand of the Battery is better is no longer straightforward.
Multiple performance objectives (or goals) are quite frequent in the industrial arena. A rational means of combining various performances evaluated by different units of measurement is essential for comparing one product performance or process output with another. In experimental studies utilizing the Design of Experiments (DOE) technique, performances of a set of planned experiments are compared to determine the influence of the factors and the combination of the factor levels that produce the most desirable performance. In this case the presence of multiple objectives poses a challenge for analysis of results. Inability to treat multiple criteria of evaluations (measure of multiple performance objectives) often renders some planned experiments ineffective.
Combining multiple criteria of evaluations into a single number is quite common practice in academic institutions and sporting events. Consider the method of expressing a Grade Point Average (GPA, a single number) as an indicator of students academic performance. The GPA is simply determined by averaging the GPA of all courses (such as scores in Math, Physics, or Chemistry - individual criteria evaluations) which the student achieves. Another example is a Figure Skating Competition where all performers are rated in a scale of 0 to 6. The performer who receives 5.92 wins over another whose score is 5.89. How do the judges come up with these scores? People judging such events follow and evaluate each performer in an agreed upon list of items (criteria of evaluations) such as style, music, height of jump, stability of landing, etc. Perhaps each item is scored in a scale of 0 - 6, then the average scores of all judges are averaged to come up with the final scores.
If academic performances and athletic abilities can be evaluated by multiple criteria and are expressed in terms of a single number, then why isnt it commonly done in engineering and science? There are no good reasons why it should not be. For a slight extra effort in data reduction, multiple criteria can be easily incorporated in most experimental data analysis scheme.
To understand the extra work necessary, let us examine how scientific evaluation differs from those of student achievement or from an athletic event. In academic as well as athletic performances, all individual evaluations are compiled in the same way, say 0 - 4 (in case of students grade, there is no units). They also carry the same Quality Characteristic (QC) or the sense of desirability (the higher score the better) and the same Relative Weights (level of importance) for all. Individual evaluations (like the grades of individual courses) can be simply added as long as their (a) units of measurement, (b) sense of desirability, and (c) relative weight (importance) are same for all courses (criteria). Unfortunately, in most engineering and scientific evaluations, the individual criteria are likely to have different units of measurement, Quality Characteristic, and relative weights. Therefore, methods specific to the application, and that which overcomes the difficulties posed by differences in the criteria of evaluations, must be devised.
(a) Units of Measurements - Unlike GPA or Figure Skating, the criteria of evaluations in engineering and science, generally have different units of measurements. For example, in an effort to select a better automobile, the selection criteria may consist of: fuel efficiency measured in Miles/Gallon, engine output measured in Horsepower, reliability measured as Defects/1000, etc. When the units of measurements for the criteria are different, they cannot be combined easily. To better understand these difficulties, consider a situation where we are to evaluate two industrial pumps of comparable performances (shown below). Based on 60% priority on higher discharge pressure and 40% on lower operating noise, which pump would we select?.
Table 1. Performance of Two Brands of Industrial Pumps
|Evaluation Criteria||Rel. Weighting||Evaluation Criteria||Pump B|
|Discharge Pressure||60%||160 psi||140 psi|
|Operating Noise||40%||90 Decibels||85 Decibels|
|Totals =||--||250 ( ? )||225 ( ? )|
Pump A delivers more pressure, but is noisier. Pump B has a little lower pressure, but is quieter. What can we do with the evaluation numbers. Could we add them? If we were to add them what units will the resulting number have? Would the totals be of use? Is Pump A with 250 total better than Pump B?
Obviously, addition of numbers (evaluations) with different units of measurements is not permissible. If such numbers are added, the total serves no useful purpose, as we have no units to assign, nor do we know whether bigger or smaller value is better. If the evaluations were to be added, they must first be made dimensionless (normalized). This can be easily done by dividing all evaluations (such as 160 psi, 140 psi) of a criteria by a fixed number (such as 200 psi), such that the resulting number is a unitless fraction.
(b) Quality Characteristic (QC) - Just because two numbers have the same or no units, they may not necessarily be meaningfully added. Consider the following two players and attempt to determine who is better.
Table 2 Golf and Basketball Scores of Two players
|Criteria||Rel. Weighting||Player 1||Player 2||QC|
|Golf (9 holes)||50%||42||52||Smaller|
|Total Score =||--||70||70||--|
Observe that the total of scores for Player 1 is (42 + 28 = ) 70 and the same for Player 2 is also (52+18 = ) 70. Are these two players of equal caliber? Is the additions of the scores meaningful and logical? Unfortunately, the total of scores do not reflect the degree by which Player 1 is superior over Player 2 ( score of 42 in Golf is better over 52 and score of 28 in basketball is better than 18). The total scores are meaningful only when the QCs of both criteria are made the same before they are added together.
One way to combine the two scores is to first change the QC of the Golf score by subtracting from a fixed number, say 100, then adding it to the score of the Basketball. The new total score then becomes:
Overall score for Player 1 =30+(100- 45)=85x0.50= 42.5
Overall score for Player 2 =20+(100- 55)= 65x0.50= 32.5
The overall scores indicate the relative merit of the players. Player 1 having the score of 42.5 is a better sportsman compared to Player 2 who has a score of 32.5.
(c) Relative Weight - In formulating the GPA, all courses for the student are weighted the same. This is generally not the case in scientific studies. For the two Players above, the skills in Golf and Basketball were weighted equally. Thus, the relative weight did not influence the judgment about their ability. If the relative weights are not the same for all criteria, contribution from the individual criteria must be multiplied by the respective relative weights. For example, if Golf had a relative weight of 40%, and Basketball had 60%, the computation for the overall scores must reflect the influence of the relative weight as shown below.
Overall score for Player 1 = 30 x 0.40 + (100 - 45)x 0.60 = 45
Overall score for Player 2 = 20 x 0.40 + (100 - 55) x0.60 = 35
The Relative Weight is a subjective number assigned to each individual criteria of evaluation. Generally it is determined during the experiment planning session by the team consensus, and are assigned such that the total of all weights is 100 (set arbitrarily).
Thus, when the concerns are addressed, criteria of evaluations can be combined into a single number as described using the following application example.
An Example Application - A group of process engineers and researchers involved in manufacturing baked food products planned an experiment to determine the best recipe for one of their current brand of cakes. Surveys showed that the best cake is judged on taste, moistness, and smoothness rated by customers. The traditional approach has been to decide the recipe based on one criterion (i.e., Taste) at a time. Experience, however, has shown that when the recipe is optimized based on one criteria, subsequent analyses using other criteria do not necessarily produce the same recipe. When the ingredients differ, selection of a compromised final recipe becomes a difficult task. Arbitrary or subjectively compromised recipes have not brought the desired customer satisfaction. The group therefore decided to follow a path of consensus decision, and carefully devise a scientific scheme to incorporate all criteria of evaluations simultaneously into the analysis process.
In the planning session convened for the Cake Baking Experiment and from subsequent reviews of experimental data, the applicable Evaluation Criteria and their characteristics as shown in Table 3 below, were identified. Taste, being a subjective criteria, was to be evaluated using a number between 0 and 12, and 12 being assigned to the best tasting cake. The Moistness was to be measured by weighing a standard size cake and by noting its weight in grams. It was the consensus, that a weight of about 40 grams represents the most desirable moistness, and indicates that its Quality Characteristic is of Nominal type. In this evaluation, both results above and below the nominal are considered equally undesirable. Smoothness was measured by counting the number of voids in the cake, which made this evaluation of type Smaller is better (QC). The relative weights were assigned such that the total was 100. The notations X1, X2, & X3 as shown next to the criteria description in Table 3, are used to represent the evaluations of any arbitrary sample cake.
Table 3. Evaluation Criteria for Cake Baking experiments
|Criteria Description||Worst Evaluation||Best Evaluation||Quality Characteristic (QC)||Rel. Weighting|
|Taste(x1)||0||12||Bigger is better||55|
|Smoothness(x3)||8||2||Smaller is better||25|
There were three samples tested (cakes baked) in each of the eight trial conditions. The performance Evaluations for the three samples in Trial # 1 are as shown below. Note that each sample is evaluated under the three criteria of evaluations which are combined into a single number (OEC = 66 for sample 1) for each sample, which represents the performance of the cake.
Table 4. Trial# 1 Evaluations
|Criteria||Sample 1||Sample 2||Sample 3|
The individual sample evaluations were combined into a single number, called the Overall Evaluation Criteria (OEC), by appropriate Normalization. The term Normalization would include the process of reducing the evaluations to dimensionless quantities, aligning their Quality Characteristics to conform to a common direction (Bigger or smaller), and allowing each criteria to contribute in proportion of their Relative Weight. The OEC equation appropriate for the Cake Baking project is as shown below.
The contribution of each criteria is turned into fractions (a dimensionless quantity) by dividing the evaluation by a fixed number such as the difference between Best and the Worst among all the respective sample evaluations (12 - 0 for Taste, see Table 3). The numerator, represents the evaluation reduced by smaller magnitude of the Worst or the Best Evaluations in case of Bigger and Smaller QCs and by the Nominal value in case of Nominal QC. The contributions of the individual criteria are then multiplied by their respective Relative Weights (55, 20, etc.). The Relative Weights which are used as a fraction of 100, assures the OEC values to fall within 0 - 100.
Since Criteria 1 has the highest Relative Weight, all other criteria are aligned to have a Bigger QC. In the case of a Nominal QC, as it is the case for Moistness (second term in the equation above), the evaluation is first reduced to deviation from the nominal value (X2 - nominal value). The evaluation reduced to deviation naturally turns to Smaller QC. The contributions from the Smoothness and Moistness, both of which now have Smaller QC, are aligned with Bigger QC by subtracting it from 1. An example calculation of OEC using the evaluations of Sample 1 of Trial # 1 (see Table 4) is shown below.
Sample calculations: Trial 1, Sample 1 (x1 = 9, x2 = 34.19, x3 = 5) OEC = 9 x 55 / 12 + ( 1 - (40 - 34.19)/15 ) x 20 +(1 - (5 - 2)/6 ) x 25 = 41.25 + 12.25 + 12.5 = 66 (shown in Table 4)
Similarly, the OECs for all other sample cakes for Trial # 1 (OEC = 64 for Sample 2 and OEC = 58.67 for Sample 3) and other Trials of the experiment are calculated. The OEC values are considered as the Results for the purposes of the analysis of the designed experiments.
The concept OEC was first published by the author in the reference text in 1989. Since then it has been successfully utilized in numerous industrial experiments, particularly those that followed the Taguchi Approach of experimental designs. The OEC scheme has been found to work well for all kinds of experimental studies regardless of whether it utilizes designed experiments.
References: 1. A Primer on The Taguchi Method by Ranjit K. Roy, Society of manufacturing Engineers, P. O. Box 6028, Dearborn, Michigan, USA 48121. Fax: 1-313-240-8252 or 1-313-271-2861. ISBN: 0-87263-468-X
2. Qualitek-4 Software for Automatic Design and Analysis of Taguchi Experiments, Nutek, Inc. 30600 Telegraph Road, Suite 2230, Birmingham, Michigan, USA 48025, Ph: 1-248-642-4560, Fax: 1-248-642-4609, Web Site: http://www.wwnet.com/~rkroy (Free DEMO version available for download)
Author: Ranjit K. Roy, Ph.D., P.E., (Mechanical Engineering),
(A sure way to implement modern quality improvement techniques is to follow a planned application which drives the long term learning process.)
As a part of continuous improvement process, manufacturing companies and their suppliers could greatly benefit by embracing and applying many of the modern quality improvement tools and techniques in their respective business practices. A list of such techniques has also been identified by several major manufacturing organizations. One such technique, widely used today for product design, process optimization, and problem solving, is the statistical method called Design of Experiment (DOE). This brief report shares the writers experience in helping many companies with implementation and training of the DOE technique in the workplace.
How should companies proceed to implement such techniques? What is a sure way to make it a common practice within the organization?
Several terms like implement, deploy, and institute are used to mean the same purpose of applying a new technique in the business practices throughout the organization. For organizations, large or small, implementation is successful when a number of people in the organization have learned the technique, and it has been made a part of the day-to-day business practice. To continue to apply the technique for business, it would require that a number of people (a critical mass) also maintain the application knowledge current and that the management continue to stress the need of a such technique in the business. What most employers want is to be assured that the business gets the most out of the training dollar spent.
As managers, what should one do? What can be done to translate the training dollars into applications?
The traditional approach has been to train all those who are supposed to be trained and hope that applications will follow. Experience shows, it does not occur naturally.
What else should the managers do besides approving the cost and checking the training box in the companys skills inventory form? There may not be a general solution for all companies, but there are other ways.
One approach that has worked well for a large automotive manufacturing client of the writer may serve as a model for others to follow. Having trained hundreds of engineering personnel in the DOE technique, when it was time to solve an urgent production problem, there was no one in the plant to provide such application support. This time, the plant manager formed a small functional group, calling it something like a Statistical Experiment Group. One progressive thinking individual was appointed to head the group. The group function was clearly defined. It was to become the center of expertise within the plant. They were to identify applications within the plant and provide help with applications.
How is this group supposed to help others while they themselves need help? The management, of course, quickly recognized that the group would need to gain experience first. Therefore, an expert consultant was retained to provided application support on a regular basis. The consultant first worked with groups ready with applications and provided training to the groups involved as demanded by the participants. Over a period of three years, the plant had solved many problems and worked on many product and process designs. Above all, the plant now has several people who maintain application- ready expertise. No longer dependent on the consultant, the plant can now happily recommend that their suppliers and their sister plants do the same.
How can an organization start the implementation process? Obviously this question should occur only when what it is to be implemented is determined first. Although, DOE has been used as an example, this strategy of implementation can work well for similar techniques like Statistical Process Control (SPC), Failure Mode and Effect Analysis (FMEA), Reliability Test Planning and Evaluation, etc. Once the technique is identified, the senior management of the organization can follow these steps:
This group must be formed by the head of the organization and should have a leader. The lead persons job responsibility will include developing application expertise within the group and help others who have application needs within the organization. The group should have a name, consistent with the function, but not specific to the technique.
It is important to keep the focus on application and justify help for application. There should be provision for training cost, but should be approved only when requested by those who need it for project applications. It is generally known that when employees are sent to training without immediate application needs, they are reluctant to request the needed assistance when they are ready for applications, particularly if the help is from external sources. On the other hand, when a group starts to apply a new technique to their project, they are more willing to solicit help and are more likely to retain the application knowledge.
The group will benefit from experienced and competent help, whether external or internal. The management must secure long range support and understanding with the consultant. The cost is generally not expected to be much more than a few training session.
True benefits of employee training comes from application of the knowledge. Although improvements made by use of newly acquired knowledge is not always quantifiable, for many projects, dollar benefits are easily estimated. The management must demand such data and document return on investment. Recognition of groups or individuals who have successfully accomplished projects, is an effective means of providing the incentive necessary. Generally, professional and peer recognition work is a much better incentive than monetary rewards.
Pointers on how make business web page that works
Ranjit K. Roy, Ph.D.,P.E. (Mech. Engr.), Author