## Description

*The Project File Details*

*Name: STEPWISE PROCEDURES IN DISCRIMINANT ANALYSIS**Type: PDF and MS Word (DOC)**Size: [319 KB]**Length: [72] Pages*

## ABSTRACT

Several multivariate measurements require variables selection and ordering. Stepwise procedures ensure a step by step method through which these variables are selected and ordered usually for discrimination and classification purposes. Stepwise procedures in discriminant analysis show that only important variables are selected, while redundant variables (variables that contribute less in the presence of other variables) are discarded. The use of stepwise procedures is employed as to obtain a classification rule with a low error rate. Here in this work, variables are selected based on Wilks’ lambda and partial F. The variable with the minimum and maximum F is included in the model first, followed by the next most important variable as can be observed from the forward selection. Backward elimination deletes the variable with the smallest F and the largest in a step by step fashion. SPSS is used to illustrate how stepwise procedures can be employed to identify the most important variable to be included in the model based on Wilks’ and partial F. The analysis revealed that only variables X1, head width at the widest dimension and X4, eye-to-top-of-head measurement are the most important variables that are worthy of inclusion into the discriminant function.

## TABLE OF CONTENTS

Title Page ……………………………………………………………………i

Approval page…………………………………………………………… ii

Dedication………………………………………………………………..…iii

Acknowledgement. …………………………………………………………iv

Abstract………………………………………………………………………v

Table of Contents. …………………………………………………………vi

CHAPTER 1: INTRODUCTION

1.1 Discriminant Analysis……………………………………………….1

1.2 Stepwise Discriminant analysis …………………………………… 2

1.3 Steps Involved in discriminant Analysis ……………………………3

1.4 Goals for Discriminant Analysis…………………………………….4

1.5 Examples of Discriminant analysis problems………………………5

1.6 Aims and Objectives…………………………………………………6

1.7 Definition of Terms…………………………………………………. 7

1.7.1 Discriminant function……………………………………………7

1.7.2 The eigenvalue…………………………………………………. 7

1.7.3 Discriminant Score…………………………………………… 8

1.7.4 Cut off……………………………………………………………8

1.7.5 The Relative Percentage……………………………………… 8

1.7.6 The Canonical Correlation, R*……………………………….. 9

1.7.7 Mahalanobis distance………………………………………… 9

1.7.8 The Classification table……………………………………… .. 9

1.7.9 Hit ratio……………………………………………………… 10

vii

1.7.10 Tolerance………………………………………………………. 10

CHAPTER TWO: LITERATURE REVIEW

2.1 Discriminant analysis……………………………………………… 11

2.2 Stepwise Discriminant analysis…………………………………….. 13

2.3 Linear Discriminant function……………………………………….. 15

2.4 Criteria for Good Discriminant functions………………………….. 18

2.4.1 Fishers Criterion…………………………………………….. 18

2.4.2 Welch’s Criterion……………………………………………. 20

2.4.3 Von Mises Criterion…………………………………………. 22

2.4.4 Bayes Criterion……………………………………………… 23

2.4.5 Unequal cost of misclassification criterion…………………… 23

CHAPTER THREE: RESEARCH METHODOLOGY

3.1 Stepwise methodologies in discriminant analysis……………… 25

3.2 The F-Distribution ……………………………………………… 28

3.3 The Wilk’s Lambda Distribution …………. …………………… 31

3.4 Using the Linear Discriminant Function……………………….…33

3.5 Interpretation of Linear Discriminant Function………..…………34

3.6 Limitation of Discriminant Function…………………………….. 35

3.7 Limitation of Stepwise Methods of Discriminant Analysis………35

3.8 Ways of Dealing with the Problems Inherent

with Stepwise Discriminant Analysis ……………………………. 39

3.9 Test of Equality of Two Mean Vectors………………………… 40

3.10 Test of Equality of Two Dispersion Matrices……………………41

3.11 Estimating Misclassification Rate…………………………………42

3.11.1 Probability of Misclassification ……………………….. 42

viii

3.12 Improved Estimates of Error Rates………………………………44

CHAPTER FOUR: DATA ANALYSIS

4.1 Method of Data Collection…………………………………………..46

4.2 Discriminant (All independent variables) analysis………………….47

4.3 Summary of Canonical discriminant function………………………49

4.4 Classification Statistics……………………………………………… 51

4.5 Discriminant (stepwise method) analysis…………………………… 52

4.6 Stepwise Statistics…………………………………………………. 54

4.7 Summary of Stepwise Canonical discriminant functions……………55

4.8 Classification Statistics for Stepwise procedures……………………57

CHAPTER FIVE: RESULTS, CONCLUSION AND

RECOMMENDATION

5.1 Results………………………………………………………………….58

5.2 Conclusion…………………………………………………………. 59

5.3 Recommendation……………………………………………………60

References………………………………………………………….. 61

Appendix I………………………………………………………….. 63

Appendix II …………………………………………………………64

## CHAPTER ONE

INTRODUCTION

1.1 DISCRIMINANT ANALYSIS

Discriminant Analysis or D.A is a multivariate technique used to

classify cases into distinct groups. It separates distinct sets of objects (or

observations) and allocates new objects (or observations) to previously

defined groups. Discriminant analysis is concerned with the problem of

classification, which arises when a researcher having made a number of

measurements on an individual, wishes to classify the individual into one of

several categories on the basis of these multivariate measurements

(Onyeagu, 2003).

Discriminant analysis will help us analyze the differences between

groups and provide us with a means to assign or classify any case into the

groups which it most closely resembles.

There are two aspects of discriminant analysis,

1. Predictive Discriminant Analysis (PDA) or Classification, which is

concerned with classifying objects into one of several groups and

2. Descriptive Discriminant Analysis (DDA) which focused on

revealing major differences among the groups (Stevens 1996).

2

According to Huberty (1994), Descriptive discriminant analysis includes the

collection of techniques involving two or more criterion variables and a set

of one or more grouping variables, each with two or more levels. “Whereas

in predictive discriminant analysis (PDA) the multiple response variables

play the role of predictor variables. In descriptive discriminant analysis

(DDA) they are viewed as outcome variables and the grouping variable(s) as

the explanatory variable(s). That is, the roles of the two types of variables

involved in a multivariate multigroup setting in DDA are reversed from the

role in PDA.

1.2 STEPWISE DISCRIMINANT ANALYSIS

A researcher may wish to discard variables that are redundant (in the

presence of other variables) when a large number of variables are available

for groups separation. Here (in discriminant analysis), variables (say y’s) are

selected and, the basic model does not change. Unlike regression, where

independent variables are selected and consequently, the model is altered.

Stepwise selection is a combination of forward and backward

variables selection methods. In forward selection, the variable entered at

each step is the one that maximizes the partial F-Statistic based on Wilks’.

The maximal additional separation of groups above and beyond the

3

separation already attained by the other variables is thus obtained. The

proportion of these F’s that exceed Fα is greater than α. While in backward

selection (elimination), the variable that contributes least is deleted at each

step as shown by the partial F.

The variables which are selected one at a time, and at each step, are

re-examined to see if any variable that entered earlier has become redundant

in the presence of recently added variables. When the largest partial F

among the variables available for entry fails to exceed a preset threshold

value, the procedure stops.

Stepwise discriminant Analysis is a form of discriminant analysis.

During the selection process no discriminant functions are calculated.

However, after the completion of the subset selection, discriminant function

is calculated for the selected variables. These variables can also be used in

the construction of classification functions.

1.3 STEPS INVOLVED IN DISCRIMINANT ANALYSIS

1. Construct the discriminant function.

2. Evaluate the discriminant function for population one (1) by

substituting the mean values of X1, X2, ….., Xp into Y = L1X1 + L2

X2+…+LPXP, label the value obtained, Y1.

4

3. Repeat step 2 for population two (2) and label the value obtained, Y2.

4. Since one is usually greater than the other, assume Y2 > Y1

5. Compute the critical value, YC = Y1 + Y2 2 6. Then state the discriminating procedure as; assign the new individual

to population one (1) if Y < YC and to population two (2) if Y > YC or

YC < Y.

1.4 GOALS FOR DISCRIMINANT ANALYSIS

Johnson and Wichern (1992) defined two goals of discriminant

analysis as:

1. To describe either graphically (in at most three dimensions) or

algebraically the differential features of objects (or observations)

from several known collections (populations). We try to find

discriminants such that the collections are separated as much as

possible.

2. To sort objects (observations) into two or more labeled classes.

The emphasis is on deriving a rule that can be used to optimally

assign a new object to the labeled classes. Johnson and Wichern

(1992) used the term discrimination to refer to Goal 1 and

Classification or Allocation to refer to goal 2.

5

The goals of discriminant analysis include identifying the relative

contribution of the p variables to separation of the groups and finding the

optimal plane on which the points can be projected to illustrate the

configuration of the groups.

1.5 EXAMPLES OF DISCRIMINANT ANALYSIS PROBLEMS

1. A geologist might wish to classify fossils into their respective

categories of fossils groups on the basis of measurements on sizes,

shapes and ages of the fossils.

2. A doctor may intend to classify new born babies into different

categories of blood groups, based on measurement obtained from the

blood samples of the babies.

3. Students applying for admission into a University are given a common

Entrance Examinations (CEE), the vector of their scores in the

entrance examination is a set of measurement, X. The problem is to

classify a student on the basis of his scores on the entrance

examination.

4. An automobile Engineer might decide to classify an automobile

engine into one of several categories of engine on the basis of

measurement of its power output, size and shape.

6

5. A nutritionist might classify food substances into categories of food

nutrient as carbohydrate, minerals, water, protein, fat and oil, and

vitamin on the basis of measurement on comparative amount of

different nutrients in the food.

As we have seen in the examples above, individuals are assigned to

groups taking cognizance of data related to the groups.

1.6 AIMS AND OBJECTIVES OF THE STUDY

This study is necessary for the following purposes:

1. For classification of cases into groups using the stepwise

methodologies of discriminant analysis;

2. To identify and discard or remove redundant variables or variables

which are little related to group distinction;

3. To compare the probabilities of misclassification and the hit ratios

obtained with discriminant analysis (all independent variables) to

that obtained with stepwise procedures.

7

1.7 DEFINITION OF TERMS

1.7.1 Discriminant Function

This is a latent variable which is created as a linear combination of

discriminating variables, such that

Y = L1x1 + L2x2 + …..+ Lp xp

where the L’s are the discriminant coefficients, the x’s are the discriminating

variables.

1.7.2 The eigenvalue: This is the ratio of importance of the dimensions

which classifies cases of the dependent variables. There is one eigenvalue

for each discriminant function. With more than one discriminant function,

the first eigenvalue will be the largest and the most important in explanatory

power, while the last eigenvalue will be the smallest and the least important

in explanatory power.

Relative importance is assessed by eigenvalues since they reflect the

percents of variance explained in the dependent variable, cumulating to

100% for all functions. Eigenvalues are part of the default of output in SPSS

(Analysis, Classify, Discrimination).

8

1.7.3 The Discriminant Score

This is the value obtained from applying a discriminant function

formula to the data for a given case. For standardized data, Z score is the

discriminant score.

1.7.4 Cutoff

When group sizes are equal, the mean of the two centroids for two

groups discriminant analysis is the cut off. The cut off is the weighted mean

if the groups are unequal. A case is classed as 0 if the discriminant score of

the discriminant function is less than or equal to the cut off or classed as 1 if

above it.

1.7.5 The Relative Percentage

This is equal to the eigenvalue of a function divided by the sum of

all eigenvalues of all discriminant functions in the model. It is the percent of

discriminating power for the model associated with a particular discriminant

function. It tells us how many functions are important. The ratio of

eigenvalues indicates the relative discriminating power of the discriminant

functions.

9

1.7.6 The Canonical Correlation, R*

This measures the association between the groups formed by the

dependent and the given discriminant function. A large canonical correlation

indicates high correlation between the discriminant functions and the groups.

An R* of 1.0 shows that all of the variability in the discriminant scores can

be accounted for by that dimension. The relative percentage and R* do not

have to be correlated. Canonical Correlation, R* , also shows how much each

function is useful in determining group differences.

1.7.7 Mahalanobis Distances

This is the distance between a case and the centroid for each group

(of the dependent variables) in attribute space (a dimensional space defined

by n variables). There is one mahalanobis distance for each group of case,

and it will be classified as belonging to the group with the smallest

mahalanobis distance. This means that the closer the case to the group

centriod, the smaller the mahalanobis distance. Mahalanobis distance is

measured in terms of standard deviations from the centroid.

1.7.8 The Classification Table

This is a table in which the rows are observed categories of the

dependent and the columns are the predicted categories of the dependent. All

cases lie on the diagonal at perfect prediction.

10

1.7.9 Hit Ratio

This is the percentage of cases on the diagonal of a confusion

matrix. It is the percentage of correct classifications. The higher the hit ratio

the less the error of misclassification, also the less the hit ratio the higher the

error rate.

1.7.10 Tolerance

This is the proportion of the variation in the independent variables that

is not explained by the variables already in the model. Zero tolerance means

that the independent variable under consideration is a perfect linear

combination of other variables already in the model. A tolerance of 1 implies

that the predictor variables are completely independent of other predictor

variables already in the model. Most computer packages set the minimum

tolerance at 0.01 as the default option.