File Type: MS Word (DOC) & PDF

File Size: 880KB

Number of Pages:87

## ABSTRACT

Dimensionality reduction provides a compact representation of an original high-dimensional data,

which means the reduced data is free from any further processing and only the vital information is

retained. For this reason, it is an invaluable preprocessing step before the application of many

machine learning algorithms that perform poorly on high-dimensional data. In this thesis, the

perceptron classification algorithm – an eager learner – is applied to three two-class datasets

(Student, Weather and Ionosphere datasets). The k-Nearest Neighbors classification algorithm – a

lazy learner – is also applied to the same two-class datasets. Each dataset is then reduced using

fifteen different dimensionality reduction techniques. The perceptron and k-nearest neighbor

classification algorithms are applied to each reduced set and the performance (evaluated using

confusion matrix) of the dimensionality reduction techniques is compared on preserving the

classification of a dataset by the k-nearest neighbors and perceptron classification algorithms. This

investigation revealed that the dimensionality reduction techniques implemented in this thesis

seem to perform much better at preserving K-Nearest Neighbor classification than they do at

preserving the classification of the original datasets using the perceptron. In general, the

dimensionality reduction techniques prove to be very efficient in preserving the classification of

both the lazy and eager learners used for this investigation.

Keywords: Classification, confusion matrix, dimensionality reduction, eager learner, k-nearest

neighbors, lazy learner, and the perceptron.

** **

## TABLE OF CONTENTS

COVER PAGE……………………………………………………………………. Error! Bookmark not defined.

Certification ……………………………………………………………………………………………………………………. i

Dedication …………………………………………………………………………………………………………………….. iii

Acknowledgement …………………………………………………………………………………………………………. iv

Abstract …………………………………………………………………………………………………………………………. v

TABLE OF CONTENTS ………………………………………………………………………………………………… vi

LIST OF TABLES ……………………………………………………………………………………………………….. viii

LIST OF FIGURES ……………………………………………………………………………………………………….. ix

CHAPTER ONE: INTRODUCTION ………………………………………………………………………………… 1

1.1 Background of the Study ………………………………………………………………………………………… 1

1.2 Problem Statement …………………………………………………………………………………………………. 3

1.3 Aim and Objectives………………………………………………………………………………………………… 4

1.4 Scope and Limitations…………………………………………………………………………………………….. 5

1.5 Thesis Structure …………………………………………………………………………………………………….. 6

CHAPTER TWO: LITERATURE REVIEW ……………………………………………………………………… 7

2.1 Dimensionality Reduction ………………………………………………………………………………………. 7

2.2 Machine Learning ………………………………………………………………………………………………….. 8

2.3 Dimensionality Reduction and Machine Learning ……………………………………………………. 10

CHAPTER THREE: METHODOLOGY …………………………………………………………………………. 12

3.1 Dimensionality Reduction Techniques ……………………………………………………………………. 12

3.1.1 The New Random Approach …………………………………………………………………………… 12

3.1.2 Modified New Random Approach ……………………………………………………………………. 12

3.1.3 Singular Value Decomposition (SVD) ……………………………………………………………… 13

3.1.4 Principal Component Analysis (PCA) ………………………………………………………………. 14

3.1.5 The Variance Approach ………………………………………………………………………………….. 14

3.1.6 The Combined Approach ………………………………………………………………………………… 15

3.1.7 The Direct Approach ………………………………………………………………………………………. 16

3.1.8 Random projection (RP) …………………………………………………………………………………. 17

3.1.9 Bottom-Up Approach……………………………………………………………………………………… 17

3.1.10 Top-Down Approach ……………………………………………………………………………………. 19

3.1.11 The New Top-Down Approach ……………………………………………………………………… 19

## CHAPTER ONE

INTRODUCTION

1.1 Background of the Study

Data volumes and variety are increasing at an alarming rate making very tedious any attempt to

glean useful information from these large data sets. Extracting or mining useful information and

hidden patterns from the data is becoming more and more important but can be very challenging

at the same time [1]. A lot of research done in domains like Biology, Astronomy, Engineering,

Consumer Transactions and Agriculture, deal with extensive sets of observations daily.

Traditional statistical techniques encounter some challenges in analyzing these datasets due to their

large sizes. The biggest challenge is the number of variables (dimensions) associated with each

observation. However, not all dimensions are required to understand the phenomenon under

investigation in high-dimensional datasets; this means that reducing the dimension of the dataset

can improve accuracy and efficiency of the analysis [2]. In other words, it is of great help if we

can map a set of points, say n, in d-dimensional space into a p-dimensional space -where p << dso

that the inherent properties of that set of points, such as their inter-point distances, their labels,

etc., does not suffer great distortion. This process is known as Dimensionality reduction [3].

A lot of methods exist for reducing the dimensionality of data. There are two categories of these

methods; in the first category, each attribute in the reduced dataset is a linear combination of the

attributes of the original dataset. In the second category, the set of attributes in the reduced dataset

is a subset of the set of attributes in the original dataset [4]. Techniques belonging to the first

category include Random Projection (RP), Singular Value Decomposition (SVD), Principal

Component Analysis (PCA), and so on; while techniques in the second category include but are

not limited to the Combined Approach (CA), Direct Approach (DA), Variance Approach (Var),

2

New Top-Down Approach (NTDn), New Bottom-Up Approach (NBUp), New Top-Down

Approach (modified version) and New Bottom-Up Approach (modified version) [5].

Dimensionality reduction provides a compact representation of an original high-dimensional data,

which means the reduced data is free from any further processing and only the vital information is

retained, so it can be used with many machine learning algorithms that perform poorly on highdimensional

data [6]. Calculation of inter-point distances is essential for many machine learning

tasks and when the dimensionality increases, it has been proved that “the distance of a sample

point to the nearest point becomes very similar to the distance of the sample point to the most

distant point”, thereby deteriorating the performance of machine learning algorithms [7].

Therefore, dimensionality reduction is an invaluable preprocessing step before the application of

many machine learning algorithms.

Machine learning is a scientific field in which computer systems can automatically and

intelligently learn their computation and improve on it through experience [8], [9]. Machine

learning algorithms are of two main types: supervised learning algorithms and unsupervised

learning algorithms. These algorithms have been used in solving a lot of complex real-world

problems [10], [11]. In unsupervised learning, the set of observations are categorized into groups

(clusters) basing the categorization on the similarity between them. This categorization is

otherwise known as clustering [8]. Many clustering algorithms exist, among which k-means

clustering is the most famous for a large number of observations [12].

Unlike clustering, classification is a supervised learning method in which the corresponding label

for any valid input is predicted based on a number of training examples referred to as “training

set,” [8], [12]. Classification algorithms can further be categorized into eager and lazy learners,

and this investigation considers one from each category. Eager learning algorithms attempt to

3

construct a general rule or create a generalization during the training phase which can further be

used in classifying unseen instances [13]. Example of eager learners includes decision trees,

support vector machine, and the perceptron.

The perceptron, an eager learner, is one of the earliest and simplest of all classification algorithms

invented by Rosenblatt [14], basically used for classifying each point of a data set into either a

positive or a negative label (1 or -1, good or bad, hot or cold, man or woman, etc.) [15]. It is

interesting to know that in its basic form, it is still as valid as when it was first published [16].

On the other hand, a lazy learner delays any generalization or model construction until it is

presented with an unseen instance to be classified [17]. This idea of not conducting any processing

until a lazy learner is presented with an unseen instance makes the learner to require a lot of space

in memory for storing the whole of the training instances and processing them each time it is

presented with a new unseen instance. Example of a lazy learner is the k-nearest neighbor classifier

[18]. In this algorithm, the result/label of any given instance is predicted based on the label most

common to its k nearest neighbors, k, in this case, is a user-defined positive integer, normally with

a small value [15].

1.2 Problem Statement

In data classification, the corresponding label (class) for any valid input is predicted based on a

number of training examples referred to as “training set”. This is achieved using a classifier model

learning algorithm is applied to a training set made up of past examples having the same set of

attributes with the unseen example [8], [12]. However, before starting the training, the label of

each example in the “training set” is known [19].

4

To build a classifier model, an eager learner attempts to construct a general rule in the training

phase which will subsequently be used in classifying unseen instances while a lazy learner delays

the process until it is presented with an unseen instance [13]. The main disadvantage in eager

learning is the long time which the learner takes in constructing the classification model but after

the model is constructed, an eager learner is very fast in classifying unseen instances, while for a

lazy learner, the disadvantage is the amount of space it consumes in memory and the time it takes

during the classification [17]. This makes dimensionality reduction a very crucial preprocessing

step because it facilitates classification, and compression of high-dimensional data and thus

conserves memory and provides a compact representation of an original high-dimensional data

[5].

Researches have been conducted on how dimensionality reduction techniques affect the

performance of classifiers [20]–[22]. However, very little attention is given to the extent to which

these reduction techniques facilitate and preserve classification. Therefore, this thesis attempts to

advance the research by investigating the extent to which dimensionality reduction preserves the

classification of weather dataset, student dataset and the ionosphere dataset obtained from “UCI

machine learning repository”, in order to fill the gap in literature and provide steps for further

research in the area of machine learning.

1.3 Aim and Objectives

The aim of this research is to investigate the extent to which dimensionality reduction techniques

preserve classification.

5

The objectives of the research are as follows:

1. Implementation of fifteen dimensionality reduction techniques and using these techniques to

reduce the weather and student datasets, as well as the ionosphere dataset obtained from the UCI

machine learning repository [23].

2. Implementation of the perceptron classification algorithm and using it to classify the data points

of a two-class dataset. It shall also be applied to the datasets reduced from this two-class dataset

using the techniques above, and comparisons will be made to determine the extent to which the

reduction methods preserve the classification of the original dataset using the perceptron.

3. Implementation of the k-Nearest Neighbors classification algorithm and comparing the

performance of the dimensionality reduction techniques on preserving the classification of a

dataset by the k-nearest neighbors and perceptron classification algorithms.

4. Using confusion matrices to show the extent to which each dimensionality reduction method

preserves classification of the original datasets and make comparisons with each other.

1.4 Scope and Limitations

This project is limited to showing the extent to which each of the dimensionality reduction methods

implemented in this thesis preserves the classification of the original datasets by the perceptron

and k-Nearest Neighbors classification algorithms. Accuracy will be used as the performance

measure for showing the extent of the classification preservation, and this shall be obtained using

confusion matrices.

6

1.5 Thesis Structure

This thesis consists of five chapters. Chapter 1 introduces dimensionality reduction and discusses

its importance and applications to machine learning tasks. After presenting the problem to be

addressed, the aim of the research is stated and the objectives are outlined.

Chapter 2 presents a review of literature related to dimensionality reduction and machine learning

in general. Existing literature on single layer neural network is reviewed.

Chapter 3 describes the methodology used in this thesis. It discusses the methods in detail and

explains how they are applied in achieving the objectives of the thesis. The results obtained from

the methodology is presented and discussed fully in Chapter 4.

Chapter 5, which is the final chapter, provides a summary of the work and the results obtained in

this thesis, concludes the research and also gives a possible recommendation for further research.

7