Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size:  965KB

Number of Pages:32

 

ABSTRACT

 

In this project, we shall implement the hierarchical clustering algorithm and apply it to various data sets such as the weather data set, the student data set, and the patient data set. We shall then reduce these datasets using the following dimensionality reduction approaches: Random Projections (RP), Principal Component Analysis (PCA), Variance (Var), the New Random Approach (NRA), the Combined Approach (CA) and the Direct Approach (DA).
The rand index and ARI will be implemented to measure the extent to which a given dimensionality reduction method preserves the hierarchical clustering of a data set. Finally, the six reduction methods will be compared by runtime, inter-point distance preservation, variance preservation and hierarchical clustering preservation of the original data set.

 

TABLE OF CONTENTS

 

DECLARATION ……………………………………………………………………………………………………. i
ABSTRACT ………………………………………………………………………………………………………….. ii
ACKNOWLEDGEMENT …………………………………………………………………………………….. iii
DEDICATION ……………………………………………………………………………………………………… iv
LIST OF FIGURES ………………………………………………………………………………………………. vi
LIST OF TABLES ……………………………………………………………………………………………….. vii
1 INTRODUCTION ………………………………………………………………………………………….. 1
2 HIERARCHICAL CLUSTERING ……………………………………………………………………. 2
1.1 SNIPPET OF CLUSTERED DATA ……………………………………………………………….. 2
3 DIMENSIONALITY REDUCTION TECHNIQUES ………………………………………….. 4
3.1.1 RANDOM PROJECTIONS (RP) …………………………………………………………. 4
3.1.2 PRINCIPAL COMPONENT ANALYSIS (PCA) …………………………………… 4
3.1.3 NEW RANDOM APPROCAH…………………………………………………………….. 5
3.1.4 VARIANCE ………………………………………………………………………………………. 6
3.1.5 COMBINED APPROACH ………………………………………………………………….. 6
3.1.6 DIRECT APPROACH ………………………………………………………………………… 7
4 IMPLEMENTATION ……………………………………………………………………………………… 9
4.1.1 RANDOM PROJECTION (RP) …………………………………………………………. 10
4.1.2 PRINCIPAL COMPONENT ANALYSIS (PCA) …………………………………. 11
4.1.3 NEW RANDOM APPROACH…………………………………………………………… 12
4.1.4 VARIANCE …………………………………………………………………………………….. 13
4.1.5 DIRECT APPROACH ………………………………………………………………………. 14
4.1.6 COMBINED APPROACH ………………………………………………………………… 15
5 RAND INDEX ……………………………………………………………………………………………… 16
6 CONCLUSION …………………………………………………………………………………………….. 17
7 REFERENCES ……………………………………………………………………………………………… 18
8 Appendix A MATLAB CODES USED FOR IMPLEMENTATION …………………… 19

 

CHAPTER ONE

 

INTRODUCTION
Given a data set containing n points in high dimensional space, it is often helpful if it can be projected onto a lower dimensional space without suffering great distortion. This process is called dimensionality reduction. Essentially, dimensionality reduction reduces the number of variables to be considered in a way that the relevant data is retained while reducing the amount of the data.
Dimensionality reduction helps to reduce the runtime of algorithms whose runtime depends on the dimensions of the working space. It also broadens the scope for the choice of method for data processing. It provides complexity control which avoids overfitting of the training data.
Dimensionality can be applied in several domains which include text data, image data, nearest neighbor search and in the domain of clustering and classification. Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning. Classification, on the other hand, is a method of supervised learning. The task of the supervised learner is to predict the value of the function for any valid input after having seen a number of training examples (i.e. pair of input and target output). As mentioned above, this project focuses on the categorization of data using hierarchical clustering.

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.