File Type: MS Word (DOC) & PDF
File Size: 1,805KB
Number of Pages:51
1.1 Background of the study
An Information Retrieval System is a system that is capable of storage, retrieval and
maintenance of information, the general objective of an Information Retrieval System is
to minimize the overhead of a user locating needed information. Overhead can be
expressed as the time a user spends in all of the steps leading to reading an item
containing the needed information, the two major measures commonly associated with
information systems are precision and recall. Information Retrieval (IR) is a large and
growing field within Natural Language Processing (Magnus,2006). A cluster or
allocation unit as it was formally called is referred to as the smallest logical amount of
disk space that can be allocated to hold a file or directory. Hence, cluster analysis or
clustering is the task of grouping a set of objects in such a way that objects in the same
group (called clusters) are more similar (in some sense or another) to each other than
those in other groups. It is a main task of exploratory data mining, and a common
technique to statistical data analysis used in many fields, including machine learning,
pattern recognition, image analysis, information retrieval and bioinformatics (Linus,
2014). Cluster analysis itself is not one specific algorithm but the general task to be
Cluster analysis is a technique that assigns items to automatically created groups based
on a calculation of the degree of association between items and groups. In the
information retrieval (IR) field, cluster analysis has been used to create groups of
documents with the goal of improving the efficiency and effectiveness of retrieval, or to
determine the structure of the literature of a field. The terms in a document collection can
also be clustered to show their relationships. The two main types of cluster analysis
methods are the nonhierarchical, which divide a data set of N items into M clusters, and
the hierarchical, which produce a nested data set in which pairs of items or clusters are
successively linked. The nonhierarchical methods such as the single pass and reallocation
methods are heuristic in nature and require less computation than the hierarchical
methods. Clustered files are often suggested as a way to cut down search time in
similarity-based systems (Caroline and Stephen). In such an organization, similar
documents are grouped together in clusters, and only the most promising clusters are
The cluster hypothesis states the fundamental assumption we make when using
clustering in information retrieval.
Cluster hypothesis. “Documents in the same cluster behave similarly with respect to
relevance to information needs.” The hypothesis states that if there is a document from a
cluster that is relevant to a search request, then it is likely that other documents from the
same cluster are also relevant (Linus, 2014). This is because clustering puts together
documents that share many terms. In both cases, we posit that similar documents behave
similarly with respect to relevance.
Tree clustering is a form of clustering algorithm that joins together objects successively
into clusters, using some measures of similarity or distance. A typical example of this
kind of clustering is the hierarchical tree. Hierarchical clustering is based on the core
idea of objects being more related to nearby objects than to objects farther away. As such
these algorithms connect objects to form clusters based on their distances.
1.2 Statement of the Problem
Although, Benue state is still a developing state but this have not really affected the
increasing number of vehicles owners in the state, and this means more work for the
federal road safety corps in Benue state, there is need for a clear statistics of vehicle
owners in a particular local government and the Benue state in general, to combat the
menace of fake vehicle registration, false driving license, vehicle theft and so on, and
ensuring road rules and regulation are kept by road users through proper registration and
monitoring, and to achieve this, a means of advance storage, processing and easy retrieval
of information system is required , with this in mind, this study becomes very necessary
as it will improve vehicle registration process as well as ensure quick and easy access to
registered vehicles and their owners information by the FRSC anywhere and anytime in
the state, this system will to a large extent reduce the challenges and restriction
associated with the use of the manual process of registering vehicles.
1.3 Justification for the Study
This study provides a means of easy storage and retrieval of information of vehicles and
their owners for the FRSC in Benue State. It eases the stress of searching through the
entire directory when retrieving information on an existing record; it will ensure the
provision of a clear statistics of vehicle owners in a particular local government in the
state. The output of the study shall serve as a benchmark for the Federal Road Safety
Corps on the application of tree structured clustering in information retrieval and the
study will also serve as a reference material to those who use this project material.
1.4 Aim and Objectives of the Study
This study is designed to help Federal road safety corps in Benue state for easy
registration of vehicles and their owners, and efficient retrieval of this information
anytime and anywhere in and out of the state. The following are the objectives of the
i The review of clustering technique as a method of structuring data for easy storage
and retrieval as an alternative way to the manual way for storing and accessing the
ii To develop a system that will minimize or curb the fake registration menace in
iii To evaluate the system performance based on, accuracy, speed and safety.
1.5 Scope of the Study
The system thus developed is based on the retrieval of records of vehicle owners
registered to different local government area. A case study of Benue State, this system
can be used by a government agency (Federal Road Safety Commission) majorly for
quick and easy access of registered records, through it; records can be easily updated or
1.6 Definition of Terms
The terms used during this project work are as defined thus:
Cluster: a logical amount of disk space that can be allocated to hold a file or directory.
Algorithm: a sequence of steps that is used to find solution to a particular problem
Database: a computerized or automated record keeping system
Query: formal statements of information needs.
Oracle Database: a relational database which is queried using a Structured Query
Hierarchical clustering: a method of data analysis which seeks to build a hierarchy of
FRSC: Federal Road Safety Commission. A government agency charged with the
statutory responsibility of road safety administration in Nigeria.
Information Retrieval (IR): a discipline involved with the organization, structuring,
analysis, storage, searching and dissemination of information.
Cluster Centroid: the point with coordinates equal to the average values of the variables
for the observations in that cluster.
Multivariate datasets: a collection of data items that contains large and multiple
Dendogram: a tree diagram used to illustrate the arrangement of clusters produced using
Categorical Data: values or observations that can be sorted into groups or category.
Numerical Data: values or observations that can be measured.
Number plate: a metal or plastic plate attached to a vehicle for official registration and