Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF
File Size: 1,125KB
Number of Pages:78

ABSTRACT

Data mining is used in extracting rules to predict certain information in many areas of Information Technology, medical science, biology, education, and human resources. Data mining can be applied on medical data to foresee novel, useful and potential knowledge that can save a life, reduce treatment cost, increases diagnostic and prediction accuracy as well as save human resources. Data mining involve several techniques such as anomaly detection, classification, regression, clustering, time series analysis, association rule, and summarization. Classification is the most important application of data mining. In this thesis, we use a classification technique called Naïve Bayes (a supervised learner) to build a hybrid framework for classifying and predicting the status of only malaria and their complications in a suspect patient using their clinical presentation. For the purpose of this study, we considered the parameter: fever, headache, nausea, vomiting, respiratory distress, convulsion, and coma as the main distinct clinical symptom. This method has the relative advantage of easy to construct, can classify categorical data, and occurrences of an event (attributes) are independent, and work better on high dimensional data. The framework developed was divided into two phases Classification Phase 1, Classification Phase 2 and is implemented using Java built on Weka library version 3.8.0. The framework was trained using data acquired from hospital and tested for performance accuracy using Receiver Operating Characteristic (ROC) and Confusion Matrix (CM). The results demonstrated that the system predicted accurately with performance accuracy of 90%, 98% on confusion matrix and 92%, 99% on ROC-Area under Curve (ROC-AUC) for Classification Phase 1 and Classification Phase 2 respectively. This means that ROC presented more optimal result than confusion matrix and such system should be useful for rural area where clinician or medical equipment are is not available to assist in predicting malaria is suspected malaria patient.

TABLE OF CONTENTS

Declaration ……………………………………………………………………………………………………………………. iii
Certificate ……………………………………………………………………………………………………………………… iv
Acknowledgements …………………………………………………………………………………………………………. v
Abstract ………………………………………………………………………………………………………………………… vi
List Of Abbreviations ………………………………………………………………………………………………….. xiiii
CHAPTER ONE …………………………………………………………………………………………………………….. 1
INTRODUCTION ………………………………………………………………………………………………………….. 1
1.1 Background of the Study …………………………………………………………………………………………. 1
1.2 Data Mining…………………………………………………………………………………………………………… 2
1.3 Classification …………………………………………………………………………………………………………. 3
1.4 Problem Statement …………………………………………………………………………………………………. 4
1.5 Research Aim & Objectives …………………………………………………………………………………….. 5
1.5.1 Research Objectives …………………………………………………………………………………………. 5
1.6 Limitation of the Study …………………………………………………………………………………………… 6
1.7 Chapterization ……………………………………………………………………………………………………….. 6
CHAPTER TWO ……………………………………………………………………………………………………………. 7
LITERATURE REVIEW ………………………………………………………………………………………………… 7
2.1 Basic Concept of Machine Learning, Data Mining, and Classification ………………………….. 7
2.2 Data Mining…………………………………………………………………………………………………………… 7
viii
2.2.1 Clustering ……………………………………………………………………………………………………….. 8
2.2.2 Classification …………………………………………………………………………………………………… 8
2.2.3 Decision Tree Classifier ……………………………………………………………………………………. 9
2.2.4 Support Vector Machine (SVM) ………………………………………………………………………. 10
2.3 Model of Prediction ………………………………………………………………………………………………. 12
2.4 Review of Literature……………………………………………………………………………………………… 12
2.4.1 Prediction Systems Using Naïve Bayes technique ………………………………………………. 12
2.4.2 Hybrid model of Prediction systems …………………………………………………………………. 15
2.4.3 Model Evaluation …………………………………………………………………………………………… 17
CHAPTER THREE ………………………………………………………………………………………………………. 19
MATERIALS AND METHODS …………………………………………………………………………………….. 19
3.1 Concept of Classification Technique ………………………………………………………………………. 19
3.2 Naïve Bayes Classification…………………………………………………………………………………….. 19
3.3 Software Design Phase ………………………………………………………………………………………….. 20
3.3.1 Requirement Elicitation …………………………………………………………………………………… 20
3.3.2 Outline of the Requirement ……………………………………………………………………………… 20
3.3.3 Software Requirement …………………………………………………………………………………….. 20
3.3.4 Hardware Requirement ……………………………………………………………………………………. 21
3.4 Use Case of the System …………………………………………………………………………………………. 21
3.5 Naïve Bayes Algorithm …………………………………………………………………………………………. 21
3.6 The Proposed Framework of Prediction…………………………………………………………………… 22
3.6.1 Prediction Procedure ………………………………………………………………………………………. 23
ix
3.7 Data Collection and Sample Size technique …………………………………………………………….. 24
3.7. 1 Sample Size Determination …………………………………………………………………………….. 24
3.8 Experimental Setup ………………………………………………………………………………………………. 25
3.9 Example 1 ……………………………………………………………………………………………………………. 26
3.10 Performance Measure of Classifier ……………………………………………………………………….. 27
3.10.1 Accuracy Check……………………………………………………………………………………………. 27
3.10.2 Confusion Matrix………………………………………………………………………………………….. 27
3.10.3 ROC and Area under Curve …………………………………………………………………………… 31
CHAPTER FOUR …………………………………………………………………………………………………………. 33
RESULTS AND DISCUSSION ……………………………………………………………………………………… 33
4.1 Presentation of Results ………………………………………………………………………………………….. 33
4.1.1 Add Record ……………………………………………………………………………………………………. 34
4.1.2 Designed Naïve Bayes Model ………………………………………………………………………….. 35
4.1.3 Prediction Interface ………………………………………………………………………………………… 36
4.1.4 Performance Result ………………………………………………………………………………………… 37
4.2 Our Contribution ………………………………………………………………………………………………. 40
Chapter Five …………………………………………………………………………………………………………………. 41
Summary, Conclusion, And Recommendations ………………………………………………………………… 41
5.1 Summary …………………………………………………………………………………………………………….. 41
5.2 Conclusion …………………………………………………………………………………………………………… 41
5.3 Recommendations ………………………………………………………………………………………………… 42
5.4 Future Work ………………………………………………………………………………………………………… 42

CHAPTER ONE

INTRODUCTION
This chapter introduced the basic idea about malaria, data mining, classification, problem statement which involves severity of malaria, our research objective and contribution, then finally chapterization of the entire thesis.
1.1 Background of the Study
Malaria can be regarded as a life-threatening parasite which is contained in spittle of mosquitoes and is transmitted through a bite (Pirnstill & Coté, 2015; Razzak, 2015). Thereafter biting human, it takes about 45 minutes to spread across entire human blood (Bartoloni & Zammarchi, 2012). Afterward, the infection would start confronting the body’s red blood cells together with liver cells, altering the body’s biochemistry and attributes of cells built – structure (Pirnstill & Coté, 2015). The four most common species of malaria parasites found in Sub-Saharan Africa includes P.falciparum, P. vivax, P. ovale and malariae ( Calderaro, Piccolo, Gorrini, Rossi, Montecchini, Dell’Anna, & Arcangeletti, 2013). Among them, P.falciparum was regarded as the most common causes of malaria and its severe cases (Gomes, Vitorino, Costa, Mendonça, Oliveira, & Siqueira-Batista, 2011). Therefore, historically, P.falciparum was attributed to the causes of malaria disease in Sub-Saharan Africa (Howes et al., 2015).
The numerous figures from assessing the hazard of this infection have been overstressed, with first initial score estimates of 300-500 million clinical cases annually leading to 1-3 million deaths globally (Chotivanich, Silamut, & Day, 2007). The parasite of P.falciparum cases of malaria alone was responsible for an approximate estimate of 40% (2.4 billion population) of the world that was affected by malaria (Gomes et al., 2011). With reference to World malaria report of (2010), out of 225 million recorded incidence in the world, 781000 death in 2009 (Mohapatra, Jangid, & Mohanty, 2014), and from a total of 198 million incidences of malaria documented in 2013, 584000 were stated death (Gu, Chen, & Yang, 2015). The most devastating effect of malaria infection is that its most targets are children <5 years of age giving an annual estimate of >300 million population globally and >3000 pediatric per day (Stauffer & Fischer, 2003). However, many levels of this infection from even clinical point of view were present in patients,
2
in which at some degree may cause neurological complication in human brain called as cerebral malaria (Idro, Marsh, John, & Newton, 2010).
Cerebral malaria can be defined as any anomaly of mental status in a patient with malaria and has a case death rate between 15% and 50%. Cerebral malaria is a quickly progressive possibly lethal complication of P.falciparum infection. It is categorized by unarousable and persistent coma along with regular motor signs. The most vulnerable groups of people are pregnant women, children, and adults with weak immune system. Many scholars agreed to the stated term that the most occurring severity complications of malaria are severe anemia and cerebral malaria. Cerebral malaria is the most evident cause of neurological complication of malaria infection with P.falciparum. Its syndrome is clinically characterized by the presence of a sexual forms of the parasite and coma caused by no any other concomitant disease of these features (Idro et al., 2010).
1.2 Data Mining
Data mining is a process of scrutinizing data from a different viewpoint and collecting the knowledge from such data (Hemanth, Vastrad, & Nagaraju, 2011; Dangare & Apte, 2012). Data mining involves the use of numerous techniques in pattern recognition and knowledge presentation. In fact, data mining task is accomplished by using: Association, class description, classification, prediction, clustering, and time series analysis (Srinivas, Rani & Govrdhan, 2010).
The healthcare industry in this 21st century is rich with data, and this data is an ingredient in data mining and knowledge discovery (Dangare & Apte, 2012). Knowledge discovery is a well-defined process of distinct phases and data mining is the important phase in the discovery of useful hidden knowledge in large databases (Soni, Ansari, Sharma, & Soni, 2011; Srinivas et al., 2010). For example, the huge amount of medical data that are gathered in hospital on daily basis contains hidden information. This hidden information could be extracted using various data mining technique and used for decision making (Taneja, 2013). Data mining provides a mechanism for novelty and discovering of unobserved patterns in data (Srinivas et al., 2010). Data mining can also be referred to as a process of discovering pattern and mining of this pattern from large datasets (Jothi, Rashid, & Husain, 2015).
Data mining has been used in the healthcare sector to provide an assistive tool for early detection, prediction systems of various diseases that can be used for decision making (Jothi et
3
al., 2015). It is sometimes used as the technique of both classification and clustering to achieve a common goal (Patil, Chopade, Mishra, Sane, & Sargar, 2016). Data mining can be useful in answering many vital and critical questions about health care (Srinivas et al., 2010). For example, data mining has been used in prediction of malaria outbreak using a large data set from Maharashtra state, India. The prediction was achieved using two data mining classification Support Vector Machine (SVM) and Artificial Neural Networks (Sharma, Kumar, Panat, & Karajkhede, 2015). It is used to improve quality of service in the healthcare industries and assist the medical practitioner in reducing the number of adverse effect on drug in order to recommend cheap medicinally equivalent substitutes ( Srinivas et al., 2010).
1.3 Classification
Classification is the process of defining a fitting model which describes and differentiates class label with the aim of providing the ability to use the model to predict the class of tuples whose class label is unidentified (Soni, Ansari, Sharma, & Soni, 2011). This imitative model is constructed on the analysis of training data i.e., data tuples whose class label is identified (Tribhuvan, Tribhuvan, & Gade, 2015).
One key area of data mining that demonstrates its application in the healthcare sector is the use of classification techniques in classifying and predicting various diseases (Srinivas et al., 2010). Classification is supervised learning methods that extract models, labeling significant data classes or predicting upcoming trends (Soni et al., 2011). It is the process of assigning a class to find formerly unseen records as correctly as possible by using a collection of records called a training dataset, where each tuple in the training set comprises a set of attributes, and then one of the attributes is called a class. The objective is to provide a classification model for the class elements, then devise a validation mechanism using test data set in order to determine the accuracy of the model ( Kuar & Wasan, 2006). This technique has been used in the healthcare environment to automatically diagnosis a patient’s disease in order to choose immediate treatment while awaiting lab-test results (Nikam, 2015).
Several classification techniques exist in data mining which includes: Support Vector Machine (SVM), K-nearest neighbor, Naive Bayes, IB3, Artificial Neural Network (ANN), Decision Tree and J48, C4.5 version of decision tree classification (Nikam, 2015). Each of these techniques can be used to classify record depends on the nature of pattern in data and the phenomena to investigate ( Kriegel, Borgwardt, Kröger, Pryakhin, Schubert, & Zimek, 2007).
4
1.4 Problem Statement
The clinical presentation of malaria in a patient is the symptomatic features presented by patients. This feature is an indication of disease course and therefore, has direct significance in guiding clinicians about the decision to take. The combination of symptoms and signs has made tremendous achievement in predicting disease (Lubezky, Ben-Haim, Nakache, Lahat, Blachar, Brazowski, & Klausner, 2010). Even the standard diagnostic criteria developed by clinicians and researchers was based on clinical manifestation that assists with an integrated approach in treatment and management of disease (Laishram, Sutton, Nanda, Sharma, Sobti, Carlton, & Joshi, 2012; Patil et al., 2016).
The main challenging issue confronting the healthcare industry is lack of quality of service at minimal cost implying from diagnosing to predicting patients correctly (Chaurasia & Pal, 2013) or administering therapy that is effective, and sometimes even understanding the complications that may result from diseases (Srinivas et al., 2010; Dangare & Apte, 2012). This issue can sometimes lead to an unfortunate clinical decision that can result in devastating consequences that are unacceptable (Dangare & Apte, 2012).
The availability of patients medical data has derived the need for clinicians, payers, and patients for an alternative computer-based assessment tool that can assist in decision making (Soni et al., 2011). For example, the physicians can compare analytical information of numerous patients with the matching condition and physicians can equally confirm their results with the conformity of other physicians dealing with a matching case from another part of the country (Srinivas et al., 2010). For a disease that can be complicated like malaria, the patient is first classified as either have the presence of malaria i.e. positive or negative before further classify the severity of the disease as either uncomplicated (mild) or complicated (severe) based on clinical manifestation (Bartoloni & Zammarchi, 2012). In each case, it’s highly important to understand the clinical features of these classes (Gomes et al., 2011). In the case of classifying positive malaria, the most clinical features are fever, headache, vomiting, and loss of appetite in accordance with the report for predicting malaria (Ndyomugyenyi, Magnussen, & Clarke, 2007). Based on this assumption, we consider these features in predicting positive malaria in phase 1. Further classifications of complicated or uncomplicated are in phase 2. Following are the classes considered in the study:
5
i. Positive Class (P): Patient can be confirmed to have positive malaria when the patient has one or more of the above symptoms and has also been confirmed by laboratories (Mutanda, Cheruiyot, Hodges, Ayodo, Odero, & John, 2014). Patients found to be positive from diagnosis can confirm the clinical suspicion of malaria (Bartoloni & Zammarchi, 2012). Therefore, in the course of this study, we considered fever, headache, nausea, and vomiting as the most occurring symptoms in a patient with malaria.
ii. Negative Class (N): Patient may have some of the parameters (symptoms) of positive malaria, but after several trying of diagnostic tests to confirm, the malaria is undetectable (Cdc, 2013). This means that the existence of the signs may be as a result of other concomitant disease but not really caused by malaria (Rai & Abraham, 2012).
iii. Mild or Uncomplicated Class (U) of malaria is the presence of one or few signs of clinical manifestation such as mile fever, sweating, weakness, chills, loss of appetite coupled with a headache or recent history of malaria but no signs of severity (Arévalo-Herrera, Lopez-Perez, Medina, Moreno, Gutierrez, & Herrera, 2015). The core central features, in this case, are fever and headache (Rai & Abraham, 2012).
iv. Severe or Complicated Class (C) manifest with one or more of the following features; repeated generalized convulsions, impaired consciousness / coma or circulatory collapse/shock, acute respiratory distress syndrome, severe anemia, renal failure, metabolic acidosis, hypoglycemia, hyperthermia, abnormal bleeding and hyperparasitaemia (Bartoloni & Zammarchi, 2012; Laishram et al., 2012). Headache and high fever are centered across all class of suspected patients with malaria (Rai & Abraham, 2012).
1.5 Research Aim & Objectives
The aim of this study is to design a hybrid model for predicting malaria which utilize large data obtained from hospital.
1.5.1 Research Objectives
The major objectives of this research are:

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.