Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size:  1,881KB

Number of Pages:77

 

ABSTRACT

The Heart Disease according to the survey is the leading cause of death all over the world. The health sector has a lot of data, but unfortunately, these data are not well utilized. This is as a result of lack of effective analysis tools to discover salient trends in data. Data Mining can help to retrieve valuable knowledge from available data. It helps to train model to predict patients’ health which will be faster compared to clinical experimentation. A lot of research has been carried out using the Cleveland heart datasets. Different Implementation of machine learning algorithms such as K-Nearest Neighbor, Support Vector Machine, Logistic Regression, Naïve Bayes, etc. have been applied but there has been limit to modeling using Bayesian Belief Network. This research tackles this drawback. This research applied Bayesian network (BN) modeling to discover the relationship between the 14 relevant attributes of the Cleveland heart data set from University of California, Irvine. The BN produce a reliable and transparent graphical representation between the attributes with the ability to predict new scenarios which makes it an artificial intelligent tool. The model has an accuracy of 85%, precision of 86%, recall of 85% and f1-score of 85%. It was concluded that the model outperformed Naïve Bayes classifier which have accuracy of 80%, precision of 81%, recall of 80% and f1-score of 80%.
Abstract: Naïve Bayes Classifier, Bayesian network, machine learning, data mining, artificial intelligence

 

 

TABLE OF CONTENTS

 

CERTIFICATION ………………………………………………………………………………………………………. II
ABSTRACT …………………………………………………………………………………………………………….. V
DEDICATION …………………………………………………………………………………………………………. VI
ACKNOWLEDGEMENT ……………………………………………………………………………………………. VII
LIST OF FIGURES ………………………………………………………………………………………………….. XIII
LIST OF TABLES …………………………………………………………………………………………………….. XV
CHAPTER ONE ………………………………………………………………………………………………………… 1
INTRODUCTION ……………………………………………………………………………………………………… 1
1.1 Research Background …………………………………………………………………………………………….1
1.2 Problem Statement ……………………………………………………………………………………………….4
1.3 Research Aim and Objectives …………………………………………………………………………………..4
1.4 Expected Contributions ………………………………………………………………………………………….4
1.5 Thesis Structure ……………………………………………………………………………………………………4
IX
CHAPTER TWO ……………………………………………………………………………………………………….. 6
LITERATURE REVIEW ……………………………………………………………………………………………….. 6
2.1 Introduction …………………………………………………………………………………………………………6
2.2 Machine Learning ………………………………………………………………………………………………….6
2.2.1 Supervised Learning ………………………………………………………………………………………………………………… 7
2.2.2 Unsupervised Learning ……………………………………………………………………………………………………………. 8
2.2.3 Semi-Supervised Learning ………………………………………………………………………………………………………… 8
2.2.4 Reinforcement Learning ………………………………………………………………………………………………………….. 8
2.3 Naïve Bayes …………………………………………………………………………………………………………9
2.4 Bayesian Belief Network ………………………………………………………………………………………. 10
2.4.1 Some Basic Definition in BB Network ………………………………………………………………………………………. 11
2.5 Application of Bayesian Network Model. ………………………………………………………………… 14
2.6 Some Programming Modules for Bayesian Network Programming ………………………………. 15
2.7 Review of Literature ……………………………………………………………………………………………. 15
CHAPTER THREE …………………………………………………………………………………………………… 19
X
METHODOLOGY …………………………………………………………………………………………………… 19
3.1 Introduction ………………………………………………………………………………………………………. 19
3.2 Network design ………………………………………………………………………………………………….. 19
3.3 Cleveland Heart Disease Data set …………………………………………………………………………… 20
3.4 Preprocessing data ……………………………………………………………………………………………… 22
3.4.1 Data Retrieval ………………………………………………………………………………………………………………………. 22
3.4.2 Handling Missing Values ……………………………………………………………………………………………………….. 22
3.4.3 Target Class Transformation …………………………………………………………………………………………………… 23
3.4.4 Data Discretization ………………………………………………………………………………………………………………… 23
3.5 Performance Metrics …………………………………………………………………………………………… 24
3.6 Tools Used ……………………………………………………………………………………………………….. 26
CHAPTER FOUR …………………………………………………………………………………………………….. 28
IMPLEMENTATION ……………………………………………………………………………………………….. 28
4.1 Introduction ………………………………………………………………………………………………………. 28
4.2 Data Preprocessing ……………………………………………………………………………………………… 28
XI
4.2.1 Data retrieval ……………………………………………………………………………………………………………………. 28
4.2.2 Handling Missing Values ………………………………………………………………………………………………………… 30
4.2.3 Target Class Transformation …………………………………………………………………………………………………… 31
4.2.4 Label Encoding ……………………………………………………………………………………………………………………… 31
4.2.5 Data Discretization ………………………………………………………………………………………………………………… 32
4.3 Structure Learning and Parameter Learning …………………………………………………………….. 35
4.3.1 Structure Learning using Hill Climbing Algorithm ……………………………………………………………………… 35
4.3.2 Parameter Learning ……………………………………………………………………………………………………………… 37
4.4 Training the Network ………………………………………………………………………………………….. 45
4.5 Testing …………………………………………………………………………………………………………….. 46
4.6 Performance Evaluation ……………………………………………………………………………………… 47
4.6.1 comparism with Naïve Bayes …………………………………………………………………………………………………. 48
CHAPTER FIVE ………………………………………………………………………………………………………. 50
CONCLUSION ……………………………………………………………………………………………………….. 50
5.1 Conclusion ………………………………………………………………………………………………………… 50

 

CHAPTER ONE

 

INTRODUCTION
1.1 Research Background
The heart is a vital organ in the human body. It is responsible for pumping blood through the blood vessels of the circulatory system. The blood helps to convey oxygen which is needed for the functioning of the body cells. The heart beats for about 100,000 times per day. Heart diseases are also called cardiovascular diseases (CVDs). Heart diseases happen to be the most common cause of death globally. According to WHO, both men and women are equally affected by heart disease. WHO estimated that 17.9 million people are dead due to heart disease in 2016 which represent 31% of all global deaths. 85% of these deaths are caused by stroke and heart attack (WHO, 2016).
Cardiovascular diseases result when the heart and blood vessels are not working normally. Other problems do exist along with the cardiovascular disease. Arteriosclerosis which generally means hardening of arteries, the arteries, in this case, becomes thicker and inflexible. Atherosclerosis means narrowing of arteries, so less blood flow through the buildups (Varun, Mounika, Sahoo, & Eswaran, 2019). Heart attacks occur generally when the blood clots or there is a blockage to blood flow from the heart.
To buttress the importance of overcoming deaths of cardiovascular diseases, WHO launched a new program on 22nd September 2016 called the Global Hearts (WHO, 2017).
2
Some factors that tend to prone heart diseases are smoking, high cholesterol, high blood pressure, physical inactivity, unhealthy diet, obesity, and poorly controlled diabetes. Diagnosis of heart disease is usually done by taking of medical history, the use of a stethoscope, Ultrasound, and ECG.
Data mining helps to identify useful trends in a large set of data. As a result of the increase in the amount of health data gathered through the electronic health record (EHR) systems, it is believed that strong analysis tools are important. With a huge amount of data, health care providers are now optimizing the efficiency of their organization using data mining. Data mining has helped the health care industry to specifically reduce costs by increasing efficiencies, improving patient’s quality of life, and most importantly saving the lives of more patients. In healthcare, data mining has proven effective in areas such as predictive medicine, customer relationship management, detection of fraud and abuse, management of healthcare and measuring the effectiveness of certain treatments (USF, 2019). Data mining can be applied to health data for many different purposes and investigations. These applications can roughly be grouped into the four main categories as discussed below (Tekieh & Raahemi, 2015).
Clinical Decision Making
Patients are normally examined by clinicians to diagnose their diseases. This process is experimental in nature and there is a possibility of the diagnosis being wrong. Data mining gives the experts in the field a second opinion for most diagnoses, especially to make sure the disease is not under-estimated during diagnosis. This information can help the clinicians to make more accurate decisions. It also helps the providers to deliver higher quality services.
Biomedicine and Genetics
3
This is another application of data mining in health care. Some diseases are studied in the biomedical and molecular level in addition to the clinical level. The effects of genetics on different diseases in micro-level can be investigated as the amount of retrieved biomedical data is increasing. In microarray data analysis, clustering techniques have received more attention compared to classification and association as there is not a lot of information available about genes, in contrast to health conditions and disease symptoms that a lot of information is known (Yoo, et al., 2011).
Population Health
Epidemiologists and other health analysts focused on the prevalence of diseases are interested in identifying the patterns, trends, and causes of spreading a specific disease across a population. For these studies, they consider different risk factors and health determinant, including early-life, lifestyle, and socio-demographic (Tekieh & Raahemi, 2015).
Health Administration and Policies:
Handling insurance plans is a big challenge in health administration. There is always a problem of insurance fraud. Data mining has been applied to detect insurance fraud in which the doctors, patients or hospitals claim drugs that were not necessary or procedures that did not actually happen. This can lead the insurance company to bankruptcy. The solution to this is a built predictive model that is real-time that can help to detect what type of drug is necessary for every diagnosis.
4
1.2 Problem Statement
As a result of some risks identified with clinical treatments such as the delay in the result and the non-availability of the medical facilities to the people, the prediction model is recommended. Although prediction model is not alternative to clinical treatments, but it can serve as first hand tool to be aware of any type of disease and be prepared for it.
1.3 Research Aim and Objectives
The aim of this research is to build a probabilistic graphical model- Bayesian Network to understand the relationship between the attributes of Cleveland heart disease dataset. Given the aim of this research, the objectives to achieve the aim are:
i. To transform the Cleveland dataset to a form suitable to model a probabilistic graphical model.
ii. To learn the structure and parameter of the model from the dataset.
iii. To make inference from the constructed model.
iv. To compare the model with Naïve Bayes algorithm.
1.4 Expected Contributions
It is expected that the Bayesian network model will assist in making inference about heart diseases, thereby serving as a diagnostic tool to support the medical practitioners.
1.5 Thesis Structure
The thesis contains five basic chapters.
5
Chapter 1 discusses the introductory part of heart diseases, problem statement, aim, objectives, the expected contribution and the thesis structure.
Chapter 2 gives an insight into an overview of machine learning, Bayesian Network, and critically review the literature.”
Chapter 3 discusses the research methodology used as well as the network design. The Cleveland Heart Disease Dataset was also described, the data preprocessing steps and the tools used for the study.
Chapter 4 provides a detailed discussion on the results and system implementation.
Chapter 5 rounds off the research by giving the conclusion.

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.