Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size:  2,101KB

Number of Pages:64

 

ABSTRACT

 

Breast cancer is a prevalent disease that affects mostly women, an early diagnosis will expedite the treatment of this ailment. In recent times, Machine Learning (ML) techniques have been employed in biomedical and informatics to help fight breast cancer. This research work proposed an ML model for the classification of breast cancer. To achieve this we employed logistic regression (LR) and also compared our model’s performance with other extant ML models namely, Support Vector Machine (SVM), Naïve Bayes (NB), and Multilayer Perceptron (MLP). The original Wisconsin Diagnostic Breast Cancer dataset (WDBC) was used. Our performance evaluation was done for two phases, i.e. Phase 1: when the WBCD is scaled (feature scaling) and Phase 2: when the dataset is not scaled. All models excluding MLP performed well when there is no feature scaling of dataset with f1-scores of (LR=97%, SVM = 97%, NB = 95%, MLP= 52%). However, when feature scaling is applied on dataset, the four models have f1-scores above 90% (SVM = 98%, LR = 97%, NB = 97%, MLP = 97%). Notably, the f1-score for LR in both cases did not change, hence to the best of our knowledge, we concluded that LR, given its simplicity and low time complexity is a good model to employ for binomial classification.
Keywords: Logistic regression, machine learning, supervised learning, features scaling, prediction models, and performance metrics.

 

TABLE OF CONTENTS

CERTIFICATION …………………………………………………………………………………………… i
Dedication …………………………………………………………………………………………………… v
Acknowledgement ………………………………………………………………………………………… vi
Table of content ………………………………………………………………………………………….. vii
List of Table …………………………………………………………………………………………………. ix
List of figures ……………………………………………………………………………………………….. x
List of Abbreviations ……………………………………………………………………………………… xi
CHAPTER ONE …………………………………………………………………………………………… 1
INTRODUCTION ………………………………………………………………………………………….. 1
1.1 Research Background ……………………………………………………………………………………………….. 1
1.1.1 Data Mining …………………………………………………………………………………………………………. 2
1.1.2 Classification ……………………………………………………………………………………………………….. 3
1.2 Problem statement …………………………………………………………………………………………………….. 3
1.3 Research Aim and objectives …………………………………………………………………………………….. 3
1.4 Limitation of study ………………………………………………………………………………………………………. 3
1.5 Paper organization …………………………………………………………………………………………………….. 4
CHAPTER TWO …………………………………………………………………………………………… 5
LITERATURE REVIEW …………………………………………………………………………………. 5
2.1 Basic Terminologies and Concepts ……………………………………………………………………………. 5
2.1.1 Data Pre-processing ……………………………………………………………………………………………. 6
2.1.2 Feature scaling ……………………………………………………………………………………………………. 6
2.1.3 Supervised Learning ……………………………………………………………………………………………. 6
2.1.4 Classification ……………………………………………………………………………………………………….. 7
2.2 Literature Review ……………………………………………………………………………………………………….. 8
CHAPTER THREE ……………………………………………………………………………………… 12
MATERIALS AND METHOD ………………………………………………………………………… 12
3.1 Concept of Classification Technique ………………………………………………………………………… 12
3.2 Software Design Phase ……………………………………………………………………………………………. 12
3.3 Hardware Requirement …………………………………………………………………………………………….. 13
3.4 Proposed Framework ……………………………………………………………………………………………….. 13
3.4.1 Experiments ………………………………………………………………………………………………………. 14
3.4.2 Data collection …………………………………………………………………………………………………… 15
3.4.2 Data pre-processing …………………………………………………………………………………………… 18

 

 

CHAPTER ONE

INTRODUCTION
1.1 Research Background
Breast cancer is now one of the most prevailing cancers that affects humans, especially woman, and early diagnosis would go a long way to reducing the damage done by this cancer on its victims. Breast cancer’s causes are multifactorial and involve family history, obesity, hormones, radiation therapy, and even reproductive factors. Every year, one million women are newly diagnosed with breast cancer, according to the report of the world health organization half of them would die, because it’s usually late when doctors detect the cancer (Aaltonen et al., 1998). Breast cancer can be categorized into two, which are malignant breast cancer and benign breast cancer. The classification of breast cancer as either malignant or benign is possible by scientifically studying the features of breast tumours, lumps, or any abnormalities found in the breast. At the benign stage the cancer has less risk and is not life-threatening while cancer that is categorized as malignant is life-threatening (Huang, Chen, Lin, Ke, & Tsai, 2017). Malignant tumours expand to the neighbouring cells, which can spread to other parts, whereas benign masses can’t expand to other tissues, the expansion is then only limited to the benign mass (Aaltonen et al., 1998; Huang et al., 2017).
To accurately classify breast cancer as benign or malignant, researchers have employed an aspect of Artificial intelligence (AI) which is machine learning. Machine learning algorithms are used to build models that accept as input, attributes that qualify a breast cancer case and produce as output a label for the type of the cancer, label 1 for being benign or label 2 for malignant.
2
Machine learning model such as Neutral network, Support vector machine (SVM), K Nearest Neighbour (KNN), Decision Tree, Naïve Bayes (NB), and logistic regression (LR), have all been used in the past to classify breast cancer. Accurate classification of breast cancer would translate to early detection, diagnosis, treatment and where possible full eradication of the cancer.
1.1.1 Data Mining
This can be seen as “mining knowledge in data” or rather as an extraction of information from a large or voluminous dataset (K.Srinivas, Rani, & A.Govrdhan, 2010). It is the most important aspect of machine learning (Kaymak, Helwan, & Uzun, 2017); whereas the salient focus aspect of data mining is the pattern recognition ability (Jothi, Rashid, & Husain, 2015). Data mining techniques can be applied to medical data records to trace and foresee salient pattern in order to save a life, increase treatment accuracy, reduce the cost of treatment and reduces human error (Manjusha, Sankaranarayanan, & Seena, 2015). Techniques such as abnormality detection, regression, clustering, summarization and association rule employ data mining. In data mining, there are various steps to be taken in finding meaningful patterns, namely:
i. Pre-processing – this involves cleaning, feature extraction, feature selection, and dimensionality reduction.
ii. Clustering – unsupervised learning technique by grouping a set of related data.
iii. Classification – this is a supervised machine learning technique; a data set (training data) is required in such a system to establish relationships between data items. Whenever a test data is supplied, it will classify such data based on the learnt relationship. In this research work, we will be focusing on classification.
3
1.1.2 Classification
Classification in data mining involves basically two processes: firstly it is the model training and with a test data to determine the class label of unknown test instances; secondly is the performance evaluation to check the accuracy of the classifier model, that is calculating the differences between the classified and actual values for each attribute tuple in the test dataset (Jouni, Issa, Harb, Jacquemod, & Leduc, 2016; Kaymak et al., 2017).
1.2 Problem statement
One of the problems of classification lies in the use of appropriate methods to fit the model depending on the nature of data. Which machine learning model would perform best in the presence of dependency among the data features, unbalanced data, and sparsely valued data features is still open research.
1.3 Research Aim and objectives
The aim is to develop a prediction system for detecting breast cancer.
The main objectives are:
1. Study and apply logistic regression for the classification of breast cancer.
2. Compare Logistic regression with other extant machine learning classification models on the same data set.
3. Performance analysis and conclusion.
1.4 Limitation of study
This paper is restricted to the study of logistic regression for the classification of breast cancer using Wisconsin Breast Cancer Dataset (WBCD) from UCI machine learning online repository. Performance of this model is measured using precision score, recall score and f1-score only.
4
1.5 Paper organization
This paper is broken into five chapters. Chapter one introduces the research essence, aim, and objective, in chapter two, a literature review of previous works related to this research work are discussed. Chapter three is all about the materials used and the methodology employed. In chapter four, performance analyses and discussion are done, finally, in chapter five, there is a summary of the work, conclusion, and recommendation for future work.
5

 

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.