Instant Download

Download your project material immediately after online payment.

Project File Details


3,000.00

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size: 2,099KB

Number of Pages:85

ABSTRACT

Cyber threats are currently on the rise, which has caused individuals, industrial control systems (ICSs), critical infrastructures (CIs), and nations to be subjected to attacks with great losses. Among the cyber threats used for these attacks is the advanced persistent threat (APT) which tends to use highly sophisticated tools to attack targeted organizations or a nation’s critical infrastructure. The capabilities of big data can be leveraged in conducting advanced analytics by gathering intelligence from potential security events and network activities to make timely reports and predictions of intrusions. In this work, big data technology is proposed; a Hadoop Ecosystem was integrated to a honeypot to collect massive data from network activities and attackers’ behaviour for forensics. A decision tree classification algorithm was built in modelling a predictive model for network intrusion detection. An accuracy of 92.46% was recorded, showing its capability of giving low false positive alarm rates.
Keywords: Cyber threat, Cyberattacks, Big data, Honeypot, Hadoop Ecosystem, Predictive model for network intrusion detection

 

 

TABLE OF CONTENTS

 

ERTIFICATION …………………………………………………………………………………………………….. ii
ABSTRACT …………………………………………………………………………………………………….. v
ACKNOWLEDGEMENT ………………………………………………………………………………………………. vi
DEDICATION …………………………………………………………………………………………………… vii
LIST OF FIGURES ……………………………………………………………………………………………………. xi
LIST OF TABLES …………………………………………………………………………………………………… xii
LIST OF APPENDICES …………………………………………………………………………………………….. xiii
LIST OF ABBREVIATIONS AND ACRONYMS ……………………………………………………………… xiv
CHAPTER ONE INTRODUCTION ……………………………………………………………………………. 1
1.1 Background …………………………………………………………………………………………….. 1
1.2 Problem Statement ………………………………………………………………………………….. 3
1.3 Research Objectives ………………………………………………………………………………… 4
1.4 Technologies for Implementation ……………………………………………………………….. 5
1.5 Scope of the Work …………………………………………………………………………………… 5
1.6 Document Road Map ……………………………………………………………………………….. 5
CHAPTER TWO LITERATURE REVIEW …………………………………………………………………… 7
2.1 Cyber threats ………………………………………………………………………………………….. 7
2.1.1 Cyber threat Terminologies ……………………………………………………………………….. 7
2.1.2 Cyber Threat Categories …………………………………………………………………………… 9
2.2 The Stages/Phases of an Advanced Attack ……………………………………………….. 12
2.2.1 Reconnaissance Phase …………………………………………………………………………….13
2.2.2 Weaponization Phase ……………………………………………………………………………….13
2.2.3 Delivery Phase ………………………………………………………………………………………..13
2.2.4 Exploitation Phase ……………………………………………………………………………………14
2.2.5 Installation Phase …………………………………………………………………………………….14
2.2.6 Command and Control (C&C) Phase …………………………………………………………..14
2.2.7 Action on Objectives Phase ……………………………………………………………………….14
2.3 Emerging Data Techniques and Advanced Analytics …………………………………… 15
2.3.1 Big Data as Emerging Data Technique ………………………………………………………..15
2.3.2 Big Data Analytics (BDA) …………………………………………………………………………..18
ix
2.3.3 Machine Learning (ML) ……………………………………………………………………………..18
2.4 Combating Cyber threats ………………………………………………………………………… 19
2.4.1 Types of Cyber Defence ……………………………………………………………………………19
2.4.2 Honeypots/Honeynets ………………………………………………………………………………20
2.4.3 Intrusion Detection Systems ………………………………………………………………………23
2.4.4 Classifications of Intrusion Detection Systems (IDS) ……………………………………..23
2.5 Review of State of the Art ……………………………………………………………………….. 26
CHAPTER THREE RESEARCH DESIGN AND IMPLEMENTATION …………………………………31
3.1 The Big Data Platform …………………………………………………………………………….. 31
3.1.1 The Apache Hadoop …………………………………………………………………………………32
3.2 Apache Spark ……………………………………………………………………………………….. 38
3.3 The Predictive Model for Intrusion Detection ………………………………………………. 40
3.3.1 Performance Metrics of the Trained Model …………………………………………………..41
Table 3.1: Categories of the Predicted Data Points ………………………………………………………….41
Table 3.2: Trained Classifier Model Evaluation Metrics Definitions……………………………………..42
3.4 Dataset for the Detection Algorithm ………………………………………………………….. 42
3.5 Big Data Architecture for the Security Framework ……………………………………….. 43
3.5.1 Set-up of Hadoop and Other Components ……………………………………………………45
3.5.2 Experimental Set Up of the Honeypot: HoneyDrive ……………………………………….46
CHAPTER FOUR EXPERIMENTAL RESULTS AND EVALUATION ……………………………….47
4.1 Findings from the Honeynet …………………………………………………………………….. 47
4.2 Evaluation of the Intrusion Classification Model ………………………………………….. 48
Table 4.1: Sub Categories of Attacks in the Dataset ………………………………………………………..49
Table 4.2: Features Generated in the Decision Tree Model ………………………………………………50
Table 4.3: Brief Statistics of the Dataset used for Training and Testing ……………………………….50
Table 4.4: Summary of Evaluation Metrics ……………………………………………………………………..51
Table 4.5: Confusion Matrix for the Predicted Connection …………………………………………………51
CHAPTER FIVE LIMITATIONS, CONCLUSION AND FUTURE RESEARCH ………………….53
5.1 Limitations…………………………………………………………………………………………….. 53
5.2 Summary and Conclusion ……………………………………………………………………….. 53
5.3 Future Research ……………………………………………………………………………………. 54
x
APPENDICES 55
Appendix A: Screen Capture of the Configured Standalone Hadoop Ecosystem ………………….55
Appendix B: Overview of the Installed Honeypot: Honeydrive ……………………………………………56
Appendix C: KDD Cup 1999 Dataset Features ……………………………………………………………….57
Appendix D: Learned Classification Tree Model ………………………………………………………………60
Appendix E: Pyspark Code for the Analysis and Classifier Model ………………………………………61
REFERENCES ……………………………………………………………………………………………………64
xi

 

CHAPTER ONE

 

INTRODUCTION
1.1 Background
In recent years, society has become dependent on computers and computer networks. We are getting more connected with ubiquitous technologies than ever before (Koutský, 2014; Mohsen et al., 2017; RSA, n.d.). Consequently, securing the systems and networks on which we are dependent becomes increasingly important for individuals’ safety, economic security, and national defence as this is now unavoidable (Tech Georgia, 2016).
With digitization, most critical infrastructures (CI) like the financial sector, energy supply, government services and healthcare depend on information technology networks for daily operations and activities. Society invariably depends on these infrastructures (Brasso, 2016). By definition, “critical infrastructure is a complex system of components that ensure transport, safety, health, communication, production, and any other activities necessary for a nation’s needs” (Wang & Alexander, 2015).
Due to these dependencies, when critical infrastructures are disrupted with its concomitant downtime, it affects activities and the well-being of the users socially and economically. This may affect a nation entirely (European Commission, 2013; Grottke, Sun, Fricks & Trivedi, 2008). Attackers or adversaries carefully target critical infrastructures after exploring its vulnerabilities and infiltrating the control systems and are indeed ready to incur greater costs and time to gain expertise in order to accomplish their goals (Hosburgh, 2016; Virvilis, Serrano & Dandurand, 2014).
2
The terms cyberattack and cyber threat are sometimes used. Cyber threats have the capability of damaging and gaining unauthorised access to computers, computer networks and information systems (Gloag, n.d.). Wang & Alexander (2015) mentioned that “cyber threats include targeted attacks, malware, spam, system privilege abuse, classified information leakage, vulnerabilities exposed by poor maintenance, user indiscretions (unintentional information leaking), and web defacements (misinformation/discredit), etc.”
Statistics have shown that these infrastructures have been witnessing an alarming increase in the number of attacks (Mainone Cable, 2017; PandaLabs, 2016) and attack scenarios are also varying. PandaLabs (2016) reported that about 18 million new types of malware were recorded in the third quarter of 2016. Between April 2016 and March 2017, the number of ransomware victims increased by 11% compared to the previous twelve months (April 2015-March 2016). This means about 2,315,931 to 2,581,026 users around the world have fallen victim to these attacks (Kaspersky Lab, 2017).
Emerging threats are coming up yearly, among them are distributed denial of service (DDoS), advanced persistent threat (APT), ransomware, social engineering attacks while there has been a high increase in others, like adware, phishing attacks and Trojans (Boehmer, 2014; FBI, n.d.; Michael, 2017; US Government, n.d.). In fact, in 2014 Virvilis saw that the astuteness, complexity and number of cyber threats and cyberattacks have increased steadily over recent years.
Ciaran Martin warned that “cyber threats will continue to evolve, which is why the countries must work together at the pace to deliver hard outcomes and ground-breaking innovation to reduce the cyber threat to critical services and deter would-be attackers,” (National Crime Agency (NCA), 2017).
3
This implies that there are no fixed ways of mitigating some of these attacks as the trend is constantly evolving in scale and sophistication (Ernst & Young, 2015; Javaid et al., 2016).
Steve Langan reported that in 2016 “cybercrime cost the global economy over $450 billion, over 2 billion personal records were stolen and in the U.S. alone over 100 million Americans had their medical records stolen” (Graham, 2017). A report has shown that the US Government alone will be investing over $19 billion for cyber defence and security in the 2017 fiscal year’s budget (Steve, 2016).
The emergence of big data platform and machine learning techniques have provided a good move in knowledge discovery and data science that can be leveraged in tackling these cyber threats. “Big data analytics is defined as enabling organizations to discover previously unseen patterns and to develop actionable insights about their businesses and environments, including cyber defence. Cyber analytics applies big data tools and techniques to capture, process and refine network activity data, applies algorithms for near-real-time review of every network node and employs visualization tools to easily identify anomalous behaviour required for fast response or investigation” (Ponemon Institute, 2013).
Having seen an overview of the dangers of cyber threats, this thesis examines cyber threats and finds a novel solution to combat cyberattacks, leveraging emerging data techniques and advanced analytics.
1.2 Problem Statement
Critical infrastructures are currently experiencing a high rate of cyberattacks using highly sophisticated techniques.
4
A lot of effort has been employed in cyber defence to combat cyber threats yet, with the emergence of new threatscapes, such controls and traditional tools are circumvented by crafty attackers. Ernst & Young (2015) reported that the security advancement in the industry has not maintained the pace with today’s diverse set of threat actors. This now leads to a research question: How can security professionals combat these cyber threats leveraging the capabilities of big data in analysing enormous and large data sets from disparate data sources (potential security events)?
This problem has spurred research which seeks to use emerging data techniques and advanced analytics to combat cyber threats (advanced persistent threat).
1.3 Research Objectives
1. The basic objective of this research work is to combat cyber threats which are used in well-orchestrated cyberattacks. This work will be achieved by the goals outlined below:
2. State-of-the-art research efforts conducted in the use of big data analytics and advanced prediction in investigating cyber threats for intrusion detection.
3. Use big data analytics to significantly enhance the detection capabilities of defenders, enabling them to detect APT activities that are passing under the radar of traditional security solutions.
4. Leverage big data methods and explore new detection algorithms capable of processing significant amounts of data from diverse data sources.
5. Generate a predictive analytical model for the prediction of cyberattacks.
5
1.4 Technologies for Implementation
1. In order to achieve this work, the following technologies or platforms were used:
2. Honeypot: used for active defence by deception, detection and network forensics.
3. Hadoop: used for dynamic data collection, consolidation and correlation of data from any number of diverse data sources, such as network traffic and event data (e.g., network devices, IDS).
4. Predictive analytics tools: machine learning algorithms for classification and prediction of network attacks.
5. Apache Spark: used to analyse streaming data for the classification (prediction) of intrusions.
1.5 Scope of the Work
The scope of the research work includes the following:
1. To integrate a honeypot system into Hadoop to capture data for advanced analytics, threat intelligence and forensics.
2. To develop an intrusion detection system that will predict and classify network intrusions and attacks with emphasis on the advanced persistent threat.
3. To explore existing technologies, tools and use KDD Cup 99 Datasets for training a supervised machine learning model.
1.6 Document Road Map
This master’s thesis report is organised into chapters. Chapter One is the introduction of the thesis report. Chapter Two is the review of related literature detailing the underlying and fundamental concepts of the research as well as the state-of-the-art advances made by different authors.
6
Chapter Three is the proposed methodology for the implementation, while Chapter Four is the implementation, experimentation/simulation and analysis of the results of the system. And finally, Chapter Five is the conclusion and future work discussion.

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.