Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size:  1,952KB

Number of Pages:84

 

ABSTRACT

 

Due to the growing demand for Cloud Computing services, the need and
importance of Distributed Systems cannot be underestimated. However,
it is dicult to use the traditional Message Passing Interface (MPI) ap-
proach to implement synchronization, coordination,and prevent deadlocks
in distributed systems. This diculty is lessened by the use of Apache’s
Hadoop/MapReduce and Zookeeper to provide Fault Tolerance in a Homo-
geneously Distributed Hardware/Software environment.
In this thesis, a mathematical model for the availability of the JobTracker in
Hadoop/MapReduce using Zookeeper’s Leader Election Service is examined.
Though the availability is less than what is expected in a k Fault Tolerance
system for higher values of the hardware failure rate, this approach makes
coordination and synchronization easy, reduces the eect of Crash failures,
and provides Fault Tolerance for distributed systems.
The availability model starts with a Markov state diagram for a general
case of N Zookeeper servers followed by specic cases of 3,4,and 5 servers.
Both software and hardware faults are considered in addition to the eect
of hardware and software repair rates. Comparisons show that, the system
availability changes with change in the number of Zookeeper servers, with 3
servers having the highest availability.
The model presented in this study can be used to decide on how many
servers are optimal for maximum availability and from which vendor they
must be purchased. It can also help determine what time to use a Zookeeper
coordinated Hadoop cluster to perform critical tasks.

 

TABLE OF CONTENTS

Declaration i
Acknowledgement ii
List of Tables vi
List of Figures viii
Abstract ix
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 4
2 Cloud Computing and Fault Tolerance 5
2.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Types of Clouds . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Virtualization in the Cloud . . . . . . . . . . . . . . . . . . . 7
2.3.1 Advantages of virtualization . . . . . . . . . . . . . . . 7
2.4 Fault, Error and Failure . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Faults Types . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Fault-tolerance Properties . . . . . . . . . . . . . . . . 9
2.5.2 K Fault Tolerant Systems . . . . . . . . . . . . . . . . 12
2.5.3 Hardware Fault Tolerance . . . . . . . . . . . . . . . . 13
2.5.4 Software Fault Tolerance . . . . . . . . . . . . . . . . 14
2.6 Properties of a Fault Tolerant Cloud . . . . . . . . . . . . . . 15
2.6.1 Availability . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 17
iii
3 Hadoop/MapReduce Architecture 18
3.1 Hadoop/MapReduce . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Hadoop/MapReduce versus other Systems . . . . . . . . . . . 21
3.3.1 Relational Database Management Systems (RDBMS) 21
3.3.2 Grid Computing . . . . . . . . . . . . . . . . . . . . . 22
3.3.3 Volunteer Computing . . . . . . . . . . . . . . . . . . 23
3.4 Features of MapReduce . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Automatic Parallelization and Distribution of Work . 23
3.4.2 Fault Tolerance in Hadoop/MapReduce . . . . . . . . 23
3.4.3 Cost Eciency . . . . . . . . . . . . . . . . . . . . . . 24
3.4.4 Simplicity . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Limitations of Hadoop/MapReduce . . . . . . . . . . . . . . 25
3.6 Apache’s ZooKeeper . . . . . . . . . . . . . . . . . . . . . . . 25
3.6.1 ZooKeeper Data Model . . . . . . . . . . . . . . . . . 26
3.6.2 Zookeeper Guarantees . . . . . . . . . . . . . . . . . . 27
3.6.3 Zookeeper Primitives . . . . . . . . . . . . . . . . . . . 28
3.6.4 Zookeeper Fault Tolerance . . . . . . . . . . . . . . . . 29
3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Availability Model 32
4.1 JobTracker Availability Model . . . . . . . . . . . . . . . . . . 32
4.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Model Assumptions . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Markov Model for a Multi-Host System . . . . . . . . . . . . 33
4.3.1 The Parameter s(t) . . . . . . . . . . . . . . . . . . . 35
4.4 Markov Model for a Three-Host (N = 3)
Hadoop/MapReduce Cluster Using
Zookeeper as Coordinating Service . . . . . . . . . . . . . . . 35
4.5 Numerical Solution to the System of
Dierential Equations . . . . . . . . . . . . . . . . . . . . . . 41
4.5.1 Interpretation of Availability plot of the JobTracker . 41
4.6 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . 44
4.6.1 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . 44
5 Conclusion and Future Work 51
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Appendix 53
Appendix A:Dierential Equations for Boundary Conditions . . . . 53
Appendix B:Dierential Equations for a Cluster of Four Servers
(N = 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
iv
Appendix C:Dierential Equations for a Cluster of Five Servers
(N = 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Appendix D: MATLAB solution to the (N = 3) System of Kol-
mogrov Dierential Equations . . . . . . . . . . . . . . . . . . 60
Appendix E: How to set up Hadoop/MapReduce on Ubuntu 11:04 61
Single Node Hadoop Cluster . . . . . . . . . . . . . . . . . . . 61
Multi-Node (N = 3) Hadoop Cluster . . . . . . . . . . . . . . 64
Submitting Word Count job to the Hadoop/MapReduce cluster 66
Appendix F:How to Install and Run Zookeeper on Ubuntu 11.04 . 68
Deploying Zookeeper Ensemble on a Single Machine . . . . . 68
Deploying Zookeeper Ensemble across a Network . . . . . . . 70
References 72

 

 

CHAPTER ONE

 

Introduction
The eectiveness of most modern information (data) processing involves
the ability to process huge datasets in parallel to meet stringent time con-
straints and organizational needs. A major challenge facing organizations
today is the ability to organize and process large data generated by cus-
tomers. According to Nielson Online[1] there are more than 1,733,993,741
internet users. How much data these users are generating and how it is pro-
cessed largely determines the success of the organization concerned. Con-
sider the social networking site Facebook; as at August 2011, it has over
750 million active users[2] who spend 700 billion minutes per month on the
network. They install over 20 million applications every day and interact
with 30 billion pieces of content (web links, news stories, blog posts, notes,
photo albums, etc.) each month. Since April 2010 when social plugins were
launched, an average of 10,000 new websites has integrated with Facebook.
The amount of data generated in Facebook is estimated as follows [3]:
12 TB of compressed data added per day
800 TB of compressed data scanned per day
25,000 map-reduce jobs per day
65 million les in HDFS
30,000 simultaneous clients to the HDFS NameNode
It was a similar demand to process large datasets in Google that inspired
Engineers in Google to introduce MapReduce [4]. At Google MapReduce is
used to build Index for Google Search, Article clustering for Google News
and perform Statistical machine translations. At Yahoo!, it is used to build
Index for Yahoo! Search and spam detection. And at Facebook, MapReduce
is used for Data mining, Ad optimization, and Spam detection [5]. MapRe-
duce is designed to use commodity nodes (runs on cheaper machines) that
can fail at any time. Its performance does not reduce signicantly due to
1
Figure 1.1: Data growth at Facebook [3]
network latency. It exhibits high fault tolerance and is easy to use by pro-
grammers who have no prior experience in parallel programming. Apaches
Hadoop[6] is an open source implementation of Googles MapReduce. It
is made up of MapReduce and Hadoop Distributed File System (HDFS).
A client submits a Job to the Master node. The Master node consists of
the NameNode and JobTracker daemonds running on a single machine or
dierent machines depending on the size of the cluster. The JobTracker
distributes the client job to selected slave machines that are running the
TaskTracker and DataNode daemonds. Each slave node must periodically
send a heartbeat signal to the JobTracker machine. If a Master does not
receive a heartbeat signal from a slave, it assumes the slave is down and
must consequently re-schedule the task assigned to the dead slave node to
another node that is idle. Hadoop/MapReduce makes it possible to pro-
cess huge datasets in time as compared to other platforms such as Database
Management Systems. This is due to the fact that slow improvements in
drive seek-time, makes applications that need to analyze a whole dataset for
batch processing experience high latency in DBMS.
1.1 Problem Statement
The availability of cloud computing services can be enhanced if proper Fault
Tolerance mechanisms are implemented in the Data Centers. Cloud reliance
problems can cause serious consequences for both the provider and customers
when time is money. Every year, many cloud service providers battle with
service outages (Table1.1). A major concern is how to minimize service
2
Table 1.1: Outages in dierent cloud services[7]
down-time for a cloud provider such as Amazon or Facebook that have thou-
sands of clients connected at any given point in time. Hadoop/MapReduce
was developed to achieve maximum performance, high fault tolerance, avail-
ability and transparency as much as possible. However, these objectives can
be elusive if the following issues are left unattended:
1. Hadoop/MapReduce is currently implemented as Master-Slave archi-
tecture; this makes both the Hadoop Distributed File System Master
node (NameNode) and the MapReduce Master Node (Job Tracker)
single point of failures. The failure of a Slave node (DataNode or Task-
Tracker) does not pose serious challenge since the Master node simple
re-assigns tasks that were to be processed by the failed node to another
node. This implies that, the failure of either the JobTracker or the Na-
meNode makes the service unavailable until they are up and running
again. However, Hadoop provides a standby NameNode implemen-
tation called the AvatarNode, which can be implemented as Primary
avatar or Secondary avatar. For instance, during failover, Primary
AvatarNode on machine M1 is killed and the Standby AvatarNode on
machine M2 is instructed to assume a Primary avatar status. This
is practically instantaneous, making the recovery a matter of a few
seconds.
However, failure of the JobTracker machine on the other hand, makes
the service unavailable until it is restarted. Between the time of failure
and restart, clients must be made to wait, which is undesirable.
3
2. How available is the solution that is proposed for problem 1 above.
Our concern then is how to make the cluster available when the active Job-
Tracker goes down and to also determine mathematically how much avail-
ability the JobTracker has. This is needed to avoid unnecessary down-time.
1.2 Objectives
The major objective of this Thesis is to determine the availability of a
proposed automatic fail-over(recovery) mechanism to address the issue of
the Hadoop/MapReduce JobTracker being a single point of failure. This
implementation is based on the Leader Election Framework mentioned in
Zookeeper [8]. That is:
Providing automatic failover mechanism for the JobTracker .
Maintaining only one active JobTracker in the cluster.
Letting only the active JobTracker serve JobClients and TaskTrackers.
Facilitating redirection for JobClient and TaskTracker to the new ac-
tive JobTracker.
Determining the Availability of the JobTracker.
Determining how sensitive the Availability of the JobTracker is to
changes in model parameters.
1.3 Thesis Organization
This Thesis is organized as follows: Chapter One introduces the topic un-
der discussion and denes the problem at stake. It also tries to clarify
what this thesis aims to achieve. Chapter Two starts with literature re-
view on concepts of Cloud Computing and Fault Tolerance. These are areas
where Availability is vital for high performance. Chapter Three introduces
Hadoop/MapReduce and Zookeeper which are used to implement the pro-
posed cluster. Chapter Four is a mathematical model of the cluster proposed
in Chapter Three. The model is aimed at determining how available the clus-
ter is. The work is concluded in Chapter Five and proposed Future areas of
interest given.
4

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.