WeCreativez WhatsApp Support
Frank Umeadi (Online - WhatsApp)
Welcome to PROJECTS.ng! My name is Frank Umeadi. I am online and ready to help you via WhatsApp chat. Let me know if you need my assistance.

Download the complete computer science project topic and material (chapter 1-5) titled A Clustering Based Web Prefetching In High Traffic Environment here on PROJECTS.ng. See below for the abstract, table of contents, list of figures, list of tables, list of appendices, list of abbreviations and chapter one. Click the DOWNLOAD NOW button to get the complete project work instantly.

3,000.00

wwsFrank Umeadi / AdminDo You Need Help on this Material? Chat Us on WhatsApp Now!
Project Topics and Materials

Download the complete computer science project topic and material (chapter 1-5) titled A Clustering Based Web Prefetching In High Traffic Environment here on PROJECTS.ng. See below for the abstract, table of contents, list of figures, list of tables, list of appendices, list of abbreviations and chapter one. Click the DOWNLOAD NOW button to get the complete project work instantly.

 

PROJECT TOPIC AND MATERIAL ON A CLUSTERING BASED WEB PREFETCHING IN HIGH TRAFFIC ENVIRONMENT

The File Details

  • Name: A Clustering Based Web Prefetching In High Traffic Environment
  • Type: PDF and MS Word (DOC)
  • Size: 912KB
  • Length: 68 Pages

 

ABSTRACT

The continued increase in demand for objects on the Internet causes high web traffic and consequently low user response time which is one of the major bottleneck in the network world. Increase in bandwidth is a possible solution to the problem but it involves increasing economic cost. An alternative solution is web prefetching. Web prefetching is the process of predicting and fetching web pages in advance by proxy server before a request is sent by a user. Prefetching is performed during the server idle time. Most literature based on the classical prefetch algorithm assumes that the server idle time is large enough to prefetch all user’s predicted requests which is not true in a real life situation. This research aims at improving the web prefetching technique by developing a prefetching technique that can be effective in a high traffic environment when the server idle time is very low.Log files were collected and preprocessed for several client group within a domain. The preprocessed log files were used to create web navigation graph, which shows the transition from one web page to another web page.Support and confidence threshold were used to remove web pages with values less than the threshold values. Several clusters were formed in a particular client group. When the prefetch time is predicted to be too small to prefetch, the entire clusters formed from various domains will be used to create a prioritized cluster based on several user request. The model was evaluated based on hit rate, byte rate, precision, accuracy of prediction and usefulness of prediction. The result shows that the proposed WebClustering algorithm performs better than the classical prefetch technique when the server idle time is small and behaves same as the classical algorithm as the server time becomes large enough to prefetch all users predictions.

TABLE OF CONTENTS

TABLE OF CONTENTS
TITLE …………………………………………………………………………………………………………………………… i
DECLARATION…………………………………………………………………………………………………………… ii
CERTIFICATION ……………………………………………………………………………………………………….. iii
DEDICATION……………………………………………………………………………………………………………… iv
ACKNOWLEDGEMENT ……………………………………………………………………………………………… v
ABSTRACT …………………………………………………………………………………………………………………. vi
LIST OF FIGURES ………………………………………………………………………………………………………. x
LIST OF TABLES ……………………………………………………………………………………………………….. xi
CHAPTER ONE …………………………………………………………………………………………………………… 1
INTRODUCTION…………………………………………………………………………………………………………. 1
1.1 Background of Study ………………………………………………………………………………………….. 1
1.2 Problem Statement …………………………………………………………………………………………….. 3
1.3 Motivation ………………………………………………………………………………………………………… 4
1.4 Aim and Objectives ……………………………………………………………………………………………. 4
1.5 Research Method ……………………………………………………………………………………………….. 5
1.6 Organization of Dissertation………………………………….………………………….5
CHAPTER TWO…………………………………………………………………………..…….6
LITERATURE REVIEW ……………………………………………………………………………………………… 7
2.1 Introduction ………………………………………………………………………………………………………. 7
2.2 Web Caching …………………………………………………………………………………………………….. 7
2.3 Types of Web Cache ………………………………………………………………………………………….. 7
2.3.1 Client Side Cache ………………………………………………………………………………………… 8
2.3.2 Proxy Server Cache……………………………………………………………………………………… 8
2.3.3 Origin Server Cache …………………………………………………………………………………….. 9
2.4 Cache Replacement Policy ………………………………………………………………………………….. 9
2.5 Proxy Caching …………………………………………………………………………………………………… 9
2.5.1 Forward Proxy Caching ……………………………………………………………………………… 10
2.5.2 Reverse Proxy Caching ………………………………………………………………………………. 10
2.5.3 Transparent Caching ………………………………………………………………………………….. 11
2.6 Web Prefetching ………………………………………………………………………………………………. 11
2.6.1 Short Term Prefetching ………………………………………………………………………………. 13
2.6.2 Long Term Prefetching ………………………………………………………………………………. 14
2.7 Related Works …………………………………………………………………………………………………. 14
2.8 Literature Gap and Contribution of This Work …………………………………………………….. 23
CHAPTER THREE …………………………………………………………………………………………………….. 25
HIGH TRAFFIC WEB PREFETCHING MODEL ………………………………………………………. 25
3.1 Introduction ……………………………………………………………………………………………………. 25
3.2 The Prefetching Model Architecture …………………………………………………………………… 25
3.3 Preprocessing of Proxy Access Log Files ……………………………………………………………. 28
3.3.1 Data Collection …………………………………………………………………………………………. 28
3.3.2 Data Cleaning……………………………………………………………………………………………. 29
3.3.3 User & Session Identification ……………………………………………………………………… 30
3.4 Clustering Of Preprocessed Log Files …………………………………………………………………. 30
3.4.1 Web Navigation Graph (WNG) …………………………………………………………………… 31
3.7 Inter Clustering ………………………………………………………………………………………………… 34
CHAPTER FOUR ……………………………………………………………………………………………………….. 39
IMPLEMENTATION AND RESULT ………………………………………………………………………….. 39
4.1 Introduction ……………………………………………………………………………………………………. 39
4.2 Implementation Details …………………………………………………………………………………….. 39
4.2.1 Programming Language ……………………………………………………………………………… 39
4.2.2 Squid Proxy Server ……………………………………………………………………………………. 39
4.2.3 Dataset……………………………………………………………………………………………………… 40
4.3 System Specification ………………………………………………………………………………………… 41
4.4 Implementation Result ……………………………………………………………………………………… 41
4.4.1 System Model ………………………………………………………………………………………………. 41
4.5 Performance Evaluation Criteria ………………………………………………………………………… 41
4.6 Discussion of Results ……………………………………………………………………………………….. 42
4.7.1 Accuracy of Prediction ………………………………………………………………………………. 42
4.7.2 Usefulness of Prediction …………………………………………………………………………….. 44
4.7.3 Precision …………………………………………………………………………………………………… 45
4.7.4 Hit Ratio …………………………………………………………………………………………………… 46
4.7.5 Byte Ratio ………………………………………………………………………………………………… 47
4.7 Summary of The Result…………………………………………………………………………………….. 48
CHAPTER FIVE ………………………………………………………………………………………………………… 49
SUMMARY, CONCLUSION AND RECOMMENDATIONS ………………………………………. 49
5.1 Summary ………………………………………………………………………………………………………… 49
5.2 Conclusion ………………………………………………………………………………………………………. 49
5.3 Recommendations ……………………………………………………………………………………………. 49
REFERENCES ……………………………………………………………………………………………………………. 51
LIST OF FIGURES
Figure 2.1: Web Prefetching and Caching Environment ………………………………………………………. 7
Figure 2.2: Block Diagram for Clustering, prefetching and caching mechanism …………………… 21
Figure 3.1: A web prefetching and caching environment ……………………………………………………. 26
Figure 3.2: Sample Access log file ………………………………………………………………………………….. 29
Figure 3.3: Cleaned access log file ………………………………………………………………………………….. 30
Figure 3.4: Client navigation and the corresponding web navigation graph ………………………….. 32
Figure 3.5: Clustering of web objects in a client group ………………………………………………………. 34
Figure 3.6: Prefetching in high traffic environment …………………………………………………………… 37
Figure 3.7: Prioritized Prefetching……………………………………………………………….38
Figure 4.1 Graph of Accuracy of Prediction at different time interval ………………………………….. 43
Figure 4.2: Graph for usefulness of prediction at different time interval ………………………………. 44
Figure 4.3: Graph of Precision at different time interval …………………………………………………….. 46
Figure 4.4: Graph of Precision at Different time interval ……………………………………………………. 47
Figure 4.5: Graph of Byte Ratio at different time interval ………………………………………………….. 48
LIST OF TABLES
Table 4.1 Dataset Table …………………………………………………………………………………………………. 40
Table 4.3 Usefulness of Prediction ………………………………………………………………………………….. 44
Table 4.4: Precision ………………………………………………………………………………………………………. 45
Table 4.5: Hit Ratio……………………………………………………………………………………………………….. 46
Table 4.6: Byte Ratio …………………………………………………………………………………………………….. 47
ACCRONYMS WNG:Web Navigation Graph BFS:Breadth First Search LRU:Least Recently Used LFU:Least Frequently Used ISP:Internet Service Provider HTTP: Hypertext Transfer Protocol HTML:Hypertext Mail Language DG:Dependency Graph ART:Adaptive Resonance Theory NASA:National Aeronautics and Space Administration IPGDSF#:Intelligent Predictive Greedy Dual Size Frequency ANN:Artificial Neural Network PSO:Particle Swam Optimization XML:Extendible Markup Language SVM:Support Vector Machine PPM:Prediction by Partial Matching URL:Uniform Resource Locator FIFO:First Come First Serve GIF:Graphics Interchange Format
JPEG: Joint Photographic Experts Group
CPU:Central Processing Unit CLR:Common Language Runtime
HR:Hit Ratio

CHAPTER 1

INTRODUCTION

1.1 Background of Study
The web is a collection of text documents and other resources, linked by hyperlinks and Uniform Resource Locator (URLs), usually accessed by web browsers, from web servers. The web started from a simple information sharing system, and has now grown to a rich collection of dynamic and interactive services. The tremendous growth of web has resulted into high demand for high bandwidth and delay in fetching user request (Neha, 2013). Users sometimes experience unpredictable delay while retrieving web pages from the server. Increase in bandwidth is a possible solution to the problem but it involves high economic cost. Web caching reduces the latency perceived by the user, reduces bandwidth utilization and reduces the loads on the origin servers (Pallis, 2007). Latency refers to the time elapsed from the time a request is sent to the time sender receives the requested information.
Many latency tolerant techniques have been developed over the years to solve this problem without necessarily increasing the bandwidth. Most notably are caching and prefetching. Web prefetching helps to fetch and cache users request during server idle time, which will reduce the load on the origin server. To reduce the access delay experienced by users, it is advisable to predict and prefetch web object based on user access patterns and cache them. Studies on web pre-fetching are mostly based on the history of user access patterns. If the history information shows an access pattern of URL address A followed B with a high probability, then B will be prefetched once A is accessed (Cheng-Zhong, 2000).
Web prefetching is the process of obtaining web pages in advance by proxy server before a request is sent by a user. When a client makes a request for web object, rather than sending request to the web server, it may be fetched from the cache. The main factor for selecting a web pre-fetching algorithm is its ability to predict the web object to be prefetched in order to reduce latency. Web prefetching exploits the spatial locality of web pages, i.e. pages that are linked with current page will be accessed with higher probability than other pages. Web prefetching can be applied in a web environment as between clients and web server, between proxy servers and web server and between clients and proxy server (Greeshma, 2012).
Web prefetching techniques are categorized into probability based and clustering based using weight-functions. In the probability based pre-fetching, probabilities are calculated using the history of data access. This method assumes that the request sequence follows a pattern and calculates the probabilities of following this pattern. Clustering based pre-fetching methods make decisions using the information of the web pages that have been fetched previously, assumes that pages that are close to the previously fetched pages are more likely to be requested in the near future (Greeshma, 2012). Moreover, web prefetching is a research topic that has gained increasing attention in recent years. The web pre-fetching fetches some web objects before users actually request it. Thus, the cache pre-fetching helps on reducing the user perceived latency. Many studies have shown that the combination of caching and pre-fetching doubles the performance compared to single caching (Waleed, 2012).

Web caching is a well-known strategy for improving performance of Web based system by keeping Web objects that are likely to be used in the near future in location closer to user. The Web caching mechanisms are implemented at three levels client level, proxy level and original server level. Significantly, proxy servers play the key roles between users and web sites to reduce of the response time of user requests and saving of network bandwidth. Therefore, for achieving better response time, an efficient caching approach should be built in a proxy server (Waleed, 2011). Due to the limitation of cache space, an intelligent mechanism is required to manage the Web cache content efficiently. The classical caching policies are not efficient in the Web caching since it considers either recency, frequency, sizeand ignore a combination of two factors that have impact on the efficiency of the Web caching. Unfortunately, the cache hit ratio is not improved much with classical caching schemes. Even though with a cache of infinite size, the hit ratio is still limited regardless of the caching scheme. This is because most people browse and explore the new web pages trying to find new information. In order to improve the hit ratio of cache, Web pre-fetching technique is integrated with web caching to overcome these limitations.
Knowing the user’s browsing history provides extra information like the type of the user or his/her preferences. This information about the user can help to improve prediction accuracy in pre-fetching process (Lenka, 2010).

1.2 Problem Statement
As the Internet continues to grow in size and popularity, web traffic and network bottlenecks are major issues in the network world. The continued increase in demand for objects on the Internet causes severe traffic and low idle time to prefetch all clusters generated from users’ request. Clustering based prefetching has been explored in several ways all assumed the server idle time for prefetching is large enough to accommodate the prefetching. In a real scenario, this is not always the case since in high traffic, the idle time may not be so high to accommodate prefetching of large size data. This work therefore seeks to address this lack of consideration of volume of high traffic during the prefetching.

1.3 Motivation
Internet users expect the web to be more friendly and meaningful with reduced network traffic. Every user needs the channel with high bandwidth and low traffic. In order to reduce the web server load, the access latency and to improve the network bandwidth from heavy network traffic, a web prefetching scheme taking low bandwidth during high traffic is considered.

1.4 Aim and Objectives
The aim of this research is to improve the web prefetching technique, by developing a prefetching technique that can be effective in a high traffic environment when the server idle time is very low. The specific objectives are to:
a) predict user request based on history of user
b) determine which pages will be requested by majority of users in the nearest future.
c) prioritize the prefetching based on the frequency of the server idle time
d) evaluate the algorithm in respect to existing prefetching algorithm

1.5 Research Method
In order to meet the objectives of this work, the following steps will be taken in the proposed inter clustering scheme:
a) Review of existing literature in the field of study.
b) Log files of users request will be collected using squid proxy server. The log files will pass through stages of cleaning processes for the removal of irrelevant information, user identification will be created for the size of pages made by users during a visit to a particular site.
c) The preprocessed log file will be used to construct a weighted Web Navigation Graph (WNG). The node of the graph represent the web pages while its edges represent the movement from one web page to another. The edges are assigned weights based on the frequency of visiting a page. Support and confidence threshold will be applied on the WNG to eliminate pages with low support and confidence value.
d) The graph will be transversed using Breadth First Search (BFS) algorithm to form several clusters within a domain.
e) In high traffic environment, clusters will be formed in favour of the requested web object by setting the support and confidence values to accommodate the requested web object from several domain. An inter domain cluster will be reconstructed from the several clusters.
f) C# will be used to implement the algorithm.
g) The proposed technique will be compared with that of Thulaseet al. (2014) based on hit ratio, byte ratio, usefulness of prediction, accuracy of prediction and precision.
1.6 Organization of Dissertation The rest of the work is organized asfollows:Chapter 2 is the literature review, the proposed web prefetching scheme is discussed in chapter 3, chapter 4 entails the result and analysis and chapter 5 concludes, summarizes and recommend the future works.

 

See other computer science project topics and materials



HOW TO GET THIS COMPLETE MATERIAL titled - A Clustering Based Web Prefetching In High Traffic Environment


If it's a free material, kindly click the DOWNLOAD NOW button at the top-right of the page (please, scroll up).

DOWNLOAD NOW


If it's a paid material, please, click any of the download now buttons at the bottom here or the top-right of this page to proceed to the checkout page where you can make your payment online with a debit card/ATM card (Instant Download - No Delay). Once your payment is completed, you can now instantly download your complete material from your account's download page or from the order email that was sent to your inbox.

You can also make a bank deposit or a direct bank transfer to our bank account with the following details:
Bank: Guaranty Trust Bank (GTBank)
Account Name: FraNKAPPWeb Technologies
Account Number: 0223901288.

After your payment, you need to send us your details which include depositor's full name, amount paid, date of payment, email address, phone number and project topic/material's title to hello@projects.ng or +2348157165603 for confirmation. Once your payment is confirmed, we will send the complete material to you via email immediately. Please, see Payments Instructions for more information. Thank you for making your best choice. Your happiness is our logo.

Shhh... Don't make a noise! Quietly subscribe to our newsletter today to get all the latest school news, free project topics and free past questions and answers. Remember, don't tell anyone because it's a secret.