Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size: 1,374 KB

Number of Pages:68

 

ABSTRACT

 

In recent times, the rate of growth in information available on the internet has resulted in large amounts of data and an increase in online users. The Recommendation System has been employed to empower users to make informed and accurate decisions from the vast abundance of information. In this Research, we propose a hybrid recommender engine which combines Content-Based and Collaborative filtering recommendations. This seeks to explore how prediction accuracy can be enhanced in existing collaborative filtering frameworks.
We investigate to see if a Recommendation System combining Content-based and Collaborative filtering, using a Mahout Framework and built on Hadoop will improve recommendation accuracy and also alleviate scalability issues currently experienced in processing large volumes of data for recommending items to users.
We employed the Feature augmentation hybrid technique where the output from the Content-based recommendation is used as an input to Collaborative filtering. The well-known MovieLens data was matched with the Internet Movie Database (IMDB) in order to extract user and item content features. The input files generated from the integration of both databases was converted to text files which serve as an input into the Collaborative filtering framework in Mahout.
By means of various experiments, the best parameter optimization for Mahout Components was determined for our model. We further examined these models by comparing the Root Mean Square Error of our model against the state of art model.
The proposed model showed significant improvement when compared with the pure collaborative model. It was demonstrated from our analysis that the extracted user and items content features can, in some cases, lead to a better prediction accuracy. To be more precise, it was discovered that the user feature, gender, has no marginal impact on our underlying model while an item feature like Country is more beneficial than genre, contrary to findings in some other research work.

 

TABLE OF CONTENTS

CERTIFICATION ……………………………………………………………………………………………… ii ABSTRACT ………………………………………………………………………………………………………. iii ACKNOWLEDGEMENT ………………………………………………………………………………….. iv
DEDICATION……………………………………………………………………………………………………. v
TABLE OF CONTENTS …………………………………………………………………………………… vi
LIST OF ABBREVIATIONS …………………………………………………………………………….. ix
LIST OF FIGURES ……………………………………………………………………………………………. x
LIST OF TABLES …………………………………………………………………………………………….. xi
CHAPTER ONE ………………………………………………………………………………………………… 1
INTRODUCTION………………………………………………………………………………………………. 1
1.1 BACKGROUND OF THE STUDY ……………………………………………………………. 1
1.2 PROBLEM STATEMENT ……………………………………………………………………….. 2
1.3 AIM AND OBJECTIVES …………………………………………………………………………. 3
1.4 SIGNIFICANCE OF THE STUDY ………………………………………………………….. 4
1.6 SYNOPSIS ……………………………………………………………………………………………….. 4
LITERATURE REVIEW …………………………………………………………………………………… 5
2.1 INFORMATION RETRIEVAL AND FILTERING …………………………………… 5
2.2 RECOMMENDER SYSTEM TYPES AND TECHNIQUES ……………………… 6
2.2.1 ENTITIES IN RECOMMENDATION SYSTEMS …………………………………….. 6
2.2.2 COLLABORATIVE FILTERING (CF)…………………………………………………….. 9
2.2.3 CONTENT-BASED RECOMMENDATION (CBR) ………………………………… 10
2.2.3.1 THE STRENGTH AND WEAKNESS OF CONTENT-BASED RECOMMENDATION …………………………………………………………………………………….. 10
2.2.4 HYBRID RECOMMENDATION AND APPROACH ……………………………… 12
2.2.4.1 POSSIBLE COMBINATION OF HYBRID RECOMMENDATION ……….. 13
vii
2.3 APACHE MAHOUT ………………………………………………………………………………. 14
2.3.1 DEVELOPMENT OF A SIMPLE RECOMMENDER USING MAHOUT LIBRARY ………………………………………………………………………………………………………… 16
2.4 HADOOP ……………………………………………………………………………………………….. 17
2.5 RELATED WORK ………………………………………………………………………………… 17
CHAPTER THREE ………………………………………………………………………………………….. 20
RESEARCH METHODOLOGY ………………………………………………………………………. 20
3.1 INTRODUCTION………………………………………………………………………………….. 20
3.2 METHODOLOGY …………………………………………………………………………………. 20
3.3 CONTENT BASED RECOMMENDATION ……………………………………………. 22
3.4 COLLABORATIVE FILTERING USING MAHOUT …………………………….. 24
3.5 RECAP …………………………………………………………………………………………………… 25
CHAPTER FOUR …………………………………………………………………………………………….. 26
IMPLEMENTATION, RESULTS, PRESENTATION AND DISCUSSION ……….. 26
4.1 OVERVIEW OF THE IMPLEMENTATION APPROACH …………………….. 26
4.2 EXTRACTION OF IMDB DATA ……………………………………………………………. 26
4.2.1 SOFTWARE TOOLS ……………………………………………………………………………… 26
4.2.1.1 SQLObject ……………………………………………………………………………………….. 27
4.2.1.2 PSYCOPG ……………………………………………………………………………………….. 27
4.2.1.3 POSTGRESQL ………………………………………………………………………………… 27
4.3 EXTRACTION OF MOVIELENS DATA ………………………………………………. 28
4.3.1 MOVIELENS RATING INFORMATION ………………………………………………. 28
4.3.2 MOVIELENS ITEM INFORMATION ………………………………………………….. 29
4.3.3 EXTRACTING MOVIELENS USER FEATURES ………………………………….. 30
4.4 ITEM FEATURES EXTRACTION AND COMBINATION …………………….. 31
viii
4.5 IMPLEMENTATION OF RECOMMENDER ENGINE BY APACHE MAHOUT ………………………………………………………………………………………………………… 32
4.5.1 CLOUDERA …………………………………………………………………………………………… 33
4.5.2 APACHE MAVEN………………………………………………………………………………….. 33
4.6 MAHOUT RECOMMENDER COMPONENTS – PARAMETERS OPTIMIZATION ……………………………………………………………………………………………… 34
4.6.1 DATASET ………………………………………………………………………………………………. 34
4.6.2 SIMILARITY METRICS AND NEIGHBORHOOD CRITERIA ………………. 35
4.7 SYSTEM EVALUATION ……………………………………………………………………….. 38
4.7.1 PERFORMANCE MEASURE ………………………………………………………………… 38
4.7.2 USER CONTENT FEATURES ……………………………………………………………….. 39
4.7.3 ITEM CONTENT FEATURES ………………………………………………………………. 41
4.7.4 COMPARING USER/ITEM CONTENT FEATURES ………………………………. 43
CHAPTER FIVE ……………………………………………………………………………………………… 45
SUMMARY AND CONCLUSIONS ………………………………………………………………….. 45
5.1 SUMMARY ……………………………………………………………………………………………. 45
5.2 CONCLUSION ……………………………………………………………………………………… 45
5.3 RECOMMENDATION AND FUTURE WORKS …………………………………… 46
REFERENCES …………………………………………………………………………………………………. 47
APPENDIX A: SOURCE CODE SNIPPET……………………………………………………….. 54
APPENDIX B: RECOMMENDER ENGINE – JAVA PROGRAM …………………….. 56
APPENDIX C: EXPERIMENTAL RESULT …………………………………………………….. 57

 

 

CHAPTER ONE

INTRODUCTION
1.1 BACKGROUND OF THE STUDY
The rate at which information is growing on the internet has resulted in large amounts of data and an increase in online users. This huge explosion of data has flooded users with large volumes of information and hence poses a great challenge in terms of information overload. Resultantly, this has made it very difficult for human beings to process such information manually and quite difficult for them to find the right information. The ability to make informed and accurate decisions from the sheer abundance of information by users often creates immense confusion. . Large internet companies like Amazon, Google, and Facebook have been faced with a difficulty in managing this explosion of information. Recommendation systems have been employed in order to transform this problem in a smart way. Figure 1.1 shows how recommender engines have stepped in this regard to rescue users from such confusion.
The vast increase in online data and users led to the rise of big data. The Big Data world has paid the most attention to the Recommendation System. Big Data has improved the capacity to do recommendations on a large scale. It has made the Recommendation System more important for the users as it predicts right piece of information out of vast amounts of information. The system is a particular form of information filtering that exploits users past behaviors or by the behavior of similar users to generate a list of information items that is personally tailored to an end user’s preferences.
At present, in E-commerce, Recommendation Systems (RSs) are broadly used for information filtering processes to deliver personalized information by predicting user’s preferences to particular items [1]. RSs attempt to suggest items (Movies, music, books, news, web pages, etc.) that are most likely to interest the users. Amazon, Netflix and other such portals use RSs extensively for suggesting content to their users. RSs aim to alleviate
2
information overload problems by presenting the most attractive and relevant content. RSs have become a basic need of every e-commerce portal.
Figure 1.1: The relevance of a Recommendation Engine to Users
1.2 PROBLEM STATEMENT
Most recently, a number of machine learning techniques and hybrid filtering techniques have been implemented to achieve quality recommendations and to handle the problems of pure Collaborative Filtering (CF). Sparsity, cold start, scalability, neighbor transitivity, and accuracy are the main problems of CF [1]. To handle the problems of CF, other recommendation techniques such as Content-based filtering [1], [5] and Knowledge-based filtering [1], [4] have been combined with CF by using hybrid algorithms.
In this work, we introduce a novel hybrid system that combines Content-based filtering and Collaborative techniques. It will be investigated if a combination of content features from the matching of MovieLens Data and Internet Movie Database (IMDB), and Collaborative filtering based on the Mahout Framework built on top of Hadoop will solve the accuracy and scalability issue currently experienced in processing large volumes of
3
data for recommending items to users, and proposing an effective model that improves recommendation accuracy.
1.3 AIM AND OBJECTIVES
The aim of the project is to develop a Hybridized Recommendation System on movie data using Collaborative and Content-based filtering techniques on top of an Hadoop [9] platform using Apache Mahout [10] and MovieLens dataset [11] to see the performance on the base of scalability and speedup, and to alleviate data sparsity and cold start problems associated with pure CF.
Objectives:
The following steps have been outlined to achieve this aim:
 To study the different ways to combine Collaborative filtering and Content-based methods into a Hybrid Recommender System.
 To determine the most effective hybrid system by incorporating some content-based characteristics into a collaborative approach (implemented on Apache Mahout).
 This will be implemented on top of Hadoop to improve scalability issues.
 To determine the implication of adjusting different Mahout Component parameters on our hybridized model.
 To evaluate the performance of the developed hybrid recommendation engine against existing models. Our novel approach will establish the influence of different content features on recommendation accuracy.
 To use the well-known MovieLens datasets [11].
 The Movie Content features will be extracted from the Internet Movie Database (IMDB). Our goal is to match user ratings from the MovieLens dataset and movie features from the IMDB in order to find appropriate item features.
 To show that the Movie Content features that were extracted have a positive impact on the prediction accuracy of our hybrid recommendation system.
4
1.4 SIGNIFICANCE OF THE STUDY
Collaborative filtering (CF) has been the most promising and widely used recommendation technique when compared to the different recommendation techniques that have been developed recently [2], [3]. Although CF has recorded success in many application settings, the CF approach still has enormous limitations, for instance, the ability to handle data sparsity, cold start problems and scalability [4]. Its appropriateness and relevance is reduced due to data sparsity. Data sparsity is a term used to refer to a situation whereby users in general rate only a limited number of items. Another limitation of the CF approach is when data is inadequate for both new users and new items (cold start), and its inability to handle the exponential growth of both users and items in the database (scalability problem). This research seeks to improve the prediction accuracy of the existing collaboration framework by incorporating Content-based features.
It is expected that at the end of the study, we would have:
 Developed a hybridized recommender engine based on Content-based and Collaborative algorithms using Mahout on Hadoop in order to achieve scalability.
 Developed an effective Hybrid Recommendation engine with improved accuracy and efficiency.
1.6 SYNOPSIS
The rest of this thesis is organized as follows, chapter two reviews existing works in Recommendation systems, Collaborative filtering, Content-based Recommendation, Hybrid Recommendation, different ways to combine Collaborative and Content-based filtering , Big data implementation (Apache Mahout and Hadoop) and other related research areas that are considered important to this study. Chapter three presents the methodology of the proposed system; Matching MovieLens data and IMDB to extract Movie content Features and the implementation of a java application based on Mahout Recommendation framework sitting on top of Hadoop for scalability purpose.
Chapter four discusses the implementation of the system and evaluation of the obtained results as compared with existing models. Chapter five gives a conclusion with a summary of the work and proposed future areas of research in hybrid recommendation systems.
5

 

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.