Analysis of recommender system algorithms in the Movielens dataset = Analysis of recommender system algorithms in the Movielens dataset
Περίληψη
Recommender Systems encompass a class of techniques and algorithms which are able to suggest “relevant” items to users. Ideally, the suggested items are as relevant to the user as possible, so that the user can engage with those items: YouTube videos, news articles, online products and so on. Items are ranked according to their relevancy, and the most relevant ones are shown to the user. The relevancy is something that the recommender system must determine and is mainly based on historical data. Recommender systems are generally divided into two main categories: collaborative filtering and content-based systems. In this thesis are focusing on Collaborative Filtering algorithms. In the first part, and introduction to recommendation engines, classification of recommender systems algorithms and evaluation metrics are defined. In chapter 2, the Movielens dataset is presented and alterations that were made to the original data due to limitation of the machine that run the algorithms. In chapter 3, a variety of algorithms that generate item to item recommendations is presented. These algorithms work solely on the interaction matrix and transformation that are based on natural language preprocessing techniques. In the 4th and 5th part, three algorithms that predict user – item rating are created. The first two of them, alternating least squares (ALS) and singular vale decomposition (SVD) are matrix factorization techniques and the last one is a recommender system based on a neural network of latent features. The evolution of errors as the training iterations pass is also presented. Concluding the analysis, it was made clear that even though the dataset on hand had a very limited amount of information, consisting only with user and movie ids along with the given rating, a recommender system could be build and achieve astonishing results in terms of error. The strength of using latent features representations was underlined, as all algorithms that used them generated very promising results, being able to achieve very low errors in their test sets. Neural network architectures of embeddings showed the best results, having the lowest rating prediction error.
Πλήρες Κείμενο:
PDFΕισερχόμενη Αναφορά
- Δεν υπάρχουν προς το παρόν εισερχόμενες αναφορές.