Online Book Recommendation System Project 2024

Online Book Recommendation System Project – Books2Rec is a book recommendation system that started as a project for a Big Data Science class at NYU. Using your Goodreads profile, Books2Rec uses machine learning techniques to provide you with highly personalized book recommendations. Don’t have a Goodreads profile? We’ve got you covered. just search for your favorite book.

The systems offered are at the forefront of how content-serving sites like Facebook, Amazon, Spotify, and more. communicate with their users; It is said that 35% of Amazon.com’s revenue comes from its referral engine

Table of Contents

Online Book Recommendation System Project

As a trio of book lovers, we checked out Goodreads, the world’s largest site for readers and book recommendations. It is owned by Amazon, which itself has a large referral engine. However, we found their recommendations left a lot to be desired.

Recommendation Systems Explained. Explaining & Implementing Content…

Here’s an example of Goodreads recommending a book about a harrowing journey to the western US frontier based on my high score for the sequel to Charlie and the Chocolate Factory. I think we can do better.

Below, we use a hybrid recommendation system to provide recommendations (ratings and subject characteristics) for Goodreads users.

An example of our recommendations based on the book’s pure metadata attributes. Notice how it applies to all of the author’s other books anyway

We use a hybrid referral system to power our referrals. Hybrid systems are a combination of two other types of recommendation systems: content-based filtering and collaborative filtering. Content-based filtering is a method of recommending products based on the similarity of listed items. That is, if I like the first book of Lord of the Rings, and if the second book is similar to the first, then it can recommend the second book. Collaborative filtering is a method that uses user ratings to determine similarities between users or products. If there is a high ratio of users who rate the first Lord of the Rings book and the second Lord of the Rings book, then they are considered similar.

Resume Review , Unable To Land Any Interview Calls

Our hybrid system uses both of these approaches. Our item similarities are a combination of user ratings and features from the books themselves.

. It is undoubtedly one of the most monumental algorithms in the history of recommender systems. Over time, we aim to improve our recommendations using the latest trends in recommendation systems.

The SVD algorithm made famous by the Netflix challenge differs from standard SVD in that it does NOT assume that the missing values are 0.

Online Book Recommendation System Project

. Standard SVD is a perfect matrix reconstruction, but it has one drawback for our purposes. if the user has not rated the book (which is the case for most books), then SVD will model it as 0 for everyone. missing books.

A Repository For The Publication And Sharing Of Heterogeneous Materials Data

To use SVD to predict ratings, you must update the matrix values to negate this effect. To achieve this, you can use Gradient Descent on the predicted score error function. Once you run Gradient Descent enough times, each value in the decomposed matrix begins to better reflect the correct values for predicting the missing units than for reconstructing the matrix.

As with all machine learning-based projects, you want to make sure that what you’re using is “better” than other popular methods. As discussed earlier, we used RMSE to evaluate the performance of our trained latent factor (SVD) model. Below are the RMSEs for several algorithms that we calculated during the development of this project.

There are two widely used metrics in referral systems that we also use. Mean squared error, otherwise known as

, is the average difference between the predicted score and the actual score. Its close cousin, Root Mean Squared Error (otherwise known as

The History Of Amazon’s Recommendation Algorithm

) is still the average distance, but the difference between predicted and actual scores is squared, meaning that it is much more expensive to miss something by a large margin than to miss something by a small margin.

Note: Not all HPC network search results are shown here, only the best model of each series (small parameters, large parameters, medium parameters).

Our final model uses SVD with 300 factors trained with 100 epochs. Overall, low-factor models consistently outperformed very high-factor models, but this mid-level (300 factors, 100 epochs) was the absolute best result of our network search. We also subjectively liked the recommendations it made for test users rather than the very small factor model. This is because a model with only 10 factors is very general. While this may be a small error for rating predictions, the suggestions he made seemed to make no sense.

Online Book Recommendation System Project

Why not use just one hyper-optimized latent factor model (SVD) instead of combining it with a content-based model?

Best Books For Children

The answer is simply that a pure SVD model can lead to very nonsensical “black box” recommendations that can turn users off. The trained SVD model simply attempts to assign factor forces to each product in a matrix to minimize some cost function. This cost function simply tries to minimize the prediction error of the latent scores in the test set. This results in a highly optimized model that, when finally used to make recommendations for new users, can spit out some very subjective weird recommendations.

For example, let’s say there is a book A that, when passed through the trained SVD model, is most similar to book B in terms of rankings. The problem is that book B may be completely unrelated to A by “traditional” standards (ie: genre book, etc.). Which could lead to a book like The Lord of the Rings The Return of the King that ends up being the most like The Sisterhood of the Traveling Pants (yes, it did). This is because it may happen that these two books are always rated equally by users, and therefore the SVD model learns to always recommend these books together, as this will minimize the error function. However, if you ask most fantasy readers, they’d probably prefer to recommend more fantasy books (but not any other Tolkien book).

This comes down to trying to find a balance between research (using SVD to recommend books that are only similar to tens of thousands of user ratings) and insightful recommendations (using content features to recommend other fantasy books if the user enjoyed the Lord of the Rings books ). To solve this problem, we combine the trained SVD matrix with the feature matrix. That way, when we map a user to this matrix, the user is mapped to all the hidden concept spaces that SVD has learned. All books returned by the model are then weighted according to how similar they are to the characteristics of the books that the user rated highly. That way, you’ll get recommendations that aren’t just about the same genre you like, but also don’t completely forget about the types of books you like.

6 million ratings from Goodreads here: goodbooks-10k repository. Along with ratings, this data also includes excellent book metadata, which was used for the content-based model.

Best Books 2022: Maureen Corrigan Picks Her Favorite Books Of The Year

) the most significant part of any data science project. The most difficult part of our data preprocessing was combining the Goodreads data and the Amazon ratings. Amazon ratings have an Amazon Standard Identification Number (ASIN) attached to them, but not an ISBN. We mapped ASINs to book titles, Goodreads book IDs to book titles, and performed a hash join on the two sets of titles to join the two sets of ratings together.

We used visualizations to see the difference in score distributions between the two datasets. Visualizations are created using

The next step is to create book features, which is done by constructing tf-idf vectors of book descriptions, labels, and shelves. There were also many missing images in the Goodreads database, which greatly reduced the quality of our web application, so those images were retrieved from Goodreads.

Online Book Recommendation System Project

After these steps, the data was clean enough to be hosted on a web server and converted into a digital format that could be consumed by machine learning algorithms.

Say The Right Thing

RapidMiner is a data science platform that enables rapid prototyping of machine learning algorithms. We used RapidMiner to get a “feel” for our data. It was great for quickly deploying models and viewing their results, but it turned out to be inflexible and unable to handle more than 12,000 users before a memory error or array leak occurred. They were able to achieve an RMSE of 0.864 and an MAE of 0.685.

Surprise is a Python library for recommendation generation and recommendation evaluation. It provides a good API and a good pipeline for recommendation systems, but we’ve found that it’s not as flexible as we’d like. It turned out to be quite difficult to get the different types of referrals to work well with his channel, but