Hello, and welcome!
In this video, we will show you how to build a collaborative filtering based recommendation
system using a Restricted Boltzmann Machine, and the TensorFlow library.
To recommend items, this system will find users that are similar to each other based
on their item ratings.
Let's download the datasets that we'll use for our application, and extract the contents
into an accessible directory.
The datasets were acquired by GroupLens, and they contain movies, users, and movie ratings
by these users.
Next we'll import the necessary libraries.
TensorFlow and Numpy will be used for the RBM model, and Pandas will allow us to easily
manipulate our datasets.
Now we'll load the data into our program.
The .dat files containing our data are similar to CSV files, but instead of using the comma
character to separate entries, it uses two colon characters instead.
To accommodate this, we can simply call the "read csv"
function with an additional sep="::".
Our datasets don't have any headers so we'll also add in the parameter "header=None".
We just imported and displayed the movie dataset, so let's do the same for the ratings.
Our "movies_df"
variable contains a dataframe that stores a movie's unique ID number, title, and genres.
Our "ratings_df"
variable stores a unique User ID number, an ID for movies that the user has watched, and
the user's rating for the movie.
Let's now rename the columns in these dataframes to something more intuitive.
Here are our final datasets.
Now let?s take a look at our RBM model.
A Restricted Boltzmann Machine has two layers of neurons: the visible input layer, and the
hidden layer.
The hidden layer is used to learn features from the information fed through the input
layer.
For our model, the input is going to contain "X"
neurons, where "X" is the number of movies in our dataset.
Each of these neurons will possess a normalized rating value ranging from 0 to 1.
0 means that a user has not watched that movie, and the closer the value is to 1, the more
the user likes the movie that the neuron's representing.
These values will be extracted and normalized from the ratings dataset.
Let's format the datasets so that they conform to the RBM's structure.
To begin, let's see how many movies we have, and if the movie IDs correspond to that value.
Notice that we have 3883 movies, while our ID's vary from 1 to 3952.
Due to this, we won't be able to index movies through their ID since we would get memory
indexing errors.
So instead we can create a column that shows what spot in our list that particular movie is in.
Now let's merge the ratings dataframe into the movies one so we can have the List Index
values in both dataframes.
We're also going to drop the Timestamp, Title, and Genres columns since we won't be needing
them to make recommendations.
Now, we can start formatting the data into input for the RBM.
We're going to store the normalized user ratings into a list of lists called "trX".
Let's build our RBM with Tensorflow.
After choosing the number of hidden layers, we create placeholder variables for the visible
layer biases, hidden layer biases, and weights that connect the hidden layer with the visible layer.
We arbitrarily use 20 units in our hidden layer.
Next we create the visible and hidden layer units.
For the activation functions, we?ll use the"nonlinear tf.sigmoid" and "tf.relu" functions.
Now we set the RBM training parameters and functions.
And set the error function, which will be Mean Absolute Error.
We also have to initialize our variables.
Thankfully, NumPy has a "zeros function" for this.
We'll train the RBM for 15 epochs.
Each epoch uses 10 batches with size 100.
After training, we'll print out a graph with the error by epoch.
With our model trained, we can finally predict movies that an arbitrarily selected user might like.
This can be done by feeding the user's watched movie preferences into the RBM
and then reconstructing the input.
The values that the RBM gives us will estimate the user's preferences for movies that he
hasn't watched, based on the preferences of the users that the RBM was trained on.
To find the 20 most recommended movies for our mock user, we can sort by the scores that
our model provided.
Feel free to modify the parameters like the number of hidden units or the loss functions.
These parameters have an impact on run time and performance.
By now, you should understand how to implement a basic recommendation system using an RBM
and the TensorFlow library.
Thank you for watching this video.
Không có nhận xét nào:
Đăng nhận xét