Introduction to Restricted Boltzmann Machines Using PyTorch

In this tutorial, we’re going to talk about a type of unsupervised learning model known as Boltzmann machines. We assume the reader is well-versed in machine learning and deep learning. We’ll use PyTorch to build a simple model using restricted Boltzmann machines. This model will predict whether or not a user will like a movie.

The Boltzmann Machine

A Boltzmann machine defines a probability distribution over binary-valued patterns. What makes Boltzmann machine models different from other deep learning models is that they’re undirected and don’t have an output layer. The other key difference is that all the hidden and visible nodes are all connected with each other. Due to this interconnection, Boltzmann machines can generate data on their own. As such, it can be classified as a generative deep learning model.

Energy Based Models

Boltzmann models are based on the physics equation shown below. The goal when using this equation is to minimize energy:

Restricted Boltzmann Machines (RBMs)

What makes RBMs different from Boltzmann machines is that visible nodes aren’t connected to each other, and hidden nodes aren’t connected with each other. Other than that, RBMs are exactly the same as Boltzmann machines.

Since RBMs are undirected, they don’t adjust their weights through gradient descent and backpropagation. They adjust their weights through a process called contrastive divergence. At the start of this process, weights for the visible nodes are randomly generated and used to generate the hidden nodes. These hidden nodes then use the same weights to reconstruct visible nodes. The weights used to reconstruct the visible nodes are the same throughout. However, the generated nodes are not the same because they aren’t connected to each other.

Deep Belief Networks

A deep-belief network is a stack of restricted Boltzmann machines, where each RBM layer communicates with both the previous and subsequent layers. The nodes of any single layer don’t communicate with each other laterally.

Deep Boltzmann Machines

Deep Boltzmann machines are a series of restricted Boltzmann machines stacked on top of each other. The hidden units are grouped into layers such that there’s full connectivity between subsequent layers, but no connectivity within layers or between non-neighboring layers.

Building a Simple RBM Model Using Pytorch

In order to install PyTorch, head over to the official PyTorch website and install it depending on your operating system. We’ll use the movie review data set available at Grouplens.

Importing the necessary libraries

We kick off by importing the libraries that we’ll need, namely:

Numpy for scientific computation
Pandas for loading in our dataset
torch the Pytorch library import
torch.nn as nn for initializing the neural network
torch.nn.parallel for parallel computations
torch.optim as optim for the optimizer
torch.utils.data for data loading and processing.
autograd for implementing automatic differentiation

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

Loading the dataset

In the next step, we import the users, ratings, and movies dataset. In our case, our dataset is separated by double colons. The dataset does not have any headers so we shall pass the headers as none. We then use the latin-1 encoding type since some of the movies have special characters in their titles.

We then set the engine to Python to ensure the dataset is correctly imported. The first column of the ratings dataset is the user ID, the second column is the movie ID, the third column is the rating and the fourth column is the timestamp.

movies = pd.read_csv('ml-1m/movies.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
users = pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
ratings = pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')

Preparing the test set and training set

Let’s now prepare our training set and test set. Our test and training sets are tab separated; therefore we’ll pass in the delimiter argument as t. As we know very well, pandas imports the data as a data frame. However, we need to convert it to an array so we can use it in PyTorch tensors. We do that using the numpy.array command from Numpy. We also specify that our array should be integers since we’re dealing with integer data types.

training_set = pd.read_csv('ml-100k/u1.base', delimiter = 't')
training_set = np.array(training_set, dtype = 'int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = 't')
test_set = np.array(test_set, dtype = 'int')

Generate matrix for use in the RBM

In order to build the RBM, we need a matrix with the users’ ratings. This matrix will have the users as the rows and the movies as the columns. The matrix will contain a user’s rating of a specific movie. Zeros will represent observations where a user didn’t rate a specific movie.

In order to create this matrix, we need to obtain the number of movies and number of users in our dataset. For no_users we pass in zero since it’s the index of the user ID column. The way we obtain the number of users is by getting the max in the training and test set, and then using the max utility to get the maximum of the two. We then force the obtained number to be an integer by wrapping the entire function inside int. We obtain the number of movies in a similar fashion:

no_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
no_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))

Next, we create a function that will create the matrix. The reason for doing this is to set up the dataset in a way that the RBM expects as input. We create a function called convert, which takes in our data as input and converts it into the matrix.

First, we create an empty list called new_data. We then create a for loop that will go through the dataset, fetch all the movies rated by a specific user, and the ratings by that same user. Notice that we loop up to no_users + 1 to include the last user ID since the range function doesn’t include the upper bound.

Since there are movies that the user didn’t rate, we first create a matrix of zeros. We then update the zeros with the user’s ratings. When appending the movie ratings, we use id_movies — 1 because indices in Python start from zero. We therefore subtract one to ensure that the first index in Python is included. We append the ratings to new_data as a list. This will create a list of lists. Later, we’ll convert this into Torch tensors. The function that converts the list to Torch tensors expects a list of lists.

def convert(data):
    new_data = []
    for id_users in range(1, no_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(no_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data

Now let’s use our function and convert our training and test data into a matrix.

training_set = convert(training_set)
test_set = convert(test_set)

Convert the data into torch sensors

Since we’re using PyTorch, we need to convert the data into Torch tensors. The way we do this is by using the FloatTensor utility. This will convert the dataset into PyTorch arrays.

training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

Convert the ratings into zeros and ones

Next we convert these ratings into binary ratings since we want to make a binary classification. We therefore convert the ratings to zeros and ones. Remember that we already have zero ratings in the dataset representing where a user didn’t rate the movie. We replace that with -1 to represent movies that a user never rated. We then convert the ratings that were rated 1 and 2 to 0 and movies that were rated 3, 4 and, 5 to 1. We do this for both the test set and training set.


training_set[training_set == 0] = -1
training_set[training_set == 1] = 0
training_set[training_set == 2] = 0
training_set[training_set >= 3] = 1
test_set[test_set == 0] = -1
test_set[test_set == 1] = 0
test_set[test_set == 2] = 0
test_set[test_set >= 3] = 1

Creating the RBM Architecture

Now we need to create a class to define the architecture of the RBM. Inside the init function we specify two parameters; the first variable is the number of visible nodes nv, and the second parameter is the number of hidden nodes nh.

Next, we initialize the weight and bias. We do this randomly using a normal distribution and using randn from torch. The weight is of size nh and nv. We then define two types of biases. a is the probability of the hidden nodes given the visible nodes, and b is the probability of the visible nodes given the hidden nodes. In declaring them we input 1 as the first parameter, which represents the batch size.

The next step is to create a function sample_h which will sample the hidden nodes. It takes x as an argument, which represents the visible neurons.

Next, we compute the probability of h given v where h and v represent the hidden and visible nodes respectively. This represents the sigmoid activation function and is computed as the product of the vector of the weights and x plus the bias a. The product is done using the mm utility from Torch. Since we’re doing a binary classification, we also return bernoulli samples of the hidden neurons.

Next, we create a function sample_v that will sample the visible nodes. The function is similar to the sample_h function.

The next function we create is the training function. It takes the following parameter; the input vector containing the movie ratings, the visible nodes obtained after k samplings, the vector of probabilities, and the probabilities of the hidden nodes after k samplings.

class RBM():
    def __init__(self, nv, nh):
        self.W = torch.randn(nh, nv)
        self.a = torch.randn(1, nh)
        self.b = torch.randn(1, nv)

    def sample_h(self, x):
        wx = torch.mm(x, self.W.t())
        activation = wx + self.a.expand_as(wx)
        p_h_given_v = torch.sigmoid(activation)
        return p_h_given_v, torch.bernoulli(p_h_given_v)

    def sample_v(self, y):
        wy = torch.mm(y, self.W)
        activation = wy + self.b.expand_as(wy)
        p_v_given_h = torch.sigmoid(activation)
        return p_v_given_h, torch.bernoulli(p_v_given_h)

    def train(self, v0, vk, ph0, phk):
        self.W += torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)
        self.b += torch.sum((v0 - vk), 0)
        self.a += torch.sum((ph0 - phk), 0)

Now we set the number of visible nodes to the length of the training set and the number of hidden nodes to 200. The number of visible nodes corresponds to the number of features in our training set. The number of hidden nodes determines the number of features that we’d like our RBM to detect. We also set a batch size of 100 and then call the class RBM.

nv = len(training_set[0])
nh = 200
batch_size = 100
rbm = RBM(nv, nh)

Training the RBM

The first step in training the RBM is to define the number of epochs. We then define a for loop where all the training set will go through. After each epoch, the weight will be adjusted in order to improve the predictions. Finally, we obtain the visible nodes with the ratings of the movies that were not rated by the users.

nb_epoch = 10
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    s = 0.
    for id_user in range(0, nb_users - batch_size, batch_size):
        vk = training_set[id_user:id_user+batch_size]
        v0 = training_set[id_user:id_user+batch_size]
        ph0,_ = rbm.sample_h(v0)
        for k in range(10):
            _,hk = rbm.sample_h(vk)
            _,vk = rbm.sample_v(hk)
            vk[v0<0] = v0[v0<0]
        phk,_ = rbm.sample_h(vk)
        rbm.train(v0, vk, ph0, phk)
        train_loss += torch.mean(torch.abs(v0[v0>=0] - vk[v0>=0]))
        s += 1.
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))

Testing the RBM

Next we test our RBM. In this stage, we use the training set data to activate the hidden neurons in order to obtain the output. This is how we get the predicted output of the test set. We then use the absolute mean to compute the test loss.

test_loss = 0
s = 0.
for id_user in range(nb_users):
    v = training_set[id_user:id_user+1]
    vt = test_set[id_user:id_user+1]
    if len(vt[vt>=0]) > 0:
        _,h = rbm.sample_h(v)
        _,v = rbm.sample_v(h)
        test_loss += torch.mean(torch.abs(vt[vt>=0] - v[vt>=0]))
        s += 1.
print('test loss: '+str(test_loss/s))

Conclusion

This model can be improved using an extension of RBMs known as autoencoders. You can learn more about RMBs and Boltzmann machines from the references shared below.