Originally published at: https://www.sitepoint.com/keras-digit-recognition-tutorial/
Over the last decade, the use of artificial neural networks (ANNs) has increased considerably. People have used ANNs in medical diagnoses, to predict Bitcoin prices, and to create fake Obama videos! With all the buzz about deep learning and artificial neural networks, haven’t you always wanted to create one for yourself? In this tutorial, we’ll create a model to recognize handwritten digits.
We use the
keras library for training the model in this tutorial. Keras is a high-level library in Python that is a wrapper over TensorFlow, CNTK and Theano. By default, Keras uses a TensorFlow backend by default, and we'll use the same to train our model.
Artificial Neural Networks
An artificial neural network is a mathematical model that converts a set of inputs to a set of outputs through a number of hidden layers. An ANN works with hidden layers, each of which is a transient form associated with a probability. In a typical neural network, each node of a layer takes all nodes of the previous layer as input. A model may have one or more hidden layers.
ANNs receive an input layer to transform it through hidden layers. An ANN is initialized by assigning random weights and biases to each node of the hidden layers. As the training data is fed into the model, it modifies these weights and biases using the errors generated at each step. Hence, our model "learns" the pattern when going through the training data.
Convoluted Neural Networks
In this tutorial, we're going to identify digits — which is a simple version of image classification. An image is essentially a collection of dots or pixels. A pixel can be identified through its component colors (RGB). Therefore, the input data of an image is essentially a 2D array of pixels, each representing a color.
If we were to train a regular neural network based on image data, we'd have to provide a long list of inputs, each of which would be connected to the next hidden layer. This makes the process difficult to scale up.
In a convoluted neural network (CNN), the layers are arranged in a 3D array (X-axis coordinate, Y-axis coordinate and color). Consequently, a node of the hidden layer would only be connected to a small region in the vicinity of the corresponding input layer, making the process far more efficient than a traditional neural network. CNNs, therefore, are popular when it comes to working with images and videos.
The various types of layers in a CNN are as follows:
- convolutional layers: these run input through certain filters, which identify features in the image
- pooling layers: these combine convolutional features, helping in feature reduction
- flatten layers: these convert an N-dimentional layer to a 1D layer
- classification layer: the final layer, which tells us the final result.
Let's now explore the data.
Explore MNIST Dataset
As you may have realized by now, we need labelled data to train any model. In this tutorial, we'll use the MNIST dataset of handwritten digits. This dataset is a part of the Keras package. It contains a training set of 60000 examples, and a test set of 10000 examples. We'll train the data on the training set and validate the results based on the test data. Further, we'll create an image of our own to test whether the model can correctly predict it.
First, let's import the MNIST dataset from Keras. The
.load_data() method returns both the training and testing datasets:
from keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data()
Let's try to visualize the digits in the dataset. If you're using Jupyter notebooks, use the following magic function to show inline Matplotlib plots:
Next, import the
pyplot module from
matplotlib and use the
.imshow() method to display the image:
import matplotlib.pyplot as plt image_index = 35 print(y_train[image_index]) plt.imshow(x_train[image_index], cmap='Greys') plt.show()
The label of the image is printed and then the image is displayed.
![label printed and image displayed|912x646](upload://3VRSGRfukrkbAShfkHO1X0fVMZb.png)
Let's verify the sizes of the training and testing datasets:
Notice that each image has the dimensions 28 x 28:
(60000, 28, 28) (10000, 28, 28)
Next, we may also wish to explore the dependent variable, stored in
y_train. Let's print all labels until the digit that we visualized above:
print(y_train[:image_index + 1])
[5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9 4 0 9 1 1 2 4 3 2 7 3 8 6 9 0 5]
Now that we've seen the structure of the data, let's work on it further before creating the model.
To work with the Keras API, we need to reshape each image to the format of (M x N x 1). We'll use the
.reshape() method to perform this action. Finally, normalize the image data by dividing each pixel value by 255 (since RGB value can range from 0 to 255):
# save input image dimensions img_rows, img_cols = 28, 28 x_train = x_train.reshape(x_train.shape, img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape, img_rows, img_cols, 1) x_train /= 255 x_test /= 255
Next, we need to convert the dependent variable in the form of integers to a binary class matrix. This can be achieved by the
from keras.utils import to_categorical num_classes = 10 y_train = to_categorical(y_train, num_classes) y_test = to_categorical(y_test, num_classes)
We're now ready to create the model and train it!