圖像識別 | 基於Keras的手寫數字識別（含代碼）

前沿

人工智慧的浪潮已經席捲全球，深度學習（Deep Learning）和人工智慧（Artificial Intelligence, AI）等詞彙也不斷地充斥在我們身邊。人工智慧的發展是一個三起兩落的變化，90年代期間，知識推理>神經網絡>機器學習；2005年左右，機器學習>知識（語義網）>神經網絡;而從2017年之後，基於深度學習的神經網絡>知識（知識圖譜）>機器學習。

卷積神經網絡（convolutional neural network, CNN）作為深度學習中的代表，最早的靈感是來源於1961年Hubel和Wiesel兩位神經生物學家，在對貓視覺皮層細胞的實驗中，發現大腦可視皮層是分層的（CNN中的分層網絡結構與其如出一轍）。深度學習作為機器學習（ML）的一個子領域，由於計算機能力的提高和大量數據的可用性，得到了戲劇性的復甦。但是，深度學習是否能等同或代表人工智慧，這一點筆者認為有待商榷，深度學習可以認為是目前人工智慧發展階段的重要技術。由於本文主要撰寫關於深度學習的入門實戰，關於細節概念不做深入研究，下面筆者從實際案例，介紹深度學習處理圖像的大致流程。

目錄：

以手寫識別數字為例，作為深度學習的入門項目，本文以Keras深度學習庫為基礎。其中使用的tensorflow等模塊需要提前配置好，同時注意模型，圖片保存、載入的文件路徑問題。在自己的計算機上運行時，需要創建或修改。下面的流程包括：使用Keras載入MNIST數據集，構建Lenet訓練網絡模型，使用Keras進行模型的保存、載入，使用Keras實現對手寫數字數據集的訓練和預測，最後畫出誤差疊代圖。

手寫數字數據集介紹：

手寫數字識別幾乎是深度學習的入門數據集了。在keras中內置了MNIST數據集，其中測試集包含60000條數據，驗證集包含10000條數據，為單通道的灰度圖片，每張圖片的像素大小為28×28.一共包含10個類別，為數字0到9。

導入相關模塊：

# import the necessary packages 
import numpy as np  
from keras.utils import np_utils  
from keras.optimizers import Adam  
from keras.preprocessing.image import ImageDataGenerator  
from keras.models import Sequential  
from keras.layers.convolutional import Conv2D  
from keras.layers.convolutional import MaxPooling2D  
from keras.layers.core import Activation  
from keras.layers.core import Flatten  
from keras.layers.core import Dense  
from keras import backend as K  
from keras.models import load_model

載入MNIST數據集

Keras可實現多種神經網絡模型，並可以加載多種數據集來評價模型的效果，下面我們使用代碼自動加載MNIST數據集。

  # load minit data
from keras.datasets import mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()

顯示MNIST訓練數據集中的前面6張圖片：

  # plot 6 images as gray scale
import matplotlib.pyplot as plt
 
plt.subplot(321)
plt.imshow(x_train[0],cmap=plt.get_cmap('gray'))
plt.subplot(322)
plt.imshow(x_train[1],cmap=plt.get_cmap('gray'))
plt.subplot(323)
plt.imshow(x_train[2],cmap=plt.get_cmap('gray'))
plt.subplot(324)
plt.imshow(x_train[3],cmap=plt.get_cmap('gray'))
plt.subplot(325)
plt.imshow(x_train[4],cmap=plt.get_cmap('gray'))
plt.subplot(326)
plt.imshow(x_train[5],cmap=plt.get_cmap('gray'))
# show
plt.show()

數據的預處理

首先，將數據轉換為4維向量[samples][width][height][pixels]，以便於後面模型的輸入

  # reshape the data to four dimensions, due to the input of model
# reshape to be [samples][width][height][pixels]
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32')
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1).astype('float32')

為了使模型訓練效果更好，通常需要對圖像進行歸一化處理

  # normalization
x_train = x_train / 255.0
x_test = x_test / 255.0

最後，原始MNIST數據集的數據標籤是0-9，通常要將其表示成one-hot向量。如訓練數據標籤為1，則將其轉化為向量[0,1,0,0,0,0,0,0,0,0]

  # one-hot
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

模型的建立與計算

訓練模型的參數設置：

  # parameters
EPOCHS = 10
INIT_LR = 1e-3
BS = 32
CLASS_NUM = 10
norm_size = 28

本文使用Lenet網絡架構，下面定義Lenet網絡結構，若要更改網絡結構，如用VGGNet，GoogleNet，Inception，ResNets或自己構建不同的網絡結構，可以直接在這一塊函數內進行修改。

  # define lenet model
def l_model(width, height, depth, NB_CLASS):
    model = Sequential()
    inputShape = (height, width, depth)
    # if we are using "channels last", update the input shape
    if K.image_data_format() == "channels_first":  # for tensorflow
        inputShape = (depth, height, width)
    # first set of CONV => RELU => POOL layers
    model.add(Conv2D(20, (5, 5), padding="same", input_shape=inputShape))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    # second set of CONV => RELU => POOL layers
    model.add(Conv2D(50, (5, 5), padding="same"))
    model.add(Activation("relu"))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    # first (and only) set of FC => RELU layers
    model.add(Flatten())
    model.add(Dense(500))
    model.add(Activation("relu"))
 
    # softmax classifier
    model.add(Dense(NB_CLASS))
    model.add(Activation("softmax"))
 
    # return the constructed network architecture
    return model

再附上兩個經典的模型：

VGG16:

import inspect
import os
 
import numpy as np
import tensorflow as tf
import time
 
VGG_MEAN = [103.939, 116.779, 123.68]
 
 
class Vgg16:
    def __init__(self, vgg16_npy_path=None):
        if vgg16_npy_path is None:
            path = inspect.getfile(Vgg16)
            path = os.path.abspath(os.path.join(path, os.pardir))
            path = os.path.join(path, "vgg16.npy")
            vgg16_npy_path = path
            print(path)
 
        self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item()
        print("npy file loaded")
 
    def build(self, rgb):
        """
        load variable from npy to build the VGG
        :param rgb: rgb image [Batch, height, width, 3] values scaled [0, 1]
        """
 
        start_time = time.time()
        print("build model started")
        rgb_scaled = rgb * 255.0
 
        # Convert RGB to BGR
        red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled)
        assert red.get_shape().as_list()[1:] == [224, 224, 1]
        assert green.get_shape().as_list()[1:] == [224, 224, 1]
        assert blue.get_shape().as_list()[1:] == [224, 224, 1]
        bgr = tf.concat(axis=3, values=[
            blue - VGG_MEAN[0],
            green - VGG_MEAN[1],
            red - VGG_MEAN[2],
        ])
        assert bgr.get_shape().as_list()[1:] == [224, 224, 3]
 
        self.conv1_1 = self.conv_layer(bgr, "conv1_1")
        self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2")
        self.pool1 = self.max_pool(self.conv1_2, 'pool1')
 
        self.conv2_1 = self.conv_layer(self.pool1, "conv2_1")
        self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2")
        self.pool2 = self.max_pool(self.conv2_2, 'pool2')
 
        self.conv3_1 = self.conv_layer(self.pool2, "conv3_1")
        self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2")
        self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3")
        self.pool3 = self.max_pool(self.conv3_3, 'pool3')
 
        self.conv4_1 = self.conv_layer(self.pool3, "conv4_1")
        self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2")
        self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3")
        self.pool4 = self.max_pool(self.conv4_3, 'pool4')
 
        self.conv5_1 = self.conv_layer(self.pool4, "conv5_1")
        self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2")
        self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3")
        self.pool5 = self.max_pool(self.conv5_3, 'pool5')
 
        self.fc6 = self.fc_layer(self.pool5, "fc6")
        assert self.fc6.get_shape().as_list()[1:] == [4096]
        self.relu6 = tf.nn.relu(self.fc6)
 
        self.fc7 = self.fc_layer(self.relu6, "fc7")
        self.relu7 = tf.nn.relu(self.fc7)
 
        self.fc8 = self.fc_layer(self.relu7, "fc8")
 
        self.prob = tf.nn.softmax(self.fc8, name="prob")
 
        self.data_dict = None
        print(("build model finished: %ds" % (time.time() - start_time)))
 
    def avg_pool(self, bottom, name):
        return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
 
    def max_pool(self, bottom, name):
        return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)
 
    def conv_layer(self, bottom, name):
        with tf.variable_scope(name):
            filt = self.get_conv_filter(name)
 
            conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')
 
            conv_biases = self.get_bias(name)
            bias = tf.nn.bias_add(conv, conv_biases)
 
            relu = tf.nn.relu(bias)
            return relu
 
    def fc_layer(self, bottom, name):
        with tf.variable_scope(name):
            shape = bottom.get_shape().as_list()
            dim = 1
            for d in shape[1:]:
                dim *= d
            x = tf.reshape(bottom, [-1, dim])
 
            weights = self.get_fc_weight(name)
            biases = self.get_bias(name)
 
            # Fully connected layer. Note that the '+' operation automatically
            # broadcasts the biases.
            fc = tf.nn.bias_add(tf.matmul(x, weights), biases)
 
            return fc
 
    def get_conv_filter(self, name):
        return tf.constant(self.data_dict[name][0], name="filter")
 
    def get_bias(self, name):
        return tf.constant(self.data_dict[name][1], name="biases")
 
    def get_fc_weight(self, name):
        return tf.constant(self.data_dict[name][0], name="weights")

GoogleNet：

from keras.models import Model
from keras.utils import plot_model
from keras import regularizers
from keras import backend as K
from keras.layers import Input,Flatten, Dense,Dropout,BatchNormalization, concatenate
from keras.layers.convolutional import Conv2D,MaxPooling2D,AveragePooling2D
 
# Global Constants
NB_CLASS=20
LEARNING_RATE=0.01
MOMENTUM=0.9
ALPHA=0.0001
BETA=0.75
GAMMA=0.1
DROPOUT=0.4
WEIGHT_DECAY=0.0005
LRN2D_NORM=True
DATA_FORMAT='channels_last' # Theano:'channels_first' Tensorflow:'channels_last'
USE_BN=True
IM_WIDTH=224
IM_HEIGHT=224
EPOCH=50
 
def conv2D_lrn2d(x,filters,kernel_size,strides=(1,1),padding='same',dilation_rate=(1,1),activation='relu',
                 use_bias=True,kernel_initializer='glorot_uniform',bias_initializer='zeros',
                 kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,
                 kernel_constraint=None,bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=WEIGHT_DECAY):
    #l2 normalization
    if weight_decay:
        kernel_regularizer=regularizers.l2(weight_decay)
        bias_regularizer=regularizers.l2(weight_decay)
    else:
        kernel_regularizer=None
        bias_regularizer=None
    x=Conv2D(filters=filters,kernel_size=kernel_size,strides=strides,padding=padding,dilation_rate=dilation_rate,
             activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
             bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
             activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
    if lrn2d_norm:
        #batch normalization
        x=BatchNormalization()(x)
 
    return x
 
def inception_module(x,params,concat_axis,padding='same',dilation_rate=(1,1),activation='relu',
                     use_bias=True,kernel_initializer='glorot_uniform',bias_initializer='zeros',
                     kernel_regularizer=None,bias_regularizer=None,activity_regularizer=None,kernel_constraint=None,
                     bias_constraint=None,lrn2d_norm=LRN2D_NORM,weight_decay=None):
    (branch1,branch2,branch3,branch4)=params
    if weight_decay:
        kernel_regularizer=regularizers.l2(weight_decay)
        bias_regularizer=regularizers.l2(weight_decay)
    else:
        kernel_regularizer=None
        bias_regularizer=None
    #1x1
    pathway1=Conv2D(filters=branch1[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
                    activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
                    bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
                    activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
    #1x1->3x3
    pathway2=Conv2D(filters=branch2[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
                    activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
                    bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
                    activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
    pathway2=Conv2D(filters=branch2[1],kernel_size=(3,3),strides=1,padding=padding,dilation_rate=dilation_rate,
                    activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
                    bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
                    activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway2)
    #1x1->5x5
    pathway3=Conv2D(filters=branch3[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
                    activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
                    bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
                    activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(x)
    pathway3=Conv2D(filters=branch3[1],kernel_size=(5,5),strides=1,padding=padding,dilation_rate=dilation_rate,
                    activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
                    bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
                    activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway3)
    #3x3->1x1
    pathway4=MaxPooling2D(pool_size=(3,3),strides=1,padding=padding,data_format=DATA_FORMAT)(x)
    pathway4=Conv2D(filters=branch4[0],kernel_size=(1,1),strides=1,padding=padding,dilation_rate=dilation_rate,
                    activation=activation,use_bias=use_bias,kernel_initializer=kernel_initializer,
                    bias_initializer=bias_initializer,kernel_regularizer=kernel_regularizer,bias_regularizer=bias_regularizer,
                    activity_regularizer=activity_regularizer,kernel_constraint=kernel_constraint,bias_constraint=bias_constraint)(pathway4)
 
    return concatenate([pathway1,pathway2,pathway3,pathway4],axis=concat_axis)
 
class GoogleNet:
    @staticmethod
    def build(width, height, depth, NB_CLASS):
        INP_SHAPE = (height, width, depth)
        img_input = Input(shape=INP_SHAPE)
        CONCAT_AXIS = 3
        # Data format:tensorflow,channels_last;theano,channels_last
        if K.image_data_format() == 'channels_first':
            INP_SHAPE = (depth, height, width)
            img_input = Input(shape=INP_SHAPE)
            CONCAT_AXIS = 1
        x = conv2D_lrn2d(img_input, 64, (7, 7), 2, padding='same', lrn2d_norm=False)
        x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
        x = BatchNormalization()(x)
 
        x = conv2D_lrn2d(x, 64, (1, 1), 1, padding='same', lrn2d_norm=False)
 
        x = conv2D_lrn2d(x, 192, (3, 3), 1, padding='same', lrn2d_norm=True)
        x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
 
        x = inception_module(x, params=[(64,), (96, 128), (16, 32), (32,)], concat_axis=CONCAT_AXIS)  # 3a
        x = inception_module(x, params=[(128,), (128, 192), (32, 96), (64,)], concat_axis=CONCAT_AXIS)  # 3b
        x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
 
        x = inception_module(x, params=[(192,), (96, 208), (16, 48), (64,)], concat_axis=CONCAT_AXIS)  # 4a
        x = inception_module(x, params=[(160,), (112, 224), (24, 64), (64,)], concat_axis=CONCAT_AXIS)  # 4b
        x = inception_module(x, params=[(128,), (128, 256), (24, 64), (64,)], concat_axis=CONCAT_AXIS)  # 4c
        x = inception_module(x, params=[(112,), (144, 288), (32, 64), (64,)], concat_axis=CONCAT_AXIS)  # 4d
        x = inception_module(x, params=[(256,), (160, 320), (32, 128), (128,)], concat_axis=CONCAT_AXIS)  # 4e
        x = MaxPooling2D(pool_size=(2, 2), strides=2, padding='same')(x)
 
        x = inception_module(x, params=[(256,), (160, 320), (32, 128), (128,)], concat_axis=CONCAT_AXIS)  # 5a
        x = inception_module(x, params=[(384,), (192, 384), (48, 128), (128,)], concat_axis=CONCAT_AXIS)  # 5b
        x = AveragePooling2D(pool_size=(1, 1), strides=1, padding='valid')(x)
 
        x = Flatten()(x)
        x = Dropout(DROPOUT)(x)
        x = Dense(output_dim=NB_CLASS, activation='linear')(x)
        x = Dense(output_dim=NB_CLASS, activation='softmax')(x)
 
        # Create a Keras Model
        model = Model(input=img_input, output=[x])
        model.summary()
        # Save a PNG of the Model Build
        #plot_model(model, to_file='../imgs/GoogLeNet.png')
        # return the constructed network architecture
        return model

設置優化方法，loss函數，並編譯模型：

model = l_model(width=norm_size, height=norm_size, depth=1, NB_CLASS=CLASS_NUM)
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

本文使用生成器以節約內存：

# Use generators to save memory
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
                             height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
                             horizontal_flip=True, fill_mode="nearest")
 
H = model.fit_generator(aug.flow(x_train, y_train, batch_size=BS),
                            steps_per_epoch=len(x_train) // BS,
                            epochs=EPOCHS, verbose=2)

結果分析

作出訓練階段的損失、精確度疊代圖，本文將epoch設置為10，已達到0.98的準確率（代碼、圖像如下所示）。

# plot the iteration process
N = EPOCHS
plt.figure()
plt.plot(np.arange(0,N),H.history['loss'],label='loss')
plt.plot(np.arange(0,N),H.history['acc'],label='train_acc')
plt.title('Training Loss and Accuracy on mnist-img classifier')
plt.xlabel('Epoch')
plt.ylabel('Loss/Accuracy')
plt.legend(loc='lower left')
plt.savefig('../figure/Figure_2.png')

公眾號：帕帕科技喵

歡迎關注與討論~