2016-11-08?陳偉才?人工智能學(xué)堂
在TensorFlow入門教程中,我們采用了 Softmax 算法深度學(xué)習(xí)MNIST圖庫,短短幾行代碼,學(xué)習(xí)準確率92%左右。上一篇文章,我們也學(xué)習(xí)了更復(fù)雜的卷積神經(jīng)網(wǎng)絡(luò)CNN算法,本文將使用CNN來學(xué)習(xí)MNIST,其準確率相對softmax提升不少。
簡單回顧MNIST背景
MNIST圖片庫是28*28的黑白圖片,圖片中寫著0到9共十個數(shù)字,在Softmax實現(xiàn)中,僅僅學(xué)習(xí)了一個Weight參數(shù)和一個Batis參數(shù)共兩個參數(shù)。這也直接導(dǎo)致Softmax實習(xí)的準確率無法提升。
Weight = tf.Variable(tf.zeros([784, 10]))
Batis = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, Weight) + Batis)
MNIST卷積神經(jīng)網(wǎng)絡(luò)的實現(xiàn)
CNN卷積神經(jīng)網(wǎng)絡(luò)深度學(xué)習(xí)過程,最最主要的兩個過程是卷積和池化。池化在之前的CNN文章中沒有提到,本文在用到的過程中會詳細講解。
權(quán)值參數(shù)的初始化
CNN深度學(xué)習(xí)涉及到大量的權(quán)值參數(shù)Weights和Batis,所以,對權(quán)值參數(shù)的初始化工作,我們抽象出兩個函數(shù)用于參數(shù)的初始化,如下面代碼片段所示:
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
初始化權(quán)值參數(shù)時,使用了tf.truncated_normal函數(shù),該函數(shù)的原型是 tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)。函數(shù)的輸出是一個tensor,其形狀是由輸入?yún)?shù)shape進行定義,tensor中的值滿足截斷正態(tài)分布,值的數(shù)據(jù)類型默認是tf.float32。weight_variable([5, 5, 1, 32])則返回5*5*1*32的tensor變量。
初始化Batis,相對于Weight則相對簡單很多,也是返回一個定義好shape的tensor,該tensor是一個常量進行初始化,初始值為0.1。bias_variable([32])則返回一個數(shù)組,該數(shù)組有32個元素,元素的值均為0.1。
卷積和池化
卷積和池化是CNN深度學(xué)習(xí)最為重要的兩個階段,先卷積再池化,且包含多層的卷積和池化,即卷積 -> 池化 ->卷積 -> 池化 ->卷積 -> 池化 -> ...如此循壞。所以,我們把卷積和池化涉及到的函數(shù)也抽象出來,如下面所示:
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
conv2d函數(shù)返回的是2維的卷積核。該函數(shù)最終調(diào)用的是tf.nn.conv2d,其原型為tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)。其中輸入?yún)?shù)input和filter均為四維tensor,strides則是一個一維的整形tensor,其中元素依次表示input參數(shù)不同維度的步幅。所以stries只包含四個元素的數(shù)組。這里我們默認strides四個維度的步幅都是1。padding則表示填充算法,有“VALID”和“SAME”兩種方式。
關(guān)于padding VALID和SAME的填充算法,https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#convolution , 具體描述如下:
1、padding = VALID
new_height = new_width = (W – F + 1) / S (結(jié)果向上取整)
2、padding = SAME
new_height = new_width = W / S (結(jié)果向上取整)
在高度上需要pad的像素數(shù)為
pad_needed_height = (new_height – 1) ?× S + F - W
根據(jù)上式,輸入矩陣上方添加的像素數(shù)為:pad_top = pad_needed_height / 2 ?(結(jié)果取整)。下方添加的像素數(shù)為:pad_down = pad_needed_height - pad_top。以此類推,在寬度上需要pad的像素數(shù)和左右分別添加的像素數(shù)為pad_needed_width = (new_width – 1) ?× S + F - W,pad_left = pad_needed_width ?/ 2 (結(jié)果取整),pad_right = pad_needed_width – pad_left。
函數(shù)max_pool則是對池化的函數(shù)封裝,采用的是最大值池化。其調(diào)用的函數(shù)是tf.nn.max_pool,其原型為tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)。
第一層卷積
接下來,我們可以進行第一層的卷積和池化啦。input是28*28*1的圖像(寬28,高28,由于是黑白色,所以深度為1),我們進行第一層卷積濾波器大小為5*5*1,輸出32通道,所以涉及到的權(quán)值參數(shù)分別為:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
由于輸入需要4維的tensor,所以我們需要將輸入圖像參數(shù)x reshape為4維的,如下所示:
x_image = tf.reshape(x, [-1, 28, 28, 1])
接下來我們對輸入x_image進行卷積和池化,這里的卷積采用的是relu(Rectified linear unit),修正線性單元,如下所示:
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
經(jīng)過第一層卷積后,我們看下output是多大。由于我們采用的是SAME的padding算法,我們按照上面的計算公式計算到上下左右各需要增加2位,所以輸入從28*28*1,填充為32*32*1。
由于濾波器大小為5,且步幅為1,所以,32 -5 +1 = 28,即經(jīng)過第一輪卷積后output為28*28*32。然后再經(jīng)過max pool池化話,就變?yōu)?4*14*32了。
第二層卷積
第一層卷積之后,輸出的通道是32,在第二層卷積,我們將構(gòu)建更深的卷積通道。第二層卷積的輸入是5*5*32,所以,第二層的weights是5*5*32,輸出是64通道,如下所示:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
同樣,我們進行第二層的卷積和池化:
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
我們再次來計算下第二層卷積和池化之后,輸出的shape是什么。同樣我們計算出padding為SAME時,上下左右需要padding的大小為2,所以第二層卷積的輸入是18*18*32。當(dāng)濾波器大小為5,strides為1時,18-5+1=14,則第二層卷積后的shape是14*14*64。最后經(jīng)過max pool池化后,output的shap是7*7*64。
全連接層
現(xiàn)在,圖片尺寸減小到7x7,我們加入一個有1024個神經(jīng)元的全連接層,用于處理整個圖片。我們把池化層輸出的張量reshape成一些向量,乘上權(quán)重矩陣,加上偏置,然后對其使用ReLU,如下所示:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
全鏈接后輸出是一個1024元素的一維tensor。
Dropout
為了減少過擬合,我們在輸出層之前加入dropout。我們用一個placeholder來代表一個神經(jīng)元的輸出在dropout中保持不變的概率。這樣我們可以在訓(xùn)練過程中啟用dropout,在測試過程中關(guān)閉dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神經(jīng)元的輸出外,還會自動處理神經(jīng)元輸出值的scale。所以用dropout的時候可以不用考慮scale。如下所示:
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
輸出層
最后,我們需要將學(xué)習(xí)結(jié)果進行輸出,上面全連接后,通道變成1024,而學(xué)習(xí)的是0到9十個數(shù)字的概率,所以,最后通道是10。如下所示:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
訓(xùn)練和評估
為了進行訓(xùn)練和評估,我們使用與之前簡單的單層SoftMax神經(jīng)網(wǎng)絡(luò)模型幾乎相同的一套代碼,只是我們會用更加復(fù)雜的ADAM優(yōu)化器來做梯度最速下降,在feed_dict中加入額外的參數(shù)keep_prob來控制dropout比例。然后每100次迭代輸出一次日志。
執(zhí)行結(jié)果
# ~/tensorflow/bin/python2.7 tf_cnn_mnist.py
Extracting /tmp/MNIST_data/train-images-idx3-ubyte.gz
Extracting /tmp/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /tmp/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/MNIST_data/t10k-labels-idx1-ubyte.gz
step 0, training accuracy 0.16
step 100, training accuracy 0.84
step 200, training accuracy 0.9
step 300, training accuracy 0.88
step 400, training accuracy 0.96
step 500, training accuracy 0.92
step 600, training accuracy 0.98
step 700, training accuracy 0.98
step 800, training accuracy 0.88
step 900, training accuracy 1
step 1000, training accuracy 0.96
step 1100, training accuracy 0.96
step 1200, training accuracy 0.96
step 1300, training accuracy 0.98
step 1400, training accuracy 0.94
step 1500, training accuracy 0.98
step 1600, training accuracy 0.98
step 1700, training accuracy 0.94
step 1800, training accuracy 0.94
step 1900, training accuracy 1
step 2000, training accuracy 0.9
step 2100, training accuracy 0.92
step 2200, training accuracy 0.92
step 2300, training accuracy 1
step 2400, training accuracy 0.98
step 2500, training accuracy 0.98
step 2600, training accuracy 1
step 2700, training accuracy 1
step 2800, training accuracy 1
step 2900, training accuracy 0.96
step 3000, training accuracy 0.98
step 3100, training accuracy 0.98
step 3200, training accuracy 0.98
step 3300, training accuracy 1
step 3400, training accuracy 0.98
step 3500, training accuracy 1
step 3600, training accuracy 1
step 3700, training accuracy 0.96
step 3800, training accuracy 1
step 3900, training accuracy 0.98
step 4000, training accuracy 1
step 4100, training accuracy 0.96
step 4200, training accuracy 0.98
step 4300, training accuracy 1
step 4400, training accuracy 0.98
step 4500, training accuracy 0.98
step 4600, training accuracy 1
step 4700, training accuracy 0.98
step 4800, training accuracy 1
step 4900, training accuracy 0.96
step 5000, training accuracy 0.96
step 5100, training accuracy 0.98
step 5200, training accuracy 1
step 5300, training accuracy 1
step 5400, training accuracy 1
step 5500, training accuracy 1
step 5600, training accuracy 0.98
step 5700, training accuracy 1
step 5800, training accuracy 0.96
step 5900, training accuracy 1
step 6000, training accuracy 0.98
step 6100, training accuracy 0.98
step 6200, training accuracy 0.98
step 6300, training accuracy 0.98
step 6400, training accuracy 0.98
step 6500, training accuracy 1
step 6600, training accuracy 1
step 6700, training accuracy 1
step 6800, training accuracy 1
step 6900, training accuracy 1
step 7000, training accuracy 1
step 7100, training accuracy 0.98
step 7200, training accuracy 1
step 7300, training accuracy 0.98
step 7400, training accuracy 1
step 7500, training accuracy 0.98
step 7600, training accuracy 1
step 7700, training accuracy 1
step 7800, training accuracy 0.98
step 7900, training accuracy 1
step 8000, training accuracy 1
step 8100, training accuracy 0.98
step 8200, training accuracy 1
step 8300, training accuracy 1
...
從上面的輸出結(jié)果可以看出,采用CNN卷積神經(jīng)網(wǎng)絡(luò),其學(xué)習(xí)的準確率最低高達98%,比Softmax的準確率92%提升不少!
完整代碼
完整代碼見github地址:https://github.com/chenweicai/tensorflow-study/blob/master/tf_cnn_mnist.py , 具體內(nèi)容如下:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
mnist = input_data.read_data_sets("/tmp/MNIST_data", one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
# convolution layer 1
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1, 28, 28, 1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# convolution layer 2
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
# full convolution
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# output layer, softmax
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# model training
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
print "step %d, training accuracy %g"%(i, train_accuracy)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print "test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?點擊圖片關(guān)注人工智能學(xué)堂