一、前言

最近一直在研究深度學(xué)習(xí)在目標檢測的應(yīng)用，看完了YOLOv2的paper和YAD2K的實現(xiàn)源碼，來總結(jié)一下自己的收獲，以便于加深理解。

二、關(guān)于目標檢測

目標檢測可簡單劃分成兩個任務(wù)，一個是分類，一個是確定bounding boxes。目前目標檢測領(lǐng)域的深度學(xué)習(xí)方法主要分為兩類：two stage的目標檢測算法；one stage的目標檢測算法。前者是先由算法生成一系列作為樣本的候選框，再通過卷積神經(jīng)網(wǎng)絡(luò)進行樣本分類；后者則不用產(chǎn)生候選框，直接將目標邊框定位的問題轉(zhuǎn)化為回歸問題處理。正是由于兩種方法的差異，在性能上也有不同，前者在檢測準確率和定位精度上占優(yōu)，后者在算法速度上占優(yōu)。YOLO（You Only Look Once ）則是一種one stage的目標檢測算法，目前已經(jīng)迭代發(fā)布了三個版本YOLOv1、YOLOv2、YOLOv3。本文著重介紹的是YOLOv2。

三、YOLOv2的改進

作者在論文中主要總結(jié)了關(guān)于YOLOv2的三個方面改進：Better、Faster、Stronger。這不是本片文章我想分享的主要內(nèi)容，因為有太多博主已經(jīng)寫的很透徹了，所以這部分我就只是很簡單的稍微敘述了作者的思想，公式比較難編輯也基本沒寫?？梢钥聪挛液隗w字的概括，如果想要了解更多的細節(jié)，可以搜搜別的博客看看。

YOLOv2的改進

1、Better

（1）batch Normalization
每個卷積層后均使用batch Normalization
采用Batch Normalization可以提升模型收斂速度，而且可以起到一定正則化效果，降低模型的過擬合。在YOLOv2中，每個卷積層后面都添加了Batch Normalization層，并且不再使用droput。使用Batch Normalization后，YOLOv2的mAP提升了2.4%。

Bacth_Normalizing
（2）High ResolutionClassifier
預(yù)訓(xùn)練分類模型采用了更高分辨率的圖片
YOLOv1先在ImageNet（224x224）分類數(shù)據(jù)集上預(yù)訓(xùn)練模型的主體部分（大部分目標檢測算法），獲得較好的分類效果，然后再訓(xùn)練網(wǎng)絡(luò)的時候?qū)⒕W(wǎng)絡(luò)的輸入從224x224增加為448x448。但是直接切換分辨率，檢測模型可能難以快速適應(yīng)高分辨率。所以YOLOv2增加了在ImageNet數(shù)據(jù)集上使用448x448的輸入來finetune分類網(wǎng)絡(luò)這一中間過程（10 epochs），這可以使得模型在檢測數(shù)據(jù)集上finetune之前已經(jīng)適用高分辨率輸入。使用高分辨率分類器后，YOLOv2的mAP提升了約4%。

YOLOv2訓(xùn)練的三個階段
（3）Convolutional With Anchor Boxes
使用了anchor boxes去預(yù)測bounding boxes，去掉了最后的全連接層，網(wǎng)絡(luò)僅采用了卷積層和池化層
在YOLOv1中，輸入圖片最終被劃分為7x7的gird cell，每個單元格預(yù)測2個邊界框。YOLOv1最后采用的是全連接層直接對邊界框進行預(yù)測，其中邊界框的寬與高是相對整張圖片大小的，而由于各個圖片中存在不同尺度和長寬比（scales and ratios）的物體，YOLOv1在訓(xùn)練過程中學(xué)習(xí)適應(yīng)不同物體的形狀是比較困難的，這也導(dǎo)致YOLOv1在精確定位方面表現(xiàn)較差。YOLOv2則引入了一個anchor boxes的概念，這樣做的目的就是得到更高的召回率，yolov1只有98個邊界框，yolov2可以達到1000多個（論文中的實現(xiàn)是845個）。還去除了全連接層，保留一定空間結(jié)構(gòu)信息，網(wǎng)絡(luò)僅由卷積層和池化層構(gòu)成。輸入由448x448變?yōu)?16x416，下采樣32倍，輸出為13x13x5x25。采用奇數(shù)的gird cell 是因為大圖像的中心往往位于圖像中間，為了避免四個gird cell參與預(yù)測，我們更希望用一個gird cell去預(yù)測。結(jié)果mAP由69.5下降到69.2，下降了0.3，召回率由81%提升到88%，提升7%。盡管mAP下降，但召回率的上升意味著我們的模型有更大的提升空間。
（4）Dimension Clusters（關(guān)于anchor boxes的第一個問題：如何確定尺寸）
利用Kmeans聚類，解決了anchor boxes的尺寸選擇問題
在Faster R-CNN和SSD中，先驗框的維度（長和寬）都是手動設(shè)定的，帶有一定的主觀性。如果選取的先驗框維度比較合適，那么模型更容易學(xué)習(xí)，從而做出更好的預(yù)測。因此，YOLOv2采用k-means聚類方法對訓(xùn)練集中的邊界框做了聚類分析。比較了復(fù)雜度和精確度后，選用了K值為5。因為設(shè)置先驗框的主要目的是為了使得預(yù)測框與ground truth的IOU更好，所以聚類分析時選用box與聚類中心box之間的IOU值作為距離指標：

距離公式

Dimension_Clusters.png
（5）Direction locationprediction（關(guān)于anchor boxes的第二個問題：如何確定位置）
引入Sigmoid函數(shù)預(yù)測offset，解決了anchor boxes的預(yù)測位置問題，采用了新的損失函數(shù)
作者借鑒了RPN網(wǎng)絡(luò)使用的anchor boxes去預(yù)測bounding boxes相對于圖片分辨率的offset，通過(x,y,w,h)四個維度去確定anchor boxes的位置，但是這樣在早期迭代中x,y會非常不穩(wěn)定，因為RPN是一個區(qū)域預(yù)測一次，但是YOLO中是169個gird cell一起預(yù)測，處于A gird cell 的x,y可能會跑到B gird cell中，到處亂跑，導(dǎo)致不穩(wěn)定。作者巧妙的引用了sigmoid函數(shù)來規(guī)約x,y的值在（0,1）輕松解決了這個offset的問題。關(guān)于w,h的也改進了YOLOv1中平方差的差的平方的方法，用了RPN中的log函數(shù)。
（6）Fine-Grained Features
采用了passthrough層，去捕捉更細粒度的特征
YOLOv2提出了一種passthrough層來利用更精細的特征圖，F(xiàn)ine-Grained Features之后YOLOv2的性能有1%的提升。
（7）Multi-Scale Training
采用不同尺寸的圖片訓(xùn)練，提高魯棒性
由于YOLOv2模型中只有卷積層和池化層，所以YOLOv2的輸入可以不限于416x416大小的圖片。為了增強模型的魯棒性，YOLOv2采用了多尺度輸入訓(xùn)練策略，具體來說就是在訓(xùn)練過程中每間隔一定的iterations之后改變模型的輸入圖片大小。由于YOLOv2的下采樣總步長為32，輸入圖片大小選擇一系列為32倍數(shù)的值：{320,352,384,...,608}，輸入圖片最小為320x320，此時對應(yīng)的特征圖大小為10x10（不是奇數(shù)了，確實有點尷尬），而輸入圖片最大為 608x608，對應(yīng)的特征圖大小為19x19。在訓(xùn)練過程，每隔10個iterations隨機選擇一種輸入圖片大小，然后只需要修改對最后檢測層的處理就可以重新訓(xùn)練。采用Multi-Scale Training策略，YOLOv2可以適應(yīng)不同大小的圖片，并且預(yù)測出很好的結(jié)果。

2、Faster

大多數(shù)檢測框架依賴于VGG-16作為的基本特征提取器。VGG-16是一個強大的，準確的分類網(wǎng)絡(luò)，但它是不必要的復(fù)雜。在單張圖像224×224分辨率的情況下VGG-16的卷積層運行一次前饋傳播需要306.90億次浮點運算。YOLO框架使用基于Googlenet架構(gòu)的自定義網(wǎng)絡(luò)。這個網(wǎng)絡(luò)比VGG-16更快，一次前饋傳播只有85.2億次的操作。然而，它的準確性比VGG-16略差。在ImageNet上，對于單張裁剪圖像，224×224分辨率下的top-5準確率，YOLO的自定義模型獲得了88.0%，而VGG-16則為90.0%。YOLOv2使用Darknet-19網(wǎng)絡(luò)，有19個卷積層和5個最大池化層。相比YOLOv1的24個卷積層和2個全連接層精簡了網(wǎng)絡(luò)。

YOLOv2網(wǎng)絡(luò)圖.png

3、Stronger

這里作者的想法也很新穎，解決了2個不同數(shù)據(jù)集相互排斥(mutualy exclusive)的問題。作者提出了WordTree，使用該樹形結(jié)構(gòu)成功的解決了不同數(shù)據(jù)集中的排斥問題。使用該樹形結(jié)構(gòu)進行分層的預(yù)測分類，在某個閾值處結(jié)束或者最終達到葉子節(jié)點處結(jié)束。下面這副圖將有助于WordTree這個概念的理解。

word_tree

四、YAD2K代碼解析

YAD2K用了90%的Keras和10%Tensorflow實現(xiàn)的YOLOv2。下面主要分析一下/yad2k/models/keras_yolo.py這個文件里的代碼。
提示：其實boxes的坐標是[y,x,h,w]而不是[x,y,w,h]。
流程：數(shù)據(jù)先經(jīng)過preprocess_true_boxes（）函數(shù)處理，然后做一些處理輸入到模型，損失函數(shù)是yolo_loss（），網(wǎng)絡(luò)最后一個卷積層的輸出作為函數(shù)yolo_head（）的輸入，然后再使用函數(shù)yolo_eval（），得到結(jié)果。

1、preprocess_true_boxes（）

這個函數(shù)是得到detectors_mask（最佳預(yù)測的anchor boxes，每一個true boxes都對應(yīng)一個anchor boxes），matching_true_boxes（用于后面和pred_boxes做差求loss）代碼后都給了比較詳細的注釋

def preprocess_true_boxes(true_boxes, anchors, image_size):
 """
參數(shù)
--------------
true_boxes : 實際框的位置和類別，我們的輸入。二個維度：
第一個維度：一張圖片中有幾個實際框
第二個維度： [x, y, w, h, class]，x,y 是框中心點坐標，w,h 是框的寬度和高度。x,y,w,h 均是除以圖片
           分辨率得到的[0,1]范圍的比值。
  
anchors : 實際anchor boxes 的值，論文中使用了五個。[w,h]，都是相對于gird cell 的比值。二個維度：
第一個維度：anchor boxes的數(shù)量，這里是5
第二個維度：[w,h]，w,h,都是相對于gird cell長寬的比值。
           [1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]
              
        
image_size : 圖片的實際尺寸。這里是416x416。


Returns
--------------
detectors_mask : 取值是0或者1，這里的shape是[13,13,5,1]，四個維度。
第一個維度：true_boxes的中心位于第幾行（y方向上屬于第幾個gird cell）
第二個維度：true_boxes的中心位于第幾列（x方向上屬于第幾個gird cell）
第三個維度：哪個anchor box
第四個維度：0/1。1的就是用于預(yù)測改true boxes 的 anchor boxes

matching_true_boxes: 這里的shape是[13,13,5,5]，四個維度。
第一個維度：true_boxes的中心位于第幾行（y方向上屬于第幾個gird cel）
第二個維度：true_boxes的中心位于第幾列（x方向上屬于第幾個gird cel）
第三個維度：第幾個anchor box
第四個維度：[x,y,w,h,class]。這里的x，y表示offset，是相當于gird cell的，w,h是取了log函數(shù)的，
class是屬于第幾類。后面的代碼會詳細看到
"""

    height, width = image_size
    num_anchors = len(anchors)

    assert height % 32 == 0,   '輸入的圖片的高度必須是32的倍數(shù)，不然會報錯。'
    assert width % 32 == 0,   '輸入的圖片的寬度必須是32的倍數(shù)，不然會報錯。'

    conv_height = height // 32    '進行g(shù)ird cell劃分'
    conv_width = width // 32    '進行g(shù)ird cell劃分'

    num_box_params = true_boxes.shape[1] 
    detectors_mask = np.zeros(
        (conv_height, conv_width, num_anchors, 1), dtype=np.float32)
    matching_true_boxes = np.zeros(
        (conv_height, conv_width, num_anchors, num_box_params),
        dtype=np.float32)    '確定detectors_mask和matching_true_boxes的維度，用0填充'

    for box in true_boxes:    '遍歷實際框'
        box_class = box[4:5]    '提取類別信息，屬于哪類'

        box = box[0:4] * np.array(
            [conv_width, conv_height, conv_width, conv_height])   '換算成相對于gird cell的值'

        i = np.floor(box[1]).astype('int')    '（y方向上屬于第幾個gird cell）'
        j = np.floor(box[0]).astype('int')    '（x方向上屬于第幾個gird cell）'
        best_iou = 0
        best_anchor = 0


        '計算anchor boxes 和 true boxes的iou，找到最佳預(yù)測的一個anchor boxes'
        for k, anchor in enumerate(anchors):
            # Find IOU between box shifted to origin and anchor box.
            box_maxes = box[2:4] / 2.
            box_mins = -box_maxes
            anchor_maxes = (anchor / 2.)
            anchor_mins = -anchor_maxes

            intersect_mins = np.maximum(box_mins, anchor_mins)
            intersect_maxes = np.minimum(box_maxes, anchor_maxes)
            intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
            intersect_area = intersect_wh[0] * intersect_wh[1]
            box_area = box[2] * box[3]
            anchor_area = anchor[0] * anchor[1]
            iou = intersect_area / (box_area + anchor_area - intersect_area)
            if iou > best_iou:
                best_iou = iou
                best_anchor = k


        if best_iou > 0:
            detectors_mask[i, j, best_anchor] = 1  '找到最佳預(yù)測anchor boxes'
            adjusted_box = np.array(
                [
                    box[0] - j, box[1] - i, 'x,y都是相對于gird cell的位置，左上角[0,0]，右下角[1,1]'
                    np.log(box[2] / anchors[best_anchor][0]),  '對應(yīng)實際框w,h和anchor boxes w,h的比值取log函數(shù)'
                    np.log(box[3] / anchors[best_anchor][1]), box_class  'class實際框的物體是屬于第幾類'
                ],
                dtype=np.float32)
            matching_true_boxes[i, j, best_anchor] = adjusted_box   
    return detectors_mask, matching_true_boxes

2、yolo_head（）

這個函數(shù)是輸入yolo的輸出層的特征，轉(zhuǎn)化成相對于gird cell坐標的x,y，相對于gird cell長寬的w,h，pred_confidence是判斷否存在物體的概率，pred_class_prob是sofrmax后各個類別分別的概率。返回值x,y,w,h在loss function中計算iou，然后計算iou損失。然后和pred_confidence計算confidence_loss，pred_class_prob用于計算classification_loss。

def yolo_head(feats, anchors, num_classes):
    """Convert final layer features to bounding box parameters.

    參數(shù)
    ----------
    feats : 神經(jīng)網(wǎng)絡(luò)最后一層的輸出，shape：[-1,13,13,125]

    anchors : 實際anchor boxes 的值，論文中使用了五個。[w,h]，都是相對于gird cell 長寬的比值。二個維度：
    第一個維度：anchor boxes的數(shù)量，這里是5
    第二個維度：[w,h]，w,h,都是相對于gird cell 長寬的比值。
    [1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]

    num_classes : 類別個數(shù)（有多少類）

    返回值
    -------
    box_xy : 每張圖片的每個gird cell中的每個pred_boxes中心點x,y相對于其所在gird cell的坐標值，左上頂點為[0,0],右下頂點為[1,1]。
    有五個維度，shape:[-1,13,13,5,2].
    第一個維度：圖片張數(shù)
    第二個維度：每組x,y在pred_boxes的行坐標信息（y方向上屬于第幾個gird cell）
    第三個維度：每組x,y在pred_boxes的列坐標信息（x方向上屬于第幾個gird cell）
    第四個維度：每組x,y的anchor box信息（使用第幾個anchor boxes）
    第五個維度：[x,y],中心點x,y相對于gird cell的坐標值
        
    box_wh : 每張圖片的每個gird cell中的每個pred_boxes的w,h都是相對于gird cell的比值
    有五個維度，shape:[-1,13,13,5,2].
    第一個維度：圖片張數(shù)
    第二個維度：每組w,h對應(yīng)的x,y在pred_boxes的行坐標信息（y方向上屬于第幾個gird cell）
    第三個維度：每組w,h對應(yīng)的x,y在pred_boxes的列坐標信息（x方向上屬于第幾個gird cell）
    第四個維度：每組w,h對應(yīng)的x,y的anchor box信息（使用第幾個anchor boxes）
    第五個維度：[w,h],w,h都是相對于gird cell的比值

    box_confidence : 每張圖片的每個gird cell中的每個pred_boxes的，判斷是否存在可檢測物體的概率。五個維度，shape:[-1,13,13,5,1]。各維度信息同上。

    box_class_pred : 每張圖片的每個gird cell中的每個pred_boxes所框起來的各個類別分別的概率(經(jīng)過了softmax)。shape:[-1,13,13,5,20]
        
    """
    num_anchors = len(anchors)
    # Reshape to batch, height, width, num_anchors, box_params.
    anchors_tensor = K.reshape(K.variable(anchors), [1, 1, 1, num_anchors, 2])

    conv_dims = K.shape(feats)[1:3]  '用多少個gird cell劃分圖片，這里是13x13'
    # In YOLO the height index is the inner most iteration.
    conv_height_index = K.arange(0, stop=conv_dims[0])
    conv_width_index = K.arange(0, stop=conv_dims[1])
    conv_height_index = K.tile(conv_height_index, [conv_dims[1]])

    conv_width_index = K.tile(
        K.expand_dims(conv_width_index, 0), [conv_dims[0], 1])
    conv_width_index = K.flatten(K.transpose(conv_width_index))
    conv_index = K.transpose(K.stack([conv_height_index, conv_width_index]))
    conv_index = K.reshape(conv_index, [1, conv_dims[0], conv_dims[1], 1, 2])  'shape:[1，13，13，1，2]'
    conv_index = K.cast(conv_index, K.dtype(feats))

    '
    tile（）：平移，
    expand_dims（）：增加維度
    transpose（）：轉(zhuǎn)置
    flatten（）：降成一維
    stack（）：堆積，增加一個維度
    conv_index:[0,0],[0,1],...,[0,12],[1,0],[1,1],...,[12,12]（大概是這個樣子）
    '

    feats = K.reshape(
        feats, [-1, conv_dims[0], conv_dims[1], num_anchors, num_classes + 5])
    conv_dims = K.cast(K.reshape(conv_dims, [1, 1, 1, 1, 2]), K.dtype(feats))

    box_xy = K.sigmoid(feats[..., :2])
    box_wh = K.exp(feats[..., 2:4])
    box_confidence = K.sigmoid(feats[..., 4:5])
    box_class_probs = K.softmax(feats[..., 5:])

    # Adjust preditions to each spatial grid point and anchor size.
    # Note: YOLO iterates over height index before width index.
    box_xy = (box_xy + conv_index) / conv_dims
    box_wh = box_wh * anchors_tensor / conv_dims

    return box_xy, box_wh, box_confidence, box_class_probs

3、yolo_loss（）

YOLOv2的損失函數(shù)較YOLOv1也有比較大的改變，主要分為三大部分的損失，IOU損失，分類損失，坐標損失。IOU損失分為了no_objects_loss和objects_loss，兩者相比對objects_loss的懲罰更大。下面簡單介紹一下和YOLOv1的區(qū)別。

3.1、confidence_loss：

YOLOv2中，總共有845個anchor_boxes，與true_boxes匹配的用于預(yù)測pred_boxes，未與true_boxes匹配的anchor_boxes用于預(yù)測background。

objects_loss（true_boxes所匹配的anchor_boxes）
與true_boxes所匹配的anchor_boxes去和預(yù)測的pred_boxes計算objects_loss。
no_objects_loss（true_boxes未匹配的anchor_boxes）
1、未與true_boxes所匹配的anchor_boxes中，若與true_boxes的IOU>0.6，則無需計算loss。
2、未與true_boxes所匹配的anchor_boxes中，若與true_boxes的IOU<0.6，則計算no_objects_loss。

這里疑惑點比較多，也比較繞，不太好理解，自己當時也理解錯了。后來自己理解：confidence是為了衡量anchor_boxes是否有物體的置信度，對于負責(zé)預(yù)測前景（pred_boxes）的anchors_boxes來說，我們必須計算objects_loss；對于負責(zé)預(yù)測背景（background）的anchors_boxes來說，若與true_boxes的IOU<0.6，我們需要計算no_objects_loss。這兩條都好理解，因為都是各干各的活。但若與true_boxes的IOU>0.6時，則不需要計算no_objects_loss。這是為什么呢？因為它給了我們驚喜，我們不忍苛責(zé)它。一個負責(zé)預(yù)測背景的anchor_boxes居然和true_boxes的IOU>0.6，框的甚至比那些本來就負責(zé)預(yù)測前景的anchors要準，吃的是草，擠的是奶，怎么能再懲罰它呢？好了言歸正傳，我個人覺得是因為被true_boxes的中心點可能在附近的gird cell里，但是true_boxes又比較大，導(dǎo)致它和附近gird cell里的anchors_boxes的IOU很大，那么這部分造成的損失可以不進行計算，畢竟它確實框的也準。就像faster rcnn中0.3<IOU<0.7的anchors一樣不造成損失，因為這部分并不是重點需要優(yōu)化的對象。
與YOLOv1不同的是修正系數(shù)的改變，YOLOv1中no_objects_loss和objects_loss分別是0.5和1，而YOLOv2中則是1和5。

3.2、classification_loss：

這部分和YOLOv1基本一致，就是經(jīng)過softmax（）后，20維向量（數(shù)據(jù)集中分類種類為20種）的均方誤差。

3.3、coordinates_loss：

這里較YOLOv1的改動較大，計算x,y的誤差由相對于整個圖像（416x416）的offset坐標誤差的均方改變?yōu)橄鄬τ趃ird cell的offset（這個offset是取sigmoid函數(shù)得到的處于（0,1）的值）坐標誤差的均方。也將修正系數(shù)由5改為了1 。計算w,h的誤差由w,h平方根的差的均方誤差變?yōu)榱耍?strong>w,h與對true_boxes匹配的anchor_boxes的長寬的比值取log函數(shù)，和YOLOv1的想法一樣，對于相等的誤差值，降低對大物體誤差的懲罰，加大對小物體誤差的懲罰。同時也將修正系數(shù)由5改為了1。

def yolo_loss(args,
              anchors,
              num_classes,
              rescore_confidence=False,
              print_loss=False):
    """
    參數(shù)
    ----------
    yolo_output : 神經(jīng)網(wǎng)絡(luò)最后一層的輸出，shape:[batch_size,13,13,125]
        
    true_boxes : 實際框的位置和類別，我們的輸入。三個維度：
    第一個維度：圖片張數(shù)
    第二個維度：一張圖片中有幾個實際框
    第三個維度： [x, y, w, h, class]，x,y 是實際框的中心點坐標，w,h 是框的寬度和高度。x,y,w,h 均是除以圖片分辨率得到的[0,1]范圍的值。


    detectors_mask : 取值是0或者1，這里的shape是[ batch_size，13,13,5,1]，其值可參考函數(shù)preprocess_true_boxes（）的輸出，五個維度：
    第一個維度：圖片張數(shù)
    第二個維度：true_boxes的中心位于第幾行（y方向上屬于第幾個gird cell）
    第三個維度：true_boxes的中心位于第幾列（x方向上屬于第幾個gird cell）
    第四個維度：哪個anchor box
    第五個維度：0/1。1的就是用于預(yù)測改true boxes 的 anchor boxes

    matching_true_boxes :這里的shape是[-1,13,13,5,5]，其值可參考函數(shù)preprocess_true_boxes（）的輸出，五個維度：
    第一個維度：圖片張數(shù)
    第二個維度：true_boxes的中心位于第幾行（y方向上屬于第幾個gird cel）
    第三個維度：true_boxes的中心位于第幾列（x方向上屬于第幾個gird cel）
    第四個維度：第幾個anchor box
    第五個維度：[x,y,w,h,class]。這里的x，y表示offset，是相當于gird cell的坐標，w,h是取了log函數(shù)的，class是屬于第幾類。

    anchors : 實際anchor boxes 的值，論文中使用了五個。[w,h]，都是相對于gird cell 長寬的比值。二個維度：
    第一個維度：anchor boxes的數(shù)量，這里是5
    第二個維度：[w,h]，w,h,都是相對于gird cell 長寬的比值。
    [1.08, 1.19], [3.42, 4.41], [6.63, 11.38], [9.42, 5.11], [16.62, 10.52]

    num_classes :類別個數(shù)（有多少類）

    rescore_confidence : bool值，F(xiàn)alse和True計算confidence_loss的objects_loss不同，后面代碼可以看到。

    print_loss : bool值，是否打印損失，包括總損失，IOU損失，分類損失，坐標損失

   返回值
    -------
    total_loss : float，總損失    
    """
    (yolo_output, true_boxes, detectors_mask, matching_true_boxes) = args
    num_anchors = len(anchors)
    object_scale = 5  '物體位于gird cell時計算置信度的修正系數(shù)'
    no_object_scale = 1  '物體位于gird cell時計算置信度的修正系數(shù)'
    class_scale = 1   '計算分類損失的修正系數(shù)'
    coordinates_scale = 1  '計算坐標損失的修正系數(shù)'

    pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head(
        yolo_output, anchors, num_classes)

    yolo_output_shape = K.shape(yolo_output)
    feats = K.reshape(yolo_output, [
        -1, yolo_output_shape[1], yolo_output_shape[2], num_anchors,
        num_classes + 5])           'shape:[-1,13,13,5,25]'

    pred_boxes = K.concatenate(
        (K.sigmoid(feats[..., 0:2]), feats[..., 2:4]), axis=-1)
    '合并得到pred_boxes的x,y,w,h，用于和matching_true_boxes計算坐標損失,shape:[-1,13,13,5,4]'


    # Expand pred x,y,w,h to allow comparison with ground truth.
    # batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params
    pred_xy = K.expand_dims(pred_xy, 4)  '增加一個維度由[-1,13,13,5,2]變成[-1,13,13,5,1,2]'
    pred_wh = K.expand_dims(pred_wh, 4)  '增加一個維度由[-1,13,13,5,2]變成[-1,13,13,5,1,2]'

    pred_wh_half = pred_wh / 2.
    pred_mins = pred_xy - pred_wh_half
    pred_maxes = pred_xy + pred_wh_half
    '計算pred_boxes左上頂點和右下頂點的坐標'

    true_boxes_shape = K.shape(true_boxes)

    true_boxes = K.reshape(true_boxes, [true_boxes_shape[0], 1, 1, 1, true_boxes_shape[1], true_boxes_shape[2]]) 
    'shape:[-1,1,1,1,-1,5],batch, conv_height, conv_width, num_anchors, num_true_boxes, box_params'

    true_xy = true_boxes[..., 0:2]
    true_wh = true_boxes[..., 2:4]

    true_wh_half = true_wh / 2.
    true_mins = true_xy - true_wh_half
    true_maxes = true_xy + true_wh_half
    '計算true_boxes左上頂點和右下頂點的坐標'


    intersect_mins = K.maximum(pred_mins, true_mins)
    intersect_maxes = K.minimum(pred_maxes, true_maxes)
    intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
    intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]

    pred_areas = pred_wh[..., 0] * pred_wh[..., 1]
    true_areas = true_wh[..., 0] * true_wh[..., 1]

    union_areas = pred_areas + true_areas - intersect_areas
    iou_scores = intersect_areas / union_areas
    '計算出所有anchor boxes（這里是一張圖片845個）和true_boxes的IOU，shape:[-1,13,13,5,2,1]'

    
    best_ious = K.max(iou_scores, axis=4)  '這里很有意思，若兩個true_boxes落在同一個gird cell里，我只取iou最大的那一個，
    因為best_iou這個值只關(guān)心在這個gird cell中最大的那個iou，不關(guān)心來自于哪個true_boxes。'

    best_ious = K.expand_dims(best_ious)  'shape:[1,-1,13,13,5,1]'

    object_detections = K.cast(best_ious > 0.6, K.dtype(best_ious)) 
     '選出IOU大于0.6的，不關(guān)注其損失。cast（）函數(shù)，第一個參數(shù)是bool值，dtype是int，就會轉(zhuǎn)換成0,1'

    no_object_weights = (no_object_scale * (1 - object_detections) *
                         (1 - detectors_mask))
    no_objects_loss = no_object_weights * K.square(-pred_confidence)

    if rescore_confidence:
        objects_loss = (object_scale * detectors_mask *
                        K.square(best_ious - pred_confidence))
    else:
        objects_loss = (object_scale * detectors_mask *
                        K.square(1 - pred_confidence))
    confidence_loss = objects_loss + no_objects_loss
    '計算confidence_loss，no_objects_loss是計算background的誤差， objects_loss是計算與true_box匹配的anchor_boxes的誤差，相比較no_objects_loss更關(guān)注這部分誤差，其修正系數(shù)為5'


    matching_classes = K.cast(matching_true_boxes[..., 4], 'int32')
    matching_classes = K.one_hot(matching_classes, num_classes)
    classification_loss = (class_scale * detectors_mask *
                           K.square(matching_classes - pred_class_prob))
    '計算classification_loss，20維向量的差'
    
    matching_boxes = matching_true_boxes[..., 0:4]
    coordinates_loss = (coordinates_scale * detectors_mask *
                        K.square(matching_boxes - pred_boxes))
    '計算coordinates_loss， x,y都是offset的均方損失，w,h是取了對數(shù)的均方損失，與YOLOv1中的平方根的差的均方類似，效果比其略好一點'

    confidence_loss_sum = K.sum(confidence_loss)
    classification_loss_sum = K.sum(classification_loss)
    coordinates_loss_sum = K.sum(coordinates_loss)
    total_loss = 0.5 * (
        confidence_loss_sum + classification_loss_sum + coordinates_loss_sum)
    if print_loss:
        total_loss = tf.Print(
            total_loss, [
                total_loss, confidence_loss_sum, classification_loss_sum,
                coordinates_loss_sum
            ],
            message='yolo_loss, conf_loss, class_loss, box_coord_loss:')

    return total_loss

4、 yolo_boxes_to_corners（）

這個函數(shù)很簡單，就是將yolo_head（）函數(shù)輸出的的x,y作為輸入，求出該boxes的左上頂點和右下頂點，作為yolo_filter_boxes（）的輸入，可用于畫出bounding box。

def yolo_boxes_to_corners(box_xy, box_wh):

    box_mins = box_xy - (box_wh / 2.)
    box_maxes = box_xy + (box_wh / 2.)

    return K.concatenate([
        box_mins[..., 1:2],  # y_min
        box_mins[..., 0:1],  # x_min
        box_maxes[..., 1:2],  # y_max
        box_maxes[..., 0:1]  # x_max
    ])

5、yolo_filter_boxes（）

從845個 pred_boxes中選出置信度大于0.6的作為最終的predict bounding boxes，實際訓(xùn)練時取了0.3，返回它的左上頂點和右下頂點坐標，置信度，分類類別。

def yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold=.6):
   
    box_scores = box_confidence * box_class_probs '定義一個box_scores，就是該 bounding boxes的置信度。shape:[-1,13,13,5,20]'
    box_classes = K.argmax(box_scores, axis=-1)  '求出最大box_scores的索引，即屬于第幾類'
    box_class_scores = K.max(box_scores, axis=-1) '求出最大box_scores的值，作為bounding boxes的置信度'
    prediction_mask = box_class_scores >= threshold '選出box_scores大于設(shè)定閾值的anchor_boxes，bool值，配合tf.boolean_mask（）函數(shù)獲取True所在位置的值'

    boxes = tf.boolean_mask(boxes, prediction_mask)  ' 符合要求的bounding boxes'
    scores = tf.boolean_mask(box_class_scores, prediction_mask)  '其對應(yīng)的置信度'
    classes = tf.boolean_mask(box_classes, prediction_mask)   '其對應(yīng)的分類結(jié)果'

    return boxes, scores, classes

6、yolo_eval（）

其中嵌套使用了yolo_boxes_to_corners（）函數(shù)和yolo_filter_boxes（）函數(shù)，然后對使用了置信度篩選后的bounding boxes使用了非極大值抑制輸出 boxes, scores, classes，分別是bounding boxes的左上頂點和右下頂點的坐標，bounding boxes的置信度，bounding boxes的的分類類別。

def yolo_eval(yolo_outputs,
              image_shape,
              max_boxes=10,
              score_threshold=.6,
              iou_threshold=.5):
    box_xy, box_wh, box_confidence, box_class_probs = yolo_outputs  'yolo_outputs是yolo_head的輸出'
    boxes = yolo_boxes_to_corners(box_xy, box_wh)
    boxes, scores, classes = yolo_filter_boxes(
        boxes, box_confidence, box_class_probs, threshold=score_threshold)

    'image_shape,(416x416)'
    height = image_shape[0]  
    width = image_shape[1]
    image_dims = K.stack([height, width, height, width])
    image_dims = K.reshape(image_dims, [1, 4])
    boxes = boxes * image_dims  '乘以圖片分辨率，得到真實的x,y,w,h'

    '運行一下NMS，非極大值抑制，iou_threshold默認是0.5，在訓(xùn)練時實際取了0.9，但是這里沒有分類別使用，我猜測這也是提高閾值的原因吧'
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')
    K.get_session().run(tf.variables_initializer([max_boxes_tensor]))
    nms_index = tf.image.non_max_suppression(
        boxes, scores, max_boxes_tensor, iou_threshold=iou_threshold)
    boxes = K.gather(boxes, nms_index)  'gather()函數(shù)，獲取索引對應(yīng)的值'
    scores = K.gather(scores, nms_index)
    classes = K.gather(classes, nms_index)
    return boxes, scores, classes

五、YOLO的優(yōu)缺點

不得不感嘆作者的創(chuàng)新能力，給我們帶來了這么好的YOLO。YOLO算法的優(yōu)點不言而喻，you only look once，不吃計算資源，在精度保證的情況下，運行速度快。缺點也很明顯就是bounding boxes的位置不夠準確，對于小物體和密集物體檢測效果差，召回率較低，但這也是YOLOv2主要改進的地方。

六、個人問題

同時這里提出一個YOLO的問題，在YAD2K中，如果一張圖片中有兩個true boxes，然后位于同一個gird cell，最優(yōu)匹配了同一個anchor boxes，似乎只能預(yù)測那個IOU最好的一個true boxes，不知道YOLOv2的源代碼是否有這樣的問題，也希望有大佬指點一下我。。。

七、總結(jié)

寫出這篇文章也是自己的總結(jié)，看論文啃代碼的確實很難熬，但是也讓自己更加深刻的理解了YOLO的來龍去脈，領(lǐng)略了作者的思想，收獲了更多。接下來我也要去領(lǐng)略一下YOLOv3的魅力了，如果有時間，我也會將YOLOv3的學(xué)習(xí)過程分享出來。加油！

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

目標檢測之YOLOv2，最詳細的代碼解析

目標檢測之YOLOv2，最詳細的代碼解析

一、前言

二、關(guān)于目標檢測

三、YOLOv2的改進

1、Better

2、Faster

3、Stronger

四、YAD2K代碼解析

1、preprocess_true_boxes（）

2、yolo_head（）

3、yolo_loss（）

3.1、confidence_loss：

3.2、classification_loss：

3.3、coordinates_loss：

4、 yolo_boxes_to_corners（）

5、yolo_filter_boxes（）

6、yolo_eval（）

五、YOLO的優(yōu)缺點

六、個人問題

七、總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

目標檢測之YOLOv2，最詳細的代碼解析

一、前言

二、關(guān)于目標檢測

三、YOLOv2的改進

1、Better

2、Faster

3、Stronger

四、YAD2K代碼解析

1、preprocess_true_boxes（）

2、yolo_head（）

3、yolo_loss（）

3.1、confidence_loss：

3.2、classification_loss：

3.3、coordinates_loss：

4、 yolo_boxes_to_corners（）

5、yolo_filter_boxes（）

6、yolo_eval（）

五、YOLO的優(yōu)缺點

六、個人問題

七、總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

目標檢測之YOLOv2，最詳細的代碼解析

二、關(guān)于目標檢測

1、Better

2、Faster

3、Stronger

四、YAD2K代碼解析

1、preprocess_true_boxes（）

2、yolo_head（）

3、yolo_loss（）

3.2、classification_loss：

3.3、coordinates_loss：

4、 yolo_boxes_to_corners（）

6、yolo_eval（）

六、個人問題

七、總結(jié)