一、主要貢獻
作者以RetinaNet和FCOS為例,分析了anchor-based和anchor-free的性能差異的原因:
- 1、每個位置的anchor數(shù)量不同。retinanet每個點多個anchor,fcos每個點只有一個anchor point
- 2、正負樣本的定義方法不同。retinanet使用IOU的雙閾值,fcos使用空間和尺度限制
- 3、回歸的初始狀態(tài)。retinanet是修改先驗的anchor;fcos是使用anchor point。
ATSS論文的主要貢獻:
- 1、指出anchor-based和anchor-free的檢測方法的本質區(qū)別是由于正負樣本的定義不同
- 2、提出一個通過目標的統(tǒng)計特征,在訓練過程中自適應進行正負樣本分配
- 3、證明在一個位置放置多個anchor去檢測目標是一個低效的方法
- 4、在沒有任何成本的情況下達到了COCO上最好的表現(xiàn)
拋出了一個在目標檢測領域的核心問題,即label asign,如何分配正負樣本?
二、分析anchor-free和anchor-based方法的差距
作者為了公平的比較兩者實際的差異,使用相同的訓練方法和tricks,并且將RetinaNet每個位置的anchor設為1。但是兩者依舊存在0.8%的差距。

作者繼續(xù)分析了存在差距的原因:
-
1、正負樣本的定義方法
image.png -
2、回歸的初始狀態(tài),即對anchor回歸還是對一個中心點回歸。
image.png
通過以下實驗的,得出結論:正負樣本的定義方法才是核心原因

三、提出Adaptive Training Sample Selection
在訓練的過程中,通過目標的統(tǒng)計特征,自動進行正負樣本的劃分。具體過程:
1、對于每個
ground-truth,通過
距離選擇
個離其中心點最近的
anchor,對于層特征金字塔,共存在
個候選的正樣本。
2、計算挑選出來的候選的正樣本和
之間的IOU。計算相應的均值
和標準差
。
3、通過均值和標準差這兩個統(tǒng)計特征,得到閾值
4、如果候選樣本中IOU大于
,并且候選樣本的中心點位于
ground-truth中,將其標記為正樣本-
5、如果一個
anchor box被分配給了多個ground-truth,僅保留IOU最大的。
image.png 1、為什么通過中心點的歐式距離選擇候選的正樣本?
對于RetinaNet和FCOS,越靠近ground-truth,預測效果越好。-
2、為什么使用了均值和標準差作為IOU閾值?
可以自動調節(jié)選取正負樣本的閾值。比如當出現(xiàn)高方差的時候,往往意味著有一個FPN層出現(xiàn)了較高的IOU,說明該層非常適合這個物體的預測,因此最終的正樣本都出自該層;而出現(xiàn)低方差的時候,說明有多個FPN層適合預測這個物體,因此會在多個層選取正樣本。
image.png 3、為什么限制
anchor box的中心點要在ground-truth中?
中心點在ground-truth之外的anchor box往往屬于poor candidates。使用ground-truth外的特征去預測ground-truth。4、采用這種
label asign劃分正負樣本是否有效
根據(jù)統(tǒng)計統(tǒng)計學,雖然不是標準的正態(tài)分布,但是仍然大約會有16%的候選樣本會被劃分為正樣本,每一個ground-truth在不同尺度、不同比例、不同位置都會分配個正樣本。相反對于
RetinaNet和FCOS的分配策略而言,大的物體會有更多的正樣本,這并不是一種公平的方式。-
5、如何選擇超參數(shù)
?
對于的選擇并不敏感。
image.png
四、結果驗證
1、使用了 ATSS后,RetinaNet和FCOS無明顯差距

2、不同尺度和不同比例的
anchor box效果都很魯棒

3、引入ATSS策略后,設置
anchor數(shù)量與結果沒有明顯的關系。
4、ATSS的性能

五、源碼實現(xiàn)
源碼參考了mmdetection的實現(xiàn):
@BBOX_ASSIGNERS.register_module()
class ATSSAssigner(BaseAssigner):
"""Assign a corresponding gt bbox or background to each bbox.
Each proposals will be assigned with `0` or a positive integer
indicating the ground truth index.
- 0: negative sample, no assigned gt
- positive integer: positive sample, index (1-based) of assigned gt
Args:
topk (float): number of bbox selected in each level
"""
def __init__(self,
topk,
iou_calculator=dict(type='BboxOverlaps2D'),
ignore_iof_thr=-1):
self.topk = topk
self.iou_calculator = build_iou_calculator(iou_calculator)
self.ignore_iof_thr = ignore_iof_thr
# https://github.com/sfzhang15/ATSS/blob/master/atss_core/modeling/rpn/atss/loss.py
def assign(self,
bboxes,
num_level_bboxes,
gt_bboxes,
gt_bboxes_ignore=None,
gt_labels=None):
"""Assign gt to bboxes.
The assignment is done in following steps
1. compute iou between all bbox (bbox of all pyramid levels) and gt
2. compute center distance between all bbox and gt
3. on each pyramid level, for each gt, select k bbox whose center
are closest to the gt center, so we total select k*l bbox as
candidates for each gt
4. get corresponding iou for the these candidates, and compute the
mean and std, set mean + std as the iou threshold
5. select these candidates whose iou are greater than or equal to
the threshold as postive
6. limit the positive sample's center in gt
Args:
bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
num_level_bboxes (List): num of bboxes in each level
gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
labelled as `ignored`, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
INF = 100000000
bboxes = bboxes[:, :4]
num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)
# compute iou between all bbox and gt
overlaps = self.iou_calculator(bboxes, gt_bboxes)
# assign 0 by default
assigned_gt_inds = overlaps.new_full((num_bboxes, ),
0,
dtype=torch.long)
if num_gt == 0 or num_bboxes == 0:
# No ground truth or boxes, return empty assignment
max_overlaps = overlaps.new_zeros((num_bboxes, ))
if num_gt == 0:
# No truth, assign everything to background
assigned_gt_inds[:] = 0
if gt_labels is None:
assigned_labels = None
else:
assigned_labels = overlaps.new_full((num_bboxes, ),
-1,
dtype=torch.long)
return AssignResult(
num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)
# compute center distance between all bbox and gt
gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
gt_points = torch.stack((gt_cx, gt_cy), dim=1)
bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)
distances = (bboxes_points[:, None, :] -
gt_points[None, :, :]).pow(2).sum(-1).sqrt()
if (self.ignore_iof_thr > 0 and gt_bboxes_ignore is not None
and gt_bboxes_ignore.numel() > 0 and bboxes.numel() > 0):
ignore_overlaps = self.iou_calculator(
bboxes, gt_bboxes_ignore, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
ignore_idxs = ignore_max_overlaps > self.ignore_iof_thr
distances[ignore_idxs, :] = INF
assigned_gt_inds[ignore_idxs] = -1
# Selecting candidates based on the center distance
candidate_idxs = []
start_idx = 0
for level, bboxes_per_level in enumerate(num_level_bboxes):
# on each pyramid level, for each gt,
# select k bbox whose center are closest to the gt center
end_idx = start_idx + bboxes_per_level
distances_per_level = distances[start_idx:end_idx, :]
selectable_k = min(self.topk, bboxes_per_level)
_, topk_idxs_per_level = distances_per_level.topk(
selectable_k, dim=0, largest=False)
candidate_idxs.append(topk_idxs_per_level + start_idx)
start_idx = end_idx
candidate_idxs = torch.cat(candidate_idxs, dim=0)
# get corresponding iou for the these candidates, and compute the
# mean and std, set mean + std as the iou threshold
candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]
overlaps_mean_per_gt = candidate_overlaps.mean(0)
overlaps_std_per_gt = candidate_overlaps.std(0)
overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]
# limit the positive sample's center in gt
for gt_idx in range(num_gt):
candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
ep_bboxes_cx = bboxes_cx.view(1, -1).expand(
num_gt, num_bboxes).contiguous().view(-1)
ep_bboxes_cy = bboxes_cy.view(1, -1).expand(
num_gt, num_bboxes).contiguous().view(-1)
candidate_idxs = candidate_idxs.view(-1)
# calculate the left, top, right, bottom distance between positive
# bbox center and gt side
l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01
is_pos = is_pos & is_in_gts
# if an anchor box is assigned to multiple gts,
# the one with the highest IoU will be selected.
overlaps_inf = torch.full_like(overlaps,
-INF).t().contiguous().view(-1)
index = candidate_idxs.view(-1)[is_pos.view(-1)]
overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
overlaps_inf = overlaps_inf.view(num_gt, -1).t()
max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
assigned_gt_inds[
max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1
if gt_labels is not None:
assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
pos_inds = torch.nonzero(
assigned_gt_inds > 0, as_tuple=False).squeeze()
if pos_inds.numel() > 0:
assigned_labels[pos_inds] = gt_labels[
assigned_gt_inds[pos_inds] - 1]
else:
assigned_labels = None
return AssignResult(
num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)




