DAC-DETR: Divide the Attention Layers and Conquer

Posted Feb 27, 2024 Updated Feb 27, 2024

By Geonu-Lee 3 min read

DAC-DETR: Divide the Attention Layers and Conquer

NeurIPS 2023, 2024-02-27 기준 1회 인용

Task

Object Detection
DETR

Contribution

cross-attention 과 self-attention 이 서로 다른 impact를 주는 것을 발견
서로 다른 역할을 하눈 두 attention 을 효율적으로 학습하기 위한 Divide-And-Conquer DETR (DAC-DETR) 구조를 제안
auxiliary decoder 를 통해서 cross-attetion layers 가 효과적으로 학습되도록 함

Motivation

이미 학습 완료된 Deformable DETR 을 기반으로 분석
Query 선택은 IoU 큰 query 4개를 선택
첫번째 행은 box position, 두번째 행은 feature distribution

(a) 는 decoder 이전의 query 를 visualization

Cross-attention layers tend to gather multiple queries around the same object

(b) 는 self-attention layer 를 빼고 cross-attention layer 만 사용했을 때 결과
(a) 의 query 들보다 object 중심으로 모이게 하는 효과가 있음

-> Cross-attention layers 는 여러 queries 를 같은 object 으로 모이게 하는 영향이 있다

Self-attention layers disperse these queries from each other

(c) 는 self-attention layers, cross-attention layers 모두 사용한 일반 Deformable DETR 결과
한개의 query를 제외하고 object 로부터 멀어지는 효과가 있음, 한개의 query는 object에 더 가까워 진다

-> Self-attention layers 는 single query만 object에 가깝게 하고 나머지는 멀어지게 한다

self-attention layers play a critical role in removing duplicates

이와 같이, self-attention 과 cross-attention 에 의한 영향이 다르니 이를 효과적으로 학습하는 방법을 제안

Proposemd Method

두개의 Decoder 로 구성
각 Decoder 의 첫번째 Layer 로 들어가는 쿼리들은 동일
$Q_O^0 = Q_C^0 = Q$

O-Decoder

Original Decoder (O-Decoder)

기존 Decoder 와 동일하게 self-attention, cross-attention

C-Decoder

focus on the cross-attention 이라서 C-Decoder

self-attention 을 제거하고 cross-attention 만 사용
model weights 는 모두 share

cross-attention layer 역할이 하나의 object 에 여러 쿼리들이 모이게 하는 것이기 때문에 해당 decoder 에 대한 결과는 one-to-many 방식으로 loss 부여

classification score 와 IoU 를 기반으로 matching score 를 연산
두가지 조건을 만족할때만 positive 로 matching 하고 나머지는 negative

matching score 가 threshold 값을 넘을 때
해당 matching score 가 top-k 안으로 들어올때

각 threshold 마다 높은 matching score 갖는 쿼리수를 count

제안하는 방법 DAC-DETR 이 Deformable-DETR 보다 높은 수를 보여준다
이는 object 로 모이는 쿼리수가 많다

$t=0.8$ 일때 Deformable-DETR 은 평균적으로 1이 안넘는다
제안하는 DAC-DETR 은 평균 수가 1을 넘기에 good classification results 와 good IoU 를 갖는 case 가 있다

Experimental Results

ResNet50 backbone 에 대한 결과
제안하는 방법의 결과를 적용했을 때 성능이 향상 된다

$*$ 는 Align-DETR 에서 제안하는 Loss 를 사용

Swin-L backbone 에 대한 결과

Object Detection DETR

This post is licensed under CC BY 4.0 by the author.

Trending Tags

Object Detection DETR Anomaly Detection Generation Diffusion Data Augmentation Segmentation Active Learning Anomaly Localization Contrastive Learning