DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

Posted Jan 9, 2024 Updated Feb 12, 2024

By Geonu-Lee 4 min read

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

ICCV 2023 , 2024-01-09 기준 5회 인용

Task

Object Detection
DETR
Knowledge Distillation

Contribution

DETR-families 를 위한 knowlege distillation method 를 제안
Hungarian-matching logits distillation
Target-aware feature distillation
Query-prior assignment distillation

Teacher: ResNet-101
Student: ResNet-50

본 논문에서 제안하는 방식으로 DETR 방법들에 대해서 KD를 적용했을 때 Teacher 를 뛰어넘는 Student 성능을 보여준다

Proposed Method

Analyze KD methods desigend for convolution-based detectors

기존의 convolution-based detectors에서 제안된 KD 방법들을 AdaMixer 에 적용했을 경우 성능이 오히려 떨어지거나 크게 향상되지 않는다

logits-level distillation methods

DETR 방법들은 decoders 으로 인해서 unordered 상태로 box predictions이 된다
Teacher의 predictions과 Student의 predictions을 one-to-one으로 대응하기가 어렵다

no natural one-to-one correspondence of predicted boxes between teacher and student for a logits-level distillation

feature-level distillation methods

generation mechanism being different between convolution and transformer

feature 생성하는 mechanism 이 convolution 과 transformer 가 서로 다르다

Therefore, directly using previous feature-level KD methods for DETRs may not necessarily bring performance gains

이전의 feature-level KD 방법으로는 DETR에 바로 적용하기에 무리가 있다

Overview

Hungarian-matching Logits Distillation

Teacher의 prediction 결과와 Student의 prediction 결과를 matching

Positive distillation

Since teacher’s positive predictions are target closely related

Teacher 에서 positive로 예측된 것들과 Student에서 positive로 예측된 것들과 matching (Teacher model 결과를 pseudo GT 로 활용)
하지만 Teacher model 에서 positive로 예측되는 수가 굉장히 제한적 (이미지에 평균 7개의 positive box가 있기 때문에)
-> 즉, Teacher model 의 수 많은 negative predictions 결과를 무시하는게 된다