Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Posted Apr 3, 2024 Updated Apr 3, 2024

By Geonu-Lee 2 min read

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

CVPR 2021 , 2024-04-03 기준 973회 인용

Task

Object Detection

Contributions

propose purely sparse method for object detection in images
이전 방법들은 완전한 sparse 방법이 아니다
- Faster R-CNN : anchor box 의 size 와 ratio 같은 요소들로 인해 density 정도가 차이나고 그에 따라 결과 차이가 크다 –> 즉, sensitive to heuristic assign rules
- DETR : N개의 object query 와 이미지의 global feature 가 interact 해야하므로 pure sparse가 아니다

DETR 은 학습시간도 오래걸린다
기존의 Dense 한 방법들과 비슷한 학습 시간에 더 좋은 성능을 보여준다

Proposemd Method

Sparse box - N개의 learnable proposal box ( N X 4 )
Sparse features - N개의 learnable proposal features ( N X C ) C = 256

Dynamic instance interactive head
learnable proposal features 를 활용해서 sparse box로 구해진 ROI feature 와의 interaction

ROI feature will interact with corresponding proposal features to filter out ineffective bins and outputs the final object feature

학습 과정 (2~7 과정을 k 번 반복)
Backbone feature extractor 로부터 feature 추출
N 개의 proposal box를 통해서 RoI feature 얻기
Proposal feature 를 self-attention → object 간 관계 파악
Proposal feature를 통해서 2개의 1x1 convolution weight 생성
RoI feature를 2개의 convolution 에 통과
classification layer, bounding boxes layer 통과
예측된 bboxes, proposal features 를 다음 stage 입력으로 사용

k-번째 dynamic instance interaction pseudo-code (3~5 번 과정)
proposal features를 활용해서 2개의 1x1 convolution weight를 만든다
만들어진 convolution weights를 통해서 roi features에 적용