Beyond 2D image space : exploring 3D geometric variation for visual object recognition|RISS 상세보기

국문 초록 (Abstract)

최근 심층 컨볼루션 신경망의 출현으로 시각적 객체 인식은 비약적 성능 향상을 이루었다. 하지만 심층
컨볼루션 신경망을 통한 기하학적 불변성을 투영하는 기존 기술은 2차원 영상 공간의 변형만을 활용한다. 이
것은 2차원 영상 공간의 물체가 3차원 물체의 투영 결과라는 사실에 기반하지 않으며, 따라서 카메라 시점의
큰 변화에 대한 능력이 제한적이다. 이러한 한계를 극복하기 위해 본 논문에서는 단일 2차원 영상에서 객체
분류, 탐지 및 재식별을 포함한 시각적 객체 인식을 위해 3차원 공간에서 객체의 기하학적 변형을 연구한다.
본 논문의 주요 연구는 아래와 같다.
먼저 3차원 공간에 정의된 컨보루션 커널의 원통형 표현을 활용하여 원통형 컨볼루션 네트워크를 제안
한다. 원통형 컨볼루션 네트워크는 시점 별 표현자를 추출하여 각 촬영 시점에서 객체 분류 확률을 예측한다.
그리고 시점 별 표현자 및 제안된 정현파 소프트 아그맥스 모듈을 사용해 객체 분류 및 촬영시첨을 동시에 추
정한다. 객체 탐지 및 촬영 시점 동시 추정을 위한 학습 데이터가 제한적인 상황에서 우리는 촬영 시점 학습
라벨이 없이 기존 객체 탐지 데이터만을 활용한 준지도 학습 방식을 사용하는것을 추가적으로 제안한다. 본
논문의 두번째 부분에서는 3차원 학습 라벨을 사용하지 않고 세분화된 객체 분류 영상 데이터만을 이용해
단일 영상에서 3차원 공간의 객체 변형을 추정하는 기법을 제안한다. 이를 위하여 객체를 3차원 형상 및 외
관으로 구성된 3차원 표준공간으로 투영하여 카메라 촬영 시점 변화에 무관하게 한다. 기존 2차원 영상 공간
내 객체의 기하학적 변형을 추정하는 기존 방법들과 달리 제안된 방법은 3차원 표준 공간에서 객체의 외관
표현자를 추정할 수 있기 때문에 객체 분류기가 3차원 기하학적 변형에 강인하게 할 수 있다. 또한 제안된
방법은 기존 방법들과 달리 3차원 형상 변형을 같이 사용하여 객체 분류 식별력을 향상시킨다.
본 논문을 통해 3차원 공간에서 객체의 기하학적 변형을 추정하는것이 객체의 식별력을 개선하며 이를
통해 객체 탐지, 촬영시점 추정, 세분화된 객체 인식 및 재식별을 포함한 시각적 객체 인식 능력이 기존 방
법들에 비해 성능 향상을 이루어 냄을 확인하였다. 본 논문에서 제안된 2차원 영상 내 3차원 기하학적 변형
추론 기술이 추후 다양한 컴퓨터 비전 분야에 활용될 수 있을 것이라 예상된다.

번역하기

최근 심층 컨볼루션 신경망의 출현으로 시각적 객체 인식은 비약적 성능 향상을 이루었다. 하지만 심층 컨볼루션 신경망을 통한 기하학적 불변성을 투영하는 기존 기술은 2차원 영상 공간의...

다국어 초록 (Multilingual Abstract)

Recent significant success on visual object recognition has been achieved by the advent of convolutional neural networks (CNNs). However, existing techniques to encode geometric invariance within CNNs only model transformation fields in 2D image space. It does not account for the fact that objects in a 2D image space are a projection of 3D ones, and thus they have limited ability to severe camera viewpoint changes. To overcome this limitation, this dissertation addresses to model geometric variation of an object in a 3D space for visual recognition tasks, including object classification, detection, and re-identification from a single 2D image. The primary contributions of this dissertation are as follows:
First, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract view-specific features through view-specific convolutional kernels to predict object category scores at each viewpoint. With the view-specific feature, we simultaneously determine object category and viewpoint using the proposed sinusoidal soft-argmax module. As training data for joint object detection and viewpoint estimation is rather limited, we propose to leverage existing object detection datasets without viewpoint annotation to enable semi-supervised viewpoint learning.
In the second part of this dissertation, we propose a novel framework that learns to recover object variation in 3D space from a single image, trained on an image collection of fine-grained object categories without ground-truth 3D annotation. We accomplish this by representing an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint variation, in a canonical configuration. Unlike conventional methods modeling spatial variation in 2D images only, our method is capable of reconfiguring the appearance feature in a canonical 3D space, thus enabling the subsequent object classifier to be invariant under 3D geometric variation. Our representation also allows us to go beyond existing methods, by incorporating 3D shape variation as an additional cue for object recognition. To learn the model without ground-truth 3D annotation, we deploy a differentiable renderer in an analysis-by-synthesis framework. By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
We demonstrate that modeling geometric variation of an object in 3D space learns the discriminative representation of the object and achieves competitive performance on visual object recognition tasks, including joint object detection and viewpoint estimation, fine-grained image recognition and vehicle re-identification. We believe that the robust techniques for understanding 3D geometric variation from 2D images proposed in this dissertation will provide an essential tool and potentially benefit various computer vision applications.

번역하기

참고문헌 (Reference)

1. Fast r-cnn, R. Girshick, pp . 1440 ? 1448, , 2015

상세검색

RISS 보유자료

상세검색

해외전자자료

Beyond 2D image space : exploring 3D geometric variation for visual object recognition

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료