RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

본 논문에서는 문장 표현력을 향상시키고 이미지 특징 벡터의 소멸을 방지할 수 있는 이중 Embedding 기법과 문맥에 맞는 문장 순서를 생성하는 Bidirectional Recurrent Neural Network(Bi-RNN)을 적용한 디...

본 논문에서는 문장 표현력을 향상시키고 이미지 특징 벡터의 소멸을 방지할 수 있는 이중 Embedding 기법과 문맥에 맞는 문장 순서를 생성하는 Bidirectional Recurrent Neural Network(Bi-RNN)을 적용한 디테일한 이미지 캡션 모델을 제안한다. 이중 Embedding 기법에서, Word Embedding 과정인 EmbeddingⅠ은 캡션의 표현력을 향상시키기 위해 데이터세트의 캡션 단어를 One-hot encoding 방식을 통해 벡터화하고 EmbeddingⅡ는 캡션 생성 과정에서 발생하는 이미지 특징의 소멸을 방지하기 위해 이미지 특징 벡터와 단어 벡터를 융합함으로써 문장 구성 요소의 누락을 방지한다. 또한 디코더 영역은 어휘 및 이미지 특징을 양방향으로 획득하는 Bi-RNN으로 구성하여 문맥에 맞는 문장의 순서를 학습한다. 마지막으로 인코더와 디코더를 통하여 획득된 전체 이미지, 문장 표현, 문장 순서 특징들을 하나의 벡터공간인 Multimodal 레이어에 융합함으로써 문장의 순서와 표현력을 모두 고려한 디테일한 캡션을 생성한다. 제안하는 모델은 Flickr 8K 및 Flickr 30K, MSCOCO와 같은 이미지 캡션 데이터세트를 이용하여 학습 및 평가를 진행하였으며 객관적인 BLEU와 METEOR 점수를 통해 모델 성능의 우수성을 입증하였다. 그 결과, 제안한 모델은 3개의 다른 캡션 모델들에 비해 BLEU 점수는 최대 20.2점, METEOR 점수는 최대 3.65점이 향상되었다.

다국어 초록 (Multilingual Abstract)

This thesis proposes a detailed image caption model that applies the double embedding technique to improve sentence expressiveness and to prevent vanishing of image feature vectors. It uses the bidirectional recurrent neural network (Bi-RNN) to generate a sequence of sentences and fit their contexts. In the double-embedding technique, embedding Ⅰ is a word-embedding process used to vectorize dataset captions through one-hot encoding to improve the expressiveness of the captions. Embedding Ⅱ prevents missed sentence components by fusing image features and word vectors to prevent image features from vanishing during caption generation. The decoder area, composed of a Bi-RNN that acquires vocabulary and image features in both directions, learns the sequence of sentences that fits their contexts. Finally, through the encoder and decoder, the detailed image caption is generated by considering both sequence and sentence expressiveness by fusing the acquired image features, sentence presentation features, and sentence sequence features into a multimodal layer as a vector space. The proposed model was learned and evaluated using image caption datasets (e.g., Flickr 8K, Flickr 30K, and MSCOCO). The proven BLEU and METEOR scores demonstrate the superiority of the model. The proposed model achieved a BLEU score maximum of 20.2 points and a METEOR score maximum of 3.65 points, which is higher than the scores of other three caption models.

목차 (Table of Contents)

목 차
목 차 ⅰ
그림 및 표 목차 ⅱ
Abstract ⅳ

목 차
목 차 ⅰ
그림 및 표 목차 ⅱ
Abstract ⅳ
제 1 장 서 론 01
제 2 장 뉴럴 네트워크 및 평가지표 04
2.1 Convolutional Neural Network 04
2.2 Recurrent Neural Network 08
2.3 Long Short-Term Memory 10
2.4 Gated Recurrent Unit 13
2.5 Bidirectional Recurrent Neural Network 15
2.6 Bi-Lingual Evaluation Understudy 17
2.7 Metric for Evaluation of Translation with Explicit ORdering 20
제 3 장 제안한 이미지 캡션 모델 23
3.1 이중 Embedding 기법과 Bi-RNN을 이용한 캡션 구성 과정 25
3.2 Multimodal 레이어를 이용한 캡션 생성 과정 27
제 4 장 실험 및 결과 29
4.1 데이터세트 및 전처리 과정 29
4.2 실험 결과 분석 31
제 5 장 결 론 41
참 고 문 헌 42

상세검색

RISS 보유자료

상세검색

해외전자자료

이중 임베딩 기법과 Bi-RNN을 이용한 이미지 캡션에 관한 연구 = A study on Image Caption using Double Embedding Technique and Bi-RNN

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료