RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

The field of NLP (Natural Language Processing) has been greatly developed in recent years with the help of deep learning technology. In particular, BERT-based deep learning models have been in the spotlight due to their outstanding performance. However, existing NLU (Natural Language Understanding) models are learning natural language only through contextual information without considering a latent characteristic that is inherent in natural language due to writers and environmental factors surrounding them. In order to take into account this characteristic hidden behind the context, by incorporating a concept of OLAP into NLP, we consider natural language text data as fact attribute values, and information such as time, locations and writers accompanying the creation of the text data as dimension attribute values.
In this paper, we propose two methods that allow the use of dimension attribute values for learning natural language text data in BERT-based models. In addition, we introduce a record-based general-purpose corpus DBLP-RC built by ourselves to be used for pre-training our general-purpose NLU model (called OLAP-BERT), and record-based labeled datasets, DBLP-RDfSRL and DBLP-RDfCIC, also built by our-selves for fine-tuning the model with SRL (Semantic Role Labeling) task and CIC (Citation Intent Classification) task. In experiments conducted by setting vanilla BERT models as baselines, our OLAP-BERT model shows a better performance than the baseline models in both tasks.

번역하기

국문 초록 (Abstract)

자연어 처리 분야는 최근 몇 년간 딥러닝 기술의 도움을 받아 크게 발전되어 왔다. 특히, 최근에는 BERT 기반의 딥러닝 모델들이 그 우수한 성능으로 인해 크게 각광받고 있다. 하지만, 기존의 자연어 이해 모델들은 자연어가 내재하고 있는 생성 주체 및 환경적 요인들에 의한 차이를 고려하지 못하고 문맥 정보만을 통해 자연어를 학습하고 있다. 이러한 문맥 뒤에 가려진 자연어의 잠재적 특성들을 함께 고려하고자 OLAP 개념을 접목하여 자연어 텍스트를 Fact(사실) 속성값으로, 텍스트 데이터 생성에 수반된 시점, 지역, 주체 등의 정보를 Dimension(차원) 속성값으로 간주하고 이를 활용한다.
본 논문에서 우리는 BERT 모델을 기반으로 자연어 텍스트의 학습에 Dimension 속성값을 이용할 수 있는 기법들을 제안한다. 또한, 제안한 기법들을 적용한 범용 자연어 이해 모델(일명: OLAP-BERT)의 사전 학습에 사용할 수 있도록 컴퓨터 사이언스 도메인의 데이터로부터 직접 구축한 레코드 기반 코퍼스(일명: DBLP-RC)와 의미역 결정 작업을 위한 학습에 사용할 수 있는 데이터셋(일명: DBLP-RDfSRC), 인용 의도 분류 작업을 위한 학습에 사용할 수 있는 데이터셋(일명: DBLP-RDfCIC)을 소개한다. 제안한 기법과 데이터셋을 통해 학습된 우리의 OLAP-BERT 모델은 의미역 결정 및 인용 의도 분류와 같은 자연어 처리 작업에 대한 실험에서 종래의 모델보다 우수한 성능을 보였다. 이러한 결과는 Dimension 속성값을 이용해 자연언어 딥러닝 모델의 성능을 더욱 향상시킬 수 있음을 보여준다.

번역하기

자연어 처리 분야는 최근 몇 년간 딥러닝 기술의 도움을 받아 크게 발전되어 왔다. 특히, 최근에는 BERT 기반의 딥러닝 모델들이 그 우수한 성능으로 인해 크게 각광받고 있다. 하지만, 기존의...

목차 (Table of Contents)

1. Introduction 1
1.1 Motivation 2
1.2 Contributions 3
2. Background 5
2.1 Language Modeling 5

1. Introduction 1
1.1 Motivation 2
1.2 Contributions 3
2. Background 5
2.1 Language Modeling 5
2.2 Context-independent and Context-sensitive Text Representation 6
2.3 Pre-train and Fine-tune Paradigm in the Field of Natural Language Processing 8
2.4 Semantic Role Labeling 9
2.5 Citation Intent Classification 10
3. Related Work 11
3.1 BERT (Bidirectional Transformers for Language Understanding) 11
3.2 BERT Variant Models 13
3.2.1 BERT Models Pre-trained with Domain-specific Corpus 13
3.2.2 BERT Models Pre-trained with New Tasks 14
3.2.3 BERT Models Pre-trained with Additional Features 15
4. Our Methods for OLAP-BERT 17
4.1 Method 1: Additional Features Affect Text Tokens Differently 17
4.2 Method 2: Additional Features Affect Text Tokens Equally 20
5. Our Datasets for OLAP-BERT 22
5.1 DBLP-RC: A Record-based Corpus 22
5.2 Record-based Labeled Datasets 24
5.2.1 DBLP-RDfSRL: A Record-based Dataset for Semantic Role Labeling 24
5.2.2 DBLP-RDfCIC: A Record-based Dataset for Citation Intent Classification 25
6. Experiments 26
6.1 Datasets 26
6.1.1 Record-based Corpus for Pre-training 26
6.1.2 Record-based Datasets for Fine-training 28
6.2 Experimental Setup 28
6.2.1 Pre-training for Natural Language Understanding Models 29
6.2.2 Fine-tuning for Task-specific Models 31
6.3 Experimental Results 32
6.3.1 Results of the Pre-training 32
6.3.2 Results of the Fine-tuning 34
7. Discussion 36
8. Conclusions 38
9. References 40

상세검색

RISS 보유자료

상세검색

해외전자자료

Improving the Performance of Natural Language Deep Learning Models by Using Dimension Attribute Values = Dimension 속성값을 이용한 자연언어 딥러닝 모델의 성능 향상

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료