The field of NLP (Natural Language Processing) has been greatly developed in recent years with the help of deep learning technology. In particular, BERT-based deep learning models have been in the spotlight due to their outstanding performance. Howeve...
The field of NLP (Natural Language Processing) has been greatly developed in recent years with the help of deep learning technology. In particular, BERT-based deep learning models have been in the spotlight due to their outstanding performance. However, existing NLU (Natural Language Understanding) models are learning natural language only through contextual information without considering a latent characteristic that is inherent in natural language due to writers and environmental factors surrounding them. In order to take into account this characteristic hidden behind the context, by incorporating a concept of OLAP into NLP, we consider natural language text data as fact attribute values, and information such as time, locations and writers accompanying the creation of the text data as dimension attribute values.
In this paper, we propose two methods that allow the use of dimension attribute values for learning natural language text data in BERT-based models. In addition, we introduce a record-based general-purpose corpus DBLP-RC built by ourselves to be used for pre-training our general-purpose NLU model (called OLAP-BERT), and record-based labeled datasets, DBLP-RDfSRL and DBLP-RDfCIC, also built by our-selves for fine-tuning the model with SRL (Semantic Role Labeling) task and CIC (Citation Intent Classification) task. In experiments conducted by setting vanilla BERT models as baselines, our OLAP-BERT model shows a better performance than the baseline models in both tasks.