RISS 검색 - 국내학술지논문 상세보기

다국어 초록 (Multilingual Abstract)

The supervised learning approach is suitable for classification of insulting sentences, but pre-decided training sentences are necessary. Since a Character-level Convolution Neural Network is robust for each character, so is appropriate for classifying abusive sentences, however, has a drawback that demanding a lot of training sentences. In this paper, we propose transfer learning method that reusing the trained filters in the real classification process after the filters get the characteristics of offensive words by generated abusive/normal pair of sentences. We got higher performances of the classifier by decreasing the effects of data shortage and class imbalance. We executed experiments and evaluations for three datasets and got higher F1-score of character-level CNN classifier when applying transfer learning in all datasets.

국문 초록 (Abstract)

욕설문장을 지도학습 접근법으로 분류하기 위해서 욕설인지 아닌지 판별된 학습 문장이 필요하다. 문자수준의 컨볼루션 신경망이 각 문자에 대해 강건성을 가지기 때문에 욕설분류에 적합...

욕설문장을 지도학습 접근법으로 분류하기 위해서 욕설인지 아닌지 판별된 학습 문장이 필요하다. 문자수준의 컨볼루션 신경망이 각 문자에 대해 강건성을 가지기 때문에 욕설분류에 적합하지만, 학습에 많은 데이터가 필요하다는 단점이 있다. 본 논문에서는 이를 해결하기 위해 임의로 생성한 욕설/비욕설 문장 쌍을 컨볼루션 신경망을 기반으로 하는 분류기에 학습시켜 컨볼루션 신경망의 필터가 욕설의 특징을 분류하도록 조정한 후, 실제 훈련문장을 학습시킬 때 필터를 재사용하는 전이학습방법을 제안한다.
이로써 데이터 부족과 클래스 불균형으로 인한 영향이 감소하여 분류 성능이 향상될 것이다. 실험 및 평가는 총 3가지 데이터에 대해 수행되었으며, 문자수준 컨볼루션 신경망을 활용한 분류기는 모든 데이터에서 전이학습을 적용했을 때 더 높은 F1 점수를 획득하였다.

참고문헌 (Reference)

1 N. V. Chawia, "SMOTE : synthetic minority oversampling technique" 16 : 321-357, 2002

2 S. Sood, "Profanity use in online communities" 1481-1490, 2012

3 K. Dinakar, "Modeling the detection of Textual Cyberbullying" 11-17, 2011

4 J. M. Xu, "Learning from bullying traces in social media" 656-666, 2012

5 N. Djuric, "Hate speech detection with comment embeddings" 29-30, 2015

6 M. Iyyer, "Generating sentences from semantic vector space representations" 2014

7 T. Mikolov, "Distributed representations of words and phrases and their compositionality" 3111-3119, 2013

8 Q. V. Le, "Distributed representations of sentences and documents" 1188-1196, 2014

9 J. Wang, "Dimensional sentiment analysis using a regional CNN-LSTM model" 225-230, 2016

10 G. Xiang, "Detecting offensive tweets via topical feature discovery over a large scale twitter corpus" 1980-1984, 2012