RISS 검색 - 학위논문

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

1
커널 밀도 추정치를 이용한 모형 기반 군집분석의 초기화 방법 : 기업 부도 빅데이터를 이용하여

조현주 국민대학교 일반대학원 데이터사이언스학과데이터사이언스전공 2018 국내석사

RANK : 250655
Expectation-Maximization 알고리즘은 가우시안 혼합 모형(gaussian mixture models)과 같이 숨겨진 구조의 모델의 파라미터를 추정하는데 널리 쓰인다. 하지만 EM 알고리즘은 초기 값(initial value)에 따라 극댓값(local maximum)으로 수렴할 가능성이 있고 알고리즘 수렴 속도의 시간이 오래 걸릴 수 있는 결점을 가지고 있다. 본 연구에서는 이러한 EM 알고리즘의 결점을 해결하고자 Modal-EM 알고리즘으로 커널 밀도 추정치(kernel density estimation)의 local mode를 찾고 커널 밀도 추정치의 local mode를 갖는 모수를 초기 값으로 지정하여 최대우도추정량(maximum likelihood estimation)을 추정하고자 한다. 또한 모의실험을 통해 기존 연구의 무작위 초기화(random initialization) 방법인 10EM, 10CEM-EM, 10em-EM, SEMmax-EM 방법과 비교하여 본 연구에서 제안하는 초기 값 선택 방법이 우도 함수(likelihood function)의 최댓값(global maximum)을 찾는데 더 좋은 성능을 가지는 것을 확인하고 실제 기업 부도 데이터에 적용하여 기존 방법론과 커널 기반 초기 값 선택 방법을 비교하고자 한다. Expectation-Maximization algorithm is widely used to estimate the parameters of hidden structural models, such as Gaussian mixture models. However, the EM algorithm has the drawback that it may converge to the local maximum according to the initial value and the algorithm convergence speed may take a long time. In order to solve the drawbacks of this EM algorithm, this study finds the local mode of the kernel density estimation with the Modal-EM algorithm and tries to estimate the maximum likelihood estimation by specifying the parameter with the local mode of the kernel density estimate as the initial value. Also, the simulation results show that the initial value choice method proposed in this study compared to the random initialization methods such as 10EM, 10CEM-EM, 10em-EM and SEMmax-EM, has better performance in finding the global maximum. We compare the results of the existing methodology and the kernel-based initial value choice method by applying it to actual corporate default data.
2
온라인 구매 행태를 고려한 토픽 모델링 기반 도서 추천

정영진 국민대학교 일반대학원 데이터사이언스학과 데이터사이언스전공 2018 국내박사

RANK : 250655
소셜 미디어의 발전으로 과거에는 정보의 수용자였던 일반 사용자들도 정보와 지식의 제공자가 되었다. 하지만 많은 정보로 인해 오히려 사용자는 구매 결정을 내리는 데 어려움을 느낀다. 사용자들의 선호를 추론하여 구매를 희망하는 아이템을 쉽게 찾을 수 있도록 지원하는 추천 시스템을 활용해 이러한 정보/지식 과부하 문제를 해결하기 위해 노력하고 있지만 과연 사용자들이 선호하는 바를 적절하게 반영할 수 있는 것인가에 대한 지속적인 의문 제기가 있었으며, 보다 나은 성능 향상을 위한 많은 연구 또한 계속 진행되어 왔다. 특히 도서 시장의 소비자들은 도서의 내용과 출판시기, 구입할 수 있는 가격 조건 등을 고려하여 구매한다. 그러나 현재까지 이와 같은 구매행태를 모두 적용한 연구는 없었다. 따라서 본 연구에서는 구매 데이터에 기초하여 토픽 모델링을 적용한 도서의 주제 분석과 사용자의 가격 및 최신성 등의 특성을 반영하고 도서 시장에서 고객에게 적절한 추천을 제공할 수 있는 방법론을 제안한다. LDA 토픽 모델링을 기반으로 도서의 주제를 추출하고 사용자의 선호를 추론한 후 사용자 선호도를 기반으로 유사도를 계산하여 추천 후보 리스트를 생성한 뒤 사용자의 도서 출판 최신성 선호 여부와 가격 수용도를 기준으로 필터링하여 최종 추천 리스트를 제공하였다. 실험을 통해 제안한 방법론이 기존의 협업 필터링 추천 시스템보다 성능이 높음을 검증하였으며, 데이터의 희박성을 낮출 수 있었다. 또한, 도서의 최신성과 사용자가 과거에 구입한 가격범위를 가격 수용도로 적용하여 사용자를 위해 보다 정교하게 추천리스트를 생성함에 따라 성과 향상에 많은 기여를 한 것으로 판단된다. 따라서 본 추천 시스템이 추천 시스템 연구의 발전에 기여하고 고객 만족과 경영에 긍정적인 영향을 미칠 것으로 예상된다. Thanks to the development of social media, general users become not only consumers but also providers of information and knowledge. But customers still feel difficulty making a decision on their purchases due to overflowing information. Although recommendation systems have launched to solve these problems, it may be still questionable how much they can satisfy customers’ needs. Especially, customers in book market consider contents of a book, recentness, and price when they make a purchase. Therefore, in this study, we propose a methodology which can reflect these characteristics based on topic modeling and provide proper recommendations to customers in book market. The current study verifies that our experimental approach shows higher performance than traditional collaborative filtering systems, reducing data sparseness. Additionally, we suggest that our technology create more elaborate recommendation lists to meet user book preferences for the recentness and price affordability, which should make a substantial contribution improvement of book recommendation system. Furthermore, we expect that our technology of book recommendation to significantly impact on the development of studies on recommendation system and the customer satisfaction and management.
3
RNN을 이용한 한국어 감성분석 : 온라인 영화 후기를 중심으로

이재준 국민대학교 일반대학원 데이터사이언스학과 데이터사이언스전공 2018 국내석사

RANK : 250639
인터넷 발전으로 다른 사람들의 의견과 경험이 포한된 온라인 후기를 손쉽게 접할 수 있게 되었다. 후기는 제품과 서비스의 실질적 구매에 영향을 미치기 때문에 기업의 중요한 마케팅 전략으로 활용될 수 있다. 이에 언어에 포함된 의견과 태도와 같은 주관적인 정보를 탐지하는 감성분석 연구가 이루어졌다. 초기에는 감성 강도를 나타낸 감성사전을 통해 이루어졌으나 최근에는 딥러닝 기반 감성분석 연구가 증가하고 있다. 감성분석과 함께 말뭉치를 학습해 다음에 나올 단어 혹은 문장을 예측하는 언어모델에서도 딥러닝을 이용하고 있다. 한국어 문장을 이용한 언어모델 실험에서는 다양한 말뭉치를 활용하였지만, 감성분석은 형태소 단위 말뭉치만이 수행되었다. 본 논문에서는 다양한 한국어 말뭉치를 이용해 문장의 감성을 긍정 또는 부정으로 판별하는 감성분석을 시도한다. 기존 감성분석에 사용한 감성사전을 구축하지 않고 문장을 구성하는 말뭉치로만 감성을 예측한다. 선행연구로 형태소 단위 모형과 함께 한글의 가장 작은 단위인 음소와 어절 단위 모형을 구축해 각 입력단위 별 모형성능을 비교분석 한다. 음소 단위 모형은 LSTM, Vanilla RNN, GRU와 같은 RNN layer를 3개 포함하는 모형을 구성하였다. 형태소 및 어절 단위 모형은 Word2vec을 이용해 말뭉치를 분산표현 한 후 LSTM layer를 2개 또는 3개 포함하는 모형을 구축하였다. 성능평가로는 정확도 및 loss를 측정하였고 각 모형별 학습시간을 비교하였다. 실험결과로는 음소 단위 모형은 LSTM 성능이 가장 뛰어났으며 형태소 및 어절 단위 모형에서는 LSTM layer가 모형성능에 영향을 미치지 않은 것을 확인하였다. 또한, 학습시간은 가장 작은 단위로 분할하여 문장길이가 길어진 음소 단위 모형이 가장 크고 어절 단위 모형이 가장 작았다. 한국어 문장을 다양한 말뭉치로 학습한 감성분석 실험을 통해 영어에 비해 연구가 부족한 한글 기반 언어처리 및 음성인식 분야에 활용될 것으로 기대한다. Internet development makes access easy to the online review which contained opinions and experiences of others. Reviews can be utilized as an important marketing strategy for the company due to their actual influence on the product and service. Sentiment analysis research was accomplished to sensor the subjective information like opinion and attitude in the words. Research was made by sentiment dictionary in the beginning, but glowing sentiment analysis research recently based on deep learning. Language model, together with sentiment analysis, which can expect next words or sentences by studying corpus is using deep learning. In language model experiment using Korean sentences, various corpus is applied but sentiment analysis with morpheme only. This study implement sentiment analysis to distinguish negative or positive using various Korean corpus. It forecast sentiment not by sentiment dictionary construction used current sentiment but by corpus only composed sentences. By construction for morpheme-level model and phoneme-level model which is the smallest unit in Korean and word-level model, analyze model performance by input unit. Phoneme-level model was constructed 3 RNN layers like LSTM, Vanilla RNN, GRU. After morpheme and word-level model expressed dispersion and constructed the models which had 2 or 3 layers included. Measured accuracy and loss for performance evaluation and compared learning hours per model. By experiment result, phoneme-level model was standing high above the others and confirmed not effect LSTM layer for model performance in morpheme and word-level model. Learning hours is the largest in phoneme-level model which the sentence becomes longer by divided in the smallest unit and the smallest in word-level model. Through sentiment analysis experiment for the Korean sentences by the various corpus, expect implemented in language processing and speech recognition which have lack of research compared to English.

내보내기
내책장담기
한글로보기

정확도순

내림차순

내림차순

10개씩 출력

맨처음 페이지로 1 맨끝 페이지로

상세검색

RISS 보유자료

상세검색

해외전자자료

연관 검색어 추천