RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      KCI등재

      Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구

      한글로보기

      https://www.riss.kr/link?id=A106635038

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants.
      The market information such as market size, market growth rate, and market share is essential for setting companies’ business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information.
      In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies’ product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Koreas microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies’ sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearsons correlation coefficient was 0.513.
      Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms.
      The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our gro
      번역하기

      With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining te...

      With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants.
      The market information such as market size, market growth rate, and market share is essential for setting companies’ business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information.
      In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies’ product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Koreas microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies’ sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearsons correlation coefficient was 0.513.
      Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms.
      The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our gro

      더보기

      참고문헌 (Reference)

      1 강형석, "한국어 단어 임베딩을 위한 Word2vec 모델의 최적화" 한국디지털콘텐츠학회 20 (20): 825-833, 2019

      2 강형석, "한국어 단어 임베딩 모델의 평가에 적합한 유추 검사 세트" 한국디지털콘텐츠학회 19 (19): 1999-2008, 2018

      3 최병옥, "한국 식용 천일염 시장규모 전망에 관한 연구" 한국산학기술학회 14 (14): 4812-4818, 2013

      4 유형선, "표준통계분류를 이용한 내수시장 규모 추정방법에 관한 연구" 한국기술혁신학회 18 (18): 387-415, 2015

      5 이유순, "표본조사를 통한 패션시장 규모 추정 방법론에 대한 연구" 한국자료분석학회 14 (14): 1281-1290, 2012

      6 이민철, "텍스트 마이닝 기법을 적용한 뉴스 데이터에서의사건 네트워크 구축" 한국지능정보시스템학회 24 (24): 183-203, 2018

      7 장은수, "체질 의료 서비스 시장규모 및 현황 조사-전국 한방의료기관을 중심으로-" 사상체질의학회 25 (25): 43-50, 2013

      8 강지수, "지역별 인구구조와 사업체의 업력구조 및 성과" 한국은행 24 (24): 101-128, 2018

      9 윤영석, "지능정보산업의 시장규모 분포 추정에 관한 연구" 한국산업경제학회 29 (29): 2179-2198, 2016

      10 손녕선, "제조업 사업체의 성장과 퇴출: 다사업체 기업 소속 사업체와 단독사업체와의 비교를 중심으로" 지암남덕우경제연구원 47 (47): 1-27, 2018

      1 강형석, "한국어 단어 임베딩을 위한 Word2vec 모델의 최적화" 한국디지털콘텐츠학회 20 (20): 825-833, 2019

      2 강형석, "한국어 단어 임베딩 모델의 평가에 적합한 유추 검사 세트" 한국디지털콘텐츠학회 19 (19): 1999-2008, 2018

      3 최병옥, "한국 식용 천일염 시장규모 전망에 관한 연구" 한국산학기술학회 14 (14): 4812-4818, 2013

      4 유형선, "표준통계분류를 이용한 내수시장 규모 추정방법에 관한 연구" 한국기술혁신학회 18 (18): 387-415, 2015

      5 이유순, "표본조사를 통한 패션시장 규모 추정 방법론에 대한 연구" 한국자료분석학회 14 (14): 1281-1290, 2012

      6 이민철, "텍스트 마이닝 기법을 적용한 뉴스 데이터에서의사건 네트워크 구축" 한국지능정보시스템학회 24 (24): 183-203, 2018

      7 장은수, "체질 의료 서비스 시장규모 및 현황 조사-전국 한방의료기관을 중심으로-" 사상체질의학회 25 (25): 43-50, 2013

      8 강지수, "지역별 인구구조와 사업체의 업력구조 및 성과" 한국은행 24 (24): 101-128, 2018

      9 윤영석, "지능정보산업의 시장규모 분포 추정에 관한 연구" 한국산업경제학회 29 (29): 2179-2198, 2016

      10 손녕선, "제조업 사업체의 성장과 퇴출: 다사업체 기업 소속 사업체와 단독사업체와의 비교를 중심으로" 지암남덕우경제연구원 47 (47): 1-27, 2018

      11 이동엽, "워드 임베딩을 이용한 아마존 패션 상품 리뷰의 사용자 감성 분석" 한국융합학회 8 (8): 1-8, 2017

      12 이경상, "스마트 모빌리티의 구매요인 분석 : Mental Accounting Theory와 텍스트 마이닝을 중심으로" 대한경영학회 31 (31): 2147-2168, 2018

      13 전승표, "데이터기반의 신규 사업 매출추정방법 연구: 지능형 사업평가 시스템을 중심으로" 한국지능정보시스템학회 23 (23): 1-22, 2017

      14 김대진, "데이터 마이닝 기법을 통한 마케팅 전략 변화에 대한 연구" 한국경영학회 22 (22): 177-194, 2018

      15 양유정, "단어 임베딩 및 벡터 유사도 기반 게임 리뷰 자동 분류 시스템 개발" 한국전자거래학회 24 (24): 1-14, 2019

      16 안정국, "국내 핀테크 동향 및 모바일 결제 서비스 분석: 텍스트 마이닝 기법 활용" 한국지능정보사회진흥원 23 (23): 26-42, 2016

      17 천세영, "교과서와 학습자료 구입액의 시장규모 추정" 한국교육재정경제학회 19 (19): 95-124, 2010

      18 박현기, "공공데이터를 이용한 G2B 전자상거래 시장수요예측- 조달청 사례를 중심으로 -" 한국정보기술학회 12 (12): 113-121, 2014

      19 허찬, "Word2vec와 Label Propagation을 이용한 감성사전 구축 방법" 한국차세대컴퓨팅학회 13 (13): 93-101, 2017

      20 박성수, "Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안" 한국디지털콘텐츠학회 19 (19): 133-140, 2018

      21 박영재, "Word2Vec을 이용한 한국 대통령 연설문 네트워크 분석" 한국물리학회 67 (67): 569-574, 2017

      22 허지욱, "Word2Vec를 이용한 한국어 단어 군집화 기법" 한국인터넷방송통신학회 18 (18): 25-30, 2018

      23 강부식, "Word2Vec과 앙상블 합성곱 신경망을 활용한 영화추천 시스템의 정확도 개선에 관한 연구" 한국디지털정책학회 17 (17): 123-130, 2019

      24 김기용, "Word2Vec과 2계층 양방향 장단기 기억 네트워크를 이용한 특허 문서의 자동 IPC 분류" 한국차세대컴퓨팅학회 15 (15): 50-60, 2019

      25 Maaten, L. v. d., "Visualizing Data Using t-SNE" 9 : 2579-2605, 2008

      26 Lilleberg, J., "Support Vector Machines and Word2vec for Text Classification with Semantic Features" 136-140, 2015

      27 Liu, H., "Sentiment Analysis of Citations Using Word2vec"

      28 이윤주, "SNS 텍스트 콘텐츠를 활용한 오피니언마이닝 기반의 패션 트랜드 마케팅 예측 분석" 한국정보기술학회 12 (12): 163-170, 2014

      29 Statistics Korea, "Report on the Economic Census - Whole Country" 2017

      30 Statistics Korea, "Report on the Census on Establishments" 2015

      31 Choi, S., "On Word Embedding Models and Parameters Optimized for Korean" 252-256, 2016

      32 Vasile, F., "Meta-Prod2vec - Product Embeddings Using Side-Information for Recommendation" 225-232, 2016

      33 Weiss, S. M., "Fundamentals of Predictive Text Mining" Springer 2015

      34 Balachandra, R., "Factors for Success in R&D Projects and New Product Innovation : A Contextual Framework" 44 (44): 276-287, 1997

      35 Jung, Y. L., "Estimating Apparatus for Market Size, and Control Method Thereof, Patent Application Number 10-2019-0112446"

      36 Mikolov, T., "Efficient Estimation of Word Representations in Vector Space"

      37 Grbovic, M., "E-Commerce in Your Inbox: Product recommendations at scale" 1809-1818, 2015

      38 김도우, "Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류" 한국정보과학회 44 (44): 742-747, 2017

      39 Harris, Z. S., "Distributional Structure" 10 (10): 146-162, 1954

      40 Le, Q., "Distributed Representations of Sentences and Documents" 1188-1196, 2014

      41 Shin, M. C., "Basic Statistics for Business and Economics" Changmin 2010

      42 Kim, S., "Automatic Extraction of Alternative Word Candidates Using the Word2vec Model" 23 (23): 769-771, 2015

      43 Ngo, D. L., "Application of Word Embedding to Drug Repositioning" 9 (9): 7-16, 2016

      44 Nam, Y., "Analysis on the Determinants of Exit of Self-Employed Businesses in Korea" BOK 1-37, 2017

      45 Chakraborty, G., "Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining" 1288-2014, 2014

      46 Stein, R. A., "An Analysis of Hierarchical Text Classification Using Word Embeddings" 471 : 216-232, 2019

      47 Lim, J., "A Study on the New Product Forecasting Methodology" 18 (18): 51-63, 1992

      48 양희정, "A Study on Word Vector Models for Representing Korean Semantic Information" 한국음성학회 7 (7): 41-47, 2015

      49 Xue, B., "A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec" 358-363, 2014

      더보기

      동일학술지(권/호) 다른 논문

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      인용정보 인용지수 설명보기

      학술지 이력

      학술지 이력
      연월일 이력구분 이력상세 등재구분
      2027 평가예정 재인증평가 신청대상 (재인증)
      2021-01-01 평가 등재학술지 유지 (재인증) KCI등재
      2018-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2015-03-25 학회명변경 영문명 : 미등록 -> Korea Intelligent Information Systems Society KCI등재
      2015-03-17 학술지명변경 외국어명 : 미등록 -> Journal of Intelligence and Information Systems KCI등재
      2015-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2011-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2009-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2008-02-11 학술지명변경 한글명 : 한국지능정보시스템학회 논문지 -> 지능정보연구 KCI등재
      2007-01-01 평가 등재학술지 유지 (등재유지) KCI등재
      2004-01-01 평가 등재학술지 선정 (등재후보2차) KCI등재
      2003-01-01 평가 등재후보 1차 PASS (등재후보1차) KCI등재후보
      2001-07-01 평가 등재후보학술지 선정 (신규평가) KCI등재후보
      더보기

      학술지 인용정보

      학술지 인용정보
      기준연도 WOS-KCI 통합IF(2년) KCIF(2년) KCIF(3년)
      2016 1.51 1.51 1.99
      KCIF(4년) KCIF(5년) 중심성지수(3년) 즉시성지수
      1.78 1.54 2.674 0.38
      더보기

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼