RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론

        박종인,김남규 한국지능정보시스템학회 2019 지능정보연구 Vol.25 No.3

        According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents ... 텍스트 데이터에 대한 다양한 분석을 위해 최근 비정형 텍스트 데이터를 구조화하는 방안에 대한 연구가 활발하게 이루어지고 있다. doc2Vec으로 대표되는 기존 문서 임베딩 방법은 문서가 포함한 모든 단어를 사용하여벡터를 만들기 때문에, 문서 벡터가 핵심 단어뿐 아니라 주변 단어의 영향도 함께 받는다는 한계가 있다. 또한기존 문서 임베딩 방법은 하나의 문서가 하나의 벡터로 표현되기 때문에, 다양한 주제를 복합적으로 갖는 복합문서를 정확하게 사상하기 어렵다는 한계를 갖는다. 본 논문에서는 기존의 문서 임베딩이 갖는 이러한 두 가지한계를 극복하기 위해 다중 벡터 문서 임베딩 방법론을 새롭게 제안한다. 구체적으로 제안 방법론은 전체 단어가 아닌 핵심 단어만 이용하여 문서를 벡터화하고, 문서가 포함하는 다양한 주제를 분해하여 하나의 문서를 여러 벡터의 집합으로 표현한다. KISS에서 수집한 총 3,147개의 논문에 대한 실험을 통해 복합 문서를 단일 벡터로 표현하는 경우의 벡터 왜곡 현상을 확인하였으며, 복합 문서를 의미적으로 분해하여 다중 벡터로 나타내는제안 방법론에 의해 이러한 왜곡 현상을 보정하고 각 문서를 더욱 정확하게 임베딩할 수 있음을 확인하였다

      • KCI등재

        Virtual Network Embedding based on Node Connectivity Awareness and Path Integration Evaluation

        ( Zhiyuan Zhao ),( Xiangru Meng ),( Yuze Su ),( Zhentao Li ) 한국인터넷정보학회 2017 KSII Transactions on Internet and Information Syst Vol.11 No.7

        As a main challenge in network virtualization, virtual network embedding problem is increasingly important and heuristic algorithms are of great interest. Aiming at the problems of poor correlation in node embedding and link embedding, long distance between adjacent virtual nodes and imbalance resource consumption of network components during embedding, we herein propose a two-stage virtual network embedding algorithm NA-PVNM. In node embedding stage, resource requirement and breadth first search algorithm are introduced to sort virtual nodes, and a node fitness function is developed to find the best substrate node. In link embedding stage, a path fitness function is developed to find the best path in which available bandwidth, CPU and path length are considered. Simulation results showed that the proposed algorithm could shorten link embedding distance, increase the acceptance ratio and revenue to cost ratio compared to previously reported algorithms. We also analyzed the impact of position constraint and substrate network attribute on algorithm performance, as well as the utilization of the substrate network resources during embedding via simulation. The results showed that, under the constraint of substrate resource distribution and virtual network requests, the critical factor of improving success ratio is to reduce resource consumption during embedding.

      • KCI등재

        Reversible Data Hiding Scheme Using Turbo Code

        Vidya Sawant,Archana Bhise 대한전자공학회 2019 IEIE Transactions on Smart Processing & Computing Vol.8 No.6

        A reversible data hiding scheme provides security for a secret message by embedding it in a cover image. After extracting the secret message at the receiver, the scheme restores the original message and the cover image without any distortion. However, data transmitted by the reversible data hiding scheme is vulnerable to the noise existing on the communications channel. A reversible data hiding scheme using turbo code is proposed in this paper to address that problem. The proposed reversible data hiding integrates modified matrix embedding, turbo code, and syndrome embedding to provide improved embedding capacity, security, and error correction. Modified matrix embedding hides the secret message in the cover image. The resulting stego-image is then encrypted and encoded by the turbo code. Turbo code deploys a secret key interleaver that shuffles the incoming bits using elliptic curve arithmetic and a secret key. The exact recovery of the cover image and the secret message by the receiver depends on the user-specific secret key. Modified matrix embedding hides a large number of secret message bits, which distorts the stegoimage. However, encrypting the stego-image with the secret key interleaver makes the embedding invisible. The proposed syndrome embedding transmits the locations of the modified bits in the cover image, and enables the receiver to reverse the effects of embedding. Simulation results with the proposed scheme demonstrate a high embedding capacity and improved error correction performance of 10-4 for an additive white Gaussian noise (AWGN) channel at an SNR of 2dB. It also successfully recovers the exact cover image and the secret message, compared to existing schemes such as matrix embedding and reversible data hiding in the encrypted domain. Moreover, the results depict high resistance to brute force attacks, statistical attacks, noise attacks, and cropping attacks.

      • Named Entity Recognition using Word Embedding as a Feature

        Miran Seok,Hye-Jeong Song,Chan-Young Park,Jong-Dae Kim,Yu-seop Kim 보안공학연구지원센터(IJSEIA) 2016 International Journal of Software Engineering and Vol.10 No.2

        This study applied word embedding to feature for named entity recognition (NER) training, and used CRF as a learning algorithm. Named entities are phrases that contain the names of persons, organizations and locations and recognizing these entities in text is one of the important task of information extraction. Word embedding is helpful in many learning algorithms of NLP, indicating that words in a sentence are mapped by a real vector in a low-dimension space. We used GloVe, Word2Vec, and CCA as the embedding methods. The Reuters Corpus Volume 1 was used to create word embedding and the 2003 shared task corpus (English) of CoNLL was used for training and testing. As a result of comparing the performance of multiple techniques for word embedding to NER, it was found that CCA (85.96%) in Test A and Word2Vec (80.72%) in Test B exhibited the best performance. When using the word embedding as a feature of NER, it is possible to obtain better results than baseline that do not use word embedding. Also, to check that the word embedding well performed, we did additional experiment calculating the similarity between words.

      • KCI등재

        접속과 내포

        유현경 국어학회 2011 국어학 Vol.60 No.-

        The aim of this research is to reconsider the implications of conjunction and embedding in Korean syntax, which were brought up by examining basic concepts related to conjunction and embedding in Korean syntax, and researching the common syntactic issues of the two grammatical categories. Thus, in second chapter, I reviewed concepts and grammatical morphs and linguistic units relating to conjunction and embedding. In third chapter, the syntactic issues of conjunction and embedding were discussed. First of all, this paper covers the problem of how to regard the structure of conjunction and embedding. Second issue is about the relationship between conjunction and embedding that focuses on adverbial clauses and subordinate conjunctive sentences. In Korean language, complex sentences are constructed with conjunction and embedding, also there are various syntactic phenomena according to the relationship between two clauses that form the complex sentence. Finally, the last issue regarding conjunction and embedding is about various syntactic phenomena of such complex sentences. The argument on the conjunction and embedding is still valid because there still are a lot more problems in need of new interpretations in the syntactic phenomena of complex sentences. The descriptions of Korean grammar until now have been based on the syntax of simple sentence, however the discussions regarding the syntactic phenomena found in complex sentence will enable Korean grammar to flourish.

      • KCI등재

        Virtual Network Embedding through Security Risk Awareness and Optimization

        ( Shuiqing Gong ),( Jing Chen ),( Conghui Huang ),( Qingchao Zhu ),( Siyi Zhao ) 한국인터넷정보학회 2016 KSII Transactions on Internet and Information Syst Vol.10 No.7

        Network virtualization promises to play a dominant role in shaping the future Internet by overcoming the Internet ossification problem. However, due to the injecting of additional virtualization layers into the network architecture, several new security risks are introduced by the network virtualization. Although traditional protection mechanisms can help in virtualized environment, they are not guaranteed to be successful and may incur high security overheads. By performing the virtual network (VN) embedding in a security-aware way, the risks exposed to both the virtual and substrate networks can be minimized, and the additional techniques adopted to enhance the security of the networks can be reduced. Unfortunately, existing embedding algorithms largely ignore the widespread security risks, making their applicability in a realistic environment rather doubtful. In this paper, we attempt to address the security risks by integrating the security factors into the VN embedding. We first abstract the security requirements and the protection mechanisms as numerical concept of security demands and security levels, and the corresponding security constraints are introduced into the VN embedding. Based on the abstraction, we develop three security-risky modes to model various levels of risky conditions in the virtualized environment, aiming at enabling a more flexible VN embedding. Then, we present a mixed integer linear programming formulation for the VN embedding problem in different security-risky modes. Moreover, we design three heuristic embedding algorithms to solve this problem, which are all based on the same proposed node-ranking approach to quantify the embedding potential of each substrate node and adopt the k-shortest path algorithm to map virtual links. Simulation results demonstrate the effectiveness and efficiency of our algorithms.

      • KCI등재

        문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안

        이민석,양석우,이홍주 한국지능정보시스템학회 2019 지능정보연구 Vol.25 No.4

        Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods. 텍스트 데이터가 특정 범주에 속하는지 판별하는 문장 분류에서, 문장의 특징을 어떻게 표현하고 어떤 특징을 선택할 것인가는 분류기의 성능에 많은 영향을 미친다. 특징 선택의 목적은 차원을 축소하여도 데이터를 잘설명할 수 있는 방안을 찾아내는 것이다. 다양한 방법이 제시되어 왔으며 Fisher Score나 정보 이득(Information Gain) 알고리즘 등을 통해 특징을 선택 하거나 문맥의 의미와 통사론적 정보를 가지는 Word2Vec 모델로 학습된 단어들을 벡터로 표현하여 차원을 축소하는 방안이 활발하게 연구되었다. 사전에 정의된 단어의 긍정 및 부정 점수에 따라 단어의 임베딩을 수정하는 방법 또한 시도하였다. 본 연구는 문장 분류 문제에 대해 선택적 단어 제거를 수행하고 임베딩을 적용하여 문장 분류 정확도를 향상시키는 방안을 제안한다. 텍스트 데이터에서 정보 이득 값이 낮은 단어들을 제거하고 단어 임베딩을 적용하는방식과, 정보이득 값이 낮은 단어와 코사인 유사도가 높은 주변 단어를 추가로 선택하여 텍스트 데이터에서 제거하고 단어 임베딩을 재구성하는 방식이다. 본 연구에서 제안하는 방안을 수행함에 있어 데이터는 Amazon.com의 ‘Kindle’ 제품에 대한 고객리뷰, IMDB 의 영화리뷰, Yelp의 사용자 리뷰를 사용하였다. Amazon.com의 리뷰 데이터는 유용한 득표수가 5개 이상을 만족하고, 전체 득표 중 유용한 득표의 비율이 70% 이상인 리뷰에 대해 유용한 리뷰라고 판단하였다. Yelp의 경우는 유용한 득표수가 5개 이상인 리뷰 약 75만개 중 10만개를 무작위 추출하였다. 학습에 사용한 딥러닝 모델은 CNN, Attention-Based Bidirectional LSTM을 사용하였고, 단어 임베딩은 Word2Vec과 GloVe를 사용하였다. 단어 제거를 수행하지 않고 Word2Vec 및 GloVe 임베딩을 적용한 경우와 본 연구에서 제안하는 선택적으로 단어 제거를 수행하고 Word2Vec 임베딩을 적용한 경우를 비교하여 통계적 유의성을 검정하였다.

      • SCOPUSKCI등재

        Energy-Aware Virtual Data Center Embedding

        ( Xiao Ma ),( Zhongbao Zhang ),( Sen Su ) 한국정보처리학회 2020 Journal of information processing systems Vol.16 No.2

        As one of the most significant challenges in the virtual data center, the virtual data center embedding has attracted extensive attention from researchers. The existing research works mainly focus on how to design algorithms to increase operating revenue. However, they ignore the energy consumption issue of the physical data center in virtual data center embedding. In this paper, we focus on studying the energy-aware virtual data center embedding problem. Specifically, we first propose an energy consumption model. It includes the energy consumption models of the virtual machine node and the virtual switch node, aiming to quantitatively measure the energy consumption in virtual data center embedding. Based on such a model, we propose two algorithms regarding virtual data center embedding: one is heuristic, and the other is based on particle swarm optimization. The second algorithm provides a better solution to virtual data center embedding by leveraging the evolution process of particle swarm optimization. Finally, experiment results show that our proposed algorithms can effectively save energy while guaranteeing the embedding success rate.

      • KCI등재

        음표 임베딩과 마디 임베딩을 이용한 곡의 생성 및 정량적 평가 방법

        이영배 ( Young-bae Lee ),정성훈 ( Sung Hoon Jung ) 한국정보처리학회 2021 정보처리학회논문지. 소프트웨어 및 데이터 공학 Vol.10 No.11

        In order to learn an existing song and create a new song using an artificial neural network, it is necessary to convert the song into numerical data that the neural network can recognize as a preprocessing process, and one-hot encoding has been used until now. In this paper, we proposed a note embedding method using notes as a basic unit and a bar embedding method that uses the bar as the basic unit, and compared the performance with the existing one-hot encoding. The performance comparison was conducted based on quantitative evaluation to determine which method produced a song more similar to the song composed by the composer, and quantitative evaluation methods used in the field of natural language processing were used as the evaluation method. As a result of the evaluation, the song created with bar embedding was the best, followed by note embedding. This is significant in that the note embedding and bar embedding proposed in this paper create a song that is more similar to the song composed by the composer than the existing one-hot encoding.

      • KCI등재

        한국어-영어 법률 말뭉치의 로컬 이중 언어 임베딩

        최순영,Andrew Stuart Matteson,임희석 한국융합학회 2018 한국융합학회논문지 Vol.9 No.10

        최근 이중 언어 임베딩(bilingual word embedding) 관련 연구들이 각광을 받고 있다. 그러나 한국어와 특정 언어로 구성된 병렬(parallel-aligned) 말뭉치로 이중 언어 워드 임베딩을 하는 연구는 질이 높은 많은 양의 말뭉치를 구하기 어려우 므로 활발히 이루어지지 않고 있다. 특히, 특정 영역에 사용할 수 있는 로컬 이중 언어 워드 임베딩(local bilingual word embedding)의 경우는 상대적으로 더 희소하다. 또한 이중 언어 워드 임베딩을 하는 경우 번역 쌍이 단어의 개수에서 일대일 대응을 이루지 못하는 경우가 많다. 본 논문에서는 로컬 워드 임베딩을 위해 한국어-영어로 구성된 한국 법률 단락 868,163 개를 크롤링(crawling)하여 임베딩을 하였고 3가지 연결 전략을 제안하였다. 본 전략은 앞서 언급한 불규칙적 대응 문제를 해결하고 단락 정렬 말뭉치에서 번역 쌍의 질을 향상시켰으며 베이스라인인 글로벌 워드 임베딩(global bilingual word embedding)과 비교하였을 때 2배의 성능을 확인하였다. Recently, studies about bilingual word embedding have been gaining much attention. However, bilingual word embedding with Korean is not actively pursued due to the difficulty in obtaining a sizable, high quality corpus. Local embeddings that can be applied to specific domains are relatively rare. Additionally, multi-word vocabulary is problematic due to the lack of one-to-one word-level correspondence in translation pairs. In this paper, we crawl 868,163 paragraphs from a Korean-English law corpus and propose three mapping strategies for word embedding. These strategies address the aforementioned issues including multi-word translation and improve translation pair quality on paragraph-aligned data. We demonstrate a twofold increase in translation pair quality compared to the global bilingual word embedding baseline.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼