RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 학위유형
        • 주제분류
          펼치기
        • 수여기관
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 지도교수
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • Clustering analysis for nonlinear and noisy patterns in high dimensional data

        유재홍 Graduate School, Korea University 2018 국내박사

        RANK : 232447

        Many modern industries generate an unprecedented wealth of data because of the recent rapid improvement in the technology for sensing, accumulating, and storing. Data mining algorithms facilitate the discovery of useful concepts or information from these huge amounts of data. Of the numerous data mining techniques available, in recent years unsupervised learning methods have attracted the interest of numerous analysts. Among the approaches to unsupervised learning in use, clustering analysis is one of the most widely used to facilitate the extraction of hidden patterns from data and thus elicit their natural groupings. Clustering analysis systematically partitions a data set by minimizing within-group variations and maximizing between-group variations. Clustering analysis techniques have been applied in various industrial fields. These include manufacturing, text categorization, image segmentation, and biomedicine. To obtain superior results in clustering analysis requires appropriate clustering algorithms and distance measures. Although most of the existing clustering analysis techniques perform reasonably well within the situations for which they were designed, no consensus exists about which one is the best all-around performer in real-life situations. In real-world situations, clustering analysis encounters several obstacles in the data structures. These include nonlinearity and locality, noisy patterns, and high dimensionality. To address these data structural issues, this thesis is aimed at establishment of unsupervised learning methods to yield more robust clustering analysis. First, a graph-based clustering algorithm based on a novel density-of-graph structure is proposed. Numerous researchers have focused recently on graph-based clustering algorithms because their graph structure is useful in modeling the local relationships among observations. By doing so, these algorithms can successfully discover nonlinear and local patterned clusters. However, no consensus exists about which algorithm best satisfies all the conditions encountered in a variety of real-world situations. In the proposed clustering algorithm, a density coefficient defined for each node is used to classify dense and sparse nodes. The main structures of clusters are identified through dense nodes and sparse nodes that are assigned to specific clusters. Experiments on various simulation data sets and benchmark data sets were conducted to examine the properties of the proposed algorithm and to compare its performance with that of existing spectral clustering and modularity-based algorithms. The experimental results demonstrated that the proposed clustering algorithm performed better than its competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly distributed. To ensure more satisfactory results from clustering analysis, an appropriate distance measure should be used. In spite of its significance, relatively few studies have been conducted to determine the most effective distance measures to use. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear and local patterned groupings. However, geodesic distance is sensitive to noisy patterns; hence, geodesic distance-based clustering may fail to identify nonlinear and local patterned clusters in the region of the noise. To overcome this sensitivity to noisy patterns, this thesis proposes a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark data sets were conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared with the competitors. This is especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned. Finally, this thesis proposes a feature ranking method to address high dimensionality. Feature ranking is a widely used feature selection method. It uses importance scores to evaluate the features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the features relevant to clustering analysis. To address this limitation, I propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble’s importance scores by decomposing the silhouette statistics of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments were conducted on different benchmark data sets to examine the properties of the proposed algorithm and to compare it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.

      • New criterion on portfolio selection based on trimmed clusters of stock market

        정승재 서울대학교 2016 국내박사

        RANK : 232446

        Decades have passed since the financial market began to receive attention from academia. Financial Economics became a solid branch of Economics, and statistical tools and Econometrics were exhaustively employed to analyze the financial data from every possible directions. A framework proposed using mathematical models was a catalyst for expansion of the market. Analytical tools from other fields such as Signal Processing and Physics discovered phenomena ubiquitous in financial data and established the stylized facts. Though these studies did deepen the understanding of the financial market, they weren’t sufficient to prevent financial crises. Institutional investors and policy makers helplessly watched the market tumbles and the academia was unable to provide a clear answer and often held responsible for crises. An apparent conclusion was that the financial market is far from being fully grasped, and further study is necessary. Recently, Network theory gained popularity as a tool to interpret the financial market structure. Analytical methods found in this field such as hierarchical tree and Minimum Spanning Tree were effective to visualize relative positions and interaction between assets. A network analysis begins with a similarity/dissimilarity measure to represent the system of objects. Statistical correlation between assets were extensively studied and accepted as a good quantity to measure the similarity/dissimilarity of assets. Another approach to utilize a dissimilarity measure is data mining. Data mining methods are particularly useful to process large amount of data in an exploratory research. Given that the financial market is yet to be explained such approach might provide an insight which was overlooked before. Therefore, clustering analysis, one of the most well-known data mining methods, was applied to the financial market. In this study, correlation coefficients between stocks were measured and transformed using a distance function. A well-established distance function preserves the topology of the original correlation matrix. It is a good metric to see the stocks as they were. Clustering analysis was then performed on dissimilarity matrices, which are correlation matrices transformed by a distance function. Clustering analysis is designed to put similar objects together in a cluster. Though not in a quantitative way as the clustering analysis, the investors and market participants already have a framework to group similar firms together. The categorization of firms using industrial sector such as the one given by MSCI’s Global Industry Classification Standard is accepted as the standard approach to group firms together. Investors would compare the firms in the same sector and add the most promising ones to their stock portfolio. One of the objectives of this study is to test whether the quantitative methods agree with the traditional classification by sectors. Firms were grouped by their correlations in stock returns and the members of the cluster were individually identified by their industrial sector. When a small data set of largest 30 firms by market capitalization in Korea were used to create a dendrogram, firms in the same sector were often found next to each other on the tree which suggests they are close to each other. There were few exceptions and the overall structure of trees varies for different correlation coefficients but a large part of the data would agree with the classification by the sector. However, when a clustering analysis was performed on a larger data set of 200 firms in Korea, most clusters were made of firms from different sectors and clusters rarely had more than 75% covered by a single sector which implies even within a sector, there is no clear dominant pattern which the members of the sector follow. A portfolio of stocks were constructed based on the clustering analysis. A hypothesis was that if the clustering analysis was able to capture the market structure, the portfolio created based on this information should outperform benchmarks such as the market index. The largest 200 firms by market capitalization in Korea were used to for the analysis, and portfolios of 10, 20, and 30 stocks were built and their performance was recorded. Stocks were chosen randomly from each clusters and the average performances of 1000 such portfolios were compared to the benchmarks. Since the stocks were chosen randomly, another benchmark, a portfolio of stocks randomly chosen from the entire data set was created. The purpose of the random portfolio is to determine whether there is a statistical difference in choosing stocks from portfolio or choosing in a completely random fashion. All clustering portfolios were able to outperform the market index but many failed to beat the random portfolio in terms of return-to-risk ratio. One of the possible explanation was found that the clustering analysis was able to identify a group of underperforming stocks and by choosing equal number of stocks from each clusters, the clustering portfolio had a relatively larger number of underperforming stocks. For an investor, the purpose of creating a stock portfolio is not to analyze the market structure but to buy diverse stocks with varying risk profiles thereby generating positive excess returns with an acceptable level of downside risk. Therefore, a trading simulation was performed to see if the clustering portfolios can be used to serve this purpose. Correlations of stocks were estimated using the historical data before a portfolio was launched, and then the portfolios were constructed in the same manner as the previous section. The portfolios were launched after the period of correlation estimation, so no information regarding the period of investment was incorporated in the portfolios. Although the clustering portfolios did outperform the market index, none of them were able to beat the random portfolio. A marked underperformance of clustering portfolios was detected for most of the portfolios. Detailed analyses of each portfolios and their clusters revealed that there were clusters of underperforming stocks and the portfolios had a disproportionately large number of underperforming stocks in their portfolio. By trimming down the underperforming clusters and thereby removing them from the portfolio construction step, the clustering portfolios were able to beat the random portfolio. The framework was formalized and using the US market, an extended portfolio management over 20 years were simulated. Clustering portfolios were constructed in 1990 and were managed until the end of 2015. Three rebalancing periods of 3 months, 6 months and 12 months were chosen and the assets were reallocating every rebalancing period to study the effect of rebalancing frequency. Three correlation estimation periods of 1 year, 3 years and 5 years were chosen and correlation coefficients were estimated over a given period to study how changing correlation estimation period would affect the performance of portfolio. Many clustering portfolios were unable to outperform the market index, and it was found that neither rebalancing period nor correlation estimation period had a linear relationship with the performance of portfolios. The cluster trimming process was formalized with rules and when a cluster with the most firms with net earnings loss over a long correlation estimation period of 3 years or 5 years was removed, the average return and return-to-risk ratio was improved. The result makes sense because firms with persistent earnings loss are likely to be struggling and adding them to portfolio is likely to be detrimental for portfolio’s performance. Another rule found was that when clusters with more firms with net earnings loss than firms with net earnings gain were removed, the performance of portfolios was improved significantly. The two rules were applied simultaneously and all clusters which satisfied the conditions were removed. The portfolios created without those clusters were able to outperform other clustering portfolios and the benchmark index. The purpose of this research was to analyze the market structure using clustering analysis based on correlation coefficients and propose a framework to create a stock portfolio. It was found that the classification by sector is insufficient to create a diversified portfolio. A framework to construct a portfolio based on clustering analysis was proposed and the trimming process to remove clusters of inferior stocks was introduced.

      • 동시출현단어 분석을 통한 국내 오픈액세스 지적구조에 관한 연구

        신주은 중앙대학교 대학원 2021 국내석사

        RANK : 232442

        본 연구는 동시출현단어 분석을 통해서 국내 오픈액세스 분야의 지적구조를 규명하고 다양한 세부 주제 영역을 파악하였다. 동시출현단어 분석은 특정한 주제 분야의 문헌 집합에서 키워드를 추출하여 각 키워드 쌍의 동시출현 빈도를 계산한 다음, 이를 기준으로 키워드 간의 연관성 정도로 하위 영역을 구분함으로써 해당 분야 연구의 구조를 시각적으로 파악하는 방법이다. 분석 대상을 선정하기 위해 KCI와 RISS에서 전 학문 분야를 대상으로 데이터를 수집하였다. 검색 키워드는 ‘오픈액세스’, ‘오픈엑세스’, ‘오픈억세스’, ‘open access’ 등과 같은 유의어를 포함하였으며, 검색 기간은 따로 제한을 두지 않았다. 이때, 학위논문을 동일하게 학술지에 게재한 경우 중복으로 간주하고 학위논문을 제외하였다. 데이터 수집 결과, 학술논문 98편, 학위논문 26편으로 총 124편이 최종 연구 대상으로 선정되었다. 키워드는 논문명과 초록에서 추출하였으며, 두 차례의 키워드 정제과정을 거쳤다. 총 1,157개의 키워드를 추출하였으며, 출현빈도 7회 이상이면서 가중치(TFxIDF) 값이 0.004 이상인 키워드 77개를 최종 분석 대상으로 선정하였다. 선정된 키워드로 동시출현단어 행렬, 코사인 유사도 행렬, 피어슨 상관계수 행렬을 작성하였다. 코사인 유사도 행렬을 바탕으로 네트워크 분석을 시행하였으며, 병렬 최근접 이웃 클러스터링 알고리즘을 적용한 패스파인더 네트워크를 통해 키워드들의 지적 관계를 시각화하였다. 또한 가중 네트워크를 위한 중심성 분석을 통해 전역중심성이 높은 키워드와 지역중심성이 높은 키워드 그리고 매개중심성이 높은 키워드를 확인하였다. 다음으로 피어슨 상관계수 행렬을 바탕으로 군집분석을 실시하였으며, 다차원척도법을 통해 나타낸 다차원 축적 지도에 군집분석 결과를 나타내었다. 키워드 간의 상관관계를 바탕으로 지적구조를 규명하였고, 다양한 분석 기법을 상호보완적으로 사용하여 결과 해석의 타당성을 부여하고자 하였다. 첫째, 네트워크 분석을 통해 나타난 국내 오픈액세스 분야의 지적구조 분석 결과 다음과 같다. 패스파인더 네트워크와 병렬 최근접 이웃 클러스터링을 적용한 결과, 3개 영역과 20개 세부 군집으로 최적의 군집이 도출되었다. 이를 통해 국내 오픈액세스 분야의 주제 영역을 구분하고 키워드 간의 관계를 파악할 수 있었다. 다음으로 네트워크의 중심성 분석 결과, 평균연관성 값이 가장 큰 키워드는 ‘오픈액세스’였으며, ‘기관’, ‘연구자’, ‘저작권’, ‘이용자’ 등의 순으로 나타났다. 이웃중심성 값이 가장 큰 키워드는 ‘오픈액세스’였으며, ‘학술지’, ‘기관’, ‘기관리포지터리’, ‘저작권’ 등의 순으로 나타났다. 삼각매개중심성 값이 가장 큰 키워드는 ‘오픈액세스’였으며, ‘기관’, ‘연구자’, ‘이용자’, ‘저작권’ 등의 순으로 나타났다. 매개중심성 값이 가장 큰 키워드는 ‘오픈액세스’였으며, ‘이용자’, ‘설문조사’, ‘연구자’, ‘기관’ 등의 순으로 나타났다. 이러한 중심성 분석 지표를 이용하여 국내 오픈액세스 연구 전반에 걸쳐 폭넓게 연결된 키워드, 세부 주제 영역에서 영향력이 가장 높은 키워드, 서로 상이한 주제 영역을 결속시키는 역할을 하는 키워드를 확인할 수 있었다. 둘째, 군집분석과 다차원척도법을 통해 나타난 국내 오픈액세스 분야의 지적구조 분석 결과 다음과 같다. 군집분석 결과, 5개의 군집으로 구분되었으며 제1군집 기관리포지터리, 제2군집 학술지, 제3군집 도서관, 제4군집 오픈액세스, 제5군집 저작권이다. 이들은 국내 오픈액세스 분야의 세부 주제 영역을 나타낸다. 다음으로 다차원척도법에 따라 형성된 다차원 축적 지도는 키워드 간의 상관관계에 따라 나타났다. 이어서 군집분석의 결과를 다차원 축적 지도상에 표시한 결과, 키워드 대부분은 군집별로 구분되어 위치하였다. 제3군집 도서관과 제2군집 학술지는 좌측에, 제1군집 기관리포지터리와 제5군집 저작권은 우측에, 제4군집 오픈액세스는 중앙에 있었다. 이를 바탕으로 제4군집 오픈액세스는 다른 군집과 상관관계가 높고 국내 오픈액세스 분야의 핵심이 되는 주제 영역임을 알 수 있었다. 셋째, 네트워크 분석을 통해 나타난 지적구조 분석 결과와 군집분석과 다차원척도법을 통해 나타난 지적구조 분석 결과를 비교하면 다음과 같다. 네트워크 분석을 통해 나타난 3개 영역과 군집분석을 통해 형성된 5개 군집의 세부 키워드는 80.52% 일치한 것으로 나타났다. 또한 평균연관성 값이 큰 키워드 ‘오픈액세스’, ‘현황’, ‘연구자’, ‘대학’, ‘저작권’ 등은 다차원 축적 지도 중앙에 있는 것을 확인할 수 있었다. 해당 키워드들이 국내 오픈액세스 분야 전반에 걸쳐 핵심적인 역할을 하고 있음을 다시 한번 확인하였다. 이를 바탕으로 2003년부터 2020년까지 전 학문 분야에서 수행된 국내 오픈액세스 분야의 핵심적인 주제 영역은 대학 리포지터리를 중심으로 하는 리포지터리에 관한 연구 영역과 오픈액세스 학술지와 논문의 제반 사항에 관한 연구 영역으로 구분할 수 있다. 본 연구는 국내 전 학문 분야의 오픈액세스 관련 연구물을 수집하여 동시출현단어 분석을 적용하여 지적구조를 규명하고자 하였다. 이를 위해 다양한 분석 기법을 상호보완적으로 사용하였다. 각 유사계수 및 분석 기법은 장단점이 있으므로 병행함으로써 핵심적인 주제 영역을 효과적으로 파악할 수 있었으며, 지적구조의 일관성을 확인할 수 있었다. 지금까지 국내 오픈액세스 분야를 대상으로 지적구조를 규명한 연구는 이루어지지 않았다. 본 연구는 내용 분석을 기반으로 하는 계량정보학적 분석을 통해 종합적이고 다양한 관점에서 고찰하고자 하였으며, 국내 전 학문 분야에서의 오픈액세스 연구의 지적구조를 규명하였다는 점에서 의의가 있다. 이러한 연구 결과는 국내 오픈액세스 분야의 지적구조를 시각적으로 파악할 수 있게 하며, 앞으로 국내 오픈액세스 연구의 방향성을 예측하는데 기초 자료로 활용할 수 있을 것으로 기대한다. In this study, by conducting a co-word analysis, the intellectual structure of the open access area is investigated and various sub-areas are identified. Co-word analysis is a visualization method for the structure of the research area by extracting keywords from a literature set in a specific topic, calculating the frequency of co-occurrence of keyword pair, and then classifying the sub-areas to the degree of correlation among keywords based on this. Data were collected from all fields in KCI and RISS to select the analysis object. The search keywords included the following synonyms: ‘open access’. The search period was not limited. The research papers considered repetition and were excluded when the same papers were published in other journals. As a result of data collection, 124 research articles, 98 articles and 26 theses, were selected as the final research objects. Keywords were extracted from titles and abstracts, and keyword cleaning was conducted. A total of 1,157 keywords were extracted. Seventy-seven keywords with a frequency of 7 and a weight value of 0.004 were selected as the final objects. A co-occurrence word matrix, cosine similarity matrix, and Pearson‘s correlation matrix were created with the selected keywords. The network analysis was conducted by a cosine similarity matrix. The intellectual relationship of keywords was visualized through PFnet with PNNC. Also, the centrality analysis of weighted networks identified keywords with high global centrality, local centrality, betweenness centrality. A cluster analysis was conducted by Pearson‘s correlation matrix. Cluster analysis results were displayed on a multidimensional scaling map. The intellectual structure was illustrated based on the correlation among keywords, and the validity of the result was given by using various analysis methods complementarily. First, the intellectual structure of domestic open access areas through network analysis is as follows. Three domains and twenty clusters were extracted through PFnet with PNNC. It was possible to distinguish the topic area and visualize intellectual relations among keywords from the open access area. Next, the centrality analysis of weighted networks was used to indicate the core keywords in this area. The keywords with the highest value of AVGSIM were ‘open access’, followed by ‘institutional’, ‘researcher’, ‘authorship’, and ‘user’. The keywords with the highest value of NC were ‘open access’, followed by ‘academic journal’, ‘institutional’, ‘repository’, and ‘copyright’. The keywords with the highest value of TBC were ‘open access’, followed by ‘institutional’, ‘researcher’, ‘user’, and ‘copyright’. The keywords with the highest value of betweenness centrality were ‘open access’, followed by ‘user’, ‘survey’, ‘researcher’, and ‘institutional’. The core keyword, influence keyword in the sub-areas, and bridge keywords were verified using centrality analysis. Second, the intellectual structure of domestic open access areas through cluster analysis and multidimensional scaling is as follows. Five clusters were derived by using cluster analysis. First clusters ‘institutional repository’, second clusters ‘journal’, third clusters ‘library’, fourth clusters ‘open access’, and fifth clusters ‘copyright’. They represent sub-areas of the domestic open access area. Next, a multidimensional scaling map described by multidimensional scaling was demonstrated based on the correlation among keywords. As a result of displaying five clusters on a multidimensional scaling map, most of the keywords were classified. The third and second cluster journals were on the left, the first and fifth clusters were on the right, and the fourth cluster open access was on the center. Moreover, the fourth cluster, ‘open access’, is highly correlated with the other clusters and is the main topic that is the core of the domestic open access area. Third, this paper compares the intellectual structure conducted by network analysis with the one obtained by cluster analysis and multidimensional scaling. The result is as follows. The detailed keywords of the three domains acquired through network analysis and the five clusters determined by cluster analysis were 80.52% consistent. In addition, the keywords 'open access', 'current state', 'researcher', 'university', and 'copyright' with high value of AVGSIM are at the center of the multidimensional scaling map. This paper confirmed once again that the keywords play a vital role throughout the domestic open access area. Based on this, the core topic areas in the domestic open access area, which was conducted in all fields from 2003 to 2020, can be divided into research areas on a study on the repository centered on the university repository and a study on all the matters of the open access journal and papers. This study aims to identify the intellectual structure through co-word analysis by collecting data related to open access in all fields of study in Korea. For this purpose, various analysis methods were used complementarily. Each similarity coefficient and analysis method has advantages and disadvantages, so it was possible to effectively grasp the core topic areas in parallel and confirm the intellectual structure consistency. Until now, no research has been conducted to identify the intellectual structure of domestic open access areas. This study was intended to examine a comprehensive and diverse perspective through bibliometrics analysis based on content analysis. It is meaningful because it has identified the intellectual structure of the open access area in all domestic fields. This research can visually identify and can be used as primary data for predicting the future direction of open access research domestically.

      • Social Protection in Asia: Cluster Analysis of the Disaggregated Social Protection Index

        링스타드알빈 서울대학교 행정대학원 2016 국내석사

        RANK : 232428

        The aim of this thesis is to cluster the countries in Asia based on their social protection. For that end, the Social Protection Index has been disaggregated into different indicators. This has previously been impossible but due to the Asian Development Bank’s data collection, this has now become available, and thus used in this thesis. It is important for researchers and policy makers alike to understand and learn from the countries in Asia. It is also important to extend the scope beyond East Asia and look at Asia in its entirety. The database contains detailed information on social protection in most countries of Asia. In order to use the index for clustering purposes, it has been disaggregated into three indicators. Firstly, it measures the coverage of social protection. Secondly, it measures the average expenditure per beneficiary adjusted to the relative poverty line. Finally, it measures gender spending by dividing the total amount of social protection spent on women by the total amount spent on men. The three indicators serves as the variables in a hierarchical cluster analysis using Ward’s method. The results for the cluster analysis is displayed through dendrograms that are further analyzed, in order to cluster the countries over time. At first all cases are clustered into two clusters, a “High-Performing Cluster” and a “Low-Performing Cluster”. Further within these clusters the worst- and best performing clusters are identified for each year. The countries that move between the high and low performing clusters are given special attention, to understand why they move. Moreover, over the three years the results are generalized, and the analysis partially reinforce the clusters geographical belongingness, with one or more exceptions per area. Further, this study explores the importance of coverage, gender spending, and depth: both in terms of justice, and in societal outcome. It shows how depth only can be understood through the coverage indicator. Thus, it also serves as critique to the Social Protection Index, which do not take this into account. The results also shed light on the importance of gender spending. It shows that although the other indicators may not be improving, the gender spending indicator account for some of the major changes throughout the years analyzed. Finally, the thesis suggests a way forward for social protection in region. It also suggests a global data collection mechanism in order to both expand the scope of countries, but also to enable researchers to look over a longer time period.

      • 스포츠 소비자의 관여도에 따른 스포츠 소비행동 분석

        김태민 연세대학교 대학원 2001 국내석사

        RANK : 232426

        본 연구의 목적은 스포츠에 대한 관여유형에 따라 소비자들을 세분화하여, 각 군집 유형별 인구통계학적 특성과 현재·미래의 스포츠 소비에 관한 특성을 분석함으로서 스포츠 소비자들의 소비행동을 파악하여 시장 지향적인 마케팅 전략 수립을 가능하게 하며 나아가 스포츠 소비자들에게 스포츠 소비에 대한 만족을 증진시킬 수 있는 기초 자료를 제공하는데 있다. 본 연구의 대상은 서울 지역의 규모 100명 이상 사업장에 직장을 갖고 있으며, 월 정기적인 수입이 있는 만 20세 이상의 성인 남·녀를 모집단으로 선정하였으며, 조사 대상의 표집은 비확률 표본 추출법 중 편의추출법(Convenience sampling method)을 사용하여 총 700부의 설문지를 배부하였고, 573부의 자료가 실제분석에 사용되었다. 자료의 분석을 위해 SPSS 8.0 Package를 사용하여 단순빈도분석(frequency), 요인분석(factor analysis), 신뢰도 분석(reliability analysis), 군집분석(cluster analysis), 복수응답분석(Multiple Resp onse), Chi-squ are검증, 일원분산분석(On e-way ANOVA) 등을 실시하였다. 이와 같은 연구 절차를 거쳐 얻어진 본 연구의 결과는 다음과 같다. 첫째, 스포츠 관여도 유형에 따라 스포츠 소비자들은 3개 군집으로 세분화되어 나타났으며 각각 스포츠에 대해 '즐거움·중요성과 자기표현을 중시'하는 군집1, '위험성과 즐거움·중요성을 중시'하는 군집2, '자기표현과 위험성을 중시'하는 군집3으로 명명하였다. 둘째, 스포츠 관여유형별 각 군집에 따른 인구통계학적 특성에서 군집1과 군집2는 '남자'가, 군집3은 '여자'가 높은 비율을 차지하고 있는 것으로 나타났으며, 직급과 월평균 수입에서는 모든 군집에서 '사원', '100-200만원 미만'이 높게 나타났다. 연령, 학력, 결혼유무에 대해서는 유의한 차이가 나타나지 않았다. 셋째, 스포츠 관여유형별 각 군집에 따른 참가종목과 스포츠활동 참가여부에는 차이가 있는 것으로 나타났다. 특히, 스포츠활동 참가여부에서 '현재는 참가하지 않는다', '미래에는 참가한다'가 각 군집별로 높은 비율을 차지하는 것으로 나타났다. 넷째, 스포츠 관여 유형별 각 군집에 따른 스포츠활동 1주간 스포츠활동 참가 정도는 각각 현재와 미래에서 모두 유의한 차이가 나타나지 않았지만, 평균 비교에 의하면 모든 군집에서 현재보다는 미래에 참가정도가 늘어날 것을 나타났다. 다섯째, 스포츠 관여유형별 각 군집에 따른 스포츠활동에 있어 1개월간 경제적 투자정도에서 현재는 유의한 차이가 없게 나타났지만 미래에서 유의한 차이가 나타났으며, 현재보다 미래에 더 많은 경제적 투자를 할 것으로 나타났다. The purpose of this study was to subdivide sport consumers based on their involvement of sport to analyze social demographic characteristics. This study was based on cluster types to analyze characteristics of present and future sport consumption to find out sport customer' s behavior establishing marketing strategy. Therefore it can be useful for basic data to increase sport customer's satisfaction of sport consumption. The subjects used for this study were male and female employees of Seoul area companies with 100+ employees over the age of twenty. The data was collected using the convenience sampling method. 573 questionnaires of the 700 collected were used for analysis for this study. SPSS 8.0 Package was used for the data analysis. Frequency analysis, factor analysis, reliability analysis, cluster analysis, multiple response, chi-square, and one-way ANOVA were all used in this study. The results of the study are as follows: First, the subjects could be divided into 3 clusters based on their main reason for getting involved with sport. Cluster 1: involved in sport for the enjoyment, importance, and means of self-expression; Cluster 2: involved in sport due to the danger, enjoyment, and importance; and Cluster 3: involved in sport due to a means of self-expression and the danger. Second, the males in cluster 1 and 2 and the females of cluster 3 had a higher social demographic characteristics compared to the other subgroups. The subjects in these subgroups had a higher percentage of pay, between 1,000,000-2,000,000 won higher based on average monthly salary. There were no other differences based on age, education, or marital status. Third, there was a difference for attend event and sport activity attend or not by each cluster of sport involvement types. Specially, in sport activity attend or not, presence no attend and future attend had high percentage for each cluster. Fourth, 1 week sport activity attend by each cluster of sport involvement types had no difference for presence and future. However, for the average comparison, in all cluster future more attendance than presence attendance. Fifth, 1 month economical investment for sport activity by each cluster of sport involvement types that there was no difference in presence but, there was a difference in future and there will be more economical investment in future rather than presence.

      • Nautical Route Clustering Analysis in Coastal Sea using Statistical Trajectory-Distance Metric

        유원철 서울대학교 대학원 2022 국내박사

        RANK : 232414

        자율운항 기술이 중요해지면서 자동항로계획 기술 역시 중요해지고 있다. 특히 대양에서의 자동항로계획에 비해 연안에서의 항로계획은 좌초, 접촉, 충돌 등 다양한 위험을 추가로 고려해야하는 어려움이 있다. 이러한 문제를 더 쉽게 해결하기 위하여 기존의 AIS 항해 궤적 데이터와 항해자들의 이전 항로 계획을 활용하여 연안 항로계획을 수립하고자 하는 다양한 시도가 있었으며, 그 중 하나가 연안 항로의 클러스터링 분석이다. 연안 항로 클러스터링 분석은 다른 궤적 클러스터링 분석과 달리 섬, 물표 등의 존재로 인하여 유사한 형상의 궤적을 분리해야 하는 경우가 발생한다. 본 연구에서는 이러한 문제를 해결하기 위하여 통계적 궤적-차이 계량법과 이를 이용한 연안 항로 클러스터링 분석 방법을 제안한다. 통계적 궤적-차이 계량법은 항해 계획 시 설정하는 cross-track distance(XTD)를 활용한 연안 항로 클러스터링에 특화된 궤적 간의 차이를 계량하는 방법이다. 통계적 계량-차이 계량법은 dynamic time warping (DTW) 궤적-차이 계량법에서 거리항을 통계 거리로 대체하여 정의할 수 있다. 본 연구에서는 얀센-샤논 분산(Jensen-Shannon divergence)과 와서스테인 거리(Wasserstein distance)의 선형결합을 대체를 위한 통계거리로 활용한다. 항로의 모든 웨이포인트를 XTD로 정의되는 비대칭 이항 정규분포로 모델링하여 통계 거리를 계산한다. 통계적 궤적-차이 계량법을 DBSCAN 클러스터링 방법에 적용하는 것으로 연안 항로 클러스터링 방법을 제안하였다. 통계적 궤적-차이 계량법은 좌초, 접촉 위험이 고려된 XTD를 기반으로 산출되기 때문에 섬과 같은 좌초, 접촉의 위험을 반영하여 항로 클러스터링을 할 수 있는 장점이 있다. 결과 클러스터로부터 대표항로를 추출하는 방법으로 중간값 궤적을 제안하였으며, 이는 평균을 계산하기 어려운 연안항로 데이터에 적용하기 적합한 대표값 산출 방법이다. 제안된 연안 항로 클러스터링 분석 방법은 국내의 AIS 궤적을 이용하여 검증, 검정하였다. As autonomous navigation technology rises, the autonomous route planning technology for surface ships is also becoming important. Unlike the route planning in outer sea, route planning in coastal area is more difficult various risks like grounding, contact, and collision. There have been a variety of attempts to utilize historical AIS data and previous route plans for route planning in coastal sea. Nautical route clustering analysis is one of the best ways for analyzing the data. This study presents clustering analysis method for nautical routes using a novel statistical trajectory-distance metric, a specialized trajectory-distance metric for nautical route clustering analysis. Based on the dynamic time warping (DTW) metric, one of the most used metrics for trajectory distance, the statistical trajectory-distance metric was defined by replacing the distance term in DTW with a linear combination of the Jensen-Shannon divergence and Wasserstein distance. Each waypoint from a nautical route was modelled as a discrete and asymmetric binomial normal distribution defined by the cross-track distance (XTD) of the waypoint. The model was then used to compute the statistical distance between waypoints. Nautical route clustering was performed using density-based spatial clustering of applications with noise and the statistical trajectory-distance metric. The nautical route for the clustering analysis, including the XTD information, was extracted from automatic identification system (AIS) data from the southern sea of the Korean Peninsula. The clustering results were evaluated by comparing them with the results of other popular trajectory-distance metrics. The proposed method was more effective compared to other trajectory-distance when the trajectories pass on both side of a small island, which is frequent case in coastal route clustering.

      • Spatio-temporal Cluster Analysis of Chickenpox Incidence among School Students in Seoul, Republic of Korea : NEIS, 2011-2018

        최진영 서울대학교 대학원 2020 국내석사

        RANK : 232397

        Chickenpox is an acute, highly contagious, and predominantly childhood disease caused by the varicella-zoster virus (VZV). The global epidemiology of chickenpox dramatically changed upon introduction of varicella vaccines, which are highly effective in reducing the incidence and burden of the disease. In the Republic of Korea, however, the incidence of chickenpox increased between 2006 and 2017, despite implementation of a routine one-dose varicella vaccination program in 2005. This study aims to identify chickenpox clusters among students in Seoul over the past eight years and explore the risk of transmission in an effort to control the incidence of chickenpox. Chickenpox incidence data were collected from the National Education Information System from 983 schools within 25 districts between March 2011 and December 2018. Sociodemographic data were available from the Korean National Statistics Office. Global (Moran’s I) and local (LISA) spatial autocorrelations were calculated. Space-time permutation scan statistics were used to identify clusters in a case data setting only. A discrete Poisson model was used for aggregated data. Association analyses of risk factors selected as student density, vaccination coverage, and local deprivation index scores were estimated via the R-INLA (Integrated Nest Laplace Approximation). SatScan, QGIS, GeoDa, and R were used for statistical processing. LISA at the district level resulted in areas that were not spatially autocorrelated. However, clusters created from high incidence areas gradually spread to neighboring districts and diminished over time, which led to an overall increase in incidence across the city. Differentiation by grade was similar to that of the total analysis, as evidenced by the obvious increase in middle- and high school-aged children within clusters. Semiannual analyses resulted in detection of clusters in the same northeastern region from 2011 to 2013 and southeastern area from 2016 to 2018. We assumed a chickenpox incidence cluster from a prospective analysis peak week in 2018. In May, sensitivity was 53.6%; however, it was possible to identify the most likely continuous cluster, with the rationale of a highly positive predicted value. When the incidence was consistent in a certain region, predicting the most likely location was feasible. This process is based on the concept of syndromic surveillance, and we expect to use it for prior detection of spatio-temporal chickenpox incidence and to expeditiously control outbreaks in the future. This study describes the post-licensure epidemiology of chickenpox incidence from a spatio-temporal point of view. The overall increase in chickenpox incidence among schools in Seoul could be attributed to a scattering of chickenpox from high incidence clusters to neighboring schools. Consequently, schools have a crucial role in spreading the disease; therefore, the importance of school surveillance and construction of a complementary surveillance system is essential to prevent and control chickenpox incidence. 수두는 수두-대상포진바이러스에 의해 호발하는 전염력이 매우 높은 급성 감염 질환으로, 소아에게 흔한 질병이다. 범세계적 수두 역학은 예방백신의 보급을 통해 발생률과 부담을 줄이는데 매우 효과적으로 변모하였다. 그러나 한국에서는 2005년부터 수두백신 1회 국가예방접종을 실시했음에도 불구하고 지속적으로 수두발생이 증가하고 있다. 본 연구는 지난 8년 동안 서울시 학교 학생에게서 발생한 수두의 군집성을 파악하고 수두 발생 위험성을 조사하여 수두 발생 억제의 기반이 되는 것을 목표로 한다. 교육행정정보시스템(NEIS)에서 집계한 2011년 3월부터 2018년 12월까지 신고된 983개 학교 수두발생 자료와 사회인구학적자료는 통계청 자료를 활용하였다. 전체 및 부분공간자기상관은 Moran’s I 및 LISA를 통해 측정하였고, 시공간순열 통계모형을 활용하여 사례 데이터 설정에서 클러스터를 식별하는데 활용하였다. 또한 학교 데이터를 집적하여 구별로 분류하였을 때는 이산 포아송 모형을 활용하였다. 추가로 위험요인에 대한 연관성 분석을 위해 구별 학생 밀도, 예방접종률, 지역 박탈지수를 설명변수로 지정하고 R-INLA 모형을 사용하였다. 통계 프로그램은 SatScan, QGIS, Geoda, R을 활용하였다. 전체 및 부분자기상관성 여부 규명 결과, 행정자치구 간 자기상관성이 유의하지 않게 나타났으나 발생률이 높은 학교 중심에서 시간이 지나면 점차 인근 지역으로 확산되어 도시 전체의 발생률이 증가되는 결과를 초래하였다. 또한 초∙중∙고별 학생의 시공간 클러스터 분석 결과, 모든 군에서 발생률이 높은 지역을 중심으로 클러스터가 산발적으로 발생하고 흩어지는 경향성이 있음을 확인할 수 있었다. 또한 중∙고등학교 학생의 수두 발생이 뚜렷이 증가함에 따라 클러스터 또한 증가하였다. 상∙하반기별 시공간 클러스터 분석 결과, 2011년부터 2013년에는 서울 북동쪽 지역에서 반복되는 경향이 있고, 2016년부터 2018년에는 남동쪽에서 지속적으로 발생하는 경향이 관찰되었다. 집적된 자료를 토대로 2018년 다발 시기인 5월, 11월의 한 주에 대한 발생 군집을 예측하였다. 학교 데이터 기반으로 전체 학생 수 설정 및 대조군 설정이 어려워 시공간순열통계 모형을 사용하여 전향적 분석한 후, 기존 집적된 데이터로 해당 시기의 군집을 후향적으로 분석한 결과와 비교하였다. 비교에는 민감도, 특이도, 양성 예측도 및 음성 예측도가 활용되었다. 5월 넷째 주 클러스터 분석 결과에서 가장 유의미한 결과값이 도출되었는데, 민감도 53.76%, 특이도 99.2%, 양성 예측도 78.7%, 음성 예측도 97.5%로 민감도가 낮아 예측한 군집에 속한 학교의 절반 정도가 실제 군집으로 드러났고, 양성 예측도를 통해 해당 시기 유병률 또한 일정 수준을 유지하며 지속적인 수두 발생이 관측되었다. 이 연구는 공간적 시점의 관점에서 수두 발생에 대한 조치 후의 역학을 설명하고자 한다. 서울 내 학교들의 수두 발생률이 전반적으로 증가한 것은 밀접 지역의 수두 발생이 유의하게 올라갈수록 발생이 높은 수준의 학교에서 인근 학교로 확산을 통해 기인할 수 있고, 특정 학교들을 중심으로 지속적인 수두가 발생할 때 예측 모형을 활용하여 수두 발생의 위험이 높은 학교를 미리 관측하여 제공하는 조기 감시체계 강화에 기본 근거가 될 수 있을 것으로 기대한다.

      • Automated clustering of the three-dimensional mandibular canal course using unsupervised machine learning method

        김영현 Graduate School, Yonsei University 2021 국내박사

        RANK : 232383

        연구목적: 본 연구의 목적은 하악 신경관의 3차원 주행 경로 분류를 위해 비지도 머신 러닝 기법인 군집 분석을 적용하고, 군집 분석을 통해 제시된 한국인의 표준 하악 신경관 주행 경로를 시각화하는 것이다. 연구대상 및 방법: 연세대학교 치과 대학병원에서 촬영된 총 429개의 치과용 콘빔시티 영상이 이용되었다. 하악 신경관의 주행의 측정을 위해 총 4개의 단면 영상이 선택되었고, 각 부위 별 2개의 수직 및 2개의 수평 변수를 획득하였다. 군집 분석은 변수 선택, 변수 표준화, 군집 경향성 평가, 최적 군집 수 결정, k-mean 군집 분석의 순서로 진행되었다. 연구결과: 군집 분석에 의해 세 가지 유형의 하악 신경관 주행 경로가 도출되었으며 군집 간에 통계적으로 유의한 평균 차이를 보였다. 군집 1은 횡단면 뷰에서 설측 방향으로 주행하고, 시상면 뷰에서 가장 급격한 경사를 보이는 유형이다. 군집 2는 하악 하연 및 설측 경계에 가장 가깝게 거의 직선으로 주행하는 유형이다. 군집 3은 후방 영역에서 협측 구부러짐과 경사 증가가 나타나는 유형이다. 군집 1유형은 26.0%로 가장 적게 분포하였고, 해당 유형에서 여성이 59.2%, 남성이 40.8%로 나타났다. 반면 군집 2유형은 42.1%로 가장 많은 분포를 보였으며, 여성(42.9%)에 비해 남성(57.1%)에서 더 많이 분포하였다. 군집 3은 전체의 31.9%가 분포하였으며 성별 비율은 유사하였다. 3개 군집 모두 좌우측에 대해 통계적으로 유의한 분포차는 없었다. 결론: 군집 분석을 통해 하악 신경관의 3차원 주행은 세가지 유형으로 자동 분류되었다. 군집 분석은 관찰자 변동성을 감소시켜 해부학적 구조의 편향되지 않은 분류를 가능하게 하며, 각 분류 그룹에 대한 대표적인 표준 정보를 제공하는데 사용될 수 있다. Purpose: The aims of this study were to apply cluster analysis, unsupervised machine learning method, for three-dimensional (3D) course classification of mandibular canal (MC), and to visualize the standard MC courses presented by cluster analysis in the Korean population. Methods: A total of 429 cone beam computed tomography images acquired at Yonsei University Dental Hospital were used. Four sites were selected for the measurement of the MC course and two vertical and two horizontal parameters were measured per site. Cluster analysis was carried out as follows: parameter selection, parameter normalization, cluster tendency evaluation, optimal number of clusters determination, and k-means cluster analysis. Results: Three types of the 3D MC course were derived as cluster 1, 2 and 3 by cluster analysis, and statistically significant mean differences were shown among clusters. Cluster 1 showed a smooth line running towards the lingual side in the axial view and a steep slope in the sagittal view. Cluster 2 ran in an almost straight line closest to the lingual and inferior border of mandible. Cluster 3 showed the pathway with a bent buccally in the axial view and an increasing slope in the sagittal view in the posterior area. Cluster 1 had the least distribution (26.0%). In this cluster, females accounted for 59.2% and males, 40.8%. On the other hand, cluster 2 showed the widest distribution of driving courses (42.1%), and males were more widely distributed (57.1%) than the female group (42.9%). Cluster 3 comprised similar ratio of male and female cases and accounted for 31.9% of the total distribution. For all three clusters, distributions of the right and left sides did not show a statistically significant difference. Conclusions: The 3D MC courses were automatically classified as three types through cluster analysis. Cluster analysis enables the unbiased classification of the anatomical structures by reducing observer variability and can present representative standard information for each classified group.

      • Deep learning-based stock price clustering using image encoder for time series

        구도현 서울대학교 대학원 2021 국내석사

        RANK : 232383

        Cluster analysis research, which classifies groups based on similarities, is being actively conducted in various academic fields. This research applies further analyses by examining the characteristics based on similarities within each cluster and across different clusters. Besides, cluster analysis based on stock prices is being actively exploited in the stock market for various objectives, including future price and direction prediction, algorithm trading, investment recommendation system, portfolio management, and outlier detection. Generally, the traditional method reduces the high dimension of stock data using principal component analysis and forms clusters applying k-Means and Hierarchical Clustering. In this study, we encoded the stock price as an image using Gramian Angular Field to handle the input data. The first proposed model employed Deep Clustering, which is actively researched in the computer vision field, to optimize both dimensionality reduction and clustering. We calculated reconstruction loss based on CNN-AutoEncoder and obtained the probability distribution of data belonging to each cluster using Student's t-distribution kernel function. The normalized distribution, the probability divided by the cluster size, was defined as an auxiliary target distribution and assumed it as a true label. We were able to approach cluster analysis through supervised learning even though it was unsupervised learning in nature. Therefore, we computed the difference between the two probability distributions using Kullback-Leibler divergence and defined it as clustering loss. We defined final loss as the sum of reconstruction loss and clustering loss to train the model. The second model reduced the dimension of an image through CNN-AutoEncoder and clustered using the k-Means algorithm. The data used in the experiment was approximately 500 stocks from S&P500 and 960 business days ranging from late-2016 to mid-2020. The proposed models were compared with the traditional clustering methods based on four validation metrics. These models extracted similarities from the training data to cluster with a high association, which resulted in better performance of future validation data. We implemented Paired Sample T-test in our experiment and identified a meaningful difference within the validation metric, especially in the correlation coefficient. We were able to find that the imbalance in cluster size was problematic when applying traditional cluster algorithms to stock prices. Thus, we defined the cluster size ratio by dividing the number of data in the largest clusters by the number of data in the smallest clusters to compare with the alternative clustering models. The number of stocks in the cluster was relatively balanced compared to other alternative models due to the normalization effect of the auxiliary target distribution. Finally, we constructed a portfolio by sampling the stocks within the cluster. We obtained optimal portfolio weights by solving the Tangency Portfolio problem. The portfolio was constructed by selecting stocks at the optimal weight, which was later compared the return with other alternative models. 여러 학술 분야에서 유사성을 기반으로 여러 집단으로 분류하는 군집 분석의 연구가 활발히 되고 있으며 군집 내의 유사성 및 특성을 분석하고 다른 군집과의 차이를 살피는 등의 응용이 다양하게 되어 있다. 군집분석은 주식시장에도 이용되고 있고 주식 가격을 군집화 하여 미래 주가 및 방향성 예측, 알고리즘 트레이딩, 투자 추천 시스템, 포트폴리오 관리, 이상치 탐지 등에 이용하고 있다. 전통적인 방법론은 일반적으로 높은 차원의 주식 데이터를 주성분 분석 (Principal Component Analysis) 등의 방법론을 이용하여 축소하고 k-Means, Hierarchical clustering 등의 방법론을 이용하여 군집을 형성한다. 본 연구에서는 주식 가격을 Gramian Angular Field를 이용하여 인코딩한 이미지를 입력 데이터로 구성한다. 또한 첫 번째 모델로 컴퓨터 비전 분야에서 활발하게 연구되고 있는 Deep Clustering 방법론을 이용하여 축소 차원과 군집 형성을 공동으로 최적화하여 군집을 만든다. 합성곱 오토인코더 (CNN-AutoEncoder)를 기반으로 재건설 손실값 (reconstruction loss)을 구하고 Student의 t-분포를 커널 함수로 사용하여 데이터가 군집에 속할 확률을 구한다. 이 확률을 군집 크기로 나누어 정규화한 분포를 보조 타켓 분포 (auxiliary target distribution)로 정의하고 이를 실제 라벨이라고 가정하면 비지도학습인 군집분석을 지도학습과 같이 학습할 수 있다. 따라서 두 확률 분포의 차이를 쿨백-라이블러 발산(Kullback-Leibler divergence)을 이용하여 구하고 이를 군집 손실값 (clustering loss)으로 정의한다. 재건설 손실값과 군집 손실값을 합하여 최종 손실값 (total loss)으로 정의하고 이 손실값으로 모델을 학습한다. 두 번째 모델은 딥러닝 기반의 합성곱 오토인코더를 이용하여 이미지의 차원을 축소하고 k-Means 알고리즘을 이용하여 군집을 형성하는 것이다. 실험에 사용한 데이터는 S&P 500이고 2016년 말부터 2020년 중순까지의 960영업일을 약 500개의 주식을 대상으로 분석한다. 제안하는 모델은 네 가지 검증 측도를 기준으로 비교 군집분석 방법론과 비교한다. 제안하는 모델은 훈련 데이터에서 보이는 유사한 특징을 추출하여 연관성이 높은 주식을 뽑아내 군집을 형성하고 한 차례 미래 시점인 검증 데이터에서 보다 우수한 성능의 주식 군집을 형성한다. 이를 확인하기 위하여 대응 표본 T-검정 (Paired Sample T-test)을 통하여 실험을 진행하였고 특히 상관계수를 기반으로 한 검증 측도에서 유의미한 차이가 있었다. 전통적인 군집 알고리즘에 주식 가격을 적용하면 군집의 크기의 불균형도가 심한 것을 실험을 통해 알 수 있다. 따라서 가장 큰 군집의 데이터 수를 가장 작은 군집의 데이터 수로 나눈 비율을 정의하고 제안하는 모델의 비율을 전통적인 군집 모델과 비교하였다. 모델의 보조 타겟 분포의 정규화 영향으로 군집 내의 주식의 수가 다른 모델 대비 균등하게 분포되어 있어 안정적이고 합리적인 비율이 나온다. 마지막으로 군집을 형성한 후에 군집에서 주식을 적절한 방법으로 샘플링 (sampling)하여 포트폴리오를 구성한다. 접점 포트폴리오 (tangency portfolio) 문제를 풀어 포트폴리오에서 최적 포트폴리오의 주식 가중치를 얻는다. 이를 기반으로 주식을 최적 비율만큼 선택하여 포트폴리오를 구성하고 다른 모델들과 수익률을 비교한다.

      • Unsupervised Cluster Analysis of Aortic Stenosis Patients Reveals Distinct Population with Different Phenotypes and Outcomes

        곽순구 서울대학교 대학원 2020 국내석사

        RANK : 232382

        There is a lack of studies investigating the heterogeneity of patients with aortic stenosis (AS). We explored whether cluster analysis of AS patients identifies distinct subgroups with different prognostic significances. Newly diagnosed moderate or severe AS patients were prospectively enrolled between 2013 and 2016 (n=398, mean 71 years, 55% male). Among demographics, laboratory, and echocardiography parameters (n=32), 11 variables were selected through dimension reduction and used for unsupervised clustering. Phenotypes and causes of mortality were compared between clusters. Three clusters with markedly different features were identified. Cluster 1 (n=60) was predominantly associated with cardiac dysfunction, cluster 2 (n=86) consisted of elderly with comorbidities, especially chronic kidney disease, whereas cluster 3 (n=252) demonstrated neither cardiac dysfunction nor comorbidities. Although the AS severity did not differ, there was a significant difference in the adverse outcomes between the clusters during a median 2.4 years follow-up (mortality rate 13.3% vs. 19.8% vs. 6.0% for cluster 1, 2, and 3, P<0.001). Particularly, compared to cluster 3, cluster 1 was associated with only cardiac mortality (adjusted hazard ratio [aHR] 7.37, 95% confidence interval [CI] 2.00-27.13, P=0.003), whereas cluster 2 was associated with higher non-cardiac mortality (aHR 3.35, 95% CI 1.26-8.90, P=0.015). Phenotypes and association of clusters with specific outcomes were reproduced in an independent validation cohort (n=262). Unsupervised clustering of AS patients revealed three distinct groups with different causes of death. This provides a new perspective in the categorization of AS patients that takes into account comorbidities and extravalvular cardiac dysfunction. 대동맥판막 협착증은 다양한 표현형이 혼재해 있는 복합적인 질환이다. 본 연구에서는 기계 학습의 일종인 군집 분석 기법을 이용하여 그 표현형들을 데이터에 기반하여 분류해보고자 하였다. 2013년부터 2016년까지 새롭게 진단된 중증도 혹은 중증의 대동맥판막 협착증 환자 398명을 대상으로 하였다. 총 32개의 신체검진, 혈액검사, 심장초음파 지표들이 군집 분석에 활용되었다. 모형 기반 군집 분석을 적용하여 세 군이 확인되었다. 제1군은 60명으로 심장의 구조적, 기능적인 이상이 주요한 특징이었고, 제2군은 86명의 나이가 많고 여러 동반된 질환들이 있는 환자들이었다. 제3군은 심장기능부전이나 동반 질환이 모두 없는 252명의 환자들로 구성되었다. 세 군의 대동맥판막 협착증의 세기를 평가하는 지표에는 차이가 없었으나, 중간 값 2.4년의 추적 기간 동안 세 군의 예후는 크게 차이가 났다. 2군이 가장 불량한 누적 생존율을 보였으며, 1군은 중간의 예후를 보였고, 3군은 가장 높은 생존율을 보였다. 사망원인에 따라 분류하였을 때, 1군은 심혈관질환에 의한 사망률과 유의미한 연관이 있었으며, 2군은 비심혈관질환 그리고 심혈관질환에 의한 사망과 각각 연관이 있었다. 본 모델을 독립적인 다른 262명의 환자들에게 적용하여 분류하였을 때에도 일관된 표현형과 예후 패턴을 보였다. 요약하면, 본 연구에서는 군집 분석을 통해 대동맥판막 협착증 환자를 특징적인 표현형과 예후를 보이는 세 군으로 분류할 수 있었다. 본 연구 결과는 대동맥판막 협착증 환자들을 동반 질환들과 판막 외 심장기능이상을 고려하여 의미 있게 분류가 가능함을 보여주었고, 향후 분류 체계의 새로운 시각과 접근법을 제공해준다.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼