RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Finding abnormal transactions among credit card transactions is known as credit card fraud detection. With the recent rapid growth of e-commerce, abnormal transaction patterns are becoming more complex and sophisticated as the volume of credit card transaction increases exponentially. As the customer damage caused by abnormal transactions increases, companies have implemented and operated the fraud detection system to minimize damage. The fraud detection system is configured by learning patterns of normal and abnormal transactions through machine learning based on huge data related to credit card transactions, and predicting whether an actual transaction is abnormal through the learned model. In this dissertation, we propose a method to build the fraud detection system with excellent performance. In terms of datasets, credit card transactions are imbalanced datasets in which the distribution of normal and abnormal transactions is imbalanced. General machine learning methods are known to be suboptimal for such imbalanced classification. A popular solution is to balance training data by oversampling the underrepresented classes (or undersampling the overrepresented classes) before applying machine learning algorithms. However, despite its popularity, the effectiveness of sampling has not been rigorously and comprehensively evaluated. To address this issue, we evaluated combinations of seven sampling methods and eight machine learning classifiers (56 varieties in total) on 31 datasets with varying degrees of imbalance. We used the areas under the precision-recall curve (AUPRC) and receiver operating characteristics curve (AUROC) as the performance measures. AUPRC is known to be more informative for imbalanced classification than AUROC. We observed that sampling significantly changed the performance of the classifier (paired t-tests P < 0.05) only for few cases (12.2% in AUPRC and 10.0% in AUROC). Surprisingly, sampling tended to degrade rather than improve the classification performance. Furthermore, the negative effects of sampling were more pronounced in AUPRC than in AUROC. Among the sampling methods, undersampling performed worse than others. Also, sampling was more effective in improving linear classifiers. Most importantly, we did not need sampling to obtain the optimal classifier for most of the 31 datasets. In addition, we found two interesting examples where sampling significantly reduced AUPRC while significantly improving AUROC (paired t-tests P < 0.05). In conclusion, the applicability of sampling is limited because it could be ineffective or even harmful. In addition, the choice of the performance measure is critical to decision making. Our results provide valuable insights into the effect and characteristics of sampling for imbalanced classification. Credit card fraud detection is a typical classification problem for which various machine learning methods have been applied and proposed. In previous studies, deep neural networks and gradient boosting-based methods have generally shown excellent classification performance. However, it is difficult to clearly determine which machine learning method should be applied to fraud detection in the real situation, because machine learning methods, performance evaluation measures, experimental datasets, and test data performance estimation methods are different for each study. In terms of machine learning methods selection, nine machine learning methods were applied to two publicly available real credit card transaction datasets. We analyze the results to see which method is the best for credit card fraud detection. Our experimental results show that the gradient boosting methods-extreme gradient boosting (XGBoost) and light gradient boosting machines (LGBMs)-have the highest classification accuracy on both datasets. We also achieved better results than the previous state-of-the-art results on the credit card fraud detection dataset. In terms of prediction time, LGBMs was more than 40 times faster than XGBoost. Based on these results, we propose that gradient boosting based methods, especially LGBMs, are suitable for credit card fraud detection. This dissertation also proposes a number of issues to be considered when establishing the fraud detection system for the real fields. It has been analyzed that the direction of optimizing AUPRC is directly related to the direction of minimizing the costs associated with the problem of detecting frauds on credit cards. Therefore, it is more desirable to use AUPRC when evaluating the performance of the fraud detection system. In addition, it was pointed out that data analysis and preprocessing are necessary based on an understanding of the domain, because the size of credit card transaction data is large and the number of features is diverse. In addition, it is necessary to recognize that the pattern of abnormal use of credit card transaction data changes over time, and periodic system relearning is essential to prevent system performance degradation. Finally, it was explained that a strategy to build a detection system according to the type of frauds that is almost immediately detected as abnormal or later detected with a time difference. In addition to the two major issues of dealing with imbalanced datasets and selecting machine learning methods, there are several other issues that need to be comprehensively studied in order to establish an excellent fraud detection system, and they should be supplemented by continuous research in the future.

번역하기

국문 초록 (Abstract)

신용카드 거래 중 비정상적인 거래를 찾는 것을 신용카드 이상거래 탐 지(credit card fraud detection)라고 한다. 최근 전자상거래의 급속한 성 장에 따라 신용카드 거래량의 기하급수적인 증가와 동시에 이상거래 패 턴 또한 복잡하고 정교하게 진화 중이다. 이상거래로 인한 고객 피해가 늘어남에 따라, 기업에서는 피해를 최소화하기 위하여 이상거래 탐지 시 스템(fraud detection system)을 도입하여 운영하고 있다. 이상거래 탐지 시스템은 신용카드 거래와 관련된 방대한 데이터를 기반으로 정상과 비 정상거래의 패턴을 기계학습을 통해 학습하고, 학습된 모델을 통해 실제 거래의 이상거래 여부를 예측하는 방식으로 작동한다. 본 논문에서는 우수한 성능의 이상거래 탐지 시스템을 구축하기 위한 방법을 제안한다. 첫 번째로 데이터 측면에서 신용카드 거래는 정상과 비정상거래의 분포가 불균형한 불균형데이터이다. 일반적인 기계학습 방 법은 이러한 불균형데이터의 분류문제에서 다수클래스에 편향적인 결과 를 보이는 것으로 알려져 있다. 이런 문제를 해결하기 위해 가장 널리 사용되는 방법은 기계학습을 적용하기 전에 표집화를 통해 데이터를 균 형화하는 것이다. 그러나 표집화의 인기에도 불구하고 실제 표집화 적용 의 효과는 종합적으로 충분히 검증되지 않았다. 본 논문에서는 표집화 적용의 효과를 확인하기 위하여 2개의 신용카드 거래데이터를 포함한 31개의 불균형데이터에 7개의 표집화 방식, 8개의 기계학습 방법, 총 56개의 조합을 적용하였다. 분류 정확도 평가지표는 area under precision-recall curve(AUPRC)와 area under receiver operating characteristics curve(AUROC)를 사용했다. AUPRC는 AUROC보다 불균 형데이터의 분류성능을 측정하는 지표로써 더 적합하다고 알려져 있다. 실험 결과, 표집화 적용이 소수의 사례(AUPRC에서 12.2%, AUROC에서 10.0%)에 대해서만 분류성능을 변화(대응 t-검증의 P값 < 0.05) 시키는 것을 관찰했다. 특히 분류성능이 향상되는 사례보다는 저하되는 사례가 더 많았다. 분류성능의 저하는 AUROC보다 AUPRC에서 두드러졌다. 표 집화 방식 중 과소표집 방식이 분류성능 저하의 경우가 가장 많았다. 또 한 표집화는 선형 기계학습 방법의 성능을 개선하는 데 더 효과적이었 다. 그러나 가장 중요한 것은 31개 데이터 대부분에서 최고의 성능을 얻 기 위해 표집화를 적용할 필요가 없었다는 점이다. 즉, 31개 데이터 대부 분에서 표집화를 적용하지 않고 기계학습 방법을 그대로 학습하여도 최 고의 성능을 얻을 수 있었다. 마지막으로 표집화를 적용했을 때 AUROC 를 개선했으나 AUPRC는 감소시킨 결과도 확인했다. 결론적으로 본 논 문은 불균형데이터에 표집화 방식을 적용하는 것은, 대부분의 경우 효과 가 없거나 심지어 성능을 저하시킨다는 통찰력을 제공한다. 또한 분류 정확도 평가지표를 적절하게 선택하는 것이 올바른 의사결정을 위해 매 우 중요하다는 점을 시사한다. 우수한 성능의 이상거래 탐지 시스템을 구축하기 위해 고려해야 할 두 번째 문제는 적합한 기계학습 방법의 선정이다. 이상거래 탐지는 전형적 인 분류문제로 다양한 기계학습 방법들이 적용, 제안되고 있다. 기존 연 구들에서는 심층신경망(deep neural network)과 gradient boosting 기반 방법들이 전반적으로 우수한 분류 정확도를 보였다. 다만 연구마다 적용 및 비교한 기계학습 방법, 정확도 평가지표, 실험 데이터, 테스트 성능 추정 방법 등이 달라 실제 현장에서 신용카드 이상거래 탐지를 위해 어 떤 기계학습 방법을 적용해야 할지 명확하게 판단하는 것은 어렵다. 본 논문에서는 기계학습 방법 선정이라는 측면에서, 두 개의 공개된 실제 신용카드 거래데이터에 대해 9개의 기계학습 방법을 적용하고, 어 떤 방법이 신용카드 이상거래 탐지에 가장 적합한지 그 결과를 분석했 다. 실험 결과, gradient boosting 방법인 extreme gradient boosting(XGBoost) 과 light gradient boosting machines(LGBM)이 두 데이터 모두에 대해 가장 높은 분류 정확도를 보였다. 특히 본 논문의 결과는 널리 활용되는 공개된 신용카드 거래데이터 중 하나인 Creditcard 데이터에 대해 가장 정확한 결과를 보였던 기존 연구보다 더 좋은 결과를 얻었다. 이상거래 예측시간에 있어서는 LGBM이 XGBoost보다 40배 이상 빨랐다. 우리는 이러한 결과에 기반해 gradient boosting 기반의 방법, 특히 LGBM이 신 용카드 이상거래 탐지에 적합하다고 제안한다. 마지막으로 실제 기업의 이상거래 탐지 시스템 구축 시에 고려해야 할 여러 사항에 대해 제언한다. AUPRC를 최적화하는 방향은 신용카드 이 상거래 탐지 문제와 연관된 비용들을 최소화 방향과 직접적인 연관이 있 다고 분석하였다. 따라서 이상거래 탐지 시스템의 성능을 평가할 때 AUPRC를 사용하는 것이 좀 더 바람직하다. 또한, 신용카드 거래데이터 의 크기가 크고 자질의 종류가 다양하므로 신용카드업에 대한 이해를 기 반으로 데이터 분석 및 전처리 과정이 필요한 점을 지적하였다. 또한 신 용카드 거래데이터는 시간에 따라 부정사용의 패턴이 변화한다는 점을 인지해야 하며, 시스템의 성능 저하를 방지하기 위하여 주기적인 시스템 의 재학습이 반드시 필요하다. 마지막으로 이상거래로 판명되는 시점에 따라 이상거래의 유형을 분리하여 별도의 이상거래 탐지 시스템 구축 전 략이 필요하다는 점을 설명하였다. 실제 이상거래 탐지 시스템 구축 시에 고려해야 할 여러 사항은 불균 형데이터 처리 및 기계학습 방법 선정이라는 주제와 더불어 우수한 성능 의 시스템 구축을 위해 반드시 종합적으로 검토해야 할 주제이며, 추후 지속적인 연구를 통해 보완해 나가야 할 것이다.

번역하기

신용카드 거래 중 비정상적인 거래를 찾는 것을 신용카드 이상거래 탐 지(credit card fraud detection)라고 한다. 최근 전자상거래의 급속한 성 장에 따라 신용카드 거래량의 기하급수적인 증가와 ...

목차 (Table of Contents)

제 1 장 서론 1
1.1 연구의 배경 1
1.2 연구의 목적 2
1.3 연구의 기여 4
1.4 논문의 구성 5

제 1 장 서론 1
1.1 연구의 배경 1
1.2 연구의 목적 2
1.3 연구의 기여 4
1.4 논문의 구성 5
제 2 장 이론적 배경 7
2.1 불균형데이터 7
2.1.1 불균형데이터 정의 7
2.1.2 불균형데이터 유형 7
2.1.3 분류문제에서의 불균형데이터 처리 방식 8
2.2 불균형데이터의 균형화 8
2.2.1 균형화 종류 8
2.2.2 과대표집 방식 1: 무작위 과대표집 9
2.2.3 과대표집 방식 2: SMOTE 10
2.2.4 과대표집 방식 3: Borderline SMOTE 10
2.2.5 과소표집 방식 1: 무작위 과소표집 11
2.2.6 과소표집 방식 2: Condensed nearest neighbors 12
2.2.7 과소표집 방식 3: NearMiss2 12
2.2.8 혼합 방식: SMOTETomek 13
2.3 불균형데이터 분류문제의 정확도 평가지표 14
2.3.1 정확도 평가지표의 종류 14
2.3.2 불균형데이터 분류문제에 적합한 평가지표 18
2.4 기계학습 방법 19
2.4.1 선형판별분석 19
2.4.2 정규화된 로지스틱회귀 방법 19
2.4.3 Boosting 20
2.4.4 Random forests 20
2.4.5 Support vector machines 21
2.4.6 오토인코더로 사전학습된 신경망 기반 분류기 21
제 3 장 기존 연구 23
3.1 데이터 균형화의 효과에 관한 연구 23
3.1.1 기존 연구 23
3.1.2 기존 연구에 대한 고찰 25
3.2 신용카드 이상거래 탐지 기계학습 방법에 관한 연구 25
3.2.1 Creditcard 데이터에 관한 연구 26
3.2.2 Fraud_Detection 데이터에 관한 연구 27
3.2.3 기존 연구에 대한 고찰 27
제 4 장 불균형데이터 분류문제의 표집화에 대한 평가 29
4.1 표집화 방식의 효과 29
4.2 31개 불균형데이터 30
4.2.1 Creditcard 데이터 35
4.2.2 Shuttle 데이터 36
4.2.3 Covtype 데이터 37
4.2.4 Abalone 데이터 37
4.2.5 Yeast 데이터 38
4.2.6 Fraud_Detection 데이터 39
4.2.7 Letter 데이터 41
4.2.8 Glass 데이터 42
4.2.9 Balance 데이터 42
4.2.10 Pendigit 데이터 43
4.2.11 Pageblock 데이터 43
4.2.12 Ecoli 데이터 44
4.2.13 Segment 데이터 45
4.2.14 Vehicle 데이터 46
4.2.15 Parkinson 데이터 47
4.2.16 Haberman 데이터 48
4.2.17 Wine 데이터 48
4.2.18 German 데이터 49
4.2.19 Iris 데이터 50
4.2.20 Ionosphere 데이터 50
4.2.21 Spambase 데이터 50
4.2.22 Heart 데이터 52
4.2.23 Sonar 데이터 53
4.3 실험 절차 53
4.3.1 데이터 전처리 53
4.3.2 표집화 방식 54
4.3.3 기계학습 방법 및 하이퍼파라미터 최적화 56
4.3.4 5x2 CV paired t-test 58
4.4 실험 결과 60
4.4.1 표집화에 의한 분류 정확도 변화 60
4.4.2 표집화 방식별 비교 66
4.4.3 기계학습 방법별 비교 67
4.4.4 표집화 방식과 기계학습 방법 조합 비교 72
4.4.5 분류 정확도 평가지표에 따른 표집화 적용 효과의 차이 74
4.4.6 Boositng 방법과 표집화 방식 79
4.5 결론 80
제 5 장 Gradient Boosting 방법을 사용한 신용카드 거래 정보 기반의 이상거래 탐지 83
5.1 신용카드 이상거래 탐지를 위한 기계학습 선정 83
5.2 실험 절차 84
5.2.1 기계학습 84
5.2.2 데이터 전처리 85
5.2.3 하이퍼파라미터 최적화 및 분류 정확도 평가 89
5.3 실험 결과 92
5.3.1 기계학습 방법별 비교 92
5.3.2 J. L. Leevy et al. 연구 결과와의 비교 95
5.3.3 학습 및 예측시간 비교 96
5.4 결론 98
제 6 장 실제 상황에서 고려해야 할 이슈 100
6.1 성능지표와 비용 100
6.2 실제 데이터의 특징 103
제 7 장 결론 107
7.1 연구의 요약 및 시사점 107
7.2 향후 연구 방향 108
참고문헌 110

상세검색

RISS 보유자료

상세검색

해외전자자료

데이터 균형화를 수행하지 않는 기계학습을 이용한 신용카드 이상거래 탐지

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료