RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

데이타베이스 활용 분야가 급증하고 업무 의존도가 높아짐에 따라서 데이타베이스에 축적되는 자료의 양이 급속히 늘어나고 있다. 이러한 자료들을 본연의 업무 운영에 적용하는데 그치지 ...

데이타베이스 활용 분야가 급증하고 업무 의존도가 높아짐에 따라서 데이타베이스에 축적되는 자료의 양이 급속히 늘어나고 있다. 이러한 자료들을 본연의 업무 운영에 적용하는데 그치지 않고, 업무 현장의 특성 분석에 필요한 실질적인 근거로서 활용할 수 있다는 인식이 확산되고 있다. 따라서 대용량의 데이타베이스로부터, 미리 예측할 수 없지만 의사 결정에 유용한 지식을 효율적으로 발견하기 위한 데이타마이닝 연구가 최근 활발히 진행되고 있다.
본 논문에서는 데이타마이닝의 여러 분야 중 특히 사건들의 상호 연관 관계 탐사에 초점을 맞추고자 한다. 사건들의 상호 관련성은 연관규칙(association rules)의 형태로 표현되는데, 연관규칙이란 특정 사건 집합의 발생이 다른 사건의 발생을 암시하는 경향을 표현하는 규칙이다. 기존의 연관규칙은 주로 사건이 발생한 전체 영역에서 성립하는 사건들 간의 연관성만을 고려하고 있다 그러나, 어떤 연관규칙은 비록 전체 영역에 대해서는 신뢰도가 그리 높지 않더라도, 특정 기간 혹은 특정 영역에서 강한 신뢰도로 성립할 수 있고, 그러한 정보를 알 수 있다면 다양한 의사 결정에 매우 유용하리라고 생각한다. 따라서, 본 논문에서는 임의의 부분 영역에서 강한 신뢰도를 갖는 연관성을 영역 연관규칙(ranged association rule)이라 정의하고, 대용량의 데이타베이스로터 영역 연관규칙이 성립하는 부분영역을 탐사하는 효율적인 알고리즘을 제안한다.
먼저, 주어진 이진 연관규칙에 대하여 미리 정의된 고정된 크기가 아닌 임의의 크기이고, 강한 신뢰도를 갖는 부분영역을 탐사하는 방법을 제시한다. 제안된 탐사 기법은 데이타 자체의 분포에 근거하여 가설적인 부분영역을 설정해 가는 데이타 기반(data-driven) 검색 기법을 이용한다. 따라서, 탐사 과정에서 불필요한 부분영역의 검색을 배제할 수 있다. 또한, 중복되는 데이타베이스 스캐닝(scanning)을 줄이기 위해, 주기억장치상에 관리할 수 있는 효과적인 자료구조를 설계한다. 제안된 자료구조는 부분영역의 크기를 확장해 가는 다음 단계의 검색에 필요한 정보를 제시하며, 단 한번의 데이타베이스 스캐닝에 의해 획득된다.
영역 연관규칙의 탐사는 먼저 단일 이진 연관규칙을 대상으로 1차원 사건 발생 영역에 대한 부분 영역의 탐사과정을 제시하고, 복수개의 이진 연관규칙을 수용할 수 있는 탐사 알고리즘으로 확장한다. 알고리즘의 확장 과정에서 연관규칙들이 포함하는 사건 집합에 근거하여 관련된 규칙들을 그룹핑하고, 각 그룹에 대해 단지 하나의 규칙에 대한 탐사만을 수행함으로 알고리즘 수행 성능을 상당히 향상시킨다. 또한, 사건 발생 영역을 다차원으로 확장하여 영역 연관규칙의 적용 범위를 넓힌다. 아울러 실험을 통해, 제안된 탐사 알고리즘에 실제 업무 현장에 적용할 만한 시간 비용으로 수행됨을 보인다.

다국어 초록 (Multilingual Abstract)

As database systems are widely spread and many business applications are heavily relying on database facilities, the volume of databases are rapidly increasing. It is realized that databases can be used as actual evidence of domain characteristics, rather than only used for their own operational purposes, In this regard, data mining techniques are taking growing attention in many applications, where they discover hidden but potentially useful knowledge for decision making from large databases.
Among various data mining areas, this study focuses on the discovery of associations among several events. An association rule expresses the tendency that the occurrence of some events implies the co-occurrence of other events at the same time. Previously announced researches on association rules, mainly deal with associations in the whole domain. Some association rules, however, can have very high confidence in a sub-interval or a subrange of the domain, though not quite high confidence in the whole domain. Such kind of association rules are expected to be very useful in various decision making problems. In this paper, we define a rgnged association rule, an association with hight confidence worthy of special attention in a sub-domain, and further propose an efficient algorithm which finds out ranged association rules.
Firstly, we suggest a data mining method that discovers sub-ranges where given binary association rules have high confidence. Note that such subranges are not delimited by predefined boundaries. In addition, the proposed method is data-driven in a sense that hypothetical subranges are built based on data distribution itself. It implies that any unnecessary subranges are not probed in the mining process. To avoid redundant database scanning, we devise an effective in-memory data structure, where essential information for the subsequent mining process is collected through single database scanning.
In the mining algorithm of the ranged association rules, we suggest the exploring process of subranges in one dimensional domain for a single binary association rule, and later extend it to accept multiple binary rules. In this phase, we identify several groups of relevant association rules based on their event sets. Since only one association rule per each group is evaluated in the mining process, the performance of the process is significantly improved. The domains of events are extended to multi-dimensional ones, and it enriches the applicability of the algorithm. In addition, our simulation shows that the suggested algorithm has reliable performance at the acceptable time cost in actual application areas.

목차 (Table of Contents)

그림차례 = iii
표 차례 = V
국문초록 = vi
1장 서론 = 1
1.1 연구 배경 = 1

그림차례 = iii
표 차례 = V
국문초록 = vi
1장 서론 = 1
1.1 연구 배경 = 1
1.2 연구 목적 = 4
1.3 연구 내용 및 논문 구성 = 6
2장 관련 연구 = 9
2.1 데이타마이닝 = 9
2.2 연관규칙 탐사 = 16
2.3 연관규칙 탐사 알고리즘 = 19
2.3.1 Apriori 알고리즘 = 20
2.3.2 PARTITION 알고리즘 = 23
2.3.3 수량 연관규칙 탐사 알고리즘 = 25
2.3.4 기존 알고리즘의 접근 방식 = 31
3장 1차원 영역 연관규칙 = 34
3.1 1차원 영역 연관규칙의 정의 = 35
3.2 1차원 영역 연관규칙의 타당성 척도 = 37
3.3 데이타 기반 분할 = 41
3.3.1 트랜잭션의 분류 = 41
3.3.2 분할의 생성 = 42
4장 1차원 영역 연관규칙 탐사 알고리즘 = 46
4.1 단일 연관규칙에 대한 부분 영역 탐사 = 47
4.1.1 분할 기준 테이블 생성 = 49
4.1.2 분할 테이블 생성과 합병 = 52
4.2 복수 연관규칙에 대한 부분 영역 탐사 = 56
4.2.1 연관규칙의 일반 속성 = 56
4.2.2 생성된 연관규칙들의 특성 = 60
4.2.2.1 동일 그룹 = 63
4.2.2.2 내포 그룹 = 65
4.2.2.3 중첩 그룹 = 67
4.3 복수 연관규칙에 대한 부분 영역 탐사 알고리즘 = 69
5장 사건 발생 영역의 확장 = 76
5.1 다차원 영역 연관규칙 = 76
5.2 다차원 영역 연관규칙 탐사 알고리즘 = 80
6장 성능 분석 = 83
7장 결론 = 91
참고문헌 = 93
영문초록 = 99

상세검색

RISS 보유자료

상세검색

해외전자자료

영역 연관규칙을 위한 데이타 탐사 기법 = Data-driven exploration for ranged association rules

부가정보

분석정보

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료