RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

인공지능(AI) 기술은 다양한 산업 분야에 혁신적인 변화를 가져오고 있으며, 이를 뒷받침하기 위한 컴퓨팅 시스템의 발전은 필수적이다. 특히, AI 워크로드는 고도의 연산 자원과 복잡한 데이터 흐름을 요구하기 때문에 이를 최적화하고 효율적으로 처리하기 위한 시스템 아키텍처 및 관리 기술의 개발은 중요한 연구 과제로 부각되고 있다. 본 논문은 온디바이스(On-device) 환경, 클라우드 클러스터 환경, 그리고 클라우드 컴퓨팅 환경이라는 세 가지 주요 컴퓨팅 환경에서 AI 워크로드의 효율성을 극대화하기 위한 새로운 접근법을 제시한다.
첫 번째 연구는 온디바이스 환경에서 제한된 자원을 효과적으로 활용하여 AI 워크로드의 성능 저하를 방지하는 데 초점을 맞춘다. AI 워크로드의 메모리 참조 특성을 심층적으로 분석하고, 이를 기반으로 비휘발성 메모리를 활용한 새로운 시스템 아키텍처를 제안한다. 분석 결과, AI 워크로드는 시간 지역성이 낮고 불규칙한 쓰기 패턴을 보이는 특징이 있으며, 이는 전통적인 시스템 설계로는 처리 성능에 심각한 영향을 미친다. 제안된 아키텍처는 비휘발성 메모리를 쓰기 가속기로 활용하여 이러한 문제를 효과적으로 해결하며, 시뮬레이션 결과 기존 시스템 대비 메모리 입출력 성능을 80% 이상 개선한다.
두 번째 연구는 다중 테넌트 클라우드 클러스터에서 GPU 자원의 활용률을 극대화하기 위한 스케줄링 전략을 탐구한다. 클라우드 환경은 이종 GPU와 같은 다양한 자원이 공존하며, 자원 단편화 문제로 인해 GPU 활용률이 저하되는 문제가 빈번히 발생한다. 본 연구는 유전 알고리즘 기반의 스케줄링 방법론을 제안하며, 이를 통해 클러스터 자원의 효율적 사용과 GPU 활용률 개선을 동시에 달성한다. 실제 클러스터 워크로드 데이터를 활용한 실험 결과, 제안된 방법은 기존 라운드 로빈 및 Tetris 알고리즘 대비 GPU 활용률을 12.8% 향상시키며, 작업 완료 시간에도 부정적인 영향을 미치지 않는다.
세 번째 연구는 클라우드 컴퓨팅 환경에서 자원 할당과 전력 관리를 통합적으로 최적화하기 위한 계층적 프레임워크를 제시한다. 기존의 자원 관리 기법은 동적인 AI 워크로드의 자원 요구를 효율적으로 처리하지 못하며, 전력 소비와 성능 간의 균형을 유지하는 데 한계를 가진다. 본 연구는 최신 딥 강화 학습(DRL) 기술을 기반으로 글로벌 자원 할당과 로컬 전력 관리를 통합한 프레임워크를 설계한다. 이를 통해 전력 소비를 최대 13.97% 절감하면서도 작업 지연 시간을 최소화하는 데 성공하며, 클라우드 환경에서의 자원 관리 효율성을 크게 향상시킨다.
본 논문은 온디바이스 환경에서의 메모리 관리, 클라우드 클러스터에서의 스케줄링 최적화, 그리고 클라우드 컴퓨팅 환경에서의 자원 및 전력 관리 통합이라는 세 가지 상이한 연구를 통해 AI 워크로드의 효율적 처리를 위한 통합적이고 체계적인 접근법을 제시한다. 이를 통해 다양한 컴퓨팅 환경에서 AI 워크로드의 성능과 자원 활용성을 극대화하며, 차세대 AI 응용 기술의 폭넓은 도입과 지속 가능성을 뒷받침하는 중요한 기반을 제공한다. AI 기술이 다양한 산업 분야에서 실질적이고 혁신적인 영향을 미칠 수 있도록 지원하는 데 기여할 것으로 기대된다.

번역하기

인공지능(AI) 기술은 다양한 산업 분야에 혁신적인 변화를 가져오고 있으며, 이를 뒷받침하기 위한 컴퓨팅 시스템의 발전은 필수적이다. 특히, AI 워크로드는 고도의 연산 자원과 복잡한 데이...

다국어 초록 (Multilingual Abstract)

Artificial Intelligence (AI) technology is bringing about innovative changes across various industries, and the advancement of computing systems to support this is essential. In particular, AI workloads require high computational resources and complex data flows, making the development of system architectures and management technologies to optimize and efficiently process these workloads an important research task. This paper presents a new approach to maximize the efficiency of AI workloads in three main computing environments: On-device environments, cloud cluster environments, and cloud computing environments.
The first study focuses on effectively utilizing limited resources in an On-device environment to prevent performance degradation of AI workloads. It conducts an in-depth analysis of the memory reference characteristics of AI workloads and proposes a new system architecture that leverages non-volatile memory based on this analysis. The results show that AI workloads exhibit low temporal locality and irregular write patterns, which can severely impact processing performance with traditional system designs. The proposed architecture effectively addresses these issues by using non-volatile memory as a write accelerator, and simulation results indicate an improvement of over 80% in memory input/output performance compared to existing systems.
The second study explores scheduling strategies to maximize GPU resource utilization in multi-tenant cloud clusters. In cloud environments, various resources coexist, such as heterogeneous GPUs, and issues frequently arise due to resource fragmentation, leading to decreased GPU utilization. This study proposes a scheduling methodology based on genetic algorithms, which simultaneously achieves efficient use of cluster resources and improved GPU utilization. Experimental results using actual cluster workload data show that the proposed method enhances GPU utilization by 12.8% compared to existing round-robin and Tetris algorithms, without negatively impacting job completion times.
The third study presents a hierarchical framework for the integrated optimization of resource allocation and power management in cloud computing environments. Existing resource management techniques struggle to efficiently handle the resource demands of dynamic AI workloads and have limitations in maintaining a balance between power consumption and performance. This research designs a framework that integrates global resource allocation and local power management based on the latest deep reinforcement learning (DRL) technology. As a result, it successfully reduces power consumption by up to 13.97% while minimizing job latency, significantly enhancing resource management efficiency in cloud environments.
This paper presents an integrated and systematic approach for the efficient processing of AI workloads through three distinct studies: memory management in on-device environments, scheduling optimization in cloud clusters, and the integration of resource and power management in cloud computing environments. This approach aims to maximize the performance and resource utilization of AI workloads across various computing environments, providing a crucial foundation for the broad adoption and sustainability of next-generation AI application technologies. It is expected to contribute to supporting AI technologies in making practical and innovative impacts across various industrial sectors.

번역하기

목차 (Table of Contents)

I. Introduction 1
II. Memory Reference Analysis and Optimization Strategy for AIWorkloads in On-device Systems. 5
A. Introduction 6
B. Characterization of Memory References in Artificial Intelligence Workloads 8
1. Data Set and Workloads 8

I. Introduction 1
II. Memory Reference Analysis and Optimization Strategy for AIWorkloads in On-device Systems. 5
A. Introduction 6
B. Characterization of Memory References in Artificial Intelligence Workloads 8
1. Data Set and Workloads 8
2. Analyzing Memory Reference Characteristics. 9
C. System Architecture for Artificial Intelligence Workloads in On-Device Systems . 15
D. Performance Evaluations 17
E. Conclusions 20
III. Optimized Scheduling of AI Applications in Multi-Tenant Cluster Systems with Evolutionary Computation . 21
A. Introduction 22
B. Problem Definition and the Basic Model 24
1. Overview of GPU Utilization in Multi-Tenant Environments 24
2. Challenges Associated with GPU Sharing Technology 24
3. GPU Fragmentation and Allocation Challenges. 25
4. Scheduling System Architecture 25
5. Optimization of GPU Cluster Utilization . 27
C. Scheduling Optimization by Genetic Algorithms 30
1. Fitness Function . 32
2. Problem Encoding and Population Size . 33
3. Selection 36
4. Crossover and Mutation 37
5. Replacement 41
D. Performance Evaluations 42
1. Convergence Test 45
2. GPU Utilization 46
3. Completion Time 48
4. Average Slowdown . 49
5. Throughput . 50
E. Discussion . 52
F. Conclusions 54
IV. Enhanced Resource Allocation for AI Workloads in Cloud Computing Systems . 56
A. Introduction 57
B. Backgrounds. 61
1. Agent-Environment Interaction Modeling 61
2. Continuous-Time Q-learning for Advantage Actor-Critic (A2C) 62
C. Problem Statement and the System Model. 65
D. Global System Model for Resource Allocation . 70
1. Advantage Actor Critic(A2C)-based Global Resource Allocation 70
E. Local System Model for Dynamic Power Management. 73
1. Workload Prediction through Bidirectional Long Short-Term Memory(BiLSTM) 73
2. Reinforcement Learning(RL)-based Dynamic Power Management for Servers . 75
F. Discussion . 77
G. Experiment Results. 79
1. Convergence Test 80
2. Job Latency . 82
3. Power Consumption 83
4. Throughput . 85
5. Trade-off 87
H. Conclusions 89
V. Conclusion 90
Bibliography . 92
Abstract (in Korean). 101

상세검색

RISS 보유자료

상세검색

해외전자자료

System Architecture and Optimization Strategies for Efficient AI Workload Management

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료