RISS 검색 - 국내학술지논문 상세보기

국문 초록 (Abstract)

GPU(Graphics Processing Unit)는 범용 CPU와는 달리 다수코어 스트리밍 프로세서(manycore streaming processor) 형태로 특화되어 발전되어 왔으며, 최근 뛰어난 병렬 처리 연산 능력으로 인하여 점차 많은 영...

GPU(Graphics Processing Unit)는 범용 CPU와는 달리 다수코어 스트리밍 프로세서(manycore streaming processor) 형태로 특화되어 발전되어 왔으며, 최근 뛰어난 병렬 처리 연산 능력으로 인하여 점차 많은 영역에서 CPU의 역할을 대체하고 있다. 이러한 추세에 따라 최근 NVIDIA 사에서는 GPGPU(General Purpose GPU) 아키텍처인 CUDA(Compute Unified Device Architecture)를 발표하여 보다 유연한 GPU 프로그래밍 환경을 제공하고 있다. 일반적으로 CUDA API를 사용한 프로그래밍 작업시 GPU의 계산구조에 관한 여러 가지 요소들에 대한 특성을 정확히 파악해야 효율적인 병렬 소프트웨어를 개발할 수 있다. 본 논문에서는 다양한 실험과 시행착오를 통하여 획득한 CUDA 프로그래밍에 관한 최적화 기법에 대하여 설명하고, 그러한 방법들이 프로그램 수행의 효율에 어떠한 영향을 미치는지 알아본다. 특히 특정 예제 문제에 대하여 효과적인 계층 구조 메모리의 접근과 코어 활성화 비율(occupancy), 지연 감춤(latency hiding) 등과 같이 성능에 영향을 미치는 몇 가지 규칙을 실험을 통해 분석해봄으로써, 향후 CUDA를 기반으로 하는 효과적인 병렬 프로그래밍에 유용하게 활용할 수 있는 구체적인 방안을 제시한다.

다국어 초록 (Multilingual Abstract)

Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

목차 (Table of Contents)

요약
Abstract
1. 서론
2. CUDA 프로그래밍 구조 및 계층적 메모리 접근 최적화
3. 활성화 비율(Occupancy) 조절 및 지연시간 감춤(Latency hiding)을 통한 최적화

요약
Abstract
1. 서론
2. CUDA 프로그래밍 구조 및 계층적 메모리 접근 최적화
3. 활성화 비율(Occupancy) 조절 및 지연시간 감춤(Latency hiding)을 통한 최적화
4. 소벨(Sobel) 연산 문제를 통한 최적화 기법의 영향 분석
5. 행렬과 벡터의 곱셈을 통한 최적화 기법의 영향 분석
6. 결론
참고문헌
부록: 본 논문에서 사용한 CUDA 예제 코드

참고문헌 (Reference)

1 Shuai Che, "Performance Study of General-Purpose Applicaions on Graphics Processors Using CUDA" 2008

2 NVIDIA, "Optimizing CUDA"

3 Shane Ryoo, "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA" ACM Press 2008

4 NVIDIA, "NVIDIA CUDA Visual Profiler (Version 2.3)"

5 NVIDIA, "NVIDIA CUDA Compute Unified Device Architecture: Technical Brief NVIDIA GeForce GTX 200 GPU Architectural Overview"

6 NVIDIA, "NVIDIA CUDA Compute Unified Device Architecture: Programming Guide (Version 2.3)"

7 "NVIDIA"

8 Mark Segal, "Kurt Akeley, The OpenGL Graphics System: A Specification(Version 2.1 - December 1)"

9 B. Parhami, "Introduction to Parallel Processing: Algorithms and Architectures" Plenum Press 377-379, 1999

10 Victor Podlozhnyuk, "Image Convolution with CUDA"

1 Shuai Che, "Performance Study of General-Purpose Applicaions on Graphics Processors Using CUDA" 2008

2 NVIDIA, "Optimizing CUDA"

3 Shane Ryoo, "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA" ACM Press 2008

4 NVIDIA, "NVIDIA CUDA Visual Profiler (Version 2.3)"

5 NVIDIA, "NVIDIA CUDA Compute Unified Device Architecture: Technical Brief NVIDIA GeForce GTX 200 GPU Architectural Overview"

6 NVIDIA, "NVIDIA CUDA Compute Unified Device Architecture: Programming Guide (Version 2.3)"

7 "NVIDIA"

8 Mark Segal, "Kurt Akeley, The OpenGL Graphics System: A Specification(Version 2.1 - December 1)"

9 B. Parhami, "Introduction to Parallel Processing: Algorithms and Architectures" Plenum Press 377-379, 1999

10 Victor Podlozhnyuk, "Image Convolution with CUDA"

11 Joe Stam, "Convolution Soup" NVIDIA 2009

12 Maryam Moazeni, "A Memory Optimization Technique for Software- Managed Scratchpad Memory in GPUs" University of California 2009

13 Sobel, I, "A 3x3 Isotropic Gradient Operator for Image Processing, presented at a talk at the Stanford Artificial Project"

연월일	이력구분	이력상세
2022	평가예정	재인증평가 신청대상 (재인증)
2019-01-01	평가	등재학술지 유지 (계속평가)
2016-01-01	평가	등재학술지 유지 (계속평가)
2015-01-01	평가	등재학술지 유지 (등재유지)
2014-09-16	학술지명변경	한글명 : 정보과학회논문지 : 컴퓨팅의 실제 및 레터 -> 정보과학회 컴퓨팅의 실제 논문지 외국어명 : Journal of KIISE : Computing Practices and Letters -> KIISE Transactions on Computing Practices
2013-04-26	학술지명변경	외국어명 : Journal of KISS : Computing Practices and Letters -> Journal of KIISE : Computing Practices and Letters
2011-01-01	평가	등재학술지 유지 (등재유지)
2009-01-01	평가	등재학술지 유지 (등재유지)
2008-10-02	학술지명변경	한글명 : 정보과학회논문지 : 컴퓨팅의 실제 -> 정보과학회논문지 : 컴퓨팅의 실제 및 레터 외국어명 : Journal of KISS : Computing Practices -> Journal of KISS : Computing Practices and Letters
2007-01-01	평가	등재학술지 유지 (등재유지)
2005-01-01	평가	등재학술지 유지 (등재유지)
2002-01-01	평가	등재학술지 선정 (등재후보2차)

기준연도	WOS-KCI 통합IF(2년)	KCIF(2년)	KCIF(3년)
2016	0.29	0.29	0.27
KCIF(4년)	KCIF(5년)	중심성지수(3년)	즉시성지수
0.24	0.21	0.503	0.04

상세검색

RISS 보유자료

상세검색

해외전자자료

최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석 = Analysis of Programming Techniques for Creating Optimized CUDA Software

부가정보

동일학술지(권/호) 다른 논문

분석정보

인용정보 인용지수 설명보기

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료