http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
VLIW (Very Long Instruction Word) 형식 드론 FCC(Flight Control Computer)의 실시간성 개선을 위한 소프트웨어 성능 가속화 연구
조두산(Doo-San Cho) 한국산업융합학회 2017 한국산업융합학회 논문집 Vol.20 No.1
Most conventional processors execute program instructions in a sequential manner. On the other hand, VLIW processor can execute multiple instructions at the same time. It exploits instruction level parallelism to improve system performance. To that end, program code should be rearranged to VLIW instruction format by a compiler. The compiler determine an optimal execution order of instructions of a program code. This instruction ordering is also called instruction scheduling. The scheduling is an algorithm that decides the execution order for instruction codes in loop parts of a program so that the instruction level parallelism can be maximized. In this research, we apply an existing scheduling algorithm to a VLIW FCC and describe analysis results to further improve its performance. And, we present a solution to solve some limitation of the existing scheduling technique. By using our solution, FCC’s performance can be improved upto 32% compared to the existing scheduling only setting.
인공지능 응용을 위한 하이브리드 메모리 설계 탐색 기법
조두산 ( Doo-San Cho ) 한국산업융합학회 2021 한국산업융합학회 논문집 Vol.24 No.5
As artificial intelligence technology advances, it is being applied to various application fields. Artificial intelligence is performing well in the field of image recognition and classification. Chip design specialized in this field is also actively being studied. Artificial intelligence-specific chips are designed to provide optimal performance for the applications. At the design task, memory component optimization is becoming an important issue. In this study, the optimal algorithm for the memory size exploration is presented, and the optimal memory size is becoming as a important factor in providing a proper design that meets the requirements of performance, cost, and power consumption.
Data Cluster Detection for Low Power Embedded Memory Subsystems
Park, Dae Jin,Cho, Jeong Hun,Cho, Doo San Trans Tech Publications 2015 Applied Mechanics and Materials Vol. No.
<P>This work proposes a technique for optimizing data placement of application-wide reused data so that it resides in scratchpad memory of processing elements in multiprocessor system on chips. The proposed technique identifies data elements with fine granularity that can profitably be placed in the scratchpad memories to maximize performance and energy gains. We present a heuristic approach that efficiently exploits the scratchpad memories using memory access footprint. Our experimental results indicate that our approach is able to reduce energy consumption by 30% over cache based memory subsystems.</P>
Park, Sang-Hyun,Cho, Doo-San,Kim, Tae-Song,Paek, Yun-Heung The Institute of Electronics and Information Engin 2006 Journal of semiconductor technology and science Vol.6 No.4
In the past decade, several tools have been developed to automate the floating-point to fixed-point conversion for DSP systems. In the conversion process, a number of scaling shifts are introduced, and they inevitably alter the original code sequence. Recently, we have observed that a compiler can often be adversely affected by this alteration, and consequently fails to generate efficient machine code for its target processor. In this paper, we present an optimization technique that safely migrates scaling shifts to other places within the code so that the compiler can produce better-quality code. We consider our technique to be safe in that it does not introduce new overflows, yet preserving the original SQNR. The experiments on a commercial fixed-point DSP processor exhibit that our technique is effective enough to achieve tangible improvement on code size and speed for a set of benchmarks.
컴퓨터 시스템 및 이론 : 멀티코어 모바일 시스템에서 효과적인 부하 균등화 기법
조중석 ( Jung Seok Cho ),조두산 ( Doo San Cho ) 한국정보처리학회 2015 정보처리학회논문지. 컴퓨터 및 통신시스템 Vol.4 No.5
The effectiveness of multicores depends on how well a scheduler can assign tasks onto the cores efficiently. In a heterogeneous multicore platform, the execution time of an application depends on which core it executes on. That is to say, the effectiveness of task assignment is one of the important components for a multicore systems`` performance. This work proposes a load scheduling technique that analyzes execution time of each task by profiling. The profiling result provides a basic information to predict which task-to-core mapping is likely to provide the best performance. By using such information, the proposed technique is about 26% performance gain.
Instruction Format Design for Low Power Embedded Systems
Youn, Jong Hee M.,Park, Dae Jin,Cho, Jeong Hun,Cho, Doo San Trans Tech Publications 2015 Applied Mechanics and Materials Vol. No.
<P>Embedded systems demand to take high performance while executing on batteries. In such environment, the systems must be optimized with available technique to reduce energy consumption while not sacrificing performance. Especially, in mobile devices, power consumption is an important design constraint. Switching activity accounts for over 90% of total power consumption in a digital circuit. In this paper, we describe an approach to design instruction format for low power instruction fetch. The proposed method reduces switching activity of the instruction fetch logic by using a heuristic that minimizes switching between adjacent instructions. To do this, the proposed approach encodes opcodes so that frequently executed instruction pairs have smaller bit changes.</P>
김용주,윤종희,조두산,백윤흥,Kim, Yong-Joo,Youn, Jong-Hee,Cho, Doo-San,Paek, Yun-Heung 한국정보처리학회 2012 정보처리학회논문지 A Vol.19 No.1
모바일 시장 및 소형 전자기기 시장의 발달에 따라 고성능 프로세서에 대한 요구 또한 커지게 되었다. 재구성형 프로세서(CGRA)는 고성능과 저전력 소모를 동시에 만족시키는 프로세서로 ASIC의 고성능 저전력을 대체하면서도 하드웨어를 쉽게 재디자인 할 수 있도록 구성된 프로세서이다. 어플리케이션의 구조에 따라 CGRA의 전체수행시간이 프로세서 자체의 수행시간보다 데이터의 전송시간에 종속되는 경우가 있다. 이 논문에서는 데이터 전송시간에 따라 수행에 사용되는 자원을 최적화 함으로써 전력소모를 줄이는 매핑 알고리즘을 제안하였다. 제안된 알고리즘을 사용한 경우, 기존의 방식보다 최대 73%, 평균 56.4%의 전력소모를 줄일 수 있었다. The demand of high performance processor is soaring due to the extending of mobile and small electronic device market. CGRA(Coarse Grained Reconfigurable Architecture) is the processor satisfying both of performance and low-power demands and a great alternative of ASIC that can be reconfigured. This paper presents a novel low-power mapping algorithm that optimizes the number of used computation resource in the mapping phase by considering data transfer time. Compared with previous mapping algorithm, ours reduce energy consumption by up to 73%, and 56.4% on average.