http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Energy-Efficient Design of Processing Element for Convolutional Neural Network
Choi, Yeongjae,Bae, Dongmyung,Sim, Jaehyeong,Choi, Seungkyu,Kim, Minhye,Kim, Lee-Sup IEEE 2017 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS PART 2 E Vol.64 No.11
<P>Convolutional neural network (CNN) is the most prominent algorithm for its wide usage and good performance. Despite the fact that the processing element (PE) plays an important role in CNN processing, there has been no study focusing on PE design optimized for state-of-the-art CNN algorithms. In this brief, we propose an optimal PE implementation including a data representation scheme, circuit block configurations, and control signals for energy-efficient CNN. To validate the excellence of this brief, we compared our proposed design with several previous methods, and fabricated a silicon chip. The software simulation results demonstrated that we can reduce 54% of data bit lengths with negligible accuracy loss. Our optimization on PE achieves to save computing power up to 47%, and an accelerator exploiting our method shows superior results in terms of power, area, and external DRAM access.</P>
Seungwook Paek,Wongyu Shin,Jaehyeong Sim,Lee-Sup Kim IEEE 2013 IEEE transactions on computer-aided design of inte Vol.32 No.10
<P>Temperature-to-power technique is useful for post-silicon power model validation. However, the previous works were applicable only to the steady-state analysis. In this paper, we propose a new temperature-to-power technique, named PowerField, supporting both transient and steady-state analysis based on a probabilistic approach. Unlike the previous works, PowerField uses two consecutive thermal images to find the most feasible power distribution that causes the change between the two input images. To obtain the power map with the highest probability, we adopted maximum a posteriori Markov random field (MAP-MRF). For MAP-MRF framework, we modeled the spatial thermal system as a set of thermal nodes and derived an approximated transient heat transfer equation that requires only the local information of each thermal node. Experimental results with a thermal simulator show that PowerField outperforms the previous method in transient analysis reducing the error by half on average. We also show that our framework works well for steady-state analysis by using two identical steady-state thermal maps as inputs. Lastly, an application to determining the binary power patterns of an FPGA device is presented achieving 90.7% average accuracy.</P>
Taeho Lee,Yong-Hun Kim,Jaehyeong Sim,Jun-Seok Park,Lee-Sup Kim IEEE 2016 IEEE transactions on very large scale integration Vol.24 No.4
<P>A digital clock and data recovery (CDR) employing a time-dithered delta-sigma modulator (TDDSM) is presented. By enabling hybrid dithering of a sampling period as well as an output bit of the TDDSM, the proposed CDR enhances the resolution of digitally controlled oscillator, removes a low-pass filter in the integral path, and reduces jitter generation. Fabricated in a 65-nm CMOS process, the proposed CDR operates at 5-Gb/s data rate with BER < 10(-12) for PRBS 31. The CDR consumes 13.32 mW at 5 Gb/s and achieves 2.14 and 29.7 ps of a long-term rms and peak-to-peak jitter, respectively.</P>
ToMato: Token Merging을 이용한 Vision Transformer 가속화
권수영(Sooyoung Kwon),권민서(Minseo Kwon),김효진(Hyojin Kim),심재형(Jaehyeong Sim) 대한전자공학회 2023 대한전자공학회 학술대회 Vol.2023 No.11
ViT(Vision Transformer) shows outstanding performance in various vision tasks by splitting images into patches and passing them through transformer blocks. However, the large model size and computational cost of ViT result in high inference latency and hindered acceleration. To accelerate ViT efficiently, we introduce ToMato(Token Merging at Once), a simple framework that recursively merges tokens by comparing similarity to adjacent tokens at the first transformer block. Applying the ToMato to DeiTbase model, we find that this reduces latency by 22.19% while maintaining high Top-1 accuracy of 80.14%. Our codes are available at https://github.com/Transformer04/ToMato.
QTNAAS: 템플릿 기반 양자화된 신경망 구조 및 가속기 탐색 프레임워크
임하영(HaYoung Lim),김경미(Kyungmi Kim),장예서(Yeseo Jang),김주연(Juyeon Kim),심재형(Jaehyeong Sim) 대한전자공학회 2023 대한전자공학회 학술대회 Vol.2023 No.11
This work presents QTNAAS, a quantized template-based neural architecture accelerator search framework. Each neural operator with various quantization levels is paired with the optimal hardware block that executes the operator efficiently. We call this pair a quantized template. This approach reduces both the search space and time consumed for performance estimation, hence make it highly probable to find optimal neural architecture. This found neural architecture is bind with hardware through templates, sustaining 100% utilization. Evaluation results show that our method can build a neural network with comparable accuracy even employing mixed-precision and an accelerator with lower latency and energy consumption compared to the existing works.