As deep learning models continue to grow in scale, the number of parameters in these models has increased, causing a significant memory bottleneck in conventional von Neumann architecture-based systems. To address this issue, a new memory technology s...
As deep learning models continue to grow in scale, the number of parameters in these models has increased, causing a significant memory bottleneck in conventional von Neumann architecture-based systems. To address this issue, a new memory technology such as Processing-In-Memory (PIM) is being developed, and its importance is also steadily being emphasized. However, since PIM designs additional logic to the existing memory structure, an in-depth analysis of the workload suitable for PIM is required in advance to prevent unnecessary overhead in the design process. In this paper, in order to verify the suitability of the recently popular Vision in Transformer (ViT) model for PIM, we build a deep learning model analysis environment using McSimA+ simulator and analyze the memory bottleneck of the ViT inference workload by layer. The analysis results show that the ViT is a very memory-intensive workload because Last-to-First Miss Ratio (LFMR) and Last Level Cache Miss Per Kilo Instruction (LLC MPKI) of the ViT, which are composed of embedding, multi-head self attention, and multi-layer perceptron layers, are 88.64 and 45.31, respectively, on average. As a result, the ViT is an appropriate workload to achieve significant system acceleration and power savings through PIM systems, unlike computationally intensive convolution neural networks (CNNs).