RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        Memory Representations in Visual Working Memory: Representational Quality and Memory Access

        신은삼,Monica Fabiani,Gabriele Gratton 한국인지및생물심리학회 2013 한국심리학회지 인지 및 생물 Vol.25 No.4

        Previously, Shin and colleagues (2006) reported sequential deflections of encoding-related lateralization (ERL) waveforms in event-related potentials (ERPs). One of these deflections, observed at posterior electrode sites (P7/P8), started about 400 ms poststimulus, and was dependent on both memory set-size and the degree of matching between memory-sets and test probes. These suggest that there is a level at which relations among items and degree of memory access are important in visual working memory. Based on these findings the present study investigated representational quality and degree of memory access. It was hypothesized that representational quality could be lowered by competition between stimuli (local suppression), and that degree of memory access be lowered when probes only partially match memory-set stimuli (partial matching). The relative distance (close or far) and similarity (homogeneous or heterogeneous) between memory-set stimuli were varied. ERPs were recorded while participants made old or new responses to single probes preceded by memory-sets (of size 2 or 4). ERL results obtained from 33 participants showed (a) that large ERL effects were found at the P7/P8 sites with a latency of 400-700 ms from probe onset, similar to Shin et al. (2006); (b) that significant ERL activity was observed only for the homogeneous memory-sets presented far apart; and (c) that the heterogeneous memory-sets presented nearby showed significantly smaller ERL activity than set-size 2 memory-sets (representing no-suppression and complete matching). These results support a hybrid of the local suppression and partial matching hypotheses, suggesting that representational quality and degree of memory access can jointly influence visual working memory processing.

      • Access region cache with register guided memory reference partitioning

        Le, G.,Shi, Y. Elsevier 2009 JOURNAL OF SYSTEMS ARCHITECTURE - Vol.55 No.10

        Wide-issue and high-frequency processors require not only a low-latency but also high-bandwidth memory system to achieve high performance. Previous studies have shown that using multiple small single-ported caches instead of a monolithic large multi-ported one for L1 data cache can be a scalable and inexpensive way to provide higher bandwidth. Various schemes on how to direct the memory references have been proposed in order to achieve a close match to the performance of an ideal multi-ported cache. However, most existing designs seldom take dynamic data access patterns into consideration, thus suffer from access conflicts within one cache and unbalanced loads between the caches. It is observed in this paper that if one can group data references defined in a program into several regions (access regions) to allow parallel accesses, providing separate small caches - access region cache for these regions may prove to have better performance. A register-guided memory reference partitioning approach is proposed and it effectively identifies these semantic regions and organizes them into multiple caches adaptively to maximize concurrent accesses. The base register name, not its content, in the memory reference instruction is used as a basic guide for instruction steering. With the initial assignment to a specific access region cache per the base register name, a reassignment mechanism is applied to capture the access pattern when program is moving across its access regions. In addition, a distribution mechanism is introduced to adaptively enable access regions to extend or shrink among the physical caches to reduce potential conflicts further. The simulations of SPEC CPU2000 benchmarks have shown that the semantic-based scheme can reduce the conflicts effectively, and obtain considerable performance improvement in terms of IPC; with 8 access region caches, 25-33% higher IPC is achieved for integer benchmark programs than a comparable 8-banked cache, while the benefit is less for floating-point benchmark programs, 19% at most.

      • KCI등재

        T-CAS : 원자적 Compare & Swap을 지원하지 않는 공유 메모리 시스템의 타이머 기반 동시성 제어 방안

        신재권,최용석,안신영,이상길,김정열,김준형,조경연,이철훈 한국정보과학회 2022 정보과학회 컴퓨팅의 실제 논문지 Vol.28 No.5

        Concurrency control is a method used to prevent the Race Condition when two or more objects access a memory. To prevent the Race Condition when multiple physically isolated nodes access shared memory, we have previously used Transactional Memory or master node control methods. However, these approaches have disadvantages, such as slow access to shared memory and heavy system overhead. Therefore, in this paper, we present an algorithm that allows mutually exclusive access to shared memory among computing resources in environments where shared memory is accessed at high speed without central control, and it demonstrates that testing can guarantee the atomicity of shared memory data. 동시성 제어는 하나의 메모리에 2개 이상의 개체가 접근할 때 Race Condition을 방지하기 위하여 사용하는 방식이다. 물리적으로 분리된 다중 노드가 공유 메모리에 접근할 때 Race Condition을 방지하기 위하여 기존에는 Transactional Memory 혹은 마스터 노드의 제어 방식을 사용하였다. 하지만 이러한 중앙 제어 방식은 공유 메모리에 대한 접근 속도가 느리고, 시스템의 오버헤드가 심한 단점이 있다. 따라서, 본 논문에서는 중앙 제어 없이 공유 메모리에 고속으로 접근하는 환경에서 컴퓨팅 자원들이 공유 메모리에 상호 배타적으로 접근할 수 있는 알고리즘을 제시하였으며, 시험을 통하여 공유 메모리 데이터의 원자성을 보장할 수 있음을 증명하였다.

      • SCIESCOPUSKCI등재

        Algorithmic GPGPU Memory Optimization

        Jang, Byunghyun,Choi, Minsu,Kim, Kyung Ki The Institute of Electronics and Information Engin 2014 Journal of semiconductor technology and science Vol.14 No.4

        The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

      • KCI등재

        Algorithmic GPGPU Memory Optimization

        장병현,최민수,김경기 대한전자공학회 2014 Journal of semiconductor technology and science Vol.14 No.4

        The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access- pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

      • SCISCIESCOPUSKCI등재

        Distributed memory access architecture and control for fully disaggregated datacenter network

        Kyeong-Eun Han(Kyeong-Eun Han),Ji Wook Youn(Ji Wook Youn),Jongtae Song(Jongtae Song),Dae-Ub Kim(Dae-Ub Kim),Joon Ki Lee(Joon Ki Lee) 한국전자통신연구원 2022 ETRI Journal Vol.44 No.6

        In this paper, we propose novel disaggregated memory module (dMM) architecture and memory access control schemes to solve the collision and contention problems of memory disaggregation, reducing the average memory access time to less than 1 μs. In the schemes, the distributed scheduler in each dMM determines the order of memory read/write access based on delay-sensitive priority requests in the disaggregated memory access frame (dMAF). We used the memory-intensive first (MIF) algorithm and priority-based MIF (p-MIF) algorithm that prioritize delay-sensitive and/or memory-intensive (MI) traffic over CPU-intensive (CI) traffic. We evaluated the performance of the proposed schemes through simulation using OPNET and hardware implementation. Our results showed that when the offered load was below 0.7 and the payload of dMAF was 256 bytes, the average round trip time (RTT) was the lowest, ~0.676 μs. The dMM scheduling algorithms, MIF and p-MIF, achieved delay less than 1 μs for all MI traffic with less than 10% of transmission overhead.

      • KCI등재
      • KCI등재

        고속 메모리의 전송선 지연시간을 적응적으로 반영하는 메모리 제어기 구조

        이찬호,구교철,Lee, Chanho,Koo, Kyochul 한국전기전자학회 2013 전기전자학회논문지 Vol.17 No.2

        고속의 동작 주파수를 갖는 메모리 제어기를 설계하여 PCB에서 고속 메모리와 통신을 할 경우 연결선의 길이와 배치에 따라 데이터가 전달되는 시간이 달라진다. 따라서 메모리 제어기를 설계한 뒤 PCB 상에서 메모리와 연결하여 동작시킬 때마다 이러한 지연시간이 달라져 제어기의 입출력 회로를 다시 설계하거나 초기화시 내부 설정을 바꾸어 주어야 한다. 본 논문에서는 이러한 문제를 해결하기 위해 제어기 내부에 초기화 단계에서 메모리에 테스트 패턴을 쓰고 읽으며 지연시간을 측정하고 적응적으로 지연시간을 고려한 입출력 회로를 구성하는 학습 방법을 제안한다. 제안한 학습 방법에서는 테스트 패턴을 쓰고 최소 시간 단위로 데이터를 읽는 타이밍을 바꾸어 가며 차례로 읽기를 시도하여 테스트 패턴이 정확히 읽히는 타이밍을 기억하여 초기화가 끝난 뒤 정상 동작을 시작하였을 때 학습 결과를 반영하여 메모리 접근을 시도한다. 제안한 학습 방법을 이용하면 PCB에 새로운 시스템을 구성하여도 초기화시 지연시간을 새로 설정하므로 제어기와 메모리의 통신 지연 문제를 해결할 수 있다. 제안한 방식은 고속의 SRAM, DRAM, 플래시 메모리 등에 사용 가능하다. The delay times due to the propagating of data on PCB depend on the shape and length of interconnection lines when memory controllers and high speed memories are soldered on the PCB. The dependency on the placement and routing on the PCB requires redesign of I/O logic or reconfiguration of the memory controller after the delay time is measured if the controller is programmable. In this paper, we propose architecture of configuring logic for the delay time estimation by writing and reading test patterns while initializing the memories. The configuration logic writes test patterns to the memory and reads them by changing timing until the correct patterns are read. The timing information is stored and the configuration logic configures the memory controller at the end of initialization. The proposed method enables easy design of systems using PCB by solving the problem of the mismatching caused by the variation of placement and routing of components including memories and memory controllers. The proposed method can be applied to high speed SRAM, DRAM, and flash memory.

      • KCI등재

        CPWL : Clock and Page Weight based Disk Buffer Management Policy for Flash Memory Systems

        Byung Kook Kang(강병국),Jong Wook Kwak(곽종욱) 한국컴퓨터정보학회 2020 韓國컴퓨터情報學會論文誌 Vol.25 No.2

        IT 산업 환경에서 모바일 데이터의 수요 증가로 인해 NAND 플래시 메모리의 사용이 지속적으로 증가하고 있다. 하지만, 플래시 메모리의 소거 동작은 긴 대기 시간과 높은 소비 전력을 요구하여 각 셀의 수명을 제한한다. 따라서 쓰기와 삭제 작업을 자주 수행하면 플래시 메모리의 성능과 수명이 단축된다. 이런 문제를 해결하기 위해 디스크 버퍼를 이용, 플래시 메모리에 할당되는 쓰기 및 지우기 연산을 감소시켜 플래시 메모리의 성능을 향상시키는 기술이 연구되고 있다. 본 논문에서는 쓰기 횟수를 최소화하기 위한 CPWL 기법을 제안한다. CPWL 기법은 버퍼 메모리 액세스 패턴에 따라 읽기 및 쓰기 페이지를 나누어 관리한다. 이렇게 나뉜 페이지를 정렬하여 쓰기 횟수를 줄이고 결과적으로 플래시 메모리의 수명을 늘리고 에너지 소비를 감소시킨다. The use of NAND flash memory is continuously increased with the demand of mobile data in the IT industry environment. However, the erase operations in flash memory require longer latency and higher power consumption, resulting in the limited lifetime for each cell. Therefore, frequent write/erase operations reduce the performance and the lifetime of the flash memory. In order to solve this problem, management techniques for improving the performance of flash based storage by reducing write and erase operations of flash memory with using disk buffers have been studied. In this paper, we propose a CPWL to minimized the number of write operations. It is a disk buffer management that separates read and write pages according to the characteristics of the buffer memory access patterns. This technique increases the lifespan of the flash memory and decreases an energy consumption by reducing the number of writes by arranging pages according to the characteristics of buffer memory access mode of requested pages.

      • KCI등재

        CPU-GPU환경에서 효율적인 메인메모리 접근을 위한 융합 프로세서 구조 개발

        박현문(Hyun-Moon Park),권진산(Jin-San Kwon),황태호(Tae-Ho Hwang),김동순(Dong-Sun Kim) 한국전자통신학회 2016 한국전자통신학회 논문지 Vol.11 No.2

        이기종시스템 구조(HSA)는 두 유닛의 각각에 메모리 폴(pools)이 가상메모리를 통해 공유할 수 있게 됨에 따라 CPU와 GPU 아키텍처의 오랜 문제를 해결하였다. 그러나 물리적 실제 시스템에서는 가상메모리 처리를 위해 GPU와 GPU 사이의 빈번한 메모리 이동으로 병목현상(Bottleneck)과 일관성 요청(Coherence request)의 오버헤드를 갖게 된다. 본 연구는 CPU와 GPU간의 효율적인 메인 메모리 접근방안으로 퓨전프로세서 알고리즘을 제안하였다. CPU가 요청한 처리할 메모리 영역을 GPU의 코어에 맞게 분배·제어해주는 기능으로 작업관리자(Job Manager)와 Re-mapper, Pre-fetcher를 제안하였다. 이를 통해 CPU와 GPU간의 빈번한 메시지도 감소되고 CPU의 메모리주소에 없는 Page-Table 요청이 낮아져 두 매체간의 효율성이 증대되었다. 제안한 알고리즘의 검증 방안으로 QEMU(:short for Quick EMUlator)기반의 에뮬레이터를 개발하고 CUDA(:Compute Unified Device. Architecture), OpenMP, OpenCL 등의 알고리즘과 비교평가를 하였다. 성능평가 결과, 본 연구에서 제안한 융합 프로세서 구조를 기존과 비교했을 때 최대 198%이상 빠르게 처리되면서 메모리 복사, 캐시미스 등의 오버헤드를 최소화하였다. The HSA resolves an old problem with existing CPU and GPU architectures by allowing both units to directly access each other's memory pools via unified virtual memory. In a physically realized system, however, frequent data exchanges between CPU and GPU for a virtual memory block result bottlenecks and coherence request overheads. In this paper, we propose Fusion Processor Architecture for efficient access of main memory from both CPU and GPU. It consists of Job Manager, Re-mapper, and Pre-fetcher to control, organize, and distribute work loads and working areas for GPU cores. These components help on reducing memory exchanges between the two processors and improving overall efficiency by eliminating faulty page table requests. To verify proposed algorithm architectures, we develop an emulator based on QEMU, and compare several architectures such as CUDA(Compute Unified Device Architecture), OpenMP, OpenCL. As a result, Proposed fusion processor architectures show 198% faster than others by removing unnecessary memory copies and cache-miss overheads.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼