http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
An Efficient Merging Algorithm for Recovery and Garbage Collection in Incremental Checkpointing
허준영,이상호,홍지만,조유근 朝鮮大學校 電子情報通信硏究所 2003 電子情報通信硏究所論文誌 Vol.6 No.2
Incremental checkpointing saves only the modified pages of the checkpoint using page write-protection. While on the one hand, the checkpointing overhead is decreased when incremental checkpointing is used, old checkpoints cannot be merged and deleted because the process' memory pages become spread out over many incremental checkpoints. In this paper, we present an efficient merging algorithm for the recovery and garbage collection in incremental checkpointing. The proposed algorithm can merge several incremental checkpoints into a full checkpoint for recovery and can delete unnecessary incremental checkpoints.
안정 저장장치의 효율적 사용을 위한 페이지 기반 점진적 검사점 기법
허준영(Junyoung Heo),이상호(Sangho Yi),구본철(Boncheol Gu),조유근(Yookun Cho),홍지만(Jiman Hong) 한국정보과학회 2007 정보과학회논문지 : 시스템 및 이론 Vol.34 No.11·12
페이지 기반 점진적 검사점은 검사점 오버헤드를 줄이기 위해 프로세스의 메모리 상태 중 변경된 페이지만 저장하는 기법이다. 그러나 점진적 검사점의 누적 크기는 검사점 횟수가 증가함에 따라 서서히 증가하게 된다. 이는 한 페이지가 검사점 작성 이후에 변경되어 검사점 작성시에 검사점에 저장되는 과정이 되풀이 되고, 이후에 삭제되지 않기 때문이다. 복구 시에 프로세스의 저장된 상태를 만들기 위해 검사점들이 모두 필요할 수 있으므로 함부로 검사점을 삭제를 할 수 없다. 본 논문에서는 페이지 기반 검사점 도구인 Pickpt를 소개하고, Pickpt가 검사점의 누적 크기 증가 문제를 해결하는 방법을 설명한다. 실험을 통해 기존 점진적 검사점에 비해 Pickpt가 점진적 검사점의 누적 크기를 현저히 줄임을 보였다. Incremental checkpointing, which is intended to minimize checkpointing overhead, saves only the modified pages of a process. However, the cumulative size of incremental checkpoints increases at a steady rate over time because a number of updated values may be saved for the same page. In this paper, we present a comprehensive overview of Pickpt, a page-level incremental checkpointing facility. Pickpt provides space-efficient techniques aiming to minimizing the use of disk space. For our experiments, the results showed that the use of disk space using Pickpt was significantly reduced, compared with existing incremental checkpointing.
임의 주기를 가지는 실시간 멀티 태스크를 위한 체크포인트 구간 최적화
곽성우(Seong Woo Kwak),양정민(Jung-Min Yang) 대한전기학회 2011 전기학회논문지 Vol.60 No.1
This paper presents an optimal checkpoint strategy for fault-tolerance in real-time systems. In our environment, multiple real-time tasks with arbitrary periods are scheduled in the system by Rate Monotonic (RM) algorithm, and checkpoints are inserted at a constant interval in each task while the width of interval is different with respect to the task. We propose a method to determine the optimal checkpoint interval for each task so that the probability of completing all the tasks is maximized. Whenever a fault occurs to a checkpoint interval of a task, the execution time of the task would be prolonged by rollback and re-execution of checkpoints. Our scheme includes the schedulability test to examine whether a task can be completed with an extended execution time. A numerical experiment is conducted to demonstrate the applicability of the proposed scheme.
New execution model for CAPE using multiple threads on multicore clusters
Do, Xuan Huyen,Ha, Viet Hai,Tran, Van Long,Renault, Eric Electronics and Telecommunications Research Instit 2021 ETRI Journal Vol.43 No.5
Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.
즉각적 오류 감지가 가능한 경우의 체크포인팅 모형 분석
이유태 한국정보통신학회 2022 한국정보통신학회논문지 Vol.26 No.1
Reactive failure management techniques are required to mitigate the impact of errors in high performance computing. Checkpoint is the standard recovery technique for coping with errors. An application employing checkpoints periodically saves its state, so that when an error occurs while some task is executing, the application is rolled back to its last checkpointed task and resumes execution from that task onward. In this paper, assuming the time-to-errors are independent each other and generally distributed, we analyze the checkpointing model with instantaneous error detection. The conventional assumption that two or more errors do not take place between two consecutive checkpoints is removed. Given the checkpointing time, down-time, and recovery time, we derive the reliability of the checkpointing model. When the time-to-error follows an exponential distribution, we obtain the optimal checkpointing interval to achieve the maximum reliability. 고성능 컴퓨팅 분야에서 오류의 영향을 완화하기 위해 사후 장애 관리 기법이 필요하다. 일반적인 오류 복구 기법은 체크포인트 기법이다. 이 기법은 체크포인트를 설정해서 주기적으로 응용 프로그램의 상태를 저장했다가, 오류가 발생했을 때 오류 발생 이전 상태로 시스템을 복구하는 것이다. 본 논문에서는 오류 발생 시간이 독립이고 동일한 일반적인 분포를 따른다는 가정에서 즉각적으로 오류를 감지하는 경우의 체크포인팅 모형을 분석한다. 두 체크포인트 사이에 많아야 하나의 오류만 발생한다는 가정을 제거한다. 체크포인트 발생 시간, 고장 시간, 복구 시간 등이 주어질 때, 시스템의 신뢰도를 유도한다. 또한, 오류 발생 시간이 지수 분포를 따르는 경우에 최적의 체크 포인팅 시간 간격을 구한다.
동시 결함 검출 기능이 있는 실시간 제어 시스템의 결함 허용성을 위한 적응형 체크포인팅 기법
유상문 제어·로봇·시스템학회 2011 제어·로봇·시스템학회 논문지 Vol.17 No.1
The checkpointing scheme is a well-known technique to cope with transient faults in digital systems. This paper proposes an adaptive checkpointing scheme for the reliability improvement of real-time control systems with concurrent fault detection capability. With concurrent fault detection capability the effect of transient faults are assumed to be detected with no latency. The proposed adaptive checkpointing scheme is based on the reliability analysis of an equidistant checkpointing scheme. Numerical data show the proposed adaptive scheme outperforms the equidistant scheme from a reliability point of view.
실시간 제어 시스템의 결함 허용성을 위한 적응형 체크포인팅 기법
류상문(Sang-Moon Ryu) 제어로봇시스템학회 2009 제어·로봇·시스템학회 논문지 Vol.15 No.6
The checkpointing scheme is a well-known technique to cope with transient faults in digital systems. This paper proposes an adaptive checkpointing scheme for the reliability improvement of real-time control systems. The proposed adaptive checkpointing scheme is based on the previous work about the reliability problem of an equidistant checkpointing scheme. For the derivation of the adaptive scheme, some conditions are introduced which are to be satisfied for the reliability improvement by exploiting an equidistant checkpointing scheme. Numerical data show the proposed adaptive scheme outperforms the equidistant scheme from a reliability point of view.
동시 결함 검출 기능이 있는 실시간 제어 시스템의 결함 허용성을 위한 적응형 체크포인팅 기법
류상문(Sang-Moon Ryu) 제어로봇시스템학회 2011 제어·로봇·시스템학회 논문지 Vol.17 No.1
The checkpointing scheme is a well-known technique to cope with transient faults in digital systems. This paper proposes an adaptive checkpointing scheme for the reliability improvement of real-time control systems with concurrent fault detection capability. With concurrent fault detection capability the effect of transient faults are assumed to be detected with no latency. The proposed adaptive checkpointing scheme is based on the reliability analysis of an equidistant checkpointing scheme. Numerical data show the proposed adaptive scheme outperforms the equidistant scheme from a reliability point of view.
실시간 제어 시스템의 결함 극복을 위한 이중화 구조와 체크포인팅 기법의 성능 분석
유상문(Sang-Moon Ryu) 제어로봇시스템학회 2008 제어·로봇·시스템학회 논문지 Vol.14 No.4
This paper deals with a performance analysis of real-time control systems, which engages DMR(dual modular redundancy) to detect transient errors and checkpointing technique to tolerate transient errors. Transient errors are caused by transient faults and the most significant type of errors in reliable computer systems. Transient faults are assumed to occur according to a Poisson process and to be detected by a dual modular redundant structure. In addition, an equidistant checkpointing strategy is considered. The probability of the successful task completion in a real-time control system where periodic checkpointing operations are performed during the execution of a real-time control task is derived. Numerical examples show how checkpoiniting scheme influences the probability of task completion. In addition, the result of the analysis is compared with the simulation result.