RISS 검색 - 국내학술지논문 상세보기

다국어 초록 (Multilingual Abstract)

This paper introduces a method of cross-entropy planning which updates prior probability for planning optimization process. Cross-entropy planning is a popular method in online planning and involves the extraction of samples from a simulation environment and selection of optimal action based on the values of the extracted samples. The performance of the cross-entropy planning is limited due to involvement of optimization processes without usage of previous planning results. We propose a method that updates prior probabilities for the optimization process based on the action sequences acquired from the cross-entropy planning. The proposed method improves the performance of cross-entropy planning with progression of planning epoch. We evaluated the proposed method based on the comparison with the cross-entropy planning in a physical-based simulation (OpenAI Gym) environment.

국문 초록 (Abstract)

본 논문에서는 사전확률 갱신을 수행하는 교차 엔트로피 계획법에 관해 기술한다. 교차 엔트로피 계획법은 실시간 계획법(online planning)에서 많이 사용하는 방법론으로 가상환경으로부터 표...

본 논문에서는 사전확률 갱신을 수행하는 교차 엔트로피 계획법에 관해 기술한다. 교차 엔트로피 계획법은 실시간 계획법(online planning)에서 많이 사용하는 방법론으로 가상환경으로부터 표본(sample)을 추출하고 추출된 표본으로부터 평가된 가치를 기반으로 최적의 행동(action)을 선택한다. 기존 교차 엔트로피 계획법은 최적화 과정에서 이전에 얻어진 탐색결과를 활용하지 않고 매번 새롭게 탐색을 수행한다. 따라서 정해진 시간 내에 탐색을 수행해야 할 경우, 도달할 수 있는 성능이 제한되어 있다. 본 논문에서는 행동 차원에 대한 교차 엔트로피 계획법의 결과물을 활용하여 최적화 과정에서의 사전확률을 갱신하고, 이를 통해 점차 높은 성능을 보일 수 있는 방법론을 제안한다. 또한, 실험에서는 물리 기반 가상환경(OpenAI Gym)에서 교차 엔트로피 계획법과 비교를 통해 제안된 방법론을 평가한다.

참고문헌 (Reference)

1 G. Brockman, "OpenAI Gym"

2 A. Weinstein, "Open-Loop planning in Large-Scale Stochastic Domains" 1436-1442, 2013

3 D. Silver, "Mastering the game of Go with deep neural networks and tree search" 529 : 484-489, 2016

4 D. Hafner, "Learning Latent Dynamics for Planning from Pixels" 2019

5 M. P. Deisenroth, "Gaussian processes for data-efficient learning in robotics and control" 37 (37): 408-423, 2015

6 M. Lewis, "Deal or no deal? end-to-end learning of negotiation dialogues" 2443-2453, 2017

7 M. Kobilarov, "Cross-entropy motion planning" 31 (31): 855-871, 2012

8 N. Lipovetzky, "Classical Planning with Simulators: Results on the Atari Video Games" 1610-1616, 2015

9 L. Kocsis, "Bandit based Monte-Carlo planning" 282-293, 2006

10 P. Boer, "A tutorial on the cross-entropy method" 134 (134): 19-67, 2005