RISS 검색 - 학위논문 상세보기

다국어 초록 (Multilingual Abstract)

Based on human learning system, reinforcement learning (RL) has shown superhuman performance. There is an emerging number of model-based (MB) RL approaches pursuing the potential benefits of higher sample efficiency and fast adaptation capacity. However, according to recent benchmark studies, MB is not always superior to model-free (MF) RL if it experiences difficulty in learning relatively easy tasks for humans or if the formation of a world model is hindered by the uncertain options of the task. To generalize the agent regardless of the task conditions, the RL agent has to use both MB and MF learning strategies parallelly. Recent findings in computational neuroscience suggest mounting evidence to support the key principle underlying RL in the human brain is meta-control, such as the arbitration control of MB and MF based on prediction error (PE). To this end, we propose a novel neuroscience-inspired RL algorithm called Meta-Dyna, which can flexibly adapt to frequent changes in environments, including both the goals and latent state-transition uncertainty, based on the concept of prefrontal meta-control. Using this approach, we test three environments and demonstrate optimal performance: i) Two-stage MDT, which is widely known for investigating the characteristics of human RL; ii) GridWorldLoCA, known as a benchmark environment for MB RL; iii) Gym Atari-Pong, newly designed based on OpenAI Gym Atari-Pong. We applied goal condition and state-transition probability based on the Two-stage MDT option. Experimental results show that our proposal exhibited better performance with respect to average reward, choice optimality, and energy efficiency than those of baseline RL models (p<0.001, independent sample t-test). Applying Meta Control based on the prefrontal cortex, Meta-Dyna demonstrates superiority in terms of a performance-speed-efficiency balance, as evidenced by the highest average rewards (Two-stage MDT - tabular: 0.61, neural network: 0.71 / Atari-Pong: -0.091), rapid convergence to optimal points (GridWorldLoCA), and lower learning costs (Atari-Pong-timestep). Gaining a deeper insight into these results would allow us not only to advance the computational theory of RL but also to build human-like RL agents.

번역하기

국문 초록 (Abstract)

인간 두뇌의 전두엽 메타 제어 이론 기반 강화학습 에이전트 적응력 향상에 관한 연구 인간 학습 시스템을 바탕으로 한 강화학습 (Reinforcement Learning) 은 특화된 분야에서 인간을 뛰어넘는 성능을 보여주고 있다. 최근 높은 샘플 효율성과 빠른 적응 능력의 잠재적 이점을 추구하는 모델 기반 (Model Based) 강화학습 (Reinforcement Learning) 접근 방식이 점점 늘어나고 있다. 그러나 최근 벤치마크 연구에 따르면, 모델 기반 (Model Based) 강화학습 (Reinforcement Learning) 이 인간에게 비교적 쉬운 작업을 학습하는 데 어려움을 겪거나 작업의 불확실한 선택으로 인 해 환경 모델 (World Model) 형성이 방해받을 경우, 모델 프리 (Model-Free) 강화학습 (Reinforcement Learning) 보다 항상 우월하지 는 않음을 알 수 있다. 작업 조건에 관계없이 에이전트를 일반화하려면, 강화학습 (Reinforcement Learning) 에이전트는 모델 기반 (Model Based) 와 모델 프리 (Model-Free) 학습 전략을 병행적으로 사용해야 한다. 최근 계산 신경과학에서의 발견은 인간 뇌에서 강화학습 (Reinforcement Learning) 을 근간으로 하는 주요 요소인 메타 제어 (Meta Control), 예를 들어 예측 오류 (Prediction Error) 에 기반한 모 델 기반 (Model Based) 와 모델 프리 (Model-Free) 의 중재 (arbitration) 제어를 지지하는 증거가 점차 늘어나고 있다. 이를 위해, 이 논문에서는 빈번한 환경 변화에 유연하게 적응할 수 있는 새로운 신경과 학에서 영감을 받은 강화학습 (Reinforcement Learning) 알고리즘인 메 타-다이나 (Meta-Dyna) 를 제안한다. 이는 목표와 잠재적 상태 전환 불 확실성을 포함한 환경 변화에 대해 전두엽 메타 제어 (Meta Control) 개 념을 바탕으로 한다. 이 접근 방식을 사용하여, 세 가지 환경에서 최적의 성능을 시험하고 입증했다: i) 의사결정에 대한 인간 강화학습 (Reinforcement Learning) 의 특성을 확인하는 데 널리 알려진 Two-stage MDT; ii) 모델 기반 (Model Based) 강화학습 (Reinforcement Learning) 벤치마크 환경으로 알려진 GridWorldLoCA; iii) OpenAI Gym Atari-Pong 을 기반으로 새롭게 설계된 Atari-Pong. 이 논문에서는 Two-stage MDT 옵션에 기반하여 목표 조건과 상태 전 환 확률을 적용하였다. 실험 결과에 따르면, 평균 보상, 선택 최적성, 에 너지 효율성 측면에서 기존 강화학습 (Reinforcement Learning) 모델들 보다 통계적으로 더 높은 성능을 보였다 (p<0.001, independent sample t-test). 전두엽 기반 메타 제어 (Meta Control) 를 적용한 메타-다이나 (Meta-Dyna) 는 가장 높은 평균보상 (Two-stage MDT - tabular:0.61, neural network:0.71 / Atari-Pong:-0.091), 최적점으로의 빠른 수렴 (GridWorldLoCA), 학습 관련 적은 비용 (cost) (Atari-Pong-timestep) 과 같이 성능-속도-효율 균형점 측면에서 우위를 가져감을 확인할 수 있 다. 이러한 결과를 더 깊이 이해함으로써, 강화학습 (Reinforcement Learning) 의 계산 이론을 발전시킬 뿐만 아니라 인간과 같은 (human-like) 강화학습 (Reinforcement Learning) 에이전트를 구축할 수 있을 것이다.

번역하기

인간 두뇌의 전두엽 메타 제어 이론 기반 강화학습 에이전트 적응력 향상에 관한 연구 인간 학습 시스템을 바탕으로 한 강화학습 (Reinforcement Learning) 은 특화된 분야에서 인간을 뛰어넘는 성...

목차 (Table of Contents)