RISS 검색 - 국내학술지논문 상세보기

다국어 초록 (Multilingual Abstract)

Successor representation (SR) is a model of human reinforcement learning (RL) mimicking the underlying mechanism of hippocampalcells constructing cognitive maps. SR utilizes these learned features to adaptively respond to the frequent reward changes. In this paper,we evaluated the performance of SR under the context where changes in latent variables of environments trigger the reward structurechanges. For a benchmark test, we adopted SR-Dyna, an integration of SR into goal-driven Dyna RL algorithm in the 2-stage MarkovDecision Task (MDT) in which we can intentionally manipulate the latent variables – state transition uncertainty and goal-condition.
To precisely investigate the characteristics of SR, we conducted the experiments while controlling each latent variable that affects thechanges in reward structure. Evaluation results showed that SR-Dyna could learn to respond to the reward changes in relation to thechanges in latent variables, but could not learn rapidly in that situation. This brings about the necessity to build more robust RL modelsthat can rapidly learn to respond to the frequent changes in the environment in which latent variables and reward structure changeat the same time.

번역하기

국문 초록 (Abstract)

차기 상태 천이 표상(Successor representation, SR) 기반 강화학습 알고리즘은 두뇌에서 발현되는 신경과학적 기전을 바탕으로 발전해온 강화학습 모델이다. 해마에서 형성되는 인지맵 기반의 환경 구조 정보를 활용하여, 변화하는 환경에서도 빠르고 유연하게 학습하고 의사결정 가능한자연 지능 모사형 강화학습 방법으로, 불확실한 보상 구조 변화에 대해 빠르게 학습하고 적응하는 강인한 성능을 보이는 것으로 잘 알려져 있다.
본 논문에서는 표면적인 보상 구조가 변화하는 환경뿐만 아니라, 상태 천이 확률과 같은 환경 구조 내 잠재 변수가 보상 구조 변화를 유발하는상황에서도 SR-기반 강화학습 알고리즘이 강인하게 반응하고 학습할 수 있는지 확인하고자 한다. 성능 확인을 위해, 상태 천이에 대한 불확실성과이로 인한 보상 구조 변화가 동시에 나타나는 2단계 마르코프 의사결정 환경에서, 목적 기반 강화학습 알고리즘에 SR을 융합한 SR-다이나 강화학습에이전트 시뮬레이션을 수행하였다. 더불어, SR의 특성을 보다 잘 관찰하기 위해 환경을 변화시키는 잠재 변수들을 순차적으로 제어하면서 기존의환경과 비교하여 추가적인 실험을 실시하였다. 실험 결과, SR-다이나는 환경 내 상태 천이 확률 변화에 따른 보상 변화를 제한적으로 학습하는행동을 보였다. 다만 기존 환경에서의 실험 결과와 비교했을 때, SR-다이나는 잠재 변수 변화로 인한 보상 구조 변화를 빠르게 학습하지는 못하는것으로 확인 되었다. 본 결과를 통해 환경 구조가 빠르게 변화하는 환경에서도 강인하게 동작할 수 있는 SR-기반 강화학습 에이전트 설계를 기대한다.

번역하기

차기 상태 천이 표상(Successor representation, SR) 기반 강화학습 알고리즘은 두뇌에서 발현되는 신경과학적 기전을 바탕으로 발전해온 강화학습 모델이다. 해마에서 형성되는 인지맵 기반의 환경...

참고문헌 (Reference)

1 J. H. Lee, "Toward high-performance, memory-efficient, and fast reinforcement learning-Lessons from decision neuroscience" 4 (4): eaav2975-, 2019

2 I. Momennejad, "The successor representation in human reinforcement learning" 1 (1): 680-692, 2017

3 S. J. Gershman, "The successor representation : Its computational logic and neural substrates" 38 (38): 7193-7200, 2018

4 J. P. O’Doherty, "The structure of reinforcement-learning mechanisms in the human brain" 1 : 94-100, 2014

5 K. L. Stachenfeld, "The hippocampus as a predictive map" 20 (20): 1643-1653, 2017

6 G. Farquhar, "Self-Consistent Models and Values" 34 : 1111-1125, 2021

7 R. S. Sutton, "Reinforcement learning: An introduction" MIT press 2018

8 J. X. Wang, "Prefrontal cortex as a meta-reinforcement learning system" 21 (21): 860-868, 2018

9 E. M. Russek, "Predictive representations can link model-based reinforcement learning to model-free mechanisms" 13 (13): e1005768-, 2017

10 D. Hassabis, "Neuroscience-inspired artificial intelligence" 95 (95): 245-258, 2017