RISS 검색 - 학위논문 상세보기

국문 초록 (Abstract)

SBOORL : 강화학습 기반
스타크래프트 빌드오더 최적화

요 약

인공지능의 발전으로 현실 세계와 유사하고 복잡한 환경을 갖고 있는 실시간 전략 게임인 스타크래프트가 주요 연구대상이 되고 있다. 스타크래프트는 가위-바위-보 게임처럼 단순한 최고 전략이 없는 게임이며 원인과 결과가 즉각적이지 않다. 스타크래프트에서는 마이크로 매니지먼트와 매크로 매니지먼트의 적절한 조화가 필요하다. 특히 매크로 매니지먼트에서 게임 승리를 위하여 빌드오더의 최적화가 요구된다. 경험상 스타크래프트에서 빌드오더는 승리와 패배의 결정 요인 중 대략 1/3 이상을 차지한다.
본 논문에서는 스타크래프트 빌드오더의 기존 연구에 대하여 개선할 수 있는 해법으로써 강화학습을 적용한 빌드오더 모델인 SBOORL을 제시한다. 또한, 강화학습을 하기 위한 필수요소인 MDP를 정의하였고, 두 가지 형태의 보상을 제안하였다. 온전한 강화학습을 적용하기 어려움을 극복하기 위해 지도학습과 강화학습을 결합한 방식을 제안했다. Replay data[1][2]로부터 data set을 사용하는 지도학습과 실시간으로 추출되는 data를 사용하는 강화학습을 결합하기 위하여 같은 형식의 data로 정의하였다. 강화학습의 환경 구축 및 스타크래프트 인공지능 봇이 full-game으로 동작될 수 있도록 UAlbertaBot[3]의 매크로매니지먼트 중 Production Manager를 수정하여 SBOORL 모델과 통합하였다.
제안한 모델을 평가·적용하기 위하여 지도학습 모델을 data set으로 기본적인 지표측정은 물론 빌트인 인공지능 봇과의 대전 실험을 통해 확인하였다. 또한, 본 논문의 결과에서 SSCAIT[4]에 등재되어 있는 중간 레벨의 인공지능 봇과의 대전을 통하여 가장 높은 승률을 기록하였다.
그러나 SBOORL 모델의 결점은 대전 결과 다양한 빌드오더를 내리기 보다는 특정 빌드오더의 방향으로 학습된 결과를 보이는 점이다. 이 문제에 대하여 Self-play 학습과정을 도입하여 보다 동적이고 창의적인 빌드오더를 내리는 인공지능 봇에 대한 연구의 필요성을 향후 연구로써 남긴다.
본 연구는 추상적이고 소규모의 문제 공간을 풀이한 것이 아니다. 그래서 본 연구에서 다룬 빌드오더 문제를 현실 세계에서 복잡한 환경을 가진 문제로 삼을 수 있다. 이에 따라 본 논문에서 제안한 모델은 현실 세계에서 순서가 존재하는 수많은 의사 결정 문제에 대입하여 문제풀이 하는데 도움 되고 적용할만한 가치가 있다고 본다.

번역하기

SBOORL : 강화학습 기반 스타크래프트 빌드오더 최적화 요 약 인공지능의 발전으로 현실 세계와 유사하고 복잡한 환경을 갖고 있는 실시간 전략 게임인 스타크래프트가 주요 연구대상이 ...

다국어 초록 (Multilingual Abstract)

SUMMARY

SBOORL :
Starcraft Build Order Optimization
Based on Reinforcement Learning

Kim, Kyeong Seok
Dep. of Computer Engineering
Graduate School of
Korea Aerospace University
(Advisor : Prof. Song, Dong Ho, Ph. D.)

With the advancement of artificial intelligence, Starcraft, a real-time strategy game that has a similar and complex environment to the real world, is becoming a major research topic. StarCraft is a game that does not have a simple top strategy like a rock-scissor-paper game, and its cause and effect are not obvious. In Starcraft, the proper balance of micro and macro-management is required, and In particular, optimization of the build order is required to win the game in macro-management. By rule of thumb, build order could contribute to one-third of the decision factors for winning or losing in Starcraft.
In this paper, we present SBOORL, a build order model applying reinforcement learning, as a solution that can improve the existing research on StarCraft build orders. The model specifies Markov Decision Process (MDP) an essential element for reinforcement learning and two types of rewards were proposed. To combine supervised learning using data set from Replay data[1][2] and reinforcement learning using data extracted in real-time, it was defined as data of the same format. SBOORL model was implemented by modifying the production manager of UAlbertaBot[3] macro-management so that the environment of reinforcement learning and the StarCraft artificial intelligence bot can operate interactively as a full-game.
In order to evaluate and apply the proposed model, the supervised learning model was verified as a data set through basic index measurement as well as a battle experiment with a built-in A.I. bot. SBOORL has been evaluated by battling several times against the middle-ranked games in SSCAIT[4] and scored the highest winning rate as shown in the test result in this paper.
However, the shortcoming of the SBOORL model is that it shows the result of learning in the direction of a specific build order rather than giving various build orders as a result of the match. For this problem, the need for research on artificial intelligence bots that give more dynamic and creative build orders by introducing a self-play learning process is left as a future study.
This study is neither an abstract nor a small-scale problem space. Therefore, the build order problem dealt with in this study is being considered as a complex environment in the real world. Accordingly, in this paper, the proposed model helps solve the problem by substituting it into many decision-making problems that have an order in the real world, and it is considered worth applying.

번역하기

목차 (Table of Contents)

제1장 서 론 1
제2장 관련연구 5
2.1 스타크래프트 인공지능 봇 5
2.2 기존 빌드오더 연구 사례 5

제1장 서 론 1
제2장 관련연구 5
2.1 스타크래프트 인공지능 봇 5
2.2 기존 빌드오더 연구 사례 5
2.2.1 목적 기반의 빌드오더 플래닝 6
2.2.2 COEP(Continual Online Evolutionary Planning) 6
2.2.3 Replay data로부터 딥러닝을 이용한 빌드오더 플래닝 7
2.2.4 Relation Networks를 이용한 빌드오더 플래닝 7
2.3 강화학습 8
2.3.1 강화학습이란? 8
2.3.2 MDP(Markov Decision Process) 11
2.3.3 강화학습의 Prediction과 Control 12
2.3.4 Bellman Equation 14
2.3.5 Model-Based 강화학습 14
2.3.6 Model-Free 강화학습 15
2.3.7 Deep 강화학습 18
2.3.7.1 Deep 가치 기반 강화학습 18
2.3.7.2 Deep 정책 기반 강화학습 19
제3장 모델링 22
3.1 강화학습 재료 준비하기 22
3.1.1 강화학습을 적용시키기 위한 스타크래프트 환경 22
3.1.2 스타크래프트 빌드오더 모델의 MDP 24
3.1.3 REINFORCE(Monte-carlo Policy Gradient) 27
3.2 지도학습을 이용하여 빌드오더 모델 사전학습 29
3.2.1 Replay file data 전처리 29
3.2.2 정제한 data로부터 지도학습 모델 31
3.3 강화학습을 이용한 빌드오더 모델 33
3.3.1 강화학습만 적용한 빌드오더 모델 33
3.3.2 SBOORL 35
3.4 빌드오더 모델과 매크로매니지먼트 통합 37
제4장 구현 및 실험 결과 40
4.1 강화학습만 적용한 빌드오더 모델 구현 40
4.2 SBOORL 구현 42
4.2.1 Replay file data 전처리 구현 43
4.2.2 정제한 data set으로부터 지도학습 모델 구현 46
4.2.3 SBOORL 구현 과정 49
4.3 빌드오더 모델과 매크로매니지먼트 통합 구현 49
4.4 실험 결과 51
4.4.1 실험 환경 51
4.4.2 강화학습만 적용한 빌드오더 모델 실험 51
4.4.3 SBOORL 실험 53
4.4.3.1 SBOORL 지도학습 실험 53
4.4.3.2 SBOORL 강화학습 실험 56
4.4.4 SBOORL의 성과 59
4.4.5 인공지능 봇과의 대전 결과 60
제5장 결론 및 향후 연구 62
참고 문헌 64
SUMMARY 67

상세검색

RISS 보유자료

상세검색

해외전자자료

SBOORL : 강화학습 기반 스타크래프트 빌드오더 최적화 = SBOORL : Starcraft Build Order Optimization Based on Reinforcement Learning

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료