1 Sunehag P, "Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward" 2085-2087, 2018
2 Sharma P K, "Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training" III : 2021
3 Foerster J, "Stabilising experience replay for deep multi-agent reinforcement learning" 1146-1155, 2017
4 Claudine Badue, "Self-driving cars : A survey" 165 : 113816-, 2021
5 X. Li, "Reinforcement learning based overtaking decision making for highway autonomous driving" IEEE 336-342, 2015
6 Rashid, T., "QMIX : Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning" 4295-4304, 2018
7 D. Koller, "Probabilistic Graphical Models: Principles and Techniques" MIT Press 2009
8 Lin M, "Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay" 1-12, 2021
9 Mnih, V., "Playing atari with deep reinforcement learning"
10 Q. Wei, "Optimal elevator group control via deep asynchronous actor-critic learning" 31 (31): 5245-5256, 2020
1 Sunehag P, "Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward" 2085-2087, 2018
2 Sharma P K, "Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training" III : 2021
3 Foerster J, "Stabilising experience replay for deep multi-agent reinforcement learning" 1146-1155, 2017
4 Claudine Badue, "Self-driving cars : A survey" 165 : 113816-, 2021
5 X. Li, "Reinforcement learning based overtaking decision making for highway autonomous driving" IEEE 336-342, 2015
6 Rashid, T., "QMIX : Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning" 4295-4304, 2018
7 D. Koller, "Probabilistic Graphical Models: Principles and Techniques" MIT Press 2009
8 Lin M, "Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay" 1-12, 2021
9 Mnih, V., "Playing atari with deep reinforcement learning"
10 Q. Wei, "Optimal elevator group control via deep asynchronous actor-critic learning" 31 (31): 5245-5256, 2020
11 R. Lowe, "Multiagent actor-critic for mixed cooperative-competitive environments" 6382-6393, 2017
12 Parunak, "Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence" MIT Press 377-421, 2000
13 Stone P, "Multiagent Systems : A Survey from a Machine Learning Perspective" 8 : 345-383, 2000
14 Kuyer, L, "Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs" Springer 2008
15 M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents" 330-337, 1993
16 Y. Yang, "Mean field multiagent reinforcement learning" 5571-5580, 2018
17 Littman, M. L, "Markov games as a framework for multi-agent reinforcement learning" Morgan Kauffman Publishers 157-163, 1994
18 M. L. Puterman, "Markov decision processes: discrete stochastic dynamic programming" John Wiley & Sons 2014
19 Jiang, "Learning attentional communication for multi-agent cooperation" 7265-7275, 2018
20 Z. Zhang, "Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization" 2083-2085, 2020
21 Zhang Y, "Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning" IV : 1251-1256, 2018
22 Volodymyr, M., "Human-level control through deep reinforcement learning" 518 (518): 529-533, 2015
23 D. Huang, "Ensemble clustering using factor graph" 50 : 131-142, 2016
24 Zawadzki, E., "Empirically evaluating multiagent learning algorithms" 2014
25 P. Kravets, "Dynamic coordination of strategies for multi-agent systems" Springer 653-670, 2020
26 C. Yu, "Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs" 21 (21): 735-748, 2020
27 K. Shah, "Distributed independent reinforcement learning (dirl) approach to resource management in wireless sensor networks" 2007
28 T. Hester, "Deep q-learning from demonstrations" 32 : 2018
29 W. B¨ohmer, "Deep coordination graphs" 980-991, 2020
30 Farinelli A, "Decentralised coordination of low-power embedded devices using the max-sum algorithm" International Foundation for Autonomous Agents and Multiagent Systems 2008
31 Foerster J, "Counterfactual Multi-Agent Policy Gradients"
32 C. Guestrin, "Coordinated reinforcement learning" 2 : 227-234, 2002
33 Gupta, J. K., "Cooperative Multi-agent Control Using Deep Reinforcement Learning" Springer 2017
34 Kok J R, "Collaborative multiagent reinforcement learning by payoffpropagation" 7 : 1789-1828, 2006
35 R. Dechter, "Bucket elimination : A unifying framework for reasoning" 113 (113): 41-85, 1999
36 N. A. Khalid, "An adaptive agent-based partner selection for routing packet in distributed wireless sensor network" 2016
37 W. Du, "A survey on multi-agent deep reinforcement learning : from the perspective of challenges and applications" 54 : 3215-3238, 2021
38 Grigorescu S, "A survey of deep learning techniques for autonomous driving" 37 (37): 362-386, 2020
39 H. Liu, "A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting" 202 : 117794-, 2020
40 D. Ye, "A multi-agent framework for packet routing in wireless sensor networks" 15 (15): 10026-10047, 2015
41 Smirnov N, "A game theory-based approach for modeling autonomous vehicle behavior in congested, urban lane-changing scenarios" 21 (21): 1523-, 2021
42 Meiyu Liu, "A cellular automata traffic flow model combined with a bp neural network based microscopic lane changing decision model" 23 (23): 309-318, 2019
43 Zeyu Zhu, "A Survey of Deep RL and IL for Autonomous Driving Policy Learning"
44 Zhu, Z., "A Survey of Deep RL and IL for Autonomous Driving Policy Learning" 2021