1 Schulman, J, "Trust Region Policy Optimization"
2 Palmas, A, "The Reinforcement Learning Workshop" Packt Publishing Ltd 483-549, 2020
3 Williams, R. J, "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning" 8 : 229-256, 1992
4 Sutton, R. S, "Reinforcement Learning: An Introduction" MIT Press 1998
5 Altuntas, N, "Reinforcement Learning-based Mobile Robot Navigation" 24 (24): 1747-1767, 2016
6 Watkins, C. J. C. H, "Q-Learning" 8 : 279-292, 1992
7 Schulman, J, "Proximal Policy Optimization Algorithms"
8 Mnih, V, "Playing Atari with Deep Reinforcement Learning"
9 Ye, D, "Modeling, Simulation and Fabrication of a Balancing Robot, 2.151: Advanced System Dynamics & Control" Massachusetts Institute of Technology 2012
10 Siegwart, R, "Introduction to Autonomous Mobile Robots" MIT Press 62-103, 2004
1 Schulman, J, "Trust Region Policy Optimization"
2 Palmas, A, "The Reinforcement Learning Workshop" Packt Publishing Ltd 483-549, 2020
3 Williams, R. J, "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning" 8 : 229-256, 1992
4 Sutton, R. S, "Reinforcement Learning: An Introduction" MIT Press 1998
5 Altuntas, N, "Reinforcement Learning-based Mobile Robot Navigation" 24 (24): 1747-1767, 2016
6 Watkins, C. J. C. H, "Q-Learning" 8 : 279-292, 1992
7 Schulman, J, "Proximal Policy Optimization Algorithms"
8 Mnih, V, "Playing Atari with Deep Reinforcement Learning"
9 Ye, D, "Modeling, Simulation and Fabrication of a Balancing Robot, 2.151: Advanced System Dynamics & Control" Massachusetts Institute of Technology 2012
10 Siegwart, R, "Introduction to Autonomous Mobile Robots" MIT Press 62-103, 2004
11 Geron, A, "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow" O’Reilly Media 2019