http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
TCP-PPCC: Online-Learning Proximal Policy for Congestion Control
Shiwei Wang,Jing Li,Yuyao Guan,Pengpeng Ding 한국통신학회 2020 한국통신학회 APNOMS Vol.2020 No.09
Effective network congestion control strategies are the key to secure the normal operation of complex and changeable networks. The fundamental assumptions of many existing TCP congestion control variants dominated by hand-crafted heuristic algorithms are no longer valid. We propose an algorithm called TCP-Proximal Policy Congestion Control (TCP-PPCC), which is based on deep reinforcement learning algorithm Proximal Policy Optimization (PPO). TCP-PPCC updates the policy offline from the features of the preceding network state and feedback from the current network environment and adjusts the congestion window online with the updated policy. The senders with TCP-PPCC can learn about the changes in network bandwidth more accurately and adjust the congestion window in time. We demonstrate the performance of TCP-PPCC by comparing it with the traditional congestion control algorithm NewReno in four network scenarios with the ns-3 simulator. The results show that in scenario 2, TCPPPCC takes 58.75% improvement in average delay and 27.80% improvement in throughput compared with NewReno.
데이터 기반 모델-프리 조종 기법 연구: Proximal Policy Optimization을 이용한 Skid-To-Turn 유도탄의 종방향 모델 제어
김정우(Jeong Woo Kim),이동길(Donggil Lee),김현태(Hyuntae Kim),성정모(Jeong Mo Seong),장하민(Hamin Chang),강형관(Hyeong-gwan Kang),이영준(Youngjun Lee),심형보(Hyungbo Shim),김유단(Youdan Kim),박종호(Jongho Park) 한국항공우주학회 2021 한국항공우주학회 학술발표회 논문집 Vol.2021 No.11
박관우,김정수 제어·로봇·시스템학회 2022 제어·로봇·시스템학회 논문지 Vol.28 No.12
In this paper, we develop an artificial intelligence Tetris robot that plays the Tetris game autonomously. The Tetris robot consists of a game agent that learns how to play the Tetris game using reinforcement learning, and hardware that plays the actual game. To develop a game agent using deep reinforcement learning, the Markov decision process was defined and a policy-based deep reinforcement learning was applied. In this paper, the Tetris game agent was trained by applying the PPO (Proximal Policy Optimization) algorithm. In particular, the multi-agent learning method was employed for the PPO learning. For learning, the PPObased game agent took the game screen as an input and applied the action to the game through software to play the Tetris game 500,000 times. In order for the robot to play the actual game, the neural network corresponding to the learned game agent was stored in Jetson Xavier and the motor and camera were used. In other words, the standalone Tetris robot, separate from the computer where the Tetris game is running, consists of a Jetson Xaiver, one camera, one Arduino MEGA, three servo motors, and three fingers. To evaluate the performance of the robot, the value function of the game agent was presented, and the performance of the actual robot was verified through demonstration. .
강화학습과 시뮬레이션을 활용한 Wafer Burn-in Test 공정 스케줄링
권순우,오원준,안성혁,이현서,이호열,박인범,Soon-Woo Kwon,Won-Jun Oh,Seong-Hyeok Ahn,Hyun-Seo Lee,Hoyeoul Lee,In-Beom Park 한국반도체디스플레이기술학회 2024 반도체디스플레이기술학회지 Vol.23 No.2
Scheduling of semiconductor test facilities has been crucial since effective scheduling contributes to the profits of semiconductor enterprises and enhances the quality of semiconductor products. This study aims to solve the scheduling problems for the wafer burn-in test facilities of the semiconductor back-end process by utilizing simulation and deep reinforcement learning-based methods. To solve the scheduling problem considered in this study. we propose novel state, action, and reward designs based on the Markov decision process. Furthermore, a neural network is trained by employing the recent RL-based method, named proximal policy optimization. Experimental results showed that the proposed method outperformed traditional heuristic-based scheduling techniques, achieving a higher due date compliance rate of jobs in terms of total job completion time.
An Efficient Load Balancing Scheme for Gaming Server Using Proximal Policy Optimization Algorithm
( Hye-young Kim ) 한국정보처리학회 2021 Journal of information processing systems Vol.17 No.2
Large amount of data is being generated in gaming servers due to the increase in the number of users and the variety of game services being provided. In particular, load balancing schemes for gaming servers are crucial consideration. The existing literature proposes algorithms that distribute loads in servers by mostly concentrating on load balancing and cooperative offloading. However, many proposed schemes impose heavy restrictions and assumptions, and such a limited service classification method is not enough to satisfy the wide range of service requirements. We propose a load balancing agent that combines the dynamic allocation programming method, a type of greedy algorithm, and proximal policy optimization, a reinforcement learning. Also, we compare performances of our proposed scheme and those of a scheme from previous literature, ProGreGA, by running a simulation.
Hu Duoxiu,Dong Wenhan,Xie Wujie,He Lei 한국항공우주학회 2022 International Journal of Aeronautical and Space Sc Vol.23 No.2
A Markov decision process model with two stages of long-distance autonomous guidance and short-distance autonomous tracking of obstacle avoidance was developed in this study, aiming to address the performance problem of multi-rotor unmanned aerial vehicles (UAV) to ground dynamic target. On this basis, an improved proximal policy optimization (PPO) algorithm is proposed. The proposed algorithm uses long short-term memory (LSTM) network to calculate reward values, update network parameters and perform adaptive optimization iterations through status information, such as the real-time position relationship between the UAV and the target, taking into account the time-sequential data received from the UAV and the environmental context information. Finally, experiment with simulation testing was performed on platform based robot control system species. The results showed that the method proposed in this paper is able to safely and effectively realize autonomous maneuvering during the entire process of the reconnaissance mission. Compared with the traditional PPO algorithm, the introduction of LSTM neural network shortened the model training time, considerably improved the efficiency of tracking and avoiding obstacles, as well as further strengthened the robustness, accuracy, and real-time ability of the algorithm.
Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C
Yoshua Kaleb Purwanto,Dae-Ki Kang The Institute of Internet 2024 International Journal of Internet, Broadcasting an Vol.16 No.3
This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.
변영훈,김한솔 제어·로봇·시스템학회 2022 제어·로봇·시스템학회 논문지 Vol.28 No.11
This paper presents a proximal policy optimization (PPO)-based model reference tracking controller design for quadrotor unmanned aerial vehicles (UAVs). First, the quadrotor UAV is divided into the attitude and position systems, and each system is expressed as a nonlinear state-space equation. Thereafter, we design a linear reference model, which is used to derive the tracking error dynamics. The proposed neural network model implements a PPO-based reinforcement learning algorithm consisting of an actor approximating a policy and a critic corresponding to a state-value function. The actor receives the state variables of the tracking error dynamics as input and outputs the thrust values to be applied to each axis. Further, we decentralize the PPO-based controller into the attitude and position controllers, which are then trained separately. For learning purposes, we implement an environment code that expresses the tracking error dynamics of the quadrotor by extending the OpenAI Gym environment. Finally, a simulation example is provided to show the position tracking performances of up to 0.0166 m and 0.0254 m for the horizontal and vertical axes, respectively. .