Search : [ keyword: 강화학습 ] (29)

A Reinforcement Learning-Based Path Optimization for Autonomous Underwater Vehicle Mission Execution in Dynamic Marine Environments

Hyojun Ahn, Shincheon Ahn, Emily Jimin Roh, Ilseok Song, Jooeun Kwon, Sei Kwon, Youngdae Kim, Soohyun Park, Joongheon Kim

http://doi.org/10.5626/JOK.2025.52.6.519

This paper proposes an AOPF (Autonomous Underwater Vehicle Optimal Path Finder) algorithm for AUV mission execution and path optimization in dynamic marine environments. The proposed algorithm utilizes a PPO (Proximal Policy Optimization)-based reinforcement learning method in combination with a 3-degree-of-freedom (DOF) model, enabling a balanced approach between obstacle avoidance and effective target approach. This method is designed to achieve faster convergence and higher mission performance compared to the DDPG (Deep Deterministic Policy Gradient) algorithm. Experimental results demonstrated that the algorithm enabled stable learning and generated efficient paths. Furthermore, the proposed approach shows strong potential for real-world deployment in complex marine environments. It offers scalability to multi-AUV cooperative control scenarios.

Reinforcement Learning with the Law of Diminishing Marginal Utility: Efficient and Equitable Resource Allocation in Multi-Agent Systems

Yunsu Lee, Byoung-Tak Zhang

http://doi.org/10.5626/JOK.2025.52.5.374

The law of diminishing marginal utility is an economic theory stating that as additional units of a good are consumed, the utility gained from each additional unit is decreased. We incorporated the law of diminishing marginal utility into multi-agent reinforcement learning for resource allocation, demonstrating that optimal distribution could emerge without direct communication among agents. This approach aligns with market principles, where individual self-Ainterested actions can lead to maximization of total utility. Experimental results in a grid-world environment showed that when two agents competed for two resources, applying the law of diminishing marginal utility led to a more equitable and Pareto-optimal allocation of resources.

A Similarity-Based Multi-Knowledge Transfer Algorithm for Enhancing Learning Efficiency of Reinforcement Learning-Based Autonomous Agent

Yeryeong Cho, Soohyun Park, Joongheon Kim

http://doi.org/10.5626/JOK.2025.52.4.310

This paper proposed a similarity-based multi-knowledge transfer algorithm (SMTRL) to enhance the learning efficiency of autonomous agents in reinforcement learning. SMTRL can calculates the similarity between pre-trained models and the current model and dynamically adjust the knowledge transfer ratio based on this similarity to maximize learning efficiency. In complex environments, autonomous agents face significant challenges when learning independently, as this process can be time-consuming and inefficient, making knowledge transfer essential. However, differences between pre-trained models and actual environments can result in negative transfer, leading to diminished learning performance. To tackle this issue, SMTRL dynamically can adjusts the ratio of knowledge transfer from highly similar pre-trained models, thereby accelerating learning stability. Furthermore, experimental results demonstrated that the proposed algorithm outperformed traditional reinforcement learning and traditional knowledge transfer learning in terms of convergence speed. Therefore, this paper introduces a novel approach to efficient knowledge transfer for autonomous agents and discusses its applicability to complex mobility environments and directions for future research.

Research on Action Selection Techniques and Dynamic Dense Reward Application for Efficient Exploration in Policy-Based Reinforcement Learning

Junhyuk Kim, Junoh Kim, Kyungeun Cho

http://doi.org/10.5626/JOK.2025.52.4.293

Nowadays, reinforcement learning is being studied and utilized in various fields, including autonomous driving, robotics, and gaming. The goal of reinforcement learning is to find the optimal policy for an agent to interact with its environment. Depending on the environment and the specific problem, either a policy-based algorithm or a value-based algorithm is selected for use. Policy-based algorithms can effectively learn in continuous and high-dimensional action spaces, but they face challenges such as the influence of learning rate parameters on the learning process and increased difficulty in converging to an optimized policy in complex environments. To address these issues, this paper proposes a behavior selection technique and a dynamic dense reward design based on a simulated annealing algorithm. The proposed method is applied to two different environments, and experimental results show that the policy-based reinforcement learning algorithms utilizing this method outperform the standard reinforcement learning algorithms.

CraftGround: A Flexible Reinforcement Learning Environment Based on the Latest Minecraft

Hyeonseo Yang, Minsu Lee, Byoung-Tak Zhang

http://doi.org/10.5626/JOK.2025.52.3.189

This paper presents CraftGround, an innovative reinforcement learning environment based on the latest version of Minecraft (1.21). CraftGround provides flexible experimental setups and supports reinforcement learning in complex 3D environments, offering a variety of observational data, including visual information, audio cues, biome-specific contexts, and in-game statistics. Our experiments evaluated several agents, such as VPT (Video PreTraining), PPO, RecurrentPPO, and DQN, across various tasks, including tree chopping, evading hostile monsters, and fishing. The results indicated that VPT performed exceptionally well due to its pretraining, achieving higher performance and efficiency in structured tasks. In contrast, online learning algorithms like PPO and RecurrentPPO demonstrated a greater ability to adapt to environmental changes, showing improvement over time. These findings highlight CraftGround's potential to advance research on adaptive agent behaviors in dynamic 3D simulations.

Improving Retrieval Models through Reinforcement Learning with Feedback

Min-Taek Seo, Joon-Ho Lim, Tae-Hyeong Kim, Hwi-Jung Ryu, Du-Seong Chang, Seung-Hoon Na

http://doi.org/10.5626/JOK.2024.51.10.900

Open-domain question answering involves the process of retrieving clues through search to solve problems. In such tasks, it is crucial that the search model provides appropriate clues, as this directly impacts the final performance. Moreover, information retrieval is an important function frequently used in everyday life. This paper recognizes the significance of these challenges and aims to improve performances of search models. Just as the recent trend involves adjusting outputs in decoder models using Reinforcement Learning from Human Feedback (RLHF), this study seeks to enhance search models through the use of reinforcement learning. Specifically, we defined two rewards: the loss of the answer model and the similarity between the retrieved documents and the correct document. Based on these, we applied reinforcement learning to adjust the probability score of the top-ranked document in the search model's document probability distribution. Through this approach, we confirmed the generality of the reinforcement learning method and its potential for further performance improvements.

Development of Personalized Autonomous Driving Agents Using Imitation Learning

Ji Hye Ok, Wookyoung Kim, Honguk Woo

http://doi.org/10.5626/JOK.2024.51.6.558

The rise of Autonomous Vehicles (AVs) has brought humans and robots together on the same roads. As AVs integrate into the existing road system, it is crucial for them to establish a connection with human drivers and operate in a way that is convenient to humans. Moreover, as the desire for personalized autonomous driving experiences frows, there is a need to meet the demand for ‘personalized’ AVs. This paper examines imitation learning methods that imitate the driving behaviors of rule-based agents. It also proposes a controlled multi-objective imitation learning approach to generate diverse driving policies based on given data. Additionally, the study assesses the derived policies in various scenarios using the Carla simulator.

Improving Portfolio Optimization Performance based on Reinforcement Learning through Episode Randomization and Action Noise

Saehyeong Woo, Doguk Kim

http://doi.org/10.5626/JOK.2024.51.4.370

Portfolio optimization is essential to reduce investment management risk and maximize returns. With the rapid development of artificial intelligence technology in recent years, research is being conducted to utilize it in various fields, and in particular, investigation on the application of reinforcement learning in the financial sector. However, most studies do not address the problem of agent overfitting due to iterative training on historical financial data. In this study, we propose a technique to mitigate overfitting through episode randomization and action noise in reinforcement learning-based portfolio optimization. The proposed technique randomizes the duration of the training data in each episode to experience different market conditions, thus promoting the effectiveness of data augmentation and exploration by leveraging action noise techniques to allow the agent to respond to specific situations. Experimental results show that the proposed technique improves the performance of the existing reinforcement learning agent, and comparative experiments confirm that both techniques contribute to performance improvement under various conditions.

A Reinforcement Learning based Adaptive Container Scheduling Back-off Scheme for Reducing Cold Starts in FaaS Platforms

Sungho Kang, Junyeol Yu, Euiseong Seo

http://doi.org/10.5626/JOK.2024.51.3.191

Function as a Service(FaaS) is a cloud computing service model that virtualizes computing resources and provides them in units of functions. As it enables flexible and easy service deployment, its use is rapidly growing in a cloudnative architecture. However, the initial execution of a function requested by a user in a FaaS platform involves several initialization steps, and this initialization overhead, that is, cold start, delays function execution. Our proposal is that when there is a request to execute the same function as the running function, waiting rather than immediately processing the request can reduce the occurrence of a cold start. In this paper, we propose a FaaS request waiting policy model based on reinforcement learning that pursues the best choice between sending and waiting for a function execution request. As a result of the comparison experiment with Openwhisk, the frequency of cold start reduced by up to 57% and the average execution time of the function reduced by up to 81%.

UnityPGTA: A Unity Platformer Game Testing Automation Tool Using Reinforcement Learning

Se-chan Park, Deock Yeop Kim, Woo Jin Lee

http://doi.org/10.5626/JOK.2024.51.2.149

The cost of game testing in the video game industry is significant, accounting for nearly half of the expenses. Research efforts are underway to automate testing processes to reduce testing costs. However, existing research on test automation often involves manual tasks such as script writing, which is costly and labor-intensive. Additionally, implementations using virtual environments like VGDL and GVG-AI pose challenges when applied to real game testing. In this paper, we propose a tool for automating game testing with the aim of system fault detection, focusing on a Unity platformer game. The proposed tool is based on a commercial game engine, autonomously analyzing the game without human intervention to establish an automated game testing environment. We compare and analyze the error detection results of the proposed tool with a random baseline model using open-source games, demonstrating the tool"s effectiveness in performing automated game analysis and testing environment setup, ultimately reducing testing costs and improving quality and stability.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr