Enhancing Stability and Performance of ReinforcementLearning Algorithms through Q Function-based LyapunovStability Constraints

Hyung Jin Kim; Jung Woo Lee

Enhancing Stability and Performance of ReinforcementLearning Algorithms through Q Function-based LyapunovStability Constraints

Hyung Jin Kim

Jung Woo Lee

Vol. 53, No. 2, pp. 109-116, Feb. 2026

10.5626/JOK.2026.53.2.109

Reinforcement Learning

Soft Actor-Critic

TD3

Lyapunov stability

PDF

Abstract

We introduce a lightweight Q-based stability regularizer for Actor–Critic methods, specifically Soft Actor-Critic (SAC) and TD3. Inspired by Lyapunov-style intuition—though without formal guarantees—this regularizer incorporates a one-sided hinge penalty into the policy loss. This penalty discourages updates that reduce the critic value at on-policy states. The training loop, which includes the replay buffer, target networks, and delayed policy updates, remains unchanged; the only additional computation required is extra forward passes through the target critic(s) during policy updates (one for SAC and two for TD3), resulting in a modest wall-clock time increase of approximately 1-2% and negligible memory overhead in our experiments. We evaluate our approach on MuJoCo tasks, including InvertedPendulum, InvertedDoublePendulum, and HumanoidStandup, using an identical environment-step budget. Performance is assessed along two dimensions: MeanRegret@K (lower is better) for early learning speed and MeanCost@K (lower is better) for safety, with an optional balanced composite measure of 0.5. Across tasks, the regularizer generally improves the trade-off between speed and safety, with particularly consistent enhancements when applied to TD3. Results for the HumanoidStandup task show higher variance due to sensitivity to contact; we present aggregate trends in the main text and detailed distributions in the appendix. Overall, this method should be considered a practical regulation mechanism that complements constraint-based approaches such as CPO and PPO-Lag. Limitations include sensitivity to the penalty weight  and reliance on the accuracy of critic estimations.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

H. J. Kim and J. W. Lee, "Enhancing Stability and Performance of ReinforcementLearning Algorithms through Q Function-based LyapunovStability Constraints," Journal of KIISE, JOK, vol. 53, no. 2, pp. 109-116, 2026. DOI: 10.5626/JOK.2026.53.2.109.

[ACM Style]

Hyung Jin Kim and Jung Woo Lee. 2026. Enhancing Stability and Performance of ReinforcementLearning Algorithms through Q Function-based LyapunovStability Constraints. Journal of KIISE, JOK, 53, 2, (2026), 109-116. DOI: 10.5626/JOK.2026.53.2.109.

[KCI Style]

김형진, 이정우, "Q 함수 기반 Lyapunov 안정성 제약을 통한강화 학습 알고리즘의 안정성과 성능 향상," 한국정보과학회 논문지, 제53권, 제2호, 109~116쪽, 2026. DOI: 10.5626/JOK.2026.53.2.109.

[Endnote/Zotero/Mendeley (RIS)] Download

[BibTeX] Download

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr