Safety Evaluation of Large Language Models Using Risky Humor

JoEun Kang; GaYeon Jung; HanSaem Kim

Safety Evaluation of Large Language Models Using Risky Humor

JoEun Kang

GaYeon Jung

HanSaem Kim

Vol. 52, No. 6, pp. 508-518, Jun. 2025

10.5626/JOK.2025.52.6.508

Large Language Model

AI ethics

AI safety

risk and safety evaluation

Korean humor

PDF

Abstract

This study evaluated the safety of generative language models through the lens of Korean humor that included socially risky content. Recently, concerns regarding the misuse of generative language models have intensified, as these models can generate plausible responses to inputs and prompts that may deviate from social norms, ethical standards, and common sense. In this context, this study aimed to identify and mitigate potential risks associated with artificial intelligence (AI) by analyzing risks inherent in humor and developing a benchmark for their evaluation. The socially risky humor examined in this study differs from conventional harmful content, as the playful and entertaining nature of humor can easily obscure unethical or risky elements. This characteristic closely resembles subtle and indirect input patterns, which are critical in AI safety assessments. The experiment involved binary classification of generated results from input requests related to unethical humor as safe or unsafe. Subsequently, the safety level of the experimental model was evaluated across four levels. Consequently, this study evaluated the safety of prominent generative language models, including GPT-4o, Gemini, and Claude. Findings indicated that these models demonstrated vulnerabilities in ethical judgment when faced with risky humor.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

J. Kang, G. Jung, H. Kim, "Safety Evaluation of Large Language Models Using Risky Humor," Journal of KIISE, JOK, vol. 52, no. 6, pp. 508-518, 2025. DOI: 10.5626/JOK.2025.52.6.508.

[ACM Style]

JoEun Kang, GaYeon Jung, and HanSaem Kim. 2025. Safety Evaluation of Large Language Models Using Risky Humor. Journal of KIISE, JOK, 52, 6, (2025), 508-518. DOI: 10.5626/JOK.2025.52.6.508.

[KCI Style]

강조은, 정가연, 김한샘, "비윤리적 유머를 활용한 LLM 안전성 평가," 한국정보과학회 논문지, 제52권, 제6호, 508~518쪽, 2025. DOI: 10.5626/JOK.2025.52.6.508.

[Endnote/Zotero/Mendeley (RIS)] Download

[BibTeX] Download

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr