Safety Evaluation of Large Language Models Using Risky Humor 


Vol. 52,  No. 6, pp. 508-518, Jun.  2025
10.5626/JOK.2025.52.6.508


PDF

  Abstract

This study evaluated the safety of generative language models through the lens of Korean humor that included socially risky content. Recently, concerns regarding the misuse of generative language models have intensified, as these models can generate plausible responses to inputs and prompts that may deviate from social norms, ethical standards, and common sense. In this context, this study aimed to identify and mitigate potential risks associated with artificial intelligence (AI) by analyzing risks inherent in humor and developing a benchmark for their evaluation. The socially risky humor examined in this study differs from conventional harmful content, as the playful and entertaining nature of humor can easily obscure unethical or risky elements. This characteristic closely resembles subtle and indirect input patterns, which are critical in AI safety assessments. The experiment involved binary classification of generated results from input requests related to unethical humor as safe or unsafe. Subsequently, the safety level of the experimental model was evaluated across four levels. Consequently, this study evaluated the safety of prominent generative language models, including GPT-4o, Gemini, and Claude. Findings indicated that these models demonstrated vulnerabilities in ethical judgment when faced with risky humor.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

J. Kang, G. Jung, H. Kim, "Safety Evaluation of Large Language Models Using Risky Humor," Journal of KIISE, JOK, vol. 52, no. 6, pp. 508-518, 2025. DOI: 10.5626/JOK.2025.52.6.508.


[ACM Style]

JoEun Kang, GaYeon Jung, and HanSaem Kim. 2025. Safety Evaluation of Large Language Models Using Risky Humor. Journal of KIISE, JOK, 52, 6, (2025), 508-518. DOI: 10.5626/JOK.2025.52.6.508.


[KCI Style]

강조은, 정가연, 김한샘, "비윤리적 유머를 활용한 LLM 안전성 평가," 한국정보과학회 논문지, 제52권, 제6호, 508~518쪽, 2025. DOI: 10.5626/JOK.2025.52.6.508.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr