TY - JOUR T1 - Safety Evaluation of Large Language Models Using Risky Humor AU - Kang, JoEun AU - Jung, GaYeon AU - Kim, HanSaem JO - Journal of KIISE, JOK PY - 2025 DA - 2025/1/14 DO - 10.5626/JOK.2025.52.6.508 KW - large language model KW - AI ethics KW - AI safety KW - risk and safety evaluation KW - Korean humor AB - This study evaluated the safety of generative language models through the lens of Korean humor that included socially risky content. Recently, concerns regarding the misuse of generative language models have intensified, as these models can generate plausible responses to inputs and prompts that may deviate from social norms, ethical standards, and common sense. In this context, this study aimed to identify and mitigate potential risks associated with artificial intelligence (AI) by analyzing risks inherent in humor and developing a benchmark for their evaluation. The socially risky humor examined in this study differs from conventional harmful content, as the playful and entertaining nature of humor can easily obscure unethical or risky elements. This characteristic closely resembles subtle and indirect input patterns, which are critical in AI safety assessments. The experiment involved binary classification of generated results from input requests related to unethical humor as safe or unsafe. Subsequently, the safety level of the experimental model was evaluated across four levels. Consequently, this study evaluated the safety of prominent generative language models, including GPT-4o, Gemini, and Claude. Findings indicated that these models demonstrated vulnerabilities in ethical judgment when faced with risky humor.