Digital Library[ Search Result ]
Safety Evaluation of Large Language Models Using Risky Humor
JoEun Kang, GaYeon Jung, HanSaem Kim
http://doi.org/10.5626/JOK.2025.52.6.508
This study evaluated the safety of generative language models through the lens of Korean humor that included socially risky content. Recently, concerns regarding the misuse of generative language models have intensified, as these models can generate plausible responses to inputs and prompts that may deviate from social norms, ethical standards, and common sense. In this context, this study aimed to identify and mitigate potential risks associated with artificial intelligence (AI) by analyzing risks inherent in humor and developing a benchmark for their evaluation. The socially risky humor examined in this study differs from conventional harmful content, as the playful and entertaining nature of humor can easily obscure unethical or risky elements. This characteristic closely resembles subtle and indirect input patterns, which are critical in AI safety assessments. The experiment involved binary classification of generated results from input requests related to unethical humor as safe or unsafe. Subsequently, the safety level of the experimental model was evaluated across four levels. Consequently, this study evaluated the safety of prominent generative language models, including GPT-4o, Gemini, and Claude. Findings indicated that these models demonstrated vulnerabilities in ethical judgment when faced with risky humor.
Voice Phishing Detection Scheme Using a GPT-3.5-based Large Language Model
http://doi.org/10.5626/JOK.2024.51.1.67
In this paper, we introduce a novel approach for voice phishing call detection, using text-davinci-003, which is a recently updated model from the generative pre-trained transformer (GPT) -3.5 language model series. To achieve this, we devised a prompt to let the language model respond with an integer ranging from 0 to 10, which indicates the likelihood that a given conversation is a voice phishing attempt. For prompt tuning, hyperparameter adjustment, and performance validation,we use a total of 105 actual Korean voice phishing transcripts and 704 transcripts from various topics of general conversations as our dataset. The proposed scheme includes a function to send voice phishing alarm during a call and a function to finally determine whether the call was a voice phishing after the call ends. Performance is evaluated in five different scenarios using different types of training and test data, demonstrating an accuracy range of 0.95 to 0.97 for the proposed technique. In particular, when tested with data from sources different from those used in training, the proposed scheme performs better than the existing bidirectional encoder representations from transformer (BERT) model-based schemes.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr