Journal of KIISE

Search : [ author: Hyeonwoo Jung ] (1)

In-Depth Evaluations of the Primality Testing Capabilities of Large Language Models: with a Focus on ChatGPT and PaLM 2

http://doi.org/10.5626/JOK.2024.51.8.699

This study aims to thoroughly evaluate the primality testing capabilities of two large language models, ChatGPT and PaLM 2. We pose two different yes/no questions for a given number, assessing whether it is prime or composite. To deem a model successful, it must correctly answer both questions while also avoiding any division errors in the generated prompt. Analyzing the inference results using a dataset consisting of 664 prime and 1458 composite numbers, we discovered a decrease in testing accuracy as the difficulty of the target numbers increased. Considering the calculation errors, both models experienced a decrease in testing accuracy, with PaLM 2 failing to conduct primality testing for all composite numbers with four high-difficulty digits. These findings highlight the potential for misleading evaluations of language models' reasoning abilities based on simple questions, emphasizing the need for comprehensive assessments.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Digital Library[ Search Result ]

In-Depth Evaluations of the Primality Testing Capabilities of Large Language Models: with a Focus on ChatGPT and PaLM 2

Search

Editorial Office