Resolving Ambiguity in Visual Question Answering through an Iterative Clarifying QA-based Framework 


Vol. 52,  No. 9, pp. 778-786, Sep.  2025
10.5626/JOK.2025.52.9.778


PDF

  Abstract

This paper presents a three-stage framework to tackle the problem of ambiguous objects in Visual Question Answering (VQA), where the object referred to in a question is unclear due to multiple candidates in the image. The framework includes: (1) detecting whether the question is ambiguous, (2) generating clarification questions when ambiguity is detected, and (3) utilizing the Q&A history to perform the final VQA. Clarification questions are generated directly by the model, leveraging visual features without any additional training. The model iteratively refines its questions by incorporating the history of previous question-answer pairs. Experiments using the LLaVA v1.6 model demonstrate that the proposed framework enhances accuracy by 6.7% and semantic accuracy by 5.6% compared to the baseline. Moreover, the integration of ambiguity detection and an early stopping strategy reduces the inefficiencies associated with multi-turn interactions, resulting in a 44% decrease in execution time. This study offers a practical solution to the ambiguous objects problem by enabling real-time clarification without the need for additional training, ultimately leading to improved VQA accuracy.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

Y. Sung, G. Park, S. Park, "Resolving Ambiguity in Visual Question Answering through an Iterative Clarifying QA-based Framework," Journal of KIISE, JOK, vol. 52, no. 9, pp. 778-786, 2025. DOI: 10.5626/JOK.2025.52.9.778.


[ACM Style]

Yu-Jeong Sung, Gyu-Min Park, and Seong-Bae Park. 2025. Resolving Ambiguity in Visual Question Answering through an Iterative Clarifying QA-based Framework. Journal of KIISE, JOK, 52, 9, (2025), 778-786. DOI: 10.5626/JOK.2025.52.9.778.


[KCI Style]

성유정, 박규민, 박성배, "반복적 질의응답 기반 명확화 프레임워크를 통한 시각적 질의 응답 내 모호성 해소," 한국정보과학회 논문지, 제52권, 제9호, 778~786쪽, 2025. DOI: 10.5626/JOK.2025.52.9.778.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr