Search : [ author: 이상구 ] (4)

Improving Conversational Query Rewriting through Generative Coreference Resolution

Heejae Yu, Sang-goo Lee

http://doi.org/10.5626/JOK.2024.51.11.1028

Conversational search enables retrieval of relevant passages for a current turn query by understanding the contextual meaning in a multi-turn dialogue. In conversational search, Conversational Query Reformulation enables utilization of off-the-shelf retrievers by transforming context-dependent queries into self-contained forms. Existing approaches primarily fine-tune pre-trained language models using human-rewritten queries as labels or prompt large language models (LLMs) to address ambiguity inherent in the current turn query, such as ellipsis and coreference. However, our preliminary experimental results indicate that existing models continue to face challenges with coreference resolution. This paper addresses two main research questions: 1) Can a model be trained to distinguish anaphoric mentions that need further clarification? And 2) Can a model be trained to clarify detected coreference mentions into more specified phrases? To investigate these questions, we devised two main components - the detector and the decoder. Our experiments demonstrated that our fine-tuned detector could identify diverse anaphoric phrases within questions, while our fine-tuned decoder could successfully clarify them, ultimately enabling effective coreference resolution for query rewriting. Therefore, we present a novel paradigm, Coreference Aware Conversational Query Reformulation, utilizing these main components.

Fast Personalized PageRank Computation on Very Large Graphs

Sungchan Park, Youna Kim, Sang-goo Lee

http://doi.org/10.5626/JOK.2022.49.10.859

Computation of Personalized PageRank (PPR) in graphs is an important function that is widely utilized in myriad application domains such as search, recommendation, and knowledge discovery. As the computation of PPR is an expensive process, a good number of innovative and efficient algorithms for computing PPR have been developed. However, efficient computation of PPR within very large graphs with over millions of nodes is still an open problem. Moreover, previously proposed algorithms cannot handle updates efficiently, thereby severely limiting their capability of handling dynamic graphs. In this paper, we present a fast converging algorithm that guarantees high and controlled precision. We attempted to improve the convergence rate of the traditional Power Iteration approximation methods and fully exact methods. The results revealed that the proposed algorithm is at least 20 times faster than the Power Iteration and outperforms other state-of-the-art algorithms in terms of computation time.

A Product Review Summarization Considering Additional Information

Jaeyeun Yoon, Ig-hoon Lee, Sang-goo Lee

http://doi.org/10.5626/JOK.2020.47.2.180

Automatic document summarization is a task that generates the document in a suitable form from an existing document for a certain user or occasion. As use of the Internet increases, the various data including texts are exploding and the value of document summarization technology is growing. While the latest deep learning-based models show reliable performance in document summarization, the problem is that performance depends on the quantity and quality of the training data. For example, it is difficult to generate reliable summarization with existing models from the product review text of online shopping malls because of typing errors and grammatically wrong sentences. Online malls and portal web services are struggling to solve this problem. Thus, to generate an appropriate document summary in poor condition relative to quality and quantity of the product review learning data, this study proposes a model that generates product review summaries with additional information. We found through experiments that this model showed improved performances in terms of relevance and readability than the existing model for product review summaries.

Korean Morphological Analyzer for Neologism and Spacing Error based on Sequence-to-Sequence

Byeongseo Choe, Ig-hoon Lee, Sang-goo Lee

http://doi.org/10.5626/JOK.2020.47.1.70

In order to analyze Internet text data from Korean internet communities, it is necessary to accurately perform morphological analysis even in a sentence with a spacing error and adequate restoration of original form for an out-of-vocabulary input. However, the existing Korean morphological analyzer often uses dictionaries and complicate preprocessing for the restoration. In this paper, we propose a Korean morphological analyzer model which is based on the sequence-to-sequence model. The model can effectively handle the spacing problem and OOV problem. In addition, the model uses syllable bigram and grapheme as additional input features. The proposed model does not use a dictionary and minimizes rule-based preprocessing. The proposed model showed better performance than other morphological analyzers without a dictionary in the experiment for Sejong corpus. Also, better performance was evident for the dataset without space and sample dataset collected from Internet.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr