Search : [ author: Youngjin Jang ] (6)

Denoising Method for Document Grounded Conversation Datasets via Back Translation Process

Damrin Kim, Boeun Kim, Youngjin Jang, Harksoo Kim

http://doi.org/10.5626/JOK.2024.51.1.34

Document Grounded Conversation is a conversation between two or more speakers based on a given document. Document-based dialogue systems are tasks that generate responses to the last utterance of dialogue, and various document-based dialogue datasets in English have been released and actively studied. Notably, There is no active research in Korean that has been conducted due to the absence of a document-based conversation dataset in Korean. While KoDoc2dial, which translates the English document-based conversation dataset Doc2dial into Korean, was recently released, it contains the noise generated during the translation process. The noise in the KoDoc2Dial should be reduced because noise-containing datasets can negatively affect training and system consistency aspects. In this paper, we propose a method for reducing the noise contained in the KoDoc2Dial through filtering using the reverse translation process. The results of the experiments showed that the method proposed in this paper had a performance improvement of about 3.6%p in SacreBLEU compared to before filtering.

Grammar Accuracy Evaluation (GAE): Quantifiable Qualitative Evaluation of Machine Translation Models

Dojun Park, Youngjin Jang, Harksoo Kim

http://doi.org/10.5626/JOK.2022.49.7.514

Natural Language Generation (NLG) refers to the operation of expressing the calculation results of a system in human language. Since the quality of generated sentences from an NLG model cannot be fully represented using only quantitative evaluation, they are evaluated using qualitative evaluation by humans in which the meaning or grammar of a sentence is scored according to a subjective criterion. Nevertheless, the existing evaluation methods have a problem as a large score deviation occurs depending on the criteria of evaluators. In this paper, we propose Grammar Accuracy Evaluation (GAE) that can provide the specific evaluating criteria. As a result of analyzing the quality of machine translation by BLEU and GAE, it was confirmed that the BLEU score does not represent the absolute performance of machine translation models and GAE compensates for the shortcomings of BLEU with flexible evaluation of alternative synonyms and changes in sentence structure.

A Span Matrix-based Answer Candidates Detection Model used 2-Step Learning

Boeun Kim, Youngjin Jang, Harksoo Kim

http://doi.org/10.5626/JOK.2021.48.5.539

Automatic data construction refers to a technology that automatically constructs data through algorithms or deep neural networks. The automated construction system of question-answer data aimed at in this paper was mainly studied through a question generation model, which signifies a model that generates questions related to a given paragraph. Previously, paragraph and answer candidates were entered into the question generation model and related questions were generated. The answer candidates" input to the question generation model was detected through a rule-based method or a method using a deep neural network. We judged that answer detection, which is a subtask of question generation, will have a great influence on question generation. Consequently, we have proposed answer candidates detection model and 2-step learning method using Span Matrix. An experiment was conducted to find out how the questions generated through various methods of extracting answer candidates affect the question-answering system. The proposed model extracted a large number of correct answers compared to the existing model, and the noise in the learning process was supplemented by using the entity name dataset. Apparently, it was confirmed that the question-answer data generated as answer candidates extracted by the proposed model contributed the most to the performance of the question-answer system.

Deep Learning-based Text Classification Model for Poisonous Clauses Classification

Gihyeon Choi, Youngjin Jang, Harksoo Kim, Kwanwoo Kim

http://doi.org/10.5626/JOK.2020.47.11.1054

Most companies sign contracts based on the contract prior to executing the task. However, several problems can occur if the poisonous clauses are not identified before the contract is concluded. To prevent this problem, companies have an expert review the contract, but the service requires much time and money. If there is a system in which the poisonous clauses can be identified through prior review of the contract, the high cost and time incurred in reviewing the contract can be mitigated. Thus, this paper proposes a text classification model that identifies any poisonous clause in the contract by inputing each paragraph in the contract. To improve the classification performance of the proposed model, the importance of each sentence is calculated based on the relationship information between the sentence in the paragraph and the class to be classified, and classification is performed by reflecting it in each sentence. The proposed model showed the performance of the F1 score 84.51%p in experiments using actual contract data and the highest performance with the F1 score 93.64%p in experiments using the WOS-5736 dataset for the performance comparison with the existing text classification models.

Passage Re-ranking Method Based on Sentence Similarity Through Multitask Learning

Youngjin Jang, Hyeon-gu Lee, Jihyun Wang, Chunghee Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2020.47.4.416

The machine reading comprehension(MRC) system is a question answering system in which a computer understands a given passage and respond questions. Recently, with the development of the deep neural network, research on the machine reading system has been actively conducted, and the open domain machine reading system that identifies the correct answer from the results of the information retrieval(IR) model rather than the given passage is in progress. However, if the IR model fails to identify a passage comprising the correct answer, the MRC system cannot respond to the question. That is, the performance of the open domain MRC system depends on the performance of the IR model. Thus, for an open domain MRC system to record high performance, a high performance IR model must be preceded. The previous IR model has been studied through query expansion and reranking. In this paper, we propose a re-ranking method using deep neural networks. The proposed model re-ranks the retrieval results (passages) through multi-task learning-based sentence similarity, and improves the performance by approximately 8% compared to the performance of the existing IR model with experimental results of 58,980 pairs of MRC data.

Jamo Unit Convolutional Neural Network Based Automatic Classification of Frequently Asked Questions with Spelling Errors

Youngjin Jang, Harksoo Kim, Dongho Kang, Sebin Kim, Hyunki Jang

http://doi.org/10.5626/JOK.2019.46.6.563

Web and mobile users obtain the desired information using the frequently asked questions (FAQ) listed on the homepage. The FAQ system displays a query response candidate that is most similar to the input based on an information retrieval model. However, the information retrieval model depends on the index, and therefore, it is vulnerable to spelling errors in the sentence. This paper proposes a model applying the FAQ system to the sentence classifier, which minimizes the spelling errors. Using the embedded layer with jamo-based convolutional neural network, the spelling errors of the user input were reduced. The performance of the classifier was improved using class embedding and feed-forward neural network. As a result of 457 and 769 FAQ classifications, the Micro F1 score showed 81.32% p and 61.11% p performance, respectively. We used the sigmoid function to quantify the reliability of the model prediction.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr