Search : [ author: 정상근 ] (4)

PatentQ&A: Proposal of Patent Q&A Neural Search System Using Transformer Model

Yoonmin Lee, Taewook Hwang, Sangkeun Jung, Hyein Seo, Yoonhyung Roh

http://doi.org/10.5626/JOK.2023.50.4.306

Recent neural network search has enabled semantic search beyond search based on statistical methods, and finds accurate search results even with typos. This paper proposes a neural network-based patentQ&A search system that provides the closest answer to the user"s question intention when a general public without patent expertise searches for patent information using general terms. A patent dataset was constructed using patent customer consultation data posted on the Korean Intellectual Property Office website. Patent-KoBERT (Triplet) and Patent-KoBERT (CrossEntropy) were fine-tuned as patent datasets were used to extract similar questions to questions entered by the user and re-rank them. As a result of the experiment, values of Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) were 0.96, confirming that answers most similar to the intention of the user input were well selected.

Epoch Score: Dataset Verification using Quantitative Data Quality Assessment

Sungryeol Kim, Taewook Hwang, Sangkeun Jung, Yoonhyung Roh

http://doi.org/10.5626/JOK.2023.50.3.250

It is tough to determine whether a dataset is suitable for a model or specified field or whether there is an error. In this paper, we propose an Epoch Score that indicates the degree of difficulty of the data as a score using incorrect answer data obtained through learning several times under the same conditions but different seeds. Through this, we verified KLUE"s Topic Classification dataset, and about 0.8% performance improvement derived by correcting high-scoring data, which we judge to have errors. Epoch Score can be used for all supervised learning regardless of the data type, such as natural language or images, and the performance of the model can be inferred by the area the of the Epoch Score.

Semantic Similarity-based Intent Analysis using Pre-trained Transformer for Natural Language Understanding

Sangkeun Jung, Hyein Seo, Hyunji Kim, Taewook Hwang

http://doi.org/10.5626/JOK.2020.47.8.748

Natural language understanding (NLU) is a central technique applied to developing robot, smart messenger, and natural interface. In this study, we propose a novel similarity-based intent analysis method instead of the typical classification methods for intent analysis problems in the NLU. To accomplish this, the neural network-based text and semantic frame readers are introduced to learn semantic vectors using pairwise text-semantic frame instances. The text to vector and the semantic frame to vector projection methods using the pre-trained transformer are proposed. Then, we propose a method of attaching the intention tag of the nearest training sentence to the query sentence by measuring the semantic vector distances in the vector space. Four experiments on the natural language learning suggest that the proposed method demonstrates superior performance compared to the existing intention analysis techniques. These four experiments use natural language corpora in Korean and English. The two experiments in Korean are weather and navigation language corpora, and the two English-based experiments involve air travel information systems and voice platform language corpora.

Effective Korean Token Units for Sequence Encoding in Deep Learning

Sangkeun Jung

http://doi.org/10.5626/JOK.2018.45.5.457

Deep learning has emerged as a new area of machine-learning research and has been successfully applied to natural language processing, such as machine translation and sentence classification. In this work, we use effective Korean input token units to encode Korean sentences for classification problems, such as topic detection. Recurrent and convolutional neural networks for Korean sentence encoding are briefly reviewed, and various Korean input tokens units, including character, morpheme-tag, morpheme, word, subword, syllable window, and hybrids of morpheme and character methods are explored. Extensive experiments on sentimental analysis, topic detection, and intention understanding tasks are conducted to find effective input token units.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr