Digital Library[ Search Result ]
Korean Coreference Resolution through BERT Embedding at the Morpheme Level
Kyeongbin Jo, Yohan Choi, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.6.495
Coreference resolution is a natural language processing task that identifies mentions that are subject to coreference resolution in a given document, and finds and groups the mentions that refer to the same entity. Korean coreference resolution has been mainly studied in an end-to-end method, and for this purpose, all spans must be considered as potential mentions, so memory usage and time complexity increase. In this paper, a word-level coreference resolution model that performs coreference resolution by mapping sub-tokens back to word units was applied to Korean, and the token expression of the word-level coreference resolution model is calculated through CorefBERT to reflect Korean characteristics. After that, entity name and dependency parsing features were added. As a result of the experiment, in the ETRI Q&A domain evaluation set, F1 was 70.68%, showing a 1.67% performance improvement compared to the existing end-to-end cross-reference solving model, Memory usage improved by 2.4 times, and speed increased by 1.82 times.
Korean End-to-End Coreference Resolution with BERT for Long Document
Kyeongbin Jo, Youngjun Jung, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.1.32
Coreference resolution is a natural language processing task that identifies mentions that are coreference resolution targets, identifies mentions that refer to the same entity, and groups them together. Recently, in coreference resolution, an end-to-end model using BERT to derive the context expression of a word while simultaneously performing mention detection and coreference resolution has been mainly studied. However, BERT has the problem of reduced performance for long documents due to its input length limit. Therefore, in this paper, the following model is proposed. First, a lengthy document is split into tokens of 512 or fewer tokens, extracted from an existing local BERT to obtain the primary contextual expression of a word, and then recombined to compute and add a globalpositional embedding value for the original document. Finally, a coreference resolution was performed by computing the entire context expression with the Global BERT layer. As a result of the experiment, the model proposed in this paper showed similar performance to the existing model, while the GPU memory usage decreased by 1.4 times and the speed improved by 2.1 times.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr