Digital Library[ Search Result ]
Korean Coreference Resolution through BERT Embedding at the Morpheme Level
Kyeongbin Jo, Yohan Choi, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.6.495
Coreference resolution is a natural language processing task that identifies mentions that are subject to coreference resolution in a given document, and finds and groups the mentions that refer to the same entity. Korean coreference resolution has been mainly studied in an end-to-end method, and for this purpose, all spans must be considered as potential mentions, so memory usage and time complexity increase. In this paper, a word-level coreference resolution model that performs coreference resolution by mapping sub-tokens back to word units was applied to Korean, and the token expression of the word-level coreference resolution model is calculated through CorefBERT to reflect Korean characteristics. After that, entity name and dependency parsing features were added. As a result of the experiment, in the ETRI Q&A domain evaluation set, F1 was 70.68%, showing a 1.67% performance improvement compared to the existing end-to-end cross-reference solving model, Memory usage improved by 2.4 times, and speed increased by 1.82 times.
Korean End-to-End Coreference Resolution with BERT for Long Document
Kyeongbin Jo, Youngjun Jung, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.1.32
Coreference resolution is a natural language processing task that identifies mentions that are coreference resolution targets, identifies mentions that refer to the same entity, and groups them together. Recently, in coreference resolution, an end-to-end model using BERT to derive the context expression of a word while simultaneously performing mention detection and coreference resolution has been mainly studied. However, BERT has the problem of reduced performance for long documents due to its input length limit. Therefore, in this paper, the following model is proposed. First, a lengthy document is split into tokens of 512 or fewer tokens, extracted from an existing local BERT to obtain the primary contextual expression of a word, and then recombined to compute and add a globalpositional embedding value for the original document. Finally, a coreference resolution was performed by computing the entire context expression with the Global BERT layer. As a result of the experiment, the model proposed in this paper showed similar performance to the existing model, while the GPU memory usage decreased by 1.4 times and the speed improved by 2.1 times.
Long-distant Coreference Resolution by Clustering-extended BERT for Korean and English Document
Cheolhun Heo, Kuntae Kim, Key-sun Choi
http://doi.org/10.5626/JOK.2020.47.12.1126
Coreference resolution is a natural language processing task of identifying all mentions that refer to the same denotation in the given natural language document. It contributes to improving the performance of various natural language processing tasks by resolving the co-referents caused by linguistically replaceable realizations by using the referencible forms such as pronouns, indicative adjectives, and abbreviations but preventing co-referencing of homonyms (i.e., same form but different meaning). We propose a novel approach to coreference resolution particulary to identify the long-distant co-referents by applying long-distance clustering for surface forms under a BERT-based model performing well in English. We compare the performance of the proposed model and other models over the Korean and English datasets. Results demonstrated that our model has a better grasp of contextual elements compared to the other models.
Korean End-to-end Neural Coreference Resolution with BERT
Kihun Kim, Cheonum Park, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2020.47.10.942
Coreference resolution is a natural language task that identifies a mention that is a coreference resolution in a given document and finds and clusters the mention of the same entity. In the Korean coreference resolution, a method using the end-to-end model that simultaneously performs mention detection and mention clustering, and another method pointer network using the encoder-decoder model were used. The BERT model released by Google has been applied to natural language processing tasks and has demonstrated many performance improvements. In this paper, we propose a Korean end-to-end neural coreference resolution with BERT. This model uses the KorBERT pre-trained with the Korean data and applies dependency parsing results and the named entity recognition feature to reflect the structural and semantic characteristics of the Korean language. Experimental results show that the performance of the CoNLL F1 (DEV) 71.00% and (TEST) 69.01% in the ETRI Q & A domain data set was higher than the previous studies.
Coreference Resolution using Multi-resolution Pointer Networks
Cheoneum Park, Changki Lee, Hyunki Kim
http://doi.org/10.5626/JOK.2019.46.4.334
Multi-resolution RNN is a method of modeling parallel sequences as RNNs. Coreference resolution is a natural language processing task in which several words representing different entities present in a document are defined as one cluster and can be solved by a pointer network. The encoder input sequence of the coreference resolution becomes all the morphemes of the document using the pointer network, and the decoder input sequence becomes all the nouns present in the document. In this paper, we propose three multi-resolution pointer network models that encode all morphemes and noun lists of a document in parallel and perform decoding by using both encoded hidden states in a decoder. We have solved the coreference resolution based on the proposed models. Experimental results show that Multi-resolution1 of the proposed model has 71.44% CoNLL F1, 70.52% CoNLL F1 of Multi-resolution2 and 70.59% CoNLL F1 of Multi-resolution3.
Mention Detection with Pointer Networks
http://doi.org/10.5626/JOK.2017.44.8.774
Mention detection systems use nouns or noun phrases as a head and construct a chunk of text that defines any meaning, including a modifier. The term “mention detection” relates to the extraction of mentions in a document. In the mentions, a coreference resolution pertains to finding out if various mentions have the same meaning to each other. A pointer network is a model based on a recurrent neural network (RNN) encoder-decoder, and outputs a list of elements that correspond to input sequence. In this paper, we propose the use of mention detection using pointer networks. Our proposed model can solve the problem of overlapped mention detection, an issue that could not be solved by sequence labeling when applying the pointer network to the mention detection. As a result of this experiment, performance of the proposed mention detection model showed an F1 of 80.07%, a 7.65%p higher than rule-based mention detection; a co-reference resolution performance using this mention detection model showed a CoNLL F1 of 52.67% (mention boundary), and a CoNLL F1 of 60.11% (head boundary) that is high, 7.68%p, or 1.5%p more than coreference resolution using rule-based mention detection.
Coreference Resolution for Korean Pronouns using Pointer Networks
Pointer Networks is a deep-learning model for the attention-mechanism outputting of a list of elements that corresponds to the input sequence and is based on a recurrent neural network (RNN). The coreference resolution for pronouns is the natural language processing (NLP) task that defines a single entity to find the antecedents that correspond to the pronouns in a document. In this paper, a pronoun coreference-resolution method that finds the relation between the antecedents and the pronouns using the Pointer Networks is proposed; furthermore, the input methods of the Pointer Networks-that is, the chaining order between the words in an entity-are proposed. From among the methods that are proposed in this paper, the chaining order Coref2 showed the best performance with an F1 of MUC 81.40 %. The method showed performances that are 31.00 % and 19.28 % better than the rule-based (50.40 %) and statistics-based (62.12 %) coreference resolution systems, respectively, for the Korean pronouns.
Korean Coreference Resolution using the Multi-pass Sieve
Cheon-Eum Park, Kyoung-Ho Choi, Changki Lee
Coreference resolution finds all expressions that refer to the same entity in a document. Coreference resolution is important for information extraction, document classification, document summary, and question answering system. In this paper, we adapt Stanford`s Multi-pass sieve system, the one of the best model of rule based coreference resolution to Korean. In this paper, all noun phrases are considered to mentions. Also, unlike Stanford`s Multi-pass sieve system, the dependency parse tree is used for mention extraction, a Korean acronym list is built ‘dynamically’. In addition, we propose a method that calculates weights by applying transitive properties of centers of the centering theory when refer Korean pronoun. The experiments show that our system obtains MUC 59.0%, B3 59.5%, Ceafe 63.5%, and CoNLL(Mean) 60.7%.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr