Digital Library[ Search Result ]
A Comparative Study on the Performance of Named Entity Recognition in Materials and Chemistry Fields through Multiple Embedding Combination Based on a Pre-trained Neural Network Language Model
Myunghoon Lee, Hyeonho Shin, Hong-Woo Chun, Jae-Min Lee, Taehyun Ha, Sung-Pil Choi
http://doi.org/10.5626/JOK.2021.48.6.696
Recently, with the rapid development of materials and chemistry fields, the academic literature has increased exponentially. Accordingly, studies are being conducted to extract meaningful information from the existing accumulated data, and Named Entity Recognition (NER) is being utilized as one of the methodologies. NER in materials and chemistry fields is a task of extracting standardized entities such as materials, material property information, and experimental conditions from academic literature and classifying types of the entities. In this paper, we studied the NER in materials and chemistry fields using a combination of embedding and a Bi-direction LSTM-CRF model with an existing published language model without pre-training a neural network language model. As a result, we found the best performing embedding combinations and analyzed their performance. Additionally, the pre-trained language model was used as a NER model to compare performance through fine-tuning. The process showed that the use of a public pre-trained language model for embedding combinations could derive meaningful results in NER in the materials and chemistry fields.
CRF based Named Entity Recognition Using a Korean Lexical Semantic Network
http://doi.org/10.5626/JOK.2021.48.5.556
Named Entity Recognition(NER) is the process of classifying words with unique meanings that often appear as OOV within sentence into categories of predefined entities. Recently, many researches have been conducted using deep learning to synthesize the words’ embedding via Convolution Neural Network(CNN), Long Short-Term Memory(LSTM) networks or training language models. However, models using these deep learning network or language model require high performance computing power and have low practicality due to slow speed. For practicality, this paper proposes Conditional Random Field(CRF) based NER model using Korean lexical network(UWordMap). By using hypernym, dependence and case particle information as training feature, our model showed 90.54% point of accuracy, 1,461 sentences/sec processing speed.
Automatic Data Augmentation for Named Entity Recognition using a Text Infilling technique and Generative Adversarial Network
Cheon-Young Park, Kong Joo Lee
http://doi.org/10.5626/JOK.2021.48.4.462
Deep neural networks have been widely used in many NLP applications, However, successful construction of deep networks requires a large training corpus. Collecting a large training corpus that contains label information such as named entities is difficult and leads to a lack of data. Automatic data augmentation represents a solution to data scarcity problem. In this paper, we propose an automatic data augmentation technique for named entity recognition(NER) based on a text infilling model and generative adversarial networks. A text infilling model is used to fill missing components of a template to generate complete sentences. Using the text infilling model, we can fill in the blank of the template to generate complete and semantically coherence text with accurately named entity labels. Sentences generated by our model show lower perplexity and higher diversity than those generated in the previous approaches. Also text augmentation based on our model can improve the performance of a conventional NER system.
A BIT Named Entity Format Suitable for Low Resource Environments
Ho Yoon, Chang-Hyun Kim, Min-ah Cheon, Ho-min Park, Young Namgoong, Min-seok Choi, Jae-kyun Kim, Jae-Hoon Kim
http://doi.org/10.5626/JOK.2021.48.3.293
Named entity recognition (NER) seeks to locate and classify named entities into predefined categories such as person names, organization, location, and others. Most name entities consist of more than one word and so the multitude of annotated corpora for NER are encoded by the BIO (short for Beginning, Inside, and Outside) format: A “B-” prefix before a tag indicates that the tag is the beginning of a named entity, and an “I-” prefix before a tag indicates that the tag is inside the named entity. An “O” tag indicates that a word belongs to no named entity. In this format, words with “O” tags in the corpora amount to more than about 90% of the words and thus, can cause two problems: the high perplexity of words with “O” tags and imbalance learning. In this paper, we propose a novel format to represent the NER corpus called the BIT format, which uses “T (short for POS Tags)” tags in place of “O” tags. Experiments have shown that the BIT format outperforms the BIO format when the meaning projection of the word representation is unreliable, namely, when word embedding is trained through a relatively small number of words.
Joint Model of Morphological Analysis and Named Entity Recognition Using Shared Layer
Hongjin Kim, Seongsik Park, Harksoo Kim
http://doi.org/10.5626/JOK.2021.48.2.167
Named entity recognition is a natural language processing technology that finds words with unique meanings such as human names, place names, organization names, dates, and time in sentences and attaches them. Morphological analysis in Korean is generally divided into morphological analysis and part-of-speech tagging. In general, named entity recognition and morphological analysis studies conducted in independently. However, in this architecture, the error of morphological analysis propagates to named entity recognition. To alleviate the error propagation problem, we propose an integrated model using Label Attention Network (LAN). As a result of the experiment, our model shows better performance than the single model of named entity recognition and morphological analysis. Our model also demonstrates better performance than previous integration models.
Biomedical Named Entity Recognition using Multi-head Attention with Highway Network
Minsoo Cho, Jinuk Park, Jihwan Ha, Chanhee Park, Sanghyun Park
http://doi.org/10.5626/JOK.2019.46.6.544
Biomedical named entity recognition(BioNER) is the process of extracting biomedical entities such as diseases, genes, proteins, and chemicals from biomedical literature. BioNER is an indispensable technique for the extraction of meaningful data from biomedical domains. The proposed model employs deep learning based Bi-LSTM-CRF model which eliminates the need for hand-crafted feature engineering. Additionally, the model contains multi-head attention to capture the relevance between words, which is used when predicting the label of each input token. Also, in the input embedding layer, the model integrates character-level embedding with word-level embedding and applies the combined word embedding into the highway network to adaptively carry each embedding to the input of the Bi-LSTM model. Two English biomedical benchmark datasets were employed in the present research to evaluate the level of performance. The proposed model resulted in higher f1-score compared to other previously studied models. The results demonstrate the effectiveness of the proposed methods in biomedical named entity recognition study.
A Named-Entity Recognition Training Method Using Bagging-Based Bootstrapping
Yujin Jeong, Juae Kim, Youngjoong Ko, Jungyun Seo
http://doi.org/10.5626/JOK.2018.45.8.825
Most previous named-entity(NE) recognition studies have been based on supervised learning methods. Although supervised learning-based NE recognition has performed well, it requires a lot of time and cost to construct a large labeled corpus. In this paper, we propose an NE recognition training method that uses an automatically generated labeled corpus to solve this problem. Since the proposed method uses a large machine-labeled corpus, it can greatly reduce the time and cost needed to generate a labeled corpus manually. In addition, a bagging-based bootstrapping technique is applied to our method in order to correct errors from the machine-labeled data. As a result, experimental results show that the proposed method achieves the highest F1 score of 70.76% by adding the bagging-based bootstrapping technique, which is 5.17%p higher than that of the baseline system.
Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs
Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.
Korean Named Entity Recognition and Classification using Word Embedding Features
Named Entity Recognition and Classification (NERC) is a task for recognition and classification of named entities such as a person"s name, location, and organization. There have been various studies carried out on Korean NERC, but they have some problems, for example lacking some features as compared with English NERC. In this paper, we propose a method that uses word embedding as features for Korean NERC. We generate a word vector using a Continuous-Bag-of- Word (CBOW) model from POS-tagged corpus, and a word cluster symbol using a K-means algorithm from a word vector. We use the word vector and word cluster symbol as word embedding features in Conditional Random Fields (CRFs). From the result of the experiment, performance improved 1.17%, 0.61% and 1.19% respectively for TV domain, Sports domain and IT domain over the baseline system. Showing better performance than other NERC systems, we demonstrate the effectiveness and efficiency of the proposed method.
Named Entity Recognition Using Distant Supervision and Active Bagging
Seong-hee Lee, Yeong-kil Song, Hark-soo Kim
Named entity recognition is a process which extracts named entities in sentences and determines categories of the named entities. Previous studies on named entity recognition have primarily been used for supervised learning. For supervised learning, a large training corpus manually annotated with named entity categories is needed, and it is a time-consuming and labor-intensive job to manually construct a large training corpus. We propose a semi-supervised learning method to minimize the cost needed for training corpus construction and to rapidly enhance the performance of named entity recognition. The proposed method uses distance supervision for the construction of the initial training corpus. It can then effectively remove noise sentences in the initial training corpus through the use of an active bagging method, an ensemble method of bagging and active learning. In the experiments, the proposed method improved the F1-score of named entity recognition from 67.36% to 76.42% after active bagging for 15 times.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr