Journal of KIISE

Search : [ keyword: BERT ] (24)

A Deep Learning-based Two-Steps Pipeline Model for Korean Morphological Analysis and Part-of-Speech Tagging

http://doi.org/10.5626/JOK.2021.48.4.444

Recent studies on Korean morphological analysis using artificial neural networks have usually performed morpheme segmentation and part-of-speech tagging as the first step with the restoration of the original form of morphemes by using a dictionary as the postprocessing step. In this study, we have divided the morphological analysis into two steps: the original form of a morpheme is restored first by using the sequence-to-sequence model, and then morpheme segmentation and part-of-speech tagging are performed by using BERT. Pipelining these two steps showed comparable performance to other approaches, even without using a morpheme restoring dictionary that requires rules or compound tag processing.

Long-distant Coreference Resolution by Clustering-extended BERT for Korean and English Document

Cheolhun Heo, Kuntae Kim, Key-sun Choi

http://doi.org/10.5626/JOK.2020.47.12.1126

Coreference resolution is a natural language processing task of identifying all mentions that refer to the same denotation in the given natural language document. It contributes to improving the performance of various natural language processing tasks by resolving the co-referents caused by linguistically replaceable realizations by using the referencible forms such as pronouns, indicative adjectives, and abbreviations but preventing co-referencing of homonyms (i.e., same form but different meaning). We propose a novel approach to coreference resolution particulary to identify the long-distant co-referents by applying long-distance clustering for surface forms under a BERT-based model performing well in English. We compare the performance of the proposed model and other models over the Korean and English datasets. Results demonstrated that our model has a better grasp of contextual elements compared to the other models.

A Small-Scale Korean-Specific BERT Language Model

Sangah Lee, Hansol Jang, Yunmee Baik, Suzi Park, Hyopil Shin

http://doi.org/10.5626/JOK.2020.47.7.682

Recent models for the sentence embedding use huge corpus and parameters. They have massive data and large hardware and it incurs extensive time to pre-train. This tendency raises the need for a model with comparable performance while economically using training data. In this study, we proposed a Korean-specific model KR-BERT, using sub-character level to character-level Korean dictionaries and BidirectionalWordPiece Tokenizer. As a result, our KR-BERT model performs comparably and even better than other existing pre-trained models using one-tenth the size of training data from the existing models. It demonstrates that in a morphologically complex and resourceless language, using sub-character level and BidirectionalWordPiece Tokenizer captures language-specific linguistic phenomena that the Multilingual BERT model missed.

A Query Result Integrity Assurance Scheme Using an Order-preserving Encryption Scheme in the Database Outsourcing Environment

Miyoung Jang, Jae Woo Chang

http://doi.org/

Recently, research on database encryption for data protection and query result authentication methods has been performed more actively in the database outsourcing environment. Existing database encryption schemes are vulnerable to order matching and counting attack of intruders who have background knowledge of the original database domain. Existing query result integrity auditing methods suffer from the transmission overhead of verification object. To resolve these problems, we propose a group-order preserving encryption index and a query result authentication method based on the encryption index. Our group-order preserving encryption index groups the original data for data encryption and support query processing without data decryption. We generate group ids by using the Hilbert-curve so that we can protect the group information while processing a query. Finally, our periodic function based data grouping and query result authentication scheme can reduce the data size of the query result verification. Through performance evaluation, we show that our method achieves better performance than an existing bucket-based verification scheme, it is 1.6 times faster in terms of
query processing time and produces verification data that is 20 times smaller.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Digital Library[ Search Result ]

A Deep Learning-based Two-Steps Pipeline Model for Korean Morphological Analysis and Part-of-Speech Tagging

Long-distant Coreference Resolution by Clustering-extended BERT for Korean and English Document

A Small-Scale Korean-Specific BERT Language Model

A Query Result Integrity Assurance Scheme Using an Order-preserving Encryption Scheme in the Database Outsourcing Environment

Search

Editorial Office