Digital Library[ Search Result ]
Unified Methodology of Multiple POS Taggers for Large-scale Korean Linguistic GS Set Construction
Tae-Young Kim, Pum-Mo Ryu, Hansaem Kim, Hyo-Jung Oh
http://doi.org/10.5626/JOK.2020.47.6.596
In recent years, there has been national support for constructing, sharing, and spreading a large-scale Korean linguistic GS set for Korean information processing. As part of the corpus construction project, this study proposes the methodology for constructing the Korean linguistic GS set using various Korean language analysis modules developed in Korea. To build a large-scale training set, we referred to automatic tagged candidate answers from the N-modules. We then minimized manual effort by classifying the error types from the candidate responses and semi- automatically correcting the major error types. In this study, we normalized results of the morphological analysis and constructed a large-scale Korean linguistic GS set based on the unified format U-POS. As a result of this study, 348,229 sentences, a total of 9,455,930 words, were constructed as the Korean linguistic GS set. This can be practically applied later as a basic training resource for Korean information processing.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr