Search : [ author: Minho Kim ] (6)

Gender Classification Model Based on Colloquial Text in Korean for Author Profiling of Messenger Data

Jihye Kang, Minho Kim, Hyuk-Chul Kwon

http://doi.org/10.5626/JOK.2023.50.12.1063

With explosive social network services (SNS) growth, there has been an extensive generation of text data through messenger services. In addition, various applications such as Sentiment Analysis, Abusive text Detection, and Chatbot have been developed and provided due to the recent development of Natural Language Processing. However, there has not been an attempt to classify various characteristics of authors such as the gender and age of speakers in Korean colloquial texts. In this study, I propose a gender classification model for author profiling using Korean colloquial texts. Based on Kakao Talk data for the gender classification of the speaker, the Domain Adaptation is carried out by additionally learning ‘Nate Pan’ data to KcBERT(Korean Comments BERT) which is learned by Korean comments. Results of experimenting with a model that combines External Lexical Information showed that the performance was improved by achieving an accuracy of approximately 95%. In this study, the self-collected ‘Nate Pan’ data and the "daily conversation" data provided by the National Institute of the Korean Language were used for domain adaptation, and the ‘Korean SNS’ data of AI HUB was used for model learning and evaluation.

Rules-based Korean Dependency Parsing Using Sentence Pattern Information

Sung-Tae Kim, Minho Kim, Hyuna Kim, Hyuk-Chul Kwon

http://doi.org/10.5626/JOK.2020.47.5.488

The parser proposed in this paper is a wide range dependency parser that facilitates dependency r-elations to all the possible candidates appearing in sentences. Output a parse tree of all candidates appearing in a sentence in which neutrality can occur, and use the rules to advance the ranking. Use the agenda mechanism to form a dominance-dependency relationship with the graph analysis method and create a candidate tree from the input sentence through the four stages of the analysis process. Additionally, for the proper use of sentence pattern information corpus, we implemented rules and algorithms that overcome the limitations of previous studies and enhanced the ranking of candidate parse trees using the sentence pattern information. As well as difficulty in ranking the [noun - determiner] strengthened the ranking using sentence pattern information about qualities. As a result, the UAS (unlabeled attachment score) of the parse tree top-rank improved by 0.74%p, and the average correct ranking of the candidate tree improved by 28.1%. Additionally, the highest performance was UAS 94.02%.

Comparison of Context-Sensitive Spelling Error Correction using Embedding Techniques

Jung-Hun Lee, Minho Kim, Hyuk-Chul Kwon

http://doi.org/10.5626/JOK.2020.47.2.147

This paper focuses on the use of embedding techniques to solve problems in context-sensitive spelling correction and compare the performance of each technique. A vector of words obtained through embedding learning is used to correct the distance between the correction target word and the surrounding context word. In this paper, we tried to improve the correction performance by reflecting the processing of words not included in the learning corpus and surrounding contextual information of the correction words. The embedding techniques used for proofing were divided into word-based embeddings and embeddings that reflected contextual information. This paper performed correction experiments using the embedding techniques, focusing on the above two improvement goals, and obtained reliable correction performance.

Algorithm for Detecting Double-Spending in Blockchain

Minho Kim, Sujin Kim, Hoon Choi

http://doi.org/10.5626/JOK.2018.45.8.848

The blockchain is a key technology of the Bitcoin, which is widely used as an electronic cash system. In the Bitcoin, one digital currency is valid for only one transaction. It is called double-spending, a type of illegal transaction, if two or more transactions are made by using the same digital currency. When the blockchain is forked, the blockchain specification assumes that the longer blockchain may be valid, but the blockchain containing double-spending may become longer than the blockchain containing normal transactions, so comparing lengths of the chain cannot completely prevent illegal transactions. In this paper, we propose an algorithm to detect double-spending and a mechanism to notify other nodes after detection. This algorithm is implemented and verified by using the bitcoin core.

Statistical Ranking Recommendation System of Hangul-to-Roman Conversion for Korean Names

Jung-Hun Lee, Minho Kim, Hyuk-Chul Kwon

http://doi.org/10.5626/JOK.2017.44.12.1269

This paper focuses on the Hangul-to-roman conversion of Korean names. The proposed method recognizes existing notation and provides results according to the frequency of use. There are two main reasons for the diversity in Hangul-to-roman name conversion. The first is the indiscreet use of varied notation made domestically and overseas. The second is the customary notation of current notation. For these reasons, it has become possible to express various Roman characters in Korean names. The system constructs and converts data from 4 million people into a statistical dictionary. In the first step, the person"s name is judged through a process matching the last name. In the second step, the first name is compared and converted in the statistical dictionary. In the last step, the syllables in the name are compared and converted, and the results are ranked according to the frequency of use. This paper measured the performance compared to the existing service systems on the web. The results showed a somewhat higher performance than other systems.

Context-sensitive Spelling Error Correction using Eojeol N-gram

Minho Kim, Hyuk-Chul Kwon, Sungki Choi

http://doi.org/

Context-sensitive spelling-error correction methods are largely classified into rule-based methods and statistical data-based methods, the latter of which is often preferred in research. Statistical error correction methods consider context-sensitive spelling error problems as word-sense disambiguation problems. The method divides a vocabulary pair, for correction, which consists of a correction target vocabulary and a replacement candidate vocabulary, according to the context. The present paper proposes a method that integrates a word-phrase n-gram model into a conventional model in order to improve the performance of the probability model by using a correction vocabulary pair, which was a result of a previous study performed by this research team. The integrated model suggested in this paper includes a method used to interpolate the probability of a sentence calculated through each model and a method used to apply the models, when both methods are sequentially applied. Both aforementioned types of integrated models exhibit relatively high accuracy and reproducibility when compared to conventional models or to a model that uses only an n-gram.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr