Search : [ author: 안재현 ] (2)

Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure

Jaehyun An, Youngjoong Ko

http://doi.org/10.5626/JOK.2019.46.5.440

In this paper, we proposed a semi-automatic expansion method to expand a chatting corpus using a large amount of utterance data from movie subtitles and drama scripts. To expand the chatting corpus, the proposed system used previously constructed chatting corpus and a similarity measure. If the similarity is calculated between a previously constructed chatting corpus and the input utterance was greater than a threshold value set in the experiment, the input utterance was selected as a new chatting utterance, that it is a correct chatting pair. We used morpheme-unit word embeddings and a Convolutional Neural Networks to efficiently calculate the similarity of the utterance embedding. In order to improve the speed of the semi-automatic expansion process, we proposed to reduce the amount of computation by clustering chat corpus by K-means clustering algorithm. Experimental results showed that the precision, recall, and F1 score of the proposed system were 61.28%, 53.19%, and 56.94%, respectively, which was 5.16%p, 6.09%, and 5.73%p higher than that of the baseline system. The term frequency and the speed of our system were also about a hundred times faster.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia

Hokyung Lee, Jaehyun An, Jeongmin Yoon, Kyoungman Bae, Youngjoong Ko

http://doi.org/10.5626/JOK.2017.44.8.813

Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user’s query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr