Search : [ keyword: Wikipedia ] (4)

Information Collection of COVID-19 Pandemic Using Wikipedia Template Network

Danu Kim, Damin Lee, Jaehyeon Myung, Changwook Jung, Inho Hong, Diego Sáez-Trumper, Jinhyuk Yun, Woo-Sung Jung, Meeyoung Cha

http://doi.org/10.5626/JOK.2022.49.5.347

Access to accurate information is essential to reduce the social damage caused by the Coronavirus Disease 2019 (COVID-19) pandemic. Information about ongoing events, such as COVID-19, is quickly updated on Wikipedia, an accessible internet encyclopedia that allows users to edit it themselves. However, the existing Wikipedia information retrieval method has a limitation in collecting information, including relationships between documents. The template format of Wikipedia reflects the structure of information as a link that is selectively applied to documents with high relevance. This study collected information on COVID-19 in 10 languages on Wikipedia using a template and reorganized it into networks. Among the 10 networks with 130,662 nodes and 202,258 edges, languages with a large number of active users had a template network with a large size and depth, and documents highly related to COVID-19 existed within a 3-hop connection structure. This research proposed a new information retrieval method applicable to multiple languages and contributes to the construction of document lists related to specific topics.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia

Hokyung Lee, Jaehyun An, Jeongmin Yoon, Kyoungman Bae, Youngjoong Ko

http://doi.org/10.5626/JOK.2017.44.8.813

Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user’s query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

A Semi-automatic Construction method of a Named Entity Dictionary Based on Wikipedia

Yeongkil Song, Seokwon Jeong, Harksoo Kim

http://doi.org/

A named entity(NE) dictionary is an important resource for the performance of NE recognition. However, it is not easy to construct a NE dictionary manually since human annotation is time consuming and labor-intensive. To save construction time and reduce human labor, we propose a semi-automatic system for the construction of a NE dictionary. The proposed system constructs a pseudo-document with Wiki-categories per NE class by using an active learning technique. Then, it calculates similarities between Wiki entries and pseudo-documents using the BM25 model, a well-known information retrieval model. Finally, it classifies each Wiki entry into NE classes based on similarities. In experiments with three different types of NE class sets, the proposed system showed high performance(macro-average F1-score of 0.9028 and micro-average F1-score 0.9554).

Building a Korean-English Parallel Corpus by Measuring Sentence Similarities Using Sequential Matching of Language Resources and Topic Modeling

JuRyong Cheon, YoungJoong Ko

http://doi.org/

In this paper, to build a parallel corpus between Korean and English in Wikipedia. We proposed a method to find similar sentences based on language resources and topic modeling. We first applied language resources(Wiki-dictionary, numbers, and online dictionary in Daum) to match word sequentially. We construct the Wiki-dictionary using titles in Wikipedia. In order to take advantages of the Wikipedia, we used translation probability in the Wiki-dictionary for word matching. In addition, we improved the accuracy of sentence similarity measuring method by using word distribution based on topic modeling. In the experiment, a previous study showed 48.4% of F1-score with only language resources based on linear combination and 51.6% with the topic modeling considering entire word distributions additionally. However, our proposed methods with sequential matching added translation probability to language resources and achieved 9.9% (58.3%) better result than the previous study. When using the proposed sequential matching method of language resources and topic modeling after considering important word distributions, the proposed system achieved 7.5%(59.1%) better than the previous study.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr