Search : [ keyword: Embedding ] (69)

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Wonsuk Yang, Hancheol Park, Jong C. Park

http://doi.org/

We present a system that automatically annotates unverified Web sentences with information from credible sources. The system turns to neural theorem proving for an annotating task for cancer related Wikipedia data (1,486 propositions) with Korean National Cancer Center data (19,304 propositions). By switching the recursive module in a neural theorem prover to a word embedding module, we overcome the fundamental problem of tremendous learning time. Within the identical environment, the original neural theorem prover was estimated to spend 233.9 days of learning time. In contrast, the revised neural theorem prover took only 102.1 minutes of learning time. We demonstrated that a neural theorem prover, which encodes a proposition in a tensor, includes a classic theorem prover for exact match and enables end-to-end differentiable logic for analogous words.

Finding the Minimum MBRs Embedding K Points

Keonwoo Kim, Younghoon Kim

http://doi.org/

There has been a recent spate in the usage of mobile device equipped GPS sensors, such as smart phones. This trend enables the posting of geo-tagged messages (i.e., multimedia messages with GPS locations) on social media such as Twitter and Facebook, and the volume of such spatial data is rapidly growing. However, the relationships between the location and content of messages are not always explicitly shown in such geo-tagged messages. Thus, the need arises to reorganize search results to find the relationship between keywords and the spatial distribution of messages. We find the smallest minimum bounding rectangle (MBR) that embedding k or more points in order to find the most dense rectangle of data, and it can be usefully used in the location search system. In this paper, we suggest efficient algorithms to discover a group of 2-Dimensional spatial data with a close distance, such as MBR. The efficiency of our proposed algorithms with synthetic and real data sets is confirmed experimentally.

Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity

Hee-Geun Yoon, Su Jeong Choi, Seong-Bae Park

http://doi.org/

The existing pattern-based triple generation systems based on distant supervision could be flawed by assumption of distant supervision. For resolving flaw from an excessive assumption, statistics information has been commonly used for measuring confidence of patterns in previous studies. In this study, we proposed a more accurate confidence measure based on semantic similarity between patterns and properties. Unsupervised learning method, word embedding and WordNet-based similarity measures were adopted for learning meaning of words and measuring semantic similarity. For resolving language discordance between patterns and properties, we adopted CCA for aligning bilingual word embedding models and a translation-based approach for a WordNet-based measure. The results of our experiments indicated that the accuracy of triples that are filtered by the semantic similarity-based confidence measure was 16% higher than that of the statistics-based approach. These results suggested that semantic similarity-based confidence measure is more effective than statistics-based approach for generating high quality triples.

Korean Named Entity Recognition and Classification using Word Embedding Features

Yunsu Choi, Jeongwon Cha

http://doi.org/

Named Entity Recognition and Classification (NERC) is a task for recognition and classification of named entities such as a person"s name, location, and organization. There have been various studies carried out on Korean NERC, but they have some problems, for example lacking some features as compared with English NERC. In this paper, we propose a method that uses word embedding as features for Korean NERC. We generate a word vector using a Continuous-Bag-of- Word (CBOW) model from POS-tagged corpus, and a word cluster symbol using a K-means algorithm from a word vector. We use the word vector and word cluster symbol as word embedding features in Conditional Random Fields (CRFs). From the result of the experiment, performance improved 1.17%, 0.61% and 1.19% respectively for TV domain, Sports domain and IT domain over the baseline system. Showing better performance than other NERC systems, we demonstrate the effectiveness and efficiency of the proposed method.

Linking Korean Predicates to Knowledge Base Properties

Yousung Won, Jongseong Woo, Jiseong Kim, YoungGyun Hahm, Key-Sun Choi

http://doi.org/

Relation extraction plays a role in for the process of transforming a sentence into a form of knowledge base. In this paper, we focus on predicates in a sentence and aim to identify the relevant knowledge base properties required to elucidate the relationship between entities, which enables a computer to understand the meaning of a sentence more clearly. Distant Supervision is a well-known approach for relation extraction, and it performs lexicalization tasks for knowledge base properties by generating a large amount of labeled data automatically. In other words, the predicate in a sentence will be linked or mapped to the possible properties which are defined by some ontologies in the knowledge base. This lexical and ontological linking of information provides us with a way of generating structured information and a basis for enrichment of the knowledge base.

Error Correction in Korean Morpheme Recovery using Deep Learning

Hyunsun Hwang, Changki Lee

http://doi.org/

Korean Morphological Analysis is a difficult process. Because Korean is an agglutinative language, one of the most important processes in Morphological Analysis is Morpheme Recovery. There are some methods using Heuristic rules and Pre-Analyzed Partial Words that were examined for this process. These methods have performance limits as a result of not using contextual information. In this study, we built a Korean morpheme recovery system using deep learning, and this system used word embedding for the utilization of contextual information. In ‘들/VV’ and ‘듣/VV’ morpheme recovery, the system showed 97.97% accuracy, a better performance than with SVM(Support Vector Machine) which showed 96.22% accuracy.

A Post-Verification Method of Near-Duplicate Image Detection using SIFT Descriptor Binarization

Yu Jin Lee, Jongho Nang

http://doi.org/

In recent years, as near-duplicate image has been increasing explosively by the spread of Internet and image-editing technology that allows easy access to image contents, related research has been done briskly. However, BoF (Bag-of-Feature), the most frequently used method for near-duplicate image detection, can cause problems that distinguish the same features from different features or the different features from same features in the quantization process of approximating a high-level local features to low-level. Therefore, a post-verification method for BoF is required to overcome the limitation of vector quantization. In this paper, we proposed and analyzed the performance of a post-verification method for BoF, which converts SIFT (Scale Invariant Feature Transform) descriptors into 128 bits binary codes and compares binary distance regarding of a short ranked list by BoF using the codes. Through an experiment using 1500 original images, it was shown that the near-duplicate detection accuracy was improved by approximately 4% over the previous BoF method.

Locally Linear Embedding for Face Recognition with Simultaneous Diagonalization

Eun-Sol Kim, Yung-Kyun Noh, Byoung-Tak Zhang

http://doi.org/

Locally linear embedding (LLE) [1] is a type of manifold algorithms, which preserves inner product value between high-dimensional data when embedding the high-dimensional data to low-dimensional space. LLE closely embeds data points on the same subspace in low-dimensional space, because the data points have significant inner product values. On the other hand, if the data points are located orthogonal to each other, these are separately embedded in low-dimensional space, even though they are in close proximity to each other in high-dimensional space. Meanwhile, it is well known that the facial images of the same person under varying illumination lie in a low-dimensional linear subspace [2]. In this study, we suggest an improved LLE method for face recognition problem. The method maximizes the characteristic of LLE, which embeds the data points totally separately when they are located orthogonal to each other. To accomplish this, all of the subspaces made by each class are forced to locate orthogonally. To make all of the subspaces orthogonal, the simultaneous Diagonalization (SD) technique was applied. From experimental results, the suggested method is shown to dramatically improve the embedding results and classification performance.

Effective Importance-Based Entity Grouping Method in Continual Graph Embedding

Kyung-Hwan Lee, Dong-Wan Choi

http://doi.org/10.5626/JOK.2025.52.7.627

This study proposed a novel approach to improving entity importance evaluation in continual graph embeddings by incorporating edge betweenness centrality as a weighting factor in a Weighted PageRank algorithm. By normalizing and integrating betweenness centrality, the proposed method effectively propagated entity importance while accounting for the significance of information flow through edges. Experimental results demonstrated significant performance improvements in MRR and Hit@N metrics across various datasets using the proposed method compared to existing methods. Notably, the proposed method showed enhanced learning performance after the initial snapshot in scenarios where new entities and relationships were continuously added. These findings highlight the effectiveness of leveraging edge centrality in promoting efficient and accurate learning in continual knowledge graph embeddings.

A Pretrained Model-Based Approach to Improve Generalization Performance for ADMET Prediction of Drug Candidates

Yoonju Kim, Sanghyun Park

http://doi.org/10.5626/JOK.2025.52.7.601

Accurate prediction of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties plays an important role in reducing clinical trial failure rates and lowering drug development costs. In this study, we propose a novel method to improve ADMET prediction performance for drug candidate compounds by integrating molecular embeddings from a graph transformer model with pretrained embeddings from a UniMol model. The proposed model can capture bond type information from molecular graph structures, generating chemically refined representations, while leveraging UniMol’s pretrained 3D embeddings to effectively learn spatial molecular characteristics. Through this, the model is designed to address the problem of data scarcity and enhance the generalization performance. In this study, we conducted prediction experiments on 10 ADMET properties. The experiment results demonstrated that our proposed model outperformed existing methods and that the prediction accuracy for ADMET properties could be improved by effectively integrating atomic bond information and 3D structures.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr