Journal of KIISE

Search : [ keyword: Embedding ] (69)

Dependency parsing is a crucial step in language analysis. It identifies relationships between words within a sentence. Recently, many models based on a pre-trained transformer have shown impressive performances in various natural language processing research. hey have been also applied to dependency parsing. Generally, traditional approaches to dependency parsing using pre-trained models consist of two main stages: 1) merging token-level embeddings generated by the pre-trained model into word-level embeddings; and 2) analyzing dependency relations by comparing or classifying the merged embeddings. However, due to a large number of parameters and additional layers required for embedding construction, comparison, and classification, these models can be inefficient in terms of time and memory usage. This paper proposes a dependency parsing technique based on sequential labeling to improve the efficiency of training and inference by defining dependency parsing units and simplifying model layers. The proposed model eliminates the necessity of the word-level embedding merging step by utilizing special tokens to define parsing units. It also effectively reduces the number of parameters by simplifying model layers. As a result, the training and inference time is significantly shortened. With these optimizations, the proposed model maintains meaningful performance in dependency parsing.

Drug-Drug Interaction Prediction Model Based on Deep Learning Using Drug Information Document Embedding

Sunwoo Jung, Sunyong Yoo

http://doi.org/10.5626/JOK.2024.51.6.503

The use of polypharmacy has emerged as a promising approach for various diseases, including cancer, hypertension, and asthma. However, the use of polypharmacy can result in unexpected interactions, which may lead to adverse drug effects. Therefore, predicting drug-drug interactions (DDI) is essential for safe medication practices. In this study, we propose a drug-drug interaction prediction model based on deep learning using document embedding to represent the drug. We generate documents about drug information by combining DrugBank data, which includes drug descriptions, indications, mechanisms of action, pharmacodynamics, and toxicity. Then we use Doc2Vec and BioSentVec language models to generate drug representation vectors from the drug information documents. The two drug vectors are paired and input into the deep learning-based prediction model, which outputs the likelihood of interaction between the two drugs. Our goal is to construct the optimal model for predicting drug-drug interactions by comparing the performance under various conditions, including language embedding model performance and adjustments for data imbalance. We expect the proposed model to be utilized for the advanced prediction of drug interactions during the drug prescription process and the clinical stages of new drug development.

Prediction of Dehydrogenation Enthalpy Using Graph Isomorphism Network

Kun Young Choi, Woo Hyun Yuk, Jeong Woo Han, Cham Kill Hong

http://doi.org/10.5626/JOK.2024.51.5.406

This paper conducts dehydrogenation enthalpy prediction that could play an important role in selecting optimal liquid organic hydrogen carriers. We employed graph convolutional networks, which produced molecular embeddings for the prediction. Specifically, we adopted Graph Isomorphism Network (GIN) known to be the most expressive graph-based representation learning model. Our proposed approach could build molecular embeddings. Our proposed approach outperformed conventional machine learning solutions and traditional representations based on chemical physics algorithms. In addition, the performance of the proposed model could be improved with small batch sizes and deeper GCN layers using skip connections.

SBERT-PRO: Predicate Oriented Sentence Embedding Model for Intent and Event Detection

Dongryul Ko, Jeayun Lee, Dahee Lee, Yuri Son, Sangmin Kim, Jaeeun Jang, Munhyeong Kim, Sanghyun Park, Jaieun Kim

http://doi.org/10.5626/JOK.2024.51.2.165

Intent detection is a crucial task in conversational systems for understanding user intentions. Additionally, event detection is vital for identifying important events within various texts, including news articles, social media posts, and reports. Among diverse approaches, the sentence embedding similarity-based method has been widely adopted to solve open-domain classification tasks. However, conventional embedding models tend to focus on specific keywords within a sentence and are not suitable for tasks that require a high-level semantic understanding of a sentence as opposed to a narrow focus on specific details within a sentence. This limitation becomes particularly evident in tasks such as intent detection, which requires a broader understanding of the intention of a sentence, and event detection, which requires an emphasis on actual events within a sentence. In this paper, we construct a training dataset suitable for intent and event detection using entity attribute information and entity relation information. Our approach is inspired by the significance of emphasizing the embedding of predicates, which unfold the content of a sentence, as opposed to focusing on entity attributes within a sentence. Furthermore, we suggest an adaptive learning strategy for the existing sentence embedding model and demonstrate that our proposed model, SBERT-PRO (PRedicate Oriented), outperforms conventional models

Hierarchical Representation and Label Embedding for Semantic Classification of Domestic Research Paper

Heejin Kook, Yeonghwa Kim, Sehui Yoon, Byungha Kang, Youhyun Shin

http://doi.org/10.5626/JOK.2024.51.1.41

The sentence"s meaning in the paper is that it has a hierarchical structure, and there is data imbalance among subcategories. In addition, the meaning of the sentence in the paper is closely related to its position within the paper. Existing flat classification methods mainly consider only subcategories, leading to a decrease in classification accuracy due to data imbalance. In response to this, this study proposes hierarchical representation and label embedding methods to perform hierarchical semantic classification of sentences effectively. In addition, the section names of the paper are actively utilized to represent the positional information of the paper sentences. Through experiments, it is demonstrated that the proposed method, which explicitly considers hierarchical and positional information in the KISTI domestic paper sentence semantic tagging dataset, achieves excellent performance in terms of F1 score.

Optimizing Computation of Tensor-Train Decomposed Embedding Layer

Seungmin Yu, Hayun Lee, Dongkun Shin

http://doi.org/10.5626/JOK.2023.50.9.729

Personalized recommendation system is ubiquitous in daily life. However, the huge amount of memory requirement to store the embedding tables used by deep learning-based recommendation system models is taking up most of the resources of industrial AI data centers. To overcome this problem, one of the solutions is to use Tensor-Train (TT) decomposition, is promising compression technique in deep neural network. In this study, we analyze unnecessary computations in Tensor-Train Gather and Reduce (TT-GnR) which is the operation of embedding layer applied with TT decomposition. To solve this problem, we define a computational unit called group to bind the item vectors into a group and propose Group Reduced TT-Gather and Reduce operation to reduce unnecessary operations by calculating with groups. Since the GRT-GnR operation is calculated in groups, computational cost varies depending on how item vectors are grouped. Experimental results showed that the GRT-GnR operation had a 41% decrease in latency compared to conventional TT-GnR operation.

A Contrastive Learning Method for Automated Fact-Checking

Seonyeong Song, Jejun An, Kunwoo Park

http://doi.org/10.5626/JOK.2023.50.8.680

As proliferation of online misinformation increases, the importance of automated fact-checking, which enables real-time evaluation, has been emphasized. In this study, we propose a contrastive learning method for automated fact-checking in Korean. The proposed method deems a sentence similar to evidence as a positive sample to determine the authenticity of a given claim. In evaluation experiments, we found that the proposed method was more effective in the sentence selection step of finding evidence sentences for a given claim than previous methods. such as a finetuned pretrained language model and SimCSE. This study shows a potential of contrastive learning for automated fact-checking.

Deep Neural Network-Based Automated Essay Trait Scoring Model Incorporating Argument Structure Information

Yejin Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2023.50.8.662

Automated essay scoring is the task of having a model read a given essay and evaluate it automatically. This paper presents a method for automated essay scoring by creating essay representations that reflect argument structure of the essay using Argument Mining, and learning essay representations for each trait score. Results of our experiments indicated that the proposed essay representation outperformed representations obtained from pre-trained language models. Furthermore, it was found that learning different representations for each evaluation criterion was more effective for essay evaluation. The performance of the proposed model, as measured by the Quadratic Weighted Kappa (QWK) metric, improved from 0.543 to 0.627, showing a high level of agreement with human evaluations. Qualitative evaluations also showed that the proposed model demonstrated similar evaluation tendencies to human evaluations.

Improving the Performance of Knowledge Tracing Models using Quantized Correctness Embeddings

Yoonjin Im, Jaewan Moon, Eunseong Choi, Jongwuk Lee

http://doi.org/10.5626/JOK.2023.50.4.329

Knowledge tracing is a task of monitoring the proficiency of knowledge based on learners" interaction records. Despite the flexible usage of deep neural network-based models for this task, the existing methods disregard the difficulty of each question and result in poor performance for learners who get the easy question wrong or the hard question correct. In this paper, we propose quantizing the learners’ response information based on the question difficulty so that the knowledge tracing models can learn both the response and the difficulty of the question in order to improve the performance. We design a method that can effectively discriminate between negative samples with a high percentage of correct answer rate and positive samples with a low percentage of correct answer rate. Toward this end, we use sinusoidal positional encoding (SPE) that can maximize the distance difference between embedding representations in the latent space. Experiments show that the AUC value is improved to a maximum of 17.89% in the target section compared to the existing method.

CoEM: Contrastive Embedding Mapper for Audio-visual Latents

Gihun Lee, Kyungchae Lee, Minchan Jeong, Myungjin Lee, Se-young Yun, Chan-hyun Yun

http://doi.org/10.5626/JOK.2023.50.1.80

Human perception can link audio-visual information to each other, making it possible to recall visual information from audio information and vice versa. Such ability is naturally acquired by experiencing situations where these two kinds of information are combined. However, it is hard to obtain video datasets that are richly combined with both types of information, and at the same time, labeled for the semantics of each scene. This paper proposes a Contrastive Embedding Mapper (CoEM), which maps embedding from one type of information to the another, corresponding to its categorical modality. Paired data is not required, CoEM learns to contrast the mapped embedding by its categories. We validated the efficacy of CoEM on the embeddings for audio and visual datasets which were trained to classify 20 shared categories. In the experiment, the embedding mapped by CoEM showed that it was capable of retrieving and generating data on its mapped domain.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Korean Dependency Parsing Using Sequence Labeling

Drug-Drug Interaction Prediction Model Based on Deep Learning Using Drug Information Document Embedding

Prediction of Dehydrogenation Enthalpy Using Graph Isomorphism Network

SBERT-PRO: Predicate Oriented Sentence Embedding Model for Intent and Event Detection

Hierarchical Representation and Label Embedding for Semantic Classification of Domestic Research Paper

Optimizing Computation of Tensor-Train Decomposed Embedding Layer

A Contrastive Learning Method for Automated Fact-Checking

Deep Neural Network-Based Automated Essay Trait Scoring Model Incorporating Argument Structure Information

Improving the Performance of Knowledge Tracing Models using Quantized Correctness Embeddings

CoEM: Contrastive Embedding Mapper for Audio-visual Latents

Search

Editorial Office