Digital Library[ Search Result ]
A Visual Analytics System for Interpretable Machine Learning
http://doi.org/10.5626/JOK.2023.50.1.57
Interpretable machine learning is a technology that assists people understand the behavior and prediction of machine learning systems. This study proposes a visual analytics system that can interpret the relationship between how machine learning models relate output results from input data. It supports users to interpret machine learning models easily and clearly. The visual analytics system proposed in this study takes an approach to effectively interpret the machine learning model through an iterative adjustment procedure that filters and groups model decision results according to input variables, target variables, and predicted/classified values. Through use case analysis and in-depth user interviews, we confirmed that our system could provide insights into the complex behavior of machine learning models, gain scientific understanding of input variables, target variables, and model predictions, and help users understand the stability and reliability of models.
Competition Relation Extraction based on Combining Machine Learning and Filtering
ChungHee Lee, YoungHoon Seo, HyunKi Kim
This study was directed at the design of a hybrid algorithm for competition relation extraction. Previous works on relation extraction have relied on various lexical and deep parsing indicators and mostly utilize only the machine learning method. We present a new algorithm integrating machine learning with various filtering methods. Some simple but useful features for competition relation extraction are also introduced, and an optimum feature set is proposed. The goal of this paper was to increase the precision of competition relation extraction by combining supervised learning with various filtering methods. Filtering methods were employed for classifying compete relation occurrence, using distance restriction for the filtering of feature pairs, and classifying whether or not the candidate entity pair is spam. For evaluation, a test set consisting of 2,565 sentences was examined. The proposed method was compared with the rule-based method and general relation extraction method. As a result, the rule-based method achieved positive precision of 0.812 and accuracy of 0.568, while the general relation extraction method achieved 0.612 and 0.563, respectively. The proposed system obtained positive precision of 0.922 and accuracy of 0.713. These results demonstrate that the developed method is effective for competition relation extraction.
MOnCa2: High-Level Context Reasoning Framework based on User Travel Behavior Recognition and Route Prediction for Intelligent Smartphone Applications
MOnCa2 is a framework for building intelligent smartphone applications based on smartphone sensors and ontology reasoning. In previous studies, MOnCa determined and inferred user situations based on sensor values represented by ontology instances. When this approach is applied, recognizing user space information or objects in user surroundings is possible, whereas determining the user’s physical context (travel behavior, travel destination) is impossible. In this paper, MOnCa2 is used to build recognition models for travel behavior and routes using smartphone sensors to analyze the user’s physical context, infer basic context regarding the user’s travel behavior and routes by adapting these models, and generate high-level context by applying ontology reasoning to the basic context for creating intelligent applications. This paper is focused on approaches that are able to recognize the user’s travel behavior using smartphone accelerometers, predict personal routes and destinations using GPS signals, and infer high-level context by applying realization.
Learning Multiple Instance Support Vector Machine through Positive Data Distribution
Joong-Won Hwang, Seong-Bae Park, Sang-Jo Lee
This paper proposes a modified MI-SVM algorithm by considering data distribution. The previous MI-SVM algorithm seeks the margin by considering the “most positive” instance in a positive bag. Positive instances included in positive bags are located in a similar area in a feature space. In order to reflect this characteristic of positive instances, the proposed method selects the “most positive” instance by calculating the distance between each instance in the bag and a pivot point that is the intersection point of all positive instances. This paper suggests two ways to select the “most positive” pivot point in the training data. First, the algorithm seeks the “most positive” pivot point along the current predicted parameter, and then selects the nearest instance in the bag as a representative from the pivot point. Second, the algorithm finds the “most positive” pivot point by using a Diverse Density framework. Our experiments on 12 benchmark multi-instance data sets show that the proposed method results in higher performance than the previous MI-SVM algorithm.
Creating Level Set Trees Using One-Class Support Vector Machines
A level set tree provides a useful representation of a multidimensional density function. Visualizing the data structure as a tree offers many advantages for data analysis and clustering. In this paper, we present a level set tree estimation algorithm for use with a set of data points. The proposed algorithm creates a level set tree from a family of level sets estimated over a whole range of levels from zero to infinity. Instead of estimating density function then thresholding, we directly estimate the density level sets using one-class support vector machines (OC-SVMs). The level set estimation is facilitated by the OC-SVM solution path algorithm. We demonstrate the proposed level set tree algorithm on benchmark data sets.
Exploiting Friend’s Username to De-anonymize Users across Heterogeneous Social Networking Sites
Nowadays, social networking sites (SNSs), such as Twitter, LinkedIn, and Tumblr, are coming into the forefront, due to the growth in the number of users. While users voluntarily provide their information in SNSs, privacy leakages resulting from the use of SNSs is becoming a problem owing to the evolution of large data processing techniques and the raising awareness of privacy. In order to solve this problem, the studies on protecting privacy on SNSs, based on graph and machine learning, have been conducted. However, examples of privacy leakages resulting from the advent of a new SNS are consistently being uncovered. In this paper, we propose a technique enabling a user to detect privacy leakages beforehand in the case where the service provider or third-party application developer threatens the SNS user’s privacy maliciously.
Developing an Automated English Sentence Scoring System for Middle-school Level Writing Test by Using Machine Learning Techniques
In this paper, we introduce an automatic scoring system for middle-school level writing test based on using machine learning techniques. We discuss overall process and features for building an automatic English writing scoring system. A "concept answer" which represents an abstract meaning of text is newly introduced in order to evaluate the elaboration of a student"s answer. In this work, multiple machine learning algorithms are adopted for scoring English writings. We suggest a decision process "optimal combination" which optimally combines multiple outputs of machine learning algorithms and generates a final single output in order to improve the performance of the automatic scoring. By experiments with actual test data, we evaluate the performance of overall automated English writing scoring system.
Study of State Machine Diagram Robustness Testing using Casual Relation of Events
Seon Yeol Lee, Heung Seok Chae
Studies of fault injection into state machine diagram have been studied for generating robustness test cases. Conventional studies have, however, tended to inject too many faults into diagrams because they only have considered structural aspects of diagrams. In this paper, we propose a method that aims to reduce the number of injected fault without a decrease in effectivenss of robustness test. A proposed method is demonstrated using a microwave oven sate machine diagram and evaluated using a hash table state machine diagram. The result of the evaluation shows that the number of injected faults is decreased by 43% and the number of test cases is decreased by 63% without a decrease in effectiveness of hash table robustness test.
Syllable-based Probabilistic Models for Korean Morphological Analysis
This paper proposes three probabilistic models for syllable-based Korean morphological analysis, and presents the performance of proposed probabilistic models. Probabilities for the models are acquired from POS-tagged corpus. The result of 10-fold cross-validation experiments shows that 98.3% answer inclusion rate is achieved when trained with Sejong POS-tagged corpus of 10 million eojeols. In our models, POS tags are assigned to each syllable before spelling recovery and morpheme generation, which enables more efficient morphological analysis than the previous probabilistic models where spelling recovery is performed at the first stage. This efficiency gains the speed-up of morphological analysis. Experiments show that morphological analysis is performed at the rate of 147K eojeols per second, which is almost 174 times faster than the previous probabilistic models for Korean morphology.
Pretrained Large Language Model-based Drug-Target Binding Affinity Prediction for Mutated Proteins
Taeung Song, Jin Hyuk Kim, Hyeon Jun Park, Jonghwan Choi
http://doi.org/10.5626/JOK.2025.52.6.539
Drug development is a costly and time-consuming process. Accurately predicting the impact of protein mutations on drug-target binding affinity remains a major challenge. Previous studies have utilized long short-term memory (LSTM) and transformer models for amino acid sequence processing. However, LSTMs suffer from long-sequence dependency issues, while transformers face high computational costs. In contrast, pretrained large language models (pLLMs) excel in handling long sequences, yet prompt-based approaches alone are insufficient for accurate binding affinity prediction. This study proposed a method that could leverage pLLMs to analyze protein structural data, transform it into embedding vectors, and use a separate machine learning model for numerical binding affinity prediction. Experimental results demonstrated that the proposed approach outperformed conventional LSTM and prompt-based methods, achieving lower root mean square error (RMSE) and higher Pearson correlation coefficient (PCC), particularly in mutation-specific predictions. Additionally, performance analysis of pLLM quantization confirmed that the method maintained sufficient accuracy with reduced computational cost.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr