Digital Library[ Search Result ]
Multidimensional Subset-based Systems for Bias Elimination Within Binary Classification Datasets
KyeongSu Byun, Goo Kim, Joonho Kwon
http://doi.org/10.5626/JOK.2023.50.5.383
As artificial intelligence technology develops, artificial intelligence-related fairness issues are drawing attention. As a result, many related studies have been conducted on this issue, but most of the research has focused on developing models and training methods. Research on removing bias existing in data used for learning, which is a fundamental cause, is still insufficient. Therefore, in this paper, we designed and implemented a system that divides the biases existing within the data into label biases and subgroup biases and removes the biases to generate datasets with improved fairness. The proposed system consists of two steps: (1) subset generation and (2) bias removal. First, the subset generator divides the existing data into subsets on formed by a combination of values in an datasets. Subsequently, the subset is divided into dominant and weak groups based on the fairness indicator values obtained by validating the existing datasets based on the validation datasets. Next, the bias remover reduces the bias shown in the subset by repeating the process of sequentially extracting and verifying the dominant group of each subset to reduce the difference from the weak group. Afterwards, the biased subsets are merged and a fair data set is returned. The fairness indicators used for the verification use the F1 score and the equalized odd. Comprehensive experiments with real-world Census incoming data, COMPAS data, and bank marketing data as verification data demonstrated that our proposed system outperformed the existing technique by yielding a better fairness improvement rate and providing more accuracy in most machine learning algorithms.
Dynamic Group Management to Improve the Scalability of PBFT
Jinsung Cho, Gwangyong Kim, Geunmo Kim, Bongjae Kim, Min Choi
http://doi.org/10.5626/JOK.2023.50.5.369
A consensus algorithm that affects the reliability and performance of a blockchain is used for identical decision-making of nodes participating in the consensus. PBFT(Practical Byzantine Falut Tolerance) is a voting-based consensus algorithm with an O(n2) time complexity. The scalability of PBFT is generally poor. This paper proposes a scheme for grouping the nodes participating in a PBFT-based blockchain network and dynamically managing each group by layering it. In addition, we create a mathematical model for estimating the expected time required for consensus of the proposed scheme. Afterwards, we propose a dynamic consensus algorithm for dynamically adjusting the structures of groups and layers based on the model for estimating the expected time of a consensus. As a result of the experiment, the proposed scheme improves the performance of the consensus time by about 97% on average compared to the group-based PBFT without hierarchical structures.
PatentQ&A: Proposal of Patent Q&A Neural Search System Using Transformer Model
Yoonmin Lee, Taewook Hwang, Sangkeun Jung, Hyein Seo, Yoonhyung Roh
http://doi.org/10.5626/JOK.2023.50.4.306
Recent neural network search has enabled semantic search beyond search based on statistical methods, and finds accurate search results even with typos. This paper proposes a neural network-based patentQ&A search system that provides the closest answer to the user"s question intention when a general public without patent expertise searches for patent information using general terms. A patent dataset was constructed using patent customer consultation data posted on the Korean Intellectual Property Office website. Patent-KoBERT (Triplet) and Patent-KoBERT (CrossEntropy) were fine-tuned as patent datasets were used to extract similar questions to questions entered by the user and re-rank them. As a result of the experiment, values of Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) were 0.96, confirming that answers most similar to the intention of the user input were well selected.
Prediction of Antibiotic Resistance to Ciprofloxacin in Patients with Upper Urinary Tract Infection through Exploratory Data Analysis and Machine Learning
http://doi.org/10.5626/JOK.2023.50.3.263
Emergency medicine physicians use an empirical treatment strategy to select antibiotics before clinically confirming an antibiotic resistance profile for a patient with a urinary tract infection. Empirical treatment is a challenging task in the context of concern for increased antibiotic resistance of urinary tract pathogens in the community. As a single-institution retrospective study, this study proposed a method for predicting antibiotic resistance using a machine learning algorithm for patients diagnosed with upper urinary tract infection in the emergency department. First, we selected significant predictors using statistical test methods and a game theory based SHAP (SHapley Additive exPlanation), respectively. Next, we compared four classifier performances and proposed an algorithm to assist decision-making in empirical treatment by adjusting the prediction probability threshold. As a result, the SVM classifier using predictors selected through SHAP (65% of the total) showed the highest AUROC (0.775) among all conditions used in the experiment. By adjusting the predictive probability threshold in the SVM, we achieved classification accuracy with a specificity that was 3.9 times higher than empirical treatment while preserving the sensitivity of the doctor"s empirical treatment at 98%.
Performance Improvement of a Korean Open Domain Q&A System by Applying the Trainable Re-ranking and Response Filtering Model
Hyeonho Shin, Myunghoon Lee, Hong-Woo Chun, Jae-Min Lee, Sung-Pil Choi
http://doi.org/10.5626/JOK.2023.50.3.273
Research on Open Domain Q&A, which can identify answers to user inquiries without preparing the target paragraph in advance, is currently being undertaken as deep learning technology is used for natural language processing. However, existing studies have limitations in semantic matching using keyword-based information retrieval. To supplement this, deep learning-based information retrieval research is in progress. But there are not many domestic studies that have been empirically applied to real systems. In this paper, a two-step performance enhancement method was proposed to improve the performance of the Korean open domain Q&A system. The proposed method is a method of sequentially applying a machine learning-based re-ranking model and a response filtering model to a baseline system in which a search engine and an MRC model was combined. In the case of the baseline system, the initial performance was an F1 score of 74.43 and an EM score of 60.79, and it was confirmed that the performance improved to an F1 score of 82.5 and an EM score of 68.82 when the proposed method was used.
DNN Retraining Method Reducing Accuracy Degradation in Packet-Lossy Environments
Dongwhee Kim, Yujin Lim, Syngha Han, Jungrae Kim
http://doi.org/10.5626/JOK.2023.50.3.285
Limited resources on mobile devices have necessitated a collaboration with cloud servers, called “Collaborative Intelligence”, to process growing Deep Neural Network (DNN) model sizes. Collaborative intelligence takes a long time to send a lot of feature data from clients to servers. One can reduce the transfer time using User Datagram Protocol (UDP), but a dropped packet during UDP transfer reduces inference accuracy. This paper proposed a DNN retraining method to develop a robust DNN model. The server-side layers are retrained to avoid lossy features by modeling continuous feature losses resulting from a packet drop. Our results showed that it can reduce accuracy reduction from packet losses, provide high accuracy reliability against changes in the communication environment, and reduce the storage overheads of mobile devices.
Shortest Paths Between Line Segments in the Presence of Rectangular Obstacles
Chanyang Seo, Taehoon Ahn, Hee-Kap Ahn
http://doi.org/10.5626/JOK.2023.50.3.204
In this paper, we present an algorithm computing L1 shortest paths between two line segments in the presence of rectangular obstacles. A path between two line segments is the shortest path between two selected points from each line segment. The selected points vary by the definition of a path between the line segments. Among them, we consider minimum shortest path which is defined by the points that minimize the length and maximum shortest path which is defined by the points that maximize the length. We present an O(nlogn)-time algorithm computing minimum shortest path and an O(n2)-time algorithm computing maximum shortest path.
Effective Detection of Generated Images Using Frequency Transform
Hyoungwon Seo, Dongsu Kim, Seoyoen Oh, Jisang Lee, Haneol Jang
http://doi.org/10.5626/JOK.2025.52.4.350
In today's digital era, advanced image generation techniques have produced counterfeit images that are nearly indistinguishable from real ones, thereby undermining the trustworthiness of digital information. Conventional machine learning and deep learning methods have shown limitations when confronting these evolving generative algorithms. This study introduces a novel approach that can analyze characteristics of generated images in the frequency domain. Specifically, we independently applied the Fast Fourier Transform (FFT) and the Discrete Cosine Transform (DCT) to evaluate the effectiveness of each method for detecting generated images. Experimental results revealed that the FFT-based model improved the test accuracy by approximately 12.8%, while the DCT-based model demonstrated a performance enhancement of about 22.2%. These findings confirm that a frequency domain approach outperforms traditional spatial domain-based detection techniques. It is expected to make a substantial contribution to enhancing image reliability in digital forensics.
Enhancing Retrieval-Augmented Generation Through Zero-Shot Sentence-Level Passage Refinement with LLMs
Taeho Hwang, Soyeong Jeong, Sukmin Cho, Jong C. Park
http://doi.org/10.5626/JOK.2025.52.4.304
This study presents a novel methodology designed to enhance the performance and effectiveness of Retrieval-Augmented Generation (RAG) by utilizing Large Language Models (LLMs) to eliminate irrelevant content at the sentence level from retrieved documents. This approach refines the content of passages exclusively through LLMs, avoiding the need for additional training or data, with the goal of improving the performance in knowledge-intensive tasks. The proposed method was tested in an open-domain question answering (QA) environment, where it demonstrated its ability to effectively remove unnecessary content and outperform over traditional RAG methods. Overall, our approach has proven effective in enhancing performance compared to conventional RAG techniques and has shown the capability to improve RAG's accuracy in a zero-shot setting without requiring additional training data.
Generating Counterfactual Examples through Generating Adversarial Examples
http://doi.org/10.5626/JOK.2022.49.12.1132
The advance of artificial intelligence (AI) has brought numerous conveniences. However, the complex structure of AI models makes it challenging to understand the inner working of AI. Counterfactual explanation is a method using counterfactual examples, in which minimum perceptible perturbations are applied to change classification results, to explain AI. Adversarial examples are data modified for causing AI models to misclassify the data. Unlike counterfactual examples, perturbations applied to adversarial examples are difficult for humans to perceive. In a simple model, generating adversarial examples is similar to generating counterfactual examples. In contrast, it is different in deep learning because the cognitive difference between humans and deep learning models is often huge. Nevertheless, we confirmed that adversarial examples generated by certain deep learning models were similar to counterfactual examples. In this paper, we analyzed the structure and conditions of deep learning models in which adversarial examples were similar to counterfactual examples. We also proposed a new metric, partial concentrated change (PCC), and compared adversarial examples generated from different models using existing metrics and the proposed PCC.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr