Search : [ author: Hyun Kim ] (32)

A Source-code Similarity-based Automatic Tutoring Method for Online Coding Test Service

Hyunsu Mun, Soohyun Kim, Jihye Kim, Youngseok Lee

http://doi.org/10.5626/JOK.2021.48.9.1044

Recently, as IT companies have used online coding test judge systems to evaluate the abilities of developers, IT job seekers and students have been practicing coding tests. However, when a user has difficulty solving a problem, it is difficult to find a suitable hint because the coding test problem has many answers. To address this problem, this study suggested a method to find the appropriate data type to use as a hint for the user. This method measures the distance between the source codes submitted by each user and the answer source code based on the data type to find the closest one. As a result of analyzing 29,592 source code data collected from five university programming courses, the average accuracy was 85.33%, and there was a difference of about 7%, depending upon the level of the students, indicating that the proposed method was useful. This study contributes to the automatic tutoring method based on the source code submitted by the user rather than a guide prepared in advance.

ILP-based Schedule Synthesis of Time-Sensitive Networking

Jin Hyun Kim, Hyonyoung Choi, Kyong Hoon Kim, Insup Lee, Se-Hoon Kim

http://doi.org/10.5626/JOK.2021.48.6.595

IEEE 802.1Qbv Time Sensitive Network (TSN), the latest real-time Ethernet standard, is a network designed to guarantee the temporal accuracy of streams. TSN is an Ethernet-based network system that is actively being developed for the factory automation and automobile network systems. TSN controls the flow of data streams based on schedules generated statically off-line to satisfy end-to-end delay or jitter requirements. However, the generation of TSN schedules is an NP-hard problem; because of this, constraint solving techniques, such as SMT (Satisfiability Modulo Theory) and ILP (Integer Linear Programming), have mainly been proposed as solutions to this problem. This paper presents a new approach using a heuristic greedy and incremental algorithm working with ILP to decrease the complexity of computing schedules and improve the schedule generation performance in computing TSN schedules. Finally, we compare our proposed method with the existing SMT solver approach to show the performance of our approach.

Analysis of Limits in Applying AP-QoS-based Wi-Fi Slicing for Real-Time Systems

Jin Hyun Kim, Hyonyoung Choi, Gangjin Kim, Yundo Choi, Tae-Won Ban, Se-Hoon Kim

http://doi.org/10.5626/JOK.2021.48.6.723

Network slicing is a new network technology that guarantees the quality of network services according to application services or user’s types. Wi-Fi, IEEE 802.11-based LAN, is the mostly popularly used short-range wireless network and has been continually attracting more and more from users. Recently, the use of Wi-Fi by safety critical IoT devices, such as medical devices, has been drastically increasing. Moreover, enterprises require network slicing of Wi-Fi to introduce the provision of prioritized QoS of Wi-Fi depending on the service type of customer. This paper presents the analysis of the limits and difficulties in applying AP-QoS-based network slicing for hard real-time systems that demand temporal deterministic streaming services. In this paper, we have defined a formal framework to analyze QoS-providing IEEE 802.11e Enhanced Distributed Coordination Access and provide the worst-case streaming scenarios and thereby demonstrated why the temporal determinism of network streaming is broken. In addition, simulation results of AP-QoS-based network slicing using NS-3 are presented to show the limits and difficulties of the network slicing. Moreover, we present Wi-Fi network slicing techniques based on EDCA of AP-QoS for real-time systems through our technical report referenced in this paper.

2-Phase Passage Re-ranking Model based on Neural-Symbolic Ranking Models

Yongjin Bae, Hyun Kim, Joon-Ho Lim, Hyun-ki Kim, Kong Joo Lee

http://doi.org/10.5626/JOK.2021.48.5.501

Previous researches related to the QA system have focused on extracting exact answers for the given questions and passages. However, when expanding the problem from machine reading comprehension to open domain question answering, finding the passage containing the correct answer is as important as machine reading comprehension. DrQA reported that Exact Match@Top1 performance decreased from 69.5 to 27.1 when the QA system had the initial search step. In the present work, we have proposed the 2-phase passage reranking model to improve the performance of the question answering system. The proposed model integrates the results of the symbolic and neural ranking models to re-rank them again. The symbolic ranking model was trained based on the CatBoost algorithm and manual features between the question and passage. The neural model was trained based on the KorBERT model by fine-tuning. The second stage model was trained based on the neural regression model. We maximized the performance by combining ranking models with different characters. Finally, the proposed model showed the performance of 85.8% via MRR and 82.2% via BinaryRecall@Top1 measure while evaluating 1,000 questions. Each performance was improved by 17.3%(MRR) and 22.3%(BR@Top1) compared with the baseline model.

A BIT Named Entity Format Suitable for Low Resource Environments

Ho Yoon, Chang-Hyun Kim, Min-ah Cheon, Ho-min Park, Young Namgoong, Min-seok Choi, Jae-kyun Kim, Jae-Hoon Kim

http://doi.org/10.5626/JOK.2021.48.3.293

Named entity recognition (NER) seeks to locate and classify named entities into predefined categories such as person names, organization, location, and others. Most name entities consist of more than one word and so the multitude of annotated corpora for NER are encoded by the BIO (short for Beginning, Inside, and Outside) format: A “B-” prefix before a tag indicates that the tag is the beginning of a named entity, and an “I-” prefix before a tag indicates that the tag is inside the named entity. An “O” tag indicates that a word belongs to no named entity. In this format, words with “O” tags in the corpora amount to more than about 90% of the words and thus, can cause two problems: the high perplexity of words with “O” tags and imbalance learning. In this paper, we propose a novel format to represent the NER corpus called the BIT format, which uses “T (short for POS Tags)” tags in place of “O” tags. Experiments have shown that the BIT format outperforms the BIO format when the meaning projection of the word representation is unreliable, namely, when word embedding is trained through a relatively small number of words.

Improvement in Tor-based Dark Web Crawling Performance by Eliminating Web Browser Rendering and Scripting Tasks

Hyunsu Mun, Soohyun Kim, Youngseok Lee

http://doi.org/10.5626/JOK.2020.47.10.1008

The dark web, represented by Tor, has become a place where various illegal services, content, and transactions such as exchanges of drugs, child pornography, weapons, and contracts are conducted because of the anonymity guaranteed by the protocol. The Tor-based dark web service requires at least 3 tunneling nodes, and this makes the Tor-based services 2.2 times slower than the general web. And the slow speed makes difficult to monitor the illegal services which open irregularly. Therefore, this paper proposes a method for improving the speed of collecting Tor-based dark web data by removing rendering and scripting tasks using the Tor Socks5 proxy server. The performance of the existing and proposed crawlers was tested on 651 dark web addresses. By removing rendering and scripting, the collection performance was improved by up to 10.04 times.

Breast Cancer Subtype Classification Using Multi-omics Data Integration Based on Neural Network

Joungmin Choi, Jiyoung Lee, Jieun Kim, Jihyun Kim, Heejoon Chae

http://doi.org/10.5626/JOK.2020.47.9.835

Breast cancer is one of the highly heterogeneous diseases comprising multiple biological factors, causing multiple subtypes. Early diagnosis and accurate subtype prediction of breast cancer play a critical role in the prognosis of cancer and are crucial to providing appropriate treatment for each patient with different subtypes. To identify significant patterns from enormous volumes of genetic and epigenetic data, machine learning-based methods have been adopted to the breast cancer subtype classification. Recently, multi-omics data integration has attracted much attention as a promising approach in recognizing complex molecular mechanisms and providing a comprehensive view of patients. However, because of the characteristics of high dimensionality, multi-omics based approaches are limited in prediction accuracy. In this paper, we propose a neural network-based breast cancer subtype classification model using multi-omics data integration. The gene expression, DNA methylation, and miRNA omics dataset were integrated after preprocessing and the classification model was trained based on the neural network using the dataset. Our performance evaluation results showed that the proposed model outperforms all other methods, providing the highest classification accuracy of 90.45%. We expect this model to be useful in predicting the subtypes of breast cancer and improving patients’ prognosis.

Realtime Video Streaming System over Narrowband Networks

Hyunmin Noh, Seunghwan Lee, Jeung Won Choi, Donghyun Kim, Kyungwoo Kim, Yunsoo Ko, Sangheon Shin, Hyungjun Kim, Hwangjun Song

http://doi.org/10.5626/JOK.2020.47.9.885

In this paper, we propose a real-time video streaming system over narrow networks that provides high-quality video services. The suggested system uses the raptor code, a forward error correction code, to support the reliable and stable data transmission in the narrowband networks. Also, the proposed system adaptively controls the raptor parameters (source symbol size, the number of source symbols, and code rate) according to the narrow network condition and the remaining buffer status. The proposed system is fully implemented on android devices and examined by using a real-time video transmission. Experimental results showed that the proposed system provides high-quality streaming services over the narrowband networks.

Defining Chunks and Chunking using Its Corpus and Bi-LSTM/CRFs in Korean

Young Namgoong, Chang-Hyun Kim, Min-ah Cheon, Ho-min Park, Ho Yoon, Min-seok Choi, Jae-kyun Kim, Jae-Hoon Kim

http://doi.org/10.5626/JOK.2020.47.6.587

There are several notorious problems in Korean dependency parsing: the head position problem and the constituent unit problem. Such problems can be somewhat resolved by chunking. Chunking seeks to locate and classify constituents referred to as chunks into predefined categories. So far, several studies in Korean have been conducted without a clear definition of chunks partially. Thus, we define chunks in Korean thoroughly and build a chunk-tagged corpus based on the definition as well as propose a Bi-LSTM/CRF chunking model using the corpus. Through experiments, we have shown that the proposed model achieved a F1-score of 98.54% and can be used for practical applications. We analyzed performance variations according to word embedding and so fastText showed the best performance. Error analysis was performed so that it could be used to improve the proposed model in the near future.

A Recommendation Scheme for an Optimal Pre-processing Permutation Towards High-Quality Big Data Analytics

Seounghyun Kim, Young-Kyoon Suh, Byungchul Tak

http://doi.org/10.5626/JOK.2020.47.3.319

Today, due to the explosive increase in data, intelligent service research through big data analysis has been actively conducted in various domains. Pre-processing of training data is essential to big data analytics via data mining or machine learning. Although incomplete and inadequate pre-processing for a given dataset can result in unreliable analysis, it is challenging for users to choose the optimal set and sequence of pre-processing functions that leads to the best results. To address this problem, we have designed and implemented a pre-processing evaluation platform that can analyze the performance of a various permutation of pre-processing functions for a given user dataset and then recommend the best permutation. Evaluation results using the real-world dataset demonstrates that the recommended pre-processing permutation yields the best performance in terms of accuracy when compared to the worst pre-processing permutation. By applying the method proposed in this paper, users can choose the best preprocessing permutation, thus being expected to obtain high-quality big data analysis results.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr