Search : [ keyword: 데이터셋 ] (6)

An Object Pseudo-Label Generation Technique based on Self-Supervised Vision Transformer for Improving Dataset Quality

Dohyun Kim, Jiwoong Jeon, Seongtaek Lim, Hongchul Lee

http://doi.org/10.5626/JOK.2024.51.1.49

Image segmentation is one of the most important tasks. It localizes objects into bounding boxes and classifies pixels in an image. The performance of an Instance segmentation model requires datasets with labels for objects of various sizes. However, the recently released "Image for Small Object Detection" dataset has large and common objects that lack labels, causing potential performance degradation. In this paper, we improve the quality of datasets by generating pseudo-labels for general objects using an unsupervised learning-based pseudo-labeling methodology to solve the aforementioned problems. Specifically, small object detection performance was improved by (+2.54 AP) compared to the original dataset. Moreover, we were able to prove an increase in performance using only a small amount of data. As a result, it was confirmed that the quality of the dataset was improved through the proposed method.

Vision-based Position Deviation Fault Injection Method for Building a Collaborative Robot Motion Fault Dataset

Donghee Yun, Dongyeon Yoo, Jungwon Lee

http://doi.org/10.5626/JOK.2023.50.9.795

The data-based fault detection method, which collects data from internal and external sensors in real-time and predicts fault, is being applied to collaborative robots, which are key facilities in smart factories. The data-based fault detection method requires a large amount of data for learning, and in particular, a large amount of data labeled as a fault state is essential. However, it is difficult to obtain large amounts of actual fault data in industrial settings. Therefore, in this study, the output of the collaborative robot fault state based on a vision sensor was analyzed and compared with the output of the normal state, and a fault injection method was proposed based on the deviation between the analyzed output signals. Collaborative robot data collected in the actual fault state could be replaced with data collected in the proposed fault injection state. The comparison of the performance of the model trained with fault injection data and trained with actual fault data confirmed that there was almost no difference, with an average of 0.97 and 0.98 accuracy, thus verifying the effectiveness of the proposed fault injection method.

Epoch Score: Dataset Verification using Quantitative Data Quality Assessment

Sungryeol Kim, Taewook Hwang, Sangkeun Jung, Yoonhyung Roh

http://doi.org/10.5626/JOK.2023.50.3.250

It is tough to determine whether a dataset is suitable for a model or specified field or whether there is an error. In this paper, we propose an Epoch Score that indicates the degree of difficulty of the data as a score using incorrect answer data obtained through learning several times under the same conditions but different seeds. Through this, we verified KLUE"s Topic Classification dataset, and about 0.8% performance improvement derived by correcting high-scoring data, which we judge to have errors. Epoch Score can be used for all supervised learning regardless of the data type, such as natural language or images, and the performance of the model can be inferred by the area the of the Epoch Score.

RDID-GAN: Reconstructing a De-identified Image Dataset to Generate Effective Learning Data

Wonseok Oh, Kangmin Bae, Yuseok Bae

http://doi.org/10.5626/JOK.2021.48.12.1329

Recently, CCTVs have been installed to prevent or handle various social problems, and there are many efforts to develop visual surveillance systems based on deep neural networks. However, the datasets collected from CCTVs are inappropriate to train models due to privacy issues. Therefore, in this paper, we proposed RDID-GAN, an effective dataset de-identification method that can remove privacy issues and negative effects raised by modifying the dataset using a de-identification procedure. RDID-GAN focuses on a de-identified region to produce competitive results by adopting the attention module. Through the experiments, we compared RDID-GAN and the conventional image-to-image translation models qualitatively and quantitatively.

Korean Machine Reading Comprehension with S²-Net

Cheoneum Park, Changki Lee, Sulyn Hong, Yigyu Hwang, Taejoon Yoo, Hyunki Kim

http://doi.org/10.5626/JOK.2018.45.12.1260

Machine reading comprehension is the task of understanding a given context and identifying the right answer in context. Simple recurrent unit (SRU) solves the vanishing gradient problem in recurrent neural network (RNN) by using neural gate such as gated recurrent unit (GRU), and removes previous hidden state from gate input to improve speed. Self-matching network is used in r-net, and this has a similar effect as coreference resolution can show similar semantic context information by calculating attention weight for its RNN sequence. In this paper, we propose a S²-Net model that add self-matching layer to an encoder using stacked SRUs and constructs a Korean machine reading comprehension dataset. Experimental results reveal the proposed S²-Net model has EM 70.81% and F1 82.48% performance in Korean machine reading comprehension.

A Re-configuration Scheme for Social Network Based Large-scale SMS Spam

Sihyun Jeong, Giseop Noh, Hayoung Oh, Chong-Kwon Kim

http://doi.org/

The Short Message Service (SMS) is one of the most popular communication tools in the world. As the cost of SMS decreases, SMS spam has been growing largely. Even though there are many existing studies on SMS spam detection, researchers commonly have limitation collecting users" private SMS contents. They need to gather the information related to social network as well as personal SMS due to the intelligent spammers being aware of the social networks. Therefore, this paper proposes the Social network Building Scheme for SMS spam detection (SBSS) algorithm that builds synthetic social network dataset realistically, without the collection of private information. Also, we analyze and categorize the attack types of SMS spam to build more complete and realistic social network dataset including SMS spam.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr