Search : [ keyword: Semi-supervised learning ] (9)

Semi-Supervised Object Detection for Small Imbalanced Drama Dataset

Dojin Kim, Unsang Park

http://doi.org/10.5626/JOK.2024.51.11.978

Images of the theme of a drama are typically zoomed-in mainly to people. As a result, people-oriented images are predominant in drama data, and class imbalance naturally occurs. This paper addresses the issue of class imbalance in drama data for object detection tasks and proposes various sampling methods to tackle this challenge within the framework of semi-supervised learning. Experimental evaluations demonstrated that the suggested semi-supervised learning approach with specialized sampling methods outperformed traditional supervised and semi-supervised methods. This study underscores the significance of selecting appropriate training data and sampling methods to optimize object detection performance in specialized datasets with unique characteristics.

An Object Pseudo-Label Generation Technique based on Self-Supervised Vision Transformer for Improving Dataset Quality

Dohyun Kim, Jiwoong Jeon, Seongtaek Lim, Hongchul Lee

http://doi.org/10.5626/JOK.2024.51.1.49

Image segmentation is one of the most important tasks. It localizes objects into bounding boxes and classifies pixels in an image. The performance of an Instance segmentation model requires datasets with labels for objects of various sizes. However, the recently released "Image for Small Object Detection" dataset has large and common objects that lack labels, causing potential performance degradation. In this paper, we improve the quality of datasets by generating pseudo-labels for general objects using an unsupervised learning-based pseudo-labeling methodology to solve the aforementioned problems. Specifically, small object detection performance was improved by (+2.54 AP) compared to the original dataset. Moreover, we were able to prove an increase in performance using only a small amount of data. As a result, it was confirmed that the quality of the dataset was improved through the proposed method.

FedGC: Global Consistency Regularization for Federated Semi-supervised Learning

Gubon Jeong, Dong-Wan Choi

http://doi.org/10.5626/JOK.2022.49.12.1108

Recently, in the field of artificial intelligence, methods of learning neural network models in distributed environments that use sufficient data and hardware have been actively studied. Among them, federated learning, which guarantees privacy preservation without sharing data, has been a dominant scheme. However, existing federated learning methods assume supervised learning using only labeled data. Since labeling costs are incurred for supervised learning, the assumption that only label data exists in the clients is unrealistic. Therefore, this study proposes a federated semi-supervised learning method using both labeled data and unlabeled data, considering a more realistic situation where only labeled data exists on the server and unlabeled data on the client. We designed a loss function considering consistency regularization between the output distributions of the server and client models and analyzed how to adjust the influence of consistency regularization. The proposed method improved the performance of existing semi-supervised learning methods in federated learning settings, and through additional experiments, we analyzed the influence of the loss term and verified the validity of the proposed method.

Semi-Supervised Learning Exploiting Robust Loss Function for Sparse Labeled Data

Youngjun Ahn, Kyuseok Shim

http://doi.org/10.5626/JOK.2021.48.12.1343

This paper proposes a semi-supervised learning method which uses data augmentation and robust loss function when labeled data are extremely sparse. Existing semi-supervised learning methods augment unlabeled data and use one-hot vector labels predicted by the current model if the confidence of the prediction is high. Since it does not use low-confidence data, a recent work has used low-confidence data in the training by utilizing robust loss function. Meanwhile, if labeled data are extremely sparse, the prediction can be incorrect even if the confidence is high. In this paper, we propose a method to improve the performance of a classification model when labeled data are extremely sparse by using predicted probability, instead of one hot vector as the label. Experiments show that the proposed method improves the performance of a classification model.

A Study on the Method for Automatically Constructing a Domain Specific Sentiment Lexicon Based Lexical Relation and Contextual Information

Sangmin Park, Byung-Won On

http://doi.org/10.5626/JOK.2020.47.10.926

A sentiment lexicon is a set of sentiment words, each of which has its sentiment polarity, and is used as a basic method for sentiment analysis. However, the meaning of some words can be different or even the original meaning can disappear across domains. As such, many sentiment words are likely to depend on a specific domain. For example, the verb phrase “slept well” usually has a negative meaning, while it has a positive meaning in movie domains. Thus, given a particular domain such as hotel, the sentiment lexicon should be constructed so that many of the domain-dependent words reflect the meaning of the domain. Using the domain-dependent sentiment lexicon will render more accurate results than using existing sentiment lexicons that do not consider domain-dependent words in the sentiment analysis. To build the domain-dependent sentiment lexicons, various studies have been presented, but there are many limitations including the human intervention and the use of local information rather than contextual information. In this paper, we propose a novel method of automatically constructing a domain-dependent sentiment lexicon based on the global and contextual information and an existing sentiment lexicon (i.e., KNU sentiment lexicon, Glove vector, Conjunction relation).

Semi-Supervised Learning for Detecting of Abusive Sentence on Twitter using Deep Neural Network with Fuzzy Category Representation

Da-Sol Park, Jeong-Won Cha

http://doi.org/10.5626/JOK.2018.45.11.1185

The number of people embracing damage caused by hate speech on the SNS(Social Network Service) is increasing rapidly. In this paper, we propose a detection method using Semi-supervised learning and Deep Neural Network from a large file to determine whether implied meaning of sentence beyond hate speech detection through comparison with a simple dictionary in twitter sentence is abusive or not. Most of the methods judge the hate speech sentence by comparing with a blacklist comprising of hate speech words. However, the reported methods have a disadvantage that skillful and subtle expression of hate speech cannot be identified. So, we created a corpus with a label on whether or not to hate speech on Korean twitter sentence. The training corpus in twitter comprised of 44,000 sentences and the test corpus comprised of 13,082 sentences. The system performance about the explicit abusive sentences of the F1 score was 86.13% on the model using 1-layer syllable CNN and sequence vector. And the system performance about the implicit abusive sentences of the F1 score 25.53% on the model using 1-layer syllable CNN and 2-layer syllable CNN and sequence vector. The proposed method can be used as a method for detecting cyber-bullying.

A Named-Entity Recognition Training Method Using Bagging-Based Bootstrapping

Yujin Jeong, Juae Kim, Youngjoong Ko, Jungyun Seo

http://doi.org/10.5626/JOK.2018.45.8.825

Most previous named-entity(NE) recognition studies have been based on supervised learning methods. Although supervised learning-based NE recognition has performed well, it requires a lot of time and cost to construct a large labeled corpus. In this paper, we propose an NE recognition training method that uses an automatically generated labeled corpus to solve this problem. Since the proposed method uses a large machine-labeled corpus, it can greatly reduce the time and cost needed to generate a labeled corpus manually. In addition, a bagging-based bootstrapping technique is applied to our method in order to correct errors from the machine-labeled data. As a result, experimental results show that the proposed method achieves the highest F1 score of 70.76% by adding the bagging-based bootstrapping technique, which is 5.17%p higher than that of the baseline system.

Graph Construction Based on Fast Low-Rank Representation in Graph-Based Semi-Supervised Learning

Byonghwa Oh, Jihoon Yang

http://doi.org/10.5626/JOK.2018.45.1.15

Low-Rank Representation (LRR) based methods are widely used in many practical applications, such as face clustering and object detection, because they can guarantee high prediction accuracy when used to constructing graphs in graph – based semi-supervised learning. However, in order to solve the LRR problem, it is necessary to perform singular value decomposition on the square matrix of the number of data points for each iteration of the algorithm; hence the calculation is inefficient. To solve this problem, we propose an improved and faster LRR method based on the recently published Fast LRR (FaLRR) and suggests ways to introduce and optimize additional constraints on the underlying optimization goals in order to address the fact that the FaLRR is fast but actually poor in classification problems. Our experiments confirm that the proposed method finds a better solution than LRR does. We also propose Fast MLRR (FaMLRR), which shows better results when the goal of minimizing is added.

A Label Inference Algorithm Considering Vertex Importance in Semi-Supervised Learning

Byonghwa Oh, Jihoon Yang, Hyun-Jin Lee

http://doi.org/

Semi-supervised learning is an area in machine learning that employs both labeled and unlabeled data in order to train a model and has the potential to improve prediction performance compared to supervised learning. Graph-based semi-supervised learning has recently come into focus with two phases: graph construction, which converts the input data into a graph, and label inference, which predicts the appropriate labels for unlabeled data using the constructed graph. The inference is based on the smoothness assumption feature of semi-supervised learning. In this study, we propose an enhanced label inference algorithm by incorporating the importance of each vertex. In addition, we prove the convergence of the suggested algorithm and verify its excellence.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr