A Study on Development of Technology to Improve Imbalanced Data Problems in Numerical Dataset Using Tomek Links Method combined with Balancing GAN 


Vol. 47,  No. 10, pp. 974-984, Oct.  2020
10.5626/JOK.2020.47.10.974


PDF

  Abstract

Machine Learning is useful due to its good performance and application in various fields such as data classification, voice recognition and predictive models. However, there exists a problem regarding the imbalance between classes in the training dataset, which degrades the classification performance of the minority class. In this paper, we propose a new data augmentation method that combines the Balancing GAN and Tomek Links Method to solve the Imbalanced Data problem and find a clear decision boundary. To verity the proposed method, we have evaluated the performance according to the classification model using five datasets. Moreover, the performance has been compared with Data Sampling and GAN based Data Augmentation Techniques. The results showed that the classification performance was improved or maintained by 0.05~0.195 in 17 of the total 25 performance evaluations. The method proposed in this paper showed the potential as a new method to solve the Imbalanced Data problem.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

H. Na, S. Park, D. Choi, "A Study on Development of Technology to Improve Imbalanced Data Problems in Numerical Dataset Using Tomek Links Method combined with Balancing GAN," Journal of KIISE, JOK, vol. 47, no. 10, pp. 974-984, 2020. DOI: 10.5626/JOK.2020.47.10.974.


[ACM Style]

Hyunsik Na, Sohee Park, and Daeseon Choi. 2020. A Study on Development of Technology to Improve Imbalanced Data Problems in Numerical Dataset Using Tomek Links Method combined with Balancing GAN. Journal of KIISE, JOK, 47, 10, (2020), 974-984. DOI: 10.5626/JOK.2020.47.10.974.


[KCI Style]

나현식, 박소희, 최대선, "수치 데이터 세트에서 Tomek Links 방법과 Balancing GAN을 결합한 불균형 데이터 문제 개선 기술," 한국정보과학회 논문지, 제47권, 제10호, 974~984쪽, 2020. DOI: 10.5626/JOK.2020.47.10.974.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr