Multidimensional Subset-based Systems for Bias Elimination Within Binary Classification Datasets

KyeongSu Byun; Goo Kim; Joonho Kwon

Multidimensional Subset-based Systems for Bias Elimination Within Binary Classification Datasets

KyeongSu Byun

Goo Kim

Joonho Kwon

Vol. 50, No. 5, pp. 383-394, May 2023

10.5626/JOK.2023.50.5.383

improve fairness

data fairness

artificial intelligence fairness

Data

bias

data bias

PDF

Abstract

As artificial intelligence technology develops, artificial intelligence-related fairness issues are drawing attention. As a result, many related studies have been conducted on this issue, but most of the research has focused on developing models and training methods. Research on removing bias existing in data used for learning, which is a fundamental cause, is still insufficient. Therefore, in this paper, we designed and implemented a system that divides the biases existing within the data into label biases and subgroup biases and removes the biases to generate datasets with improved fairness. The proposed system consists of two steps: (1) subset generation and (2) bias removal. First, the subset generator divides the existing data into subsets on formed by a combination of values in an datasets. Subsequently, the subset is divided into dominant and weak groups based on the fairness indicator values obtained by validating the existing datasets based on the validation datasets. Next, the bias remover reduces the bias shown in the subset by repeating the process of sequentially extracting and verifying the dominant group of each subset to reduce the difference from the weak group. Afterwards, the biased subsets are merged and a fair data set is returned. The fairness indicators used for the verification use the F1 score and the equalized odd. Comprehensive experiments with real-world Census incoming data, COMPAS data, and bank marketing data as verification data demonstrated that our proposed system outperformed the existing technique by yielding a better fairness improvement rate and providing more accuracy in most machine learning algorithms.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

K. Byun, G. Kim, J. Kwon, "Multidimensional Subset-based Systems for Bias Elimination Within Binary Classification Datasets," Journal of KIISE, JOK, vol. 50, no. 5, pp. 383-394, 2023. DOI: 10.5626/JOK.2023.50.5.383.

[ACM Style]

KyeongSu Byun, Goo Kim, and Joonho Kwon. 2023. Multidimensional Subset-based Systems for Bias Elimination Within Binary Classification Datasets. Journal of KIISE, JOK, 50, 5, (2023), 383-394. DOI: 10.5626/JOK.2023.50.5.383.

[KCI Style]

변경수, 김구, 권준호, "이진 분류 데이터 세트 내 편향 제거를 위한 다차원 서브셋 기반 시스템," 한국정보과학회 논문지, 제50권, 제5호, 383~394쪽, 2023. DOI: 10.5626/JOK.2023.50.5.383.

[Endnote/Zotero/Mendeley (RIS)] Download

[BibTeX] Download

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr