Search : [ keyword: High Dimensional Data ] (1)

A Dimension Reduction Method for Unsupervised Outlier Detection in High Dimensional Data

Cheong Hee Park

http://doi.org/10.5626/JOK.2022.49.7.537

Among various outlier detection methods, Isolation Forest is known to be very effective in detecting outliers. But it is difficult to apply to high dimensional data due to the sparseness of the data and the limitation of the total number of attributes that can be selected for node partitioning. In this paper, we propose a dimension reduction method for unsupervised outlier detection in high dimensional data. Dimension reduction is performed by linear transformation maximizing kurtosis, and in the transformed space, outlier detection by Isolation Forest is applied. Kurtosis is a statistical measure that can be interpreted as the degree of the presence of outliers in the distribution. A linear transformation is found by using a simple one-layer neural network where a subset of features having the highest kurtosis is used as input features and an objective function, which maximizes kurtosis in output nodes, is set. The experimental results using text data demonstrated the high detection performance of Isolation Forest modeled in the space transformed by the proposed dimension reduction method.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr