Search : [ keyword: 상관관계 ] (7)

Measuring Anonymized Data Utility through Correlation Indicator

Yongki Hong, Gihyuk Ko, Heedong Yang, Chanho Ryu, Seung Hwan Ryu

http://doi.org/10.5626/JOK.2023.50.12.1163

As we transition into an artificial intelligence-driven society, data collection and utilization are actively progressing. Consequently, currently there are emerging technologies and privacy models to convert original data into anonymized data, while ensuring it does not violate privacy guidelines. Notably, privacy models including k-anonymity, l-diversity, and t-closeness are actively being used. Depending on the purpose of the data, the situation, and the degree of privacy, it"s crucial to choose the appropriate models and parameters. Ideally, the best scenario would be maximizing data utility while meeting privacy conditions. This process is called Privacy-Preserving Data Publishing (PPDP). To derive this ideal scenario, it is essential to consider both utility and privacy indicators. This paper introduces a new utility indicator, the Effect Size Average Cost, which can assist privacy administrators to efficiently create anonymized data. This indicator pertains to the correlation change between quasi-identifiers and sensitive attributes. In this study, we conducted experiments to compute and compare this indicator with tables where k-anonymity, l-diversity, and t-closeness were applied respectively. The results identified significant differences in the Effect Size Average Costs for each case, indicating the potential of this indicator as a valid basis for determining which privacy model to adopt.

Predicting Significant Blood Marker Values for Pressure Ulcer Forecasting Utilizing Feature Minimization and Selection

Yeonhee Kim, Hoyoul Jung, Jang-Hwan Choi

http://doi.org/10.5626/JOK.2023.50.12.1054

Pressure ulcers are difficult to treat once they occur, and huge economic costs are incurred during the treatment process. Therefore, predicting the occurrence of pressure ulcers is important in terms of patient suffering and economics. In this study, the correlation between the lab codes (features) and pressure ulcers obtained from blood tests of patients with spinal cord injury was analyzed to provide meaningful characteristic information for the prediction of pressure ulcers. We compare and analyze the correlation coefficients of Pearson, Spearman, and Kendall"s tau, which are mainly used in feature selection methods. In addition, the importance of features is calculated using XGBoost and LightGBM, which are machine learning methods based on gradient boosting. In order to verify the performance of this model, we use the long short-term memory (LSTM) model to predict other features using the features occupying the top-5 in importance. In this way, unnecessary features can be minimized in diagnosing pressure ulcers and guidelines can be provided to medical personnel.

A GCN-based Time-Series Data Anomaly Detection Method using Sensor-specific Time Lagged Cross Correlation

Kangwoo Lee, Yunyeong Kim, Sungwon Jung

http://doi.org/10.5626/JOK.2023.50.9.805

Anomaly detection of equipment through time series data is a very important because it can prevent further damage and contribute to productivity improvement. Although research studies on time series data anomaly detection are being actively conducted, but they have the following restrictions. First, unnecessary false alarms occur because correlations with other sensors are not considered. Second, although complete graph modeling and GAT have been applied to analyze the correlation of each sensor, this method requires a lot of time due to the increase in unnecessary operations. In this paper, we propose SC-GCNAD(Sensor-specific Correlation GCN Anomaly Detection) to address these problems. SC-GCNAD can analyze the exact correlation of each sensor by applying TLCC that reflects characteristics of time series data. It utilize GCN with excellent model expressiveness. As a result, SC-GCNAD can improve F1-Score by up to 6.37% and reduce analysis time by up to 95.31% compared to the baseline model.

A Network Topology Scaling Method for Improving Network Comparison Using Colon Cancer Transcriptome Data

Eonyong Han, Inuk Jung

http://doi.org/10.5626/JOK.2022.49.8.646

Various research methods have been proposed based on gene expression information in the disease analysis model. In cancer transcriptome data analysis, methods of discovering hidden characteristics based on pathways are useful for the interpretation of results. In this study, the gene correlation network in the pathway unit was compared and analyzed based on the gene co-expression data. If there is a difference in the size of the two networks to be compared, the bias of the amount of information results in biased network information on a larger scale. To resolve this bias, the network of patients from different backgrounds was adjusted using the same amount of information in the network configuration. Normalized networks applied comparative analysis of important gene groups using the characteristics of biological networks, normalized 202 pathways networks using data of subtypes of total 4 types of colon cancer, and identified 5 pathways with specific results among subspecies.

A Selection Technique of Source Project in Heterogeneous Defect Prediction based on Correlation Coefficients

Eunseob Kim, Jongmoon Baik, Duksan Ryu

http://doi.org/10.5626/JOK.2021.48.8.920

The software defect prediction techniques try to predict defect-prone modules and ensure the quality of the developing software using previous defect data. Nowadays, heterogeneous defect prediction (HDP) techniques have been applying defect prediction techniques even when the metrics between source and target projects are different. Previous HDP techniques focused on improving prediction performance when the source and target projects were given. However in a real development environment, more than one source projects exist for one target project, thus identifying a project that is suitable for source data is challenging. This paper suggests a correlation-based selection technique for source projects in HDP. After the metric matching process, correlation coefficients are calculated for each corresponding metric, and the project with the highest score is selected for source data. The experiment shows that the performance of the proposed selection method is higher than the results of random selection, and removing projects with less than 100 instances from the source candidates improves the performance. Therefore, using the proposed selection technique could improve the prediction accuracy in HDP.

Visual Analytics for Abnormal Event detection using Seasonal-Trend Decomposition and Serial-Correlation

Hanbyul Yeon, Yun Jang

http://doi.org/

In this paper, we present a visual analytics system that uses serial- correlation to detect an abnormal event in spatio-temporal data. Our approach extracts the topic-model from spatio-temporal tweets and then filters the abnormal event candidates using a seasonal-trend decomposition procedure based on Loess smoothing (STL). We re-extract the topic from the candidates, and then, we apply STL to the second candidate. Finally, we analyze the serial- correlation between the first candidates and the second candidate in order to detect abnormal events. We have used a visual analytic approach to detect the abnormal events, and therefore, the users can intuitively analyze abnormal event trends and cyclical patterns. For the case study, we have verified our visual analytics system by analyzing information related to two different events: the ‘Gyeongju Mauna Resort collapse’ and the ‘Jindo-ferry sinking’.

Secure Multi-Party Computation of Correlation Coefficients

Sun Kyong Hong, Sang Pil Kim, Hyo Sang Lim, Yang Sae Moon

http://doi.org/

In this paper, we address the problem of computing Pearson correlation coefficients and Spearman’s rank correlation coefficients in a secure manner while data providers preserve privacy of their own data in distributed environment. For a data mining or data analysis in the distributed environment, data providers(data owners) need to share their original data with each other. However, the original data may often contain very sensitive information, and thus, data providers do not prefer to disclose their original data for preserving privacy. In this paper, we formally define the secure correlation computation, SCC in short, as the problem of computing correlation coefficients in the distributed computing environment while preserving the data privacy (i.e., not disclosing the sensitive data) of multiple data providers. We then present SCC solutions for Pearson and Spearman’s correlation coefficients using secure scalar product. We show the correctness and secure property of the proposed solutions by presenting theorems and proving them formally. We also empirically show that the proposed solutions can be used for practical applications in the performance aspect.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr