Search : [ keyword: correlation ] (10)

Measuring Anonymized Data Utility through Correlation Indicator

Yongki Hong, Gihyuk Ko, Heedong Yang, Chanho Ryu, Seung Hwan Ryu

http://doi.org/10.5626/JOK.2023.50.12.1163

As we transition into an artificial intelligence-driven society, data collection and utilization are actively progressing. Consequently, currently there are emerging technologies and privacy models to convert original data into anonymized data, while ensuring it does not violate privacy guidelines. Notably, privacy models including k-anonymity, l-diversity, and t-closeness are actively being used. Depending on the purpose of the data, the situation, and the degree of privacy, it"s crucial to choose the appropriate models and parameters. Ideally, the best scenario would be maximizing data utility while meeting privacy conditions. This process is called Privacy-Preserving Data Publishing (PPDP). To derive this ideal scenario, it is essential to consider both utility and privacy indicators. This paper introduces a new utility indicator, the Effect Size Average Cost, which can assist privacy administrators to efficiently create anonymized data. This indicator pertains to the correlation change between quasi-identifiers and sensitive attributes. In this study, we conducted experiments to compute and compare this indicator with tables where k-anonymity, l-diversity, and t-closeness were applied respectively. The results identified significant differences in the Effect Size Average Costs for each case, indicating the potential of this indicator as a valid basis for determining which privacy model to adopt.

Predicting Significant Blood Marker Values for Pressure Ulcer Forecasting Utilizing Feature Minimization and Selection

Yeonhee Kim, Hoyoul Jung, Jang-Hwan Choi

http://doi.org/10.5626/JOK.2023.50.12.1054

Pressure ulcers are difficult to treat once they occur, and huge economic costs are incurred during the treatment process. Therefore, predicting the occurrence of pressure ulcers is important in terms of patient suffering and economics. In this study, the correlation between the lab codes (features) and pressure ulcers obtained from blood tests of patients with spinal cord injury was analyzed to provide meaningful characteristic information for the prediction of pressure ulcers. We compare and analyze the correlation coefficients of Pearson, Spearman, and Kendall"s tau, which are mainly used in feature selection methods. In addition, the importance of features is calculated using XGBoost and LightGBM, which are machine learning methods based on gradient boosting. In order to verify the performance of this model, we use the long short-term memory (LSTM) model to predict other features using the features occupying the top-5 in importance. In this way, unnecessary features can be minimized in diagnosing pressure ulcers and guidelines can be provided to medical personnel.

A GCN-based Time-Series Data Anomaly Detection Method using Sensor-specific Time Lagged Cross Correlation

Kangwoo Lee, Yunyeong Kim, Sungwon Jung

http://doi.org/10.5626/JOK.2023.50.9.805

Anomaly detection of equipment through time series data is a very important because it can prevent further damage and contribute to productivity improvement. Although research studies on time series data anomaly detection are being actively conducted, but they have the following restrictions. First, unnecessary false alarms occur because correlations with other sensors are not considered. Second, although complete graph modeling and GAT have been applied to analyze the correlation of each sensor, this method requires a lot of time due to the increase in unnecessary operations. In this paper, we propose SC-GCNAD(Sensor-specific Correlation GCN Anomaly Detection) to address these problems. SC-GCNAD can analyze the exact correlation of each sensor by applying TLCC that reflects characteristics of time series data. It utilize GCN with excellent model expressiveness. As a result, SC-GCNAD can improve F1-Score by up to 6.37% and reduce analysis time by up to 95.31% compared to the baseline model.

A Network Topology Scaling Method for Improving Network Comparison Using Colon Cancer Transcriptome Data

Eonyong Han, Inuk Jung

http://doi.org/10.5626/JOK.2022.49.8.646

Various research methods have been proposed based on gene expression information in the disease analysis model. In cancer transcriptome data analysis, methods of discovering hidden characteristics based on pathways are useful for the interpretation of results. In this study, the gene correlation network in the pathway unit was compared and analyzed based on the gene co-expression data. If there is a difference in the size of the two networks to be compared, the bias of the amount of information results in biased network information on a larger scale. To resolve this bias, the network of patients from different backgrounds was adjusted using the same amount of information in the network configuration. Normalized networks applied comparative analysis of important gene groups using the characteristics of biological networks, normalized 202 pathways networks using data of subtypes of total 4 types of colon cancer, and identified 5 pathways with specific results among subspecies.

A Selection Technique of Source Project in Heterogeneous Defect Prediction based on Correlation Coefficients

Eunseob Kim, Jongmoon Baik, Duksan Ryu

http://doi.org/10.5626/JOK.2021.48.8.920

The software defect prediction techniques try to predict defect-prone modules and ensure the quality of the developing software using previous defect data. Nowadays, heterogeneous defect prediction (HDP) techniques have been applying defect prediction techniques even when the metrics between source and target projects are different. Previous HDP techniques focused on improving prediction performance when the source and target projects were given. However in a real development environment, more than one source projects exist for one target project, thus identifying a project that is suitable for source data is challenging. This paper suggests a correlation-based selection technique for source projects in HDP. After the metric matching process, correlation coefficients are calculated for each corresponding metric, and the project with the highest score is selected for source data. The experiment shows that the performance of the proposed selection method is higher than the results of random selection, and removing projects with less than 100 instances from the source candidates improves the performance. Therefore, using the proposed selection technique could improve the prediction accuracy in HDP.

A Traffic-Classification Method Using the Correlation of the Network Flow

YoungHoon Goo, Kyuseok Shim, Sungho Lee, Baraka D. Sija, MyungSup Kim

http://doi.org/

Presently, the ubiquitous emergence of high-speed-network environments has led to a rapid increase of various applications, leading to constantly complicated network traffic. To manage networks efficiently, the traffic classification of specific units is essential. While various traffic-classification methods have been studied, a methods for the complete classification of network traffic has not yet been developed. In this paper, a correlation model of the network flow is defined, and a traffic-classification method for which this model is used is proposed. The proposed network-correlation model for traffic classification consists of a similarity model and a connectivity model. Suggestion for the effectiveness of the proposed method is demonstrated in terms of accuracy and completeness through experiments.

Rank Correlation Coefficient of Energy Data for Identification of Abnormal Sensors in Buildings

Naeon Kim, Sihyun Jeong, Boyeon Jang, Chong-Kwon Kim

http://doi.org/

Anomaly detection is the identification of data that do not conform to a normal pattern or behavior model in a dataset. It can be utilized for detecting errors among data generated by devices or user behavior change in a social network data set. In this study, we proposed a new approach using rank correlation coefficient to efficiently detect abnormal data in devices of a building. With the increased push for energy conservation, many energy efficiency solutions have been proposed over the years. HVAC (Heating, Ventilating and Air Conditioning) system monitors and manages thousands of sensors such as thermostats, air conditioners, and lighting in large buildings. Currently, operators use the building’s HVAC system for controlling efficient energy consumption. By using the proposed approach, it is possible to observe changes of ranking relationship between the devices in HVAC system and identify abnormal behavior in social network.

Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity

Hee-Geun Yoon, Su Jeong Choi, Seong-Bae Park

http://doi.org/

The existing pattern-based triple generation systems based on distant supervision could be flawed by assumption of distant supervision. For resolving flaw from an excessive assumption, statistics information has been commonly used for measuring confidence of patterns in previous studies. In this study, we proposed a more accurate confidence measure based on semantic similarity between patterns and properties. Unsupervised learning method, word embedding and WordNet-based similarity measures were adopted for learning meaning of words and measuring semantic similarity. For resolving language discordance between patterns and properties, we adopted CCA for aligning bilingual word embedding models and a translation-based approach for a WordNet-based measure. The results of our experiments indicated that the accuracy of triples that are filtered by the semantic similarity-based confidence measure was 16% higher than that of the statistics-based approach. These results suggested that semantic similarity-based confidence measure is more effective than statistics-based approach for generating high quality triples.

Visual Analytics for Abnormal Event detection using Seasonal-Trend Decomposition and Serial-Correlation

Hanbyul Yeon, Yun Jang

http://doi.org/

In this paper, we present a visual analytics system that uses serial- correlation to detect an abnormal event in spatio-temporal data. Our approach extracts the topic-model from spatio-temporal tweets and then filters the abnormal event candidates using a seasonal-trend decomposition procedure based on Loess smoothing (STL). We re-extract the topic from the candidates, and then, we apply STL to the second candidate. Finally, we analyze the serial- correlation between the first candidates and the second candidate in order to detect abnormal events. We have used a visual analytic approach to detect the abnormal events, and therefore, the users can intuitively analyze abnormal event trends and cyclical patterns. For the case study, we have verified our visual analytics system by analyzing information related to two different events: the ‘Gyeongju Mauna Resort collapse’ and the ‘Jindo-ferry sinking’.

Secure Multi-Party Computation of Correlation Coefficients

Sun Kyong Hong, Sang Pil Kim, Hyo Sang Lim, Yang Sae Moon

http://doi.org/

In this paper, we address the problem of computing Pearson correlation coefficients and Spearman’s rank correlation coefficients in a secure manner while data providers preserve privacy of their own data in distributed environment. For a data mining or data analysis in the distributed environment, data providers(data owners) need to share their original data with each other. However, the original data may often contain very sensitive information, and thus, data providers do not prefer to disclose their original data for preserving privacy. In this paper, we formally define the secure correlation computation, SCC in short, as the problem of computing correlation coefficients in the distributed computing environment while preserving the data privacy (i.e., not disclosing the sensitive data) of multiple data providers. We then present SCC solutions for Pearson and Spearman’s correlation coefficients using secure scalar product. We show the correctness and secure property of the proposed solutions by presenting theorems and proving them formally. We also empirically show that the proposed solutions can be used for practical applications in the performance aspect.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr