Digital Library[ Search Result ]
Model Contrastive Federated Learning on Re-Identification
Seongyoon Kim, Woojin Chung, Sungwoo Cho, Yongjin Yang, Shinhyeok Hwang, Se-Young Yun
http://doi.org/10.5626/JOK.2024.51.9.841
Advances in data collection and computing power have dramatically increased the integration of AI technology into various services. Traditional centralized cloud data processing raises concerns over the exposure of sensitive user data. To address these issues, federated learning (FL) has emerged as a decentralized training method where clients train models locally on their data and send locally updated models to a central server. The central server aggregates these locally updated models to improve a global model without directly accessing local data, thereby enhancing data privacy. This paper presents FedCON, a novel FL framework specifically designed for re-identification (Re-ID) tasks across various domains. FedCON integrates contrastive learning with FL to enhance feature representation, which is crucial for Re-ID tasks that emphasize similarity between feature vectors to match identities across different images. By focusing on feature similarity, FedCON can effectively addresses data heterogeneity challenges and improve the global model's performance in Re-ID applications. Empirical studies on person and vehicle Re-ID datasets demonstrated that FedCON outperformed existing FL methods for Re-ID. Our experiments with FedCON on various CCTV datasets for person Re-ID showed superior performance to several baselines. Additionally, FedCON significantly enhanced vehicle Re-ID performance on real-world datasets such as VeRi-776 and VRIC, demonstrating its practical applicability.
A Differential-Privacy Technique for Publishing Density-based Clustering Results
Namil Kim, Incheol Baek, Hyubjin Lee, Minsoo Kim, Yon Dohn Chung
http://doi.org/10.5626/JOK.2024.51.4.380
Clustering techniques group data with similar characteristics. Density-Based Spatial Clustering Analysis (DBSCAN) is widely used in various fields as it can detect outliers and is not affected by data distribution. However, the conventional DBSCAN method has a vulnerability where privacy-sensitive personal information in the original data can be easily exposed in the clustering results. Therefore, disclosing and distributing such data without appropriate privacy protection poses risks. This paper proposes a method to generate DBSCAN results that satisfy differential privacy. Additionally, a post-processing technique is introduced to effectively reduce noise introduced during the application of differential privacy and to process the data for future analysis. Through experiments, we observed that the proposed method enhances the utility of the data while satisfying differential privacy.
Homomorphic Encryption-Based Support Computation for Privacy-Preserving Association Analysis
Yunsoo Park, Lynin Sokhonn, Munkyu Lee
http://doi.org/10.5626/JOK.2024.51.3.203
Homomorphic encryption is a cryptographic scheme that enables computation on ciphertexts without decryption. Homomorphic encryption is attracting attention as a cryptographic technology that can solve the issue of user privacy invasion in machine learning and cloud services. A representative scheme of homomorphic encryption is the CKKS scheme. CKKS is an approximate homomorphic encryption scheme that supports real and complex number operations. In this paper, we propose a method to efficiently compute support among evaluation metrics of association analysis using CKKS scheme, and a method to compute supports in parallel using matrix multiplication for multiple itemsets. We implemented and evaluated the proposed method to compute supports using the HEaaN library. According to evaluation results, the support value calculated by the proposed method was almost identical to that calculated without encryption, confirming that the proposed method could effectively calculate the support value while protecting user data privacy.
Measuring Anonymized Data Utility through Correlation Indicator
Yongki Hong, Gihyuk Ko, Heedong Yang, Chanho Ryu, Seung Hwan Ryu
http://doi.org/10.5626/JOK.2023.50.12.1163
As we transition into an artificial intelligence-driven society, data collection and utilization are actively progressing. Consequently, currently there are emerging technologies and privacy models to convert original data into anonymized data, while ensuring it does not violate privacy guidelines. Notably, privacy models including k-anonymity, l-diversity, and t-closeness are actively being used. Depending on the purpose of the data, the situation, and the degree of privacy, it"s crucial to choose the appropriate models and parameters. Ideally, the best scenario would be maximizing data utility while meeting privacy conditions. This process is called Privacy-Preserving Data Publishing (PPDP). To derive this ideal scenario, it is essential to consider both utility and privacy indicators. This paper introduces a new utility indicator, the Effect Size Average Cost, which can assist privacy administrators to efficiently create anonymized data. This indicator pertains to the correlation change between quasi-identifiers and sensitive attributes. In this study, we conducted experiments to compute and compare this indicator with tables where k-anonymity, l-diversity, and t-closeness were applied respectively. The results identified significant differences in the Effect Size Average Costs for each case, indicating the potential of this indicator as a valid basis for determining which privacy model to adopt.
Privacy-Preserving Data Publishing: Research on Trends in De-identification Techniques for Structured and Unstructured Data
Yongki Hong, Gihyuk Ko, Heedong Yang, Seung Hwan Ryu
http://doi.org/10.5626/JOK.2023.50.11.1008
The advent of AI has seen an increased demand for data for AI development, leading to a proliferation of data sharing and distribution. However, there is also the risk of personal information disclosure during data utilization and therefore, it is necessary to undergo a process of de-identification before distributing the data. Privacy-Preserving Data Publishing (PPDP) is a series of procedures aimed at adhering to specified privacy guidelines while maximizing the utility of data. It has been continuously researched and developed. Since the early 2000s, techniques for de-identifying structured data (e.g., tables or relational data) were studied. As a significant portion of the collected data is now unstructured data and its proportion is increasing, research on de-identification techniques for unstructured data is also actively being conducted. In this paper, we aim to introduce the existing de-identification techniques for structured data and discuss recent trends in de-identification techniques for unstructured data.
A Privacy-preserving Histogram Construction Method Guaranteeing the Differential Privacy
In Cheol Baek, Jongseon Kim, Yon Dohn Chung
http://doi.org/10.5626/JOK.2022.49.6.488
With the widespread use of data collection and analysis, the need for preserving the privacy of individuals is emerging. Various privacy models have been proposed to guarantee privacy while collecting and analyzing data in a privacy-preserving manner. Among various privacy models, the differential privacy stands as the de facto standard. In this paper, we propose a privacy-preserving histogram construction method guaranteeing differential privacy. The proposed method consists of histogram bin setting and frequency calculation stages. In the first stage, we use the Laplace mechanism to heuristic bin setting algorithms to select a differentially private number of bins. In the second stage, we use the Laplace mechanism to each frequency falling into the bins to output differentially private frequencies. We prove the proposed method guarantees differential privacy and compare the accuracy according to privacy budget values and distribution rates through experiments.
Network-level Tracker Detection Using Features of Encrypted Traffic
Dongkeun Lee, Minwoo Joo, Wonjun Lee
http://doi.org/10.5626/JOK.2022.49.4.314
Third-party trackers breach users’ data privacy by compiling large amounts of personal data such as location or browsing history through web tracking techniques. Although previous research has proposed several methods to protect the users from web tracking via its detection and blockage, their effectiveness is limited in terms of dependency or performance. To this end, this paper proposes a novel approach to detect trackers at the network level using features of encrypted traffic. The proposed method first builds a classification model based on the features extracted from side-channel information of encrypted traffic generated by trackers. It then prevents leakage of user information by accurately detecting tracker traffic within the network independently from the user’s browsers or devices. We validate the feasibility of utilizing features of encrypted traffic in tracker detection by studying the distinctive characteristics of tracker traffic derived from real-world encrypted traffic analysis.
Time-series Location Data Collection and Analysis Under Local Differential Privacy
Kijung Jung, Hyukki Lee, Yon Dohn Chung
http://doi.org/10.5626/JOK.2022.49.4.305
As the prevalence of smart devices that can generate location data, the number of location-based services is exploding. Since the user’s location data are sensitive information, if the original data are utilized in their original form, the privacy of individuals could be breached. In this study, we proposed a time-series location data collection and analysis method that satisfies local differential privacy, which is a strong privacy model for the data collection environment and considers the characteristics of time-series location data. In the data collection process, the location of an individual is expressed as a bit array. After that, each bit of the array is perturbed by randomized responses for privacy preservation. In the data analysis process, we analyzed the location frequency using hidden Markov model. Moreover, we performed additional spatiotemporal correlation analysis, which is not possible in the existing analysis methods. To demonstrate the performance of the proposed method, we generated trajectory data based on the Seoul subway and analyzed the results of our method.
Privacy-preserving Pre-computation of Join Selectivity using Differential Privacy for the Proliferation of Pseudonymized Data Combination
Hyubjin Lee, Jong Seon Kim, Yon Dohn Chung
http://doi.org/10.5626/JOK.2022.49.3.250
With the enforcement of 3 data acts, pseudonymized information from various domains can be joined through certified expert agencies. Before joining all pseudonymized information, the expert agency provides a service that can compute the join selectivity in advance. However, the existing join selectivity pre-computation methods have vulnerabilities that can lead to privacy breaches. In this paper, we propose a privacy-preserving join selectivity pre-computation method that uses randomly generated one-time key values provided by the expert agency for anonymizing data through a one-way hash technique, and ensures differential privacy when pre-computing join selectivity. The proposed method ensures the anonymity of the data sent by the join requesting institutions to the expert agency and prevents privacy breaches that may occur in the previous join selectivity pre-computation methods. The experimental results showed that the proposed method provided effective join selectivity while satisfying differential privacy.
Research on WGAN models with Rényi Differential Privacy
Sujin Lee, Cheolhee Park, Dowon Hong, Jae-kum Kim
http://doi.org/10.5626/JOK.2021.48.1.128
Personal data is collected through various services and managers extract values from the collected data and provide individually customized services by analyzing the results. However, data that contains sensitive information, such as medical data, must be protected from privacy breaches. Accordingly, to mitigate privacy invasion, Generative Adversarial Network(GAN) is widely used as a model for generating synthetic data. Still, privacy vulnerabilities exist because GAN models can learn not only the characteristics of the original data but also the sensitive information contained in the original data. Hence, many studies have been conducted to protect the privacy of GAN models. In particular, research has been actively conducted in the field of differential privacy, which is a strict privacy notion. But it is insufficient to apply it to real environments in terms of the usefulness of the data. In this paper, we studied GAN models with Rényi differential privacy, which preserve the utility of the original data while ensuring privacy protection. Specifically, we focused on WGAN and WGAN-GP models, compared synthetic data generated from non-private and differentially private models, and analyzed data utility in each scenario.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr