Digital Library[ Search Result ]
Measuring Anonymized Data Utility through Correlation Indicator
Yongki Hong, Gihyuk Ko, Heedong Yang, Chanho Ryu, Seung Hwan Ryu
http://doi.org/10.5626/JOK.2023.50.12.1163
As we transition into an artificial intelligence-driven society, data collection and utilization are actively progressing. Consequently, currently there are emerging technologies and privacy models to convert original data into anonymized data, while ensuring it does not violate privacy guidelines. Notably, privacy models including k-anonymity, l-diversity, and t-closeness are actively being used. Depending on the purpose of the data, the situation, and the degree of privacy, it"s crucial to choose the appropriate models and parameters. Ideally, the best scenario would be maximizing data utility while meeting privacy conditions. This process is called Privacy-Preserving Data Publishing (PPDP). To derive this ideal scenario, it is essential to consider both utility and privacy indicators. This paper introduces a new utility indicator, the Effect Size Average Cost, which can assist privacy administrators to efficiently create anonymized data. This indicator pertains to the correlation change between quasi-identifiers and sensitive attributes. In this study, we conducted experiments to compute and compare this indicator with tables where k-anonymity, l-diversity, and t-closeness were applied respectively. The results identified significant differences in the Effect Size Average Costs for each case, indicating the potential of this indicator as a valid basis for determining which privacy model to adopt.
Privacy-Preserving Data Publishing: Research on Trends in De-identification Techniques for Structured and Unstructured Data
Yongki Hong, Gihyuk Ko, Heedong Yang, Seung Hwan Ryu
http://doi.org/10.5626/JOK.2023.50.11.1008
The advent of AI has seen an increased demand for data for AI development, leading to a proliferation of data sharing and distribution. However, there is also the risk of personal information disclosure during data utilization and therefore, it is necessary to undergo a process of de-identification before distributing the data. Privacy-Preserving Data Publishing (PPDP) is a series of procedures aimed at adhering to specified privacy guidelines while maximizing the utility of data. It has been continuously researched and developed. Since the early 2000s, techniques for de-identifying structured data (e.g., tables or relational data) were studied. As a significant portion of the collected data is now unstructured data and its proportion is increasing, research on de-identification techniques for unstructured data is also actively being conducted. In this paper, we aim to introduce the existing de-identification techniques for structured data and discuss recent trends in de-identification techniques for unstructured data.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr