AI Glossary: What Is T-Closeness? Definition & Meaning

T-Closenessは privacy model designed to enhance データ保護 in the context of data sharing and publication. It extends earlier models like k-anonymity and l-diversity であり、敏感属性の分布類似性の概念を導入しています。

従来のデータ匿名化の, techniques like k-anonymity focus on making individual records indistinguishable from one another within groups to protect identity. However, these methods can still expose sensitive information by allowing adversaries to infer details based on the remaining data. T-Closeness addresses this vulnerability by ensuring that the distribution of sensitive attribute values in any group of records is close to the 全体の分布これらの値の全データセット内での

The ‘T’ in T-Closeness represents a threshold, which defines how close the distribution of sensitive values in a given group must be to the distribution of the same values in the full dataset. Specifically, T-Closeness requires that the Earth Mover’s Distance (アースムーバー距離) between these two distributions does not exceed the predetermined threshold T. This allows for a more nuanced approach to privacy, as it helps maintain the utility of the data while ensuring that sensitive information cannot be easily inferred from it.

全体として、T-Closenessはデータプライバシー, particularly in scenarios where sensitive information must be shared or analyzed. It strikes a balance between data utility and privacy protection, making it a valuable tool in the fields of data science, healthcare, and any domain where sensitive data is prevalent.