AI Glossary: What Is T-Closeness? Definition & Meaning

T-Closeness ist ein privacy model designed to enhance Datenschutz in the context of data sharing and publication. It extends earlier models like k-anonymity and l-diversity durch die Einführung des Konzepts der Verteilungsähnlichkeit für sensible Attribute.

Bei herkömmlichen Datenanonymisierung, techniques like k-anonymity focus on making individual records indistinguishable from one another within groups to protect identity. However, these methods can still expose sensitive information by allowing adversaries to infer details based on the remaining data. T-Closeness addresses this vulnerability by ensuring that the distribution of sensitive attribute values in any group of records is close to the Gesamtverteilung dieser Werte im gesamten Datensatz.

The ‘T’ in T-Closeness represents a threshold, which defines how close the distribution of sensitive values in a given group must be to the distribution of the same values in the full dataset. Specifically, T-Closeness requires that the Earth Mover’s Distance (EMD) between these two distributions does not exceed the predetermined threshold T. This allows for a more nuanced approach to privacy, as it helps maintain the utility of the data while ensuring that sensitive information cannot be easily inferred from it.

Insgesamt bietet T-Closeness einen robusten Rahmen für Datenschutz, particularly in scenarios where sensitive information must be shared or analyzed. It strikes a balance between data utility and privacy protection, making it a valuable tool in the fields of data science, healthcare, and any domain where sensitive data is prevalent.