Problem Statement & Scope
Recent years have seen an increase in the usage of digital devices with location capabilities. Mobility/trajectory data are collected by various devices, such as smartphones and cameras. Pervasive usage of these devices, although providing convince, leaves a non-erasable digital trace of the user. The contextual information attached to a trace can be used to crack the habitual patterns and activities of users. Our work quantifies the privacy leakage in a given mobility dataset after it has been anonymized with a location privacy protection mechanism (LPPM). We follow an Adversarial Approach: Taking the role of an attacker to recover the original trajectories given the anonymized trajectory dataset and mobility profiles of individuals
Our problem statement can be summarized as:
"How much user information is leaked after a trajectory dataset has been protected with a Location Privacy Preservation Mechanism"
CONTRIBUTIONS OF OUR WORK:
Identify the right metric to measure privacy of users in an anonymied mobility dataset
Quantify privacy for users in a published anonymized mobility dataset
Identify the best set of methods and parameters that can be used to promote maximum privacy for the mobility datasets
The dataset provides mobility traces of cabs in San Francisco, USA. It contains GPS coordinates of 536 taxis collected for May, 2008. Trace format for this dataset is - [latitude, longitude, occupancy, time]. Time is in UNIX epoch format, latitude longitude in decimal degrees, and occupancy is either 0 or 1 denoting if the cab is occupied or not.
Trajectory dataset containing GPS traces of taxis for the city of Beijing provisioned by T-Drive and Microsoft Research License Agreement (MSR-LA). It contains week-long trajectories of 10,537 taxis with around 15 million data points recorded during Feb 2, 2008 to Feb 8, 2008.
An Adversarial Approach
Our project quantifies the user location privacy of a given mobility dataset through adversarial attacks against different standard Location Privacy Preserving Mechanisms(LPPM).
An adversary is an entity who takes the role of an attacker to recover the original trace of the users given the anonymized trajectory dataset and mobility profiles of individuals. It is assumed that adversary has a knowledge of the anonymization function which is used to anonymize the actual trace. The adversary will also have a knowledge about some part of the actual trace.
The approach can be divided into three steps:
(i) KNOWLEDGE CONSTRUCTION: Knowledge created by the adversary using part of the actual trace. e An adversary collects various pieces of information about the mobility of the users.
(ii) DE-OBFUSCATION: Obtaining the actual location of users which was previously obfuscated.
(iii) TRACING ATTACK: Attack done by the adversary to find out the whole sequence (or a partial subsequence) of the user’s actual trace. The scope of the tracing attack in our work is limited to reconstruction of the entire trajectory.
Quantifying Privacy of Users
Distortion/Correctness was identified to quantify the success of the adversarial attack and measure users' privacy
Correctness/ distortion is the measure of how much of the original trace was reconstructed by the adversary
Correctness measures the distance between adversary's outcome and actual outcome. The actual outcome value is what we want to hide from the adversary
If distance = 0; adversary's outcome and actual outcome are equal. The correctness value is very high, that is, the adversary has complete knowledge and is able to identify the actual trace of the user
Parameters to maximize Privacy
Impact of various LPPM parameters was studied with respect to Privacy. We tuned these parameters to achieve increasing privacy
These parameters included:
Number of Regions
Location Hiding Level
Hiding Levels Probability
Fake Noise Injection Probability
One question remains "How usable the dataset will be to any researcher if the actual information is being hidden and distorted". The scope of our project was limited to reconstruction of the adversary’s trace. The usability of datasets after applying these LPPM mechanisms can be done as a part of future work.