Main
Description
Status
Research
Publications
Demo
People
Funding
Links
|
Description
While there is a plethora of mechanisms to ensure lawful access to
privacy-protected data, additional research is required in order to
reassure individuals that their personal data is being used for
the purpose that they consented to. This is particularly
important in the context of new data mining approaches, as used, for
instance, in biomedical research and commercial data mining.
In this project we investigate the use of computational workflows to
ensure and enforce appropriate use of sensitive personal
data. Computational workflows describe in a declarative manner the
data processing steps and the expected results of complex data
analysis processes such as data mining. We see workflows as an
artifact that captures, among other things, how data is being used and
for what purpose.
We therefore believe that computational workflow systems are a good
starting point and could be extended to support a variety of privacy
related tasks including:
- Ensuring compliance of a data analysis system with
specified privacy policies before enabling execution and during
execution via monitoring.
- Assisting users to comply with required privacy policies
by selecting data analysis workflows that comply with those
policies for the datasets to be analyzed.
- Enabling transparency of data analysis systems that use
sensitive information, including the generation of detailed
provenance trails.
- Supporting accountability with respect to the appropriate
use of data in compliance with privacy policies.
- Supporting negotiation and relaxation of privacy policies
as well as access to data, by providing evidence for the ``need to
know'' of sensitive data and, conversely, the ability to identify
opportunities for an increase in privacy where such measures do not
aversly affect quality.
More specifically, we are extending the Wings Workflow System.
Reasoning about Privacy Policies in Wings
We created a prototype of a workflow system that checks privacy
policies for workflows based on Wings. The workflows describe how data
is used in terms of how it is analyzed and processed. To exemplify
applications that could raise privacy concerns regarding use, we
modeled data mining algorithms that could be used as workflow steps,
called components, and created semantic representations of data
and workflows that use those components. Both, components and data
were described in OWL/RDF.
We first defined a component catalog that contained a range
of data mining algorithms as well as privacy preservation
techniques. The catalog was not meant to be exhaustive, but rather
be representative of the kinds of algorithms that are relevant to
reasoning about privacy. Data mining algorithms included
clustering methods (e.g., k-means, Gaussian mixture models),
manifold learning (e.g., GTM), and classification (e.g., SVM).
Privacy preservation techniques were divided into two subclasses:
per attribute and per dataset. The former had several subclasses
including anonymization, perturbation, and encryption. The class
of privacy preservation techniques per dataset included
generalization algorithms such as k-anonymity.
We also defined a data ontology with semantic
representations of datasets, which essentially provided a meta-data
vocabulary that we could use to reason about how datasets are
transformed by the workflow components upon execution. Roughly,
attributes of datasets had associated properties that expressed
whether the attributes were protected by privacy preservation
methods (e.g., whether they were anonymized).
In addition, domain-specific ontologies were used to express the
use that was authorized by the individuals when the data was
collected.
Using this data ontology, we populated a {\em data catalog} with
initial datasets and specified meta-data attributes and values
using the ontology. Finally, we defined workflows whose
computational steps were elements of the component catalog and
whose input datasets were elements of the data catalog.
We defined rules that would represent reasonable constraints to
address privacy protection. Each rule had a context that referred to
the condition where the underlying policy was relevant, so that the
policy applied only if this condition was satisfied, and a set of
requirements that represented non-amendable conditions under which the
use of data was required or not allowed.
More (Research)
|