Main
Description
Status
Research
Publications
Demo
People
Funding
Links
|
Research
Open Questions and Requirements
In our research we consider the following open questions and
requirements which we derived from our insights from use cases,
studied in conjunction with our prototype implementaiton.
A Usage-Oriented Policy Language
A language for representing privacy policies for workflows needs
to be developed, together with a semantics for reasoning about it. The
language needs to support a variety of aspects about private
information and privacy relevant algorithms and support novel
types of privacy policies, such as:
- Algorithmic policies, to specify what kinds of data analysis
algorithms are allowed. These could be allowed for specified data
types, for specific data sources, or in a data-independent manner.
For example, group detection algorithms could be disallowed for
use with medical data sources. Another example would be to disable
the use of group detection followed by event detection algorithms
unless the accuracy of the data sources is above a certain level. This
policy may be used to avoid positive identification of individuals as threats
with an accuracy so low that it may be a concern for individuals' liberties.
Algorithmic policies may be contingent on properties of
intermediate data products. Such policies may also express that
certain steps have to be performed before storing a result, or
transmitting data over an unsecured network. Expressing and
reasoning about these types of policies may build on Linear
Temporal Logic which has proved useful in other areas
of computer science, most notably software verification and more
recently automated planning.
- Query-based policies, to specify what kinds of questions
the system is allowed to act upon. These include both
user-issued queries as well as system-generated intermediate
sub-queries. For example, queries regarding payments may be
allowed to the system in accessing any kind of sources including
medical and financial sources, while any sub-queries regarding
the nature or details of patient treatment may be disallowed.
- Data integration policies, to specify at the workflow
level whether diverse data sources could be integrated through
data mining steps. These would essentially control the legal
joining of workflow strands.
- Data creation policies, to specify what kinds of data
may be created by the workflow. This could be specified via
attribute types, entity types, or specific values.
- Provenance policies, to specify what information needs to
be recorded and for how long it needs to be kept. This would
reflect privacy needs for auditing and the stature of
limitations for such requirements. Without these policies, there
are no limits to the amount of details that a system could be
expected to provide well after a workflow is used, so it is best
to state these expectations up front.
These policies augment and are complementary to access policies
for specific data sources or services in the system.
Extending Workflow Systems
Given this language, existing workflow systems would need to be
extended in the following three ways.
-
Workflow creation and execution subsystem need to be
extended.
The workflow creation process that is responsible for selecting
the data mining processes and data sources to be used in answering
a query or line of inquiry needs to be governed by privacy
policies that place constraints on the choices of data sources and
algorithms.
The extended workflow system should exercise full control over the
design of the end-to-end data mining process before any
computation occurs.
The execution system needs to enforce privacy constraints that
regard decisions about where data is being analyzed, and to
enforce aspects that are only evaluable during execution itself.
For example, a privacy policy may state that if the output of a
clustering algorithm contains a cluster with less than k
individuals then the analysis is not allowed. Generally the
fidelity of the models of applied components will not be high
enough to predict such situations ahead of execution.
- Workflow systems need to leave detailed provenance trails of
how data was processed and what mechanisms were used to ensure
compliance with privacy policies by the workflow, both in its
design and in its execution, in order to support transparency and
accountability regarding violation of privacy policies that regard
the use of data. Re-execution of workflows through provenance
trails could be used to prove, during an audit, that a given
result was obtained as advertised.
- Workflow system should support a distributed architecture
for storage and retrieval of policy information. There may be
several ways in which privacy requirements enter the system.
Privacy rules need to be associated with different entities in the
system. Some privacy policies should be associated with data when
it is collected. Other privacy policies would be associated with
collections or types of data (e.g., all the data collected by a
clinical trial). Yet other policies may be application or system
specific (e.g., federal or state privacy laws that may apply).
An important open issue
is the trade-off between privacy and result quality. Many privacy
preserving operations abstract information from the data which
leads to less accurate results. Data descriptions and algorithm
models will have to be extended to represent the relative accuracy
of algorithms based on abstraction data features.
Reasoning about Privacy and Privacy Policies
An important open question is the negotiation of policies.
Mechanisms need to be developed that support argumentation of
``need to know'' to relax privacy requirements if needed. When the
privacy policies are too constraining for the system to find a
solution to a query, it is possible to explore relaxations of some
subset of policies that would enable the original request to be
fulfilled.
By articulating the choices that the system rejected and the
privacy policies that forbid those analyses, the system would be
articulating its ``need to know'' for specific data sources and data
products.
Conversely, the developed mechanisms could be used to check
whether existing information disclosure agreements are indeed
necessary for the purpose, or whether the level of privacy
could be increased, e.g., via the inclusion of additional
anonymization steps, without aversely affecting the quality of the
final result.
Such mechanisms for reasoning about policies
may also assist in
the design of privacy policies themselves, by enabling exploration
of allowable but undesirable workflows under a given set of
policies. This is important, because it may be difficult to design
policies that are complete, in the sense that there is no way to
exploit sensitive data when complying with them.
|