Welcome to the CS 295 D homepage!
This is special topics course on data privacy and confidentiality. Data privacy and confidentiality issues arise whenever we collect information about individuals using which we could infer properties/attributes that could be deemed as sensitive. Traditionally, privacy research has focused on data collected by census bureau which periodically collects variety of demographic and occupational information. Such information is used for variety of purposes from planning to policy. Since the information may contain attributes deemed sensitive, variety of privacy technologies have been explored in such a context. More recently, over the past 2 decades with the advent of computerization, abundance of hardware for data capture, and the proliferation of the Web, the process of data collection has exploded. Today, web portal such as google and yahoo capture information about individuals to customize our web experience; sensor-based pervasive spaces capture continuously information about indviduals to provide improved/customized services, etc. Indeed, given the ease with which data can be captured and the relative benefits of such data capture has resulted in a situation where data about individuals is omnipresent. This had led to research on privacy and confidentiality challenges. In this course, we will explore such challenges and the technologies that are being developed to address the challenges focusing on the following contexts:
Data Publishing -- when a entity such as census bureau or any other organization publishes data to support other analysis tasks. The data being published may contain identifying and sensitive information which then needs to be scrubbed in order to release the data. A perfect example of publishing is medical information captured from the medical record. Publishing such micro-data is indeed useful for a variety of purposes but any such information release is subject to privacy laws such as HIPAA.
Data Outsourcing -- in data outsourcing scenarios, a user/data owner uses cloud services to push his/her data content to a third party to manage. Such an outsourcing has numerous advantages ranging from improved data management, to support for mobility to improved data sharing. However, the data being outsourced may contain sensitive information which is either a concern of privacy or confidentiality or both. Thus such information needs to be appropriately transformed such that the data service provider cannot infer any sensitive information from data. If the only service provided was that of storage, outsourcing would be relatively straightforward given today's encryption technologies. The challenge builds up service providers provide increasingly more complex services including search, sharing, etc.
Data Collection in Pervasive/Multimedia -- pervasive spaces follow the paradigm of data collection, analysis, and adaptation to provide variety of possibilities in terms of new functionalities and applications. Such a data is typically captured through sensors which could be of multiple forms. Sensor data capture and representation poses new privacy challenges and potentially expose interesting tradeoffs in terms of privacy -- the more invasive the sensing, possibly better a representation of the individual and the state of the system which can lead to better adaptations. The set of papers either determine techniques for inferencing in pervasive spaces and/or establishing privacy policies and/or exploring the tradeoffs.
Data Sharing/Exchange -- while outsourcing, data collection, and publishing have considered the possibility of an owner of data allowing access to others for variety of purposes, the sharing / exchange look at a more symmetric usage of information wherein data owners may share data with each other for the purpose of supporting some analysis. For instance, two pharmacies may want to share data for data mining purposes -e.g., trend analysis. However, sharing at the data level is prohibited (or at least not desired) due to variety reasons. The goal becomes how can one achieve minimal disclosure of data and yet support applications that require data across multiple owners to be analyzed.
The bulk of the class will focus on reading and presentations on recent papers from the database, applied crypto and security communities. The course will also include class projects.
Time and place:
Class: Monday and Wednesdays 2:00- 3:20p in ICS 253
Final: Friday Mar 19, 1:30-3:30pm
Contact:
Sharad Mehrotra: sharad[at]ics[dot]uci[dot]edu
Bijit Hore: bhore[at]ics[dot]uci[dot]edu