RDA/NISO Privacy Implications of Research Data Sets IG Case Statement
See link to Version 3 in attachments to this page below, updated as an Interest Group Charter.
[Old Case Statement]
Working Group Charter
This Working Group will explore issues related to scientific research data sets that contain human subject information, as well as related datasets that have the potential to be combined in a way that can expose private information. Over a period of 18 months, the group will develop a framework for how researchers and repositories should appropriately manage human-subject datasets, to develop a metadata set to describe the privacy-related aspects of research datasets, compile a bibliography of related resources, and to build awareness of the privacy implications of research-data sharing. While privacy is related to the ethical, legal, and data-publishing issues surrounding data management of which privacy is a part, this working group is focused specifically on privacy-related concerns and will support, where appropriate, the related work of other RDA working groups.
The group will work to achieve the following specific outcomes:
- Development of a framework that explains, at a high level, the precautions that data creators, repositories, aggregators and scientists should use in creating, using, preserving, and providing access to research data.
- Definitions of key vectors where privacy issues are evident in the ecosystem of data sharing and reuse.
- An outline of situations where the privacy principles would be applied.
- Identification of key areas of variance in privacy laws or regulations at national and international levels that are significant when sharing data worldwide.
- Definition of a set of technical metadata that can be used to describe privacy-related information contained within a data set, parameters for use, and description of where it should be applied.
- Gather and share a bibliography of data-and-privacy-related materials for public use.
- Advancement of adoption of the principles through an outreach and communications campaign.
The information that comes out of this project will enhance the privacy of people worldwide whose personal data become the subject of research, as well as offer guidelines those involved with the collection, preservation, sharing, use, and re-use of that data. The latter group is very broad, as the number of fields that are using—or could potentially benefit from applying—human subject research data is tremendous. Medicine and psychology are obvious examples, but data science is also being integrated into other fields such as the humanities and social sciences. Disciplines have developed ethics for researchers in these situations and human-subjects review protocols ensure proper treatment of research subjects during studies, but no generalized guidelines yet exist for these same privacy issues in the deposit, preservation, and re-use of such datasets. Institutional Review Boards (IRBs) vet a variety of research processes, including data management and reuse polices primarily within the United States, but even there a 2014 report by the National Academies recommended a number of changes to the IRB system—changes that will be informed by the recommendations produced by this working group. Other parties that might benefit from this effort are research funding bodies, governments, and academic data repositories.
1) Improving the understanding of the privacy issues that relate specifically to research data from distinct stakeholder perspectives.
2) Support of a worldwide dialogue about the privacy issues that surround the sharing, combination, and reuse of research data.
3) Reduction of the risk of an unintentional release of personally identifiable information through the sharing or reuse of research data.
4) The creation and adoption of this framework will reduce the potential risk to scientific discovery writ large that might be caused by the unintentional but significant exposure of personal data.
Engagement with existing work in the area
NISO and RDA are both involved in related work. Related efforts are also being undertaken by outside organizations, and this working group will in some cases include individuals involved in those endeavors. In other cases, the results of the outside work being undertaken will be studied and, where applicable, applied to this project.
The following are some related projects.
NISO has completed a project funded by the Mellon Foundation related to privacy of patron data in library, publisher, and software-provider systems. This effort created a high-level set of principles that will provide the scholarly communications community with a benchmark to relate to these issues. The principles were distributed and discussed in late 2015. While that project was explicitly focused on the U.S. market and not focused on data, but rather on publisher and library end-user services, it is related to, and will inform, this work.
The Research Data Alliance has a number of groups that are exploring related issues as well. An interest group within the RDA is focused on Legal Interoperability for data sets. This group has been developing a core set of principles and guidelines that include best practices through which legal interoperability can be achieved. For human subject data, a core component of legal interoperability deals directly with privacy issues.
A new RDA working group has formed that will explore security and trust as it relates to research data. The group, which has posted its case statement for comment, will be focused primarily on the technological aspects of security and trust building necessary for security of potentially injurious data, if released. Certainly, security is a component of protection of privacy and there are many examples of efforts to securely share information, although a significant portion of privacy-related issues are policy focused, not necessarily technology focused.
Yet another intersecting group within RDA is centered on the topic of Ethics and Social Aspects of Data (ESAD). This group is studying a broad set of issues surrounding data sharing and the culture of scientists. It is creating an annotated bibliography and plans to pursue two additional deliverables, producing educational materials and case studies of ethical dilemmas faced by researchers working with data. Privacy is among many concerns that the group is focused on, although many ethical issues extend well beyond privacy. Conversations between ESAD and this group have already begun and the efforts should be complimentary.
Several additional working groups have similar connections to privacy. As this project develops, liaisons and points of contact with other groups will be explored and fostered. One such effort is the work by the Data & Society Institute and its Council for Big Data, Ethics, and Society (BDES). Data & Society is “a research institute in New York City that is focused on social, cultural, and ethical issues arising from data-centric technological development.” In 2015, the project produced a survey report entitled Human-Subjects Protections and Big Data: Open Questions and Changing Landscapes, which outlines some of the challenges related to scholarly data resources and privacy. In September of that year, BDES announced a new network to “facilitate information sharing, discussion, and community building among academics, practitioners, researchers, and others who seek to raise important questions, share opportunities, and ask for help navigating complex data ethics issues.”
There is a significant project led by the Harvard School of Engineering and Applied Sciences entitled Privacy Tools for Sharing Research Data. This effort is part of a larger National Science Foundation Secure and Trustworthy Cyberspace Project that has received additional support from the Sloan Foundation and Google, Inc. That project’s goals are “to help enable the collection, analysis, and sharing of sensitive data while providing privacy for individual subjects.” A good deal of its work has been around tools to support differential privacy risk assessments as a framework for decision making about the risks and controls necessary to support privacy. The group has developed open course materials, hosted seminars, and produced a variety of papers and presentations. It has also organized a public symposium hosted by the Harvard Institute for Applied Computational Science Privacy in a Networked World, held Friday, January 23, 2015. The symposium included speakers Edward Snowden, Bruce Schneier, John DeLong, John Wilbanks, Lee Rainie, and Cynthia Dwork. This initiative is primarily, though not exclusively, focused on the technological and computational elements of privacy and data protection. NISO has worked closely with several members of this team and plans to include them in this working group.
The group will focus on world-wide legal frameworks and the impacts these frameworks have on data sharing, especially with human-subject data. After gathering these legal strictures and comparing the differences and similarities, the group will begin crafting a set of principles that will provide guidance to the researcher and repository communities on how to manage these data when they are received. Building on these, the group will craft a set of use cases on how the principles will be applied. After these elements are completed, an effort to advance the principles through promotion and community outreach will be developed and executed.
This working group will be open to all RDA and NISO members.
Mechanisms and Coordination
The group will meet face-to-face during the RDA Plenaries to craft its work plan, build interest and awareness, and to share its work with other working and interest groups regarding privacy and data sharing.
The group will meet virtually twice per month during the interim periods between plenaries. This will be undertaken using NISO’s virtual meeting service as well as using the RDA’s supporting services, such as the RDA website and wiki.
Meetings and group communications will be coordinated by NISO staff and the other co-chairs.
NISO is planning to host a public symposium on research data and privacy, in coordination with the RDA P8 in Denver, CO in the fall of 2016.
All documents related to this project will be publicly available on both the RDA website and mirrored on the NISO website.
A timeline for the project is included within the full Case Statement.
For more details on the group timeline and potential group membership, download the full Case Statement A Revision to the full case statement has been posted.
Author: George Alter
Date: 24 Jan, 2016
This is an important issue, and it is a valuable addition to RDA.
My only reservation is that the Case Statement puts most of its focus on legal frameworks. International differences in legal frameworks are an important topic, but there has been extensive development of policies, procedures, and technologies for sharing confidential data that are not driven by specific legislation. There is also rapid development in areas like synthetic data, differential privacy, and secure multi-party computing, which won't be captured by focusing on laws and regulations. It would be useful for the WG to conduct a survey of practices for sharing confidential data to reveal how repositories comply with privacy legislation.
Author: Todd Carpenter
Date: 26 Feb, 2016
Thank you for your comment. Perhaps the Case Statement reads too mcuh as if the focus is solely on legal frameworks. That is not the intention, as we are hoping to make this project broader than that. Perhaps a better formulation might be "policy frameworks", which covers technology, legal, institutional and ethcial norms. That said, there are ethical and systems issues that are adequately covered by other IGs and WGs, which this WG will partner with. With that in mind, we won't be digging too deeply into specific technological framworks for security and technology practice. Reference to existing practice
Your idea of a survey of exisitng practice is a useful one, which we will consider and try to execute on. Our initial phase of work willl include a vairety of data gathering and this fits well within that.
An informal and *very* small-scale survey of some institutional repositories (only in the US) indicated that there was an unwillingness to host data that contains personally identifiable information. This outreach didn't cover a host of domain-specific repositories that specialize in these types of data and I am very reticent to draw any conclusions based on those anectdotes. I expect a more robust survey will highlight exisitng practice and topics around which this committte can build.
Thank you again for the suggestion. I hope we'll be able to connect about this effort, if not in Tokyo, than via phone in the near future.