RDA COVID19 Coordination Activity Overview TO ALL GROUPS Feedback from Tom Dafoe

TO ALL GROUPS Feedback from Tom Dafoe

Creator

Discussion
June 9, 2020 at 5:53 pm #104003

Natalie Harrower
Participant

Dear moderators,
We received a document with very specific comments, well laid out by page number, from Tom Dafoe. I have posted it here for your reference: https://docs.google.com/document/d/1NKZvUvic8W2psegSWaBtvLza30cTm5Tb/edit
To make things easier, I am below pasting the parts to relate specifically to each of your sections with handy colour coding to add a bit of flair to such a long email. There is something for everyone here.
I pass the baton to you to work into the final release (which is still here https://docs.google.com/document/d/1qEa6dnumnQBbXDDFYWwVGi7ILElfPSpi/edi…)
Thank you!
Natalie
CLINICAL
Pg. 20 – “Measures should be taken in order to organise the transferral of data and trial documents to a suitable and secure data repository to help ensure that the data are properly prepared, available in the longer term, stored securely and subject to rigorous governance.”
Consider rephrasing to “… available in the longer term, stored securely (with respect to access control, confidentiality, and integrity), and …”.
Pg. 21 – “Due to pressure to rapidly publish and make data available, there may be a greater risk of data not being properly de-identified (anonymised) prior to data sharing. For this reason, measures to protect and properly de-identify data is paramount (e.g. specific data use agreements).”
Anonymisation and de-identification are clearly delineated later in the document (e.g., pg. 56, pg. 69-70) but are conflated somewhat here. Revision 5 has improved upon the degree to which this occurs in the document vs. revision 4, but this is an example (among a few others) that remains. Consider rephrasing to more clearly align with similar references in the rest of the document and ensure consistency/clarity throughout.
_________________
OMICS
Pg. 31 – “There are no widely accepted standards for X-ray raw data files. Generally these are stored and archived in the vendor’s native formats.”
Is this consistent with the imaging data statement on pg. 24, regarding the DICOM format and tags?
_________________
EPIDEMIOLOGY
Pg. 37 – “Develop systems that support workflows to link and share data between different domains, while protecting privacy and security. Use domain specific, time stamped, encrypted person identifiers for this purpose.”
Consider rephrasing to: “Use domain specific, time stamped, encrypted person identifiers for this purpose, based on industry-standard encryption and cryptographic constructions”.
Pg. 40 – “Data sharing is essential to improve epidemiological analysis, cross-border pandemic modelling, and coordinated policy development between countries. To ensure privacy, both pseudo-anonymisation of direct identifiers (e.g. patient specific ID’s) and anonymisation of indirect identifiers (e.g. socio-demographic information on individuals) must be applied. In addition, it is necessary to control statistical disclosure risk to prevent identification of individuals and their health status using a combination of indirect identifiers such as education level, sex, age, and clinical condition, among others (Duncan et al., 2011; Templ et al., 2015; Templ, 2017). Using synthetic data may be an option to lower re-identification risks while retaining properties of the original data sets.”
I have some concerns about the clarity of this section, and how it contrasts with the content presented on pages 69 and 70. De-identification is not mentioned in this section, yet the risks presented are relevant to de-identification. Consider review of sources such as the following regarding improving the clarity of this section and aligning it with other content in the document:
https://iapp.org/news/a/de-identification-vs-anonymization/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6502465/
_________________
SOCIAL SCIENCES
Pg. 45 – “Researchers with sensitive data or data with disclosure risk should seek a storage solution for their data which offers flexibility and protection, such as a solution offering remote access work (German Data Forum (RatSWD), 2020).”
Consider rephrasing to “… which offers flexibility and security safeguards, such as a solution offering secure remote access”.
Pg. 45 – “Sensitive data and human subject data containing personally identifiable information (PII) or protected health information (PHI) should be adequately protected and encrypted when at rest or in transit, and no matter where or how it is stored.”
Consider rephrasing to “… should be adequately protected and encrypted, using industry-standard methods, when at rest or in transit …”.
Pg. 45 – “Ensure that data should be backed up in multiple locations all under the same security conditions …”
Consider rephrasing to “… data should be backed up in multiple authorized, managed locations, all under the same security conditions and agreements …”.
Pg. 45 – “Using data from social media introduces additional issues. Individuals creating and sharing content may not regard this as a public space and have an expectation of a degree of privacy. Furthermore, social networks by definition reveal connections between many individuals; thus an individual post or tweet may provide information on many different data subjects without their knowledge or consent. In addition, researchers collecting data from the web should ensure they have sufficient rights to do so to safeguard their ability to use the data; many websites have terms and conditions that prohibit data collection, particularly via web scraping and other automated methods.”
Surprised to not see any mention of metadata, re-identification risk associated with data that has extracted or derived from social media, or caution regarding the reliability of this data.
Pg. 47 – “Deposit quality-controlled research data in a data repository, whenever possible in a trustworthy data repository committed to preservation.”
Most similar wording in the document includes the word “secure” (e.g., pg. 8, pg. 20). Suggest including the term in such descriptions to underscore the necessary technical requirements.
_________________
COMMUNITY
Pg. 49 – “Data processors/ data custodians/ data controllers: determine the purposes and methods of the processing of personal data, perform the data processing, including analysis, anonymisation, storing and preservation, sharing e.g. researchers, app developer, funders, policymakers, health authority”
Another instance of de-identification not being mentioned, but the document offering more delineation in subsequent sections.
Pg. 49 – “What do we mean by app development for community-generated data? 1. Symptom tracking apps (health monitoring apps where users self-report COVID-19 symptoms). 2. Contact tracing apps (mobile phone tracking used to identify the potential geographic spread of COVID-19).”
Consider use of the term “exposure notification” for such mobile apps, vs. “contact tracing”.
Pg. 50 – “A balance must be achieved between timely testing and contact tracing, emergency response and community safety alongside individual privacy concerns such as surveillance, unauthorised use of personal data and forms of abuse that might result from the identification of subjects.”
Consider rephrasing to: “… between timely testing and contact tracing, exposure notification, emergency response, and community safety alongside individual privacy concerns such as surveillance …”
Pg. 51 – “Adequate medical, social and emotional support networks need to be established before apps relay to users they may have been in close proximity to a COVID-19 positive individual. Data governance comes with accountability and the need to work with the relevant local, national and international authorities to ensure appropriate support networks are in place and the app coordinates with these authorities in such matters.”
Strongly support the intent of this section, but consider bolstering even further – such as requiring that all exposure notification app content/messaging be generated and tailored to the context of a user being newly alerted to contact and potential exposure.
Pg. 51 – Nothing in this section, nor pg. 52, regarding log and audit requirements. Ontario is one jurisdiction that now has legislated, mandatory log and audit requirements for certain kinds of health data. The document could generally make more clear statements about the log and audit requirements that go into “repository” and “remote access” arrangements.
Pg. 51 – “4. Make sensitive technical considerations such as transmitting anonymised codes as a means to alert individuals to exposure.”
Again, strongly support the intent here, but this wording is not terribly clear. Consider rephrasing to “Use sensitivity to guide technical choices and considerations, such as a decision to only transmit anonymized codes or keys as a means to alert individuals within exposure notification apps.”
Pg. 51 – “Contact tracing apps should adhere to the same development recommendations as other software, particularly to build public trust (see Research Software and Data Sharing).”
Suggest that “adhere to or exceed the minimum requirements within development recommendations for other software” be considered for this section, given the significance, privacy and security concerns, and public trust implications. It also squares far better with the “utmost importance” and “key importance” language that immediately follows on pg. 52.
Pg. 52 – “Ensure apps and participatory response coordination platforms are developed with the research, emergency response and health care questions are the central concept and only gather data needed to address these questions.
This wording is quite unclear. Suggest this be revisited.
Pg. 52 – “4. Protecting personal data are of the utmost importance when developing applications. Use protocols and methods that aim to protect personal data e.g. Decentralised Privacy-Preserving Proximity Tracing (DP-3T).”
Should the Apple/Google APIs be mentioned in this section?
_________________
INDIGENOUS DATA
Pg. 57 – “Indigenous data governance is also a prerequisite for determining appropriate future use of data. As contact tracing becomes a key tool in the fight against COVID-19 there has been a noticeable shift from paper based to electronic tracking, and to increasing centralisation. Mobile phone location tracking is also another tool being employed by nations and states to mitigate the spread of COVID19.”
Suggest that “location tracking” be clarified as only some proposals and jurisdictions have pursued this method. Consider rephrasing to “Mobile phone proximity and/or location tracking”.
_________________
RESEARCH SOFTWARE
Pg. 59 – “Policy makers should enact policies that encourage software to be available under an open source software licence, or at least require the software to be accessible.”
Is “available” the intent here? Or actual accessibility? Suggest this be revisited for clarity.
_________________
LEGAL & ETHICAL
Pg. 67 – “The obligation to limit the identifiability of personal data as far as possible – including via pseudonymisation techniques.”
This is the first appearance of “pseudonymization” in the document after numerous generic references to anonymization. Suggest that these terms be reviewed and harmonized throughout the document, and certain concepts introduced sooner.
Pg. 68 – “The obligation to use anonymised data instead of personal data, or minimise personal data use, or de-identify where possible.”
This is the point in the document where it begins to delineate de-identification despite numerous generic references to anonymisation prior. Suggest that these terms be reviewed and harmonized throughout the document. Even the way these terms are being suggested in this order and mode of consideration may not be accurate or indicative of their applicability, suitability, or protective value in a given context. This should be revisited generally throughout.
Pg. 69 – “Researchers may thus need to take into account the possibility of future re-identification”
The next page (pg. 70) clearly recommends a re-identification risk assessment. The need to assess, vs. merely “take [risk] into account”, should be highlighted in this sentence. Consider rephrasing to “may need to take into account the possibility of future re-identification, and manage this risk by means of a risk assessment”.
Pg. 71 – “4. Carry out an impact assessment in regard to the impact on the data subject (the individual identified) before disclosure or publication, and introduce additional measures (Statistical Disclosure Control) to mitigate the risk.”
This wording is very confusing as it tends to suggest re-identification risk assessment includes a view of impact as being a consideration for a single data subject, without any consideration of aggregate sensitivity or impact, in a section that otherwise concerns itself with datasets in their entirety. This section and the intent should be revisited and clarified.
_______
Read our statement on ‘Playing Our Part during COVID-19’
_________________
Dr. Natalie Harrower
Director, Digital Repository of Ireland
Royal Irish Academy
***@***.*** | @natalieharrower
http://www.dri.ie | @dri_ireland
RDA COVID-19 Working Group
European Commission FAIR data expert group
European Open Science Cloud (EOSC) FAIR working group
The Academy is subject to the FOI Act 2014, the Data Protection Acts 1988-2003 and 2018, GDPR (EU 2016/679) and S.I. No. 336/2011, EC Privacy & Electronic Communications Regulations. For further information see our website http://www.ria.ie/privacy-and-data-protection
Creator

Discussion

Author

Replies
June 9, 2020 at 6:35 pm #129755

Daniel Mietchen
Member

Thanks, Nathalie – that’s very useful feedback!
d.
On Tue, Jun 9, 2020 at 1:53 PM natalieharrower via RDA COVID19 Coordination
wrote:
Author

Replies

RDA COVID19 Coordination

Group Organizers

TO ALL GROUPS Feedback from Tom Dafoe