I felt very privileged to have attended the 8th Research Data Alliance (RDA) plenary in Denver, Colorado, as data is not my background. I am a final year medical student at Newcastle University, and while I have completed research projects and audits, the scope of RDA was a completely new world. So let me start by saying, thank goodness there was a newcomers’ session!
How did I end up at the conference?
I am part of the National Student Association of Medical Research (NSAMR), a non-profit organisation that aims to foster research among medical students. To help achieve this aim we are setting up the United Kingdom’s first online open access journal that is authored, peer reviewed, and edited by medical students. While this may not initially seem important, there are many reasons why it is and these can be found here (https://rd-alliance.org/blogs/rda-p7-ecp-implementing-rda-recommendations-student-journal.html). I attended the 8th RDA plenary to learn about RDA outputs, which could be incorporated into our journal.
The main aspect of RDA that really stood out to me was its openness and emphasis on forward thinking. It is an international and multi-national organisation, which creates solutions that work worldwide. Initially, I did not appreciate how important something like this could be. Data problems are rarely isolated, and while they may come in different guises, they are normally a phenomenon encountered by multiple organisations in many states. By mandating that working and interest groups have contributors from different continents, a universal solution to a specific problem can be created. This approach is more efficient as it eradicates multiple groups developing solutions that are very similar or only work for them. The approach favoured by RDA streamlines the process and thus efficiently generates a solution to a problem.
The purpose of RDA
Before I arrived at the plenary I thought that RDA was the primary implementer of its recommendations. Instead, RDA develops recommendations and allows others to implement them on their own accord. RDA therefore must have platforms for it to test its outputs. This is where the NSAMR Journal comes in. A newly formed journal does not have an archive of articles that it would need to alter if changes were implemented. Additionally, it is uncommon for medical students to have PubMed publications, thus they do not need to alter their bibliographic data for all their publications. Therefore, RDA recommendations can be implemented and altered by the RDA easily. The journal offers a low risk environment for RDA to test its outputs and recommendations regarding journals, open science, and open access.
Among the sessions I attended, the two that really stood out were the Healthcare Data IG and the Data Citation WG. The Healthcare data IG was on the second day of the conference and was focused on how data can be used for personalised medicine, repository development, and drug development. After a day of terminology and acronyms that I didn’t understand, it was a relief to be back in my jargon comfort zone.
There are many ways in which healthcare data can improve the quality of care for patients. It can help inform diagnostic decision-making, and image interpretation. Together these could be used to develop virtual assistants or clinics that could assist clinicians in the management and triage of patients. However, unlike some other types of data, data in healthcare is special. It is often personal and may be damaging if it is revealed. Thus a large component of the IG’s work is addressing privacy and security concerns. Besides security there are other problems with healthcare data. It’s often incomplete, unstructured, and doubles in size every five years! Furthermore, with the introduction of wearable technologies there is going to be a tsunami of healthcare data that needs to be protected and interpreted.
How can we overcome these problems?
Collaboration is key, and for successful collaboration there needs to be the open sharing of data. One way in which this could be achieved is by using data repositories, which are large databases where individuals can enter and access similar data sets. The benefit of repositories is that the volume of data available to researchers is much larger and the data contained can be re-used and re-purposed to suit the researcher’s needs. However, for this to occur in healthcare there needs to be an improvement in security. Approximately 50% of companies have had issues with data security, and it is surprising there haven’t been more breaches. With such sensitive data this is a major obstacle that must be addressed before any major contributors will want to be involved.
While these problems are being addressed the focus should be on submission of healthcare related research data to repositories. Historically, data has been distributed through subscription only journals. While this is an adequate way of conveying results it can limit their overall impact. Instead, results could be published in an open access journal and data submitted to a repositories. Submission to open repositories would allow open data that could be accessed by researchers who could incorporate it into their own research, thus increasing the data’s utility. This is the aim that the NSAMR journal wishes to achieve, as it is particularly important when the types of research the NSAMR journal will receive. The majority of medical students complete an audit, small research project or a pilot study as part of their curriculum. Often these projects are repeated in different institutions but the results are never combined. This is wasteful as inefficient. There are two solutions to this problem, submission to repositories and development of a national medical students projects database.
Submission to repositories is important for these types of projects due to their size. Alone the projects are not powered to make reliable conclusions. However, if data sets could be combined with similar data sets, overall a significant conclusion may be possible. NSAMR journal submission policies will include clauses that authors should submit data sets to appropriate repositories. However, repository data it is only as useful as its metadata is complete. Therefore, the NSAMR journal must also ensure that authors submit metadata pertaining to the data sets as well.
A national database would minimise wasteful repetition of completed projects or allow projects that were incomplete due to size limitations to be repeated. NSAMR has also founded national medical students projects database where primary investigators can post projects they wish to complete and for students to sign up.
Why do we need open science, open access and open data?
Access to knowledge is expensive, conditions of use can hamper development, and ownership of data is unclear. Open access and open data remove these limitations and open science allows transparency. Open data in healthcare is the future and has tremendous power to transform clinical practice. Open healthcare data will be a great leap forward. However, there are many obstacles that must be overcome. Instead, small steps need to be taken while legal and security issues are addressed. These small steps should begin by increasing the utility of data relating to healthcare, something that can be achieved by open access, repositories, and research project databases.