Presenting my research – neuroscience & my interest in RDA
Since I am new to the RDA network, I wanted to present my reserach are, which is neuroscience. I am working as a postdoctoral researcher and senior teaching assistant at Faculty of Medicine, Osijek, Croata. My research interests are chronic stress, neurodegeneration and sex specific differences in neuronal and metabolic diseases.
I am starting my first small independent scientific project that is trying to connect chronic stress to neurodegeneration through mechanisms beyond genetics - epigenetic modification. This scientific aspect requires big data analysis, area that I am not very experienced in, but I am determined to improve my knowledge.
Also, I am involved in the study of sex specific regulation of diabetes showing different responses between sexes and between two different interventions in animal model of diabetes. One of the goals of the project is development of mathematical model for analyzing glucose and insulin tolerance tests, often used as clinical and preclinical tests for diagnosis of diabetes. These mathematical models produce data that are appropriate for open sharing among medical and biomedical community in order to develop better tests and more precise, personalized diagnostics for diabetic patients.
The scientific aspects of my research require big data analysis, area that I am not very experienced in, but I am determined to improve my knowledge. I would like improving my knowledge about big data and open sharing, specifically in the field of biomedicine. Also, I would like to establish a collaboration with experts from the field of big data analysis and open sharing of the data among scientists. I am interested in RDA working and interest groups: Big Data IG because of its value in understanding how to select a Big Data solution for a specific scientific application and following up-to-date technological solutions in big data analysis. RDA/WDS Publishing Data Workflows WG is another RDA working group interesting to me which is important for researchers in developing big databases and are reflecting on world-wide situation in certain pathologies. Reproducibility is a big issue in modern science so Good Research Data Management (RDM) RDA interest group offers useful tool for scientists as a key component of research integrity, reproducible research and development of resources in sharing data.
Click on the poster image to enlarge
Author: Christian Pagé
Date: 30 Mar, 2020
Thanks for this nice poster! It is very interesting in a scientific point of vue.
For the data and RDA related aspects, I have a few questions:
1- What is the estimated data volume expected. All data is local too?
2- In your community, do you already have some standards for data formats and metadata schema?
3- Also, when processing data, do you record provenance and lineage information? If so using W3C-PROV?
Author: Marta Balog
Date: 31 Mar, 2020
thank you for your interest in my poster and for your questions. I am very inexperienced in big data analysis – I just started a pilot project that is supposed to generate big data; therefore, I am sorry for not being able to answer to all your questions in details.
Here are the answers:
1. The data will come from a study performed in animal model (rats). There were 80 brain samples in total. Each sample will give info on many different genes that we are choosing right now. Each gene can show multiple sites of mutations or epigenetic changes. Since this is the first ˝big data˝ project for me, I am not sure about the data volume, but I expect it to be quite demanding. Study was divided in two parts; one part being performed locally and other in Hungary.
2. We are usually collecting data in excel spreadsheets, however data on sequencing from this project will be provided in output formats of the company that will do the sequencing, so I don´t know. Other data on animals besides the sequencing were already collected in classical excel spreadsheets or simple .txt files that can easily be imported to R software or python for further analysis.
3. Previously we had no such data, so we did not use W3C-PROV yet. In the case of the abovementioned project data provenance and lineage were recorded using excel spreadsheets. We are planning project that will provide data coming from patients from different countries and for that purpose we will provide more pertinent recording of provenance and lineage and probably use W3C-PROV.
Author: Rob Hooft
Date: 31 Mar, 2020
When reading the poster I was already wondering whether you had any specific experience with data interoperability, and this question is strengthened by looking at your earlier answer about using EXCEL!
Excel spreadsheets, if they have a header, are very bad in exactly defining what data is given. For example a column may be labeled "Gender", but this does (a) not clarify whether the column is filled with m/f/unknown strings, or with an encoding 1/2, maybe even following an ISO standard, and (b) does not define what kind of gender is registered: chromosomal, self-reported, characteristics at birth.
Especially when you need to join data from different tables, you really need to know these details. Are this issues you are encountering? And if yes, how do you deal with that?
Author: Marta Balog
Date: 02 Apr, 2020
Dear Rob, thank you for your questions and comments.
You are completely right. So far, I had no experience with data interoperability and since my data were not so demanding (coming from one experiment), excel spredsheets worked out just fine. However, I am aware that adjustment will be necessary when switching to bigger data volume. Our projects are evolving towards omics data, together with the equipment producing big data, so we have to adjust. That is why I am interested in RDA