Due to the size and complexity of data, algorithms are run on computational backends where the data resides. Furthermore, data is often sensitive and cannot be shared. Collecting good quality training data is costly and considered a commodity. On the other hand, Machine Learning needs plenty of data to train models. All these limitations exclude openness and limit data sharing.
However, new paradigms emerge, such as federated machine learning. It allows for creating Machine Learning models without providing full access to data to unauthorised stakeholders, by training models locally, and aggregating them at a later stage; this way, only the abstract representation of data is shared, and not the raw data itself.
The goal of this BoF is to identify common challenges in visiting data. We want to define a common domain agnostic framework for accessing protected data for machine learning. In the long term, it can lead to a definition of EOSC participation criteria.
The proposer of this session is active in the area of machine learning since ~15 years, also teaching the subject at the Vienna University of Technology, and has worked at an information security research organisation in the past 8 years, in a number of projects that deal with data management and privacy-preserving analysis.