Removing barriers in data visiting for privacy preserving machine learning

You are here

05 Dec 2019

Removing barriers in data visiting for privacy preserving machine learning

Submitted by Rudolf Mayer


Meeting objectives: 

  • engage with stakeholders involved in privacy preserving machine learning that work with data in various domains, such as: medical, earth observation, cyber physical systems, etc. 
  • present current solutions and discuss their limitations and open issues
  • identify gaps with respect to accessing data for machine learning purposes, e.g. visiting data vs sharing data, federated vs centralised learning, FAIR vs open data 
  • identify common challenges in bringing algorithms to train the models without compromising ownership of visited data 
  • receive feedback on all of the above
  • discuss and identify topics for a potential new IG/WG that would address common problems faced by all domains working with sensitive data in ML
Meeting agenda: 

Part 1 – Meeting objectives and getting to know participants

Part 2 – Overview of current work in the area of privacy preserving machine learning

  • Lightning talks
  • Discussion

Part 3 – Identifying common challenges

  • work in groups
  • reporting and discussion

Part 4 – Wrap up and planning next joint activities

Type of Meeting: 
Working meeting
Short introduction describing any previous activities: 

Due to the size and complexity of data, algorithms are run on computational backends where the data resides. Furthermore, data is often sensitive and cannot be shared. Collecting good quality training data is costly and considered a commodity. On the other hand, Machine Learning needs plenty of data to train models. All these limitations exclude openness and limit data sharing.

However, new paradigms emerge, such as federated machine learning. It allows for creating Machine Learning models without providing full access to data to unauthorised stakeholders, by training models locally, and aggregating them at a later stage; this way, only the abstract representation of data is shared, and not the raw data itself.

 

The goal of this BoF is to identify common challenges in visiting data. We want to define a common domain agnostic framework for accessing protected data for machine learning. In the long term, it can lead to a definition of EOSC participation criteria.

 

The proposer of this session is active in the area of machine learning since ~15 years, also teaching the subject at the Vienna University of Technology, and has worked at an information security research organisation in the past 8 years, in a number of projects that deal with data management and privacy-preserving analysis.

Remote participation availability (only for physical Plenaries): 
Yes
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2):