Bridging the Data/HPC divide: lessons learned from the community talks

You are here

11 Jun 2023

Bridging the Data/HPC divide: lessons learned from the community talks

Submitted by Maijastiina Arvola


Meeting objectives: 

 

Click here for the collaborative session notes

 

Scientific advances designed to address global challenges require researchers to have seamless access to data and computing and increasingly high performance computing. A certain disconnect has characterized the relationship between the HPC and data communities and this needs to be addressed in order to fully support today's data and compute intensive science.

For example, members of the ArtificiaI Intelligence/Machine Learning (AI/ML) community are starting to use the resources of HPC, as HPC systems include GPUs for acceleration.  Reciprocally, AI/ML is an emerging method in science, the main consumer of HPC resources. Hybrid HPC/ML workflows enable combining ML tasks with traditional numerical simulations on HPC resources, for instance for data-intensive analysis, experiment optimization, and data mitigation when data values are missing.  The use of HPC resources to run AI/ML presents new challenges for data, related to metadata and provenance, I/O, and reproducibility. Another example of topics requiring bridging the two communities is the increasing development of digital twins in environmental sciences that require bringing together diverse data from various sources and adapting that to HPC workflows.

This BoF session will continue the discussion started in RDA P18 session “Combining Data community strengths and High Performance Computing opportunities”, followed by BoF sessions “Leveraging Data community strengths and High Performance Computing opportunities”, andBridging the HPC/Data Divide” in IDW 2022, Supercomputing 2022 and International Supercomputing 2023 conferences. The purpose of these sessions has been to discuss topics bridging the divide between the communities, as well as challenges and solutions the communities share. Both communities have been asked for feedback on the divide, and the audiences have been actively engaged into the sessions.

The community feedback showed that both communities identify the divide and the need to share best practices, success stories and provide concrete solutions and data management techniques for HPC users. At the same time, it is evident that there are significant differences in terms of awareness of data management principles among the HPC community.

This BoF will wrap up the discussions and feedback collected from the previous sessions, and further elaborate efforts to bridge the gap and lessons learned from the work. A few topics identified based on the previous discussions will be considered more closely such as solutions in sensitive data management, and FAIR data in HPC environment. The expected outcome of this session is an increased understanding of the division between the communities, but also of the synergies and overlaps, opportunities for closer collaboration. The need for establishing an RDA working group or interest group will be discussed in view of overcoming the gaps and increasing understanding, common language, and ways of working between the communities.

Meeting agenda: 
  1. Problem statement
  2. Outcomes and community feedback from the series of discussions
  3. Deeper dive to efforts to bridge the gap and lessons learned (e.g. solutions in sensitive data management, FAIR data in HPC environment)
  4. Recommendations for the way forward
Type of Meeting: 
Informative meeting
Short introduction describing any previous activities: 

This BoF session will continue the discussion started in RDA P19 session “Leveraging Data community strengths and High Performance Computing opportunities”, followed by BoF sessionsBridging the HPC/Data Divide” in IDW 2022, Supercomputing 2022 and International Supercomputing 2023 conferences. In these sessions, feedback has been collected from both communities by using interactive polls as well as through open discussion, and this BoF will wrap up the discussions and analyze the feedback received to provide recommendations for the way forward.

BoF chair serving as contact person: 
Meeting presenters: 
Maijastiina Arvola, CSC – IT Center for Science; Christine Kirkpatrick, SDSC; Mark Gray (Pawsey), Maciej Cytowski (Pawsey), Marek Michalewicz (NSCC), Line Pouchard (BNL)
Contact for group (email): 
If "Other," Please specify:: 
HPC
Driven by RDA Organisational Member: 
Driven by RDA Organisational Member
Applicable Pathways: 
FAIR, CARE, TRUST - Principles
FAIR, CARE, TRUST - Adoption, Implementation, and Deployment
Other
Estimate of the required room capacity: 
30-50
I Understand a Chair Must be Present at the Event to Hold the Breakout Session: 
Yes
Please indicate at least (3) three breakout slots that would suit your meeting.: 
Breakout 2
Breakout 3
Breakout 4