What are you looking for?

RDA Microsoft Roundtable: Data readiness and Data-centric AI skills | May 15th

Date
|

As AI continues to transform research, data quality, accessibility, and workforce readiness are key factors in ensuring responsible and effective AI-driven innovation. However, gaps remain in data readiness, metadata standards, and AI-specific training. How can research institutions, industry leaders, and policymakers collaborate to establish AI-ready data and develop the necessary expertise?

The RDA-Microsoft Roundtables will bring together leading researchers, industry experts, and policymakers across the globe to explore: 

  • The State of Data Readiness for AI – identifying challenges in data quality, metadata standardisation, and interoperability. 
  • Best Practices in AI Data Management – frameworks for assessing data readiness and improving dataset alignment across disciplines. 
  • Workforce and Skills Development – equipping researchers and professionals with essential AI-related data curation and governance skills.  

Agenda

Featuring keynote talks, expert panels, and interactive breakout sessions, this event is your chance to hear directly from the experts on data-centric AI and data readiness for AI. The agenda will include:

  • Keynote speeches from industry experts
  • Audience Q&A and discussion
  • Break out discussion rooms

Insights will feed into a joint RDA-Microsoft white paper, guiding the global research community forward.

Confirmed Speakers

Haritha Thilakarathne

Cloud Solution Architect (AI), Microsoft, Sydney, Australia

Dr. Haritha Thilakarathne is a Cloud Solution Architect in AI at Microsoft, specialising in scalable, secure, and intelligent cloud architectures on Azure, with experience in machine learning and deep learning. He was awarded as a Microsoft Most Valuable Professional in AI for 8 times. Haritha holds a Ph.D. in Deep Learning and actively contributes to the technical community through conferences, meetups, and his blog.

Talk: ‘Accelerating Research Workflows with Data-Centric Small Language Models’

Abstract: This talk explores the emerging role of Small Language Models (SLMs), such as Microsoft’s open model Phi-4, in advancing data-centric AI. When paired with high-quality, well-curated data, SLMs can be fine-tuned or adapted to specific domains, enabling diverse applications including data curation, preprocessing, summarization, and information retrieval. Their lightweight, open-source design and ability to operate locally make them particularly effective in sensitive or resource-constrained research environments. This presentation highlights how these characteristics position SLMs as powerful, accessible tools for domain-specific innovation, aligning closely with the broader shift toward data-centric approaches in AI development. 

Kyong-Ha Lee

Director of the Hyper-scale AI Research Center at the Korea Institute of Science and Technology Information, Daejeon, South Korea

Dr. Kyong-Ha Lee received his bachelor’s, master’s, and doctoral degrees from Chungnam National University in South Korea. He then served as a postdoctoral researcher at the University of Arizona in the United States and as a senior researcher at ETRI. Currently, he is serving as the Director of the Hyper-scale AI Research Center at KISTI.He is also a lifetime member of the Korean Institute of Information Scientists and Engineers (KIISE) and the Korea Information Processing Society (KIPS), and has served as a board director for both organizations.

In addition, he is a member of the AI Committee of the Korean Federation of Science and Technology Societies (KOFST) and serves on the Subcommittee for National Defense Technology Strategy under the Ministry of Science and ICT of Korea.

Talk: ‘Generative AI and AI for Science’

Abstract: Since the release of ChatGPT, large language models (LLMs) have rapidly transformed society—and their impact is now reaching scientific research as well.  

In this keynote, I will explore how generative AI technologies, exemplified by LLMs, are reshaping the landscape of scientific discovery.  

I will also discuss what we, as a research and data community, must be prepared for in terms of data readiness, responsible AI practices, and capacity building to harness these innovations effectively and ethically.

Mukkesh Kumar

Head of Data Management Platform, A*STAR, Singapore

Dr Mukkesh Kumar is the Head of Data Management Platform at A*STAR, leading the Multi-modal Data Management, Data Curation & Data Stewardship, Healthcare Data Analytics & Reporting and Healthcare Software Development teams. Dr Mukkesh Kumar is a PhD alumnus of the NUS Saw Swee Hock School of Public Health, his interests are in developing generative AI and agentic solutions for biomedical informatics. Forging collaborations with the National University Hospital (NUH) in Value-based Healthcare Strategy, the Early Screening for Gestational Diabetes Mellitus in a Low Risk Population (EaGeR) pilot study conducted at NUH for the real world deployment of early pregnancy GDM predictor AI model. Working in close partnership with Singapore’s Ministry of Health (MOH), Dr Mukkesh Kumar is shaping the national data standardization and standardized data analytics strategies. Dr Mukkesh Kumar has been mentoring the Data Managers at US Boston Children’s Hospital/Harvard Medical School for multi-centre clinical research studies, building talent and capabilities in the global research ecosystem.

Talk: ‘AI-Driven Innovation Strategies for Data Curation and Management in International Collaborative Research’

Abstract: This talk explores key challenges faced in advancing data readiness for AI, both within individual research efforts and across international scientific communities. It introduces the Observational Health Data Sciences and Informatics (OHDSI) initiative and the OMOP Common Data Model, with a focus on ongoing implementations and collaborations in Singapore. A practical case study on data curation will be presented to illustrate innovative methodologies and the integration of generative AI tools in enhancing data quality and utility. The session aims to provide insights into scalable, collaborative approaches for improving AI-readiness in complex healthcare data ecosystems. 

FacebookTwitterLinkedInBlueskyEmail

Related Events