status: Recognised & Endorsed

Chair (s): Peter Baumann, MORRIS RIEDEL

Group Email: [group_email]

Secretariat Liaison:


RDA Big Data Interest Group Charter

Mission

The ultimate goal of RDA Big Data Interest Group is to produce a set of recommendation documents to advise diverse research communities with respect to:

How to select an appropriate Big Data solution for a particular science application with optimal value? and

What are the best practices in dealing with various data and computing issues associated with such a solution?

Objectives

In order to achieve our mission, we need to attain the following objectives, while take duly into account related activities and results, such as those of International Organization for Standardization (ISO), Open Geospatial Consortium (OGC), and US National Institute of Standards and Technology (NIST) Big Data Public Working Group (NBD-PWG), as well as other relevant organizations and undertakings.

  • Clarifying, and sometimes defining, terminologies related to Big Data.

    • Any Big Data solution for scientific research will involve many relevant disciplines such as computing hardware/software infrastructure and architecture, data management/curation, analyses and algorithm, etc. Discussions will be more effective when there is no confusion in terminology.

    • The efforts of the RDA Terminologies WG, as well as other relevant efforts (e.g. from NDB-PWG), will be consulted and feedback will be provided whenever necessary.

  • Characterizing leading Big Data technologies.

    • Investigations will be carried out 1) directly through spin-off Working Groups (WGs) and 2) in collaboration with other RDA groups, to characterize the technologies.

    • The characterization of a Big Data technology will include its strengths, weaknesses, and limitations. In other words, what it is good for, in what sort of environment, for what kind of analyses/algorithms.

    • Example evaluation criteria include, but not limited to,

      • Performance, resource utilization, and scalability,

      • Usability,

      • Flexibility and extensibility, and

      • Propensity in supporting scientific collaborations.

  • Collaborating with external entities through IG member involvements.

    • These external organizations and enterprises include, but not limited to, ISO, OGC, NIST, EarthCube, EarthServer, or even individual research projects.

    • These interactions with external entities will ensure BDIG is up to date with new developments and enable us to leverage others’ efforts.

    • Examples of such interactions include: the establishment of connection with NIST NBD-PWG activities, participation in its discussion of Big Data reference architecture, as well as the exercise of common Big Data use cases collected by NBD-PWG and BDIG; similar activities with OGC on Big Geo Data and the ISO SQL WG on flexible retrieval from Big Science Data.

  • Producing a set of recommendation documents based on results obtained from activities in attaining above objectives.

    • This set of documents will include:

      • A systematic classification of algorithms pertinent to the characterization of Big Data technologies,

      • Characterizations of Big Data technologies investigated, especially their value characteristics in each category of use cases,

      • Frequency of each class of algorithms and/or queries used by workflows in various use cases, delineated by science domains/subdomains, and

      • Feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries.

    • These recommendation documents are aimed to serve as a best practice guide for scientific groups/communities interested in investing in Big Data technologies.

Participation

Big Data Interest Group, BDIG,  is open to all RDA members to participate. The following participants are especially relevant:

  • Domain scientists wishing to utilize Big Data solutions for their research and/or applications,

  • Data specialists with experience in data production, curation, analysis, and management, especially involving large volumes and varieties of data,

  • Computational scientists or software engineers with special interests in data analysis techniques and algorithm analysis, especially pertaining to Big-Data-relevant technologies and tools,

  • Experts, or aspiring experts, of various Big Data technologies and tools,

  • Computational infrastructure and architecture experts in fields such as distributed computing, high-performance computing, and database systems,

  • Data scientists with a blended interest involving some subsets the activities mentioned above, in partcular with share, use, and reuse of open scientific data collections, and

  • Managers involved in any combination of the activities mentioned above.

Interaction Mechanism

BDIG will utilize capabilities provided by the RDA platform to communicate and collaborate effectively to achieve its goals. These include:

  • Monthly telecoms/webex to with planned agenda to discuss specific issues

  • Asynchronous collaboration using google docs, wiki and email list servers

  • Semiannual RDA Plenary meetings to hold sessions for F2F interactions amongst members and to inform other RDA members of its ongoing activities.

Outcome

BDIG will be considered a success if the Interest Group:

  • Develops recommendation documents accepted within and beyond RDA

  • Visibility and uptake of results can be demonstrated within RDA, but also beyond RDA (such as OGC and ISO, discussions and outreach of results at specific domain-specific events like the IGARSS, EGU, AGU, etc.)

All BDIG documents will be publicly available on the RDA server.

References

[1] NIST Big Data Use Cases: http://bigdatawg.nist.gov/usecases.php

[2] EarthServer Big Data Use Cases: http://www.earthserver.eu/Services

[3] OGC Big Data Domain Working Group: http://external.opengeospatial.org/twiki_public/BigDataDwg/

 

Wiki Contents

Posts

07
January
2021

Dark Data and FAIR

by Jack Casey

Dear RDA members,
2 | Add new comment
21
October
2020

EUXDAT Infrastructure Webinars

by Karel Charvat

Dear friends EUXDAT developed e-Infrastructure for analysis of Agriculture and Environment Related Data. All solution is based on Open Source and Open Standards. Migration of infrastructure from one cloud to others demonstrated, that this solution is easily transferable and could be replicated. We are now running a series of Webinars about infrastructure and also pilot implementations. Introduction of infrastructure was done on Monday Video from Monday presentation of EUXDAT e-Infrastructure is now publicly available.
0 | Add new comment
31
July
2020

July RDA Newsletter Published

by Jamie Lupo-Petta

Dear RDA Members,   The July issue of the RDA Newsletter has been published.  It is full of information this month, including a reminder about the Call for Sessions deadline (which is next week - August 4) for Virtual Plenary 16, newly endorsed outputs, group updates and more.  Check it out at https://bit.ly/33b0CuY.   Have a good weekend!   Regards,   Jamie Petta, on behalf of the RDA Secretariat
0 | Add new comment
07
January
2020

RDA Plenary 15 session acceptance: "Data Properties: Metadata as an Economic Good" Joint Session

by Stefanie Kethers

Dear Metadata Interest Group, Data Economics Interest Group, Big Data Interest Group, and Vocabulary Services Interest Group,     Thank you for responding to the TAB comments on your session proposal "Data Properties: Metadata as an Economic Good" for RDA Plenary 15 in March 2020. Your session proposal has now been accepted. Congratulations! We look forward to seeing you in Melbourne. Best wishes, Stefanie Kethers (RDA Secretariat)
0 | Add new comment
23
December
2019

RDA P15 Melbourne - Session Acceptance (Improve before Accept)

by Isabelle PERSEIL

  Dear Metadata Interest Group, Data Economics Interest Group, Big Data Interest Group and Vocabulary Services Interest Group,   At its meeting on December 19, the RDA Technical Advisory Board considered your proposal for a session at Plenary 15 in 2020.   Unfortunately, TAB IS NOT ABLE TO ACCEPT YOUR SESSION PROPOSAL UNLESS YOU MAKE THE CHANGES THAT ARE LISTED BELOW. IF YOU STILL WISH TO PROCEED YOU WILL NEED TO:   1. Update the session description to address TAB's concerns BY OR BEFORE JAN 2ND, 2020
0 | Add new comment
09
December
2019

P15 Session Proposal Received

by Jamie Lupo-Petta

Dear RDA Chair, Thank you for your session proposal for Plenary 15 titled “Data Properties: Metadata as an Economic Good”. 
0 | Add new comment
06
December
2019

RDA Plenary 15 Announcements - Registration and Calls for Co-located Events and Posters Now Open

by Alexandra Delipalta

  RDA is excited to announce several important pieces of information related to RDA Plenary 15: Data for Real-World Impact, which will be held from 18-20 March 2020 in Melbourne, Australia at the Melbourne Convention Exhibition Centre, MCEC (https://mcec.com.au/).   This event will be hosted by CSIRO (Commonwealth Scientific and Industrial Research Organisation) with the support of the Australian Research Data Commons (ARDC).
0 | Add new comment
28
November
2019

Deadline For P15 Group Session Submissions Extended To 5 December 2019

by Alexandra Delipalta

Dear RDA members,  RDA's 15th Plenary meeting, Melbourne, Australia, 18-20 March 2020  The deadline to submit session proposals for Plenary 15 in Melbourne, Australia has been extended to 5 December 2019, midnight  UTC. Group session application form Joint meeting application form
0 | Add new comment
12
November
2019

Deadline for P15 Group Session Submission Fast Approaching

by Jamie Lupo-Petta

Dear Group Members,   The deadline for group session submissions for Plenary 15 is 28 November 2019, midnight UTC – just two weeks away.
0 | Add new comment

Pages