Big Data IG

IG

Group details

Chair(s): 
Peter Baumann, Ben Evans, Kwo-Sen Kuo, Morris Riedel
Case Statement: 
 

RDA Big Data Interest Group Charter

Mission

The ultimate goal of RDA Big Data Interest Group is to produce a set of recommendation documents to advise diverse research communities with respect to:

How to select an appropriate Big Data solution for a particular science application with optimal value? and

What are the best practices in dealing with various data and computing issues associated with such a solution?

Objectives

In order to achieve our mission, we need to attain the following objectives, while take duly into account related activities and results, such as those of International Organization for Standardization (ISO), Open Geospatial Consortium (OGC), and US National Institute of Standards and Technology (NIST) Big Data Public Working Group (NBD-PWG), as well as other relevant organizations and undertakings.

  • Clarifying, and sometimes defining, terminologies related to Big Data.

    • Any Big Data solution for scientific research will involve many relevant disciplines such as computing hardware/software infrastructure and architecture, data management/curation, analyses and algorithm, etc. Discussions will be more effective when there is no confusion in terminology.

    • The efforts of the RDA Terminologies WG, as well as other relevant efforts (e.g. from NDB-PWG), will be consulted and feedback will be provided whenever necessary.

  • Characterizing leading Big Data technologies.

    • Investigations will be carried out 1) directly through spin-off Working Groups (WGs) and 2) in collaboration with other RDA groups, to characterize the technologies.

    • The characterization of a Big Data technology will include its strengths, weaknesses, and limitations. In other words, what it is good for, in what sort of environment, for what kind of analyses/algorithms.

    • Example evaluation criteria include, but not limited to,

      • Performance, resource utilization, and scalability,

      • Usability,

      • Flexibility and extensibility, and

      • Propensity in supporting scientific collaborations.

  • Collaborating with external entities through IG member involvements.

    • These external organizations and enterprises include, but not limited to, ISO, OGC, NIST, EarthCube, EarthServer, or even individual research projects.

    • These interactions with external entities will ensure BDIG is up to date with new developments and enable us to leverage others’ efforts.

    • Examples of such interactions include: the establishment of connection with NIST NBD-PWG activities, participation in its discussion of Big Data reference architecture, as well as the exercise of common Big Data use cases collected by NBD-PWG and BDIG; similar activities with OGC on Big Geo Data and the ISO SQL WG on flexible retrieval from Big Science Data.

  • Producing a set of recommendation documents based on results obtained from activities in attaining above objectives.

    • This set of documents will include:

      • A systematic classification of algorithms pertinent to the characterization of Big Data technologies,

      • Characterizations of Big Data technologies investigated, especially their value characteristics in each category of use cases,

      • Frequency of each class of algorithms and/or queries used by workflows in various use cases, delineated by science domains/subdomains, and

      • Feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries.

    • These recommendation documents are aimed to serve as a best practice guide for scientific groups/communities interested in investing in Big Data technologies.

Participation

Big Data Interest Group, BDIG,  is open to all RDA members to participate. The following participants are especially relevant:

  • Domain scientists wishing to utilize Big Data solutions for their research and/or applications,

  • Data specialists with experience in data production, curation, analysis, and management, especially involving large volumes and varieties of data,

  • Computational scientists or software engineers with special interests in data analysis techniques and algorithm analysis, especially pertaining to Big-Data-relevant technologies and tools,

  • Experts, or aspiring experts, of various Big Data technologies and tools,

  • Computational infrastructure and architecture experts in fields such as distributed computing, high-performance computing, and database systems,

  • Data scientists with a blended interest involving some subsets the activities mentioned above, in partcular with share, use, and reuse of open scientific data collections, and

  • Managers involved in any combination of the activities mentioned above.

Interaction Mechanism

BDIG will utilize capabilities provided by the RDA platform to communicate and collaborate effectively to achieve its goals. These include:

  • Monthly telecoms/webex to with planned agenda to discuss specific issues

  • Asynchronous collaboration using google docs, wiki and email list servers

  • Semiannual RDA Plenary meetings to hold sessions for F2F interactions amongst members and to inform other RDA members of its ongoing activities.

Outcome

BDIG will be considered a success if the Interest Group:

  • Develops recommendation documents accepted within and beyond RDA

  • Visibility and uptake of results can be demonstrated within RDA, but also beyond RDA (such as OGC and ISO, discussions and outreach of results at specific domain-specific events like the IGARSS, EGU, AGU, etc.)

All BDIG documents will be publicly available on the RDA server.

References

[1] NIST Big Data Use Cases: http://bigdatawg.nist.gov/usecases.php

[2] EarthServer Big Data Use Cases: http://www.earthserver.eu/Services

[3] OGC Big Data Domain Working Group: http://external.opengeospatial.org/twiki_public/BigDataDwg/

 

Wiki Contents

Recent Activity

18 Sep 2017

“R programming language to manage metadata, data complying with OGC standards and controlled vocabularies: the case of Tuna Fisheries”

Dear IG's / WG's,

We would like to invite you to an RDA related event that we believe to be relevant to your activities; “R programming language to manage metadata and data by complying with OGC (EML, CF conventions) standards and controlled vocabularies: the case of Tuna Fisheries” .

When: Tuesday 19 September 2017, from 11:30 to 13:00 local time

Where: room: Mansfield 10, RDA Plenary Meeting, Montreal, Canada

04 Apr 2017

AW: [rda-bigdata-ig][rda-edu-ig] Re: [rda-bigdata-ig][rda-edu-ig] Fwd: FW: Join The Atlantic on 3/30 to talk higher ed and tomorrow's workforce

Dear Yuri,
Interesting, thanks.
I can't be in RDA as planned due to an ad-hoc
proposal defense in Berlin I need to attend.
But the links are surely interesting!
Take care,
Morris
-----Ursprüngliche Nachricht-----
Von: y.demchenko=***@***.***-groups.org [mailto:***@***.***-groups.org] Im Auftrag von ***@***.***
Gesendet: Monday, April 03, 2017 6:58 PM

03 Apr 2017

Re: [rda-bigdata-ig][rda-edu-ig] Fwd: FW: Join The Atlantic on 3/30 to talk higher ed and tomorrow's workforce

FYI:
The recording of very interesting discussions during this event is
FYI:
The recording of very interesting discussions during this event is
available at the same URL
>> https://www.theatlantic.com/live/events/crunching-the-numbers/2017/
FYI:
The recording of very interesting discussions during this event is
available at the same URL

01 Apr 2017

Re: Are we requesting dial in access for the Big Data IG and the Array Database WG Breakout sessions at Barcelona?

Dear all,
for those who cannot participate physically, here is a dial-on opportunity for
joining us. Please proceed as described below.
cu,
Peter
--
Dr. Peter Baumann
- Professor of Computer Science, Jacobs University Bremen
www.faculty.jacobs-university.de/pbaumann
mail: ***@***.***-university.de
tel: +49-421-200-3178, fax: +49-421-200-493178
- Executive Director, rasdaman GmbH Bremen (HRB 26793)

30 Mar 2017

Fwd: FW: Join The Atlantic on 3/30 to talk higher ed and tomorrow's workforce

Dear RDA IG/WG Members:
I want to share with you this invitation to the event by the Atlantic
The event will start at 14:00 CET but will be recorded.
Shortly after this the PwC and BHEF report "Investing in America’s data
science and analytics talent" will be available open.
EDISON project cooperated with both organisations in making their
recommendations useful for our community.
Please use this opportunity and we can discuss some interesting issues
next week at RDA9.
Best Regards,
Yuri Demchenko

30 Mar 2017

Re: Agenda for Joint IG Session Thursday April 6, 14:00-15:30

FYI pls see below
best,
Peter
--
Dr. Peter Baumann
- Professor of Computer Science, Jacobs University Bremen
www.faculty.jacobs-university.de/pbaumann
mail: ***@***.***-university.de
tel: +49-421-200-3178, fax: +49-421-200-493178
- Executive Director, rasdaman GmbH Bremen (HRB 26793)
www.rasdaman.com, mail: ***@***.***

23 Mar 2017

Big Data IG Members' Survey (4th and Last Mailing)

I've invited you to fill out the following form:
RDA IG Big Data Survey
To fill it out, visit:
https://docs.google.com/forms/d/e/1FAIpQLSelrcvb2U3SyR0wYIlgD7E5Z_CCG_lH...
This is the 4th and final mailing requesting participation in the survey.
The survey will close on Friday, 31 March 2017. We have 33 responses now.
Please participate as soon as possible.

16 Mar 2017

Big Data IG Members' Survey (3rd Mailing)

I've invited you to fill out the following form:
RDA IG Big Data Survey
To fill it out, visit:
https://docs.google.com/forms/d/e/1FAIpQLSelrcvb2U3SyR0wYIlgD7E5Z_CCG_lH...
Dear Member of RDA Big Data Interest Group,
We have 29 responses now, 10 more than last week! Please participate.
In order to better serve the members of this interest group in promoting

09 Mar 2017

Big Data IG Members' Survey (2nd Mailing)

I've invited you to fill out the following form:
RDA IG Big Data Survey
To fill it out, visit:
https://docs.google.com/forms/d/e/1FAIpQLSelrcvb2U3SyR0wYIlgD7E5Z_CCG_lH...
Dear Member of RDA Big Data Interest Group,
(There are 100+ members in our Interest Group. We have received 19
responses since the 1st mailing but would like to have more, much more.)

02 Mar 2017

Join Peter Baumann for the Datacubes as an Enabling "Big Data" Paradigm webinar, 6 March, 14:00 UTC

Dear all,
We kindly invite you to take part in the Datacubes as an Enabling "Big
Data" Paradigm webinar
,
presented on March 6 at 14:00 UTC by Peter Baumann, co-chair of the RDA
Array Database Assessment and the Big Data groups.
/In this webinar will present the concept of datacubes and services
around them, based on practical largescale science applications and
several "Big Datacube" standards initiated and edited by the
presenter, so as to proliferate knowledge and stimulate discussion.
/