Big Data Infrastructure Working Group (BDI-WG)
Please review and comment on the atached case-statement proposing a Big Data Infrastructure Working Group (BDI-WG) led by Wo Chang (NIST), Geoffrey Fox (Indiana U.), and Yuri Demchenko (U. of Amsterdam).
Document at https://www.rd-alliance.org/filedepot/folder/132?fid=397
Author: Talapady Bhat
Date: 31 Jan, 2014
Metadata and re-used terminology is a major focus area of RDA as it plays a key role in data integration and management
For this reason I suggest the following changes shown in red
Big Data is the term used to describe the deluge of data in our networked, digitized, sensor-laden, information-driven world. There is a broad agreement among commercial, academic, and government leaders about the remarkable potential of “Big Data” to spark innovation, fuel commerce, and drive progress. The availability of vast data resources carries the potential to answer questions previously unobtainable. However there is also broad agreement on the ability of Big Data to overwhelm traditional approaches. The rate at which data volumes, speeds, and complexity are growing is outpacing scientific and technological advances in data analytics, management, transport, and more.
Despite the widespread agreement on the opportunities and current limitations of Big Data, the lack of a best practice implementation guide will hold back future Big Data application deployment advancements. How to curate or pre-process data at rest or in motion from a central location or distributed sites? How to decide whether to transfer large datasets or analytic tools between data storage and process site? How best to provision and configure computing cluster and resources? What security and privacy measures are needed? How to manage and monitor massive computing nodes from the traditional computing environment? What metadata terminologies are to be uses to document data and provenance parameters?
Big Data best practice implementation guide will provide data scientists the best guideline on documenting and orchestrating Big Data applications across a diversified range of domains using a
Author: Keith Jeffery
Date: 31 Jan, 2014
Good to see this initiative - it is very important and game-changing. I agree with the comment from Talapady Bhat; metadata will be critical and so cross-links with MIG, MSDWG and DICIG will be important (but I would say that wouldn't I!)
The membership is very much US-dominated so I suggest to get some people from Europe. I'll volunteer! My old team at Rutherford Lab (of which John Wood was CEO when I was IT Director) are good candidates - Juan Bicarregui for example. ERCIM (wwww.ercim.eu) has an initiative on big data - suggest contact the new President , Domenico Laforenza.
The NIST reference architecture is a great starting point but please do not ignore all the work done in Europe (then not labelled big data) on e-Science, GRIDs and CLOUDs. From a data perspective, recall the 4th Paradigm (it is a great pity we don't have Jim Gray anymore - he would have strong views). Tony Hey has a good overview of this.
I don't see any of my old colleagues from VLDB (Very Large Data Bases) Endowment Board all of whom have been working in what is now called big data for years and have a lot of experience. Mike Brodie (ex Verizon chief scientist) springs to mind, Mike Stonebraker of course and from Europe people like Martin Kersten (MONET DB System and Data Distilleries company)), Stefano Ceri, Paolo Atzeni, and others. Similarly the guys from CERN aand ESA have real experience of big data.
Author: Wo Chang
Date: 27 Feb, 2014
Thanks to TN’s comment about RDA’s interest on data integration and management which are crucial to RDA’s mission. This is exactly why we need BDI-WG to enabling/providing Big Data computing infrastructure so that results from analytics processing of massive amount of data can be shared and accessed from others. BDI-WG is focused on how System Orchestrator (or data scientists) to interact with other Big Data ecosystem components (data source, analytics processing (before, during, and after), computing cluster, and external access) without going into any specific business/application domains, information models & network architects, security & privacy, metadata & terminology, etc. The ultimate goal of the Big Data ecosystem is a vendor and technology agnostic ecosystem so that it can re-use and re-purpose for Big Data analytics tools for years to come. Thanks for suggesting “Metadata architecture” as part of the System Orchestrator but I would use “Metadata Specialists” since we need architects and specialists to enable orchestration. Thank you for your great input!
Thanks to Keith’s comment! Yes, with our limited access to the European Big Data community, currently our interaction is very limited within the US. Thanks for pointing out the great development of Big Data initiatives from Europe and would very much like to engage and get input from them. Yes, our intent is to use our community (mostly US) driven NIST Reference Architecture but by all means that we would like to include other regions’ input like Europe, Asia, etc. This is exactly how the ISO/IEC JTC 1 Big Data Study Group is formed (http://jtc1bigdatasg.nist.gov) and our current strategy is to work with Big Data communities from the US, Europe and Asia regions (http://jtc1bigdatasg.nist.gov/committees.php). With RDA’s broad interest in data and with tremendous data expertise range from Agricultural Data Interoperability, Big Data Analytics, to Biodiversity Data Integration, etc., it is my believe that the BDI-WG can draw much synergies from these data groups. Yes, I would very much like to work with you offline to get connect with your Big Data colleagues and see how we can make this effort as an international collaboration. Thank you for your great comment!