Data Fabric IG

IG

Group details

Chair (s): 
Case Statement: 
IG Established
 

The Data Fabric IG (DFIG) identified that working with data in the many scientific labs and most probably also in other areas such as industry and governance is highly inefficient and too costly. Excellent scientists working on date intensive science tasks are forced to spend about 75% of their time to manage, find, combine and curate data. What a waste of time and capacity. The DFIG is therefore looking at the data creation and consumption cycle to identify opportunities to optimize the work with data, to place current RDA activities in the overall landscape, to look what other rcommunities are doing in this area and to foster testing and adoption of RDA outputs. The goal of DFIG finally is to identify common components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios such as replicating data in federations, developing virtual research environments, and automating regular data management tasks. Much important work is being done on data publishing and citation, but DFIG believes that we need to start at early moments in the "Data Fabrics" in the labs to organize, document and manage data professionally if we want to meet the requirements of the coming decades.

  

DFIG is focusing on the data creation and consumption cycle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective.

DFIG's goal is to identify common components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Throughout its existence, DFIG has shepherded multiple spin-off groups into existence, dealing with specific aspects of the cycle and components involved, particularly regarding Persistent Idenfiers (PIDs), their relevance and applicability to address data referencing and management issues. These efforts have brought forth a new understanding which is summarized in an overview document here.

The group is currently reassessing the overall landscape in trying to identify the next challenges, components or other work areas of interest. An overview is contained in The Future Trends for the Data Fabric.


File Attachment: 

Recent Activity

16 May 2018

Final version of GDON document

Dear all,
we would also like to finally submit the Virtual Layer recommendations
summary to RDA as a Supporting Output. The final version is attached;
unless there are further comments, we will hand this in end of May.
Best, Tobias
--
Dr. Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany
Phone: +49 40 460094-431
Email: ***@***.***
URL: http://www.dkrz.de

15 May 2018

Seeking feedback on GEDE draft supporting output

Dear Data Fabric group members,
the GEDE group has worked on a larger report that is focused on the PID
topic area (see attached). This is coordinated by Peter Wittenburg, many
of whom you know, and he asks whether the report can be submitted as an
official DFIG Supporting Output. I believe that the report is within the
scope of topics DFIG has discussed in the past and that it is good that
you as the DFIG expert members have a final view on the report before it
is put out for endorsement as a Supporting Output by RDA. Some of you

16 Mar 2018

Data Fabric P11 session agenda

Dear all,
the RDA P11 Data Fabric IG session will take place during breakout 8,
Friday 11:00-12:30, Room A04. The planned session agenda is as follows:
1. Introduction (10 min.)
2. Future trends – individual topics under common object and collection
management (40 min.):
a. Object management and provenance in data analytics
b. ENVRI provenance concerns and their mapping to Data Fabric components
c. Metadata components and metadata fabrics
d. Activities by the Chinese Academy of Sciences
3. Open discussion (20 min.)

07 Mar 2018

Call details for today's meeting

Talk to you in about an hour. Details below.
Data Fabric VC
Wed, Mar 7, 2018 1:00 PM - 2:00 PM CET
Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/983339573
You can also dial in using your phone.
United States: +1 (646) 749-3129
Access Code: 983-339-573
More phone numbers
Canada: +1 (647) 497-9391
Finland: +358 923 17 0568
France: +33 157 329 484
Germany: +49 69 5880 7802 75

07 Mar 2018

Future Trends for the Data Fabric & Berlin P11

Dear all,
following our last call in December, I have received multiple detailed
ideas for the future activities of the Data Fabric IG for P11 and
beyond, which I have now compiled into attached document. This is an
open collection and more contributions are most welcome; we don't need
elaborated abstracts at this point, but 1-2 paragraphs will help to find
out what the scope of specific activities will be.
We should also discuss this at our call later today and see how it fits

02 Mar 2018

paper on infrastructure evolution patterns

Dear colleagues of the DFT and Data Fabric Group,
I would like to inform you about a paper George Strawn and I have written and which is based on many discussions in DFIG, DFT and other RDA groups. We wrote a paper where we compare the evolution of different large infrastructures, extract some patterns and compare it with the state of the data domain.
http://doi.org/10.23728/b2share.4e8ac36c0dd343da81fd9e83e72805a0
If you like reading it, it would be fine for us to get your comments.

31 Jan 2018

Supporting output discussion call

Hi,
This is the work we did revising the definitions of the proposal. The document attached shows some suggestions for improvements and a glossary of terms which could help.
Should I mail it to the Group?
Regards
Abraham
________________________________
From: weigel=***@***.***-groups.org <***@***.***-groups.org> on behalf of TobiasWeigel <***@***.***>
Sent: Tuesday, January 30, 2018 1:38 PM
To: TobiasWeigel

19 Dec 2017

Data Fabric P11 session planning

Dear all,
I have transferred the Google document contents to attached document and
in view of the discussion on how to limit presentation time, I've now
structured the possible contributions into three areas. We will then use
the pre-P11 call to figure out how to specifically populate each of the
three areas. The original list of individual contributions is still
included in the appendix, but this will not go into the formal
submission; instead, I'd like to base our collaborative discussion

19 Dec 2017

Scheduling pre-P11 call and supporting output discussion call

Dear all,
thank you for joining today's meeting.
As discussed during the call, we will have two more calls before P11:
1.) Supporting output discussion. The point should be to react to the
review comment with minimal new content to be included, and make sure
that the state as described in the document is understandable by
outsiders. We must be fully aware that this is an unfinished discussion.
The supporting output document and Mark P.'s comments are here:

12 Dec 2017

Data Fabric VC on Tuesday, Dec 19, 12:00 UTC

Dear all,
we will have a Data Fabric virtual meeting on Tuesday, Dec 19,
12:00-13:00 UTC. The focus of discussion will be on plans for a Data
Fabric session at the RDA P11 in Berlin and possible joint sessions with
other groups.
Best, Tobias
--
Dr. Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany
Phone: +49 40 460094-104
Email: ***@***.***
URL: http://www.dkrz.de