Data Fabric IG


Group details

Chair (s): 
Case Statement: 
IG Established

The Data Fabric IG (DFIG) identified that working with data in the many scientific labs and most probably also in other areas such as industry and governance is highly inefficient and too costly. Excellent scientists working on date intensive science tasks are forced to spend about 75% of their time to manage, find, combine and curate data. What a waste of time and capacity. The DFIG is therefore looking at the data creation and consumption cycle to identify opportunities to optimize the work with data, to place current RDA activities in the overall landscape, to look what other rcommunities are doing in this area and to foster testing and adoption of RDA outputs. The goal of DFIG finally is to identify common components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios such as replicating data in federations, developing virtual research environments, and automating regular data management tasks. Much important work is being done on data publishing and citation, but DFIG believes that we need to start at early moments in the "Data Fabrics" in the labs to organize, document and manage data professionally if we want to meet the requirements of the coming decades.


DFIG is focusing on the data creation and consumption cycle as it happens daily in the scientific and industrial labs and on the identification of ways to make this work more efficiently and thus more cost-effective.

DFIG's goal is to identify common components and define their characteristics and services that can be used across boundaries in such a way that they can be combined to solve a variety of data scenarios.

Throughout its existence, DFIG has shepherded multiple spin-off groups into existence, dealing with specific aspects of the cycle and components involved, particularly regarding Persistent Idenfiers (PIDs), their relevance and applicability to address data referencing and management issues. These efforts have brought forth a new understanding which is summarized in an overview document here.

The group is currently reassessing the overall landscape in trying to identify the next challenges, components or other work areas of interest. An overview is contained in The Future Trends for the Data Fabric.

Recent Activity

16 Mar 2018

Data Fabric P11 session agenda

Dear all,
the RDA P11 Data Fabric IG session will take place during breakout 8,
Friday 11:00-12:30, Room A04. The planned session agenda is as follows:
1. Introduction (10 min.)
2. Future trends – individual topics under common object and collection
management (40 min.):
a. Object management and provenance in data analytics
b. ENVRI provenance concerns and their mapping to Data Fabric components
c. Metadata components and metadata fabrics
d. Activities by the Chinese Academy of Sciences
3. Open discussion (20 min.)

07 Mar 2018

Call details for today's meeting

Talk to you in about an hour. Details below.
Data Fabric VC
Wed, Mar 7, 2018 1:00 PM - 2:00 PM CET
Please join my meeting from your computer, tablet or smartphone.
You can also dial in using your phone.
United States: +1 (646) 749-3129
Access Code: 983-339-573
More phone numbers
Canada: +1 (647) 497-9391
Finland: +358 923 17 0568
France: +33 157 329 484
Germany: +49 69 5880 7802 75

07 Mar 2018

Future Trends for the Data Fabric & Berlin P11

Dear all,
following our last call in December, I have received multiple detailed
ideas for the future activities of the Data Fabric IG for P11 and
beyond, which I have now compiled into attached document. This is an
open collection and more contributions are most welcome; we don't need
elaborated abstracts at this point, but 1-2 paragraphs will help to find
out what the scope of specific activities will be.
We should also discuss this at our call later today and see how it fits

02 Mar 2018

paper on infrastructure evolution patterns

Dear colleagues of the DFT and Data Fabric Group,
I would like to inform you about a paper George Strawn and I have written and which is based on many discussions in DFIG, DFT and other RDA groups. We wrote a paper where we compare the evolution of different large infrastructures, extract some patterns and compare it with the state of the data domain.
If you like reading it, it would be fine for us to get your comments.

31 Jan 2018

Supporting output discussion call

This is the work we did revising the definitions of the proposal. The document attached shows some suggestions for improvements and a glossary of terms which could help.
Should I mail it to the Group?
From: weigel=***@***.*** <***@***.***> on behalf of TobiasWeigel <***@***.***>
Sent: Tuesday, January 30, 2018 1:38 PM
To: TobiasWeigel

19 Dec 2017

Data Fabric P11 session planning

Dear all,
I have transferred the Google document contents to attached document and
in view of the discussion on how to limit presentation time, I've now
structured the possible contributions into three areas. We will then use
the pre-P11 call to figure out how to specifically populate each of the
three areas. The original list of individual contributions is still
included in the appendix, but this will not go into the formal
submission; instead, I'd like to base our collaborative discussion

19 Dec 2017

Scheduling pre-P11 call and supporting output discussion call

Dear all,
thank you for joining today's meeting.
As discussed during the call, we will have two more calls before P11:
1.) Supporting output discussion. The point should be to react to the
review comment with minimal new content to be included, and make sure
that the state as described in the document is understandable by
outsiders. We must be fully aware that this is an unfinished discussion.
The supporting output document and Mark P.'s comments are here:

12 Dec 2017

Data Fabric VC on Tuesday, Dec 19, 12:00 UTC

Dear all,
we will have a Data Fabric virtual meeting on Tuesday, Dec 19,
12:00-13:00 UTC. The focus of discussion will be on plans for a Data
Fabric session at the RDA P11 in Berlin and possible joint sessions with
other groups.
Best, Tobias
Dr. Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany
Phone: +49 40 460094-104
Email: ***@***.***

07 Dec 2017

Re: An potentia use case for the DFIG

Hello Abraham, Barbara,
thank you for sharing this perspective from ENVRIplus (slides attached
again). I agree that the provenance metadata concerns you illustrate
have relevance to the Data Fabric concepts. We have been discussing
provenance as a strong driver use case particularly in view of PID
record usage, which led to the creation of the PID Kernel Information
WG. The main approach here is that the most basic provenance relations
can be formulated by maintaining essential (kernel) information within

05 Dec 2017

Data Fabric P11 planning VC

Dear all,
we would like to schedule a brief call before the RDA P11 sessions
submissions deadline. Jianhui and my schedule did not leave us a lot of
options, but if you would like to participate, please state your
availability in the following Doodle until end of this week:
Best, Tobias
Dr. Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany