Data Type Registries (DTR) Recommendations (Endorsed)

  Data Type Registries Working Group

Recommendation Title: Data Type Model and Registry

Impact: Ensures data producers classify their data sets in standard data types, allowing data users to automatically identify instruments to process and visualise the data

Recommendation package DOI: dx.doi.org/10.15497/A5BCD108-ECC4-41BE-91A7-20112FF77458

Responsible RDA Working Group Co-Chairs: 

Larry Lannom - Corporation for National Research Initiatives, Virginia USA

Daan Broeder - Max Planck Institute for Psycholinguistics, Netherlands

Summary

The RDA Data Type Registries (DTR) Working Group (WG) was approved at the first RDA Plenary (March 2013, Gothenburg, Sweden). The basic goal of this group was to aid data sharing efforts through improved data typing, specifically to make clear the details and assumptions buried in other peoples’ data. This was seen primarily as a problem in defining a data model appropriate to a wide potential collection of data types, prototyping that model in a registry, and developing a federation strategy across multiple registry instances, all following an analysis of use cases and related efforts. Larry Lannom of CNRI and Daan Broeder of MPI took on the co-chair tasks.

 

The WG attracted a large degree of interest, both at the conceptual level and in the details of the prototype, which the co-chairs took as confirmation of the relevance of the issue. The prototype was successfully deployed and a number of use cases implemented, allowing us to gain experience with DTR issues and discuss the community’s reactions and comments. The scope of the issues involved, however, proved to be too broad especially with respect to community specific typing needs for a single WG and therefore a follow-on WG, provisionally named Data Typing, will be proposed. The follow-on WG will primarily try to identify the data model that will allow people to specify and represent data types from select communities. In the end, the outcomes of the DTR WG can be summarized as:

 

Confirmation that detailed and precise data typing is a key consideration in data sharing and reuse and that a federated registry system for such types is highly desirable and needs to accommodate each community’s own requirements

 

Deployment of a prototype registry implementing one potential data model, against which various use cases can be tested

Involvement of multiple ongoing scientific data management efforts, across a variety of domains, in actively planning for and testing the use of data types and associated registries in their data management efforts

 

Integration with one additional RDA WG (Persistent Identifier Types) and at least one Interest Group (RDA/CODATA Materials Data, Infrastructure & Interoperability IG)

 

Development of a set of questions that require further consideration before a detailed recommendation on data typing can be issued.

 

Finally, we believe that the DTR WG served as an excellent example of the benefits that RDA can and will bring to solving the problems of data sharing, by bringing together what would otherwise be disparate domain-specific groups to focus on common problems at the data level as opposed to the domain level. The remainder of this report will provide details on the outcomes summarized above. 

 


The output is now available for public comment, please have a look.

 

Group content visibility: 
Use group defaults
File: 
  • James Passmore's picture

    Author: James Passmore

    Date: 02 Jun, 2015

    I couldn't see in the discussion paper anything about existing mechanisms to aid data sharing, particularly the publication of ISO 19139 metadata for data and services. How do these registries tie in with such metadata?

    Similarly, I'm concerned that you seem to be defining a new mechanism for data typing when others already exist, such as O&M, SWE. I'm probably missing something obvious. Did you look at such standards and reject them as not fit for purpose?

  • Rufus Pollock's picture

    Author: Rufus Pollock

    Date: 03 Jun, 2015

    Consider connections with:

    - JSON Table Schema - http://dataprotocols.org/json-table-schema/

    - Data Package profiles - https://github.com/okfn/data.okfn.org/issues/184 and https://github.com/dataprotocols/dataprotocols/issues/183

    In particular, re the latter can we share data types references between the two.

  • Paul Millar's picture

    Author: Paul Millar

    Date: 03 Jun, 2015

    Storing objects along with arbitrary metadata (as a JSON object), fetching by object-ID, and querying for objects matching metadata predicates are all supported by the CDMI standard, initially by SNIA and now an ISO standard:

    http://www.snia.org/cdmi

    It would be interesting to see whether the recommendations of this group can be reduced to a CDMI profile document.  Is the standard sufficient or are additional features required?

    SNIA has a community process where interested parties can propose profiles or extensions to CDMI: http://www.snia.org/tech_activities/publicreview/cdmi.  In this way, this WG can have a direct impact on the future direction of storage technology.

    Cheers,

    Paul.

  • Michael Lutz's picture

    Author: Michael Lutz

    Date: 17 Jun, 2015

    We are working on the topic of publishing registers in the domain of INSPIRE (Infrastructure for Spatial Information in Europe, http://inspire.ec.europa.eu/) as well.

    You can take a look at the INSPIRE registry (http://inspire.ec.europa.eu/registry) which currently includes 7 registers, ranging from simple lists of data themes (http://inspire.ec.europa.eu/theme) to more complex hierarchical registers for code lists and their values (http://inspire.ec.europa.eu/codelist/).

    We are currently discussing the creation of a register to represent the INSPIRE UML data models (including packages, classes and properties), which is similar to the model presented in your study.
     
    We are working as well on a study to set up a federation of registers hosted in different organisations. The federation shall support local extensions of central registers as well as search and retrieval across the registers in the federation through a central access point. Part of this work will be the definition of APIs to access (and modify) the content of the federated registers.

    If this work is of interest for your group, we can provide you with more detailed information.

  • Larry Lannom's picture

    Author: Larry Lannom

    Date: 26 Jul, 2015

    Thanks all for these comments. I want to apologize for the extremely
    delayed response, although for some of the comments I hope the April
    Outputs document gave a spirit of a response.
     
    In answer to the various questions and comments on how DTR fits with
    existing efforts, I want to emphasize that the Data Type Registry is
    not intended to be a stand-alone source of all type information. As
    stated in the Output document section on related related efforts: "In
    some ways a group of federated type registries can be seen as
    multiplexing across existing efforts, providing a common API and
    system of unique identifiers for referential integrity of existing
    efforts and providing a low barrier of entry to adding to and
    enhancing existing work. The ‘Related Standards and Recommendations’
    component of the proposed data model would provide the connectivity."
     
    The incipient follow-on WG, which will be called Data Typing, will
    look into establishing those relationships, while still allowing
    freedom of action for easily establishing types that don't fit into
    existing services and efforts, for whatever reason. The new group will
    specifically look into the efforts mentioned in the above comments,
    such as the INSPIRE work. There will be a BOF at P6 to begin to map
    this out and I encourage those who are interested in this work to
    attend.
     
     
    Thanks for the interest and information,
     
    Larry

     

     

  • Sara john's picture

    Author: Sara john

    Date: 28 Feb, 2017

    I couldn't see in the discussion paper anything about existing mechanisms to aid data sharing, particularly the publication of ISO 19139 metadata for data and services. How do these registries tie in with such metadata?

    an example:

    برنامج حسابات | برنامج حسابات ومخازن

    can u tested it ?

submit a comment