New organization structure proposed for DFT candidate vocabulary

11 Jan 2014

I have uploaded a draft document to filepot (see https://www.rd-alliance.org/filedepot/folder/100?fid=371) discussing a framework might be helpful for organizing and understanding the candidate vocabulary list which is being testing out using RDA Term Collection Wiki prototype currently under development.

 

I've also attached a copy of it here.

This is a possible revision of Peter Wittenburg;s preliminary list of candidate terms and organized them into essentially 2 groups – Organizational and Procedural.  I use more upper level categories and take a stab at some foundational ideas as well as creating a preliminary listing of items in the categories.  This is all tentative and meant for comments as a way of moving towards an organization the group may agree to.

Gary Berg-Cross

File Attachment: 
  • Thomas Zastrow's picture

    Author: Thomas Zastrow

    Date: 13 Jan, 2014

    Hi Gary,
    Thank you very much for the document. From the technical perspective,
    there should be no obstacle to re-organize our terms.
    If you want to change the category/ies for an existing term, just go to
    edit mode and search for the string:
    [Freelinking: unknown plugin indicator "Category"]
    and change it to whatever you would like. Changing an existing category
    is a little bit more tricky, please let me know if you try to do so.
    Best,
    Tom

  • Diana Hendrickx's picture

    Author: Diana Hendrickx

    Date: 05 Feb, 2014

    When interviewing people from my field (chemical safety) for the RDA Europe project, I discovered that the terms below are often not well understood or differently interpreted by different people. I think it is usefull to include a consensus definition of these terms in the RDA term wiki collection.

    List of terms:

    centralized system

    distributed system

    GRID

    Database Management System

    Relational system

    Non-relational system

    Proprietary file systems

    Data warehouse

    Hadoop

    Cloud

    MapReduce

    Ontology

    Taxonomy

    Data integration

    Data harmonization

    Data federation

    Data harvesting

    Data discovery

    Data preservation

    Workflow management

    Data replication

    Open data principle

  • Peter Wittenburg's picture

    Author: Peter Wittenburg

    Date: 05 Feb, 2014

    Thanks Diana.
    That list is impressive again. There are some terms which are close to what DFT discussed based on the models submitted.
    But there are also a number of terms such as Data warehouse, Hadoop, Cloud, MapReduce which are special and not in DFT's focus.
    Best
    Peter
    - Show quoted text -From: d.hendrickx=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of dhendrickx
    Sent: Wednesday, February 05, 2014 5:08 PM
    To: Data Foundation and Terminology WG
    Subject: Re: [rda-dft-wg] New organization structure proposed for DFT candidate vocabulary
    When interviewing people from my field (chemical safety) for the RDA Europe project, I discovered that the terms below are often not well understood or differently interpreted by different people. I think it is usefull to include a consensus definition of these terms in the RDA term wiki collection.
    List of terms:
    centralized system
    distributed system
    GRID
    Database Management System
    Relational system
    Non-relational system
    Proprietary file systems
    Data warehouse
    Hadoop
    Cloud
    MapReduce
    Ontology
    Taxonomy
    Data integration
    Data harmonization
    Data federation
    Data harvesting
    Data discovery
    Data preservation
    Workflow management
    Data replication
    Open data principle
    --
    Full post: https://rd-alliance.org/new-organization-structure-proposed-dft-candidat...
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/1174

  • Herman Stehouwer's picture

    Author: Herman Stehouwer

    Date: 05 Feb, 2014

    Dear Peter,
    I agree.
    However, given that some of the terms fall outside the scope of the terminology group, how do we clarify these for the community?
    They are obviously general enough that people should know about them.
    (though I don’t want to be the one defining “cloud”)
    Cheers,
    Herman

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 05 Feb, 2014

    I like the list and will check the literature for some candidate defs.
    Perhaps you mihht help by noting some elements of confusion or
    didagreementd people had on things like distributed systems.
    I would think that a "data warehouse" could be part of our focus. Certainly
    it is part of big data reality.
    Gary
    On Feb 5, 2014 11:09 AM, "dhendrickx" <***@***.***>
    wrote:

  • Thomas Zastrow's picture

    Author: Thomas Zastrow

    Date: 05 Feb, 2014

    I'm working on a new version of the wiki where it will be possible to
    set different scopes for any term. Anyway, I also wouldn't integrate
    terms like "Hadoop", its to specific and maybe in 5 years nobody will
    talk about Hadoop anymore ;-)

  • Diana Hendrickx's picture

    Author: Diana Hendrickx

    Date: 05 Feb, 2014

    e.g. A system where multiple computers in the same institute communicate through a network is by some people called "centralized system" (because all computers are on the same location), while others call it a "distributed system" (because information is spread over different computers).

    I think it would be helpfull that for future interviews a link to clear definitions of terms can be added to questionnaires.

     

    Diana

  • Peter Wittenburg's picture

    Author: Peter Wittenburg

    Date: 05 Feb, 2014

    Gary,
    The term "data warehouse" typically comes from the data mining folks and I assume that they have a history of discussing this term.
    It would be good if we would have a IG/WG to make use of their knowledge to have them defining such terms.
    I am afraid to put up definitions taken from various sources without having the core folks on board. We risk of creating a domain of incoherent definitions etc.
    Best
    Peter
    From: gbergcross=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of Gary
    Sent: Wednesday, February 05, 2014 5:47 PM
    To: ***@***.***-groups.org
    Subject: Re: [rda-dft-wg] New organization structure proposed for DFT candidate vocabulary
    I like the list and will check the literature for some candidate defs.
    Perhaps you mihht help by noting some elements of confusion or didagreementd people had on things like distributed systems.
    I would think that a "data warehouse" could be part of our focus. Certainly it is part of big data reality.
    - Show quoted text -On Feb 5, 2014 11:09 AM, "dhendrickx" <***@***.***> wrote:
    When interviewing people from my field (chemical safety) for the RDA Europe project, I discovered that the terms below are often not well understood or differently interpreted by different people. I think it is usefull to include a consensus definition of these terms in the RDA term wiki collection.
    List of terms:
    centralized system
    distributed system
    GRID
    Database Management System
    Relational system
    Non-relational system
    Proprietary file systems
    Data warehouse
    Hadoop
    Cloud
    MapReduce
    Ontology
    Taxonomy
    Data integration
    Data harmonization
    Data federation
    Data harvesting
    Data discovery
    Data preservation
    Workflow management
    Data replication
    Open data principle
    --
    Full post: https://rd-alliance.org/new-organization-structure-proposed-dft-candidat...
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/1174

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 05 Feb, 2014

    Peter
    I believe we can usefully distinguish data warehouses from opearational
    data stores by data agreegation and both from data marts.
    In my experience the use of queries is as prominent as dats mining.

  • Peter Wittenburg's picture

    Author: Peter Wittenburg

    Date: 05 Feb, 2014

    Diana,
    It depends a bit on the circumstances whether you speak about a centralized or a distributed system. Technology continuously offers new choices. I am just sitting with High-Performance-Computation people where these two terms again have different flavors.
    Yes we can try to define such terms, but it is not as trivial to come to a definition which will be taken serious. Let's also not forget that RDA is about data. The terms centralized and distributed have their history in computation.
    So let's focus on what we first find in our models - we need to be grounded.
    Peter
    - Show quoted text -From: d.hendrickx=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of dhendrickx
    Sent: Wednesday, February 05, 2014 6:09 PM
    To: Data Foundation and Terminology WG
    Subject: Re: [rda-dft-wg] New organization structure proposed for DFT candidate vocabulary
    e.g. A system where multiple computers in the same institute communicate through a network is by some people called "centralized system" (because all computers are on the same location), while others call it a "distributed system" (because information is spread over different computers).
    I think it would be helpfull that for future interviews a link to clear definitions of terms can be added to questionnaires.
    Diana
    --
    Full post: https://rd-alliance.org/new-organization-structure-proposed-dft-candidat...
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/1174

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 07 Feb, 2014

    Diana,
    The explanations you provide make sense and would not be obvious to
    everyone which explains why the definitions that will be useful to people
    is grounded in real experience.
    What i take out of the distinctions is that there are degrees or types of
    centralization and some are total while others are location based with data
    distributed between these local resources. This is a distinction that can
    be at least broached.
    Gary Berg-Cross, Ph.D.
    ***@***.***
    http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross
    NSF INTEROP Project
    http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0955816
    SOCoP Executive Secretary
    Knowledge Strategies
    Potomac, MD
    240-426-0770
    On Wed, Feb 5, 2014 at 12:09 PM, dhendrickx <
    ***@***.***> wrote:

submit a comment