I have uploaded a draft document to filepot (see https://www.rd-alliance.org/filedepot/folder/100?fid=371) discussing a framework might be helpful for organizing and understanding the candidate vocabulary list which is being testing out using RDA Term Collection Wiki prototype currently under development.
I've also attached a copy of it here.
This is a possible revision of Peter Wittenburg;s preliminary list of candidate terms and organized them into essentially 2 groups – Organizational and Procedural. I use more upper level categories and take a stab at some foundational ideas as well as creating a preliminary listing of items in the categories. This is all tentative and meant for comments as a way of moving towards an organization the group may agree to.
Gary Berg-Cross
Author: Thomas Zastrow
Date: 13 Jan, 2014
Hi Gary,
Thank you very much for the document. From the technical perspective,
there should be no obstacle to re-organize our terms.
If you want to change the category/ies for an existing term, just go to
edit mode and search for the string:
[Freelinking: unknown plugin indicator "Category"]
and change it to whatever you would like. Changing an existing category
is a little bit more tricky, please let me know if you try to do so.
Best,
Tom
Author: Diana Hendrickx
Date: 05 Feb, 2014
When interviewing people from my field (chemical safety) for the RDA Europe project, I discovered that the terms below are often not well understood or differently interpreted by different people. I think it is usefull to include a consensus definition of these terms in the RDA term wiki collection.
List of terms:
centralized system
distributed system
GRID
Database Management System
Relational system
Non-relational system
Proprietary file systems
Data warehouse
Hadoop
Cloud
MapReduce
Ontology
Taxonomy
Data integration
Data harmonization
Data federation
Data harvesting
Data discovery
Data preservation
Workflow management
Data replication
Open data principle
Author: Peter Wittenburg
Date: 05 Feb, 2014
Thanks Diana.
That list is impressive again. There are some terms which are close to what DFT discussed based on the models submitted.
But there are also a number of terms such as Data warehouse, Hadoop, Cloud, MapReduce which are special and not in DFT's focus.
Best
Peter
- Show quoted text -From: d.hendrickx=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of dhendrickx
Sent: Wednesday, February 05, 2014 5:08 PM
To: Data Foundation and Terminology WG
Subject: Re: [rda-dft-wg] New organization structure proposed for DFT candidate vocabulary
When interviewing people from my field (chemical safety) for the RDA Europe project, I discovered that the terms below are often not well understood or differently interpreted by different people. I think it is usefull to include a consensus definition of these terms in the RDA term wiki collection.
List of terms:
centralized system
distributed system
GRID
Database Management System
Relational system
Non-relational system
Proprietary file systems
Data warehouse
Hadoop
Cloud
MapReduce
Ontology
Taxonomy
Data integration
Data harmonization
Data federation
Data harvesting
Data discovery
Data preservation
Workflow management
Data replication
Open data principle
--
Full post: https://rd-alliance.org/new-organization-structure-proposed-dft-candidat...
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/1174
Author: Herman Stehouwer
Date: 05 Feb, 2014
Dear Peter,
I agree.
However, given that some of the terms fall outside the scope of the terminology group, how do we clarify these for the community?
They are obviously general enough that people should know about them.
(though I don’t want to be the one defining “cloud”)
Cheers,
Herman
Author: Gary Berg-Cross
Date: 05 Feb, 2014
I like the list and will check the literature for some candidate defs.
Perhaps you mihht help by noting some elements of confusion or
didagreementd people had on things like distributed systems.
I would think that a "data warehouse" could be part of our focus. Certainly
it is part of big data reality.
Gary
On Feb 5, 2014 11:09 AM, "dhendrickx" <***@***.***>
wrote:
Author: Thomas Zastrow
Date: 05 Feb, 2014
I'm working on a new version of the wiki where it will be possible to
set different scopes for any term. Anyway, I also wouldn't integrate
terms like "Hadoop", its to specific and maybe in 5 years nobody will
talk about Hadoop anymore ;-)
Author: Diana Hendrickx
Date: 05 Feb, 2014
e.g. A system where multiple computers in the same institute communicate through a network is by some people called "centralized system" (because all computers are on the same location), while others call it a "distributed system" (because information is spread over different computers).
I think it would be helpfull that for future interviews a link to clear definitions of terms can be added to questionnaires.
Diana
Author: Peter Wittenburg
Date: 05 Feb, 2014
Gary,
The term "data warehouse" typically comes from the data mining folks and I assume that they have a history of discussing this term.
It would be good if we would have a IG/WG to make use of their knowledge to have them defining such terms.
I am afraid to put up definitions taken from various sources without having the core folks on board. We risk of creating a domain of incoherent definitions etc.
Best
Peter
From: gbergcross=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of Gary
Sent: Wednesday, February 05, 2014 5:47 PM
To: ***@***.***-groups.org
Subject: Re: [rda-dft-wg] New organization structure proposed for DFT candidate vocabulary
I like the list and will check the literature for some candidate defs.
Perhaps you mihht help by noting some elements of confusion or didagreementd people had on things like distributed systems.
I would think that a "data warehouse" could be part of our focus. Certainly it is part of big data reality.
- Show quoted text -On Feb 5, 2014 11:09 AM, "dhendrickx" <***@***.***> wrote:
When interviewing people from my field (chemical safety) for the RDA Europe project, I discovered that the terms below are often not well understood or differently interpreted by different people. I think it is usefull to include a consensus definition of these terms in the RDA term wiki collection.
List of terms:
centralized system
distributed system
GRID
Database Management System
Relational system
Non-relational system
Proprietary file systems
Data warehouse
Hadoop
Cloud
MapReduce
Ontology
Taxonomy
Data integration
Data harmonization
Data federation
Data harvesting
Data discovery
Data preservation
Workflow management
Data replication
Open data principle
--
Full post: https://rd-alliance.org/new-organization-structure-proposed-dft-candidat...
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/1174
Author: Gary Berg-Cross
Date: 05 Feb, 2014
Peter
I believe we can usefully distinguish data warehouses from opearational
data stores by data agreegation and both from data marts.
In my experience the use of queries is as prominent as dats mining.
Author: Peter Wittenburg
Date: 05 Feb, 2014
Diana,
It depends a bit on the circumstances whether you speak about a centralized or a distributed system. Technology continuously offers new choices. I am just sitting with High-Performance-Computation people where these two terms again have different flavors.
Yes we can try to define such terms, but it is not as trivial to come to a definition which will be taken serious. Let's also not forget that RDA is about data. The terms centralized and distributed have their history in computation.
So let's focus on what we first find in our models - we need to be grounded.
Peter
- Show quoted text -From: d.hendrickx=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of dhendrickx
Sent: Wednesday, February 05, 2014 6:09 PM
To: Data Foundation and Terminology WG
Subject: Re: [rda-dft-wg] New organization structure proposed for DFT candidate vocabulary
e.g. A system where multiple computers in the same institute communicate through a network is by some people called "centralized system" (because all computers are on the same location), while others call it a "distributed system" (because information is spread over different computers).
I think it would be helpfull that for future interviews a link to clear definitions of terms can be added to questionnaires.
Diana
--
Full post: https://rd-alliance.org/new-organization-structure-proposed-dft-candidat...
Manage my subscriptions: https://rd-alliance.org/mailinglist
Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/1174
Author: Gary Berg-Cross
Date: 07 Feb, 2014
Diana,
The explanations you provide make sense and would not be obvious to
everyone which explains why the definitions that will be useful to people
is grounded in real experience.
What i take out of the distinctions is that there are degrees or types of
centralization and some are total while others are location based with data
distributed between these local resources. This is a distinction that can
be at least broached.
Gary Berg-Cross, Ph.D.
***@***.***
http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross
NSF INTEROP Project
http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0955816
SOCoP Executive Secretary
Knowledge Strategies
Potomac, MD
240-426-0770
On Wed, Feb 5, 2014 at 12:09 PM, dhendrickx <
***@***.***> wrote: