Re: [rda-datafabric-ig][rda-collection-wg] Re: [rda-datafabric-ig][rda-collection-wg] Re: [rda-datafabric-ig][rda-collection-wg] Some thoughts on "Data Aggregations" terminology & concepts

12 Apr 2016

Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
identifiers collections should have can be non-trivial.
>
>
>
> Best,
>
>
>
> Juha
>
>
>
>
>
> From: keith.jeffery=***@***.***-groups.org
[mailto:***@***.***-groups.org] On
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
identifiers collections should have can be non-trivial.
>
>
>
> Best,
>
>
>
> Juha
>
>
>
>
>
> From: keith.jeffery=***@***.***-groups.org
[mailto:***@***.***-groups.org] On
Behalf Of ***@***.***
> Sent: 11. huhtikuuta 2016 20:23
> To: uschwar1 <***@***.***>; Jeremy York
<***@***.***>; TobiasWeigel <***@***.***>; Data Fabric
IG <***@***.***-groups.org>; Research Data Collections WG
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
identifiers collections should have can be non-trivial.
>
>
>
> Best,
>
>
>
> Juha
>
>
>
>
>
> From: keith.jeffery=***@***.***-groups.org
[mailto:***@***.***-groups.org] On
Behalf Of ***@***.***
> Sent: 11. huhtikuuta 2016 20:23
> To: uschwar1 <***@***.***>; Jeremy York
<***@***.***>; TobiasWeigel <***@***.***>; Data Fabric
IG <***@***.***-groups.org>; Research Data Collections WG
<***@***.***-groups.org>
> Cc: ThomasZastrow
<***@***.***>; Gary <***@***.***>
> Subject: [rda-datafabric-ig] RE:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Some thoughts on "Data
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
identifiers collections should have can be non-trivial.
>
>
>
> Best,
>
>
>
> Juha
>
>
>
>
>
> From: keith.jeffery=***@***.***-groups.org
[mailto:***@***.***-groups.org] On
Behalf Of ***@***.***
> Sent: 11. huhtikuuta 2016 20:23
> To: uschwar1 <***@***.***>; Jeremy York
<***@***.***>; TobiasWeigel <***@***.***>; Data Fabric
IG <***@***.***-groups.org>; Research Data Collections WG
<***@***.***-groups.org>
> Cc: ThomasZastrow
<***@***.***>; Gary <***@***.***>
> Subject: [rda-datafabric-ig] RE:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Some thoughts on "Data
Aggregations" terminology & concepts
>
>
>
> All –
>
> Let me rejoin now.
>
> 1. I don’t like ‘a collection is a PID’. A collection is a
collection and a PID is something that identifies it uniquely and
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
identifiers collections should have can be non-trivial.
>
>
>
> Best,
>
>
>
> Juha
>
>
>
>
>
> From: keith.jeffery=***@***.***-groups.org
[mailto:***@***.***-groups.org] On
Behalf Of ***@***.***
> Sent: 11. huhtikuuta 2016 20:23
> To: uschwar1 <***@***.***>; Jeremy York
<***@***.***>; TobiasWeigel <***@***.***>; Data Fabric
IG <***@***.***-groups.org>; Research Data Collections WG
<***@***.***-groups.org>
> Cc: ThomasZastrow
<***@***.***>; Gary <***@***.***>
> Subject: [rda-datafabric-ig] RE:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Some thoughts on "Data
Aggregations" terminology & concepts
>
>
>
> All –
>
> Let me rejoin now.
>
> 1. I don’t like ‘a collection is a PID’. A collection is a
collection and a PID is something that identifies it uniquely and
permanently
>
> 2. The recursive approach is elegant but limited; it should be
possible to express relationships between any collections (or any DO)
whether hierarchic (‘belongs to’/is part of’) or in a fully connected
graph where it may be that one collection is a proper subset of another
(or superset of >1 other collections) or that collection A was derived
from Collection B by process X or that collection C was derived from
collection D with process U and from collection E with process W - and
all with appropriate date/time stamping so that provenance is recorded
Hi Gary, all,
I agree with Thomas: this now tends to become a more and more
philosophical debate - I like this, and we should continue this perhaps
with a beer in Denver. But to shorten the decisions process here let me
assume that an undoubted goal is to setup the foundations to build
automated processes on collections and try to bring it down to a simple
question:
Do we want to be able to prove the correctness of processes on
collections or not. If this is case, we need a mathematical solid
definition of the object we are working on. I'm not saying, that we have
to prove correctness for all processes, btw., that's not common practice
in computer science anyway.
Alternatively we also can say, we omit the possibility of correctness
proves and use artificial intelligence. In this case we can just use
language and, if really wanted, ontologies.
The obvious resulting question in this case is, how and why AI processes
would need a concept of collection. I suppose, these processes would not
reflect on collections but just use the links inside collections in an
unstructured, recursive way, just as a crawler would work. A concept of
collections becomes unnecessary for such processes, they just work.
But to understand, how they work brings us back to the foundations of
automated processes on collections and the correctness proves of our
understanding. That's why I think we should rely on sound definitions.
To Juha and Keith (1.):
we are still talking about whether we use PID or ID or both inside
collections. The mayor point is, that we want to formalize the
references as the mayor structural element of collections.
The phrase "But saying that a collection is a PID is a bit like saying
that a book is an ISBN." is great and shows, what is irritating here,
even if it makes sense from a mathematical viewpoint.
May suggestion would be to change the definition to: A collection is
+++referenced by+++ a PID pointing to a digital object consisting of a
set/list of PIDs/Ids and a set of additional pointers/links and metadata
together with each PID/Id.
Juha's phrase then becomes: "But saying that a collection is referenced
by a PID is a bit like saying that a book is referenced by an ISBN."
which sounds reasonable for me.
To Keith (2.):
it should be possible to express relationships between any collections
(or any DO) whether hierarchic (‘belongs to’/is part of’) or in a fully
connected graph where it may be that one collection is a proper subset
of another (or superset of >1 other collections) or that collection A
was derived from Collection B by process X or that collection C was
derived from collection D with process U and from collection E with
process W - and all with appropriate date/time stamping so that
provenance is recorded (and all associated descriptive / contextual /
actionable metadata).
I understand this as a statement about possible counterexamples, but
actually it is a great 'collection' of test cases, where one can see the
possiblities of the given definition:
The fully connected graph is a resulting description of collections seen
as vertices with (directed) edges given by the PIDs/Ids inside each of
the collections. The fully connected graph therefore is a collection
given by the process 'give me all vertices and edges connected to one of
its vertices'. Proper subsets of collections are in the scope of the
definition as well. And that collection A could be derived from
Collection B by process X, was something I said before anyway. Date/time
stamping and provenance for collections is on the roadmap of the
collections WG too. So from my point of view at least this all fits
quite well.
Am 12.04.2016 um 00:12 schrieb Gary Berg-Cross:
> Ulrich
> In response to your reductive assumption in:
>
> >To Gary: of course a collection is something different to an ordinary
> PID also in my reductionist approach. It is a PID, that points to a
> very special kind of DO. My assumption is, that this is sufficient for
> all underlying "substance". But this of course still has to be proven.
> But perhaps the examples I mentioned already give a feeling of the
> possibilities, that such a definition can have.
>
> PID doesn't seem to be the substrate even if it can be formalized
> nearly and recursed. Behind a PID idea is that of Identity, but even
> this doesn't seem like a basis for build up a Collection concept.
> Data collections pre-existed digital data and thus PID as a practical
> example.
>
>
> I am more in the camp of ontologists like John Sowa who see
> ontological concepts as the material which logical operators are used
> to express concepts.
>
> "Pure logic is ontologically neutral. It makes no presuppositions
> about what exists or may exist in any domain or any language for
> talking about the domain. To represent knowledge about a specific
> domain, it must be supplemented with an ontology that defines the
> categories of things in that domain and the terms that people use to
> talk about them. The ontology defines the words of a natural language,
> the predicates of predicate calculus, the concept and relation types
> of conceptual graphs, the classes of an object-oriented language, or
> the tables and fields of a relational database." from *"**Ontology,
> Metadata, and Semiotics"* *John F. Sowa*
>
> So as a basis of Collection, if you want to find an atom for the
> molecule of Collection it might be the idea of "and" or "partOf" which
> produces aggregations & wholes. But there are just some many ways of
> building larger structures from smaller ones and this is sub-part of
> ontology called Mereology.
>
> So to me we can't start with mathematical and logical terms and expect
> to build a world unless we use concepts with terms from that world.
>
> Again to quote Sowa on this language effort:
> "No ontology, formal or informal, is independent of the vocabulary
> and the methodologies (i.e., language games) used to analyze the data.
> Natural language terms have been the starting point for every ontology
> from Aristotle to the present. Even the most abstract ontologies of
> mathematics and science are analyzed, debated, explained, and taught
> in natural languages. For computer applications, the users who enter
> data and choose options on menus, think in the words of the NL
> vocabulary. Any options that cannot be explained in words the users
> understand are open invitations to mistakes, confusions, and system
> vulnerabilities. Therefore, every ontology that has any practical
> application must have a mapping, direct or indirect, to and from
> natural languages. " (from John Sowa's "The Role of Logic and Ontology
> In Language and Reasoning."
>
> Gary Berg-Cross, Ph.D.
> ***@***.***
> ​ ​
> ​
> _http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross_
> Member, Ontolog Board of Trustees
> Independent Consultant
> Potomac, MD
> 240-426-0770
>
Am 12.04.2016 um 09:33 schrieb jehakala:
>
> Hello,
>
>
>
> Dublin Core community discussed the definition of collection a lot
when we were drafting DC Collections application profile, available at
http://dublincore.org/groups/collections/collection-application-profile/. After
trying several other alternatives we finally decided to use simply
“collection is an aggregation of items” since adding more detail would
have limited the applicability of the definition. The definition allows
even collections with zero items (one of the things which also caused
problems). Item in turn is a physical or digital resource, and these
resources may be complex, like research data sets.
>
>
>
> Like Keith I do not think it is a good idea to use PID in the
definition of a collection. There are a lot of collections out there
which do not and may never have PIDs or any other kind of identifiers.
Identifier such as ISCI (International Standard Collection Identifier,
ISO 27730 http://www.iso.org/iso/catalogue_detail.htm?csnumber=44293) is
one of the key metadata elements describing a collection, and I’m fine
with for instance making it mandatory in RDA. But saying that a
collection is a PID is a bit like saying that a book is an ISBN. RDA can
of course use whatever collection definition it wishes, but other
communities may not follow the example, or understand fully what is
going on. On the other hand, using or refining the Dublin Core
definition of collection (or something else that is already out there)
would make the RDA approach easier to grasp.
>
>
>
> One of the things I like in DC Collections application profile is its
data model, which was inherited from an earlier research project carried
out in the UK. RDA is of course free to develop its own data model, but
IMO it would do no harm to take a look at what Dublin Core community has
done. DC data model does not explicitly present sub- and
super-collections, but they have been taken into account in the metadata
level, just like associated collections and associated publications,
which are both relevant for research data collections.
>
>
>
> International Standard Collection Identifier, by the way, is a
semantic identifier which is based on the standard identifier of the
agent which owns the collection. For instance, any ISCI owned by the
National library of Finland would start with FI-NL, which is the
library’s ISIL standard identifier. Deciding what kind of (standard)
identifiers collections should have can be non-trivial.
>
>
>
> Best,
>
>
>
> Juha
>
>
>
>
>
> From: keith.jeffery=***@***.***-groups.org
[mailto:***@***.***-groups.org] On
Behalf Of ***@***.***
> Sent: 11. huhtikuuta 2016 20:23
> To: uschwar1 <***@***.***>; Jeremy York
<***@***.***>; TobiasWeigel <***@***.***>; Data Fabric
IG <***@***.***-groups.org>; Research Data Collections WG
<***@***.***-groups.org>
> Cc: ThomasZastrow
<***@***.***>; Gary <***@***.***>
> Subject: [rda-datafabric-ig] RE:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Re:
[rda-datafabric-ig][rda-collection-wg] Some thoughts on "Data
Aggregations" terminology & concepts
>
>
>
> All –
>
> Let me rejoin now.
>
> 1. I don’t like ‘a collection is a PID’. A collection is a
collection and a PID is something that identifies it uniquely and
permanently
>
> 2. The recursive approach is elegant but limited; it should be
possible to express relationships between any collections (or any DO)
whether hierarchic (‘belongs to’/is part of’) or in a fully connected
graph where it may be that one collection is a proper subset of another
(or superset of >1 other collections) or that collection A was derived
from Collection B by process X or that collection C was derived from
collection D with process U and from collection E with process W - and
all with appropriate date/time stamping so that provenance is recorded
(and all associated descriptive / contextual / actionable metadata).
>
> So this is an appeal that we do not simplify to a level where we lose
--
Mit freundlichem Gruss
Ulrich Schwardmann
Phone:+49-551-201-1542 Email:***@***.*** _____ _____ ___
Gesellschaft fuer wissenschaftliche / __\ \ / / \ / __|
Datenverarbeitung mbH Goettingen (GWDG) | (_--\ \/\/ /| |) | (_--
Am Fassberg 11 D-37077 Goettingen Germany \___| \_/\_/ |___/ \___|
URL: http://www.gwdg.de E-Mail: ***@***.***
Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Dipl.-Kfm. Markus Hoppe
Sitz der Gesellschaft: Goettingen Registergericht: Goettingen
Handelsregister-Nr. B 598 Zertifiziert nach ISO 9001