Research Data Collections WG Activity Overview Re: [rda-datafabric-ig][rda-dft][rda-collection-wg] Re: [rda-datafabric-ig] Re: [rda-datafabric-ig][rda-collection-wg] Re: [rda-datafabri…

Re: [rda-datafabric-ig][rda-dft][rda-collection-wg] Re: [rda-datafabric-ig] Re: [rda-datafabric-ig][rda-collection-wg] Re: [rda-datafabri…

Creator

Discussion
April 12, 2016 at 7:48 pm #83418

Frederik Baumgardt
Member

Hi,
it’s great to see so much activity and I’ve tried to follow all the arguments going on here. I will base my responses on a list of points I have extracted from the discussion so far (https://rd-alliance.org/group/research-data-collections-wg/wiki/mailing-… ). I hope I got everything somewhat right and complete, apologies if I missed or misstated anyone’s opinion – and feel free to correct me where I’m wrong.
With respect to DEFINITIONS. Yes, we need definitions. And I would hope that we figure out ways to cover as many different definitions as possible and manage the conflicting ones to still support their use cases as much as possible. I agree with Gary that set theory can not fully express the meaning we want to capture in our model. But it might be sufficient to formalise some terms, together with a foundational terminology defined in natural language.
With respect to IDENTIFIERS. I fully agree with the distinction between referer and referent. As Jacob pointed out, it gets trickier with the distinction between referer and locator: knowledge about the location has to be embedded somewhere. And I think the defintion Gary pointed to is a pragmatic one: identifiers exist in the context an infrastructure to dereference them. With this in mind, I disagree with Jacob’s characterisation of identifiers as mere labels. RDA infrastructure provides typing of identifiers (https://rd-alliance.org/groups/pid-information-types-wg.html) which embeds ontology into identifiers and I think we should make use of that.
With respect to UNIQUENESS and PERSISTENCE. As Jacob pointed out they are not intrinsic to identifiers, they are implemented by the supporting infrastructure. This adds serious complexity to the requirements on collection models that aim to provide both characteristics. I think they’re not boolean properties and it is important to specify their data types, and how they propagate in aggregations. E.g. depending on the underlying model, a collection that refers to non-persistent objects may or may not be persistent itself.
With respect to IDENTITY. As far as I can tell, we have different concepts of identity floating around, see the previous example. So far I have broadly distinguished between semantic identity (invariable to changes in structure) and structural identity, there might be other categories or sub-categories. E.g. structural identity might be recursive or not (collection identity changes when collection item is changed vs. when it is removed). Or a semantic identity might be elastic to a certain point (constrained by object properties).
With respect to ACCESS CONTROL. Obviously there is infrastructure to deal with it on a technical level. Conceptually, I think it is a special case of the model pertaining to persistence and identity. Not having access to parts of a collection is similar to the collection being changed. How we deal with changes to objects/collections that are supposed to be immutable needs to be part of the discussion. I would just argue that we should solve easier problems first.
With respect to DEFINITION OF COLLECTIONS. I like both approaches, the enumerative/list-based and the generative/rule-based one. The rule-based/generative/query-based approach is in line with recommendations from the Data Citation WG (https://rd-alliance.org/group/data-citation-wg/outcomes/data-citation-re…). Although in a way enumeration is just a very explicit query, correct? I am thinking of a set of operators that can be chained, where disjunction/conjunction operators would address Keith’s concern about the limitations of strictly hierarchical subsets.
With respect to POWER COLLECTIONS. This is possibly jumping a step ahead, but powersets are great to efficiently manage the relations between collections using bitmask implementations of the characteristic functions (https://en.wikipedia.org/wiki/Power_set#Representing_subsets_as_functions).
Best,
Frederik
Creator

Discussion

Research Data Collections WG

Group Organizers

Re: [rda-datafabric-ig][rda-dft][rda-collection-wg] Re: [rda-datafabric-ig] Re: [rda-datafabric-ig][rda-collection-wg] Re: [rda-datafabri…