Research Data Collections WG Activity Overview Re: [rda-collection-wg] [rda-collection-wg] Collection requirements, streaming -…

Re: [rda-collection-wg] [rda-collection-wg] Collection requirements, streaming -…

Creator

Discussion
March 21, 2016 at 5:02 pm #122790

Frederik Baumgardt
Member

Hi Thomas,
you make a couple good points, here’s my take on them:
Hi Thomas,
you make a couple good points, here’s my take on them:
> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>
> Dear all,
>
> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
Hi Thomas,
you make a couple good points, here’s my take on them:
> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>
> Dear all,
>
> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
>
> Some use cases:
>
> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
>
> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
Fully agree. Different models of collections which are configured with flags like this, be it explicit or implicit, are a major part of my vision for the output of this WG.
If any of our differences of opinion are due to mistakes on my part, please correct me (and apologies for any frustration). Otherwise I’d suggest we gather these differences and figure out if and how we can fit them into a unified model. Obviously I am open to reduce said model to whatever degree is necessary to fit it inside the WG time constraints, but at the same time I’d like to achieve a satisfying measure of coverage and utility.
Best,
Frederik=
Creator

Discussion

Author

Replies
March 23, 2016 at 5:49 pm #133270

Tobias Weigel
Member

Hello Frederik and Tom,
answers inline..
——– Original Message ——–
Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
Collection requirements, streaming -…
From: fbaumgardt
To: ThomasZastrow
, Research Data Collections
WG
Date: 21 Mar 2016, 18:02
Hello Frederik and Tom,
answers inline..
——– Original Message ——–
Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
Collection requirements, streaming -…
From: fbaumgardt
To: ThomasZastrow
, Research Data Collections
WG
Date: 21 Mar 2016, 18:02
> Hi Thomas,
> you make a couple good points, here’s my take on them:
>
>> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>>
>> Dear all,
>>
>> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
>
> This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
>
>> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
>>
>> Some use cases:
>>
>> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
>> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
>>
>> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
>
> I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
I agree on this with Frederik, though I’d phrase it as: A collection
purely consisting of PID references does guarantee persistence of the
collection’s *structure*. (If you say content, you are making a
statement on the referenced objects, which may disappear as Tom already
pointed out).
I also think we need a solid foundation for the ID, PID, PIT etc.
concepts. This is what DF is also working on. To come back to your
example: A collection of non-P-IDs may still make sense (an ID can also
be globally resolvable, though not persistent), but you cannot say
anything about the validity of the references. Looking at the other
discussions within RDA (DF etc.), I would tend to exclude such generic
IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
The mixed case is an interesting thought, though in the end it might be
a bit off for the same reasons as above. But we may still keep it in
mind; such hybrid cases can be conceptually disturbing, but appear often
enough in practice.
Hello Frederik and Tom,
answers inline..
——– Original Message ——–
Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
Collection requirements, streaming -…
From: fbaumgardt
To: ThomasZastrow
, Research Data Collections
WG
Date: 21 Mar 2016, 18:02
> Hi Thomas,
> you make a couple good points, here’s my take on them:
>
>> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>>
>> Dear all,
>>
>> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
>
> This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
>
>> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
>>
>> Some use cases:
>>
>> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
>> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
>>
>> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
>
> I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
I agree on this with Frederik, though I’d phrase it as: A collection
purely consisting of PID references does guarantee persistence of the
collection’s *structure*. (If you say content, you are making a
statement on the referenced objects, which may disappear as Tom already
pointed out).
I also think we need a solid foundation for the ID, PID, PIT etc.
concepts. This is what DF is also working on. To come back to your
example: A collection of non-P-IDs may still make sense (an ID can also
be globally resolvable, though not persistent), but you cannot say
anything about the validity of the references. Looking at the other
discussions within RDA (DF etc.), I would tend to exclude such generic
IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
The mixed case is an interesting thought, though in the end it might be
a bit off for the same reasons as above. But we may still keep it in
mind; such hybrid cases can be conceptually disturbing, but appear often
enough in practice.
>
> Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
>
> By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
>
Can you define a bit better what you mean with ‘persistent traits’? I’m
not sure I completely got your point about why collection items are not
atomic. I would not go so far as saying that every collection consists
of items which can always be further subdivided (that would seem a bit
too much recursion to me).
Hello Frederik and Tom,
answers inline..
——– Original Message ——–
Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
Collection requirements, streaming -…
From: fbaumgardt
To: ThomasZastrow
, Research Data Collections
WG
Date: 21 Mar 2016, 18:02
> Hi Thomas,
> you make a couple good points, here’s my take on them:
>
>> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>>
>> Dear all,
>>
>> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
>
> This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
>
>> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
>>
>> Some use cases:
>>
>> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
>> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
>>
>> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
>
> I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
I agree on this with Frederik, though I’d phrase it as: A collection
purely consisting of PID references does guarantee persistence of the
collection’s *structure*. (If you say content, you are making a
statement on the referenced objects, which may disappear as Tom already
pointed out).
I also think we need a solid foundation for the ID, PID, PIT etc.
concepts. This is what DF is also working on. To come back to your
example: A collection of non-P-IDs may still make sense (an ID can also
be globally resolvable, though not persistent), but you cannot say
anything about the validity of the references. Looking at the other
discussions within RDA (DF etc.), I would tend to exclude such generic
IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
The mixed case is an interesting thought, though in the end it might be
a bit off for the same reasons as above. But we may still keep it in
mind; such hybrid cases can be conceptually disturbing, but appear often
enough in practice.
>
> Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
>
> By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
>
Can you define a bit better what you mean with ‘persistent traits’? I’m
not sure I completely got your point about why collection items are not
atomic. I would not go so far as saying that every collection consists
of items which can always be further subdivided (that would seem a bit
too much recursion to me).
> Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
Hello Frederik and Tom,
answers inline..
——– Original Message ——–
Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
Collection requirements, streaming -…
From: fbaumgardt
To: ThomasZastrow
, Research Data Collections
WG
Date: 21 Mar 2016, 18:02
> Hi Thomas,
> you make a couple good points, here’s my take on them:
>
>> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>>
>> Dear all,
>>
>> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
>
> This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
>
>> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
>>
>> Some use cases:
>>
>> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
>> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
>>
>> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
>
> I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
I agree on this with Frederik, though I’d phrase it as: A collection
purely consisting of PID references does guarantee persistence of the
collection’s *structure*. (If you say content, you are making a
statement on the referenced objects, which may disappear as Tom already
pointed out).
I also think we need a solid foundation for the ID, PID, PIT etc.
concepts. This is what DF is also working on. To come back to your
example: A collection of non-P-IDs may still make sense (an ID can also
be globally resolvable, though not persistent), but you cannot say
anything about the validity of the references. Looking at the other
discussions within RDA (DF etc.), I would tend to exclude such generic
IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
The mixed case is an interesting thought, though in the end it might be
a bit off for the same reasons as above. But we may still keep it in
mind; such hybrid cases can be conceptually disturbing, but appear often
enough in practice.
>
> Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
>
> By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
>
Can you define a bit better what you mean with ‘persistent traits’? I’m
not sure I completely got your point about why collection items are not
atomic. I would not go so far as saying that every collection consists
of items which can always be further subdivided (that would seem a bit
too much recursion to me).
> Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
What do you mean with the genealogical relation?
>
>> Before you kill me – one thing I would provide in this sense is a flag “static” or “non static” which is assigned to the collection as a whole.
>
> Fully agree. Different models of collections which are configured with flags like this, be it explicit or implicit, are a major part of my vision for the output of this WG.
I also think that this can be one fundamental outcome.
Hello Frederik and Tom,
answers inline..
——– Original Message ——–
Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
Collection requirements, streaming -…
From: fbaumgardt
To: ThomasZastrow
, Research Data Collections
WG
Date: 21 Mar 2016, 18:02
> Hi Thomas,
> you make a couple good points, here’s my take on them:
>
>> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
:
>>
>> Dear all,
>>
>> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
>
> This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
>
>> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
>>
>> Some use cases:
>>
>> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
>> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
>>
>> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
>
> I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
I agree on this with Frederik, though I’d phrase it as: A collection
purely consisting of PID references does guarantee persistence of the
collection’s *structure*. (If you say content, you are making a
statement on the referenced objects, which may disappear as Tom already
pointed out).
I also think we need a solid foundation for the ID, PID, PIT etc.
concepts. This is what DF is also working on. To come back to your
example: A collection of non-P-IDs may still make sense (an ID can also
be globally resolvable, though not persistent), but you cannot say
anything about the validity of the references. Looking at the other
discussions within RDA (DF etc.), I would tend to exclude such generic
IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
The mixed case is an interesting thought, though in the end it might be
a bit off for the same reasons as above. But we may still keep it in
mind; such hybrid cases can be conceptually disturbing, but appear often
enough in practice.
>
> Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
>
> By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
>
Can you define a bit better what you mean with ‘persistent traits’? I’m
not sure I completely got your point about why collection items are not
atomic. I would not go so far as saying that every collection consists
of items which can always be further subdivided (that would seem a bit
too much recursion to me).
> Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
What do you mean with the genealogical relation?
>
>> Before you kill me – one thing I would provide in this sense is a flag “static” or “non static” which is assigned to the collection as a whole.
>
> Fully agree. Different models of collections which are configured with flags like this, be it explicit or implicit, are a major part of my vision for the output of this WG.
I also think that this can be one fundamental outcome.
>
> If any of our differences of opinion are due to mistakes on my part, please correct me (and apologies for any frustration). Otherwise I’d suggest we gather these differences and figure out if and how we can fit them into a unified model. Obviously I am open to reduce said model to whatever degree is necessary to fit it inside the WG time constraints, but at the same time I’d like to achieve a satisfying measure of coverage and utility.
>
Well phrased. Noone should be afraid of getting bashed for questioning
(or bashing) the principles we work on. That’s what we are here for. 🙂
Best, Tobias
Author

Replies

Research Data Collections WG

Group Organizers

Re: [rda-collection-wg] [rda-collection-wg] Collection requirements, streaming -…