Skip to main content

Notice

We are in the process of rolling out a soft launch of the RDA website, which includes a new member platform. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Visitors may encounter functionality issues with group pages, navigation, missing content, broken links, etc. As you explore the new site, please provide your feedback using the UserSnap tool on the bottom right corner of each page. Thank you for your understanding and support as we work through all issues as quickly as possible. Stay updated about upcoming features and functionalities: https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/

Re: [rda-collection-wg] [rda-collection-wg] Collection requirements, streaming -…

  • Creator
    Discussion
  • #122790

    Hi Thomas,
    you make a couple good points, here’s my take on them:
    Hi Thomas,
    you make a couple good points, here’s my take on them:
    > Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >
    > Dear all,
    >
    > I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    Hi Thomas,
    you make a couple good points, here’s my take on them:
    > Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >
    > Dear all,
    >
    > I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    > We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
    >
    > Some use cases:
    >
    > – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
    > – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
    >
    > My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
    I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
    Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
    By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
    Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
    Fully agree. Different models of collections which are configured with flags like this, be it explicit or implicit, are a major part of my vision for the output of this WG.
    If any of our differences of opinion are due to mistakes on my part, please correct me (and apologies for any frustration). Otherwise I’d suggest we gather these differences and figure out if and how we can fit them into a unified model. Obviously I am open to reduce said model to whatever degree is necessary to fit it inside the WG time constraints, but at the same time I’d like to achieve a satisfying measure of coverage and utility.
    Best,
    Frederik=

  • Author
    Replies
  • #133270

    Hello Frederik and Tom,
    answers inline..
    ——– Original Message ——–
    Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
    Collection requirements, streaming -…
    From: fbaumgardt
    To: ThomasZastrow
    , Research Data Collections
    WG
    Date: 21 Mar 2016, 18:02
    Hello Frederik and Tom,
    answers inline..
    ——– Original Message ——–
    Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
    Collection requirements, streaming -…
    From: fbaumgardt
    To: ThomasZastrow
    , Research Data Collections
    WG
    Date: 21 Mar 2016, 18:02
    > Hi Thomas,
    > you make a couple good points, here’s my take on them:
    >
    >> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >>
    >> Dear all,
    >>
    >> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    >
    > This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    >
    >> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
    >>
    >> Some use cases:
    >>
    >> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
    >> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
    >>
    >> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
    >
    > I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
    I agree on this with Frederik, though I’d phrase it as: A collection
    purely consisting of PID references does guarantee persistence of the
    collection’s *structure*. (If you say content, you are making a
    statement on the referenced objects, which may disappear as Tom already
    pointed out).
    I also think we need a solid foundation for the ID, PID, PIT etc.
    concepts. This is what DF is also working on. To come back to your
    example: A collection of non-P-IDs may still make sense (an ID can also
    be globally resolvable, though not persistent), but you cannot say
    anything about the validity of the references. Looking at the other
    discussions within RDA (DF etc.), I would tend to exclude such generic
    IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
    The mixed case is an interesting thought, though in the end it might be
    a bit off for the same reasons as above. But we may still keep it in
    mind; such hybrid cases can be conceptually disturbing, but appear often
    enough in practice.
    Hello Frederik and Tom,
    answers inline..
    ——– Original Message ——–
    Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
    Collection requirements, streaming -…
    From: fbaumgardt
    To: ThomasZastrow
    , Research Data Collections
    WG
    Date: 21 Mar 2016, 18:02
    > Hi Thomas,
    > you make a couple good points, here’s my take on them:
    >
    >> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >>
    >> Dear all,
    >>
    >> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    >
    > This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    >
    >> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
    >>
    >> Some use cases:
    >>
    >> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
    >> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
    >>
    >> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
    >
    > I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
    I agree on this with Frederik, though I’d phrase it as: A collection
    purely consisting of PID references does guarantee persistence of the
    collection’s *structure*. (If you say content, you are making a
    statement on the referenced objects, which may disappear as Tom already
    pointed out).
    I also think we need a solid foundation for the ID, PID, PIT etc.
    concepts. This is what DF is also working on. To come back to your
    example: A collection of non-P-IDs may still make sense (an ID can also
    be globally resolvable, though not persistent), but you cannot say
    anything about the validity of the references. Looking at the other
    discussions within RDA (DF etc.), I would tend to exclude such generic
    IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
    The mixed case is an interesting thought, though in the end it might be
    a bit off for the same reasons as above. But we may still keep it in
    mind; such hybrid cases can be conceptually disturbing, but appear often
    enough in practice.
    >
    > Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
    >
    > By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
    >
    Can you define a bit better what you mean with ‘persistent traits’? I’m
    not sure I completely got your point about why collection items are not
    atomic. I would not go so far as saying that every collection consists
    of items which can always be further subdivided (that would seem a bit
    too much recursion to me).
    Hello Frederik and Tom,
    answers inline..
    ——– Original Message ——–
    Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
    Collection requirements, streaming -…
    From: fbaumgardt
    To: ThomasZastrow
    , Research Data Collections
    WG
    Date: 21 Mar 2016, 18:02
    > Hi Thomas,
    > you make a couple good points, here’s my take on them:
    >
    >> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >>
    >> Dear all,
    >>
    >> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    >
    > This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    >
    >> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
    >>
    >> Some use cases:
    >>
    >> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
    >> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
    >>
    >> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
    >
    > I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
    I agree on this with Frederik, though I’d phrase it as: A collection
    purely consisting of PID references does guarantee persistence of the
    collection’s *structure*. (If you say content, you are making a
    statement on the referenced objects, which may disappear as Tom already
    pointed out).
    I also think we need a solid foundation for the ID, PID, PIT etc.
    concepts. This is what DF is also working on. To come back to your
    example: A collection of non-P-IDs may still make sense (an ID can also
    be globally resolvable, though not persistent), but you cannot say
    anything about the validity of the references. Looking at the other
    discussions within RDA (DF etc.), I would tend to exclude such generic
    IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
    The mixed case is an interesting thought, though in the end it might be
    a bit off for the same reasons as above. But we may still keep it in
    mind; such hybrid cases can be conceptually disturbing, but appear often
    enough in practice.
    >
    > Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
    >
    > By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
    >
    Can you define a bit better what you mean with ‘persistent traits’? I’m
    not sure I completely got your point about why collection items are not
    atomic. I would not go so far as saying that every collection consists
    of items which can always be further subdivided (that would seem a bit
    too much recursion to me).
    > Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
    Hello Frederik and Tom,
    answers inline..
    ——– Original Message ——–
    Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
    Collection requirements, streaming -…
    From: fbaumgardt
    To: ThomasZastrow
    , Research Data Collections
    WG
    Date: 21 Mar 2016, 18:02
    > Hi Thomas,
    > you make a couple good points, here’s my take on them:
    >
    >> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >>
    >> Dear all,
    >>
    >> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    >
    > This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    >
    >> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
    >>
    >> Some use cases:
    >>
    >> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
    >> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
    >>
    >> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
    >
    > I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
    I agree on this with Frederik, though I’d phrase it as: A collection
    purely consisting of PID references does guarantee persistence of the
    collection’s *structure*. (If you say content, you are making a
    statement on the referenced objects, which may disappear as Tom already
    pointed out).
    I also think we need a solid foundation for the ID, PID, PIT etc.
    concepts. This is what DF is also working on. To come back to your
    example: A collection of non-P-IDs may still make sense (an ID can also
    be globally resolvable, though not persistent), but you cannot say
    anything about the validity of the references. Looking at the other
    discussions within RDA (DF etc.), I would tend to exclude such generic
    IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
    The mixed case is an interesting thought, though in the end it might be
    a bit off for the same reasons as above. But we may still keep it in
    mind; such hybrid cases can be conceptually disturbing, but appear often
    enough in practice.
    >
    > Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
    >
    > By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
    >
    Can you define a bit better what you mean with ‘persistent traits’? I’m
    not sure I completely got your point about why collection items are not
    atomic. I would not go so far as saying that every collection consists
    of items which can always be further subdivided (that would seem a bit
    too much recursion to me).
    > Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
    What do you mean with the genealogical relation?
    >
    >> Before you kill me – one thing I would provide in this sense is a flag “static” or “non static” which is assigned to the collection as a whole.
    >
    > Fully agree. Different models of collections which are configured with flags like this, be it explicit or implicit, are a major part of my vision for the output of this WG.
    I also think that this can be one fundamental outcome.
    Hello Frederik and Tom,
    answers inline..
    ——– Original Message ——–
    Subject: [rda-collection-wg] Re: [rda-collection-wg] [rda-collection-wg]
    Collection requirements, streaming -…
    From: fbaumgardt
    To: ThomasZastrow
    , Research Data Collections
    WG
    Date: 21 Mar 2016, 18:02
    > Hi Thomas,
    > you make a couple good points, here’s my take on them:
    >
    >> Am 21.03.2016 um 10:32 schrieb ThomasZastrow
    :
    >>
    >> Dear all,
    >>
    >> I had this discussion – about immutable/mutable objects a PID is pointing to – already a few times in the past … and depending who is attending the discussion and which current use cases are in consideration the answers (*if* there are answers) are subject of change.
    >
    > This is really what led me to go back to the question of what’s inside a PID, there are a couple different conceptions floating around and I’d like to gather them and see which ones, if not all, we can and want to cover. Or at least identify and understand the canonical ones, i.e. those underlying frequently used PID services.
    >
    >> We are talking about research data *collections*. We should not talk about versioning at all. If the content of a collection is static – fine. If it is not static – also fine. We don’t know and we don’t care. The concept of a collection and its PID can only guarantee the persistence of the collection itself – but it doesn’t says anything about the individual components. This is along the concept of PIDs in general: the PID is persistent, but maybe it is pointing to something which no longer exists. Same goes for the collection and its PID: the collection is persistently there, but who knows what has happened to its content …? Also, if the collection is not able to determine changes done to its individual items, versioning etc. on the collection level is not possible.
    >>
    >> Some use cases:
    >>
    >> – You mentioned already the ORCID use case. If my postal adress changes, my ORCID ID stays the same. Lets say someone creates a collection of ORCID IDs – if some of them are subject of change, the collection itself has even no chance to find out and should also not reflect these changes
    >> – A collection of sensor data – “Collecting started at 1.1.1990 and is still going on”. Maybe the sensor data of one day is an immutable thing, but not the collection itself. Creating every day a new collection is not always an option.
    >>
    >> My personal opinion: Per definition, PIDs are not able to deal with versioning at all. They also can’t reflect changes in the data they are pointing to. So it makes no sense to implement versioning etc. on a logical level of data management which is defined by PIDs.
    >
    > I don’t think we agree here. First, given the distinction between an ID and a PID, we should be able to propagate their characteristics into their aggregations. By which I mean that a collection purely consisting of PID-references in fact does guarantee persistence of its content. Conversely, a PID-referenced collection of non-persistent IDs is little more than a list of Strings. An ID-referenced aggregation of a mix of ID- and PID-referenced objects would work, on the other hand (if you don’t need or expect PID characteristics, that is). Of course building a model around this distinction is not possible without a sound foundation at the ID, PID, PIT and data type level.
    I agree on this with Frederik, though I’d phrase it as: A collection
    purely consisting of PID references does guarantee persistence of the
    collection’s *structure*. (If you say content, you are making a
    statement on the referenced objects, which may disappear as Tom already
    pointed out).
    I also think we need a solid foundation for the ID, PID, PIT etc.
    concepts. This is what DF is also working on. To come back to your
    example: A collection of non-P-IDs may still make sense (an ID can also
    be globally resolvable, though not persistent), but you cannot say
    anything about the validity of the references. Looking at the other
    discussions within RDA (DF etc.), I would tend to exclude such generic
    IDs. Persistent IDs are increasingly seen as a necessity, for good reasons.
    The mixed case is an interesting thought, though in the end it might be
    a bit off for the same reasons as above. But we may still keep it in
    mind; such hybrid cases can be conceptually disturbing, but appear often
    enough in practice.
    >
    > Secondly, if the data that’s referenced by an ID can change then it is not persistent, from a data citation standpoint. That’s what I’d like to address with persistent traits – ORCID or any other PID implementation that wants to reference partly mutable objects could define a data mask, clarifying which properties are part of its persistent and its mutable identities.
    >
    > By the way, I think one key premise on my part is that collection items (or any PID referenced object) aren’t atomic, i.e. are actually (associative) arrays and have a sub-object address space. We might not share this notion to the same degree. (Sidenote: Aggregate objects being collections would bring ‘persistent traits’ into the scope of our WG rather than PIT, etc.)
    >
    Can you define a bit better what you mean with ‘persistent traits’? I’m
    not sure I completely got your point about why collection items are not
    atomic. I would not go so far as saying that every collection consists
    of items which can always be further subdivided (that would seem a bit
    too much recursion to me).
    > Thirdly, in my view, the models we were discussing aim at implementing versioning not within a single PID but as a relation between a series of PIDs. Versioning of PID-referenced data thus becomes genealogical relation rather than the record of change in an individual object.
    What do you mean with the genealogical relation?
    >
    >> Before you kill me – one thing I would provide in this sense is a flag “static” or “non static” which is assigned to the collection as a whole.
    >
    > Fully agree. Different models of collections which are configured with flags like this, be it explicit or implicit, are a major part of my vision for the output of this WG.
    I also think that this can be one fundamental outcome.
    >
    > If any of our differences of opinion are due to mistakes on my part, please correct me (and apologies for any frustration). Otherwise I’d suggest we gather these differences and figure out if and how we can fit them into a unified model. Obviously I am open to reduce said model to whatever degree is necessary to fit it inside the WG time constraints, but at the same time I’d like to achieve a satisfying measure of coverage and utility.
    >
    Well phrased. Noone should be afraid of getting bashed for questioning
    (or bashing) the principles we work on. That’s what we are here for. 🙂
    Best, Tobias

Log in to reply.