Minutes from yesterday's call

01 Jun 2016

Dear all,
here are the merged minutes from yesterdays' group call. The next call
will take place in 2 weeks at the usual timeslot: Tuesday June 14, 13:00
GMT.
Best, Tobias
Attendees: Frederik, Ulrich, Christopher, Tobias
Notes:
* Formal definitions - set theory:
o potentially use ADT as intermediate step between model and
implementation
o Look into Haskell docs, use that to get to a small core def for
sorted/unsorted, multimembership/unique
* Models are essentially a set of attributes and rules about them
(plus operations)
o Traits are then useful because they group things together and
make it simpler to explain what a certain formal concept (like
sorting) implies in terms of practically useful properties and
methods
o The clone operation (i.e. cache a snapshot of elements in
recursive collections) is useful because it may solve the
complexity issues that arise once we have recursion
o for citation use case: the copy/clone method (i.e. making
snapshots) may solve the issue that collections whose deep
membership changes may not remain citable
+ citation use case is present in many communities, and we
need to cover their differences. Snapshotting may not be
possible for everyone.
+ but we may figure out a way to compress the snapshotting
process so we can reconstruct a snapshot on request.
o for very dynamic data we may want to state that there are
policies that must be observed (basic versioning)
+ Christopher Harrison: use case with high volume and lots of
change every day - not possible to statically snapshot it -
need a UC description and see where the gaps are wrt the
formal model we end up with
* Collection API - what about a member API? Such an API would answer
e.g. "which collection(s) does this object belong to?" - different
from scope of current swagger API
o Might be solvable through PIT + registered property, but not
sure whether this is the only action of that API
o Close to a global search, so costly/difficult - might not be
implementable by all use cases, but perhaps relevant across
multiple RDA use cases
o Ulrich: nice if every member of a collection has a pinned PID to
its parent; but only feasible for static collections inside a
repository
+ across repositories, you will need a crawler that then
creates a graph; the query however then deviates to "which
subcollection(s) of a given (entry point) collection does
this object belong to?" which actually will lead to a two
parameter function
SubcollectionsMemberBelongsTo(member,collection) - we can
include this in the hierarchy trait
+ actually, the pointer to parents is a collection in itself
o Christopher: Subscription model could help with dynamic data as well
o Frederik: RSS or blog pingback might provide useful ideas and
infrastructure
--
Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany
Phone: +49 40 460094-104
Email: ***@***.***
URL: http://www.dkrz.de
ORCID: orcid.org/0000-0002-4040-0215
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

  • Thomas Zastrow's picture

    Author: Thomas Zastrow

    Date: 03 Jun, 2016

    Dear all,
    ... as I wasn't able to participate in the last videomeeting and because
    we want some discussion on the mailing list ... and weekend is coming ...
    I have the feeling we try to say too much about the individual items
    inside a collection. From my perspective, *anything* which has an adress
    can be an item inside of a collection. But that means it is difficult to
    say anything about the item itself from the collections perspective. And
    it is not necessary: there is the PIT API or similar approaches.
    a)
    It is not necessary that an item has the capability of storing parent's
    information - nor the creator of a collection maybe doesn't have write
    permissions on items he/she is adding to a collection. I also don't know
    any programming language where you can ask an object: "Tell me to which
    collections you belong to". Even in the closed namespace / memory area
    of an application this is not realized, how should that work on a higher
    / global level?
    b)
    I'm not sure if we really should care about a differentiation between
    dynamic / static collections wrt the items a collection contains. A
    collection itself can be dynamic or static - I agree. But we can't say
    *anything* about the items inside of a collection. Maybe there are items
    which are defined as a rule like "Give me the last measurement". I don't
    see a way how such an item could tell the collection "Today I'm giving
    back other results then yesterday". It could also be that such an item
    doesn't exist anymore - we don't have a "wayback" channel and yes, I'm
    not talking only about PIDs here. If static just means that the
    collection cannot be changed anymore - thats fine.
    c)
    I'm not sure if I understood the "trait" thing (and I never heard about
    something like that before): it is about collecting properties/functions
    in to a functional group? Do we really need this level of abstraction?
    Best,
    Tom

  • Tobias Weigel's picture

    Author: Tobias Weigel

    Date: 06 Jun, 2016

    Hello Tom,
    I concur that we may allow anything that has an address to be inside a
    collection, however, there may be an important side condition: That
    either the agent adding the item to a collection or the one responsible
    for managing the collection can make a realistic claim about the item's
    current life cycle state and expected development. There is not a clear
    distinction here, which is probably the cause for many of our problems.
    Do we want to be totally arbitrary regarding the items? Or should the
    benefit of using a collection API rather be that you *can* assume that
    some essential information about item status will be available? I don't
    think we have a clear take on this yet.
    On your item a) - parts known what they are part of - the best example
    for such cases are trees, where you would be unable to traverse otherwise.
    On b) - I think you are right and there may be a line here that we do
    not want to cross wrt the "backchannel" from item to collection you
    explained. A collection should specify whether its constituency is
    dynamic or static, but it is probably too difficult to answer this by
    redirecting to individual items and leave the answer up to them.
    On the traits: It is in principle as you describe, gathering properties
    and methods into flexible "chunks" that can be recombined and their
    recombination may give rise to other special methods. I like the model
    because it at least circumvents some of the issues with multiple
    inheritance. I am currently sticking with it because it is very
    flexible, but I am also not sure if this will be reflected in the API at
    the end. Traits-based programming [1] is probably not the most
    accessible paradigm and I'm not completely sure if this is the right
    description for what's currently in the document.
    Best, Tobias
    [1] https://en.wikipedia.org/wiki/Trait_%28computer_programming%29
    -------- Original Message --------
    Subject: Re: [rda-collection-wg] Minutes from yesterday's call
    From: ThomasZastrow
    <***@***.***>
    To: TobiasWeigel <***@***.***>, RDA Collections WG
    <***@***.***-groups.org>
    Date: 03 Jun 2016, 15:42

  • Bridget Almas's picture

    Author: Bridget Almas

    Date: 06 Jun, 2016

    On 06/06/2016 05:20 AM, TobiasWeigel wrote:
    > Hello Tom,
    >
    > I concur that we may allow anything that has an address to be inside a
    > collection, however, there may be an important side condition: That
    > either the agent adding the item to a collection or the one
    > responsible for managing the collection can make a realistic claim
    > about the item's current life cycle state and expected development.
    > There is not a clear distinction here, which is probably the cause for
    > many of our problems. Do we want to be totally arbitrary regarding the
    > items? Or should the benefit of using a collection API rather be that
    > you *can* assume that some essential information about item status
    > will be available? I don't think we have a clear take on this yet.
    >> The latter is what I have been assuming, and without it it would be

    On 06/06/2016 05:20 AM, TobiasWeigel wrote:
    > Hello Tom,
    >
    > I concur that we may allow anything that has an address to be inside a
    > collection, however, there may be an important side condition: That
    > either the agent adding the item to a collection or the one
    > responsible for managing the collection can make a realistic claim
    > about the item's current life cycle state and expected development.
    > There is not a clear distinction here, which is probably the cause for
    > many of our problems. Do we want to be totally arbitrary regarding the
    > items? Or should the benefit of using a collection API rather be that
    > you *can* assume that some essential information about item status
    > will be available? I don't think we have a clear take on this yet.
    >> The latter is what I have been assuming, and without it it would be
    hard for me to justify the value of the collections API to our use cases.
    >
    > On your item a) - parts known what they are part of - the best example
    > for such cases are trees, where you would be unable to traverse
    > otherwise.
    >
    > On b) - I think you are right and there may be a line here that we do
    > not want to cross wrt the "backchannel" from item to collection you
    > explained. A collection should specify whether its constituency is
    > dynamic or static, but it is probably too difficult to answer this by
    > redirecting to individual items and leave the answer up to them.
    >
    > On the traits: It is in principle as you describe, gathering
    > properties and methods into flexible "chunks" that can be recombined
    > and their recombination may give rise to other special methods. I like
    > the model because it at least circumvents some of the issues with
    > multiple inheritance. I am currently sticking with it because it is
    > very flexible, but I am also not sure if this will be reflected in the
    > API at the end. Traits-based programming [1] is probably not the most
    > accessible paradigm and I'm not completely sure if this is the right
    > description for what's currently in the document.
    >
    >> I think the traits are what I had previously been thinking of as

  • Tobias Weigel's picture

    Author: Tobias Weigel

    Date: 06 Jun, 2016

    Hello Bridget,
    well said - the ability to retrieve essential item status and membership
    information through the collection API is a deciding feature for many
    use case providers, including yours and also ours, actually.
    Best, Tobias
    -------- Original Message --------
    Subject: Re: [rda-collection-wg] Minutes from yesterday's call
    From: balmas <***@***.***>
    To: ***@***.***-groups.org
    Date: 06 Jun 2016, 15:45

submit a comment