PID Kernel Information WG: Aug 17 meeting

Kernel Information WG VC Aug 17

General topics on the call:

  • Structural metadata discussion

  • Ulrich’s e-mail on decision-making

  • Performance questions

  • etc.



  • structural metadata used in library, db communities

    • physical info *about* an object; pages, chapters, size, etc. - minimally describing the structure

    • library community: clear distinction between structural vs. descriptive metadata

      • descriptive metadata has no bounding - we don't want that

    • Q: if we limit KI to structural md, can we enable the use cases? Beth thinks so - see assessment at end of email. particular point for RDA: relation to a registered service (via the DTR), e.g. a reader service

      • Experience from past projects is that good value can be done based on structural md alone

      • this goes beyond the 'condition & yes/no decision' scope put forth by Ulrich

      • this is the later process

    • The overhead in duplicate storage has to be worth it

    • Important: In this group everyone knows that we are not focused on the descriptive MD - HS is not a discovery service - but that’s not obvious for others. Will be important later on when designing recs.

    • Are there other defined structural MD schemas other than METS?

      • No... ask Alex/IDigBio again?

    • so: what do we do?

    • Ulrich: metadata in his e-mail was also driven by discovery intent

      • is structural metadata really that well bounded?

    • Beth: structural MD is *at* the object - not *inside* it. Characteristics of the objects and not their content.

      • Mark: Take provenance as example - is not at content, but is it really at object level?

    • Maggie: Need for automated workflows to be able to make decisions about DOs based on a *minimum* number of calls (to PID registries and then following resource locators to find e.g. landing pages or other metadata sources for the DO) -> desire to have as much of “important” fields available at kernel level...

    • TW: process view on decision-making, selection, then taking action

      • Need to understand those decisions first

      • Beth: Can we/some third agent trust those decisions? That’s part of the decision process - it’s one use case. Then take further action,

    • Ulrich: danger of restricting the KI too much within the RDA common profile; might not be useful for communities. Instead: provide additionally to a precise KI set hints on how to extent that

      • Beth: in the absence of agreement, we need to take several options into experimentation. One benchmark is that we might just blow the HS easily up.

        • Two limitations to drawing really broad: limitations of HS and diluted benefit of common profile it it gets too large.

      • Performance/efficiency: Client has to be in the picture because it actually executes the decision making.

        • This also goes as far as covering the content of the DTR and the multitude (and possible explosion) of PIDs in there...

      • Blockchains: There should be PIDs in there, not objects/data.

      • Decision making layer - where is it actually located?

        • On top of HS (server-side) - not the HS itself, it just stores PID records

          • Not entirely correct: HS REST API bulk operations - upload a script; but only focused on single HS

        • At client

        • At a cache inbetween - harvested from HS; but has limitations wrt real-time data / time series (e.g. seismology)

    • Do we want to have a meta-framework around KI profiles that incorporates all of the facet we’ve talked about so far?

      • Particularly: decision-making process, complexity, structural MD, trust questions, ...


Next call we should start with P10 agenda discussion

  • Next time: use gotomeeting. Mark: Request account from rda sec.

Beth’s e-mail

I'd like to advocate for a decision criteria for inclusion / disclusion
for kernel information be based on whether the information is structural
metadata.  If a field is structural in nature, it qualifies for
inclusion. If not structural, it does not.  I'd like to discuss this
tomorrow if agenda permits.

A definition of structural metadata that I think is most appropriate
comes out of the digital library community*:  

In digital library community usage, structural metadata describes the
intellectual or physical elements of a digital object. For a file that
represents a single page as a compound document (e.g., a JPEG 2000 jpm
file), the structural metadata may include information on page layout.
In a multi-file digital object (e.g., a scanned book with many page
images), structural metadata describes the object's components and their
relationships: pages, chapters, tables of contents, index, etc. Such
metadata can support sophisticated search and retrieval actions as well
as the navigation and presentation of digital objects. METS offers one
model for the encoding of structural metadata.

In some engineering and related technical contexts, the term structural
metadata refers to physical or technical information about a digital
object, such as file format, size, media, etc.


What can be done with structural metadata alone?   With structural
metadata alone one can:

-- linking related objects,

-- verifying quality
-- making optimization decisions (based on size)
-- associating actionable entities to a data object (such as registering
a reader in DTR for a file format or media)
-- allows information about object granularity to
This I advocate is a strong set of functionality.   Additionally,
structural metadata can be bounded far more easily than can trying to
draw a fuzzy line around and through descriptive metadata.