Bridget Almas, Dave Dubin, Anna Krohn, Isuru Suriarachchi, Jason Jackson, Kim Fortun, Julia Collins, Gary Berg-Cross, Raphael Ritz
* share progress on Anna and Dave's work on development of provenance terminology and discuss approaches to contributing this to the DFT working group's registry deliverable.
* understand interests of the DPHE group wrt provenance issues and explore possibilities for cross-group collaboration
Discussion Notes/Action Items:
Dave gave an overview of the in-progress work on collecting core terminology on data provenance. He demonstrated the use of SKOS concepts as linked data and described plans to mint PURL identifiers as persistent URIs for the terminology.
Rafael confirmed the DFT group's interest in ingesting the prov terminology into the registry and identified a few issues that need to be worked out wrt to use of SKOS concepts (which the DFT registry isn't currently using) and handling of relationships.
Gary also responded following the meeting with additional notes and questions via email, copied below:
"As I wrote on the chat DFT would be interested in several things such as a definition of Information Content that Dave mentioned along with the use cases which we might use to focus some key provenance terms.
Is it correct to think of your core terms as largely within the Metadata concept?
If so are there active connections with some of the metadata groups to work on some of this?
Another thing that came up was the connection to data stored in libraries that give context for the more dynamic forms of "research data" flowing from sensors and social media. The DFT group has only some small connection to the vocabularies here such as in archiving efforts, but if there are people out there that are interested in this as a vital part of RDA we would welcome input."
Action Item: Dave will look into option for integrating the prov linked data set into the DFT semantic media wiki solution. Rafael and Gary will also give this some thought and discussions will continue over email leading up to a face-to-face discussion at P4.
Jason, Julia and Kim gave an overview of the work and interests of the DPHE group. Jason discussed the development of a Open Folklore's Ethnographic Thesaurus and Local Contexts licenses for sharing ethnographic data. He raised the need for the domain to gain an understanding of how to take general metadata frameworks like Dublicn Core, Prov ,etc, and make them work in an ethnographic context. Julia reiterated a need explore wasys for the application frameworks they use to record data (e.g. Nunaliit) to capture provenance information as well. Kim discussed a near term project initiative of the DPHE group to set forth best practices for storing and sharing researcher-created ethnographic data such as field notes etc. - data that is traditionally kept in non-digital forms. Key questions pertain to defining the citation structure and metadata requirements. The heterogonitity of the ethnographic data presents some interesting challenges.
Dave suggested that one way to start would be for the DPHE group to provide a natural language narrative of case studies and scenarios of interest, and from there we can start to contruct a model of what the provenance data requirements might be. Bridget offered to provide an example use case from the linguistic domain (posted to the wiki at https://rd-alliance.org/group/research-data-provenance/wiki/use-case-provenance-linguistic-annotations.html ). She also suggested taking a look at the ANDS Prov Use Cases tool for other examples.
Jason followed up via email with the following information:
"The following applied to the museum object corner of our space. It will be harder work to do the key task of making explicit implicit practices in other corners of our realm (field photography, found documents, etc.).
In a ethnographic museum case, what I will describe now could have been true for the other museums where I have worked. It comes in actuality out of work currently underway at the MMWC.
Like (all/) most museums, the MMWC has an idiosyncratic implementation of “customary” cataloging and accessioning practices. Its current catalog system is built upon older iterations of a once new system. That new system was inspired by unknown older museums, but it has its own quirks. These are usually the product of the original insights of some founding curator and some preconceived notions of what the collection is or will be like in the future and of the needs of future users. Many middle aged museum systems have smart numbers and other elements designed to encode extra metadata into simple elements. These are almost all despised by those who inherited them. An example from the Gilcrease Museum where I once worked is a prefix in the catalog number that told you whether an object came from east or west of the Mississippi River. Which turn out to be wonderful when cataloging objects from, say, Brazil.
At MMWC our idiosyncratic system is not too bad as these things go. Our work over the last few years has involved mapping (building crosswalks) our unique fields into Dublin Core fields. The first place where we have needed to accomplish this task is in our use of Omeka to build digital exhibitions on the basis of our collections. In this example, we have to export data out of our collections database (Filemaker files to Comma Separated Value Files into Omeka import).
When we do this, because of the public nature of the target outcome (an object in our Omeka site), some information is purposefully filtered out in route from our in-house database to the public content space. A clear example of this would be insurance value or some other information of back house relevance that we are not wanting to share widely. Another kind of transformation is that we might filter out information about which we do not have sufficient confidence to share—a speculative attribution for instance. Our database may record that someone “thinks” a pottery bowl is from Uganda or Santa Clara pueblo on the basis of style or some other curatorial skills-based analysis, but we might not want to commit to that publicly. We might not export a whole category of data for this reason, we might massage it on a single object basis, or we might apply a more general description across a group of objects. (If we were confident about Africa universally but not confident about Country, County, City, etc. of origin, we might attend to this as the data moves.
This trial and error work is easier in the Omeka case because it is done on a small, exhibition level basis and it does not effect us wholesale. It is teaching us things that we will probably go back and deal with systemically in our underlying collections database as a whole. As when Dublin Core Exended provides for a field we now know we need but that was absent in our ancestral database scheme (DC: "Rights Holder", for instance).
(Something to know about how Omeka works with Dublin Core extended… It does not present empty fields. This example object http://dlib.indiana.edu/omeka/mathers/items/show/288 shows a small amount of metadata because we did not bring much over.) We are not using Omeka for our catalog, just as a means of doing exhibitions. The content in this case is more like an exhibition label and less like a full database item.)
We have not gone there yet, but our Omeka site should be harvestable by Open Folklore and cognate efforts."