Here are the notes from the Prov IG session at P6 (with thanks due to early career fellow Hien Truong). Will follow up shortly with a poll to schedule our next webconference.
See also the Session Agenda.
Mireille Louys and Mathieu Servillat put forth for discussion the topic of prototyping an astronomical data provenance model for the Interntional Virtual Observatory Alliance. (See slides). They were particularly interested in getting feedback from the group on other experiences trying to incorporate the W3C PROV ontology into a domain specific model.
Subsequent discussion points and questions :
- VO model took the design pattern from PROV W3C but didn't implement all of it. The main model classes used were Activity, Entity, Agent
- VO is using PROV-N notation because it allows you to trace the execution scenario in simple text and W3C provides converters to standard formats (JSON, XML, etc.)
- Q: Does IVOA have query requirements format for provenance?
- A: human readable input.
- Q: Identifiers of data objects?
- A: RunID is kept all the time, and it relates to other data objects in the run.
- Q: is PROV-N suitable for machine processing?
- A: use the converters to enable that
- Q: is PROV data is included in the header of the FITS files?
- A: IVOA haven't decided yet, it will depend upon the serialization format desired. They might use IDs to bind prov metadata to data, as outlined in the PROV-AQ recommendations.
- Q: relevance of trust, and history of astronomy domain?
- A: in this context, trust issues are mostly revolving around the quality of the data, as can be determined by the completeness and level of granularity of the provenance data (not so much about whether the data providers themselves are trusted).
- Primary use case here is one of reuse, which requires very fine grained information about the data set, some of which might be already filtered out depending upon who the original audience was. A problem of defining the "fitness for use" clearly. It's important that each individual object in a data set be "trustable" in terms of its quality.
- Q: how to deal with very very complex methodologies and incorporate that level of detail into the provenance chain? Geo community has data sets that include physical as well as virtual data.
- Q: Is capturing who contributed data difficult?
- A. Capturing the "who" of the provenance information is well tackled in astronomy, at least for the human contributors. The PI and contact person is alwyas tightly bound to to the data.
- Q: is the VO current implementation of "Agent" was too narrow? Maybe it needs to take into account possibilities of mancines or institutions (e.g.) as agents.?
- VO team has need for guidance on how to leverage PID definitions to hook the Prov metadata to the data. They will write these questions up and send them to the group.
- Slides are posted on the group RDA website, attendees encouraged to post questions/answers/discussion points to the list.
- Group members also asked to consider whether any would like to step up to help co-chair the group.
- Group members encouraged to add information to our RDA site, participate in between plenary Web conferences, suggest topics for future work.