RE: [Biodiversity Community] Comments re yesterday's meeting

24 Sep 2015

Dear Donat,
First of all, I pressume that your intention was to send this email to the RDA Biodiversity Data Integration IG and not the Biodiversity Informatics mailing list. In any case, I cc my reply to the RDA group's mailing list.
As per the Agenda circulated a few weeks ago, The IG addressed issues related to the mobilisation of biological collection data and especially their potential use in climate change research (the theme of the RDA P6 plenary). Digitisation of collections is a major undertaking by a significant amount of museums around the world. The breadth and volume of data that are produced by such digitisation workflows is substantial. As the ongoing digitisation programmes are going on in parallel, it is pivotal to support the Museums coordinate their actions, producing interoperable, openly accessible machine readable datasets. It is also critical that we do this in the context of the how these datasets can be re-used and effectively linked to other datasets.
The above, however, are associated for the Museums with a series of social and technical challenges. It is only if we address these challenges that we will be able to support the collection holders to proceed towards a common goal. So, I believe that RDA and the BDI IG is a very good place to act on this. In fact, Museums can be ideal adopters of RDA outputs. As this was the first time we addressed this big issue in the context of RDA it was also importnat to describe these challenges. This was the premise behind the BDI IG agenda in P6. Please also let me diasgree with the way you interpret Mark's comments to the session. Our discussion highlighted the fact that we need to also aim at providing incentives at institutional level and not only researcher level. This is something that RDA has not put enough emphasis up to now. Mark's comment if these challenges for bio-collections digitisation apply also to other collections in fact show his interest in this topic rather anything else.
Regarding the Names services process. This is a ongoing activity that started in Amsterdam and continues as an initiative to form an RDA WG. This can continue in parallel with the activities of BDI IG. Interest Groups have by default a wider view on issues and should be more inclusive touching upon different topics in alignment to their mission statement. So, because the topic of the BDI IG was to touch on the urgent issues on collection data mobilisation does not mean that we dropped our efforts in the Names Services.
I will follow-up with an email to the members of the under-development WG team with an update on the progress and a proposal.
Thank you for your comments and I hope the above clarify things a bit.
Kind regards,
From: Roderic Page [***@***.***]
Sent: 24 September 2015 17:16
To: Donat Agosti
Cc: Dimitris Koureas; ***@***.***
Subject: Re: [Biodiversity Community] Comments re yesterday's meeting
Hi Donat,
I’m curious about the context for this email, especially:
A concern is that we have a very limited resource and time budget so we need to focus on something that we can deliver. Still, names , or a cleverly choosen subset of it, might be a choice – but see above.
Also, consider potential adopters. Of names, we might be able to leverage support of up to 30% of the annual output of taxonomic work.
Who is providing a budget and a deadline? Where does the figure 30% of annual output of names come from?
On 24 Sep 2015, at 14:59, Donat Agosti <***@***.***> wrote:
Dear Dimitris
Please find attached my comments regarding yesterday’s biodivdatainterop .
Biodiversity Data Interoperability IG
The focus of this meeting has been on digitized specimens, based on a set of lectures and a brief discussion.
In my view this was a step back of this IG. The presentations have been reports from ongoing work, rather than a lessons learned of what all these projects share and a proposal where we should go (this then being the outcome of the BDIIG. Digital collections and how to get there are not something unique to the RDA world (there are other groups collecting physical specimens), and thus the critique by Mark Parson (Secretary General RDA) was, that what we talk here is probably better discussed in a wider context. The meeting ended in the rather usual way trying the state of the art rather than presenting a vision of how to actually solve digitization and handling of specimens.
The older focus on name infrastructure did not surface at al.
I personally still think that names is the only unique contribution we can make to RDA – but at the same time, I think this too is an issue that could be tackled in more general ways. Names in crystallography, molecules, agriculture, and even names of people, that is named entities, might be a way forward. All these have the issue of getting identifiers, and the issue of links to a reference (scientific article) and synonym, acronyms, etc.
Though I put the idea forward focusing on names, I would be willing to reconsider.
Donald Hobbern came to a similar conclusion, pushing for an adoption or development of and approach that would put our names into a bigger context.
The disadvantages though is, that, unless we have a substantial name service involved on our side that is making a commitment to developments, this is hard to organize. The reason that I envision is that somebody from our community needs to take on the lead: This person needs to understand the subject, needs to have a vision, needs to be connected in the identifier world, and should have some executive power to implement the proposed system in their own, and then promote the use of the “chosen” identifers in the community, including the development of what data should be returned. It also needs to have a commitment to define the links of names (syn, nov.comb., etc).
An alternative might be to use Wikidata as a stage inbetween that would on the one hand allow having a dump of all the existing in COL, GBIF, IPNI, Zoobank in Wikidate that then are immediately a source for Wikimedia, could be edited through the community (might involved a layered system of super editors from the respective taxonomic groups), and be used and cited by users like GBIF. Plazi does this already, but it is an avenue that needs be explored, not pleast because of the social issues involved.
A concern is that we have a very limited resource and time budget so we need to focus on something that we can deliver. Still, names , or a cleverly choosen subset of it, might be a choice – but see above.
Also, consider potential adopters. Of names, we might be able to leverage support of up to 30% of the annual output of taxonomic work.
    Since this has become a web discussion, I'll add my thoughts.

    Our involvement in RDA needs to focus on how we can contribute to, and benefit from, responses to the big cross-domain challenges in achieving the RDA vision ("researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society").  We have many venues in which to discuss our own domain challenges and do not need to revisit them in a diluted form here.  Working sessions on biodiversity data integration at RDA simply distract us from getting involved in the bigger picture issues.

    How we operate our own digitisation processes and how we handle the specific semantic challenges of our data is unnecessary detail here, except when it can be recognised as a pattern shared across many domains. Here are some of the areas where we could truly benefit from RDA:

    • How can a largely volunteer, globally-dispersed community successfully manage large-scale open-ended vocabularies important to its domain? This is where taxon names and concepts come in - we should abstract away from the specifics of biological taxonomy and understand how others are handling large-scale gazetteers, catalogues, etc., all of which share similar issues around authoritativeness, completeness, currency, stability and deduplication.
    • How can we build global systems which facilitate integration of all relevant data yet simultaneously support access restrictions of various kinds for more sensitive data elements (whether as part of a data embargo mechanism or, in our case, to protect endangered species or to exercise due diligence around notifiable pest data)?
    • What are the best patterns for handling versioning of datasets in repositories?
    • What mechanisms should we use to express data provenance in search results and can this information be incorporated into the citation trees for publications and data?
    • How do we model community trust in particular experts and associate this with access rights for manipulating data?
    • How should metadata be structured to maximise interpretability across domains?
    • How should we manage dispersed data annotations as enhancement and extensions to historical data.

    These are areas where we are all building our own solutions.  If we can find ways to work with other communities to develop best practices and common standards in these areas, then we can all benefit because there will be significant impetus for software components and tools to implement these standards.  Using those standards will lower our own implementation costs and maximise discoverability and use even in cross-domain analyses.

    We certainly have problems which matter to RDA but we should be dispersing ourselves to participate in the big-topic working groups to ensure that our use cases are understood and that we can contribute to the shared solution.


