Andreas Rauber
Forum Replies Created
-
AuthorReplies
-
Andreas RauberMemberDear all,
This is absolutely correct! In a nutshell, we need to differentiate
between two aspects, namely (1) **identifying** the precise
subset/colection of a – potentially changing – dataset, and 2) the
actual information making up a citation to that dataset.
for 1) this WG has come up with an answer, a single principle, that so
far seems to work across all types of data and solutions for
implementing a data repository, whereas for 2) some recommendations can
be made, but it will ultimately depend strongly on the domain, the type
of data and its use.
1) For the identification we again have two sub-challenges, namely a)
the evolution of data (new data being added, errors corrected, …; and
b) the identification of arbitrary subsets. 1a) is solved by versioning
all changes to data, whereas 1b) is addressed by resolving any subsets
dynamically via an operation that is reproducible (referred to, in the
guidelines, as a query) that was executed at a certain timestamp, that
needs to be stored and that is associted with a persistent identifier.
This could be the check-out from a Git repository at a certain point in
time, it could be a “list-directory” command against a versioned file
system, it can be an SQL query against a temporal table, we have seen
solutions using slice/dice operators against a NetCDF file, but it could
also – to use the audio-example referred to before – even a pointer to a
specific off-set in an audio file (or, e.g. a 30sec segment starting at
minute 1 for all audio files with a 44kHz sampling rate in the
collection, as sometimes used to be done for music retrieval
experiments). This also works across distributed repositories, as each
only needs to keep the queries it processed as well as the timestamps
locally, without any need to synchronize clocks. An aggregator would
then simply store the individual PIDs of the responses from a federated
system.
2) Concerning the actual citation text, we may want to re-visit that
topic to see how much more specific we can get while still making sure
that any recommendation works across possibly all types of data and
domains. Currently, we stayed at a very limited set of metadata,
borrowing from the analogy of citations to literature, recommending the
use of two identifiers: one for the (continuously evolving) data source,
and one for the specific subset extracted from it at a given point in
time (analogy: a specific (static) paper identified e.g. via a DOI in an
(evolving, i.e. growing, with new editions being added) journal
(identified via e.g. an ISSN). The creator of the subset may be compared
to the author of a paper, whereas the owner/operator of the data source
may be likened to the editor of proceedings – but any such mapping will
already differ quite a lot across repositories and types of data, so it
is not part of the general recommendations – beyond the statement that
each data center should provide a recommended way of phrasing/expressing
citation. this may well be worth picking up if we have a feeling that
this core can be extended. now that we have a better understadning of
the identification and resolving process.
The pre-print version of the paper we’ve prepared on the recommendations
as well as reference implementations and deployed adoptions, could be
useful to review these principles in different settings:
http://doi.org/10.5281/zenodo.4571616
best regards,
Andi -
Andreas RauberMemberHi Kheeran,
Thanks a lot for the pointer to SAIL and the UKSerP. This infrastructure is, indeed, conceptually very similar to ours – but several orders of magnitudes larger. It very likely took a bit longer to set-up: Some places seem to have been a bit more forward-looking than others in setting up according infrastructures. We, unfortunately, had (and still have) nothing like that in place at this scale.
While setting up a solution like UKSerP was out of the question given the time and budget constraints, we luckily are also facing a somewhat simpler setting with the current project: rather than setting up a central system we want to enable data owners directly to deploy the solution within their own environment – which should make it easier (both from a legal perspective as well as concerning trust) – to make their data accessible. It also keeps the data where the domain expertise is available. This, however, also limits interlinking between data, integrating data from different sources. But we hope that once trust has been gained in the individual systems operated by the data owners themselves, they will also be more willing to trust a third-party system providing the linkage of data while still keeping the data owners in control.
Thus, I wouldn’t dare to say in any way that we have managed to * solve * the social dynamics. On the contrary, as long as benefits are to be gained from having control over the data, and as long as handing over data to somebody else would risk reducing these benefits, we will see all kind of activities happening to impede access to data. Our hope is that via a bottom-up process institutions can learn about the opportunities, assess the risks that each such infrastructure carries, in spite of all protection mechanisms themselves, understand the trade-offs and implications, and to thus gain trust and see the benefits. It’s a rather organic process, allowing institutions (which, in our case, also include industry as a key stakeholder: a lot of data that was relevant in our discussions was commercial data (supply chain analysis, logistics data, etc.) which is held by companies which seem less eager to feed such sensitive data into centralized repositories) to learn from examples and eventually try it themselves. At least that’s our hope and vision in preparing these building blocks and set-up descriptions.
Andi -
Andreas RauberMemberDear all,
Just to let you know and to allow you to prepare for the upcoming plenary:
We already have submitted a request to hold a break-out session again at the upcoming plenary. Assuming that it will be accepted we will again be able to share updates on a number of on-going adoptions as well as discuss a few questions that came up in recent discussions and presentations.
If you think you may have an adoption story to share or have specific issues you would like to see discussed in relation to our recommendations, please let me know so that we can prepare an according schedule.
Andi -
Andreas RauberMemberThis also looks good from my side – unless we see a really strong need for an additional body emerging we should keep the structure as simple as possible. Informal ad-hoc meetings may serve the purpose perfectly well…
Andi -
Andreas RauberMemberHi Mark,
Very intrigued!! 🙂
It would be great to hear about both, the easy experience and the challenging setting! (You know that I frequently find challenges more interesting than solutions 🙂
I’ll plan a slot for you and get back to you with more details a bit later as the whole agenda evolves…
Thanks a lot!!
Andi -
Andreas RauberMemberHi Andrew,
sorry, I just noticed that I hadn’t sent out the details yet, so here
they are:
– Timing:
The meeting will run from Wed. noon until Fri noon, i.e. 2 full days
spread over three. (Which shouldn’t keep you from arriving a day early
to settle in and enjoy vienna – I am sure we can fill any free time you
might have somehow, be it with meetings or otherwise 🙂
– Venue:
The meeting will be held in the the main building of the Technical
University of Vienna, at Karlsplatz 13 (metro station Karlsplatz, metro
lines U1, U2, U4, so easily reachable from virtually all directions. The
meeting will take place in the “Festsaal”, detailed location info is
available in the attached PDF document. This is right in the city
center, about 10mins walk to the Opera, Musikverein, Konzerthaus,
Theater an der Wien, etc.
– Hotels:
There are a number of hotels in the vicinity where TU Wien has
negotiated preferential rates for events at TU Wien, see the atached
list. But basically any hotel that’s near to a metro station (or
anywhere surrounding the Karlsplatz area, Vienna is a pretty small,
walkable city) is fine.
– Arrival:
Vienna International Airport is approx. 20 minutes to the east of the
city. Getting downtown is usually pretty straightforward, either via the
Airport Express Train (CAT, 16 mins), the slightly slower but cheaper
regular train S7 (20 mins), both of which will take you to the metro
stop in Landstraße-Wien Mitte (U3, U4), or by an express train to the
cenral railway station Hauptbahnhof (17mins, U1), as well a several bus
lines. There are also several pre-booked taxi services, the biggest
company at the airport is Airportdriver https://www.airportdriver.at/en
fixed price of 33EUR to any place within Vienna, pre-book at least
24hours before arrival.
More detailed arrival info and links are avilable at the TU Wine
homepage at
https://www.tuwien.ac.at/en/contactsearch/visit_us_travelling_information/
If there is any additional information that any of you needs, please let
me know!
Looking forward to welcoming you in Vienna!
Andi -
Andreas RauberMemberHi Mark, Ahmed,
That’s great, thank you! We definitely want to devote most of the time
of the meeting again to present and discuss new pilots, lessons learned
form implementations and new issues identified. the Deep Carbon
Observatory pilot is definitely most welcome, specifically your
observations on versioning approaches.
This is also call for any other groups who have implemented or are
starting to implement (part of) the recommendations and would like to
present their current status, questions, etc. at th eupcoming plenary:
please let me know so that we can plan slots accordingly.
Andi -
Andreas RauberMemberDear Mark, all,
That’s brilliant news, congratulations!!
As you know all the details at least as good if not better than I do there might be no need for it, but just in case: if you have any questions, let me know, I’ll be happy to help!!
Andi -
Andreas RauberMemberDear all,
The webinar by Gianmaria Silvello from the University of Padova will
start in 20 minutes.
To join, connect to
https://attendee.gotowebinar.com/register/8910633932765250051
Gianmaria will talk about automatically generating citation text from
queries (Recommendation 10): Citation generation for
– XML (rule-based and machine-learning based), for
– RDF (view-based) and for
– RDBMS (view-based),
with demonstrations from real applications for each:
– EAD/Archives+Pharmacological data for XML,
– Eagle-i (bioresources) for RDF and
– IUPHAR (pharmacological data) for RDBMS
best regards,
Andreas -
Andreas RauberMemberDear all,
Just as a reminder for our upcoming Working Group Break-out Meeting at
P10 in Montreal, taking place on Wed. Oct 20, 15:30-17:30
Please let myself and Ari know if you would like to *** present ***
a use-case, pilot, or raise issues that should be addressed by the
Working Group. We will then plan an according slot in the agenda!
If you want to present something, it would be great if you could send us
the slides a day in advance so we can integrate them into a single slide
set.
For those new to this Working Group, I encourage you to take a look at
* 2-page Flyer summarizing the recommendations of the working group
https://www.rd-alliance.org/recommendations-working-group-data-citation-…
(http://dx.doi.org/10.15497/RDA00016)
* A short description of the recommendations, published in the Bulletin
of the IEEE Technical Committee on Digital Libraries, 12:1, 2016.
http://www.ieee-tcdl.org/Bulletin/v12n1/papers/IEEE-TCDL-DC-2016_paper_1…
* The set of webinars presenting the recommendations as well as numerous
adoptions for different dataset (medical, forestry, astronomical,
long-tails/small scale CSV, climate data). For all Webinars the
recordings, set of slides, as well as supporting materials are available at
https://www.rd-alliance.org/group/data-citation-wg/webconference/webconf…
Looking forward to seeing many of you in Montreal next week!
Best regard,
Andreas Rauber -
Andreas RauberMemberDear all,
We are happy to announce the next Webinar on implementations of the recommendations of the RDA Working Group on Data Citation (RDA WGDC) that might be of interest to the Weather, climate and air quality IG. Chris Schubert from the Climate Change Centre austria (CCCA) will be presenting how they implemented the Recommendations of the RDA WG on Dynamic Data citation (WGDC) in their data center.
Title: Implementing of the RDA Data Citation Recommendations by the
Climate Change Centre Austria (CCCA) for a repository of
NetCDF files
Presenter: Chris Schubert, Head of the CCCA Data Center, Vienna, Austria
Time: Thu., 29. 6. 2017, 16:00 CEST (Vienna), which is
07:00 San Francisco
10:00 Washington, DC
15:00 London
16:00 Amsterdam
22:00 Beijing
23:00 Tokyo
00:00+1 Sydney
02:00+1 Auckland
Registration:
https://attendee.gotowebinar.com/register/7377638743291503105The WGDC Web conference page also contains recordings, slides and links to supporting material/papers for all preceeding webinars in this series, specifically:
https://www.rd-alliance.org/group/data-citation-wg/webconference/webconference-data-citation-wg.html
* Implementing the RDA Data Citation Recommendations for
Long-Tail Research Data / CSV files
Presenter: Stefan Pröll
* Implementing the RDA Data Citation Recommendations in the
Distributed Infrastructure of the Virtual and Atomic Molecular
Data Center (VAMDC)
Presenter: Carlo Maria Zwölf, VAMDC, Observatoire de Paris, France
* Implementation of Dynamic Data Citation at the Vermont
Monitoring Cooperative
Presenter: James Duncan, VMC, University of Vermont, Burlington, VT
* Adoption of the RDA Data Citation of Evolving Data Recommendation to
Electronic Health Records
Presenter: Leslie McIntosh, PHD, MPH, Director Center for
Biomedical Informatics, Washington University in
St.Luis
* Enabeling Precise Identification and Citeability of Dynamic Data:
Recommendations of the RDA Working Group on Data Citation (WGDC)
Presenter: Andreas Rauber, Vienna University of TechnologyLooking forward to talking to many of you during the upcoming webinar.
best regards,
Andreas Rauber___________________________________________________________________
Andreas Rauber
Dept. of Software Technology and Interactive Systems
Vienna Univ. of Technology | Tel: (+43) 1 58801 18826
Favoritenstr. 9 – 11 / 188 | Fax: (+43) 1 58801 18899
A – 1040 Wien | e-mail: rauber@ifs.tuwien.ac.at
AUSTRIA | http://www.ifs.tuwien.ac.at/~andi/
__________________________________________________________________ -
Andreas RauberMemberDear all,
The registration for the next Webinar of the Working Group on Data Citation on Fri, Mar 31 at 16:00 Paris / CET is available at
https://attendee.gotowebinar.com/register/8283022795138686465
The webinar will be held by Carlo Maria Zwölf who will present the implementation of the WGDC
Recommendations on Dynamic Data Citation within the distributed
infrastructure of the Virtual and Atomic Molecular Data Center (VAMDC).
Details are available at:
https://www.rd-alliance.org/group/data-citation-wg/webconference/webconf…
Best regards, Andi -
Andreas RauberMemberThe GoTo Web System seems to be down – I managed to connect as Organizer
via telephone dial-in.
I could start the Webinar, but obviously without any web access, thus
not allowing anybody else to join why tries to connect via the web. I
have no clue wht the rpobnlem is – never encountered this before in any
GoTo Meeting…
Andi -
Andreas RauberMemberDear all,
Thanks a lot for atending today’s break-out session! The slides of all presentations are now available at the RDA Website under our WG repository at
https://www.rd-alliance.org/8th-plenary-wgdc-session-slides
They provide links to more detailed back-ground material on the pilots presented beyond what was possible to discuss in the limited time available, as well as contact details of the PIs and their teams presenting the pilots. We will follow up with information on upcoming conference calls allowing the individual pilots to be presented in more detail in the next few weeks.
Thanks again for attending, and as usual, any feedback on the session, the pilots, the recommendations etc. is highly welcome!
best regards, Andi
-
Andreas RauberMemberDear all,
Welcome to all of you who have already arrived in Dever for the P8 meeting. We will have an exciting break-out session bringing together updates on the current implementations of the recommendations as well as comments from different communities.
The meeting will take place on Fri., Sep.16, 11:00-12:30, in Tower Court D
The agenda currently looks as follows:
11:00 Welcome and quick recap of the recommendations
11:10 Reports from individual adoption activities
- Stefan Pröll: CSV reference implementation (presented by Andreas Rauber)
- Cynthia Hudson Vitale (WUSTL)
- Cynthia Chandler (WHOI / BCO-DMO)
- James Duncan (UVM / VMC)
- Justin Buck (BODC / ARGO)
- Carlo Maria Zwölf (OBSPM / VAMDC)
- Anita Smyth (TERN / ANDS WG)
- Martin Fenner (DataCite)
12:20 Q&A, Future Plans
Please let us know if anybody else wants to present any updates on implementations (or if I missed anybody in this list, apologies in advance if so) and we’ll do our best to squeeze it into the compressed schedule.
Those who are not familiar with the recommendations may want to take a look at the summary page which provides also download links to the 2-page flyer and the full article providing some more background information on the individual recommendations, all available at https://rd-alliance.org/group/data-citation-wg/wiki/wgdc-recommendations.html
Looking forward to see you at the break-out meeting!
Andi
-
AuthorReplies