Skip to main content

Notice

We are in the process of rolling out a soft launch of the RDA website, which includes a new member platform. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Visitors may encounter functionality issues with group pages, navigation, missing content, broken links, etc. As you explore the new site, please provide your feedback using the UserSnap tool on the bottom right corner of each page. Thank you for your understanding and support as we work through all issues as quickly as possible. Stay updated about upcoming features and functionalities: https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/

Forum Replies Created

Page 1 of 2
  • Author
    Replies
  • in reply to: #129509

    Dear all,
    This is absolutely correct! In a nutshell, we need to differentiate
    between two aspects, namely (1) **identifying** the precise
    subset/colection of a – potentially changing – dataset, and 2) the
    actual information making up a citation to that dataset.
    for 1) this WG has come up with an answer, a single principle, that so
    far seems to work across all types of data and solutions for
    implementing a data repository, whereas for 2) some recommendations can
    be made, but it will ultimately depend strongly on the domain, the type
    of data and its use.
    1) For the identification we again have two sub-challenges, namely a)
    the evolution of data (new data being added, errors corrected, …; and
    b) the identification of arbitrary subsets. 1a) is solved by versioning
    all changes to data, whereas 1b) is addressed by resolving any subsets
    dynamically via an operation that is reproducible (referred to, in the
    guidelines, as a query) that was executed at a certain timestamp, that
    needs to be stored and that is associted with a persistent identifier.
    This could be the check-out from a Git repository at a certain point in
    time, it could be a “list-directory” command against a versioned file
    system, it can be an SQL query against a temporal table, we have seen
    solutions using slice/dice operators against a NetCDF file, but it could
    also – to use the audio-example referred to before – even a pointer to a
    specific off-set in an audio file (or, e.g. a 30sec segment starting at
    minute 1 for all audio files with a 44kHz sampling rate in the
    collection, as sometimes used to be done for music retrieval
    experiments). This also works across distributed repositories, as each
    only needs to keep the queries it processed as well as the timestamps
    locally, without any need to synchronize clocks. An aggregator would
    then simply store the individual PIDs of the responses from a federated
    system.
    2) Concerning the actual citation text, we may want to re-visit that
    topic to see how much more specific we can get while still making sure
    that any recommendation works across possibly all types of data and
    domains. Currently, we stayed at a very limited set of metadata,
    borrowing from the analogy of citations to literature, recommending the
    use of two identifiers: one for the (continuously evolving) data source,
    and one for the specific subset extracted from it at a given point in
    time (analogy: a specific (static) paper identified e.g. via a DOI in an
    (evolving, i.e. growing, with new editions being added) journal
    (identified via e.g. an ISSN). The creator of the subset may be compared
    to the author of a paper, whereas the owner/operator of the data source
    may be likened to the editor of proceedings – but any such mapping will
    already differ quite a lot across repositories and types of data, so it
    is not part of the general recommendations – beyond the statement that
    each data center should provide a recommended way of phrasing/expressing
    citation. this may well be worth picking up if we have a feeling that
    this core can be extended. now that we have a better understadning of
    the identification and resolving process.
    The pre-print version of the paper we’ve prepared on the recommendations
    as well as reference implementations and deployed adoptions, could be
    useful to review these principles in different settings:
    http://doi.org/10.5281/zenodo.4571616
    best regards,
    Andi

  • in reply to: #129715

    Hi Kheeran,
    Thanks a lot for the pointer to SAIL and the UKSerP. This infrastructure is, indeed, conceptually very similar to ours – but several orders of magnitudes larger. It very likely took a bit longer to set-up: Some places seem to have been a bit more forward-looking than others in setting up according infrastructures. We, unfortunately, had (and still have) nothing like that in place at this scale.
    While setting up a solution like UKSerP was out of the question given the time and budget constraints, we luckily are also facing a somewhat simpler setting with the current project: rather than setting up a central system we want to enable data owners directly to deploy the solution within their own environment – which should make it easier (both from a legal perspective as well as concerning trust) – to make their data accessible. It also keeps the data where the domain expertise is available. This, however, also limits interlinking between data, integrating data from different sources. But we hope that once trust has been gained in the individual systems operated by the data owners themselves, they will also be more willing to trust a third-party system providing the linkage of data while still keeping the data owners in control.
    Thus, I wouldn’t dare to say in any way that we have managed to * solve * the social dynamics. On the contrary, as long as benefits are to be gained from having control over the data, and as long as handing over data to somebody else would risk reducing these benefits, we will see all kind of activities happening to impede access to data. Our hope is that via a bottom-up process institutions can learn about the opportunities, assess the risks that each such infrastructure carries, in spite of all protection mechanisms themselves, understand the trade-offs and implications, and to thus gain trust and see the benefits. It’s a rather organic process, allowing institutions (which, in our case, also include industry as a key stakeholder: a lot of data that was relevant in our discussions was commercial data (supply chain analysis, logistics data, etc.) which is held by companies which seem less eager to feed such sensitive data into centralized repositories) to learn from examples and eventually try it themselves. At least that’s our hope and vision in preparing these building blocks and set-up descriptions.
    Andi

  • in reply to: #130276

    Dear all,
    Just to let you know and to allow you to prepare for the upcoming plenary:
    We already have submitted a request to hold a break-out session again at the upcoming plenary. Assuming that it will be accepted we will again be able to share updates on a number of on-going adoptions as well as discuss a few questions that came up in recent discussions and presentations.
    If you think you may have an adoption story to share or have specific issues you would like to see discussed in relation to our recommendations, please let me know so that we can prepare an according schedule.
    Andi

  • in reply to: #130482

    This also looks good from my side – unless we see a really strong need for an additional body emerging we should keep the structure as simple as possible. Informal ad-hoc meetings may serve the purpose perfectly well…
    Andi

  • in reply to: #130928

    Hi Mark,
    Very intrigued!! 🙂
    It would be great to hear about both, the easy experience and the challenging setting! (You know that I frequently find challenges more interesting than solutions 🙂
    I’ll plan a slot for you and get back to you with more details a bit later as the whole agenda evolves…
    Thanks a lot!!
    Andi

  • in reply to: #131138

    Hi Andrew,
    sorry, I just noticed that I hadn’t sent out the details yet, so here
    they are:
    – Timing:
    The meeting will run from Wed. noon until Fri noon, i.e. 2 full days
    spread over three. (Which shouldn’t keep you from arriving a day early
    to settle in and enjoy vienna – I am sure we can fill any free time you
    might have somehow, be it with meetings or otherwise 🙂
    – Venue:
    The meeting will be held in the the main building of the Technical
    University of Vienna, at Karlsplatz 13 (metro station Karlsplatz, metro
    lines U1, U2, U4, so easily reachable from virtually all directions. The
    meeting will take place in the “Festsaal”, detailed location info is
    available in the attached PDF document. This is right in the city
    center, about 10mins walk to the Opera, Musikverein, Konzerthaus,
    Theater an der Wien, etc.
    – Hotels:
    There are a number of hotels in the vicinity where TU Wien has
    negotiated preferential rates for events at TU Wien, see the atached
    list. But basically any hotel that’s near to a metro station (or
    anywhere surrounding the Karlsplatz area, Vienna is a pretty small,
    walkable city) is fine.
    – Arrival:
    Vienna International Airport is approx. 20 minutes to the east of the
    city. Getting downtown is usually pretty straightforward, either via the
    Airport Express Train (CAT, 16 mins), the slightly slower but cheaper
    regular train S7 (20 mins), both of which will take you to the metro
    stop in Landstraße-Wien Mitte (U3, U4), or by an express train to the
    cenral railway station Hauptbahnhof (17mins, U1), as well a several bus
    lines. There are also several pre-booked taxi services, the biggest
    company at the airport is Airportdriver https://www.airportdriver.at/en
    fixed price of 33EUR to any place within Vienna, pre-book at least
    24hours before arrival.
    More detailed arrival info and links are avilable at the TU Wine
    homepage at
    https://www.tuwien.ac.at/en/contactsearch/visit_us_travelling_information/
    If there is any additional information that any of you needs, please let
    me know!
    Looking forward to welcoming you in Vienna!
    Andi

  • in reply to: #131239

    Hi Mark, Ahmed,
    That’s great, thank you! We definitely want to devote most of the time
    of the meeting again to present and discuss new pilots, lessons learned
    form implementations and new issues identified. the Deep Carbon
    Observatory pilot is definitely most welcome, specifically your
    observations on versioning approaches.
    This is also call for any other groups who have implemented or are
    starting to implement (part of) the recommendations and would like to
    present their current status, questions, etc. at th eupcoming plenary:
    please let me know so that we can plan slots accordingly.
    Andi

  • in reply to: #131260

    Dear Mark, all,
    That’s brilliant news, congratulations!!
    As you know all the details at least as good if not better than I do there might be no need for it, but just in case: if you have any questions, let me know, I’ll be happy to help!!
    Andi

  • in reply to: #131400

    Dear all,
    The webinar by Gianmaria Silvello from the University of Padova will
    start in 20 minutes.
    To join, connect to
    https://attendee.gotowebinar.com/register/8910633932765250051
    Gianmaria will talk about automatically generating citation text from
    queries (Recommendation 10): Citation generation for
    – XML (rule-based and machine-learning based), for
    – RDF (view-based) and for
    – RDBMS (view-based),
    with demonstrations from real applications for each:
    – EAD/Archives+Pharmacological data for XML,
    – Eagle-i (bioresources) for RDF and
    – IUPHAR (pharmacological data) for RDBMS
    best regards,
    Andreas

  • in reply to: #131493

    Dear all,
    Just as a reminder for our upcoming Working Group Break-out Meeting at
    P10 in Montreal, taking place on Wed. Oct 20, 15:30-17:30
    Please let myself and Ari know if you would like to *** present ***
    a use-case, pilot, or raise issues that should be addressed by the
    Working Group. We will then plan an according slot in the agenda!
    If you want to present something, it would be great if you could send us
    the slides a day in advance so we can integrate them into a single slide
    set.
    For those new to this Working Group, I encourage you to take a look at
    * 2-page Flyer summarizing the recommendations of the working group
    https://www.rd-alliance.org/recommendations-working-group-data-citation-
    (http://dx.doi.org/10.15497/RDA00016)
    * A short description of the recommendations, published in the Bulletin
    of the IEEE Technical Committee on Digital Libraries, 12:1, 2016.
    http://www.ieee-tcdl.org/Bulletin/v12n1/papers/IEEE-TCDL-DC-2016_paper_1
    * The set of webinars presenting the recommendations as well as numerous
    adoptions for different dataset (medical, forestry, astronomical,
    long-tails/small scale CSV, climate data). For all Webinars the
    recordings, set of slides, as well as supporting materials are available at
    https://www.rd-alliance.org/group/data-citation-wg/webconference/webconf
    Looking forward to seeing many of you in Montreal next week!
    Best regard,
    Andreas Rauber

  • in reply to: #131750

    Dear all,

    We are happy to announce the next Webinar on implementations of the recommendations of the RDA Working Group on Data Citation (RDA WGDC) that might be of interest to the Weather, climate and air quality IG. Chris Schubert from the Climate Change Centre austria (CCCA) will be presenting how they implemented the Recommendations of the RDA WG on Dynamic Data citation (WGDC) in their data center.

    Title: Implementing of the RDA Data Citation Recommendations by the
           Climate Change Centre Austria (CCCA) for a repository of
           NetCDF files
    Presenter: Chris Schubert, Head of the CCCA Data Center, Vienna, Austria
    Time: Thu., 29. 6. 2017, 16:00 CEST (Vienna), which is
          07:00 San Francisco
          10:00 Washington, DC
          15:00 London
          16:00 Amsterdam
          22:00 Beijing
          23:00 Tokyo
          00:00+1 Sydney
          02:00+1 Auckland
    Registration:
    https://attendee.gotowebinar.com/register/7377638743291503105

    The WGDC Web conference page also contains recordings, slides and links to supporting material/papers for all preceeding webinars in this series, specifically:

    https://www.rd-alliance.org/group/data-citation-wg/webconference/webconference-data-citation-wg.html

    * Implementing the RDA Data Citation Recommendations for
      Long-Tail Research Data / CSV files
      Presenter: Stefan Pröll
    * Implementing the RDA Data Citation Recommendations in the
      Distributed Infrastructure of the Virtual and Atomic Molecular
      Data Center (VAMDC)
      Presenter: Carlo Maria Zwölf, VAMDC, Observatoire de Paris, France
    * Implementation of Dynamic Data Citation at the Vermont
      Monitoring Cooperative
      Presenter: James Duncan, VMC, University of Vermont, Burlington, VT
    * Adoption of the RDA Data Citation of Evolving Data Recommendation to
      Electronic Health Records
      Presenter: Leslie McIntosh, PHD, MPH, Director Center for
                 Biomedical Informatics, Washington University in
                 St.Luis
    * Enabeling Precise Identification and Citeability of Dynamic Data:
      Recommendations of the RDA Working Group on Data Citation (WGDC)
      Presenter: Andreas Rauber, Vienna University of Technology

    Looking forward to talking to many of you during the upcoming webinar.

    best regards,
    Andreas Rauber

    ___________________________________________________________________
    Andreas Rauber
    Dept. of Software Technology and Interactive Systems
    Vienna Univ. of Technology  | Tel:    (+43) 1 58801 18826
    Favoritenstr. 9 – 11 / 188  | Fax:    (+43) 1 58801 18899
    A – 1040 Wien               | e-mail: rauber@ifs.tuwien.ac.at
    AUSTRIA                     | http://www.ifs.tuwien.ac.at/~andi/
    __________________________________________________________________

     

  • in reply to: #131992

    Dear all,
    The registration for the next Webinar of the Working Group on Data Citation on Fri, Mar 31 at 16:00 Paris / CET is available at
    https://attendee.gotowebinar.com/register/8283022795138686465
    The webinar will be held by Carlo Maria Zwölf who will present the implementation of the WGDC
    Recommendations on Dynamic Data Citation within the distributed
    infrastructure of the Virtual and Atomic Molecular Data Center (VAMDC).
    Details are available at:
    https://www.rd-alliance.org/group/data-citation-wg/webconference/webconf
    Best regards, Andi

  • in reply to: #132102

    The GoTo Web System seems to be down – I managed to connect as Organizer
    via telephone dial-in.
    I could start the Webinar, but obviously without any web access, thus
    not allowing anybody else to join why tries to connect via the web. I
    have no clue wht the rpobnlem is – never encountered this before in any
    GoTo Meeting…
    Andi

  • in reply to: #132430

    Dear all,

    Thanks a lot for atending today’s break-out session! The slides of all presentations are now available at the RDA Website under our WG repository at

    https://www.rd-alliance.org/8th-plenary-wgdc-session-slides

    They provide links to more detailed back-ground material on the pilots presented beyond what was possible to discuss in the limited time available, as well as contact details of the PIs and their teams presenting the pilots. We will follow up with information on upcoming conference calls allowing the individual pilots to be presented in more detail in the next few weeks.

    Thanks again for attending, and as usual, any feedback on the session, the pilots, the recommendations etc. is highly welcome!

    best regards, Andi

     

  • in reply to: #132439

    Dear all,

    Welcome to all of you who have already arrived in Dever for the P8 meeting. We will have an exciting break-out session bringing together updates on the current implementations of the recommendations as well as comments from different communities.

    The meeting will take place on Fri., Sep.16, 11:00-12:30, in Tower Court D

    The agenda currently looks as follows:

    11:00 Welcome and quick recap of the recommendations

    11:10 Reports from individual adoption activities

    • Stefan Pröll: CSV reference implementation (presented by Andreas Rauber)
    • Cynthia Hudson Vitale (WUSTL)
    • Cynthia Chandler (WHOI / BCO-DMO)
    • James Duncan (UVM / VMC)
    • Justin Buck (BODC / ARGO)
    • Carlo Maria Zwölf (OBSPM / VAMDC)
    • Anita Smyth (TERN / ANDS WG)
    • Martin Fenner (DataCite)

    12:20 Q&A, Future Plans

    Please let us know if anybody else wants to present any updates on implementations (or if I missed anybody in this list, apologies in advance if so) and we’ll do our best to squeeze it into the compressed schedule.

    Those who are not familiar with the recommendations may want to take a look at the summary page which provides also download links to the 2-page flyer and the full article providing some more background information on the individual recommendations, all available at https://rd-alliance.org/group/data-citation-wg/wiki/wgdc-recommendations.html

    Looking forward to see you at the break-out meeting!

    Andi

Page 1 of 2