Skip to main content


We are in the process of rolling out a soft launch of the RDA website, which includes a new member platform. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: Visitors may encounter functionality issues with group pages, navigation, missing content, broken links, etc. As you explore the new site, please provide your feedback using the UserSnap tool on the bottom right corner of each page. Thank you for your understanding and support as we work through all issues as quickly as possible. Stay updated about upcoming features and functionalities:


There is an overarching problem in dataset citation, and you point out half
of it. Datasets are frequently aggregate works (collections) and there is
no easy way to reference the components of the aggregate unit. While I
advocate making bibliographic metadata available, using biblatex is
marginally better than bibtex, neither was designed to be the source
authority format for archival records.
The second and more frequently ignored problem in dataset referencing is
that datasets as archival objects are often miscategorized.
Some take the position that all objects in digital form are data… but is
software data? Ans this leads to an important philosophical question about
the role of institutional repositories, should they persist data, or should
they persist the evidenciary record? That is, is the term data at all even
If I have an aggregate work of audio materials that may be be
cited/referenced as an album and each sub-unit as a track. There is no
reason to categorize this as a “dataset”. The same is true with a set of
ethnographic interviews which are just dumped into a repository. They are
interviews not just recordings or “dataset”. So depending on the media
type some things should not be datasets. I find the dcmitype vocabulary
very helpful in this regard. Dublin core says that every artifact should
have a one to one record in the catalogue. So each audio recording should
get its own record an a relationship to the record for the aggregate work
which would be the album.
Datasets are a legitimate item type, but as the dcmitype identifies them
they are tabular data, ready for ingest into a computer application. In
this manner they are distinct from the dcmitype for text in that they are
not designed for human literary consumption.
The need to accurately identify item types comes back to repositories and
how they identify content, and make those identical ions available via
pre-formatted bibliographic records. If the repository says that everything
is “dataset” then the use of biblatex @dataset versus bibtex @misc is a
mute point because both are equally unhelpful and ambiguous to the end-user
who might look to reuse the bibliographic metadata.
Also note that apa7th is out. I don’t like it but it is out.
Some food for thought,
All the best,
– Hugh
On Thu, Apr 1, 2021 at 10:26 AM mtrognitz via Data Citation WG <
***@***.***> wrote: