Taking care of the Long Tail of Data: I-IRG report
Data driven science is seen as the fourth paradigm of scientific research after experimental science, theoretical science and computational science.
While great focus is given to Big Data (comprising structured as well as unstructured data with a tendency to be homogeneous and standardized), at the opposite side of the spectrum Long Tail Data - characterized as heterogeneous, relatively small data, often with unique standards and not regulated - has received so far less attention and poses its challenges.
Long tail data exist across all disciplines, very often only in individual computers or university servers, and often also with minimal or no attached metadata or documentation, a major obstacle to reusability.
The Long Tail of data is thus often in trouble, but it is ubiquitous and taking it properly into account is one of the keys to optimally exploit the capacity of Open Science. It raises more and more interest, in particular with the development of institutional and general repositories, including the public and commercial ones, and in the library community, since librarians undergo a very significant evolution of their traditional tasks and are interested in becoming scientific data curators.
e-IRG Task Force on the Long Tail of Data has produced an interesting report on the Long Tail of Data with help from the community and input from RDA open data advocates Françoise Genova and Wolfram Horstmann.
Download the whole report now: it is available online here!