Curating Historical Statistics Data on Baltic Countries in 1897-1939: Providing Data with Rich Metadata
With the expansion of information and communication technologies and the more recent onslaught of ever-increasing computer power and big data, the research community started focusing its attention on collecting and analyzing wast amounts of textual, numeric and visual data available in the social media, business processes, health records etc. At the same time interest in other types of data waned somewhat. To a certain extent this also applies to the collections of historical data. To be sure, several databases exist that provide information on manifold aspects of societies in historical perspective (in some cases ranging for more than two centuries): ourworldindata.org, clio-infra.eu, www.gapminder.org, stats.oecd.org, data.worldbank.org, www.historicalstatistics.org etc. Some of them provide raw data, visualizations as well as quite extensive metadata. However, historical statistics is, in general, rather sparse (data is available for a limited number of countries and for limited periods of time) and poorly documented (sources are only generally described). Historical researchers usually have to rely on the data collected in the “Western world” or analyze rather limited time series (ranging for only 30-50 years).
Historical statistics data on Baltic countries are rather scarce and there are two main reasons for this situation. First, these countries were occupied by Russia, Germany and the Soviet Union for extended periods of time. Thus, historical data is simply not available as they did not exist as separate countries. However, this situation could be remedied by collecting sub-national data on the provinces and other territorial units of the occupying countries coinciding (more or less) with the boundaries of the currently independent Baltic states. Second, historical statistics were collected and published during the interwar period when the countries gained independence, however, these data still remain non-digitized as to be easily available for research purposes (in data table format). Again, this situation could be changed by researchers collecting and digitizing the available data sources.
Having these two possible sources of historical data in mind, the project by the Vilnius University researchers was initiated (see www.lidata.eu/en/BalticHistory). One of the several activities in the project is systematic collection of historical statistics data on the comparative social and economic development of the Baltic States in 1897-1939 and subsequent publishing in the Lithuanian Data Archive for Social Sciences and Humanities (LiDA) hosted by the Kaunas University of Technology. Two major objectives were contemplated with regard to the data collection and publishing: 1) user friendly visualizations of times series and regional data (along the lines of ourworldindata.org) as well as 2) detailed documentation of all the data sources and cells in the table. Thus, in this poster we present an attempt to develop data documentation model for historical statistics data that includes metadata not only on the variable (column) level (which is quite usual for metadata standards, such as DDI) but also on the data table cell level (as well as on the case (row) level). We also provide realization of the historical statistics data visualization possibilities (including detailed metadata) employing Shiny applications.
This poster may be relevant to 'Empirical Humanities Metadata Working Group' aiming to analyze best metadata practices as well as their initiated 'Digital Practices in History and Ethnography (DPHE) Interest Group' which states as one of its goals advancement of 'capacity to share, integrate, visualize and act with different kinds of data and analyses'. Also, it may be relevant to 'Social Sciences Research Data (SSRD) Interest Group' aiming to investigate (among other things) best practices of implementing Reuse (one of the FAIR building blocks) by identifying 'the quality and provenance of the data: where do they come from, how were they collected and curated, etc.'