FAIRness in the multi-service data infrastructure of the Tropospheric Ozone Assessment Report (TOAR) and Artificial Intelligence for Air Quality (IntelliAQ) project
IntelliAQ is a European project aiming to develop novel approaches for the analysis and synthesis of global air quality data based on deep neural networks. A core element of the project’s strategy is the linkage of several different types of data, including time-series of air quality observations, high-resolution geospatial data, high-resolution weather model data, and satellite retrievals of air pollutants. To achieve this linkage, IntelliAQ builds upon and expands the data infrastructure of the Tropospheric Ozone Assessment Report (TOAR), which has collected multi-year time-series of ground-level ozone observations from over 30 providers at more than 10,000 sites around the world. Based on experiences in TOAR, phase 1, we have re-designed the database, designed new web services for access to the geospatial data, created a new workflow for data submissions, which includes a semi-automatic data publication, and developed tools for efficient parallel processing of large volumes of meteorological data. All data services are being developed with FAIR principles in mind from the start, and together, they will form the central data portal to support the second phase of TOAR, which just started in December 2019. All major elements of the IntelliAQ/TOAR data infrastructure offer REST APIs with (to the extent possible) uniform query syntax. Where possible, we provide free, open, and unrestricted access to TOAR data under the CC-BY 4 license. TOAR data publications include a doi and are offered for individual data submissions and as a central repository for datasets that are analyzed in TOAR-related publications. We have started the process to have the TOAR data centre certified under the Core Trust Seal regulations. IntelliAQ and TOAR aim to produce datasets that can be reused for several decades. Besides its main role as a community data repository, the TOAR data centre acts as a platform to test novel, high-performance workflows for heterogeneous data sets, primarily in the context of machine learning applications.
Click on poster image to enlarge
Note: A higher-resolution version of the poster is attached at the bottom of this page.