In case you have not heard, AGU is coming again this year... in December.
Please allow me to bring your attention to 7 sessions that specifically address Science Data Analytics and/or Data Science (listed below for your convenience). Note one (the first one) targets the education-minded community. The others are associated with Earth and Space Science Informatics, but conveners are very interested in insights and experiences from the physical science communities, as well as from the informatics community. See you in SF.
Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research: The Future Is Here
Session ID#: 1879
Scientists are increasingly exploring heterogeneous data analysis methodologies to tease out information and knowledge. Science data analytics techniques need to be well understood and advanced in order to maximize cross dataset integration and usability. Data Scientist required skills in performing data analytics, to better understand unobvious relationships across various datasets, are becoming more and more appreciated and significant, given the increasing amount of heterogeneous data available. This session seeks papers that: Describe university and non-university science domain oriented data scientist (and data analytics) training being provided to students, and; Desirable science research oriented analytics skills and expertise that are needed to be taught, so that students can move into high demand, science domain data scientist (data analytics) positions. Topics covered include curriculums and science data research projects, that teach/utilize machine learning, statistics, data mining, decision support modeling, or other analytics techniques.
Steven J Kempler
Emily Law, Sara J Graves, Chung-Lin Shie
EARTH AND SPACE SCIENCE INORMATICS
Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas
Session ID#: 1809
Today, industries are calling Data Science “the sexiest job of the 21st century”. But how do Data Scientists contribute to scientific research? What experiences, challenges, and solutions have Data Scientists had that future Data Scientists can learn from? What do Data Scientists need to know, to support Earth and Spaces science research? This session seeks papers that describe Data Science activities, experiences, and challenges, as well as the expertise and skills Data Scientists need. Areas that may be covered include data lifecycle phases: data modelling, acquisition, cleaning, integration, analysis, and interpretation, each of which introduces challenges, problems, and solutions. We invite papers that address:
Type of work a Data Scientist performs
Data Science experiences and lessons
Data Science challenges
Data Science top problems (and solutions). For example:
Ensuring data and meta-data consistency
Maintaining analytics expertise per science domain
Supporting quality and uncertainty
Advancing data analytics techniques
John S Hughes and Steven J Kempler
Advancing Analytics using Big Data Climate Information System
Session ID#: 3022
Earth system science has seen massive increase in both observational data and modeling outputs. This constitutes a Big-Data challenge that demands Big-Data technologies to address. However, it is difficult for individual investigators or research groups to implement petabyte-scale platforms required to tackle the data analyses needed. It is also increasingly obvious that we must share and leverage our infrastructure investments in order to scale our research and development efforts and to increase the scientific productivity or throughput. Thus, in this session we seek presentations for innovative techniques, systems, or infrastructures that address the petascale data analysis and collaborative research and development challenges.
The focus of this session is on data analytics, rather than search, access, or curation. Subtopics of interest include:
Science applications focusing on integrating climate modeling and satellite observations.
Techniques, systems, infrastructures that enable seamless collaborations.
Innovative approaches of interactive visualization enhancing analytics of large data sets.
Tsengdar J Lee, Michael S Seablom, Ramakrishna R Nemani
Big Data in the Geosciences: New Analytics Methods and Parallel Algorithm
Session ID#: 3292
Earth and space science data are increasingly large and complex--often representing high spatial/temporal/spectral resolution and dimensions from remote sensing or model results--making such data difficult to analyze, visualize, interpret, and understand by traditional methods. This session focuses on application and development of new geoscientific data analytics approaches (statistical, data mining, assimilation, machine learning, etc.) and parallel algorithms and software employing high performance computing resources for scalable analysis and novel applications of traditional methods on large geoscience data sets. Analysis methods that operate in-situ with parallel simulations to reduce output data volumes are also of interest. Abstracts focused on analysis, synthesis and knowledge extraction from large and complex Earth science data from all disciplines are invited
Robert L Jacob, Forrest M Hoffman, Miguel D Mahecha
Leveraging Enabling Technologies and Architectures to enable Data Intensive Science
Session ID#: 3041
The objective of this session is to share innovative concepts, emerging solutions, and applications for Big Earth and Space Data to enable Data-Intensive Science. Data-Intensive Science defines three high-level activities: capture, curation, and analysis of data. Being able to handle massive amount of data impacts our architectural decisions and approaches. Topics include demonstration, studies, methods, and/or architectural discussion on
Common enabling technologies
Automated techniques for data analysis
Science analysis and visualization
Real time decision support
Implication of Data Intensive science to education
Data management lifecycle functions from data capture through analysis
Architecture that spans multiple data systems and organizations
Rahul Ramachandran, Daniel J Crichton, Morris Riedel
Open source solutions for analyzing big earth observation data
Session ID#: 3080
Most current earth observation data has become freely available, but has also become too large to download and analyse on local machines. Several solutions exist to analyse "near the data", e.g. array data bases, solutions build on hadoop, solutions that use R or python to organize a cluster, or google earth engine. Not all of these are open source and hence suitable for transparent reproducible scientific research purposes. This session will attract papers that present solutions to and experiences with analysing big earth observation data near the data, using open source software. It will also accept contributions with non open-source solutions that are willing to discuss transparency and reproducibility.
Edzer J Pebesma
James Frew, Robert J Hijmans, Jonathan A Greenberg
Technology Trends for Big Science Data Management
Session ID#: 2525
The technology trend toward the use of ontologies, models and information representation  is predicted to favorably impact system architectures for big science data management. Data scientists in the space and earth sciences are developing system architectural components that incorporate these technologies into the data lifecycle - from ground systems through to the archives – and that help drive science analysis by producing interoperable systems and correlatable data. Technologies include ontologies for model driven development, science and engineering discipline ontologies, metadata and provenance standards vocabularies, and the use of semantic infrastructure for integration, publication, and analysis of science data to promote cross-disciplinary studies. This session invites papers on these and related technologies that are intended to improve the discovery and correct use of data and help meet the expectations of modern scientists in the Big Data era.  National Research Council (U.S.). Frontiers in Massive Data Analysis. 2013.
John S Hughes
Daniel J Crichton, Yolanda Gil, Bernd Ritschel
7 AGU Sessions addressing: Science Data Analytics; Data Science
You are here