Digital catalogs of ocean data have been available for decades, but advances in standardized services and software for catalog searches and data access now make it possible to create catalog-driven workflows that automate—end-to-end—data search, analysis, and visualization of data from multiple distributed sources. Further, these workflows may be shared, reused, and adapted with ease.
A new workflow developed within the US Integrated Ocean Observing System (IOOS) automates the skill assessment of water temperature forecasts from multiple ocean forecast models, allowing improved forecast products to be delivered for an open water swim event.
A series of Jupyter Notebooks are used to capture and document the end-to-end workflow using a collection of Python tools that facilitate working with standardized catalog and data services. The workflow first searches a catalog of metadata using the Open Geospatial Consortium (OGC) Catalog Service for the Web (CSW), then accesses data service endpoints found in the metadata records using the OGC Sensor Observation Service (SOS) for in situ sensor data and OPeNDAP services for remotely-sensed and model data. Skill metrics are computed and time series comparisons of forecast model and observed data are displayed interactively, leveraging the capabilities of modern web browsers.
The workflow executed with the 2015 Boston Light Swim input parameters discovered in situ temperature data and temperature data from a remotely-sensed dataset and from four different forecast models that cover the region. The resulting workflow not only solves a challenging specific problem, but highlights the benefits of dynamic, reusable workflows in general.
These workflows adapt as new data enter the data system, facilitate reproducible science, provide templates from which new scientific workflows can be developed, and encourage data providers to use standardized services. As applied to the ocean swim event, the workflow exposed problems with two of the ocean forecast products which led to improved regional forecasts once errors were corrected.
Through catalog searches and use of interoperable web services dynamic, reusable workflows enable more effective assessment of large, distributed collections of data. More eyes on the model results means more feedback to modelers, resulting in better models. Complex data analysis from a variety of sources can be automated and dynamically respond as new data enter (or leave) the system. The notebook approach allows rich documentation of the workflow, and automatic generation of software environments allow users to easily run specific notebooks on local computers, on remote machines, or in the Cloud. The workflows serve as training and demonstration by example. The workflows can also be easily modified to form the basis for new scientific applications. Often a new application will highlight issues that need fixing before it can function. Regardless of the issue, fixing it for a specific workflow not only enables success for that workflow, but for an entire class of workflows and, thus, the larger geoscience community. With their numerous benefits demonstrated here, the researchers anticipate increased use of dynamic notebooks across geoscience domains.
The workflow was tested by a team from the U.S. Geological Survey, the Southeast Coastal Ocean Observing Regional Association, and Axiom Data Science. The lead researcher was Richard P. Signell of the USGS Woods Hole Coastal and Marine Science Center, and the results were presented at the 14th Estuarine and Coastal Modeling Conference. Read the full paper here