Two benchmark climate change reports offer differing ranges of future sea level rise. How could that be? A deeper look reveals that one report gauges the most likely change while the other gauges plausible change. But that insight is gained only through access to the data and methodologies behind the documents, and details that are often difficult to trace. Tracking data provenance using Semantic Web technology offers a solution for decision makers considering results contained in such science-based documents, argue a group of researchers in a commentary published in the June edition of Nature Climate Change.

“Data provenance, which helps identify the origin of a scientific conclusion, is very important. We want to link a scientific conclusion back to the data source, methods, computer program, to the activities and people who produced this conclusion,” said Xiaogang Ma, lead author of the commentary and a postdoctoral fellow in the Tetherless World Constellation at Rensselaer Polytechnic Institute. Authors include Peter Fox, Tetherless World Research Constellation Chair and a professor of earth and environmental sciences and computer science at Rensselaer, as well as researchers from NASA, the University of Arizona, and Second Nature Inc.

As discussed in “Capturing and presenting provenance of global change information,” the two sea-level reports serve different goals, with one intended for broad policy framing and the other focused on costal risk assessment. The commentary authors used the examples of the reports – the 2013 Fifth Assessment Report of the International Panel on Climate Change and a 2012 report published by the National Oceanic and Atmospheric Administration in support of the National Climate Assessment of the United States Global Climate Research Program – to explain the benefits and advantages that Semantic Web technologies can bring to decision-makers.

For example, a graph in the third U.S. National Climate Assessment tracks three sea level indicators over time – tide gauge data, satellite data, and proxy data such as sediment records – to predict a range of possible future sea-level rise. But the graph offers little background on the datasets, people, and models used to compile the figure. Fox, and the Tetherless World Constellation, takes part in just such an effort, creating a Global Change Information System that will track data incorporated into the National Climate Assessment.

“A lot of money, effort, and human resources have been spent on global climate change research and its impacts, and the output is generally in the form of these reports,” Ma said. “The reports are used in decision-making because they relate global climate change to societal and economic issues, and in their work, decision-makers want to know – how did you generate these conclusion?”

As the commentary states:

For decision-makers to understand (and feel confident in their understanding of) the value and nature of the different approaches, scenarios, and projections, it is essential for them to be able to look into the data, models, and analytical sources behind the reports. And they need to be able to access this information easily and in interpretable format – a goal that provenance tracking can support.

Data provenance captures that information, and together with application tools makes it easily accessible both to human users and to machines via the Web. Provenance begins with metadata – data about data – which can now be systematically labeled and annotated using a number of established “metadata schema.” Such metadata might include information on people, scientific instruments, fieldwork, conference, organizations – any element that contributes to a scientific conclusion.

Once labeled, a conceptual model maps the relationship between different elements of data. The model, or “ontology,” starts as a diagram mapping the relationships, and is used as the basis for a series of linked Web pages for data sets and metadata.

“Our aim is to make it possible to track the whole workflow that was used to generate the scientific conclusion,” said Ma. “With a well-organized back-end and well-designed user interface, you can easily pose queries and generate new information based on the report. It’s far more powerful than the original (static) print or pdf document."