Advancing Data Transparency in Industrial Ecology
These actions will help the IE community to move toward data transparency and accessibility.
(1) a minimum publication requirement for IE research to be adopted by the Journal of Industrial Ecology;
(2) a system of optional data openness badges rewarding journal articles that contain transparent and accessible data.
The FAIR Guiding Principles define a fundamental set of four attributes of open data; they should be: (1) findable; (2) accessible; (3) interoperable; and (4) reusable. It is envisaged that these attributes can be achieved if authors publish appropriate sets of metadata alongside their specific datasets. Here, we summarize the specific guidance provided in Wilkinson and colleagues (2016) for data to meet FAIR criteria:
- (1)findable: indexing or archiving (meta)data with unique identifiers (e.g., digital object identifiers [DOIs]) at a searchable resource;
- (2)accessible: (meta)data use an open standard for machine readability and are made permanently available.
- (3)Interoperable: (meta)data use standard data vocabularies, in a formal, open, and broadly applicable language, and include references to connected data.
- (4)reuseable: (meta)data are defined with relevant attributes for reuse such as a clearly defined license statement.
The data repository registry re3data now lists over 1,500 individual data repositories from multiple scientific fields ranging from general-purpose ones such as Figshare, Zenodo, and Dryad to subject specific ones such as GenBank for genetic sequence data, PANGAEA for Earth and Environmental Science, or the Interdisciplinary Earth Data Alliance.
We identify two key issues that often make inspection of IE research difficult: (1) Digital data are typically inadequately identified; and 2) data extraction is often more difficult than necessary. The requirements below are intended to address these two issues.
Minimum Publication Requirement 1: Data citation: All secondary data and databases used in the analysis must be cited in accordance with the journal's citation style. This information can include database version, database settings (e.g., allocation), date accessed, and DOI, if pertinent. This requirement both clarifies data sources and provides incentives for publication of reusable and citable data. Data may be cited in the main section of the paper or in the supporting information.
Minimum Publication Requirement 2: Enumerate primary results: The data that are represented in each graph or figure in an article must be published in numerical form, clearly referenced in the text, and labeled. For example, a simple spreadsheet containing the quantitative data shown in figures and tables in an article fulfills this requirement; such data can be provided in supporting information or in a publicly accessible repository. This requirement should facilitate the unambiguous inspection and usage of quantitative information contained in all key results presented as figures and graphs. The underlying quantitative data would become directly accessible, avoiding the need to visually estimate them from figures or manually copy them from tables and thus avoiding any uncertainties or errors introduced from this process. This requirement aims to facilitate increased citation, reuse, and meta-analyses of published work.
In all cases, the data supplied should be published in the supporting information or archived in a trusted repository, preferably an official repository which assigns DOIs, and cited accordingly in the original article. We expect practices in this regard to evolve as scientific publishing continues to address data transparency and accessibility.
We introduce a Python toolbox for IE that includes the life cycle assessment
(LCA) framework Brightway2, the ecospold2matrix module that parses unallocated data
in ecospold format, the pySUT and pymrio modules for building and analyzing multiregion
input-output models and supply and use tables, and the dynamic stock model class for
dynamic stock modeling.
Input-Output
Specific software is needed for most I-O models because of
the large number of system variables. Aside from play models for illustrative purposes, the number of sectors in the system typically lies between several hundred (single-region I-O)
and 104 to 105 (for integrated hybrid models such as THEMIS
or the Eora MRIO table [Hertwich et al. 2015; Lenzen et al.
2013]). I-O modeling requires many intermediate steps that require software, including balancing algorithms, trade linking
tools for multiregion input-output (MRIO), constructs to build
I-O models from supply and use tables (SUTs), and aggregation/disaggregation routines (Wood 2015; Lenzen et al. 2009;
Majeau-Bettez et al. 2014b; Miller and Blair 2009). Data processing and analysis scripts, however, are not generally made
public, and we only know of two exceptions: De Koning and
colleagues (2015) and CIRAIG (2015). A recent comparative
study of six major MRIO frameworks by Lutter and Giljum
(2014, 7) finds a general lack of transparency: “Procedures for
manipulating IO tables, e.g. for disaggregating existing tables or
harmonizing IO tables from different national sources, [are] often not well documented.” This is a problem, given that these
models are rapidly gaining relevance in climate and resource
policy making.
Databases
Databases for IE are further developed than is the case for
model software. Existing databases are accepted and widely used
by the community; they are comprehensive and often well
documented. Examples for such databases include ecoinvent
for LCA (Frischknecht et al. 2005; Weidema et al. 2013);
the Eora world MRIO model (Lenzen et al. 2013), IELab
(Lenzen et al. 2014), or EXIOBASE (Wood et al. 2014) for
I-O, as well as a global database of materials flows for MFA
(SERI/WU 2014). Collaborative data frameworks have been
proposed by several researchers (Davis et al. 2010; Lenzen et al.
2014).
Level of software development:
The understanding of
the importance of good software is ubiquitous within IE,
but this is rarely reflected in common practice of our
field. No widely accepted, readily available implementations of many common computational routines exist.
At present, most IE models are coded in spreadsheets or
form monolithic blocks of software in various programming languages. They are developed as in-house projects
for single-case studies. Often, the quality of documentation does not match the complexity of the code. In many
cases, the code is difficult to reuse and is therefore abandoned.
Level of software openness:
To our knowledge, there
is neither an established standard nor a vivid debate regarding the transparency and reproducibility of computations
behind published quantitative research conducted under
the label IE. A general lack of reproducibility may lower
the scientific quality of the field as a whole, which, in
the long run, can impede interaction with other scientific fields and the acquisition of research funding. Low levels
of reproducibility and transparency exclude noninsiders
from verifying the conclusions drawn, which can undermine the credibility of our research
Guidelines for Developing, Testing, and
Documenting Software Tools in Industrial
Ecology
The Python Toolbox for Industrial Ecology
Life Cycle Assessment (Brightway2 and Extensions)
Brightway2 is a framework for LCA, covering everything
from data I-O and processing to calculations and interpretation
(Mutel 2014). The software itself is split into different modules,
each with a specific focus and limited set of capabilities. In
addition to the core components, extension modules provide
user interfaces, regional and dynamic LCA, and data interfaces.
this module reorganizes the ecoinvent 3 database as
a collection of matrices. It can notably: assemble the unallocated dataset in a supply and use table
framework (see pySUT). perform basic quality checks on the reallocated data
sets and arrange them as Leontief technical coefficient
matrices with environmental extensions. optionally change sign conventions for waste flows and
properties to align with the waste input-output (WIO)
model (Nakamura and Kondo 2002).
Multiregional Input-Output Analysis (pymrio)
Approximately half a dozen environmentally extended
MRIO tables were published over the last 2 years (Tukker and
Dietzenbacher 2013). Most of these tables are freely available.
Each model has its own file format, classification, and indexing;
efficient handling and analyzing of MRIO models therefore requires a certain degree of training. The pymrio module (Stadler
2014a) allows for easy handling of global MRIO models. It provides a comprehensive, well-documented (Stadler 2014b) set
of commands for manipulating and analyzing (MR)IO tables,
including: Parsing global MRIO tables Modifying region and sector classification Restructuring extensions Calculating various accounts (footprint, territorial, impacts embodied in trade) Exporting to various formats (csv, html, MS Excel) Visualization and automated report generation.
Comments
Post a Comment