Industrial Ecology Notes (Part 5)

Advancing Data Transparency in Industrial Ecology

These actions will help the IE community to move toward data transparency and accessibility.

(1) a minimum publication requirement for IE research to be adopted by the Journal of Industrial Ecology;

(2) a system of optional data openness badges rewarding journal articles that contain transparent and accessible data.

The FAIR Guiding Principles define a fundamental set of four attributes of open data; they should be: (1) findable; (2) accessible; (3) interoperable; and (4) reusable. It is envisaged that these attributes can be achieved if authors publish appropriate sets of metadata alongside their specific datasets. Here, we summarize the specific guidance provided in Wilkinson and colleagues (2016) for data to meet FAIR criteria:

(1)findable: indexing or archiving (meta)data with unique identifiers (e.g., digital object identifiers [DOIs]) at a searchable resource;
(2)accessible: (meta)data use an open standard for machine readability and are made permanently available.
(3)Interoperable: (meta)data use standard data vocabularies, in a formal, open, and broadly applicable language, and include references to connected data.
(4)reuseable: (meta)data are defined with relevant attributes for reuse such as a clearly defined license statement.

The data repository registry re3data now lists over 1,500 individual data repositories from multiple scientific fields ranging from general-purpose ones such as Figshare, Zenodo, and Dryad to subject specific ones such as GenBank for genetic sequence data, PANGAEA for Earth and Environmental Science, or the Interdisciplinary Earth Data Alliance.

We identify two key issues that often make inspection of IE research difficult: (1) Digital data are typically inadequately identified; and 2) data extraction is often more difficult than necessary. The requirements below are intended to address these two issues.

Minimum Publication Requirement 1: Data citation: All secondary data and databases used in the analysis must be cited in accordance with the journal's citation style. This information can include database version, database settings (e.g., allocation), date accessed, and DOI, if pertinent. This requirement both clarifies data sources and provides incentives for publication of reusable and citable data. Data may be cited in the main section of the paper or in the supporting information.

Minimum Publication Requirement 2: Enumerate primary results: The data that are represented in each graph or figure in an article must be published in numerical form, clearly referenced in the text, and labeled. For example, a simple spreadsheet containing the quantitative data shown in figures and tables in an article fulfills this requirement; such data can be provided in supporting information or in a publicly accessible repository. This requirement should facilitate the unambiguous inspection and usage of quantitative information contained in all key results presented as figures and graphs. The underlying quantitative data would become directly accessible, avoiding the need to visually estimate them from figures or manually copy them from tables and thus avoiding any uncertainties or errors introduced from this process. This requirement aims to facilitate increased citation, reuse, and meta-analyses of published work.

In all cases, the data supplied should be published in the supporting information or archived in a trusted repository, preferably an official repository which assigns DOIs, and cited accordingly in the original article. We expect practices in this regard to evolve as scientific publishing continues to address data transparency and accessibility.

Industrial ecologists: Make research results available! - International Society for Industrial Ecology - ISIE (is4ie.org)

Lifting Industrial Ecology Modeling to a New Level of Quality and Transparency

We introduce a Python toolbox for IE that includes the life cycle assessment (LCA) framework Brightway2, the ecospold2matrix module that parses unallocated data in ecospold format, the pySUT and pymrio modules for building and analyzing multiregion input-output models and supply and use tables, and the dynamic stock model class for dynamic stock modeling.

Input-Output

Specific software is needed for most I-O models because of the large number of system variables. Aside from play models for illustrative purposes, the number of sectors in the system typically lies between several hundred (single-region I-O) and 104 to 105 (for integrated hybrid models such as THEMIS or the Eora MRIO table [Hertwich et al. 2015; Lenzen et al. 2013]). I-O modeling requires many intermediate steps that require software, including balancing algorithms, trade linking tools for multiregion input-output (MRIO), constructs to build I-O models from supply and use tables (SUTs), and aggregation/disaggregation routines (Wood 2015; Lenzen et al. 2009; Majeau-Bettez et al. 2014b; Miller and Blair 2009). Data processing and analysis scripts, however, are not generally made public, and we only know of two exceptions: De Koning and colleagues (2015) and CIRAIG (2015). A recent comparative study of six major MRIO frameworks by Lutter and Giljum (2014, 7) finds a general lack of transparency: “Procedures for manipulating IO tables, e.g. for disaggregating existing tables or harmonizing IO tables from different national sources, [are] often not well documented.” This is a problem, given that these models are rapidly gaining relevance in climate and resource policy making.

Databases

Databases for IE are further developed than is the case for model software. Existing databases are accepted and widely used by the community; they are comprehensive and often well documented. Examples for such databases include ecoinvent for LCA (Frischknecht et al. 2005; Weidema et al. 2013); the Eora world MRIO model (Lenzen et al. 2013), IELab (Lenzen et al. 2014), or EXIOBASE (Wood et al. 2014) for I-O, as well as a global database of materials flows for MFA (SERI/WU 2014). Collaborative data frameworks have been proposed by several researchers (Davis et al. 2010; Lenzen et al. 2014).

Level of software development:

The understanding of the importance of good software is ubiquitous within IE, but this is rarely reflected in common practice of our field. No widely accepted, readily available implementations of many common computational routines exist. At present, most IE models are coded in spreadsheets or form monolithic blocks of software in various programming languages. They are developed as in-house projects for single-case studies. Often, the quality of documentation does not match the complexity of the code. In many cases, the code is difficult to reuse and is therefore abandoned.

Level of software openness:

To our knowledge, there is neither an established standard nor a vivid debate regarding the transparency and reproducibility of computations behind published quantitative research conducted under the label IE. A general lack of reproducibility may lower the scientific quality of the field as a whole, which, in the long run, can impede interaction with other scientific fields and the acquisition of research funding. Low levels of reproducibility and transparency exclude noninsiders from verifying the conclusions drawn, which can undermine the credibility of our research

Guidelines for Developing, Testing, and Documenting Software Tools in Industrial Ecology

The Python Toolbox for Industrial Ecology

Life Cycle Assessment (Brightway2 and Extensions) Brightway2 is a framework for LCA, covering everything from data I-O and processing to calculations and interpretation (Mutel 2014). The software itself is split into different modules, each with a specific focus and limited set of capabilities. In addition to the core components, extension modules provide user interfaces, regional and dynamic LCA, and data interfaces.

this module reorganizes the ecoinvent 3 database as a collection of matrices. It can notably: assemble the unallocated dataset in a supply and use table framework (see pySUT). perform basic quality checks on the reallocated data sets and arrange them as Leontief technical coefficient matrices with environmental extensions. optionally change sign conventions for waste flows and properties to align with the waste input-output (WIO) model (Nakamura and Kondo 2002).

Multiregional Input-Output Analysis (pymrio)

Approximately half a dozen environmentally extended MRIO tables were published over the last 2 years (Tukker and Dietzenbacher 2013). Most of these tables are freely available. Each model has its own file format, classification, and indexing; efficient handling and analyzing of MRIO models therefore requires a certain degree of training. The pymrio module (Stadler 2014a) allows for easy handling of global MRIO models. It provides a comprehensive, well-documented (Stadler 2014b) set of commands for manipulating and analyzing (MR)IO tables, including: Parsing global MRIO tables Modifying region and sector classification Restructuring extensions Calculating various accounts (footprint, territorial, impacts embodied in trade) Exporting to various formats (csv, html, MS Excel) Visualization and automated report generation.

Alan Fortuny Sicart

Search This Blog