Python API

Class IamDataFrame

class pyam.IamDataFrame(data, **kwargs)[source]

This class is a wrapper for dataframes following the IAMC format. It provides a number of diagnostic features (including validation of data, completeness of variables provided) as well as a number of visualization and plotting tools.

Methods

append(other, ignore_meta_conflict=False, inplace=False, **kwargs)[source]

Append any castable object to this IamDataFrame. Columns in other.meta that are not in self.meta are always merged, duplicate region-variable-unit-year rows raise a ValueError.

Parameters:

other: pyam.IamDataFrame, ixmp.TimeSeries, ixmp.Scenario,

pd.DataFrame or data file

An IamDataFrame, TimeSeries or Scenario (requires ixmp), pandas.DataFrame or data file with IAMC-format data columns

ignore_meta_conflict : bool, default False

If False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.

inplace : bool, default False

If True, do operation inplace and return None

as_pandas(with_metadata=False)[source]

Return this as a pd.DataFrame

Parameters:

with_metadata : bool, default False

if True, join data with existing metadata

bar_plot(*args, **kwargs)[source]

Plot timeseries bars of existing data

see pyam.plotting.bar_plot() for all available options

categorize(name, value, criteria, color=None, marker=None, linestyle=None)[source]

Assign scenarios to a category according to specific criteria or display the category assignment

Parameters:

name: str

category column name

value: str

category identifier

criteria: dict

dictionary with variables mapped to applicable checks (‘up’ and ‘lo’ for respective bounds, ‘year’ for years - optional)

color: str

assign a color to this category for plotting

marker: str

assign a marker to this category for plotting

linestyle: str

assign a linestyle to this category for plotting

check_aggregate(variable, components=None, units=None, exclude_on_fail=False, multiplier=1, **kwargs)[source]

Check whether the timeseries data match the aggregation of components or sub-categories

Parameters:

variable: str

variable to be checked for matching aggregation of sub-categories

components: list of str, default None

list of variables, defaults to all sub-categories of variable

units: str or list of str, default None

filter variable and components for given unit(s)

exclude_on_fail: boolean, default False

flag scenarios failing validation as exclude: True

multiplier: number, default 1

factor when comparing variable and sum of components

kwargs: passed to `np.isclose()`

check_aggregate_regions(variable, region='World', components=None, units=None, exclude_on_fail=False, **kwargs)[source]

Check whether the region timeseries data match the aggregation of components

Parameters:

variable: str

variable to be checked for matching aggregation of components data

region: str

region to be checked for matching aggregation of components data

components: list of str, default None

list of regions, defaults to all regions except region

units: str or list of str, default None

filter variable and components for given unit(s)

exclude_on_fail: boolean, default False

flag scenarios failing validation as exclude: True

kwargs: passed to `np.isclose()`

check_internal_consistency(**kwargs)[source]

Check whether the database is internally consistent

We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a dictionary of inconsistent variables is returned.

Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we start to store how the regions relate, see [this issue](https://github.com/IAMconsortium/pyam/issues/106).

Parameters:kwargs: passed to `np.isclose()`
col_apply(col, func, *args, **kwargs)[source]

Apply a function to a column

Parameters:

col: string

column in either data or metadata

func: functional

function to apply

convert_unit(conversion_mapping, inplace=False)[source]

Converts units based on provided unit conversion factors

Parameters:

conversion_mapping: dict

for each unit for which a conversion should be carried out, provide current unit and target unit and conversion factor {<current unit>: [<target unit>, <conversion factor>]}

inplace: bool, default False

if True, do operation inplace and return None

export_metadata(path)[source]

Export metadata to Excel

Parameters:

path: string

path/filename for xlsx file of metadata export

filter(filters=None, keep=True, inplace=False, **kwargs)[source]

Return a filtered IamDataFrame (i.e., a subset of current data)

Parameters:

keep: bool, default True

keep all scenarios satisfying the filters (if True) or the inverse

inplace: bool, default False

if True, do operation inplace and return None

filters by kwargs or dict (deprecated):

The following columns are available for filtering:
  • metadata columns: filter by category assignment in metadata
  • ‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’: string or list of strings, where * can be used as a wildcard
  • ‘level’: the maximum “depth” of IAM variables (number of ‘|’) (exluding the strings given in the ‘variable’ argument)
  • ‘year’: takes an integer, a list of integers or a range
    note that the last year of a range is not included, so range(2010,2015) is interpreted as [2010, ..., 2014]
  • ‘regexp=True’ overrides pseudo-regexp syntax in pattern_match()
head(*args, **kwargs)[source]

Identical to pd.DataFrame.head() operating on data

interpolate(year)[source]

Interpolate missing values in timeseries (linear interpolation)

Parameters:

year: int

year to be interpolated

line_plot(x='year', y='value', **kwargs)[source]

Plot timeseries lines of existing data

see pyam.plotting.line_plot() for all available options

load_metadata(path, *args, **kwargs)[source]

Load metadata exported from pyam.IamDataFrame instance

Parameters:

path: string

xlsx file with metadata exported from pyam.IamDataFrame instance

map_regions(map_col, agg=None, copy_col=None, fname=None, region_col=None, remove_duplicates=False, inplace=False)[source]

Plot regional data for a single model, scenario, variable, and year

see pyam.plotting.region_plot() for all available options

Parameters:

map_col: string

The column used to map new regions to. Common examples include iso and 5_region.

agg: string, optional

Perform a data aggregation. Options include: sum.

copy_col: string, optional

Copy the existing region data into a new column for later use.

fname: string, optional

Use a non-default region mapping file

region_col: string, optional

Use a non-default column name for regions to map from.

remove_duplicates: bool, optional, default: False

If there are duplicates in the mapping from one regional level to another, then remove these duplicates by counting the most common mapped value. This option is most useful when mapping from high resolution (e.g., model regions) to low resolution (e.g., 5_region).

inplace : bool, default False

if True, do operation inplace and return None

models()[source]

Get a list of models

pie_plot(*args, **kwargs)[source]

Plot a pie chart

see pyam.plotting.pie_plot() for all available options

pivot_table(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]

Returns a pivot table

Parameters:

index: str or list of strings

rows for Pivot table

columns: str or list of strings

columns for Pivot table

values: str, default ‘value’

dataframe column to aggregate or count

aggfunc: str or function, default ‘count’

function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’

fill_value: scalar, default None

value to replace missing values with

style: str, default None

output style for pivot table formatting accepts ‘highlight_not_max’, ‘heatmap’

region_plot(**kwargs)[source]

Plot regional data for a single model, scenario, variable, and year

see pyam.plotting.region_plot() for all available options

regions()[source]

Get a list of regions

rename(mapping, inplace=False)[source]

Rename and aggregate column entries using groupby.sum() on values. When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.

Parameters:

mapping: dict

for each column where entries should be renamed, provide current name and target name {<column name>: {<current_name_1>: <target_name_1>,

<current_name_2>: <target_name_2>}}

inplace: bool, default False

if True, do operation inplace and return None

require_variable(variable, unit=None, year=None, exclude_on_fail=False)[source]

Check whether all scenarios have a required variable

Parameters:

variable: str

required variable

unit: str, default None

name of unit (optional)

years: int or list, default None

years (optional)

exclude: bool, default False

flag scenarios missing the required variables as exclude: True

reset_exclude()[source]

Reset exclusion assignment for all scenarios to exclude: False

scatter(x, y, **kwargs)[source]

Plot a scatter chart using metadata columns

see pyam.plotting.scatter() for all available options

scenarios()[source]

Get a list of scenarios

set_meta(meta, name=None, index=None)[source]

Add metadata columns as pd.Series, list or value (int/float/str)

Parameters:

meta: pd.Series, list, int, float or str

column to be added to metadata (by [‘model’, ‘scenario’] index if possible)

name: str, optional

meta column name (defaults to meta pd.Series.name); either a meta.name or the name kwarg must be defined

index: pyam.IamDataFrame, pd.DataFrame or pd.MultiIndex, optional

index to be used for setting meta column ([‘model’, ‘scenario’])

stack_plot(*args, **kwargs)[source]

Plot timeseries stacks of existing data

see pyam.plotting.stack_plot() for all available options

tail(*args, **kwargs)[source]

Identical to pd.DataFrame.tail() operating on data

timeseries()[source]

Returns a dataframe in the standard IAMC format

to_csv(path, index=False, **kwargs)[source]

Write data to a csv file

Parameters:

index: boolean, default False

write row names (index)

to_excel(path=None, writer=None, sheet_name='data', index=False, **kwargs)[source]

Write timeseries data to Excel using the IAMC template convention (wrapper for pd.DataFrame.to_excel())

Parameters:

excel_writer: string or ExcelWriter object

file path or existing ExcelWriter

sheet_name: string, default ‘data’

name of the sheet that will contain the (filtered) IamDataFrame

index: boolean, default False

write row names (index)

validate(criteria={}, exclude_on_fail=False)[source]

Validate scenarios using criteria on timeseries values

Parameters:

criteria: dict

dictionary with variable keys and check values

(‘up’ and ‘lo’ for respective bounds, ‘year’ for years)

exclude_on_fail: bool, default False

flag scenarios failing validation as exclude: True

variables(include_units=False)[source]

Get a list of variables

Parameters:

include_units: boolean, default False

include the units

Useful pyam functions

pyam.filter_by_meta(data, df, join_meta=False, **kwargs)[source]

Filter by and join meta columns from an IamDataFrame to a pd.DataFrame

Parameters:

data: pd.DataFrame instance

DataFrame to which meta columns are to be joined, index or columns must include [‘model’, ‘scenario’]

df: IamDataFrame instance

IamDataFrame from which meta columns are filtered and joined (optional)

join_meta: bool, default False

join selected columns from df.meta on data

kwargs:

meta columns to be filtered/joined, where col=… applies filters by the given arguments (using utils.pattern_match()) and col=None joins the column without filtering (setting col to np.nan if (model, scenario) not in df.meta.index)

pyam.cumulative(x, first_year, last_year)[source]

Returns the cumulative sum of a timeseries (indexed over years), implements linear interpolation between years, ignores nan’s in the range. The function includes the last-year value of the series, and raises a warning if start_year or last_year is outside of the timeseries range and returns nan

Parameters:

x: pandas.Series

a timeseries to be summed over time

first_year: int

first year of the sum

last_year: int

last year of the sum (inclusive)

pyam.fill_series(x, year)[source]

Returns the value of a timeseries (indexed over years) for a year by linear interpolation.

Parameters:

x: pandas.Series

a timeseries to be interpolated

year: int

year of interpolation

Class Statistics

This class provides a wrapper for generating descriptive summary statistics for timeseries data using various groupbys or filters. It uses the pandas.describe() function internally and hides the tedious work of filters, groupbys and merging of dataframes.

class pyam.Statistics(df, groupby=None, filters=None, rows=False, percentiles=[0.25, 0.5, 0.75])[source]

This class provides a wrapper for descriptive statistics of IAMC-style timeseries data.

Parameters:

df: pyam.IamDataFrame

an IamDataFrame from which to retrieve metadata for grouping, filtering

groupby: str or dict

a column of df.meta to be used for the groupby feature, or a dictionary of {column: list}, where list is used for ordering

filters: list of tuples

arguments for filtering and describing, either ((index, dict) or ((index[0], index[1]), dict), when also using groupby, index must haev length 2.

percentiles: list-like of numbers, optional

The percentiles to include in the output of pandas.describe(). All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.

Methods

add(data, header, row=None, subheader=None)[source]

Filter data by arguments of this SummaryStats instance, then apply pd.describe() and format the statistics

Parameters:

data : pd.DataFrame or pd.Series

data for which summary statistics should be computed

header : str

column name for descriptive statistics

row : str

row name for descriptive statistics (required if pyam.Statistics(rows=True))

subheader : str, optional

column name (level=1) if data is a unnamed pd.Series

reindex(copy=True)[source]

Reindex the summary statistics dataframe

summarize(center='mean', fullrange=None, interquartile=None, custom_format='{:.2f}')[source]

Format the compiled statistics to a concise string output