Python API¶
Class IamDataFrame¶
-
class
pyam.
IamDataFrame
(data, **kwargs)[source]¶ This class is a wrapper for dataframes following the IAMC format. It provides a number of diagnostic features (including validation of data, completeness of variables provided) as well as a number of visualization and plotting tools.
Methods
-
append
(other, ignore_meta_conflict=False, inplace=False, **kwargs)[source]¶ Append any castable object to this IamDataFrame. Columns in other.meta that are not in self.meta are always merged, duplicate region-variable-unit-year rows raise a ValueError.
Parameters: other: pyam.IamDataFrame, ixmp.TimeSeries, ixmp.Scenario,
pd.DataFrame or data file
An IamDataFrame, TimeSeries or Scenario (requires ixmp), pandas.DataFrame or data file with IAMC-format data columns
ignore_meta_conflict : bool, default False
If False and other is an IamDataFrame, raise an error if any meta columns present in self and other are not identical.
inplace : bool, default False
If True, do operation inplace and return None
-
as_pandas
(with_metadata=False)[source]¶ Return this as a pd.DataFrame
Parameters: with_metadata : bool, default False
if True, join data with existing metadata
-
bar_plot
(*args, **kwargs)[source]¶ Plot timeseries bars of existing data
see pyam.plotting.bar_plot() for all available options
-
categorize
(name, value, criteria, color=None, marker=None, linestyle=None)[source]¶ Assign scenarios to a category according to specific criteria or display the category assignment
Parameters: name: str
category column name
value: str
category identifier
criteria: dict
dictionary with variables mapped to applicable checks (‘up’ and ‘lo’ for respective bounds, ‘year’ for years - optional)
color: str
assign a color to this category for plotting
marker: str
assign a marker to this category for plotting
linestyle: str
assign a linestyle to this category for plotting
-
check_aggregate
(variable, components=None, units=None, exclude_on_fail=False, multiplier=1, **kwargs)[source]¶ Check whether the timeseries data match the aggregation of components or sub-categories
Parameters: variable: str
variable to be checked for matching aggregation of sub-categories
components: list of str, default None
list of variables, defaults to all sub-categories of variable
units: str or list of str, default None
filter variable and components for given unit(s)
exclude_on_fail: boolean, default False
flag scenarios failing validation as exclude: True
multiplier: number, default 1
factor when comparing variable and sum of components
kwargs: passed to `np.isclose()`
-
check_aggregate_regions
(variable, region='World', components=None, units=None, exclude_on_fail=False, **kwargs)[source]¶ Check whether the region timeseries data match the aggregation of components
Parameters: variable: str
variable to be checked for matching aggregation of components data
region: str
region to be checked for matching aggregation of components data
components: list of str, default None
list of regions, defaults to all regions except region
units: str or list of str, default None
filter variable and components for given unit(s)
exclude_on_fail: boolean, default False
flag scenarios failing validation as exclude: True
kwargs: passed to `np.isclose()`
-
check_internal_consistency
(**kwargs)[source]¶ Check whether the database is internally consistent
We check that all variables are equal to the sum of their sectoral components and that all the regions add up to the World total. If the check is passed, None is returned, otherwise a dictionary of inconsistent variables is returned.
Note: at the moment, this method’s regional checking is limited to checking that all the regions sum to the World region. We cannot make this more automatic unless we start to store how the regions relate, see [this issue](https://github.com/IAMconsortium/pyam/issues/106).
Parameters: kwargs: passed to `np.isclose()`
-
col_apply
(col, func, *args, **kwargs)[source]¶ Apply a function to a column
Parameters: col: string
column in either data or metadata
func: functional
function to apply
-
convert_unit
(conversion_mapping, inplace=False)[source]¶ Converts units based on provided unit conversion factors
Parameters: conversion_mapping: dict
for each unit for which a conversion should be carried out, provide current unit and target unit and conversion factor {<current unit>: [<target unit>, <conversion factor>]}
inplace: bool, default False
if True, do operation inplace and return None
-
export_metadata
(path)[source]¶ Export metadata to Excel
Parameters: path: string
path/filename for xlsx file of metadata export
-
filter
(filters=None, keep=True, inplace=False, **kwargs)[source]¶ Return a filtered IamDataFrame (i.e., a subset of current data)
Parameters: keep: bool, default True
keep all scenarios satisfying the filters (if True) or the inverse
inplace: bool, default False
if True, do operation inplace and return None
filters by kwargs or dict (deprecated):
- The following columns are available for filtering:
- metadata columns: filter by category assignment in metadata
- ‘model’, ‘scenario’, ‘region’, ‘variable’, ‘unit’:
string or list of strings, where
*
can be used as a wildcard - ‘level’: the maximum “depth” of IAM variables (number of ‘|’) (exluding the strings given in the ‘variable’ argument)
- ‘year’: takes an integer, a list of integers or a range
- note that the last year of a range is not included,
so
range(2010,2015)
is interpreted as[2010, ..., 2014]
- ‘regexp=True’ overrides pseudo-regexp syntax in pattern_match()
-
interpolate
(year)[source]¶ Interpolate missing values in timeseries (linear interpolation)
Parameters: year: int
year to be interpolated
-
line_plot
(x='year', y='value', **kwargs)[source]¶ Plot timeseries lines of existing data
see pyam.plotting.line_plot() for all available options
-
load_metadata
(path, *args, **kwargs)[source]¶ Load metadata exported from pyam.IamDataFrame instance
Parameters: path: string
xlsx file with metadata exported from pyam.IamDataFrame instance
-
map_regions
(map_col, agg=None, copy_col=None, fname=None, region_col=None, remove_duplicates=False, inplace=False)[source]¶ Plot regional data for a single model, scenario, variable, and year
see pyam.plotting.region_plot() for all available options
Parameters: map_col: string
The column used to map new regions to. Common examples include iso and 5_region.
agg: string, optional
Perform a data aggregation. Options include: sum.
copy_col: string, optional
Copy the existing region data into a new column for later use.
fname: string, optional
Use a non-default region mapping file
region_col: string, optional
Use a non-default column name for regions to map from.
remove_duplicates: bool, optional, default: False
If there are duplicates in the mapping from one regional level to another, then remove these duplicates by counting the most common mapped value. This option is most useful when mapping from high resolution (e.g., model regions) to low resolution (e.g., 5_region).
inplace : bool, default False
if True, do operation inplace and return None
-
pie_plot
(*args, **kwargs)[source]¶ Plot a pie chart
see pyam.plotting.pie_plot() for all available options
-
pivot_table
(index, columns, values='value', aggfunc='count', fill_value=None, style=None)[source]¶ Returns a pivot table
Parameters: index: str or list of strings
rows for Pivot table
columns: str or list of strings
columns for Pivot table
values: str, default ‘value’
dataframe column to aggregate or count
aggfunc: str or function, default ‘count’
function used for aggregation, accepts ‘count’, ‘mean’, and ‘sum’
fill_value: scalar, default None
value to replace missing values with
style: str, default None
output style for pivot table formatting accepts ‘highlight_not_max’, ‘heatmap’
-
region_plot
(**kwargs)[source]¶ Plot regional data for a single model, scenario, variable, and year
see pyam.plotting.region_plot() for all available options
-
rename
(mapping, inplace=False)[source]¶ Rename and aggregate column entries using groupby.sum() on values. When renaming models or scenarios, the uniqueness of the index must be maintained, and the function will raise an error otherwise.
Parameters: mapping: dict
for each column where entries should be renamed, provide current name and target name {<column name>: {<current_name_1>: <target_name_1>,
<current_name_2>: <target_name_2>}}
inplace: bool, default False
if True, do operation inplace and return None
-
require_variable
(variable, unit=None, year=None, exclude_on_fail=False)[source]¶ Check whether all scenarios have a required variable
Parameters: variable: str
required variable
unit: str, default None
name of unit (optional)
years: int or list, default None
years (optional)
exclude: bool, default False
flag scenarios missing the required variables as exclude: True
-
scatter
(x, y, **kwargs)[source]¶ Plot a scatter chart using metadata columns
see pyam.plotting.scatter() for all available options
-
set_meta
(meta, name=None, index=None)[source]¶ Add metadata columns as pd.Series, list or value (int/float/str)
Parameters: meta: pd.Series, list, int, float or str
column to be added to metadata (by [‘model’, ‘scenario’] index if possible)
name: str, optional
meta column name (defaults to meta pd.Series.name); either a meta.name or the name kwarg must be defined
index: pyam.IamDataFrame, pd.DataFrame or pd.MultiIndex, optional
index to be used for setting meta column ([‘model’, ‘scenario’])
-
stack_plot
(*args, **kwargs)[source]¶ Plot timeseries stacks of existing data
see pyam.plotting.stack_plot() for all available options
-
to_csv
(path, index=False, **kwargs)[source]¶ Write data to a csv file
Parameters: index: boolean, default False
write row names (index)
-
to_excel
(path=None, writer=None, sheet_name='data', index=False, **kwargs)[source]¶ Write timeseries data to Excel using the IAMC template convention (wrapper for pd.DataFrame.to_excel())
Parameters: excel_writer: string or ExcelWriter object
file path or existing ExcelWriter
sheet_name: string, default ‘data’
name of the sheet that will contain the (filtered) IamDataFrame
index: boolean, default False
write row names (index)
-
validate
(criteria={}, exclude_on_fail=False)[source]¶ Validate scenarios using criteria on timeseries values
Parameters: criteria: dict
- dictionary with variable keys and check values
(‘up’ and ‘lo’ for respective bounds, ‘year’ for years)
exclude_on_fail: bool, default False
flag scenarios failing validation as exclude: True
-
Useful pyam functions¶
-
pyam.
filter_by_meta
(data, df, join_meta=False, **kwargs)[source]¶ Filter by and join meta columns from an IamDataFrame to a pd.DataFrame
Parameters: data: pd.DataFrame instance
DataFrame to which meta columns are to be joined, index or columns must include [‘model’, ‘scenario’]
df: IamDataFrame instance
IamDataFrame from which meta columns are filtered and joined (optional)
join_meta: bool, default False
join selected columns from df.meta on data
kwargs:
meta columns to be filtered/joined, where col=… applies filters by the given arguments (using utils.pattern_match()) and col=None joins the column without filtering (setting col to np.nan if (model, scenario) not in df.meta.index)
-
pyam.
cumulative
(x, first_year, last_year)[source]¶ Returns the cumulative sum of a timeseries (indexed over years), implements linear interpolation between years, ignores nan’s in the range. The function includes the last-year value of the series, and raises a warning if start_year or last_year is outside of the timeseries range and returns nan
Parameters: x: pandas.Series
a timeseries to be summed over time
first_year: int
first year of the sum
last_year: int
last year of the sum (inclusive)
Class Statistics¶
This class provides a wrapper for generating descriptive summary statistics for timeseries data using various groupbys or filters. It uses the pandas.describe() function internally and hides the tedious work of filters, groupbys and merging of dataframes.
-
class
pyam.
Statistics
(df, groupby=None, filters=None, rows=False, percentiles=[0.25, 0.5, 0.75])[source]¶ This class provides a wrapper for descriptive statistics of IAMC-style timeseries data.
Parameters: df: pyam.IamDataFrame
an IamDataFrame from which to retrieve metadata for grouping, filtering
groupby: str or dict
a column of df.meta to be used for the groupby feature, or a dictionary of {column: list}, where list is used for ordering
filters: list of tuples
arguments for filtering and describing, either ((index, dict) or ((index[0], index[1]), dict), when also using groupby, index must haev length 2.
percentiles: list-like of numbers, optional
The percentiles to include in the output of pandas.describe(). All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
Methods
-
add
(data, header, row=None, subheader=None)[source]¶ Filter data by arguments of this SummaryStats instance, then apply pd.describe() and format the statistics
Parameters: data : pd.DataFrame or pd.Series
data for which summary statistics should be computed
header : str
column name for descriptive statistics
row : str
row name for descriptive statistics (required if pyam.Statistics(rows=True))
subheader : str, optional
column name (level=1) if data is a unnamed pd.Series
-