message_ix_models.project.ssp.data.SSPOriginal

class message_ix_models.project.ssp.data.SSPOriginal(*args, **kwargs)[source]

Bases: SSPDataSource

Provider of exogenous data from the original SSP database.

This database is accessible from https://tntcat.iiasa.ac.at/SspDb/dsd.

To use data from this source:

  1. Read the general documentation for project.ssp.data.

  2. If necessary, obtain copy of the original data file(s).

  3. Call SSPOriginal.add_tasks() with keyword arguments corresponding to SSPDataSource.Options. In particular:

    • model should be one of:

      • IIASA GDP

      • IIASA-WiC POP

      • NCAR

      • OECD Env-Growth

      • PIK GDP-32

  • measure: The measures available differ according to the model; see the source data for details.

  • unit is not recognized/has no effect.

Example

>>> keys = SSPOriginal.add_tasks(
...     computer, context, ssp_id="3", measure="POP", model="IIASA-WiC POP",
... )
>>> result = computer.get(keys[0])
__init__(*args, **kwargs) None[source]

Create an instance and prepare info for transform()/get().

The base implementation:

  • Sets options—if not already set—by passing kwargs to Options.

  • Raises an exception if there are other/unhandled args or kwargs.

  • If key is not set, constructs it with:

    • Name name or measure in lower case.

    • Dimensions dims.

    Subclasses may pre-empt this behaviour by setting key statically or dynamically.

A concrete class implementation must:

  • Set options, either directly or by calling super().__init__() with or without keyword arguments.

  • Set key, either directly or by calling super().__init__(). In the latter case, it may set name, measure, and/or dims to control the behaviour.

  • Raise an exception if unrecognized or invalid kwargs are passed.

and may:

  • Transform kwargs or options arguments into other values, for instance by mapping certain values to others, applying regular expressions, or other operations.

  • Store those values as instance attributes for use in get().

  • Log messages that give information that helps to debug exceptions.

It must not perform any time- or memory-intensive operations, such as actually loading or fetching data. Those operations should be in get().

Methods

__init__(*args, **kwargs)

Create an instance and prepare info for transform()/get().

add_tasks(c, *args[, context, strict])

Add tasks to c to provide and transform the data.

get()

Return the data.

make_query(dim_case, model_scenario, unit)

Assemble and store a pandas.DataFrame.query() string.

transform(c, base_key)

Add tasks to c to transform raw data from base_key.

Attributes

filename

Name of file containing the data.

model_date

One-to-one correspondence between "model" codes and date fragments in scenario codes.

replace

Replacements to apply when loading the data.

unique

unique argument to iamc.to_quantity().

use_test_data

True to allow the class to look up and use test data.

variable

Alias from short measure IDs to IAMC 'variable'.

where

where argument to path_fallback().

options

Instance of the Options class.

key

Key for the returned Quantity.

class Options(aggregate: bool = True, interpolate: bool = True, measure: str = '', name: str = '', dims: tuple[str, ...] = ('n', 'y'), model: str = '', source: str = '', ssp_id: str = '')

Bases: BaseOptions

aggregate: bool = True

True if ExoDataSource.transform() should aggregate data on the \(n\) dimension.

dims: tuple[str, ...] = ('n', 'y')

Dimensions for the returned Key/Quantity.

classmethod from_args(source_id: str | ExoDataSource, *args, **kwargs)

Construct an instance from keyword arguments.

Parameters:

source_id – For backwards-compatibility with prepare_computer().

handle_source(prefix: str) None

Check that source starts with prefix; update ssp_id.

interpolate: bool = True

True if ExoDataSource.transform() should interpolate data on the \(y\) dimension.

measure: str = ''

Identifier for the primary measure of retrieved/returned data.

model: str = ''

Model name.

name: str = ''

Name for the returned Key/Quantity.

source: str = ''

SSP(2017).1”. ssp_id should be preferred.

Type:

Partial URN for a code in the SSP code list, e.g. “ICONICS

ssp_id: str = ''

Short ID of the SSP code, e.g. “1”.

classmethod add_tasks(c: Computer, *args, context: Context | None = None, strict: bool = True, **kwargs) tuple

Add tasks to c to provide and transform the data.

The first returned key is key, and will trigger the following tasks:

  1. Load or retrieve data by invoking ExoDataSource.get().

  2. If BaseOptions.aggregate is True, aggregate on the \(n\) (node) dimension according to Config.regions.

  3. If BaseOptions.interpolate is True, interpolate on the \(y\) (year) dimension according to Config.years.

Steps (2) and (3) are added by transform() and may differ in concrete classes.

Other returned keys include further transformations:

  • key + "y0_indexed": same as key, but indexed to the values as of the first model period.

Other keys that are created but not returned can be accessed on c:

  • key + "message_ix_models.foo.bar.CLASS": the raw data, with a tag from the fully-qualified name of the ExoDataSource class.

To support the loading and transformation of data, add_structure() is first called with c.

Todo

Add option/tasks to index to a particular label on the \(n\) dimension.

Parameters:
  • context – Passed to add_structure().

  • strict – Passed to add_structure().

Return type:

tuple of Key

filename = 'SspDb_country_data_2013-06-12.csv.zip'

Name of file containing the data.

get()

Return the data.

Implementations in concrete classes may load data from file, retrieve from remote sources or local caches, generate data, or anything else.

The Quantity returned by this method must have dimensions corresponding to key. If the original/upstream/raw data has different dimensionality (fewer or more dimensions; different dimension IDs), a concrete class must transform these, make appropriate selections, etc.

key: Key

Key for the returned Quantity. This may either be set statically on a concrete subclass, or created via __init__().

make_query(dim_case: Callable[[str], str], model_scenario: Iterable[tuple[str, str]], unit: str) None

Assemble and store a pandas.DataFrame.query() string.

Parameters:
  • dim_case – Function to apply to IAMC dimension IDs, for instance str.upper() to use “MODEL”.

  • model_scenario – Iterable of (model_name, scenario_name) pairs. model_name may be an empty string.

  • unit – Units. May be an empty string.

model_date = {'IIASA GDP': '130219', 'IIASA-WiC POP': '130115', 'NCAR': '130115', 'OECD Env-Growth': '130325', 'PIK GDP-32': '130424'}

One-to-one correspondence between “model” codes and date fragments in scenario codes.

options: Options

Instance of the Options class.

A concrete class that overrides Options should redefine this attribute, to facilitate type checking.

replace: dict[str, str | dict[str, str]] = {'billion US$2005/yr': 'billion USD_2005/yr'}

Replacements to apply when loading the data.

transform(c: Computer, base_key: Key) Key

Add tasks to c to transform raw data from base_key.

base_key refers to the Quantity returned by get(). Via add_tasks(), transform() adds additional tasks to c that further transform the data. (Such operations may be done in get() directly, but transform() allows use of genno operators and conveniences.)

In the default implementation:

  1. If aggregate is True, aggregate the data ( genno.operator.aggregate()) on the \(n\) dimension using the key “n::groups”.

  2. If interpolate is True, interpolate the data ( genno.operator.interpolate()) on the \(y\) dimension using “y::coords”.

Concrete classes may override this method to, for instance, change how aggregate and interpolate are handled, or add further steps. Such overrides may call the base implementation, or not.

Returns:

referring to the data from base_key after any transformation. This may be the same as base_key.

Return type:

Key

unique: str = 'MODEL SCENARIO VARIABLE UNIT'

unique argument to iamc.to_quantity().

use_test_data: bool = False

True to allow the class to look up and use test data. If no test data exists, this setting has no effect. See _where().

variable = {'GDP': 'GDP|PPP', 'POP': 'Population'}

Alias from short measure IDs to IAMC ‘variable’. See make_query().

where: list['str | Path'] = ['local', 'package', 'private']

where argument to path_fallback(). In order:

  1. Currently data is stored in message-static-data, cloned and linked from within the user’s ‘local’ data directory.

  2. Previously some files were stored directly within message_ix_models (available in an editable install from a clone of the git repository, ‘package’) or in message_data (‘private’). These settings are only provided for backward compatibility.

Fuzzed/random test data (‘test’) is also available, but not enabled by default.