Energy Score
The energy score (ES) is a scoring rule for evaluating multivariate probabilistic forecasts. It is defined as
where \(\mathbf{y} \in \mathbb{R}^{d}\) is the multivariate observation (\(d > 1\)), and \(\mathbf{X}\) and \(\mathbf{X}^{\prime}\) are independent random variables that follow the multivariate forecast distribution \(F\) (Gneiting and Raftery, 2007)1. If the dimension \(d\) were equal to one, the energy score would reduce to the continuous ranked probability score (CRPS).
While multivariate probabilistic forecasts could belong to a parametric family of distributions, such as a multivariate normal distribution, it is more common in practice that these forecasts are ensemble forecasts; that is, the forecast is comprised of a predictive sample \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\), where each ensemble member \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M} \in \R^{d}\).
In this case, the expectations in the definition of the energy score can be replaced by sample means over the ensemble members, yielding the following representation of the energy score when evaluating an ensemble forecast \(F_{ens}\) with \(M\) members.
scoringrules.energy_score
energy_score(
observations: Array,
forecasts: Array,
/,
m_axis: int = -2,
v_axis: int = -1,
*,
backend: Backend = None,
) -> Array
Compute the Energy Score for a finite multivariate ensemble.
The Energy Score is a multivariate scoring rule expressed as
where \(||\cdot||\) is the euclidean norm over the input dimensions (the variables).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations
|
Array
|
The observed values, where the variables dimension is by default the last axis. |
required |
forecasts
|
Array
|
The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis. |
required |
m_axis
|
int
|
The axis corresponding to the ensemble dimension on the forecasts array. Defaults to -2. |
-2
|
v_axis
|
int
|
The axis corresponding to the variables dimension on the forecasts array (or the observations
array with an extra dimension on |
-1
|
backend
|
Backend
|
The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'. |
None
|
Returns:
Name | Type | Description |
---|---|---|
energy_score |
Array of shape (...)
|
The computed Energy Score. |
Weighted versions
The energy score provides a measure of overall forecast performance. However, it is often
the case that certain outcomes are of more interest than others, making it desirable to
assign more weight to these outcomes when evaluating forecast performance. This can be
achieved using weighted scoring rules. Weighted scoring rules typically introduce a
weight function into conventional scoring rules, and users can choose the weight function
depending on what outcomes they want to emphasise. Allen et al. (2022)2
discuss three weighted versions of the energy score. These are all available in scoringrules
.
Firstly, the outcome-weighted energy score (originally introduced by Holzmann and Klar (2014)3) is defined as
where \(w : \mathbb{R}^{d} \to [0, \infty)\) is the non-negative weight function used to target particular multivariate outcomes, and \(\bar{w} = \mathbb{E}[w(X)]\). As before, \(\mathbf{X}, \mathbf{X}^{\prime} \sim F\) are independent.
scoringrules.owenergy_score
owenergy_score(
observations: Array,
forecasts: Array,
w_func: tp.Callable[[ArrayLike], ArrayLike],
/,
m_axis: int = -2,
v_axis: int = -1,
*,
backend: Backend = None,
) -> Array
Compute the Outcome-Weighted Energy Score (owES) for a finite multivariate ensemble.
Computation is performed using the ensemble representation of the owES in Allen et al. (2022):
where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, \(\| \cdotp \|\) is the Euclidean distance, \(w\) is the chosen weight function, and \(\bar{w} = \sum_{m=1}^{M}w(\mathbf{x}_{m})/M\).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations
|
Array
|
The observed values, where the variables dimension is by default the last axis. |
required |
forecasts
|
Array
|
The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis. |
required |
w_func
|
Callable[[ArrayLike], ArrayLike]
|
Weight function used to emphasise particular outcomes. |
required |
m_axis
|
int
|
The axis corresponding to the ensemble dimension. Defaults to -2. |
-2
|
v_axis
|
int
|
The axis corresponding to the variables dimension. Defaults to -1. |
-1
|
backend
|
Backend
|
The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'. |
None
|
Returns:
Name | Type | Description |
---|---|---|
owenergy_score |
ArrayLike of shape (...)
|
The computed Outcome-Weighted Energy Score. |
Secondly, Allen et al. (2022) introduced the threshold-weighted energy score as
where \(v : \mathbb{R}^{d} \to \mathbb{R}^{d}\) is a so-called chaining function. The threshold-weighted energy score transforms the forecasts and observations according to the chaining function \(v\), prior to calculating the unweighted energy score. Choosing a chaining function is generally more difficult than choosing a weight function when emphasising particular outcomes.
scoringrules.twenergy_score
twenergy_score(
observations: Array,
forecasts: Array,
v_func: tp.Callable[[ArrayLike], ArrayLike],
/,
m_axis: int = -2,
v_axis: int = -1,
*,
backend: Backend = None,
) -> Array
Compute the Threshold-Weighted Energy Score (twES) for a finite multivariate ensemble.
Computation is performed using the ensemble representation of the twES in Allen et al. (2022):
where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, \(\| \cdotp \|\) is the Euclidean distance, and \(v\) is the chaining function used to target particular outcomes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations
|
Array
|
The observed values, where the variables dimension is by default the last axis. |
required |
forecasts
|
Array
|
The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis. |
required |
v_func
|
Callable[[ArrayLike], ArrayLike]
|
Chaining function used to emphasise particular outcomes. |
required |
m_axis
|
int
|
The axis corresponding to the ensemble dimension. Defaults to -2. |
-2
|
v_axis
|
int
|
The axis corresponding to the variables dimension. Defaults to -1. |
-1
|
backend
|
Backend
|
The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'. |
None
|
Returns:
Name | Type | Description |
---|---|---|
twenergy_score |
ArrayLike of shape (...)
|
The computed Threshold-Weighted Energy Score. |
As an alternative, the vertically re-scaled energy score is defined as
where \(w : \mathbb{R}^{d} \to [0, \infty)\) is the non-negative weight function used to target particular multivariate outcomes, and \(\mathbf{x}_{0} \in \mathbb{R}^{d}\). Typically, \(\mathbf{x}_{0}\) is chosen to be zero.
scoringrules.vrenergy_score
vrenergy_score(
observations: Array,
forecasts: Array,
w_func: tp.Callable[[ArrayLike], ArrayLike],
/,
*,
m_axis: int = -2,
v_axis: int = -1,
backend: Backend = None,
) -> Array
Compute the Vertically Re-scaled Energy Score (vrES) for a finite multivariate ensemble.
Computation is performed using the ensemble representation of the vrES in Allen et al. (2022):
where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, and \(w\) is the weight function used to target particular outcomes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations
|
Array
|
The observed values, where the variables dimension is by default the last axis. |
required |
forecasts
|
Array
|
The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis. |
required |
w_func
|
Callable[[ArrayLike], ArrayLike]
|
Weight function used to emphasise particular outcomes. |
required |
m_axis
|
int
|
The axis corresponding to the ensemble dimension. Defaults to -2. |
-2
|
v_axis
|
int
|
The axis corresponding to the variables dimension. Defaults to -1. |
-1
|
backend
|
Backend
|
The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'. |
None
|
Returns:
Name | Type | Description |
---|---|---|
vrenergy_score |
ArrayLike of shape (...)
|
The computed Vertically Re-scaled Energy Score. |
Each of these weighted energy scores targets particular outcomes in a different way. Further details regarding the differences between these scoring rules, as well as choices for the weight and chaining functions, can be found in Allen et al. (2022). The weighted energy scores can easily be computed for ensemble forecasts by replacing the expectations with sample means over the ensemble members.
-
Tilmann Gneiting and Adrian E Raftery. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 2007. URL: https://doi.org/10.1198/016214506000001437, doi:10.1198/016214506000001437. ↩
-
Sam Allen, David Ginsbourger, and Johanna Ziegel. Evaluating forecasts for high-impact events using transformed kernel scores. arXiv preprint arXiv:2202.12732, 2022. ↩
-
Hajo Holzmann and Bernhard Klar. Focusing on regions of interest in forecast evaluation. The Annals of Applied Statistics, 11:2404–2431, 2017. ↩