Energy Score

The energy score (ES) is a scoring rule for evaluating multivariate probabilistic forecasts. It is defined as

\[\text{ES}(F, \mathbf{y})= \mathbb{E} \| \mathbf{X} - \mathbf{y} \| - \frac{1}{2} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \|, \]

where \(\mathbf{y} \in \mathbb{R}^{d}\) is the multivariate observation (\(d > 1\)), and \(\mathbf{X}\) and \(\mathbf{X}^{\prime}\) are independent random variables that follow the multivariate forecast distribution \(F\) (Gneiting and Raftery, 2007)¹. If the dimension \(d\) were equal to one, the energy score would reduce to the continuous ranked probability score (CRPS).

While multivariate probabilistic forecasts could belong to a parametric family of distributions, such as a multivariate normal distribution, it is more common in practice that these forecasts are ensemble forecasts; that is, the forecast is comprised of a predictive sample \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\), where each ensemble member \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M} \in \R^{d}\).

In this case, the expectations in the definition of the energy score can be replaced by sample means over the ensemble members, yielding the following representation of the energy score when evaluating an ensemble forecast \(F_{ens}\) with \(M\) members.

scoringrules.energy_score

energy_score(
    observations: Array,
    forecasts: Array,
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    backend: Backend = None,
) -> Array

Compute the Energy Score for a finite multivariate ensemble.

The Energy Score is a multivariate scoring rule expressed as

\[\text{ES}(F_{ens}, \mathbf{y})= \frac{1}{M} \sum_{m=1}^{M} \| \mathbf{x}_{m} - \mathbf{y} \| - \frac{1}{2 M^{2}} \sum_{m=1}^{M} \sum_{j=1}^{M} \| \mathbf{x}_{m} - \mathbf{x}_{j} \| \]

where \(||\cdot||\) is the euclidean norm over the input dimensions (the variables).

Parameters:

Name	Type	Description	Default
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension on the forecasts array. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension on the forecasts array (or the observations array with an extra dimension on `m_axis`). Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`energy_score`	`Array of shape (...)`	The computed Energy Score.

Weighted versions

The energy score provides a measure of overall forecast performance. However, it is often the case that certain outcomes are of more interest than others, making it desirable to assign more weight to these outcomes when evaluating forecast performance. This can be achieved using weighted scoring rules. Weighted scoring rules typically introduce a weight function into conventional scoring rules, and users can choose the weight function depending on what outcomes they want to emphasise. Allen et al. (2022)² discuss three weighted versions of the energy score. These are all available in scoringrules.

Firstly, the outcome-weighted energy score (originally introduced by Holzmann and Klar (2014)³) is defined as

\[\text{owES}(F, \mathbf{y}; w)= \frac{1}{\bar{w}} \mathbb{E} \| \mathbf{X} - \mathbf{y} \| w(\mathbf{X}) w(\mathbf{y}) - \frac{1}{2 \bar{w}^{2}} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \| w(\mathbf{X})w(\mathbf{X}^{\prime})w(\mathbf{y}), \]

where \(w : \mathbb{R}^{d} \to [0, \infty)\) is the non-negative weight function used to target particular multivariate outcomes, and \(\bar{w} = \mathbb{E}[w(X)]\). As before, \(\mathbf{X}, \mathbf{X}^{\prime} \sim F\) are independent.

scoringrules.owenergy_score

owenergy_score(
    observations: Array,
    forecasts: Array,
    w_func: tp.Callable[[ArrayLike], ArrayLike],
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    backend: Backend = None,
) -> Array

Compute the Outcome-Weighted Energy Score (owES) for a finite multivariate ensemble.

Computation is performed using the ensemble representation of the owES in Allen et al. (2022):

\[ \mathrm{owES}(F_{ens}, \mathbf{y}) = \frac{1}{M \bar{w}} \sum_{m = 1}^{M} \| \mathbf{x}_{m} - \mathbf{y} \| w(\mathbf{x}_{m}) w(\mathbf{y}) - \frac{1}{2 M^{2} \bar{w}^{2}} \sum_{m = 1}^{M} \sum_{j = 1}^{M} \| \mathbf{x}_{m} - \mathbf{x}_{j} \| w(\mathbf{x}_{m}) w(\mathbf{x}_{j}) w(\mathbf{y}), \]

where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, \(\| \cdotp \|\) is the Euclidean distance, \(w\) is the chosen weight function, and \(\bar{w} = \sum_{m=1}^{M}w(\mathbf{x}_{m})/M\).

Parameters:

Name	Type	Description	Default
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`w_func`	`Callable[[ArrayLike], ArrayLike]`	Weight function used to emphasise particular outcomes.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`owenergy_score`	`ArrayLike of shape (...)`	The computed Outcome-Weighted Energy Score.

Secondly, Allen et al. (2022) introduced the threshold-weighted energy score as

\[\text{twES}(F, \mathbf{y}; v)= \mathbb{E} \| v(\mathbf{X}) - v(\mathbf{y}) \| - \frac{1}{2} \mathbb{E} \| v(\mathbf{X}) - v(\mathbf{X}^{\prime}) \|, \]

where \(v : \mathbb{R}^{d} \to \mathbb{R}^{d}\) is a so-called chaining function. The threshold-weighted energy score transforms the forecasts and observations according to the chaining function \(v\), prior to calculating the unweighted energy score. Choosing a chaining function is generally more difficult than choosing a weight function when emphasising particular outcomes.

scoringrules.twenergy_score

twenergy_score(
    observations: Array,
    forecasts: Array,
    v_func: tp.Callable[[ArrayLike], ArrayLike],
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    backend: Backend = None,
) -> Array

Compute the Threshold-Weighted Energy Score (twES) for a finite multivariate ensemble.

Computation is performed using the ensemble representation of the twES in Allen et al. (2022):

\[ \mathrm{twES}(F_{ens}, \mathbf{y}) = \frac{1}{M} \sum_{m = 1}^{M} \| v(\mathbf{x}_{m}) - v(\mathbf{y}) \| - \frac{1}{2 M^{2}} \sum_{m = 1}^{M} \sum_{j = 1}^{M} \| v(\mathbf{x}_{m}) - v(\mathbf{x}_{j}) \|, \]

where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, \(\| \cdotp \|\) is the Euclidean distance, and \(v\) is the chaining function used to target particular outcomes.

Parameters:

Name	Type	Description	Default
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`v_func`	`Callable[[ArrayLike], ArrayLike]`	Chaining function used to emphasise particular outcomes.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`twenergy_score`	`ArrayLike of shape (...)`	The computed Threshold-Weighted Energy Score.

As an alternative, the vertically re-scaled energy score is defined as

\[ \begin{split} \text{vrES}(F, \mathbf{y}; w, \mathbf{x}_{0}) = & \mathbb{E} \| \mathbf{X} - \mathbf{y} \| w(\mathbf{X}) w(\mathbf{y}) \\ & - \frac{1}{2} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \| w(\mathbf{X})w(\mathbf{X}^{\prime}) \\ & + \left( \mathbb{E} \| \mathbf{X} - \mathbf{x}_{0} \| w(\mathbf{X}) - \| \mathbf{y} - \mathbf{x}_{0} \| w(\mathbf{y}) \right) \left(\mathbb{E}[w(\mathbf{X})] - w(\mathbf{y}) \right), \end{split} \]

where \(w : \mathbb{R}^{d} \to [0, \infty)\) is the non-negative weight function used to target particular multivariate outcomes, and \(\mathbf{x}_{0} \in \mathbb{R}^{d}\). Typically, \(\mathbf{x}_{0}\) is chosen to be zero.

scoringrules.vrenergy_score

vrenergy_score(
    observations: Array,
    forecasts: Array,
    w_func: tp.Callable[[ArrayLike], ArrayLike],
    /,
    *,
    m_axis: int = -2,
    v_axis: int = -1,
    backend: Backend = None,
) -> Array

Compute the Vertically Re-scaled Energy Score (vrES) for a finite multivariate ensemble.

Computation is performed using the ensemble representation of the vrES in Allen et al. (2022):

\[ \begin{split} \mathrm{vrES}(F_{ens}, \mathbf{y}) = & \frac{1}{M} \sum_{m = 1}^{M} \| \mathbf{x}_{m} - \mathbf{y} \| w(\mathbf{x}_{m}) w(\mathbf{y}) - \frac{1}{2 M^{2}} \sum_{m = 1}^{M} \sum_{j = 1}^{M} \| \mathbf{x}_{m} - \mathbf{x}_{j} \| w(\mathbf{x}_{m}) w(\mathbf{x_{j}}) \\ & + \left( \frac{1}{M} \sum_{m = 1}^{M} \| \mathbf{x}_{m} \| w(\mathbf{x}_{m}) - \| \mathbf{y} \| w(\mathbf{y}) \right) \left( \frac{1}{M} \sum_{m = 1}^{M} w(\mathbf{x}_{m}) - w(\mathbf{y}) \right), \end{split} \]

where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, and \(w\) is the weight function used to target particular outcomes.

Parameters:

Name	Type	Description	Default
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`w_func`	`Callable[[ArrayLike], ArrayLike]`	Weight function used to emphasise particular outcomes.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`vrenergy_score`	`ArrayLike of shape (...)`	The computed Vertically Re-scaled Energy Score.

Each of these weighted energy scores targets particular outcomes in a different way. Further details regarding the differences between these scoring rules, as well as choices for the weight and chaining functions, can be found in Allen et al. (2022). The weighted energy scores can easily be computed for ensemble forecasts by replacing the expectations with sample means over the ensemble members.

Tilmann Gneiting and Adrian E Raftery. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 2007. URL: https://doi.org/10.1198/016214506000001437, doi:10.1198/016214506000001437. ↩
Sam Allen, David Ginsbourger, and Johanna Ziegel. Evaluating forecasts for high-impact events using transformed kernel scores. arXiv preprint arXiv:2202.12732, 2022. ↩
Hajo Holzmann and Bernhard Klar. Focusing on regions of interest in forecast evaluation. The Annals of Applied Statistics, 11:2404–2431, 2017. ↩