Variogram Score

The varigoram score (VS) is a scoring rule for evaluating multivariate probabilistic forecasts. It is defined as

\[\text{VS}_{p}(F, \mathbf{y})= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \mathbb{E} | X_{i} - X_{j} |^{p} - | y_{i} - y_{j} |^{p} \right)^{2}, \]

where \(p > 0\), \(\mathbf{y} = (y_{1}, \dots, y_{d}) \in \mathbb{R}^{d}\) is the multivariate observation (\(d > 1\)), and \(\mathbf{X} = (X_{1}, \dots, X_{d})\) is a random vector that follows the multivariate forecast distribution \(F\) (Scheuerer and Hamill, 2015)¹. The exponent \(p\) is typically chosen to be 0.5 or 1.

The variogram score is less sensitive to marginal forecast performance than the energy score, and Scheuerer and Hamill (2015) argue that it should therefore be more sensitive to errors in the forecast's dependence structure.

While multivariate probabilistic forecasts could belong to a parametric family of distributions, such as a multivariate normal distribution, it is more common in practice that these forecasts are ensemble forecasts; that is, the forecast is comprised of a predictive sample \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\), where each ensemble member \(\mathbf{x}_{i} = (x_{i, 1}, \dots, x_{i, d}) \in \R^{d}\) for \(i = 1, \dots, M\).

In this case, the expectation in the definition of the variogram score can be replaced by a sample mean over the ensemble members, yielding the following representation of the variogram score when evaluating an ensemble forecast \(F_{ens}\) with \(M\) members.

scoringrules.variogram_score

variogram_score(
    observations: Array,
    forecasts: Array,
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    p: float = 1.0,
    backend: Backend = None,
) -> Array

Compute the Variogram Score for a finite multivariate ensemble.

For a \(D\)-variate ensemble the Variogram Score (Sheuerer and Hamill, 2015) of order \(p\) is expressed as

\[\text{VS}_{p}(F_{ens}, \mathbf{y})= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \frac{1}{M} \sum_{m=1}^{M} | x_{m,i} - x_{m,j} |^{p} - | y_{i} - y_{j} |^{p} \right)^{2}. \]

where \(\mathbf{X}\) and \(\mathbf{X'}\) are independently sampled ensembles from from \(F\).

Parameters:

Name	Type	Description	Default
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`p`	`float`	The order of the Variogram Score. Typical values are 0.5, 1.0 or 2.0. Defaults to 1.0.	`1.0`
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`variogram_score`	`Array`	The computed Variogram Score.

Weighted versions

It is often the case that certain outcomes are of more interest than others when evaluating forecast performance. These outcomes can be emphasised by employing weighted scoring rules. Weighted scoring rules typically introduce a weight function into conventional scoring rules, and users can choose the weight function depending on what outcomes they want to emphasise. Allen et al. (2022)² introduced three weighted versions of the variogram score. These are all available in scoringrules.

Firstly, the outcome-weighted variogram score (see also Holzmann and Klar (2014)³) is defined as

\[\text{owVS}_{p}(F, \mathbf{y}; w) = \frac{1}{\bar{w}} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{y}) w(\mathbf{X}) w(\mathbf{y}) ] - \frac{1}{2 \bar{w}^{2}} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{X}^{\prime}) w(\mathbf{X}) w(\mathbf{X}^{\prime}) w(\mathbf{y}) ], \]

where

\[ \rho_{p}(\mathbf{x}, \mathbf{z}) = \sum_{i=1}^{d} \sum_{j=1}^{d} \left( |x_{i} - x_{j}|^{p} - |z_{i} - z_{j}|^{p} \right)^{2}, \]

for \(\mathbf{x} = (x_{1}, \dots, x_{d}) \in \mathbb{R}^{d}\) and \(\mathbf{z} = (z_{1}, \dots, z_{d}) \in \mathbb{R}^{d}\).

Here, \(w : \mathbb{R}^{d} \to [0, \infty)\) is the non-negative weight function used to target particular multivariate outcomes, and \(\bar{w} = \mathbb{E}[w(X)]\). As before, \(\mathbf{X}, \mathbf{X}^{\prime} \sim F\) are independent.

scoringrules.owvariogram_score

owvariogram_score(
    observations: Array,
    forecasts: Array,
    w_func: tp.Callable,
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    p: float = 1.0,
    backend: Backend = None,
) -> Array

Compute the Outcome-Weighted Variogram Score (owVS) for a finite multivariate ensemble.

Computation is performed using the ensemble representation of the owVS in Allen et al. (2022):

\[ \begin{split} \mathrm{owVS}(F_{ens}, \mathbf{y}) = & \frac{1}{M \bar{w}} \sum_{m=1}^{M} \sum_{i,j=1}^{D}(|y_{i} - y_{j}|^{p} - |x_{m,i} - x_{m,j}|^{p})^{2} w(\mathbf{x}_{m}) w(\mathbf{y}) \\ & - \frac{1}{2 M^{2} \bar{w}^{2}} \sum_{k,m=1}^{M} \sum_{i,j=1}^{D} \left( |x_{k,i} - x_{k,j}|^{p} - |x_{m,i} - x_{m,j}|^{p} \right)^{2} w(\mathbf{x}_{k}) w(\mathbf{x}_{m}) w(\mathbf{y}), \end{split} \]

where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, \(w\) is the chosen weight function, and \(\bar{w} = \sum_{m=1}^{M}w(\mathbf{x}_{m})/M\).

Parameters:

Name	Type	Description	Default
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`p`	`float`	The order of the Variogram Score. Typical values are 0.5, 1.0 or 2.0. Defaults to 1.0.	`1.0`
`w_func`	`Callable`	Weight function used to emphasise particular outcomes.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`owvariogram_score`	`ArrayLike of shape (...)`	The computed Outcome-Weighted Variogram Score.

Secondly, Allen et al. (2022) introduced the threshold-weighted variogram score as

\[\text{twVS}_{p}(F, \mathbf{y}; v)= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \mathbb{E} | v(\mathbf{X})_{i} - v(\mathbf{X})_{j} |^{p} - | v(\mathbf{y})_{i} - v(\mathbf{y})_{j} |^{p} \right)^{2}, \]

where \(v : \mathbb{R}^{d} \to \mathbb{R}^{d}\) is a so-called chaining function, so that \(v(\mathbf{X}) = (v(\mathbf{X})_{1}, \dots, v(\mathbf{X})_{d}) \in \mathbb{R}^{d}\). The threshold-weighted variogram score transforms the forecasts and observations according to the chaining function \(v\), prior to calculating the unweighted variogram score. Choosing a chaining function is generally more difficult than choosing a weight function when emphasising particular outcomes.

scoringrules.twvariogram_score

twvariogram_score(
    observations: Array,
    forecasts: Array,
    v_func: tp.Callable,
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    p: float = 1.0,
    backend: Backend = None,
) -> Array

Compute the Threshold-Weighted Variogram Score (twVS) for a finite multivariate ensemble.

Computation is performed using the ensemble representation of the twVS in Allen et al. (2022):

\[ \mathrm{twVS}(F_{ens}, \mathbf{y}) = \sum_{i,j=1}^{D}(|v(\mathbf{y})_i - v(\mathbf{y})_{j}|^{p} - \frac{1}{M} \sum_{m=1}^{M}|v(\mathbf{x}_{m})_{i} - v(\mathbf{x}_{m})_{j}|^{p})^{2}, \]

where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, and \(v\) is the chaining function used to target particular outcomes.

Parameters:

Name	Type	Description	Default
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`p`	`float`	The order of the Variogram Score. Typical values are 0.5, 1.0 or 2.0. Defaults to 1.0.	`1.0`
`v_func`	`Callable`	Chaining function used to emphasise particular outcomes.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`twvariogram_score`	`ArrayLike of shape (...)`	The computed Threshold-Weighted Variogram Score.

As an alternative, the vertically re-scaled variogram score is defined as

\[\text{vrVS}_{p}(F, \mathbf{y}; w) = \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{y}) w(\mathbf{X}) w(\mathbf{y}) ] - \frac{1}{2} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{X}^{\prime}) w(\mathbf{X}) w(\mathbf{X}^{\prime}) ] + \left( \mathbb{E} [ \rho_{p} ( \mathbf{X}, \mathbf{x}_{0} ) w(\mathbf{X}) ] - \rho_{p} ( \mathbf{y}, \mathbf{x}_{0}) w(\mathbf{y}) \right) \left(\mathbb{E}[w(\mathbf{X})] - w(\mathbf{y}) \right), \]

where \(w\) and \(\rho_{p}\) are as defined above, and \(\mathbf{x}_{0} \in \mathbb{R}^{d}\). Typically, \(\mathbf{x}_{0}\) is chosen to be the zero vector.

scoringrules.vrvariogram_score

vrvariogram_score(
    observations: Array,
    forecasts: Array,
    w_func: tp.Callable,
    /,
    m_axis: int = -2,
    v_axis: int = -1,
    *,
    p: float = 1.0,
    backend: Backend = None,
) -> Array

Compute the Vertically Re-scaled Variogram Score (vrVS) for a finite multivariate ensemble.

Computation is performed using the ensemble representation of the vrVS in Allen et al. (2022):

\[ \begin{split} \mathrm{vrVS}(F_{ens}, \mathbf{y}) = & \frac{1}{M} \sum_{m=1}^{M} \sum_{i,j=1}^{D}(|y_{i} - y_{j}|^{p} - |x_{m,i} - x_{m,j}|^{p})^{2} w(\mathbf{x}_{m}) w(\mathbf{y}) \\ & - \frac{1}{2 M^{2}} \sum_{k,m=1}^{M} \sum_{i,j=1}^{D} \left( |x_{k,i} - x_{k,j}|^{p} - |x_{m,i} - x_{m,j}|^{p} \right)^{2} w(\mathbf{x}_{k}) w(\mathbf{x}_{m})) \\ & + \left( \frac{1}{M} \sum_{m = 1}^{M} \sum_{i,j=1}^{D}(|x_{m,i} - x_{m,j}|^{p} w(\mathbf{x}_{m}) - \sum_{i,j=1}^{D}(|y_{i} - y_{j}|^{p} w(\mathbf{y}) \right) \left( \frac{1}{M} \sum_{m = 1}^{M} w(\mathbf{x}_{m}) - w(\mathbf{y}) \right), \end{split} \]

where \(F_{ens}\) is the ensemble forecast \(\mathbf{x}_{1}, \dots, \mathbf{x}_{M}\) with \(M\) members, \(w\) is the chosen weight function, and \(\bar{w} = \sum_{m=1}^{M}w(\mathbf{x}_{m})/M\).

Parameters:

Name	Type	Description	Default
`observations`	`Array`	The observed values, where the variables dimension is by default the last axis.	required
`forecasts`	`Array`	The predicted forecast ensemble, where the ensemble dimension is by default represented by the second last axis and the variables dimension by the last axis.	required
`p`	`float`	The order of the Variogram Score. Typical values are 0.5, 1.0 or 2.0. Defaults to 1.0.	`1.0`
`w_func`	`Callable`	Weight function used to emphasise particular outcomes.	required
`m_axis`	`int`	The axis corresponding to the ensemble dimension. Defaults to -2.	`-2`
`v_axis`	`int`	The axis corresponding to the variables dimension. Defaults to -1.	`-1`
`backend`	`Backend`	The name of the backend used for computations. Defaults to 'numba' if available, else 'numpy'.	`None`

Returns:

Name	Type	Description
`vrvariogram_score`	`ArrayLike of shape (...)`	The computed Vertically Re-scaled Variogram Score.

Each of these weighted variogram scores targets particular outcomes in a different way. Further details regarding the differences between these scoring rules, as well as choices for the weight and chaining functions, can be found in Allen et al. (2022). The weighted variogram scores can easily be computed for ensemble forecasts by replacing the expectations with sample means over the ensemble members.

Michael Scheuerer and Thomas M. Hamill. Variogram-Based Proper Scoring Rules for Probabilistic Forecasts of Multivariate Quantities. Monthly Weather Review, 2015. URL: https://journals.ametsoc.org/view/journals/mwre/143/4/mwr-d-14-00269.1.xml, doi:10.1175/MWR-D-14-00269.1. ↩
Sam Allen, David Ginsbourger, and Johanna Ziegel. Evaluating forecasts for high-impact events using transformed kernel scores. arXiv preprint arXiv:2202.12732, 2022. ↩
Hajo Holzmann and Bernhard Klar. Focusing on regions of interest in forecast evaluation. The Annals of Applied Statistics, 11:2404–2431, 2017. ↩