Skip to content

Statistical test and tools

lmo.diagnostic

Hypothesis tests, estimator properties, and performance metrics.

lmo.diagnostic.HypothesisTestResult

Bases: NamedTuple

Results of a hypothesis test.

is_valid: np.bool_ | npt.NDArray[np.bool_] property

Check if the statistic is finite and not nan.

is_significant(level=0.05)

Whether or not the null hypothesis can be rejected, with a certain confidence level (5% by default).

lmo.diagnostic.normaltest(a, /, *, axis=None)

Statistical hypothesis test for non-normality, using the L-skewness and L-kurtosis coefficients on the sample data..

Adapted from Harri & Coble (2011), and includes Hosking’s correction.

Definition
  • H0: The data was drawn from a normal distribution.
  • H1: The data was drawn from a non-normal distribution.

Examples:

Compare the testing power with scipy.stats.normaltest given 10.000 samples from a contaminated normal distribution.

>>> import numpy as np
>>> from lmo.diagnostic import normaltest
>>> from scipy.stats import normaltest as normaltest_scipy
>>> rng = np.random.default_rng(12345)
>>> n = 10_000
>>> x = 0.9 * rng.normal(0, 1, n) + 0.1 * rng.normal(0, 9, n)
>>> normaltest(x)[1]
0.04806618
>>> normaltest_scipy(x)[1]
0.08435627

At a 5% significance level, Lmo’s test is significant (i.e. indicating non-normality), whereas scipy’s test isn’t (i.e. inconclusive).

Parameters:

Name Type Description Default
a AnyArrayFloat

Array-like of sample data.

required
axis int | None

Axis along which to compute the test.

None

Returns:

Type Description
HypothesisTestResult

A named tuple with:

  • statistic: The \(\tau^2_{3, 4}\) test statistic.
  • pvalue: Two-sided chi squared probability for \(H_0\).
References

A. Harri & K.H. Coble (2011) - Normality testing: Two new tests using L-moments

lmo.diagnostic.l_moment_gof(rv_or_cdf, l_moments, n_obs, /, trim=0, **kwargs)

Goodness-of-fit (GOF) hypothesis test for the null hypothesis that the observed L-moments come from a distribution with the given scipy.stats distribution or cumulative distribution function (CDF).

  • H0: The theoretical probability distribution, with the given CDF, is a good fit for the observed L-moments.
  • H1: The distribution is not a good fit for the observed L-moments.

The test statistic is the squared Mahalanobis distance between the \(n\) observed L-moments, and the theoretical L-moments. It asymptically (in sample size) follows the \(\chi^2\) distribution, with \(n\) degrees of freedom.

The sample L-moments are expected to be of consecutive orders \(r = 1, 2, \dots, n\). Generally, the amount of L-moments \(n\) should not be less than the amount of parameters of the distribution, including the location and scale parameters. Therefore, it is required to have \(n \ge 2\).

Notes

The theoretical L-moments and their covariance matrix are calculated from the CDF using numerical integration (scipy.integrate.quad and scipy.integrate.nquad). Undefined or infinite integrals cannot be detected, in which case the results might be incorrect.

If an IntegrationWarning is issued, or the function is very slow, then the results are probably incorrect, and larger degrees of trimming should be used.

Examples:

Test if the samples are drawn from a normal distribution.

>>> import lmo
>>> import numpy as np
>>> from lmo.diagnostic import l_moment_gof
>>> from scipy.stats import norm
>>> rng = np.random.default_rng(12345)
>>> X = norm(13.12, 1.66)
>>> n = 1_000
>>> x = X.rvs(n, random_state=rng)
>>> x_lm = lmo.l_moment(x, [1, 2, 3, 4])
>>> l_moment_gof(X, x_lm, n).pvalue
0.82597

Contaminated samples:

>>> y = 0.9 * x + 0.1 * rng.normal(X.mean(), X.std() * 10, n)
>>> y_lm = lmo.l_moment(y, [1, 2, 3, 4])
>>> y_lm.round(3)
array([13.193, 1.286, 0.006, 0.168])
>>> l_moment_gof(X, y_lm, n).pvalue
0.0
See Also

lmo.diagnostic.l_stats_gof(rv_or_cdf, l_stats, n_obs, /, trim=0, **kwargs)

Analogous to lmo.diagnostic.l_moment_gof, but using the L-stats (see lmo.l_stats).

lmo.diagnostic.l_moment_bounds(r, /, trim=0, scale=1.0)

l_moment_bounds(r: AnyOrderND, /, trim: AnyTrim = ..., scale: float = ...) -> _ArrF8
l_moment_bounds(r: AnyOrder, /, trim: AnyTrim = ..., scale: float = ...) -> float

Returns the absolute upper bounds \(L^{(s,t)}_r\) on L-moments \(\lambda^{(s,t)}_r\), proportional to the scale \(\sigma_X\) (standard deviation) of the probability distribution of random variable \(X\). So \(\left| \lambda^{(s,t)}_r(X) \right| \le \sigma_X \, L^{(s,t)}_r\), given that standard deviation \(\sigma_X\) of \(X\) exists and is finite.

Warning

These bounds do not apply to distributions with undefined variance, e.g. the Cauchy distribution, even if trimmed L-moments are used. Distributions with infinite variance (e.g. Student’s t with \(\nu=2\)) are a grey area:

For the L-scale (\(r=2\)), the corresponding bound will not be a valid one. However, it can still be used to find the L-ratio bounds, because the \(\sigma_X\) terms will cancel out. Doing this is not for the faint of heart, as it requires dividing infinity by infinity. So be sure to wear safety glasses.

The bounds are derived by applying the Cauchy-Schwarz inequality to the covariance-based definition of generalized trimmed L-moment, for \(r > 1\):

\[ \lambda^{(s,t)}_r(X) = \frac{r+s+t}{r} \frac{B(r,\, r+s+t)}{B(r+s,\, r+t)} \mathrm{Cov}\left[ X,\; F(X)^s \big(1 - F(X)\big)^t P^{(\alpha, \beta)}_r(X) \right] \;, \]

where \(B\) is the Beta function, \(P^{(\alpha, \beta)}_r\) the Jacobi polynomial, and \(F\) the cumulative distribution function of random variable \(X\).

After a lot of work, one can (and one did) derive the closed-form inequality:

\[ \left| \lambda^{(s,t)}_r(X) \right| \le \frac{\sigma_X}{\sqrt{2 \pi}} \frac{\Gamma(r+s+t+1)}{r} \sqrt{\frac{ B(r-\frac{1}{2}, s+\frac{1}{2}, t+\frac{1}{2}) }{ \Gamma(s+t+1) \Gamma(r+s) \Gamma(r+t) }} \]

for \(r \in \mathbb{N}_{\ge 2}\) and \(s, t \in \mathbb{R}_{\ge 0}\), where \(\Gamma\) is the Gamma function, and \(B\) the multivariate Beta function

For the untrimmed L-moments, this simplifies to

\[ \left| \lambda_r(X) \right| \le \frac{\sigma_X}{\sqrt{2 r - 1}} \,. \]
Notes

For \(r=1\) there are no bounds, i.e. float('inf') is returned.

There are no references; this novel finding is not (yet..?) published by the author, @jorenham.

Parameters:

Name Type Description Default
r AnyOrder | AnyOrderND

The L-moment order(s), non-negative integer or array-like of integers.

required
trim AnyTrim

Left- and right-trim orders \((s, t)\), as a tuple of non-negative ints or floats.

0
scale float

The standard deviation \(\sigma_X\) of the random variable \(X\). Defaults to 1.

1.0

Returns:

Name Type Description
out float | _ArrF8

float array or scalar like r.

See Also

lmo.diagnostic.l_ratio_bounds(r, /, trim=0, *, legacy=False)

l_ratio_bounds(
    r: AnyOrderND, /, trim: AnyTrim = ..., *, legacy: bool = ...
) -> _Tuple2[_ArrF8]
l_ratio_bounds(
    r: AnyOrder, /, trim: AnyTrim = ..., *, legacy: bool = ...
) -> _Tuple2[float]

Unlike the standardized product-moments, the L-moment ratio’s with \( r \ge 2 \) are bounded above and below.

Specifically, Hosking derived in 2007 that

\[ | \tlratio{r}{s,t}| \le \frac 2 r \frac{\ffact{r + s + t}{r - 2}}{\ffact{r - 1 + s \wedge t}{r - 2}} . \]

But this derivation relies on unnecessarily loose Jacobi polynomial bounds. If the actual min and max of the Jacobi polynomials are used instead, the following (tighter) inequality is obtained:

\[ \frac{\dot{w}_r^{(s, t)}}{\dot{w}_2^{(s, t)}} \min_{u \in [0, 1]} \left[ \shjacobi{r - 1}{t + 1}{s + 1}{u} \right] \le \tlratio{s, t}{r} \le \frac{\dot{w}_r^{(s, t)}}{\dot{w}_2^{(s, t)}} \max_{0 \le u \le 1} \left[ \shjacobi{r - 1}{t + 1}{s + 1}{u} \right], \]

where

\[ \dot{w}_r^{(s, t)} = \frac{\B(r - 1,\ r + s + t + 1)}{r \B(r + s,\ r + t)}. \]

Examples:

Without trim, the lower- and upper-bounds of the L-skewness and L-kurtosis are:

>>> l_ratio_bounds(3)
(-1.0, 1.0)
>>> l_ratio_bounds(4)
(-0.25, 1.0)

For the L-kurtosis, the “legacy” bounds by Hosking (2007) are clearly looser:

>>> l_ratio_bounds(4, legacy=True)
(-1.0, 1.0)

For the symmetrically trimmed TL-moment ratio’s:

>>> l_ratio_bounds(3, trim=3)
(-1.2, 1.2)
>>> l_ratio_bounds(4, trim=3)
(-0.15, 1.5)

Similarly, those of the LL-ratio’s are

>>> l_ratio_bounds(3, trim=(0, 3))
(-0.8, 2.0)
>>> l_ratio_bounds(4, trim=(0, 3))
(-0.233333, 3.5)

The LH-skewness bounds are “flipped” w.r.t to the LL-skewness, but they are the same for the L*-kurtosis:

>>> l_ratio_bounds(3, trim=(3, 0))
(-2.0, 0.8)
>>> l_ratio_bounds(4, trim=(3, 0))
(-0.233333, 3.5)

The bounds of multiple L-ratio’s can be calculated in one shot:

>>> np.stack(l_ratio_bounds([3, 4, 5, 6], trim=(1, 2)))
array([[-1.        , -0.19444444, -1.12      , -0.14945848],
       [ 1.33333333,  1.75      ,  2.24      ,  2.8       ]])

Parameters:

Name Type Description Default
r AnyOrder | AnyOrderND

Scalar or array-like with the L-moment ratio order(s).

required
trim AnyTrim

L-moment ratio trim-length(s).

0
legacy bool

If set to True, will use the (looser) by Hosking (2007).

False

Returns:

Type Description
_Tuple2[float | _ArrF8]

A 2-tuple with arrays or scalars, of the lower- and upper bounds.

See Also
References

lmo.diagnostic.rejection_point(influence_fn, /, rho_min=0, rho_max=np.inf)

Evaluate the approximate rejection point of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).

\[ \rho^*_{T|F} = \inf_{r>0} \left\{ r: | \psi_{T|F}(x) | \le \epsilon, \, |x| > r \right\} \; \]

with a \(\epsilon\) a small positive number, corresponding to the tol param of e.g. l_moment_influence , which defaults to 1e-8.

Examples:

The untrimmed L-location isn’t robust, e.g. with the standard normal distribution:

>>> import numpy as np
>>> from scipy.stats import distributions as dists
>>> from lmo.diagnostic import rejection_point
>>> if_l_loc_norm = dists.norm.l_moment_influence(1, trim=0)
>>> if_l_loc_norm(np.inf)
inf
>>> rejection_point(if_l_loc_norm)
nan

For the TL-location of the Gaussian distribution, and even for the Student’s t distribution with 4 degrees of freedom (3 also works, but is very slow), they exist.

>>> influence_norm = dists.norm.l_moment_influence(1, trim=1)
>>> influence_t4 = dists.t(4).l_moment_influence(1, trim=1)
>>> influence_norm(np.inf), influence_t4(np.inf)
(0.0, 0.0)
>>> rejection_point(influence_norm), rejection_point(influence_t4)
(6.0, 206.0)
Notes

Large rejection points (e.g. >1000) are unlikely to be found.

For instance, that of the TL-location of the Student’s t distribution with 2 degrees of freedom lies between somewhere 1e4 and 1e5, but will not be found. In this case, using trim=2 will return 166.0.

Parameters:

Name Type Description Default
influence_fn Callable[[float], float]

Univariate influence function.

required
rho_min float

The minimum \(\rho^*_{T|F}\) of the search space. Must be finite and non-negative. Defaults to \(0\).

0
rho_max float

The maximum \(\rho^*_{T|F}\) of the search space. Must be larger than rho_min. Defaults to \(\infty\).

inf

Returns:

Type Description
float

A finite or infinite scalar.

See Also

lmo.diagnostic.error_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))

Evaluate the gross-error sensitivity of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).

\[ \gamma^*_{T|F} = \max_{x} \left| \psi_{T|F}(x) \right| \]

Examples:

Evaluate the gross-error sensitivity of the standard exponential distribution’s LL-skewness (\(\tau^{(0, 1)}_3\)) and LL-kurtosis (\(\tau^{(0, 1)}_4\)) coefficients:

>>> from lmo.diagnostic import error_sensitivity
>>> from scipy.stats import expon
>>> ll_skew_if = expon.l_ratio_influence(3, 2, trim=(0, 1))
>>> ll_kurt_if = expon.l_ratio_influence(4, 2, trim=(0, 1))
>>> error_sensitivity(ll_skew_if, domain=(0, float("inf")))
1.814657
>>> error_sensitivity(ll_kurt_if, domain=(0, float("inf")))
1.377743

Parameters:

Name Type Description Default
influence_fn Callable[[float], float]

Univariate influence function.

required
domain _Tuple2[float]

Domain of the CDF. Defaults to \((-\infty, \infty)\).

(-inf, inf)

Returns:

Type Description
float

Gross-error sensitivity \(\gamma^*_{T|F}\) .

See Also

lmo.diagnostic.shift_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))

Evaluate the local-shift sensitivity of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).

\[ \lambda^*_{T|F} = \max_{x \neq y} \left| \frac{ \psi_{T|F}(y) - \psi_{T|F}(x) }{ y - x } \right| \]

Represents the effect of shifting an observation slightly from \(x\), to a neighbouring point \(y\). For instance, adding an observation at \(y\) and removing one at \(x\).

Examples:

Evaluate the local-shift sensitivity of the standard exponential distribution’s LL-skewness (\(\tau^{(0, 1)}_3\)) and LL-kurtosis (\(\tau^{(0, 1)}_4\)) coefficients:

>>> from lmo.diagnostic import shift_sensitivity
>>> from scipy.stats import expon
>>> ll_skew_if = expon.l_ratio_influence(3, 2, trim=(0, 1))
>>> ll_kurt_if = expon.l_ratio_influence(4, 2, trim=(0, 1))
>>> domain = 0, float("inf")
>>> shift_sensitivity(ll_skew_if, domain)
0.837735
>>> shift_sensitivity(ll_kurt_if, domain)
1.442062

Let’s compare these with the untrimmed ones:

>>> shift_sensitivity(expon.l_ratio_influence(3, 2), domain)
1.920317
>>> shift_sensitivity(expon.l_ratio_influence(4, 2), domain)
1.047565

Parameters:

Name Type Description Default
influence_fn Callable[[float], float]

Univariate influence function.

required
domain _Tuple2[float]

Domain of the CDF. Defaults to \((-\infty, \infty)\).

(-inf, inf)

Returns:

Type Description
float

Local-shift sensitivity \(\lambda^*_{T|F}\) .

See Also
References