Statistical test and tools¶

`lmo.diagnostic` ¶

Hypothesis tests, estimator properties, and performance metrics.

`lmo.diagnostic.HypothesisTestResult` ¶

Bases: NamedTuple

Results of a hypothesis test.

`is_valid: np.bool_ | npt.NDArray[np.bool_]` `property` ¶

Check if the statistic is finite and not nan.

`is_significant(level=0.05)` ¶

Whether or not the null hypothesis can be rejected, with a certain confidence level (5% by default).

`lmo.diagnostic.normaltest(a, /, *, axis=None)` ¶

Statistical hypothesis test for non-normality, using the L-skewness and L-kurtosis coefficients on the sample data..

Adapted from Harri & Coble (2011), and includes Hosking’s correction.

Definition

H0: The data was drawn from a normal distribution.
H1: The data was drawn from a non-normal distribution.

Examples:

Compare the testing power with scipy.stats.normaltest given 10.000 samples from a contaminated normal distribution.

>>> import numpy as np
>>> from lmo.diagnostic import normaltest
>>> from scipy.stats import normaltest as normaltest_scipy
>>> rng = np.random.default_rng(12345)
>>> n = 10_000
>>> x = 0.9 * rng.normal(0, 1, n) + 0.1 * rng.normal(0, 9, n)
>>> normaltest(x)[1]
0.04806618
>>> normaltest_scipy(x)[1]
0.08435627

At a 5% significance level, Lmo’s test is significant (i.e. indicating non-normality), whereas scipy’s test isn’t (i.e. inconclusive).

Parameters:

Name	Type	Description	Default
`a`	`AnyArrayFloat`	Array-like of sample data.	required
`axis`	`int \| None`	Axis along which to compute the test.	`None`

Returns:

Type	Description
`HypothesisTestResult`	A named tuple with: `statistic`: The \(\tau^2_{3, 4}\) test statistic. `pvalue`: Two-sided chi squared probability for \(H_0\).

References

A. Harri & K.H. Coble (2011) - Normality testing: Two new tests using L-moments

`lmo.diagnostic.l_moment_gof(rv_or_cdf, l_moments, n_obs, /, trim=0, **kwargs)` ¶

Goodness-of-fit (GOF) hypothesis test for the null hypothesis that the observed L-moments come from a distribution with the given scipy.stats distribution or cumulative distribution function (CDF).

H0: The theoretical probability distribution, with the given CDF, is a good fit for the observed L-moments.
H1: The distribution is not a good fit for the observed L-moments.

The test statistic is the squared Mahalanobis distance between the \(n\) observed L-moments, and the theoretical L-moments. It asymptically (in sample size) follows the \(\chi^2\) distribution, with \(n\) degrees of freedom.

The sample L-moments are expected to be of consecutive orders \(r = 1, 2, \dots, n\). Generally, the amount of L-moments \(n\) should not be less than the amount of parameters of the distribution, including the location and scale parameters. Therefore, it is required to have \(n \ge 2\).

Notes

The theoretical L-moments and their covariance matrix are calculated from the CDF using numerical integration (scipy.integrate.quad and scipy.integrate.nquad). Undefined or infinite integrals cannot be detected, in which case the results might be incorrect.

If an IntegrationWarning is issued, or the function is very slow, then the results are probably incorrect, and larger degrees of trimming should be used.

Examples:

Test if the samples are drawn from a normal distribution.

>>> import lmo
>>> import numpy as np
>>> from lmo.diagnostic import l_moment_gof
>>> from scipy.stats import norm
>>> rng = np.random.default_rng(12345)
>>> X = norm(13.12, 1.66)
>>> n = 1_000
>>> x = X.rvs(n, random_state=rng)
>>> x_lm = lmo.l_moment(x, [1, 2, 3, 4])
>>> l_moment_gof(X, x_lm, n).pvalue
0.82597

Contaminated samples:

>>> y = 0.9 * x + 0.1 * rng.normal(X.mean(), X.std() * 10, n)
>>> y_lm = lmo.l_moment(y, [1, 2, 3, 4])
>>> y_lm.round(3)
array([13.193, 1.286, 0.006, 0.168])
>>> l_moment_gof(X, y_lm, n).pvalue
0.0

See Also

`lmo.diagnostic.l_stats_gof(rv_or_cdf, l_stats, n_obs, /, trim=0, **kwargs)` ¶

Analogous to lmo.diagnostic.l_moment_gof, but using the L-stats (see lmo.l_stats).

`lmo.diagnostic.l_moment_bounds(r, /, trim=0, scale=1.0)` ¶

l_moment_bounds(r: AnyOrderND, /, trim: AnyTrim = ..., scale: float = ...) -> _ArrF8

l_moment_bounds(r: AnyOrder, /, trim: AnyTrim = ..., scale: float = ...) -> float

Returns the absolute upper bounds \(L^{(s,t)}_r\) on L-moments \(\lambda^{(s,t)}_r\), proportional to the scale \(\sigma_X\) (standard deviation) of the probability distribution of random variable \(X\). So \(\left| \lambda^{(s,t)}_r(X) \right| \le \sigma_X \, L^{(s,t)}_r\), given that standard deviation \(\sigma_X\) of \(X\) exists and is finite.

Warning

These bounds do not apply to distributions with undefined variance, e.g. the Cauchy distribution, even if trimmed L-moments are used. Distributions with infinite variance (e.g. Student’s t with \(\nu=2\)) are a grey area:

For the L-scale (\(r=2\)), the corresponding bound will not be a valid one. However, it can still be used to find the L-ratio bounds, because the \(\sigma_X\) terms will cancel out. Doing this is not for the faint of heart, as it requires dividing infinity by infinity. So be sure to wear safety glasses.

The bounds are derived by applying the Cauchy-Schwarz inequality to the covariance-based definition of generalized trimmed L-moment, for \(r > 1\):

\[ \lambda^{(s,t)}_r(X) = \frac{r+s+t}{r} \frac{B(r,\, r+s+t)}{B(r+s,\, r+t)} \mathrm{Cov}\left[ X,\; F(X)^s \big(1 - F(X)\big)^t P^{(\alpha, \beta)}_r(X) \right] \;, \]

where \(B\) is the Beta function, \(P^{(\alpha, \beta)}_r\) the Jacobi polynomial, and \(F\) the cumulative distribution function of random variable \(X\).

After a lot of work, one can (and one did) derive the closed-form inequality:

\[ \left| \lambda^{(s,t)}_r(X) \right| \le \frac{\sigma_X}{\sqrt{2 \pi}} \frac{\Gamma(r+s+t+1)}{r} \sqrt{\frac{ B(r-\frac{1}{2}, s+\frac{1}{2}, t+\frac{1}{2}) }{ \Gamma(s+t+1) \Gamma(r+s) \Gamma(r+t) }} \]

for \(r \in \mathbb{N}_{\ge 2}\) and \(s, t \in \mathbb{R}_{\ge 0}\), where \(\Gamma\) is the Gamma function, and \(B\) the multivariate Beta function

For the untrimmed L-moments, this simplifies to

\[ \left| \lambda_r(X) \right| \le \frac{\sigma_X}{\sqrt{2 r - 1}} \,. \]

Notes

For \(r=1\) there are no bounds, i.e. float('inf') is returned.

There are no references; this novel finding is not (yet..?) published by the author, @jorenham.

Parameters:

Name	Type	Description	Default
`r`	`AnyOrder \| AnyOrderND`	The L-moment order(s), non-negative integer or array-like of integers.	required
`trim`	`AnyTrim`	Left- and right-trim orders \((s, t)\), as a tuple of non-negative ints or floats.	`0`
`scale`	`float`	The standard deviation \(\sigma_X\) of the random variable \(X\). Defaults to 1.	`1.0`

Returns:

Name	Type	Description
`out`	`float \| _ArrF8`	float array or scalar like `r`.

See Also

`lmo.diagnostic.l_ratio_bounds(r, /, trim=0, *, legacy=False)` ¶

l_ratio_bounds(
    r: AnyOrderND, /, trim: AnyTrim = ..., *, legacy: bool = ...
) -> _Tuple2[_ArrF8]

l_ratio_bounds(
    r: AnyOrder, /, trim: AnyTrim = ..., *, legacy: bool = ...
) -> _Tuple2[float]

Unlike the standardized product-moments, the L-moment ratio’s with \( r \ge 2 \) are bounded above and below.

Specifically, Hosking derived in 2007 that

\[ | \tlratio{r}{s,t}| \le \frac 2 r \frac{\ffact{r + s + t}{r - 2}}{\ffact{r - 1 + s \wedge t}{r - 2}} . \]

But this derivation relies on unnecessarily loose Jacobi polynomial bounds. If the actual min and max of the Jacobi polynomials are used instead, the following (tighter) inequality is obtained:

\[ \frac{\dot{w}_r^{(s, t)}}{\dot{w}_2^{(s, t)}} \min_{u \in [0, 1]} \left[ \shjacobi{r - 1}{t + 1}{s + 1}{u} \right] \le \tlratio{s, t}{r} \le \frac{\dot{w}_r^{(s, t)}}{\dot{w}_2^{(s, t)}} \max_{0 \le u \le 1} \left[ \shjacobi{r - 1}{t + 1}{s + 1}{u} \right], \]

where

\[ \dot{w}_r^{(s, t)} = \frac{\B(r - 1,\ r + s + t + 1)}{r \B(r + s,\ r + t)}. \]

Examples:

Without trim, the lower- and upper-bounds of the L-skewness and L-kurtosis are:

>>> l_ratio_bounds(3)
(-1.0, 1.0)
>>> l_ratio_bounds(4)
(-0.25, 1.0)

For the L-kurtosis, the “legacy” bounds by Hosking (2007) are clearly looser:

>>> l_ratio_bounds(4, legacy=True)
(-1.0, 1.0)

For the symmetrically trimmed TL-moment ratio’s:

>>> l_ratio_bounds(3, trim=3)
(-1.2, 1.2)
>>> l_ratio_bounds(4, trim=3)
(-0.15, 1.5)

Similarly, those of the LL-ratio’s are

>>> l_ratio_bounds(3, trim=(0, 3))
(-0.8, 2.0)
>>> l_ratio_bounds(4, trim=(0, 3))
(-0.233333, 3.5)

The LH-skewness bounds are “flipped” w.r.t to the LL-skewness, but they are the same for the L*-kurtosis:

>>> l_ratio_bounds(3, trim=(3, 0))
(-2.0, 0.8)
>>> l_ratio_bounds(4, trim=(3, 0))
(-0.233333, 3.5)

The bounds of multiple L-ratio’s can be calculated in one shot:

>>> np.stack(l_ratio_bounds([3, 4, 5, 6], trim=(1, 2)))
array([[-1.        , -0.19444444, -1.12      , -0.14945848],
       [ 1.33333333,  1.75      ,  2.24      ,  2.8       ]])

Parameters:

Name	Type	Description	Default
`r`	`AnyOrder \| AnyOrderND`	Scalar or array-like with the L-moment ratio order(s).	required
`trim`	`AnyTrim`	L-moment ratio trim-length(s).	`0`
`legacy`	`bool`	If set to `True`, will use the (looser) by Hosking (2007).	`False`

Returns:

Type	Description
`_Tuple2[float \| _ArrF8]`	A 2-tuple with arrays or scalars, of the lower- and upper bounds.

See Also

References

J.R.M. Hosking (2007) - Some theory and practical uses of trimmed L-moments

`lmo.diagnostic.rejection_point(influence_fn, /, rho_min=0, rho_max=np.inf)` ¶

Evaluate the approximate rejection point of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).

\[ \rho^*_{T|F} = \inf_{r>0} \left\{ r: | \psi_{T|F}(x) | \le \epsilon, \, |x| > r \right\} \; \]

with a \(\epsilon\) a small positive number, corresponding to the tol param of e.g. l_moment_influence , which defaults to 1e-8.

Examples:

The untrimmed L-location isn’t robust, e.g. with the standard normal distribution:

>>> import numpy as np
>>> from scipy.stats import distributions as dists
>>> from lmo.diagnostic import rejection_point
>>> if_l_loc_norm = dists.norm.l_moment_influence(1, trim=0)
>>> if_l_loc_norm(np.inf)
inf
>>> rejection_point(if_l_loc_norm)
nan

For the TL-location of the Gaussian distribution, and even for the Student’s t distribution with 4 degrees of freedom (3 also works, but is very slow), they exist.

>>> influence_norm = dists.norm.l_moment_influence(1, trim=1)
>>> influence_t4 = dists.t(4).l_moment_influence(1, trim=1)
>>> influence_norm(np.inf), influence_t4(np.inf)
(0.0, 0.0)
>>> rejection_point(influence_norm), rejection_point(influence_t4)
(6.0, 206.0)

Notes

Large rejection points (e.g. >1000) are unlikely to be found.

For instance, that of the TL-location of the Student’s t distribution with 2 degrees of freedom lies between somewhere 1e4 and 1e5, but will not be found. In this case, using trim=2 will return 166.0.

Parameters:

Name	Type	Description	Default
`influence_fn`	`Callable[[float], float]`	Univariate influence function.	required
`rho_min`	`float`	The minimum \(\rho^*_{T\|F}\) of the search space. Must be finite and non-negative. Defaults to \(0\).	`0`
`rho_max`	`float`	The maximum \(\rho^*_{T\|F}\) of the search space. Must be larger than `rho_min`. Defaults to \(\infty\).	`inf`

Returns:

Type	Description
`float`	A finite or infinite scalar.

See Also

`lmo.diagnostic.error_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))` ¶

Evaluate the gross-error sensitivity of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).

\[ \gamma^*_{T|F} = \max_{x} \left| \psi_{T|F}(x) \right| \]

Examples:

Evaluate the gross-error sensitivity of the standard exponential distribution’s LL-skewness (\(\tau^{(0, 1)}_3\)) and LL-kurtosis (\(\tau^{(0, 1)}_4\)) coefficients:

>>> from lmo.diagnostic import error_sensitivity
>>> from scipy.stats import expon
>>> ll_skew_if = expon.l_ratio_influence(3, 2, trim=(0, 1))
>>> ll_kurt_if = expon.l_ratio_influence(4, 2, trim=(0, 1))
>>> error_sensitivity(ll_skew_if, domain=(0, float("inf")))
1.814657
>>> error_sensitivity(ll_kurt_if, domain=(0, float("inf")))
1.377743

Parameters:

Name	Type	Description	Default
`influence_fn`	`Callable[[float], float]`	Univariate influence function.	required
`domain`	`_Tuple2[float]`	Domain of the CDF. Defaults to \((-\infty, \infty)\).	`(-inf, inf)`

Returns:

Type	Description
`float`	Gross-error sensitivity \(\gamma^*_{T\|F}\) .

See Also

`lmo.diagnostic.shift_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))` ¶

Evaluate the local-shift sensitivity of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).

\[ \lambda^*_{T|F} = \max_{x \neq y} \left| \frac{ \psi_{T|F}(y) - \psi_{T|F}(x) }{ y - x } \right| \]

Represents the effect of shifting an observation slightly from \(x\), to a neighbouring point \(y\). For instance, adding an observation at \(y\) and removing one at \(x\).

Examples:

Evaluate the local-shift sensitivity of the standard exponential distribution’s LL-skewness (\(\tau^{(0, 1)}_3\)) and LL-kurtosis (\(\tau^{(0, 1)}_4\)) coefficients:

>>> from lmo.diagnostic import shift_sensitivity
>>> from scipy.stats import expon
>>> ll_skew_if = expon.l_ratio_influence(3, 2, trim=(0, 1))
>>> ll_kurt_if = expon.l_ratio_influence(4, 2, trim=(0, 1))
>>> domain = 0, float("inf")
>>> shift_sensitivity(ll_skew_if, domain)
0.837735
>>> shift_sensitivity(ll_kurt_if, domain)
1.442062

Let’s compare these with the untrimmed ones:

>>> shift_sensitivity(expon.l_ratio_influence(3, 2), domain)
1.920317
>>> shift_sensitivity(expon.l_ratio_influence(4, 2), domain)
1.047565

Parameters:

Name	Type	Description	Default
`influence_fn`	`Callable[[float], float]`	Univariate influence function.	required
`domain`	`_Tuple2[float]`	Domain of the CDF. Defaults to \((-\infty, \infty)\).	`(-inf, inf)`

Returns:

Type	Description
`float`	Local-shift sensitivity \(\lambda^*_{T\|F}\) .

See Also

References

Frank R. Hampel (1974) - The Influence Curve and its Role in Robust Estimation

Statistical test and tools¶

lmo.diagnostic ¶

lmo.diagnostic.HypothesisTestResult ¶

is_valid: np.bool_ | npt.NDArray[np.bool_] property ¶

is_significant(level=0.05) ¶

lmo.diagnostic.normaltest(a, /, *, axis=None) ¶

lmo.diagnostic.l_moment_gof(rv_or_cdf, l_moments, n_obs, /, trim=0, **kwargs) ¶

lmo.diagnostic.l_stats_gof(rv_or_cdf, l_stats, n_obs, /, trim=0, **kwargs) ¶

lmo.diagnostic.l_moment_bounds(r, /, trim=0, scale=1.0) ¶

lmo.diagnostic.l_ratio_bounds(r, /, trim=0, *, legacy=False) ¶

lmo.diagnostic.rejection_point(influence_fn, /, rho_min=0, rho_max=np.inf) ¶

lmo.diagnostic.error_sensitivity(influence_fn, /, domain=(-math.inf, math.inf)) ¶

lmo.diagnostic.shift_sensitivity(influence_fn, /, domain=(-math.inf, math.inf)) ¶

`lmo.diagnostic` ¶

`lmo.diagnostic.HypothesisTestResult` ¶

`is_valid: np.bool_ | npt.NDArray[np.bool_]` `property` ¶

`is_significant(level=0.05)` ¶

`lmo.diagnostic.normaltest(a, /, *, axis=None)` ¶

`lmo.diagnostic.l_moment_gof(rv_or_cdf, l_moments, n_obs, /, trim=0, **kwargs)` ¶

`lmo.diagnostic.l_stats_gof(rv_or_cdf, l_stats, n_obs, /, trim=0, **kwargs)` ¶

`lmo.diagnostic.l_moment_bounds(r, /, trim=0, scale=1.0)` ¶

`lmo.diagnostic.l_ratio_bounds(r, /, trim=0, *, legacy=False)` ¶

`lmo.diagnostic.rejection_point(influence_fn, /, rho_min=0, rho_max=np.inf)` ¶

`lmo.diagnostic.error_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))` ¶

`lmo.diagnostic.shift_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))` ¶