Statistical test and tools¶
lmo.diagnostic
¶
Hypothesis tests, estimator properties, and performance metrics.
lmo.diagnostic.HypothesisTestResult
¶
Bases: NamedTuple
Results of a hypothesis test.
lmo.diagnostic.normaltest(a, /, *, axis=None)
¶
Statistical hypothesis test for non-normality, using the L-skewness and L-kurtosis coefficients on the sample data..
Adapted from Harri & Coble (2011), and includes Hosking’s correction.
Definition
- H0: The data was drawn from a normal distribution.
- H1: The data was drawn from a non-normal distribution.
Examples:
Compare the testing power with scipy.stats.normaltest
given 10.000 samples from a contaminated normal distribution.
>>> import numpy as np
>>> from lmo.diagnostic import normaltest
>>> from scipy.stats import normaltest as normaltest_scipy
>>> rng = np.random.default_rng(12345)
>>> n = 10_000
>>> x = 0.9 * rng.normal(0, 1, n) + 0.1 * rng.normal(0, 9, n)
>>> normaltest(x)[1]
0.04806618
>>> normaltest_scipy(x)[1]
0.08435627
At a 5% significance level, Lmo’s test is significant (i.e. indicating non-normality), whereas scipy’s test isn’t (i.e. inconclusive).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
a | AnyArrayFloat | Array-like of sample data. | required |
axis | int | None | Axis along which to compute the test. | None |
Returns:
Type | Description |
---|---|
HypothesisTestResult | A named tuple with:
|
lmo.diagnostic.l_moment_gof(rv_or_cdf, l_moments, n_obs, /, trim=0, **kwargs)
¶
Goodness-of-fit (GOF) hypothesis test for the null hypothesis that the observed L-moments come from a distribution with the given scipy.stats
distribution or cumulative distribution function (CDF).
H0
: The theoretical probability distribution, with the given CDF, is a good fit for the observed L-moments.H1
: The distribution is not a good fit for the observed L-moments.
The test statistic is the squared Mahalanobis distance between the \(n\) observed L-moments, and the theoretical L-moments. It asymptically (in sample size) follows the \(\chi^2\) distribution, with \(n\) degrees of freedom.
The sample L-moments are expected to be of consecutive orders \(r = 1, 2, \dots, n\). Generally, the amount of L-moments \(n\) should not be less than the amount of parameters of the distribution, including the location and scale parameters. Therefore, it is required to have \(n \ge 2\).
Notes
The theoretical L-moments and their covariance matrix are calculated from the CDF using numerical integration (scipy.integrate.quad
and scipy.integrate.nquad
). Undefined or infinite integrals cannot be detected, in which case the results might be incorrect.
If an IntegrationWarning
is issued, or the function is very slow, then the results are probably incorrect, and larger degrees of trimming should be used.
Examples:
Test if the samples are drawn from a normal distribution.
>>> import lmo
>>> import numpy as np
>>> from lmo.diagnostic import l_moment_gof
>>> from scipy.stats import norm
>>> rng = np.random.default_rng(12345)
>>> X = norm(13.12, 1.66)
>>> n = 1_000
>>> x = X.rvs(n, random_state=rng)
>>> x_lm = lmo.l_moment(x, [1, 2, 3, 4])
>>> l_moment_gof(X, x_lm, n).pvalue
0.82597
Contaminated samples:
>>> y = 0.9 * x + 0.1 * rng.normal(X.mean(), X.std() * 10, n)
>>> y_lm = lmo.l_moment(y, [1, 2, 3, 4])
>>> y_lm.round(3)
array([13.193, 1.286, 0.006, 0.168])
>>> l_moment_gof(X, y_lm, n).pvalue
0.0
See Also
lmo.diagnostic.l_stats_gof(rv_or_cdf, l_stats, n_obs, /, trim=0, **kwargs)
¶
Analogous to lmo.diagnostic.l_moment_gof
, but using the L-stats (see lmo.l_stats
).
lmo.diagnostic.l_moment_bounds(r, /, trim=0, scale=1.0)
¶
l_moment_bounds(r: AnyOrderND, /, trim: AnyTrim = ..., scale: float = ...) -> _ArrF8
l_moment_bounds(r: AnyOrder, /, trim: AnyTrim = ..., scale: float = ...) -> float
Returns the absolute upper bounds \(L^{(s,t)}_r\) on L-moments \(\lambda^{(s,t)}_r\), proportional to the scale \(\sigma_X\) (standard deviation) of the probability distribution of random variable \(X\). So \(\left| \lambda^{(s,t)}_r(X) \right| \le \sigma_X \, L^{(s,t)}_r\), given that standard deviation \(\sigma_X\) of \(X\) exists and is finite.
Warning
These bounds do not apply to distributions with undefined variance, e.g. the Cauchy distribution, even if trimmed L-moments are used. Distributions with infinite variance (e.g. Student’s t with \(\nu=2\)) are a grey area:
For the L-scale (\(r=2\)), the corresponding bound will not be a valid one. However, it can still be used to find the L-ratio bounds, because the \(\sigma_X\) terms will cancel out. Doing this is not for the faint of heart, as it requires dividing infinity by infinity. So be sure to wear safety glasses.
The bounds are derived by applying the Cauchy-Schwarz inequality to the covariance-based definition of generalized trimmed L-moment, for \(r > 1\):
where \(B\) is the Beta function, \(P^{(\alpha, \beta)}_r\) the Jacobi polynomial, and \(F\) the cumulative distribution function of random variable \(X\).
After a lot of work, one can (and one did) derive the closed-form inequality:
for \(r \in \mathbb{N}_{\ge 2}\) and \(s, t \in \mathbb{R}_{\ge 0}\), where \(\Gamma\) is the Gamma function, and \(B\) the multivariate Beta function
For the untrimmed L-moments, this simplifies to
Notes
For \(r=1\) there are no bounds, i.e. float('inf')
is returned.
There are no references; this novel finding is not (yet..?) published by the author, @jorenham.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
r | AnyOrder | AnyOrderND | The L-moment order(s), non-negative integer or array-like of integers. | required |
trim | AnyTrim | Left- and right-trim orders \((s, t)\), as a tuple of non-negative ints or floats. | 0 |
scale | float | The standard deviation \(\sigma_X\) of the random variable \(X\). Defaults to 1. | 1.0 |
Returns:
Name | Type | Description |
---|---|---|
out | float | _ArrF8 | float array or scalar like |
See Also
lmo.diagnostic.l_ratio_bounds(r, /, trim=0, *, legacy=False)
¶
l_ratio_bounds(
r: AnyOrderND, /, trim: AnyTrim = ..., *, legacy: bool = ...
) -> _Tuple2[_ArrF8]
l_ratio_bounds(
r: AnyOrder, /, trim: AnyTrim = ..., *, legacy: bool = ...
) -> _Tuple2[float]
Unlike the standardized product-moments, the L-moment ratio’s with \( r \ge 2 \) are bounded above and below.
Specifically, Hosking derived in 2007 that
But this derivation relies on unnecessarily loose Jacobi polynomial bounds. If the actual min and max of the Jacobi polynomials are used instead, the following (tighter) inequality is obtained:
where
Examples:
Without trim, the lower- and upper-bounds of the L-skewness and L-kurtosis are:
>>> l_ratio_bounds(3)
(-1.0, 1.0)
>>> l_ratio_bounds(4)
(-0.25, 1.0)
For the L-kurtosis, the “legacy” bounds by Hosking (2007) are clearly looser:
>>> l_ratio_bounds(4, legacy=True)
(-1.0, 1.0)
For the symmetrically trimmed TL-moment ratio’s:
>>> l_ratio_bounds(3, trim=3)
(-1.2, 1.2)
>>> l_ratio_bounds(4, trim=3)
(-0.15, 1.5)
Similarly, those of the LL-ratio’s are
>>> l_ratio_bounds(3, trim=(0, 3))
(-0.8, 2.0)
>>> l_ratio_bounds(4, trim=(0, 3))
(-0.233333, 3.5)
The LH-skewness bounds are “flipped” w.r.t to the LL-skewness, but they are the same for the L*-kurtosis:
>>> l_ratio_bounds(3, trim=(3, 0))
(-2.0, 0.8)
>>> l_ratio_bounds(4, trim=(3, 0))
(-0.233333, 3.5)
The bounds of multiple L-ratio’s can be calculated in one shot:
>>> np.stack(l_ratio_bounds([3, 4, 5, 6], trim=(1, 2)))
array([[-1. , -0.19444444, -1.12 , -0.14945848],
[ 1.33333333, 1.75 , 2.24 , 2.8 ]])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
r | AnyOrder | AnyOrderND | Scalar or array-like with the L-moment ratio order(s). | required |
trim | AnyTrim | L-moment ratio trim-length(s). | 0 |
legacy | bool | If set to | False |
Returns:
Type | Description |
---|---|
_Tuple2[float | _ArrF8] | A 2-tuple with arrays or scalars, of the lower- and upper bounds. |
See Also
lmo.diagnostic.rejection_point(influence_fn, /, rho_min=0, rho_max=np.inf)
¶
Evaluate the approximate rejection point of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).
with a \(\epsilon\) a small positive number, corresponding to the tol
param of e.g. l_moment_influence , which defaults to 1e-8
.
Examples:
The untrimmed L-location isn’t robust, e.g. with the standard normal distribution:
>>> import numpy as np
>>> from scipy.stats import distributions as dists
>>> from lmo.diagnostic import rejection_point
>>> if_l_loc_norm = dists.norm.l_moment_influence(1, trim=0)
>>> if_l_loc_norm(np.inf)
inf
>>> rejection_point(if_l_loc_norm)
nan
For the TL-location of the Gaussian distribution, and even for the Student’s t distribution with 4 degrees of freedom (3 also works, but is very slow), they exist.
>>> influence_norm = dists.norm.l_moment_influence(1, trim=1)
>>> influence_t4 = dists.t(4).l_moment_influence(1, trim=1)
>>> influence_norm(np.inf), influence_t4(np.inf)
(0.0, 0.0)
>>> rejection_point(influence_norm), rejection_point(influence_t4)
(6.0, 206.0)
Notes
Large rejection points (e.g. >1000) are unlikely to be found.
For instance, that of the TL-location of the Student’s t distribution with 2 degrees of freedom lies between somewhere 1e4
and 1e5
, but will not be found. In this case, using trim=2
will return 166.0
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
influence_fn | Callable[[float], float] | Univariate influence function. | required |
rho_min | float | The minimum \(\rho^*_{T|F}\) of the search space. Must be finite and non-negative. Defaults to \(0\). | 0 |
rho_max | float | The maximum \(\rho^*_{T|F}\) of the search space. Must be larger than | inf |
Returns:
Type | Description |
---|---|
float | A finite or infinite scalar. |
lmo.diagnostic.error_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))
¶
Evaluate the gross-error sensitivity of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).
Examples:
Evaluate the gross-error sensitivity of the standard exponential distribution’s LL-skewness (\(\tau^{(0, 1)}_3\)) and LL-kurtosis (\(\tau^{(0, 1)}_4\)) coefficients:
>>> from lmo.diagnostic import error_sensitivity
>>> from scipy.stats import expon
>>> ll_skew_if = expon.l_ratio_influence(3, 2, trim=(0, 1))
>>> ll_kurt_if = expon.l_ratio_influence(4, 2, trim=(0, 1))
>>> error_sensitivity(ll_skew_if, domain=(0, float("inf")))
1.814657
>>> error_sensitivity(ll_kurt_if, domain=(0, float("inf")))
1.377743
Parameters:
Name | Type | Description | Default |
---|---|---|---|
influence_fn | Callable[[float], float] | Univariate influence function. | required |
domain | _Tuple2[float] | Domain of the CDF. Defaults to \((-\infty, \infty)\). | (-inf, inf) |
Returns:
Type | Description |
---|---|
float | Gross-error sensitivity \(\gamma^*_{T|F}\) . |
lmo.diagnostic.shift_sensitivity(influence_fn, /, domain=(-math.inf, math.inf))
¶
Evaluate the local-shift sensitivity of an influence function \(\psi_{T|F}(x)\) given a statistical functional \(T\) (e.g. an L-moment) and cumulative distribution function \(F(x)\).
Represents the effect of shifting an observation slightly from \(x\), to a neighbouring point \(y\). For instance, adding an observation at \(y\) and removing one at \(x\).
Examples:
Evaluate the local-shift sensitivity of the standard exponential distribution’s LL-skewness (\(\tau^{(0, 1)}_3\)) and LL-kurtosis (\(\tau^{(0, 1)}_4\)) coefficients:
>>> from lmo.diagnostic import shift_sensitivity
>>> from scipy.stats import expon
>>> ll_skew_if = expon.l_ratio_influence(3, 2, trim=(0, 1))
>>> ll_kurt_if = expon.l_ratio_influence(4, 2, trim=(0, 1))
>>> domain = 0, float("inf")
>>> shift_sensitivity(ll_skew_if, domain)
0.837735
>>> shift_sensitivity(ll_kurt_if, domain)
1.442062
Let’s compare these with the untrimmed ones:
>>> shift_sensitivity(expon.l_ratio_influence(3, 2), domain)
1.920317
>>> shift_sensitivity(expon.l_ratio_influence(4, 2), domain)
1.047565
Parameters:
Name | Type | Description | Default |
---|---|---|---|
influence_fn | Callable[[float], float] | Univariate influence function. | required |
domain | _Tuple2[float] | Domain of the CDF. Defaults to \((-\infty, \infty)\). | (-inf, inf) |
Returns:
Type | Description |
---|---|
float | Local-shift sensitivity \(\lambda^*_{T|F}\) . |