module Estimate:sig
..end
Deepest numerology.
typeestimator =
float array -> float array -> e
estimator xs ys
is an estimate of the true value of dependent ys
,
given independent xs
.
type
e = {
|
a : |
|
b : |
|
r2 : |
|
bounds : |
A linear estimate, modeling the dependent variable as Y = a + bX
.
r2
is
R²
of the
fit.
bounds
are the 99%
confidence interval
for b
.
val pp_e : e Unmark.fmt
Pretty-prints e
.
val validity : e -> [ `Bad | `Good | `Meh | `Undef ]
Gives an interpretation of e
's validity. One of:
`Good
- e
describes the data extremely well.`Meh
- e
should be further investigated. It does not fully describe
the data.`Bad
- e
is likely meaningless. It poorly describes the data.`Undef
- e
is not defined over the data.e
can fail to describe the data for reasons ranging from successfully
filtering out the noise, to data being non-linear and e
being completely
meaningless. Rules of thumb:
b
stays stable over repreated runs, the measurement is noisy, but
the estimate is good.e
is most commonly undefined when the data is all-0, in which case 0
is still a good summary of it.
validity
is currently implemented by checking R²
.
`Undef
is R² = 0
,
`Bad
is 0 < R² < 0.9
,
and `Meh
is 0.9 <= R² < 0.98
.
val ols : estimator
Ordinary Least Squares is the usual linear regression that minimizes
average squared residuals (and maximizes R²
).
OLS assumptions are somewhat violated by the benchmarking procedure: the
errors have a heavy positive tail (instead of being normally distributed),
and their variance grows with the number of iterations (instead of being
independent of X
). As a result, the estimates are overly affected by
noise and tend to have less stability across runs, while the confidence
intervals tend to be too narrow to predict this instability.
On the upside, OLS is quick to compute, well known, and readily interpretable.
val tse : estimator
Theil–Sen Estimator is an order-based linear regression.
It provides a robust estimation in the sense that
making TSE resilient against the skewed benchmark noise, and giving it good stability across runs.
Note that by design, TSE does not maximize R²
. For this reason, R²
is
not an entirely accurate way of assessing goodness-of-fit for this method,
and perfectly good estimates are sometimes flagged as having poor
validity
. A particular failure case happens with results
resembling a step-function, where TSE sometimes (correctly) assigns a
vanishingly small b
, and a biased a
that offsets the line. Such
predictions have an error larger than the sample mean, resulting in a
negative R²
.
Persistently low R²
should be investigated by plotting the results.
TSE is the default estimator in Unmark.