Uses Student's t-distribution
1 uses
1.1 in frequentist statistical inference
1.1.1 hypothesis testing
1.1.2 confidence intervals
1.1.3 prediction intervals
1.2 in bayesian statistics
1.3 robust parametric modeling
uses
in frequentist statistical inference
student s t-distribution arises in variety of statistical estimation problems goal estimate unknown parameter, such mean value, in setting data observed additive errors. if (as in practical statistical work) population standard deviation of these errors unknown , has estimated data, t-distribution used account uncertainty results estimation. in such problems, if standard deviation of errors known, normal distribution used instead of t-distribution.
confidence intervals , hypothesis tests 2 statistical procedures in quantiles of sampling distribution of particular statistic (e.g. standard score) required. in situation statistic linear function of data, divided usual estimate of standard deviation, resulting quantity can rescaled , centered follow student s t-distribution. statistical analyses involving means, weighted means, , regression coefficients lead statistics having form.
quite often, textbook problems treat population standard deviation if known , thereby avoid need use student s t-distribution. these problems of 2 kinds: (1) in sample size large 1 may treat data-based estimate of variance if certain, , (2) illustrate mathematical reasoning, in problem of estimating standard deviation temporarily ignored because not point author or instructor explaining.
hypothesis testing
a number of statistics can shown have t-distributions samples of moderate size under null hypotheses of interest, t-distribution forms basis significance tests. example, distribution of spearman s rank correlation coefficient ρ, in null case (zero correlation) approximated t distribution sample sizes above 20.
confidence intervals
suppose number chosen that
pr
(
−
a
<
t
<
a
)
=
0.9
,
{\displaystyle \pr(-a<t<a)=0.9,}
when t has t-distribution n − 1 degrees of freedom. symmetry, same saying satisfies
pr
(
t
<
a
)
=
0.95
,
{\displaystyle \pr(t<a)=0.95,}
so 95th percentile of probability distribution, or
a
=
t
(
0.05
,
n
−
1
)
{\displaystyle a=t_{(0.05,n-1)}}
. then
pr
(
−
a
<
x
¯
n
−
μ
s
n
n
<
a
)
=
0.9
,
{\displaystyle \pr \left(-a<{\frac {{\overline {x}}_{n}-\mu }{\frac {s_{n}}{\sqrt {n}}}}<a\right)=0.9,}
and equivalent to
pr
(
x
¯
n
−
a
s
n
n
<
μ
<
x
¯
n
+
a
s
n
n
)
=
0.9.
{\displaystyle \pr \left({\overline {x}}_{n}-a{\frac {s_{n}}{\sqrt {n}}}<\mu <{\overline {x}}_{n}+a{\frac {s_{n}}{\sqrt {n}}}\right)=0.9.}
therefore, interval endpoints are
x
¯
n
±
a
s
n
n
{\displaystyle {\overline {x}}_{n}\pm a{\frac {s_{n}}{\sqrt {n}}}}
is 90% confidence interval μ. therefore, if find mean of set of observations can reasonably expect have normal distribution, can use t-distribution examine whether confidence limits on mean include theoretically predicted value – such value predicted on null hypothesis.
it result used in student s t-tests: since difference between means of samples 2 normal distributions distributed normally, t-distribution can used examine whether difference can reasonably supposed zero.
if data distributed, one-sided (1 − a)-upper confidence limit (ucl) of mean, can calculated using following equation:
u
c
l
1
−
a
=
x
¯
n
+
t
a
,
n
−
1
s
n
n
.
{\displaystyle \mathrm {ucl} _{1-a}={\overline {x}}_{n}+t_{a,n-1}{\frac {s_{n}}{\sqrt {n}}}.}
the resulting ucl greatest average value occur given confidence interval , population size. in other words,
x
¯
n
{\displaystyle {\overline {x}}_{n}}
being mean of set of observations, probability mean of distribution inferior ucl1−a equal confidence level 1 − a.
prediction intervals
the t-distribution can used construct prediction interval unobserved sample normal distribution unknown mean , variance.
in bayesian statistics
the student s t-distribution, in three-parameter (location-scale) version, arises in bayesian statistics result of connection normal distribution. whenever variance of distributed random variable unknown , conjugate prior placed on follows inverse gamma distribution, resulting marginal distribution of variable follow student s t-distribution. equivalent constructions same results involve conjugate scaled-inverse-chi-squared distribution on variance, or conjugate gamma distribution on precision. if improper prior proportional σ placed on variance, t-distribution arises. case regardless of whether mean of distributed variable known, unknown distributed according conjugate distributed prior, or unknown distributed according improper constant prior.
related situations produce t-distribution are:
the marginal posterior distribution of unknown mean of distributed variable, unknown prior mean , variance following above model.
the prior predictive distribution , posterior predictive distribution of new distributed data point when series of independent identically distributed distributed data points have been observed, prior mean , variance in above model.
robust parametric modeling
the t-distribution used alternative normal distribution model data, has heavier tails normal distribution allows for; see e.g. lange et al. classical approach identify outliers , exclude or downweight them in way. however, not easy identify outliers (especially in high dimensions), , t-distribution natural choice of model such data , provides parametric approach robust statistics.
a bayesian account can found in gelman et al. degrees of freedom parameter controls kurtosis of distribution , correlated scale parameter. likelihood can have multiple local maxima and, such, necessary fix degrees of freedom @ low value , estimate other parameters taking given. authors report values between 3 , 9 choices. venables , ripley suggest value of 5 choice.
Comments
Post a Comment