Likelihood function

For statistical inference using likelihood functions, see maximum-likelihood estimation and likelihood-ratio testing.

In statistics, a likelihood function (often simply the likelihood) is a function of the parameters of a statistical model given data. Likelihood functions play a key role in statistical inference, especially methods of estimating a parameter from a set of statistics. In informal contexts, "likelihood" is often used as a synonym for "probability." In statistics, a distinction is made depending on the roles of outcomes vs. parameters. Probability is used before data are available to describe possible future outcomes given a fixed value for the parameter (or parameter vector). Likelihood is used after data are available to describe a function of a parameter (or parameter vector) for a given outcome.

Definition

The likelihood of a set of parameter values, θ, given outcomes x, is equal to the probability of those observed outcomes given those parameter values, that is

\mathcal{L}(\theta |x) = P(x | \theta)

The likelihood function is defined differently for discrete and continuous probability distributions.

Discrete probability distribution

Let X be a random variable with a discrete probability distribution p depending on a parameter θ. Then the function

{\mathcal {L}}(\theta |x)=p_{\theta }(x)=P_{\theta }(X=x)

considered as a function of θ, is called the likelihood function (of θ, given the outcome x of the random variable X). Sometimes the probability of the value x of X for the parameter value θ is written as $P(X=x|\theta)$ ; often written as $P(X=x;\theta)$ to emphasize that this differs from $\mathcal{L}(\theta |x)$ which is not a conditional probability, because θ is a parameter and not a random variable.

Continuous probability distribution

Let X be a random variable following an absolutely continuous probability distribution with density function f depending on a parameter θ. Then the function

\mathcal{L}(\theta |x) = f_{\theta} (x), \,

considered as a function of θ, is called the likelihood function (of θ, given the outcome x of X). Sometimes the density function for the value x of X for the parameter value θ is written as $f(x|\theta )$ ; this should not be confused with $\mathcal{L}(\theta |x)$ which should not be considered a conditional probability density.

In general

In measure-theoretic probability theory, the density function is defined as the Radon-Nikodym derivative of the probability distribution relative to a dominating measure. This provides a likelihood function for any probability model with all distributions, whether discrete, absolutely continuous, a mixture or something else. (Likelihoods will be comparable, e.g., for parameter estimation, only if they are Radon–Nikodym derivatives with respect to the same dominating measure.) .

Log-likelihood

For many applications, the natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques. Finding the maximum of a function often involves taking the derivative of a function and solving for the parameter being maximized, and this is often easier when the function being maximized is a log-likelihood rather than the original likelihood function.

For example, some likelihood functions are for the parameters that explain a collection of statistically independent observations. In such a situation, the likelihood function factors into a product of individual likelihood functions. The logarithm of this product is a sum of individual logarithms, and the derivative of a sum of terms is often easier to compute than the derivative of a product. In addition, several common distributions have likelihood functions that contain products of factors involving exponentiation. The logarithm of such a function is a sum of products, again easier to differentiate than the original function.

Edwards ^[1] established the axiomatic basis for use of the log-likelihood ratio as a measure of relative support for one hypothesis against another. The support function is then the natural logarithm of the likelihood function. Both terms are used in phylogenetics but were not adopted in a general treatment of the topic of statistical evidence.^[2]

Example: the gamma distribution

The gamma distribution has two parameters α and β. The likelihood function is

\mathcal{L} (\alpha, \beta \,|\, x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}

Finding the maximum likelihood estimate of β for a single observed value x looks rather daunting. Its logarithm is much simpler to work with:

\log \mathcal{L}(\alpha,\beta \,|\, x) = \alpha \log \beta - \log \Gamma(\alpha) + (\alpha-1) \log x - \beta x. \,

Maximizing the log-likelihood first requires taking the partial derivative with respect to β:

\frac{\partial \log \mathcal{L}(\alpha,\beta \,|\, x)}{\partial \beta} = \frac{\alpha}{\beta} - x

If there are a number of independent observations x₁, ..., x_n, then the joint log-likelihood will be the sum of individual log-likelihoods, and the derivative of this sum will be a sum of derivatives of each individual log-likelihood:

\frac{\partial \log \mathcal{L}(\alpha,\beta \,|\, x_1, \ldots, x_n)}{\partial \beta} = \frac{\partial \log \mathcal{L}(\alpha,\beta \,|\, x_1)}{\partial \beta} + \cdots + \frac{\partial \log \mathcal{L}(\alpha,\beta \,|\, x_n)}{\partial \beta} = \frac{n \alpha}{\beta} - \sum_{i=1}^n x_i.

To complete the maximization procedure for the joint log-likelihood, the equation is set to zero and solved for β:

\hat\beta = \frac{\alpha}{\bar{x}}.

Here ${\hat {\beta }}$ denotes the maximum-likelihood estimate, and $\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i$ is the sample mean of the observations.

Likelihood function of a parameterized model

Among many applications, we consider here one of broad theoretical and practical importance. Given a parameterized family of probability density functions (or probability mass functions in the case of discrete distributions)

x\mapsto f(x\mid\theta), \!

where θ is the parameter, the likelihood function is

\theta\mapsto f(x\mid\theta), \!

written

\mathcal{L}(\theta \mid x)=f(x\mid\theta), \!

where x is the observed outcome of an experiment. In other words, when f(x | θ) is viewed as a function of x with θ fixed, it is a probability density function, and when viewed as a function of θ with x fixed, it is a likelihood function.

This is not the same as the probability that those parameters are the right ones, given the observed sample. Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous consequences in medicine, engineering or jurisprudence. See prosecutor's fallacy for an example of this.

From a geometric standpoint, if we consider f (x, θ) as a function of two variables then the family of probability distributions can be viewed as a family of curves parallel to the x-axis, while the family of likelihood functions are the orthogonal curves parallel to the θ-axis.

Likelihoods for continuous distributions

The use of the probability density instead of a probability in specifying the likelihood function above is justified as follows. The likelihood that an observation $x$ lies in the interval $[x_{j},x_{j}+h]$ , where $x_{j}$ is a specific observed value and $h>0$ a constant, is given by ${\mathcal {L}}(\theta |x\in [x_{j},x_{j}+h])$ . Observe that $\arg \max _{\theta }{\mathcal {L}}(\theta |x\in [x_{j},x_{j}+h])=\arg \max _{\theta }{\frac {1}{h}}{\mathcal {L}}(\theta |x\in [x_{j},x_{j}+h])$ , since $h$ is positive and constant. Because $\arg \max _{\theta }{\frac {1}{h}}{\mathcal {L}}(\theta |x\in [x_{j},x_{j}+h])=\arg \max _{\theta }{\frac {1}{h}}\mathrm {Pr} (x_{j}\leq x\leq x_{j}+h|\theta )=\arg \max _{\theta }{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x|\theta )\,dx$ , where $f(x|\theta )$ is the probability density function of the variable $x$ , it follows that $\arg \max _{\theta }{\mathcal {L}}(\theta |x\in [x_{j},x_{j}+h])=\arg \max _{\theta }{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x|\theta )\,dx$ . The first fundamental theorem of calculus and the l'Hôpital's rule together provide that $\lim _{h\to 0^{+}}{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x|\theta )\,dx=\lim _{h\to 0^{+}}{\frac {{\frac {d}{dh}}\int _{x_{j}}^{x_{j}+h}f(x|\theta )\,dx}{\frac {dh}{dh}}}=\lim _{h\to 0^{+}}{\frac {f(x_{j}+h|\theta )}{1}}=f(x_{j}|\theta )$ . Then, $\arg \max _{\theta }{\mathcal {L}}(\theta |x_{j})=\arg \max _{\theta }\left[\lim _{h\to 0^{+}}{\mathcal {L}}(\theta |x\in [x_{j},x_{j}+h])\right]=\arg \max _{\theta }\left[\lim _{h\to 0^{+}}{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x|\theta )\,dx\right]=\arg \max _{\theta }f(x_{j}|\theta )$ . Therefore,

\arg \max _{\theta }{\mathcal {L}}(\theta |x_{j})=\arg \max _{\theta }f(x_{j}|\theta )\!

and so maximizing the probability density at $x_{j}$ amounts to maximizing the likelihood of the specific observation $x_{j}$ .

Likelihoods for mixed continuous–discrete distributions

The above can be extended in a simple way to allow consideration of distributions which contain both discrete and continuous components. Suppose that the distribution consists of a number of discrete probability masses p_k(θ) and a density f(x | θ), where the sum of all the p's added to the integral of f is always one. Assuming that it is possible to distinguish an observation corresponding to one of the discrete probability masses from one which corresponds to the density component, the likelihood function for an observation from the continuous component can be dealt with in the manner shown above. For an observation from the discrete component, the likelihood function for an observation from the discrete component is simply

\mathcal{L}(\theta \mid x )= p_k(\theta), \!

where k is the index of the discrete probability mass corresponding to observation x, because maximizing the probability mass (or probability) at x amounts to maximizing the likelihood of the specific observation.

The fact that the likelihood function can be defined in a way that includes contributions that are not commensurate (the density and the probability mass) arises from the way in which the likelihood function is defined up to a constant of proportionality, where this "constant" can change with the observation x, but not with the parameter θ.

Example 1

The likelihood function for estimating the probability of a coin landing heads-up without prior knowledge after observing HH

The likelihood function for estimating the probability of a coin landing heads-up without prior knowledge after observing HHT

Let $p_\text{H}$ be the probability that a certain coin lands heads up (H) when tossed. So, the probability of getting two heads in two tosses (HH) is $p_\text{H}^2$ . If $p_\text{H} = 0.5$ , then the probability of seeing two heads is 0.25:

P(\text{HH} | p_\text{H}=0.5) = 0.25.

With this, we can say that the likelihood that $p_\text{H} = 0.5$ , given the observation HH, is 0.25, that is

\mathcal{L}(p_\text{H}=0.5 | \text{HH}) = P(\text{HH} | p_\text{H}=0.5) = 0.25.

But this is not the same as saying that the probability that $p_\text{H} = 0.5$ , given the observation HH, is 0.25. For that, we need concepts from Bayesian inference. In particular, Bayes' theorem says that the posterior probability (density) is proportional to the likelihood times the prior probability. When tossing a physical coin, the probability is zero that $p_\text{H}$ is exactly 0.5, because any physical device has imperfections. The edges of any coin will be slightly beveled, and the mass distribution will never be perfectly uniform. This will generate a distribution for $p_\text{H}$ . Moreover, the features on the coin generate slight imbalances, suggesting that even the average of this distribution will likely not be exactly 0.5. However, it might be hard to find a coin that is demonstratively not fair, i.e., for which $p_\text{H}$ is clearly greater than (or less than) 0.5.

Example 2

Main article: German tank problem

Consider a jar containing N lottery tickets numbered from 1 through N. If you pick a ticket randomly then you get positive integer n, with probability 1/N if n ≤ N and with probability zero if n > N. This can be written

P(n|N)= \frac{[n \le N]}{N}

where the Iverson bracket [n ≤ N] is 1 when n ≤ N and 0 otherwise. When considered a function of n for fixed N this is the probability distribution, but when considered a function of N for fixed n this is a likelihood function. The maximum likelihood estimate for N is N₀ = n (by contrast, the unbiased estimate is 2n − 1).

This likelihood function is not a probability distribution, because the total

\sum_{N=1}^\infty P(n|N) = \sum_{N} \frac{[N \ge n]}{N} = \sum_{N=n}^\infty \frac{1}{N}

is a divergent series.

Suppose, however, that you pick two tickets rather than one.

The probability of the outcome {n₁, n₂}, where n₁ < n₂, is

P(\{n_1,n_2\}|N)= \frac{[n_2 \le N]}{\binom N 2} .

When considered a function of N for fixed n₂, this is a likelihood function. The maximum likelihood estimate for N is N₀ = n₂.

This time the total

\sum_{N=1}^\infty P(\{n_1,n_2\}|N) = \sum_{N} \frac{[N\ge n_2]}{\binom N 2} =\frac 2 {n_2-1}

is a convergent series, and so this likelihood function can be normalized into a probability distribution.

If you pick 3 or more tickets, the likelihood function has a well defined mean value, which is larger than the maximum likelihood estimate. If you pick 4 or more tickets, the likelihood function has a well defined standard deviation too.

With 2 or more tickets, the probability distributions just derived match the results from a Bayesian analysis assuming an improper, uniform prior for N over all positive integers. The use of improper priors is often justified by saying that the information from the data dominates the information from the prior. If only a very few tickets are available, and a precise answer is important, this can justify the work of collecting relevant information from other sources to use as an informative prior.

Relative likelihood

Relative likelihood function

Suppose that the maximum likelihood estimate for θ is ${\hat {\theta }}$ . Relative plausibilities of other θ values may be found by comparing the likelihood of those other values with the likelihood of ${\hat {\theta }}$ . The relative likelihood of θ is defined^[3]^[4] as $\mathcal{L}(\theta | x)/\mathcal{L}(\hat \theta | x).$

A 10% likelihood region for θ is

\{\theta : \mathcal{L}(\theta | x)/\mathcal{L}(\hat \theta | x) \ge 0.10\},

and more generally, a p% likelihood region for θ is defined^[3]^[4] to be

\{\theta : \mathcal{L}(\theta | x)/\mathcal{L}(\hat \theta | x) \ge p/100 \}.

If θ is a single real parameter, a p% likelihood region will typically comprise an interval of real values. In that case, the region is called a likelihood interval.^[3]^[4]^[5]

Likelihood intervals can be compared to confidence intervals. If θ is a single real parameter, then under certain conditions, a 14.7% likelihood interval for θ will be the same as a 95% confidence interval.^[3] In a slightly different formulation suited to the use of log-likelihoods (see Wilks' theorem), the test-statistic is twice the difference in log-likelihoods and the probability distribution of the test statistic is approximately a chi-squared distribution with degrees-of-freedom (df) equal to the difference in df's between the two models (therefore, the e⁻² likelihood interval is the same as the 0.954 confidence interval; assuming difference in df's to be 1).^[5]

The idea of basing an interval estimate on the relative likelihood goes back to Fisher in 1956 and has been used by many authors since then.^[5] A likelihood interval can be used without claiming any particular coverage probability; as such, it differs from confidence intervals.

Relative likelihood of models

The definition of relative likelihood can be generalized to compare different statistical models. This generalization is based on AIC (Akaike information criterion), or sometimes AICc (Akaike Information Criterion with correction).

Suppose that, for some dataset, we have two statistical models, M₁ and M₂. Also suppose that AIC(M₁) ≤ AIC(M₂). Then the relative likelihood of M₂ with respect to M₁ is defined^[6] to be

exp((AIC(M₁)−AIC(M₂))/2)

To see that this is a generalization of the earlier definition, suppose that we have some model M with a (possibly multivariate) parameter θ. Then for any θ, set M₂ = M(θ), and also set M₁ = M( ${\hat {\theta }}$ ). The general definition now gives the same result as the earlier definition.

Likelihoods that eliminate nuisance parameters

In many cases, the likelihood is a function of more than one parameter but interest focuses on the estimation of only one, or at most a few of them, with the others being considered as nuisance parameters. Several alternative approaches have been developed to eliminate such nuisance parameters so that a likelihood can be written as a function of only the parameter (or parameters) of interest; the main approaches being marginal, conditional and profile likelihoods.^[7]^[8]

These approaches are useful because standard likelihood methods can become unreliable or fail entirely when there are many nuisance parameters or when the nuisance parameters are high-dimensional. This is particularly true when the nuisance parameters can be considered to be "missing data"; they represent a non-negligible fraction of the number of observations and this fraction does not decrease when the sample size increases. Often these approaches can be used to derive closed-form formulae for statistical tests when direct use of maximum likelihood requires iterative numerical methods. These approaches find application in some specialized topics such as sequential analysis.

Conditional likelihood

Sometimes it is possible to find a sufficient statistic for the nuisance parameters, and conditioning on this statistic results in a likelihood which does not depend on the nuisance parameters.

One example occurs in 2×2 tables, where conditioning on all four marginal totals leads to a conditional likelihood based on the non-central hypergeometric distribution. This form of conditioning is also the basis for Fisher's exact test.

Marginal likelihood

Main article: Marginal likelihood

Sometimes we can remove the nuisance parameters by considering a likelihood based on only part of the information in the data, for example by using the set of ranks rather than the numerical values. Another example occurs in linear mixed models, where considering a likelihood for the residuals only after fitting the fixed effects leads to residual maximum likelihood estimation of the variance components.

Profile likelihood

When the likelihood function depends on many parameters, depending on the application, we might be interested in only a subset of these parameters. It is often possible to reduce the number of the uninteresting (nuisance) parameters by writing them as functions of the parameters of interest. For example, the functions might be the value of the nuisance parameter which maximizes the likelihood given the value of the other (interesting) parameters.

This procedure is called concentration of the parameters and results in the concentrated likelihood function, also occasionally known as the maximized likelihood function, but most often called the profile likelihood function. It is then possible (and simpler) to find the values of the parameters which maximizes the profile likelihood function (similar to the Maximum likelihood)

For example, consider a regression analysis model with normally distributed errors. The most likely value of the error variance is the variance of the residuals. The residuals depend on all other parameters. Hence the variance parameter can be written as a function of the other parameters.

Unlike conditional and marginal likelihoods, profile likelihood methods can always be used, even when the profile likelihood cannot be written down explicitly. However, the profile likelihood is not a true likelihood, as it is not based directly on a probability distribution, and this leads to some less satisfactory properties. Attempts have been made to improve this, resulting in modified profile likelihood.

The idea of profile likelihood can also be used to compute confidence intervals that often have better small-sample properties than those based on asymptotic standard errors calculated from the full likelihood. In the case of parameter estimation in partially observed systems, the profile likelihood can be also used for identifiability analysis.^[9] Results from profile likelihood analysis can be incorporated in uncertainty analysis of model predictions.^[10]

Partial likelihood

A partial likelihood is a factor component of the likelihood function that isolates the parameters of interest.^[11] It is a key component of the proportional hazards model.

Historical remarks

Term	Likelihood of the outcome
Virtually certain	99-100 % probability
Very likely	90-100 % probability
Likely	66-100 % probability
About as likely as not	33 to 66 % probability
Unlikely	0-33 % probability
Very unlikely	0-10 % probability
Exceptionally unlikely	0-1 % probability

Notes

↑ Edwards, A.W.F. 1972. Likelihood. Cambridge University Press, Cambridge (expanded edition, 1992, Johns Hopkins University Press, Baltimore). ISBN 0-8018-4443-6
↑ Royall, R. 1997. Statistical Evidence. Chapman and Hall / CRC, Boca Raton.
1 2 3 4 Kalbfleisch J.G. (1985) Probability and Statistical Inference, Springer (§9.3.)
1 2 3 Sprott D.A. (2000) Statistical Inference in Science, Springer (chap.2)
1 2 3 Hudson, D. J. (1971). "Interval Estimation from the Likelihood Function". Journal of the Royal Statistical Society, Series B. 33 (2): 256–262.
↑ Burnham K. P. & Anderson D.R. (2002), Model Selection and Multimodel Inference, §2.8 (Springer).
↑ Pawitan, Yudi (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press. ISBN 0-19-850765-8.
↑ Wen Hsiang Wei. "Generalized linear model course notes". Tung Hai University, Taichung, Taiwan. pp. Chapter 5. Retrieved 2007-01-23.
↑ Raue, A; Kreutz, C; Maiwald, T; Bachmann, J; Schilling, M; Klingmüller, U; Timmer, J (2009). "Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood". Bioinformatics. 25 (15): 1923–9. doi:10.1093/bioinformatics/btp358. PMID 19505944.
↑ Vanlier, J; Tiemann, C; Hilbers, P; van Riel, N (2012). "An integrated strategy for prediction uncertainty analysis". Bioinformatics. 28 (8): 1130–5. doi:10.1093/bioinformatics/bts088. PMID 22355081.
↑ Cox, D. R. (1975). "Partial likelihood". Biometrika. 62 (2): 269–276. doi:10.1093/biomet/62.2.269. MR 0400509.
↑ James Franklin (2001), The Science of Conjecture: Evidence and Probability before Pascal, The Johns Hopkins University Press, ISBN 0-8018-7109-3
↑ Anders Hald (1998). A History of Mathematical Statistics from 1750 to 1930. New York: Wiley. ISBN 0-471-17912-4.
↑ Steffen L. Lauritzen, Aspects of T. N. Thiele’s Contributions to Statistics. Bulletin of the International Statistical Institute, 58, 27–30, 1999.
↑ Steffen L. Lauritzen (2002). Thiele: Pioneer in Statistics. [Oxford University Press]. p. 288. ISBN 978-0-19-850972-1.
↑ Stigler, Stephen M. (2002). Statistics on the Table: The History of Statistical Concepts and Methods. Harvard University Press. p. 195. ISBN 9780674009790. [Peirce] found that [his subjects'] estimates varied directly with the log odds that they actually were correct, a remarkable early appearance of the log odds as an experimentally determined measure of certainty
↑ Fisher, R.A. (1922). "On the mathematical foundations of theoretical statistics". Philosophical Transactions of the Royal Society A. 222 (594–604): 309–368. doi:10.1098/rsta.1922.0009. JFM 48.1280.02. JSTOR 91208.
↑ M. D. Mastrandrea, C. B. Field, T. F. Stocker, O. Edenhofer, K. L. Ebi, D. J. Frame, H. Held, E. Kriegler, K. J. Mach, P. R. Matschoss, G.-K. Plattner, G. W. Yohe, and F. W. Zwiers, Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties, Intergovernmental Panel on Climate Change, 2010.

References

Hald, A. (1998), A History of Mathematical Statistics from 1750 to 1930, John Wiley & Sons, ISBN 0-471-17912-4 .
Hald, A. (1999), "On the History of Maximum Likelihood in Relation to Inverse Probability and Least Squares", Statistical Science, 14 (2): 214–222, doi:10.1214/ss/1009212248, JSTOR 2676741 .
Pratt, J. W. (May 1976), "F. Y. Edgeworth and R. A. Fisher on the Efficiency of Maximum Likelihood Estimation", The Annals of Statistics, 4 (3): 501–514, doi:10.1214/aos/1176343457, JSTOR 2958222 .
Stigler, S. M. (1978), "Francis Ysidro Edgeworth, Statistician", Journal of the Royal Statistical Society, Series A, 141 (3): 287–322, doi:10.2307/2344804, JSTOR 2344804 .
Stigler, S. M. (1986), The History of Statistics: The Measurement of Uncertainty before 1900, Harvard University Press, ISBN 0-674-40340-1 .
Stigler, S. M. (1999), Statistics on the Table: The History of Statistical Concepts and Methods, Harvard University Press, ISBN 0-674-83601-4 .

External links

Look up likelihood in Wiktionary, the free dictionary.

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the 12/1/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.