Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Business Industries Finance Tax

Home > Variance


First Prev [ 1 2 ] Next Last

:This article is about mathematics. Alternate meaning: variance (land use).

In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. The variance of a real-valued random variable is its second central moment, and also its second cumulant (cumulants differ from central moments only at and above degree 4).

1 Definition

If μ = E(X) is the expected value (mean) of the random variable X, then the variance is

That is, it is the expected value of the square of the deviation of X from its own mean. In plain language, it can be expressed as "The average of the square of the distance of each data point from the mean". It is thus the mean squared deviation. The variance of random variable X is typically designated as , , or simply .

Note that the above definition can be used for both discrete and continuous random variables.

Many distributions, such as the Cauchy distribution, do not have a variance because the relevant integral diverges. In particular, if a distribution does not have expected value, it does not have variance either. The opposite is not true: there are distributions for which expected value exists, but variance does not.

2 Properties

If the variance is defined, we can conclude that it is never negative because the squares are positive or zero. The unit of variance is the square of the unit of observation. For example, the variance of a set of heights measured in centimeters will be given in square centimeters. This fact is inconvenient and has motivated many statisticians to instead use the square root of the variance, known as the standard deviation, as a summary of dispersion.

It can be proven easily from the definition that the variance does not depend on the mean value . That is, if the variable is "displaced" an amount b by taking X+b, the variance of the resulting random variable is left untouched. By contrast, if the variable is multiplied by a scaling factor a, the variance is multiplied by a2. More formally, if a and b are real constants and X is a random variable whose variance is defined,

Another formula for the variance that follows in a straightforward manner from the above definition is:

This is often used to calculate the variance in practice.

One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum (or difference) of independent random variables is the sum of their variances. A weaker condition than independence, called uncorrelatedIn probability theory and statistics, to call two real-valued random variables X and Y uncorrelated means that their correlation is zero, or, equivalently, their covariance is zero. If X and Y are independent then they are uncorrelated. It is not true, honess also suffices. In general,

Here is the covarianceIn probability theory and statistics, the covariance between two real-valued random variables X and Y with expected values and is defined as: : where E is the expectation operator. This is equivalent to the following formula which is commonly used in actu, which is zero for uncorrelated random variables.

3 Population variance and sample variance

In statistics, the concept of variance can also be used to describe a set of data. When the set of data is a populationIn statistics, a statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we woul, it is called the population variance. If the set is a sampleA sample is that part of a population which is actually observed. In normal scientific practice, we demand that it be selected in such a way as to avoid presenting a biased view of the population. If statistical inference is to be used, there must be a wa, we call it the sample variance.

The population variance of a population yi where i = 1, 2, ..., N is given by

where is the population mean. In practice, when dealing with large populations, it is almost never possible to find the exact value of the population variance, due to time, cost, and other resource constraints.

A common method of estimating the population variance is samplingSampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. In particular, results from probabi. When estimating the population variance using n random sampleA sample is a subset of a larger population. A random sample is one in which every item/object therein had an equal opportunity/ chance/ probability of being selected from the larger population containing all items/objects of interest. See also random rans xi where i = 1, 2, ..., n, the following formula is an unbiasedIn statistics, a biased estimator is one that for some reason on average over- or underestimates what is being estimated. The word bias has at least two different senses in statistics, one referring to something considered very bad, the other referring to estimator:

where is the sample mean.

Note that the n-1 in the denominator above contrasts with the equation for population variance. One common source of confusion is that the term sample variance and notation s2 may refer to either the unbiased estimator of the population variance given above, and to what is strictly speaking the variance of the sample, computed by using n instead of n-1.

Intiutively, computing the variance by dividing by n instead of n-1 gives an underestimate of the population variance. This is because we are using the sample mean as an estimate of the population mean , which we do not know. In practice, for large n, the distinction is often a minor one.

See also algorithms for calculating variance.





Non User