Home > Binomial distribution
In mathematics, the binomial distribution is a discrete probability distribution which describes the number of successes in a sequence of n independent yes/no experiments, each of which yielding success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. The binomial distribution is the basis for the popular binomial test of statistical significance.A typical example is the following: assume 5% of the population are HIV-positive. You pick 500 people randomly. How likely is it that you get 30 or more HIV-positives?
The number of HIV-positives you pick is a random variable X which follows a binomial distribution with n = 500 and p = .05. We are interested in the probability Pr[X ≥ 30].
In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p). The probability of getting exactly k successes is given by
-
where
-
is the binomial coefficient "n choose k" (also denoted C(n, k)), whence the name of the distribution. The formula can be understood as follows: we want k successes (pk) and n − k failures ((1 − p)n − k). However, the k successes can occur anywhere among the n trials, and there are C(n, k) different ways of distributing k successes in a sequence of n trials.
The cumulative distribution function can be expressed in terms of the regularized incomplete beta function, as follows:
- .
If X ~ B(n, p), then the expected value of X is
-
and the variance is
-
The most likely value or mode of X is given by the largest integer less than or equal to (n+1)p; if m = (n+1)p is itself an integer, then m − 1 and m are both modes.
If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is
-
Two other important distributions arise as approximations of binomial distributions:
- If both np and n(1 − p) are greater than 5 or so, then an excellent approximation (provided a suitable continuity correction is used) to B(n, p) is given by the normal distributionProbability density function of Gaussian distribution (bell curve). The normal distribution is an extremely important probability distribution in many fields. It is also called the Gaussian distribution especially in physics and engineering. It is actuall
-
- This approximation is a huge time-saver; historically, it was the first use of the normal distribution, introduced in Abraham de MoivreAbraham de Moivre ( May 26, 1667 November 27, 1754), was a French mathematician famous for de Moivre's formula which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was elected a Fellow of the's book The Doctrine of ChancesThe Doctrine of Chances is a book on probability theory by 18th-century French mathematician Abraham de Moivre, published in 1733. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Protes in 1733. Nowadays, it can be seen as a consequence of the central limit theoremCentral limit theorems are a set of weak-convergence results in probability theory. Intuitively, they all express the fact that any sum of many independent identically distributed random variables is approximately normally distributed. These results expla since B(n, p) is a sum of n independent, identically distributed 0-1 indicator variables. Warning: this approximation gives inaccurate results unless a continuity correction is used. Note: that the picture gives the normal and binomial probability density functionIn mathematics, a probability density function (pdf serves to represent a probability distribution in terms of integrals. If a probability distribution has density f ''x , then intuitively the infinitesimal interval x x + d''x has probability f ''x d''x''s (PDF) and not the cumulative distribution functions.
- For example, suppose you randomly sample n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups of n people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2. Large sample sizes n are good because the standard deviation gets smaller, which allows a more precise estimate of the unknown parameter p.
- If n is large and p is small, so that np is of moderate size, then the Poisson distributionIn statistics and probability theory, the Poisson distribution is a discrete probability distribution (discovered by Simeon-Denis Poisson ( 1781- 1840) and published, together with his probability theory, in 1838 in his work Recherches sur la probabilite with parameter λ = np is a good approximation to B(n, p).
The formula for Bézier curves was inspired by the binomial distribution.