| Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |
|
|||||
| First Prev [ 1 2 ] Next Last |
Informally, mutual information measures the information of X that is shared by Y. If X and Y are independent, then X contains no information about Y and vice versa, so their mutual information is zero. If X and Y are identical (this means they describe the same trial, not two trials of the same sort occurring together), then all information conveyed by X is shared with Y: knowing X reveals nothing new about Y and vice versa, therefore the mutual information is the same as the information conveyed by X (or Y) alone, namely the entropy of X. In a specific sense (see below), mutual information quantifies the distance between the joint distribution of X and Y and the product of their marginal distributions.
Formally, if the joint probability distribution of X and Y is p with p(x, y) = Pr(X=x, Y=y), the probability distribution of X alone is f with f(x) = Pr(X=x), and the probability distribution of Y alone is g with g(y) = Pr(Y=y), then the mutual information of X and Y is given by I(X, Y), defined as follows for the discrete case:
In the continuous case, replace summation by a definite double integral:
Mutual information is a measure of independence in the following sense: I(X, Y) = 0 iff X and Y are independent random variables. This is easy to see in one direction: if X and Y are independent, then p(x,y) = f(x) × g(y), and therefore:
Moreover, mutual information is nonnegative (i.e. I(X,Y) ≥ 0; see below) and symmetric (i.e. I(X,Y) = I(Y,X)).
Several generalizations of mutual information to more than two random variables have been proposed, but a widely agreed on definition has not yet emerged.
Mutual information can be equivalently expressed as
where H(X) and H(X|Y) are the unconditional entropy and conditional entropyThe conditional entropy is an entropy measure used in information theory. The conditional entropy measures how much entropy a random variable has remaining if we have already learned completely the value of a second random variable. It is referred to as t of X, likewise H(Y) and H(Y|X) are the unconditional and conditional entropy of Y, with
and
Since H(X) > H(X|Y), this proves the nonnegativity property stated above.
Mutual information can also be expressed in terms of the Kullback-Leibler divergenceIn probability theory and information theory, the Kullback-Leibler divergence or relative entropy is a quantity which measures the difference between two probability distributions. The term "divergence" is a misnomer; it is not the same as divergence in c. Let q(x, y) = f(x) × g(y); then
Furthermore, let hy(x) = p(x, y) / g(y). Then
Thus mutual information can also be understood as the expectation of the Kullback-Leibler divergence between the conditional distributionGiven two jointly distributed random variables X and Y the conditional probability distribution of Y given X (written Y | X ) is the probability distribution of Y when X is known to be a particular value. For discrete random variables, the conditional pro h of X given Y and the univariate distribution f of X: the more different the distributions f and h, the greater the information gain.