In classical information theory, the **mutual information** of two random variables is a quantity that measures the mutual dependence of the two variables. Intuitively, the mutual information "I(X:Y)" measures the information about *X* that is shared by *Y*. Image:classinfo.png

If *X* and *Y* are independent, then *X* contains no information about *Y* and vice versa, so their mutual information is zero. If *X* and *Y* are identical then all information conveyed by *X* is shared with *Y*: knowing *X* reveals nothing new about *Y* and vice versa, therefore the mutual information is the same as the information conveyed by *X* (or *Y*) alone, namely the entropy of *X*. In a specific sense (see below), mutual information quantifies the distance between the joint distribution of *X* and *Y* and the product of their marginal distributions.

If we consider pairs of discrete random variables (*X*, *Y*), then formally, the mutual information can be defined as: *I*(*X* : *Y*) : = *H*(*X*) + *H*(*Y*) − *H*(*X**Y*) with *H*(*X*), *H*(*Y*) the Shannon entropy of "X" and "Y", and *H*(*X**Y*) the Shannon entropy of the pair "(X,Y)". In terms of the probabilities, the mutual information can be written as

$$I(X;Y) = \sum_{y \in Y} \sum_{x \in X} p(x,y) \log \frac{p(x,y)}{f(x)\,g(y)},$$

where *p* is the joint probability distribution function of *X* and *Y*, and *f* and *g* are the marginal probability distribution functions of *X* and *Y* respectively.

In the continuous case, we replace summation by a definite double integral:

$$I(X;Y) = \int_Y \int_X p(x,y) \log \frac{p(x,y)}{f(x)\,g(y)} \; dx \,dy, \!$$

where *p* is now the joint probability *density* function of *X* and *Y*, and *f* and *g* are the marginal probability density functions of *X* and *Y* respectively.

Mutual information is nonnegative by subadditivity of the Shannon entropy. (i.e. *I*(*X*;*Y*) ≥ 0; see below) and symmetric (i.e. *I*(*X*;*Y*) = *I*(*Y*;*X*)).

### Relation to other quantities

Mutual information can be equivalently expressed as

*I*(*X*; *Y*) = *H*(*X*) − *H*(*X*∣*Y*) = *H*(*Y*) − *H*(*Y*∣*X*) = *H*(*X*) + *H*(*Y*) − *H*(*X*, *Y*)

where *H*(*X*∣*Y*) = *H*(*X**Y*) − *H*(*Y*) is the conditional entropies.

Mutual information can also be expressed in terms of the Kullback-Leibler divergence between the joint distribution of two random variables *X* and *Y* and the product of their marginal distributions. Let *q*(*x*, *y*) = *f*(*x*) × *g*(*y*); then

*I*(*X*; *Y*) = *KL*(*p*, *q*).

Furthermore, let *hy*(*x*) = *p*(*x*, *y*) / *g*(*y*). Then

- $I(X;Y) = \sum_y g(y) \sum_x h_y(x) \times \log_2 \frac{h_y(x)}{f(x)} \!$
- = ∑
_{y}*g*(*y*)*KL*(*h*_{y},*f*) - = E
_{Y}[*KL*(*h*_{y},*f*)].

- = ∑

Thus mutual information can also be understood as the expectation of the Kullback-Leibler divergence between the conditional distribution *h* of *X* given *Y* and the univariate distribution *f* of *X*: the more different the distributions *f* and *h*, the greater the information gain.

Category:Handbook of Quantum Information Category:Classical Information Theory