# Shannon's noiseless coding theorem

Shannon's noiseless coding theorem places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.

### Shannon's statement

Let X be a random variable taking values in some finite alphabet Σ1 and let f be a decipherable code from Σ1 to Σ2 where ∣Σ2 * ∣ = a. Let S denote the resulting wordlength of f(X).

If f is optimal in the sense that it has the minimal expected wordlength for X, then

$$\frac{H(X)}{\log_2 a} \leq \mathbb{E}S &lt; \frac{H(X)}{\log_2 a} +1$$

### Proof

Let si denote the wordlength of each possible xi (i = 1, …, n). Define qi = a − si/C, where C is chosen so that qi = 1.

Then

$$\begin{matrix} H(X) &amp;=&amp; - \sum_{i=1}^n p_i \log_2 p_i \\ &amp;&amp; \\ &amp;\leq&amp; - \sum_{i=1}^n p_i \log_2 q_i \\ &amp;&amp; \\ &amp;=&amp; - \sum_{i=1}^n p_i \log_2 a^{-s_i} + \sum_{i=1}^n \log_2 C \\ &amp;&amp; \\ &amp;\leq&amp; - \sum_{i=1}^n - s_i p_i \log_2 a \\ &amp;&amp; \\ &amp;\leq&amp; - \mathbb{E}S \log_2 a \\ \end{matrix}$$

where the second line follows from Gibbs' inequality and the third line follows from Kraft's inequality: C = ∑i = 1na − si ≤ 1 so logC ≤ 0.

For the second inequality we may set

si = ⌈ − logapi

so that

$$- \log_a p_i \leq s_i &lt; -\log_a p_i + 1$$

and so

a − si ≤ pi

and

a − si ≤ ∑pi = 1

and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal S satisfies

$$\begin{matrix} \mathbb{E}S &amp; = &amp; \sum p_i s_i \\ &amp;&amp; \\ &amp; &lt; &amp; \sum p_i \left( -\log_a p_i +1 \right) \\ &amp;&amp; \\ &amp; = &amp; \sum - p_i \frac{\log_2 p_i}{\log_2 a} +1 \\ &amp;&amp; \\ &amp; = &amp; \frac{H(X)}{\log_2 a} +1 \\ \end{matrix}$$

Category: Quantum Information Theory