Classical Entropy Measures

Contents

Introduction

Classical entropy quantities are a way to quantify how much information is revealed in a random event. Shannon first introduced a way to quantify information by associating to an event occuring with probability math, an amount of information math (as is standard in information theory, logarithms are taken in base 2 and so information is measured in bits). If an unlikely event occurs, one gains a large amount of information. The Shannon entropy is the average amount of information gained by observing the outcome of an event. In some cases, in particular ones relevant to cryptography, the average amount of information is not a good quantity, since it assumes many independent repetitions of the same experiment.


Random Events

When we discuss random events, we assume that they occur according to a pre-defined ensemble of possible outcomes of that event, and associated probabilities. Event math is a single instance drawn from the ensemble math with probabilities math. We call this probability distribution math. The terminology math refers to a single instance drawn from this distribution taking the value math. One similarly defines distributions over more than one random variable. For instance, math is the joint distribution of math and math, and math is the distribution of math conditioned on the fact that math takes the value math.

Shannon Entropy

It was Shannon who pioneered the mathematical formulation of information. In essence his insight was that an event that occurs with probability math could be associated with an amount of information math. Consider many independent repetitions of random event math. The average information revealed by each instance of math is given by the Shannon entropy of math defined as follows.

The Shannon entropy associated with an event math drawn from random distribution math is

math.

Likewise, one can define conditional Shannon entropies. math denotes the Shannon entropy of math given math. It measures the average amount of information one learns from a single instance of math if one possesses string math, where math are chosen according to joint distribution math. One can average this quantity to form math, the conditional Shannon entropy.

The conditional Shannon entropy of an event math given math is defined by

math.

This leads one to define the mutual Shannon information between math and math by

math.

In some sense, this is the amount of information in common to the two strings math and math.

Shannon information was first used to solve problems of compression, and communication over a noisy channel, as given in the following theorems.

Source coding theorem : Consider a source emitting independent and indentically distributed (i.i.d.) random variables drawn from distribution math. For any math and math, there exists an encoder such that for sufficiently large math, any sequence drawn from math can be compressed to length math, and a decoder such that, except with probability math, the original sequence can be restored from the compressed string.

Furthermore, if one tries to compress the same source using math bits per instance, it is virtually certain that information will be lost.

For a discrete, memoryless channel, in which Alice sends a random variable drawn from math to Bob who receives math, the channel capacity is defined by math.

Noisy channel coding theorem : Consider Alice communicate with Bob via a discrete memoryless channel which has the property that if Alice draws from an i.i.d. source math, Bob receives math. For any math and math, for large enough math, there exists an encoding of length math and a decoder such that math bits of information are conveyed by the channel for each encoder-channel-decoder cycle, except with probability math.

Notice that in the noisy channel coding theorem, the channel is memoryless, and Alice has an i.i.d. source. In other words, all uses of the channel are independent of one another. This is the situation in which Shannon information is useful. However, in cryptographic scenarios where the channel may be controlled by an eavesdropper, such an assumption is not usually valid. Instead, other entropy measures have been developed that apply for these cases.

Beyond Shannon entropy

Rényi introduced the following generalization of the Shannon entropy.

The Rényi entropy of order math is defined by

math

We have, math. Two other important cases are math and math. A useful property is that, for math, math.

math is sometimes called the max entropy of math. It is important for information reconciliation, which in essence is error correction.

math is sometimes called the min entropy of math. It is important for privacy amplification. There, the presence of an eavesdropper means that it no longer suffices to consider each use of the channel as independent. The min entropy represents the maximum amount of information that could be learned from the event math, so describes the worst case scenario. In a cryptographic application, one wants to be assured security even in the worst case.

In general, Rényi entropies are strongly discontinuous. As an example, consider the two distributions math and math defined on math. Take math, math, and math to be the uniform distribution. The difference between min entropies is math. In the large math limit, the two distributions have distance math, which is exponentially small, while the difference in min entropies becomes arbitrarily large.

Smoothed versions of these quantities have been introduced which remove such discontinuities. In essence, these smoothed quantities involve optimizing such quantities over a small region of probability space. They have operational significance in cryptography in that they provide the relevant quantities for information reconciliation and privacy amplification.

The conditional versions of these smoothed entropies are relevant in cryptography, hence we provide a definition of these directly.

For a distribution math, and smoothing parameter math, we define the following smoothed Rényi entropies:

math

where math is a set of events with total probability at least math.

More generally, the smooth Rényi entropy of order math can be defined. However, up to an additive constant these equal either math (for math) or math (for math). It is also worth noting that for a large number of independent repetitions of the same experiment, the Rényi entropies tend to the Shannon entropy, that is,

math


Significance to cryptography

Information Reconciliation

The task of information reconciliation can be stated as follows. Alice has string math and Bob math, these being chosen with joint distribution math. Alice also possesses some additional independent random string math. What is the minimum length of string math that Alice can compute such that math is uniquely obtainable by Bob using math, math and math, except with probability less than math?

Renner and Wolf denote this quantity math. It is tightly bounded by the relation

math

where math.

It is intuitively clear why math is the correct quantity. Recall the definition

math

where math is a set of events with total probability at least math. The size of the set of strings math that could have generated math given math is math. Alice's additional information needs to point to one of these. It hence requires math bits to encode. Since Alice does not know math, she must assume the worst, hence we maximize on math. Furthermore, since some error is tolerable, we minimize on math, by cutting away unlikely events from the probability distribution.


Privacy Amplification

In essence, this task seeks to find the maximum length of string Alice and Bob can form from their shared string such that Eve has no information on this string.

This task can be stated more formally as follows. Alice possesses string math and Eve math, distributed according to math. Alice also has some uncorrelated random string math. What is the maximum length of a binary string math, such that for a uniform random variable math that is independent of math and math, we have math, except with probability less than math?

Again, this quantity, denoted math, has been defined and bounded by Renner and Wolf :

math

where math.

The reason that this quantity is relevant follows from the definition of extractors, which are functions commonly exploited for privacy amplification. In essence they take a non-uniform random string, and some additional catalytic randomness, to form a smaller string which is arbitrarily close to being uniform. The length reduction in the string is bounded by its min-entropy.

References

C. E. Shannon, Bell System Technical Journal 27, 379 (1948).

A. Rényi, in Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability (1961), vol. 1.

R. Renner and S. Wolf, in Advances in Cryptology --- ASIACRYPT 2005, edited by B. Roy (Springer-Verlag, 2005), vol. 3788, pp. 199--216.