Contents |
Introduction
Classical entropy quantities are a way to quantify how much information is revealed in a random event. Shannon first introduced a way to quantify information by associating to an event occuring with probability
, an amount of information
(as is standard in information theory, logarithms are taken in base 2 and so information is measured in bits). If an unlikely event occurs, one gains a large amount of information. The Shannon entropy is the average amount of information gained by observing the outcome of an event. In some cases, in particular ones relevant to cryptography, the average amount of information is not a good quantity, since it assumes many independent repetitions of the same experiment.
Random Events
When we discuss random events, we assume that they occur according to
a pre-defined ensemble of possible outcomes of that event, and
associated probabilities. Event
is a single instance drawn from
the ensemble
with probabilities
. We call this probability distribution
. The terminology
refers to a single instance drawn from
this distribution taking the value
. One similarly defines
distributions over more than one random variable. For instance,
is the joint distribution of
and
, and
is
the distribution of
conditioned on the fact that
takes the
value
.
Shannon Entropy
It was Shannon who pioneered the mathematical formulation of
information. In essence his insight was that an event
that occurs with probability
could be associated with an amount of
information
. Consider many independent
repetitions of random event
. The average information revealed by
each instance of
is given by the Shannon entropy of
defined as
follows.
The Shannon entropy associated with an event
drawn from random
distribution
is
.Likewise, one can define conditional Shannon entropies.
denotes the Shannon entropy of
given
. It measures the average
amount of information one learns from a single instance of
if one
possesses string
, where
are chosen according to joint
distribution
. One can average this quantity to form
, the conditional Shannon entropy.
The conditional Shannon entropy of an event
given
is defined by
.This leads one to define the mutual Shannon information between
and
by
.In some sense, this is the amount of information in common to the two strings
and
.
Shannon information was first used to solve problems of compression, and communication over a noisy channel, as given in the following theorems.
Source coding theorem :
Consider a source emitting independent and indentically distributed
(i.i.d.) random variables drawn from distribution
. For any
and
, there exists an encoder such that for
sufficiently large
, any sequence drawn from
can be
compressed to length
, and a decoder such that, except with
probability
, the original sequence can be restored from
the compressed string.
Furthermore, if one tries to compress the same source using
bits per instance, it is virtually certain that information will be
lost.
For a discrete, memoryless channel, in which Alice sends a random
variable drawn from
to Bob who receives
, the channel capacity is defined by
.
Noisy channel coding theorem :
Consider Alice communicate with Bob via a discrete memoryless channel
which has the property that if Alice draws from an i.i.d. source
,
Bob receives
. For any
and
, for large enough
, there exists an encoding of length
and a decoder such that
bits of information are conveyed by the channel for each
encoder-channel-decoder cycle, except with probability
.
Notice that in the noisy channel coding theorem, the channel is memoryless, and Alice has an i.i.d. source. In other words, all uses of the channel are independent of one another. This is the situation in which Shannon information is useful. However, in cryptographic scenarios where the channel may be controlled by an eavesdropper, such an assumption is not usually valid. Instead, other entropy measures have been developed that apply for these cases.
Beyond Shannon entropy
Rényi introduced the following generalization of the Shannon entropy.
The Rényi entropy of order
is defined by

We have,
.
Two other important cases are
and
. A useful property is that, for
,
.
is sometimes called the max entropy of
. It is important for information reconciliation, which in essence is error correction.
is sometimes called the min entropy of
.
It is important for privacy amplification. There, the
presence of an eavesdropper means that it no longer suffices to
consider each use of the channel as independent. The min entropy
represents the maximum amount of information that could be learned
from the event
, so describes the worst case scenario. In a
cryptographic application, one wants to be assured security even in
the worst case.
In general, Rényi entropies are strongly
discontinuous. As an example, consider the two distributions
and
defined on
. Take
,
, and
to be the uniform
distribution. The difference between min entropies is
. In the large
limit, the two distributions have distance
,
which is exponentially small, while the difference in min entropies
becomes arbitrarily large.
Smoothed versions of these quantities have been introduced which remove such discontinuities. In essence, these smoothed quantities involve optimizing such quantities over a small region of probability space. They have operational significance in cryptography in that they provide the relevant quantities for information reconciliation and privacy amplification.
The conditional versions of these smoothed entropies are relevant in cryptography, hence we provide a definition of these directly.
For a distribution
, and smoothing parameter
, we
define the following smoothed Rényi entropies:

where
is a set of events with total probability at least
.
More generally, the smooth Rényi entropy of order
can be
defined. However, up to an additive constant these equal
either
(for
) or
(for
). It is
also worth noting that for a large number of independent repetitions
of the same experiment, the Rényi entropies tend to the Shannon
entropy, that is,

Significance to cryptography
Information Reconciliation
The task of information reconciliation can be stated as follows.
Alice has string
and Bob
, these being chosen with joint
distribution
. Alice also possesses some additional
independent random string
. What is the minimum length of string
that Alice can compute such that
is uniquely obtainable
by Bob using
,
and
, except with probability less than
?
Renner and Wolf denote this quantity
. It is tightly bounded by the relation

It is intuitively clear why
is the correct
quantity. Recall the definition

where
is a set of events with total probability at least
. The size of the set of strings
that could have
generated
given
is
.
Alice's additional information needs to point to one of these. It
hence requires
bits to encode.
Since Alice does not know
, she must assume the worst, hence we
maximize on
. Furthermore, since some error is tolerable, we
minimize on
, by cutting away unlikely events from the
probability distribution.
Privacy Amplification
In essence, this task seeks to find the maximum length of string Alice and Bob can form from their shared string such that Eve has no information on this string.
This task can be stated more formally as follows. Alice possesses
string
and Eve
, distributed according to
. Alice also
has some uncorrelated random string
. What is the maximum length
of a binary string
, such that for a uniform random
variable
that is independent of
and
, we have
, except
with probability less than
?
Again, this quantity, denoted
, has been defined
and bounded by Renner and Wolf :

The reason that this quantity is relevant follows from the definition of extractors, which are functions commonly exploited for privacy amplification. In essence they take a non-uniform random string, and some additional catalytic randomness, to form a smaller string which is arbitrarily close to being uniform. The length reduction in the string is bounded by its min-entropy.
References
C. E. Shannon, Bell System Technical Journal 27, 379 (1948).
A. Rényi, in Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability (1961), vol. 1.
R. Renner and S. Wolf, in Advances in Cryptology --- ASIACRYPT 2005, edited by B. Roy (Springer-Verlag, 2005), vol. 3788, pp. 199--216.


