Differential Entropy: Analytical Calculation and Sample-based Estimates

Differential Entropy: Analytical Calculation and Sample-based Estimates

written by Ge Yang

THE GOAL  ~~ Compare the differential entropy computed analytically using the probability density function (PDF) with those estimated using a collection of samples.

An Analytical Example

Consider a uniform distribution between [0,1/2][0, 1/2]

p(x)={2 if x[0,1/2]0 otherwise. p(x) = \begin{cases} 2&\text{ if } x \in [0, 1/2]\\ 0&\text{ otherwise. } \end{cases}

The differential entropy is

H(u)=01/22log(2)dx=log2=0.693\begin{aligned} H(u) &= - \int_0^{1/2} 2 * \log(2) \mathrm d x \\ &= - \log 2 \\ &= - 0.693 \end{aligned}

There are two ways to compute this numerically. The first is to use analytical calculation from the distribution. We do so by discretizing the probability distribution function into bins. We do so by normalizing the sequence of probabilities. This gives us H(x)=log2=0.69H(x) = -\log 2 = -0.69:

def H_d(ps):
    ps_norm = ps / ps.sum()
    return - np.sum(np.log(ps) * ps_norm)
xs = np.linspace(0, 1 / 2, 1001)
ps = np.ones(1001) * 2
h_d = H_d(ps)
doc.print(f"analytical: {h_d:0.2f}")
analytical: -0.69

Can We Compute This Via scipy entropy?

The scipy.stats.entropy function computes the Shannon entropy of a discrete distribution. The scipy entropy does not assume that the probability is normalized, so it normalizes the ps internally first. This means that in order to convert this to the differential entropy, we need to scale the result by the log sum.

h_d_analytic = stats.entropy(ps) - np.log(ps.sum())
doc.print(f"analytic entropy w/ scipy: {h_d_analytic:0.3f}")
analytic entropy w/ scipy: -0.693

This shows that we can get the discrete scipy.stats.entropy to agree with our differential entropy function H_d:ps -> R that is defined analytically.


Estimating Entropy Using Samples from A Probability Density Function

When we do not have access to the PDF, we can also estimate the entropy from a set of samples. We can do this using the differential_entropy function from the stats package in scipy:

from scipy import stats

samples = np.random.uniform(0, 1/2, 10001)
doc.print(f"verify the distribution: min: {samples.min():0.2f}, max: {samples.max():0.2f}")
verify the distribution: min: 0.00, max: 0.50
h_d_sample = stats.differential_entropy(samples)
doc.print(f"sample-based: {h_d_sample:0.3f}")
sample-based: -0.702

Effect of Sampling On Differential Entropy

THE GOAL  ~~ vary the number of samples from the distribution, and compare the sample-based estimate against those computed from the analytical solution. This is done on the scipy documentation page 1.

Again, consider the uniform distribution from above.

def H_d(ps):
    ps_norm = ps / ps.sum()
    return - np.sum(np.log(ps) * ps_norm)

def get_H_d(N):
    xs = np.linspace(0, 1 / 2, N)
    ps = np.ones(N) * 2
    h_d = H_d(ps)
    doc.print(f"N={N} => h_d={h_d:0.2f}")

for n in [101, 201, 401, 801, 1601]:
    get_H_d(n)
N=101 => h_d=-0.69
N=201 => h_d=-0.69
N=401 => h_d=-0.69
N=801 => h_d=-0.69
N=1601 => h_d=-0.69

The differential entropy does not depend on the number of points in the discretized distribution as long as the discretization is done properly.

Effect of Change of Variable On Differential Entropy

THE GOAL  ~~ Investigate the effect of the change of variable, by varying the scale of the distribution and looking at the entropy. We can look at both the differential entropy analytically, and the sample-based estimates 1.

Now consider a more general case, a uniform distribution between [a,b][a, b]

p(x)={1baif x[a,b]0otherwise. p(x) = \begin{cases} \frac 1 {\vert b - a \vert} & \text{if } x \in [a, b] \\ 0 & \text{otherwise. } \end{cases}

The differential entropy is log(ba)\log \left(\vert b - a \vert\right). When a=0 and b=0.5, H(x)=log2H(x) = - \log 2.

We can verify this numerically. First let's define the entropy function

def H_d(ps):
    ps_norm = ps / ps.sum()
    return - np.sum(np.log(ps) * ps_norm)

We can plot the delta against log(ba)\log(b - a) -- it turned out to be zero across the range variants.

def get_H_d(a, b, N=201):
    xs = np.linspace(a, b, N)
    ps = np.ones(N) / (b - a)
    h_d = H_d(ps)
    delta = h_d - np.log(b - a)
    doc.print(f"a={a}, b={b} => h_d={h_d:0.2f}, δ={delta}")

for b in [1/4, 1/2, 1, 2, 4]:
    get_H_d(0, b)
a=0, b=0.25 => h_d=-1.39, δ=0.0
a=0, b=0.5 => h_d=-0.69, δ=0.0
a=0, b=1 => h_d=-0.00, δ=-0.0
a=0, b=2 => h_d=0.69, δ=0.0
a=0, b=4 => h_d=1.39, δ=0.0

Using The Wrong Measure

What happens when you use the Shannon entropy on the samples, and treat the samples as the probabilities?

samples = np.random.uniform(0, 1/2, 10001)
doc.print(f"verify the distribution: min: {samples.min():0.2f}, max: {samples.max():0.2f}")
verify the distribution: min: 0.00, max: 0.50
h_d_wrong = H_d(samples)
doc.print(f"Shannon Entropy is (incorrectly): {h_d_wrong:0.3f}")
Shannon Entropy is (incorrectly): 1.195