Distances & Divergences — Minimal Examples
Each distance/divergence is shown as a rendered equation, then its minimal
numpy code, then the computed output. The
numbers match the Example lines in the post. A few intractable objects are described
in notes instead of code.
1. Vector & metric distances
Classical distances between two points $\mathbf p,\mathbf q\in\mathbb R^n$. All are true metrics.
p = np.array([0.0, 0.0]); q = np.array([3.0, 4.0])
print("Euclidean L2:", np.sqrt(((p - q) ** 2).sum()))Euclidean L2: 5.0
p = np.array([0.0, 0.0]); q = np.array([3.0, 4.0])
print("Manhattan L1:", np.abs(p - q).sum())Manhattan L1: 7.0
p = np.array([0.0, 0.0]); q = np.array([3.0, 4.0]); k = 3
print("Minkowski k=3:", (np.abs(p - q) ** k).sum() ** (1 / k))Minkowski k=3: 4.497941445275415
p = np.array([0.0, 0.0]); q = np.array([3.0, 4.0])
print("Chebyshev Linf:", np.abs(p - q).max())Chebyshev Linf: 4.0
p = np.array([0.0, 0.0]); q = np.array([3.0, 4.0])
Q = np.array([[2.0, 0.5], [0.5, 1.0]]); d = p - q
print("Quadratic:", np.sqrt(d @ Q @ d))Quadratic: 6.782329983125268
a = np.array([2.0, 1.0]); b = np.array([0.0, 0.0])
Sigma = np.array([[2.0, 0.5], [0.5, 1.0]]); diff = a - b
print("Mahalanobis:", np.sqrt(diff @ np.linalg.inv(Sigma) @ diff))Mahalanobis: 1.5118578920369088
s1, s2 = "10110", "10011"
print("Hamming:", sum(c1 != c2 for c1, c2 in zip(s1, s2)))Hamming: 2
def levenshtein(s, t):
m, n = len(s), len(t)
dp = np.zeros((m + 1, n + 1), int)
dp[:, 0] = np.arange(m + 1); dp[0, :] = np.arange(n + 1)
for i in range(1, m + 1):
for j in range(1, n + 1):
cost = 0 if s[i-1] == t[j-1] else 1
dp[i, j] = min(dp[i-1, j] + 1, dp[i, j-1] + 1, dp[i-1, j-1] + cost)
return dp[m, n]
def dtw(x, y):
n, m = len(x), len(y)
D = np.full((n + 1, m + 1), np.inf); D[0, 0] = 0
for i in range(1, n + 1):
for j in range(1, m + 1):
c = abs(x[i-1] - y[j-1])
D[i, j] = c + min(D[i-1, j], D[i, j-1], D[i-1, j-1])
return D[n, m]
print("Levenshtein('kitten','sitting'):", levenshtein("kitten", "sitting"))
print("DTW([1,2,3],[1,2,2,3]):", dtw([1, 2, 3], [1, 2, 2, 3]))Levenshtein('kitten','sitting'): 3
DTW([1,2,3],[1,2,2,3]): 0.02. Riemannian & information geometry
Distance becomes the length of the shortest path (geodesic) on a curved manifold; for statistical models the curvature is the Fisher information.
# Gaussian N(mu, sigma^2): I(mu) = 1 / sigma^2, verified by Monte Carlo.
sigma = 2.0
rng = np.random.default_rng(0)
x = rng.normal(0.0, sigma, size=200000)
score = x / sigma**2 # d/dmu log p at mu = 0
print("Fisher info I(mu) analytic:", 1 / sigma**2)
print("Fisher info I(mu) numeric :", (score**2).mean())Fisher info I(mu) analytic: 0.25 Fisher info I(mu) numeric : 0.25060033195535436
These require solving a variational/geodesic problem on a manifold, so there is no one-line snippet. On a unit sphere the geodesic between points at angular separation $\theta$ is the arc $L=\theta$ (orthogonal points $\Rightarrow \pi/2\approx1.571$). Locally the Fisher–Rao distance satisfies $\rho_{FR}(p,q)\approx\sqrt{2\,\mathrm{KL}(p\|q)}$; for the $p,q$ below that is $\sqrt{2\cdot0.511}\approx1.011$.
3a. $f$-divergences (Csiszár / Ali–Silvey)
Template $D_f(p\|q)=\int p\,f(q/p)\,d\mu$. Examples use the pmfs $p=[0.5,0.5]$, $q=[0.9,0.1]$.
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("KL(p||q) nats:", (p * np.log(p / q)).sum())
print("KL(q||p) nats:", (q * np.log(q / p)).sum()) # asymmetricKL(p||q) nats: 0.5108256237659907 KL(q||p) nats: 0.3680642071684971
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("Reverse KL = KL(q||p) nats:", (q * np.log(q / p)).sum())Reverse KL = KL(q||p) nats: 0.3680642071684971
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("Pearson chi2:", (((q - p) ** 2) / p).sum())Pearson chi2: 0.6400000000000001
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("Neyman chi2:", (((p - q) ** 2) / q).sum())Neyman chi2: 1.7777777777777781
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("Hellinger:", np.sqrt(((np.sqrt(p) - np.sqrt(q)) ** 2).sum() / 2))Hellinger: 0.32491969623290634
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("Total variation:", 0.5 * np.abs(p - q).sum())Total variation: 0.4
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1]); alpha = 0.0
t = q / p
f = 4 / (1 - alpha**2) * (1 - t ** ((1 + alpha) / 2))
print("Amari alpha=0:", (p * f).sum())Amari alpha=0: 0.4222912360003366
3b. Bregman divergences
Template $B_F(x\|y)=F(x)-F(y)-\langle x-y,\nabla F(y)\rangle$ for a strictly convex potential $F$.
x = np.array([1.0, 2.0]); y = np.array([0.0, 0.0])
print("Bregman sq-Euclidean:", ((x - y) ** 2).sum())Bregman sq-Euclidean: 5.0
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
print("Bregman (F=-H) = KL(p||q):", (p * np.log(p / q)).sum())Bregman (F=-H) = KL(p||q): 0.5108256237659907
pi = np.array([1.0, 2.0, 3.0]); qi = np.array([1.0, 1.0, 4.0])
print("Itakura-Saito:", (pi / qi - np.log(pi / qi) - 1).sum())Itakura-Saito: 0.3445348918918354
P = np.array([[2.0, 0.0], [0.0, 1.0]]); Q = np.eye(2); n = 2
PQ = P @ np.linalg.inv(Q)
print("Log-Det:", np.trace(PQ) - np.log(np.linalg.det(PQ)) - n)Log-Det: 0.3068528194400546
3c. Overlap / $\alpha$-power family
All built from the affinity $\int p^{\alpha}q^{1-\alpha}d\mu$ (same $p,q$ pmfs).
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
BC = np.sqrt(p * q).sum()
print("Bhattacharyya coeff:", BC)
print("Bhattacharyya dist :", -np.log(BC))Bhattacharyya coeff: 0.8944271909999159 Bhattacharyya dist : 0.11157177565710491
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
als = np.linspace(0.01, 0.99, 99)
print("Chernoff info:", max(-np.log((p**a * q**(1 - a)).sum()) for a in als))Chernoff info: 0.11237628278879076
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1]); a = 0.5
print("Renyi div a=0.5:", 1 / (a - 1) * np.log((p**a * q**(1 - a)).sum()))Renyi div a=0.5: 0.22314355131420957
3d. Symmetrized & Jensen-type divergences
Symmetric combinations of KL and Jensen gaps (same $p,q$ pmfs).
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
kl = lambda u, v: (u * np.log(u / v)).sum()
print("Jeffreys:", kl(p, q) + kl(q, p))Jeffreys: 0.8788898309344878
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
kl = lambda u, v: (u * np.log(u / v)).sum()
m = (p + q) / 2
js = 0.5 * kl(p, m) + 0.5 * kl(q, m)
print("Jensen-Shannon:", js, " sqrt(JS):", np.sqrt(js))Jensen-Shannon: 0.10174922507919676 sqrt(JS): 0.31898154347735663
p = np.array([0.5, 0.5]); q = np.array([0.9, 0.1])
negH = lambda z: (z * np.log(z)).sum() # F = -Shannon entropy
m = (p + q) / 2
print("Burbea-Rao (=JS):", (negH(p) + negH(q)) / 2 - negH(m))Burbea-Rao (=JS): 0.10174922507919681
4. Entropies
Uncertainty of a single distribution. Examples use $r=[0.5,0.25,0.25]$.
r = np.array([0.5, 0.25, 0.25])
print("Shannon (nats):", -(r * np.log(r)).sum())
print("Shannon (bits):", -(r * np.log2(r)).sum())Shannon (nats): 1.0397207708399179 Shannon (bits): 1.5
r = np.array([0.5, 0.25, 0.25]); alpha = 2.0
print("Renyi entropy a=2:", 1 / (1 - alpha) * np.log((r**alpha).sum()))Renyi entropy a=2: 0.9808292530117262
r = np.array([0.5, 0.25, 0.25]); alpha = 2.0
print("Tsallis a=2:", 1 / (1 - alpha) * ((r**alpha).sum() - 1))Tsallis a=2: 0.625
r = np.array([0.5, 0.25, 0.25]); alpha = 2.0; beta = 3.0
print("Sharma-Mittal a=2,b=3:", 1 / (1 - beta) * ((r**alpha).sum() ** ((1 - beta) / (1 - alpha)) - 1))Sharma-Mittal a=2,b=3: 0.4296875
5. Distances between sets & metric spaces
How far apart whole sets / shapes are.
A = np.array([[0, 0], [1, 0], [0, 1.0]]); B = np.array([[0, 0], [1, 1.0]])
def directed(X, Y):
return max(np.min(np.sqrt(((x - Y) ** 2).sum(1))) for x in X)
print("Hausdorff:", max(directed(A, B), directed(B, A)))Hausdorff: 1.0
Compares whole metric spaces up to isometry; computing it exactly is NP-hard (an infimum over all isometric embeddings). Two isometric shapes give $d_{GH}=0$; in practice it is approximated via the Gromov–Wasserstein relaxation.
6. Optimal transport & integral probability metrics (IPMs)
An IPM is $\gamma_{\mathcal F}(p,q)=\sup_{f\in\mathcal F}|\int f\,dp-\int f\,dq|$.
u = np.array([0.0, 1, 2, 3]); v = np.array([1.0, 2, 3, 4])
print("Wasserstein-1:", np.abs(np.sort(u) - np.sort(v)).mean())Wasserstein-1: 1.0
def mmd2(X, Y, g=0.5):
XX = ((X[:, None, :] - X[None, :, :]) ** 2).sum(-1)
YY = ((Y[:, None, :] - Y[None, :, :]) ** 2).sum(-1)
XY = ((X[:, None, :] - Y[None, :, :]) ** 2).sum(-1)
return np.exp(-g * XX).mean() + np.exp(-g * YY).mean() - 2 * np.exp(-g * XY).mean()
X = np.array([[0.0], [1.0], [2.0]]); Y = np.array([[3.0], [4.0], [5.0]])
print("MMD^2 (RBF, g=0.5):", mmd2(X, Y))MMD^2 (RBF, g=0.5): 1.0634645194698213
u = np.array([0.0, 1, 2, 3]); v = np.array([1.0, 2, 3, 4])
grid = np.sort(np.concatenate([u, v]))
Fu = np.searchsorted(np.sort(u), grid, side="right") / len(u)
Fv = np.searchsorted(np.sort(v), grid, side="right") / len(v)
print("KS statistic:", np.abs(Fu - Fv).max())KS statistic: 0.25
The Stein discrepancy is a supremum over a Stein-RKHS ball (needs the score $\nabla\log p$, not the normalizer) — estimated from samples, not a one-liner. Lévy–Prokhorov is an infimum over all Borel sets; for a point mass shifted by $\delta$ it equals $\min(\delta,1)$ (e.g. $\delta=0.3\Rightarrow0.3$).
7. Quantum geometry (density matrices)
Densities become density matrices $\rho$; integrals become traces. With diagonal $\rho$ these reduce to the classical formulas on the eigenvalues.
rho = np.array([[0.7, 0.0], [0.0, 0.3]])
ev = np.linalg.eigvalsh(rho)
print("Von Neumann entropy (nats):", -(ev * np.log(ev)).sum())Von Neumann entropy (nats): 0.6108643020548935
def logm_sym(M):
w, V = np.linalg.eigh(M)
return V @ np.diag(np.log(w)) @ V.T
P = np.array([[0.7, 0.0], [0.0, 0.3]]); Q = np.array([[0.5, 0.0], [0.0, 0.5]])
print("Von Neumann divergence (nats):", np.trace(P @ (logm_sym(P) - logm_sym(Q)) - P + Q))Von Neumann divergence (nats): 0.08228287850505178