信息、熵与交叉熵 | WhiteDLG's Blog

1. 信息

$$ I(x) = \log_2(\frac{1}{p(x)}) = - \log_2(p(x)) $$

信息定义为： 一个事件概率的倒数再取 log。
取 log 以保证独立事件的信息量相加。

示例：

Fair coin (均匀硬币): $$ \begin{aligned} & \text{正面 } p(h) = 0.5 \quad \Rightarrow \quad I_p(h) = \log_2(1/0.5) = 1 \\ & \text{反面 } p(t) = 0.5 \quad \Rightarrow \quad I_p(t) = \log_2(1/0.5) = 1 \end{aligned} $$
Uneven coin (不均匀硬币): $$ \begin{aligned} & \text{正面 } q(h) = 0.2 \quad \Rightarrow \quad I_q(h) = \log_2(1/0.2) = 2.32 \\ & \text{反面 } q(t) = 0.8 \quad \Rightarrow \quad I_q(t) = \log_2(1/0.8) = 0.32 \end{aligned} $$

结论： 可以发现，小概率事件发生时，信息量大；大概率事件发生时，信息量小。

$$ H(p) = \sum p_i I_i^p = \sum p_i \log_2(\frac{1}{p_i}) = - \sum p_i \log_2(p_i) $$

公式注解：

解释：
当两个事件发生概率一样，那么随机情况更难确定，信息量将会更大；反之如果一个事件 0.8，一个事件 0.2 的概率，概率更加集中，对应的随机性更小，信息量也会更小。

计算示例如下：

Example 1: a coin with $p(h) = 0.5, \ p(t) = 0.5$

$$ H(p) = p(h) \times \log_2(1/p(h)) + p(t) \times \log_2(1/p(t)) = 0.5 \times 1 + 0.5 \times 1 = 1 $$

Example 2: a coin with $q(h) = 0.2, \ q(t) = 0.8$

$$ H(q) = q(h) \times \log_2(1/q(h)) + q(t) \times \log_2(1/q(t)) = 0.2 \times 2.32 + 0.8 \times 0.32 = 0.72 $$

$$ H(p, q) = \sum p_i I_i^q = \sum p_i \log_2(\frac{1}{q_i}) = - \sum p_i \log_2(q_i) $$

公式注解：

运算示例：
(注：此处假设真实分布 $p(h)=0.5, p(t)=0.5$)

Case 1:

$$ q(h) = 0.2, \quad q(t) = 0.8 $$ $$ H(p, q) = p(h) \times \log_2(1/q(h)) + p(t) \times \log_2(1/q(t)) = 0.5 \times 2.32 + 0.5 \times 0.32 = 1.32 $$

Case 2:

$$ q(h) = 0.4, \quad q(t) = 0.6 $$ $$ H(p, q) = p(h) \times \log_2(1/q(h)) + p(t) \times \log_2(1/q(t)) = 0.5 \times 1.32 + 0.5 \times 0.74 = 1.03 $$

Tags: Math, Information Theory, Entropy, Machine Learning