本文为在Scratchapixel 上学习的翻译读后感与部分个人解读。这里不会将全篇的内容系数翻译,保留原文以便后期自行理解,笔者只精炼一些文章中关键的点出来便于记录。
Variance and Standard Deviation 方差与标准差(Variance and Standard Deviation):Standard deviation is simply the square root of variance, and variance is defined as the expected value of the square difference between the outcome of the experiment.
V a r ( X ) = σ 2 = E [ ( X − E [ X ] ) 2 ] = ∑ i ( x i − E [ X ] ) 2 p i . Var(X) = \sigma^2 = E[(X - E[X])^2] = \sum_i (x_i - E[X])^2p_i. V a r ( X ) = σ 2 = E [ ( X − E [ X ] ) 2 ] = i ∑ ( x i − E [ X ] ) 2 p i .
Standard Deviation = σ 2 . \text{Standard Deviation} = \sqrt{\sigma^2}. Standard Deviation = σ 2 .
如若注意到的话,可以看到方差的符号标示上方有平方,这是为了避免潜在的符号干扰,本质上声明了方差和标准差是不可能为负数的。
由于期望的可加性性质(上一篇笔记中有推导过程),令μ = E [ X ] \mu = E[X] μ = E [ X ] ,若随机变量X X X 为常数,那么E [ c ] = c E[c] = c E [ c ] = c 。
E [ X − E [ X ] 2 ] = E [ ( X − μ ) 2 ] = E [ X 2 − 2 μ X + μ 2 ] = E [ X 2 ] − 2 μ E [ X ] + E [ μ 2 ] = E [ X 2 ] − 2 μ 2 + μ 2 = E [ X 2 ] − μ 2 = ∑ i x i 2 p i − μ 2 = ∑ i x i 2 p i − ( ∑ i x i p i ) 2 \begin{array}{l} E[X - E[X]^2] & = & E[(X - \mu)^2] \\ & = & E[X^2 - 2 \mu X + \mu^2] \\ & = & E[X^2] - 2 \mu E[X] + E[\mu^2] \\ & = & E[X^2] - 2\mu^2 + \mu^2 \\ & = & E[X^2] - \mu^2 \\ & = & \sum_i x_i^2 p_i - \mu^2 \\ & = & \sum_i x_i^2 p_i - (\sum_i x_i p_i)^2 \end{array}E [ X − E [ X ] 2 ] = = = = = = = E [ ( X − μ ) 2 ] E [ X 2 − 2 μ X + μ 2 ] E [ X 2 ] − 2 μ E [ X ] + E [ μ 2 ] E [ X 2 ] − 2 μ 2 + μ 2 E [ X 2 ] − μ 2 ∑ i x i 2 p i − μ 2 ∑ i x i 2 p i − ( ∑ i x i p i ) 2
若当前随机变量表示的是一个等概率随机事件,那么方差可以直接根据其样本均值X ˉ = E [ X ] \bar{X} = E[X] X ˉ = E [ X ] 构建计算
V a r ( X ) = ∑ i ( x i − E [ X ] ) 2 p i = ∑ i = 1 n ( x i − X ˉ ) 2 n \begin{array}{l} Var(X) & = & \sum_i(x_i - E[X])^2 p_i\\ & = & \sum_{i=1}^n \frac{(x_i - \bar{X})^2}{n} \end{array}V a r ( X ) = = ∑ i ( x i − E [ X ] ) 2 p i ∑ i = 1 n n ( x i − X ˉ ) 2
Properies of Variance
若P r ( X = c ) = 1 Pr(X = c) = 1 P r ( X = c ) = 1 ,那么其方差V a r ( X ) = ∑ i x i 2 p i − μ 2 = c 2 ∗ 1 − c 2 = 0 Var(X) = \sum_i x_i^2p_i - \mu^2 = c^2*1 - c^2 = 0 V a r ( X ) = ∑ i x i 2 p i − μ 2 = c 2 ∗ 1 − c 2 = 0 。换言之就是,一个必然事件的方差为0。
若有事件Y = a X + b Y = aX + b Y = a X + b , 那么其方差
V a r ( Y ) = E [ ( Y − E [ Y ] ) 2 ] = E [ ( a X + b − E [ a X + b ] ) 2 ] = E [ ( a X + b − a E [ X ] − b ) 2 ] = a 2 E [ ( X − E [ X ] ) 2 ] = a 2 V a r ( X ) \begin{array}{l} Var(Y) & = & E[(Y - E[Y])^2] \\ & = & E[(aX + b - E[aX + b])^2] \\ & = & E[(aX + b - aE[X] - b)^2] \\ & = & a^2 E[(X - E[X])^2] \\ & = & a^2Var(X) \end{array}V a r ( Y ) = = = = = E [ ( Y − E [ Y ] ) 2 ] E [ ( a X + b − E [ a X + b ] ) 2 ] E [ ( a X + b − a E [ X ] − b ) 2 ] a 2 E [ ( X − E [ X ] ) 2 ] a 2 V a r ( X )
若X 1 , . . . , X n X_1, …, X_n X 1 , . . . , X n 为独立随机变量,那么其方差V a r ( X 1 + . . . + X n ) = V a r ( X 1 ) + . . . + V a r ( X n ) . Var(X_1 + … + X_n) = Var(X_1) + … + Var(X_n). V a r ( X 1 + . . . + X n ) = V a r ( X 1 ) + . . . + V a r ( X n ) .
这里只推导两个随机变量之间的相加,多项式可递推。令μ 1 = E [ X 1 ] , μ 2 = E [ X 2 ] \mu_1 = E[X_1], \mu_2 = E[X_2] μ 1 = E [ X 1 ] , μ 2 = E [ X 2 ]
,而E [ X 1 + X 2 ] = E [ X 1 ] + E [ X 2 ] = μ 1 + μ 2 E[X_1 + X_2] = E[X_1] + E[X_2] = \mu_1 + \mu_2 E [ X 1 + X 2 ] = E [ X 1 ] + E [ X 2 ] = μ 1 + μ 2 ,因此:
V a r ( X 1 + X 2 ) = E [ ( X 1 + X 2 − E [ X 1 + X 2 ] ) 2 ] = E [ ( X 1 + X 2 − μ 1 − μ 2 ) 2 ] = E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] − E [ 2 ( X 1 − μ 1 ) ( X 2 − μ 2 ) ] = E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] − 2 ( E [ ( X 1 − μ 1 ) ] ∗ E [ X 2 − μ 2 ] ) = E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] − 2 ( ( μ 1 − μ 1 ) ( μ 2 − μ 2 ) ) = E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] = V a r ( X 1 ) + V a r ( X 2 ) \begin{array}{l} Var(X_1 + X_2) & = & E[(X_1 + X_2 - E[X_1 + X_2])^2] \\ & = & E[(X_1 + X_2 - \mu_1 - \mu_2)^2] \\ & = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] - E[2(X_1 - \mu_1)(X_2 - \mu_2)] \\ & = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] - 2(E[(X_1 - \mu_1)]*E[X_2 - \mu_2]) \\ & = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] - 2((\mu_1 - \mu_1)(\mu_2 - \mu_2)) \\ & = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] \\ & = & Var(X_1) + Var(X_2) \end{array}V a r ( X 1 + X 2 ) = = = = = = = E [ ( X 1 + X 2 − E [ X 1 + X 2 ] ) 2 ] E [ ( X 1 + X 2 − μ 1 − μ 2 ) 2 ] E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] − E [ 2 ( X 1 − μ 1 ) ( X 2 − μ 2 ) ] E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] − 2 ( E [ ( X 1 − μ 1 ) ] ∗ E [ X 2 − μ 2 ] ) E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] − 2 ( ( μ 1 − μ 1 ) ( μ 2 − μ 2 ) ) E [ ( X 1 − μ 1 ) 2 ] + E [ ( X 2 − μ 2 ) 2 ] V a r ( X 1 ) + V a r ( X 2 )
Probability Distribution: Part 2 正态分布(Normal Distribution):p ( x ) = N ( μ , σ ) = 1 σ 2 π e − ( x − μ ) 2 2 σ 2 . p(x) = \mathcal{N}(\mu, \sigma) = {\dfrac{1}{\sigma \sqrt {2 \pi} } } e^{-{\dfrac{(x -\mu)^2}{2\sigma^2}}}. p ( x ) = N ( μ , σ ) = σ 2 π 1 e − 2 σ 2 ( x − μ ) 2 .
其中μ \mu μ 代表正态分布的期望,σ \sigma σ 代表正态分布的标准差,整个曲线根据μ \mu μ 对称。
见图所示
Sampling Distribution 样本分布(Sample Distribution):Each sample on its own, is a random variable, but because now they represent the mean of certain number n of items in the population, we label them with the upper letter X X X . We can repeat this experiment N N N times which gives as series of samples: X 1 , X 2 , . . . X N X_1,X_2,…X_N X 1 , X 2 , . . . X N . This collection of samples is what we call a sampling distribution.
样本均值的期望(Expected value of the distribution of mean):We can apply to samples or statistics the same method for computing a mean than the method we used to calculate the mean of random variables.
注意到样本分布和普通的集群分布的区别,样本分布中,假定每个样本对集群取三次观察值,由于观察值本身是随机的缘故,因此观察值就是一个随机变量x x x 。那么这样的一个样本分布的样本大小为3,所以样本均值X 1 ˉ = E [ x ] = ∑ i = 1 n x i n \bar{X_1} = E[x] = \frac{\sum^n_{i=1}x_i}{n} X 1 ˉ = E [ x ] = n ∑ i = 1 n x i ,样本方差V a r ( X 1 ˉ ) = ∑ i = 1 n ( x i − X 1 ˉ ) Var(\bar{X_1}) = \sum^n_{i=1}(x_i - \bar{X_1}) V a r ( X 1 ˉ ) = ∑ i = 1 n ( x i − X 1 ˉ ) 。
上文说到的样本均值的期望的计算,也就是将最基本的观察值事件求取均值作为随机变量的期望计算 ,是讲样本这个群作为一个随机变量X X X ,那么重复这样在总群中做采样,可以得到一系列X 1 , X 2 , . . . X N X_1, X_2, … X_N X 1 , X 2 , . . . X N ,此时样本均值的期望μ X ˉ = E [ X ˉ ] = ∑ i = 1 N X i ˉ N \mu_{\bar{X}} = E[\bar{X}] = \frac{\sum^N_{i=1}\bar{X_i}}{N} μ X ˉ = E [ X ˉ ] = N ∑ i = 1 N X i ˉ ,样本均值的方差V a r ( X 1 ˉ ) = ∑ i − 1 N ( X i ˉ − μ X ˉ ) 2 N Var(\bar{X_1}) = \frac{\sum^N_{i-1}(\bar{X_i} - \mu_{\bar{X}})^2}{N} V a r ( X 1 ˉ ) = N ∑ i − 1 N ( X i ˉ − μ X ˉ ) 2 。
所以务必要明确原文当中Expected value of the distribution of mean 的含义才可得以进一步的计算。
中心极限定理(Central Limit Theorem, CLT): The mean of the sampling distribution of the mean μ X ˉ \mu_{\bar{X}} μ X ˉ equals the mean of the population μ \mu μ and that the standard error of the distribution of means μ X ˉ \mu_{\bar{X}} μ X ˉ is equal to the standard deviation of the population σ \sigma σ divided by the square root of n n n . In addition, the sampling distribution of the mean will approach a normal distribution N ( μ , σ n ) N(\mu, {\frac{\sigma}{\sqrt{n}}}) N ( μ , n σ ) . These relationships may be summarized as follows:
μ X ˉ = μ σ X ˉ = σ n \mu_{\bar{X}} = \mu_{ \sigma \bar{X}}=\frac{ \sigma }{\sqrt{n}} μ X ˉ = μ σ X ˉ = n σ
Properties of the Sample Mean
X ˉ = 1 n ( X 1 + . . . + X n ) \bar X = \dfrac{1}{n} (X_1 + … + X_n) X ˉ = n 1 ( X 1 + . . . + X n )
样本均值等于总体平均值
E [ X ˉ n ] = 1 n ∑ i = 1 n E [ X i ] = 1 n ⋅ n μ = μ . E[\bar X_n] = \dfrac{1}{n} \sum_{i=1}^n E[X_i] = \dfrac{1}{n} \cdot { n \mu } = \mu. E [ X ˉ n ] = n 1 i = 1 ∑ n E [ X i ] = n 1 ⋅ n μ = μ .
遵循与基本事件x x x 一样的性质(样本均值的期望本身就是随机变量x x x 的期望所计算而来的均值,2已经证明样本期望就是总体期望本身)
E [ a X + b ] = a E [ X ] + b E [ X 1 + . . . + X n ] = E [ X 1 ] + . . . + E [ X n ] \begin{array}{l} E[aX+b] = aE[X] + b\\ E[X_1 + … + X_n] = E[X_1] + … + E[X_n] \end{array} E [ a X + b ] = a E [ X ] + b E [ X 1 + . . . + X n ] = E [ X 1 ] + . . . + E [ X n ]
也是样本期望的定义
E [ X ˉ ] = E [ 1 n ( X 1 + . . . + X n ) ] = 1 n E [ X 1 + . . . + X n ] = 1 n ∑ i = 1 N E [ X i ] . \begin{array}{l} E[\bar X]&=&E[\dfrac{1}{n}(X_1 + … + X_n)]\\ &=&\dfrac{1}{n}E[X_1 + … + X_n]\\ &=&\dfrac{1}{n} \sum_{i=1}^N E[X_i]. \end{array} E [ X ˉ ] = = = E [ n 1 ( X 1 + . . . + X n ) ] n 1 E [ X 1 + . . . + X n ] n 1 ∑ i = 1 N E [ X i ] .
样本方差
V a r ( X ˉ n ) = 1 n 2 V a r ( ∑ i = 1 n X i ) = 1 n 2 ∑ i = 1 n V a r ( X i ) = 1 n 2 ⋅ n σ 2 = σ 2 n . \begin{array}{l} Var(\bar X_n)&=&\dfrac{1}{n^2} Var \left( \sum_{i=1}^n X_i \right) \\ &=&\dfrac{1}{n^2} \sum_{i=1}^n Var(X_i) = \dfrac{1}{n^2} \cdot n \sigma^2 = \dfrac{\sigma^2}{n}. \end{array} V a r ( X ˉ n ) = = n 2 1 V a r ( ∑ i = 1 n X i ) n 2 1 ∑ i = 1 n V a r ( X i ) = n 2 1 ⋅ n σ 2 = n σ 2 .
6.正如之前方差的定义中讲述的以下性质样本方差也都继承
V a r ( a X + b ) = a 2 V a r ( X ) V a r ( X 1 + . . . + X n ) = V a r ( X 1 ) + . . . + V a r ( X n ) . \begin{array}{l} Var(aX + b) = a^2Var(X)\\Var(X_1+…+X_n) = Var(X_1) + … + Var(X_n). \end{array} V a r ( a X + b ) = a 2 V a r ( X ) V a r ( X 1 + . . . + X n ) = V a r ( X 1 ) + . . . + V a r ( X n ) .
因此样本方差σ 2 \sigma^2 σ 2 为:
V a r ( X ˉ ) = V a r ( 1 n ( X 1 + . . . X n ) ) = 1 n 2 V a r ( X 1 + . . . X n ) = 1 n 2 ∑ i = 1 n V a r ( X i ) . \begin{array}{l} Var(\bar X)&=&Var(\dfrac{1}{n}(X_1 + … X_n))\\ &=&\dfrac{1}{n^2 } Var(X_1 + … X_n)\\ &=&\dfrac{1}{n^2 } \sum_{i=1}^n Var(X_i). \end{array} V a r ( X ˉ ) = = = V a r ( n 1 ( X 1 + . . . X n ) ) n 2 1 V a r ( X 1 + . . . X n ) n 2 1 ∑ i = 1 n V a r ( X i ) .
7.因为样本方差为σ 2 n \frac{\sigma^2}{n} n σ 2 比总体方差σ 2 \sigma^2 σ 2 要更小的关系(换言之样本标准差σ n \frac{\sigma}{\sqrt{n}} n σ ),样本均值X ˉ \bar{X} X ˉ 会比单一观察量X i X_i X i 所计算得到的期望μ \mu μ 更接近