[ Bingo ]

Camera Model

Pinhole Camera Model

本文为在Scratchapixel上学习相机模型时的个人理解。本文不做翻译或搬运工作，只描述个人学习上的理解。

PS: 这一章略过Depth of field(景深)的概念

易混淆概念解读

Camera Parameter	Description
Focal Length	eye到真实film平面的距离，与光圈一道用于计算FOV/AVO。焦距容易和虚拟相机中的虚拟film平面到eye之间的距离混淆，虚拟film平面一般位于近裁剪面
Camera Aperture	光圈定义了真实相机的物理维度，与焦距一道用于计算FOV/AVO。同时光圈的两个维度也定义了film gate aspect ratio大小。Wiki中列举了大多数常见的film参数。
Clipping Planes	远近裁剪面是虚拟的平面，其位于摄像机的视锥体中，只有在远近裁剪面中的对象才会被渲染。由于画布经常与近裁剪面放置在一起，因此要提防与Focal Length概念混淆。
Image Size	输出图像的尺寸/像素，图像尺寸定义了resolution gate aspect ratio。

由以上的这些概念可以推导计算出下面这些变量的值

Variable	Decription
Angle of View	由Focal Length与Film Size(Camera Aperture)计算得到
Canvas/Screen Window	其宽高比例与真实光圈定义得到的film gate aspect ratio一致，可以根据Canvas Size以及光圈快门宽高比例计算得到
Film Gate Aspect Ratio	$\frac{\text{film width}}{\text{film height}}$ ，这里film为真实相机的胶卷尺寸
Resolution Gate Aspect Ratio	$\frac{\text{image width}}{\text{image height}}$ ，这里image为输出的像素宽高比图像的

光圈(Aperature):本质上，在平红相机模型中，光圈就是暗室上的那个小洞。但是在现实生活中光圈(pin hole camera model)有一系列的问题。

首先，较大的光圈将会产生模糊的图像(在物体上的同一个位置反射的光子有较大概率在背景布上重叠)，但可以大大增大进光量减少曝光时间(曝光时间越长，那么拍摄非静止物体就会模糊)。

然而，为了得到一个边缘清晰锐利可辨的图像，减小光圈直径又是必要的，但这也要求曝光时间增加来提高画面亮度，这也提高了画面模糊的概率。

上述概念可由下两动图解释

为了解决曝光时间和边缘锋利的矛盾，人们将平红相机的洞替换为了凸透镜。如图所示，更大的光圈可以带来更佳的进光量，减少曝光时间，同时，他也可以将一定距离内的物体上反射的光重新在背景布上汇聚(不同的焦距将会带来不同的景深)。

景深表示在场景中，可见的最远的物体和最近的物体之间的距离(边缘锋利可辨)。因此平红相机的景深为无限大，因为他只是简单的讲光路重现在背景布上，不存在透镜相机的光路汇集的情况。

视角(Angle of View/AOV, Field of View/FOV):视角与两个参数息息相关，底片尺寸与焦距。也就是说改变两者中的任何一个都会引起视角的变化。

> 相同的焦距不同的底片大小将会导致最终图像的内容变少

同样的若想要在不同的底片大小上获得一致的内容，那么就要适当的调整焦距大小，这也就带来清晰度的影响。因为底片越大，相同的内容展示的也就越清晰。

同时这三个参数都是互相衔接的，也就是只要知道三者中的任何两个，就可以计算出剩下一个参数的大小。当然数字相机拍摄的画面已经不再受到底片的大小的影响了(传感器大小)。

胶片框比例(File Gate Radio)与图像分辨率比例(Resolution Gate Ratio):这两者是很有可能不一致的，那么这时候你使用的相机就是变形镜头。

两者比例不同时，你可能面临画面选择的问题。因此Maya给出两个选项:过扫描(Oversan)和填充(Fill)如下图。

基本未知量推导

相机原点就可以理解为Eye所在位置

$\begin{array}{l}\tan({\theta_H \over 2}) & = & {A \over B} \\& = & \color{red}{\dfrac {\dfrac { (\text{Film Aperture Width} * 25.4) } { 2 } } { \text{Focal Length} }}.\end{array}$

$\theta$ 是FOV(视角/视野)，可以通过 $arctan(\frac{A}{B})$ 求解显示得到。

$Canvas Size$ 也就是所谓的近裁剪面，因此近裁剪面的位置不同，画布的大小也会随之改变。当近裁剪面位于 $z = 0$ 位置时，相应计算而来的 $Canvas Size = 0$

$\begin{array}{l}\tan({\theta_H \over 2}) = {A \over B} =\dfrac{\dfrac{\text{Canvas Width} } { 2 } } { Z_{near} }, \\\dfrac{\text{Canvas Width} } { 2 } = \tan({\theta_H \over 2})$

只要得到Canvas Size就可以通过aspect ratio以及简单的坐标变换得到[left, top, right, bottom]各自的值。

$\begin{array}{l}\text{right} = \color{red}{\dfrac {\dfrac { (\text{Film Aperture Width}$

$\text{top} = \color{red}{\dfrac {\dfrac { (\text{Film Aperture Height}$

PBRT中的相机坐标系变换

Camera Space: 以相机的origin作为坐标轴的中心，z轴作为视线方向，y轴作为抬头方向(up direction)，用于判断一个物体是否对于相机本身是否可见会非常的方便。
Screen Space: 屏幕空间本身是在图像面板上定义出的，尽管被称之为屏幕空间，但z值依然是有意义的(相关于远近裁剪面，其范围为 $[0, 1]$ )。
Normalized device coordinate(NDC) space : 这是一个真实图像被渲染的坐标系，xy的范围都在 $[0, 1]$ 之间，z值和屏幕空间坐标系一样有效，本质上也就是屏幕空间线性转换而来。
Raster Space: 本质上他和NDC Space并没有太大的差别，唯独的区别就是他的xy范围在 $[0, xResolution]$ 和 $[0, yResolution]$

Mathematical Foundations of Monte Carlo Methods 4

本文为在Scratchapixel上学习的翻译读后感与部分个人解读。这里不会将全篇的内容系数翻译，保留原文以便后期自行理解，笔者只精炼一些文章中关键的点出来便于记录。

Hit-or-Miss Monte Carlo Method

蒙特卡洛方法(Monte Carlo methods):是一个使用随机采样的数值方法来解决数学问题的方法。

原始的蒙特卡洛方法是允许任意分布采样的，所有的非均匀采样的目的都是降低方差提高估计量的有效性也就是重要性采样。

各向同性散射(Isotropic scattering)与各向异性散射(Anistropic scattering):一个光子在进入一个材质时发生了散射，并且其改变后了的方向是随机的，被称为是各项同性；反之，光子改变后了的方向若是只在一个圆锥方向内，那么被称之为各向异性。

Monte Carlo Estimator

对于函数 $f(x)$ 的积分求解，可以用面积法来表达

$F = \int_a^b f(x)\;dx.$

下图中，在函数 $f(x)$ 上随机采样一点 $x$ ，那么结果 $f(x) * (b-a)$ 。但是很明显它和实际收敛结果 $F$ 差距较大。

增加采样数目到4个点，此时对这几个点所求面积进行平均。那么最终的近似结果很明显根据大数定理会不断的逼近真实收敛值。

以下公式很好的表达了这个思想

$\langle F^N\rangle = (b-a) \dfrac{1}{N } \sum_{i=0}^{N-1} f(X_i).$

其中 $\langle F^N\rangle$ 表示了在采样空间 $S$ 中采样N个点之后 $F$ 的近似值，也等价于之前所讲到的样本均值 $\bar X_n$ 。

这里采样点满足均匀分布，即 $pdf(x) = \frac{1}{b-a}$ 。

$\langle F^N \rangle$ 也是随机变量(随机变量的和)，其期望就是 $F$ 本身。

$\begin{array}{l}$

E[\langle F^N \rangle] & = & E \left[ (b-a) \dfrac{1}{N } \sum_{i=0}^{N-1} f(x_i)\right],\\

& = & (b-a)\dfrac{1}{N}\sum_{i=0}^{N-1}E[f(x)],\\

& = &(b-a)\dfrac{1}{N} \sum_{i=0}^{N-1} \int_a^b f(x)pdf(x)\:dx\\

& = & \dfrac{1}{N} \sum_{i=0}^{N-1} \int_a^b f(x)\:dx,\\

&=& \int_a^b f(x)\:dx,\\

&=&F\\

\end{array} $E [⟨ F^{N} ⟩] = = = = = = E [(b - a) N 1 \sum_{i = 0}^{N - 1} f (x_{i})], (b - a) N 1 \sum_{i = 0}^{N - 1} E [f (x)], (b - a) N 1 \sum_{i = 0}^{N - 1} \int_{a}^{b} f (x) p d f (x) d x N 1 \sum_{i = 0}^{N - 1} \int_{a}^{b} f (x) d x, \int_{a}^{b} f (x) d x, F$

原文中公式有误，已纠正求和部分

由于前面选用了均匀分布的 $pdf(x)$ 的缘故，下面推广到任意 $pdf(x)$ 上。下方为通用的蒙特卡洛估计量的写法。

$\langle F^N \rangle = \dfrac{1}{N} \sum_{i=0}^{N-1} \dfrac{f(X_i)}{pdf(X_i)}.$

对其求取期望验证正确性

$\begin{array}{l}$

E[\langle F^N \rangle ] & = & E \left[ \dfrac{1}{N } \sum_{i=0}^{N-1} \dfrac{f(X_i)}{pdf X_i)} \right],\

& = & \dfrac{1}{N} \sum_{i=0}^{N-1} E\left[ \dfrac{f(X_i)}{pdf(X_i) }\right],\

& = & \dfrac{1}{N} \sum_{i=0}^{N-1} \int_\Omega \dfrac{f(x)}{pdf(x)} pdf(x)\;dx, \\

& = & \dfrac{1}{N} \sum_{i=0}^{N-1} \int_\omega f(x) \; dx, \

& = & F.

\end{array} $E [⟨ F^{N} ⟩] = = = = = E [N 1 \sum_{i = 0}^{N - 1} p d f X _{i} ) f ( X _{i} )], N 1 \sum_{i = 0}^{N - 1} E [p d f ( X _{i} ) f ( X _{i} )], N 1 \sum_{i = 0}^{N - 1} \int_{Ω} p d f ( x ) f ( x ) p d f (x) d x, N 1 \sum_{i = 0}^{N - 1} \int_{ω} f (x) d x, F .$

上下限 $(b-a)$ 这样一个积分区间在通用写法中是隐藏的。原因很简单， $(b-a)$ 的产生是因为估计量 $h(x) = \frac{f(x)}{pdf(x)}$ 下均匀分布的 $pdf(x) = \frac{1}{b-a}$ 引起的。

因此原式应该是这样: $\langle F^N \rangle = \frac{1}{N}\sum_{i=0}^{N-1}\frac{f(X_i)}{\frac{1}{b-a}}$ ，而写法 $\langle F^N\rangle = (b-a) \dfrac{1}{N } \sum_{i=0}^{N-1} f(X_i)$ 会更容易能够从图中直接推出。

Properties of Monte Carlo Integration

蒙特卡洛积分估计值会向函数 $f(x)$ 收敛/逼近。 $\text{Pr} \left ( \lim_{N\to\infty} \langle F^N \rangle = F \right ) = 1$
蒙特卡罗估计量是无偏且一致的
收敛的速度和函数的方差 $\sigma^2$ 成比例。估计量本身的方差为 $\frac{\sigma^2}{n}$ ，因此如若需要降低估计值错误为原来的一半，那么需要提高四倍的采样。( $\sigma[\langle F^N \rangle] \propto { 1 \over \sqrt{N} }$ )

无偏:样本均值的期望就是求解积分本身

一致:随着样本容量的增大，估计量愈来愈接近总体参数的真值)

Importance Sampling

重要性采样作为减小方差众多方法中的一个，本身的思想较为直接。

以下为不同采样分布的采样点对于近似值的影响(会高于或者低于真实积分解)。

此图中均匀分布其值勉强，但采样过程中似乎遗漏了函数当中较为重要的部分(一个高峰被忽略)。而右边的人为的采样也并不是一个较好的方法，这将会导致偏差(bias)。

如若被积函数为常数函数，那么采样选用均匀分布得到的结果本身就是正确的。

现有一函数，他与函数 $f(x)$ 成比例

$f(x) = c f'(x)$

因此

$\dfrac{f(x)}{f'(x) } = \dfrac{1}{c}$

那么这里代入蒙特卡洛估计量，联系常数函数的采样的结论。

$\langle F^N \rangle = \dfrac{1}{N} \sum_{i=0}^{N-1} {\dfrac{\color{orange}{f(x)}}{\color{red}{pdf(x)}}} = \dfrac{1}{N}\sum_{i=0}^{N-1}{\dfrac{\color{orange}{f(x)}}{\color{red}{f'(x)}}} = \dfrac{1}{N}\sum_{i=0}^{N-1}{\dfrac{\color{orange}{1}}{\color{red}{c}}}$

也就是说，只要 $pdf(x)$ 与被积函数 $f(x)$ 成比例，蒙特卡洛积分的方差就是0(常数函数的方差为0)。换言之， $pdf(x)$ 与被积函数 $f(x)$ 的相似度越高，那么偏差也就越低。

以 $f(x) = sin(x)$ ，区间 $[0, \frac{\pi}{2}]$ 为例

$\begin{array}{l}$

F & = & \int_0^{\pi \over 2} \sin(x) \; dx \\

& = & \left[ -\cos(x) \right]_0^{\pi \over 2} \\

& = & -\cos(\dfrac{\pi}{2}) - - \cos(0) \\

& = & 1.

\end{array} $F = = = = \int_{0}^{2 π} sin (x) d x [- cos (x)]_{0}^{2 π} - cos (2 π) - - cos (0) 1 .$

选用两个不同的 $pdf(x)$ 进行对比

#	Uniform	Importance	Error Uniform %	Error Importance %
0	1.125890	0.969068	12%	-3%
1	1.277833	0.925675	27%	-7%
2	1.054394	0.980940	5%	-1%
3	1.125890	0.969068	12%	-1%
4	1.125890	0.969068	12%	-6%
5	0.830151	1.041751	-16%	4%
6	1.062268	0.989363	6%	-1%
7	0.849265	1.043809	-15%	4%
8	0.921527	1.020279	-7%	2%
9	1.002310	0.994284	0%	0%

很明显的是，结果非常符合重要性采样理论。

Quasi Monte Carlo

随机采样中无法避免的就是当采样点近乎重合的现象(clump)，这也就意味着在最终计算时，其中一个采样点的信息也就被浪费，这不利于收敛的快速计算。

分层采样(Stratified Sampling): The interval of integration is divided into N subintervals or cells (also often called strata), samples are placed in the middle of these subintervals but are jittered by some negative or positive random offset which can’t be greater than half the width of a cell

换言之就是 $\langle F^N \rangle = { (b-a) \over N} \sum_{i=0}^{N-1} f(a + ( { {i+\xi} \over N } ) (b-a)).$ ，其中 $-h/2 \leq \xi \leq h/2$ 。分层采样的思想介于随机采样和均匀采样之间的。

低差异化序列(Low-Discrepancy Sequences):The goal is to generate sequences of samples which are not exactly uniformly distributed (uniformly distributed samples cause aliasing) and yet which appear to have some regularity in the way they are spaced.

Van der Corput Sequence的简介可见链接，思想就是将整数转换为二进制形式，根据小数点镜像对称。根据 $\phi_b(n) = {d_0 \over {2^1}} + {d_1 \over {2^2}} … + {d_{N-1}\over {2^N}}.$ 将给定的整数转换为小数形式。

Mathematical Foundations of Monte Carlo Methods 3

本文为在Scratchapixel上学习的翻译读后感与部分个人解读。这里不会将全篇的内容系数翻译，保留原文以便后期自行理解，笔者只精炼一些文章中关键的点出来便于记录。

The Probability Distribution Function

概率密度函数(Probabilify density function):When a function such as the normal distribution defines a continuous probability distribution. In other words, pdfs are used for continuous random variables.

The PDF can be used to calculate the probability that a random variable lies within an interval:

$Pr(a \leqslant X \leqslant b) = \int^b_a pdf(x)dx$

概率密度函数对概率的积分必为1

$\int^{\infty}_{-\infty}g(x)dx = 1$

概率质量函数(the probability mass function)用于描述离散型随机变量; 概率密度函数(probability distribution function)用于描述连续型随机变量。

累积分布函数(Cumulative Distribution Function/Probability distribution function):CDFs are monotonically increasing functions.It’s not strictly monotic though. There may be intervals of constancy.

$pdf(x) = \frac{d}{dx}cdf(x)$

cdf是pdf在区间 $[-\infty, \infty]$ 上的和，pdf(x)是cdf在「点」x上的斜率/导数

Expected Value of the Function of a Random Variable: Law of the Unconscious Statistician

统计师的无意识法则(law of the unconscious statistician):In practice, you don’t necessarily know the probability distribution of F(X). Of course you can calculate it, but this is an extra step, which you can avoid if you use the second method.

$E[F(X)] = F[Y] = \sum F(X_i) P_X(X_i)$

假定函数 $F(X)$ 是关于随机变量 $X$ 的一个映射(因此 $F(X)$ 本身也是随机变量)，那么举例而言 $F(X) = (X - 3)^2$ ，现求函数 $F(X)$ 的期望。

根据期望的定义(离散型随机变量 $E[X] = \sum_{i=0}X_ipmf(X_i)$ ，连续型随机变量 $E[X] = \int^{\infty}_{-\infty}Xpdf(X)$ )，我们需要知道F(X)的 $pdf(F(X))$ 。

若 $X$ 的采样空间为 $S = \{1,2,3,4,5,6\}$ ，那么对其所有的可能值进行 $F(X)$ 计算

$\begin{array}{l}X = 1, \; F(1) = (1-3)^2 = 4,\\X = 2, \; F(2) = (2-3)^2 = 1, \ X = 3,\;F(3) = (3-3)^2 = 0, \ X = 4,\;F(4) = (4-3)^2 = 1, \\X=5,\;F(5) = (5-3)^2 = 4,\ X = 6,\;F(6) = (6-3)^2 = 9.\end{array}$

也就可以得到 $pdf(F(X))$ 的「离散型表达」(连续性和离散型本质相同)

$\begin{array}{l}Pr(F(0)) &=& \dfrac{1}{6}\ Pr(F(1)) &=& \dfrac{1}{6} + \dfrac{1}{6} &=& \dfrac{2}{6}\ Pr(F(4)) &=& \dfrac{1}{6} + \dfrac{1}{6} &=& \dfrac{2}{6}\ Pr(F(9)) &=& \dfrac{1}{6}.\end{array}$

最后一步计算 $F(X)$ 的期望也就顺理成章

$\begin{array}{l}E[F(X)]&=&0 \times Pr(F(X) = 0) + 1 \times Pr(F(X) = 1) + \\&&4 \times Pr(F(X) = 4) + 9 \times Pr(F(X) = 9),\\&=&0$

所以这里使用随机变量 $Y$ 代替随机变量 $F(X)$ ，就有

$E[F(X)] = E[Y] = \sum Y_i P_Y(Y_i) \text{… …}<1>$

根本不需要知道 $F(X)$ 的具体概率分布，直接根据随机变量 $X$ 的概率分布就可以求出 $Y$ 的期望。

$\begin{array}{l}E[F(X)] & = &(1-3)^2\times Pr(X = 1) + (2-3)^2\times Pr(X = 2) +\\&& (3-3)^2\times Pr(X = 3)+ (4-3)^2\times Pr(X = 4) +\\&& (5-3)^2\times Pr(X = 5) + (6-3)^2\times Pr(X = 6)\\&=&4$

原因在于，在计算期望的时候，多做了一步工作。也就是将上述式子当中相同的 $F(X_i)$ 对应的概率进行了合并( $Pr(F(X_i) = \sum_{j=0}{Pr(X_j)}(all F(X_i) == F(X_j)))$ )，但只为求解他的 $pdf$ 。所以事实上这一步在求解期望过程中并不需要。

$E[F(X)] = E[Y] = \sum{F(X_i)P_X(X_i)}\text{… …}<2>$

对比 $<1>$ 中，我们将中间合并过程得到的 $Pr(Y_i)$ 直接替换为了 $X$ 的概率分布。之所以称之为无意识的，是因为这一个过程非常的直观，以至于没有意识的状态下统计学家就得出了这样一个结论，同时也是毫无疑问正确的，因此而得名，How did the Law of the Unconscious Statistician get its name?

The Inverse Transform Sampling Method

简介背景，目前已有函数 $PDF(X)$ 的一堆数据(这个 $PDF(X)$ 可能是数学分析就可以解决的自然的概率分布，或是任意的 $PDF(X)$ )，那么我们就可以通过求解 $CDF(X)$ 的反函数 $InvCDF(X)$ ，进行均匀采样计算就可以得到对应 $PDF(X)$ 的密度分布。

1.计算机中求解的步骤是如何实现的?

首先对已有数据进行累加求解对应累积分布函数(CDF)，如图

在y轴上进行均匀采样(Uniform Distribution)，假定当前生成随机数 $r = 0.491$ ，那么求取其下界(最接近该y值的采样点，且采样点的 $y_{sample} \leqslant y$ )

以下证明方法和原文略有不同，更好理解

令图中采样点 $n = 15, n = 16$ 坐标为 $(x_1, y_1), (x_2, y_2)$

那么所求随机数生成点 $(?, r)$ 的x轴坐标，令 $dx$ 为采样点间间距，随机采样点x坐标「?」与下界 $x_1$ 之差为k，其中采样区间为 $[min, max]$ ，采样数量为 $nSamples$

$dx = \frac{min - max}{nSamples}$

$\frac{k}{dx} = \frac{r-y_1}{y_2-y_1} = t$

$k = t * dx$

那么所求「?」即为

$? = min + n_{lower}$

为了将「?」控制在区间 $[0, 1]$ 之间( $pdf(X)$ 定义)，需要做一次映射

$\begin{array}{l}? \backsim [min, max] \ x \backsim [0, 1]\end{array}$

$\begin{array}{l}\frac{x-min}{?-0} = \frac{max-x}{1-?} \\? = \frac{x-min}{max-min}\end{array}$

最终，得到横坐标值之后(也就是在 $invCDF(X)$ 的x轴上均匀采样得到y值)绘图得到

2.上述过程中并没有求反函数过程，为何要求反函数？

本质上当对 $CDF(X)$ 的y轴进行均匀采样求解x的时候，就已经是在隐含求解了，不过因为不是所有的 $CDF(X)$ 都可以显式的通过数学分析变换，因此计算机中求解使用的是通用过程。

3.指数分布的反函数求解过程

$PDF(X) = \lambda e^{-\lambda x}$

$\begin{array}{l}$

P(X>t) & = & \int^{\infty}_t \lambda e^{-\lambda x}dx \\

& = & \int_{-\infty}^t \lambda e^{-\lambda x} -\frac{1}{\lambda}d(-\lambda x)

\end{array} $P (X > t) = = \int_{t}^{\infty} λ e^{- λ x} d x \int_{- \infty}^{t} λ e^{- λ x} - λ 1 d (- λ x)$

令 $u = -\lambda x$ ，当 $x = t$ 时， $u = - \lambda t$ ；当 $x = \infty$ , $u = -\infty$ ，变换函数 $x = -\frac{1}{\lambda}u$ 在 $[-\infty, -\lambda t]$ 上单值， $\frac{dx}{du} = -\frac{1}{\lambda}$ 在 $[-\infty, -\lambda t]$ 上连续

$\begin{array}{l}P(X>t) & = & -\int^{-\lambda t}_{-\infty}e^udu \\& = & -\left[ e^u \right]^{-\lambda t}_{-\infty} \\& = & e^{- \lambda t} - e^{-\infty} \\& = & e^{- \lambda t}\end{array}$

那么求取 $P(x<t) = 1-e^{-\lambda x} = y$ 的反函数

$\begin{array}{l}y = 1-e^{-\lambda x} \\e^{-\lambda x} = 1 - y \\x = -\frac{1}{\lambda}ln(1-y)\end{array}$

那么最终对此函数进行采样，得到的结果就是指数分布。

原文中有误，缺 $dx$ 且最后结果e幂上少符号缺 $\lambda$ ，y的表达式有误，目前已反馈给Scratchapixel

Estimators

参数(Parameter):参数是指描述总体特征的一个或若干个数值，例如总体(Population)的均值、总体的比例和总体的方差等数字特征，两个或两个以上总体间的相关系数、偏相关系数、复相关系数和回归系数等数字特征。

在一般情况下，总体参数是未知的，例如一个国家或地区的人口总数、GDP总量、小麦总产量、人均可支配收入和产品的合格率等都是总体未知参数，而通过全面调查取得这些未知参数需要付出高昂的成本。参数估计的目的就是利用抽样得到的样本信息来估计未知的总体参数(详情可见参数的概念)

估计量(Estimator)与估计值(Estimate): The sample mean is a form of estimator, but in the general sense, an estimator is a function operating on observable data and returning an estimate of the population’s parameter value $θ$ .

This function $δ$ is what we call an estimator of the parameter $θ$ and the result of $δ(x_1,…,x_n)$ is called an estimate of $θ$ .(An estimation of the population’s paramter $θ$ ).

采样均值其实就是总体未知参数的一个估计量，本质上估计量就是一组数据的函数。估计量就是随机变量 $X_1, . . . , X_n$ 的一个映射，因此本身他也是随机变量。

常见估计量

样本均值 $\bar{X} = \frac{1}{n} \sum^n_{i=1}X_i$ ，是总体均值 $E[X] = \mu$ 的估计量
样本方差 $S^2 = \frac{1}{n-1}\sum^n_{i=1}(X_i - \bar{X})^2$ 是总体方差 $D(X)=\sigma^2$ 的估计量；样本标准差 $S = \sqrt{\frac{1}{n-1}\sum^n_{i=1}(X_i - \bar{X})^2}$ 是总体标准差 $\sigma$ 的估计量
样本比例 $\bar{p}=\frac{n_1}{n}$ 是总体比例p的估计量，其中 $n_1$ 为样本中具有某种特征的样本单位数。

估计量和估计值之间的区别: An estimate is a specific value $δ(x_1,…,x_n)$ of the estimator which we can determine by using observable values $x_1,…,x_n$ . The estimator is a function $δ(X)$ of the random vector $X$ while again, an estimate is a just specific value $δ(x)$ .

一句话概括，估计值只是估计总体未知参数的某一估计量，代入样本值计算得到的具体结果

点估计和区间估计，这里不做延伸阅读，简单的介绍了置信区间，置信度/置信概率/置信系数/置信水平，置信上下限等基本概念。

Properties of Estimators

无偏性(Unbias):

当采样的数量趋于极限时，样本均值就等于总体均值本身

$\bar X_n \xrightarrow{p} \theta \quad \text{for} n \rightarrow \infty$

也就是样本均值的期望就有着如下的关系

$E[\bar X_n] - \theta = 0.$

而满足无偏估计性质的样本均值，样本均值就是估计量本身，替换 $\bar{X_n}$ 为 $\sigma(X)$ ，就有以下两种情况了

$E[\delta_{unbiased}(X)] - \theta = 0.$

$E[\delta_{biased}(X)] - \theta \neq 0.$

二者之差就是偏差本身了

$E[\delta_{biases}(X)] - \theta = \text{ bias }.$

正如先前讲到的，计算机图形学中经常采用有偏的方法来完成计算，原因在于有偏的方法可以带来更快速度的收敛计算(前提是满足一致性，这比无偏的性质对一个估计量而言更重要)，但却只与真实值之间有着微乎其微的误差。

一致性(Consistency):是指随着样本容量的增大，估计量愈来愈接近总体参数的真值。设总体参数为 $\theta$ ， $\delta$ 为一估计量，如果当样本容量 $n \to \infty$ 时，依概率收敛于 $\theta$ ，即

$P - lim_{n \to \infty}\delta = \theta$

如果一个估计量是一致估计量，那么可以通过增加样本容量来提高估计的精度和可靠性。

可以证明，样本均值 $\bar{X}$ 是总体均值 $\mu$ 的一致估计量；样本比例 $\bar{p}$ 是总体比例 $p$ 的一致估计量；样本方差 $S^2$ 是总体方差 $\sigma^2$ 的一致估计量；样本标准差 $S$ 是总体标准差 $\sigma$ 的一致估计量.(详情见估计量评价的标准)

有效性(Variance):有效性是指估计量与总体参数的离散程度。如果两个估计量都是无偏的，那么离散程度较小的估计量相对而言是较为有效的。离散程度是用方差度量的，因此在无偏估计量中，方差愈小愈有效。

设 $\theta_1$ 与 $\theta_2$ 为总体参数 $\theta$ 的无偏估计量，即 $E(\theta_1)=\theta$ ， $E(\theta_2)=\theta$ ，那么如果两者的方差对比

$D(\theta_1) \leqslant D(\theta_2)$

那么称 $\theta_1$ 会比 $\theta_2$ 有效

有效性是一个对比性质，因此是相对的，不存在绝对的自身有效的估计量。

Mathematical Foundations of Monte Carlo Methods 2

本文为在Scratchapixel上学习的翻译读后感与部分个人解读。这里不会将全篇的内容系数翻译，保留原文以便后期自行理解，笔者只精炼一些文章中关键的点出来便于记录。

Variance and Standard Deviation

方差与标准差(Variance and Standard Deviation):Standard deviation is simply the square root of variance, and variance is defined as the expected value of the square difference between the outcome of the experiment.

$Var(X) = \sigma^2 = E[(X - E[X])^2] = \sum_i (x_i - E[X])^2p_i.$

$\text{Standard Deviation} = \sqrt{\sigma^2}.$

如若注意到的话，可以看到方差的符号标示上方有平方，这是为了避免潜在的符号干扰，本质上声明了方差和标准差是不可能为负数的。

由于期望的可加性性质(上一篇笔记中有推导过程)，令 $\mu = E[X]$ ，若随机变量 $X$ 为常数，那么 $E[c] = c$ 。

$\begin{array}{l}E[X - E[X]^2] & = & E[(X - \mu)^2] \\& = & E[X^2 - 2 \mu X + \mu^2] \\& = & E[X^2] - 2 \mu E[X] + E[\mu^2] \\& = & E[X^2] - 2\mu^2 + \mu^2 \\& = & E[X^2] - \mu^2 \\& = & \sum_i x_i^2 p_i - \mu^2 \\& = & \sum_i x_i^2 p_i - (\sum_i x_i p_i)^2\end{array}$

若当前随机变量表示的是一个等概率随机事件，那么方差可以直接根据其样本均值 $\bar{X} = E[X]$ 构建计算

$\begin{array}{l}Var(X) & = & \sum_i(x_i - E[X])^2 p_i\\& = & \sum_{i=1}^n \frac{(x_i - \bar{X})^2}{n}\end{array}$

Properies of Variance

若 $Pr(X = c) = 1$ ，那么其方差 $Var(X) = \sum_i x_i^2p_i - \mu^2 = c^2*1 - c^2 = 0$ 。换言之就是，一个必然事件的方差为0。
若有事件 $Y = aX + b$ , 那么其方差

$\begin{array}{l}Var(Y) & = & E[(Y - E[Y])^2] \\& = & E[(aX + b - E[aX + b])^2] \\& = & E[(aX + b - aE[X] - b)^2] \\& = & a^2 E[(X - E[X])^2] \\& = & a^2Var(X)\end{array}$
若 $X_1, …, X_n$ 为独立随机变量，那么其方差 $Var(X_1 + … + X_n) = Var(X_1) + … + Var(X_n).$

这里只推导两个随机变量之间的相加，多项式可递推。令 $\mu_1 = E[X_1], \mu_2 = E[X_2]$

，而 $E[X_1 + X_2] = E[X_1] + E[X_2] = \mu_1 + \mu_2$ ，因此:

$\begin{array}{l}Var(X_1 + X_2) & = & E[(X_1 + X_2 - E[X_1 + X_2])^2] \\& = & E[(X_1 + X_2 - \mu_1 - \mu_2)^2] \\& = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] - E[2(X_1 - \mu_1)(X_2 - \mu_2)] \\& = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] - 2(E[(X_1 - \mu_1)]*E[X_2 - \mu_2]) \\& = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] - 2((\mu_1 - \mu_1)(\mu_2 - \mu_2)) \\& = & E[(X_1 - \mu_1)^2] + E[(X_2 - \mu_2)^2] \\& = & Var(X_1) + Var(X_2)\end{array}$

Probability Distribution: Part 2

正态分布(Normal Distribution): $p(x) = \mathcal{N}(\mu, \sigma) = {\dfrac{1}{\sigma \sqrt {2 \pi} } } e^{-{\dfrac{(x -\mu)^2}{2\sigma^2}}}.$

其中 $\mu$ 代表正态分布的期望， $\sigma$ 代表正态分布的标准差，整个曲线根据 $\mu$ 对称。

见图所示

Sampling Distribution

样本分布(Sample Distribution):Each sample on its own, is a random variable, but because now they represent the mean of certain number n of items in the population, we label them with the upper letter $X$ . We can repeat this experiment $N$ times which gives as series of samples: $X_1,X_2,…X_N$ . This collection of samples is what we call a sampling distribution.

样本均值的期望(Expected value of the distribution of mean):We can apply to samples or statistics the same method for computing a mean than the method we used to calculate the mean of random variables.

注意到样本分布和普通的集群分布的区别，样本分布中，假定每个样本对集群取三次观察值，由于观察值本身是随机的缘故，因此观察值就是一个随机变量 $x$ 。那么这样的一个样本分布的样本大小为3，所以样本均值 $\bar{X_1} = E[x] = \frac{\sum^n_{i=1}x_i}{n}$ ，样本方差 $Var(\bar{X_1}) = \sum^n_{i=1}(x_i - \bar{X_1})$ 。

上文说到的样本均值的期望的计算，也就是将最基本的观察值事件求取均值作为随机变量的期望计算，是讲样本这个群作为一个随机变量 $X$ ，那么重复这样在总群中做采样，可以得到一系列 $X_1, X_2, … X_N$ ，此时样本均值的期望 $\mu_{\bar{X}} = E[\bar{X}] = \frac{\sum^N_{i=1}\bar{X_i}}{N}$ ，样本均值的方差 $Var(\bar{X_1}) = \frac{\sum^N_{i-1}(\bar{X_i} - \mu_{\bar{X}})^2}{N}$ 。

所以务必要明确原文当中Expected value of the distribution of mean的含义才可得以进一步的计算。

中心极限定理(Central Limit Theorem, CLT): The mean of the sampling distribution of the mean $\mu_{\bar{X}}$ equals the mean of the population $\mu$ and that the standard error of the distribution of means $\mu_{\bar{X}}$ is equal to the standard deviation of the population $\sigma$ divided by the square root of $n$ . In addition, the sampling distribution of the mean will approach a normal distribution $N(\mu, {\frac{\sigma}{\sqrt{n}}})$ . These relationships may be summarized as follows:

$\mu_{\bar{X}} = \mu_{ \sigma \bar{X}}=\frac{ \sigma }{\sqrt{n}}$

Properties of the Sample Mean

$\bar X = \dfrac{1}{n} (X_1 + … + X_n)$
样本均值等于总体平均值

$E[\bar X_n] = \dfrac{1}{n} \sum_{i=1}^n E[X_i] = \dfrac{1}{n} \cdot { n \mu } = \mu.$
遵循与基本事件 $x$ 一样的性质(样本均值的期望本身就是随机变量 $x$ 的期望所计算而来的均值，2已经证明样本期望就是总体期望本身)

$\begin{array}{l}E[aX+b] = aE[X] + b\\E[X_1 + … + X_n] = E[X_1] + … + E[X_n]\end{array}$
也是样本期望的定义

$\begin{array}{l}E[\bar X]&=&E[\dfrac{1}{n}(X_1 + … + X_n)]\\&=&\dfrac{1}{n}E[X_1 + … + X_n]\\&=&\dfrac{1}{n} \sum_{i=1}^N E[X_i].\end{array}$
样本方差

$\begin{array}{l}Var(\bar X_n)&=&\dfrac{1}{n^2} Var \left( \sum_{i=1}^n X_i \right) \\&=&\dfrac{1}{n^2} \sum_{i=1}^n Var(X_i) = \dfrac{1}{n^2} \cdot n \sigma^2 = \dfrac{\sigma^2}{n}.\end{array}$

6.正如之前方差的定义中讲述的以下性质样本方差也都继承

$\begin{array}{l}Var(aX + b) = a^2Var(X)\\Var(X_1+…+X_n) = Var(X_1) + … + Var(X_n).\end{array}$

因此样本方差 $\sigma^2$ 为:

$\begin{array}{l}Var(\bar X)&=&Var(\dfrac{1}{n}(X_1 + … X_n))\\&=&\dfrac{1}{n^2 } Var(X_1 + … X_n)\\&=&\dfrac{1}{n^2 } \sum_{i=1}^n Var(X_i).\end{array}$

7.因为样本方差为 $\frac{\sigma^2}{n}$ 比总体方差 $\sigma^2$ 要更小的关系(换言之样本标准差 $\frac{\sigma}{\sqrt{n}}$ )，样本均值 $\bar{X}$ 会比单一观察量 $X_i$ 所计算得到的期望 $\mu$ 更接近

Mathematical Foundations of Monte Carlo Methods 1

本文为在Scratchapixel上学习的翻译读后感与部分个人解读。这里不会将全篇的内容系数翻译，保留原文以便后期自行理解，笔者只精炼一些文章中关键的点出来便于记录。

Random Variables and Probability

随机变量(random variable):A random variable is not a fixed value, but a function, mapping or associating a unique numerical value to each possible outcome of a random process which is not necessarily a number.

随机变量本质上就是一个将随机实验结果，映射到实际数据上的函数 $X(e)$ 。

采样空间(sample space):A sample space defines the set of all possible outcomes from an experiment.

采样空间可以用于定义基本事件与非基本事件。假定你现在有10张牌，3张为0，5张为1，2张为2。那么采样的非基本事件的采样空间就为 $S = \{0, 0, 0, 1, 1, 1, 1, 1, 2, 2 \}$ ，而基本事件的采样空间就为 $S = \{ 0, 1, 2 \}$ 。

名词	释义
Random Variable	A random variable is a function X defined from a sample space S to a measurable space (1, 0). Random variables are denoted with upper case letters.
Probability	A probability provides a quantatative description of the likely occurrence of a particular event.
Observation or Realization	A realization, or observed value, of a random variable is the value that is actually observed.
Event	An event is any collection of outcomes of an experiment. Any subset of the sample space is an event.
Sample Space	Exhaustive list of all the possible outcomes of an experiment. Each possible result of such experiment is represented by one and only one point in the sample space, which is usually denoted by $S$ . The elements of the sample space can be thought of as all the different possibilities that could happen.

Probability Distrubution

伯努利分布(Bernoulli trail):In probability theory when a random process has only two outcomes.

二项分布(Binomial Distribution):We want to find the probability that $S=n$ , where $n\le N$ , which is the probability that $n$ of the $N$ samples take on the value of 1, and $N-n$ samples take on the value of 0:

$Pr(S=n)=C^N_n p^k (1-p)^{(N-n)}$

for n = 0, 1, 2, …, N, where:

$C^N_n=\frac{N!}{n!(N-n)!}$

更多的概率分布函数，比如均匀分布，泊松分布等等可以在这里找到List of probability distributions.

Probablity Properties

枚举事件(Collectively exhaustive events):A set of events is said to be jointly or collectively exhaustive if at least one of the event must occur.

互斥事件(Mutually exclusive events):Two sets A and B are said to be mutually exclusive or disjoint if A and B have no elements in common.

抛硬币本身就既是一个枚举事件，又是一个互斥事件:你可以保证得到的事件结果只有面和花两种(可枚举)，且若出现花的一面则另一面不会出现，反之成立(互斥事件)

独立事件(Independent event):When you toss a coin the probability of getting heads or tails is $\frac{1}{2}$ as we know it, but if you toss the coin a second time, the probability of getting either heads or tails is still $\frac{1}{2}$ . In other words, the first toss did not change the outcome of the second toss or to say it differently again, the chances of getting heads or tails on the second toss are completely independent of whether or not we got “tails” or “heads” in the first toss.

独立事件有别于前两者，只是表示每一次事件的发生不会因为上一次事件的发生而影响其发生概率。独立事件之间可以遵循乘法规则，也就是 $Pr(A \cap B) = Pr(A)Pr(B)$ .

Introduction to Statistics

统计学(Statistics):The goal of statistics is to provide information on random variables and probability distributions we don’t know anything about in the first place.

有偏(Bias):By “randomly” we mean that the process by which we select elements in the population, doesn’t give more importance to some elements than others. If it was the case we would introduce what we call bias in the calculation of this estimate.

也就是不在统计的随机过程中加入人为的统计因素，这将会导致偏差。但正如前文描述，Bias也不是一无是处，有时候在图形学中有偏的方法可以在更小的画面影响下更快的得到，甚至得到比无偏的方法更接近与收敛值。这其实是一个权衡利弊的结果。

采样/统计(Sample or Statistics):Our random variable really is some sort of “sampler”, it’s a tool or a function on the population, that we can use to collect data on that population, and the collected data makes up what we call the observations and the group of observations itself is what we call a sample or statistics.

Expected Value

期望(Expected Value):The mean and the expected value are equal however the mean is a simple average of numbers not weighted by anything, while the expected value is a sum of numbers weighted by their probability:

$E = \sum^N_{i=1}p_ix_i$

换言之，随机变量在采样数不断变大后会向一个值收敛，而这个值就是所求的数学期望

样本均值(Sample mean):the mean of a collection of observations produced by a random variable X, is called a sample mean:

$\bar{X}_n = \frac{1}{n}(X_1+X_2+X_3+…+X_n)$

独立同分布(Independent and Identically Distributed,i.i.d):Where $X_1,X_2, …$ is a sequence of random variables which have the property to be independent and identically distributed.

如果随机变量序列或者其他随机变量有相同的概率分布，并且他们之间互相独立，那么这些随机变量是独立同分布的。

当 $\omega$ 为随机变量采样空间中的一个采样结果时，可以这样表述 $x=X(\omega)$ 。所以以下方式可以 $\bar{X}=\frac{1}{n}(x_1+x_2+…+x_n)$ 也可以改写为 $\bar{X}=\frac{1}{n} (X_1(\omega)+X_2(\omega)+…+X_n(\omega)$ 。上述 $X_1,X_2, …$ 可以作为随机变量 $X$ 的一个实例来看待。

独立(Independent):Imagine that this coin lands on heads with probability 2/3, and tails with probability 1/3. If we flip the coin twice, the outcome of the first coin flip will not the change the outcome of the second.

同分布(Identically Distributed):When the coin was flipped the first time, the probability of either getting heads or tail was 2/3, and 1/3 respectively. When the coin is flipped the second time, the probability of actually getting either heads or tails is still 2/3, and 1/3. The probability that you get either heads or tails after the first flip doesn’t change.

大数定理(Law of Large Nunbers, LLN):The idea that the sample mean converges in value and probability to the expected value as the sample size increases.

大数定理实例解读: … If you toss a coin 10 times, what is the probability that you get 5 heads? This can actually be computed analytically using the binomial distribution:

$\left( \begin{array}{cr} 10 \ 5 \end{array} \right) \left( { 1 \over 2 }\right)^5 \left(1 - {1\over 2}\right)^5 = 0.2461.$

But if you now consider 100 trials, the probability becomes:

$\left( \begin{array}{cr} 100 \ 50 \end{array} \right) \left( { 1 \over 2 }\right)^{50} \left(1 - {1\over 2}\right)^{50} = 0.0796.$

The higher the number of trials, the smaller the probability of getting exactly N/2 number of heads. … however as mentioned before, interestingly the probability to get exactly N/2 heads gets smaller.Let’s for example calculate the probability that we can any number of heads between 40 and 60 for 100 trials:

$Pr(40 \leq X \leq 60) = \sum_{i=40}^{60} C^i_{100} \left( \dfrac{1}{2} \right)^i \left( 1 - \dfrac{1}{2} \right)^{100 -i} = 0.9648.$

However, if we compute the probability of getting any number of heads in the interval [4,6] for 10 trials, then we get:

$Pr(4 \leq X \leq 6) = \sum_{i=4}^{6} C^i_{10} \left( \dfrac{1}{2} \right)^i \left( 1 - \dfrac{1}{2} \right)^{10 -i} = 0.6563.$

Clearly, the probability of getting close to 1/2 increases as the number of trials increases.

想象一下丢了N次色子最后的次数分布图为正态分布，那么当总次数变为10次时，就可以理解为采样间隔非常大或者是采样频率更低，总次数不足，因此单个事件(比如5次)发生的概率就远远比采样频率更高的 $C^{5}_{10}$ 要大的多。

但倘若取采样间隔 $[40, 60]$ 以及 $[4, 6]$ 就可以明显发现，在采样频率更高的前提下，最终的概率会越来越趋近于 $\frac{1}{2}$ 附近。

对比连续型随机变量，事实上当采样频率趋于无穷大 $f \to \infty$ ，连续性随机变量概率函数 $PDF(x) = 0$ (事实上其概率密度函数的定义就是 $PDF(a<x<b) = \int^b_a f(x)dx$ ，这也被记为 $X \sim f(x)$ )，所以当离散型随机变量的采样频率不断增加，也就是大数定理反应的，其单一事件发生的概率也会不断趋近于0。

结论(Conclusion):the sample mean $\bar{X}$ of a random sample always converge in probability to the population mean $\mu$ of the population from which the random sample was taken … If we know the distribution of the random variable we can compute the expected value directly

$E[X] = \sum_{i=1} p_i x_i \rightarrow = \sum_{\omega \in S} X(\omega) p(\omega).$

事实上当写下 $E[X_1]$ 的时候，此时 $X_1$ 只是随机变量的一个取值，他可以是任何值，但是其期望 $E[X_1]$ 是固定的，这对随机变量的所有取值而言都一样 $E[X_1] = E[X_2] = E[X_3]… = E[X_n]$ 。原因很简单，因为这些随机变量的取值都共有一个概率密度函数。

Properties of Expectations

若 $Y = aX + b$ , 那么 $E[Y] = aE[X] + b$ 。考虑一下当抛掷的色子的号码同时增加b的时候，那么期望值也会随之增加b。

$\begin{array}{l}E[aX] &=& \sum_i a x_i P(X = x_i) \\&=&a \sum_i x_i P(X = x_i)\ &=&aE[X].\end{array}$

随机变量和的期望等价于每个随机变量期望的和

$E[X_1 + \text{…} X_n] = E[X_1] + \text{…} + E[X_n]$

X+Y的期望可以按照如下表达

$\begin{array}{l}E[X + Y] &=&\sum_i \sum_j (x_i + y_j) Pr(X = x_i, Y = y_i) \\& = & \sum_i \sum_j x_i Pr(X = x_i, Y = y_i) + \sum_i \sum_j y_j Pr(X = x_i, Y = y_i) \ & = & \sum_j y_j Pr(Y = y_j) + \sum_i x_i Pr(X = x_i) \\&=&E[Y] + E[X].\end{array}$

这里Y的概率已经被求和为1省略，X同理

$\sum_j Pr(X = x_i, Y = y_j) = Pr(X = x_i),$

$\sum_i Pr(X = x_i, Y = y_j) = Pr(Y = y_j).$

可以这样解读概率1的由来

$\begin{array}{l}\sum_j Pr(X = x_i, Y = y_j) & = & \sum_j Pr(X = x_i)$

这里 $Pr(X = x_i, Y = y_j)$ 就是代表当 $X$ 和 $Y$ 同时发生的概率，可以使用乘法守则来解决

Path Tracing Notes 1

Diffuse Reflection

以下笔记为读Syntopia后感，解决了不少概念上以及实际应用上的疑难点

入射光与表面法线夹角

$cos\theta = \vec{n} \ldotp \vec{l}$

出射光辐射度计算

$L_{out}(\vec{w_o}) = \int{K * L_{in}(\vec{w_i}) cos\theta} d\vec{w_i}$

漫反射模型在表面上，任意方向进入的光其反射方向都为随机，因此在给定指定出射方向上的辐射度求解时需要对当前点的整个半球方向上的入射光进行考虑，因为所有光都有可能在这个方向上出射。

这里的常数K本质上「就是决定了多少入射辐射度会在给定方向out上出去」，这里基于能量守恒的定义 $\int{Kcos\theta d\vec{w_i} \le 1}$ ，也就是出射辐射度绝不会大于入射辐射度。

简单的计算过后，也能发现 $K \le \frac{1}{\pi}$

漫反射模型当中，反照率 $albedo = K\pi$ ，当然在基于物理的材质模型下albedo始终处于[0, 1]区间当中，也就是满足能量守恒。

漫反射模型的渲染方程的定义可以改为如下形式

$L_{out}(\vec{w_o}) = \int{\frac{albedo}{\pi}*L_{in}(\vec{w_i})cos\theta d\vec{w_i}}$

但漫反射模型的K因子无非是下列中通用BRDF形式的渲染方程

$L_{out}(\vec{w_o}) = \int{BRDF(\vec{w_i}, \vec{w_o})*L_{in}(\vec{w_i})cos\theta d\vec{w_i}}$

为了计算上述的无穷积分，也就可以引入Monte Carlo Sampling。本质上就是为了求取一个积分的估计量，可以采用不断提升采样数量并求取平均来解决

$\int_a^b f(x) dx \approx{\frac{b-a}{N}\sum^N_{i=1}f(X_i)}$

将此应用到漫反射模型上

也就是替换 $BRDF(\vec{w_i, \vec{w_o}})$ 为反照率albedo形式， $cos\theta$ 可以替换为方向向量点积

$L_{out}(\vec{w_o}) = \int{\frac{albedo}{\pi}*L_{in}(\vec{w_i})cos\theta d\vec{w_i}}$

$= \frac{2\pi}{N}\sum_{\vec{w_i}}(\frac{albedo}{\pi})L_{in}(\vec{n} \ldotp \vec{w_i})$

$= \frac{2 \ldotp albedo}{N} \sum_{\vec{w_i}}(\vec{n} \ldotp \vec{w_i})$

Importance Sampling

介于 $cos\theta$ 也就是 $(\vec{n} \ldotp \vec{w_i})$ 这样的因子，入射光在靠近法线方向上贡献给反射光的辐射度将会更大。

根据Monte Carlo Sampling的样式定义，为求取积分近似值，我们需要将 $f(X_i)$ 除以PDF(probability density function，概率密度函数)

PDF与给定求取积分的函数f(x)越成比例，那么理论上说该积分求取收敛的也就会越快。这里f(x)本质上成比例与 $cos\theta$ ，那么配合上能量守恒可以将PDF定义为

$PDF = \frac{cos\theta}{2\pi}$

最终积分近似求解的表达式也就可以简化掉 $cos\theta$ 部分

Direct Lighting / Next Event Estimation

原先笔者在视频教学中了解到这一项优化技术，以为Next Event Estimation是一个比较小众的优化方式，配合上Advanced Global Illumination以及PBRT，这才明白其实Next Event Estimation本质上就是Direct Lighting，是一个非常常见的高效利用光线的方式。

很常见的方式是讲理想的面光源换为环境贴图，这样能够极大的改善光线利用率

Path tracing的方式本质上就是光线从摄像头发出，在场景中弹跳直到遇到光源为止。这里的关键就在于要遇到光源。

这里最重要的部分就是光源的面积，理想中的点光源在Path traing中得到的画面通常是一篇全黑，因为光线击中点光源的概率为0。粗劣的计算一下一个简单的场景:

角直径(angular diameter)科普，是用于描述从给定视点上观测一个球或者圆形的观测大小，也被成为视角(visual angle)。

太阳的角直径大约为32弧分，也就是大约0.5度左右，其立体角(solid angle)

$\Omega = 2\pi(1-cos\theta)$

因此粗略得出太阳大约覆盖了 $6*10^{-5}$ 立体弧度或者 $\frac{1}{100000}$ 的半球表面，所以即便是采用接近70000个采样点，半球也不过只有 $1-(1-10^{-5})^{70000} = 50\%$ 的几率能够捕捉到太阳光线(换句话说就是发射的光线50%几率能够击中太阳)。

Direct Lighting本质上是重要性采样的直观实践。在每次光线击中表面时我们都向已知光源发射一根光线手动获取其光亮度(当然当真正击中光源时我们将他的权值定为0，因为光源已经在先前部分有了贡献)。