Flow-Models

约 650 个字 2 张图片预计阅读时间 3 分钟

Key Idea¶

Key Idea

Map simple distributions (easy to sample and evaluate densities) to complex distributions through an invertible transformation

和 Variational Autoencoders 相似：

Change of Variables Formula¶

Change of Variables formula

(1D case) If \(X=f(Z)\) and \(f(\cdot)\) is monotone with inverse \(Z=f^{-1}(X)=h(X)\), then:

\[ p_{X}(x)=p_{Z}(h(X))|h'(X)|=\frac{p_{Z}(z)}{f'(z)} \]

（证明：

\[ P(X \le x)=P(f(Z) \leq x) = P(Z \leq f^{-1}(x))=P(Z \leq h(x)) \]

对 \(x\) 求导：

\[ p_{X}(x)=p_{Z}(h(x))|h'(x)| \]

证毕）

Change of Variables formula

(General case) The mapping between \(Z\) and \(X\), given by \(f:\mathbb{R}^n \to \mathbb{R}^n\), is invertible such that \(X=f(Z)\) and \(Z=f^{-1}(X)\). Then

\[ p_{X}(x)=p_{Z}(f^{-1}(x))\left| \det \left( \frac{\partial f^{-1}(x)}{\partial x} \right) \right| \]

和 VAE 不同，\(x\) 和 \(z\) 都需要是连续的，并且维度相同
如果 \(f\) 是可逆的线性变换，则

\[ p_{X}(x)=p_{Z}(f^{-1}(x))\left| \det \left( \frac{\partial f(z)}{\partial z} \right) \right|^{-1} \]

Normalizing Flow Model¶

Flow of Transformations¶

应用一系列如上图的变换，使得从最初的简单分布到最后的复杂分布的过程更容易:

\[ z_{m}=f_{\theta}^m \circ \cdots \circ f_{\theta}^1(z_{0}) := f_{\theta}(z_{0}) \]

此时 change of variable 公式变为

\[ p_{X}(x;\theta)=p_{Z}(f_{\theta}^{-1}(x))\prod_{m=1}^{M}\left| \det \left( \frac{\partial (f_{\theta}^m)^{-1}(z_{m})}{\partial z_{m}}\right) \right| \]

Learning and Inference¶

仍然使用 Maximum-Likelihood-Learning:

\[ \max_{\theta}\log p_{X}(\mathcal{D};\theta)=\max_{\theta }\sum_{x \in \mathcal{D}} \left( \log p_{Z}(f_{\theta}^{-1}(x)) + \log \left| \det \left(\frac{ \partial f_{\theta}^{-1}(x) }{ \partial x } \right) \right| \right) \]

\[ \begin{align} \nabla_{\theta}P_{X}(x;\theta) &=\nabla_{\theta}\left( \log p_{Z} \left(f_{\theta}^{-1}(x)\right) + \log \left| \det \left( \frac{ \partial f_{\theta}^{-1}(x) }{ \partial x } \right) \right| \right) \\ & = \frac{1}{p_{Z}(f_{\theta}^{-1}(x))} \nabla_{\theta}p_{Z}\left(f_{\theta}^{-1}(x)\right) + \frac{1}{\left| \det\left(\frac{ \partial f_{\theta}^{-1}(x) }{ \partial x } \right) \right| } \nabla_{\theta} \left| \det \left( \frac{ \partial f_{\theta}^{-1}(x) }{ \partial x } \right) \right| \end{align} \]

使用正向的 \(f\) 和逆向的 \(f^{-1}\) 就可以完成类似于 VAE 中 encoder-decoder 的任务

Desiderata¶

latent variable 和 data 都要是连续的，且维数相同
简单的先验分布 \(p_{Z}(z)\)，例如各向同性的高斯分布
可逆、高效的变换 \(f\)
Determinant 结构特殊，计算更高效（否则计算 determinant 的复杂度为 \(O(n^3)\)），例如三角阵：设 \(x=(x_{1}, x_{2}, \dots,x_{n})=f(z)=(f_{1}(z),\dots,f_{n}(z)), z=(z_{1},\dots,z_{n})\) 如果 \(x_{i}\) 仅依赖于 \(z_{<i}\)，则为下三角阵

\[ J=\frac{ \partial f }{ \partial z } = \begin{pmatrix} \frac{ \partial f_{1} }{ \partial z_{1} } & \cdots & \frac{ \partial f_{1} }{ \partial z_{n} } \\ \dots & \dots & \dots \\ \frac{ \partial f_{n} }{ \partial z_{1} } & \dots & \frac{ \partial f_{n} }{ \partial z_{n} } \end{pmatrix} = \begin{pmatrix} \frac{ \partial f_{1} }{ \partial z_{1} } & \cdots & 0 \\ \dots & \dots & \dots \\ \frac{ \partial f_{n} }{ \partial z_{1} } & \dots & \frac{ \partial f_{n} }{ \partial z_{n} } \end{pmatrix} \]