【深度学习】Diffusion扩散模型原理解析2

由于篇幅受限,CSDN不能发布超过一定次数的文章,故在此给出上一篇链接:【深度学习】diffusion原理解析

3.2、目标函数求解

里面的最后一项, q ( x T ∣ x 0 ) q(x_T|x_0) q(xTx0)我们前面提到过,其近似服从标准正态,而对于 P ( x T ) P(x_T) P(xT),我们是假定为标准正态,这两项都可以求出来,所以没有任何可学习的参数

真正需要优化的是第一项和第二项。第一项就是重构损失;而第二项,是KL散度。里面的 P ( x t − 1 ∣ x t ) P(x_{t-1}|x_t) P(xt1xt)需要用神经网络去逼近。

论文提到, q ( x t − 1 ∣ x t ) q(x_{t-1}|x_t) q(xt1xt)是正态分布,但由于 q ( x t − 1 ∣ x t ) q(x_{t-1}|x_t) q(xt1xt)是无法求出来的,所以选择用 P ( x t − 1 ∣ x t ) P(x_{t-1}|x_t) P(xt1xt)去逼近

q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0)服从正态分布(证明),我们可以求出来。

直接把 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0)配成正态分布求解期望和方差比较麻烦,我们不如反过来推

假设多维高斯分布P(x),我们有
P ( x ) = 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ { − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) } = 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ { − 1 2 ( x T Σ − 1 x − μ T Σ − 1 x − x T Σ − 1 μ + μ Σ − 1 μ ) } = 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ⁡ { − 1 2 ( x T Σ − 1 x − 2 μ T Σ − 1 x + μ Σ − 1 μ ) } (11) \begin{aligned} P(x)=&\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp\left\{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right\} \\=&\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp\left\{-\frac{1}{2}(x^T\Sigma^{-1}x-\mu^T\Sigma^{-1}x-x^T\Sigma^{-1}\mu+\mu\Sigma^{-1}\mu)\right\} \\=&\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp\left\{-\frac{1}{2}(x^T\Sigma^{-1}x-2\mu^T\Sigma^{-1}x+\mu\Sigma^{-1}\mu)\right\} \nonumber\end{aligned}\tag{11} P(x)===(2π)2p∣Σ211exp{21(xμ)TΣ1(xμ)}(2π)2p∣Σ211exp{21(xTΣ1xμTΣ1xxTΣ1μ+μΣ1μ)}(2π)2p∣Σ211exp{21(xTΣ1x2μTΣ1x+μΣ1μ)}(11)
对于随机变量x,里面有关的只有 x T Σ − 1 x x^T\Sigma^{-1}x xTΣ1x 2 μ T Σ − 1 x 2\mu^T\Sigma^{-1}x 2μTΣ1x。其中第一项有两个x,为二次项。第二项有一个x,为一次项。

同理,对于 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0),我们只需要找出对应的一次项跟二次项,就能够得出期望跟协方差了

q ( x t − 1 ∣ x t , x 0 ) = q ( x t − 1 , x t ∣ x 0 ) q ( x t ∣ x 0 ) = q ( x t ∣ x t − 1 , x 0 ) q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) = q ( x t ∣ x t − 1 ) q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) = N ( α t x t − 1 , ( 1 − α t ) I ) N ( α ˉ t − 1 x 0 , ( 1 − α ˉ t − 1 ) I ) N ( α ˉ t x 0 , ( 1 − α ˉ t ) I ) ∝ exp ⁡ − { ( x t − α t x t − 1 ) T ( x t − α t x t − 1 ) 2 ( 1 − α t ) + ( x t − 1 − α ˉ t − 1 x 0 ) T ( x t − 1 − α ˉ t − 1 x 0 ) 2 ( 1 − α ˉ t − 1 ) − ( x t − α ˉ t x 0 ) T ( x t − α ˉ t x 0 ) 2 ( 1 − α ˉ t ) } = exp ⁡ − { x t T x t − 2 α t x t T x t − 1 + α t x t − 1 T x t − 1 2 ( 1 − α t ) + x t − 1 T x t − 1 − 2 α ˉ t − 1 x 0 T x t − 1 + α t − 1 x 0 T x 0 2 ( 1 − α ˉ t − 1 ) − ( x t − α ˉ t x 0 ) T ( x t − α ˉ t x 0 ) 2 ( 1 − α ˉ t ) } = exp ⁡ { − 1 2 ( x t − 1 T 1 − α ˉ t β t ( 1 − α ˉ t − 1 ) x t − 1 − 2 α t ( 1 − α ˉ t − 1 ) x t T + α ˉ t − 1 ( 1 − α t ) x 0 T 1 − α ˉ t 1 − α ˉ t β t ( 1 − α ˉ t − 1 ) ) x t − 1 + C } 由式( 11 )可得 \begin{aligned}q(x_{t-1}|x_t,x_0)=&\frac{q(x_{t-1},x_{t}|x_0)}{q(x_t|x_0)}\\=&\frac{q(x_t|x_{t-1},x_0)q(x_{t-1}|x_0)}{q(x_t|x_0)}\\=&\frac{q(x_t|x_{t-1})q(x_{t-1}|x_0)}{q(x_t|x_0)}\\=&\frac{N(\sqrt{\alpha_t}x_{t-1},(1-\alpha_t)I)N(\sqrt{\bar\alpha_{t-1}}x_{0},(1-\bar\alpha_{t-1})I)}{N(\sqrt{\bar\alpha_t}x_{0},(1-\bar\alpha_t)I)}\\\propto&\exp -\left\{\frac{(x_t-\sqrt{\alpha_t}x_{t-1})^T(x_t-\sqrt{\alpha_t}x_{t-1})}{2(1-\alpha_t)}+\frac{(x_{t-1}-\sqrt{\bar\alpha_{t-1}}x_{0})^T(x_{t-1}-\sqrt{\bar\alpha_{t-1}}x_{0})}{2(1-\bar\alpha_{t-1})}-\frac{(x_{t}-\sqrt{\bar\alpha_{t}}x_{0})^T(x_{t}-\sqrt{\bar\alpha_{t}}x_{0})}{2(1-\bar\alpha_{t})}\right\}\\=&\exp-\left\{\frac{x_t^Tx_t-2\sqrt{\alpha_t}x_{t}^Tx_{t-1}+\alpha_tx_{t-1}^Tx_{t-1}}{2(1-\alpha_t)}+\frac{x_{t-1}^Tx_{t-1}-2\sqrt{\bar\alpha_{t-1}}x_0^Tx_{t-1}+\alpha_{t-1}x_0^Tx_0}{2(1-\bar\alpha_{t-1})}-\frac{(x_{t}-\sqrt{\bar\alpha_{t}}x_{0})^T(x_{t}-\sqrt{\bar\alpha_{t}}x_{0})}{2(1-\bar\alpha_{t})}\right\}\\=&\exp\left\{-\frac{1}{2}\left(x_{t-1}^T\frac{1-\bar\alpha_t}{\beta_t(1-\bar\alpha_{t-1})}x_{t-1}-2\frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})x_t^T+\sqrt{\bar\alpha_{t-1}}(1-\alpha_t)x_0^T}{1-\bar\alpha_t}\frac{1-\bar\alpha_t}{\beta_t(1-\bar\alpha_{t-1})}\right)x_{t-1}+C\right\}\end{aligned}\nonumber由式(11)可得 q(xt1xt,x0)======q(xtx0)q(xt1,xtx0)q(xtx0)q(xtxt1,x0)q(xt1x0)q(xtx0)q(xtxt1)q(xt1x0)N(αˉt x0,(1αˉt)I)N(αt xt1,(1αt)I)N(αˉt1 x0,(1αˉt1)I)exp{2(1αt)(xtαt xt1)T(xtαt xt1)+2(1αˉt1)(xt1αˉt1 x0)T(xt1αˉt1 x0)2(1αˉt)(xtαˉt x0)T(xtαˉt x0)}exp{2(1αt)xtTxt2αt xtTxt1+αtxt1Txt1+2(1αˉt1)xt1Txt12αˉt1 x0Txt1+αt1x0Tx02(1αˉt)(xtαˉt x0)T(xtαˉt x0)}exp{21(xt1Tβt(1αˉt1)1αˉtxt121αˉtαt (1αˉt1)xtT+αˉt1 (1αt)x0Tβt(1αˉt1)1αˉt)xt1+C}由式(11)可得

q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ∣ a t ( 1 − α ˉ t − 1 ) x t + α ˉ t − 1 ( 1 − α t ) x 0 1 − α ˉ t , 1 − α ˉ t − 1 1 − α ˉ t β t I ) q(x_{t-1}|x_t,x_0)\sim N(x_{t-1}|\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t+\sqrt{\bar\alpha_{t-1}}(1-\alpha_t)x_0}{1-\bar\alpha_t},\frac{1-\bar\alpha_{t-1}}{1-\bar\alpha_t}\beta_tI) q(xt1xt,x0)N(xt11αˉtat (1αˉt1)xt+αˉt1 (1αt)x0,1αˉt1αˉt1βtI)

再简单变化一下,可得
q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ∣ a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + α ˉ t − 1 β t x 0 1 − α ˉ t , 1 − α ˉ t − 1 1 − α ˉ t β t I ) q(x_{t-1}|x_t,x_0)\sim N(x_{t-1}|\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\sqrt{\bar\alpha_{t-1}}\beta_t x_0}{1-\bar\alpha_t},\frac{1-\bar\alpha_{t-1}}{1-\bar\alpha_t}\beta_tI) q(xt1xt,x0)N(xt11αˉtat (1αˉt1)xt+1αˉtαˉt1 βtx0,1αˉt1αˉt1βtI)
那么接下来,就可以求解 K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ P ( x t − 1 ∣ x t ) ) KL(q(x_{t-1}|x_t,x_0)||P(x_{t-1}|x_t)) KL(q(xt1xt,x0)∣∣P(xt1xt))

q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ∣ μ ϕ t − 1 , Σ ϕ t − 1 ) q(x_{t-1}|x_{t},x_0)\sim N(x_{t-1}|\mu_\phi^{t-1},\Sigma_\phi^{t-1}) q(xt1xt,x0)N(xt1μϕt1,Σϕt1),为了简便,我隐去t-1时刻,记作 q ( x t − 1 ∣ x t , x 0 ) ∼ N ( x t − 1 ∣ μ ϕ , Σ ϕ ) q(x_{t-1}|x_{t},x_0)\sim N(x_{t-1}|\mu_\phi,\Sigma_\phi) q(xt1xt,x0)N(xt1μϕ,Σϕ)

由于 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0)的协方差与 x 0 , x t x_0,x_t x0,xt无关,是一个固定的值。

所以,设 P ( x t − 1 ∣ x t ) ∼ N ( x t − 1 ∣ μ θ t − 1 , Σ θ t − 1 ) P(x_{t-1}|x_t)\sim N(x_{t-1}|\mu_{\theta}^{t-1},\Sigma_{\theta}^{t-1}) P(xt1xt)N(xt1μθt1,Σθt1),为了简便,我依然隐去时刻,表达为 P ( x t − 1 ∣ x t ) ∼ N ( x t − 1 ∣ μ θ , Σ θ ) P(x_{t-1}|x_t)\sim N(x_{t-1}|\mu_{\theta},\Sigma_{\theta}) P(xt1xt)N(xt1μθ,Σθ)

里面的 Σ θ \Sigma_\theta Σθ直接等于 q ( x t − 1 ∣ x t , x 0 ) q(x_{t-1}|x_t,x_0) q(xt1xt,x0)的协方差,也就是 Σ ϕ = Σ θ \Sigma_\phi=\Sigma_\theta Σϕ=Σθ

下面给出两个正态分布的KL散度公式

在这里插入图片描述

其中,n表示随机变量x的维度

推导请看参考③

直接代入公式可得:
K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ P ( x t − 1 ∣ x 0 ) ) = 1 2 [ ( μ ϕ − μ θ ) T Σ θ − 1 ( μ ϕ − μ θ ) − log ⁡ det ⁡ ( Σ θ − 1 Σ ϕ ) + T r ( Σ θ − 1 Σ ϕ ) − n ] = 1 2 [ ( μ ϕ − μ θ ) T Σ θ − 1 ( μ ϕ − μ θ ) − log ⁡ 1 + n − n ] = 1 2 [ ( μ ϕ − μ θ ) T Σ θ − 1 ( μ ϕ − μ θ ) ] = 1 2 σ t 2 [ ∣ ∣ μ ϕ − μ θ ( x t , t ) ∣ ∣ 2 ] (12) \begin{aligned}KL(q(x_{t-1}|x_t,x_0)||P(x_{t-1}|x_0))=&\frac{1}{2}\left[(\mu_\phi-\mu_\theta)^T\Sigma_\theta^{-1}(\mu_\phi-\mu_\theta)-\log \det(\Sigma_\theta^{-1}\Sigma_\phi)+Tr(\Sigma_\theta^{-1}\Sigma_\phi)-n\right]\\=&\frac{1}{2}\left[(\mu_\phi-\mu_\theta)^T\Sigma_\theta^{-1}(\mu_\phi-\mu_\theta)-\log1+n-n\right]\\=&\frac{1}{2}\left[(\mu_\phi-\mu_\theta)^T\Sigma_\theta^{-1}(\mu_\phi-\mu_\theta)\right]\\=&\frac{1}{2\sigma^2_t}\left[||\mu_\phi-\mu_\theta(x_t,t)||^2\right]\end{aligned}\tag{12} KL(q(xt1xt,x0)∣∣P(xt1x0))====21[(μϕμθ)TΣθ1(μϕμθ)logdet(Σθ1Σϕ)+Tr(Σθ1Σϕ)n]21[(μϕμθ)TΣθ1(μϕμθ)log1+nn]21[(μϕμθ)TΣθ1(μϕμθ)]2σt21[∣∣μϕμθ(xt,t)2](12)
σ t 2 \sigma_t^2 σt2是方差 Σ θ \Sigma_\theta Σθ的表达,由于是给定的,所以为了简单起见,写成这样。

但论文里面对他进行了比较,不论 σ t 2 \sigma_t^2 σt2直接取成 Σ θ \Sigma_\theta Σθ,还是 β t 、 β ˉ \beta_t、\bar \beta βtβˉ,都得到了差不多的实验结果

所以,便得到了最终的损失函数

在论文中,还将该损失函数写成了其他形式,我们前面写到
μ ϕ = a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + α ˉ t − 1 β t x 0 1 − α ˉ t (13) \mu_\phi=\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\sqrt{\bar\alpha_{t-1}}\beta_t x_0}{1-\bar\alpha_t}\tag{13} μϕ=1αˉtat (1αˉt1)xt+1αˉtαˉt1 βtx0(13)
那么对于 P ( x t − 1 ∣ x t ) P(x_{t-1}|x_t) P(xt1xt)而言,里面其实只有一个未知数,也就是 x 0 x_0 x0,所以,我们只需要让神经网络预测 x 0 x_0 x0就可以了,记神经网络预测的 x 0 x_0 x0 f θ ( x t , t ) f_\theta(x_t,t) fθ(xt,t)所以式(12)可进行如下变化:
1 2 σ i 2 [ ∣ ∣ μ ϕ − μ θ ( x t , t ) ∣ ∣ 2 ] = 1 2 σ t 2 [ ∣ ∣ ( a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + α ˉ t − 1 β t x 0 1 − α ˉ t ) − ( a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + α ˉ t − 1 β t f θ ( x t , t ) 1 − α ˉ t ∣ ∣ 2 ) ] = α ˉ t − 1 β t 2 2 σ t 2 ( 1 − α ˉ t ) 2 [ ∣ ∣ x 0 − f θ ( x t , t ) ∣ ∣ 2 ] (14) \begin{aligned}\frac{1}{2\sigma^2_i}\left[||\mu_\phi-\mu_\theta(x_t,t)||^2\right]=&\frac{1}{2\sigma^2_t}\left[||\left(\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\sqrt{\bar\alpha_{t-1}}\beta_t x_0}{1-\bar\alpha_t}\right)-\left(\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\sqrt{\bar\alpha_{t-1}}\beta_t f_\theta(x_t,t)}{1-\bar\alpha_t}||^2\right)\right]\\=&\frac{\bar\alpha_{t-1}\beta_t^2}{2\sigma^2_t(1-\bar\alpha_t)^2}\left[||x_0-f_\theta(x_t,t)||^2\right]\end{aligned}\tag{14} 2σi21[∣∣μϕμθ(xt,t)2]==2σt21[∣∣(1αˉtat (1αˉt1)xt+1αˉtαˉt1 βtx0)(1αˉtat (1αˉt1)xt+1αˉtαˉt1 βtfθ(xt,t)2)]2σt2(1αˉt)2αˉt1βt2[∣∣x0fθ(xt,t)2](14)
除此之外,还可以去预测噪声


x t = α ˉ t x 0 + 1 − α ˉ t ϵ t → x 0 = x t − 1 − α ˉ t ϵ t α ˉ t (14) x_t=\sqrt{\bar\alpha_t}x_{0}+\sqrt{1-\bar\alpha_t}\epsilon_t\rightarrow x_0=\frac{x_t-\sqrt{1-\bar\alpha_t}\epsilon_t}{\sqrt{\bar\alpha_t}}\tag{14} xt=αˉt x0+1αˉt ϵtx0=αˉt xt1αˉt ϵt(14)
x 0 x_0 x0代入式(13)
μ ϕ = a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + α ˉ t − 1 β t 1 − α ˉ t x t − 1 − α ˉ t ϵ t α ˉ t = a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + α ˉ t − 1 β t x t ( 1 − α ˉ t ) α ˉ t − α ˉ t − 1 β t 1 − α ˉ t ϵ t ( 1 − α ˉ t ) α ˉ t = a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + β t x t ( 1 − α ˉ t ) α t − β t 1 − α ˉ t ϵ t ( 1 − α ˉ t ) α t = 1 α t [ a t ( 1 − α ˉ t − 1 ) x t 1 − α ˉ t + β t x t 1 − α ˉ t − β t 1 − α ˉ t ϵ t 1 − α ˉ t ] = 1 α t [ ( α t ( 1 − α ˉ t − 1 ) + β t 1 − α ˉ t ) x t − β t 1 − α ˉ t ϵ t 1 − α ˉ t ] = 1 α t [ x t − β t 1 − α ˉ t ϵ t ] (15) \begin{aligned}\mu_\phi=&\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\sqrt{\bar\alpha_{t-1}}\beta_t}{1-\bar\alpha_t}\frac{x_t-\sqrt{1-\bar\alpha_t}\epsilon_t}{\sqrt{\bar\alpha_t}}\\=&\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\sqrt{\bar\alpha_{t-1}}\beta_tx_t}{(1-\bar\alpha_t)\sqrt{\bar\alpha_t}}-\frac{\sqrt{\bar\alpha_{t-1}}\beta_t\sqrt{1-\bar\alpha_t}\epsilon_t}{(1-\bar\alpha_t)\sqrt{\bar\alpha_t}}\\=&\frac{\sqrt{a_t}(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\beta_tx_t}{(1-\bar\alpha_t)\sqrt\alpha_t}-\frac{\beta_t\sqrt{1-\bar\alpha_t}\epsilon_t}{(1-\bar\alpha_t)\sqrt\alpha_t}\\=&\frac{1}{\sqrt{\alpha_t}}\left[\frac{a_t(1-\bar\alpha_{t-1})x_t}{1-\bar\alpha_t}+\frac{\beta_tx_t}{1-\bar\alpha_t}-\frac{\beta_t\sqrt{1-\bar\alpha_t}\epsilon_t}{1-\bar\alpha_t}\right]\\=&\frac{1}{\sqrt{\alpha_t}}\left[\left(\frac{\alpha_t(1-\bar\alpha_{t-1})+\beta_t}{1-\bar\alpha_t}\right)x_t-\frac{\beta_t\sqrt{1-\bar\alpha_t}\epsilon_t}{1-\bar\alpha_t}\right]\\=&\frac{1}{\sqrt{\alpha_t}}\left[x_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\epsilon_t\right]\end{aligned}\tag{15} μϕ======1αˉtat (1αˉt1)xt+1αˉtαˉt1 βtαˉt xt1αˉt ϵt1αˉtat (1αˉt1)xt+(1αˉt)αˉt αˉt1 βtxt(1αˉt)αˉt αˉt1 βt1αˉt ϵt1αˉtat (1αˉt1)xt+(1αˉt)α tβtxt(1αˉt)α tβt1αˉt ϵtαt 1[1αˉtat(1αˉt1)xt+1αˉtβtxt1αˉtβt1αˉt ϵt]αt 1[(1αˉtαt(1αˉt1)+βt)xt1αˉtβt1αˉt ϵt]αt 1[xt1αˉt βtϵt](15)
此时我们发现,对于 P ( x t − 1 ∣ x t ) P(x_{t-1}|x_t) P(xt1xt)中,只剩下 ϵ t \epsilon_t ϵt是未知数,所以,我们用神经网络去预测噪声
1 2 σ i 2 [ ∣ ∣ μ ϕ − μ θ ( x t , t ) ∣ ∣ 2 ] = 1 2 σ t 2 ∣ ∣ 1 α t ( x t − β t 1 − α ˉ t ϵ t ) − 1 α t ( x t − β t 1 − α ˉ t ϵ θ ( x t , t ) ) ∣ ∣ 2 = β t 2 2 σ t 2 ( 1 − α ˉ t ) α t ∣ ∣ ϵ t − ϵ θ ( x t , t ) ∣ ∣ 2 \begin{aligned}\frac{1}{2\sigma^2_i}\left[||\mu_\phi-\mu_\theta(x_t,t)||^2\right]=&\frac{1}{2\sigma_t^2}||\frac{1}{\sqrt{\alpha_t}}\left(x_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\epsilon_t\right)-\frac{1}{\sqrt{\alpha_t}}\left(x_t-\frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\epsilon_\theta(x_t,t)\right)||^2\\=&\frac{\beta_t^2}{2\sigma_t^2(1-\bar\alpha_t)\alpha_t}||\epsilon_t-\epsilon_\theta(x_t,t)||^2\end{aligned}\nonumber 2σi21[∣∣μϕμθ(xt,t)2]==2σt21∣∣αt 1(xt1αˉt βtϵt)αt 1(xt1αˉt βtϵθ(xt,t))22σt2(1αˉt)αtβt2∣∣ϵtϵθ(xt,t)2
至此,我们中遇得到了KL散度的优化目标函数

接下来,我们来看重构损失
max ⁡ E q ( x 1 ∣ x 0 ) [ log ⁡ P ( x 0 ∣ x 1 ) ] ≈ max ⁡ 1 n ∑ i = 1 n log ⁡ P ( x 0 i ∣ x 1 i ) = max ⁡ 1 n ∑ i = 1 n log ⁡ 1 2 π D / 2 ∣ Σ θ ∣ 1 / 2 exp ⁡ { − 1 2 ( x 0 i − μ θ ( x 1 i , 1 ) ) T Σ θ − 1 ( x 0 i − μ θ ( x 1 i , 1 ) ) } = max ⁡ 1 n ∑ i = 1 n log ⁡ 1 2 π D / 2 ∣ Σ θ ∣ 1 / 2 − 1 2 ( x 0 i − μ θ ( x 1 i , 1 ) ) T Σ θ − 1 ( x 0 i − μ θ ( x 1 i , 1 ) ) ∝ min ⁡ 1 n ∑ i = 1 n ( x 0 i − μ θ ( x 1 i , 1 ) ) T ( x 0 i − μ θ ( x 1 i , 1 ) ) = min ⁡ 1 n ∑ i = 1 n ∣ ∣ x 0 i − μ θ ( x 1 i , 1 ) ∣ ∣ 2 \begin{aligned}\max \mathbb{E}_{q(x_1|x_0)}[\log P(x_0|x_1)]\approx&\max\frac{1}{n}\sum\limits_{i=1}^n\log P(x_0^i|x_1^i)\\=&\max\frac{1}{n}\sum\limits_{i=1}^n\log \frac{1}{{2\pi }^{D/2}|\Sigma_\theta|^{1/2}}\exp\left\{-\frac{1}{2}(x_0^i-\mu_\theta(x_1^i,1))^T\Sigma_\theta^{-1}(x_0^i-\mu_\theta(x_1^i,1))\right\}\\=&\max\frac{1}{n}\sum\limits_{i=1}^n\log \frac{1}{{2\pi }^{D/2}|\Sigma_\theta|^{1/2}}-\frac{1}{2}(x_0^i-\mu_\theta(x_1^i,1))^T\Sigma_\theta^{-1}(x_0^i-\mu_\theta(x_1^i,1))\\\propto &\min \frac{1}{n}\sum\limits_{i=1}^n(x_0^i-\mu_\theta(x_1^i,1))^T(x_0^i-\mu_\theta(x_1^i,1))\\=&\min \frac{1}{n}\sum\limits_{i=1}^n||x_0^i-\mu_\theta(x_1^i,1)||^2\end{aligned} maxEq(x1x0)[logP(x0x1)]===maxn1i=1nlogP(x0ix1i)maxn1i=1nlog2πD/2Σθ1/21exp{21(x0iμθ(x1i,1))TΣθ1(x0iμθ(x1i,1))}maxn1i=1nlog2πD/2Σθ1/2121(x0iμθ(x1i,1))TΣθ1(x0iμθ(x1i,1))minn1i=1n(x0iμθ(x1i,1))T(x0iμθ(x1i,1))minn1i=1n∣∣x0iμθ(x1i,1)2

将式(14)的 x 0 x_0 x0和式(15)代入
∣ ∣ x 0 i − μ θ ( x 1 i , 1 ) ∣ ∣ 2 = ∣ ∣ x 1 − 1 − α ˉ 1 ϵ 1 α ˉ 1 − 1 α 1 [ x 1 − β 1 1 − α ˉ 1 ϵ θ ( x 1 , 1 ) ] ∣ ∣ 2 ∝ ∣ ∣ ϵ θ ( x 1 , 1 ) − ϵ 1 ∣ ∣ 2 \begin{aligned}||x_0^i-\mu_\theta(x_1^i,1)||^2=&||\frac{x_1-\sqrt{1-\bar\alpha_1}\epsilon_1}{\sqrt{\bar\alpha_1}}-\frac{1}{\sqrt{\alpha_1}}\left[x_1-\frac{\beta_1}{\sqrt{1-\bar\alpha_1}}\epsilon_\theta(x_1,1)\right]||^2\\\propto&||\epsilon_\theta(x_1,1)-\epsilon_1||^2\end{aligned} ∣∣x0iμθ(x1i,1)2=∣∣αˉ1 x11αˉ1 ϵ1α1 1[x11αˉ1 β1ϵθ(x1,1)]2∣∣ϵθ(x1,1)ϵ12
所以,最终的流程为:

在这里插入图片描述

4、结束

好了,以上就是本文所有内容了,如有问题,还望指出,阿里嘎多!

在这里插入图片描述

5、参考

①一文解释 Diffusion Model (一) DDPM 理论推导 - 知乎 (zhihu.com)

②Diffusion Model入门(8)——Denoising Diffusion Probabilistic Models(完结篇) - 知乎 (zhihu.com)

③两个多元正态分布的KL散度、巴氏距离和W距离 - 科学空间|Scientific Spaces (kexue.fm)

④什么是扩散模型?

⑤diffusion model 最近在图像生成领域大红大紫,如何看待它的风头开始超过 GAN ? - 知乎 (zhihu.com)

相关推荐

最近更新

  1. .Net Core WebAPI参数的传递方式

    2024-05-13 20:50:08       0 阅读
  2. QT--气泡框的实现

    2024-05-13 20:50:08       0 阅读
  3. LeetCode 968.监控二叉树 (hard)

    2024-05-13 20:50:08       0 阅读
  4. leetcode热题100.完全平方数(动态规划进阶)

    2024-05-13 20:50:08       0 阅读
  5. leetcode328-Odd Even Linked List

    2024-05-13 20:50:08       0 阅读
  6. C 语言设计模式(结构型)

    2024-05-13 20:50:08       0 阅读
  7. v-if 与 v-show(vue3条件渲染)

    2024-05-13 20:50:08       0 阅读
  8. kafka防止消息丢失配置

    2024-05-13 20:50:08       0 阅读

热门阅读

  1. 处理Git将本地大文件上传到公共区域失败

    2024-05-13 20:50:08       3 阅读
  2. 通过实例学C#之Stack类

    2024-05-13 20:50:08       5 阅读
  3. SQLZOO:Self join

    2024-05-13 20:50:08       2 阅读
  4. MySQL sql_mode引发的异常

    2024-05-13 20:50:08       3 阅读
  5. SQLZOO:Using Null

    2024-05-13 20:50:08       5 阅读
  6. Redis面试高频问题

    2024-05-13 20:50:08       5 阅读
  7. 【编程向导】Docker-常用命令

    2024-05-13 20:50:08       3 阅读
  8. OSINT技术情报精选·2024年5月第1周

    2024-05-13 20:50:08       4 阅读
  9. 二分查找GO语言实现

    2024-05-13 20:50:08       3 阅读
  10. MYSQL DBA运维实战

    2024-05-13 20:50:08       4 阅读
  11. js设计模式--发布订阅者模式

    2024-05-13 20:50:08       3 阅读