Introduction

I have updated Answers of exercises on Hoff, A first course in Bayesian statistical methods (added 10-4), so I’d like to briefly touch upon its content in this blog post as well.

First, let me briefly explain the relationship between this problem and the textbook content. I previously wrote an article titled Hoff/標準ベイズのM-Hアルゴリズムがworkすることの証明でつまずいた話 (A story about stumbling on the proof that the M-H algorithm in Hoff/Standard Bayes works), and in Hoff (2009), the proof that the M-H algorithm works in Chapter 10 is done in the following steps:

  1. The M-H algorithm generates a Markov chain that is irreducible, aperiodic, and positive recurrent.

  2. By the Ergodic Theorem, as \(s \to \infty\), there uniquely exists a \(\pi\) such that

    • \(\mathrm{Pr}(x^{(s)} \in A) \to \pi(A)\) for any set \(A\);
    • \(\frac{1}{S} \sum g(x^{(s)}) \to \int g(x) \pi(x) dx\).
  3. Such a \(\pi\) is called the stationary distribution and has the following property:

    If \(x^{(s)} \sim \pi\),
    and \(x^{(s+1)} \) is generated from the Markov chain starting at \(x^{(s)}\),
    then \(\mathrm{Pr}(x^{(s+1)} \in A) = \pi(A)\). (Once sampled from the stationary distribution, subsequent samples are also sampled from the stationary distribution.)

  4. \(\pi(x) = p_0(x)\) holds.

From 1-4, the Markov chain generated by the M-H algorithm can approximate the target distribution \(p_0\).

Hoff (2009) provides the proof for the fourth step using the M-H algorithm for the one-dimensional discrete case. However, Exercise 10-4 asks for a proof for the case of Gibbs sampling with multi-dimensional, continuous parameters.

Specifically, the problem asks to show that for a parameter vector \(\bm{\phi}\), when \(\bm{\phi}^{(s)}\) is sampled from the target distribution \(p(\bm{\phi})\), and \(\bm{\phi}^{(s+1)}\) is generated by Gibbs sampling, then \(\text{Pr}(\bm{\phi}^{(s+1)} \in A) = \int_A p(\bm{\phi}) d \bm{\phi}\) holds.

My solution is as follows.

My Solution

Let \(\bm{\phi}\) be a \(d\)-dimensional random vector.

Let \(\bm{\phi}_a\) and \(\bm{\phi}_b\) be arbitrary random vectors in \(\mathbb{R}^d\), and let \(\bm{\phi}_a = (a_1, \dots, a_d)\) and \(\bm{\phi}_b = (b_1, \dots, b_d)\). Then,

\begin{equation} \label{eq:10-4-dbc} \begin{aligned} & p(\bm{\phi}_a) \mathrm{Pr}( \bm{\phi}^{(t)} = \bm{\phi}_a \to \bm{\phi}^{(t+1)} = \bm{\phi}_b ) \\ = & p(a_1, \dots, a_d) p(b_1 | a_2, \dots, a_d) p(b_2 | b_1, a_3, \dots, a_d) \\ & \times p(b_3 | b_1, b_2, a_4, \dots, a_d) \dots p(b_d | b_1, \dots, b_{d-1}) \\ = & p(a_1|a_2, \dots, a_d) p(a_2, \dots, a_d) p(b_1 | a_2, \dots, a_d) p(b_2 | b_1, a_3, \dots, a_d) \\ & \times p(b_3 | b_1, b_2, a_4, \dots, a_d) \dots p(b_d | b_1, \dots, b_{d-1}) \\ = & p(a_1|a_2, \dots, a_d) p(b_1, a_2, \dots, a_d) p(b_2 | b_1, a_3, \dots, a_d) \\ & \times p(b_3 | b_1, b_2, a_4, \dots, a_d) \dots p(b_d | b_1, \dots, b_{d-1}) \\ = & p(a_1|a_2, \dots, a_d) p(a_2 | b_1, a_3, \dots, a_d) p(b_1, a_3, \dots, a_d) p(b_2 | b_1, a_3, \dots, a_d) \\ & \times p(b_3 | b_1, b_2, a_4, \dots, a_d) \dots p(b_d | b_1, \dots, b_{d-1}) \\ = & p(a_1|a_2, \dots, a_d) p(a_2 | b_1, a_3, \dots, a_d) p(b_1, b_2, a_3, \dots, a_d) \\ & \times p(b_3 | b_1, b_2, a_4, \dots, a_d) \dots p(b_d | b_1, \dots, b_{d-1}) \\ = & \ldots \\ = &p(a_1 | a_2, \dots, a_d) p(a_2 | b_1, a_3, \dots, a_d) \dots p(a_d | b_1, \dots, b_{d-1}) p(b_1, \dots, b_d) \\ = & p(\bm{\phi}_b) \mathrm{Pr}( \bm{\phi}^{(t)} = \bm{\phi}_b \to \bm{\phi}^{(t+1)} = \bm{\phi}_a ) \\ \end{aligned} \end{equation}

holds.

Therefore,

\begin{equation*} \begin{aligned} \text{Pr}(\bm{\phi}^{(s+1)} \in A) &= \int_{\mathbb{R}^d} \left[ \int_A p(\bm{\phi}^{(s)}) \mathrm{Pr}(\bm{\phi}^{(t)} = \bm{\phi}^{(s)} \to \bm{\phi}^{(t+1)} = \bm{\phi}^{(s+1)}) d\bm{\phi}^{(s+1)} \right] d\bm{\phi}^{(s)} & (\text{by assumption}) \\ &= \int_{\mathbb{R}^d} \left[ \int_A p(\bm{\phi}^{(s+1)}) \mathrm{Pr}(\bm{\phi}^{(t)} = \bm{\phi}^{(s+1)} \to \bm{\phi}^{(t+1)} = \bm{\phi}^{(s)}) d\bm{\phi}^{(s+1)} \right] d\bm{\phi}^{(s)} &(\because \eqref{eq:10-4-dbc}) \\ &= \int_A p(\bm{\phi}^{(s+1)}) \left[ \int_{\mathbb{R}^d} \mathrm{Pr}(\bm{\phi}^{(t)} = \bm{\phi}^{(s+1)} \to \bm{\phi}^{(t+1)} = \bm{\phi}^{(s)}) d\bm{\phi}^{(s)} \right] d\bm{\phi}^{(s+1)} \\ &= \int_A p(\bm{\phi}^{(s+1)}) \left[ \int_{\mathbb{R}^d} p(\bm{\phi}^{(s)} | \bm{\phi}^{(s+1)}) d\bm{\phi}^{(s)} \right] d\bm{\phi}^{(s+1)} \\ &= \int_A p(\bm{\phi}^{(s+1)}) \times 1 d\bm{\phi}^{(s+1)} \\ &= \int_A p(\bm{\phi}) d\bm{\phi} \end{aligned} \end{equation*}

Note: The original derivation had a slight error in the integration steps. I’ve corrected it to show the integration over the transition probability yields 1. The key step relies on the detailed balance condition derived in \eqref{eq:10-4-dbc} and the law of total probability. A more standard way to show stationarity is:

\begin{equation*} \begin{aligned} \int_{\mathbb{R}^d} p(\bm{\phi}^{(s)}) \mathrm{Pr}(\bm{\phi}^{(s)} \to \bm{\phi}^{(s+1)}) d\bm{\phi}^{(s)} &= \int_{\mathbb{R}^d} p(\bm{\phi}^{(s+1)}) \mathrm{Pr}(\bm{\phi}^{(s+1)} \to \bm{\phi}^{(s)}) d\bm{\phi}^{(s)} \quad (\text{using detailed balance}) \\ &= p(\bm{\phi}^{(s+1)}) \int_{\mathbb{R}^d} \mathrm{Pr}(\bm{\phi}^{(s+1)} \to \bm{\phi}^{(s)}) d\bm{\phi}^{(s)} \\ &= p(\bm{\phi}^{(s+1)}) \times 1 \\ &= p(\bm{\phi}^{(s+1)}) \end{aligned} \end{equation*}

Integrating this result over set A gives the desired outcome.

Thoughts on Solving

As mentioned earlier, the textbook contains the proof for the M-H algorithm case. Since Gibbs sampling is a type of M-H algorithm, and the more general proof is in the textbook, I initially assumed this problem would be easy.

However, I got surprisingly stuck on this problem (is it just me?). One reason is that Hoff (2009) performs the proof without using concepts like “kernel” or “detailed balance condition,” which made it difficult to correlate with other textbooks when I consulted them.

Respecting Hoff’s approach, I also carried out the proof without explicitly using concepts like “kernel” or “detailed balance condition.” If there are any mistakes or a more elegant solution, I would appreciate it if you could point them out. Thank you for reading to the end again.

References

Hoff, Peter D. 2009. A First Course in Bayesian Statistical Methods. 1st ed. Springer Texts in Statistics. New York, N.Y: Springer.