Bayesian Inference

Bayesian inference is a statistical approach to making predictions or inferences about unknown parameters in a given model. This approach is named after the 18th-century mathematician and statistician Thomas Bayes.

Bayesian vs Frequentist Inference

The Bayesian approach is in contrast to the frequentist approach to statistical inference. Frequentist inference only considers the probability of observed data given a model, whereas Bayesian inference considers both the probability of the observed data given the model and the prior probability of the model parameters.

In frequentist inference, the parameter is considered fixed, and the probability distribution of the observed data is calculated under that assumption. In contrast, Bayesian inference treats the parameter as a random variable with a prior distribution, and updates the distribution of the parameter based on the observed data.

Bayes' Theorem

Bayes' theorem is the foundation of Bayesian inference. It states that the posterior probability of a hypothesis is proportional to the prior probability of the hypothesis, multiplied by the likelihood of the data given the hypothesis.

P(θx)=P(xθ)P(θ)P(x)P(\theta | x) = \frac{P(x | \theta)P(\theta)}{P(x)}

where:

  • P(θx)P(\theta | x) is the posterior probability of the parameter θ\theta given the data xx
  • P(xθ)P(x | \theta) is the likelihood of the data xx given the parameter θ\theta
  • P(θ)P(\theta) is the prior probability of the parameter θ\theta
  • P(x)P(x) is the marginal probability of the data xx

Bayesian inference involves updating the prior probability distribution of the parameter based on the observed data using Bayes' theorem.

Example

Suppose we want to determine the probability of a fair coin landing heads up. We can model the outcome of a coin flip with a Bernoulli distribution, where x=1x=1 represents heads and x=0x=0 represents tails. The probability of the coin flipping heads is denoted as θ\theta, and we assume a prior distribution of θBeta(a,b)\theta \sim \text{Beta}(a,b).

Let's say we flip the coin 10 times and observe 6 heads and 4 tails. We can update the prior distribution of θ\theta using Bayes' theorem:

P(θx)=P(xθ)P(θ)P(x)=θ6(1θ)4Γ(a+b)Γ(a)Γ(b)θa1(1θ)b101θ6(1θ)4Γ(a+b)Γ(a)Γ(b)θa1(1θ)b1dθ=θa+61(1θ)b+41Γ(a+b)Γ(a+6)Γ(b+4)01θa+61(1θ)b+41Γ(a+b)Γ(a+6)Γ(b+4)dθ\begin{aligned} P(\theta|x) &= \frac{P(x|\theta)P(\theta)}{P(x)} \\ &= \frac{\theta^6(1-\theta)^4\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1}}{\int_0^1\theta^6(1-\theta)^4\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1}d\theta} \\ &= \frac{\theta^{a+6-1}(1-\theta)^{b+4-1}\frac{\Gamma(a+b)}{\Gamma(a+6)\Gamma(b+4)}}{\int_0^1\theta^{a+6-1}(1-\theta)^{b+4-1}\frac{\Gamma(a+b)}{\Gamma(a+6)\Gamma(b+4)}d\theta} \end{aligned}

The posterior distribution of θ\theta is a Beta distribution with parameters a+6a+6 and b+4b+4. We can use this posterior distribution to make predictions about the probability of the coin landing heads up.

Conclusion

Bayesian inference offers a powerful framework for making predictions and inferences about model parameters. It allows for the incorporation of prior knowledge and the updating of that knowledge based on observed data. By using Bayes' theorem, we can calculate the posterior distribution of parameters and make probability predictions about future events.

ベイズ推定[JA]