Let’s consider the example from Chapter 1 on sex ratio among births. For European-race populations the proportion of female births is approximately .485. Scientists are interested in factors that affect this proportion. Here we consider whether the medical condition placentia previa impacts the sex ratio. One study in Germany found 437 female births out of 980 total births.
# We input the data
n <- 980
y <- 437
# First assume a beta(1,1) prior
# Then posterior distribution is beta(438,544)
x <- c(0:1000)/1000
postd <- dbeta(x,438,544)
priord <- dbeta(x,1,1)
lhood <- exp(437*log(x) + 543*log(1-x))
par(mfrow=c(3,1))
plot(x,priord,type="l",main="prior")
plot(x,lhood,type="l",main="likelihood")
plot(x,postd,type="l",main="posterior")
We can summarize the posterior distribution analytically. For example
qbeta(c(.025,.25,.50,.75,.975),438,544)
## [1] 0.4150655 0.4353081 0.4459919 0.4567090 0.4771998
Or via simulation.
postdraws <- rbeta(1000,438,544)
mean(postdraws)
## [1] 0.4457587
sqrt(var(postdraws))
## [1] 0.0156546
quantile(postdraws,probs=c(0.025,.25,.50,.75,.975))
## 2.5% 25% 50% 75% 97.5%
## 0.4157086 0.4348037 0.4461792 0.4570891 0.4766284
hist(postdraws,breaks=c(35:55)/100)
lines(x,10*postd,type="l")
I frequently want to see the mean, s.d., and percentiles. So I can write a function to compute these and it will be available when I go into R
describe <- function(x){c(mn=mean(x),sd=sqrt(var(x)),quantile(x,probs=c(0.025,.25,.50,.75,.975)))}
describe(postdraws)
## mn sd 2.5% 25% 50% 75% 97.5%
## 0.4457587 0.0156546 0.4157086 0.4348037 0.4461792 0.4570891 0.4766284
Now we can try other prior distributions. Indeed we can write a short function to perform conjugate analysis for the binomial distribution.
binomial.conj <- function(n,y,a,b) {
postdraws <- rbeta(1000,y+a,n-y+b)
describe(postdraws)}
binomial.conj(980,437,1,1)
## mn sd 2.5% 25% 50% 75%
## 0.44641523 0.01622191 0.41329165 0.43591843 0.44649917 0.45745187
## 97.5%
## 0.47904013
binomial.conj(980,437,12,12) # prior mean = .5, prior s.d. = .1
## mn sd 2.5% 25% 50% 75%
## 0.44765721 0.01572398 0.41648732 0.43676972 0.44757439 0.45895388
## 97.5%
## 0.47688277
binomial.conj(980,437,50,50) # prior mean = .5, prior s.d. = .05
## mn sd 2.5% 25% 50% 75%
## 0.45073276 0.01521337 0.42085756 0.43999008 0.45070197 0.46111891
## 97.5%
## 0.47955554
What if we don’t want to use a conjugate prior distribution? The simulation approach is very flexible and can easily accommodate any prior distribution. We can for the moment just describe our prior distribution as putting probabilities on the grid we have been using (.001,……,.999). For one example, suppose we want a bimodal prior distribution that puts probability primarily between .4 and .6 but also allows some probability on the full interval.
prior <- c(rep(1,400),1+.1*c(0:100), 11-.1*c(1:100),rep(1,400))
plot(x,prior,type="l")
postdist <- prior*lhood
postdist <- postdist/sum(postdist)
postdraws <- sample(x,1000,replace=TRUE,prob=postdist)
describe(postdraws)
## mn sd 2.5% 25% 50% 75%
## 0.44998500 0.01553706 0.42000000 0.44000000 0.45000000 0.46100000
## 97.5%
## 0.47902500