演習の概要

  • 一般化線形モデルを推定する
  • ここでは、ポアソン回帰モデル、負の二項分布モデル、ゼロ過剰ポアソン回帰モデル、ゼロ過剰負の二項分布モデルを扱う

注意事項

  • 慶應義塾大学SFCで開講している「ベイズ統計」の授業履修者向けの演習用ページです。
  • 必ずしも全てのバージョンのRやOSで動作確認を行っていません。
  • Rの更新などにより、Rコードなどを予告無しに変更する場合があります。

分析の準備

パッケージのインストール

install.packages("pscl")
install.packages("brms")
install.packages("sp")
install.packages("tidyverse")
library(pscl)
library(brms)
library(sp)
library(tidyverse)

データ

  • 全国の市区町村別病院数のデータ(2006年)をもちいます。ここからダウンロードできます。
  • このデータには、市区町村コードID、一般病院数Hospital、医師数Doctors、人口数Pop、交通事故発生件数TrAcc、人口密度Popd、65歳以上人口割合Pop65r、一人あたり所得IncPが含まれています。
data84 <- read.csv("data84.csv", header=TRUE, sep=",")
head(data84)
##     ID Hospital Doctors     Pop TrAcc    Popd Pop65r  Emp3r   IncP
## 1 1100      191    5682 1880863 10000 16.7766 0.1730 0.8689 1.3579
## 2 1202       32     797  294264  1523  4.3409 0.2394 0.8452 1.1173
## 3 1203       15     341  142161   540  5.8430 0.2742 0.7892 1.0415
## 4 1204       38    1221  355004  1749  4.7486 0.2219 0.8259 1.1288
## 5 1205        6     292   98372   347 12.1974 0.2569 0.7236 1.1881
## 6 1206       16     352  190478   658  1.3977 0.2118 0.8228 1.1357
  • ヒストグラムを作図してみよう。(一般病院数が30以上の自治体数は非常に少ないため省略)
hist(data84$Hospital, 0:200, 
     xlab="一般病院数", ylab="自治体数",
     cex=1.5, cex.axis=1.5, main="", 
     xlim=c(0,30), ylim=c(0,1000))

ポアソン回帰モデル

最尤推定

  • glm()を用いてポアソン回帰モデルを最尤推定する
model84_p <- glm(Hospital~Popd+Pop65r, 
                 data=data84, 
                 family=poisson)
summary(model84_p)
## 
## Call:
## glm(formula = Hospital ~ Popd + Pop65r, family = poisson, data = data84)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.5396  -2.0722  -1.1382   0.1121  30.2976  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.0029314  0.0489047   61.40   <2e-16 ***
## Popd         0.0101086  0.0002775   36.43   <2e-16 ***
## Pop65r      -7.4496953  0.2096749  -35.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 18036  on 1815  degrees of freedom
## Residual deviance: 14235  on 1813  degrees of freedom
## AIC: 18285
## 
## Number of Fisher Scoring iterations: 6

ベイズ推定

  • bms()を用いてポアソン回帰モデルをベイズ推定する
model84_pb <- brm(Hospital~Popd+Pop65r, 
                 data=data84, 
                 family=poisson,
                 prior=NULL,
                 iter=2000,
                 warmup=1000,
                 chains=3)
  • 推定結果は以下のようになる
summary(model84_pb)
##  Family: poisson 
##   Links: mu = log 
## Formula: Hospital ~ Popd + Pop65r 
##    Data: data84 (Number of observations: 1816) 
## Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 3000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     3.00      0.05     2.91     3.10        905 1.01
## Popd          0.01      0.00     0.01     0.01       2508 1.00
## Pop65r       -7.44      0.21    -7.86    -7.03        734 1.01
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
  • 事後分布を図示する
plot(model84_pb)

負の二項分布回帰モデル

最尤推定

  • glm.nb()を用いて負の二項分布回帰モデルを最尤推定する
model84_nb <- glm.nb(Hospital~Popd+Pop65r, 
                 data=data84)
summary(model84_nb)
## 
## Call:
## MASS::glm.nb(formula = Hospital ~ Popd + Pop65r, data = data84, 
##     init.theta = 0.7022697332, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0712  -1.1105  -0.5038   0.1036   5.4334  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.369913   0.138035  24.413   <2e-16 ***
## Popd         0.012066   0.001371   8.804   <2e-16 ***
## Pop65r      -9.211951   0.528993 -17.414   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.7023) family taken to be 1)
## 
##     Null deviance: 2495.3  on 1815  degrees of freedom
## Residual deviance: 1900.6  on 1813  degrees of freedom
## AIC: 8495.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.7023 
##           Std. Err.:  0.0290 
## 
##  2 x log-likelihood:  -8487.5820

ベイズ推定

  • bms()を用いて負の二項分布回帰モデルをベイズ推定する
model84_nbb <- brm(Hospital~Popd+Pop65r, 
                 data=data84, 
                 family="negbinomial",
                 prior=NULL,
                 iter=2000,
                 warmup=1000,
                 chains=3)
  • 推定結果は以下のようになる
summary(model84_nbb)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: Hospital ~ Popd + Pop65r 
##    Data: data84 (Number of observations: 1816) 
## Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 3000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     3.37      0.15     3.08     3.67       2245 1.00
## Popd          0.01      0.00     0.01     0.02       2996 1.00
## Pop65r       -9.22      0.57   -10.33    -8.11       2152 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## shape     0.70      0.03     0.65     0.76       2409 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
  • 事後分布を図示する
plot(model84_nbb)

ゼロ過剰ポアソン回帰モデル

最尤推定

  • psclパッケージのzeroinfl()を用いてゼロ過剰ポアソン回帰モデルを最尤推定する
model84_zip <-  zeroinfl(Hospital~Popd+Pop65r, 
                 data=data84)
summary(model84_zip)
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
## 
## Call:
## pscl::zeroinfl(formula = Hospital ~ Popd + Pop65r, data = data84)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1163 -1.1179 -0.7098  0.1100 51.8565 
## 
## Count model coefficients (poisson with log link):
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.9821538  0.0529822   56.29   <2e-16 ***
## Popd         0.0084273  0.0002854   29.53   <2e-16 ***
## Pop65r      -6.2381281  0.2324180  -26.84   <2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.75176    0.33649  -5.206 1.93e-07 ***
## Popd        -0.05581    0.01195  -4.673 2.98e-06 ***
## Pop65r       3.28326    1.18070   2.781  0.00542 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 12 
## Log-likelihood: -8449 on 6 Df

ベイズ推定

  • bms()を用いてゼロ過剰ポアソン回帰モデルをベイズ推定する
model84_zipb <- brm(Hospital~Popd+Pop65r, 
                 data=data84, 
                 family="zero_inflated_poisson",
                 prior=NULL,
                 iter=2000,
                 warmup=1000,
                 chains=3,
                 seed=1234)
  • 推定結果は以下のようになる
summary(model84_zipb)
##  Family: zero_inflated_poisson 
##   Links: mu = log; zi = identity 
## Formula: Hospital ~ Popd + Pop65r 
##    Data: data84 (Number of observations: 1816) 
## Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 3000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     3.05      0.05     2.95     3.16       1676 1.00
## Popd          0.01      0.00     0.01     0.01       2946 1.00
## Pop65r       -6.61      0.23    -7.06    -6.16       1418 1.00
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## zi     0.21      0.01     0.19     0.24       1753 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
  • 事後分布を図示する
plot(model84_zipb)

ゼロ過剰負の二項分布モデル

最尤推定

  • psclパッケージのzeroinfl()の引数dist="negbin"を指定して、ゼロ過剰負の二項分布モデルを最尤推定する
  • 基本モデル
model84_zinb1 <- zeroinfl(Hospital~Popd+Pop65r, 
                          data=data84, 
                          dist="negbin")
summary(model84_zinb1)
## 
## Call:
## pscl::zeroinfl(formula = Hospital ~ Popd + Pop65r, data = data84, 
##     dist = "negbin")
## 
## Pearson residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83454 -0.65700 -0.39170  0.09742 20.22244 
## 
## Count model coefficients (negbin with log link):
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.419307   0.150732  22.685  < 2e-16 ***
## Popd         0.011475   0.001517   7.564 3.92e-14 ***
## Pop65r      -9.300731   0.580367 -16.026  < 2e-16 ***
## Log(theta)  -0.312045   0.043232  -7.218 5.28e-13 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)    5.332      2.087   2.555  0.01063 * 
## Popd          -2.999      1.166  -2.572  0.01012 * 
## Pop65r       -24.598      9.473  -2.597  0.00941 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.7319 
## Number of iterations in BFGS optimization: 53 
## Log-likelihood: -4233 on 7 Df
  • 0値の発生確率を定数項で与えるモデル
model84_zinb2 <- zeroinfl(Hospital~Popd+Pop65r|1, 
                          data=data84, 
                          dist="negbin")
summary(model84_zinb2)
## 
## Call:
## zeroinfl(formula = Hospital ~ Popd + Pop65r | 1, data = data84, 
##     dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8180 -0.6588 -0.4054  0.1080 20.3269 
## 
## Count model coefficients (negbin with log link):
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.369923   0.150447  22.399  < 2e-16 ***
## Popd         0.012066   0.001561   7.729 1.08e-14 ***
## Pop65r      -9.211989   0.574493 -16.035  < 2e-16 ***
## Log(theta)  -0.353440   0.041357  -8.546  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -14.92      85.61  -0.174    0.862
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.7023 
## Number of iterations in BFGS optimization: 20 
## Log-likelihood: -4244 on 5 Df
  • カウントデータのモデル(Hospital~Popd+Pop65r+Emp3r)とzero-inflated model(Hospital~Popd+Emp3r)の組み合わせモデル
model84_zinb3 <- zeroinfl(Hospital~Popd+Pop65r+Emp3r|Popd+Emp3r, 
                          data=data84, 
                          dist="negbin")
summary(model84_zinb3)
## 
## Call:
## zeroinfl(formula = Hospital ~ Popd + Pop65r + Emp3r | Popd + Emp3r, 
##     data = data84, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9607 -0.6778 -0.3665  0.2443  9.8948 
## 
## Count model coefficients (negbin with log link):
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.970845   0.244539  -3.970 7.18e-05 ***
## Popd         0.005735   0.001282   4.472 7.75e-06 ***
## Pop65r      -7.906842   0.531503 -14.876  < 2e-16 ***
## Emp3r        5.823930   0.289447  20.121  < 2e-16 ***
## Log(theta)  -0.045769   0.045452  -1.007    0.314    
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -27.6700     7.4594  -3.709 0.000208 ***
## Popd         -2.8474     0.9427  -3.020 0.002524 ** 
## Emp3r        33.5572     8.9136   3.765 0.000167 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.9553 
## Number of iterations in BFGS optimization: 37 
## Log-likelihood: -4048 on 8 Df

ベイズ推定

  • bms()を用いてゼロ過剰ポアソン回帰モデルをベイズ推定する
model84_zinbb <- brm(Hospital~Popd+Pop65r, 
                 data=data84, 
                 family="zero_inflated_negbinomial",
                 prior=NULL,
                 iter=2000,
                 warmup=1000,
                 chains=3,
                 seed=1234)
  • 推定結果は以下のようになる
summary(model84_zinbb)
##  Family: zero_inflated_negbinomial 
##   Links: mu = log; shape = identity; zi = identity 
## Formula: Hospital ~ Popd + Pop65r 
##    Data: data84 (Number of observations: 1816) 
## Samples: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 3000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     3.37      0.15     3.07     3.67       2535 1.00
## Popd          0.01      0.00     0.01     0.02       3579 1.00
## Pop65r       -9.21      0.57   -10.29    -8.11       2336 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## shape     0.70      0.03     0.65     0.76       2569 1.00
## zi        0.00      0.00     0.00     0.01       3341 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
  • 事後分布を図示する
plot(model84_zinbb)

  • 事後予測をチェックする
pp_check(model84_zinbb)
## Using 10 posterior samples for ppc type 'dens_overlay' by default.