演習の概要

注意事項

  • 慶應義塾大学SFCで開講している「統計解析」の授業履修者向けの演習用ページです。
  • 必ずしも全てのバージョンのRやOSで動作確認を行っていません。この演習用ページを作成している段階では、R3.6.0を使っています。
  • Rの更新などにより、Rコードなどを予告無しに変更する場合があります。

データ分析の準備

パッケージのインストール

  • 今回の演習では、以下のパッケージを使います。
install.packages("ggplot2")
install.packages("datasets")
install.packages("jtools")
install.packages("ggstance")
install.packages("broom")
library(ggplot2)
library(datasets)
library(jtools)
library(ggstance)
library(broom)

データの要約

  • datasetsパッケージのwarpbreaksデータを用います
  • warpbreaksデータの統計量は以下のようになります。平均と比較して分散が非常に大きいため、過分散なデータであると言えます。
summary(warpbreaks)
     breaks      wool   tension
 Min.   :10.00   A:27   L:18   
 1st Qu.:18.25   B:27   M:18   
 Median :26.00          H:18   
 Mean   :28.15                 
 3rd Qu.:34.00                 
 Max.   :70.00                 
c(mean=mean(warpbreaks$breaks), var=var(warpbreaks$breaks), ratio=var(warpbreaks$breaks)/mean(warpbreaks$breaks))
      mean        var      ratio 
 28.148148 174.204053   6.188828 
  • ヒストグラムは以下のようになります。
ggplot2::ggplot(warpbreaks, ggplot2::aes(x=breaks)) + 
  ggplot2::geom_histogram()

ggplot2::ggplot(warpbreaks, ggplot2::aes(x=breaks, fill=wool)) + 
  ggplot2::geom_histogram(position = "identity", alpha = 0.5) 

ggplot2::ggplot(warpbreaks, ggplot2::aes(x=breaks, fill=tension)) + 
  ggplot2::geom_histogram(position = "identity", alpha = 0.5) 

モデルの推定

ポアソン回帰モデルの推定

  • glm()関数を用いてポアソン回帰モデルを推定します。リンク関数をfamily = poisson(link = "log")と指定します。
wb.poisson <- glm(breaks~wool+tension, data=warpbreaks, 
                  family = poisson(link = "log"))
summary(wb.poisson)

Call:
glm(formula = breaks ~ wool + tension, family = poisson(link = "log"), 
    data = warpbreaks)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.6871  -1.6503  -0.4269   1.1902   4.2616  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.69196    0.04541  81.302  < 2e-16 ***
woolB       -0.20599    0.05157  -3.994 6.49e-05 ***
tensionM    -0.32132    0.06027  -5.332 9.73e-08 ***
tensionH    -0.51849    0.06396  -8.107 5.21e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 210.39  on 50  degrees of freedom
AIC: 493.06

Number of Fisher Scoring iterations: 4

疑似ポアソン回帰モデルの推定

  • glm()関数を用いて疑似ポアソン回帰モデルを推定します。リンク関数をfamily = quasipoisson(link = "log")と指定します。
wb.qpoisson <- glm(breaks~wool+tension, data=warpbreaks, 
                  family = quasipoisson(link = "log"))
summary(wb.qpoisson)

Call:
glm(formula = breaks ~ wool + tension, family = quasipoisson(link = "log"), 
    data = warpbreaks)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.6871  -1.6503  -0.4269   1.1902   4.2616  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.69196    0.09374  39.384  < 2e-16 ***
woolB       -0.20599    0.10646  -1.935 0.058673 .  
tensionM    -0.32132    0.12441  -2.583 0.012775 *  
tensionH    -0.51849    0.13203  -3.927 0.000264 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 4.261537)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 210.39  on 50  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

負の二項分布モデルの推定

  • MASSパッケージのglm.nb()関数を用い負の二項分布モデルを推定します。
library(MASS)
wb.negbin <- glm.nb(breaks~wool+tension, data=warpbreaks)
summary(wb.negbin)

Call:
glm.nb(formula = breaks ~ wool + tension, data = warpbreaks, 
    init.theta = 9.944385436, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.0144  -0.9319  -0.2240   0.5828   1.8220  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   3.6734     0.0979  37.520  < 2e-16 ***
woolB        -0.1862     0.1010  -1.844   0.0651 .  
tensionM     -0.2992     0.1217  -2.458   0.0140 *  
tensionH     -0.5114     0.1237  -4.133 3.58e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(9.9444) family taken to be 1)

    Null deviance: 75.464  on 53  degrees of freedom
Residual deviance: 53.723  on 50  degrees of freedom
AIC: 408.76

Number of Fisher Scoring iterations: 1

              Theta:  9.94 
          Std. Err.:  2.56 

 2 x log-likelihood:  -398.764 

モデル間のパラメータ比較

  • jtoolsパッケージのplot_summs()関数を用いて、モデルパラメータを視覚的に比較できます。
#jtools::plot_summs(wb.poisson, 
#           scale = TRUE, exp = TRUE)
jtools::plot_summs(wb.poisson, wb.qpoisson, wb.negbin, 
                   scale = TRUE, exp = TRUE)
Note: Pseudo-R2 for quasibinomial/quasipoisson families is calculated by refitting
the fitted and null models as binomial/poisson.

予測

  • predict()関数を使って、予測結果をかえすことができます。ここでは、wool="A", tension="L"の場合の予測結果を比較します。
predict(wb.poisson, newdata=data.frame(wool="A", tension="L"), type="response")
       1 
40.12354 
predict(wb.qpoisson, newdata=data.frame(wool="A", tension="L"), type="response")
       1 
40.12354 
predict(wb.negbin, newdata=data.frame(wool="A", tension="L"), type="response")
      1 
39.3838 
LS0tCnRpdGxlOiAi57Wx6KiI6Kej5p6QUua8lOe/kjUiCiNhdXRob3I6ICJUb21veXVraSBGdXJ1dGFuaSIKI2RhdGU6ICIyMDE5LzgvOSIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OiBkZWZhdWx0Ci0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKa25pdHI6Om9wdHNfY2h1bmskc2V0KGNhY2hlID0gVFJVRSkKa25pdHI6Om9wdHNfY2h1bmskc2V0KGVycm9yID0gVFJVRSkKYGBgCgojIyDmvJTnv5Ljga7mpoLopoEKLSDjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjgIHnlpHkvLzjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjgIHosqDjga7kuozpoIXliIbluIPjg6Ljg4fjg6sKCiMjIyDms6jmhI/kuovpoIUKLSDmhbbmh4nnvqnlob7lpKflraZTRkPjgafplovorJvjgZfjgabjgYTjgovjgIzntbHoqIjop6PmnpDjgI3jga7mjojmpa3lsaXkv67ogIXlkJHjgZHjga7mvJTnv5LnlKjjg5rjg7zjgrjjgafjgZnjgIIKLSDlv4XjgZrjgZfjgoLlhajjgabjga7jg5Djg7zjgrjjg6fjg7Pjga5S44KET1Pjgafli5XkvZznorroqo3jgpLooYzjgaPjgabjgYTjgb7jgZvjgpPjgILjgZPjga7mvJTnv5LnlKjjg5rjg7zjgrjjgpLkvZzmiJDjgZfjgabjgYTjgovmrrXpmo7jgafjga/jgIFSMy42LjDjgpLkvb/jgaPjgabjgYTjgb7jgZnjgIIKLSBS44Gu5pu05paw44Gq44Gp44Gr44KI44KK44CBUuOCs+ODvOODieOBquOBqeOCkuS6iOWRiueEoeOBl+OBq+WkieabtOOBmeOCi+WgtOWQiOOBjOOBguOCiuOBvuOBmeOAgiAKCiMjIOODh+ODvOOCv+WIhuaekOOBrua6luWCmQojIyMg44OR44OD44Kx44O844K444Gu44Kk44Oz44K544OI44O844OrCi0g5LuK5Zue44Gu5ryU57+S44Gn44Gv44CB5Lul5LiL44Gu44OR44OD44Kx44O844K444KS5L2/44GE44G+44GZ44CCCmBgYHtyIGluc3RhbGwucGFja2FnZXMsIGV2YWw9RkFMU0V9Cmluc3RhbGwucGFja2FnZXMoImdncGxvdDIiKQppbnN0YWxsLnBhY2thZ2VzKCJkYXRhc2V0cyIpCmluc3RhbGwucGFja2FnZXMoImp0b29scyIpCmluc3RhbGwucGFja2FnZXMoImdnc3RhbmNlIikKaW5zdGFsbC5wYWNrYWdlcygiYnJvb20iKQpgYGAKCmBgYHtyIGxpYnJhcnksIGV2YWw9RkFMU0V9CmxpYnJhcnkoZ2dwbG90MikKbGlicmFyeShkYXRhc2V0cykKbGlicmFyeShqdG9vbHMpCmxpYnJhcnkoZ2dzdGFuY2UpCmxpYnJhcnkoYnJvb20pCmBgYAoKIyMjIOODh+ODvOOCv+OBruimgee0hAotIGBkYXRhc2V0c2Djg5Hjg4PjgrHjg7zjgrjjga5gd2FycGJyZWFrc2Djg4fjg7zjgr/jgpLnlKjjgYTjgb7jgZkKLSBgd2FycGJyZWFrc2Djg4fjg7zjgr/jga7ntbHoqIjph4/jga/ku6XkuIvjga7jgojjgYbjgavjgarjgorjgb7jgZnjgILlubPlnYfjgajmr5TovIPjgZfjgabliIbmlaPjgYzpnZ7luLjjgavlpKfjgY3jgYTjgZ/jgoHjgIHpgY7liIbmlaPjgarjg4fjg7zjgr/jgafjgYLjgovjgajoqIDjgYjjgb7jgZnjgIIKYGBge3Igc3VtbWFyeX0Kc3VtbWFyeSh3YXJwYnJlYWtzKQpgYGAKYGBge3IgbWVhbnZhcn0KYyhtZWFuPW1lYW4od2FycGJyZWFrcyRicmVha3MpLCB2YXI9dmFyKHdhcnBicmVha3MkYnJlYWtzKSwgcmF0aW89dmFyKHdhcnBicmVha3MkYnJlYWtzKS9tZWFuKHdhcnBicmVha3MkYnJlYWtzKSkKYGBgCgoKLSDjg5Ljgrnjg4jjgrDjg6njg6Djga/ku6XkuIvjga7jgojjgYbjgavjgarjgorjgb7jgZnjgIIKYGBge3IgaGlzdDEsIGZpZy5oZWlnaHQ9MS41LCBmaWcud2lkdGg9MS41fQpnZ3Bsb3QyOjpnZ3Bsb3Qod2FycGJyZWFrcywgZ2dwbG90Mjo6YWVzKHg9YnJlYWtzKSkgKyAKICBnZ3Bsb3QyOjpnZW9tX2hpc3RvZ3JhbSgpCmBgYApgYGB7ciBoaXN0MiwgZmlnLmhlaWdodD0xLjUsIGZpZy53aWR0aD0xLjV9CmdncGxvdDI6OmdncGxvdCh3YXJwYnJlYWtzLCBnZ3Bsb3QyOjphZXMoeD1icmVha3MsIGZpbGw9d29vbCkpICsgCiAgZ2dwbG90Mjo6Z2VvbV9oaXN0b2dyYW0ocG9zaXRpb24gPSAiaWRlbnRpdHkiLCBhbHBoYSA9IDAuNSkgCmBgYApgYGB7ciBoaXN0MywgZmlnLmhlaWdodD0xLjUsIGZpZy53aWR0aD0xLjV9CmdncGxvdDI6OmdncGxvdCh3YXJwYnJlYWtzLCBnZ3Bsb3QyOjphZXMoeD1icmVha3MsIGZpbGw9dGVuc2lvbikpICsgCiAgZ2dwbG90Mjo6Z2VvbV9oaXN0b2dyYW0ocG9zaXRpb24gPSAiaWRlbnRpdHkiLCBhbHBoYSA9IDAuNSkgCmBgYAoKIyMg44Oi44OH44Or44Gu5o6o5a6aCiMjIyDjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjga7mjqjlrpoKLSBgZ2xtKClg6Zai5pWw44KS55So44GE44Gm44Od44Ki44K944Oz5Zue5biw44Oi44OH44Or44KS5o6o5a6a44GX44G+44GZ44CC44Oq44Oz44Kv6Zai5pWw44KSYGZhbWlseSA9IHBvaXNzb24obGluayA9ICJsb2ciKWDjgajmjIflrprjgZfjgb7jgZnjgIIKYGBge3Igd2IucG9pc3Nvbn0Kd2IucG9pc3NvbiA8LSBnbG0oYnJlYWtzfndvb2wrdGVuc2lvbiwgZGF0YT13YXJwYnJlYWtzLCAKICAgICAgICAgICAgICAgICAgZmFtaWx5ID0gcG9pc3NvbihsaW5rID0gImxvZyIpKQpzdW1tYXJ5KHdiLnBvaXNzb24pCmBgYAoKIyMjIOeWkeS8vOODneOCouOCveODs+WbnuW4sOODouODh+ODq+OBruaOqOWumgotIGBnbG0oKWDplqLmlbDjgpLnlKjjgYTjgabnlpHkvLzjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjgpLmjqjlrprjgZfjgb7jgZnjgILjg6rjg7Pjgq/plqLmlbDjgpJgZmFtaWx5ID0gcXVhc2lwb2lzc29uKGxpbmsgPSAibG9nIilg44Go5oyH5a6a44GX44G+44GZ44CCCmBgYHtyIHdiLnFwb2lzc29ufQp3Yi5xcG9pc3NvbiA8LSBnbG0oYnJlYWtzfndvb2wrdGVuc2lvbiwgZGF0YT13YXJwYnJlYWtzLCAKICAgICAgICAgICAgICAgICAgZmFtaWx5ID0gcXVhc2lwb2lzc29uKGxpbmsgPSAibG9nIikpCnN1bW1hcnkod2IucXBvaXNzb24pCmBgYAojIyMg6LKg44Gu5LqM6aCF5YiG5biD44Oi44OH44Or44Gu5o6o5a6aCi0gYE1BU1Ng44OR44OD44Kx44O844K444GuYGdsbS5uYigpYOmWouaVsOOCkueUqOOBhOiyoOOBruS6jOmgheWIhuW4g+ODouODh+ODq+OCkuaOqOWumuOBl+OBvuOBmeOAggpgYGB7ciB3Yi5uZWdiaW59CmxpYnJhcnkoTUFTUykKd2IubmVnYmluIDwtIGdsbS5uYihicmVha3N+d29vbCt0ZW5zaW9uLCBkYXRhPXdhcnBicmVha3MpCnN1bW1hcnkod2IubmVnYmluKQpgYGAKCiMjIyDjg6Ljg4fjg6vplpPjga7jg5Hjg6njg6Hjg7zjgr/mr5TovIMKLSBganRvb2xzYOODkeODg+OCseODvOOCuOOBrmBwbG90X3N1bW1zKClg6Zai5pWw44KS55So44GE44Gm44CB44Oi44OH44Or44OR44Op44Oh44O844K/44KS6KaW6Kaa55qE44Gr5q+U6LyD44Gn44GN44G+44GZ44CCCmBgYHtyIHBsb3Quc3VtbXMsIGZpZy5oZWlnaHQ9MS41LCBmaWcud2lkdGg9MS41fQojanRvb2xzOjpwbG90X3N1bW1zKHdiLnBvaXNzb24sIAojICAgICAgICAgICBzY2FsZSA9IFRSVUUsIGV4cCA9IFRSVUUpCmp0b29sczo6cGxvdF9zdW1tcyh3Yi5wb2lzc29uLCB3Yi5xcG9pc3Nvbiwgd2IubmVnYmluLCAKICAgICAgICAgICAgICAgICAgIHNjYWxlID0gVFJVRSwgZXhwID0gVFJVRSkKYGBgCiMjIyDkuojmuKwKLSBgcHJlZGljdCgpYOmWouaVsOOCkuS9v+OBo+OBpuOAgeS6iOa4rOe1kOaenOOCkuOBi+OBiOOBmeOBk+OBqOOBjOOBp+OBjeOBvuOBmeOAguOBk+OBk+OBp+OBr+OAgWB3b29sPSJBIiwgdGVuc2lvbj0iTCJg44Gu5aC05ZCI44Gu5LqI5ris57WQ5p6c44KS5q+U6LyD44GX44G+44GZ44CCCmBgYHtyIHByZWRpY3RfcmVzfQpwcmVkaWN0KHdiLnBvaXNzb24sIG5ld2RhdGE9ZGF0YS5mcmFtZSh3b29sPSJBIiwgdGVuc2lvbj0iTCIpLCB0eXBlPSJyZXNwb25zZSIpCnByZWRpY3Qod2IucXBvaXNzb24sIG5ld2RhdGE9ZGF0YS5mcmFtZSh3b29sPSJBIiwgdGVuc2lvbj0iTCIpLCB0eXBlPSJyZXNwb25zZSIpCnByZWRpY3Qod2IubmVnYmluLCBuZXdkYXRhPWRhdGEuZnJhbWUod29vbD0iQSIsIHRlbnNpb249IkwiKSwgdHlwZT0icmVzcG9uc2UiKQpgYGAKCg==