データ分析の準備
パッケージのインストール
install.packages("ggplot2")
install.packages("datasets")
install.packages("jtools")
install.packages("ggstance")
install.packages("broom")
library(ggplot2)
library(datasets)
library(jtools)
library(ggstance)
library(broom)
データの要約
datasets
パッケージのwarpbreaks
データを用います
warpbreaks
データの統計量は以下のようになります。平均と比較して分散が非常に大きいため、過分散なデータであると言えます。
summary(warpbreaks)
breaks wool tension
Min. :10.00 A:27 L:18
1st Qu.:18.25 B:27 M:18
Median :26.00 H:18
Mean :28.15
3rd Qu.:34.00
Max. :70.00
c(mean=mean(warpbreaks$breaks), var=var(warpbreaks$breaks), ratio=var(warpbreaks$breaks)/mean(warpbreaks$breaks))
mean var ratio
28.148148 174.204053 6.188828
ggplot2::ggplot(warpbreaks, ggplot2::aes(x=breaks)) +
ggplot2::geom_histogram()
ggplot2::ggplot(warpbreaks, ggplot2::aes(x=breaks, fill=wool)) +
ggplot2::geom_histogram(position = "identity", alpha = 0.5)
ggplot2::ggplot(warpbreaks, ggplot2::aes(x=breaks, fill=tension)) +
ggplot2::geom_histogram(position = "identity", alpha = 0.5)
モデルの推定
ポアソン回帰モデルの推定
glm()
関数を用いてポアソン回帰モデルを推定します。リンク関数をfamily = poisson(link = "log")
と指定します。
wb.poisson <- glm(breaks~wool+tension, data=warpbreaks,
family = poisson(link = "log"))
summary(wb.poisson)
Call:
glm(formula = breaks ~ wool + tension, family = poisson(link = "log"),
data = warpbreaks)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.6871 -1.6503 -0.4269 1.1902 4.2616
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.69196 0.04541 81.302 < 2e-16 ***
woolB -0.20599 0.05157 -3.994 6.49e-05 ***
tensionM -0.32132 0.06027 -5.332 9.73e-08 ***
tensionH -0.51849 0.06396 -8.107 5.21e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 297.37 on 53 degrees of freedom
Residual deviance: 210.39 on 50 degrees of freedom
AIC: 493.06
Number of Fisher Scoring iterations: 4
疑似ポアソン回帰モデルの推定
glm()
関数を用いて疑似ポアソン回帰モデルを推定します。リンク関数をfamily = quasipoisson(link = "log")
と指定します。
wb.qpoisson <- glm(breaks~wool+tension, data=warpbreaks,
family = quasipoisson(link = "log"))
summary(wb.qpoisson)
Call:
glm(formula = breaks ~ wool + tension, family = quasipoisson(link = "log"),
data = warpbreaks)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.6871 -1.6503 -0.4269 1.1902 4.2616
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.69196 0.09374 39.384 < 2e-16 ***
woolB -0.20599 0.10646 -1.935 0.058673 .
tensionM -0.32132 0.12441 -2.583 0.012775 *
tensionH -0.51849 0.13203 -3.927 0.000264 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 4.261537)
Null deviance: 297.37 on 53 degrees of freedom
Residual deviance: 210.39 on 50 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
負の二項分布モデルの推定
MASS
パッケージのglm.nb()
関数を用い負の二項分布モデルを推定します。
library(MASS)
wb.negbin <- glm.nb(breaks~wool+tension, data=warpbreaks)
summary(wb.negbin)
Call:
glm.nb(formula = breaks ~ wool + tension, data = warpbreaks,
init.theta = 9.944385436, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0144 -0.9319 -0.2240 0.5828 1.8220
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.6734 0.0979 37.520 < 2e-16 ***
woolB -0.1862 0.1010 -1.844 0.0651 .
tensionM -0.2992 0.1217 -2.458 0.0140 *
tensionH -0.5114 0.1237 -4.133 3.58e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(9.9444) family taken to be 1)
Null deviance: 75.464 on 53 degrees of freedom
Residual deviance: 53.723 on 50 degrees of freedom
AIC: 408.76
Number of Fisher Scoring iterations: 1
Theta: 9.94
Std. Err.: 2.56
2 x log-likelihood: -398.764
モデル間のパラメータ比較
jtools
パッケージのplot_summs()
関数を用いて、モデルパラメータを視覚的に比較できます。
#jtools::plot_summs(wb.poisson,
# scale = TRUE, exp = TRUE)
jtools::plot_summs(wb.poisson, wb.qpoisson, wb.negbin,
scale = TRUE, exp = TRUE)
Note: Pseudo-R2 for quasibinomial/quasipoisson families is calculated by refitting
the fitted and null models as binomial/poisson.
予測
predict()
関数を使って、予測結果をかえすことができます。ここでは、wool="A", tension="L"
の場合の予測結果を比較します。
predict(wb.poisson, newdata=data.frame(wool="A", tension="L"), type="response")
1
40.12354
predict(wb.qpoisson, newdata=data.frame(wool="A", tension="L"), type="response")
1
40.12354
predict(wb.negbin, newdata=data.frame(wool="A", tension="L"), type="response")
1
39.3838
LS0tCnRpdGxlOiAi57Wx6KiI6Kej5p6QUua8lOe/kjUiCiNhdXRob3I6ICJUb21veXVraSBGdXJ1dGFuaSIKI2RhdGU6ICIyMDE5LzgvOSIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OiBkZWZhdWx0Ci0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKa25pdHI6Om9wdHNfY2h1bmskc2V0KGNhY2hlID0gVFJVRSkKa25pdHI6Om9wdHNfY2h1bmskc2V0KGVycm9yID0gVFJVRSkKYGBgCgojIyDmvJTnv5Ljga7mpoLopoEKLSDjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjgIHnlpHkvLzjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjgIHosqDjga7kuozpoIXliIbluIPjg6Ljg4fjg6sKCiMjIyDms6jmhI/kuovpoIUKLSDmhbbmh4nnvqnlob7lpKflraZTRkPjgafplovorJvjgZfjgabjgYTjgovjgIzntbHoqIjop6PmnpDjgI3jga7mjojmpa3lsaXkv67ogIXlkJHjgZHjga7mvJTnv5LnlKjjg5rjg7zjgrjjgafjgZnjgIIKLSDlv4XjgZrjgZfjgoLlhajjgabjga7jg5Djg7zjgrjjg6fjg7Pjga5S44KET1Pjgafli5XkvZznorroqo3jgpLooYzjgaPjgabjgYTjgb7jgZvjgpPjgILjgZPjga7mvJTnv5LnlKjjg5rjg7zjgrjjgpLkvZzmiJDjgZfjgabjgYTjgovmrrXpmo7jgafjga/jgIFSMy42LjDjgpLkvb/jgaPjgabjgYTjgb7jgZnjgIIKLSBS44Gu5pu05paw44Gq44Gp44Gr44KI44KK44CBUuOCs+ODvOODieOBquOBqeOCkuS6iOWRiueEoeOBl+OBq+WkieabtOOBmeOCi+WgtOWQiOOBjOOBguOCiuOBvuOBmeOAgiAKCiMjIOODh+ODvOOCv+WIhuaekOOBrua6luWCmQojIyMg44OR44OD44Kx44O844K444Gu44Kk44Oz44K544OI44O844OrCi0g5LuK5Zue44Gu5ryU57+S44Gn44Gv44CB5Lul5LiL44Gu44OR44OD44Kx44O844K444KS5L2/44GE44G+44GZ44CCCmBgYHtyIGluc3RhbGwucGFja2FnZXMsIGV2YWw9RkFMU0V9Cmluc3RhbGwucGFja2FnZXMoImdncGxvdDIiKQppbnN0YWxsLnBhY2thZ2VzKCJkYXRhc2V0cyIpCmluc3RhbGwucGFja2FnZXMoImp0b29scyIpCmluc3RhbGwucGFja2FnZXMoImdnc3RhbmNlIikKaW5zdGFsbC5wYWNrYWdlcygiYnJvb20iKQpgYGAKCmBgYHtyIGxpYnJhcnksIGV2YWw9RkFMU0V9CmxpYnJhcnkoZ2dwbG90MikKbGlicmFyeShkYXRhc2V0cykKbGlicmFyeShqdG9vbHMpCmxpYnJhcnkoZ2dzdGFuY2UpCmxpYnJhcnkoYnJvb20pCmBgYAoKIyMjIOODh+ODvOOCv+OBruimgee0hAotIGBkYXRhc2V0c2Djg5Hjg4PjgrHjg7zjgrjjga5gd2FycGJyZWFrc2Djg4fjg7zjgr/jgpLnlKjjgYTjgb7jgZkKLSBgd2FycGJyZWFrc2Djg4fjg7zjgr/jga7ntbHoqIjph4/jga/ku6XkuIvjga7jgojjgYbjgavjgarjgorjgb7jgZnjgILlubPlnYfjgajmr5TovIPjgZfjgabliIbmlaPjgYzpnZ7luLjjgavlpKfjgY3jgYTjgZ/jgoHjgIHpgY7liIbmlaPjgarjg4fjg7zjgr/jgafjgYLjgovjgajoqIDjgYjjgb7jgZnjgIIKYGBge3Igc3VtbWFyeX0Kc3VtbWFyeSh3YXJwYnJlYWtzKQpgYGAKYGBge3IgbWVhbnZhcn0KYyhtZWFuPW1lYW4od2FycGJyZWFrcyRicmVha3MpLCB2YXI9dmFyKHdhcnBicmVha3MkYnJlYWtzKSwgcmF0aW89dmFyKHdhcnBicmVha3MkYnJlYWtzKS9tZWFuKHdhcnBicmVha3MkYnJlYWtzKSkKYGBgCgoKLSDjg5Ljgrnjg4jjgrDjg6njg6Djga/ku6XkuIvjga7jgojjgYbjgavjgarjgorjgb7jgZnjgIIKYGBge3IgaGlzdDEsIGZpZy5oZWlnaHQ9MS41LCBmaWcud2lkdGg9MS41fQpnZ3Bsb3QyOjpnZ3Bsb3Qod2FycGJyZWFrcywgZ2dwbG90Mjo6YWVzKHg9YnJlYWtzKSkgKyAKICBnZ3Bsb3QyOjpnZW9tX2hpc3RvZ3JhbSgpCmBgYApgYGB7ciBoaXN0MiwgZmlnLmhlaWdodD0xLjUsIGZpZy53aWR0aD0xLjV9CmdncGxvdDI6OmdncGxvdCh3YXJwYnJlYWtzLCBnZ3Bsb3QyOjphZXMoeD1icmVha3MsIGZpbGw9d29vbCkpICsgCiAgZ2dwbG90Mjo6Z2VvbV9oaXN0b2dyYW0ocG9zaXRpb24gPSAiaWRlbnRpdHkiLCBhbHBoYSA9IDAuNSkgCmBgYApgYGB7ciBoaXN0MywgZmlnLmhlaWdodD0xLjUsIGZpZy53aWR0aD0xLjV9CmdncGxvdDI6OmdncGxvdCh3YXJwYnJlYWtzLCBnZ3Bsb3QyOjphZXMoeD1icmVha3MsIGZpbGw9dGVuc2lvbikpICsgCiAgZ2dwbG90Mjo6Z2VvbV9oaXN0b2dyYW0ocG9zaXRpb24gPSAiaWRlbnRpdHkiLCBhbHBoYSA9IDAuNSkgCmBgYAoKIyMg44Oi44OH44Or44Gu5o6o5a6aCiMjIyDjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjga7mjqjlrpoKLSBgZ2xtKClg6Zai5pWw44KS55So44GE44Gm44Od44Ki44K944Oz5Zue5biw44Oi44OH44Or44KS5o6o5a6a44GX44G+44GZ44CC44Oq44Oz44Kv6Zai5pWw44KSYGZhbWlseSA9IHBvaXNzb24obGluayA9ICJsb2ciKWDjgajmjIflrprjgZfjgb7jgZnjgIIKYGBge3Igd2IucG9pc3Nvbn0Kd2IucG9pc3NvbiA8LSBnbG0oYnJlYWtzfndvb2wrdGVuc2lvbiwgZGF0YT13YXJwYnJlYWtzLCAKICAgICAgICAgICAgICAgICAgZmFtaWx5ID0gcG9pc3NvbihsaW5rID0gImxvZyIpKQpzdW1tYXJ5KHdiLnBvaXNzb24pCmBgYAoKIyMjIOeWkeS8vOODneOCouOCveODs+WbnuW4sOODouODh+ODq+OBruaOqOWumgotIGBnbG0oKWDplqLmlbDjgpLnlKjjgYTjgabnlpHkvLzjg53jgqLjgr3jg7Plm57luLDjg6Ljg4fjg6vjgpLmjqjlrprjgZfjgb7jgZnjgILjg6rjg7Pjgq/plqLmlbDjgpJgZmFtaWx5ID0gcXVhc2lwb2lzc29uKGxpbmsgPSAibG9nIilg44Go5oyH5a6a44GX44G+44GZ44CCCmBgYHtyIHdiLnFwb2lzc29ufQp3Yi5xcG9pc3NvbiA8LSBnbG0oYnJlYWtzfndvb2wrdGVuc2lvbiwgZGF0YT13YXJwYnJlYWtzLCAKICAgICAgICAgICAgICAgICAgZmFtaWx5ID0gcXVhc2lwb2lzc29uKGxpbmsgPSAibG9nIikpCnN1bW1hcnkod2IucXBvaXNzb24pCmBgYAojIyMg6LKg44Gu5LqM6aCF5YiG5biD44Oi44OH44Or44Gu5o6o5a6aCi0gYE1BU1Ng44OR44OD44Kx44O844K444GuYGdsbS5uYigpYOmWouaVsOOCkueUqOOBhOiyoOOBruS6jOmgheWIhuW4g+ODouODh+ODq+OCkuaOqOWumuOBl+OBvuOBmeOAggpgYGB7ciB3Yi5uZWdiaW59CmxpYnJhcnkoTUFTUykKd2IubmVnYmluIDwtIGdsbS5uYihicmVha3N+d29vbCt0ZW5zaW9uLCBkYXRhPXdhcnBicmVha3MpCnN1bW1hcnkod2IubmVnYmluKQpgYGAKCiMjIyDjg6Ljg4fjg6vplpPjga7jg5Hjg6njg6Hjg7zjgr/mr5TovIMKLSBganRvb2xzYOODkeODg+OCseODvOOCuOOBrmBwbG90X3N1bW1zKClg6Zai5pWw44KS55So44GE44Gm44CB44Oi44OH44Or44OR44Op44Oh44O844K/44KS6KaW6Kaa55qE44Gr5q+U6LyD44Gn44GN44G+44GZ44CCCmBgYHtyIHBsb3Quc3VtbXMsIGZpZy5oZWlnaHQ9MS41LCBmaWcud2lkdGg9MS41fQojanRvb2xzOjpwbG90X3N1bW1zKHdiLnBvaXNzb24sIAojICAgICAgICAgICBzY2FsZSA9IFRSVUUsIGV4cCA9IFRSVUUpCmp0b29sczo6cGxvdF9zdW1tcyh3Yi5wb2lzc29uLCB3Yi5xcG9pc3Nvbiwgd2IubmVnYmluLCAKICAgICAgICAgICAgICAgICAgIHNjYWxlID0gVFJVRSwgZXhwID0gVFJVRSkKYGBgCiMjIyDkuojmuKwKLSBgcHJlZGljdCgpYOmWouaVsOOCkuS9v+OBo+OBpuOAgeS6iOa4rOe1kOaenOOCkuOBi+OBiOOBmeOBk+OBqOOBjOOBp+OBjeOBvuOBmeOAguOBk+OBk+OBp+OBr+OAgWB3b29sPSJBIiwgdGVuc2lvbj0iTCJg44Gu5aC05ZCI44Gu5LqI5ris57WQ5p6c44KS5q+U6LyD44GX44G+44GZ44CCCmBgYHtyIHByZWRpY3RfcmVzfQpwcmVkaWN0KHdiLnBvaXNzb24sIG5ld2RhdGE9ZGF0YS5mcmFtZSh3b29sPSJBIiwgdGVuc2lvbj0iTCIpLCB0eXBlPSJyZXNwb25zZSIpCnByZWRpY3Qod2IucXBvaXNzb24sIG5ld2RhdGE9ZGF0YS5mcmFtZSh3b29sPSJBIiwgdGVuc2lvbj0iTCIpLCB0eXBlPSJyZXNwb25zZSIpCnByZWRpY3Qod2IubmVnYmluLCBuZXdkYXRhPWRhdGEuZnJhbWUod29vbD0iQSIsIHRlbnNpb249IkwiKSwgdHlwZT0icmVzcG9uc2UiKQpgYGAKCg==