Purpose of the workshop

This site is prepared for the GIS workshop at Universität Wien. The main purpose of the workshop is to encourage Japanologie researchers to understand GIS and public statistics regarding Japan.

In this workshop, how to use GIS in R will be lectured by hands-on. R could be available from the following URL https://cran.r-project.org/. R studio https://www.rstudio.com/ is also recommended to use because of its beautiful GUI.

Preparation and setup

When using GIS in R, it is necessary to use several packages. In order to use packages, we need to install them by using install.packages() function like this. ggplot2 package can be installed via github.

Once you install packages, usually you don’t need to install them next time. When some of them are upgraded, it is recommended to install the packages again.

install.packages(c("maptools","classInt","RColorBrewer", "readr",
                   "magrittr","dplyr","kokudosuuchi", "estatapi",
                   "sf","purrr", "needs", "DT"))
devtools::install_github("tidyverse/ggplot2")

Installed packages becomes active by using library() function.

library(maptools)
library(classInt)
library(RColorBrewer)
library(readr)
library(magrittr)
library(dplyr)
library(kokudosuuchi)
library(estatapi)
library(sf)
library(purrr)
library(needs)
library(ggplot2)
library(DT)

magrittr() of needs package should be prioritized so that pipes %>% are available.

needs::prioritize(magrittr) 

Create a new folder named as gisdata for the working space. In this case, gisdata folder will be made under the user folder.

setwd() is used to move directory and getwd() returns the current directory.

setwd("./")
dir.create("gisdata")
setwd("gisdata")
getwd()

Once you made a working directory, you don’t need to use the first and the second line of the above chunk.

Download and visualize GIS mesh map

First of all, we download a 1km mesh estiamted population data set from Kokudo Suuchi Joho (KSJ) from MLIT (Ministry of Lnad Infrastructure and Transportation) web site and then make a thematic map of estimated population of Kanagawa in 2020.

Kokudo Suuchi Joho Web site is as following; http://nlftp.mlit.go.jp/ksj/index.html

getKSJSummary() of kokudosuuchi package provides summary information of Kokudo Suuchi Joho via API

kokudosuuchi::getKSJSummary() %>% 
  filter(title=="1kmメッシュ別将来推計人口(H29国政局推計)(shape形式版)")
## # A tibble: 1 x 5
##   identifier title                                 field1  field2 areaType
##   <chr>      <chr>                                 <chr>   <chr>  <chr>   
## 1 m1000      1kmメッシュ別将来推計人口(H29国政局推計)(shape形式版)… 各種統計… " "    3

After getting URL of Kokudo Suuchi Joho by getKSJURL(), Kokudo Suuchi Joho data could be downloaded by curl_download of curl package.

url <- kokudosuuchi::getKSJURL("m1000", prefCode = 14) %>%
  dplyr::select(zipFileUrl) %>% 
  dplyr::pull(1)
tf <- tempfile(fileext = ".zip")
curl::curl_download(url, tf)
unzip(tf, exdir = getwd(), junkpaths = TRUE)
unlink(tf)
rm(tf)

st_read of sf package is used to read ArcGIS shape files.

d <- sf::st_read('Mesh3_POP_14.shp')

Plot a map of estimated population in 2020 of Kanagawa prefecture by using ggplot2 as followings.

ggplot2::ggplot(data = d, aes(fill=POP2020)) + 
  geom_sf() + 
  scale_fill_distiller(palette="RdYlGn") + 
  theme_bw() +
  labs(title = "Population of Kanagawa in 2020 (est.)")

In this case, class of the attribute we’d like to show in the map is defined according to quantile style by using classIntervals() funciton of classInt package.

Find another class style from help(classIntervals) or the following URL (in Japanese).

http://web.sfc.keio.ac.jp/~maunz/wiki/index.php?asakura_sp_chap04

Class interval of population 2020 is five by defining n=5. Color palet of the class interval is defined by using findClours(). Here, a grey scale color pallet is applied.

pal0 <- c("white","grey","grey2")
ci <- classInt::classIntervals(d$POP2020, n=5, style = "quantile")
ci_Col <- classInt::findColours(ci,pal0)
You can plot a white map as following.

Color of borders of mesh (or polygon boundary) can be defined by border="".

plot(sf::st_geometry(d), border="grey", col="white")

A thematic map can also be drawn like this. Here, title and legend are added by using title() and legend().

plot(sf::st_geometry(d), border="grey" ,col=ci_Col)
title("Population of Kanagawa in 2020 (Estimated)", cex=1.4)
legend("topleft",fill=attr(ci_Col,"palette"), cex=0.8,
       legend=names(attr(ci_Col,"table")),bty="n")

The following chunk shows another way to make a thematic map by using pipe %>%.

In this case, RColorBrewer is applied to define a color pallet. Quantitative pallets, sequential pallets and diverging pallets can be employed in RColorBrewer. (i.e. see https://moderndata.plot.ly/create-colorful-graphs-in-r-with-rcolorbrewer-and-plotly/)

d %>% 
  sf::st_geometry(d) %>% 
  plot(col=classInt::findColours(ci, rev(RColorBrewer::brewer.pal(7, "RdYlGn"))))

A grey scale map can be depicted like this.

d %>% 
  sf::st_geometry(d) %>% 
  plot(col=classInt::findColours(ci, pal0), border="grey")

You can select areas arbitrary by using filter() function of dplyr package. In this case, Yokohama city boundary is selected by CITY_CODE.

市区町村コード or city codes are available from the following URL. http://www.soumu.go.jp/denshijiti/code.html

yoko <- dplyr::filter(d, CITY_CODE>=14101&CITY_CODE<=14143)
yoko %>% 
  sf::st_geometry(yoko) %>% 
  plot(col=classInt::findColours(ci, rev(RColorBrewer::brewer.pal(7, "RdYlGn"))))

Secondary, we’d like to dray a map of 500m mesh scale population in 2020 of Yokohama.

Download a 500m mesh estiamted population data set from KSJ and make a thematic map.

kokudosuuchi::getKSJSummary() %>%
  filter(title=="500mメッシュ別将来推計人口(H29国政局推計)(shape形式版)")
url <- kokudosuuchi::getKSJURL("m500", prefCode = 14) %>%
  select(zipFileUrl) %>% 
  pull(1)
tf <- tempfile(fileext = ".zip")
curl::curl_download(url, tf)
unzip(tf, exdir = getwd(), junkpaths = TRUE)
unlink(tf)
rm(tf)
d2 <- sf::st_read('Mesh4_POP_14.shp')
d2 %>% 
  sf::st_geometry(d2) %>% 
  plot(col=classInt::findColours(
    classInt::classIntervals(d2$POP2020, n=5, style = "sd"), 
    rev(RColorBrewer::brewer.pal(7, "RdYlGn"))),
    border="grey")

Now, we’d like to draw a Yokohama city area population and then to overlay a railway map and administrative boundary map to make a thematic map.

A railway GIS data and a boundary GIS data are also downloaded from KSJ.

# kokudosuuchi::getKSJSummary() %>% filter(title=="鉄道")
url <- kokudosuuchi::getKSJURL("N02") %>%
  select(zipFileUrl) %>% 
  pull(1)
tf <- tempfile(fileext = ".zip")
curl::curl_download(url, tf)
unzip(tf, exdir = getwd(), junkpaths = TRUE)
unlink(tf)
rm(tf)
rw_ln <- sf::st_read('N02-05-g_RailroadSection.shp')
# kokudosuuchi::getKSJSummary() %>% filter(title=="行政区域")
kokudosuuchi::getKSJURL("N03", prefCode = 14) %>%
  filter(year==2015) %>%  select(zipFileUrl) %>% 
  pull(1) -> url
tf <- tempfile(fileext = ".zip")
curl::curl_download(url, tf)
unzip(tf, exdir = getwd(), junkpaths = TRUE)
unlink(tf)
rm(tf)
# Read a boundary of Kanagawa Pref.
ad <- sf::st_read('N03-15_14_150101.shp')
# Select Yokohama City boundary and plot it
ad %>%
  filter(N03_007 %in% c(14101:14118)) -> y_ad
# Select Yokohama city: City codes is from 14101 to 14118
y2 <- dplyr::filter(d2, CITY_CODE>=14101&CITY_CODE<=14118)
y2 %>% 
  sf::st_geometry(y2) %>% 
  plot(col=classInt::findColours(
    classInt::classIntervals(y2$POP2020, n=5, style = "sd"), 
    rev(RColorBrewer::brewer.pal(7, "RdYlGn"))),
    border="grey")

plot(rw_ln, add=TRUE)
## Warning in plot.sf(rw_ln, add = TRUE): ignoring all but the first attribute
plot(y_ad, col="transparent", border="red", add=TRUE)
## Warning in plot.sf(y_ad, col = "transparent", border = "red", add = TRUE):
## ignoring all but the first attribute
title("Population of Yokohama CIty in 2020 (Estimated)")
legend("bottomright",
       fill=attr(
         classInt::findColours(
           classInt::classIntervals(d$POP2020, n=5, style = "quantile"), 
           rev(RColorBrewer::brewer.pal(7, "RdYlGn"))),"palette"), 
       cex=1.0,
       legend=names(attr(
         classInt::findColours(
           classInt::classIntervals(d$POP2020, n=5, style = "quantile"), 
           rev(RColorBrewer::brewer.pal(7, "RdYlGn"))),"table")),bty="n")

Download multiple GIS data from KSJ

Here we show a case to download multiple boundary data of Kanto region (Saitama, Chiba, Tokyo and Kanagawa).

preflist <- 11:14
urls <- kokudosuuchi::getKSJURL("N03", prefCode = preflist) %>%
  filter(year==2015)  %$% zipFileUrl
d <- purrr::map(urls, getKSJData,cache_dir = "cached_zip")
zip_exdir <- tempfile()
dir.create(zip_exdir)
setwd("cached_zip")
map(list.files(), unzip, exdir = zip_exdir)

mdata <- file.path(zip_exdir,
                  list.files(zip_exdir), 
                  paste("N03-15_",11:14,"_150101.shp", sep="")) %>% 
  map(st_read) %>% 
  do.call(rbind, .)

setwd("../")
st_write(mdata, "N03-15_Kanto_150101.shp")

Bind a shape file and a table file

Here, we’d like to demonstrate how to bind a shape file of Tokyo 23ku(wards) GIS data and a house land price (Koji Chika Data of House).

Land price (Koji Chika) data of Tokyo in 2018 is available from Tokyo metropolitan government’s web site. Here, land price data is provided as a point data of table format.

A CSV file (comma (,) deliminated file) can be read by read_csv() of readr.

Select land price points of houses from the data set by using select() and filter().

lp_url <- sprintf("http://www.zaimu.metro.tokyo.jp/kijunchi/30kouji/30kouji_chiten_data.csv")
curl::curl_download(lp_url, "lp_tky.csv")
lph_tky <- readr::read_csv("lp_tky.csv", skip=1, 
                          col_types = cols(都道府県市区町村コード = "c"),
                          locale = locale(encoding = "CP932")) %>% 
  select(CODE = 2,
         LP2018 = 10,
         Chiseki = 13,
         Yoto = 34) %>% 
  filter(Yoto %in% c("1住居", "1中専", "1低専", "2住居", "2中専", "2低専", "準住居"))

And then, mean house land prices of “Ku” are calculated by summarize() of dply by CODE. This is like making a pivot table in EXCEL.

lph_tky %>%
  dplyr::group_by(CODE) %>%
  dplyr::summarise(LP2018.Mean = mean(LP2018))

Tokyo metropolitan prefecture’s boundary is also available from KSJ.

kokudosuuchi::getKSJURL("N03", prefCode = 13) %>%
  filter(year==2015) %>%  select(zipFileUrl) %>% 
  pull(1) -> url
tf <- tempfile(fileext = ".zip")
curl::curl_download(url, tf)
unzip(tf, exdir = getwd(), junkpaths = TRUE)
unlink(tf)
rm(tf)

Read a boundary of Tokyo-to and select Tokyo 23ku boundary.

tky_bd <- sf::st_read('N03-15_13_150101.shp') %>%
  dplyr::group_by(N03_007) %>%
  dplyr::summarise(geometry = st_union(geometry)) %>%
  dplyr::ungroup()
tky_bd %>%
  filter(N03_007 %in% c(13101:13123)) -> tky23ku_bd

To bind a shape file and a table file, use inner_join() of dplyr.

# Join a shape file and a table
tky23ku_lph <- dplyr::inner_join(tky23ku_bd, lph_tky, by = c("N03_007" = "CODE"))
## Warning: Column `N03_007`/`CODE` joining factor and character vector,
## coercing into character vector

Finally, you can depict a thematic map of house land price data of Tokyo 23 Ku in 2018.

plot(tky23ku_lph["LP2018"])

Download government statistics of Japan from e-Stat

e-Stat URL: https://www.e-stat.go.jp/

Before downloading data sets, it is necessary to register e-Stat API from the following URL https://www.e-stat.go.jp/mypage/user/preregister

You need to enter your e-Stat api ID as myappID

You can search stats table by stas data ID of e-Stat if you know it. estat_getStatsList() of estatapi returns the list of the stats.

# Search stats list by stats data ID of e-stat if known
StatsList <- estatapi::estat_getStatsList(
  appId = myappID, 
  lang = c("J"),
  searchWord = "",
  statsDataID = "0000010109"
)
str(StatsList)
## Classes 'tbl_df', 'tbl' and 'data.frame':    10000 obs. of  13 variables:
##  $ @id                 : chr  "0003110904" "0003110914" "0003110917" "0003110918" ...
##  $ STAT_NAME           : chr  "国民経済計算" "国民経済計算" "国民経済計算" "国民経済計算" ...
##  $ GOV_ORG             : chr  "内閣府" "内閣府" "内閣府" "内閣府" ...
##  $ STATISTICS_NAME     : chr  "国民経済計算年次推計 平成2年基準 平成10年度確報 フロー編" "国民経済計算年次推計 平成2年基準 平成10年度確報 フロー編" "国民経済計算年次推計 平成2年基準 平成10年度確報 フロー編" "国民経済計算年次推計 平成2年基準 平成10年度確報 フロー編" ...
##  $ TITLE               : chr  "1.統合勘定 2. 国民可処分所得と処分勘定 年度" "1.統合勘定 1. 国内総生産と総支出勘定 年度" "1.統合勘定 2. 国民可処分所得と処分勘定 暦年" "1.統合勘定 3. 資本調達勘定 (1)実物取引 暦年" ...
##  $ CYCLE               : chr  "年次" "年次" "年次" "年次" ...
##  $ SURVEY_DATE         : chr  "199801-199812" "199801-199812" "199801-199812" "199801-199812" ...
##  $ OPEN_DATE           : chr  "2015-09-30" "2015-09-30" "2015-09-30" "2015-09-30" ...
##  $ SMALL_AREA          : chr  "0" "0" "0" "0" ...
##  $ MAIN_CATEGORY       : chr  "企業・家計・経済" "企業・家計・経済" "企業・家計・経済" "企業・家計・経済" ...
##  $ SUB_CATEGORY        : chr  "国民経済計算" "国民経済計算" "国民経済計算" "国民経済計算" ...
##  $ OVERALL_TOTAL_NUMBER: chr  "616" "748" "616" "396" ...
##  $ UPDATED_DATE        : chr  "2016-08-15" "2016-08-15" "2016-08-15" "2016-08-15" ...

estat_getMetaInfo() returns meta data information of the stats.

# Get metadata information
meta_info <- estatapi::estat_getMetaInfo(
  appId = myappID, 
  statsDataId = "0000010109"
)
meta_info
## $tab
## # A tibble: 1 x 3
##   `@code` `@name` `@level`
##   <chr>   <chr>   <chr>   
## 1 00001   観測値  1       
## 
## $cat01
## # A tibble: 384 x 4
##    `@code` `@name`                        `@level` `@unit`
##    <chr>   <chr>                          <chr>    <chr>  
##  1 I1101   I1101_平均余命(0歳)(男)   1        年     
##  2 I1102   I1102_平均余命(0歳)(女)   1        年     
##  3 I1201   I1201_平均余命(20歳)(男) 1        年     
##  4 I1202   I1202_平均余命(20歳)(女) 1        年     
##  5 I1301   I1301_平均余命(40歳)(男) 1        年     
##  6 I1302   I1302_平均余命(40歳)(女) 1        年     
##  7 I1401   I1401_平均余命(60歳)(男) 1        年     
##  8 I1402   I1402_平均余命(60歳)(女) 1        年     
##  9 I1501   I1501_平均余命(65歳)(男) 1        年     
## 10 I1502   I1502_平均余命(65歳)(女) 1        年     
## # ... with 374 more rows
## 
## $area
## # A tibble: 48 x 3
##    `@code` `@name` `@level`
##    <chr>   <chr>   <chr>   
##  1 00000   全国    1       
##  2 01000   北海道  2       
##  3 02000   青森県  2       
##  4 03000   岩手県  2       
##  5 04000   宮城県  2       
##  6 05000   秋田県  2       
##  7 06000   山形県  2       
##  8 07000   福島県  2       
##  9 08000   茨城県  2       
## 10 09000   栃木県  2       
## # ... with 38 more rows
## 
## $time
## # A tibble: 42 x 3
##    `@code`    `@name`  `@level`
##    <chr>      <chr>    <chr>   
##  1 1975100000 1975年度 1       
##  2 1976100000 1976年度 1       
##  3 1977100000 1977年度 1       
##  4 1978100000 1978年度 1       
##  5 1979100000 1979年度 1       
##  6 1980100000 1980年度 1       
##  7 1981100000 1981年度 1       
##  8 1982100000 1982年度 1       
##  9 1983100000 1983年度 1       
## 10 1984100000 1984年度 1       
## # ... with 32 more rows
## 
## $.names
## # A tibble: 4 x 2
##   id    name          
##   <chr> <chr>         
## 1 tab   観測値        
## 2 cat01 I 健康・医療
## 3 area  地域          
## 4 time  調査年

Now, stats data can be obtained by using estat_getStatsData(). By adding condition on statsDataId, cdCat01 or lvArea, you can narrowe down the condition of stats.

# Get stats from e-Stat
SDS_df <- estatapi::estat_getStatsData(
  appId = myappID, 
  statsDataId = "0000010109", # 社会・人口統計体系 都道府県データ 健康・医療
  cdCat01 = "I1601",          # 健康寿命
#  cdCat01 = c("I1601", "I1602"),         # 健康寿命
  lvArea = 2                 # Level of Area
#  lvTime = 1         # Level of Time
#  cdtime = "2010100000"            # 2010年度
#  surveyYears = 2010
  ) 
## Fetching record 1-94... (total: 94 records)
h2013_df <- subset(SDS_df, SDS_df$調査年 %in% "2013年度") %>% 
  as.data.frame()
h2013_df$KENCODE <- as.numeric(h2013_df$area_code)/1000
head(h2013_df)
##   tab_code 観測値 cat01_code       I 健康・医療 area_code   地域
## 1    00001 観測値      I1601 I1601_健康寿命(男)     01000 北海道
## 2    00001 観測値      I1601 I1601_健康寿命(男)     02000 青森県
## 3    00001 観測値      I1601 I1601_健康寿命(男)     03000 岩手県
## 4    00001 観測値      I1601 I1601_健康寿命(男)     04000 宮城県
## 5    00001 観測値      I1601 I1601_健康寿命(男)     05000 秋田県
## 6    00001 観測値      I1601 I1601_健康寿命(男)     06000 山形県
##    time_code   調査年 unit value KENCODE
## 1 2013100000 2013年度   年 71.11       1
## 2 2013100000 2013年度   年 70.29       2
## 3 2013100000 2013年度   年 70.68       3
## 4 2013100000 2013年度   年 71.99       4
## 5 2013100000 2013年度   年 70.71       5
## 6 2013100000 2013年度   年 71.34       6
str(h2013_df)
## 'data.frame':    47 obs. of  11 variables:
##  $ tab_code      : chr  "00001" "00001" "00001" "00001" ...
##  $ 観測値        : chr  "観測値" "観測値" "観測値" "観測値" ...
##  $ cat01_code    : chr  "I1601" "I1601" "I1601" "I1601" ...
##  $ I 健康・医療: chr  "I1601_健康寿命(男)" "I1601_健康寿命(男)" "I1601_健康寿命(男)" "I1601_健康寿命(男)" ...
##  $ area_code     : chr  "01000" "02000" "03000" "04000" ...
##  $ 地域          : chr  "北海道" "青森県" "岩手県" "宮城県" ...
##  $ time_code     : chr  "2013100000" "2013100000" "2013100000" "2013100000" ...
##  $ 調査年        : chr  "2013年度" "2013年度" "2013年度" "2013年度" ...
##  $ unit          : chr  "年" "年" "年" "年" ...
##  $ value         : num  71.1 70.3 70.7 72 70.7 ...
##  $ KENCODE       : num  1 2 3 4 5 6 7 8 9 10 ...

Prefecture boundary GIS data is available my web site as following.

# Download GIS data set from my web page
url.gis <- sprintf("http://web.sfc.keio.ac.jp/~maunz/asakura_sp/asakura_sp_data_1101.zip")
tf <- tempfile(fileext = ".zip")
curl::curl_download(url.gis, tf)
unzip(tf, exdir = getwd(), junkpaths = TRUE)
# Read jpn_pref.shp = prefecture boundary data
jpn_pref <- sf::st_read('jpn_pref.shp')

Finally, by joining a prefecture boundary GIS data and a house land price data, you can plot a thematic map. Here, we plot “Kenko Jumyo (Health life expectancy)” of male in 2013 by prefectural level on the map.

# Join SDS health care data and jpn_pref.shp
jpn_pref_h <- dplyr::inner_join(jpn_pref, h2013_df, by = c("PREF_CODE" = "KENCODE"))
# Plot a map
ggplot(data = jpn_pref_h, aes(fill= value)) + 
  geom_sf() + 
  scale_fill_distiller(palette="RdYlGn") + 
  theme_bw() +
  labs(title = "Health life expectancy of male in Japan (2013)")