Using R and weathercan
to compute weather indices in Canada
When a colleague told me he assigned an undergrad student full-time to copy-paste historical weather data from Environment Canada, I searched into my files and found some code which I could format and share. The weathercan
package, a gem written by Steffi LaZerte, allowed me to automate fetching tasks and save countless hours. All I had to provide was the spatial coordinates of my sites and, since my work was about agricultural data, time periods expressed as beginning of the season to harvest.
library("tidyverse")
library("weathercan")
My csv table contained coordinates and dates.
weather <- read_csv("data/sites.csv")
## Parsed with column specification:
## cols(
## siteid = coldouble(),
## latitude = coldouble(),
## longitude = coldouble(),
## start = coldate(format = “”),
## end = coldate(format = “”)
## )
weather %>%
sample_n(6)
site_id | latitude | longitude | start | end |
---|---|---|---|---|
5 | 45.80300 | -74.96984 | 2016-06-01 | 2016-09-26 |
19 | 45.22333 | -73.53528 | 2017-05-18 | 2017-09-01 |
26 | 45.24454 | -73.48610 | 2014-06-03 | 2014-09-08 |
27 | 46.09112 | -72.34862 | 2014-05-30 | 2014-09-30 |
24 | 45.24550 | -73.48829 | 2015-05-13 | 2015-09-15 |
3 | 46.73023 | -72.26801 | 2016-05-21 | 2016-09-13 |
The first thing to do is to look for the closest stations to each site, which I did within a loop. For each row of my weather
table, I ran the stations_search()
function from the weathercan
package. stations_search()
takes the coordinates, a maximum distance and an interval on which weather data are archived (hourly, daily, weekly, etc.). If I wanted hourly data, stations with a daily resolution would have been discarded.
I store each output of stations_search()
in a list, which is a very useful object type where each element can be anything, from strings to machine learning models.
stations <- list()
for (i in 1:nrow(weather)) {
stations[[i]] <- stations_search(coords = c(weather$latitude[i], weather$longitude[i]),
dist = 50, interval = "day")
}
I verified each element of the list and wrote down the station identifier I wanted. This process could have been done with if
statements within a loop, but since I had not so many sites, this was done quickly by hand. For example, the third site:
stations[[3]]
prov | station_name | station_id | climate_id | WMO_id | TC_id | lat | lon | elev | tz | interval | start | end | distance |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
QC | ST UBALD | 5288 | 7017767 | NA | NA | 46.68 | -72.32 | NA | Etc/GMT+5 | day | 1963 | 1964 | 6.854702 |
QC | ST ALBAN | 5255 | 7016800 | NA | NA | 46.72 | -72.08 | 76.2 | Etc/GMT+5 | day | 1949 | 2018 | 14.417353 |
QC | STE ANNE DE LA PERADE | 5257 | 7016840 | NA | NA | 46.58 | -72.23 | 16.0 | Etc/GMT+5 | day | 1949 | 2018 | 16.951883 |
QC | LAC AUX SABLES | 5203 | 701LEEH | NA | NA | 46.87 | -72.40 | 160.0 | Etc/GMT+5 | day | 1964 | 2018 | 18.518982 |
QC | HERVEY JONCTION | 5226 | 7013102 | NA | NA | 46.85 | -72.47 | 177.4 | Etc/GMT+5 | day | 1926 | 1978 | 20.374795 |
QC | ST TITE | 5287 | 7017760 | NA | NA | 46.73 | -72.57 | 141.7 | Etc/GMT+5 | day | 1920 | 1985 | 23.083509 |
QC | DESCHAMBAULT | 27325 | 7011983 | 71389 | WHQ | 46.69 | -71.97 | 61.0 | Etc/GMT+5 | day | 1997 | 2018 | 23.222388 |
QC | ST NARCISSE | 5281 | 7017585 | NA | NA | 46.53 | -72.43 | 46.0 | Etc/GMT+5 | day | 1973 | 2018 | 25.481852 |
QC | HEROUXVILLE | 5225 | 7013100 | NA | NA | 46.67 | -72.60 | 145.0 | Etc/GMT+5 | day | 1966 | 2018 | 26.258655 |
QC | DESCHAMBAULT | 5220 | 7011982 | NA | NA | 46.67 | -71.92 | 15.2 | Etc/GMT+5 | day | 1971 | 2018 | 27.445147 |
QC | STE CHRISTINE | 5266 | 7017000 | NA | NA | 46.82 | -71.92 | 152.1 | Etc/GMT+5 | day | 1950 | 2018 | 28.390738 |
QC | CHAMPLAIN | 5211 | 7011290 | NA | NA | 46.47 | -72.33 | 12.0 | Etc/GMT+5 | day | 1980 | 2011 | 29.315576 |
QC | RIVIERE A PIERRE | 5253 | 7016560 | NA | NA | 47.00 | -72.17 | 221.0 | Etc/GMT+5 | day | 1949 | 1994 | 30.907166 |
QC | FORTIERVILLE | 5362 | 7022494 | NA | NA | 46.48 | -72.05 | 53.3 | Etc/GMT+5 | day | 1973 | 2018 | 32.445967 |
QC | CHUTE PANET | 5214 | 7011600 | NA | NA | 46.87 | -71.87 | 152.4 | Etc/GMT+5 | day | 1949 | 1982 | 34.126112 |
QC | GENTILLY | 5367 | 7022700 | NA | NA | 46.40 | -72.37 | 6.1 | Etc/GMT+5 | day | 1973 | 1975 | 37.533097 |
QC | ST JOSEPH DE MEKINAC | 5276 | 7017422 | NA | NA | 46.92 | -72.68 | 121.9 | Etc/GMT+5 | day | 1973 | 1994 | 37.858884 |
QC | STE FRANCOISE ROMAINE | 5482 | 7027267 | NA | NA | 46.48 | -71.93 | 91.4 | Etc/GMT+5 | day | 1963 | 1985 | 38.005007 |
QC | STE CROIX | 5459 | 7027088 | NA | NA | 46.62 | -71.78 | 70.0 | Etc/GMT+5 | day | 1973 | 1993 | 39.299534 |
QC | SHAWINIGAN | 27646 | 7018001 | 71370 | XSH | 46.56 | -72.73 | 110.0 | Etc/GMT+5 | day | 1998 | 2018 | 40.113102 |
QC | DONNACONA | 5221 | 7012070 | NA | NA | 46.67 | -71.75 | 10.7 | Etc/GMT+5 | day | 1918 | 1964 | 40.179312 |
QC | SHAWINIGAN | 5290 | 7018000 | NA | NA | 46.57 | -72.75 | 121.9 | Etc/GMT+5 | day | 1902 | 2004 | 40.971074 |
QC | DONNACONA 2 | 5222 | 7012071 | NA | NA | 46.68 | -71.73 | 45.7 | Etc/GMT+5 | day | 1952 | 2008 | 41.520499 |
QC | RIVIERE VERTE OUEST | 5254 | 7016675 | NA | NA | 46.98 | -71.83 | 213.4 | Etc/GMT+5 | day | 1966 | 2018 | 43.436700 |
QC | CAP DE LA MADELEINE | 5209 | 7011045 | NA | NA | 46.37 | -72.53 | 17.1 | Etc/GMT+5 | day | 1920 | 1932 | 44.802297 |
QC | BECANCOUR | 5316 | 7020570 | NA | NA | 46.33 | -72.43 | 14.9 | Etc/GMT+5 | day | 1966 | 1994 | 46.193809 |
QC | TROIS-RIVIERES | 10764 | 7018562 | 71724 | WTY | 46.35 | -72.52 | 6.0 | Etc/GMT+5 | day | 1993 | 2018 | 46.477383 |
QC | LAC MINOGAMI | 5235 | 7013678 | NA | NA | 46.67 | -72.87 | 259.1 | Etc/GMT+5 | day | 1964 | 1978 | 46.524598 |
QC | TROIS RIVIERES AQUEDUC | 5201 | 701HE63 | NA | NA | 46.38 | -72.62 | 54.9 | Etc/GMT+5 | day | 1974 | 2009 | 47.374297 |
QC | TROIS RIVIERES | 5292 | 7018564 | NA | NA | 46.37 | -72.60 | 53.3 | Etc/GMT+5 | day | 1934 | 1986 | 47.453040 |
QC | CLUB TOURILLI | 5216 | 7011800 | NA | NA | 47.08 | -71.90 | 249.9 | Etc/GMT+5 | day | 1949 | 1953 | 47.938720 |
QC | DUCHESNAY | 5224 | 7012240 | NA | NA | 46.87 | -71.65 | 166.1 | Etc/GMT+5 | day | 1963 | 1994 | 49.670924 |
QC | MANSEAU | 5402 | 7024615 | NA | NA | 46.33 | -71.98 | 96.0 | Etc/GMT+5 | day | 1978 | 1994 | 49.675412 |
I wrote down each station_id
in a vector merged to my original table, so that each site has its station identifier.
weather$station_ids <- c(5266, 5255, 5255, 5619, 5619, 5619, 5257, 5255, 5203, 5619,
5619, 8321, 8321, 5266, 5393, 10872, 5940, 10869, 10762, 5237,
10843, 5522, 5522, 10843, 5237, 10843, 5522, 5237, 5532)
Now let's fetch our weather data. This is done with the weather_dl()
function. Within a loop for each site (each row of the weather
data table), I used the station_id
, the start time and the end time to fetch daily weather data from Environment Canada and store the extracted data in a list. Of course, you must be connected to the web to do this (and expect some warnings).
weather_tables <- list()
for (i in 1:nrow(weather)) {
weather_tables[[i]] <- weather_dl(station_ids = weather$station_ids[i],
start = weather$start[i], end = weather$end[i],
interval="day")
}
Daily weather data are stored in each element of the list. For example, the first site with selected columns.
weather_tables[[1]] %>%
select(year, month, day, mean_temp, total_precip) %>%
head(10)
year | month | day | mean_temp | total_precip |
---|---|---|---|---|
2016 | 05 | 09 | NA | 0.8 |
2016 | 05 | 10 | 9.5 | 0.0 |
2016 | 05 | 11 | NA | 0.0 |
2016 | 05 | 12 | NA | 0.0 |
2016 | 05 | 13 | 14.8 | 6.2 |
2016 | 05 | 14 | NA | NA |
2016 | 05 | 15 | NA | NA |
2016 | 05 | 16 | 2.3 | 1.4 |
2016 | 05 | 17 | NA | 0.0 |
2016 | 05 | 18 | NA | 0.0 |
You will likely inspect your data tables further. For my part, I’ll jump right away to the computation of weather indices: I need daily precipitations, mean temperature, a Shannon diversity index on precipitations (SDI, 0 means all precipitations occurred the same day, and 1 means it fell uniformly through the season) and growing degree days (GDD, the sum of Celsius degrees higher than a threshold). Total precipitations and mean temperature can be computed with sum and a mean. SDI and GDD will need custom functions.
SDI_f <- function(x) {
p <- x/sum(x, na.rm = TRUE)
SDI <- -sum(p * log(p), na.rm = TRUE) / log(length(x))
return(SDI)
}
GDD_f <- function(x, delim = 5) {
sum(x[x >= delim], na.rm = TRUE)
}
Since I want to store these indices in my original table, I create empty columns.
weather$total_precip <- NA
weather$mean_temp <- NA
weather$SDI <- NA
weather$GDD <- NA
I compute each index in this final loop, where total precipitation is the sum of daily total precipitations and mean temperature is the mean of daily temperatures. SDI and GDD are computed with the functions I defined earlier.
for (i in 1:nrow(weather)) {
# cumulated prcipitations
weather$total_precip[i] <- sum(weather_tables[[i]]$total_precip, na.rm = TRUE)
# mean temperature
weather$mean_temp[i] <- mean(weather_tables[[i]]$mean_temp, na.rm = TRUE)
# Shannon diversity index of precipitations
weather$SDI[i] <- SDI_f(weather_tables[[i]]$total_precip)
# Growing degree days
weather$GDD[i] <- GDD_f(weather_tables[[i]]$mean_temp, delim = 5)
}
That' it, our weather table is done!
weather %>%
sample_n(6)
site_id | latitude | longitude | start | end | station_ids | total_precip | mean_temp | SDI | GDD |
---|---|---|---|---|---|---|---|---|---|
20 | 45.93424 | -73.32026 | 2016-05-11 | 2016-09-19 | 5237 | 471.0 | 19.16308 | 0.5665234 | 2486.9 |
7 | 46.53156 | -72.23351 | 2017-05-21 | 2017-10-03 | 5257 | 324.5 | 17.63529 | 0.7089004 | 2398.4 |
21 | 45.24248 | -73.51408 | 2016-05-20 | 2016-09-06 | 10843 | 314.2 | 19.86542 | 0.6048280 | 2125.6 |
3 | 46.73023 | -72.26801 | 2016-05-21 | 2016-09-13 | 5255 | 405.6 | 18.53846 | 0.7054556 | 1687.0 |
12 | 46.43994 | -72.66337 | 2017-05-27 | 2017-09-28 | 8321 | 421.4 | 17.30579 | 0.7033883 | 2094.0 |
5 | 45.80300 | -74.96984 | 2016-06-01 | 2016-09-26 | 5619 | 428.8 | 17.56250 | 0.6450922 | 281.0 |
Data and codes are available on Gitlab