-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathClimatology.Rmd
1427 lines (1197 loc) · 77.9 KB
/
Climatology.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Quantifying relationships between environmental factors and accumulated tornado power on the most prolific days in the largest ``outbreaks"
author: "Zoe Schroder/James Elsner"
date: "7/31/2018"
output: github_notebook
editor_options:
chunk_output_type: console
---
Research to be submitted to the **International Journal of Climatology**
Title: Quantifying relationships between environmental factors and power dissipation on the most prolific days in the largest tornado "outbreaks"
##Abstract
Load packages and suppress the messages of the packages.
```{r}
suppressMessages(library(lubridate))
suppressMessages(library(sf))
suppressMessages(library(tmap))
suppressMessages(library(USAboundaries))
suppressMessages(library(rgeos))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(xts))
suppressMessages(library(raster))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(lme4))
suppressMessages(library(xtable))
suppressMessages(library(ggrepel))
suppressMessages(library(viridis))
suppressMessages(library(gridExtra))
```
##Introduction
Tornadoes pose a significant risk to life and property in the United States. The strongest tornadoes producing the majority of fatalities occur in clusters \citep{ElsnerElsnerJagger2015, FuhrmannEtAl2014, DeanandSchneider2012, SchneiderEtAl2004, Mercer2009, Galway1977}. In fact, three-fourths of all fatalities occur on days with the most tornadoes within a multi-day outbreak. For example, the April 27, 2011 outbreak produced 199 tornadoes that resulted in 316 fatalities and more than 2700 injuries. Insured losses exceeded \$11 billion \citep{Knupp2014}.
Tornado clusters (`outbreaks') occur generally east of the Rocky Mountains and west of the Appalachian Mountains \citep{Dean2010} during the months of April, May, and June \citep{DixonEtAl2014, TippettEtAl2014, TippettEtAl2012, Trapp2014, Dean2010, Galway1977}. Outbreaks are largely confined to the Southeast during the late fall and winter months \citep{Dean2010}. The percentage of all U.S.~tornadoes occurring in clusters is on the rise \citep{Moore2017, TippettEtAl2016, ElsnerElsnerJagger2015, TippettEtAl2014, BrooksEtAl2014}. Why this is happening could be related to changes in environmental factors that influence the amount and intensity of deep convection.
To shed light on this question, here we examine the relationships between collective tornado activity within a cluster and the associated environmental variables. Studies have identified environmental factors important to the development of tornadoes \citep{AndersonFrey2018, ChengEtAl2016, DeanandSchneider2012, JacksonandBrown2009, DoswellandEvans2003, Brown2002, Craven2002, Brooks1994}. Missing from these studies is a quantification of the relationships between environmental factors and collective tornado activity. Specifically, how much convective available potential energy is needed, for example, to produce a ten percent increase in accumulated tornado power (ATP)?
The objective here is to quantify the extent to which environmental factors influence ATP. We first identify the biggest days in the largest clusters of tornadoes. To quantify the relationships we regress ATP onto convective available potential energy (CAPE), convective inhibition (CIN), helicity, and bulk shear using tornadoes occurring on these big days. Values for the predictor variables are extracted from reanalysis data. Finally we examine model residuals for goodness of fit.
The paper is outlined as follows. The method to define tornado clusters and the selection criteria for determining the big days within large clusters are described in \S2. Tornado power dissipation is defined and estimated in \S3. The environmental variables are described in \S4 and the statistical relationships between ATP and the environmental variables are modeled and described in \S5. A summary of the paper and a list of conclusions along with ways the study can be improved are given in \S6.
##Tornado Clusters
`Definitions and Data`
A tornado can occur in isolation or within a cluster with other tornadoes. The American Meteorological Society formally defines a tornado outbreak as "multiple tornado occurrences associated with a particular synoptic-scale system" \citep{AMS2018}. Less formally it is understood that an outbreak is a cluster of several to dozens of tornadoes that occur within a relatively short time scale and over a limited geographic region \citep{MalamudTurcotteBrooks2016, TippettEtAl2016, ElsnerElsnerJagger2015}. We focus on tornado clusters in this work rather than on individual tornadoes because clusters have a larger spatial and temporal extent than individual tornadoes that better matches the scale represented by the environmental data. We refer to them as "clusters" rather than "outbreaks"" because we make no attempt to associate the clusters with a particular synoptic-scale system.
In this paper we consider only tornadoes occurring on convective days having at least ten tornadoes when those days are part of a cluster of at least 30 tornadoes. This requires a two step approach. Step one groups the tornadoes into clusters and step two selects tornadoes from clusters with at least 30 tornadoes on days with at least ten tornadoes.
We obtain the tornado data from the Storm Prediction Center’s extensive tornado record (\url{https://www.spc.noaa.gov/wcm/#data}). Date, time, and location of each tornado are used to delineate groups of tornadoes. The data are subset to include only contiguous United States tornadoes that occur from 1994 to 2017, inclusive. The start year marks the beginning of the extensive use of the WSR-88D radar. There are 29,372 tornadoes in the available record over this period of time.
**The newest GIS shapefile contains missing geometries for more than 30% of the tornadoes. The number of missing geometries is highest after 1995. Instead here we use the csv file from https://www.spc.noaa.gov/wcm/#data Use the start lon/lat and create a `sp` object then convert to `sf`. Set the coordinate reference system (crs) to ESPG 4326.**
```{r, eval = FALSE}
Tor.spdf <- read.csv(file = "1950-2017_actual_tornadoes.csv")
sp::coordinates(Tor.spdf) <- ~ slon + slat
Tor.sfdf <- st_as_sf(Tor.spdf)
st_crs(Tor.sfdf) <- 4326
```
**Remove tornadoes in Hawaii, Alaska, and Puerto Rico and those occurring before 1994. That year marks the beginning of comprehensive WSR-88D radar. For missing EF ratings use the modification rules (if/else) defined here: https://www.spc.noaa.gov/wcm/OneTor_F-scale-modifications.pdf **
```{r, eval = FALSE}
All_Tornadoes <- Tor.sfdf %>%
filter(yr >= 1994,
!st %in% c("AK", "PR", "HI")) %>%
mutate(mag = ifelse(mag == -9 & len <= 5, 0, mag),
mag = ifelse(mag == -9 & len > 5, 1, mag))
```
**Add a data/time column also add columns for path length, width, and area in metric units. Leave the time zone as native CDT. Create a convective day (6AM to 6AM) column taking hours 00:00:00 -> 05:59:59 and assigning it to the previous date (this associates the previous day's date to tornadoes occurring up to 6 hours after local midnight).**
```{r, eval = FALSE}
All_Tornadoes <- All_Tornadoes %>%
mutate(dy = format(as.Date(date,format="%m/%d/%y"), "%d"),
DateTime = as.POSIXct(paste(yr, mo, dy, time), format = "%Y%m%d%H:%M:%S"),
Hour = hour(DateTime),
Year = year(DateTime),
cDateTime = DateTime - as.difftime(6, unit = "hours"),
cDate = as.Date(as_datetime(ifelse(Hour < 6, (DateTime - 86400), cDateTime), tz = Sys.timezone())),
Length = len * 1609.34,
Length = ifelse(Length == 0, min(Length[Length > 0]), Length), #takes care of zero length
Width = wid * .9144,
Width = ifelse(Width == 0, min(Width[Width > 0]), Width), #takes care of zero width
Width = ifelse(Year >= 1995, Width * pi/4, Width), #takes care of change: avg to max
cas = inj + fat,
AreaPath = Length * Width,
Ma = factor(month.abb[mo], levels = month.abb[1:12])) %>%
sf::st_sf()
dim(All_Tornadoes)[1]
```
**The geometry type is `POINT`. Each tornado is represented as a single point location geometry (start location). Add power dissipation per tornado. **
```{r, eval = FALSE}
perc <- c(1, 0, 0, 0, 0, 0,
.772, .228, 0, 0, 0, 0,
.616, .268, .115, 0, 0, 0,
.529, .271, .133, .067, 0, 0,
.543, .238, .131, .056, .032, 0,
.538, .223, .119, .07, .033, .017)
percM <- matrix(perc, ncol = 6, byrow = TRUE)
threshW <- c(29.06, 38.45, 49.62, 60.8, 74.21, 89.41)
midptW <- c(diff(threshW)/2 + threshW[-length(threshW)], threshW[length(threshW)] + 7.5)
ef <- All_Tornadoes$mag + 1
EW3 <- numeric()
for(i in 1:length(ef)) EW3[i] = midptW^3 %*% percM[ef[i], ]
All_Tornadoes <- All_Tornadoes %>%
mutate(ED = EW3 * AreaPath)
```
## Group tornadoes into clusters
First we project the geographic coordinates of the tornado locations using a Lambert conic conformal projection for the contiguous United States. The projection origin is situated in eastern Kansas (39$^\circ$ N latitude and 96$^\circ$ W longitude). Then the Euclidean distance ($d_{ij}$) between the genesis location of tornadoes $i$ and $j$ is computed for all tornado pairs. Similarly, the time separating each tornado pair ($t_{ij}$) is computed and added to a scaled Euclidean distance to give a space-time difference ($\delta_k$). The equation is
$$
\delta_k = d_{ij}/s + t_{ij},
$$
where $s$ is a scaling factor and where $k = n(n+1)/2$ indexes the unique tornado pairs where $n$ is the number of tornadoes. The scaling factor is set to 15~m~s$^{-1}$ to match the units of $t_{ij}$ (seconds) and to match the average speed of tornado-producing thunderstorms (although there is wide variation in this average).
Next, the $k$ space-time differences ($\delta_k$) are used to group the individual tornadoes into clusters. If tornado $i$ is in close proximity to tornado $j$ based on a small value of $\delta_k$, then the two tornadoes are considered to belong to the same cluster. Clustering is done using the single-linkage method whereby the two tornadoes with the smallest $\delta_k$ are grouped first. Then the two tornadoes (or the first tornado cluster and another tornado) with the next smallest $\delta_k$ are grouped second. The procedure continues by grouping tornado pairs, cluster-tornado pairs, and cluster-cluster pairs until there is a single large cluster. A cluster-tornado pair occurs when the shortest distance is between the closest tornado in the cluster and a tornado not in the cluster. For example, three tornadoes each 100 km ($\sim$2 hours at 15~m~s$^{-1}$) apart occurring at the same time are considered a cluster. A fourth tornado is considered in the cluster if it is no more than 100 km from {\it any} one of the other three tornadoes. The grouping is done with the {\texttt hclust} function from the {\texttt stats} package in R.
**Determine the distance between tornadoes in space and time. Use a projection, not lat/lon. See https://epsg.io/102004. Extract the coordinates of the start locations as a N by 2 matrix, where N is the number of tornadoes. Also extract the date-time as a vector of class `POSIXct`.**
```{r, eval = FALSE}
All_Tornadoes <- st_transform(All_Tornadoes, crs = 102004)
space <- st_coordinates(All_Tornadoes)
time <- All_Tornadoes$DateTime
```
**Next compute pairwise Euclidean distances in space and, separately, in time using the `dist()` function. Divide the spatial distance by 15 so that the values are commensurate with the time 'distance' based on the assumption of 15 meters per second (~34 mph) for an average speed of tornado-generating storms. Compare: Distance from New York to Denver is 2.622 x 10^6 meters. There are 3.154 x 10^7 seconds in a year. This will capture the historic multiday tornado outbreaks. For analysis we want to consider each day in the multiday group separately. As the value of the divisor increases cluster areas get larger. Remove `ds` and `dt` to free memory. Distances are saved as an object of class `dist` containing a vector of length N * (N-1)/2, which is the number of unique point pairs.**
```{r, eval = FALSE}
ds <- dist(space) / 15
dt <- dist(time)
dst <- ds + dt
rm(ds, dt)
```
Our interest centers on clusters that are not too small (e.g., a family of tornadoes from a single supercell) and not too large (e.g., all tornadoes during a week). So we stop grouping once there are no additional pairs within a $\delta_k$\ of 50K ($\sim$14 space-time hours). This results in 6,156 unique clusters and 155 large (at least 30 tornadoes) clusters. The largest cluster occurred from April 26--28, 2011. It contains 293 tornadoes. Duration of the clusters ranges from 46 one-day clusters to one five-day cluster (Table~\ref{GroupsbyDuration}). Multi-day clusters account for 70\% of all our clusters. The cluster with the longest duration occurred from September 4--8, 2004 and contained 103 tornadoes (Fig.~\ref{LargestGroup}). Our clusters match the outbreaks identified subjectively by \cite{Forbes2006} with an agreement rate of 88\%.
**Next group the tornadoes based on the space-time distances. This is done with the `hclust()` (hierarchical cluster) function. Initially, each tornado is assigned to its own group and then the algorithm joins the two closest tornadoes determined by values in `dst`. The algorithm continues by joining tornadoes (and tornado groups) until there is a single large group. The single linkage method (`method = "single"`) is related to the minimal spanning tree (MST) and adopts a 'friends of friends' grouping strategy. An edge-weighted graph is a graph where each edge has a weight (or cost). Here weights are space-time distances between tornadoes. A MST of an edge-weighted graph is a spanning tree whose weight (the sum of the weights of its edges) is no larger than the weight of any other spanning tree. A spanning tree of a graph on N vertices (tornado centroids) is a subset of N-1 edges that form a tree (Skiena 1990, p. 227).The `cutree()` function is used to extract a group number for each tornado. Tornadoes in each group are close in space & time. Here the tree is cut at a height of 50000 space-time units. Making `h` smaller results in smaller groups (fewer tornadoes per group).**
```{r, eval = FALSE}
stime <- proc.time()
tree <- hclust(dst, method = "single")
groupNumber <- as.integer(cutree(tree, h = 50000))
proc.time() - stime
```
**Add the group number to each tornado. **
```{r, eval = FALSE}
All_Tornadoes$groupNumber <- groupNumber
```
**Compute group-level statistics. Keep only tornado groups with at least 30 tornadoes.**
```{r, eval = FALSE}
Groups.sfdfT <- All_Tornadoes %>%
group_by(groupNumber) %>%
summarize(Year = first(Year),
Month = first(mo),
FirstDate = first(date),
LastDate = last(date),
Name = paste(FirstDate, "to", LastDate),
FirstcDate = first(cDate),
LastcDate = last(cDate),
ncD = n_distinct(cDate),
nT = n(),
n0 = sum(mag == 0),
n1 = sum(mag == 1),
n2 = sum(mag == 2),
n3 = sum(mag == 3),
n4 = sum(mag == 4),
n5 = sum(mag == 5),
ATP = sum(ED),
ATP_TW = paste(round(ATP/10^12), "TW"),
maxEF = max(mag),
nD = n_distinct(date),
StartTime = first(DateTime),
EndTime = last(DateTime),
Duration = difftime(EndTime, StartTime, units = "secs"),
cas = sum(inj + fat)) %>%
filter(nT >= 30)
dim(Groups.sfdfT)
```
```{r}
as.data.frame(Groups.sfdfT) %>%
group_by(nD) %>%
summarize(tot_events = n(),
tot_torn = sum(nT))
(155-46)/155 * 100
```
`Table 1: The total number of large groups and tornadoes by duration.` #CORRECT
**Get the tornadoes that are in the 155 large groups. **
```{r, eval = FALSE}
GroupTornadoes <- All_Tornadoes %>%
filter(groupNumber %in% Groups.sfdfT$groupNumber)
```
############################
## Forbes 2004 Comparison ##
############################
**Over the period 1994-2017 there were 155 tornado groups with at least 30 tornadoes. How many of these groups are not considered `outbreaks`? How many previously considered `outbreaks` are missed by this algorithm?**
```{r, eval = FALSE}
#FORBES 2004
Groups.sfdfT %>%
filter(Year <= 2004) %>%
top_n(n = 13, wt = nT) %>%
arrange(desc(nT))
```
**False Positive: How many outbreaks Schroder and Elsner identified, not identified by Forbes/Fuhrmann**
***False Negative: How many Forbes/Fuhrmann picked up that we did not**
**% Match: F match Z / sum(F & Z )**
```{r, eval = FALSE}
FPFN <- read.csv("FPFN_Forbes.csv")
FPFN <- FPFN[1:8,]
FPFN
```
**Map of the longest tornado outbreak. Tornado points colored by cDate. **
```{r, eval = FALSE}
longestgroup <- Groups.sfdfT %>%
filter(groupNumber == 3075)
longestgroup <- st_convex_hull(longestgroup)
#longestdays <- BigDays.sfdfT %>%
# filter(groupNumber == 3075)
longestdaytorns <- GroupTornadoes %>%
filter(groupNumber == 3075)
```
```{r, eval = FALSE}
states.sf <- us_states()
counties.sf <- us_counties()
longestdaytorns <- longestdaytorns %>%
mutate(ID = as.factor(gsub("-", "", cDate)))
longestdaytorns$ID <- factor(longestdaytorns$ID, levels = rev(levels(longestdaytorns$ID)))
```
```{r}
sts <- state.name[!state.name %in% c("Alaska", "Hawaii")]
stateBorders <- us_states(states = sts)
tm_shape(stateBorders) +
tm_borders(col = "gray70") +
tm_layout(legend.bg.color = "white", legend.text.size = .75) +
#tm_shape(counties.sf) +
# tm_borders(col = "gray40", alpha = .3) +
tm_shape(longestgroup, projection = "merc", is.master = TRUE) +
tm_borders(col = "gray15", alpha = 1, lwd = 5) +
tm_scale_bar(width = .3, size = 0.9, lwd = 2, color.dark = "gray70") +
tm_compass(size = 3, lwd = 2, fontsize = 1, color.dark = "gray70") +
tm_format("World", legend.position = c("left", "top"),
attr.position = c("left", "top"),
legend.frame = FALSE,
#title = "Longest Group of Tornadoes",
#title.size = 1.3,
#title.position = c("left", "TOP"),
inner.margins = c(.05, .2, .05, .2)) + #.15S, .15W, .1,N .2 E
tm_layout(legend.bg.color = "white", legend.text.size = .5) +
tm_shape(longestdaytorns) +
tm_symbols(size = 4, alpha = 0.8, col = "ID", n = 5, palette = "seq", title.col = "September", labels = c("8th", "7th", "6th", "5th", "4th"), border.alpha = 0) +
tm_layout(legend.title.size = 1.1,
legend.position = c("right", "bottom"),
legend.stack = "horizontal",
legend.frame = FALSE,
legend.text.size = 1, legend.width = -0.3, aes.palette = list(seq = "-Greys"))
```
`Figure 1: A cluster of tornadoes in 2004 that occurred between September 4th and 8th. Each circle is a tornado genesis location colored by the day of occurrence. The black line is the minimum convex polygon surrounding all the genesis locations (convex hull)` #CORRECT
`Select tornadoes from large clusters on days with at least ten tornadoes`
Our objective is to quantify the extent to which well-known environmental factors statistically explain tornado activity at an aggregate level as measure by the accumulated power dissipation. Since some of the environmental factors have large diurnal fluctuations that can confound a multi-day analysis, we narrow our focus further by considering only the most prolific days in these largest groups. We define the day as the 24-hour period starting at 6 AM local time (often referred to as the `convective' day) \citep{DoswellEtAl2006}. A big convective day (big day) as part of a large cluster is defined as one with at least ten tornadoes.
**Filter individual tornadoes to remove those that are not part of a large group. Group by group number and convective dates. Remove days within big groups (group days) having fewer than 10 tornadoes. There are 212 big days in large groups. These will be used for further analysis and modeling.**
```{r, eval = FALSE}
BigDays.sfdfT <- All_Tornadoes %>%
filter(groupNumber %in% Groups.sfdfT$groupNumber) %>%
group_by(groupNumber, cDate) %>%
summarize(nT = n(),
n0 = sum(mag == 0),
n1 = sum(mag == 1),
n2 = sum(mag == 2),
n3 = sum(mag == 3),
n4 = sum(mag == 4),
n5 = sum(mag == 5),
GroupDayTotalED = sum(ED),
GroupDayMaxED = max(ED),
GroupDayMeanED = mean(ED),
GroupDayCas = sum(cas),
GroupDayFat = sum(fat),
StartTime_CST = first(DateTime),
EndTime_CST= last(DateTime),
StartTime_UTC = StartTime_CST + 21600,
EndTime_UTC = EndTime_CST + 21600,
Duration = difftime(EndTime_CST, StartTime_CST, units = "secs")) %>%
filter(nT >= 10) %>%
mutate(Year = year(cDate),
Mo = month(cDate),
Month = format(cDate, "%m"), # this is needed to preserve the leading zeros
Day = format(cDate, "%d"),
ATP = GroupDayTotalED/10^12)
dim(BigDays.sfdfT)
```
With this definition, we find 212 big days within our large clusters. Note that there are sometimes more than one big day in a single large cluster. Also, ten or more tornadoes can occur within smaller clusters, and our set of big days accounts for only 28.6\% of all days with at least ten tornadoes. The top two big days (April 26, 2011, and April 27, 2011) are associated with the largest tornado cluster (Table~\ref{BiggestDays}). Note that this set of big days identified and analyzed in this paper is unchanged for values of $s$ (Eq.~\ref{stdiff}) ranging between 8 and 20~m~s$^{-1}$.
**What is the percentage of all big days (>= 10 tornadoes) that occur within a big group? 29% of all big days (>= 10 tornadoes) occur within a big group/outbreak (>= 30 tornadoes)**
```{r, eval = FALSE}
TotalBigDays <- All_Tornadoes %>%
group_by(groupNumber, cDate) %>%
summarize(nT = n()) %>%
filter(nT >= 10)
dim(BigDays.sfdfT)[1]/dim(TotalBigDays)[1] * 100
```
**Create a unique ID for each big day and each tornado. Extract the tornadoes associated with each big day using the unique ID.**
```{r, eval = FALSE}
BigDayTornadoes <- All_Tornadoes %>%
mutate(ID = paste0(gsub("-", "", cDate), groupNumber))
BigDays.sfdfT <- BigDays.sfdfT %>%
mutate(ID = paste0(gsub("-", "", cDate), groupNumber))
BigDayTornadoes <- BigDayTornadoes %>%
filter(ID %in% BigDays.sfdfT$ID)
sum(BigDays.sfdfT$nT)
```
```{r, eval = FALSE}
as.data.frame(BigDays.sfdfT) %>%
dplyr::select(cDate, nT, GroupDayCas, ATP) %>%
top_n(n = 10, wt = nT) %>%
arrange(desc(nT))
```
`Table 2: The top ten big days in the largest tornado clusters. ATP is the accumulated tornado power.` #CORRECT
Figure~\ref{May30} is an example of a big day in a large cluster. There were 88 tornadoes on that day. The cluster is identified as the eighth most prolific by our method (and the first most prolific by \cite{Forbes2006}) and extended over a two convective day period beginning on May 30th. This is the seventh largest of our big days as defined by the number of tornadoes in any large cluster.
**Create a map showing the frequency of big tornado days by county. First transform the CRS of the county boundaries to match the CRS of `BigDays.sfdfT`.**
```{r, eval = FALSE}
counties.sf <- st_transform(counties.sf,
crs = st_crs(BigDays.sfdfT))
stateBorders <- st_transform(stateBorders,
crs = st_crs(BigDays.sfdfT))
```
**Extract the geographic centroids of tornado genesis on big tornado days by creating an `sf` object using `st_centroid()` that reduces the MULTIPOINT geometry to a POINT geometry. Compute the area of the group and the number of tornadoes per area (density).**
```{r}
groupDayCentroids.sfdfT <- st_centroid(BigDays.sfdfT)
groupDayCentroids.sfdfT$groupArea <- st_area(st_convex_hull(BigDays.sfdfT))
groupDayCentroids.sfdfT$groupDensity <- groupDayCentroids.sfdfT$nT/groupDayCentroids.sfdfT$groupArea
```
*Extract the Hull, centroids, and tornadoes for the May 30, 2004 big day. *
```{r, eval = FALSE}
May30 <- BigDays.sfdfT %>%
filter(cDate == "2004-05-30")
May30 <- st_convex_hull(May30)
May30centroid <- groupDayCentroids.sfdfT %>%
filter(cDate == "2004-05-30")
May30tornadoes <- BigDayTornadoes %>%
filter(groupNumber == May30centroid$groupNumber)
May30tornadoes <- May30tornadoes %>%
mutate(Hour2 = ifelse(Hour <= 6, Hour + 24, Hour))
```
**Make a map of the May 30 tornado day. Obtain the state and county boundaries from the `USAboundaries` package. **
```{r, eval = FALSE}
May30tornadoes$Hour2 <- cut(May30tornadoes$Hour2, breaks=c(6,12,18,24,30))
tm_shape(stateBorders) +
tm_borders(col = "gray70", alpha = 1) +
tm_scale_bar(width = .3, size = 0.8, lwd = 1.75, color.dark = "gray70") +
tm_compass(size = 3, fontsize = 1, lwd = 2, color.dark = "gray70") +
tm_layout(legend.bg.color = "white",
legend.text.size = .75,
attr.position = c("right", "top"),
inner.margins = c(.15, .15, .1, .2)) +
#tm_shape(counties.sf) +
# tm_borders(col = "gray40", alpha = .3) +
# tm_scale_bar(width = 8, size = 8, color.dark = "gray70") +
#tm_format("World", legend.position = c("right", "top"),
# attr.position = c("right", "top"),
# legend.frame = FALSE,
#title = "May 30th Tornado Group",
#title.size = 1.3,
#title.position = c("left", "TOP"),
# inner.margins = c(.2, .2, .2, .2)) +
tm_shape(May30, is.master = TRUE, projection = "merc") +
tm_borders(col = "black", lwd = 3) +
tm_shape(May30tornadoes) +
tm_symbols(size = 4, col = "Hour2", alpha = 0.8, palette = "Greys", title.col = "Time [CST]", labels = c("6 to 12", "12 to 18", "18 to 24", "0 to 6"), border.alpha = 0) +
tm_layout(legend.title.size = 1.1,
legend.position = c("right", "bottom"),
legend.stack = "horizontal",
legend.frame = FALSE,
legend.text.size = 1, legend.width = -0.2) +
tm_shape(May30centroid) +
tm_symbols(size = 1.25, col = "black", shape = 24)
```
`Figure 2: Tornadoes occurring on May 30, 2004 as part of a big day within a large cluster. Each point represents a genesis location and is colored by the hour it occurred. The black triangle is the geographic center of the genesis locations. The black line is the minimum convex polygon around the genesis locations (convex hull).` #CORRECT
Most big days occur east of the Rockies and west of the Appalachians depicted by the centroids (Fig.~\ref{Centroids}). In particular, there is a group of centroids that spans the middle South extending northwestward toward the central Great Plains. There is also a tendency for days having the most tornadoes to occur farther to the east.
**Obtain the group day hulls. Transform the CRS to match that of the environmental data raster grids.**
```{r, eval = FALSE}
BigDays.sfdfT <- st_convex_hull(BigDays.sfdfT)
BigDays.sfdfT$HullArea <- st_area(BigDays.sfdfT)
BigDays.sfdfT <- st_transform(BigDays.sfdfT,
crs = "+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
**Plot the centroid for each group day and size the aesthetic by the number of tornadoes.**
```{r}
cr <- RColorBrewer::brewer.pal(6, "Greys")
cr <- cr[-c(1:2)]
tm_shape(stateBorders) +
tm_borders(col = "gray70") +
tm_compass(size = 3, position = c("left", "bottom"), fontsize = 1, lwd = 2, color.dark = "gray70") +
tm_scale_bar(width = 0.2, size = 0.9, lwd = 2, color.dark = "gray70") +
#tm_shape(counties.sf) +
# tm_borders(col = "gray40", alpha = .3) +
# tm_compass(position = c("left", "bottom"), color.dark = "gray70") +
# tm_scale_bar(width = 8, size = 8, color.dark = "gray70") +
tm_format("World",
#title = "Big Day Centroids",
#title.size = 1.3,
#title.position = c("left", "TOP"),
inner.margins = c(.1, .1, .1, .1)) +
tm_shape(groupDayCentroids.sfdfT, is.master = TRUE, projection = "merc") +
tm_symbols(col = "nT", n = 4, alpha = 0.8,
palette = cr,
title.col = "Count",
labels = c("10 to 50", "50 to 100", "100 to 150", "150 to 200"),
shape =24,
scale = 2, border.alpha = 0) +
tm_layout(legend.title.size=1.1,
legend.text.size = 1,
legend.position = c("right","top"),
legend.stack = "horizontal",
legend.bg.color = "white",
legend.frame = FALSE,
legend.width = -0.17)
```
`Figure 3: Centroids of genesis locations occurring on big days in large clusters. Triangles are sized and colored by the number of tornadoes on that day.` #CORRECT
`Accumulated Tornado Power`
We use tornado counts to define clusters and big days but our interest centers on the accumulated power dissipated over all tornadoes occurring during a big day. The standard indicator of tornado strength is the Enhanced Fujita scale \citep{MalamudTurcotte2012}, but path length and width are sometimes used to compute other intensity metrics \citep{BrooksLeeCraven2003, FuhrmannEtAl2014, MalamudTurcotte2012}. Over a cluster of tornadoes, the Destructive Potential Index (DPI) has been used as a measure of the potential for damage and casualties \citep{ThompsonVescio1998}. The adjusted Fujita mile is a collective measure that uses the highest EF rating multiplied by the tornado track length \citep{FuhrmannEtAl2014}.
Here we follow the work of \cite{FrickerElsnerJagger2017} in defining the power dissipation ($E$) of a tornado as the potential of the wind to inflict damage to objects on the surface. It is calculated using damage path area ($A_p$), air density ($\rho$), midpoint wind speed ($v_j$) for each EF rating ($j = 0, \cdots, J$, where $J$ is the maximum EF rating), and the fraction of the damage path ($w_j$) associated with each rating. $E$ is strongly correlated to DPI but more useful here because it is an extensive variable. As such we sum $E$ over all tornadoes occurring during a big day to get the accumulated tornado power (ATP). Mathematically, we express $E$ and ATP as:
$$
E = A_p \rho \sum_{j=0}^{J} w_j v_j^{3},
$$
and
$$
ATP = \sum_{i = 1}^n E_i
$$
where $n$ is the number of tornadoes occurring in the big day.
The reporting of path width changed from an `average' to the maximum in 1994. Our study starts with 1994 and so it is not impacted by this change. ATP is calculated using path width and the highest EF rating of each tornado on the big day. Therefore, ATP is considered a maximum estimate of power dissipation on a given day.
A list of the top ten big days in large clusters by ATP includes the infamous days of April 27, 2011 and May 4, 2003 (Table~\ref{BiggestDays}). The ATP on April 27, 2011 is nearly four times the ATP on the next most powerful day (April 26, 2011). The Spearman rank correlation between ATP and the number of tornadoes is 0.63. Big days occurring as part of a large cluster are most likely to occur during April through June (Table~\ref{seasonalATP}). July and August have the fewest big days. Monthly average ATP peaks in April with the next highest months being March and May. May and November have similar values of average ATP. There are fewer big tornado days during November, but when they occur they tend to include stronger tornadoes with longer paths leading to more ATP.
```{r}
BigDays.sfdfT %>%
group_by(Month) %>%
summarize(avgATP = mean(ATP),
nT = sum(nT),
nBD = n())
```
`Table 3: Seasonal variation in accumulated tornado power (ATP), number of tornadoes, and the number of big days by month. The number of tornadoes and the number of big days are based on tornadoes occurring during the period 1994--2017.` #CORRECT
*Use a Spearman's correlation to quantify the relationship between ATP and the number of tornadoes. *
```{r}
cor.test(x = BigDays.sfdfT$ATP,
y = BigDays.sfdfT$nT,
method = 'spearman')
```
*Spearman's rank correlation between ATP and number of tornadoes is .633*
`Environmental Variables`
To quantify the relationship between ATP and environmental factors on big days we obtain environmental variables from the National Center for Environmental Prediction's (NCEP) North American Regional Reanalysis (NARR). The data are available from the National Center for Atmospheric Research (NCAR). Variables from the NARR have been used previously to analyze convective environments \citep{BrooksLeeCraven2003, GensiniandAshley2011, MesingerEtAl2006}. Tornado environments have been studied without NARR using proximity soundings and weather stations \citep{PotvinEtAl2010}. Here we are interested in aggregate tornado activity occurring over a broad spatial scale so the NARR variables are used rather than proximity soundings.
We use the original NARR 3-hourly files containing environmental data for each day ranging from 00~UTC to 21~UTC in 3-hour increments. For each big day, we choose the closest time before the occurrence of the first tornado (Table~\ref{BigDayZtime}). This allows us to capture the environment {\it before} the occurrence of tornadoes. The majority of times selected are after 12 UTC with the peak occurring at 12~UTC.
**Round the UTC time to nearest 6 hours. This is done with the `align.time()` function from the `xts` package. Adjust it by 3 hours to get the closest time. This falls within the outbreak so you need to subtract by 3 hours (10800 seconds). This will produce the closest 3 hour NARR time that occurs before and not within the big day. **
```{r, eval = FALSE}
BigDays.sfdfT$StartTime_UTC <- force_tz(BigDays.sfdfT$StartTime_UTC, tzone = "UTC")
BigDays.sfdfT$NARRtime <- (align.time(BigDays.sfdfT$StartTime_UTC, n = (60 * 60 * 3)) - 3600 * 3)
```
**Split the NARR date and time into their individual variables. Then bind the columns for BigDays.sfdfT. NOTE: cannot do a mutate because 00Z produces NAs.**
```{r, eval = FALSE}
NARRday = format(as.POSIXct(strptime(BigDays.sfdfT$NARRtime,"%Y-%m-%d %H:%M:%S",tz="")) ,format = "%Y/%m/%d")
NARRZtime = format(as.POSIXct(strptime(BigDays.sfdfT$NARRtime,"%Y-%m-%d %H:%M:%S",tz="")) ,format = "%H")
BigDays.sfdfT <- cbind(BigDays.sfdfT, NARRday, NARRZtime)
```
**Create a downloadable string of information for the varying NARR times. **
```{r, eval = FALSE}
BigDays.sfdfT <- BigDays.sfdfT %>%
mutate(YrMoDa = gsub("/", "", NARRday),
slug = paste0("merged_AWIP32.",YrMoDa, NARRZtime),
slug2 = paste0("merged_AWIP32.",YrMoDa))
```
**Extract a vector of the big days. Save as a .csv for NARR download. **
```{r, eval = FALSE}
bigdays <- BigDays.sfdfT$NARRday
bigdaytimes <- BigDays.sfdfT$NARRZtime
x <- cbind(as.character(bigdays), as.character(bigdaytimes))
#write.csv(x, "BigDays.csv")
```
**Obtain the group day hulls. Transform the CRS to match that of the environmental data raster grids.**
```{r, eval = FALSE}
BigDays.sfdfT <- st_convex_hull(BigDays.sfdfT)
BigDays.sfdfT$HullArea <- st_area(BigDays.sfdfT)
BigDays.sfdfT <- st_transform(BigDays.sfdfT,
crs = "+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
**Get the centroid (central point of the tornado activity) for each big day. **
```{r, eval = FALSE}
BigDayCentroids.df <- st_centroid(BigDays.sfdfT)
BigDayCentroids.df$groupArea <- st_area(st_convex_hull(BigDays.sfdfT))
BigDayCentroids.df$groupDensity <- BigDayCentroids.df$nT/BigDayCentroids.df$groupArea
```
**Create a table to show how many big days fall in each start Z time. **
```{r, eval = FALSE}
BigDays.sfdfT %>%
group_by(NARRZtime) %>%
summarize(count = n())
```
`Table 4: Total number of big days associated with each Z time.` #CORRECT
Each NARR file contains 434 atmospheric variables. We consider only a small subset of the them representing convective instability and wind shear including the 180 to 0 hPa above ground level (AGL) CAPE and CIN (layer 375, 376), the 0 to 3000 m AGL helicity (layer 323), and the 0 to 6000 m AGL U and V components of storm motion (layer 324, 325). Additionally, we download the U and V components of wind for the 1000 hPa (layer 260, 261) and 500 hPa (layer 117, 118) levels. We compute total storm motion as the square root of the sum of the velocity components squared. We compute the bulk shear as the square root of the sum of the squared differences between the u- and v-components of the wind at 1000 hPa and 500 hPa levels. We choose these variables because they are well known to be associated with tornado development \citep{Brooks1994, JacksonandBrown2009, Brown2002, Craven2002, DeanandSchneider2012, AndersonFrey2018, DoswellandEvans2003, ChengEtAl2016}.
**Data is downloaded from NCAR's North American Regional Reanalysis (https://rda.ucar.edu/datasets/ds608.0/#!access). It extends from 1-1-1979 to 11-1-2018. Use the NCAR NARR 3-hourly files.Spatial Extent: Longitude Range: Westernmost = 148.64E Easternmost = 2.568W; Latitude Range: Southernmost = 0.897N Northernmost = 85.333N**
```{r, eval = FALSE}
BigDays.sfdfT <- st_transform(BigDays.sfdfT,
crs = "+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
**The list of all variables can be found here: http://www.emc.ncep.noaa.gov/mmb/rreanl/merged_land_AWIP32.pdf **
```{r, eval = FALSE}
slug <- BigDays.sfdfT$slug
slug2 <- BigDays.sfdfT$slug2
```
**Read the grib files as raster bricks and assign the CAPE and helicity variables to separate raster layers. Extract the average (and extreme) environmental values within each of the big days in large groups hulls.**
```{r, eval = FALSE}
avgCAPE <- numeric()
avgHLCY <- numeric()
avgCIN <- numeric()
avgUSTM <- numeric()
avgVSTM <- numeric()
avgBS <- numeric()
avgSM <- numeric()
avgRATIO <- numeric()
maxCAPE <- numeric()
maxHLCY <- numeric()
minCIN <- numeric()
maxUSTM <- numeric()
maxVSTM <- numeric()
maxBS <- numeric()
maxSM <- numeric()
for(i in 1:length(slug)){
print(i)
rb <- brick(paste0("/Volumes/Zoe's Work/NCARNARR/All/", BigDays.sfdfT$slug2[i], "/",BigDays.sfdfT$slug[i])) #<-- this is for varying NARR times
CAPE <- raster(rb, layer = 375)
HLCY <- raster(rb, layer = 323)
CIN <- raster(rb, layer = 376)
USTM <- raster(rb, layer = 324)
VSTM <- raster(rb, layer = 325)
UGRD500 <- raster(rb, layer = 117)
VGRD500 <- raster(rb, layer = 118)
UGRDsfc <- raster(rb, layer = 293)
VGRDsfc <- raster(rb, layer = 294)
SM <- sqrt(USTM^2 + VSTM^2)
RATIO <- CAPE/abs(CIN)
BS <- sqrt(((UGRD500 - UGRDsfc)**2) + ((VGRD500 - VGRDsfc)**2))
avgCAPE <- c(avgCAPE, as.numeric(raster::extract(CAPE, BigDays.sfdfT[i, ], fun = mean)))
avgHLCY <- c(avgHLCY, as.numeric(raster::extract(HLCY, BigDays.sfdfT[i, ], fun = mean)))
avgCIN <- c(avgCIN, as.numeric(raster::extract(CIN, BigDays.sfdfT[i, ], fun = mean)))
avgUSTM <- c(avgUSTM, as.numeric(raster::extract(USTM, BigDays.sfdfT[i, ], fun = mean)))
avgVSTM <- c(avgVSTM, as.numeric(raster::extract(VSTM, BigDays.sfdfT[i, ], fun = mean)))
avgSM <- c(avgSM, as.numeric(raster::extract(SM, BigDays.sfdfT[i, ], fun = mean)))
avgRATIO <- c(avgRATIO, as.numeric(raster::extract(RATIO, BigDays.sfdfT[i, ], fun = mean)))
avgBS <- c(avgBS, as.numeric(raster::extract(BS, BigDays.sfdfT[i, ], fun = mean)))
maxCAPE <- c(maxCAPE, as.numeric(raster::extract(CAPE, BigDays.sfdfT[i, ], fun = max)))
maxHLCY <- c(maxHLCY, as.numeric(raster::extract(HLCY, BigDays.sfdfT[i, ], fun = max)))
minCIN <- c(minCIN, as.numeric(raster::extract(CIN, BigDays.sfdfT[i, ], fun = min)))
maxUSTM <- c(maxUSTM, as.numeric(raster::extract(USTM, BigDays.sfdfT[i, ], fun = max)))
maxVSTM <- c(maxVSTM, as.numeric(raster::extract(VSTM, BigDays.sfdfT[i, ], fun = max)))
maxSM <- c(maxSM, as.numeric(raster::extract(SM, BigDays.sfdfT[i, ], fun = max)))
maxBS <- c(maxBS, as.numeric(raster::extract(BS, BigDays.sfdfT[i, ], fun = max)))
}
```
**Add environmental data values to the group day means data frame.**
```{r, eval = FALSE}
BigDays.sfdfT$avgCAPE <- avgCAPE
BigDays.sfdfT$avgHLCY <- avgHLCY
BigDays.sfdfT$avgCIN <- avgCIN
BigDays.sfdfT$avgUSTM <- avgUSTM
BigDays.sfdfT$avgVSTM <- avgVSTM
BigDays.sfdfT$avgBS <- avgBS
BigDays.sfdfT$avgRATIO <- avgRATIO
BigDays.sfdfT$avgSM <- avgSM
BigDays.sfdfT$maxCAPE <- maxCAPE
BigDays.sfdfT$maxHLCY <- maxHLCY
BigDays.sfdfT$minCIN <- minCIN
BigDays.sfdfT$maxUSTM <- maxUSTM
BigDays.sfdfT$maxVSTM <- maxVSTM
BigDays.sfdfT$maxBS <- maxBS
BigDays.sfdfT$maxSM <- maxSM
```
**Scale the variables to make them easier to read and input for models. **
```{r, eval = FALSE}
BigDays.sfdfT$avgCAPE2 <- BigDays.sfdfT$avgCAPE/1000
BigDays.sfdfT$avgHLCY2 <- BigDays.sfdfT$avgHLCY/100
BigDays.sfdfT$avgCIN2 <- BigDays.sfdfT$avgCIN/100
BigDays.sfdfT$avgBS2 <- BigDays.sfdfT$avgBS/10
BigDays.sfdfT$avgUSTM2 <- BigDays.sfdfT$avgUSTM/10
BigDays.sfdfT$avgVSTM2 <- BigDays.sfdfT$avgVSTM/10
BigDays.sfdfT$avgSM2 <- BigDays.sfdfT$avgSM/10
BigDays.sfdfT$maxCAPE2 <- BigDays.sfdfT$maxCAPE/1000
BigDays.sfdfT$maxHLCY2 <- BigDays.sfdfT$maxHLCY/100
BigDays.sfdfT$minCIN2 <- BigDays.sfdfT$minCIN/100
BigDays.sfdfT$maxBS2 <- BigDays.sfdfT$maxBS/10
BigDays.sfdfT$maxUSTM2 <- BigDays.sfdfT$maxUSTM/10
BigDays.sfdfT$maxVSTM2 <- BigDays.sfdfT$maxVSTM/10
BigDays.sfdfT$maxSM2 <- BigDays.sfdfT$maxSM/10
```
Selected and computed NARR variables are available in the form of a 277 by 349 rectangular raster. The corresponding big day convex hull encompassing the tornado genesis locations is used as a spatial mask, and the raster values falling under the mask are reduced to a single value. For the variables CAPE, bulk shear, and helicity, the reduction consists of taking the highest value under the mask. For CIN the reduction consists of taking the smallest value under the mask (Fig.~\ref{May6Environments}). In this way, every big day value of ATP is associated with one value for each of the environmental variables. The single highest (or lowest) value ensures that the unstable air mass is represented. To varying degrees this approach distinguishes the environmental variables when considering extremes in ATP (Table~\ref{TopBottomDays}). This ability to distinguish extremes in ATP is particularly true for bulk shear and, to a lesser extent, CAPE and foreshadows the regression results presented next.
```{r, eval = FALSE}
BigDays.sfdfT %>%
dplyr::select(cDate, maxCAPE, minCIN, maxHLCY, maxBS, ATP) %>%
top_n(n = 3, wt = ATP) %>%
arrange(desc(ATP))
BigDays.sfdfT %>%
dplyr::select(cDate, maxCAPE, minCIN, maxHLCY, maxBS, ATP) %>%
top_n(n = 3, wt = -ATP) %>%
arrange(desc(ATP))
```
`Table 5: Single values for the environmental variables on big days. Big days are separated into top three and bottom three groups based on the value of ATP.` #CORRECTED
Save `BigDays.sfdfT` so we can work on the models below without needing to run all the code above.
```{r}
#save(BigDays.sfdfT, BigDayTornadoes, Groups.sfdfT, GroupTornadoes, All_Tornadoes, file = "Climatology.RData")
load("Climatology.RData")
dim(BigDays.sfdfT)
```
#######################################
## Plot Environments for May 6, 2003 ##
#######################################
```{r}
BigDays.sfdfT <- BigDays.sfdfT %>%
mutate(ID = paste0(gsub("-", "", cDate), groupNumber))
```
*May 6, 2003 Big Day*
```{r}
May6 <- BigDays.sfdfT %>%
filter(ID == "200305062624")
May6 <- st_convex_hull(May6)
May6centroid <- groupDayCentroids.sfdfT %>%
filter(ID == "200305062624")
May6tornadoes <- BigDayTornadoes %>%
filter(ID == "200305062624")
```
*Plot the CAPE, HLCY, and CIN for the May 6, 2003 Big Day. First, read in the raster. *
```{r}
rb <- brick(paste0("/Users/zoeschroder/Desktop/Projects/Tor-clusters/merged_AWIP32.20030506/merged_AWIP32.2003050612")) #<-- this is for varying NARR times
CAPE <- raster(rb, layer = 375)
HLCY <- raster(rb, layer = 323)
CIN <- raster(rb, layer = 376)
UGRD500 <- raster(rb, layer = 117)
VGRD500 <- raster(rb, layer = 118)
UGRDsfc <- raster(rb, layer = 293)
VGRDsfc <- raster(rb, layer = 294)
BS <- sqrt(((UGRD500 - UGRDsfc)**2) + ((VGRD500 - VGRDsfc)**2))
```
**Convective Available Potential energy **
```{r}
CAPE_May6 <- crop(CAPE, extent(May6))
CAPE_extent<- mask(CAPE_May6, May6)
plot(CAPE_extent)
maxCAPE <- which.max(CAPE_extent) #Returns a pixel number
#Test that this max value equals the one in BigDays.sfdfT
test <- getValues(CAPE_extent)
test<-as.matrix(test, ncol=1)
```
```{r}
CAPEmaxpos <- xyFromCell(CAPE_extent, maxCAPE)
xy <- data.frame(CAPEmaxpos)
CAPE_latlon <- data.frame(lon=xy$x, lat=xy$y)
print(CAPE_latlon)
CAPE_latlon <- SpatialPoints(CAPE_latlon)
proj4string(CAPE_latlon) = CRS("+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
```{r}
counties <- us_counties()
states <- us_states()
```
```{r, eval = FALSE}
p1 <- tm_shape(CAPE_extent) +
tm_raster(title = "", n = 6, palette = "Greys") +
tm_format(title = expression(paste("CAPE [J ", kg^-1, "]")), "World", legend.position = c("right", "bottom"),
attr.position = c("right", "bottom"),
legend.frame = FALSE,
inner.margins = c(.1, 0, .1, .18), #bottom, left, top
legend.text.size = 1, legend.width = -0.28, legend.title.size=1, legend.bg.color = "white", title.size = 1.4) +
#tm_shape(counties) +
#tm_borders(col = "grey") +
tm_scale_bar(width = 0.3, size = 0.8, position = c("right", "top"), color.dark = "gray70") +
tm_shape(states) +
tm_borders() +
tm_shape(May6, is.master = TRUE, projection = "merc") +
tm_borders(col = "black", lwd = 3) +
#tm_shape(May6centroid) +
# tm_symbols(size = 1.25, col = "black", shape = 24) +
tm_shape(CAPE_latlon) +
tm_symbols(size = 1.4, col = "black", shape = 22)
p1
```
**Helicity**
```{r}
HLCY_May6 <- crop(HLCY, extent(May6))
HLCY_extent<- mask(HLCY_May6, May6)
plot(HLCY_extent)
maxHLCY <- which.max(HLCY_extent)
```
```{r}
HLCYmaxpos <- xyFromCell(HLCY_extent, maxHLCY)
xy <- data.frame(HLCYmaxpos)
HLCY_latlon <- data.frame(lon=xy$x, lat=xy$y)
print(HLCY_latlon)
HLCY_latlon <- SpatialPoints(HLCY_latlon)
proj4string(HLCY_latlon) = CRS("+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
```{r, eval = FALSE}
p2 <- tm_shape(HLCY_extent) +
tm_raster(title = "", n = 5, palette = "Greys") + #"YlGnBu") +
tm_format(title = expression(paste("Helicity [",m^2, s^-2, "]")), "World", legend.position = c("right", "bottom"),
attr.position = c("right", "bottom"),
legend.frame = FALSE,
inner.margins = c(.1, 0, .1, .15), #bottom, left, top
legend.text.size = 1, legend.width = -0.25, legend.title.size=1, legend.bg.color = "white", title.size = 1.4) +
#tm_shape(counties) +
# tm_borders(col = "grey") +
tm_scale_bar(width = 0.3, size = 0.8, position = c("right", "top"), color.dark = "gray70") +
tm_shape(states) +
tm_borders() +
tm_shape(May6, is.master = TRUE, projection = "merc") +
tm_borders(col = "black", lwd = 3) +
#tm_shape(May6centroid) +
# tm_symbols(size = 1.25, col = "black", shape = 24) +
tm_shape(HLCY_latlon) +
tm_symbols(size = 1.4, col = "black", shape = 22)
p2
```
**Convective Inhibition **
```{r}
CIN_May6 <- crop(CIN, extent(May6))
CIN_extent<- mask(CIN_May6, May6)
plot(CIN_extent)
minCIN <- which.min(CIN_extent)
```
```{r}
CINminpos <- xyFromCell(CIN_extent, minCIN)
xy <- data.frame(CINminpos)
CIN_latlon <- data.frame(lon=xy$x, lat=xy$y)
print(CIN_latlon)
CIN_latlon <- SpatialPoints(CIN_latlon)
proj4string(CIN_latlon) = CRS("+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
```{r, eval = FALSE}
p3 <- tm_shape(CIN_extent) +
tm_raster(title = "", n =5, palette = "Greys") + #"YlOrBr") +
tm_format(title = expression(paste("CIN [J ", kg^-1, "]")), "World", legend.position = c("right", "bottom"),
attr.position = c("right", "bottom"),
legend.frame = FALSE,
inner.margins = c(.1, 0, .1, .15), #bottom, left, top
legend.text.size = 1., legend.width = -0.3, legend.title.size=1, legend.bg.color = "white", title.size = 1.4) +
#tm_shape(counties) +
# tm_borders(col = "grey") +
tm_scale_bar(width = 0.3, size = 0.8, position = c("right", "top"), color.dark = "gray70") +
tm_shape(states) +
tm_borders() +
tm_shape(May6, is.master = TRUE, projection = "merc") +
tm_borders(col = "black", lwd = 3) +
#tm_shape(May6centroid) +
# tm_symbols(size = 1.25, col = "black", shape = 24) +
tm_shape(CIN_latlon) +
tm_symbols(size = 1.4, col = "black", shape = 22)
p3
```
**Bulk Shear**
```{r}
BS_May6 <- crop(BS, extent(May6))
BS_extent<- mask(BS_May6, May6)
plot(BS_extent)
maxBS <- which.max(BS_extent)
```
```{r}
BSmaxpos <- xyFromCell(BS_extent, maxBS)
xy <- data.frame(BSmaxpos)
BS_latlon <- data.frame(lon=xy$x, lat=xy$y)
print(BS_latlon)
BS_latlon <- SpatialPoints(BS_latlon)
proj4string(BS_latlon) = CRS("+proj=lcc +lat_1=50 +lat_2=50 +lat_0=50 +lon_0=-107 +x_0=0 +y_0=0 +a=6371200 +b=6371200 +units=m +no_defs")
```
```{r, eval = FALSE}
p4 <- tm_shape(BS_extent) +
tm_raster(title = "", n = 5, palette = "Greys") +
tm_format(title = expression(paste("Bulk Shear [m ", s^-1, "]")), "World", legend.position = c("right", "bottom"),
attr.position = c("right", "bottom"),
legend.frame = FALSE,
inner.margins = c(.1, 0, .1, .15), #bottom, left, top
legend.text.size = 1, legend.width = -0.2, legend.title.size=1, legend.bg.color = "white", title.size = 1.4) +
#tm_shape(counties) +
# tm_borders(col = "grey") +
tm_scale_bar(width = 0.3, size = 0.8, position = c("right", "top"), color.dark = "gray70") +
tm_shape(states) +
tm_borders() +
tm_shape(May6, is.master = TRUE, projection = "merc") +
tm_borders(col = "black", lwd = 3) +
#tm_shape(May6centroid) +
# tm_symbols(size = 1.25, col = "black", shape = 24) +
tm_shape(BS_latlon) +
tm_symbols(size = 1.4, col = "black", shape = 22)
p4
```
**Combine into one figure: **
```{r}
tmap_arrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
```
`Figure 4: Environmental conditions at 12 UTC on May 6, 2003. The black line is the spatial extent of the tornado genesis locations and the black triangle is the centroid of the locations. The first tornado in the cluster started at 14:20 UTC. The black square indicates the locations of the highest value of CAPE (3660~J~kg$^{-1}$), the lowest value of CIN ($-$149~J~kg$^{-1}$), the highest value of helicity (308~m$^2$~s$^{-2}$) and the highest value of bulk shear (33~m~s$^{-1}$).`
## Quantifying the relationship between ATP and environmental factors
We use our collated data representing 212 big days to regress ATP onto the environmental variables whose values are chosen within the area defined by the tornado cluster as described above. The regression model quantifies the effect of each environmental variable on ATP while holding the other variables constant. Due to the large seasonal variability in ATP (Table~\ref{seasonalATP}), the month of the big day occurrence is included as a random effect (an offset to the intercept term). Environmental variables are considered fixed effects as is the year during which the big day occurred. Year is included as a fixed effect because ATP is increasing over time \citep{ElsnerFrickerSchroder2018}. If year is not included in the model, the increasing trend could confound the influence of the other fixed effects. The coefficient on year is the annual trend.
```{r}
cor(BigDays.sfdfT$maxBS, BigDays.sfdfT$maxSM)
```
**Test Average vs Max using Models:**
```{r}
m1 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo),
weights = nT,
data = BigDays.sfdfT)
summary(m1) #SIG
m2 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) + maxCAPE2,
weights = nT,
data = BigDays.sfdfT)
summary(m2) #SIG max...not avg
m3 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) + maxBS2,
weights = nT,
data = BigDays.sfdfT)
summary(m3) #SIG max
m4 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) + maxSM2,
weights = nT,
data = BigDays.sfdfT)
summary(m4) #SIG max
m5 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) + maxHLCY2,
weights = nT,
data = BigDays.sfdfT)
summary(m5) #SIG max...not avg
m6 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) + minCIN2,
weights = nT,
data = BigDays.sfdfT)
summary(m6) #SIG max...not avg
```
*Fit the best model for log(ATP):*
```{r}
model1 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) +
maxCAPE2 + maxBS2 + maxHLCY2 + minCIN2 ,
weights = nT,
data = BigDays.sfdfT)
summary(model1)
model2 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) +
maxBS2 + minCIN2 + maxCAPE2 * maxHLCY2,
weights = nT,
data = BigDays.sfdfT)
summary(model2)
model3 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) +
maxCAPE2 + maxBS2 + maxHLCY,
weights = nT,
data = BigDays.sfdfT)
summary(model3)
model4 <- lmer(log(GroupDayTotalED) ~ I(Year - 2004) + (1|Mo) +
maxCAPE2 * maxBS2 + maxHLCY2 + minCIN2 ,
weights = nT,
data = BigDays.sfdfT)
summary(model4)
AIC(model1, model2, model3, model4) # model 1 is better...without the interaction
confint(model1, method = "Wald")
```
Values of ATP are skewed to the right with most big days having less than 5 TW of ATP. However, the top ten days have more than 30 TW each of ATP with the top day having 221 TW (Table~\ref{BiggestDays}). The distribution of ATP on a log scale is nearly symmetric about the mean value of 9 TW. The median value is 3.2 TW, and the geometric mean is 2.6 TW. So, the model uses the logarithm of ATP as the response variable. Mathematically the model is given by
$$
\ln(\hbox{ATP}_i) = &\beta_0 + \beta_{\hbox{Year}} \hbox{Year}_i + \beta_{\hbox{CAPE}} \hbox{CAPE}_i \\
& \beta_{\hbox{Shear}} \hbox{Shear}_i + \beta_{\hbox{Helicity}} \hbox{Helicity}_i + \beta_{\hbox{CIN}} \hbox{CIN}_i + \\
& \beta_{\hbox{Month}} (1 | \hbox{Month}_i) + \epsilon_i,
$$
where the $\beta_{\hbox{Year}}$, $\beta_{\hbox{CAPE}}$, $\beta_{\hbox{Shear}}$ $\beta_{\hbox{Helicity}}$, $\beta_{\hbox{CIN}}$ and $\beta_{\hbox{Month}}$ are the model coefficients. Month is a random effect as mentioned above so $\beta_{\hbox{Month}}$ is a vector of coefficients with one element for each month of the year. To make interpreting the coefficients easier, we divide the values of CAPE and CIN by 1000, helicity by 100, and bulk shear by 10. The coefficients are determined via an interactive maximum likelihood approach with the {\texttt lmer} function from the {\texttt lme4} package for R \citep{BatesEtAl2015}.
The regression model is best in the sense that it has the lowest Akaike Information Criterion (AIC) score, which measures the overall quality (goodness of fit and simplicity) of the model. Due to a large correlation between bulk shear and relative storm motion (0.55) we retain only bulk shear in the model. We determined that interactions between the environmental variables did not improve the model fit based on higher AIC scores when they were included. We also determined that {\it spatially averaged} values for the environmental variables makes the fit worse. The maximum (and minimum) values within the cluster area provide a better representation of the environmental conditions for the tornadoes on each big day because they are less contaminated by mesoscale and synoptic scale features.
*Table of coefficients.*
```{r}
x <- summary(model1)
xtable(x$coefficients, digits = 3)
```
`Table 6: Coefficient estimates from a regression model of ATP onto year, CAPE, bulk shear, CIN, and helicity using data from n = 212 big days in large groups over the period. The standard error is on the estimate and its t value is the ratio of the estimate to the standard error. The coefficients were determined via an interactive maximum likelihood approach with the lmer function from the lme4 package for R (Bates et al. 2015)` #CORRECT
```{r}
confint(model1)
(exp(0.050) - 1) * 100
```