On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using background sampling approaches. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.
The hypotheses I would like to test with these models are as follows:
H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.
study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?
H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.
study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?
H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.
study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?
H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.
study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?
H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.
study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?
Base models
These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.
explore_brt (mod_file_path = brt_outputs[7 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862741
Residual.Deviance 0.2948092
Correlation 0.9249630
AUC 0.9922000
Per.Expl 78.7336988
cvDeviance 0.5909966
cvCorrelation 0.8025147
cvAUC 0.9464300
cvPer.Expl 57.3679835
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 37.726838
temp_mean 23.806676
sal_mean 7.021355
chl_mean 5.980357
ssh_mean 5.413233
uostr_mean 5.244057
vostr_mean 3.838429
bathy_sd 2.871536
mld_mean 2.581925
uo_mean 2.392186
vo_mean 1.868975
pred_var 1.254433
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 835.60
2 10 bathy_mean 8 ssh_mean 650.18
3 8 ssh_mean 2 temp_mean 556.11
4 10 bathy_mean 3 sal_mean 496.83
5 10 bathy_mean 4 uo_mean 406.37
6 3 sal_mean 2 temp_mean 343.56
7 8 ssh_mean 1 chl_mean 337.30
[1] "External percent deviance explained"
[1] -3.437823
[1] "TPR"
[1] 0.2602644
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.8898691 -0.8883681 0.01999832 0.9823639 -3.437823 0.787337
explore_brt (mod_file_path = brt_outputs[8 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38627408
Residual.Deviance 0.09736975
Correlation 0.98457770
AUC 0.99990000
Per.Expl 92.97615463
cvDeviance 0.34392232
cvCorrelation 0.89862302
cvAUC 0.97914000
cvPer.Expl 75.19088555
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.0639964
tag 24.5885909
temp_mean 18.9524289
ssh_mean 4.9851264
sal_mean 4.0438501
uostr_mean 3.9759944
chl_mean 3.8548424
vostr_mean 2.6340512
bathy_sd 1.2289919
uo_mean 1.1341238
vo_mean 1.1196672
mld_mean 0.8787739
pred_var 0.5395627
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1883.53
2 11 bathy_mean 1 tag 770.37
3 2 chl_mean 1 tag 714.07
4 3 temp_mean 1 tag 626.86
5 9 ssh_mean 1 tag 604.60
6 8 vostr_mean 1 tag 409.93
7 7 vo_mean 1 tag 382.85
8 6 uostr_mean 1 tag 370.45
[1] "External percent deviance explained"
[1] -6.314042
[1] "TPR"
[1] 0.2532462
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6500 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9504276 -0.9615518 0.004527448 0.9781609 -6.314042 0.9297615
explore_brt (mod_file_path = brt_outputs[9 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38627408
Residual.Deviance 0.08949741
Correlation 0.98503325
AUC 0.99990000
Per.Expl 93.54403230
cvDeviance 0.29985176
cvCorrelation 0.91378722
cvAUC 0.98270000
cvPer.Expl 78.36995123
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.6117764
tag 20.0388250
lat 8.8678338
temp_mean 4.2491755
bathy_mean 3.6159044
chl_mean 2.7994230
sal_mean 2.3492197
ssh_mean 1.1726958
vostr_mean 1.1624008
vo_mean 0.6291947
uo_mean 0.6153267
bathy_sd 0.5647726
uostr_mean 0.5065045
mld_mean 0.4821748
pred_var 0.3347723
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 737.42
2 5 sal_mean 1 tag 551.43
3 12 bathy_mean 1 tag 502.86
4 3 chl_mean 1 tag 464.36
5 14 dist_coast 1 tag 419.05
6 9 vostr_mean 1 tag 274.27
7 8 vo_mean 1 tag 270.06
8 10 ssh_mean 1 tag 227.94
9 4 temp_mean 1 tag 195.29
10 13 bathy_sd 1 tag 186.34
11 7 uostr_mean 1 tag 171.34
[1] "External percent deviance explained"
[1] -6.741773
[1] "TPR"
[1] 0.2528088
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9557107 -0.9657932 0.003646813 0.980038 -6.741773 0.9354403
DO models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
explore_brt (mod_file_path = brt_outputs[14 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.08039145
Correlation 0.98792844
AUC 1.00000000
Per.Expl 94.20097610
cvDeviance 0.30084003
cvCorrelation 0.91332970
cvAUC 0.98319000
cvPer.Expl 78.29895482
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.6469598
o2_mean_0m 26.9115748
tag 20.0046262
temp_mean 4.4981160
chl_mean 3.6729261
ssh_mean 2.6241221
uostr_mean 2.1630542
sal_mean 2.0577804
vostr_mean 1.8988935
mld_mean 0.9525212
uo_mean 0.7373759
bathy_sd 0.7185793
vo_mean 0.7077598
pred_var 0.4057107
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 5 sal_mean 1 tag 1251.29
2 4 temp_mean 2 o2_mean_0m 856.53
3 2 o2_mean_0m 1 tag 838.02
4 12 bathy_mean 1 tag 811.97
5 4 temp_mean 1 tag 452.47
6 3 chl_mean 1 tag 413.64
7 13 bathy_sd 1 tag 363.30
8 8 vo_mean 1 tag 348.54
9 7 uostr_mean 1 tag 340.42
10 9 vostr_mean 1 tag 299.61
[1] "External percent deviance explained"
[1] -6.928509
[1] "TPR"
[1] 0.2512725
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9619643 -0.9797619 0.0005028151 0.9977697 -6.928509 0.9420098
explore_brt (mod_file_path = brt_outputs[15 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06074708
Correlation 0.99206350
AUC 1.00000000
Per.Expl 95.61801965
cvDeviance 0.26396205
cvCorrelation 0.92768023
cvAUC 0.98584000
cvPer.Expl 80.95914152
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.1425792
tag 18.4444753
o2_mean_0m 10.5833390
lat 7.0053558
bathy_mean 3.1295802
chl_mean 2.2557239
sal_mean 1.5213093
temp_mean 1.2786828
vostr_mean 0.9690416
ssh_mean 0.9508931
mld_mean 0.5242121
vo_mean 0.5180216
uo_mean 0.4790866
bathy_sd 0.4771174
uostr_mean 0.4558059
pred_var 0.2647762
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 779.87
2 2 lat 1 tag 692.69
3 5 temp_mean 3 o2_mean_0m 636.47
4 15 dist_coast 1 tag 466.74
5 6 sal_mean 1 tag 426.43
6 4 chl_mean 1 tag 421.31
7 13 bathy_mean 1 tag 420.13
8 14 bathy_sd 1 tag 344.47
9 9 vo_mean 1 tag 303.38
10 10 vostr_mean 1 tag 230.16
11 5 temp_mean 1 tag 222.22
12 8 uostr_mean 1 tag 208.55
13 15 dist_coast 3 o2_mean_0m 174.48
[1] "External percent deviance explained"
[1] -7.624278
[1] "TPR"
[1] 0.2515557
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9704327 -0.9864821 0.0001846725 0.9964614 -7.624278 0.9561802
explore_brt (mod_file_path = brt_outputs[13 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.07118474
Correlation 0.98998036
AUC 1.00000000
Per.Expl 94.86510106
cvDeviance 0.28488053
cvCorrelation 0.92010488
cvAUC 0.98419000
cvPer.Expl 79.45019060
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.0714692
o2_mean_0m 26.9541769
tag 18.8753315
o2_mean_60m 10.1542238
chl_mean 3.3838023
ssh_mean 2.9041357
temp_mean 1.9892805
sal_mean 1.6667460
vostr_mean 1.1791073
uostr_mean 1.0524747
mld_mean 0.7605733
uo_mean 0.6030973
vo_mean 0.5369837
bathy_sd 0.5062655
pred_var 0.3623321
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 o2_mean_0m 1 tag 914.39
2 5 sal_mean 1 tag 778.04
3 12 bathy_mean 1 tag 774.90
4 4 temp_mean 2 o2_mean_0m 449.37
5 3 chl_mean 1 tag 439.91
6 13 bathy_sd 1 tag 427.34
7 4 temp_mean 1 tag 381.87
8 14 o2_mean_60m 1 tag 355.36
9 9 vostr_mean 1 tag 293.42
10 8 vo_mean 1 tag 292.69
11 10 ssh_mean 1 tag 259.83
[1] "External percent deviance explained"
[1] -7.112088
[1] "TPR"
[1] 0.2514715
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9657599 -0.9823985 0.000340465 0.9978181 -7.112088 0.948651
explore_brt (mod_file_path = brt_outputs[10 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06836473
Correlation 0.99037738
AUC 1.00000000
Per.Expl 95.06852165
cvDeviance 0.28369551
cvCorrelation 0.92037443
cvAUC 0.98436000
cvPer.Expl 79.53567176
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 27.2629760
bathy_mean 25.2498534
tag 18.4421903
o2_mean_250m 16.5248776
chl_mean 2.2649308
temp_mean 1.9626402
sal_mean 1.7897130
ssh_mean 1.5479838
uostr_mean 1.1276963
vostr_mean 1.0966645
bathy_sd 0.7244073
vo_mean 0.5873495
mld_mean 0.5347793
uo_mean 0.5208155
pred_var 0.3631225
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 5 sal_mean 1 tag 1081.33
2 2 o2_mean_0m 1 tag 831.07
3 4 temp_mean 2 o2_mean_0m 634.92
4 14 o2_mean_250m 1 tag 580.53
5 3 chl_mean 1 tag 508.53
6 12 bathy_mean 1 tag 461.58
7 9 vostr_mean 1 tag 296.67
8 4 temp_mean 1 tag 295.22
9 8 vo_mean 1 tag 272.25
10 14 o2_mean_250m 2 o2_mean_0m 254.15
11 13 bathy_sd 1 tag 249.56
[1] "External percent deviance explained"
[1] -7.449017
[1] "TPR"
[1] 0.251507
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9667664 -0.9823896 0.0003823276 0.9973991 -7.449017 0.9506852
explore_brt (mod_file_path = brt_outputs[11 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06749084
Correlation 0.99030698
AUC 1.00000000
Per.Expl 95.13155977
cvDeviance 0.27553664
cvCorrelation 0.92358978
cvAUC 0.98486000
cvPer.Expl 80.12421074
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 27.1899223
bathy_mean 24.5062176
tag 17.3552811
o2_mean_250m 14.4641723
o2_mean_60m 5.7991520
ssh_mean 1.9059494
chl_mean 1.8501917
sal_mean 1.4611193
temp_mean 1.4363430
uostr_mean 0.8686807
vostr_mean 0.8420687
bathy_sd 0.5476465
vo_mean 0.5305118
mld_mean 0.5035008
uo_mean 0.4481542
pred_var 0.2910886
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 o2_mean_0m 1 tag 817.46
2 5 sal_mean 1 tag 587.30
3 4 temp_mean 2 o2_mean_0m 558.25
4 15 o2_mean_250m 1 tag 478.71
5 3 chl_mean 1 tag 429.20
6 12 bathy_mean 1 tag 410.22
7 4 temp_mean 1 tag 305.22
8 14 o2_mean_60m 1 tag 300.61
9 9 vostr_mean 1 tag 270.41
10 7 uostr_mean 1 tag 228.06
11 13 bathy_sd 1 tag 205.43
12 15 o2_mean_250m 2 o2_mean_0m 203.41
13 10 ssh_mean 1 tag 189.41
[1] "External percent deviance explained"
[1] -7.413317
[1] "TPR"
[1] 0.2514821
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9680402 -0.9840536 0.0002799894 0.997662 -7.413317 0.9513156
explore_brt (mod_file_path = brt_outputs[12 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06674273
Correlation 0.99036056
AUC 1.00000000
Per.Expl 95.18552429
cvDeviance 0.25849355
cvCorrelation 0.92868348
cvAUC 0.98632000
cvPer.Expl 81.35361122
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 50.7979905
tag 17.4187334
o2_mean_0m 11.1532694
o2_mean_250m 4.8956535
lat 4.3567015
o2_mean_60m 2.7427545
chl_mean 1.7488786
bathy_mean 1.2868608
sal_mean 1.0812867
temp_mean 0.9791718
vostr_mean 0.6174175
ssh_mean 0.6127769
uostr_mean 0.5321097
bathy_sd 0.4024418
vo_mean 0.3965878
mld_mean 0.3741117
uo_mean 0.3487283
pred_var 0.2545257
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 662.12
2 5 temp_mean 3 o2_mean_0m 481.90
3 2 lat 1 tag 385.52
4 6 sal_mean 1 tag 348.70
5 4 chl_mean 1 tag 336.40
6 14 bathy_sd 1 tag 330.96
7 17 o2_mean_250m 1 tag 293.11
8 13 bathy_mean 1 tag 276.05
9 15 dist_coast 1 tag 246.21
10 10 vostr_mean 1 tag 223.18
11 16 o2_mean_60m 1 tag 213.22
12 9 vo_mean 1 tag 206.61
13 16 o2_mean_60m 5 temp_mean 175.27
14 5 temp_mean 1 tag 144.27
15 15 dist_coast 3 o2_mean_0m 140.51
16 11 ssh_mean 1 tag 109.39
[1] "External percent deviance explained"
[1] -7.47654
[1] "TPR"
[1] 0.2514392
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9687462 -0.9846778 0.0002475989 0.9969652 -7.47654 0.9518552
AGI models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
explore_brt (mod_file_path = brt_outputs[5 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.08532138
Correlation 0.98702006
AUC 1.00000000
Per.Expl 93.84534182
cvDeviance 0.31310146
cvCorrelation 0.91077834
cvAUC 0.98123000
cvPer.Expl 77.41442573
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.3849938
tag 22.9924922
temp_mean 19.0362797
ssh_mean 5.1405835
uostr_mean 4.5173002
AGI_0m 3.9279250
sal_mean 3.5882270
chl_mean 2.8743197
vostr_mean 2.5250985
bathy_sd 1.2576208
uo_mean 0.9020629
vo_mean 0.8232203
mld_mean 0.6439118
pred_var 0.3859644
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1816.06
2 4 sal_mean 1 tag 1301.25
3 3 temp_mean 1 tag 851.13
4 11 bathy_mean 1 tag 729.91
5 2 chl_mean 1 tag 428.42
6 7 vo_mean 1 tag 417.68
7 9 ssh_mean 1 tag 328.79
8 8 vostr_mean 1 tag 326.69
9 11 bathy_mean 3 temp_mean 318.21
10 13 AGI_0m 1 tag 295.11
[1] "External percent deviance explained"
[1] -6.590472
[1] "TPR"
[1] 0.2511185
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9603852 -0.9788467 0.0005035391 0.9568836 -6.590472 0.9384534
explore_brt (mod_file_path = brt_outputs[6 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.07272435
Correlation 0.98931402
AUC 1.00000000
Per.Expl 94.75402902
cvDeviance 0.27470458
cvCorrelation 0.92333512
cvAUC 0.98468000
cvPer.Expl 80.18418485
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.4969130
tag 19.1224933
lat 8.8661139
temp_mean 4.4530414
bathy_mean 3.4033437
AGI_0m 2.7486874
chl_mean 2.2110726
sal_mean 1.8920127
ssh_mean 1.0292439
vostr_mean 0.8247087
bathy_sd 0.5853612
vo_mean 0.5707580
uo_mean 0.5420383
uostr_mean 0.5176506
mld_mean 0.4773298
pred_var 0.2592316
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 711.46
2 5 sal_mean 1 tag 430.54
3 13 bathy_sd 1 tag 389.47
4 3 chl_mean 1 tag 388.51
5 12 bathy_mean 1 tag 385.22
6 15 dist_coast 1 tag 319.70
7 14 AGI_0m 1 tag 299.95
8 14 AGI_0m 4 temp_mean 291.55
9 4 temp_mean 1 tag 271.29
10 8 vo_mean 1 tag 270.02
11 9 vostr_mean 1 tag 263.25
12 14 AGI_0m 2 lat 173.01
13 10 ssh_mean 1 tag 168.72
[1] "External percent deviance explained"
[1] -7.078305
[1] "TPR"
[1] 0.251129
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9665712 -0.9839926 0.00022761 0.9554261 -7.078305 0.9475403
explore_brt (mod_file_path = brt_outputs[4 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.07558239
Correlation 0.98942417
AUC 1.00000000
Per.Expl 94.54786401
cvDeviance 0.29847916
cvCorrelation 0.91616548
cvAUC 0.98261000
cvPer.Expl 78.46920527
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.1481369
tag 22.3947478
temp_mean 19.6320724
AGI_0m 4.5763214
uostr_mean 4.0458198
AGI_60m 3.9505830
ssh_mean 3.2901697
sal_mean 3.2745439
vostr_mean 2.2604520
chl_mean 2.1686907
bathy_sd 0.8962724
uo_mean 0.7362601
vo_mean 0.7075609
mld_mean 0.5826191
pred_var 0.3357501
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1887.49
2 4 sal_mean 1 tag 1176.99
3 3 temp_mean 1 tag 753.28
4 11 bathy_mean 1 tag 637.60
5 14 AGI_60m 1 tag 576.97
6 8 vostr_mean 1 tag 433.15
7 2 chl_mean 1 tag 415.56
8 12 bathy_sd 1 tag 400.28
9 11 bathy_mean 3 temp_mean 363.16
10 7 vo_mean 1 tag 338.63
11 9 ssh_mean 1 tag 259.86
[1] "External percent deviance explained"
[1] -6.698273
[1] "TPR"
[1] 0.2511296
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9642625 -0.9823706 0.0003412825 0.9570393 -6.698273 0.9454786
explore_brt (mod_file_path = brt_outputs[1 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.09622781
Correlation 0.98319352
AUC 0.99980000
Per.Expl 93.05860674
cvDeviance 0.30428517
cvCorrelation 0.91335265
cvAUC 0.98178000
cvPer.Expl 78.05038891
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.2607017
tag 20.7337462
temp_mean 17.6619552
AGI_250m 12.8869113
uostr_mean 5.3151550
ssh_mean 4.0761838
AGI_0m 3.6152744
sal_mean 3.0128057
chl_mean 1.9236281
vostr_mean 1.2761366
bathy_sd 1.1067112
vo_mean 0.6687755
uo_mean 0.6510241
mld_mean 0.5242352
pred_var 0.2867560
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1354.07
2 4 sal_mean 1 tag 1099.13
3 3 temp_mean 1 tag 650.85
4 14 AGI_250m 1 tag 423.38
5 12 bathy_sd 1 tag 385.64
6 2 chl_mean 1 tag 371.29
7 11 bathy_mean 1 tag 357.84
8 9 ssh_mean 1 tag 292.44
9 7 vo_mean 1 tag 283.14
10 8 vostr_mean 1 tag 229.35
11 13 AGI_0m 1 tag 216.49
[1] "External percent deviance explained"
[1] -6.525803
[1] "TPR"
[1] 0.2510634
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4900 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9585593 -0.9766923 0.0006610235 0.9583741 -6.525803 0.9305861
explore_brt (mod_file_path = brt_outputs[2 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.08163443
Correlation 0.98713893
AUC 1.00000000
Per.Expl 94.11130005
cvDeviance 0.29071385
cvCorrelation 0.91828031
cvAUC 0.98297000
cvPer.Expl 79.02935599
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 25.7475201
tag 20.6577555
temp_mean 18.3535453
AGI_250m 12.2429766
uostr_mean 4.4230811
ssh_mean 4.1802787
AGI_0m 4.0304067
sal_mean 2.6720505
AGI_60m 1.6636685
chl_mean 1.5954124
vostr_mean 1.2580929
bathy_sd 1.1907574
vo_mean 0.7000913
uo_mean 0.5201387
mld_mean 0.4689463
pred_var 0.2952780
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1653.43
2 4 sal_mean 1 tag 1094.62
3 3 temp_mean 1 tag 585.48
4 14 AGI_60m 1 tag 369.06
5 8 vostr_mean 1 tag 336.94
6 11 bathy_mean 1 tag 335.06
7 15 AGI_250m 1 tag 309.88
8 12 bathy_sd 1 tag 303.52
9 2 chl_mean 1 tag 295.26
10 9 ssh_mean 1 tag 239.61
11 7 vo_mean 1 tag 210.65
12 13 AGI_0m 1 tag 163.15
13 5 uo_mean 1 tag 142.51
[1] "External percent deviance explained"
[1] -6.697318
[1] "TPR"
[1] 0.251067
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.963364 -0.9805785 0.0004149873 0.9566989 -6.697318 0.941113
explore_brt (mod_file_path = brt_outputs[3 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.06368071
Correlation 0.99128982
AUC 1.00000000
Per.Expl 95.40639170
cvDeviance 0.26342427
cvCorrelation 0.92717887
cvAUC 0.98558000
cvPer.Expl 80.99788972
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.3167942
tag 19.4852974
lat 8.2334957
temp_mean 3.9473184
AGI_250m 3.3725702
AGI_0m 2.2700357
bathy_mean 1.9914815
chl_mean 1.7837172
sal_mean 1.4620165
AGI_60m 1.1307988
ssh_mean 0.8257918
vostr_mean 0.5574685
bathy_sd 0.5546149
vo_mean 0.5191201
uostr_mean 0.4545932
uo_mean 0.4254801
mld_mean 0.4233323
pred_var 0.2460737
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 716.35
2 13 bathy_sd 1 tag 437.20
3 3 chl_mean 1 tag 402.35
4 5 sal_mean 1 tag 378.54
5 16 AGI_60m 1 tag 342.53
6 17 AGI_250m 1 tag 264.53
7 12 bathy_mean 1 tag 261.67
8 4 temp_mean 1 tag 240.35
9 15 dist_coast 1 tag 215.24
10 14 AGI_0m 1 tag 210.42
11 8 vo_mean 1 tag 200.95
12 9 vostr_mean 1 tag 193.50
13 14 AGI_0m 4 temp_mean 189.18
14 14 AGI_0m 2 lat 124.05
15 6 uo_mean 1 tag 115.00
16 10 ssh_mean 1 tag 113.62
[1] "External percent deviance explained"
[1] -7.26371
[1] "TPR"
[1] 0.2512885
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9698598 -0.9855271 0.000225953 0.9550295 -7.26371 0.9540639
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_bckg_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
base_0m_Nspat_Ytag
92.976
0.876
0.761
0.961
0.994
0.141
0.960
0.930
base_0m_Yspat_Ytag
93.544
0.887
0.770
0.964
0.995
0.125
0.963
0.935
do_0m_Nspat_Ytag
94.201
0.901
0.772
0.971
0.996
0.124
0.969
0.942
do_0m_Yspat_Ytag
95.618
0.920
0.788
0.977
0.997
0.110
0.976
0.956
do_0m_60m_Nspat_Ytag
94.865
0.908
0.775
0.973
0.997
0.119
0.972
0.949
do_0m_250m_Nspat_Ytag
95.069
0.909
0.783
0.974
0.996
0.119
0.972
0.951
do_0m_60m_250m_Nspat_Ytag
95.132
0.913
0.783
0.976
0.997
0.116
0.973
0.951
do_0m_60m_250m_Yspat_Ytag
95.186
0.918
0.784
0.977
0.997
0.113
0.975
0.952
agi_0m_Nspat_Ytag
93.845
0.901
0.765
0.971
0.997
0.124
0.970
0.938
agi_0m_Yspat_Ytag
94.754
0.916
0.776
0.975
0.998
0.114
0.974
0.948
agi_0m_60m_Nspat_Ytag
94.548
0.908
0.765
0.973
0.997
0.119
0.972
0.945
agi_0m_250m_Nspat_Ytag
93.059
0.897
0.767
0.967
0.997
0.129
0.967
0.931
agi_0m_60m_250m_Nspat_Ytag
94.111
0.907
0.767
0.972
0.997
0.122
0.971
0.941
agi_0m_60m_250m_Yspat_Ytag
95.406
0.920
0.777
0.976
0.998
0.111
0.975
0.954
ggplot (output_sum, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial models w/ tag ID
Base models: Relative to the CRW PA base models, these had drastically higher AUC scores and deviance explained values. The base model with no spatial or tag ID predictors was the lowest scoring model.
DO and AGIModel performance generally increased with the added depth layers, but were all fairly comparable to each other. Models with spatial and tag ID predictors performed the best, but as described on the CRW PA document, we will likely not include them for these models as they would not be included in the projection work and are not essential for addressing this study’s objectives.
The performance metrics across comparable DO and AGI models were much more similar relative to the models with the CRW PA data.
DO models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
explore_brt (mod_file_path = brt_outputs_Ntag[12 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2227822
Correlation 0.9477851
AUC 0.9963000
Per.Expl 83.9296423
cvDeviance 0.5119148
cvCorrelation 0.8357045
cvAUC 0.9593600
cvPer.Expl 63.0731081
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 37.042591
o2_mean_0m 29.610389
temp_mean 8.255550
chl_mean 5.168471
ssh_mean 3.874249
sal_mean 3.296404
vostr_mean 2.770317
mld_mean 2.274538
bathy_sd 2.115492
uostr_mean 1.764755
uo_mean 1.535677
vo_mean 1.263401
pred_var 1.028166
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 1203.43
2 2 chl_mean 1 o2_mean_0m 685.80
3 11 bathy_mean 3 temp_mean 629.37
4 11 bathy_mean 5 uo_mean 482.06
5 9 ssh_mean 3 temp_mean 428.24
6 10 mld_mean 7 vo_mean 397.02
7 11 bathy_mean 9 ssh_mean 393.82
8 11 bathy_mean 4 sal_mean 391.94
[1] "External percent deviance explained"
[1] -4.109901
[1] "TPR"
[1] 0.2542114
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9154336 -0.9314576 0.007539246 1.002372 -4.109901 0.8392964
explore_brt (mod_file_path = brt_outputs_Ntag[13 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.1924089
Correlation 0.9564733
AUC 0.9975000
Per.Expl 86.1206180
cvDeviance 0.4707743
cvCorrelation 0.8515532
cvAUC 0.9652100
cvPer.Expl 66.0407773
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 53.5317897
o2_mean_0m 12.2975738
lat 8.1708571
bathy_mean 5.8707423
chl_mean 3.6108876
temp_mean 3.2303458
sal_mean 2.5444041
ssh_mean 2.2134207
vostr_mean 1.7157868
mld_mean 1.5219427
uo_mean 1.1835532
bathy_sd 1.1832015
vo_mean 1.0970184
uostr_mean 0.9877975
pred_var 0.8406787
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 890.78
2 12 bathy_mean 4 temp_mean 775.24
3 14 dist_coast 11 mld_mean 564.53
4 3 chl_mean 2 o2_mean_0m 249.12
5 2 o2_mean_0m 1 lat 209.95
6 12 bathy_mean 1 lat 192.27
7 12 bathy_mean 6 uo_mean 190.20
8 12 bathy_mean 5 sal_mean 178.54
9 7 uostr_mean 2 o2_mean_0m 145.22
10 12 bathy_mean 3 chl_mean 137.46
11 12 bathy_mean 10 ssh_mean 127.01
[1] "External percent deviance explained"
[1] -4.442145
[1] "TPR"
[1] 0.2531495
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9259653 -0.9413396 0.005478969 1.00341 -4.442145 0.8612062
explore_brt (mod_file_path = brt_outputs_Ntag[11 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2018195
Correlation 0.9539294
AUC 0.9971000
Per.Expl 85.4417866
cvDeviance 0.4904242
cvCorrelation 0.8443609
cvAUC 0.9626400
cvPer.Expl 64.6233345
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.8872618
o2_mean_0m 29.0113534
o2_mean_60m 11.2428960
ssh_mean 4.9169342
chl_mean 4.7561458
temp_mean 4.0546910
sal_mean 2.9788618
vostr_mean 1.8897280
mld_mean 1.8536327
bathy_sd 1.6149277
uo_mean 1.4543292
uostr_mean 1.3424132
vo_mean 1.1227679
pred_var 0.8740571
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 811.01
2 6 uostr_mean 1 o2_mean_0m 445.93
3 2 chl_mean 1 o2_mean_0m 381.63
4 11 bathy_mean 4 sal_mean 370.20
5 10 mld_mean 7 vo_mean 347.32
6 11 bathy_mean 5 uo_mean 325.10
7 13 o2_mean_60m 3 temp_mean 284.60
8 9 ssh_mean 2 chl_mean 279.34
9 11 bathy_mean 3 temp_mean 279.32
10 11 bathy_mean 1 o2_mean_0m 238.68
[1] "External percent deviance explained"
[1] -4.350923
[1] "TPR"
[1] 0.2538241
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9223194 -0.9369767 0.006545141 1.002044 -4.350923 0.8544179
explore_brt (mod_file_path = brt_outputs_Ntag[8 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2217747
Correlation 0.9471965
AUC 0.9959000
Per.Expl 84.0023199
cvDeviance 0.4919018
cvCorrelation 0.8427542
cvAUC 0.9622200
cvPer.Expl 64.5167458
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.0545282
o2_mean_0m 30.5314940
o2_mean_250m 15.9557395
temp_mean 3.9670785
chl_mean 3.9145848
sal_mean 2.7179083
ssh_mean 2.6863495
bathy_sd 1.7546135
vostr_mean 1.5258534
mld_mean 1.4129971
uostr_mean 1.3536665
uo_mean 1.3228056
vo_mean 1.0371771
pred_var 0.7652041
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 733.46
2 13 o2_mean_250m 1 o2_mean_0m 339.34
3 2 chl_mean 1 o2_mean_0m 329.37
4 13 o2_mean_250m 4 sal_mean 297.21
5 6 uostr_mean 5 uo_mean 278.72
6 11 bathy_mean 3 temp_mean 265.17
7 11 bathy_mean 1 o2_mean_0m 261.90
8 11 bathy_mean 4 sal_mean 224.97
9 9 ssh_mean 4 sal_mean 194.26
10 4 sal_mean 1 o2_mean_0m 187.63
[1] "External percent deviance explained"
[1] -4.187577
[1] "TPR"
[1] 0.2540331
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9189645 -0.9341724 0.00722521 1.002586 -4.187577 0.8400232
explore_brt (mod_file_path = brt_outputs_Ntag[9 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.1928786
Correlation 0.9567911
AUC 0.9976000
Per.Expl 86.0867365
cvDeviance 0.4815743
cvCorrelation 0.8475797
cvAUC 0.9638500
cvPer.Expl 65.2617211
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 30.1719892
bathy_mean 29.0380795
o2_mean_250m 12.9073690
o2_mean_60m 7.1186837
chl_mean 3.2788350
temp_mean 3.2112162
sal_mean 2.8197533
ssh_mean 2.5431217
bathy_sd 1.6008272
vostr_mean 1.4381590
mld_mean 1.4332633
uostr_mean 1.3219324
uo_mean 1.2491223
vo_mean 1.0814297
pred_var 0.7862186
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 632.37
2 2 chl_mean 1 o2_mean_0m 631.48
3 13 o2_mean_60m 3 temp_mean 326.47
4 11 bathy_mean 5 uo_mean 292.91
5 14 o2_mean_250m 1 o2_mean_0m 266.03
6 11 bathy_mean 3 temp_mean 212.56
7 11 bathy_mean 4 sal_mean 183.05
8 6 uostr_mean 5 uo_mean 176.12
9 14 o2_mean_250m 11 bathy_mean 166.45
10 4 sal_mean 1 o2_mean_0m 153.66
11 9 ssh_mean 4 sal_mean 153.50
[1] "External percent deviance explained"
[1] -4.421676
[1] "TPR"
[1] 0.2532748
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9253238 -0.9416763 0.005680135 1.003422 -4.421676 0.8608674
explore_brt (mod_file_path = brt_outputs_Ntag[10 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.1730551
Correlation 0.9630163
AUC 0.9983000
Per.Expl 87.5167023
cvDeviance 0.4571302
cvCorrelation 0.8567712
cvAUC 0.9671700
cvPer.Expl 67.0249860
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.0947600
o2_mean_0m 11.9843937
o2_mean_250m 7.5420896
lat 5.0225211
bathy_mean 3.7536001
o2_mean_60m 3.5987315
chl_mean 2.9175112
temp_mean 2.7347070
sal_mean 2.4927940
ssh_mean 1.8254724
mld_mean 1.3819650
vostr_mean 1.1186016
uo_mean 1.0635226
uostr_mean 0.9547572
bathy_sd 0.9449802
vo_mean 0.9346186
pred_var 0.6349742
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 1104.53
2 12 bathy_mean 4 temp_mean 566.77
3 14 dist_coast 11 mld_mean 551.57
4 12 bathy_mean 1 lat 485.06
5 12 bathy_mean 5 sal_mean 373.98
6 5 sal_mean 2 o2_mean_0m 254.84
7 12 bathy_mean 3 chl_mean 244.77
8 16 o2_mean_250m 12 bathy_mean 192.70
9 15 o2_mean_60m 4 temp_mean 184.99
10 16 o2_mean_250m 2 o2_mean_0m 154.32
11 16 o2_mean_250m 1 lat 133.79
12 2 o2_mean_0m 1 lat 130.57
13 15 o2_mean_60m 3 chl_mean 117.28
14 3 chl_mean 2 o2_mean_0m 114.22
[1] "External percent deviance explained"
[1] -4.646858
[1] "TPR"
[1] 0.2525845
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9319747 -0.9496718 0.004143605 1.002288 -4.646858 0.875167
AGI models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
explore_brt (mod_file_path = brt_outputs_Ntag[5 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.2497858
Correlation 0.9395089
AUC 0.9945000
Per.Expl 81.9816994
cvDeviance 0.5307930
cvCorrelation 0.8283713
cvAUC 0.9563800
cvPer.Expl 61.7112444
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 36.9573034
temp_mean 21.9226886
AGI_0m 9.9456826
ssh_mean 5.1190726
uostr_mean 5.0894934
sal_mean 4.6868399
chl_mean 4.6532916
vostr_mean 3.5292848
bathy_sd 2.1704713
uo_mean 1.7266251
mld_mean 1.7123633
vo_mean 1.5145409
pred_var 0.9723426
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 6924.06
2 10 bathy_mean 4 uo_mean 486.67
3 12 AGI_0m 8 ssh_mean 471.86
4 10 bathy_mean 8 ssh_mean 421.48
5 12 AGI_0m 4 uo_mean 404.95
6 10 bathy_mean 2 temp_mean 375.23
7 10 bathy_mean 3 sal_mean 341.20
8 8 ssh_mean 5 uostr_mean 230.04
[1] "External percent deviance explained"
[1] -3.883058
[1] "TPR"
[1] 0.2539667
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9104791 -0.9295857 0.007261449 0.9609996 -3.883058 0.819817
explore_brt (mod_file_path = brt_outputs_Ntag[6 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1967422
Correlation 0.9556098
AUC 0.9974000
Per.Expl 85.8080043
cvDeviance 0.4860996
cvCorrelation 0.8459321
cvAUC 0.9628400
cvPer.Expl 64.9352059
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 53.5244509
lat 10.5089650
AGI_0m 7.1058947
bathy_mean 6.5961745
temp_mean 5.3593100
chl_mean 3.3173597
sal_mean 2.8581691
ssh_mean 2.0268410
uo_mean 1.5245115
vostr_mean 1.4444895
mld_mean 1.3388584
bathy_sd 1.2377161
uostr_mean 1.2096878
vo_mean 1.1623617
pred_var 0.7852103
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 2714.25
2 13 AGI_0m 1 lat 626.22
3 11 bathy_mean 3 temp_mean 463.19
4 11 bathy_mean 2 chl_mean 314.46
5 3 temp_mean 1 lat 308.75
6 11 bathy_mean 5 uo_mean 282.97
7 13 AGI_0m 11 bathy_mean 248.68
8 11 bathy_mean 1 lat 234.37
9 14 dist_coast 8 vostr_mean 177.05
10 11 bathy_mean 9 ssh_mean 176.04
11 11 bathy_mean 4 sal_mean 173.67
[1] "External percent deviance explained"
[1] -4.389091
[1] "TPR"
[1] 0.2526244
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9251414 -0.9441604 0.004502158 0.9584907 -4.389091 0.85808
explore_brt (mod_file_path = brt_outputs_Ntag[4 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1981884
Correlation 0.9571046
AUC 0.9975000
Per.Expl 85.7036795
cvDeviance 0.5045864
cvCorrelation 0.8398032
cvAUC 0.9604600
cvPer.Expl 63.6016628
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 34.034848
temp_mean 21.042165
AGI_0m 9.921024
AGI_60m 5.785260
sal_mean 5.106237
uostr_mean 4.656571
chl_mean 4.204098
ssh_mean 4.041292
vostr_mean 3.376607
bathy_sd 1.942936
uo_mean 1.777164
mld_mean 1.692922
vo_mean 1.399912
pred_var 1.018965
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 6728.90
2 10 bathy_mean 2 temp_mean 594.17
3 12 AGI_0m 8 ssh_mean 365.47
4 10 bathy_mean 3 sal_mean 336.23
5 10 bathy_mean 8 ssh_mean 334.83
6 13 AGI_60m 10 bathy_mean 328.60
7 10 bathy_mean 4 uo_mean 286.71
8 12 AGI_0m 4 uo_mean 200.22
9 13 AGI_60m 2 temp_mean 196.14
10 5 uostr_mean 2 temp_mean 190.43
[1] "External percent deviance explained"
[1] -4.229205
[1] "TPR"
[1] 0.2524369
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5100 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9233124 -0.9460456 0.004115473 0.9609968 -4.229205 0.8570368
explore_brt (mod_file_path = brt_outputs_Ntag[1 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.2223382
Correlation 0.9477111
AUC 0.9960000
Per.Expl 83.9616359
cvDeviance 0.5036610
cvCorrelation 0.8404252
cvAUC 0.9600700
cvPer.Expl 63.6684121
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.079609
temp_mean 20.440342
AGI_250m 12.927116
AGI_0m 8.557389
uostr_mean 5.276555
ssh_mean 4.592386
sal_mean 4.485481
chl_mean 3.402499
bathy_sd 2.123828
vostr_mean 2.027799
uo_mean 1.572834
vo_mean 1.438505
mld_mean 1.305188
pred_var 0.770471
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 4454.06
2 10 bathy_mean 3 sal_mean 445.32
3 13 AGI_250m 2 temp_mean 412.52
4 13 AGI_250m 3 sal_mean 286.51
5 13 AGI_250m 10 bathy_mean 285.61
6 12 AGI_0m 10 bathy_mean 283.01
7 10 bathy_mean 2 temp_mean 274.09
8 12 AGI_0m 4 uo_mean 267.22
9 12 AGI_0m 8 ssh_mean 234.66
10 12 AGI_0m 11 bathy_sd 234.29
[1] "External percent deviance explained"
[1] -4.130385
[1] "TPR"
[1] 0.253199
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9190934 -0.9393155 0.005542045 0.9595282 -4.130385 0.8396164
explore_brt (mod_file_path = brt_outputs_Ntag[2 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1822323
Correlation 0.9612794
AUC 0.9981000
Per.Expl 86.8546720
cvDeviance 0.4900686
cvCorrelation 0.8457470
cvAUC 0.9625500
cvPer.Expl 64.6489000
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.5789429
temp_mean 20.2111119
AGI_250m 12.8943905
AGI_0m 8.2276644
uostr_mean 5.6018303
sal_mean 4.1762023
ssh_mean 3.7524824
AGI_60m 3.4167547
chl_mean 3.2497693
vostr_mean 1.9240122
bathy_sd 1.8515668
uo_mean 1.5600765
mld_mean 1.3810041
vo_mean 1.3729043
pred_var 0.8012876
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 5794.97
2 10 bathy_mean 3 sal_mean 437.23
3 14 AGI_250m 2 temp_mean 429.14
4 12 AGI_0m 10 bathy_mean 421.21
5 13 AGI_60m 10 bathy_mean 414.15
6 12 AGI_0m 8 ssh_mean 331.82
7 4 uo_mean 2 temp_mean 322.86
8 10 bathy_mean 4 uo_mean 294.72
9 10 bathy_mean 2 temp_mean 294.26
10 14 AGI_250m 3 sal_mean 239.00
11 8 ssh_mean 3 sal_mean 167.69
[1] "External percent deviance explained"
[1] -4.403407
[1] "TPR"
[1] 0.252197
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5150 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9290584 -0.9515003 0.003382733 0.9601138 -4.403407 0.8685467
explore_brt (mod_file_path = brt_outputs_Ntag[3 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1859167
Correlation 0.9590992
AUC 0.9978000
Per.Expl 86.5888987
cvDeviance 0.4697185
cvCorrelation 0.8519017
cvAUC 0.9651800
cvPer.Expl 66.1168520
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.8697060
lat 9.6962485
AGI_0m 6.1261413
bathy_mean 5.4406471
AGI_250m 4.8968222
temp_mean 4.8373778
chl_mean 2.9005605
sal_mean 2.7254972
AGI_60m 2.2333278
ssh_mean 1.9217436
uo_mean 1.3083809
mld_mean 1.2604230
vostr_mean 1.2375953
vo_mean 1.0230573
uostr_mean 0.9963094
bathy_sd 0.9226174
pred_var 0.6035448
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1149.59
2 13 AGI_0m 1 lat 966.63
3 13 AGI_0m 11 bathy_mean 645.51
4 15 AGI_60m 11 bathy_mean 335.90
5 11 bathy_mean 3 temp_mean 275.79
6 11 bathy_mean 5 uo_mean 265.44
7 6 uostr_mean 1 lat 221.40
8 16 AGI_250m 11 bathy_mean 194.80
9 13 AGI_0m 5 uo_mean 193.30
10 11 bathy_mean 1 lat 183.75
11 3 temp_mean 1 lat 176.25
12 15 AGI_60m 3 temp_mean 165.89
13 11 bathy_mean 2 chl_mean 152.85
14 8 vostr_mean 5 uo_mean 137.36
[1] "External percent deviance explained"
[1] -4.428819
[1] "TPR"
[1] 0.252357
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9289301 -0.9497639 0.00375298 0.9585225 -4.428819 0.865889
Summary table of results
output_sum_Ntag <- read.csv (here ("data/brt/mod_outputs/brt_bckg_output_summary_Ntag.csv" ))
kableExtra:: kable (output_sum_Ntag)
base_0m_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
do_0m_Nspat_Ntag
83.930
0.785
0.744
0.906
0.987
0.199
0.919
0.839
do_0m_Yspat_Ntag
86.121
0.810
0.746
0.921
0.990
0.186
0.930
0.861
do_0m_60m_Nspat_Ntag
85.442
0.802
0.745
0.919
0.989
0.189
0.927
0.854
do_0m_250m_Nspat_Ntag
84.002
0.789
0.744
0.910
0.987
0.197
0.920
0.840
do_0m_60m_250m_Nspat_Ntag
86.087
0.809
0.746
0.917
0.990
0.187
0.929
0.861
do_0m_60m_250m_Yspat_Ntag
87.517
0.823
0.747
0.928
0.992
0.179
0.935
0.875
agi_0m_Nspat_Ntag
81.982
0.775
0.743
0.903
0.987
0.204
0.915
0.820
agi_0m_Yspat_Ntag
85.808
0.809
0.746
0.922
0.990
0.186
0.930
0.858
agi_0m_60m_Nspat_Ntag
85.704
0.805
0.745
0.922
0.990
0.187
0.929
0.857
agi_0m_250m_Nspat_Ntag
83.962
0.793
0.744
0.914
0.988
0.195
0.923
0.840
agi_0m_60m_250m_Nspat_Ntag
86.855
0.818
0.746
0.928
0.991
0.179
0.935
0.869
agi_0m_60m_250m_Yspat_Ntag
86.589
0.820
0.746
0.928
0.991
0.180
0.935
0.866
output_sum_Ntag_Nspat <- output_sum_Ntag %>%
filter (! grepl ("Yspat" , model))
ggplot (output_sum_Ntag_Nspat, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial models w/o tag ID
If only considering models that did not include spatial data as model predictors, the AGI models performed much better than the DO models across the board.
The AGI model will all depth layers performed the best and considerably better than the comparable DO model.
For the DO model with all depth layers, DO_0m was the predictor variable with the highest relative influence, but was closely followed by bathymetry. DO_250m was the third most influential predictor, but is considerably lower than DO_0m and bathymetry. Partial plots show drastically different relationships that the CRW PA models, with DO_250m having a positive correlation and DO_0m having an inverse sweet spot.
For the AGI model with all depth layers, bathymetry and temperature were the two predictors with the highest relative influence, and AGI 250m was listed third, somewhat closely followed by AGI 0m. The partial plots for these two variables are similar to the DO models, but less extreme.
Base models w/o tag ID and w/ data at seasonal and annual resolutions
For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_base_0m_seas_Nspat_Ntag.rds" ,
test_data = base_test_seasonal)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862930
Residual.Deviance 0.3718825
Correlation 0.8913160
AUC 0.9811000
Per.Expl 73.1743220
cvDeviance 0.5439165
cvCorrelation 0.8203274
cvAUC 0.9543000
cvPer.Expl 60.7646778
[1] "Relative influence of predictor variables"
rel.inf
vo_mean 37.7484397
vostr_mean 13.0995207
bathy_mean 9.5110530
uostr_mean 8.6973249
ssh_mean 8.4917915
sal_mean 6.2551008
temp_mean 5.2891595
mld_mean 3.9670224
chl_mean 2.9443099
uo_mean 1.9103745
bathy_sd 1.2421311
pred_var 0.8437721
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 sal_mean 1 mld_mean 1130.27
2 10 bathy_mean 6 uostr_mean 473.01
3 8 vostr_mean 4 temp_mean 345.16
4 7 vo_mean 3 ssh_mean 238.10
5 7 vo_mean 2 sal_mean 188.96
6 8 vostr_mean 3 ssh_mean 179.53
7 4 temp_mean 2 sal_mean 164.83
[1] "External percent deviance explained"
[1] -3.262271
[1] "TPR"
[1] 0.2625646
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.8871873 -0.8752483 0.02439327 0.990689 -3.262271 0.7317432
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_base_0m_ann_Nspat_Ntag.rds" ,
test_data = base_test_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862892
Residual.Deviance 0.3485522
Correlation 0.9016247
AUC 0.9844000
Per.Expl 74.8571794
cvDeviance 0.5423354
cvCorrelation 0.8223235
cvAUC 0.9539500
cvPer.Expl 60.8786270
[1] "Relative influence of predictor variables"
rel.inf
vo_mean 38.5581912
vostr_mean 16.9760400
uostr_mean 11.7424763
bathy_mean 10.1812331
chl_mean 5.1572250
sal_mean 4.4084308
temp_mean 3.6019413
ssh_mean 3.1522272
mld_mean 2.3610565
uo_mean 1.8300039
bathy_sd 1.2618894
pred_var 0.7692852
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 vostr_mean 6 uostr_mean 1088.69
2 7 vo_mean 6 uostr_mean 501.60
3 10 bathy_mean 8 vostr_mean 396.66
4 3 ssh_mean 1 mld_mean 391.80
5 8 vostr_mean 3 ssh_mean 319.72
6 8 vostr_mean 4 temp_mean 298.87
7 8 vostr_mean 1 mld_mean 259.45
[1] "External percent deviance explained"
[1] -3.253788
[1] "TPR"
[1] 0.2632895
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.8860142 -0.87391 0.0259118 0.9730752 -3.253788 0.7485718
DO models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2470047
Correlation 0.9382402
AUC 0.9942000
Per.Expl 82.1822454
cvDeviance 0.4886605
cvCorrelation 0.8432439
cvAUC 0.9622300
cvPer.Expl 64.7503346
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m_seas 29.8253334
bathy_mean 28.6073228
o2_mean_250m_seas 12.8824390
o2_mean_60m_seas 7.4286303
ssh_mean 3.7142049
chl_mean 3.4442744
temp_mean 3.3428766
sal_mean 2.5604620
uostr_mean 1.4038380
mld_mean 1.3638226
bathy_sd 1.3610024
vostr_mean 1.2779076
uo_mean 1.2051556
vo_mean 0.8640793
pred_var 0.7186511
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_seas 3 sal_mean 327.14
2 15 o2_mean_250m_seas 10 bathy_mean 303.66
3 10 bathy_mean 1 chl_mean 255.27
4 10 bathy_mean 4 uo_mean 243.69
5 10 bathy_mean 8 ssh_mean 225.29
6 15 o2_mean_250m_seas 13 o2_mean_0m_seas 198.72
7 10 bathy_mean 2 temp_mean 198.07
8 13 o2_mean_0m_seas 8 ssh_mean 191.82
9 13 o2_mean_0m_seas 2 temp_mean 185.50
10 10 bathy_mean 3 sal_mean 165.98
11 13 o2_mean_0m_seas 10 bathy_mean 164.29
[1] "External percent deviance explained"
[1] -3.957167
[1] "TPR"
[1] 0.2572183
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6850 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9093165 -0.9106175 0.01362216 0.9920977 -3.957167 0.8218225
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2393353
Correlation 0.9405329
AUC 0.9946000
Per.Expl 82.7354798
cvDeviance 0.4837428
cvCorrelation 0.8448644
cvAUC 0.9630300
cvPer.Expl 65.1050721
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.7884390
o2_mean_0m_seas 11.3114005
o2_mean_250m_seas 8.0490924
o2_mean_60m_seas 4.3433808
bathy_mean 4.0318500
lat 3.9720294
chl_mean 3.0187321
temp_mean 2.7062765
sal_mean 2.2069866
ssh_mean 1.8282680
mld_mean 1.2179747
uo_mean 1.0978100
vostr_mean 1.0915666
bathy_sd 1.0099016
uostr_mean 0.9010789
vo_mean 0.8156201
pred_var 0.6095928
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 394.73
2 17 o2_mean_250m_seas 11 bathy_mean 298.69
3 11 bathy_mean 2 chl_mean 293.55
4 11 bathy_mean 5 uo_mean 198.66
5 16 o2_mean_60m_seas 4 sal_mean 174.92
6 16 o2_mean_60m_seas 11 bathy_mean 171.41
7 15 o2_mean_0m_seas 1 lat 160.69
8 11 bathy_mean 9 ssh_mean 156.06
9 13 dist_coast 10 mld_mean 151.33
10 11 bathy_mean 4 sal_mean 125.29
11 6 uostr_mean 1 lat 112.97
12 15 o2_mean_0m_seas 4 sal_mean 111.65
13 16 o2_mean_60m_seas 3 temp_mean 100.81
14 9 ssh_mean 4 sal_mean 99.29
[1] "External percent deviance explained"
[1] -4.031325
[1] "TPR"
[1] 0.2570792
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6800 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9113597 -0.9126288 0.0133201 0.990409 -4.031325 0.8273548
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2418740
Correlation 0.9426300
AUC 0.9953000
Per.Expl 82.5523491
cvDeviance 0.5203305
cvCorrelation 0.8308027
cvAUC 0.9580000
cvPer.Expl 62.4658114
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.6559956
o2_mean_0m_ann 22.1577042
o2_mean_250m_ann 13.9331703
temp_mean 7.5838024
o2_mean_60m_ann 7.1954997
chl_mean 4.4958699
sal_mean 3.5287622
ssh_mean 2.9167102
uostr_mean 2.2307148
vostr_mean 1.6575902
mld_mean 1.5932019
bathy_sd 1.5853359
uo_mean 1.4712842
vo_mean 1.1971742
pred_var 0.7971843
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_ann 2 temp_mean 476.01
2 14 o2_mean_60m_ann 10 bathy_mean 249.69
3 10 bathy_mean 2 temp_mean 246.40
4 10 bathy_mean 1 chl_mean 230.23
5 10 bathy_mean 4 uo_mean 223.84
6 10 bathy_mean 3 sal_mean 190.35
7 8 ssh_mean 5 uostr_mean 157.00
8 10 bathy_mean 8 ssh_mean 137.07
9 14 o2_mean_60m_ann 13 o2_mean_0m_ann 133.08
10 13 o2_mean_0m_ann 3 sal_mean 125.77
11 15 o2_mean_250m_ann 13 o2_mean_0m_ann 123.86
[1] "External percent deviance explained"
[1] -3.831515
[1] "TPR"
[1] 0.2571994
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9064786 -0.910928 0.01359871 0.9935801 -3.831515 0.8255235
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2251524
Correlation 0.9476293
AUC 0.9962000
Per.Expl 83.7585691
cvDeviance 0.5016073
cvCorrelation 0.8374576
cvAUC 0.9609200
cvPer.Expl 63.8164123
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.0391515
lat 7.5722313
o2_mean_250m_ann 5.9824841
o2_mean_0m_ann 5.6170844
bathy_mean 5.3985391
chl_mean 4.1449974
temp_mean 3.4237513
o2_mean_60m_ann 3.1501892
sal_mean 2.8294737
ssh_mean 2.2021890
vostr_mean 1.4298185
mld_mean 1.3292879
uo_mean 1.2080682
bathy_sd 1.0629375
uostr_mean 1.0065255
vo_mean 0.9086555
pred_var 0.6946158
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 552.75
2 16 o2_mean_60m_ann 11 bathy_mean 437.44
3 11 bathy_mean 2 chl_mean 329.67
4 16 o2_mean_60m_ann 3 temp_mean 246.01
5 6 uostr_mean 1 lat 204.09
6 13 dist_coast 10 mld_mean 180.83
7 15 o2_mean_0m_ann 1 lat 172.78
8 11 bathy_mean 9 ssh_mean 157.39
9 11 bathy_mean 5 uo_mean 129.80
10 16 o2_mean_60m_ann 4 sal_mean 121.32
11 8 vostr_mean 5 uo_mean 119.76
12 16 o2_mean_60m_ann 1 lat 111.11
13 17 o2_mean_250m_ann 1 lat 97.43
14 3 temp_mean 1 lat 84.62
[1] "External percent deviance explained"
[1] -4.025613
[1] "TPR"
[1] 0.2564111
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8400 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9113126 -0.9164691 0.01199286 0.9927838 -4.025613 0.8375857
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.1951649
Correlation 0.9560043
AUC 0.9974000
Per.Expl 85.9217263
cvDeviance 0.4585485
cvCorrelation 0.8553534
cvAUC 0.9665500
cvPer.Expl 66.9224706
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.7948276
o2_mean_0m 21.5078589
o2_mean_250m_seas 10.0195711
o2_mean_0m_seas 8.3152092
o2_mean_60m_seas 5.4522658
o2_mean_250m_ann 3.7071018
o2_mean_0m_ann 3.4441561
chl_mean 2.6938371
temp_mean 2.5287651
o2_mean_60m_ann 1.9918635
sal_mean 1.9409870
ssh_mean 1.9325752
o2_mean_250m 1.8523367
o2_mean_60m 1.6689402
vostr_mean 1.0311990
bathy_sd 1.0131874
mld_mean 1.0037532
uostr_mean 0.9663831
uo_mean 0.9203555
vo_mean 0.6778727
pred_var 0.5369536
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 370.38
2 18 o2_mean_250m_seas 14 o2_mean_250m 306.19
3 18 o2_mean_250m_seas 11 bathy_mean 305.67
4 13 o2_mean_60m 11 bathy_mean 224.97
5 17 o2_mean_60m_seas 4 sal_mean 204.17
6 11 bathy_mean 2 chl_mean 135.89
7 20 o2_mean_60m_ann 11 bathy_mean 134.81
8 11 bathy_mean 3 temp_mean 133.33
9 11 bathy_mean 5 uo_mean 129.99
10 2 chl_mean 1 o2_mean_0m 114.45
11 11 bathy_mean 4 sal_mean 102.36
12 16 o2_mean_0m_seas 3 temp_mean 96.45
13 16 o2_mean_0m_seas 9 ssh_mean 94.23
14 20 o2_mean_60m_ann 7 vo_mean 90.49
15 11 bathy_mean 9 ssh_mean 87.48
16 4 sal_mean 3 temp_mean 83.53
17 8 vostr_mean 1 o2_mean_0m 81.07
18 21 o2_mean_250m_ann 18 o2_mean_250m_seas 80.52
19 18 o2_mean_250m_seas 1 o2_mean_0m 72.67
20 6 uostr_mean 1 o2_mean_0m 72.01
21 19 o2_mean_0m_ann 17 o2_mean_60m_seas 67.89
22 13 o2_mean_60m 3 temp_mean 66.18
[1] "External percent deviance explained"
[1] -4.334845
[1] "TPR"
[1] 0.2553172
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9223993 -0.9289301 0.009656197 0.9896762 -4.334845 0.8592173
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.1892159
Correlation 0.9579015
AUC 0.9977000
Per.Expl 86.3508545
cvDeviance 0.4502540
cvCorrelation 0.8578266
cvAUC 0.9676900
cvPer.Expl 67.5207993
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 49.9204334
o2_mean_0m 8.5367913
o2_mean_0m_seas 5.4191441
o2_mean_250m_seas 4.6052682
lat 3.6613526
bathy_mean 3.5322270
o2_mean_60m_seas 3.1409177
chl_mean 2.4859345
o2_mean_250m_ann 2.4025619
temp_mean 2.2239870
sal_mean 1.7501465
o2_mean_60m 1.6449577
o2_mean_250m 1.5228690
o2_mean_60m_ann 1.5065172
ssh_mean 1.4200972
o2_mean_0m_ann 0.9673241
vostr_mean 0.9644953
mld_mean 0.9434420
uo_mean 0.8649698
bathy_sd 0.7097241
uostr_mean 0.6677465
vo_mean 0.6428025
pred_var 0.4662903
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 20 o2_mean_250m_seas 12 bathy_mean 350.89
2 12 bathy_mean 3 chl_mean 339.39
3 12 bathy_mean 4 temp_mean 252.74
4 20 o2_mean_250m_seas 16 o2_mean_250m 233.93
5 12 bathy_mean 5 sal_mean 231.91
6 14 dist_coast 9 vostr_mean 230.46
7 22 o2_mean_60m_ann 12 bathy_mean 220.69
8 4 temp_mean 2 o2_mean_0m 210.49
9 14 dist_coast 11 mld_mean 185.40
10 15 o2_mean_60m 12 bathy_mean 177.56
11 3 chl_mean 2 o2_mean_0m 147.74
12 12 bathy_mean 6 uo_mean 144.14
13 12 bathy_mean 10 ssh_mean 142.95
14 9 vostr_mean 2 o2_mean_0m 136.67
15 15 o2_mean_60m 4 temp_mean 103.32
16 21 o2_mean_0m_ann 1 lat 99.30
17 19 o2_mean_60m_seas 5 sal_mean 97.76
18 18 o2_mean_0m_seas 13 bathy_sd 89.75
19 2 o2_mean_0m 1 lat 78.99
20 18 o2_mean_0m_seas 1 lat 77.18
21 22 o2_mean_60m_ann 2 o2_mean_0m 76.72
22 22 o2_mean_60m_ann 1 lat 68.39
23 18 o2_mean_0m_seas 10 ssh_mean 61.97
24 23 o2_mean_250m_ann 20 o2_mean_250m_seas 60.12
25 18 o2_mean_0m_seas 8 vo_mean 56.72
26 21 o2_mean_0m_ann 19 o2_mean_60m_seas 56.55
[1] "External percent deviance explained"
[1] -4.38333
[1] "TPR"
[1] 0.2550322
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9237878 -0.9312165 0.009047522 0.9890362 -4.38333 0.8635085
AGI models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2421421
Correlation 0.9418055
AUC 0.9951000
Per.Expl 82.5329751
cvDeviance 0.5074723
cvCorrelation 0.8361282
cvAUC 0.9592900
cvPer.Expl 63.3932574
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 28.8388522
temp_mean 21.3968548
AGI_250m_seas 15.4230725
uostr_mean 5.4030524
AGI_0m_seas 5.3820944
sal_mean 4.6015206
AGI_60m_seas 4.2373703
chl_mean 3.4946950
ssh_mean 2.7499598
mld_mean 1.7088384
vostr_mean 1.7086736
bathy_sd 1.6872755
uo_mean 1.3773158
vo_mean 1.2699474
pred_var 0.7204775
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 3 sal_mean 513.24
2 15 AGI_250m_seas 2 temp_mean 439.74
3 10 bathy_mean 2 temp_mean 305.07
4 15 AGI_250m_seas 10 bathy_mean 256.15
5 14 AGI_60m_seas 10 bathy_mean 207.41
6 13 AGI_0m_seas 2 temp_mean 201.25
7 13 AGI_0m_seas 6 vo_mean 184.66
8 14 AGI_60m_seas 2 temp_mean 181.04
9 10 bathy_mean 4 uo_mean 143.19
10 2 temp_mean 1 chl_mean 132.10
11 6 vo_mean 3 sal_mean 131.10
[1] "External percent deviance explained"
[1] -3.890904
[1] "TPR"
[1] 0.257047
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9083448 -0.9128889 0.01332724 0.9877051 -3.890904 0.8253298
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2155743
Correlation 0.9500420
AUC 0.9967000
Per.Expl 84.4494519
cvDeviance 0.4841299
cvCorrelation 0.8444286
cvAUC 0.9628100
cvPer.Expl 65.0770781
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.3084109
lat 7.5685093
AGI_250m_seas 6.5650645
bathy_mean 4.9318242
temp_mean 4.7807364
AGI_0m_seas 4.5184742
AGI_60m_seas 3.7734783
sal_mean 3.4432083
chl_mean 3.2167249
ssh_mean 1.8300516
mld_mean 1.4287051
vostr_mean 1.1808924
uo_mean 1.1091414
vo_mean 0.9476019
uostr_mean 0.9377008
bathy_sd 0.8703324
pred_var 0.5891434
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 467.69
2 15 AGI_0m_seas 7 vo_mean 276.78
3 17 AGI_250m_seas 11 bathy_mean 271.41
4 3 temp_mean 1 lat 255.78
5 13 dist_coast 10 mld_mean 230.24
6 15 AGI_0m_seas 4 sal_mean 207.85
7 4 sal_mean 1 lat 151.54
8 11 bathy_mean 2 chl_mean 151.13
9 6 uostr_mean 1 lat 142.70
10 3 temp_mean 2 chl_mean 141.51
11 11 bathy_mean 9 ssh_mean 131.67
12 13 dist_coast 8 vostr_mean 107.96
13 16 AGI_60m_seas 11 bathy_mean 106.29
14 13 dist_coast 1 lat 100.32
[1] "External percent deviance explained"
[1] -4.131152
[1] "TPR"
[1] 0.2558366
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9161296 -0.9231533 0.01083895 0.9905151 -4.131152 0.8444945
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2628935
Correlation 0.9354953
AUC 0.9940000
Per.Expl 81.0360614
cvDeviance 0.5359687
cvCorrelation 0.8240669
cvAUC 0.9553400
cvPer.Expl 61.3376613
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.262030
temp_mean 21.315013
AGI_250m_ann 14.991258
uostr_mean 5.778944
sal_mean 5.124400
AGI_60m_ann 4.443709
chl_mean 3.980368
ssh_mean 3.370811
AGI_0m_ann 2.546665
bathy_sd 2.059610
vostr_mean 1.952209
mld_mean 1.619756
uo_mean 1.510746
vo_mean 1.324461
pred_var 0.720020
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 3 sal_mean 726.81
2 14 AGI_60m_ann 10 bathy_mean 459.19
3 13 AGI_0m_ann 2 temp_mean 378.68
4 15 AGI_250m_ann 2 temp_mean 273.55
5 13 AGI_0m_ann 3 sal_mean 218.74
6 10 bathy_mean 2 temp_mean 216.81
7 14 AGI_60m_ann 2 temp_mean 206.70
8 15 AGI_250m_ann 14 AGI_60m_ann 187.66
9 15 AGI_250m_ann 13 AGI_0m_ann 184.90
10 14 AGI_60m_ann 3 sal_mean 136.67
11 6 vo_mean 3 sal_mean 133.79
[1] "External percent deviance explained"
[1] -3.619672
[1] "TPR"
[1] 0.2575821
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7900 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9025154 -0.9080129 0.0144611 0.9867976 -3.619672 0.8103606
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2360291
Correlation 0.9435098
AUC 0.9956000
Per.Expl 82.9739352
cvDeviance 0.5061817
cvCorrelation 0.8352547
cvAUC 0.9599800
cvPer.Expl 63.4863619
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.6411632
lat 8.4674358
AGI_60m_ann 6.0984294
bathy_mean 5.2557204
AGI_250m_ann 4.8594233
temp_mean 4.7777648
chl_mean 3.8187141
sal_mean 3.5230954
AGI_0m_ann 2.2155331
ssh_mean 2.0640587
mld_mean 1.3580563
vostr_mean 1.3000387
uo_mean 1.1310191
uostr_mean 1.0961652
bathy_sd 0.9743069
vo_mean 0.8236064
pred_var 0.5954694
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 16 AGI_60m_ann 11 bathy_mean 511.73
2 11 bathy_mean 3 temp_mean 409.15
3 6 uostr_mean 1 lat 217.27
4 3 temp_mean 1 lat 197.20
5 15 AGI_0m_ann 3 temp_mean 188.34
6 11 bathy_mean 1 lat 169.31
7 2 chl_mean 1 lat 168.59
8 4 sal_mean 1 lat 164.04
9 17 AGI_250m_ann 16 AGI_60m_ann 155.97
10 15 AGI_0m_ann 4 sal_mean 147.21
11 13 dist_coast 10 mld_mean 142.77
12 13 dist_coast 8 vostr_mean 121.93
13 11 bathy_mean 2 chl_mean 113.84
14 15 AGI_0m_ann 11 bathy_mean 110.48
[1] "External percent deviance explained"
[1] -3.873646
[1] "TPR"
[1] 0.2563679
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9105987 -0.9175181 0.0118898 0.9886284 -3.873646 0.8297394
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.1865064
Correlation 0.9596064
AUC 0.9980000
Per.Expl 86.5462763
cvDeviance 0.4570929
cvCorrelation 0.8569081
cvAUC 0.9661000
cvPer.Expl 67.0274031
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.3057785
temp_mean 19.7302461
AGI_250m_seas 10.1427791
AGI_0m 6.7539420
uostr_mean 5.0843977
sal_mean 3.7797233
AGI_0m_seas 3.4398070
AGI_250m_ann 3.0656614
AGI_250m 2.9094675
AGI_60m_seas 2.6130226
chl_mean 2.2934217
AGI_60m_ann 2.2513231
ssh_mean 2.1152245
AGI_60m 1.3693592
bathy_sd 1.3626378
vostr_mean 1.2977303
AGI_0m_ann 1.1555551
uo_mean 1.0380480
vo_mean 0.9418376
mld_mean 0.9201980
pred_var 0.4298396
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 3310.95
2 20 AGI_60m_ann 10 bathy_mean 357.98
3 19 AGI_0m_ann 16 AGI_0m_seas 305.39
4 16 AGI_0m_seas 6 vo_mean 284.87
5 10 bathy_mean 3 sal_mean 213.26
6 18 AGI_250m_seas 2 temp_mean 183.13
7 18 AGI_250m_seas 10 bathy_mean 172.38
8 12 AGI_0m 5 uostr_mean 170.11
9 19 AGI_0m_ann 10 bathy_mean 160.66
10 4 uo_mean 2 temp_mean 149.15
11 20 AGI_60m_ann 16 AGI_0m_seas 137.50
12 16 AGI_0m_seas 13 AGI_60m 136.91
13 12 AGI_0m 10 bathy_mean 134.92
14 10 bathy_mean 2 temp_mean 129.71
15 21 AGI_250m_ann 3 sal_mean 113.98
16 10 bathy_mean 4 uo_mean 111.18
17 19 AGI_0m_ann 11 bathy_sd 109.09
18 12 AGI_0m 3 sal_mean 106.63
19 5 uostr_mean 2 temp_mean 94.19
20 12 AGI_0m 8 ssh_mean 86.97
21 18 AGI_250m_seas 14 AGI_250m 85.14
22 13 AGI_60m 10 bathy_mean 83.26
[1] "External percent deviance explained"
[1] -4.308013
[1] "TPR"
[1] 0.254485
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8350 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9259021 -0.9373609 0.00797838 0.9897775 -4.308013 0.8654628
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.1794387
Correlation 0.9613569
AUC 0.9982000
Per.Expl 87.0561087
cvDeviance 0.4428947
cvCorrelation 0.8609286
cvAUC 0.9681400
cvPer.Expl 68.0515947
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 50.6297312
lat 7.4773131
AGI_0m 5.8649772
AGI_60m_ann 4.3063983
bathy_mean 3.7610000
temp_mean 3.3346837
AGI_0m_seas 3.0194786
AGI_250m_seas 3.0002579
sal_mean 2.3172896
chl_mean 2.2989785
AGI_60m_seas 1.9360137
AGI_250m_ann 1.6666857
AGI_250m 1.5038736
ssh_mean 1.3536386
AGI_0m_ann 1.2578651
AGI_60m 1.1737055
vostr_mean 0.9598372
mld_mean 0.8927208
uo_mean 0.8547717
uostr_mean 0.7902823
vo_mean 0.6354592
bathy_sd 0.6025325
pred_var 0.3625061
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 868.66
2 13 AGI_0m 1 lat 559.83
3 22 AGI_60m_ann 11 bathy_mean 478.77
4 21 AGI_0m_ann 18 AGI_0m_seas 388.44
5 18 AGI_0m_seas 7 vo_mean 227.09
6 13 AGI_0m 11 bathy_mean 216.84
7 14 dist_coast 10 mld_mean 212.25
8 20 AGI_250m_seas 11 bathy_mean 183.69
9 21 AGI_0m_ann 11 bathy_mean 174.45
10 18 AGI_0m_seas 15 AGI_60m 169.88
11 14 dist_coast 8 vostr_mean 150.05
12 6 uostr_mean 1 lat 144.30
13 22 AGI_60m_ann 18 AGI_0m_seas 105.52
14 15 AGI_60m 11 bathy_mean 78.87
15 3 temp_mean 1 lat 68.14
16 14 dist_coast 1 lat 67.04
17 19 AGI_60m_seas 13 AGI_0m 66.47
18 23 AGI_250m_ann 4 sal_mean 62.44
19 11 bathy_mean 2 chl_mean 62.31
20 11 bathy_mean 3 temp_mean 61.34
21 23 AGI_250m_ann 9 ssh_mean 61.03
22 22 AGI_60m_ann 4 sal_mean 57.74
23 13 AGI_0m 6 uostr_mean 57.31
24 11 bathy_mean 9 ssh_mean 56.58
25 23 AGI_250m_ann 22 AGI_60m_ann 45.98
26 13 AGI_0m 9 ssh_mean 43.15
[1] "External percent deviance explained"
[1] -4.421388
[1] "TPR"
[1] 0.2541551
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8150 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.9284584 -0.9393181 0.007349749 0.9900364 -4.421388 0.8705611
Summary table of results
output_sum_seas_ann <- read.csv (here ("data/brt/mod_outputs/brt_background_seas_ann_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
base_0m_Nspat_Ytag
92.976
0.876
0.761
0.961
0.994
0.141
0.960
0.930
base_0m_Yspat_Ytag
93.544
0.887
0.770
0.964
0.995
0.125
0.963
0.935
do_0m_Nspat_Ytag
94.201
0.901
0.772
0.971
0.996
0.124
0.969
0.942
do_0m_Yspat_Ytag
95.618
0.920
0.788
0.977
0.997
0.110
0.976
0.956
do_0m_60m_Nspat_Ytag
94.865
0.908
0.775
0.973
0.997
0.119
0.972
0.949
do_0m_250m_Nspat_Ytag
95.069
0.909
0.783
0.974
0.996
0.119
0.972
0.951
do_0m_60m_250m_Nspat_Ytag
95.132
0.913
0.783
0.976
0.997
0.116
0.973
0.951
do_0m_60m_250m_Yspat_Ytag
95.186
0.918
0.784
0.977
0.997
0.113
0.975
0.952
agi_0m_Nspat_Ytag
93.845
0.901
0.765
0.971
0.997
0.124
0.970
0.938
agi_0m_Yspat_Ytag
94.754
0.916
0.776
0.975
0.998
0.114
0.974
0.948
agi_0m_60m_Nspat_Ytag
94.548
0.908
0.765
0.973
0.997
0.119
0.972
0.945
agi_0m_250m_Nspat_Ytag
93.059
0.897
0.767
0.967
0.997
0.129
0.967
0.931
agi_0m_60m_250m_Nspat_Ytag
94.111
0.907
0.767
0.972
0.997
0.122
0.971
0.941
agi_0m_60m_250m_Yspat_Ytag
95.406
0.920
0.777
0.976
0.998
0.111
0.975
0.954
output_sum_seas_ann_Nspat <- output_sum_seas_ann %>%
filter (! grepl ("Yspat" , model))
ggplot (output_sum_seas_ann_Nspat, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial seasonal/annual models
Seasonal and annual base models performed better than the daily resolution base models, with the annual base model performing better than the seasonal one.
The DO and AGI models with all depth layers and temporal resolutions were by far the best performing and had nearly identical scores across evaluation metrics. The models that also included spatial predictors also performed slightly better than those without, but were still fairly comparable.
For the DO model with all temporal resolutions, the top predictor variables with the highest relative importance were bathymetry and DO_0m_daily. The next variables that have considerably lower values are DO_250m_seasonal and DO_0m_seasonal. Partial plots follow similar trends as previously described.
For the AGI model with all temporal resolutions, bathymetry and temperature were the two predictors with the highest relative influence. The next variables that have considerably lower values are AGI_250m_seasonal and AGI_0m_seasonal.
Model fine-tuning and selection
Here, I take the two best performing models from the above sections (agi and do with all depths and temporal resolutions without tag ID or spatial variables as predictors) to be used as overfit reference models. The following model options excluded the wind predictors as these consistently had lower relative importance than the random predictor variable we included. I also included a combo model that uses information about AGI at 250 m and DO at 0m across temporal resolutions. Lastly, the final models also remove do/agi at 60m and at a seasonal resolution, as these were typically the vars with the lowest predictive performance relative to the other depth layers and resolutions.
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_base_0m_dail_no_wind.rds" ,
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862910
Residual.Deviance 0.3084231
Correlation 0.9196032
AUC 0.9905000
Per.Expl 77.7519251
cvDeviance 0.6001754
cvCorrelation 0.7986359
cvAUC 0.9450200
cvPer.Expl 56.7063877
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 39.840307
temp_mean 26.480383
sal_mean 8.651181
ssh_mean 8.204863
chl_mean 6.841676
bathy_sd 4.361221
mld_mean 3.560571
pred_var 2.059798
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 6 bathy_mean 4 ssh_mean 1488.69
2 6 bathy_mean 2 temp_mean 1321.98
3 6 bathy_mean 3 sal_mean 1048.20
[1] "External percent deviance explained"
[1] 0.731709
[1] "TPR"
[1] 0.740062
[1] "TSS"
[1] 0.8763352
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4750 iterations were performed.
There were 8 predictors of which 8 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2267729 0.8935136 0.9811617 0.9938523 0.731709 0.7775193
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.1909322
Correlation 0.9569687
AUC 0.9975000
Per.Expl 86.2271541
cvDeviance 0.4463350
cvCorrelation 0.8600541
cvAUC 0.9682600
cvPer.Expl 67.8037354
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.2112221
o2_mean_0m 21.2023536
o2_mean_250m_seas 10.1374648
o2_mean_0m_seas 8.4989877
o2_mean_60m_seas 5.2518224
o2_mean_250m_ann 4.0182405
o2_mean_0m_ann 3.4428506
chl_mean 2.9180736
temp_mean 2.7140167
o2_mean_250m 2.6253593
ssh_mean 2.6104859
sal_mean 2.4001687
o2_mean_60m_ann 2.3289369
o2_mean_60m 2.3143331
bathy_sd 1.3897784
mld_mean 1.2730290
pred_var 0.6628767
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 780.03
2 16 o2_mean_60m_ann 7 bathy_mean 417.55
3 13 o2_mean_60m_seas 4 sal_mean 365.89
4 7 bathy_mean 2 chl_mean 256.48
5 4 sal_mean 1 o2_mean_0m 238.13
6 16 o2_mean_60m_ann 13 o2_mean_60m_seas 224.10
7 16 o2_mean_60m_ann 9 o2_mean_60m 214.13
8 12 o2_mean_0m_seas 3 temp_mean 202.33
9 7 bathy_mean 5 ssh_mean 191.77
10 9 o2_mean_60m 7 bathy_mean 190.11
11 2 chl_mean 1 o2_mean_0m 171.76
12 5 ssh_mean 4 sal_mean 164.64
13 16 o2_mean_60m_ann 3 temp_mean 158.76
14 7 bathy_mean 3 temp_mean 143.24
[1] "External percent deviance explained"
[1] 0.8138392
[1] "TPR"
[1] 0.7447363
[1] "TSS"
[1] 0.9235923
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.182767 0.9320937 0.9901405 0.9997562 0.8138392 0.8622715
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862937
Residual.Deviance 0.1804510
Correlation 0.9611911
AUC 0.9978000
Per.Expl 86.9832054
cvDeviance 0.4441876
cvCorrelation 0.8615975
cvAUC 0.9680800
cvPer.Expl 67.9586240
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.0158951
temp_mean 22.6707636
AGI_250m_seas 9.1916259
AGI_0m 6.6966664
AGI_0m_seas 3.8422741
sal_mean 3.5425059
AGI_250m_ann 3.2096436
ssh_mean 3.1559174
AGI_60m_seas 2.9237078
AGI_60m_ann 2.8613663
chl_mean 2.6961486
AGI_250m 2.6213203
AGI_60m 1.6572538
AGI_0m_ann 1.6292092
bathy_sd 1.6259866
mld_mean 1.0917162
pred_var 0.5679992
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 4282.43
2 15 AGI_0m_ann 6 bathy_mean 498.47
3 16 AGI_60m_ann 6 bathy_mean 428.77
4 15 AGI_0m_ann 12 AGI_0m_seas 302.09
5 6 bathy_mean 3 sal_mean 265.89
6 3 sal_mean 2 temp_mean 254.57
7 12 AGI_0m_seas 9 AGI_60m 224.51
8 14 AGI_250m_seas 2 temp_mean 215.85
9 8 AGI_0m 4 ssh_mean 210.70
10 12 AGI_0m_seas 2 temp_mean 189.86
11 16 AGI_60m_ann 12 AGI_0m_seas 180.44
12 6 bathy_mean 2 temp_mean 138.32
13 9 AGI_60m 6 bathy_mean 130.20
14 17 AGI_250m_ann 3 sal_mean 123.95
[1] "External percent deviance explained"
[1] 0.8308918
[1] "TPR"
[1] 0.7459277
[1] "TSS"
[1] 0.9328179
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1727172 0.9399568 0.9925128 0.9983244 0.8308918 0.8698321
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_250_DO_0_dail_seas_ann.rds" ,
test_data = readRDS (here ("data/brt/mod_eval/back/agi_do_test_daily_seasonal_annual.rds" )))
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862937
Residual.Deviance 0.3060410
Correlation 0.9199450
AUC 0.9904000
Per.Expl 77.9237999
cvDeviance 0.5653657
cvCorrelation 0.8125080
cvAUC 0.9506300
cvPer.Expl 59.2174651
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.7670441
temp_mean 23.3647635
AGI_250m_seas 12.4431526
ssh_mean 5.8816844
sal_mean 5.3796841
chl_mean 4.5083168
AGI_250m_ann 4.4835903
AGI_250m 3.0054229
bathy_sd 2.2749334
mld_mean 1.9481681
pred_var 1.1300635
o2_mean_0m_seas 0.9638811
o2_mean_0m 0.9352572
o2_mean_0m_ann 0.9140382
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 AGI_250m_seas 3 sal_mean 566.08
2 6 bathy_mean 3 sal_mean 463.43
3 10 AGI_250m_seas 8 AGI_250m 362.22
4 6 bathy_mean 2 temp_mean 352.20
5 4 ssh_mean 3 sal_mean 297.74
6 3 sal_mean 2 temp_mean 283.95
7 2 temp_mean 1 chl_mean 272.89
8 10 AGI_250m_seas 6 bathy_mean 255.92
9 10 AGI_250m_seas 2 temp_mean 254.76
10 11 AGI_250m_ann 10 AGI_250m_seas 219.10
[1] "External percent deviance explained"
[1] 0.7119036
[1] "TPR"
[1] 0.73811
[1] "TSS"
[1] 0.8507341
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2394772 0.8793008 0.9773403 0.9982518 0.7119036 0.779238
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_seas_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.2038413
Correlation 0.9528997
AUC 0.9971000
Per.Expl 85.2959607
cvDeviance 0.4624112
cvCorrelation 0.8532735
cvAUC 0.9663100
cvPer.Expl 66.6440829
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.1431098
o2_mean_0m 21.2576581
o2_mean_250m_seas 14.6524628
o2_mean_0m_seas 10.3343085
o2_mean_250m_ann 3.7807478
chl_mean 3.5499333
temp_mean 3.4191758
o2_mean_250m 3.3145643
o2_mean_0m_ann 2.9473000
sal_mean 2.8866095
ssh_mean 2.6457934
bathy_sd 1.7336484
mld_mean 1.5223918
pred_var 0.8122965
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 550.04
2 12 o2_mean_250m_seas 4 sal_mean 330.52
3 5 ssh_mean 4 sal_mean 277.30
4 7 bathy_mean 2 chl_mean 228.67
5 13 o2_mean_0m_ann 3 temp_mean 217.07
6 7 bathy_mean 3 temp_mean 200.66
7 11 o2_mean_0m_seas 5 ssh_mean 200.53
8 2 chl_mean 1 o2_mean_0m 197.61
9 7 bathy_mean 5 ssh_mean 180.78
10 14 o2_mean_250m_ann 7 bathy_mean 176.73
[1] "External percent deviance explained"
[1] 0.8042935
[1] "TPR"
[1] 0.7442145
[1] "TSS"
[1] 0.9193297
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1882474 0.9277648 0.9891198 0.9997909 0.8042935 0.8529596
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.2049642
Correlation 0.9527920
AUC 0.9969000
Per.Expl 85.2149594
cvDeviance 0.4668691
cvCorrelation 0.8528284
cvAUC 0.9654000
cvPer.Expl 66.3225079
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 29.2528201
bathy_mean 27.2139698
o2_mean_250m_ann 10.3488658
o2_mean_60m_ann 5.1891939
o2_mean_60m 4.3862218
o2_mean_250m 4.1055387
o2_mean_0m_ann 3.5061360
chl_mean 3.3512300
temp_mean 3.2633686
ssh_mean 2.7930406
sal_mean 2.7415436
bathy_sd 1.5875312
mld_mean 1.4668771
pred_var 0.7936627
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 868.81
2 13 o2_mean_60m_ann 9 o2_mean_60m 616.51
3 13 o2_mean_60m_ann 7 bathy_mean 508.10
4 2 chl_mean 1 o2_mean_0m 331.31
5 7 bathy_mean 5 ssh_mean 225.81
6 7 bathy_mean 2 chl_mean 202.56
7 4 sal_mean 1 o2_mean_0m 189.29
8 9 o2_mean_60m 7 bathy_mean 174.66
9 5 ssh_mean 4 sal_mean 147.68
10 5 ssh_mean 1 o2_mean_0m 141.09
[1] "External percent deviance explained"
[1] 0.8031895
[1] "TPR"
[1] 0.7442169
[1] "TSS"
[1] 0.9181552
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8500 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1884739 0.927647 0.9890864 1.001509 0.8031895 0.8521496
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_seas_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.2292116
Correlation 0.9440405
AUC 0.9955000
Per.Expl 83.4658799
cvDeviance 0.4740094
cvCorrelation 0.8484897
cvAUC 0.9646700
cvPer.Expl 65.8074456
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m_seas 27.9505776
bathy_mean 26.9351820
o2_mean_250m_seas 9.8844047
o2_mean_60m_seas 6.3680630
o2_mean_250m_ann 5.0995438
o2_mean_0m_ann 3.8092643
ssh_mean 3.7296212
chl_mean 3.3491603
o2_mean_60m_ann 3.2545759
temp_mean 3.2487986
sal_mean 2.7231927
mld_mean 1.4744168
bathy_sd 1.4443932
pred_var 0.7288057
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 o2_mean_60m_seas 3 sal_mean 479.91
2 13 o2_mean_60m_ann 2 temp_mean 478.20
3 13 o2_mean_60m_ann 6 bathy_mean 418.57
4 13 o2_mean_60m_ann 10 o2_mean_60m_seas 254.04
5 12 o2_mean_0m_ann 3 sal_mean 220.61
6 9 o2_mean_0m_seas 4 ssh_mean 220.06
7 12 o2_mean_0m_ann 2 temp_mean 194.82
8 6 bathy_mean 4 ssh_mean 187.13
9 11 o2_mean_250m_seas 10 o2_mean_60m_seas 179.16
10 13 o2_mean_60m_ann 12 o2_mean_0m_ann 167.38
[1] "External percent deviance explained"
[1] 0.7919195
[1] "TPR"
[1] 0.7437238
[1] "TSS"
[1] 0.9095314
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1958424 0.9215876 0.9881686 0.9957744 0.7919195 0.8346588
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.2448288
Correlation 0.9388358
AUC 0.9943000
Per.Expl 82.3393335
cvDeviance 0.4840649
cvCorrelation 0.8454748
cvAUC 0.9630700
cvPer.Expl 65.0820953
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 29.2796312
bathy_mean 28.8141343
o2_mean_250m_ann 13.4961708
o2_mean_250m 5.7919607
o2_mean_0m_ann 4.5850995
temp_mean 3.9269822
chl_mean 3.6838678
sal_mean 3.1729942
ssh_mean 2.9109508
bathy_sd 1.9790000
mld_mean 1.4918958
pred_var 0.8673126
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 944.89
2 2 chl_mean 1 o2_mean_0m 382.19
3 11 o2_mean_0m_ann 3 temp_mean 371.49
4 5 ssh_mean 4 sal_mean 265.73
5 9 o2_mean_250m 4 sal_mean 251.19
6 7 bathy_mean 2 chl_mean 230.35
7 7 bathy_mean 5 ssh_mean 220.35
[1] "External percent deviance explained"
[1] 0.778455
[1] "TPR"
[1] 0.7426068
[1] "TSS"
[1] 0.9045105
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2031527 0.9150966 0.9860506 1.001674 0.778455 0.8233933
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_ann_refined.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862943
Residual.Deviance 0.2398020
Correlation 0.9406621
AUC 0.9948000
Per.Expl 82.7019431
cvDeviance 0.4915797
cvCorrelation 0.8422300
cvAUC 0.9620000
cvPer.Expl 64.5400167
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 31.140330
bathy_mean 28.749425
o2_mean_250m_ann 19.411834
temp_mean 4.579720
chl_mean 4.062427
sal_mean 3.623802
ssh_mean 3.214311
bathy_sd 2.324424
mld_mean 1.777783
pred_var 1.115944
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 1044.00
2 2 chl_mean 1 o2_mean_0m 519.00
3 5 ssh_mean 4 sal_mean 363.68
4 7 bathy_mean 3 temp_mean 352.14
5 7 bathy_mean 5 ssh_mean 333.50
[1] "External percent deviance explained"
[1] 0.7801465
[1] "TPR"
[1] 0.7428095
[1] "TSS"
[1] 0.9019239
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8050 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2024326 0.9157098 0.9864209 1.001757 0.7801465 0.8270194
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_seas_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.2010998
Correlation 0.9542854
AUC 0.9970000
Per.Expl 85.4936623
cvDeviance 0.4550922
cvCorrelation 0.8583417
cvAUC 0.9666300
cvPer.Expl 67.1719242
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.944500
temp_mean 20.937253
AGI_250m_seas 10.045099
AGI_0m 7.256943
ssh_mean 6.121208
AGI_250m_ann 4.606362
sal_mean 4.514486
AGI_0m_seas 4.190922
chl_mean 3.030145
AGI_250m 2.700179
bathy_sd 2.167797
AGI_0m_ann 2.088657
mld_mean 1.396450
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 4136.97
2 6 bathy_mean 3 sal_mean 529.40
3 3 sal_mean 2 temp_mean 284.85
4 13 AGI_250m_ann 3 sal_mean 275.46
5 11 AGI_250m_seas 6 bathy_mean 272.17
6 8 AGI_0m 4 ssh_mean 255.06
7 13 AGI_250m_ann 12 AGI_0m_ann 244.55
8 12 AGI_0m_ann 10 AGI_0m_seas 213.62
[1] "External percent deviance explained"
[1] 0.8167617
[1] "TPR"
[1] 0.7449762
[1] "TSS"
[1] 0.9286281
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1797948 0.9348405 0.9907361 0.9974215 0.8167617 0.8549366
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862937
Residual.Deviance 0.2032064
Correlation 0.9540643
AUC 0.9968000
Per.Expl 85.3417499
cvDeviance 0.4663864
cvCorrelation 0.8534616
cvAUC 0.9653400
cvPer.Expl 66.3573172
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.5304868
temp_mean 23.4915348
AGI_250m_ann 9.6682724
AGI_0m 8.1928263
AGI_250m 4.5223145
AGI_60m_ann 4.1493352
sal_mean 3.8868656
ssh_mean 3.5482652
chl_mean 3.2514216
AGI_60m 2.3822009
AGI_0m_ann 2.2132672
bathy_sd 2.0657467
mld_mean 1.3576929
pred_var 0.7397698
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 6156.07
2 12 AGI_0m_ann 6 bathy_mean 479.59
3 6 bathy_mean 3 sal_mean 425.20
4 13 AGI_60m_ann 6 bathy_mean 339.23
5 8 AGI_0m 6 bathy_mean 242.36
6 3 sal_mean 2 temp_mean 206.38
7 6 bathy_mean 2 temp_mean 198.40
8 9 AGI_60m 6 bathy_mean 162.81
9 8 AGI_0m 4 ssh_mean 153.26
10 14 AGI_250m_ann 6 bathy_mean 147.24
[1] "External percent deviance explained"
[1] 0.8166798
[1] "TPR"
[1] 0.7452172
[1] "TSS"
[1] 0.9252425
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8900 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1810393 0.9338172 0.9912345 0.9977871 0.8166798 0.8534175
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_seas_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862937
Residual.Deviance 0.2074970
Correlation 0.9534637
AUC 0.9970000
Per.Expl 85.0322475
cvDeviance 0.4828283
cvCorrelation 0.8458918
cvAUC 0.9632300
cvPer.Expl 65.1712825
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.5184692
temp_mean 23.3377420
AGI_250m_seas 10.3276529
AGI_250m_ann 5.5760228
AGI_0m_seas 5.3250063
sal_mean 4.6047463
AGI_60m_seas 4.1155872
chl_mean 3.4446687
AGI_60m_ann 3.1418888
ssh_mean 3.0303836
AGI_0m_ann 2.1425244
bathy_sd 1.9534824
mld_mean 1.6448171
pred_var 0.8370084
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 6 bathy_mean 3 sal_mean 711.68
2 12 AGI_0m_ann 6 bathy_mean 407.71
3 13 AGI_60m_ann 6 bathy_mean 383.91
4 3 sal_mean 2 temp_mean 325.84
5 13 AGI_60m_ann 9 AGI_0m_seas 286.42
6 10 AGI_60m_seas 2 temp_mean 250.03
7 9 AGI_0m_seas 1 chl_mean 247.46
8 14 AGI_250m_ann 11 AGI_250m_seas 235.08
9 11 AGI_250m_seas 2 temp_mean 227.58
10 12 AGI_0m_ann 9 AGI_0m_seas 189.43
[1] "External percent deviance explained"
[1] 0.8063937
[1] "TPR"
[1] 0.7446677
[1] "TSS"
[1] 0.9211093
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9350 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1861977 0.929935 0.9901722 0.9965861 0.8063937 0.8503225
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862937
Residual.Deviance 0.2335348
Correlation 0.9436237
AUC 0.9953000
Per.Expl 83.1540136
cvDeviance 0.4849331
cvCorrelation 0.8459238
cvAUC 0.9627400
cvPer.Expl 65.0194514
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.5059991
temp_mean 23.2340032
AGI_250m_ann 10.6427249
AGI_0m 8.6595033
ssh_mean 5.0994310
sal_mean 4.4591895
AGI_250m 4.4112764
chl_mean 3.6254212
AGI_0m_ann 2.6455869
bathy_sd 2.4434493
mld_mean 1.3756022
pred_var 0.8978129
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 6122.61
2 6 bathy_mean 3 sal_mean 400.07
3 11 AGI_0m_ann 2 temp_mean 290.87
4 11 AGI_0m_ann 6 bathy_mean 235.60
5 12 AGI_250m_ann 11 AGI_0m_ann 218.64
6 8 AGI_0m 4 ssh_mean 217.18
7 6 bathy_mean 2 temp_mean 206.81
[1] "External percent deviance explained"
[1] 0.7947049
[1] "TPR"
[1] 0.7441577
[1] "TSS"
[1] 0.9103937
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8050 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1946114 0.9229508 0.9891775 0.9984766 0.7947049 0.8315401
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_ann_refined.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862937
Residual.Deviance 0.2609379
Correlation 0.9333431
AUC 0.9931000
Per.Expl 81.1773004
cvDeviance 0.5019193
cvCorrelation 0.8391199
cvAUC 0.9603500
cvPer.Expl 63.7941598
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 33.849922
temp_mean 22.651298
AGI_250m_ann 14.888652
AGI_0m 9.378249
ssh_mean 5.130144
sal_mean 5.086498
chl_mean 3.946693
bathy_sd 2.378719
mld_mean 1.594325
pred_var 1.095502
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 6464.78
2 6 bathy_mean 3 sal_mean 374.16
3 10 AGI_250m_ann 2 temp_mean 326.45
4 10 AGI_250m_ann 3 sal_mean 289.62
5 6 bathy_mean 2 temp_mean 284.01
[1] "External percent deviance explained"
[1] 0.7778535
[1] "TPR"
[1] 0.7429219
[1] "TSS"
[1] 0.9018982
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7500 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2038216 0.9148892 0.9868442 0.9964849 0.7778535 0.811773
Summary table of results
output_sum_refined <- read.csv (here ("data/brt/mod_outputs/brt_bckg_refined_output_summary.csv" ))
kableExtra:: kable (output_sum_refined)
brt_base_0m_dail_no_wind
77.752
0.732
0.740
0.876
0.981
0.227
0.894
0.778
brt_do_0m_60m_250m_dail_seas_ann_no_wind
86.227
0.814
0.745
0.924
0.990
0.183
0.932
0.862
brt_agi_0m_60m_250m_dail_seas_ann_no_wind
86.983
0.831
0.746
0.933
0.993
0.173
0.939
0.869
brt_agi_250_do_0_dail_seas_ann
77.924
0.712
0.738
0.851
0.997
0.239
0.879
0.779
brt_do_0m_250m_dail_seas_ann
85.296
0.804
0.744
0.919
0.989
0.188
0.928
0.853
brt_do_0m_60m_250m_dail_ann
85.215
0.803
0.744
0.918
0.989
0.188
0.928
0.852
brt_do_0m_60m_250m_seas_ann
83.466
0.792
0.744
0.910
0.988
0.196
0.922
0.845
brt_do_0m_250m_dail_ann
82.339
0.778
0.743
0.905
0.986
0.203
0.915
0.823
brt_do_0m_250m_dail_ann_refined
82.701
0.780
0.743
0.902
0.986
0.202
0.916
0.827
brt_agi_0m_250m_dail_seas_ann
85.494
0.817
0.745
0.929
0.991
0.180
0.935
0.855
brt_agi_0m_60m_250m_dail_ann
85.342
0.817
0.745
0.925
0.991
0.181
0.934
0.853
brt_agi_0m_60m_250m_seas_ann
85.032
0.806
0.745
0.921
0.990
0.186
0.930
0.850
brt_agi_0m_250m_dail_ann
83.154
0.795
0.744
0.910
0.989
0.195
0.923
0.831
brt_agi_0m_250m_dail_ann_refined
81.177
0.778
0.743
0.902
0.987
0.204
0.915
0.812
ggplot (output_sum_refined, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from refined mdoels