On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps.
The majority of the predictors included in the following models are at a daily temporal resolution. However, for the DO and AGI models, we also investigated the inclusion of these two predictors at seasonal and annual temporal resolutions. The remaining environmental predictors are also available at these resolutions, and can included in follow-up models.
The pseudo absences used in these models were generated using correlated random walk approaches, but another quarto document includes models with background sampling pseudo absences. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.
The hypotheses I would like to test with these models are as follows:
H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.
study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?
H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.
study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?
H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.
study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?
H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.
study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?
H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.
study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?
Base models
These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.
explore_brt (mod_file_path = brt_outputs[7 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.7986447
Correlation 0.7174286
AUC 0.9148000
Per.Expl 42.3894599
cvDeviance 1.0127285
cvCorrelation 0.5642493
cvAUC 0.8220800
cvPer.Expl 26.9464423
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.367481
temp_mean 18.477169
sal_mean 10.540214
chl_mean 8.710175
ssh_mean 6.015610
mld_mean 5.861958
vostr_mean 5.303627
bathy_sd 5.270798
vo_mean 3.496298
uo_mean 3.392592
uostr_mean 2.849782
pred_var 2.714296
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 363.44
2 6 vo_mean 3 sal_mean 193.16
3 12 pred_var 4 uo_mean 168.67
4 8 ssh_mean 2 temp_mean 162.10
5 2 temp_mean 1 chl_mean 129.68
6 10 bathy_mean 8 ssh_mean 110.34
7 8 ssh_mean 1 chl_mean 98.98
[1] "External percent deviance explained"
[1] 0.3850404
[1] "TPR"
[1] 0.6952086
[1] "TSS"
[1] 0.6133057
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4150 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3706824 0.6788681 0.8917619 1.003817 0.3850404 0.4238946
explore_brt (mod_file_path = brt_outputs[8 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.3815785
Correlation 0.8963614
AUC 0.9887000
Per.Expl 72.4746916
cvDeviance 0.6107316
cvCorrelation 0.7793802
cvAUC 0.9410600
cvPer.Expl 55.9446474
[1] "Relative influence of predictor variables"
rel.inf
tag 50.031513
bathy_mean 16.608199
temp_mean 8.783888
sal_mean 5.805500
chl_mean 3.982330
ssh_mean 3.974665
vostr_mean 2.251400
mld_mean 2.045141
bathy_sd 1.668791
vo_mean 1.313441
uostr_mean 1.275039
uo_mean 1.217342
pred_var 1.042751
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 1 tag 1234.81
2 3 temp_mean 1 tag 1203.85
3 4 sal_mean 1 tag 1164.68
4 9 ssh_mean 1 tag 423.01
5 2 chl_mean 1 tag 377.50
6 12 bathy_sd 1 tag 206.76
7 13 pred_var 1 tag 181.75
8 11 bathy_mean 3 temp_mean 178.96
[1] "External percent deviance explained"
[1] 0.6770616
[1] "TPR"
[1] 0.7374214
[1] "TSS"
[1] 0.8360522
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2551359 0.864596 0.9762279 0.9982626 0.6770616 0.7247469
explore_brt (mod_file_path = brt_outputs[9 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.3356425
Correlation 0.9105272
AUC 0.9916000
Per.Expl 75.7883011
cvDeviance 0.5396134
cvCorrelation 0.8105200
cvAUC 0.9550200
cvPer.Expl 61.0747802
[1] "Relative influence of predictor variables"
rel.inf
tag 47.5168117
dist_coast 18.2926688
lat 7.4869016
bathy_mean 5.5577763
temp_mean 4.8362634
sal_mean 4.3607456
chl_mean 3.0609643
ssh_mean 1.8297866
vostr_mean 1.6011683
mld_mean 1.4125607
pred_var 0.9512854
bathy_sd 0.9198428
vo_mean 0.7636076
uo_mean 0.7143868
uostr_mean 0.6952301
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 944.49
2 12 bathy_mean 1 tag 585.86
3 4 temp_mean 1 tag 510.39
4 5 sal_mean 1 tag 378.67
5 14 dist_coast 1 tag 349.40
6 3 chl_mean 1 tag 287.67
7 10 ssh_mean 1 tag 178.03
8 11 mld_mean 1 tag 147.57
9 15 pred_var 1 tag 129.07
10 13 bathy_sd 1 tag 105.68
11 8 vo_mean 1 tag 86.99
[1] "External percent deviance explained"
[1] 0.7120117
[1] "TPR"
[1] 0.7398529
[1] "TSS"
[1] 0.8502702
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2400452 0.8807886 0.9809981 0.9991846 0.7120117 0.757883
DO models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
explore_brt (mod_file_path = brt_outputs[14 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3442681
Correlation 0.9078579
AUC 0.9908000
Per.Expl 75.1662880
cvDeviance 0.5595631
cvCorrelation 0.8035769
cvAUC 0.9512600
cvPer.Expl 59.6360292
[1] "Relative influence of predictor variables"
rel.inf
tag 46.1232094
bathy_mean 16.5549379
o2_mean_0m 14.2363652
temp_mean 4.3829271
sal_mean 4.0579305
chl_mean 3.6413222
ssh_mean 2.7853999
mld_mean 1.8441505
bathy_sd 1.3707436
vostr_mean 1.3122101
vo_mean 1.0065403
pred_var 0.9193902
uostr_mean 0.9162099
uo_mean 0.8486632
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 1167.29
2 2 o2_mean_0m 1 tag 789.99
3 4 temp_mean 1 tag 769.65
4 5 sal_mean 1 tag 747.93
5 10 ssh_mean 1 tag 246.55
6 3 chl_mean 1 tag 244.15
7 4 temp_mean 2 o2_mean_0m 194.97
8 13 bathy_sd 1 tag 194.44
9 8 vo_mean 1 tag 143.85
10 14 pred_var 1 tag 108.26
[1] "External percent deviance explained"
[1] 0.7150451
[1] "TPR"
[1] 0.7406633
[1] "TSS"
[1] 0.852678
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7650 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2388119 0.8826053 0.9824973 0.9967742 0.7150451 0.7516629
explore_brt (mod_file_path = brt_outputs[15 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3077135
Correlation 0.9191926
AUC 0.9929000
Per.Expl 77.8031473
cvDeviance 0.5030906
cvCorrelation 0.8262495
cvAUC 0.9613100
cvPer.Expl 63.7096644
[1] "Relative influence of predictor variables"
rel.inf
tag 45.2009449
dist_coast 18.4105318
o2_mean_0m 10.3754042
lat 6.4377993
bathy_mean 5.1604232
sal_mean 2.8517201
temp_mean 2.3917958
chl_mean 2.0825112
ssh_mean 1.4059547
mld_mean 1.2399969
vostr_mean 1.0909863
bathy_sd 0.8612787
pred_var 0.8202809
vo_mean 0.6310177
uostr_mean 0.6049444
uo_mean 0.4344099
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 935.71
2 3 o2_mean_0m 1 tag 599.92
3 13 bathy_mean 1 tag 570.45
4 15 dist_coast 1 tag 297.19
5 5 temp_mean 1 tag 273.14
6 6 sal_mean 1 tag 259.17
7 4 chl_mean 1 tag 148.96
8 11 ssh_mean 1 tag 138.61
9 5 temp_mean 3 o2_mean_0m 137.74
10 14 bathy_sd 1 tag 122.14
11 9 vo_mean 1 tag 106.95
12 12 mld_mean 1 tag 97.74
13 16 pred_var 1 tag 91.82
[1] "External percent deviance explained"
[1] 0.7416577
[1] "TPR"
[1] 0.7422192
[1] "TSS"
[1] 0.8637654
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7350 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.226637 0.8945919 0.9855039 0.9997086 0.7416577 0.7780315
explore_brt (mod_file_path = brt_outputs[13 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3336455
Correlation 0.9113159
AUC 0.9914000
Per.Expl 75.9325541
cvDeviance 0.5484820
cvCorrelation 0.8074909
cvAUC 0.9531600
cvPer.Expl 60.4353619
[1] "Relative influence of predictor variables"
rel.inf
tag 45.8318388
bathy_mean 15.6731379
o2_mean_0m 13.4749675
temp_mean 4.0324904
o2_mean_60m 4.0124893
sal_mean 3.7440800
chl_mean 3.4171857
ssh_mean 2.5269272
mld_mean 1.6460394
bathy_sd 1.2069586
vostr_mean 1.1031311
vo_mean 0.8942394
pred_var 0.8418653
uo_mean 0.8246185
uostr_mean 0.7700309
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 bathy_mean 1 tag 863.04
2 2 o2_mean_0m 1 tag 752.48
3 4 temp_mean 1 tag 727.72
4 5 sal_mean 1 tag 659.23
5 14 o2_mean_60m 1 tag 352.84
6 10 ssh_mean 1 tag 208.39
7 4 temp_mean 2 o2_mean_0m 204.79
8 3 chl_mean 1 tag 199.75
9 13 bathy_sd 1 tag 155.00
10 8 vo_mean 1 tag 132.19
11 15 pred_var 1 tag 104.14
[1] "External percent deviance explained"
[1] 0.7206636
[1] "TPR"
[1] 0.7408662
[1] "TSS"
[1] 0.8563363
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2363376 0.8849535 0.9829041 0.9978414 0.7206636 0.7593255
explore_brt (mod_file_path = brt_outputs[10 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3421213
Correlation 0.9070514
AUC 0.9904000
Per.Expl 75.3211530
cvDeviance 0.5490893
cvCorrelation 0.8069965
cvAUC 0.9530700
cvPer.Expl 60.3915546
[1] "Relative influence of predictor variables"
rel.inf
tag 46.4292453
o2_mean_0m 15.1805295
o2_mean_250m 13.2507504
bathy_mean 7.9891172
sal_mean 3.2341166
temp_mean 2.9293936
ssh_mean 2.2599756
chl_mean 2.1392953
mld_mean 1.2551674
bathy_sd 1.1236108
uostr_mean 0.8767264
pred_var 0.8541930
vostr_mean 0.8531117
vo_mean 0.8470162
uo_mean 0.7777511
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 o2_mean_0m 1 tag 801.68
2 12 bathy_mean 1 tag 732.17
3 5 sal_mean 1 tag 671.55
4 4 temp_mean 1 tag 593.41
5 14 o2_mean_250m 1 tag 331.86
6 3 chl_mean 1 tag 224.17
7 10 ssh_mean 1 tag 172.47
8 4 temp_mean 2 o2_mean_0m 125.51
9 14 o2_mean_250m 2 o2_mean_0m 114.20
10 8 vo_mean 1 tag 114.05
11 13 bathy_sd 1 tag 110.04
[1] "External percent deviance explained"
[1] 0.7167128
[1] "TPR"
[1] 0.740492
[1] "TSS"
[1] 0.8476697
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.238839 0.8821954 0.9821616 0.9976817 0.7167128 0.7532115
explore_brt (mod_file_path = brt_outputs[11 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3305617
Correlation 0.9113032
AUC 0.9913000
Per.Expl 76.1550015
cvDeviance 0.5396544
cvCorrelation 0.8109503
cvAUC 0.9547000
cvPer.Expl 61.0721392
[1] "Relative influence of predictor variables"
rel.inf
tag 45.8298025
o2_mean_0m 14.8053415
o2_mean_250m 12.7273136
bathy_mean 7.4986643
o2_mean_60m 3.1614481
sal_mean 2.8046144
temp_mean 2.7978182
chl_mean 2.0527019
ssh_mean 1.9440515
mld_mean 1.3218104
bathy_sd 1.1178508
uostr_mean 0.9653779
pred_var 0.8301538
vo_mean 0.7652047
vostr_mean 0.7020946
uo_mean 0.6757517
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 1 tag 682.07
2 2 o2_mean_0m 1 tag 669.63
3 12 bathy_mean 1 tag 604.89
4 5 sal_mean 1 tag 436.04
5 15 o2_mean_250m 1 tag 284.97
6 3 chl_mean 1 tag 230.31
7 14 o2_mean_60m 1 tag 197.79
8 10 ssh_mean 1 tag 168.44
9 13 bathy_sd 1 tag 135.70
10 8 vo_mean 1 tag 119.67
11 4 temp_mean 2 o2_mean_0m 114.73
12 16 pred_var 1 tag 88.20
13 11 mld_mean 1 tag 74.37
[1] "External percent deviance explained"
[1] 0.7222488
[1] "TPR"
[1] 0.7408139
[1] "TSS"
[1] 0.8538549
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.236182 0.8848858 0.9827898 0.9981021 0.7222488 0.76155
explore_brt (mod_file_path = brt_outputs[12 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.3097863
Correlation 0.9179801
AUC 0.9925000
Per.Expl 77.6536275
cvDeviance 0.5032433
cvCorrelation 0.8262230
cvAUC 0.9611900
cvPer.Expl 63.6986447
[1] "Relative influence of predictor variables"
rel.inf
tag 44.6832785
dist_coast 14.9083920
o2_mean_0m 11.3088495
o2_mean_250m 7.7993714
lat 4.4617062
bathy_mean 2.7810933
o2_mean_60m 2.4314643
sal_mean 2.1541513
temp_mean 2.0608261
chl_mean 1.5384187
mld_mean 1.1011340
ssh_mean 1.0869767
pred_var 0.7795164
bathy_sd 0.7163208
vostr_mean 0.5989924
uostr_mean 0.5474190
vo_mean 0.5378007
uo_mean 0.5042887
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 619.06
2 2 lat 1 tag 558.22
3 13 bathy_mean 1 tag 365.25
4 5 temp_mean 1 tag 298.79
5 6 sal_mean 1 tag 283.50
6 15 dist_coast 1 tag 209.23
7 17 o2_mean_250m 1 tag 162.76
8 4 chl_mean 1 tag 129.59
9 16 o2_mean_60m 1 tag 123.37
10 11 ssh_mean 1 tag 94.28
11 14 bathy_sd 1 tag 88.17
12 12 mld_mean 1 tag 81.84
13 9 vo_mean 1 tag 65.29
14 18 pred_var 1 tag 65.25
15 5 temp_mean 3 o2_mean_0m 59.87
16 10 vostr_mean 1 tag 43.83
[1] "External percent deviance explained"
[1] 0.7414953
[1] "TPR"
[1] 0.7422514
[1] "TSS"
[1] 0.8677531
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6800 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2264786 0.8948487 0.9856034 1.00013 0.7414953 0.7765363
AGI models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
explore_brt (mod_file_path = brt_outputs[5 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3488727
Correlation 0.9075879
AUC 0.9911000
Per.Expl 74.8339450
cvDeviance 0.5715675
cvCorrelation 0.7967747
cvAUC 0.9489800
cvPer.Expl 58.7697755
[1] "Relative influence of predictor variables"
rel.inf
tag 45.6682568
bathy_mean 15.8051134
temp_mean 9.0349774
AGI_0m 6.3891728
ssh_mean 5.4588690
sal_mean 4.9996184
chl_mean 3.1875195
vostr_mean 1.8324591
mld_mean 1.6778310
uostr_mean 1.3701035
bathy_sd 1.3575468
vo_mean 1.2788779
uo_mean 1.0123857
pred_var 0.9272687
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1385.24
2 4 sal_mean 1 tag 1008.93
3 3 temp_mean 1 tag 980.93
4 11 bathy_mean 1 tag 980.72
5 13 AGI_0m 1 tag 359.38
6 9 ssh_mean 1 tag 297.36
7 2 chl_mean 1 tag 241.30
8 13 AGI_0m 9 ssh_mean 214.62
9 12 bathy_sd 1 tag 205.71
10 7 vo_mean 1 tag 161.67
[1] "External percent deviance explained"
[1] 0.7058188
[1] "TPR"
[1] 0.7398468
[1] "TSS"
[1] 0.8467854
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2418833 0.8794963 0.9810396 1.000017 0.7058188 0.7483395
explore_brt (mod_file_path = brt_outputs[6 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3197760
Correlation 0.9158473
AUC 0.9925000
Per.Expl 76.9328462
cvDeviance 0.5137785
cvCorrelation 0.8210419
cvAUC 0.9596800
cvPer.Expl 62.9384050
[1] "Relative influence of predictor variables"
rel.inf
tag 43.8312768
dist_coast 18.8936048
lat 7.3197274
bathy_mean 5.3601903
AGI_0m 5.2578297
temp_mean 4.8902634
sal_mean 3.8487744
chl_mean 2.5368902
ssh_mean 2.1700563
vostr_mean 1.1761165
mld_mean 1.1622948
pred_var 0.7803509
bathy_sd 0.7748458
uostr_mean 0.7138587
vo_mean 0.6796771
uo_mean 0.6042428
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 951.49
2 14 AGI_0m 4 temp_mean 683.10
3 12 bathy_mean 1 tag 473.92
4 4 temp_mean 1 tag 424.60
5 5 sal_mean 1 tag 293.15
6 14 AGI_0m 1 tag 282.92
7 15 dist_coast 1 tag 263.55
8 3 chl_mean 1 tag 175.09
9 10 ssh_mean 1 tag 142.20
10 13 bathy_sd 1 tag 107.14
11 8 vo_mean 1 tag 92.97
12 16 pred_var 1 tag 87.84
13 11 mld_mean 1 tag 75.98
[1] "External percent deviance explained"
[1] 0.7283939
[1] "TPR"
[1] 0.7411294
[1] "TSS"
[1] 0.8566402
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.232085 0.889198 0.9835227 1.000057 0.7283939 0.7693285
explore_brt (mod_file_path = brt_outputs[4 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3375275
Correlation 0.9115731
AUC 0.9920000
Per.Expl 75.6523377
cvDeviance 0.5596743
cvCorrelation 0.8023107
cvAUC 0.9513600
cvPer.Expl 59.6277012
[1] "Relative influence of predictor variables"
rel.inf
tag 45.3698332
bathy_mean 14.8376454
temp_mean 8.7196282
AGI_0m 6.4385954
sal_mean 4.6470954
ssh_mean 4.6224771
AGI_60m 3.4371631
chl_mean 3.0979502
vostr_mean 1.7466955
mld_mean 1.5192247
uostr_mean 1.2840192
vo_mean 1.2375403
bathy_sd 1.2360217
uo_mean 0.9357369
pred_var 0.8703735
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1150.88
2 3 temp_mean 1 tag 1069.42
3 4 sal_mean 1 tag 977.35
4 11 bathy_mean 1 tag 872.62
5 13 AGI_0m 1 tag 303.61
6 14 AGI_60m 1 tag 276.48
7 9 ssh_mean 1 tag 249.01
8 12 bathy_sd 1 tag 204.60
9 2 chl_mean 1 tag 187.86
10 7 vo_mean 1 tag 144.86
11 15 pred_var 1 tag 124.26
[1] "External percent deviance explained"
[1] 0.7139032
[1] "TPR"
[1] 0.74046
[1] "TSS"
[1] 0.8501959
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2381082 0.8834379 0.9822011 1.00085 0.7139032 0.7565234
explore_brt (mod_file_path = brt_outputs[1 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3407856
Correlation 0.9092219
AUC 0.9912000
Per.Expl 75.4173115
cvDeviance 0.5592736
cvCorrelation 0.8020908
cvAUC 0.9511400
cvPer.Expl 59.6566018
[1] "Relative influence of predictor variables"
rel.inf
tag 46.4790375
AGI_250m 12.2874054
temp_mean 8.9924380
bathy_mean 8.4880695
AGI_0m 5.6191772
sal_mean 3.8744780
ssh_mean 3.7625577
chl_mean 2.5548718
vostr_mean 1.3289345
mld_mean 1.3003549
uostr_mean 1.2354471
vo_mean 1.1455409
bathy_sd 1.1337923
pred_var 0.8997732
uo_mean 0.8981217
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1018.34
2 3 temp_mean 1 tag 834.21
3 13 AGI_0m 3 temp_mean 793.98
4 11 bathy_mean 1 tag 654.52
5 14 AGI_250m 1 tag 308.87
6 13 AGI_0m 1 tag 302.48
7 9 ssh_mean 1 tag 278.94
8 2 chl_mean 1 tag 198.59
9 12 bathy_sd 1 tag 164.86
10 7 vo_mean 1 tag 131.62
11 15 pred_var 1 tag 126.78
[1] "External percent deviance explained"
[1] 0.7098012
[1] "TPR"
[1] 0.7398088
[1] "TSS"
[1] 0.8471372
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7800 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2411697 0.8796409 0.9809232 1.001943 0.7098012 0.7541731
explore_brt (mod_file_path = brt_outputs[2 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3446793
Correlation 0.9077153
AUC 0.9909000
Per.Expl 75.1364360
cvDeviance 0.5538145
cvCorrelation 0.8046978
cvAUC 0.9522800
cvPer.Expl 60.0503988
[1] "Relative influence of predictor variables"
rel.inf
tag 45.7582300
AGI_250m 12.1128304
temp_mean 9.0027738
bathy_mean 8.2161329
AGI_0m 5.5044249
sal_mean 3.6823711
ssh_mean 3.0624512
AGI_60m 2.7807343
chl_mean 2.4205258
vostr_mean 1.2805676
uostr_mean 1.2661180
mld_mean 1.2345471
vo_mean 1.0742638
bathy_sd 1.0404232
uo_mean 0.8104982
pred_var 0.7531076
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 930.68
2 3 temp_mean 1 tag 920.62
3 13 AGI_0m 3 temp_mean 685.43
4 11 bathy_mean 1 tag 579.12
5 13 AGI_0m 1 tag 253.04
6 14 AGI_60m 1 tag 237.86
7 15 AGI_250m 1 tag 233.39
8 9 ssh_mean 1 tag 210.53
9 12 bathy_sd 1 tag 175.95
10 2 chl_mean 1 tag 152.48
11 7 vo_mean 1 tag 127.49
12 16 pred_var 1 tag 95.31
13 13 AGI_0m 9 ssh_mean 86.29
[1] "External percent deviance explained"
[1] 0.7085886
[1] "TPR"
[1] 0.7397452
[1] "TSS"
[1] 0.8460373
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7200 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2417386 0.8791051 0.9808666 1.002203 0.7085886 0.7513644
explore_brt (mod_file_path = brt_outputs[3 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.3080448
Correlation 0.9195783
AUC 0.9934000
Per.Expl 77.7790835
cvDeviance 0.5084655
cvCorrelation 0.8226110
cvAUC 0.9603100
cvPer.Expl 63.3216647
[1] "Relative influence of predictor variables"
rel.inf
tag 44.6291838
dist_coast 15.2851304
lat 6.9231034
AGI_250m 6.7387247
AGI_0m 4.8196657
temp_mean 4.5949137
bathy_mean 3.2716900
sal_mean 2.6278408
AGI_60m 2.1064618
chl_mean 2.0757314
ssh_mean 1.7362553
mld_mean 1.0240873
pred_var 0.7992513
bathy_sd 0.7490174
vostr_mean 0.7444577
vo_mean 0.6774726
uostr_mean 0.6498551
uo_mean 0.5471578
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 838.15
2 4 temp_mean 1 tag 365.46
3 12 bathy_mean 1 tag 330.46
4 5 sal_mean 1 tag 314.53
5 14 AGI_0m 4 temp_mean 311.04
6 14 AGI_0m 1 tag 262.60
7 15 dist_coast 1 tag 206.65
8 17 AGI_250m 1 tag 178.43
9 3 chl_mean 1 tag 144.07
10 16 AGI_60m 1 tag 123.29
11 13 bathy_sd 1 tag 120.30
12 10 ssh_mean 1 tag 103.70
13 8 vo_mean 1 tag 82.19
14 18 pred_var 1 tag 82.03
15 11 mld_mean 1 tag 72.98
16 9 vostr_mean 1 tag 54.22
[1] "External percent deviance explained"
[1] 0.7347234
[1] "TPR"
[1] 0.7415656
[1] "TSS"
[1] 0.8603764
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2295486 0.8915997 0.9842994 1.001493 0.7347234 0.7777908
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_crw_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
42.389
0.385
0.695
0.613
0.892
0.371
0.679
0.892
base_0m_Nspat_Ytag
72.475
0.677
0.737
0.836
0.976
0.255
0.865
0.725
base_0m_Yspat_Ytag
75.788
0.712
0.740
0.850
0.981
0.240
0.881
0.758
do_0m_Nspat_Ytag
75.166
0.715
0.741
0.853
0.982
0.239
0.883
0.752
do_0m_Yspat_Ytag
77.803
0.742
0.742
0.864
0.986
0.227
0.895
0.778
do_0m_60m_Nspat_Ytag
75.933
0.721
0.741
0.856
0.983
0.236
0.885
0.759
do_0m_250m_Nspat_Ytag
75.321
0.717
0.740
0.848
0.982
0.239
0.882
0.753
do_0m_60m_250m_Nspat_Ytag
76.155
0.722
0.741
0.854
0.983
0.236
0.885
0.762
do_0m_60m_250m_Yspat_Ytag
77.654
0.741
0.742
0.868
0.986
0.226
0.895
0.777
agi_0m_Nspat_Ytag
74.834
0.706
0.740
0.847
0.981
0.242
0.879
0.748
agi_0m_Yspat_Ytag
76.933
0.728
0.741
0.857
0.984
0.232
0.889
0.769
agi_0m_60m_Nspat_Ytag
75.672
0.714
0.740
0.850
0.982
0.238
0.883
0.757
agi_0m_250m_Nspat_Ytag
75.417
0.710
0.740
0.847
0.981
0.241
0.880
0.754
agi_0m_60m_250m_Nspat_Ytag
75.136
0.709
0.740
0.846
0.981
0.242
0.879
0.751
agi_0m_60m_250m_Yspat_Ytag
77.780
0.735
0.742
0.860
0.984
0.230
0.892
0.778
ggplot (output_sum, aes (x = AUC, y = TSS, color = deviance_exp, text = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial models w/ tag ID
Base models: Bathymetry was consistently one of the top predictor variables across all base models, and percent explained greatly increased after including spatial and tag ID as additional predictors. After running these initial models, we decided to instead run the spatial analysis separately (GLMs, GAMs), rather than including them as predictors in the hSDMs, to specifically investigate the relationships between latitude, distance to coast, and the AGI or DO at different depth layers. Additionally, we will not include tag ID as a predictor variable as it would not be included in any projection work and is not critical for the main objectives of this study.
DO models: Performance metrics generally increased, though only subtly, after including the additional depth layers relative to the DO_0m model. However, relative to the base models, including DO considerably improved model performance. Across depth layers, DO at 0m and 250m were consistently in the top 5 predictors with most relative influence and had comparable contributions. From the partial plots, we generally see a sweet spot for DO values at 0m and a negative relationship for DO at 250m.
AGI models: Performance metrics were comparable among the DO and AGI models, and the patterns observed for the DO models also generally held for the AGI models. We see model performance greatly improve for the AGI models relative to the base models, and performance also subtly increased after including the additional depth layers. A primary difference for the AGI models is the relative influence of the AGI at 250m. For these models, the AGI at this depth layer is the only one appearing in the top variables with the highest relative influence, and the AGI at 0m and 60m is typically lower in the list. The AGI partial plots show similar patterns as the DO plots, with less of a dramatic negative relationship for the AGI at 250m.
The random predictor variable was typically the lowest performing metric, but across some models, had a higher relative influence than the predictors related to wind stress and wind stress curl.
DO models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
explore_brt (mod_file_path = brt_outputs_Ntag[12 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.7008164
Correlation 0.7644341
AUC 0.9373000
Per.Expl 49.4467567
cvDeviance 0.9191696
cvCorrelation 0.6261518
cvAUC 0.8570800
cvPer.Expl 33.6958924
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 25.276448
bathy_mean 24.692429
temp_mean 9.643582
sal_mean 7.962940
chl_mean 6.550115
ssh_mean 4.947427
mld_mean 3.952063
bathy_sd 3.566587
vostr_mean 3.058564
vo_mean 2.889361
uo_mean 2.880227
pred_var 2.378356
uostr_mean 2.201901
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 669.74
2 9 ssh_mean 1 o2_mean_0m 242.37
3 9 ssh_mean 4 sal_mean 166.84
4 11 bathy_mean 1 o2_mean_0m 158.43
5 11 bathy_mean 3 temp_mean 150.85
6 7 vo_mean 4 sal_mean 123.20
7 9 ssh_mean 3 temp_mean 94.56
8 13 pred_var 8 vostr_mean 91.68
[1] "External percent deviance explained"
[1] 0.4500602
[1] "TPR"
[1] 0.7078792
[1] "TSS"
[1] 0.6709833
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4400 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3467092 0.7274861 0.9171592 0.9981085 0.4500602 0.4944676
explore_brt (mod_file_path = brt_outputs_Ntag[13 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6614427
Correlation 0.7820692
AUC 0.9456000
Per.Expl 52.2869659
cvDeviance 0.8779106
cvCorrelation 0.6486562
cvAUC 0.8709500
cvPer.Expl 36.6720992
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 26.671181
o2_mean_0m 18.726788
lat 9.290748
temp_mean 7.857842
bathy_mean 7.494543
sal_mean 6.766416
chl_mean 4.487413
ssh_mean 3.382554
mld_mean 2.832160
vostr_mean 2.487294
vo_mean 2.263713
pred_var 2.105882
uo_mean 2.030506
bathy_sd 1.927916
uostr_mean 1.675046
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 534.86
2 2 o2_mean_0m 1 lat 195.38
3 10 ssh_mean 2 o2_mean_0m 187.36
4 14 dist_coast 5 sal_mean 171.61
5 12 bathy_mean 4 temp_mean 151.42
6 4 temp_mean 1 lat 140.73
7 7 uostr_mean 1 lat 129.13
8 10 ssh_mean 1 lat 68.75
9 9 vostr_mean 5 sal_mean 66.17
10 14 dist_coast 4 temp_mean 65.09
11 5 sal_mean 2 o2_mean_0m 62.75
[1] "External percent deviance explained"
[1] 0.4774787
[1] "TPR"
[1] 0.7122652
[1] "TSS"
[1] 0.6959527
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3369501 0.7453633 0.9259331 1.000626 0.4774787 0.5228697
explore_brt (mod_file_path = brt_outputs_Ntag[11 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6829345
Correlation 0.7734095
AUC 0.9413000
Per.Expl 50.7366588
cvDeviance 0.9080272
cvCorrelation 0.6319024
cvAUC 0.8602700
cvPer.Expl 34.4996481
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 23.261017
bathy_mean 23.180777
temp_mean 8.946731
o2_mean_60m 7.584790
sal_mean 7.116600
chl_mean 5.956366
ssh_mean 4.122221
bathy_sd 3.632138
mld_mean 3.629388
vostr_mean 2.942032
vo_mean 2.838769
uo_mean 2.590805
pred_var 2.267411
uostr_mean 1.930953
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 725.63
2 9 ssh_mean 1 o2_mean_0m 207.72
3 13 o2_mean_60m 3 temp_mean 146.75
4 11 bathy_mean 3 temp_mean 143.76
5 4 sal_mean 3 temp_mean 138.38
6 11 bathy_mean 1 o2_mean_0m 118.26
7 9 ssh_mean 4 sal_mean 109.57
8 6 uostr_mean 1 o2_mean_0m 101.66
9 9 ssh_mean 3 temp_mean 75.73
10 14 pred_var 5 uo_mean 69.79
[1] "External percent deviance explained"
[1] 0.4630292
[1] "TPR"
[1] 0.7102091
[1] "TSS"
[1] 0.6853121
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3418533 0.7367275 0.9218172 0.9995302 0.4630292 0.5073666
explore_brt (mod_file_path = brt_outputs_Ntag[8 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6791781
Correlation 0.7746510
AUC 0.9422000
Per.Expl 51.0076259
cvDeviance 0.9080124
cvCorrelation 0.6325770
cvAUC 0.8609300
cvPer.Expl 34.5007109
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 25.069876
o2_mean_250m 21.370418
bathy_mean 11.146251
temp_mean 8.048127
sal_mean 7.380057
chl_mean 5.021190
ssh_mean 4.259103
mld_mean 3.260225
bathy_sd 2.908945
vo_mean 2.563888
vostr_mean 2.452558
uo_mean 2.395906
pred_var 2.102817
uostr_mean 2.020640
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 623.90
2 9 ssh_mean 4 sal_mean 206.20
3 9 ssh_mean 1 o2_mean_0m 176.32
4 13 o2_mean_250m 1 o2_mean_0m 156.94
5 7 vo_mean 4 sal_mean 145.70
6 11 bathy_mean 1 o2_mean_0m 110.15
7 14 pred_var 8 vostr_mean 107.16
8 4 sal_mean 3 temp_mean 93.56
9 8 vostr_mean 4 sal_mean 86.63
10 13 o2_mean_250m 3 temp_mean 70.13
[1] "External percent deviance explained"
[1] 0.4672029
[1] "TPR"
[1] 0.7109929
[1] "TSS"
[1] 0.6804633
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3403401 0.7395697 0.9233672 0.9988469 0.4672029 0.5100763
explore_brt (mod_file_path = brt_outputs_Ntag[9 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6675583
Correlation 0.7801596
AUC 0.9448000
Per.Expl 51.8458202
cvDeviance 0.9001398
cvCorrelation 0.6364350
cvAUC 0.8631700
cvPer.Expl 35.0685995
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 23.999794
o2_mean_250m 21.235677
bathy_mean 9.216478
temp_mean 7.566296
sal_mean 6.750525
o2_mean_60m 6.595268
chl_mean 4.699962
ssh_mean 3.230278
mld_mean 2.930399
bathy_sd 2.862417
vo_mean 2.465057
uo_mean 2.292054
vostr_mean 2.217181
pred_var 2.123455
uostr_mean 1.815160
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 514.84
2 9 ssh_mean 4 sal_mean 230.69
3 14 o2_mean_250m 1 o2_mean_0m 137.44
4 9 ssh_mean 1 o2_mean_0m 130.29
5 4 sal_mean 3 temp_mean 117.26
6 7 vo_mean 4 sal_mean 113.45
7 15 pred_var 5 uo_mean 99.85
8 13 o2_mean_60m 3 temp_mean 87.88
9 9 ssh_mean 3 temp_mean 75.78
10 11 bathy_mean 1 o2_mean_0m 74.79
11 11 bathy_mean 3 temp_mean 73.68
[1] "External percent deviance explained"
[1] 0.4730226
[1] "TPR"
[1] 0.7118979
[1] "TSS"
[1] 0.6942536
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.338087 0.7435281 0.9252005 0.997885 0.4730226 0.5184582
explore_brt (mod_file_path = brt_outputs_Ntag[10 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862935
Residual.Deviance 0.6260018
Correlation 0.7998334
AUC 0.9540000
Per.Expl 54.8434923
cvDeviance 0.8717790
cvCorrelation 0.6520278
cvAUC 0.8730600
cvPer.Expl 37.1143990
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 21.448387
o2_mean_0m 17.884136
o2_mean_250m 11.445241
lat 6.825416
temp_mean 6.586808
sal_mean 6.132765
o2_mean_60m 5.658457
chl_mean 3.792773
bathy_mean 3.672083
ssh_mean 2.744783
mld_mean 2.548687
vo_mean 2.115490
uo_mean 1.958505
vostr_mean 1.921788
pred_var 1.875404
bathy_sd 1.700138
uostr_mean 1.689138
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 562.48
2 10 ssh_mean 2 o2_mean_0m 142.65
3 2 o2_mean_0m 1 lat 117.54
4 10 ssh_mean 5 sal_mean 105.70
5 14 dist_coast 5 sal_mean 100.26
6 5 sal_mean 4 temp_mean 99.98
7 14 dist_coast 9 vostr_mean 95.55
8 12 bathy_mean 4 temp_mean 94.87
9 4 temp_mean 1 lat 93.76
10 10 ssh_mean 1 lat 90.14
11 16 o2_mean_250m 1 lat 81.26
12 17 pred_var 9 vostr_mean 69.31
13 15 o2_mean_60m 4 temp_mean 64.27
14 9 vostr_mean 5 sal_mean 60.88
[1] "External percent deviance explained"
[1] 0.5003296
[1] "TPR"
[1] 0.7165041
[1] "TSS"
[1] 0.7177324
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3278095 0.7619584 0.9344126 1.000316 0.5003296 0.5484349
AGI models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
explore_brt (mod_file_path = brt_outputs_Ntag[5 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.7138692
Correlation 0.7575339
AUC 0.9336000
Per.Expl 48.5047989
cvDeviance 0.9289380
cvCorrelation 0.6208177
cvAUC 0.8541100
cvPer.Expl 32.9907313
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 22.731429
AGI_0m 17.791729
temp_mean 15.222220
sal_mean 9.295392
ssh_mean 7.358567
chl_mean 5.645338
bathy_sd 3.705780
vo_mean 3.438996
vostr_mean 3.398845
mld_mean 3.390303
uo_mean 2.903321
uostr_mean 2.782337
pred_var 2.335742
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 3448.33
2 12 AGI_0m 8 ssh_mean 241.53
3 10 bathy_mean 2 temp_mean 223.16
4 12 AGI_0m 10 bathy_mean 194.18
5 12 AGI_0m 4 uo_mean 140.23
6 7 vostr_mean 2 temp_mean 89.54
7 8 ssh_mean 2 temp_mean 83.29
8 6 vo_mean 3 sal_mean 80.67
[1] "External percent deviance explained"
[1] 0.4366052
[1] "TPR"
[1] 0.7045997
[1] "TSS"
[1] 0.6523948
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3521742 0.7161327 0.9105999 0.9996408 0.4366052 0.485048
explore_brt (mod_file_path = brt_outputs_Ntag[6 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.6608846
Correlation 0.7823616
AUC 0.9458000
Per.Expl 52.3268580
cvDeviance 0.8877425
cvCorrelation 0.6432731
cvAUC 0.8676200
cvPer.Expl 35.9623875
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 26.940501
AGI_0m 14.237577
lat 11.432881
temp_mean 9.137101
sal_mean 8.383518
bathy_mean 6.450135
chl_mean 4.606477
ssh_mean 4.094246
mld_mean 2.781658
vostr_mean 2.141951
vo_mean 2.063669
pred_var 2.032190
uo_mean 1.995454
uostr_mean 1.873742
bathy_sd 1.828900
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1902.42
2 3 temp_mean 1 lat 616.06
3 13 AGI_0m 9 ssh_mean 167.94
4 8 vostr_mean 3 temp_mean 162.76
5 6 uostr_mean 1 lat 154.65
6 14 dist_coast 4 sal_mean 144.17
7 13 AGI_0m 11 bathy_mean 142.36
8 13 AGI_0m 1 lat 134.68
9 9 ssh_mean 3 temp_mean 89.31
10 8 vostr_mean 4 sal_mean 70.22
11 13 AGI_0m 4 sal_mean 68.82
[1] "External percent deviance explained"
[1] 0.4743424
[1] "TPR"
[1] 0.7112435
[1] "TSS"
[1] 0.6897118
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3383874 0.7424042 0.9238761 0.9986839 0.4743424 0.5232686
explore_brt (mod_file_path = brt_outputs_Ntag[4 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.6808364
Correlation 0.7747488
AUC 0.9423000
Per.Expl 50.8876286
cvDeviance 0.9168888
cvCorrelation 0.6270623
cvAUC 0.8579000
cvPer.Expl 33.8599080
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 21.700356
AGI_0m 16.232947
temp_mean 14.727427
sal_mean 8.696719
ssh_mean 6.145377
chl_mean 5.640991
AGI_60m 5.466690
bathy_sd 3.771708
vostr_mean 3.420011
mld_mean 3.394063
vo_mean 3.152506
uo_mean 2.837063
uostr_mean 2.639898
pred_var 2.174244
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 3156.43
2 12 AGI_0m 10 bathy_mean 213.03
3 10 bathy_mean 2 temp_mean 207.88
4 12 AGI_0m 8 ssh_mean 201.47
5 7 vostr_mean 2 temp_mean 135.57
6 6 vo_mean 3 sal_mean 96.14
7 8 ssh_mean 2 temp_mean 95.85
8 13 AGI_60m 10 bathy_mean 88.11
9 5 uostr_mean 2 temp_mean 80.23
10 14 pred_var 4 uo_mean 74.79
[1] "External percent deviance explained"
[1] 0.4577255
[1] "TPR"
[1] 0.7087203
[1] "TSS"
[1] 0.6718425
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3441022 0.7322665 0.9188448 0.9987076 0.4577255 0.5088763
explore_brt (mod_file_path = brt_outputs_Ntag[1 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.7036843
Correlation 0.7613252
AUC 0.9355000
Per.Expl 49.2394844
cvDeviance 0.9131700
cvCorrelation 0.6290762
cvAUC 0.8592000
cvPer.Expl 34.1281627
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m 20.028572
temp_mean 15.736663
AGI_0m 15.502156
bathy_mean 11.082676
sal_mean 8.284880
ssh_mean 5.695943
chl_mean 4.456598
bathy_sd 3.261830
mld_mean 3.150028
vo_mean 2.968082
uo_mean 2.742510
uostr_mean 2.688339
vostr_mean 2.523998
pred_var 1.877725
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 2261.05
2 12 AGI_0m 8 ssh_mean 243.87
3 13 AGI_250m 12 AGI_0m 192.33
4 12 AGI_0m 10 bathy_mean 149.62
5 6 vo_mean 3 sal_mean 110.20
6 13 AGI_250m 2 temp_mean 96.73
7 10 bathy_mean 2 temp_mean 87.09
8 12 AGI_0m 4 uo_mean 81.43
9 7 vostr_mean 2 temp_mean 70.55
10 13 AGI_250m 3 sal_mean 62.69
[1] "External percent deviance explained"
[1] 0.4483766
[1] "TPR"
[1] 0.7067332
[1] "TSS"
[1] 0.6632542
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3477708 0.7247524 0.914827 0.9985637 0.4483766 0.4923948
explore_brt (mod_file_path = brt_outputs_Ntag[2 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.6776093
Correlation 0.7746133
AUC 0.9421000
Per.Expl 51.1204139
cvDeviance 0.9081818
cvCorrelation 0.6309660
cvAUC 0.8602600
cvPer.Expl 34.4879906
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m 18.677103
temp_mean 15.215390
AGI_0m 14.842826
bathy_mean 11.576550
sal_mean 8.159404
ssh_mean 4.885997
chl_mean 4.267141
AGI_60m 3.982295
mld_mean 3.144571
bathy_sd 2.910730
vo_mean 2.764213
uostr_mean 2.569658
vostr_mean 2.536394
uo_mean 2.474835
pred_var 1.992893
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 2217.12
2 12 AGI_0m 8 ssh_mean 266.83
3 12 AGI_0m 10 bathy_mean 171.01
4 14 AGI_250m 12 AGI_0m 121.70
5 6 vo_mean 3 sal_mean 119.30
6 7 vostr_mean 2 temp_mean 89.77
7 14 AGI_250m 2 temp_mean 85.04
8 13 AGI_60m 10 bathy_mean 70.39
9 10 bathy_mean 3 sal_mean 54.79
10 10 bathy_mean 2 temp_mean 54.17
11 11 bathy_sd 3 sal_mean 51.71
[1] "External percent deviance explained"
[1] 0.4637131
[1] "TPR"
[1] 0.7096986
[1] "TSS"
[1] 0.6705067
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3419866 0.7360071 0.920779 0.9997123 0.4637131 0.5112041
explore_brt (mod_file_path = brt_outputs_Ntag[3 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862829
Residual.Deviance 0.6484015
Correlation 0.7886623
AUC 0.9489000
Per.Expl 53.2273293
cvDeviance 0.8776781
cvCorrelation 0.6485564
cvAUC 0.8704400
cvPer.Expl 36.6883816
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 22.326115
AGI_0m 13.657497
lat 10.234998
AGI_250m 10.174094
temp_mean 8.542396
sal_mean 7.029363
bathy_mean 4.398889
chl_mean 3.838414
ssh_mean 3.348338
AGI_60m 3.161974
mld_mean 2.653839
uo_mean 1.954712
vo_mean 1.922921
uostr_mean 1.732725
pred_var 1.726678
vostr_mean 1.726454
bathy_sd 1.570595
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1330.23
2 3 temp_mean 1 lat 514.47
3 13 AGI_0m 1 lat 160.00
4 13 AGI_0m 9 ssh_mean 146.13
5 13 AGI_0m 11 bathy_mean 137.65
6 6 uostr_mean 1 lat 137.36
7 16 AGI_250m 13 AGI_0m 93.98
8 14 dist_coast 8 vostr_mean 93.55
9 14 dist_coast 4 sal_mean 81.11
10 8 vostr_mean 4 sal_mean 71.57
11 8 vostr_mean 3 temp_mean 70.69
12 12 bathy_sd 4 sal_mean 60.30
13 13 AGI_0m 4 sal_mean 59.27
14 5 uo_mean 3 temp_mean 57.28
[1] "External percent deviance explained"
[1] 0.4821228
[1] "TPR"
[1] 0.71269
[1] "TSS"
[1] 0.6885405
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3356191 0.7477048 0.926785 0.9980565 0.4821228 0.5322733
Summary table of results
output_sum_Ntag <- read.csv (here ("data/brt/mod_outputs/brt_crw_output_summary_Ntag.csv" ))
kableExtra:: kable (output_sum_Ntag)
base_0m_Nspat_Ntag
42.389
0.385
0.695
0.613
0.892
0.371
0.679
0.424
do_0m_Nspat_Ntag
49.447
0.450
0.708
0.671
0.917
0.347
0.727
0.494
do_0m_Yspat_Ntag
52.287
0.477
0.712
0.696
0.926
0.337
0.745
0.523
do_0m_60m_Nspat_Ntag
50.737
0.463
0.710
0.685
0.922
0.342
0.737
0.507
do_0m_250m_Nspat_Ntag
51.008
0.467
0.711
0.680
0.923
0.340
0.740
0.510
do_0m_60m_250m_Nspat_Ntag
51.846
0.473
0.712
0.694
0.925
0.338
0.744
0.518
do_0m_60m_250m_Yspat_Ntag
54.843
0.500
0.717
0.718
0.934
0.328
0.762
0.548
agi_0m_Nspat_Ntag
48.505
0.437
0.705
0.652
0.911
0.352
0.716
0.485
agi_0m_Yspat_Ntag
52.327
0.474
0.711
0.690
0.924
0.338
0.742
0.523
agi_0m_60m_Nspat_Ntag
50.888
0.458
0.709
0.672
0.919
0.344
0.732
0.509
agi_0m_250m_Nspat_Ntag
49.239
0.448
0.707
0.663
0.915
0.348
0.724
0.492
agi_0m_60m_250m_Nspat_Ntag
51.120
0.464
0.710
0.671
0.912
0.342
0.736
0.511
agi_0m_60m_250m_Yspat_Ntag
53.227
0.482
0.713
0.689
0.927
0.336
0.748
0.532
output_sum_Ntag_Nspat <- output_sum_Ntag %>%
filter (! grepl ("Yspat" , model))
ggplot (output_sum_Ntag_Nspat, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial models w/o tag ID
These models were all developed using predictor data at a daily resolution, and if we only consider models with no spatial predictors, the DO model with DO data at 0m, 60m, and 250m performed the best, with the comparable AGI model having lower TSS and AUC scores.
The DO and AGI models both performed better relative to the base model.
DO at 0m and DO at 250m were the two predictors with the highest relative influence, while DO at 60m was considerably lower in the list. This pattern held whether or not spatial predictor variables were included. Still, performance metrics improved for the DO_0m_60m_250m model relative to the DO_0m_250m model. Partial plot patters for DO at 0m and 250m were the same as the original models that included tag ID as a predictor (sweet spot for 0m, negative relationship for 250m).
AGI at 250m was the most important predictor variable, followed by temperature and AGI at 0m (the two had nearly identical relative influence values). However, AGI at 0m became more influential if spatial predictors were included. Model performance had smaller differences between the AGI_0m_250m and AGI_0m_60m_250m as the DO models did. Partial plot patters for the AGI at 0m and 250m remained the same as described above.
Base models w/o tag ID and w/ data at seasonal and annual resolutions
For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_base_0m_seas_Nspat_Ntag.rds" ,
test_data = base_test_seasonal)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.8809143
Correlation 0.6567031
AUC 0.8771000
Per.Expl 36.4552432
cvDeviance 0.9683417
cvCorrelation 0.5954101
cvAUC 0.8407800
cvPer.Expl 30.1486725
[1] "Relative influence of predictor variables"
rel.inf
vostr_mean 17.333628
uostr_mean 13.508666
bathy_mean 13.232902
vo_mean 11.309019
temp_mean 10.168831
ssh_mean 9.233046
sal_mean 8.770811
mld_mean 5.895077
chl_mean 5.118059
uo_mean 2.402328
bathy_sd 1.581218
pred_var 1.446416
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 sal_mean 1 mld_mean 133.60
2 10 bathy_mean 6 uostr_mean 81.44
3 10 bathy_mean 2 sal_mean 73.84
4 8 vostr_mean 4 temp_mean 63.53
5 6 uostr_mean 4 temp_mean 60.80
6 7 vo_mean 4 temp_mean 56.52
7 4 temp_mean 2 sal_mean 48.36
[1] "External percent deviance explained"
[1] 0.3488583
[1] "TPR"
[1] 0.682944
[1] "TSS"
[1] 0.5561876
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8350 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.38543 0.6400077 0.8671911 1.005559 0.3488583 0.3645524
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_base_0m_ann_Nspat_Ntag.rds" ,
test_data = base_test_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.7271350
Correlation 0.7489326
AUC 0.9294000
Per.Expl 47.5482367
cvDeviance 0.9391765
cvCorrelation 0.6117233
cvAUC 0.8502500
cvPer.Expl 32.2526569
[1] "Relative influence of predictor variables"
rel.inf
vostr_mean 20.172003
uostr_mean 13.485912
sal_mean 10.802089
bathy_mean 9.230921
vo_mean 8.994202
mld_mean 7.785211
chl_mean 7.103797
temp_mean 6.996080
ssh_mean 5.855904
uo_mean 3.741738
bathy_sd 2.977250
pred_var 2.854894
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 ssh_mean 2 sal_mean 854.62
2 8 vostr_mean 4 temp_mean 733.30
3 8 vostr_mean 6 uostr_mean 401.72
4 6 uostr_mean 3 ssh_mean 381.20
5 10 bathy_mean 8 vostr_mean 275.00
6 6 uostr_mean 2 sal_mean 226.43
7 9 chl_mean 3 ssh_mean 186.31
[1] "External percent deviance explained"
[1] 0.4327516
[1] "TPR"
[1] 0.7036107
[1] "TSS"
[1] 0.6509996
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4900 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.354432 0.7109286 0.9086044 1.002374 0.4327516 0.4754824
DO models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862856
Residual.Deviance 0.8087782
Correlation 0.7015642
AUC 0.9035000
Per.Expl 41.6586195
cvDeviance 0.9224917
cvCorrelation 0.6249818
cvAUC 0.8574200
cvPer.Expl 33.4558676
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_seas 25.0249198
o2_mean_0m_seas 24.7052133
o2_mean_60m_seas 9.0417226
temp_mean 8.8706495
bathy_mean 8.1790158
sal_mean 5.6940018
chl_mean 4.2801120
ssh_mean 3.9981758
mld_mean 2.3770798
bathy_sd 1.6902741
vostr_mean 1.5184271
vo_mean 1.3453968
uo_mean 1.2410073
uostr_mean 1.0733586
pred_var 0.9606457
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 204.60
2 15 o2_mean_250m_seas 13 o2_mean_0m_seas 120.96
3 13 o2_mean_0m_seas 2 temp_mean 108.56
4 14 o2_mean_60m_seas 3 sal_mean 104.18
5 13 o2_mean_0m_seas 8 ssh_mean 102.23
6 13 o2_mean_0m_seas 10 bathy_mean 73.29
7 13 o2_mean_0m_seas 1 chl_mean 56.07
8 10 bathy_mean 3 sal_mean 43.98
9 14 o2_mean_60m_seas 2 temp_mean 35.27
10 14 o2_mean_60m_seas 10 bathy_mean 33.37
11 7 vostr_mean 3 sal_mean 28.21
[1] "External percent deviance explained"
[1] 0.4130282
[1] "TPR"
[1] 0.6999559
[1] "TSS"
[1] 0.6280722
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3615509 0.697535 0.901256 1.002082 0.4130282 0.4165862
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862856
Residual.Deviance 0.7888981
Correlation 0.7126061
AUC 0.9098000
Per.Expl 43.0926725
cvDeviance 0.9023416
cvCorrelation 0.6369539
cvAUC 0.8648600
cvPer.Expl 34.9093998
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 24.1818242
o2_mean_0m_seas 18.7439025
o2_mean_250m_seas 12.5092916
temp_mean 7.7772110
o2_mean_60m_seas 6.6435930
sal_mean 6.4633552
lat 5.4361197
chl_mean 4.0752844
bathy_mean 3.0942439
ssh_mean 2.6982385
mld_mean 2.0105217
vostr_mean 1.3598262
vo_mean 1.0937062
uo_mean 1.0536335
uostr_mean 0.9697403
bathy_sd 0.9471132
pred_var 0.9423949
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 97.65
2 15 o2_mean_0m_seas 9 ssh_mean 79.80
3 15 o2_mean_0m_seas 3 temp_mean 68.01
4 17 o2_mean_250m_seas 15 o2_mean_0m_seas 57.47
5 16 o2_mean_60m_seas 4 sal_mean 56.75
6 4 sal_mean 1 lat 40.08
7 15 o2_mean_0m_seas 1 lat 39.48
8 15 o2_mean_0m_seas 2 chl_mean 36.64
9 6 uostr_mean 3 temp_mean 35.68
10 16 o2_mean_60m_seas 8 vostr_mean 30.12
11 8 vostr_mean 4 sal_mean 30.08
12 6 uostr_mean 1 lat 27.81
13 13 dist_coast 4 sal_mean 25.67
14 16 o2_mean_60m_seas 3 temp_mean 21.36
[1] "External percent deviance explained"
[1] 0.4291891
[1] "TPR"
[1] 0.7037666
[1] "TSS"
[1] 0.6477985
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3555959 0.7105654 0.9089065 1.002524 0.4291891 0.4309267
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862856
Residual.Deviance 0.7161664
Correlation 0.7594624
AUC 0.9361000
Per.Expl 48.3391884
cvDeviance 0.9443926
cvCorrelation 0.6087538
cvAUC 0.8489600
cvPer.Expl 31.8760420
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_ann 22.815335
temp_mean 12.948825
o2_mean_0m_ann 10.632947
o2_mean_60m_ann 8.366439
bathy_mean 8.305431
sal_mean 7.217211
chl_mean 6.745035
ssh_mean 4.679201
bathy_sd 3.764948
mld_mean 3.059999
vostr_mean 2.774661
uo_mean 2.445189
vo_mean 2.303757
pred_var 2.190658
uostr_mean 1.750363
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 o2_mean_0m_ann 2 temp_mean 270.07
2 14 o2_mean_60m_ann 2 temp_mean 163.36
3 10 bathy_mean 2 temp_mean 143.17
4 14 o2_mean_60m_ann 3 sal_mean 118.14
5 10 bathy_mean 8 ssh_mean 103.69
6 13 o2_mean_0m_ann 3 sal_mean 103.08
7 14 o2_mean_60m_ann 13 o2_mean_0m_ann 95.55
8 8 ssh_mean 1 chl_mean 89.14
9 14 o2_mean_60m_ann 10 bathy_mean 79.14
10 7 vostr_mean 3 sal_mean 71.34
11 15 o2_mean_250m_ann 14 o2_mean_60m_ann 60.83
[1] "External percent deviance explained"
[1] 0.4783583
[1] "TPR"
[1] 0.7163317
[1] "TSS"
[1] 0.7136478
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4700 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3344052 0.7557594 0.9340804 0.9990281 0.4783583 0.4833919
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862856
Residual.Deviance 0.6536089
Correlation 0.7921300
AUC 0.9520000
Per.Expl 52.8517851
cvDeviance 0.9204287
cvCorrelation 0.6225846
cvAUC 0.8570200
cvPer.Expl 33.6046878
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 20.297776
o2_mean_250m_ann 12.518497
temp_mean 9.519565
lat 8.436190
sal_mean 7.739106
chl_mean 6.172511
o2_mean_60m_ann 5.512454
o2_mean_0m_ann 5.467624
bathy_mean 4.211227
ssh_mean 3.787317
mld_mean 2.916206
vostr_mean 2.473318
pred_var 2.428564
uo_mean 2.233912
bathy_sd 2.205305
vo_mean 2.138529
uostr_mean 1.941896
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 16 o2_mean_60m_ann 1 lat 185.52
2 16 o2_mean_60m_ann 3 temp_mean 172.29
3 15 o2_mean_0m_ann 3 temp_mean 169.48
4 16 o2_mean_60m_ann 13 dist_coast 165.08
5 6 uostr_mean 1 lat 153.50
6 3 temp_mean 1 lat 99.53
7 9 ssh_mean 3 temp_mean 93.04
8 13 dist_coast 4 sal_mean 92.60
9 11 bathy_mean 3 temp_mean 89.52
10 16 o2_mean_60m_ann 4 sal_mean 77.49
11 15 o2_mean_0m_ann 4 sal_mean 75.36
12 9 ssh_mean 2 chl_mean 69.98
13 8 vostr_mean 4 sal_mean 62.33
14 11 bathy_mean 9 ssh_mean 60.04
[1] "External percent deviance explained"
[1] 0.5204087
[1] "TPR"
[1] 0.7236268
[1] "TSS"
[1] 0.7422174
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3173993 0.7858205 0.9486514 0.998094 0.5204087 0.5285179
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862856
Residual.Deviance 0.5979158
Correlation 0.8131143
AUC 0.9593000
Per.Expl 56.8692188
cvDeviance 0.8555333
cvCorrelation 0.6625779
cvAUC 0.8793100
cvPer.Expl 38.2859278
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_ann 16.178947
o2_mean_0m 15.177742
o2_mean_0m_seas 9.907204
temp_mean 6.497884
o2_mean_250m_seas 6.092396
o2_mean_60m_seas 5.404267
bathy_mean 5.055538
o2_mean_60m_ann 4.745217
sal_mean 4.411907
chl_mean 3.792187
o2_mean_0m_ann 3.029557
o2_mean_250m 2.746472
ssh_mean 2.606318
o2_mean_60m 2.599873
mld_mean 2.186092
vostr_mean 1.834691
bathy_sd 1.699575
vo_mean 1.699286
uo_mean 1.648957
pred_var 1.562635
uostr_mean 1.123254
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 16 o2_mean_0m_seas 1 o2_mean_0m 314.48
2 3 temp_mean 1 o2_mean_0m 289.71
3 11 bathy_mean 3 temp_mean 193.84
4 18 o2_mean_250m_seas 14 o2_mean_250m 183.63
5 20 o2_mean_60m_ann 11 bathy_mean 145.03
6 19 o2_mean_0m_ann 16 o2_mean_0m_seas 85.20
7 16 o2_mean_0m_seas 9 ssh_mean 79.30
8 13 o2_mean_60m 11 bathy_mean 71.76
9 16 o2_mean_0m_seas 11 bathy_mean 65.70
10 20 o2_mean_60m_ann 3 temp_mean 62.46
11 21 o2_mean_250m_ann 18 o2_mean_250m_seas 55.73
12 19 o2_mean_0m_ann 4 sal_mean 51.58
13 8 vostr_mean 4 sal_mean 51.37
14 15 pred_var 8 vostr_mean 51.21
15 17 o2_mean_60m_seas 4 sal_mean 48.85
16 11 bathy_mean 1 o2_mean_0m 44.77
17 19 o2_mean_0m_ann 3 temp_mean 44.26
18 12 bathy_sd 8 vostr_mean 44.13
19 11 bathy_mean 4 sal_mean 43.04
20 11 bathy_mean 9 ssh_mean 42.54
21 10 mld_mean 8 vostr_mean 40.56
22 5 uo_mean 3 temp_mean 39.63
[1] "External percent deviance explained"
[1] 0.5616733
[1] "TPR"
[1] 0.7273396
[1] "TSS"
[1] 0.7702744
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5250 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3019287 0.8070669 0.956105 0.9990883 0.5616733 0.5686922
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862856
Residual.Deviance 0.5671679
Correlation 0.8278610
AUC 0.9657000
Per.Expl 59.0872269
cvDeviance 0.8361002
cvCorrelation 0.6730030
cvAUC 0.8850800
cvPer.Expl 39.6877369
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 16.282045
o2_mean_0m 12.496115
o2_mean_250m_ann 8.725825
o2_mean_0m_seas 8.005763
temp_mean 6.076514
o2_mean_60m_seas 4.786964
sal_mean 4.595102
lat 4.281887
o2_mean_250m_seas 3.733350
o2_mean_60m_ann 3.494279
chl_mean 3.473096
o2_mean_250m 2.798671
bathy_mean 2.791004
o2_mean_0m_ann 2.672064
ssh_mean 2.344051
o2_mean_60m 2.321452
mld_mean 2.100745
vostr_mean 1.812754
pred_var 1.605802
uo_mean 1.593011
vo_mean 1.490482
bathy_sd 1.280824
uostr_mean 1.238201
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 281.68
2 4 temp_mean 1 lat 188.66
3 22 o2_mean_60m_ann 14 dist_coast 161.35
4 20 o2_mean_250m_seas 16 o2_mean_250m 153.50
5 18 o2_mean_0m_seas 2 o2_mean_0m 137.29
6 14 dist_coast 9 vostr_mean 97.66
7 21 o2_mean_0m_ann 18 o2_mean_0m_seas 87.06
8 18 o2_mean_0m_seas 3 chl_mean 84.54
9 17 pred_var 9 vostr_mean 69.98
10 15 o2_mean_60m 14 dist_coast 66.32
11 12 bathy_mean 4 temp_mean 65.44
12 22 o2_mean_60m_ann 1 lat 62.14
13 18 o2_mean_0m_seas 10 ssh_mean 58.26
14 21 o2_mean_0m_ann 5 sal_mean 54.97
15 23 o2_mean_250m_ann 20 o2_mean_250m_seas 46.00
16 14 dist_coast 5 sal_mean 44.14
17 17 pred_var 5 sal_mean 44.13
18 22 o2_mean_60m_ann 4 temp_mean 40.49
19 9 vostr_mean 5 sal_mean 38.35
20 5 sal_mean 4 temp_mean 37.25
21 13 bathy_sd 9 vostr_mean 37.03
22 22 o2_mean_60m_ann 12 bathy_mean 36.84
23 21 o2_mean_0m_ann 4 temp_mean 36.29
24 19 o2_mean_60m_seas 5 sal_mean 35.59
25 12 bathy_mean 10 ssh_mean 33.24
26 20 o2_mean_250m_seas 18 o2_mean_0m_seas 29.74
[1] "External percent deviance explained"
[1] 0.582118
[1] "TPR"
[1] 0.7302281
[1] "TSS"
[1] 0.7874428
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2928683 0.820827 0.9619062 0.9987813 0.582118 0.5908723
AGI models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862835
Residual.Deviance 0.8251404
Correlation 0.6939540
AUC 0.8991000
Per.Expl 40.4782338
cvDeviance 0.9404219
cvCorrelation 0.6134191
cvAUC 0.8508300
cvPer.Expl 32.1623635
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_seas 22.274332
temp_mean 18.540774
bathy_mean 12.674723
AGI_0m_seas 10.926103
sal_mean 8.104305
AGI_60m_seas 6.579498
chl_mean 4.148226
ssh_mean 4.078015
mld_mean 2.972459
vostr_mean 2.221736
vo_mean 1.866460
bathy_sd 1.737114
uo_mean 1.398152
uostr_mean 1.319364
pred_var 1.158739
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m_seas 2 temp_mean 250.88
2 14 AGI_60m_seas 2 temp_mean 103.94
3 10 bathy_mean 2 temp_mean 77.52
4 15 AGI_250m_seas 3 sal_mean 56.09
5 15 AGI_250m_seas 2 temp_mean 56.09
6 7 vostr_mean 2 temp_mean 42.38
7 13 AGI_0m_seas 9 mld_mean 37.60
8 14 AGI_60m_seas 10 bathy_mean 35.78
9 2 temp_mean 1 chl_mean 35.67
10 6 vo_mean 3 sal_mean 32.86
11 13 AGI_0m_seas 8 ssh_mean 30.74
[1] "External percent deviance explained"
[1] 0.3934875
[1] "TPR"
[1] 0.6960424
[1] "TSS"
[1] 0.6104789
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3684756 0.6826742 0.893425 1.009876 0.3934875 0.4047823
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862835
Residual.Deviance 0.8006785
Correlation 0.7068682
AUC 0.9066000
Per.Expl 42.2427977
cvDeviance 0.9168479
cvCorrelation 0.6273558
cvAUC 0.8592600
cvPer.Expl 33.8628864
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 28.8893087
temp_mean 10.5432446
AGI_0m_seas 10.1288917
AGI_250m_seas 10.0417742
lat 8.8687497
sal_mean 7.4016025
AGI_60m_seas 4.3633851
bathy_mean 4.1233081
chl_mean 3.7589339
ssh_mean 3.0280635
mld_mean 2.1940657
vostr_mean 1.5684098
vo_mean 1.2555925
pred_var 1.0180637
uo_mean 0.9757527
bathy_sd 0.9564136
uostr_mean 0.8844400
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 97.96
2 16 AGI_60m_seas 3 temp_mean 97.27
3 15 AGI_0m_seas 3 temp_mean 75.27
4 17 AGI_250m_seas 4 sal_mean 68.92
5 4 sal_mean 1 lat 61.71
6 8 vostr_mean 3 temp_mean 52.21
7 3 temp_mean 1 lat 38.14
8 3 temp_mean 2 chl_mean 35.84
9 15 AGI_0m_seas 14 pred_var 31.75
10 15 AGI_0m_seas 10 mld_mean 29.54
11 6 uostr_mean 1 lat 27.49
12 17 AGI_250m_seas 15 AGI_0m_seas 24.40
13 9 ssh_mean 1 lat 21.37
14 13 dist_coast 4 sal_mean 20.69
[1] "External percent deviance explained"
[1] 0.4108189
[1] "TPR"
[1] 0.6997377
[1] "TSS"
[1] 0.6225376
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3625151 0.6956699 0.9008476 1.010332 0.4108189 0.422428
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862835
Residual.Deviance 0.7076969
Correlation 0.7638716
AUC 0.9383000
Per.Expl 48.9500550
cvDeviance 0.9482559
cvCorrelation 0.6060927
cvAUC 0.8473700
cvPer.Expl 31.5972584
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 22.322069
temp_mean 17.196598
bathy_mean 8.824421
sal_mean 8.746002
AGI_60m_ann 6.666562
chl_mean 6.027215
AGI_0m_ann 5.506478
ssh_mean 4.904209
mld_mean 3.679155
vostr_mean 3.311048
bathy_sd 2.955120
vo_mean 2.697215
uostr_mean 2.496864
uo_mean 2.416078
pred_var 2.250965
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 15 AGI_250m_ann 13 AGI_0m_ann 145.88
2 2 temp_mean 1 chl_mean 139.37
3 6 vo_mean 3 sal_mean 114.16
4 7 vostr_mean 2 temp_mean 101.76
5 15 AGI_250m_ann 2 temp_mean 81.32
6 12 pred_var 4 uo_mean 70.90
7 3 sal_mean 2 temp_mean 63.15
8 8 ssh_mean 2 temp_mean 63.05
9 14 AGI_60m_ann 8 ssh_mean 61.26
10 13 AGI_0m_ann 2 temp_mean 55.14
11 8 ssh_mean 1 chl_mean 53.38
[1] "External percent deviance explained"
[1] 0.4801424
[1] "TPR"
[1] 0.7163946
[1] "TSS"
[1] 0.7031821
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3344379 0.7553953 0.934186 1.006753 0.4801424 0.4895005
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862835
Residual.Deviance 0.6515083
Correlation 0.7920543
AUC 0.9517000
Per.Expl 53.0032392
cvDeviance 0.9139383
cvCorrelation 0.6265787
cvAUC 0.8589900
cvPer.Expl 34.0727658
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 22.423691
AGI_250m_ann 10.765829
temp_mean 10.510588
lat 9.523071
sal_mean 7.696498
chl_mean 5.973845
AGI_60m_ann 4.842763
AGI_0m_ann 4.753056
ssh_mean 3.983158
bathy_mean 3.959771
mld_mean 2.875991
vostr_mean 2.499355
vo_mean 2.351126
pred_var 2.277252
uo_mean 1.961842
bathy_sd 1.834693
uostr_mean 1.767471
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 lat 162.87
2 8 vostr_mean 3 temp_mean 139.21
3 3 temp_mean 2 chl_mean 125.54
4 16 AGI_60m_ann 1 lat 96.96
5 6 uostr_mean 1 lat 93.81
6 13 dist_coast 10 mld_mean 91.10
7 11 bathy_mean 3 temp_mean 90.92
8 15 AGI_0m_ann 3 temp_mean 86.23
9 17 AGI_250m_ann 13 dist_coast 85.41
10 13 dist_coast 4 sal_mean 77.67
11 8 vostr_mean 1 lat 76.13
12 16 AGI_60m_ann 13 dist_coast 74.50
13 15 AGI_0m_ann 4 sal_mean 67.33
14 16 AGI_60m_ann 9 ssh_mean 59.22
[1] "External percent deviance explained"
[1] 0.5203384
[1] "TPR"
[1] 0.723391
[1] "TSS"
[1] 0.7405006
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3183485 0.7838702 0.9482195 1.006969 0.5203384 0.5300324
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862835
Residual.Deviance 0.5664194
Correlation 0.8279527
AUC 0.9661000
Per.Expl 59.1411534
cvDeviance 0.8483558
cvCorrelation 0.6645373
cvAUC 0.8805500
cvPer.Expl 38.8035803
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 12.823573
temp_mean 12.702689
AGI_0m 11.412626
bathy_mean 7.586913
AGI_0m_seas 7.377428
sal_mean 5.367600
AGI_60m_ann 4.888697
AGI_250m_seas 4.318591
AGI_0m_ann 3.839247
AGI_250m 3.720149
AGI_60m_seas 3.717035
ssh_mean 3.636899
chl_mean 3.126271
vostr_mean 2.371861
mld_mean 2.186303
AGI_60m 2.058995
bathy_sd 2.002734
vo_mean 1.870397
uo_mean 1.766527
pred_var 1.653718
uostr_mean 1.571746
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 1419.86
2 20 AGI_60m_ann 16 AGI_0m_seas 311.03
3 12 AGI_0m 10 bathy_mean 135.86
4 7 vostr_mean 2 temp_mean 124.41
5 16 AGI_0m_seas 14 AGI_250m 100.59
6 19 AGI_0m_ann 16 AGI_0m_seas 94.31
7 12 AGI_0m 3 sal_mean 91.29
8 16 AGI_0m_seas 2 temp_mean 69.16
9 18 AGI_250m_seas 14 AGI_250m 67.35
10 20 AGI_60m_ann 8 ssh_mean 65.77
11 12 AGI_0m 8 ssh_mean 56.81
12 13 AGI_60m 10 bathy_mean 51.64
13 16 AGI_0m_seas 11 bathy_sd 47.71
14 16 AGI_0m_seas 7 vostr_mean 44.87
15 19 AGI_0m_ann 14 AGI_250m 42.03
16 20 AGI_60m_ann 10 bathy_mean 36.72
17 16 AGI_0m_seas 9 mld_mean 35.92
18 8 ssh_mean 3 sal_mean 35.35
19 17 AGI_60m_seas 12 AGI_0m 34.64
20 21 AGI_250m_ann 2 temp_mean 34.02
21 16 AGI_0m_seas 15 pred_var 33.74
22 15 pred_var 7 vostr_mean 32.25
[1] "External percent deviance explained"
[1] 0.581436
[1] "TPR"
[1] 0.730664
[1] "TSS"
[1] 0.7829999
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2937248 0.8196993 0.9627405 1.002194 0.581436 0.5914115
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862835
Residual.Deviance 0.5708909
Correlation 0.8249740
AUC 0.9649000
Per.Expl 58.8186041
cvDeviance 0.8334177
cvCorrelation 0.6724646
cvAUC 0.8849000
cvPer.Expl 39.8811495
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 19.788149
AGI_0m 10.839193
lat 6.905782
temp_mean 6.888633
AGI_0m_seas 6.749013
AGI_250m_ann 6.354972
sal_mean 5.264005
AGI_60m_ann 4.058445
AGI_0m_ann 3.397817
AGI_60m_seas 3.367171
AGI_250m 3.334527
bathy_mean 3.156230
chl_mean 2.908569
ssh_mean 2.889818
AGI_250m_seas 2.355303
AGI_60m 1.897931
mld_mean 1.739738
pred_var 1.565287
vo_mean 1.551979
vostr_mean 1.506959
uo_mean 1.287787
uostr_mean 1.126974
bathy_sd 1.065720
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 868.54
2 3 temp_mean 1 lat 252.16
3 22 AGI_60m_ann 18 AGI_0m_seas 241.04
4 8 vostr_mean 3 temp_mean 137.38
5 6 uostr_mean 1 lat 120.49
6 21 AGI_0m_ann 18 AGI_0m_seas 108.12
7 13 AGI_0m 11 bathy_mean 102.44
8 13 AGI_0m 9 ssh_mean 70.50
9 20 AGI_250m_seas 4 sal_mean 69.96
10 23 AGI_250m_ann 14 dist_coast 69.31
11 18 AGI_0m_seas 16 AGI_250m 66.03
12 4 sal_mean 1 lat 65.97
13 20 AGI_250m_seas 16 AGI_250m 57.06
14 13 AGI_0m 1 lat 56.19
15 22 AGI_60m_ann 11 bathy_mean 51.55
16 14 dist_coast 10 mld_mean 50.67
17 21 AGI_0m_ann 1 lat 47.56
18 17 pred_var 8 vostr_mean 44.15
19 22 AGI_60m_ann 1 lat 38.77
20 22 AGI_60m_ann 14 dist_coast 38.74
21 18 AGI_0m_seas 3 temp_mean 36.97
22 18 AGI_0m_seas 15 AGI_60m 35.31
23 18 AGI_0m_seas 17 pred_var 31.98
24 18 AGI_0m_seas 13 AGI_0m 31.73
25 16 AGI_250m 11 bathy_mean 30.47
26 12 bathy_sd 7 vo_mean 29.92
[1] "External percent deviance explained"
[1] 0.5777579
[1] "TPR"
[1] 0.729998
[1] "TSS"
[1] 0.772311
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.295842 0.816219 0.9614117 1.004362 0.5777579 0.588186
Summary table of results
output_sum_seas_ann <- read.csv (here ("data/brt/mod_outputs/brt_crw_seas_ann_output_summary.csv" ))
kableExtra:: kable (output_sum_seas_ann)
brt_base_0m_seas_Nspat_Ntag
36.455
0.349
0.683
0.556
0.867
0.385
0.640
0.365
brt_base_0m_ann_Nspat_Ntag
47.548
0.433
0.704
0.651
0.909
0.354
0.711
0.475
brt_do_0m_60m_250m_seas_Nspat_Ntag
41.659
0.400
0.696
0.617
0.894
0.366
0.686
0.417
brt_do_0m_60m_250m_seas_Yspat_Ntag
43.093
0.412
0.699
0.624
0.899
0.363
0.695
0.431
brt_do_0m_60m_250m_ann_Nspat_Ntag
48.339
0.450
0.709
0.668
0.919
0.347
0.729
0.483
brt_do_0m_60m_250m_ann_Yspat_Ntag
52.852
0.485
0.715
0.698
0.932
0.334
0.754
0.529
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag
56.869
0.531
0.721
0.733
0.944
0.316
0.783
0.569
brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag
59.087
0.547
0.724
0.747
0.949
0.309
0.793
0.591
brt_agi_0m_60m_250m_seas_Nspat_Ntag
40.478
0.381
0.692
0.595
0.886
0.373
0.672
0.405
brt_agi_0m_60m_250m_seas_Yspat_Ntag
42.243
0.397
0.696
0.612
0.893
0.367
0.684
0.422
brt_agi_0m_60m_250m_ann_Nspat_Ntag
48.950
0.442
0.706
0.659
0.914
0.350
0.722
0.490
brt_agi_0m_60m_250m_ann_Yspat_Ntag
53.003
0.479
0.713
0.694
0.928
0.336
0.749
0.530
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag
59.141
0.542
0.723
0.743
0.947
0.311
0.790
0.591
brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag
58.819
0.543
0.723
0.743
0.947
0.311
0.791
0.588
base_0m_daily_Nspat_Ntag
42.389
0.385
0.695
0.613
0.892
0.371
0.679
0.424
do_0m_daily_Nspat_Ntag
49.447
0.450
0.708
0.671
0.917
0.347
0.727
0.494
agi_0m_daily_Nspat_Ntag
48.505
0.437
0.705
0.652
0.911
0.352
0.716
0.485
output_sum_seas_ann_Nspat <- output_sum_seas_ann %>%
filter (! grepl ("Yspat" , model))
ggplot (output_sum_seas_ann_Nspat, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial seasonal/annual models
Seasonal and annual base models were comparable in performance to the daily resolution base model, with seasonal performing slightly worse, and annual performing slightly better.
The AGI model with all depth layers and resolutions performed the best if only looking at models with no spatial predictor variables, but the comparable DO model performed similarly.
Annual models generally performed better than seasonal ones, but the models with data at a daily, seasonal, and annual data performed considerably better.
For the DO model with all depths and temporal resolutions, the two predictors with the highest relative influence (and whose values were quite comparable) were DO_250m_annual and DO_0m_daily. The remaining seasonal DO values were also highly ranked. Partial plots either show a negative correlation or a sweet spot range of DO values at each of the depth layers and resolutions.
For the AGI model with all depths and temporal resolutions, the top predictor variable is AGI_250m_annual, which is closely followed by daily temperature at 0m. Lower down the list is AGI_0m_daily, bathymetry, and AGI_0m_seasonal. Partial plot relationships show similar trends as described previously.
Model fine-tuning and selection
Here, I take the two best performing models from the above sections (agi and do with all depths and temporal resolutions without tag ID or spatial variables as predictors) to be used as overfit reference models. The following model options excluded the wind predictors as these consistently had lower relative importance than the random predictor variable we included. I also included a combo model that uses information about AGI at 250 m and DO at 0m across temporal resolutions. Lastly, the final models also remove do/agi at 60m and at a seasonal resolution, as these were typically the vars with the lowest predictive performance relative to the other depth layers and resolutions.
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_base_0m_dail_no_wind.rds" ,
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862823
Residual.Deviance 0.8207228
Correlation 0.7026177
AUC 0.9063000
Per.Expl 40.7968459
cvDeviance 1.0092555
cvCorrelation 0.5671984
cvAUC 0.8240500
cvPer.Expl 27.1969674
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.402818
temp_mean 21.775619
sal_mean 12.415293
chl_mean 9.738094
bathy_sd 7.545582
ssh_mean 7.489137
mld_mean 6.792296
pred_var 3.841160
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 6 bathy_mean 2 temp_mean 704.51
2 4 ssh_mean 2 temp_mean 449.02
3 3 sal_mean 2 temp_mean 290.44
[1] "External percent deviance explained"
[1] 0.3727562
[1] "TPR"
[1] 0.691841
[1] "TSS"
[1] 0.5944967
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4200 iterations were performed.
There were 8 predictors of which 8 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3753995 0.6673045 0.8850325 1.001694 0.3727562 0.4079685
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862871
Residual.Deviance 0.5891149
Correlation 0.8163435
AUC 0.9606000
Per.Expl 57.5041190
cvDeviance 0.8440735
cvCorrelation 0.6692730
cvAUC 0.8827300
cvPer.Expl 39.1126493
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 17.287654
o2_mean_250m_ann 14.631246
o2_mean_0m_seas 8.910687
o2_mean_250m_seas 7.142539
temp_mean 6.627435
o2_mean_60m_ann 5.495030
o2_mean_60m_seas 5.325783
bathy_mean 4.964121
sal_mean 4.592187
o2_mean_250m 4.465235
chl_mean 4.270698
o2_mean_0m_ann 3.305816
o2_mean_60m 3.042965
ssh_mean 2.916029
mld_mean 2.724895
bathy_sd 2.244270
pred_var 2.053409
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 508.25
2 12 o2_mean_0m_seas 1 o2_mean_0m 301.24
3 14 o2_mean_250m_seas 10 o2_mean_250m 166.76
4 12 o2_mean_0m_seas 5 ssh_mean 150.68
5 12 o2_mean_0m_seas 2 chl_mean 122.58
6 4 sal_mean 3 temp_mean 101.22
7 15 o2_mean_0m_ann 4 sal_mean 89.73
8 16 o2_mean_60m_ann 7 bathy_mean 86.63
9 9 o2_mean_60m 7 bathy_mean 81.59
10 9 o2_mean_60m 8 bathy_sd 77.18
11 11 pred_var 7 bathy_mean 66.90
12 7 bathy_mean 3 temp_mean 65.75
13 12 o2_mean_0m_seas 4 sal_mean 63.82
14 13 o2_mean_60m_seas 4 sal_mean 63.74
[1] "External percent deviance explained"
[1] 0.5285492
[1] "TPR"
[1] 0.7203563
[1] "TSS"
[1] 0.7319403
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3166705 0.7799527 0.9421311 1.001803 0.5285492 0.5750412
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862598
Residual.Deviance 0.1784155
Correlation 0.9620158
AUC 0.9980000
Per.Expl 87.1297181
cvDeviance 0.4443144
cvCorrelation 0.8616943
cvAUC 0.9680100
cvPer.Expl 67.9486949
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.410142
temp_mean 22.090687
AGI_250m_seas 8.885923
AGI_0m 7.036726
AGI_0m_seas 4.031791
sal_mean 3.863634
AGI_250m_ann 3.717057
AGI_250m 3.185755
AGI_60m_ann 3.018782
ssh_mean 2.964206
chl_mean 2.686814
AGI_60m_seas 2.575530
AGI_0m_ann 1.639762
AGI_60m 1.581991
bathy_sd 1.553865
mld_mean 1.073780
pred_var 0.683555
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 3465.55
2 15 AGI_0m_ann 12 AGI_0m_seas 386.42
3 6 bathy_mean 3 sal_mean 359.44
4 16 AGI_60m_ann 6 bathy_mean 297.40
5 16 AGI_60m_ann 12 AGI_0m_seas 290.80
6 17 AGI_250m_ann 3 sal_mean 258.66
7 14 AGI_250m_seas 6 bathy_mean 192.39
8 14 AGI_250m_seas 2 temp_mean 163.51
9 8 AGI_0m 4 ssh_mean 162.57
10 6 bathy_mean 2 temp_mean 154.13
11 14 AGI_250m_seas 10 AGI_250m 153.38
12 3 sal_mean 2 temp_mean 150.76
13 12 AGI_0m_seas 2 temp_mean 123.54
14 8 AGI_0m 3 sal_mean 103.44
[1] "External percent deviance explained"
[1] -2.981096
[1] "TPR"
[1] 0.3431204
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.8144573 -0.5895665 0.1853505 0.6929731 -2.981096 0.8712972
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_250_DO_0_dail_seas_ann.rds" ,
test_data = readRDS (here ("data/brt/mod_eval/agi_do_test_daily_seasonal_annual.rds" )))
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862903
Residual.Deviance 0.7427241
Correlation 0.7474077
AUC 0.9299000
Per.Expl 46.4236261
cvDeviance 0.9683087
cvCorrelation 0.5940334
cvAUC 0.8398100
cvPer.Expl 30.1510880
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 19.506760
temp_mean 18.786242
bathy_mean 11.359335
sal_mean 9.001494
AGI_250m_seas 7.495655
chl_mean 6.480442
ssh_mean 5.010443
AGI_250m 4.757546
bathy_sd 4.426992
mld_mean 4.362426
pred_var 2.438232
o2_mean_0m_ann 2.265174
o2_mean_0m 2.169848
o2_mean_0m_seas 1.939412
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 AGI_250m_seas 3 sal_mean 214.08
2 10 AGI_250m_seas 8 AGI_250m 181.77
3 6 bathy_mean 2 temp_mean 176.86
4 3 sal_mean 2 temp_mean 124.68
5 11 AGI_250m_ann 10 AGI_250m_seas 107.03
6 4 ssh_mean 2 temp_mean 106.46
7 2 temp_mean 1 chl_mean 72.76
8 6 bathy_mean 4 ssh_mean 67.22
9 11 AGI_250m_ann 2 temp_mean 60.92
10 4 ssh_mean 3 sal_mean 57.36
[1] "External percent deviance explained"
[1] 0.3919503
[1] "TPR"
[1] 0.6946288
[1] "TSS"
[1] 0.6173527
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3692691 0.6793932 0.8906042 1.010997 0.3919503 0.4642363
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_dail_seas_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862871
Residual.Deviance 0.6163731
Correlation 0.8037426
AUC 0.9552000
Per.Expl 55.5378474
cvDeviance 0.8577154
cvCorrelation 0.6617449
cvAUC 0.8786300
cvPer.Expl 38.1285877
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 17.645183
o2_mean_250m_ann 17.206276
o2_mean_0m_seas 10.345249
temp_mean 7.771990
o2_mean_250m_seas 7.199949
bathy_mean 7.171107
sal_mean 6.275883
chl_mean 5.065733
o2_mean_0m_ann 4.840965
o2_mean_250m 4.413205
ssh_mean 4.028459
mld_mean 3.069241
bathy_sd 2.700490
pred_var 2.266270
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 510.24
2 13 o2_mean_0m_ann 3 temp_mean 283.65
3 12 o2_mean_250m_seas 4 sal_mean 254.65
4 11 o2_mean_0m_seas 1 o2_mean_0m 212.12
5 4 sal_mean 3 temp_mean 189.20
6 11 o2_mean_0m_seas 5 ssh_mean 181.69
7 7 bathy_mean 3 temp_mean 158.82
8 13 o2_mean_0m_ann 4 sal_mean 151.35
9 12 o2_mean_250m_seas 9 o2_mean_250m 148.24
10 5 ssh_mean 4 sal_mean 132.70
[1] "External percent deviance explained"
[1] 0.5114889
[1] "TPR"
[1] 0.7177333
[1] "TSS"
[1] 0.7216508
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3235065 0.7685859 0.9368822 1.002484 0.5114889 0.5553785
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_dail_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862871
Residual.Deviance 0.6155217
Correlation 0.8049238
AUC 0.9559000
Per.Expl 55.5992622
cvDeviance 0.8649027
cvCorrelation 0.6571466
cvAUC 0.8761100
cvPer.Expl 37.6101303
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 22.911813
o2_mean_250m_ann 20.906459
temp_mean 7.579760
o2_mean_60m_ann 6.536482
bathy_mean 6.283049
sal_mean 5.382627
o2_mean_60m 5.220181
chl_mean 4.808321
o2_mean_250m 4.411980
o2_mean_0m_ann 4.227451
ssh_mean 3.567634
mld_mean 3.088428
bathy_sd 2.746170
pred_var 2.329644
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 435.39
2 12 o2_mean_0m_ann 4 sal_mean 215.51
3 12 o2_mean_0m_ann 3 temp_mean 157.08
4 5 ssh_mean 4 sal_mean 130.86
5 13 o2_mean_60m_ann 9 o2_mean_60m 108.59
6 9 o2_mean_60m 8 bathy_sd 107.81
7 7 bathy_mean 1 o2_mean_0m 107.33
8 4 sal_mean 3 temp_mean 102.56
9 14 o2_mean_250m_ann 5 ssh_mean 95.09
10 14 o2_mean_250m_ann 1 o2_mean_0m 86.49
[1] "External percent deviance explained"
[1] 0.5100508
[1] "TPR"
[1] 0.7177138
[1] "TSS"
[1] 0.7178233
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.32378 0.768407 0.9368439 1.002037 0.5100508 0.5559926
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_seas_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862871
Residual.Deviance 0.6251108
Correlation 0.8003773
AUC 0.9539000
Per.Expl 54.9075532
cvDeviance 0.8721020
cvCorrelation 0.6521736
cvAUC 0.8737200
cvPer.Expl 37.0908065
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m_seas 19.995636
o2_mean_250m_ann 18.349971
temp_mean 8.136337
o2_mean_60m_ann 6.495673
o2_mean_250m_seas 5.955392
bathy_mean 5.754092
o2_mean_60m_seas 5.701270
sal_mean 5.400275
chl_mean 5.206935
o2_mean_0m_ann 4.813953
ssh_mean 4.746915
bathy_sd 3.881525
mld_mean 3.166622
pred_var 2.395403
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 9 o2_mean_0m_seas 4 ssh_mean 284.85
2 12 o2_mean_0m_ann 9 o2_mean_0m_seas 187.29
3 13 o2_mean_60m_ann 2 temp_mean 130.99
4 10 o2_mean_60m_seas 3 sal_mean 130.12
5 13 o2_mean_60m_ann 6 bathy_mean 127.90
6 4 ssh_mean 3 sal_mean 126.43
7 6 bathy_mean 2 temp_mean 122.51
8 12 o2_mean_0m_ann 2 temp_mean 113.98
9 12 o2_mean_0m_ann 3 sal_mean 113.39
10 9 o2_mean_0m_seas 3 sal_mean 102.47
[1] "External percent deviance explained"
[1] 0.5045185
[1] "TPR"
[1] 0.7169489
[1] "TSS"
[1] 0.7132807
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5750 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.32614 0.7645057 0.9353368 1.000552 0.5045185 0.5490755
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_daily_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862871
Residual.Deviance 0.6484473
Correlation 0.7885861
AUC 0.9484000
Per.Expl 53.2241715
cvDeviance 0.8798869
cvCorrelation 0.6493045
cvAUC 0.8712000
cvPer.Expl 36.5292446
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 23.928416
o2_mean_250m_ann 22.476323
temp_mean 8.228036
bathy_mean 7.977447
sal_mean 7.293728
o2_mean_250m 6.043256
o2_mean_0m_ann 5.555636
chl_mean 5.205159
ssh_mean 4.371214
bathy_sd 3.247484
mld_mean 3.079187
pred_var 2.594113
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 399.72
2 5 ssh_mean 4 sal_mean 271.74
3 11 o2_mean_0m_ann 3 temp_mean 259.43
4 11 o2_mean_0m_ann 4 sal_mean 153.76
5 5 ssh_mean 1 o2_mean_0m 110.68
6 5 ssh_mean 3 temp_mean 109.51
7 7 bathy_mean 3 temp_mean 106.18
[1] "External percent deviance explained"
[1] 0.4903438
[1] "TPR"
[1] 0.7145817
[1] "TSS"
[1] 0.7052563
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3316533 0.7548044 0.9305779 1.003816 0.4903438 0.5322417
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_daily_ann_refined.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862871
Residual.Deviance 0.6761948
Correlation 0.7734595
AUC 0.9410000
Per.Expl 51.2225994
cvDeviance 0.8926418
cvCorrelation 0.6411554
cvAUC 0.8660500
cvPer.Expl 35.6091636
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_250m_ann 26.649713
o2_mean_0m 25.554358
temp_mean 9.448079
bathy_mean 9.055228
sal_mean 7.947775
chl_mean 6.066081
ssh_mean 4.828383
bathy_sd 3.788760
mld_mean 3.688052
pred_var 2.973574
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 659.50
2 10 o2_mean_250m_ann 5 ssh_mean 184.31
3 5 ssh_mean 1 o2_mean_0m 174.16
4 10 o2_mean_250m_ann 1 o2_mean_0m 162.94
5 4 sal_mean 3 temp_mean 153.47
[1] "External percent deviance explained"
[1] 0.4725454
[1] "TPR"
[1] 0.7110877
[1] "TSS"
[1] 0.6838483
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3391331 0.7405838 0.9236049 1.005523 0.4725454 0.512226
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_dail_seas_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862903
Residual.Deviance 0.6277548
Correlation 0.7973163
AUC 0.9522000
Per.Expl 54.7169289
cvDeviance 0.8617301
cvCorrelation 0.6582930
cvAUC 0.8768100
cvPer.Expl 37.8391330
[1] "Relative influence of predictor variables"
rel.inf
temp_mean 15.504668
AGI_250m_ann 15.163685
AGI_0m 11.740021
bathy_mean 11.710734
AGI_0m_seas 8.197830
sal_mean 6.807123
AGI_250m_seas 6.751818
AGI_0m_ann 5.161877
chl_mean 5.090897
AGI_250m 4.624616
bathy_sd 3.784137
mld_mean 3.106183
pred_var 2.356411
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 7 AGI_0m 2 temp_mean 2843.30
2 7 AGI_0m 5 bathy_mean 355.65
3 12 AGI_0m_ann 10 AGI_0m_seas 197.02
4 10 AGI_0m_seas 8 AGI_250m 181.68
5 13 AGI_250m_ann 12 AGI_0m_ann 165.05
6 11 AGI_250m_seas 8 AGI_250m 149.59
7 10 AGI_0m_seas 2 temp_mean 136.02
8 13 AGI_250m_ann 11 AGI_250m_seas 122.24
[1] "External percent deviance explained"
[1] 0.4999456
[1] "TPR"
[1] 0.7155658
[1] "TSS"
[1] 0.7086853
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5450 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.328907 0.7586342 0.9325364 1.007354 0.4999456 0.5471693
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_dail_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862903
Residual.Deviance 0.6367013
Correlation 0.7941612
AUC 0.9510000
Per.Expl 54.0715738
cvDeviance 0.8740148
cvCorrelation 0.6515205
cvAUC 0.8726800
cvPer.Expl 36.9529728
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 17.274493
temp_mean 16.133953
AGI_0m 14.099815
bathy_mean 9.641570
AGI_60m_ann 7.408950
sal_mean 7.026667
AGI_0m_ann 4.754954
chl_mean 4.594943
AGI_250m 4.537709
mld_mean 3.251739
AGI_60m 3.225000
bathy_sd 3.186472
uo_mean 2.713621
pred_var 2.150112
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 2976.46
2 8 AGI_0m 6 bathy_mean 252.24
3 14 AGI_250m_ann 12 AGI_0m_ann 165.77
4 8 AGI_0m 4 uo_mean 162.60
5 12 AGI_0m_ann 5 mld_mean 93.44
6 8 AGI_0m 3 sal_mean 86.30
7 14 AGI_250m_ann 2 temp_mean 83.71
8 12 AGI_0m_ann 7 bathy_sd 80.94
9 12 AGI_0m_ann 2 temp_mean 67.97
10 12 AGI_0m_ann 11 pred_var 65.20
[1] "External percent deviance explained"
[1] 0.4910487
[1] "TPR"
[1] 0.7138839
[1] "TSS"
[1] 0.7005274
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.3321372 0.7529645 0.9291586 1.008817 0.4910487 0.5407157
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_seas_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862903
Residual.Deviance 0.6158797
Correlation 0.8025426
AUC 0.9544000
Per.Expl 55.5735382
cvDeviance 0.8570209
cvCorrelation 0.6610243
cvAUC 0.8783300
cvPer.Expl 38.1788311
[1] "Relative influence of predictor variables"
rel.inf
temp_mean 15.578233
AGI_250m_ann 15.210565
AGI_0m 11.973672
bathy_mean 9.776419
AGI_0m_seas 7.890212
AGI_250m_seas 6.641468
sal_mean 6.271472
AGI_60m_ann 6.177322
AGI_60m_seas 5.458597
AGI_0m_ann 4.936646
mld_mean 3.089618
uo_mean 2.844618
pred_var 2.119871
bathy_sd 2.031289
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 7 AGI_0m 1 temp_mean 2341.72
2 13 AGI_60m_ann 9 AGI_0m_seas 353.40
3 7 AGI_0m 5 bathy_mean 239.91
4 12 AGI_0m_ann 9 AGI_0m_seas 116.40
5 12 AGI_0m_ann 6 bathy_sd 106.43
6 13 AGI_60m_ann 1 temp_mean 101.25
7 7 AGI_0m 3 uo_mean 95.95
8 14 AGI_250m_ann 12 AGI_0m_ann 95.63
9 9 AGI_0m_seas 1 temp_mean 82.38
10 8 pred_var 3 uo_mean 64.47
[1] "External percent deviance explained"
[1] 0.5069531
[1] "TPR"
[1] 0.7162487
[1] "TSS"
[1] 0.7110636
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.326375 0.7627901 0.9339043 1.007275 0.5069531 0.5557354
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_daily_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862903
Residual.Deviance 0.6487719
Correlation 0.7878082
AUC 0.9478000
Per.Expl 53.2008622
cvDeviance 0.8800365
cvCorrelation 0.6486868
cvAUC 0.8714400
cvPer.Expl 36.5185981
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 19.948966
temp_mean 16.717759
AGI_0m 15.017058
bathy_mean 11.083678
sal_mean 7.668646
ssh_mean 5.326252
AGI_0m_ann 5.279991
chl_mean 4.620618
AGI_250m 4.413306
bathy_sd 4.127255
mld_mean 3.363532
pred_var 2.432939
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 2154.06
2 12 AGI_250m_ann 11 AGI_0m_ann 350.53
3 8 AGI_0m 4 ssh_mean 292.80
4 11 AGI_0m_ann 5 mld_mean 171.71
5 12 AGI_250m_ann 2 temp_mean 169.96
6 8 AGI_0m 6 bathy_mean 165.47
7 8 AGI_0m 3 sal_mean 105.61
[1] "External percent deviance explained"
[1] 0.4839341
[1] "TPR"
[1] 0.7127614
[1] "TSS"
[1] 0.6940011
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.334548 0.7485892 0.9269441 1.007254 0.4839341 0.5320086
explore_brt (mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_daily_ann_refined.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862903
Residual.Deviance 0.6893252
Correlation 0.7662559
AUC 0.9372000
Per.Expl 50.2755506
cvDeviance 0.8949519
cvCorrelation 0.6400720
cvAUC 0.8654600
cvPer.Expl 35.4426768
[1] "Relative influence of predictor variables"
rel.inf
AGI_250m_ann 22.930180
temp_mean 17.534897
AGI_0m 16.389380
bathy_mean 12.369388
sal_mean 8.490533
ssh_mean 5.773889
chl_mean 5.512047
bathy_sd 4.792948
mld_mean 3.560432
pred_var 2.646305
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 2783.04
2 8 AGI_0m 6 bathy_mean 264.58
3 8 AGI_0m 4 ssh_mean 255.08
4 10 AGI_250m_ann 2 temp_mean 236.26
5 6 bathy_mean 2 temp_mean 152.65
[1] "External percent deviance explained"
[1] 0.4592981
[1] "TPR"
[1] 0.7080003
[1] "TSS"
[1] 0.6700011
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.344129 0.7301866 0.9173835 1.0093 0.4592981 0.5027555
Summary table of results
output_sum_refined <- read.csv (here ("data/brt/mod_outputs/brt_crw_refined_output_summary.csv" ))
kableExtra:: kable (output_sum_refined)
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag
56.869
0.531
0.721
0.733
0.944
0.316
0.783
0.569
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag
59.141
0.542
0.723
0.743
0.947
0.311
0.790
0.591
base_0m_daily_Nspat_Ntag
42.389
0.385
0.695
0.613
0.892
0.371
0.679
0.424
do_0m_daily_Nspat_Ntag
49.447
0.450
0.708
0.671
0.917
0.347
0.727
0.494
agi_0m_daily_Nspat_Ntag
48.505
0.437
0.705
0.652
0.911
0.352
0.716
0.485
brt_base_0m_dail_no_wind
40.797
0.373
0.692
0.594
0.885
0.375
0.667
0.408
brt_do_0m_60m_250m_dail_seas_ann_no_wind
57.504
0.529
0.720
0.732
0.942
0.317
0.780
0.575
brt_agi_0m_60m_250m_dail_seas_ann_no_wind
57.975
0.524
0.719
0.725
0.940
0.319
0.775
0.580
brt_agi_250_do_0_dail_seas_ann
46.424
0.392
0.695
0.617
0.891
0.369
0.679
0.464
brt_do_0m_250m_dail_seas_ann
55.538
0.511
0.718
0.722
0.937
0.324
0.769
0.555
brt_do_0m_60m_250m_dail_ann
55.599
0.510
0.718
0.719
0.937
0.324
0.768
0.556
brt_do_0m_60m_250m_seas_ann
54.908
0.505
0.717
0.713
0.935
0.326
0.765
0.549
brt_do_0m_250m_dail_ann
53.224
0.490
0.715
0.705
0.931
0.332
0.755
0.532
brt_do_0m_250m_dail_ann_refined
51.223
0.473
0.711
0.683
0.924
0.339
0.741
0.512
brt_agi_0m_250m_dail_seas_ann
55.086
0.501
0.716
0.701
0.933
0.328
0.759
0.551
brt_agi_0m_60m_250m_dail_ann
54.072
0.491
0.714
0.701
0.929
0.332
0.753
0.541
brt_agi_0m_60m_250m_seas_ann
55.574
0.507
0.716
0.711
0.934
0.326
0.763
0.556
brt_agi_0m_250m_dail_ann
53.201
0.484
0.713
0.694
0.927
0.335
0.749
0.532
brt_agi_0m_250m_dail_ann_refined
50.276
0.459
0.708
0.670
0.917
0.344
0.730
0.503
ggplot (output_sum_refined, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from adjusted models
The DO or AGI annual values at 250 m and the DO or AGI daily values at 0m were consistently those with the highest relative importance.
The reference models (that are likely overfit) still performed the best, with the AGI model having the highest scores across performance metrics.
Seems like removing the wind predictors doesn’t really change the reference models, so we can move forward without them.
All modified models w/o a temporal resolution or depth layer were all within 0.05 TSS and AUC.
The combined AGI and DO model performed poorly. Will be best to keep them separated.