Mako hSDM BRT explore (CRW PAs)

Author

Emily Nazario

Published

August 8, 2024

On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps.

The majority of the predictors included in the following models are at a daily temporal resolution. However, for the DO and AGI models, we also investigated the inclusion of these two predictors at seasonal and annual temporal resolutions. The remaining environmental predictors are also available at these resolutions, and can included in follow-up models.

The pseudo absences used in these models were generated using correlated random walk approaches, but another quarto document includes models with background sampling pseudo absences. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.

The hypotheses I would like to test with these models are as follows:

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Base models

These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.

explore_brt(mod_file_path = brt_outputs[7], 
            test_data = base_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.7986447
Correlation        0.7174286
AUC                0.9148000
Per.Expl          42.3894599
cvDeviance         1.0127285
cvCorrelation      0.5642493
cvAUC              0.8220800
cvPer.Expl        26.9464423
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 27.367481
temp_mean  18.477169
sal_mean   10.540214
chl_mean    8.710175
ssh_mean    6.015610
mld_mean    5.861958
vostr_mean  5.303627
bathy_sd    5.270798
vo_mean     3.496298
uo_mean     3.392592
uostr_mean  2.849782
pred_var    2.714296
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   363.44
2          6    vo_mean          3   sal_mean   193.16
3         12   pred_var          4    uo_mean   168.67
4          8   ssh_mean          2  temp_mean   162.10
5          2  temp_mean          1   chl_mean   129.68
6         10 bathy_mean          8   ssh_mean   110.34
7          8   ssh_mean          1   chl_mean    98.98
[1] "External percent deviance explained"
[1] 0.3850404

[1] "TPR"
[1] 0.6952086
[1] "TSS"
[1] 0.6133057
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4150 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3706824 0.6788681 0.8917619  1.003817         0.3850404 0.4238946
explore_brt(mod_file_path = brt_outputs[8], 
            test_data = base_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.3815785
Correlation        0.8963614
AUC                0.9887000
Per.Expl          72.4746916
cvDeviance         0.6107316
cvCorrelation      0.7793802
cvAUC              0.9410600
cvPer.Expl        55.9446474
[1] "Relative influence of predictor variables"

             rel.inf
tag        50.031513
bathy_mean 16.608199
temp_mean   8.783888
sal_mean    5.805500
chl_mean    3.982330
ssh_mean    3.974665
vostr_mean  2.251400
mld_mean    2.045141
bathy_sd    1.668791
vo_mean     1.313441
uostr_mean  1.275039
uo_mean     1.217342
pred_var    1.042751
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         11 bathy_mean          1        tag  1234.81
2          3  temp_mean          1        tag  1203.85
3          4   sal_mean          1        tag  1164.68
4          9   ssh_mean          1        tag   423.01
5          2   chl_mean          1        tag   377.50
6         12   bathy_sd          1        tag   206.76
7         13   pred_var          1        tag   181.75
8         11 bathy_mean          3  temp_mean   178.96
[1] "External percent deviance explained"
[1] 0.6770616

[1] "TPR"
[1] 0.7374214
[1] "TSS"
[1] 0.8360522
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2551359 0.864596 0.9762279 0.9982626         0.6770616 0.7247469
explore_brt(mod_file_path = brt_outputs[9], 
            test_data = base_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.3356425
Correlation        0.9105272
AUC                0.9916000
Per.Expl          75.7883011
cvDeviance         0.5396134
cvCorrelation      0.8105200
cvAUC              0.9550200
cvPer.Expl        61.0747802
[1] "Relative influence of predictor variables"

              rel.inf
tag        47.5168117
dist_coast 18.2926688
lat         7.4869016
bathy_mean  5.5577763
temp_mean   4.8362634
sal_mean    4.3607456
chl_mean    3.0609643
ssh_mean    1.8297866
vostr_mean  1.6011683
mld_mean    1.4125607
pred_var    0.9512854
bathy_sd    0.9198428
vo_mean     0.7636076
uo_mean     0.7143868
uostr_mean  0.6952301
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   944.49
2          12 bathy_mean          1        tag   585.86
3           4  temp_mean          1        tag   510.39
4           5   sal_mean          1        tag   378.67
5          14 dist_coast          1        tag   349.40
6           3   chl_mean          1        tag   287.67
7          10   ssh_mean          1        tag   178.03
8          11   mld_mean          1        tag   147.57
9          15   pred_var          1        tag   129.07
10         13   bathy_sd          1        tag   105.68
11          8    vo_mean          1        tag    86.99
[1] "External percent deviance explained"
[1] 0.7120117

[1] "TPR"
[1] 0.7398529
[1] "TSS"
[1] 0.8502702
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2400452 0.8807886 0.9809981 0.9991846         0.7120117 0.757883

DO models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).

explore_brt(mod_file_path = brt_outputs[14], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3442681
Correlation        0.9078579
AUC                0.9908000
Per.Expl          75.1662880
cvDeviance         0.5595631
cvCorrelation      0.8035769
cvAUC              0.9512600
cvPer.Expl        59.6360292
[1] "Relative influence of predictor variables"

              rel.inf
tag        46.1232094
bathy_mean 16.5549379
o2_mean_0m 14.2363652
temp_mean   4.3829271
sal_mean    4.0579305
chl_mean    3.6413222
ssh_mean    2.7853999
mld_mean    1.8441505
bathy_sd    1.3707436
vostr_mean  1.3122101
vo_mean     1.0065403
pred_var    0.9193902
uostr_mean  0.9162099
uo_mean     0.8486632
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12 bathy_mean          1        tag  1167.29
2           2 o2_mean_0m          1        tag   789.99
3           4  temp_mean          1        tag   769.65
4           5   sal_mean          1        tag   747.93
5          10   ssh_mean          1        tag   246.55
6           3   chl_mean          1        tag   244.15
7           4  temp_mean          2 o2_mean_0m   194.97
8          13   bathy_sd          1        tag   194.44
9           8    vo_mean          1        tag   143.85
10         14   pred_var          1        tag   108.26
[1] "External percent deviance explained"
[1] 0.7150451

[1] "TPR"
[1] 0.7406633
[1] "TSS"
[1] 0.852678
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7650 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2388119 0.8826053 0.9824973 0.9967742         0.7150451 0.7516629
explore_brt(mod_file_path = brt_outputs[15], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3077135
Correlation        0.9191926
AUC                0.9929000
Per.Expl          77.8031473
cvDeviance         0.5030906
cvCorrelation      0.8262495
cvAUC              0.9613100
cvPer.Expl        63.7096644
[1] "Relative influence of predictor variables"

              rel.inf
tag        45.2009449
dist_coast 18.4105318
o2_mean_0m 10.3754042
lat         6.4377993
bathy_mean  5.1604232
sal_mean    2.8517201
temp_mean   2.3917958
chl_mean    2.0825112
ssh_mean    1.4059547
mld_mean    1.2399969
vostr_mean  1.0909863
bathy_sd    0.8612787
pred_var    0.8202809
vo_mean     0.6310177
uostr_mean  0.6049444
uo_mean     0.4344099
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   935.71
2           3 o2_mean_0m          1        tag   599.92
3          13 bathy_mean          1        tag   570.45
4          15 dist_coast          1        tag   297.19
5           5  temp_mean          1        tag   273.14
6           6   sal_mean          1        tag   259.17
7           4   chl_mean          1        tag   148.96
8          11   ssh_mean          1        tag   138.61
9           5  temp_mean          3 o2_mean_0m   137.74
10         14   bathy_sd          1        tag   122.14
11          9    vo_mean          1        tag   106.95
12         12   mld_mean          1        tag    97.74
13         16   pred_var          1        tag    91.82
[1] "External percent deviance explained"
[1] 0.7416577

[1] "TPR"
[1] 0.7422192
[1] "TSS"
[1] 0.8637654
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7350 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.226637 0.8945919 0.9855039 0.9997086         0.7416577 0.7780315
explore_brt(mod_file_path = brt_outputs[13], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3336455
Correlation        0.9113159
AUC                0.9914000
Per.Expl          75.9325541
cvDeviance         0.5484820
cvCorrelation      0.8074909
cvAUC              0.9531600
cvPer.Expl        60.4353619
[1] "Relative influence of predictor variables"

               rel.inf
tag         45.8318388
bathy_mean  15.6731379
o2_mean_0m  13.4749675
temp_mean    4.0324904
o2_mean_60m  4.0124893
sal_mean     3.7440800
chl_mean     3.4171857
ssh_mean     2.5269272
mld_mean     1.6460394
bathy_sd     1.2069586
vostr_mean   1.1031311
vo_mean      0.8942394
pred_var     0.8418653
uo_mean      0.8246185
uostr_mean   0.7700309
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index  var1.names var2.index var2.names int.size
1          12  bathy_mean          1        tag   863.04
2           2  o2_mean_0m          1        tag   752.48
3           4   temp_mean          1        tag   727.72
4           5    sal_mean          1        tag   659.23
5          14 o2_mean_60m          1        tag   352.84
6          10    ssh_mean          1        tag   208.39
7           4   temp_mean          2 o2_mean_0m   204.79
8           3    chl_mean          1        tag   199.75
9          13    bathy_sd          1        tag   155.00
10          8     vo_mean          1        tag   132.19
11         15    pred_var          1        tag   104.14
[1] "External percent deviance explained"
[1] 0.7206636

[1] "TPR"
[1] 0.7408662
[1] "TSS"
[1] 0.8563363
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2363376 0.8849535 0.9829041 0.9978414         0.7206636 0.7593255
explore_brt(mod_file_path = brt_outputs[10], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3421213
Correlation        0.9070514
AUC                0.9904000
Per.Expl          75.3211530
cvDeviance         0.5490893
cvCorrelation      0.8069965
cvAUC              0.9530700
cvPer.Expl        60.3915546
[1] "Relative influence of predictor variables"

                rel.inf
tag          46.4292453
o2_mean_0m   15.1805295
o2_mean_250m 13.2507504
bathy_mean    7.9891172
sal_mean      3.2341166
temp_mean     2.9293936
ssh_mean      2.2599756
chl_mean      2.1392953
mld_mean      1.2551674
bathy_sd      1.1236108
uostr_mean    0.8767264
pred_var      0.8541930
vostr_mean    0.8531117
vo_mean       0.8470162
uo_mean       0.7777511
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           2   o2_mean_0m          1        tag   801.68
2          12   bathy_mean          1        tag   732.17
3           5     sal_mean          1        tag   671.55
4           4    temp_mean          1        tag   593.41
5          14 o2_mean_250m          1        tag   331.86
6           3     chl_mean          1        tag   224.17
7          10     ssh_mean          1        tag   172.47
8           4    temp_mean          2 o2_mean_0m   125.51
9          14 o2_mean_250m          2 o2_mean_0m   114.20
10          8      vo_mean          1        tag   114.05
11         13     bathy_sd          1        tag   110.04
[1] "External percent deviance explained"
[1] 0.7167128

[1] "TPR"
[1] 0.740492
[1] "TSS"
[1] 0.8476697
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.238839 0.8821954 0.9821616 0.9976817         0.7167128 0.7532115
explore_brt(mod_file_path = brt_outputs[11], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3305617
Correlation        0.9113032
AUC                0.9913000
Per.Expl          76.1550015
cvDeviance         0.5396544
cvCorrelation      0.8109503
cvAUC              0.9547000
cvPer.Expl        61.0721392
[1] "Relative influence of predictor variables"

                rel.inf
tag          45.8298025
o2_mean_0m   14.8053415
o2_mean_250m 12.7273136
bathy_mean    7.4986643
o2_mean_60m   3.1614481
sal_mean      2.8046144
temp_mean     2.7978182
chl_mean      2.0527019
ssh_mean      1.9440515
mld_mean      1.3218104
bathy_sd      1.1178508
uostr_mean    0.9653779
pred_var      0.8301538
vo_mean       0.7652047
vostr_mean    0.7020946
uo_mean       0.6757517
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           4    temp_mean          1        tag   682.07
2           2   o2_mean_0m          1        tag   669.63
3          12   bathy_mean          1        tag   604.89
4           5     sal_mean          1        tag   436.04
5          15 o2_mean_250m          1        tag   284.97
6           3     chl_mean          1        tag   230.31
7          14  o2_mean_60m          1        tag   197.79
8          10     ssh_mean          1        tag   168.44
9          13     bathy_sd          1        tag   135.70
10          8      vo_mean          1        tag   119.67
11          4    temp_mean          2 o2_mean_0m   114.73
12         16     pred_var          1        tag    88.20
13         11     mld_mean          1        tag    74.37
[1] "External percent deviance explained"
[1] 0.7222488

[1] "TPR"
[1] 0.7408139
[1] "TSS"
[1] 0.8538549
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.236182 0.8848858 0.9827898 0.9981021         0.7222488  0.76155
explore_brt(mod_file_path = brt_outputs[12], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3097863
Correlation        0.9179801
AUC                0.9925000
Per.Expl          77.6536275
cvDeviance         0.5032433
cvCorrelation      0.8262230
cvAUC              0.9611900
cvPer.Expl        63.6986447
[1] "Relative influence of predictor variables"

                rel.inf
tag          44.6832785
dist_coast   14.9083920
o2_mean_0m   11.3088495
o2_mean_250m  7.7993714
lat           4.4617062
bathy_mean    2.7810933
o2_mean_60m   2.4314643
sal_mean      2.1541513
temp_mean     2.0608261
chl_mean      1.5384187
mld_mean      1.1011340
ssh_mean      1.0869767
pred_var      0.7795164
bathy_sd      0.7163208
vostr_mean    0.5989924
uostr_mean    0.5474190
vo_mean       0.5378007
uo_mean       0.5042887
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3   o2_mean_0m          1        tag   619.06
2           2          lat          1        tag   558.22
3          13   bathy_mean          1        tag   365.25
4           5    temp_mean          1        tag   298.79
5           6     sal_mean          1        tag   283.50
6          15   dist_coast          1        tag   209.23
7          17 o2_mean_250m          1        tag   162.76
8           4     chl_mean          1        tag   129.59
9          16  o2_mean_60m          1        tag   123.37
10         11     ssh_mean          1        tag    94.28
11         14     bathy_sd          1        tag    88.17
12         12     mld_mean          1        tag    81.84
13          9      vo_mean          1        tag    65.29
14         18     pred_var          1        tag    65.25
15          5    temp_mean          3 o2_mean_0m    59.87
16         10   vostr_mean          1        tag    43.83
[1] "External percent deviance explained"
[1] 0.7414953

[1] "TPR"
[1] 0.7422514
[1] "TSS"
[1] 0.8677531
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6800 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2264786 0.8948487 0.9856034   1.00013         0.7414953 0.7765363

AGI models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).

explore_brt(mod_file_path = brt_outputs[5], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3488727
Correlation        0.9075879
AUC                0.9911000
Per.Expl          74.8339450
cvDeviance         0.5715675
cvCorrelation      0.7967747
cvAUC              0.9489800
cvPer.Expl        58.7697755
[1] "Relative influence of predictor variables"

              rel.inf
tag        45.6682568
bathy_mean 15.8051134
temp_mean   9.0349774
AGI_0m      6.3891728
ssh_mean    5.4588690
sal_mean    4.9996184
chl_mean    3.1875195
vostr_mean  1.8324591
mld_mean    1.6778310
uostr_mean  1.3701035
bathy_sd    1.3575468
vo_mean     1.2788779
uo_mean     1.0123857
pred_var    0.9272687
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1385.24
2           4   sal_mean          1        tag  1008.93
3           3  temp_mean          1        tag   980.93
4          11 bathy_mean          1        tag   980.72
5          13     AGI_0m          1        tag   359.38
6           9   ssh_mean          1        tag   297.36
7           2   chl_mean          1        tag   241.30
8          13     AGI_0m          9   ssh_mean   214.62
9          12   bathy_sd          1        tag   205.71
10          7    vo_mean          1        tag   161.67
[1] "External percent deviance explained"
[1] 0.7058188

[1] "TPR"
[1] 0.7398468
[1] "TSS"
[1] 0.8467854
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2418833 0.8794963 0.9810396  1.000017         0.7058188 0.7483395
explore_brt(mod_file_path = brt_outputs[6], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3197760
Correlation        0.9158473
AUC                0.9925000
Per.Expl          76.9328462
cvDeviance         0.5137785
cvCorrelation      0.8210419
cvAUC              0.9596800
cvPer.Expl        62.9384050
[1] "Relative influence of predictor variables"

              rel.inf
tag        43.8312768
dist_coast 18.8936048
lat         7.3197274
bathy_mean  5.3601903
AGI_0m      5.2578297
temp_mean   4.8902634
sal_mean    3.8487744
chl_mean    2.5368902
ssh_mean    2.1700563
vostr_mean  1.1761165
mld_mean    1.1622948
pred_var    0.7803509
bathy_sd    0.7748458
uostr_mean  0.7138587
vo_mean     0.6796771
uo_mean     0.6042428
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   951.49
2          14     AGI_0m          4  temp_mean   683.10
3          12 bathy_mean          1        tag   473.92
4           4  temp_mean          1        tag   424.60
5           5   sal_mean          1        tag   293.15
6          14     AGI_0m          1        tag   282.92
7          15 dist_coast          1        tag   263.55
8           3   chl_mean          1        tag   175.09
9          10   ssh_mean          1        tag   142.20
10         13   bathy_sd          1        tag   107.14
11          8    vo_mean          1        tag    92.97
12         16   pred_var          1        tag    87.84
13         11   mld_mean          1        tag    75.98
[1] "External percent deviance explained"
[1] 0.7283939

[1] "TPR"
[1] 0.7411294
[1] "TSS"
[1] 0.8566402
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
      RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.232085 0.889198 0.9835227  1.000057         0.7283939 0.7693285
explore_brt(mod_file_path = brt_outputs[4], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3375275
Correlation        0.9115731
AUC                0.9920000
Per.Expl          75.6523377
cvDeviance         0.5596743
cvCorrelation      0.8023107
cvAUC              0.9513600
cvPer.Expl        59.6277012
[1] "Relative influence of predictor variables"

              rel.inf
tag        45.3698332
bathy_mean 14.8376454
temp_mean   8.7196282
AGI_0m      6.4385954
sal_mean    4.6470954
ssh_mean    4.6224771
AGI_60m     3.4371631
chl_mean    3.0979502
vostr_mean  1.7466955
mld_mean    1.5192247
uostr_mean  1.2840192
vo_mean     1.2375403
bathy_sd    1.2360217
uo_mean     0.9357369
pred_var    0.8703735
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1150.88
2           3  temp_mean          1        tag  1069.42
3           4   sal_mean          1        tag   977.35
4          11 bathy_mean          1        tag   872.62
5          13     AGI_0m          1        tag   303.61
6          14    AGI_60m          1        tag   276.48
7           9   ssh_mean          1        tag   249.01
8          12   bathy_sd          1        tag   204.60
9           2   chl_mean          1        tag   187.86
10          7    vo_mean          1        tag   144.86
11         15   pred_var          1        tag   124.26
[1] "External percent deviance explained"
[1] 0.7139032

[1] "TPR"
[1] 0.74046
[1] "TSS"
[1] 0.8501959
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2381082 0.8834379 0.9822011   1.00085         0.7139032 0.7565234
explore_brt(mod_file_path = brt_outputs[1], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3407856
Correlation        0.9092219
AUC                0.9912000
Per.Expl          75.4173115
cvDeviance         0.5592736
cvCorrelation      0.8020908
cvAUC              0.9511400
cvPer.Expl        59.6566018
[1] "Relative influence of predictor variables"

              rel.inf
tag        46.4790375
AGI_250m   12.2874054
temp_mean   8.9924380
bathy_mean  8.4880695
AGI_0m      5.6191772
sal_mean    3.8744780
ssh_mean    3.7625577
chl_mean    2.5548718
vostr_mean  1.3289345
mld_mean    1.3003549
uostr_mean  1.2354471
vo_mean     1.1455409
bathy_sd    1.1337923
pred_var    0.8997732
uo_mean     0.8981217
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag  1018.34
2           3  temp_mean          1        tag   834.21
3          13     AGI_0m          3  temp_mean   793.98
4          11 bathy_mean          1        tag   654.52
5          14   AGI_250m          1        tag   308.87
6          13     AGI_0m          1        tag   302.48
7           9   ssh_mean          1        tag   278.94
8           2   chl_mean          1        tag   198.59
9          12   bathy_sd          1        tag   164.86
10          7    vo_mean          1        tag   131.62
11         15   pred_var          1        tag   126.78
[1] "External percent deviance explained"
[1] 0.7098012

[1] "TPR"
[1] 0.7398088
[1] "TSS"
[1] 0.8471372
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7800 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2411697 0.8796409 0.9809232  1.001943         0.7098012 0.7541731
explore_brt(mod_file_path = brt_outputs[2], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3446793
Correlation        0.9077153
AUC                0.9909000
Per.Expl          75.1364360
cvDeviance         0.5538145
cvCorrelation      0.8046978
cvAUC              0.9522800
cvPer.Expl        60.0503988
[1] "Relative influence of predictor variables"

              rel.inf
tag        45.7582300
AGI_250m   12.1128304
temp_mean   9.0027738
bathy_mean  8.2161329
AGI_0m      5.5044249
sal_mean    3.6823711
ssh_mean    3.0624512
AGI_60m     2.7807343
chl_mean    2.4205258
vostr_mean  1.2805676
uostr_mean  1.2661180
mld_mean    1.2345471
vo_mean     1.0742638
bathy_sd    1.0404232
uo_mean     0.8104982
pred_var    0.7531076
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag   930.68
2           3  temp_mean          1        tag   920.62
3          13     AGI_0m          3  temp_mean   685.43
4          11 bathy_mean          1        tag   579.12
5          13     AGI_0m          1        tag   253.04
6          14    AGI_60m          1        tag   237.86
7          15   AGI_250m          1        tag   233.39
8           9   ssh_mean          1        tag   210.53
9          12   bathy_sd          1        tag   175.95
10          2   chl_mean          1        tag   152.48
11          7    vo_mean          1        tag   127.49
12         16   pred_var          1        tag    95.31
13         13     AGI_0m          9   ssh_mean    86.29
[1] "External percent deviance explained"
[1] 0.7085886

[1] "TPR"
[1] 0.7397452
[1] "TSS"
[1] 0.8460373
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7200 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2417386 0.8791051 0.9808666  1.002203         0.7085886 0.7513644
explore_brt(mod_file_path = brt_outputs[3], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3080448
Correlation        0.9195783
AUC                0.9934000
Per.Expl          77.7790835
cvDeviance         0.5084655
cvCorrelation      0.8226110
cvAUC              0.9603100
cvPer.Expl        63.3216647
[1] "Relative influence of predictor variables"

              rel.inf
tag        44.6291838
dist_coast 15.2851304
lat         6.9231034
AGI_250m    6.7387247
AGI_0m      4.8196657
temp_mean   4.5949137
bathy_mean  3.2716900
sal_mean    2.6278408
AGI_60m     2.1064618
chl_mean    2.0757314
ssh_mean    1.7362553
mld_mean    1.0240873
pred_var    0.7992513
bathy_sd    0.7490174
vostr_mean  0.7444577
vo_mean     0.6774726
uostr_mean  0.6498551
uo_mean     0.5471578
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   838.15
2           4  temp_mean          1        tag   365.46
3          12 bathy_mean          1        tag   330.46
4           5   sal_mean          1        tag   314.53
5          14     AGI_0m          4  temp_mean   311.04
6          14     AGI_0m          1        tag   262.60
7          15 dist_coast          1        tag   206.65
8          17   AGI_250m          1        tag   178.43
9           3   chl_mean          1        tag   144.07
10         16    AGI_60m          1        tag   123.29
11         13   bathy_sd          1        tag   120.30
12         10   ssh_mean          1        tag   103.70
13          8    vo_mean          1        tag    82.19
14         18   pred_var          1        tag    82.03
15         11   mld_mean          1        tag    72.98
16          9 vostr_mean          1        tag    54.22
[1] "External percent deviance explained"
[1] 0.7347234

[1] "TPR"
[1] 0.7415656
[1] "TSS"
[1] 0.8603764
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2295486 0.8915997 0.9842994  1.001493         0.7347234 0.7777908

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_crw_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained deviance_exp TPR_mean TSS AUC RMSE SpearmanCor PseudoR2
base_0m_Nspat_Ntag 42.389 0.385 0.695 0.613 0.892 0.371 0.679 0.892
base_0m_Nspat_Ytag 72.475 0.677 0.737 0.836 0.976 0.255 0.865 0.725
base_0m_Yspat_Ytag 75.788 0.712 0.740 0.850 0.981 0.240 0.881 0.758
do_0m_Nspat_Ytag 75.166 0.715 0.741 0.853 0.982 0.239 0.883 0.752
do_0m_Yspat_Ytag 77.803 0.742 0.742 0.864 0.986 0.227 0.895 0.778
do_0m_60m_Nspat_Ytag 75.933 0.721 0.741 0.856 0.983 0.236 0.885 0.759
do_0m_250m_Nspat_Ytag 75.321 0.717 0.740 0.848 0.982 0.239 0.882 0.753
do_0m_60m_250m_Nspat_Ytag 76.155 0.722 0.741 0.854 0.983 0.236 0.885 0.762
do_0m_60m_250m_Yspat_Ytag 77.654 0.741 0.742 0.868 0.986 0.226 0.895 0.777
agi_0m_Nspat_Ytag 74.834 0.706 0.740 0.847 0.981 0.242 0.879 0.748
agi_0m_Yspat_Ytag 76.933 0.728 0.741 0.857 0.984 0.232 0.889 0.769
agi_0m_60m_Nspat_Ytag 75.672 0.714 0.740 0.850 0.982 0.238 0.883 0.757
agi_0m_250m_Nspat_Ytag 75.417 0.710 0.740 0.847 0.981 0.241 0.880 0.754
agi_0m_60m_250m_Nspat_Ytag 75.136 0.709 0.740 0.846 0.981 0.242 0.879 0.751
agi_0m_60m_250m_Yspat_Ytag 77.780 0.735 0.742 0.860 0.984 0.230 0.892 0.778
ggplot(output_sum, aes(x = AUC, y = TSS, color = deviance_exp, text = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50', 
                  max.overlaps = 20,
                  label.size = 0.5)

Conclusions from initial models w/ tag ID

  • Base models:Bathymetry was consistently one of the top predictor variables across all base models, and percent explained greatly increased after including spatial and tag ID as additional predictors. After running these initial models, we decided to instead run the spatial analysis separately (GLMs, GAMs), rather than including them as predictors in the hSDMs, to specifically investigate the relationships between latitude, distance to coast, and the AGI or DO at different depth layers. Additionally, we will not include tag ID as a predictor variable as it would not be included in any projection work and is not critical for the main objectives of this study.

  • DO models: Performance metrics generally increased, though only subtly, after including the additional depth layers relative to the DO_0m model. However, relative to the base models, including DO considerably improved model performance. Across depth layers, DO at 0m and 250m were consistently in the top 5 predictors with most relative influence and had comparable contributions. From the partial plots, we generally see a sweet spot for DO values at 0m and a negative relationship for DO at 250m.

  • AGI models: Performance metrics were comparable among the DO and AGI models, and the patterns observed for the DO models also generally held for the AGI models. We see model performance greatly improve for the AGI models relative to the base models, and performance also subtly increased after including the additional depth layers. A primary difference for the AGI models is the relative influence of the AGI at 250m. For these models, the AGI at this depth layer is the only one appearing in the top variables with the highest relative influence, and the AGI at 0m and 60m is typically lower in the list. The AGI partial plots show similar patterns as the DO plots, with less of a dramatic negative relationship for the AGI at 250m.

  • The random predictor variable was typically the lowest performing metric, but across some models, had a higher relative influence than the predictors related to wind stress and wind stress curl.

DO models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

explore_brt(mod_file_path = brt_outputs_Ntag[12], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.7008164
Correlation        0.7644341
AUC                0.9373000
Per.Expl          49.4467567
cvDeviance         0.9191696
cvCorrelation      0.6261518
cvAUC              0.8570800
cvPer.Expl        33.6958924
[1] "Relative influence of predictor variables"

             rel.inf
o2_mean_0m 25.276448
bathy_mean 24.692429
temp_mean   9.643582
sal_mean    7.962940
chl_mean    6.550115
ssh_mean    4.947427
mld_mean    3.952063
bathy_sd    3.566587
vostr_mean  3.058564
vo_mean     2.889361
uo_mean     2.880227
pred_var    2.378356
uostr_mean  2.201901
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          3  temp_mean          1 o2_mean_0m   669.74
2          9   ssh_mean          1 o2_mean_0m   242.37
3          9   ssh_mean          4   sal_mean   166.84
4         11 bathy_mean          1 o2_mean_0m   158.43
5         11 bathy_mean          3  temp_mean   150.85
6          7    vo_mean          4   sal_mean   123.20
7          9   ssh_mean          3  temp_mean    94.56
8         13   pred_var          8 vostr_mean    91.68
[1] "External percent deviance explained"
[1] 0.4500602

[1] "TPR"
[1] 0.7078792
[1] "TSS"
[1] 0.6709833
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4400 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3467092 0.7274861 0.9171592 0.9981085         0.4500602 0.4944676
explore_brt(mod_file_path = brt_outputs_Ntag[13], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6614427
Correlation        0.7820692
AUC                0.9456000
Per.Expl          52.2869659
cvDeviance         0.8779106
cvCorrelation      0.6486562
cvAUC              0.8709500
cvPer.Expl        36.6720992
[1] "Relative influence of predictor variables"

             rel.inf
dist_coast 26.671181
o2_mean_0m 18.726788
lat         9.290748
temp_mean   7.857842
bathy_mean  7.494543
sal_mean    6.766416
chl_mean    4.487413
ssh_mean    3.382554
mld_mean    2.832160
vostr_mean  2.487294
vo_mean     2.263713
pred_var    2.105882
uo_mean     2.030506
bathy_sd    1.927916
uostr_mean  1.675046
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           4  temp_mean          2 o2_mean_0m   534.86
2           2 o2_mean_0m          1        lat   195.38
3          10   ssh_mean          2 o2_mean_0m   187.36
4          14 dist_coast          5   sal_mean   171.61
5          12 bathy_mean          4  temp_mean   151.42
6           4  temp_mean          1        lat   140.73
7           7 uostr_mean          1        lat   129.13
8          10   ssh_mean          1        lat    68.75
9           9 vostr_mean          5   sal_mean    66.17
10         14 dist_coast          4  temp_mean    65.09
11          5   sal_mean          2 o2_mean_0m    62.75
[1] "External percent deviance explained"
[1] 0.4774787

[1] "TPR"
[1] 0.7122652
[1] "TSS"
[1] 0.6959527
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3369501 0.7453633 0.9259331  1.000626         0.4774787 0.5228697
explore_brt(mod_file_path = brt_outputs_Ntag[11], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6829345
Correlation        0.7734095
AUC                0.9413000
Per.Expl          50.7366588
cvDeviance         0.9080272
cvCorrelation      0.6319024
cvAUC              0.8602700
cvPer.Expl        34.4996481
[1] "Relative influence of predictor variables"

              rel.inf
o2_mean_0m  23.261017
bathy_mean  23.180777
temp_mean    8.946731
o2_mean_60m  7.584790
sal_mean     7.116600
chl_mean     5.956366
ssh_mean     4.122221
bathy_sd     3.632138
mld_mean     3.629388
vostr_mean   2.942032
vo_mean      2.838769
uo_mean      2.590805
pred_var     2.267411
uostr_mean   1.930953
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index  var1.names var2.index var2.names int.size
1           3   temp_mean          1 o2_mean_0m   725.63
2           9    ssh_mean          1 o2_mean_0m   207.72
3          13 o2_mean_60m          3  temp_mean   146.75
4          11  bathy_mean          3  temp_mean   143.76
5           4    sal_mean          3  temp_mean   138.38
6          11  bathy_mean          1 o2_mean_0m   118.26
7           9    ssh_mean          4   sal_mean   109.57
8           6  uostr_mean          1 o2_mean_0m   101.66
9           9    ssh_mean          3  temp_mean    75.73
10         14    pred_var          5    uo_mean    69.79
[1] "External percent deviance explained"
[1] 0.4630292

[1] "TPR"
[1] 0.7102091
[1] "TSS"
[1] 0.6853121
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3418533 0.7367275 0.9218172 0.9995302         0.4630292 0.5073666
explore_brt(mod_file_path = brt_outputs_Ntag[8], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6791781
Correlation        0.7746510
AUC                0.9422000
Per.Expl          51.0076259
cvDeviance         0.9080124
cvCorrelation      0.6325770
cvAUC              0.8609300
cvPer.Expl        34.5007109
[1] "Relative influence of predictor variables"

               rel.inf
o2_mean_0m   25.069876
o2_mean_250m 21.370418
bathy_mean   11.146251
temp_mean     8.048127
sal_mean      7.380057
chl_mean      5.021190
ssh_mean      4.259103
mld_mean      3.260225
bathy_sd      2.908945
vo_mean       2.563888
vostr_mean    2.452558
uo_mean       2.395906
pred_var      2.102817
uostr_mean    2.020640
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3    temp_mean          1 o2_mean_0m   623.90
2           9     ssh_mean          4   sal_mean   206.20
3           9     ssh_mean          1 o2_mean_0m   176.32
4          13 o2_mean_250m          1 o2_mean_0m   156.94
5           7      vo_mean          4   sal_mean   145.70
6          11   bathy_mean          1 o2_mean_0m   110.15
7          14     pred_var          8 vostr_mean   107.16
8           4     sal_mean          3  temp_mean    93.56
9           8   vostr_mean          4   sal_mean    86.63
10         13 o2_mean_250m          3  temp_mean    70.13
[1] "External percent deviance explained"
[1] 0.4672029

[1] "TPR"
[1] 0.7109929
[1] "TSS"
[1] 0.6804633
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3403401 0.7395697 0.9233672 0.9988469         0.4672029 0.5100763
explore_brt(mod_file_path = brt_outputs_Ntag[9], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6675583
Correlation        0.7801596
AUC                0.9448000
Per.Expl          51.8458202
cvDeviance         0.9001398
cvCorrelation      0.6364350
cvAUC              0.8631700
cvPer.Expl        35.0685995
[1] "Relative influence of predictor variables"

               rel.inf
o2_mean_0m   23.999794
o2_mean_250m 21.235677
bathy_mean    9.216478
temp_mean     7.566296
sal_mean      6.750525
o2_mean_60m   6.595268
chl_mean      4.699962
ssh_mean      3.230278
mld_mean      2.930399
bathy_sd      2.862417
vo_mean       2.465057
uo_mean       2.292054
vostr_mean    2.217181
pred_var      2.123455
uostr_mean    1.815160
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3    temp_mean          1 o2_mean_0m   514.84
2           9     ssh_mean          4   sal_mean   230.69
3          14 o2_mean_250m          1 o2_mean_0m   137.44
4           9     ssh_mean          1 o2_mean_0m   130.29
5           4     sal_mean          3  temp_mean   117.26
6           7      vo_mean          4   sal_mean   113.45
7          15     pred_var          5    uo_mean    99.85
8          13  o2_mean_60m          3  temp_mean    87.88
9           9     ssh_mean          3  temp_mean    75.78
10         11   bathy_mean          1 o2_mean_0m    74.79
11         11   bathy_mean          3  temp_mean    73.68
[1] "External percent deviance explained"
[1] 0.4730226

[1] "TPR"
[1] 0.7118979
[1] "TSS"
[1] 0.6942536
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.338087 0.7435281 0.9252005  0.997885         0.4730226 0.5184582
explore_brt(mod_file_path = brt_outputs_Ntag[10], 
            test_data = do_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.6260018
Correlation        0.7998334
AUC                0.9540000
Per.Expl          54.8434923
cvDeviance         0.8717790
cvCorrelation      0.6520278
cvAUC              0.8730600
cvPer.Expl        37.1143990
[1] "Relative influence of predictor variables"

               rel.inf
dist_coast   21.448387
o2_mean_0m   17.884136
o2_mean_250m 11.445241
lat           6.825416
temp_mean     6.586808
sal_mean      6.132765
o2_mean_60m   5.658457
chl_mean      3.792773
bathy_mean    3.672083
ssh_mean      2.744783
mld_mean      2.548687
vo_mean       2.115490
uo_mean       1.958505
vostr_mean    1.921788
pred_var      1.875404
bathy_sd      1.700138
uostr_mean    1.689138
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           4    temp_mean          2 o2_mean_0m   562.48
2          10     ssh_mean          2 o2_mean_0m   142.65
3           2   o2_mean_0m          1        lat   117.54
4          10     ssh_mean          5   sal_mean   105.70
5          14   dist_coast          5   sal_mean   100.26
6           5     sal_mean          4  temp_mean    99.98
7          14   dist_coast          9 vostr_mean    95.55
8          12   bathy_mean          4  temp_mean    94.87
9           4    temp_mean          1        lat    93.76
10         10     ssh_mean          1        lat    90.14
11         16 o2_mean_250m          1        lat    81.26
12         17     pred_var          9 vostr_mean    69.31
13         15  o2_mean_60m          4  temp_mean    64.27
14          9   vostr_mean          5   sal_mean    60.88
[1] "External percent deviance explained"
[1] 0.5003296

[1] "TPR"
[1] 0.7165041
[1] "TSS"
[1] 0.7177324
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3278095 0.7619584 0.9344126  1.000316         0.5003296 0.5484349

AGI models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

explore_brt(mod_file_path = brt_outputs_Ntag[5], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.7138692
Correlation        0.7575339
AUC                0.9336000
Per.Expl          48.5047989
cvDeviance         0.9289380
cvCorrelation      0.6208177
cvAUC              0.8541100
cvPer.Expl        32.9907313
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 22.731429
AGI_0m     17.791729
temp_mean  15.222220
sal_mean    9.295392
ssh_mean    7.358567
chl_mean    5.645338
bathy_sd    3.705780
vo_mean     3.438996
vostr_mean  3.398845
mld_mean    3.390303
uo_mean     2.903321
uostr_mean  2.782337
pred_var    2.335742
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         12     AGI_0m          2  temp_mean  3448.33
2         12     AGI_0m          8   ssh_mean   241.53
3         10 bathy_mean          2  temp_mean   223.16
4         12     AGI_0m         10 bathy_mean   194.18
5         12     AGI_0m          4    uo_mean   140.23
6          7 vostr_mean          2  temp_mean    89.54
7          8   ssh_mean          2  temp_mean    83.29
8          6    vo_mean          3   sal_mean    80.67
[1] "External percent deviance explained"
[1] 0.4366052

[1] "TPR"
[1] 0.7045997
[1] "TSS"
[1] 0.6523948
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.3521742 0.7161327 0.9105999 0.9996408         0.4366052 0.485048
explore_brt(mod_file_path = brt_outputs_Ntag[6], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.6608846
Correlation        0.7823616
AUC                0.9458000
Per.Expl          52.3268580
cvDeviance         0.8877425
cvCorrelation      0.6432731
cvAUC              0.8676200
cvPer.Expl        35.9623875
[1] "Relative influence of predictor variables"

             rel.inf
dist_coast 26.940501
AGI_0m     14.237577
lat        11.432881
temp_mean   9.137101
sal_mean    8.383518
bathy_mean  6.450135
chl_mean    4.606477
ssh_mean    4.094246
mld_mean    2.781658
vostr_mean  2.141951
vo_mean     2.063669
pred_var    2.032190
uo_mean     1.995454
uostr_mean  1.873742
bathy_sd    1.828900
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1902.42
2           3  temp_mean          1        lat   616.06
3          13     AGI_0m          9   ssh_mean   167.94
4           8 vostr_mean          3  temp_mean   162.76
5           6 uostr_mean          1        lat   154.65
6          14 dist_coast          4   sal_mean   144.17
7          13     AGI_0m         11 bathy_mean   142.36
8          13     AGI_0m          1        lat   134.68
9           9   ssh_mean          3  temp_mean    89.31
10          8 vostr_mean          4   sal_mean    70.22
11         13     AGI_0m          4   sal_mean    68.82
[1] "External percent deviance explained"
[1] 0.4743424

[1] "TPR"
[1] 0.7112435
[1] "TSS"
[1] 0.6897118
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3383874 0.7424042 0.9238761 0.9986839         0.4743424 0.5232686
explore_brt(mod_file_path = brt_outputs_Ntag[4], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.6808364
Correlation        0.7747488
AUC                0.9423000
Per.Expl          50.8876286
cvDeviance         0.9168888
cvCorrelation      0.6270623
cvAUC              0.8579000
cvPer.Expl        33.8599080
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 21.700356
AGI_0m     16.232947
temp_mean  14.727427
sal_mean    8.696719
ssh_mean    6.145377
chl_mean    5.640991
AGI_60m     5.466690
bathy_sd    3.771708
vostr_mean  3.420011
mld_mean    3.394063
vo_mean     3.152506
uo_mean     2.837063
uostr_mean  2.639898
pred_var    2.174244
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  3156.43
2          12     AGI_0m         10 bathy_mean   213.03
3          10 bathy_mean          2  temp_mean   207.88
4          12     AGI_0m          8   ssh_mean   201.47
5           7 vostr_mean          2  temp_mean   135.57
6           6    vo_mean          3   sal_mean    96.14
7           8   ssh_mean          2  temp_mean    95.85
8          13    AGI_60m         10 bathy_mean    88.11
9           5 uostr_mean          2  temp_mean    80.23
10         14   pred_var          4    uo_mean    74.79
[1] "External percent deviance explained"
[1] 0.4577255

[1] "TPR"
[1] 0.7087203
[1] "TSS"
[1] 0.6718425
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3441022 0.7322665 0.9188448 0.9987076         0.4577255 0.5088763
explore_brt(mod_file_path = brt_outputs_Ntag[1], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.7036843
Correlation        0.7613252
AUC                0.9355000
Per.Expl          49.2394844
cvDeviance         0.9131700
cvCorrelation      0.6290762
cvAUC              0.8592000
cvPer.Expl        34.1281627
[1] "Relative influence of predictor variables"

             rel.inf
AGI_250m   20.028572
temp_mean  15.736663
AGI_0m     15.502156
bathy_mean 11.082676
sal_mean    8.284880
ssh_mean    5.695943
chl_mean    4.456598
bathy_sd    3.261830
mld_mean    3.150028
vo_mean     2.968082
uo_mean     2.742510
uostr_mean  2.688339
vostr_mean  2.523998
pred_var    1.877725
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  2261.05
2          12     AGI_0m          8   ssh_mean   243.87
3          13   AGI_250m         12     AGI_0m   192.33
4          12     AGI_0m         10 bathy_mean   149.62
5           6    vo_mean          3   sal_mean   110.20
6          13   AGI_250m          2  temp_mean    96.73
7          10 bathy_mean          2  temp_mean    87.09
8          12     AGI_0m          4    uo_mean    81.43
9           7 vostr_mean          2  temp_mean    70.55
10         13   AGI_250m          3   sal_mean    62.69
[1] "External percent deviance explained"
[1] 0.4483766

[1] "TPR"
[1] 0.7067332
[1] "TSS"
[1] 0.6632542
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3477708 0.7247524 0.914827 0.9985637         0.4483766 0.4923948
explore_brt(mod_file_path = brt_outputs_Ntag[2], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.6776093
Correlation        0.7746133
AUC                0.9421000
Per.Expl          51.1204139
cvDeviance         0.9081818
cvCorrelation      0.6309660
cvAUC              0.8602600
cvPer.Expl        34.4879906
[1] "Relative influence of predictor variables"

             rel.inf
AGI_250m   18.677103
temp_mean  15.215390
AGI_0m     14.842826
bathy_mean 11.576550
sal_mean    8.159404
ssh_mean    4.885997
chl_mean    4.267141
AGI_60m     3.982295
mld_mean    3.144571
bathy_sd    2.910730
vo_mean     2.764213
uostr_mean  2.569658
vostr_mean  2.536394
uo_mean     2.474835
pred_var    1.992893
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  2217.12
2          12     AGI_0m          8   ssh_mean   266.83
3          12     AGI_0m         10 bathy_mean   171.01
4          14   AGI_250m         12     AGI_0m   121.70
5           6    vo_mean          3   sal_mean   119.30
6           7 vostr_mean          2  temp_mean    89.77
7          14   AGI_250m          2  temp_mean    85.04
8          13    AGI_60m         10 bathy_mean    70.39
9          10 bathy_mean          3   sal_mean    54.79
10         10 bathy_mean          2  temp_mean    54.17
11         11   bathy_sd          3   sal_mean    51.71
[1] "External percent deviance explained"
[1] 0.4637131

[1] "TPR"
[1] 0.7096986
[1] "TSS"
[1] 0.6705067
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3419866 0.7360071 0.920779 0.9997123         0.4637131 0.5112041
explore_brt(mod_file_path = brt_outputs_Ntag[3], 
            test_data = agi_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.6484015
Correlation        0.7886623
AUC                0.9489000
Per.Expl          53.2273293
cvDeviance         0.8776781
cvCorrelation      0.6485564
cvAUC              0.8704400
cvPer.Expl        36.6883816
[1] "Relative influence of predictor variables"

             rel.inf
dist_coast 22.326115
AGI_0m     13.657497
lat        10.234998
AGI_250m   10.174094
temp_mean   8.542396
sal_mean    7.029363
bathy_mean  4.398889
chl_mean    3.838414
ssh_mean    3.348338
AGI_60m     3.161974
mld_mean    2.653839
uo_mean     1.954712
vo_mean     1.922921
uostr_mean  1.732725
pred_var    1.726678
vostr_mean  1.726454
bathy_sd    1.570595
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1330.23
2           3  temp_mean          1        lat   514.47
3          13     AGI_0m          1        lat   160.00
4          13     AGI_0m          9   ssh_mean   146.13
5          13     AGI_0m         11 bathy_mean   137.65
6           6 uostr_mean          1        lat   137.36
7          16   AGI_250m         13     AGI_0m    93.98
8          14 dist_coast          8 vostr_mean    93.55
9          14 dist_coast          4   sal_mean    81.11
10          8 vostr_mean          4   sal_mean    71.57
11          8 vostr_mean          3  temp_mean    70.69
12         12   bathy_sd          4   sal_mean    60.30
13         13     AGI_0m          4   sal_mean    59.27
14          5    uo_mean          3  temp_mean    57.28
[1] "External percent deviance explained"
[1] 0.4821228

[1] "TPR"
[1] 0.71269
[1] "TSS"
[1] 0.6885405
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3356191 0.7477048 0.926785 0.9980565         0.4821228 0.5322733

Summary table of results

output_sum_Ntag <- read.csv(here("data/brt/mod_outputs/brt_crw_output_summary_Ntag.csv"))
kableExtra::kable(output_sum_Ntag)
model percent_explained deviance_exp TPR_mean TSS AUC RMSE SpearmanCor PseudoR2
base_0m_Nspat_Ntag 42.389 0.385 0.695 0.613 0.892 0.371 0.679 0.424
do_0m_Nspat_Ntag 49.447 0.450 0.708 0.671 0.917 0.347 0.727 0.494
do_0m_Yspat_Ntag 52.287 0.477 0.712 0.696 0.926 0.337 0.745 0.523
do_0m_60m_Nspat_Ntag 50.737 0.463 0.710 0.685 0.922 0.342 0.737 0.507
do_0m_250m_Nspat_Ntag 51.008 0.467 0.711 0.680 0.923 0.340 0.740 0.510
do_0m_60m_250m_Nspat_Ntag 51.846 0.473 0.712 0.694 0.925 0.338 0.744 0.518
do_0m_60m_250m_Yspat_Ntag 54.843 0.500 0.717 0.718 0.934 0.328 0.762 0.548
agi_0m_Nspat_Ntag 48.505 0.437 0.705 0.652 0.911 0.352 0.716 0.485
agi_0m_Yspat_Ntag 52.327 0.474 0.711 0.690 0.924 0.338 0.742 0.523
agi_0m_60m_Nspat_Ntag 50.888 0.458 0.709 0.672 0.919 0.344 0.732 0.509
agi_0m_250m_Nspat_Ntag 49.239 0.448 0.707 0.663 0.915 0.348 0.724 0.492
agi_0m_60m_250m_Nspat_Ntag 51.120 0.464 0.710 0.671 0.912 0.342 0.736 0.511
agi_0m_60m_250m_Yspat_Ntag 53.227 0.482 0.713 0.689 0.927 0.336 0.748 0.532
output_sum_Ntag_Nspat <- output_sum_Ntag %>%
  filter(!grepl("Yspat", model))

ggplot(output_sum_Ntag_Nspat, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50', 
                  max.overlaps = 20, 
                  label.size = 0.5)

Conclusions from initial models w/o tag ID

  • These models were all developed using predictor data at a daily resolution, and if we only consider models with no spatial predictors, the DO model with DO data at 0m, 60m, and 250m performed the best, with the comparable AGI model having lower TSS and AUC scores.

  • The DO and AGI models both performed better relative to the base model.

  • DO at 0m and DO at 250m were the two predictors with the highest relative influence, while DO at 60m was considerably lower in the list. This pattern held whether or not spatial predictor variables were included. Still, performance metrics improved for the DO_0m_60m_250m model relative to the DO_0m_250m model. Partial plot patters for DO at 0m and 250m were the same as the original models that included tag ID as a predictor (sweet spot for 0m, negative relationship for 250m).

  • AGI at 250m was the most important predictor variable, followed by temperature and AGI at 0m (the two had nearly identical relative influence values). However, AGI at 0m became more influential if spatial predictors were included. Model performance had smaller differences between the AGI_0m_250m and AGI_0m_60m_250m as the DO models did. Partial plot patters for the AGI at 0m and 250m remained the same as described above.

Base models w/o tag ID and w/ data at seasonal and annual resolutions

For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_base_0m_seas_Nspat_Ntag.rds",
            test_data = base_test_seasonal)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.8809143
Correlation        0.6567031
AUC                0.8771000
Per.Expl          36.4552432
cvDeviance         0.9683417
cvCorrelation      0.5954101
cvAUC              0.8407800
cvPer.Expl        30.1486725
[1] "Relative influence of predictor variables"

             rel.inf
vostr_mean 17.333628
uostr_mean 13.508666
bathy_mean 13.232902
vo_mean    11.309019
temp_mean  10.168831
ssh_mean    9.233046
sal_mean    8.770811
mld_mean    5.895077
chl_mean    5.118059
uo_mean     2.402328
bathy_sd    1.581218
pred_var    1.446416
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          2   sal_mean          1   mld_mean   133.60
2         10 bathy_mean          6 uostr_mean    81.44
3         10 bathy_mean          2   sal_mean    73.84
4          8 vostr_mean          4  temp_mean    63.53
5          6 uostr_mean          4  temp_mean    60.80
6          7    vo_mean          4  temp_mean    56.52
7          4  temp_mean          2   sal_mean    48.36
[1] "External percent deviance explained"
[1] 0.3488583

[1] "TPR"
[1] 0.682944
[1] "TSS"
[1] 0.5561876
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8350 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
     RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.38543 0.6400077 0.8671911  1.005559         0.3488583 0.3645524
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_base_0m_ann_Nspat_Ntag.rds",
            test_data = base_test_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.7271350
Correlation        0.7489326
AUC                0.9294000
Per.Expl          47.5482367
cvDeviance         0.9391765
cvCorrelation      0.6117233
cvAUC              0.8502500
cvPer.Expl        32.2526569
[1] "Relative influence of predictor variables"

             rel.inf
vostr_mean 20.172003
uostr_mean 13.485912
sal_mean   10.802089
bathy_mean  9.230921
vo_mean     8.994202
mld_mean    7.785211
chl_mean    7.103797
temp_mean   6.996080
ssh_mean    5.855904
uo_mean     3.741738
bathy_sd    2.977250
pred_var    2.854894
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          3   ssh_mean          2   sal_mean   854.62
2          8 vostr_mean          4  temp_mean   733.30
3          8 vostr_mean          6 uostr_mean   401.72
4          6 uostr_mean          3   ssh_mean   381.20
5         10 bathy_mean          8 vostr_mean   275.00
6          6 uostr_mean          2   sal_mean   226.43
7          9   chl_mean          3   ssh_mean   186.31
[1] "External percent deviance explained"
[1] 0.4327516

[1] "TPR"
[1] 0.7036107
[1] "TSS"
[1] 0.6509996
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4900 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.354432 0.7109286 0.9086044  1.002374         0.4327516 0.4754824

DO models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862856
Residual.Deviance  0.8087782
Correlation        0.7015642
AUC                0.9035000
Per.Expl          41.6586195
cvDeviance         0.9224917
cvCorrelation      0.6249818
cvAUC              0.8574200
cvPer.Expl        33.4558676
[1] "Relative influence of predictor variables"

                     rel.inf
o2_mean_250m_seas 25.0249198
o2_mean_0m_seas   24.7052133
o2_mean_60m_seas   9.0417226
temp_mean          8.8706495
bathy_mean         8.1790158
sal_mean           5.6940018
chl_mean           4.2801120
ssh_mean           3.9981758
mld_mean           2.3770798
bathy_sd           1.6902741
vostr_mean         1.5184271
vo_mean            1.3453968
uo_mean            1.2410073
uostr_mean         1.0733586
pred_var           0.9606457
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1          10        bathy_mean          2       temp_mean   204.60
2          15 o2_mean_250m_seas         13 o2_mean_0m_seas   120.96
3          13   o2_mean_0m_seas          2       temp_mean   108.56
4          14  o2_mean_60m_seas          3        sal_mean   104.18
5          13   o2_mean_0m_seas          8        ssh_mean   102.23
6          13   o2_mean_0m_seas         10      bathy_mean    73.29
7          13   o2_mean_0m_seas          1        chl_mean    56.07
8          10        bathy_mean          3        sal_mean    43.98
9          14  o2_mean_60m_seas          2       temp_mean    35.27
10         14  o2_mean_60m_seas         10      bathy_mean    33.37
11          7        vostr_mean          3        sal_mean    28.21
[1] "External percent deviance explained"
[1] 0.4130282

[1] "TPR"
[1] 0.6999559
[1] "TSS"
[1] 0.6280722
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE      Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3615509 0.697535 0.901256  1.002082         0.4130282 0.4165862
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862856
Residual.Deviance  0.7888981
Correlation        0.7126061
AUC                0.9098000
Per.Expl          43.0926725
cvDeviance         0.9023416
cvCorrelation      0.6369539
cvAUC              0.8648600
cvPer.Expl        34.9093998
[1] "Relative influence of predictor variables"

                     rel.inf
dist_coast        24.1818242
o2_mean_0m_seas   18.7439025
o2_mean_250m_seas 12.5092916
temp_mean          7.7772110
o2_mean_60m_seas   6.6435930
sal_mean           6.4633552
lat                5.4361197
chl_mean           4.0752844
bathy_mean         3.0942439
ssh_mean           2.6982385
mld_mean           2.0105217
vostr_mean         1.3598262
vo_mean            1.0937062
uo_mean            1.0536335
uostr_mean         0.9697403
bathy_sd           0.9471132
pred_var           0.9423949
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1          11        bathy_mean          3       temp_mean    97.65
2          15   o2_mean_0m_seas          9        ssh_mean    79.80
3          15   o2_mean_0m_seas          3       temp_mean    68.01
4          17 o2_mean_250m_seas         15 o2_mean_0m_seas    57.47
5          16  o2_mean_60m_seas          4        sal_mean    56.75
6           4          sal_mean          1             lat    40.08
7          15   o2_mean_0m_seas          1             lat    39.48
8          15   o2_mean_0m_seas          2        chl_mean    36.64
9           6        uostr_mean          3       temp_mean    35.68
10         16  o2_mean_60m_seas          8      vostr_mean    30.12
11          8        vostr_mean          4        sal_mean    30.08
12          6        uostr_mean          1             lat    27.81
13         13        dist_coast          4        sal_mean    25.67
14         16  o2_mean_60m_seas          3       temp_mean    21.36
[1] "External percent deviance explained"
[1] 0.4291891

[1] "TPR"
[1] 0.7037666
[1] "TSS"
[1] 0.6477985
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3555959 0.7105654 0.9089065  1.002524         0.4291891 0.4309267
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862856
Residual.Deviance  0.7161664
Correlation        0.7594624
AUC                0.9361000
Per.Expl          48.3391884
cvDeviance         0.9443926
cvCorrelation      0.6087538
cvAUC              0.8489600
cvPer.Expl        31.8760420
[1] "Relative influence of predictor variables"

                   rel.inf
o2_mean_250m_ann 22.815335
temp_mean        12.948825
o2_mean_0m_ann   10.632947
o2_mean_60m_ann   8.366439
bathy_mean        8.305431
sal_mean          7.217211
chl_mean          6.745035
ssh_mean          4.679201
bathy_sd          3.764948
mld_mean          3.059999
vostr_mean        2.774661
uo_mean           2.445189
vo_mean           2.303757
pred_var          2.190658
uostr_mean        1.750363
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index      var2.names int.size
1          13   o2_mean_0m_ann          2       temp_mean   270.07
2          14  o2_mean_60m_ann          2       temp_mean   163.36
3          10       bathy_mean          2       temp_mean   143.17
4          14  o2_mean_60m_ann          3        sal_mean   118.14
5          10       bathy_mean          8        ssh_mean   103.69
6          13   o2_mean_0m_ann          3        sal_mean   103.08
7          14  o2_mean_60m_ann         13  o2_mean_0m_ann    95.55
8           8         ssh_mean          1        chl_mean    89.14
9          14  o2_mean_60m_ann         10      bathy_mean    79.14
10          7       vostr_mean          3        sal_mean    71.34
11         15 o2_mean_250m_ann         14 o2_mean_60m_ann    60.83
[1] "External percent deviance explained"
[1] 0.4783583

[1] "TPR"
[1] 0.7163317
[1] "TSS"
[1] 0.7136478
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4700 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3344052 0.7557594 0.9340804 0.9990281         0.4783583 0.4833919
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862856
Residual.Deviance  0.6536089
Correlation        0.7921300
AUC                0.9520000
Per.Expl          52.8517851
cvDeviance         0.9204287
cvCorrelation      0.6225846
cvAUC              0.8570200
cvPer.Expl        33.6046878
[1] "Relative influence of predictor variables"

                   rel.inf
dist_coast       20.297776
o2_mean_250m_ann 12.518497
temp_mean         9.519565
lat               8.436190
sal_mean          7.739106
chl_mean          6.172511
o2_mean_60m_ann   5.512454
o2_mean_0m_ann    5.467624
bathy_mean        4.211227
ssh_mean          3.787317
mld_mean          2.916206
vostr_mean        2.473318
pred_var          2.428564
uo_mean           2.233912
bathy_sd          2.205305
vo_mean           2.138529
uostr_mean        1.941896
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index      var1.names var2.index var2.names int.size
1          16 o2_mean_60m_ann          1        lat   185.52
2          16 o2_mean_60m_ann          3  temp_mean   172.29
3          15  o2_mean_0m_ann          3  temp_mean   169.48
4          16 o2_mean_60m_ann         13 dist_coast   165.08
5           6      uostr_mean          1        lat   153.50
6           3       temp_mean          1        lat    99.53
7           9        ssh_mean          3  temp_mean    93.04
8          13      dist_coast          4   sal_mean    92.60
9          11      bathy_mean          3  temp_mean    89.52
10         16 o2_mean_60m_ann          4   sal_mean    77.49
11         15  o2_mean_0m_ann          4   sal_mean    75.36
12          9        ssh_mean          2   chl_mean    69.98
13          8      vostr_mean          4   sal_mean    62.33
14         11      bathy_mean          9   ssh_mean    60.04
[1] "External percent deviance explained"
[1] 0.5204087

[1] "TPR"
[1] 0.7236268
[1] "TSS"
[1] 0.7422174
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3173993 0.7858205 0.9486514  0.998094         0.5204087 0.5285179
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862856
Residual.Deviance  0.5979158
Correlation        0.8131143
AUC                0.9593000
Per.Expl          56.8692188
cvDeviance         0.8555333
cvCorrelation      0.6625779
cvAUC              0.8793100
cvPer.Expl        38.2859278
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_250m_ann  16.178947
o2_mean_0m        15.177742
o2_mean_0m_seas    9.907204
temp_mean          6.497884
o2_mean_250m_seas  6.092396
o2_mean_60m_seas   5.404267
bathy_mean         5.055538
o2_mean_60m_ann    4.745217
sal_mean           4.411907
chl_mean           3.792187
o2_mean_0m_ann     3.029557
o2_mean_250m       2.746472
ssh_mean           2.606318
o2_mean_60m        2.599873
mld_mean           2.186092
vostr_mean         1.834691
bathy_sd           1.699575
vo_mean            1.699286
uo_mean            1.648957
pred_var           1.562635
uostr_mean         1.123254
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index        var2.names int.size
1          16   o2_mean_0m_seas          1        o2_mean_0m   314.48
2           3         temp_mean          1        o2_mean_0m   289.71
3          11        bathy_mean          3         temp_mean   193.84
4          18 o2_mean_250m_seas         14      o2_mean_250m   183.63
5          20   o2_mean_60m_ann         11        bathy_mean   145.03
6          19    o2_mean_0m_ann         16   o2_mean_0m_seas    85.20
7          16   o2_mean_0m_seas          9          ssh_mean    79.30
8          13       o2_mean_60m         11        bathy_mean    71.76
9          16   o2_mean_0m_seas         11        bathy_mean    65.70
10         20   o2_mean_60m_ann          3         temp_mean    62.46
11         21  o2_mean_250m_ann         18 o2_mean_250m_seas    55.73
12         19    o2_mean_0m_ann          4          sal_mean    51.58
13          8        vostr_mean          4          sal_mean    51.37
14         15          pred_var          8        vostr_mean    51.21
15         17  o2_mean_60m_seas          4          sal_mean    48.85
16         11        bathy_mean          1        o2_mean_0m    44.77
17         19    o2_mean_0m_ann          3         temp_mean    44.26
18         12          bathy_sd          8        vostr_mean    44.13
19         11        bathy_mean          4          sal_mean    43.04
20         11        bathy_mean          9          ssh_mean    42.54
21         10          mld_mean          8        vostr_mean    40.56
22          5           uo_mean          3         temp_mean    39.63
[1] "External percent deviance explained"
[1] 0.5616733

[1] "TPR"
[1] 0.7273396
[1] "TSS"
[1] 0.7702744
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5250 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3019287 0.8070669 0.956105 0.9990883         0.5616733 0.5686922
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862856
Residual.Deviance  0.5671679
Correlation        0.8278610
AUC                0.9657000
Per.Expl          59.0872269
cvDeviance         0.8361002
cvCorrelation      0.6730030
cvAUC              0.8850800
cvPer.Expl        39.6877369
[1] "Relative influence of predictor variables"

                    rel.inf
dist_coast        16.282045
o2_mean_0m        12.496115
o2_mean_250m_ann   8.725825
o2_mean_0m_seas    8.005763
temp_mean          6.076514
o2_mean_60m_seas   4.786964
sal_mean           4.595102
lat                4.281887
o2_mean_250m_seas  3.733350
o2_mean_60m_ann    3.494279
chl_mean           3.473096
o2_mean_250m       2.798671
bathy_mean         2.791004
o2_mean_0m_ann     2.672064
ssh_mean           2.344051
o2_mean_60m        2.321452
mld_mean           2.100745
vostr_mean         1.812754
pred_var           1.605802
uo_mean            1.593011
vo_mean            1.490482
bathy_sd           1.280824
uostr_mean         1.238201
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index        var2.names int.size
1           4         temp_mean          2        o2_mean_0m   281.68
2           4         temp_mean          1               lat   188.66
3          22   o2_mean_60m_ann         14        dist_coast   161.35
4          20 o2_mean_250m_seas         16      o2_mean_250m   153.50
5          18   o2_mean_0m_seas          2        o2_mean_0m   137.29
6          14        dist_coast          9        vostr_mean    97.66
7          21    o2_mean_0m_ann         18   o2_mean_0m_seas    87.06
8          18   o2_mean_0m_seas          3          chl_mean    84.54
9          17          pred_var          9        vostr_mean    69.98
10         15       o2_mean_60m         14        dist_coast    66.32
11         12        bathy_mean          4         temp_mean    65.44
12         22   o2_mean_60m_ann          1               lat    62.14
13         18   o2_mean_0m_seas         10          ssh_mean    58.26
14         21    o2_mean_0m_ann          5          sal_mean    54.97
15         23  o2_mean_250m_ann         20 o2_mean_250m_seas    46.00
16         14        dist_coast          5          sal_mean    44.14
17         17          pred_var          5          sal_mean    44.13
18         22   o2_mean_60m_ann          4         temp_mean    40.49
19          9        vostr_mean          5          sal_mean    38.35
20          5          sal_mean          4         temp_mean    37.25
21         13          bathy_sd          9        vostr_mean    37.03
22         22   o2_mean_60m_ann         12        bathy_mean    36.84
23         21    o2_mean_0m_ann          4         temp_mean    36.29
24         19  o2_mean_60m_seas          5          sal_mean    35.59
25         12        bathy_mean         10          ssh_mean    33.24
26         20 o2_mean_250m_seas         18   o2_mean_0m_seas    29.74
[1] "External percent deviance explained"
[1] 0.582118

[1] "TPR"
[1] 0.7302281
[1] "TSS"
[1] 0.7874428
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2928683 0.820827 0.9619062 0.9987813          0.582118 0.5908723

AGI models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862835
Residual.Deviance  0.8251404
Correlation        0.6939540
AUC                0.8991000
Per.Expl          40.4782338
cvDeviance         0.9404219
cvCorrelation      0.6134191
cvAUC              0.8508300
cvPer.Expl        32.1623635
[1] "Relative influence of predictor variables"

                rel.inf
AGI_250m_seas 22.274332
temp_mean     18.540774
bathy_mean    12.674723
AGI_0m_seas   10.926103
sal_mean       8.104305
AGI_60m_seas   6.579498
chl_mean       4.148226
ssh_mean       4.078015
mld_mean       2.972459
vostr_mean     2.221736
vo_mean        1.866460
bathy_sd       1.737114
uo_mean        1.398152
uostr_mean     1.319364
pred_var       1.158739
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          13   AGI_0m_seas          2  temp_mean   250.88
2          14  AGI_60m_seas          2  temp_mean   103.94
3          10    bathy_mean          2  temp_mean    77.52
4          15 AGI_250m_seas          3   sal_mean    56.09
5          15 AGI_250m_seas          2  temp_mean    56.09
6           7    vostr_mean          2  temp_mean    42.38
7          13   AGI_0m_seas          9   mld_mean    37.60
8          14  AGI_60m_seas         10 bathy_mean    35.78
9           2     temp_mean          1   chl_mean    35.67
10          6       vo_mean          3   sal_mean    32.86
11         13   AGI_0m_seas          8   ssh_mean    30.74
[1] "External percent deviance explained"
[1] 0.3934875

[1] "TPR"
[1] 0.6960424
[1] "TSS"
[1] 0.6104789
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3684756 0.6826742 0.893425  1.009876         0.3934875 0.4047823
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862835
Residual.Deviance  0.8006785
Correlation        0.7068682
AUC                0.9066000
Per.Expl          42.2427977
cvDeviance         0.9168479
cvCorrelation      0.6273558
cvAUC              0.8592600
cvPer.Expl        33.8628864
[1] "Relative influence of predictor variables"

                 rel.inf
dist_coast    28.8893087
temp_mean     10.5432446
AGI_0m_seas   10.1288917
AGI_250m_seas 10.0417742
lat            8.8687497
sal_mean       7.4016025
AGI_60m_seas   4.3633851
bathy_mean     4.1233081
chl_mean       3.7589339
ssh_mean       3.0280635
mld_mean       2.1940657
vostr_mean     1.5684098
vo_mean        1.2555925
pred_var       1.0180637
uo_mean        0.9757527
bathy_sd       0.9564136
uostr_mean     0.8844400
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          11    bathy_mean          3   temp_mean    97.96
2          16  AGI_60m_seas          3   temp_mean    97.27
3          15   AGI_0m_seas          3   temp_mean    75.27
4          17 AGI_250m_seas          4    sal_mean    68.92
5           4      sal_mean          1         lat    61.71
6           8    vostr_mean          3   temp_mean    52.21
7           3     temp_mean          1         lat    38.14
8           3     temp_mean          2    chl_mean    35.84
9          15   AGI_0m_seas         14    pred_var    31.75
10         15   AGI_0m_seas         10    mld_mean    29.54
11          6    uostr_mean          1         lat    27.49
12         17 AGI_250m_seas         15 AGI_0m_seas    24.40
13          9      ssh_mean          1         lat    21.37
14         13    dist_coast          4    sal_mean    20.69
[1] "External percent deviance explained"
[1] 0.4108189

[1] "TPR"
[1] 0.6997377
[1] "TSS"
[1] 0.6225376
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.3625151 0.6956699 0.9008476  1.010332         0.4108189 0.422428
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862835
Residual.Deviance  0.7076969
Correlation        0.7638716
AUC                0.9383000
Per.Expl          48.9500550
cvDeviance         0.9482559
cvCorrelation      0.6060927
cvAUC              0.8473700
cvPer.Expl        31.5972584
[1] "Relative influence of predictor variables"

               rel.inf
AGI_250m_ann 22.322069
temp_mean    17.196598
bathy_mean    8.824421
sal_mean      8.746002
AGI_60m_ann   6.666562
chl_mean      6.027215
AGI_0m_ann    5.506478
ssh_mean      4.904209
mld_mean      3.679155
vostr_mean    3.311048
bathy_sd      2.955120
vo_mean       2.697215
uostr_mean    2.496864
uo_mean       2.416078
pred_var      2.250965
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1          15 AGI_250m_ann         13 AGI_0m_ann   145.88
2           2    temp_mean          1   chl_mean   139.37
3           6      vo_mean          3   sal_mean   114.16
4           7   vostr_mean          2  temp_mean   101.76
5          15 AGI_250m_ann          2  temp_mean    81.32
6          12     pred_var          4    uo_mean    70.90
7           3     sal_mean          2  temp_mean    63.15
8           8     ssh_mean          2  temp_mean    63.05
9          14  AGI_60m_ann          8   ssh_mean    61.26
10         13   AGI_0m_ann          2  temp_mean    55.14
11          8     ssh_mean          1   chl_mean    53.38
[1] "External percent deviance explained"
[1] 0.4801424

[1] "TPR"
[1] 0.7163946
[1] "TSS"
[1] 0.7031821
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4800 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3344379 0.7553953 0.934186  1.006753         0.4801424 0.4895005
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862835
Residual.Deviance  0.6515083
Correlation        0.7920543
AUC                0.9517000
Per.Expl          53.0032392
cvDeviance         0.9139383
cvCorrelation      0.6265787
cvAUC              0.8589900
cvPer.Expl        34.0727658
[1] "Relative influence of predictor variables"

               rel.inf
dist_coast   22.423691
AGI_250m_ann 10.765829
temp_mean    10.510588
lat           9.523071
sal_mean      7.696498
chl_mean      5.973845
AGI_60m_ann   4.842763
AGI_0m_ann    4.753056
ssh_mean      3.983158
bathy_mean    3.959771
mld_mean      2.875991
vostr_mean    2.499355
vo_mean       2.351126
pred_var      2.277252
uo_mean       1.961842
bathy_sd      1.834693
uostr_mean    1.767471
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3    temp_mean          1        lat   162.87
2           8   vostr_mean          3  temp_mean   139.21
3           3    temp_mean          2   chl_mean   125.54
4          16  AGI_60m_ann          1        lat    96.96
5           6   uostr_mean          1        lat    93.81
6          13   dist_coast         10   mld_mean    91.10
7          11   bathy_mean          3  temp_mean    90.92
8          15   AGI_0m_ann          3  temp_mean    86.23
9          17 AGI_250m_ann         13 dist_coast    85.41
10         13   dist_coast          4   sal_mean    77.67
11          8   vostr_mean          1        lat    76.13
12         16  AGI_60m_ann         13 dist_coast    74.50
13         15   AGI_0m_ann          4   sal_mean    67.33
14         16  AGI_60m_ann          9   ssh_mean    59.22
[1] "External percent deviance explained"
[1] 0.5203384

[1] "TPR"
[1] 0.723391
[1] "TSS"
[1] 0.7405006
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3183485 0.7838702 0.9482195  1.006969         0.5203384 0.5300324
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862835
Residual.Deviance  0.5664194
Correlation        0.8279527
AUC                0.9661000
Per.Expl          59.1411534
cvDeviance         0.8483558
cvCorrelation      0.6645373
cvAUC              0.8805500
cvPer.Expl        38.8035803
[1] "Relative influence of predictor variables"

                rel.inf
AGI_250m_ann  12.823573
temp_mean     12.702689
AGI_0m        11.412626
bathy_mean     7.586913
AGI_0m_seas    7.377428
sal_mean       5.367600
AGI_60m_ann    4.888697
AGI_250m_seas  4.318591
AGI_0m_ann     3.839247
AGI_250m       3.720149
AGI_60m_seas   3.717035
ssh_mean       3.636899
chl_mean       3.126271
vostr_mean     2.371861
mld_mean       2.186303
AGI_60m        2.058995
bathy_sd       2.002734
vo_mean        1.870397
uo_mean        1.766527
pred_var       1.653718
uostr_mean     1.571746
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          12        AGI_0m          2   temp_mean  1419.86
2          20   AGI_60m_ann         16 AGI_0m_seas   311.03
3          12        AGI_0m         10  bathy_mean   135.86
4           7    vostr_mean          2   temp_mean   124.41
5          16   AGI_0m_seas         14    AGI_250m   100.59
6          19    AGI_0m_ann         16 AGI_0m_seas    94.31
7          12        AGI_0m          3    sal_mean    91.29
8          16   AGI_0m_seas          2   temp_mean    69.16
9          18 AGI_250m_seas         14    AGI_250m    67.35
10         20   AGI_60m_ann          8    ssh_mean    65.77
11         12        AGI_0m          8    ssh_mean    56.81
12         13       AGI_60m         10  bathy_mean    51.64
13         16   AGI_0m_seas         11    bathy_sd    47.71
14         16   AGI_0m_seas          7  vostr_mean    44.87
15         19    AGI_0m_ann         14    AGI_250m    42.03
16         20   AGI_60m_ann         10  bathy_mean    36.72
17         16   AGI_0m_seas          9    mld_mean    35.92
18          8      ssh_mean          3    sal_mean    35.35
19         17  AGI_60m_seas         12      AGI_0m    34.64
20         21  AGI_250m_ann          2   temp_mean    34.02
21         16   AGI_0m_seas         15    pred_var    33.74
22         15      pred_var          7  vostr_mean    32.25
[1] "External percent deviance explained"
[1] 0.581436

[1] "TPR"
[1] 0.730664
[1] "TSS"
[1] 0.7829999
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2937248 0.8196993 0.9627405  1.002194          0.581436 0.5914115
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862835
Residual.Deviance  0.5708909
Correlation        0.8249740
AUC                0.9649000
Per.Expl          58.8186041
cvDeviance         0.8334177
cvCorrelation      0.6724646
cvAUC              0.8849000
cvPer.Expl        39.8811495
[1] "Relative influence of predictor variables"

                rel.inf
dist_coast    19.788149
AGI_0m        10.839193
lat            6.905782
temp_mean      6.888633
AGI_0m_seas    6.749013
AGI_250m_ann   6.354972
sal_mean       5.264005
AGI_60m_ann    4.058445
AGI_0m_ann     3.397817
AGI_60m_seas   3.367171
AGI_250m       3.334527
bathy_mean     3.156230
chl_mean       2.908569
ssh_mean       2.889818
AGI_250m_seas  2.355303
AGI_60m        1.897931
mld_mean       1.739738
pred_var       1.565287
vo_mean        1.551979
vostr_mean     1.506959
uo_mean        1.287787
uostr_mean     1.126974
bathy_sd       1.065720
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          13        AGI_0m          3   temp_mean   868.54
2           3     temp_mean          1         lat   252.16
3          22   AGI_60m_ann         18 AGI_0m_seas   241.04
4           8    vostr_mean          3   temp_mean   137.38
5           6    uostr_mean          1         lat   120.49
6          21    AGI_0m_ann         18 AGI_0m_seas   108.12
7          13        AGI_0m         11  bathy_mean   102.44
8          13        AGI_0m          9    ssh_mean    70.50
9          20 AGI_250m_seas          4    sal_mean    69.96
10         23  AGI_250m_ann         14  dist_coast    69.31
11         18   AGI_0m_seas         16    AGI_250m    66.03
12          4      sal_mean          1         lat    65.97
13         20 AGI_250m_seas         16    AGI_250m    57.06
14         13        AGI_0m          1         lat    56.19
15         22   AGI_60m_ann         11  bathy_mean    51.55
16         14    dist_coast         10    mld_mean    50.67
17         21    AGI_0m_ann          1         lat    47.56
18         17      pred_var          8  vostr_mean    44.15
19         22   AGI_60m_ann          1         lat    38.77
20         22   AGI_60m_ann         14  dist_coast    38.74
21         18   AGI_0m_seas          3   temp_mean    36.97
22         18   AGI_0m_seas         15     AGI_60m    35.31
23         18   AGI_0m_seas         17    pred_var    31.98
24         18   AGI_0m_seas         13      AGI_0m    31.73
25         16      AGI_250m         11  bathy_mean    30.47
26         12      bathy_sd          7     vo_mean    29.92
[1] "External percent deviance explained"
[1] 0.5777579

[1] "TPR"
[1] 0.729998
[1] "TSS"
[1] 0.772311
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
      RMSE      Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.295842 0.816219 0.9614117  1.004362         0.5777579 0.588186

Summary table of results

output_sum_seas_ann <- read.csv(here("data/brt/mod_outputs/brt_crw_seas_ann_output_summary.csv"))
kableExtra::kable(output_sum_seas_ann)
model percent_explained deviance_exp TPR_mean TSS AUC RMSE SpearmanCor PseudoR2
brt_base_0m_seas_Nspat_Ntag 36.455 0.349 0.683 0.556 0.867 0.385 0.640 0.365
brt_base_0m_ann_Nspat_Ntag 47.548 0.433 0.704 0.651 0.909 0.354 0.711 0.475
brt_do_0m_60m_250m_seas_Nspat_Ntag 41.659 0.400 0.696 0.617 0.894 0.366 0.686 0.417
brt_do_0m_60m_250m_seas_Yspat_Ntag 43.093 0.412 0.699 0.624 0.899 0.363 0.695 0.431
brt_do_0m_60m_250m_ann_Nspat_Ntag 48.339 0.450 0.709 0.668 0.919 0.347 0.729 0.483
brt_do_0m_60m_250m_ann_Yspat_Ntag 52.852 0.485 0.715 0.698 0.932 0.334 0.754 0.529
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag 56.869 0.531 0.721 0.733 0.944 0.316 0.783 0.569
brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag 59.087 0.547 0.724 0.747 0.949 0.309 0.793 0.591
brt_agi_0m_60m_250m_seas_Nspat_Ntag 40.478 0.381 0.692 0.595 0.886 0.373 0.672 0.405
brt_agi_0m_60m_250m_seas_Yspat_Ntag 42.243 0.397 0.696 0.612 0.893 0.367 0.684 0.422
brt_agi_0m_60m_250m_ann_Nspat_Ntag 48.950 0.442 0.706 0.659 0.914 0.350 0.722 0.490
brt_agi_0m_60m_250m_ann_Yspat_Ntag 53.003 0.479 0.713 0.694 0.928 0.336 0.749 0.530
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag 59.141 0.542 0.723 0.743 0.947 0.311 0.790 0.591
brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag 58.819 0.543 0.723 0.743 0.947 0.311 0.791 0.588
base_0m_daily_Nspat_Ntag 42.389 0.385 0.695 0.613 0.892 0.371 0.679 0.424
do_0m_daily_Nspat_Ntag 49.447 0.450 0.708 0.671 0.917 0.347 0.727 0.494
agi_0m_daily_Nspat_Ntag 48.505 0.437 0.705 0.652 0.911 0.352 0.716 0.485
output_sum_seas_ann_Nspat <- output_sum_seas_ann %>%
  filter(!grepl("Yspat", model))

ggplot(output_sum_seas_ann_Nspat, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50', 
                  max.overlaps = 20, 
                  label.size = 0.5)

Conclusions from initial seasonal/annual models

  • Seasonal and annual base models were comparable in performance to the daily resolution base model, with seasonal performing slightly worse, and annual performing slightly better.

  • The AGI model with all depth layers and resolutions performed the best if only looking at models with no spatial predictor variables, but the comparable DO model performed similarly.

  • Annual models generally performed better than seasonal ones, but the models with data at a daily, seasonal, and annual data performed considerably better.

  • For the DO model with all depths and temporal resolutions, the two predictors with the highest relative influence (and whose values were quite comparable) were DO_250m_annual and DO_0m_daily. The remaining seasonal DO values were also highly ranked. Partial plots either show a negative correlation or a sweet spot range of DO values at each of the depth layers and resolutions.

  • For the AGI model with all depths and temporal resolutions, the top predictor variable is AGI_250m_annual, which is closely followed by daily temperature at 0m. Lower down the list is AGI_0m_daily, bathymetry, and AGI_0m_seasonal. Partial plot relationships show similar trends as described previously.

Model fine-tuning and selection

Here, I take the two best performing models from the above sections (agi and do with all depths and temporal resolutions without tag ID or spatial variables as predictors) to be used as overfit reference models. The following model options excluded the wind predictors as these consistently had lower relative importance than the random predictor variable we included. I also included a combo model that uses information about AGI at 250 m and DO at 0m across temporal resolutions. Lastly, the final models also remove do/agi at 60m and at a seasonal resolution, as these were typically the vars with the lowest predictive performance relative to the other depth layers and resolutions.

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_base_0m_dail_no_wind.rds",
            test_data = base_test_daily)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.8207228
Correlation        0.7026177
AUC                0.9063000
Per.Expl          40.7968459
cvDeviance         1.0092555
cvCorrelation      0.5671984
cvAUC              0.8240500
cvPer.Expl        27.1969674
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 30.402818
temp_mean  21.775619
sal_mean   12.415293
chl_mean    9.738094
bathy_sd    7.545582
ssh_mean    7.489137
mld_mean    6.792296
pred_var    3.841160
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          6 bathy_mean          2  temp_mean   704.51
2          4   ssh_mean          2  temp_mean   449.02
3          3   sal_mean          2  temp_mean   290.44
[1] "External percent deviance explained"
[1] 0.3727562

[1] "TPR"
[1] 0.691841
[1] "TSS"
[1] 0.5944967
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4200 iterations were performed.
There were 8 predictors of which 8 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3753995 0.6673045 0.8850325  1.001694         0.3727562 0.4079685
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.5891149
Correlation        0.8163435
AUC                0.9606000
Per.Expl          57.5041190
cvDeviance         0.8440735
cvCorrelation      0.6692730
cvAUC              0.8827300
cvPer.Expl        39.1126493
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.287654
o2_mean_250m_ann  14.631246
o2_mean_0m_seas    8.910687
o2_mean_250m_seas  7.142539
temp_mean          6.627435
o2_mean_60m_ann    5.495030
o2_mean_60m_seas   5.325783
bathy_mean         4.964121
sal_mean           4.592187
o2_mean_250m       4.465235
chl_mean           4.270698
o2_mean_0m_ann     3.305816
o2_mean_60m        3.042965
ssh_mean           2.916029
mld_mean           2.724895
bathy_sd           2.244270
pred_var           2.053409
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   508.25
2          12   o2_mean_0m_seas          1   o2_mean_0m   301.24
3          14 o2_mean_250m_seas         10 o2_mean_250m   166.76
4          12   o2_mean_0m_seas          5     ssh_mean   150.68
5          12   o2_mean_0m_seas          2     chl_mean   122.58
6           4          sal_mean          3    temp_mean   101.22
7          15    o2_mean_0m_ann          4     sal_mean    89.73
8          16   o2_mean_60m_ann          7   bathy_mean    86.63
9           9       o2_mean_60m          7   bathy_mean    81.59
10          9       o2_mean_60m          8     bathy_sd    77.18
11         11          pred_var          7   bathy_mean    66.90
12          7        bathy_mean          3    temp_mean    65.75
13         12   o2_mean_0m_seas          4     sal_mean    63.82
14         13  o2_mean_60m_seas          4     sal_mean    63.74
[1] "External percent deviance explained"
[1] 0.5285492

[1] "TPR"
[1] 0.7203563
[1] "TSS"
[1] 0.7319403
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3166705 0.7799527 0.9421311  1.001803         0.5285492 0.5750412
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862598
Residual.Deviance  0.1784155
Correlation        0.9620158
AUC                0.9980000
Per.Expl          87.1297181
cvDeviance         0.4443144
cvCorrelation      0.8616943
cvAUC              0.9680100
cvPer.Expl        67.9486949
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean    29.410142
temp_mean     22.090687
AGI_250m_seas  8.885923
AGI_0m         7.036726
AGI_0m_seas    4.031791
sal_mean       3.863634
AGI_250m_ann   3.717057
AGI_250m       3.185755
AGI_60m_ann    3.018782
ssh_mean       2.964206
chl_mean       2.686814
AGI_60m_seas   2.575530
AGI_0m_ann     1.639762
AGI_60m        1.581991
bathy_sd       1.553865
mld_mean       1.073780
pred_var       0.683555
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  3465.55
2          15    AGI_0m_ann         12 AGI_0m_seas   386.42
3           6    bathy_mean          3    sal_mean   359.44
4          16   AGI_60m_ann          6  bathy_mean   297.40
5          16   AGI_60m_ann         12 AGI_0m_seas   290.80
6          17  AGI_250m_ann          3    sal_mean   258.66
7          14 AGI_250m_seas          6  bathy_mean   192.39
8          14 AGI_250m_seas          2   temp_mean   163.51
9           8        AGI_0m          4    ssh_mean   162.57
10          6    bathy_mean          2   temp_mean   154.13
11         14 AGI_250m_seas         10    AGI_250m   153.38
12          3      sal_mean          2   temp_mean   150.76
13         12   AGI_0m_seas          2   temp_mean   123.54
14          8        AGI_0m          3    sal_mean   103.44
[1] "External percent deviance explained"
[1] -2.981096

[1] "TPR"
[1] 0.3431204
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.8144573 -0.5895665 0.1853505 0.6929731         -2.981096 0.8712972
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_250_DO_0_dail_seas_ann.rds",
            test_data = readRDS(here("data/brt/mod_eval/agi_do_test_daily_seasonal_annual.rds")))
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.7427241
Correlation        0.7474077
AUC                0.9299000
Per.Expl          46.4236261
cvDeviance         0.9683087
cvCorrelation      0.5940334
cvAUC              0.8398100
cvPer.Expl        30.1510880
[1] "Relative influence of predictor variables"

                  rel.inf
AGI_250m_ann    19.506760
temp_mean       18.786242
bathy_mean      11.359335
sal_mean         9.001494
AGI_250m_seas    7.495655
chl_mean         6.480442
ssh_mean         5.010443
AGI_250m         4.757546
bathy_sd         4.426992
mld_mean         4.362426
pred_var         2.438232
o2_mean_0m_ann   2.265174
o2_mean_0m       2.169848
o2_mean_0m_seas  1.939412
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index    var2.names int.size
1          10 AGI_250m_seas          3      sal_mean   214.08
2          10 AGI_250m_seas          8      AGI_250m   181.77
3           6    bathy_mean          2     temp_mean   176.86
4           3      sal_mean          2     temp_mean   124.68
5          11  AGI_250m_ann         10 AGI_250m_seas   107.03
6           4      ssh_mean          2     temp_mean   106.46
7           2     temp_mean          1      chl_mean    72.76
8           6    bathy_mean          4      ssh_mean    67.22
9          11  AGI_250m_ann          2     temp_mean    60.92
10          4      ssh_mean          3      sal_mean    57.36
[1] "External percent deviance explained"
[1] 0.3919503

[1] "TPR"
[1] 0.6946288
[1] "TSS"
[1] 0.6173527
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3692691 0.6793932 0.8906042  1.010997         0.3919503 0.4642363
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_dail_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6163731
Correlation        0.8037426
AUC                0.9552000
Per.Expl          55.5378474
cvDeviance         0.8577154
cvCorrelation      0.6617449
cvAUC              0.8786300
cvPer.Expl        38.1285877
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.645183
o2_mean_250m_ann  17.206276
o2_mean_0m_seas   10.345249
temp_mean          7.771990
o2_mean_250m_seas  7.199949
bathy_mean         7.171107
sal_mean           6.275883
chl_mean           5.065733
o2_mean_0m_ann     4.840965
o2_mean_250m       4.413205
ssh_mean           4.028459
mld_mean           3.069241
bathy_sd           2.700490
pred_var           2.266270
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   510.24
2          13    o2_mean_0m_ann          3    temp_mean   283.65
3          12 o2_mean_250m_seas          4     sal_mean   254.65
4          11   o2_mean_0m_seas          1   o2_mean_0m   212.12
5           4          sal_mean          3    temp_mean   189.20
6          11   o2_mean_0m_seas          5     ssh_mean   181.69
7           7        bathy_mean          3    temp_mean   158.82
8          13    o2_mean_0m_ann          4     sal_mean   151.35
9          12 o2_mean_250m_seas          9 o2_mean_250m   148.24
10          5          ssh_mean          4     sal_mean   132.70
[1] "External percent deviance explained"
[1] 0.5114889

[1] "TPR"
[1] 0.7177333
[1] "TSS"
[1] 0.7216508
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3235065 0.7685859 0.9368822  1.002484         0.5114889 0.5553785
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_dail_ann.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6155217
Correlation        0.8049238
AUC                0.9559000
Per.Expl          55.5992622
cvDeviance         0.8649027
cvCorrelation      0.6571466
cvAUC              0.8761100
cvPer.Expl        37.6101303
[1] "Relative influence of predictor variables"

                   rel.inf
o2_mean_0m       22.911813
o2_mean_250m_ann 20.906459
temp_mean         7.579760
o2_mean_60m_ann   6.536482
bathy_mean        6.283049
sal_mean          5.382627
o2_mean_60m       5.220181
chl_mean          4.808321
o2_mean_250m      4.411980
o2_mean_0m_ann    4.227451
ssh_mean          3.567634
mld_mean          3.088428
bathy_sd          2.746170
pred_var          2.329644
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index  var2.names int.size
1           3        temp_mean          1  o2_mean_0m   435.39
2          12   o2_mean_0m_ann          4    sal_mean   215.51
3          12   o2_mean_0m_ann          3   temp_mean   157.08
4           5         ssh_mean          4    sal_mean   130.86
5          13  o2_mean_60m_ann          9 o2_mean_60m   108.59
6           9      o2_mean_60m          8    bathy_sd   107.81
7           7       bathy_mean          1  o2_mean_0m   107.33
8           4         sal_mean          3   temp_mean   102.56
9          14 o2_mean_250m_ann          5    ssh_mean    95.09
10         14 o2_mean_250m_ann          1  o2_mean_0m    86.49
[1] "External percent deviance explained"
[1] 0.5100508

[1] "TPR"
[1] 0.7177138
[1] "TSS"
[1] 0.7178233
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
     RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.32378 0.768407 0.9368439  1.002037         0.5100508 0.5559926
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6251108
Correlation        0.8003773
AUC                0.9539000
Per.Expl          54.9075532
cvDeviance         0.8721020
cvCorrelation      0.6521736
cvAUC              0.8737200
cvPer.Expl        37.0908065
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m_seas   19.995636
o2_mean_250m_ann  18.349971
temp_mean          8.136337
o2_mean_60m_ann    6.495673
o2_mean_250m_seas  5.955392
bathy_mean         5.754092
o2_mean_60m_seas   5.701270
sal_mean           5.400275
chl_mean           5.206935
o2_mean_0m_ann     4.813953
ssh_mean           4.746915
bathy_sd           3.881525
mld_mean           3.166622
pred_var           2.395403
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index      var2.names int.size
1           9  o2_mean_0m_seas          4        ssh_mean   284.85
2          12   o2_mean_0m_ann          9 o2_mean_0m_seas   187.29
3          13  o2_mean_60m_ann          2       temp_mean   130.99
4          10 o2_mean_60m_seas          3        sal_mean   130.12
5          13  o2_mean_60m_ann          6      bathy_mean   127.90
6           4         ssh_mean          3        sal_mean   126.43
7           6       bathy_mean          2       temp_mean   122.51
8          12   o2_mean_0m_ann          2       temp_mean   113.98
9          12   o2_mean_0m_ann          3        sal_mean   113.39
10          9  o2_mean_0m_seas          3        sal_mean   102.47
[1] "External percent deviance explained"
[1] 0.5045185

[1] "TPR"
[1] 0.7169489
[1] "TSS"
[1] 0.7132807
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5750 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
     RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.32614 0.7645057 0.9353368  1.000552         0.5045185 0.5490755
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_daily_ann.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6484473
Correlation        0.7885861
AUC                0.9484000
Per.Expl          53.2241715
cvDeviance         0.8798869
cvCorrelation      0.6493045
cvAUC              0.8712000
cvPer.Expl        36.5292446
[1] "Relative influence of predictor variables"

                   rel.inf
o2_mean_0m       23.928416
o2_mean_250m_ann 22.476323
temp_mean         8.228036
bathy_mean        7.977447
sal_mean          7.293728
o2_mean_250m      6.043256
o2_mean_0m_ann    5.555636
chl_mean          5.205159
ssh_mean          4.371214
bathy_sd          3.247484
mld_mean          3.079187
pred_var          2.594113
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index     var1.names var2.index var2.names int.size
1          3      temp_mean          1 o2_mean_0m   399.72
2          5       ssh_mean          4   sal_mean   271.74
3         11 o2_mean_0m_ann          3  temp_mean   259.43
4         11 o2_mean_0m_ann          4   sal_mean   153.76
5          5       ssh_mean          1 o2_mean_0m   110.68
6          5       ssh_mean          3  temp_mean   109.51
7          7     bathy_mean          3  temp_mean   106.18
[1] "External percent deviance explained"
[1] 0.4903438

[1] "TPR"
[1] 0.7145817
[1] "TSS"
[1] 0.7052563
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3316533 0.7548044 0.9305779  1.003816         0.4903438 0.5322417
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_daily_ann_refined.rds",
            test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6761948
Correlation        0.7734595
AUC                0.9410000
Per.Expl          51.2225994
cvDeviance         0.8926418
cvCorrelation      0.6411554
cvAUC              0.8660500
cvPer.Expl        35.6091636
[1] "Relative influence of predictor variables"

                   rel.inf
o2_mean_250m_ann 26.649713
o2_mean_0m       25.554358
temp_mean         9.448079
bathy_mean        9.055228
sal_mean          7.947775
chl_mean          6.066081
ssh_mean          4.828383
bathy_sd          3.788760
mld_mean          3.688052
pred_var          2.973574
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index       var1.names var2.index var2.names int.size
1          3        temp_mean          1 o2_mean_0m   659.50
2         10 o2_mean_250m_ann          5   ssh_mean   184.31
3          5         ssh_mean          1 o2_mean_0m   174.16
4         10 o2_mean_250m_ann          1 o2_mean_0m   162.94
5          4         sal_mean          3  temp_mean   153.47
[1] "External percent deviance explained"
[1] 0.4725454

[1] "TPR"
[1] 0.7110877
[1] "TSS"
[1] 0.6838483
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.3391331 0.7405838 0.9236049  1.005523         0.4725454 0.512226
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6277548
Correlation        0.7973163
AUC                0.9522000
Per.Expl          54.7169289
cvDeviance         0.8617301
cvCorrelation      0.6582930
cvAUC              0.8768100
cvPer.Expl        37.8391330
[1] "Relative influence of predictor variables"

                rel.inf
temp_mean     15.504668
AGI_250m_ann  15.163685
AGI_0m        11.740021
bathy_mean    11.710734
AGI_0m_seas    8.197830
sal_mean       6.807123
AGI_250m_seas  6.751818
AGI_0m_ann     5.161877
chl_mean       5.090897
AGI_250m       4.624616
bathy_sd       3.784137
mld_mean       3.106183
pred_var       2.356411
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index    var2.names int.size
1          7        AGI_0m          2     temp_mean  2843.30
2          7        AGI_0m          5    bathy_mean   355.65
3         12    AGI_0m_ann         10   AGI_0m_seas   197.02
4         10   AGI_0m_seas          8      AGI_250m   181.68
5         13  AGI_250m_ann         12    AGI_0m_ann   165.05
6         11 AGI_250m_seas          8      AGI_250m   149.59
7         10   AGI_0m_seas          2     temp_mean   136.02
8         13  AGI_250m_ann         11 AGI_250m_seas   122.24
[1] "External percent deviance explained"
[1] 0.4999456

[1] "TPR"
[1] 0.7155658
[1] "TSS"
[1] 0.7086853
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5450 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.328907 0.7586342 0.9325364  1.007354         0.4999456 0.5471693
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_dail_ann.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6367013
Correlation        0.7941612
AUC                0.9510000
Per.Expl          54.0715738
cvDeviance         0.8740148
cvCorrelation      0.6515205
cvAUC              0.8726800
cvPer.Expl        36.9529728
[1] "Relative influence of predictor variables"

               rel.inf
AGI_250m_ann 17.274493
temp_mean    16.133953
AGI_0m       14.099815
bathy_mean    9.641570
AGI_60m_ann   7.408950
sal_mean      7.026667
AGI_0m_ann    4.754954
chl_mean      4.594943
AGI_250m      4.537709
mld_mean      3.251739
AGI_60m       3.225000
bathy_sd      3.186472
uo_mean       2.713621
pred_var      2.150112
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           8       AGI_0m          2  temp_mean  2976.46
2           8       AGI_0m          6 bathy_mean   252.24
3          14 AGI_250m_ann         12 AGI_0m_ann   165.77
4           8       AGI_0m          4    uo_mean   162.60
5          12   AGI_0m_ann          5   mld_mean    93.44
6           8       AGI_0m          3   sal_mean    86.30
7          14 AGI_250m_ann          2  temp_mean    83.71
8          12   AGI_0m_ann          7   bathy_sd    80.94
9          12   AGI_0m_ann          2  temp_mean    67.97
10         12   AGI_0m_ann         11   pred_var    65.20
[1] "External percent deviance explained"
[1] 0.4910487

[1] "TPR"
[1] 0.7138839
[1] "TSS"
[1] 0.7005274
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3321372 0.7529645 0.9291586  1.008817         0.4910487 0.5407157
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6158797
Correlation        0.8025426
AUC                0.9544000
Per.Expl          55.5735382
cvDeviance         0.8570209
cvCorrelation      0.6610243
cvAUC              0.8783300
cvPer.Expl        38.1788311
[1] "Relative influence of predictor variables"

                rel.inf
temp_mean     15.578233
AGI_250m_ann  15.210565
AGI_0m        11.973672
bathy_mean     9.776419
AGI_0m_seas    7.890212
AGI_250m_seas  6.641468
sal_mean       6.271472
AGI_60m_ann    6.177322
AGI_60m_seas   5.458597
AGI_0m_ann     4.936646
mld_mean       3.089618
uo_mean        2.844618
pred_var       2.119871
bathy_sd       2.031289
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index  var2.names int.size
1           7       AGI_0m          1   temp_mean  2341.72
2          13  AGI_60m_ann          9 AGI_0m_seas   353.40
3           7       AGI_0m          5  bathy_mean   239.91
4          12   AGI_0m_ann          9 AGI_0m_seas   116.40
5          12   AGI_0m_ann          6    bathy_sd   106.43
6          13  AGI_60m_ann          1   temp_mean   101.25
7           7       AGI_0m          3     uo_mean    95.95
8          14 AGI_250m_ann         12  AGI_0m_ann    95.63
9           9  AGI_0m_seas          1   temp_mean    82.38
10          8     pred_var          3     uo_mean    64.47
[1] "External percent deviance explained"
[1] 0.5069531

[1] "TPR"
[1] 0.7162487
[1] "TSS"
[1] 0.7110636
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.326375 0.7627901 0.9339043  1.007275         0.5069531 0.5557354
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_daily_ann.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6487719
Correlation        0.7878082
AUC                0.9478000
Per.Expl          53.2008622
cvDeviance         0.8800365
cvCorrelation      0.6486868
cvAUC              0.8714400
cvPer.Expl        36.5185981
[1] "Relative influence of predictor variables"

               rel.inf
AGI_250m_ann 19.948966
temp_mean    16.717759
AGI_0m       15.017058
bathy_mean   11.083678
sal_mean      7.668646
ssh_mean      5.326252
AGI_0m_ann    5.279991
chl_mean      4.620618
AGI_250m      4.413306
bathy_sd      4.127255
mld_mean      3.363532
pred_var      2.432939
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index   var1.names var2.index var2.names int.size
1          8       AGI_0m          2  temp_mean  2154.06
2         12 AGI_250m_ann         11 AGI_0m_ann   350.53
3          8       AGI_0m          4   ssh_mean   292.80
4         11   AGI_0m_ann          5   mld_mean   171.71
5         12 AGI_250m_ann          2  temp_mean   169.96
6          8       AGI_0m          6 bathy_mean   165.47
7          8       AGI_0m          3   sal_mean   105.61
[1] "External percent deviance explained"
[1] 0.4839341

[1] "TPR"
[1] 0.7127614
[1] "TSS"
[1] 0.6940011
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.334548 0.7485892 0.9269441  1.007254         0.4839341 0.5320086
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_daily_ann_refined.rds",
            test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6893252
Correlation        0.7662559
AUC                0.9372000
Per.Expl          50.2755506
cvDeviance         0.8949519
cvCorrelation      0.6400720
cvAUC              0.8654600
cvPer.Expl        35.4426768
[1] "Relative influence of predictor variables"

               rel.inf
AGI_250m_ann 22.930180
temp_mean    17.534897
AGI_0m       16.389380
bathy_mean   12.369388
sal_mean      8.490533
ssh_mean      5.773889
chl_mean      5.512047
bathy_sd      4.792948
mld_mean      3.560432
pred_var      2.646305
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index   var1.names var2.index var2.names int.size
1          8       AGI_0m          2  temp_mean  2783.04
2          8       AGI_0m          6 bathy_mean   264.58
3          8       AGI_0m          4   ssh_mean   255.08
4         10 AGI_250m_ann          2  temp_mean   236.26
5          6   bathy_mean          2  temp_mean   152.65
[1] "External percent deviance explained"
[1] 0.4592981

[1] "TPR"
[1] 0.7080003
[1] "TSS"
[1] 0.6700011
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.344129 0.7301866 0.9173835    1.0093         0.4592981 0.5027555

Summary table of results

output_sum_refined <- read.csv(here("data/brt/mod_outputs/brt_crw_refined_output_summary.csv"))
kableExtra::kable(output_sum_refined)
model percent_explained deviance_exp TPR_mean TSS AUC RMSE SpearmanCor PseudoR2
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag 56.869 0.531 0.721 0.733 0.944 0.316 0.783 0.569
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag 59.141 0.542 0.723 0.743 0.947 0.311 0.790 0.591
base_0m_daily_Nspat_Ntag 42.389 0.385 0.695 0.613 0.892 0.371 0.679 0.424
do_0m_daily_Nspat_Ntag 49.447 0.450 0.708 0.671 0.917 0.347 0.727 0.494
agi_0m_daily_Nspat_Ntag 48.505 0.437 0.705 0.652 0.911 0.352 0.716 0.485
brt_base_0m_dail_no_wind 40.797 0.373 0.692 0.594 0.885 0.375 0.667 0.408
brt_do_0m_60m_250m_dail_seas_ann_no_wind 57.504 0.529 0.720 0.732 0.942 0.317 0.780 0.575
brt_agi_0m_60m_250m_dail_seas_ann_no_wind 57.975 0.524 0.719 0.725 0.940 0.319 0.775 0.580
brt_agi_250_do_0_dail_seas_ann 46.424 0.392 0.695 0.617 0.891 0.369 0.679 0.464
brt_do_0m_250m_dail_seas_ann 55.538 0.511 0.718 0.722 0.937 0.324 0.769 0.555
brt_do_0m_60m_250m_dail_ann 55.599 0.510 0.718 0.719 0.937 0.324 0.768 0.556
brt_do_0m_60m_250m_seas_ann 54.908 0.505 0.717 0.713 0.935 0.326 0.765 0.549
brt_do_0m_250m_dail_ann 53.224 0.490 0.715 0.705 0.931 0.332 0.755 0.532
brt_do_0m_250m_dail_ann_refined 51.223 0.473 0.711 0.683 0.924 0.339 0.741 0.512
brt_agi_0m_250m_dail_seas_ann 55.086 0.501 0.716 0.701 0.933 0.328 0.759 0.551
brt_agi_0m_60m_250m_dail_ann 54.072 0.491 0.714 0.701 0.929 0.332 0.753 0.541
brt_agi_0m_60m_250m_seas_ann 55.574 0.507 0.716 0.711 0.934 0.326 0.763 0.556
brt_agi_0m_250m_dail_ann 53.201 0.484 0.713 0.694 0.927 0.335 0.749 0.532
brt_agi_0m_250m_dail_ann_refined 50.276 0.459 0.708 0.670 0.917 0.344 0.730 0.503
ggplot(output_sum_refined, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50', 
                  max.overlaps = 20, 
                  label.size = 0.5)

Conclusions from adjusted models

  • The DO or AGI annual values at 250 m and the DO or AGI daily values at 0m were consistently those with the highest relative importance.
  • The reference models (that are likely overfit) still performed the best, with the AGI model having the highest scores across performance metrics.
  • Seems like removing the wind predictors doesn’t really change the reference models, so we can move forward without them.
  • All modified models w/o a temporal resolution or depth layer were all within 0.05 TSS and AUC.
  • The combined AGI and DO model performed poorly. Will be best to keep them separated.