Mako hSDM BRT explore (Background PAs)

Author

Emily Nazario

Published

August 15, 2024

On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using background sampling approaches. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.

The hypotheses I would like to test with these models are as follows:

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Base models

These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.

explore_brt(mod_file_path = brt_outputs[7], 
            test_data = base_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.2948092
Correlation        0.9249630
AUC                0.9922000
Per.Expl          78.7336988
cvDeviance         0.5909966
cvCorrelation      0.8025147
cvAUC              0.9464300
cvPer.Expl        57.3679835
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 37.726838
temp_mean  23.806676
sal_mean    7.021355
chl_mean    5.980357
ssh_mean    5.413233
uostr_mean  5.244057
vostr_mean  3.838429
bathy_sd    2.871536
mld_mean    2.581925
uo_mean     2.392186
vo_mean     1.868975
pred_var    1.254433
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   835.60
2         10 bathy_mean          8   ssh_mean   650.18
3          8   ssh_mean          2  temp_mean   556.11
4         10 bathy_mean          3   sal_mean   496.83
5         10 bathy_mean          4    uo_mean   406.37
6          3   sal_mean          2  temp_mean   343.56
7          8   ssh_mean          1   chl_mean   337.30
[1] "External percent deviance explained"
[1] -3.437823

[1] "TPR"
[1] 0.2602644
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained PseudoR2
1 0.8898691 -0.8883681 0.01999832 0.9823639         -3.437823 0.787337

explore_brt(mod_file_path = brt_outputs[8], 
            test_data = base_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38627408
Residual.Deviance  0.09736975
Correlation        0.98457770
AUC                0.99990000
Per.Expl          92.97615463
cvDeviance         0.34392232
cvCorrelation      0.89862302
cvAUC              0.97914000
cvPer.Expl        75.19088555
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 32.0639964
tag        24.5885909
temp_mean  18.9524289
ssh_mean    4.9851264
sal_mean    4.0438501
uostr_mean  3.9759944
chl_mean    3.8548424
vostr_mean  2.6340512
bathy_sd    1.2289919
uo_mean     1.1341238
vo_mean     1.1196672
mld_mean    0.8787739
pred_var    0.5395627
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          4   sal_mean          1        tag  1883.53
2         11 bathy_mean          1        tag   770.37
3          2   chl_mean          1        tag   714.07
4          3  temp_mean          1        tag   626.86
5          9   ssh_mean          1        tag   604.60
6          8 vostr_mean          1        tag   409.93
7          7    vo_mean          1        tag   382.85
8          6 uostr_mean          1        tag   370.45
[1] "External percent deviance explained"
[1] -6.314042

[1] "TPR"
[1] 0.2532462
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6500 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9504276 -0.9615518 0.004527448 0.9781609         -6.314042 0.9297615

explore_brt(mod_file_path = brt_outputs[9], 
            test_data = base_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38627408
Residual.Deviance  0.08949741
Correlation        0.98503325
AUC                0.99990000
Per.Expl          93.54403230
cvDeviance         0.29985176
cvCorrelation      0.91378722
cvAUC              0.98270000
cvPer.Expl        78.36995123
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 52.6117764
tag        20.0388250
lat         8.8678338
temp_mean   4.2491755
bathy_mean  3.6159044
chl_mean    2.7994230
sal_mean    2.3492197
ssh_mean    1.1726958
vostr_mean  1.1624008
vo_mean     0.6291947
uo_mean     0.6153267
bathy_sd    0.5647726
uostr_mean  0.5065045
mld_mean    0.4821748
pred_var    0.3347723
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   737.42
2           5   sal_mean          1        tag   551.43
3          12 bathy_mean          1        tag   502.86
4           3   chl_mean          1        tag   464.36
5          14 dist_coast          1        tag   419.05
6           9 vostr_mean          1        tag   274.27
7           8    vo_mean          1        tag   270.06
8          10   ssh_mean          1        tag   227.94
9           4  temp_mean          1        tag   195.29
10         13   bathy_sd          1        tag   186.34
11          7 uostr_mean          1        tag   171.34
[1] "External percent deviance explained"
[1] -6.741773

[1] "TPR"
[1] 0.2528088
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9557107 -0.9657932 0.003646813  0.980038         -6.741773 0.9354403

DO models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).

explore_brt(mod_file_path = brt_outputs[14], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38629281
Residual.Deviance  0.08039145
Correlation        0.98792844
AUC                1.00000000
Per.Expl          94.20097610
cvDeviance         0.30084003
cvCorrelation      0.91332970
cvAUC              0.98319000
cvPer.Expl        78.29895482
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 32.6469598
o2_mean_0m 26.9115748
tag        20.0046262
temp_mean   4.4981160
chl_mean    3.6729261
ssh_mean    2.6241221
uostr_mean  2.1630542
sal_mean    2.0577804
vostr_mean  1.8988935
mld_mean    0.9525212
uo_mean     0.7373759
bathy_sd    0.7185793
vo_mean     0.7077598
pred_var    0.4057107
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           5   sal_mean          1        tag  1251.29
2           4  temp_mean          2 o2_mean_0m   856.53
3           2 o2_mean_0m          1        tag   838.02
4          12 bathy_mean          1        tag   811.97
5           4  temp_mean          1        tag   452.47
6           3   chl_mean          1        tag   413.64
7          13   bathy_sd          1        tag   363.30
8           8    vo_mean          1        tag   348.54
9           7 uostr_mean          1        tag   340.42
10          9 vostr_mean          1        tag   299.61
[1] "External percent deviance explained"
[1] -6.928509

[1] "TPR"
[1] 0.2512725
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9619643 -0.9797619 0.0005028151 0.9977697         -6.928509 0.9420098

explore_brt(mod_file_path = brt_outputs[15], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38629281
Residual.Deviance  0.06074708
Correlation        0.99206350
AUC                1.00000000
Per.Expl          95.61801965
cvDeviance         0.26396205
cvCorrelation      0.92768023
cvAUC              0.98584000
cvPer.Expl        80.95914152
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 51.1425792
tag        18.4444753
o2_mean_0m 10.5833390
lat         7.0053558
bathy_mean  3.1295802
chl_mean    2.2557239
sal_mean    1.5213093
temp_mean   1.2786828
vostr_mean  0.9690416
ssh_mean    0.9508931
mld_mean    0.5242121
vo_mean     0.5180216
uo_mean     0.4790866
bathy_sd    0.4771174
uostr_mean  0.4558059
pred_var    0.2647762
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           3 o2_mean_0m          1        tag   779.87
2           2        lat          1        tag   692.69
3           5  temp_mean          3 o2_mean_0m   636.47
4          15 dist_coast          1        tag   466.74
5           6   sal_mean          1        tag   426.43
6           4   chl_mean          1        tag   421.31
7          13 bathy_mean          1        tag   420.13
8          14   bathy_sd          1        tag   344.47
9           9    vo_mean          1        tag   303.38
10         10 vostr_mean          1        tag   230.16
11          5  temp_mean          1        tag   222.22
12          8 uostr_mean          1        tag   208.55
13         15 dist_coast          3 o2_mean_0m   174.48
[1] "External percent deviance explained"
[1] -7.624278

[1] "TPR"
[1] 0.2515557
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9704327 -0.9864821 0.0001846725 0.9964614         -7.624278 0.9561802

explore_brt(mod_file_path = brt_outputs[13], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38629281
Residual.Deviance  0.07118474
Correlation        0.98998036
AUC                1.00000000
Per.Expl          94.86510106
cvDeviance         0.28488053
cvCorrelation      0.92010488
cvAUC              0.98419000
cvPer.Expl        79.45019060
[1] "Relative influence of predictor variables"

               rel.inf
bathy_mean  29.0714692
o2_mean_0m  26.9541769
tag         18.8753315
o2_mean_60m 10.1542238
chl_mean     3.3838023
ssh_mean     2.9041357
temp_mean    1.9892805
sal_mean     1.6667460
vostr_mean   1.1791073
uostr_mean   1.0524747
mld_mean     0.7605733
uo_mean      0.6030973
vo_mean      0.5369837
bathy_sd     0.5062655
pred_var     0.3623321
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index  var1.names var2.index var2.names int.size
1           2  o2_mean_0m          1        tag   914.39
2           5    sal_mean          1        tag   778.04
3          12  bathy_mean          1        tag   774.90
4           4   temp_mean          2 o2_mean_0m   449.37
5           3    chl_mean          1        tag   439.91
6          13    bathy_sd          1        tag   427.34
7           4   temp_mean          1        tag   381.87
8          14 o2_mean_60m          1        tag   355.36
9           9  vostr_mean          1        tag   293.42
10          8     vo_mean          1        tag   292.69
11         10    ssh_mean          1        tag   259.83
[1] "External percent deviance explained"
[1] -7.112088

[1] "TPR"
[1] 0.2514715
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained PseudoR2
1 0.9657599 -0.9823985 0.000340465 0.9978181         -7.112088 0.948651

explore_brt(mod_file_path = brt_outputs[10], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38629281
Residual.Deviance  0.06836473
Correlation        0.99037738
AUC                1.00000000
Per.Expl          95.06852165
cvDeviance         0.28369551
cvCorrelation      0.92037443
cvAUC              0.98436000
cvPer.Expl        79.53567176
[1] "Relative influence of predictor variables"

                rel.inf
o2_mean_0m   27.2629760
bathy_mean   25.2498534
tag          18.4421903
o2_mean_250m 16.5248776
chl_mean      2.2649308
temp_mean     1.9626402
sal_mean      1.7897130
ssh_mean      1.5479838
uostr_mean    1.1276963
vostr_mean    1.0966645
bathy_sd      0.7244073
vo_mean       0.5873495
mld_mean      0.5347793
uo_mean       0.5208155
pred_var      0.3631225
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           5     sal_mean          1        tag  1081.33
2           2   o2_mean_0m          1        tag   831.07
3           4    temp_mean          2 o2_mean_0m   634.92
4          14 o2_mean_250m          1        tag   580.53
5           3     chl_mean          1        tag   508.53
6          12   bathy_mean          1        tag   461.58
7           9   vostr_mean          1        tag   296.67
8           4    temp_mean          1        tag   295.22
9           8      vo_mean          1        tag   272.25
10         14 o2_mean_250m          2 o2_mean_0m   254.15
11         13     bathy_sd          1        tag   249.56
[1] "External percent deviance explained"
[1] -7.449017

[1] "TPR"
[1] 0.251507
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9667664 -0.9823896 0.0003823276 0.9973991         -7.449017 0.9506852

explore_brt(mod_file_path = brt_outputs[11], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38629281
Residual.Deviance  0.06749084
Correlation        0.99030698
AUC                1.00000000
Per.Expl          95.13155977
cvDeviance         0.27553664
cvCorrelation      0.92358978
cvAUC              0.98486000
cvPer.Expl        80.12421074
[1] "Relative influence of predictor variables"

                rel.inf
o2_mean_0m   27.1899223
bathy_mean   24.5062176
tag          17.3552811
o2_mean_250m 14.4641723
o2_mean_60m   5.7991520
ssh_mean      1.9059494
chl_mean      1.8501917
sal_mean      1.4611193
temp_mean     1.4363430
uostr_mean    0.8686807
vostr_mean    0.8420687
bathy_sd      0.5476465
vo_mean       0.5305118
mld_mean      0.5035008
uo_mean       0.4481542
pred_var      0.2910886
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           2   o2_mean_0m          1        tag   817.46
2           5     sal_mean          1        tag   587.30
3           4    temp_mean          2 o2_mean_0m   558.25
4          15 o2_mean_250m          1        tag   478.71
5           3     chl_mean          1        tag   429.20
6          12   bathy_mean          1        tag   410.22
7           4    temp_mean          1        tag   305.22
8          14  o2_mean_60m          1        tag   300.61
9           9   vostr_mean          1        tag   270.41
10          7   uostr_mean          1        tag   228.06
11         13     bathy_sd          1        tag   205.43
12         15 o2_mean_250m          2 o2_mean_0m   203.41
13         10     ssh_mean          1        tag   189.41
[1] "External percent deviance explained"
[1] -7.413317

[1] "TPR"
[1] 0.2514821
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9680402 -0.9840536 0.0002799894  0.997662         -7.413317 0.9513156

explore_brt(mod_file_path = brt_outputs[12], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38629281
Residual.Deviance  0.06674273
Correlation        0.99036056
AUC                1.00000000
Per.Expl          95.18552429
cvDeviance         0.25849355
cvCorrelation      0.92868348
cvAUC              0.98632000
cvPer.Expl        81.35361122
[1] "Relative influence of predictor variables"

                rel.inf
dist_coast   50.7979905
tag          17.4187334
o2_mean_0m   11.1532694
o2_mean_250m  4.8956535
lat           4.3567015
o2_mean_60m   2.7427545
chl_mean      1.7488786
bathy_mean    1.2868608
sal_mean      1.0812867
temp_mean     0.9791718
vostr_mean    0.6174175
ssh_mean      0.6127769
uostr_mean    0.5321097
bathy_sd      0.4024418
vo_mean       0.3965878
mld_mean      0.3741117
uo_mean       0.3487283
pred_var      0.2545257
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3   o2_mean_0m          1        tag   662.12
2           5    temp_mean          3 o2_mean_0m   481.90
3           2          lat          1        tag   385.52
4           6     sal_mean          1        tag   348.70
5           4     chl_mean          1        tag   336.40
6          14     bathy_sd          1        tag   330.96
7          17 o2_mean_250m          1        tag   293.11
8          13   bathy_mean          1        tag   276.05
9          15   dist_coast          1        tag   246.21
10         10   vostr_mean          1        tag   223.18
11         16  o2_mean_60m          1        tag   213.22
12          9      vo_mean          1        tag   206.61
13         16  o2_mean_60m          5  temp_mean   175.27
14          5    temp_mean          1        tag   144.27
15         15   dist_coast          3 o2_mean_0m   140.51
16         11     ssh_mean          1        tag   109.39
[1] "External percent deviance explained"
[1] -7.47654

[1] "TPR"
[1] 0.2514392
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9687462 -0.9846778 0.0002475989 0.9969652          -7.47654 0.9518552

AGI models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).

explore_brt(mod_file_path = brt_outputs[5], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38628958
Residual.Deviance  0.08532138
Correlation        0.98702006
AUC                1.00000000
Per.Expl          93.84534182
cvDeviance         0.31310146
cvCorrelation      0.91077834
cvAUC              0.98123000
cvPer.Expl        77.41442573
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 31.3849938
tag        22.9924922
temp_mean  19.0362797
ssh_mean    5.1405835
uostr_mean  4.5173002
AGI_0m      3.9279250
sal_mean    3.5882270
chl_mean    2.8743197
vostr_mean  2.5250985
bathy_sd    1.2576208
uo_mean     0.9020629
vo_mean     0.8232203
mld_mean    0.6439118
pred_var    0.3859644
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1816.06
2           4   sal_mean          1        tag  1301.25
3           3  temp_mean          1        tag   851.13
4          11 bathy_mean          1        tag   729.91
5           2   chl_mean          1        tag   428.42
6           7    vo_mean          1        tag   417.68
7           9   ssh_mean          1        tag   328.79
8           8 vostr_mean          1        tag   326.69
9          11 bathy_mean          3  temp_mean   318.21
10         13     AGI_0m          1        tag   295.11
[1] "External percent deviance explained"
[1] -6.590472

[1] "TPR"
[1] 0.2511185
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9603852 -0.9788467 0.0005035391 0.9568836         -6.590472 0.9384534

explore_brt(mod_file_path = brt_outputs[6], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38628958
Residual.Deviance  0.07272435
Correlation        0.98931402
AUC                1.00000000
Per.Expl          94.75402902
cvDeviance         0.27470458
cvCorrelation      0.92333512
cvAUC              0.98468000
cvPer.Expl        80.18418485
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 52.4969130
tag        19.1224933
lat         8.8661139
temp_mean   4.4530414
bathy_mean  3.4033437
AGI_0m      2.7486874
chl_mean    2.2110726
sal_mean    1.8920127
ssh_mean    1.0292439
vostr_mean  0.8247087
bathy_sd    0.5853612
vo_mean     0.5707580
uo_mean     0.5420383
uostr_mean  0.5176506
mld_mean    0.4773298
pred_var    0.2592316
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   711.46
2           5   sal_mean          1        tag   430.54
3          13   bathy_sd          1        tag   389.47
4           3   chl_mean          1        tag   388.51
5          12 bathy_mean          1        tag   385.22
6          15 dist_coast          1        tag   319.70
7          14     AGI_0m          1        tag   299.95
8          14     AGI_0m          4  temp_mean   291.55
9           4  temp_mean          1        tag   271.29
10          8    vo_mean          1        tag   270.02
11          9 vostr_mean          1        tag   263.25
12         14     AGI_0m          2        lat   173.01
13         10   ssh_mean          1        tag   168.72
[1] "External percent deviance explained"
[1] -7.078305

[1] "TPR"
[1] 0.251129
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9665712 -0.9839926 0.00022761 0.9554261         -7.078305 0.9475403

explore_brt(mod_file_path = brt_outputs[4], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38628958
Residual.Deviance  0.07558239
Correlation        0.98942417
AUC                1.00000000
Per.Expl          94.54786401
cvDeviance         0.29847916
cvCorrelation      0.91616548
cvAUC              0.98261000
cvPer.Expl        78.46920527
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 31.1481369
tag        22.3947478
temp_mean  19.6320724
AGI_0m      4.5763214
uostr_mean  4.0458198
AGI_60m     3.9505830
ssh_mean    3.2901697
sal_mean    3.2745439
vostr_mean  2.2604520
chl_mean    2.1686907
bathy_sd    0.8962724
uo_mean     0.7362601
vo_mean     0.7075609
mld_mean    0.5826191
pred_var    0.3357501
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1887.49
2           4   sal_mean          1        tag  1176.99
3           3  temp_mean          1        tag   753.28
4          11 bathy_mean          1        tag   637.60
5          14    AGI_60m          1        tag   576.97
6           8 vostr_mean          1        tag   433.15
7           2   chl_mean          1        tag   415.56
8          12   bathy_sd          1        tag   400.28
9          11 bathy_mean          3  temp_mean   363.16
10          7    vo_mean          1        tag   338.63
11          9   ssh_mean          1        tag   259.86
[1] "External percent deviance explained"
[1] -6.698273

[1] "TPR"
[1] 0.2511296
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9642625 -0.9823706 0.0003412825 0.9570393         -6.698273 0.9454786

explore_brt(mod_file_path = brt_outputs[1], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38628958
Residual.Deviance  0.09622781
Correlation        0.98319352
AUC                0.99980000
Per.Expl          93.05860674
cvDeviance         0.30428517
cvCorrelation      0.91335265
cvAUC              0.98178000
cvPer.Expl        78.05038891
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 26.2607017
tag        20.7337462
temp_mean  17.6619552
AGI_250m   12.8869113
uostr_mean  5.3151550
ssh_mean    4.0761838
AGI_0m      3.6152744
sal_mean    3.0128057
chl_mean    1.9236281
vostr_mean  1.2761366
bathy_sd    1.1067112
vo_mean     0.6687755
uo_mean     0.6510241
mld_mean    0.5242352
pred_var    0.2867560
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1354.07
2           4   sal_mean          1        tag  1099.13
3           3  temp_mean          1        tag   650.85
4          14   AGI_250m          1        tag   423.38
5          12   bathy_sd          1        tag   385.64
6           2   chl_mean          1        tag   371.29
7          11 bathy_mean          1        tag   357.84
8           9   ssh_mean          1        tag   292.44
9           7    vo_mean          1        tag   283.14
10          8 vostr_mean          1        tag   229.35
11         13     AGI_0m          1        tag   216.49
[1] "External percent deviance explained"
[1] -6.525803

[1] "TPR"
[1] 0.2510634
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4900 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor      C-index PredRatio DevianceExplained  PseudoR2
1 0.9585593 -0.9766923 0.0006610235 0.9583741         -6.525803 0.9305861

explore_brt(mod_file_path = brt_outputs[2], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38628958
Residual.Deviance  0.08163443
Correlation        0.98713893
AUC                1.00000000
Per.Expl          94.11130005
cvDeviance         0.29071385
cvCorrelation      0.91828031
cvAUC              0.98297000
cvPer.Expl        79.02935599
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 25.7475201
tag        20.6577555
temp_mean  18.3535453
AGI_250m   12.2429766
uostr_mean  4.4230811
ssh_mean    4.1802787
AGI_0m      4.0304067
sal_mean    2.6720505
AGI_60m     1.6636685
chl_mean    1.5954124
vostr_mean  1.2580929
bathy_sd    1.1907574
vo_mean     0.7000913
uo_mean     0.5201387
mld_mean    0.4689463
pred_var    0.2952780
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1653.43
2           4   sal_mean          1        tag  1094.62
3           3  temp_mean          1        tag   585.48
4          14    AGI_60m          1        tag   369.06
5           8 vostr_mean          1        tag   336.94
6          11 bathy_mean          1        tag   335.06
7          15   AGI_250m          1        tag   309.88
8          12   bathy_sd          1        tag   303.52
9           2   chl_mean          1        tag   295.26
10          9   ssh_mean          1        tag   239.61
11          7    vo_mean          1        tag   210.65
12         13     AGI_0m          1        tag   163.15
13          5    uo_mean          1        tag   142.51
[1] "External percent deviance explained"
[1] -6.697318

[1] "TPR"
[1] 0.251067
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
      RMSE        Cor      C-index PredRatio DevianceExplained PseudoR2
1 0.963364 -0.9805785 0.0004149873 0.9566989         -6.697318 0.941113

explore_brt(mod_file_path = brt_outputs[3], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                      Model 1
Total.Deviance     1.38628958
Residual.Deviance  0.06368071
Correlation        0.99128982
AUC                1.00000000
Per.Expl          95.40639170
cvDeviance         0.26342427
cvCorrelation      0.92717887
cvAUC              0.98558000
cvPer.Expl        80.99788972
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 52.3167942
tag        19.4852974
lat         8.2334957
temp_mean   3.9473184
AGI_250m    3.3725702
AGI_0m      2.2700357
bathy_mean  1.9914815
chl_mean    1.7837172
sal_mean    1.4620165
AGI_60m     1.1307988
ssh_mean    0.8257918
vostr_mean  0.5574685
bathy_sd    0.5546149
vo_mean     0.5191201
uostr_mean  0.4545932
uo_mean     0.4254801
mld_mean    0.4233323
pred_var    0.2460737
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   716.35
2          13   bathy_sd          1        tag   437.20
3           3   chl_mean          1        tag   402.35
4           5   sal_mean          1        tag   378.54
5          16    AGI_60m          1        tag   342.53
6          17   AGI_250m          1        tag   264.53
7          12 bathy_mean          1        tag   261.67
8           4  temp_mean          1        tag   240.35
9          15 dist_coast          1        tag   215.24
10         14     AGI_0m          1        tag   210.42
11          8    vo_mean          1        tag   200.95
12          9 vostr_mean          1        tag   193.50
13         14     AGI_0m          4  temp_mean   189.18
14         14     AGI_0m          2        lat   124.05
15          6    uo_mean          1        tag   115.00
16         10   ssh_mean          1        tag   113.62
[1] "External percent deviance explained"
[1] -7.26371

[1] "TPR"
[1] 0.2512885
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9698598 -0.9855271 0.000225953 0.9550295          -7.26371 0.9540639

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_bckg_output_summary.csv"))
kableExtra::kable(output_sum)

model	percent_explained	deviance_exp	TPR_mean	TSS	AUC	RMSE	SpearmanCor	PseudoR2
base_0m_Nspat_Ntag	78.734	0.724	0.739	0.870	0.979	0.231	0.888	0.787
base_0m_Nspat_Ytag	92.976	0.876	0.761	0.961	0.994	0.141	0.960	0.930
base_0m_Yspat_Ytag	93.544	0.887	0.770	0.964	0.995	0.125	0.963	0.935
do_0m_Nspat_Ytag	94.201	0.901	0.772	0.971	0.996	0.124	0.969	0.942
do_0m_Yspat_Ytag	95.618	0.920	0.788	0.977	0.997	0.110	0.976	0.956
do_0m_60m_Nspat_Ytag	94.865	0.908	0.775	0.973	0.997	0.119	0.972	0.949
do_0m_250m_Nspat_Ytag	95.069	0.909	0.783	0.974	0.996	0.119	0.972	0.951
do_0m_60m_250m_Nspat_Ytag	95.132	0.913	0.783	0.976	0.997	0.116	0.973	0.951
do_0m_60m_250m_Yspat_Ytag	95.186	0.918	0.784	0.977	0.997	0.113	0.975	0.952
agi_0m_Nspat_Ytag	93.845	0.901	0.765	0.971	0.997	0.124	0.970	0.938
agi_0m_Yspat_Ytag	94.754	0.916	0.776	0.975	0.998	0.114	0.974	0.948
agi_0m_60m_Nspat_Ytag	94.548	0.908	0.765	0.973	0.997	0.119	0.972	0.945
agi_0m_250m_Nspat_Ytag	93.059	0.897	0.767	0.967	0.997	0.129	0.967	0.931
agi_0m_60m_250m_Nspat_Ytag	94.111	0.907	0.767	0.972	0.997	0.122	0.971	0.941
agi_0m_60m_250m_Yspat_Ytag	95.406	0.920	0.777	0.976	0.998	0.111	0.975	0.954

ggplot(output_sum, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50',
                  max.overlaps = 20,
                  label.size = 0.5)

Conclusions from initial models w/ tag ID

Base models: Relative to the CRW PA base models, these had drastically higher AUC scores and deviance explained values. The base model with no spatial or tag ID predictors was the lowest scoring model.
DO and AGIModel performance generally increased with the added depth layers, but were all fairly comparable to each other. Models with spatial and tag ID predictors performed the best, but as described on the CRW PA document, we will likely not include them for these models as they would not be included in the projection work and are not essential for addressing this study’s objectives.
The performance metrics across comparable DO and AGI models were much more similar relative to the models with the CRW PA data.

DO models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

explore_brt(mod_file_path = brt_outputs_Ntag[12], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.2227822
Correlation        0.9477851
AUC                0.9963000
Per.Expl          83.9296423
cvDeviance         0.5119148
cvCorrelation      0.8357045
cvAUC              0.9593600
cvPer.Expl        63.0731081
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 37.042591
o2_mean_0m 29.610389
temp_mean   8.255550
chl_mean    5.168471
ssh_mean    3.874249
sal_mean    3.296404
vostr_mean  2.770317
mld_mean    2.274538
bathy_sd    2.115492
uostr_mean  1.764755
uo_mean     1.535677
vo_mean     1.263401
pred_var    1.028166
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          3  temp_mean          1 o2_mean_0m  1203.43
2          2   chl_mean          1 o2_mean_0m   685.80
3         11 bathy_mean          3  temp_mean   629.37
4         11 bathy_mean          5    uo_mean   482.06
5          9   ssh_mean          3  temp_mean   428.24
6         10   mld_mean          7    vo_mean   397.02
7         11 bathy_mean          9   ssh_mean   393.82
8         11 bathy_mean          4   sal_mean   391.94
[1] "External percent deviance explained"
[1] -4.109901

[1] "TPR"
[1] 0.2542114
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9154336 -0.9314576 0.007539246  1.002372         -4.109901 0.8392964

explore_brt(mod_file_path = brt_outputs_Ntag[13], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.1924089
Correlation        0.9564733
AUC                0.9975000
Per.Expl          86.1206180
cvDeviance         0.4707743
cvCorrelation      0.8515532
cvAUC              0.9652100
cvPer.Expl        66.0407773
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 53.5317897
o2_mean_0m 12.2975738
lat         8.1708571
bathy_mean  5.8707423
chl_mean    3.6108876
temp_mean   3.2303458
sal_mean    2.5444041
ssh_mean    2.2134207
vostr_mean  1.7157868
mld_mean    1.5219427
uo_mean     1.1835532
bathy_sd    1.1832015
vo_mean     1.0970184
uostr_mean  0.9877975
pred_var    0.8406787
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1           4  temp_mean          2 o2_mean_0m   890.78
2          12 bathy_mean          4  temp_mean   775.24
3          14 dist_coast         11   mld_mean   564.53
4           3   chl_mean          2 o2_mean_0m   249.12
5           2 o2_mean_0m          1        lat   209.95
6          12 bathy_mean          1        lat   192.27
7          12 bathy_mean          6    uo_mean   190.20
8          12 bathy_mean          5   sal_mean   178.54
9           7 uostr_mean          2 o2_mean_0m   145.22
10         12 bathy_mean          3   chl_mean   137.46
11         12 bathy_mean         10   ssh_mean   127.01
[1] "External percent deviance explained"
[1] -4.442145

[1] "TPR"
[1] 0.2531495
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9259653 -0.9413396 0.005478969   1.00341         -4.442145 0.8612062

explore_brt(mod_file_path = brt_outputs_Ntag[11], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.2018195
Correlation        0.9539294
AUC                0.9971000
Per.Expl          85.4417866
cvDeviance         0.4904242
cvCorrelation      0.8443609
cvAUC              0.9626400
cvPer.Expl        64.6233345
[1] "Relative influence of predictor variables"

               rel.inf
bathy_mean  32.8872618
o2_mean_0m  29.0113534
o2_mean_60m 11.2428960
ssh_mean     4.9169342
chl_mean     4.7561458
temp_mean    4.0546910
sal_mean     2.9788618
vostr_mean   1.8897280
mld_mean     1.8536327
bathy_sd     1.6149277
uo_mean      1.4543292
uostr_mean   1.3424132
vo_mean      1.1227679
pred_var     0.8740571
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index  var1.names var2.index var2.names int.size
1           3   temp_mean          1 o2_mean_0m   811.01
2           6  uostr_mean          1 o2_mean_0m   445.93
3           2    chl_mean          1 o2_mean_0m   381.63
4          11  bathy_mean          4   sal_mean   370.20
5          10    mld_mean          7    vo_mean   347.32
6          11  bathy_mean          5    uo_mean   325.10
7          13 o2_mean_60m          3  temp_mean   284.60
8           9    ssh_mean          2   chl_mean   279.34
9          11  bathy_mean          3  temp_mean   279.32
10         11  bathy_mean          1 o2_mean_0m   238.68
[1] "External percent deviance explained"
[1] -4.350923

[1] "TPR"
[1] 0.2538241
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9223194 -0.9369767 0.006545141  1.002044         -4.350923 0.8544179

explore_brt(mod_file_path = brt_outputs_Ntag[8], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.2217747
Correlation        0.9471965
AUC                0.9959000
Per.Expl          84.0023199
cvDeviance         0.4919018
cvCorrelation      0.8427542
cvAUC              0.9622200
cvPer.Expl        64.5167458
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean   31.0545282
o2_mean_0m   30.5314940
o2_mean_250m 15.9557395
temp_mean     3.9670785
chl_mean      3.9145848
sal_mean      2.7179083
ssh_mean      2.6863495
bathy_sd      1.7546135
vostr_mean    1.5258534
mld_mean      1.4129971
uostr_mean    1.3536665
uo_mean       1.3228056
vo_mean       1.0371771
pred_var      0.7652041
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3    temp_mean          1 o2_mean_0m   733.46
2          13 o2_mean_250m          1 o2_mean_0m   339.34
3           2     chl_mean          1 o2_mean_0m   329.37
4          13 o2_mean_250m          4   sal_mean   297.21
5           6   uostr_mean          5    uo_mean   278.72
6          11   bathy_mean          3  temp_mean   265.17
7          11   bathy_mean          1 o2_mean_0m   261.90
8          11   bathy_mean          4   sal_mean   224.97
9           9     ssh_mean          4   sal_mean   194.26
10          4     sal_mean          1 o2_mean_0m   187.63
[1] "External percent deviance explained"
[1] -4.187577

[1] "TPR"
[1] 0.2540331
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9189645 -0.9341724 0.00722521  1.002586         -4.187577 0.8400232

explore_brt(mod_file_path = brt_outputs_Ntag[9], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.1928786
Correlation        0.9567911
AUC                0.9976000
Per.Expl          86.0867365
cvDeviance         0.4815743
cvCorrelation      0.8475797
cvAUC              0.9638500
cvPer.Expl        65.2617211
[1] "Relative influence of predictor variables"

                rel.inf
o2_mean_0m   30.1719892
bathy_mean   29.0380795
o2_mean_250m 12.9073690
o2_mean_60m   7.1186837
chl_mean      3.2788350
temp_mean     3.2112162
sal_mean      2.8197533
ssh_mean      2.5431217
bathy_sd      1.6008272
vostr_mean    1.4381590
mld_mean      1.4332633
uostr_mean    1.3219324
uo_mean       1.2491223
vo_mean       1.0814297
pred_var      0.7862186
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           3    temp_mean          1 o2_mean_0m   632.37
2           2     chl_mean          1 o2_mean_0m   631.48
3          13  o2_mean_60m          3  temp_mean   326.47
4          11   bathy_mean          5    uo_mean   292.91
5          14 o2_mean_250m          1 o2_mean_0m   266.03
6          11   bathy_mean          3  temp_mean   212.56
7          11   bathy_mean          4   sal_mean   183.05
8           6   uostr_mean          5    uo_mean   176.12
9          14 o2_mean_250m         11 bathy_mean   166.45
10          4     sal_mean          1 o2_mean_0m   153.66
11          9     ssh_mean          4   sal_mean   153.50
[1] "External percent deviance explained"
[1] -4.421676

[1] "TPR"
[1] 0.2532748
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9253238 -0.9416763 0.005680135  1.003422         -4.421676 0.8608674

explore_brt(mod_file_path = brt_outputs_Ntag[10], 
            test_data = do_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862928
Residual.Deviance  0.1730551
Correlation        0.9630163
AUC                0.9983000
Per.Expl          87.5167023
cvDeviance         0.4571302
cvCorrelation      0.8567712
cvAUC              0.9671700
cvPer.Expl        67.0249860
[1] "Relative influence of predictor variables"

                rel.inf
dist_coast   51.0947600
o2_mean_0m   11.9843937
o2_mean_250m  7.5420896
lat           5.0225211
bathy_mean    3.7536001
o2_mean_60m   3.5987315
chl_mean      2.9175112
temp_mean     2.7347070
sal_mean      2.4927940
ssh_mean      1.8254724
mld_mean      1.3819650
vostr_mean    1.1186016
uo_mean       1.0635226
uostr_mean    0.9547572
bathy_sd      0.9449802
vo_mean       0.9346186
pred_var      0.6349742
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           4    temp_mean          2 o2_mean_0m  1104.53
2          12   bathy_mean          4  temp_mean   566.77
3          14   dist_coast         11   mld_mean   551.57
4          12   bathy_mean          1        lat   485.06
5          12   bathy_mean          5   sal_mean   373.98
6           5     sal_mean          2 o2_mean_0m   254.84
7          12   bathy_mean          3   chl_mean   244.77
8          16 o2_mean_250m         12 bathy_mean   192.70
9          15  o2_mean_60m          4  temp_mean   184.99
10         16 o2_mean_250m          2 o2_mean_0m   154.32
11         16 o2_mean_250m          1        lat   133.79
12          2   o2_mean_0m          1        lat   130.57
13         15  o2_mean_60m          3   chl_mean   117.28
14          3     chl_mean          2 o2_mean_0m   114.22
[1] "External percent deviance explained"
[1] -4.646858

[1] "TPR"
[1] 0.2525845
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained PseudoR2
1 0.9319747 -0.9496718 0.004143605  1.002288         -4.646858 0.875167

AGI models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

explore_brt(mod_file_path = brt_outputs_Ntag[5], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.2497858
Correlation        0.9395089
AUC                0.9945000
Per.Expl          81.9816994
cvDeviance         0.5307930
cvCorrelation      0.8283713
cvAUC              0.9563800
cvPer.Expl        61.7112444
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 36.9573034
temp_mean  21.9226886
AGI_0m      9.9456826
ssh_mean    5.1190726
uostr_mean  5.0894934
sal_mean    4.6868399
chl_mean    4.6532916
vostr_mean  3.5292848
bathy_sd    2.1704713
uo_mean     1.7266251
mld_mean    1.7123633
vo_mean     1.5145409
pred_var    0.9723426
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         12     AGI_0m          2  temp_mean  6924.06
2         10 bathy_mean          4    uo_mean   486.67
3         12     AGI_0m          8   ssh_mean   471.86
4         10 bathy_mean          8   ssh_mean   421.48
5         12     AGI_0m          4    uo_mean   404.95
6         10 bathy_mean          2  temp_mean   375.23
7         10 bathy_mean          3   sal_mean   341.20
8          8   ssh_mean          5 uostr_mean   230.04
[1] "External percent deviance explained"
[1] -3.883058

[1] "TPR"
[1] 0.2539667
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained PseudoR2
1 0.9104791 -0.9295857 0.007261449 0.9609996         -3.883058 0.819817

explore_brt(mod_file_path = brt_outputs_Ntag[6], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.1967422
Correlation        0.9556098
AUC                0.9974000
Per.Expl          85.8080043
cvDeviance         0.4860996
cvCorrelation      0.8459321
cvAUC              0.9628400
cvPer.Expl        64.9352059
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 53.5244509
lat        10.5089650
AGI_0m      7.1058947
bathy_mean  6.5961745
temp_mean   5.3593100
chl_mean    3.3173597
sal_mean    2.8581691
ssh_mean    2.0268410
uo_mean     1.5245115
vostr_mean  1.4444895
mld_mean    1.3388584
bathy_sd    1.2377161
uostr_mean  1.2096878
vo_mean     1.1623617
pred_var    0.7852103
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  2714.25
2          13     AGI_0m          1        lat   626.22
3          11 bathy_mean          3  temp_mean   463.19
4          11 bathy_mean          2   chl_mean   314.46
5           3  temp_mean          1        lat   308.75
6          11 bathy_mean          5    uo_mean   282.97
7          13     AGI_0m         11 bathy_mean   248.68
8          11 bathy_mean          1        lat   234.37
9          14 dist_coast          8 vostr_mean   177.05
10         11 bathy_mean          9   ssh_mean   176.04
11         11 bathy_mean          4   sal_mean   173.67
[1] "External percent deviance explained"
[1] -4.389091

[1] "TPR"
[1] 0.2526244
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained PseudoR2
1 0.9251414 -0.9441604 0.004502158 0.9584907         -4.389091  0.85808

explore_brt(mod_file_path = brt_outputs_Ntag[4], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.1981884
Correlation        0.9571046
AUC                0.9975000
Per.Expl          85.7036795
cvDeviance         0.5045864
cvCorrelation      0.8398032
cvAUC              0.9604600
cvPer.Expl        63.6016628
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 34.034848
temp_mean  21.042165
AGI_0m      9.921024
AGI_60m     5.785260
sal_mean    5.106237
uostr_mean  4.656571
chl_mean    4.204098
ssh_mean    4.041292
vostr_mean  3.376607
bathy_sd    1.942936
uo_mean     1.777164
mld_mean    1.692922
vo_mean     1.399912
pred_var    1.018965
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  6728.90
2          10 bathy_mean          2  temp_mean   594.17
3          12     AGI_0m          8   ssh_mean   365.47
4          10 bathy_mean          3   sal_mean   336.23
5          10 bathy_mean          8   ssh_mean   334.83
6          13    AGI_60m         10 bathy_mean   328.60
7          10 bathy_mean          4    uo_mean   286.71
8          12     AGI_0m          4    uo_mean   200.22
9          13    AGI_60m          2  temp_mean   196.14
10          5 uostr_mean          2  temp_mean   190.43
[1] "External percent deviance explained"
[1] -4.229205

[1] "TPR"
[1] 0.2524369
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5100 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9233124 -0.9460456 0.004115473 0.9609968         -4.229205 0.8570368

explore_brt(mod_file_path = brt_outputs_Ntag[1], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.2223382
Correlation        0.9477111
AUC                0.9960000
Per.Expl          83.9616359
cvDeviance         0.5036610
cvCorrelation      0.8404252
cvAUC              0.9600700
cvPer.Expl        63.6684121
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 31.079609
temp_mean  20.440342
AGI_250m   12.927116
AGI_0m      8.557389
uostr_mean  5.276555
ssh_mean    4.592386
sal_mean    4.485481
chl_mean    3.402499
bathy_sd    2.123828
vostr_mean  2.027799
uo_mean     1.572834
vo_mean     1.438505
mld_mean    1.305188
pred_var    0.770471
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  4454.06
2          10 bathy_mean          3   sal_mean   445.32
3          13   AGI_250m          2  temp_mean   412.52
4          13   AGI_250m          3   sal_mean   286.51
5          13   AGI_250m         10 bathy_mean   285.61
6          12     AGI_0m         10 bathy_mean   283.01
7          10 bathy_mean          2  temp_mean   274.09
8          12     AGI_0m          4    uo_mean   267.22
9          12     AGI_0m          8   ssh_mean   234.66
10         12     AGI_0m         11   bathy_sd   234.29
[1] "External percent deviance explained"
[1] -4.130385

[1] "TPR"
[1] 0.253199
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9190934 -0.9393155 0.005542045 0.9595282         -4.130385 0.8396164

explore_brt(mod_file_path = brt_outputs_Ntag[2], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.1822323
Correlation        0.9612794
AUC                0.9981000
Per.Expl          86.8546720
cvDeviance         0.4900686
cvCorrelation      0.8457470
cvAUC              0.9625500
cvPer.Expl        64.6489000
[1] "Relative influence of predictor variables"

              rel.inf
bathy_mean 29.5789429
temp_mean  20.2111119
AGI_250m   12.8943905
AGI_0m      8.2276644
uostr_mean  5.6018303
sal_mean    4.1762023
ssh_mean    3.7524824
AGI_60m     3.4167547
chl_mean    3.2497693
vostr_mean  1.9240122
bathy_sd    1.8515668
uo_mean     1.5600765
mld_mean    1.3810041
vo_mean     1.3729043
pred_var    0.8012876
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  5794.97
2          10 bathy_mean          3   sal_mean   437.23
3          14   AGI_250m          2  temp_mean   429.14
4          12     AGI_0m         10 bathy_mean   421.21
5          13    AGI_60m         10 bathy_mean   414.15
6          12     AGI_0m          8   ssh_mean   331.82
7           4    uo_mean          2  temp_mean   322.86
8          10 bathy_mean          4    uo_mean   294.72
9          10 bathy_mean          2  temp_mean   294.26
10         14   AGI_250m          3   sal_mean   239.00
11          8   ssh_mean          3   sal_mean   167.69
[1] "External percent deviance explained"
[1] -4.403407

[1] "TPR"
[1] 0.252197
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5150 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9290584 -0.9515003 0.003382733 0.9601138         -4.403407 0.8685467

explore_brt(mod_file_path = brt_outputs_Ntag[3], 
            test_data = agi_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.1859167
Correlation        0.9590992
AUC                0.9978000
Per.Expl          86.5888987
cvDeviance         0.4697185
cvCorrelation      0.8519017
cvAUC              0.9651800
cvPer.Expl        66.1168520
[1] "Relative influence of predictor variables"

              rel.inf
dist_coast 51.8697060
lat         9.6962485
AGI_0m      6.1261413
bathy_mean  5.4406471
AGI_250m    4.8968222
temp_mean   4.8373778
chl_mean    2.9005605
sal_mean    2.7254972
AGI_60m     2.2333278
ssh_mean    1.9217436
uo_mean     1.3083809
mld_mean    1.2604230
vostr_mean  1.2375953
vo_mean     1.0230573
uostr_mean  0.9963094
bathy_sd    0.9226174
pred_var    0.6035448
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1149.59
2          13     AGI_0m          1        lat   966.63
3          13     AGI_0m         11 bathy_mean   645.51
4          15    AGI_60m         11 bathy_mean   335.90
5          11 bathy_mean          3  temp_mean   275.79
6          11 bathy_mean          5    uo_mean   265.44
7           6 uostr_mean          1        lat   221.40
8          16   AGI_250m         11 bathy_mean   194.80
9          13     AGI_0m          5    uo_mean   193.30
10         11 bathy_mean          1        lat   183.75
11          3  temp_mean          1        lat   176.25
12         15    AGI_60m          3  temp_mean   165.89
13         11 bathy_mean          2   chl_mean   152.85
14          8 vostr_mean          5    uo_mean   137.36
[1] "External percent deviance explained"
[1] -4.428819

[1] "TPR"
[1] 0.252357
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained PseudoR2
1 0.9289301 -0.9497639 0.00375298 0.9585225         -4.428819 0.865889

Summary table of results

output_sum_Ntag <- read.csv(here("data/brt/mod_outputs/brt_bckg_output_summary_Ntag.csv"))
kableExtra::kable(output_sum_Ntag)

model	percent_explained	deviance_exp	TPR_mean	TSS	AUC	RMSE	SpearmanCor	PseudoR2
base_0m_Nspat_Ntag	78.734	0.724	0.739	0.870	0.979	0.231	0.888	0.787
do_0m_Nspat_Ntag	83.930	0.785	0.744	0.906	0.987	0.199	0.919	0.839
do_0m_Yspat_Ntag	86.121	0.810	0.746	0.921	0.990	0.186	0.930	0.861
do_0m_60m_Nspat_Ntag	85.442	0.802	0.745	0.919	0.989	0.189	0.927	0.854
do_0m_250m_Nspat_Ntag	84.002	0.789	0.744	0.910	0.987	0.197	0.920	0.840
do_0m_60m_250m_Nspat_Ntag	86.087	0.809	0.746	0.917	0.990	0.187	0.929	0.861
do_0m_60m_250m_Yspat_Ntag	87.517	0.823	0.747	0.928	0.992	0.179	0.935	0.875
agi_0m_Nspat_Ntag	81.982	0.775	0.743	0.903	0.987	0.204	0.915	0.820
agi_0m_Yspat_Ntag	85.808	0.809	0.746	0.922	0.990	0.186	0.930	0.858
agi_0m_60m_Nspat_Ntag	85.704	0.805	0.745	0.922	0.990	0.187	0.929	0.857
agi_0m_250m_Nspat_Ntag	83.962	0.793	0.744	0.914	0.988	0.195	0.923	0.840
agi_0m_60m_250m_Nspat_Ntag	86.855	0.818	0.746	0.928	0.991	0.179	0.935	0.869
agi_0m_60m_250m_Yspat_Ntag	86.589	0.820	0.746	0.928	0.991	0.180	0.935	0.866

output_sum_Ntag_Nspat <- output_sum_Ntag %>% 
  filter(!grepl("Yspat", model))

ggplot(output_sum_Ntag_Nspat, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50',
                  max.overlaps = 20,
                  label.size = 0.5)

Conclusions from initial models w/o tag ID

If only considering models that did not include spatial data as model predictors, the AGI models performed much better than the DO models across the board.
The AGI model will all depth layers performed the best and considerably better than the comparable DO model.
For the DO model with all depth layers, DO_0m was the predictor variable with the highest relative influence, but was closely followed by bathymetry. DO_250m was the third most influential predictor, but is considerably lower than DO_0m and bathymetry. Partial plots show drastically different relationships that the CRW PA models, with DO_250m having a positive correlation and DO_0m having an inverse sweet spot.
For the AGI model with all depth layers, bathymetry and temperature were the two predictors with the highest relative influence, and AGI 250m was listed third, somewhat closely followed by AGI 0m. The partial plots for these two variables are similar to the DO models, but less extreme.

Base models w/o tag ID and w/ data at seasonal and annual resolutions

For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_base_0m_seas_Nspat_Ntag.rds",
            test_data = base_test_seasonal)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862930
Residual.Deviance  0.3718825
Correlation        0.8913160
AUC                0.9811000
Per.Expl          73.1743220
cvDeviance         0.5439165
cvCorrelation      0.8203274
cvAUC              0.9543000
cvPer.Expl        60.7646778
[1] "Relative influence of predictor variables"

              rel.inf
vo_mean    37.7484397
vostr_mean 13.0995207
bathy_mean  9.5110530
uostr_mean  8.6973249
ssh_mean    8.4917915
sal_mean    6.2551008
temp_mean   5.2891595
mld_mean    3.9670224
chl_mean    2.9443099
uo_mean     1.9103745
bathy_sd    1.2421311
pred_var    0.8437721
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          2   sal_mean          1   mld_mean  1130.27
2         10 bathy_mean          6 uostr_mean   473.01
3          8 vostr_mean          4  temp_mean   345.16
4          7    vo_mean          3   ssh_mean   238.10
5          7    vo_mean          2   sal_mean   188.96
6          8 vostr_mean          3   ssh_mean   179.53
7          4  temp_mean          2   sal_mean   164.83
[1] "External percent deviance explained"
[1] -3.262271

[1] "TPR"
[1] 0.2625646
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.8871873 -0.8752483 0.02439327  0.990689         -3.262271 0.7317432

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_base_0m_ann_Nspat_Ntag.rds",
            test_data = base_test_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862892
Residual.Deviance  0.3485522
Correlation        0.9016247
AUC                0.9844000
Per.Expl          74.8571794
cvDeviance         0.5423354
cvCorrelation      0.8223235
cvAUC              0.9539500
cvPer.Expl        60.8786270
[1] "Relative influence of predictor variables"

              rel.inf
vo_mean    38.5581912
vostr_mean 16.9760400
uostr_mean 11.7424763
bathy_mean 10.1812331
chl_mean    5.1572250
sal_mean    4.4084308
temp_mean   3.6019413
ssh_mean    3.1522272
mld_mean    2.3610565
uo_mean     1.8300039
bathy_sd    1.2618894
pred_var    0.7692852
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          8 vostr_mean          6 uostr_mean  1088.69
2          7    vo_mean          6 uostr_mean   501.60
3         10 bathy_mean          8 vostr_mean   396.66
4          3   ssh_mean          1   mld_mean   391.80
5          8 vostr_mean          3   ssh_mean   319.72
6          8 vostr_mean          4  temp_mean   298.87
7          8 vostr_mean          1   mld_mean   259.45
[1] "External percent deviance explained"
[1] -3.253788

[1] "TPR"
[1] 0.2632895
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.8860142 -0.87391 0.0259118 0.9730752         -3.253788 0.7485718

DO models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2470047
Correlation        0.9382402
AUC                0.9942000
Per.Expl          82.1822454
cvDeviance         0.4886605
cvCorrelation      0.8432439
cvAUC              0.9622300
cvPer.Expl        64.7503346
[1] "Relative influence of predictor variables"

                     rel.inf
o2_mean_0m_seas   29.8253334
bathy_mean        28.6073228
o2_mean_250m_seas 12.8824390
o2_mean_60m_seas   7.4286303
ssh_mean           3.7142049
chl_mean           3.4442744
temp_mean          3.3428766
sal_mean           2.5604620
uostr_mean         1.4038380
mld_mean           1.3638226
bathy_sd           1.3610024
vostr_mean         1.2779076
uo_mean            1.2051556
vo_mean            0.8640793
pred_var           0.7186511
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index      var2.names int.size
1          14  o2_mean_60m_seas          3        sal_mean   327.14
2          15 o2_mean_250m_seas         10      bathy_mean   303.66
3          10        bathy_mean          1        chl_mean   255.27
4          10        bathy_mean          4         uo_mean   243.69
5          10        bathy_mean          8        ssh_mean   225.29
6          15 o2_mean_250m_seas         13 o2_mean_0m_seas   198.72
7          10        bathy_mean          2       temp_mean   198.07
8          13   o2_mean_0m_seas          8        ssh_mean   191.82
9          13   o2_mean_0m_seas          2       temp_mean   185.50
10         10        bathy_mean          3        sal_mean   165.98
11         13   o2_mean_0m_seas         10      bathy_mean   164.29
[1] "External percent deviance explained"
[1] -3.957167

[1] "TPR"
[1] 0.2572183
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6850 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9093165 -0.9106175 0.01362216 0.9920977         -3.957167 0.8218225

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2393353
Correlation        0.9405329
AUC                0.9946000
Per.Expl          82.7354798
cvDeviance         0.4837428
cvCorrelation      0.8448644
cvAUC              0.9630300
cvPer.Expl        65.1050721
[1] "Relative influence of predictor variables"

                     rel.inf
dist_coast        51.7884390
o2_mean_0m_seas   11.3114005
o2_mean_250m_seas  8.0490924
o2_mean_60m_seas   4.3433808
bathy_mean         4.0318500
lat                3.9720294
chl_mean           3.0187321
temp_mean          2.7062765
sal_mean           2.2069866
ssh_mean           1.8282680
mld_mean           1.2179747
uo_mean            1.0978100
vostr_mean         1.0915666
bathy_sd           1.0099016
uostr_mean         0.9010789
vo_mean            0.8156201
pred_var           0.6095928
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index var2.names int.size
1          11        bathy_mean          3  temp_mean   394.73
2          17 o2_mean_250m_seas         11 bathy_mean   298.69
3          11        bathy_mean          2   chl_mean   293.55
4          11        bathy_mean          5    uo_mean   198.66
5          16  o2_mean_60m_seas          4   sal_mean   174.92
6          16  o2_mean_60m_seas         11 bathy_mean   171.41
7          15   o2_mean_0m_seas          1        lat   160.69
8          11        bathy_mean          9   ssh_mean   156.06
9          13        dist_coast         10   mld_mean   151.33
10         11        bathy_mean          4   sal_mean   125.29
11          6        uostr_mean          1        lat   112.97
12         15   o2_mean_0m_seas          4   sal_mean   111.65
13         16  o2_mean_60m_seas          3  temp_mean   100.81
14          9          ssh_mean          4   sal_mean    99.29
[1] "External percent deviance explained"
[1] -4.031325

[1] "TPR"
[1] 0.2570792
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6800 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.9113597 -0.9126288 0.0133201  0.990409         -4.031325 0.8273548

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2418740
Correlation        0.9426300
AUC                0.9953000
Per.Expl          82.5523491
cvDeviance         0.5203305
cvCorrelation      0.8308027
cvAUC              0.9580000
cvPer.Expl        62.4658114
[1] "Relative influence of predictor variables"

                    rel.inf
bathy_mean       27.6559956
o2_mean_0m_ann   22.1577042
o2_mean_250m_ann 13.9331703
temp_mean         7.5838024
o2_mean_60m_ann   7.1954997
chl_mean          4.4958699
sal_mean          3.5287622
ssh_mean          2.9167102
uostr_mean        2.2307148
vostr_mean        1.6575902
mld_mean          1.5932019
bathy_sd          1.5853359
uo_mean           1.4712842
vo_mean           1.1971742
pred_var          0.7971843
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index     var2.names int.size
1          14  o2_mean_60m_ann          2      temp_mean   476.01
2          14  o2_mean_60m_ann         10     bathy_mean   249.69
3          10       bathy_mean          2      temp_mean   246.40
4          10       bathy_mean          1       chl_mean   230.23
5          10       bathy_mean          4        uo_mean   223.84
6          10       bathy_mean          3       sal_mean   190.35
7           8         ssh_mean          5     uostr_mean   157.00
8          10       bathy_mean          8       ssh_mean   137.07
9          14  o2_mean_60m_ann         13 o2_mean_0m_ann   133.08
10         13   o2_mean_0m_ann          3       sal_mean   125.77
11         15 o2_mean_250m_ann         13 o2_mean_0m_ann   123.86
[1] "External percent deviance explained"
[1] -3.831515

[1] "TPR"
[1] 0.2571994
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9064786 -0.910928 0.01359871 0.9935801         -3.831515 0.8255235

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2251524
Correlation        0.9476293
AUC                0.9962000
Per.Expl          83.7585691
cvDeviance         0.5016073
cvCorrelation      0.8374576
cvAUC              0.9609200
cvPer.Expl        63.8164123
[1] "Relative influence of predictor variables"

                    rel.inf
dist_coast       52.0391515
lat               7.5722313
o2_mean_250m_ann  5.9824841
o2_mean_0m_ann    5.6170844
bathy_mean        5.3985391
chl_mean          4.1449974
temp_mean         3.4237513
o2_mean_60m_ann   3.1501892
sal_mean          2.8294737
ssh_mean          2.2021890
vostr_mean        1.4298185
mld_mean          1.3292879
uo_mean           1.2080682
bathy_sd          1.0629375
uostr_mean        1.0065255
vo_mean           0.9086555
pred_var          0.6946158
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index var2.names int.size
1          11       bathy_mean          3  temp_mean   552.75
2          16  o2_mean_60m_ann         11 bathy_mean   437.44
3          11       bathy_mean          2   chl_mean   329.67
4          16  o2_mean_60m_ann          3  temp_mean   246.01
5           6       uostr_mean          1        lat   204.09
6          13       dist_coast         10   mld_mean   180.83
7          15   o2_mean_0m_ann          1        lat   172.78
8          11       bathy_mean          9   ssh_mean   157.39
9          11       bathy_mean          5    uo_mean   129.80
10         16  o2_mean_60m_ann          4   sal_mean   121.32
11          8       vostr_mean          5    uo_mean   119.76
12         16  o2_mean_60m_ann          1        lat   111.11
13         17 o2_mean_250m_ann          1        lat    97.43
14          3        temp_mean          1        lat    84.62
[1] "External percent deviance explained"
[1] -4.025613

[1] "TPR"
[1] 0.2564111
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8400 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9113126 -0.9164691 0.01199286 0.9927838         -4.025613 0.8375857

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.1951649
Correlation        0.9560043
AUC                0.9974000
Per.Expl          85.9217263
cvDeviance         0.4585485
cvCorrelation      0.8553534
cvAUC              0.9665500
cvPer.Expl        66.9224706
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        26.7948276
o2_mean_0m        21.5078589
o2_mean_250m_seas 10.0195711
o2_mean_0m_seas    8.3152092
o2_mean_60m_seas   5.4522658
o2_mean_250m_ann   3.7071018
o2_mean_0m_ann     3.4441561
chl_mean           2.6938371
temp_mean          2.5287651
o2_mean_60m_ann    1.9918635
sal_mean           1.9409870
ssh_mean           1.9325752
o2_mean_250m       1.8523367
o2_mean_60m        1.6689402
vostr_mean         1.0311990
bathy_sd           1.0131874
mld_mean           1.0037532
uostr_mean         0.9663831
uo_mean            0.9203555
vo_mean            0.6778727
pred_var           0.5369536
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index        var2.names int.size
1           3         temp_mean          1        o2_mean_0m   370.38
2          18 o2_mean_250m_seas         14      o2_mean_250m   306.19
3          18 o2_mean_250m_seas         11        bathy_mean   305.67
4          13       o2_mean_60m         11        bathy_mean   224.97
5          17  o2_mean_60m_seas          4          sal_mean   204.17
6          11        bathy_mean          2          chl_mean   135.89
7          20   o2_mean_60m_ann         11        bathy_mean   134.81
8          11        bathy_mean          3         temp_mean   133.33
9          11        bathy_mean          5           uo_mean   129.99
10          2          chl_mean          1        o2_mean_0m   114.45
11         11        bathy_mean          4          sal_mean   102.36
12         16   o2_mean_0m_seas          3         temp_mean    96.45
13         16   o2_mean_0m_seas          9          ssh_mean    94.23
14         20   o2_mean_60m_ann          7           vo_mean    90.49
15         11        bathy_mean          9          ssh_mean    87.48
16          4          sal_mean          3         temp_mean    83.53
17          8        vostr_mean          1        o2_mean_0m    81.07
18         21  o2_mean_250m_ann         18 o2_mean_250m_seas    80.52
19         18 o2_mean_250m_seas          1        o2_mean_0m    72.67
20          6        uostr_mean          1        o2_mean_0m    72.01
21         19    o2_mean_0m_ann         17  o2_mean_60m_seas    67.89
22         13       o2_mean_60m          3         temp_mean    66.18
[1] "External percent deviance explained"
[1] -4.334845

[1] "TPR"
[1] 0.2553172
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9223993 -0.9289301 0.009656197 0.9896762         -4.334845 0.8592173

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.1892159
Correlation        0.9579015
AUC                0.9977000
Per.Expl          86.3508545
cvDeviance         0.4502540
cvCorrelation      0.8578266
cvAUC              0.9676900
cvPer.Expl        67.5207993
[1] "Relative influence of predictor variables"

                     rel.inf
dist_coast        49.9204334
o2_mean_0m         8.5367913
o2_mean_0m_seas    5.4191441
o2_mean_250m_seas  4.6052682
lat                3.6613526
bathy_mean         3.5322270
o2_mean_60m_seas   3.1409177
chl_mean           2.4859345
o2_mean_250m_ann   2.4025619
temp_mean          2.2239870
sal_mean           1.7501465
o2_mean_60m        1.6449577
o2_mean_250m       1.5228690
o2_mean_60m_ann    1.5065172
ssh_mean           1.4200972
o2_mean_0m_ann     0.9673241
vostr_mean         0.9644953
mld_mean           0.9434420
uo_mean            0.8649698
bathy_sd           0.7097241
uostr_mean         0.6677465
vo_mean            0.6428025
pred_var           0.4662903
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index        var2.names int.size
1          20 o2_mean_250m_seas         12        bathy_mean   350.89
2          12        bathy_mean          3          chl_mean   339.39
3          12        bathy_mean          4         temp_mean   252.74
4          20 o2_mean_250m_seas         16      o2_mean_250m   233.93
5          12        bathy_mean          5          sal_mean   231.91
6          14        dist_coast          9        vostr_mean   230.46
7          22   o2_mean_60m_ann         12        bathy_mean   220.69
8           4         temp_mean          2        o2_mean_0m   210.49
9          14        dist_coast         11          mld_mean   185.40
10         15       o2_mean_60m         12        bathy_mean   177.56
11          3          chl_mean          2        o2_mean_0m   147.74
12         12        bathy_mean          6           uo_mean   144.14
13         12        bathy_mean         10          ssh_mean   142.95
14          9        vostr_mean          2        o2_mean_0m   136.67
15         15       o2_mean_60m          4         temp_mean   103.32
16         21    o2_mean_0m_ann          1               lat    99.30
17         19  o2_mean_60m_seas          5          sal_mean    97.76
18         18   o2_mean_0m_seas         13          bathy_sd    89.75
19          2        o2_mean_0m          1               lat    78.99
20         18   o2_mean_0m_seas          1               lat    77.18
21         22   o2_mean_60m_ann          2        o2_mean_0m    76.72
22         22   o2_mean_60m_ann          1               lat    68.39
23         18   o2_mean_0m_seas         10          ssh_mean    61.97
24         23  o2_mean_250m_ann         20 o2_mean_250m_seas    60.12
25         18   o2_mean_0m_seas          8           vo_mean    56.72
26         21    o2_mean_0m_ann         19  o2_mean_60m_seas    56.55
[1] "External percent deviance explained"
[1] -4.38333

[1] "TPR"
[1] 0.2550322
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9237878 -0.9312165 0.009047522 0.9890362          -4.38333 0.8635085

AGI models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2421421
Correlation        0.9418055
AUC                0.9951000
Per.Expl          82.5329751
cvDeviance         0.5074723
cvCorrelation      0.8361282
cvAUC              0.9592900
cvPer.Expl        63.3932574
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    28.8388522
temp_mean     21.3968548
AGI_250m_seas 15.4230725
uostr_mean     5.4030524
AGI_0m_seas    5.3820944
sal_mean       4.6015206
AGI_60m_seas   4.2373703
chl_mean       3.4946950
ssh_mean       2.7499598
mld_mean       1.7088384
vostr_mean     1.7086736
bathy_sd       1.6872755
uo_mean        1.3773158
vo_mean        1.2699474
pred_var       0.7204775
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          10    bathy_mean          3   sal_mean   513.24
2          15 AGI_250m_seas          2  temp_mean   439.74
3          10    bathy_mean          2  temp_mean   305.07
4          15 AGI_250m_seas         10 bathy_mean   256.15
5          14  AGI_60m_seas         10 bathy_mean   207.41
6          13   AGI_0m_seas          2  temp_mean   201.25
7          13   AGI_0m_seas          6    vo_mean   184.66
8          14  AGI_60m_seas          2  temp_mean   181.04
9          10    bathy_mean          4    uo_mean   143.19
10          2     temp_mean          1   chl_mean   132.10
11          6       vo_mean          3   sal_mean   131.10
[1] "External percent deviance explained"
[1] -3.890904

[1] "TPR"
[1] 0.257047
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9083448 -0.9128889 0.01332724 0.9877051         -3.890904 0.8253298

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2155743
Correlation        0.9500420
AUC                0.9967000
Per.Expl          84.4494519
cvDeviance         0.4841299
cvCorrelation      0.8444286
cvAUC              0.9628100
cvPer.Expl        65.0770781
[1] "Relative influence of predictor variables"

                 rel.inf
dist_coast    52.3084109
lat            7.5685093
AGI_250m_seas  6.5650645
bathy_mean     4.9318242
temp_mean      4.7807364
AGI_0m_seas    4.5184742
AGI_60m_seas   3.7734783
sal_mean       3.4432083
chl_mean       3.2167249
ssh_mean       1.8300516
mld_mean       1.4287051
vostr_mean     1.1808924
uo_mean        1.1091414
vo_mean        0.9476019
uostr_mean     0.9377008
bathy_sd       0.8703324
pred_var       0.5891434
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          11    bathy_mean          3  temp_mean   467.69
2          15   AGI_0m_seas          7    vo_mean   276.78
3          17 AGI_250m_seas         11 bathy_mean   271.41
4           3     temp_mean          1        lat   255.78
5          13    dist_coast         10   mld_mean   230.24
6          15   AGI_0m_seas          4   sal_mean   207.85
7           4      sal_mean          1        lat   151.54
8          11    bathy_mean          2   chl_mean   151.13
9           6    uostr_mean          1        lat   142.70
10          3     temp_mean          2   chl_mean   141.51
11         11    bathy_mean          9   ssh_mean   131.67
12         13    dist_coast          8 vostr_mean   107.96
13         16  AGI_60m_seas         11 bathy_mean   106.29
14         13    dist_coast          1        lat   100.32
[1] "External percent deviance explained"
[1] -4.131152

[1] "TPR"
[1] 0.2558366
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9161296 -0.9231533 0.01083895 0.9905151         -4.131152 0.8444945

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2628935
Correlation        0.9354953
AUC                0.9940000
Per.Expl          81.0360614
cvDeviance         0.5359687
cvCorrelation      0.8240669
cvAUC              0.9553400
cvPer.Expl        61.3376613
[1] "Relative influence of predictor variables"

               rel.inf
bathy_mean   29.262030
temp_mean    21.315013
AGI_250m_ann 14.991258
uostr_mean    5.778944
sal_mean      5.124400
AGI_60m_ann   4.443709
chl_mean      3.980368
ssh_mean      3.370811
AGI_0m_ann    2.546665
bathy_sd      2.059610
vostr_mean    1.952209
mld_mean      1.619756
uo_mean       1.510746
vo_mean       1.324461
pred_var      0.720020
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index  var2.names int.size
1          10   bathy_mean          3    sal_mean   726.81
2          14  AGI_60m_ann         10  bathy_mean   459.19
3          13   AGI_0m_ann          2   temp_mean   378.68
4          15 AGI_250m_ann          2   temp_mean   273.55
5          13   AGI_0m_ann          3    sal_mean   218.74
6          10   bathy_mean          2   temp_mean   216.81
7          14  AGI_60m_ann          2   temp_mean   206.70
8          15 AGI_250m_ann         14 AGI_60m_ann   187.66
9          15 AGI_250m_ann         13  AGI_0m_ann   184.90
10         14  AGI_60m_ann          3    sal_mean   136.67
11          6      vo_mean          3    sal_mean   133.79
[1] "External percent deviance explained"
[1] -3.619672

[1] "TPR"
[1] 0.2575821
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7900 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE        Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.9025154 -0.9080129 0.0144611 0.9867976         -3.619672 0.8103606

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2360291
Correlation        0.9435098
AUC                0.9956000
Per.Expl          82.9739352
cvDeviance         0.5061817
cvCorrelation      0.8352547
cvAUC              0.9599800
cvPer.Expl        63.4863619
[1] "Relative influence of predictor variables"

                rel.inf
dist_coast   51.6411632
lat           8.4674358
AGI_60m_ann   6.0984294
bathy_mean    5.2557204
AGI_250m_ann  4.8594233
temp_mean     4.7777648
chl_mean      3.8187141
sal_mean      3.5230954
AGI_0m_ann    2.2155331
ssh_mean      2.0640587
mld_mean      1.3580563
vostr_mean    1.3000387
uo_mean       1.1310191
uostr_mean    1.0961652
bathy_sd      0.9743069
vo_mean       0.8236064
pred_var      0.5954694
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index  var2.names int.size
1          16  AGI_60m_ann         11  bathy_mean   511.73
2          11   bathy_mean          3   temp_mean   409.15
3           6   uostr_mean          1         lat   217.27
4           3    temp_mean          1         lat   197.20
5          15   AGI_0m_ann          3   temp_mean   188.34
6          11   bathy_mean          1         lat   169.31
7           2     chl_mean          1         lat   168.59
8           4     sal_mean          1         lat   164.04
9          17 AGI_250m_ann         16 AGI_60m_ann   155.97
10         15   AGI_0m_ann          4    sal_mean   147.21
11         13   dist_coast         10    mld_mean   142.77
12         13   dist_coast          8  vostr_mean   121.93
13         11   bathy_mean          2    chl_mean   113.84
14         15   AGI_0m_ann         11  bathy_mean   110.48
[1] "External percent deviance explained"
[1] -3.873646

[1] "TPR"
[1] 0.2563679
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.9105987 -0.9175181 0.0118898 0.9886284         -3.873646 0.8297394

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.1865064
Correlation        0.9596064
AUC                0.9980000
Per.Expl          86.5462763
cvDeviance         0.4570929
cvCorrelation      0.8569081
cvAUC              0.9661000
cvPer.Expl        67.0274031
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    27.3057785
temp_mean     19.7302461
AGI_250m_seas 10.1427791
AGI_0m         6.7539420
uostr_mean     5.0843977
sal_mean       3.7797233
AGI_0m_seas    3.4398070
AGI_250m_ann   3.0656614
AGI_250m       2.9094675
AGI_60m_seas   2.6130226
chl_mean       2.2934217
AGI_60m_ann    2.2513231
ssh_mean       2.1152245
AGI_60m        1.3693592
bathy_sd       1.3626378
vostr_mean     1.2977303
AGI_0m_ann     1.1555551
uo_mean        1.0380480
vo_mean        0.9418376
mld_mean       0.9201980
pred_var       0.4298396
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          12        AGI_0m          2   temp_mean  3310.95
2          20   AGI_60m_ann         10  bathy_mean   357.98
3          19    AGI_0m_ann         16 AGI_0m_seas   305.39
4          16   AGI_0m_seas          6     vo_mean   284.87
5          10    bathy_mean          3    sal_mean   213.26
6          18 AGI_250m_seas          2   temp_mean   183.13
7          18 AGI_250m_seas         10  bathy_mean   172.38
8          12        AGI_0m          5  uostr_mean   170.11
9          19    AGI_0m_ann         10  bathy_mean   160.66
10          4       uo_mean          2   temp_mean   149.15
11         20   AGI_60m_ann         16 AGI_0m_seas   137.50
12         16   AGI_0m_seas         13     AGI_60m   136.91
13         12        AGI_0m         10  bathy_mean   134.92
14         10    bathy_mean          2   temp_mean   129.71
15         21  AGI_250m_ann          3    sal_mean   113.98
16         10    bathy_mean          4     uo_mean   111.18
17         19    AGI_0m_ann         11    bathy_sd   109.09
18         12        AGI_0m          3    sal_mean   106.63
19          5    uostr_mean          2   temp_mean    94.19
20         12        AGI_0m          8    ssh_mean    86.97
21         18 AGI_250m_seas         14    AGI_250m    85.14
22         13       AGI_60m         10  bathy_mean    83.26
[1] "External percent deviance explained"
[1] -4.308013

[1] "TPR"
[1] 0.254485
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8350 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained  PseudoR2
1 0.9259021 -0.9373609 0.00797838 0.9897775         -4.308013 0.8654628

explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.1794387
Correlation        0.9613569
AUC                0.9982000
Per.Expl          87.0561087
cvDeviance         0.4428947
cvCorrelation      0.8609286
cvAUC              0.9681400
cvPer.Expl        68.0515947
[1] "Relative influence of predictor variables"

                 rel.inf
dist_coast    50.6297312
lat            7.4773131
AGI_0m         5.8649772
AGI_60m_ann    4.3063983
bathy_mean     3.7610000
temp_mean      3.3346837
AGI_0m_seas    3.0194786
AGI_250m_seas  3.0002579
sal_mean       2.3172896
chl_mean       2.2989785
AGI_60m_seas   1.9360137
AGI_250m_ann   1.6666857
AGI_250m       1.5038736
ssh_mean       1.3536386
AGI_0m_ann     1.2578651
AGI_60m        1.1737055
vostr_mean     0.9598372
mld_mean       0.8927208
uo_mean        0.8547717
uostr_mean     0.7902823
vo_mean        0.6354592
bathy_sd       0.6025325
pred_var       0.3625061
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          13        AGI_0m          3   temp_mean   868.66
2          13        AGI_0m          1         lat   559.83
3          22   AGI_60m_ann         11  bathy_mean   478.77
4          21    AGI_0m_ann         18 AGI_0m_seas   388.44
5          18   AGI_0m_seas          7     vo_mean   227.09
6          13        AGI_0m         11  bathy_mean   216.84
7          14    dist_coast         10    mld_mean   212.25
8          20 AGI_250m_seas         11  bathy_mean   183.69
9          21    AGI_0m_ann         11  bathy_mean   174.45
10         18   AGI_0m_seas         15     AGI_60m   169.88
11         14    dist_coast          8  vostr_mean   150.05
12          6    uostr_mean          1         lat   144.30
13         22   AGI_60m_ann         18 AGI_0m_seas   105.52
14         15       AGI_60m         11  bathy_mean    78.87
15          3     temp_mean          1         lat    68.14
16         14    dist_coast          1         lat    67.04
17         19  AGI_60m_seas         13      AGI_0m    66.47
18         23  AGI_250m_ann          4    sal_mean    62.44
19         11    bathy_mean          2    chl_mean    62.31
20         11    bathy_mean          3   temp_mean    61.34
21         23  AGI_250m_ann          9    ssh_mean    61.03
22         22   AGI_60m_ann          4    sal_mean    57.74
23         13        AGI_0m          6  uostr_mean    57.31
24         11    bathy_mean          9    ssh_mean    56.58
25         23  AGI_250m_ann         22 AGI_60m_ann    45.98
26         13        AGI_0m          9    ssh_mean    43.15
[1] "External percent deviance explained"
[1] -4.421388

[1] "TPR"
[1] 0.2541551
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8150 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE        Cor     C-index PredRatio DevianceExplained  PseudoR2
1 0.9284584 -0.9393181 0.007349749 0.9900364         -4.421388 0.8705611

Summary table of results

output_sum_seas_ann <- read.csv(here("data/brt/mod_outputs/brt_background_seas_ann_output_summary.csv"))
kableExtra::kable(output_sum)

model	percent_explained	deviance_exp	TPR_mean	TSS	AUC	RMSE	SpearmanCor	PseudoR2
base_0m_Nspat_Ntag	78.734	0.724	0.739	0.870	0.979	0.231	0.888	0.787
base_0m_Nspat_Ytag	92.976	0.876	0.761	0.961	0.994	0.141	0.960	0.930
base_0m_Yspat_Ytag	93.544	0.887	0.770	0.964	0.995	0.125	0.963	0.935
do_0m_Nspat_Ytag	94.201	0.901	0.772	0.971	0.996	0.124	0.969	0.942
do_0m_Yspat_Ytag	95.618	0.920	0.788	0.977	0.997	0.110	0.976	0.956
do_0m_60m_Nspat_Ytag	94.865	0.908	0.775	0.973	0.997	0.119	0.972	0.949
do_0m_250m_Nspat_Ytag	95.069	0.909	0.783	0.974	0.996	0.119	0.972	0.951
do_0m_60m_250m_Nspat_Ytag	95.132	0.913	0.783	0.976	0.997	0.116	0.973	0.951
do_0m_60m_250m_Yspat_Ytag	95.186	0.918	0.784	0.977	0.997	0.113	0.975	0.952
agi_0m_Nspat_Ytag	93.845	0.901	0.765	0.971	0.997	0.124	0.970	0.938
agi_0m_Yspat_Ytag	94.754	0.916	0.776	0.975	0.998	0.114	0.974	0.948
agi_0m_60m_Nspat_Ytag	94.548	0.908	0.765	0.973	0.997	0.119	0.972	0.945
agi_0m_250m_Nspat_Ytag	93.059	0.897	0.767	0.967	0.997	0.129	0.967	0.931
agi_0m_60m_250m_Nspat_Ytag	94.111	0.907	0.767	0.972	0.997	0.122	0.971	0.941
agi_0m_60m_250m_Yspat_Ytag	95.406	0.920	0.777	0.976	0.998	0.111	0.975	0.954

output_sum_seas_ann_Nspat <- output_sum_seas_ann %>% 
  filter(!grepl("Yspat", model))

ggplot(output_sum_seas_ann_Nspat, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50',
                  max.overlaps = 20,
                  label.size = 0.5)

Conclusions from initial seasonal/annual models

Seasonal and annual base models performed better than the daily resolution base models, with the annual base model performing better than the seasonal one.
The DO and AGI models with all depth layers and temporal resolutions were by far the best performing and had nearly identical scores across evaluation metrics. The models that also included spatial predictors also performed slightly better than those without, but were still fairly comparable.
For the DO model with all temporal resolutions, the top predictor variables with the highest relative importance were bathymetry and DO_0m_daily. The next variables that have considerably lower values are DO_250m_seasonal and DO_0m_seasonal. Partial plots follow similar trends as previously described.
For the AGI model with all temporal resolutions, bathymetry and temperature were the two predictors with the highest relative influence. The next variables that have considerably lower values are AGI_250m_seasonal and AGI_0m_seasonal.

Model fine-tuning and selection

Here, I take the two best performing models from the above sections (agi and do with all depths and temporal resolutions without tag ID or spatial variables as predictors) to be used as overfit reference models. The following model options excluded the wind predictors as these consistently had lower relative importance than the random predictor variable we included. I also included a combo model that uses information about AGI at 250 m and DO at 0m across temporal resolutions. Lastly, the final models also remove do/agi at 60m and at a seasonal resolution, as these were typically the vars with the lowest predictive performance relative to the other depth layers and resolutions.

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_base_0m_dail_no_wind.rds",
            test_data = base_test_daily)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862910
Residual.Deviance  0.3084231
Correlation        0.9196032
AUC                0.9905000
Per.Expl          77.7519251
cvDeviance         0.6001754
cvCorrelation      0.7986359
cvAUC              0.9450200
cvPer.Expl        56.7063877
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 39.840307
temp_mean  26.480383
sal_mean    8.651181
ssh_mean    8.204863
chl_mean    6.841676
bathy_sd    4.361221
mld_mean    3.560571
pred_var    2.059798
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          6 bathy_mean          4   ssh_mean  1488.69
2          6 bathy_mean          2  temp_mean  1321.98
3          6 bathy_mean          3   sal_mean  1048.20
[1] "External percent deviance explained"
[1] 0.731709

[1] "TPR"
[1] 0.740062
[1] "TSS"
[1] 0.8763352
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4750 iterations were performed.
There were 8 predictors of which 8 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2267729 0.8935136 0.9811617 0.9938523          0.731709 0.7775193

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.1909322
Correlation        0.9569687
AUC                0.9975000
Per.Expl          86.2271541
cvDeviance         0.4463350
cvCorrelation      0.8600541
cvAUC              0.9682600
cvPer.Expl        67.8037354
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        26.2112221
o2_mean_0m        21.2023536
o2_mean_250m_seas 10.1374648
o2_mean_0m_seas    8.4989877
o2_mean_60m_seas   5.2518224
o2_mean_250m_ann   4.0182405
o2_mean_0m_ann     3.4428506
chl_mean           2.9180736
temp_mean          2.7140167
o2_mean_250m       2.6253593
ssh_mean           2.6104859
sal_mean           2.4001687
o2_mean_60m_ann    2.3289369
o2_mean_60m        2.3143331
bathy_sd           1.3897784
mld_mean           1.2730290
pred_var           0.6628767
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index       var2.names int.size
1           3        temp_mean          1       o2_mean_0m   780.03
2          16  o2_mean_60m_ann          7       bathy_mean   417.55
3          13 o2_mean_60m_seas          4         sal_mean   365.89
4           7       bathy_mean          2         chl_mean   256.48
5           4         sal_mean          1       o2_mean_0m   238.13
6          16  o2_mean_60m_ann         13 o2_mean_60m_seas   224.10
7          16  o2_mean_60m_ann          9      o2_mean_60m   214.13
8          12  o2_mean_0m_seas          3        temp_mean   202.33
9           7       bathy_mean          5         ssh_mean   191.77
10          9      o2_mean_60m          7       bathy_mean   190.11
11          2         chl_mean          1       o2_mean_0m   171.76
12          5         ssh_mean          4         sal_mean   164.64
13         16  o2_mean_60m_ann          3        temp_mean   158.76
14          7       bathy_mean          3        temp_mean   143.24
[1] "External percent deviance explained"
[1] 0.8138392

[1] "TPR"
[1] 0.7447363
[1] "TSS"
[1] 0.9235923
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.182767 0.9320937 0.9901405 0.9997562         0.8138392 0.8622715

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.1804510
Correlation        0.9611911
AUC                0.9978000
Per.Expl          86.9832054
cvDeviance         0.4441876
cvCorrelation      0.8615975
cvAUC              0.9680800
cvPer.Expl        67.9586240
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    30.0158951
temp_mean     22.6707636
AGI_250m_seas  9.1916259
AGI_0m         6.6966664
AGI_0m_seas    3.8422741
sal_mean       3.5425059
AGI_250m_ann   3.2096436
ssh_mean       3.1559174
AGI_60m_seas   2.9237078
AGI_60m_ann    2.8613663
chl_mean       2.6961486
AGI_250m       2.6213203
AGI_60m        1.6572538
AGI_0m_ann     1.6292092
bathy_sd       1.6259866
mld_mean       1.0917162
pred_var       0.5679992
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  4282.43
2          15    AGI_0m_ann          6  bathy_mean   498.47
3          16   AGI_60m_ann          6  bathy_mean   428.77
4          15    AGI_0m_ann         12 AGI_0m_seas   302.09
5           6    bathy_mean          3    sal_mean   265.89
6           3      sal_mean          2   temp_mean   254.57
7          12   AGI_0m_seas          9     AGI_60m   224.51
8          14 AGI_250m_seas          2   temp_mean   215.85
9           8        AGI_0m          4    ssh_mean   210.70
10         12   AGI_0m_seas          2   temp_mean   189.86
11         16   AGI_60m_ann         12 AGI_0m_seas   180.44
12          6    bathy_mean          2   temp_mean   138.32
13          9       AGI_60m          6  bathy_mean   130.20
14         17  AGI_250m_ann          3    sal_mean   123.95
[1] "External percent deviance explained"
[1] 0.8308918

[1] "TPR"
[1] 0.7459277
[1] "TSS"
[1] 0.9328179
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1727172 0.9399568 0.9925128 0.9983244         0.8308918 0.8698321

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_250_DO_0_dail_seas_ann.rds",
            test_data = readRDS(here("data/brt/mod_eval/back/agi_do_test_daily_seasonal_annual.rds")))

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.3060410
Correlation        0.9199450
AUC                0.9904000
Per.Expl          77.9237999
cvDeviance         0.5653657
cvCorrelation      0.8125080
cvAUC              0.9506300
cvPer.Expl        59.2174651
[1] "Relative influence of predictor variables"

                   rel.inf
bathy_mean      32.7670441
temp_mean       23.3647635
AGI_250m_seas   12.4431526
ssh_mean         5.8816844
sal_mean         5.3796841
chl_mean         4.5083168
AGI_250m_ann     4.4835903
AGI_250m         3.0054229
bathy_sd         2.2749334
mld_mean         1.9481681
pred_var         1.1300635
o2_mean_0m_seas  0.9638811
o2_mean_0m       0.9352572
o2_mean_0m_ann   0.9140382
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index    var2.names int.size
1          10 AGI_250m_seas          3      sal_mean   566.08
2           6    bathy_mean          3      sal_mean   463.43
3          10 AGI_250m_seas          8      AGI_250m   362.22
4           6    bathy_mean          2     temp_mean   352.20
5           4      ssh_mean          3      sal_mean   297.74
6           3      sal_mean          2     temp_mean   283.95
7           2     temp_mean          1      chl_mean   272.89
8          10 AGI_250m_seas          6    bathy_mean   255.92
9          10 AGI_250m_seas          2     temp_mean   254.76
10         11  AGI_250m_ann         10 AGI_250m_seas   219.10
[1] "External percent deviance explained"
[1] 0.7119036

[1] "TPR"
[1] 0.73811
[1] "TSS"
[1] 0.8507341
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2394772 0.8793008 0.9773403 0.9982518         0.7119036 0.779238

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.2038413
Correlation        0.9528997
AUC                0.9971000
Per.Expl          85.2959607
cvDeviance         0.4624112
cvCorrelation      0.8532735
cvAUC              0.9663100
cvPer.Expl        66.6440829
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        27.1431098
o2_mean_0m        21.2576581
o2_mean_250m_seas 14.6524628
o2_mean_0m_seas   10.3343085
o2_mean_250m_ann   3.7807478
chl_mean           3.5499333
temp_mean          3.4191758
o2_mean_250m       3.3145643
o2_mean_0m_ann     2.9473000
sal_mean           2.8866095
ssh_mean           2.6457934
bathy_sd           1.7336484
mld_mean           1.5223918
pred_var           0.8122965
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index var2.names int.size
1           3         temp_mean          1 o2_mean_0m   550.04
2          12 o2_mean_250m_seas          4   sal_mean   330.52
3           5          ssh_mean          4   sal_mean   277.30
4           7        bathy_mean          2   chl_mean   228.67
5          13    o2_mean_0m_ann          3  temp_mean   217.07
6           7        bathy_mean          3  temp_mean   200.66
7          11   o2_mean_0m_seas          5   ssh_mean   200.53
8           2          chl_mean          1 o2_mean_0m   197.61
9           7        bathy_mean          5   ssh_mean   180.78
10         14  o2_mean_250m_ann          7 bathy_mean   176.73
[1] "External percent deviance explained"
[1] 0.8042935

[1] "TPR"
[1] 0.7442145
[1] "TSS"
[1] 0.9193297
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1882474 0.9277648 0.9891198 0.9997909         0.8042935 0.8529596

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_ann.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.2049642
Correlation        0.9527920
AUC                0.9969000
Per.Expl          85.2149594
cvDeviance         0.4668691
cvCorrelation      0.8528284
cvAUC              0.9654000
cvPer.Expl        66.3225079
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m       29.2528201
bathy_mean       27.2139698
o2_mean_250m_ann 10.3488658
o2_mean_60m_ann   5.1891939
o2_mean_60m       4.3862218
o2_mean_250m      4.1055387
o2_mean_0m_ann    3.5061360
chl_mean          3.3512300
temp_mean         3.2633686
ssh_mean          2.7930406
sal_mean          2.7415436
bathy_sd          1.5875312
mld_mean          1.4668771
pred_var          0.7936627
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index      var1.names var2.index  var2.names int.size
1           3       temp_mean          1  o2_mean_0m   868.81
2          13 o2_mean_60m_ann          9 o2_mean_60m   616.51
3          13 o2_mean_60m_ann          7  bathy_mean   508.10
4           2        chl_mean          1  o2_mean_0m   331.31
5           7      bathy_mean          5    ssh_mean   225.81
6           7      bathy_mean          2    chl_mean   202.56
7           4        sal_mean          1  o2_mean_0m   189.29
8           9     o2_mean_60m          7  bathy_mean   174.66
9           5        ssh_mean          4    sal_mean   147.68
10          5        ssh_mean          1  o2_mean_0m   141.09
[1] "External percent deviance explained"
[1] 0.8031895

[1] "TPR"
[1] 0.7442169
[1] "TSS"
[1] 0.9181552
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8500 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1884739 0.927647 0.9890864  1.001509         0.8031895 0.8521496

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.2292116
Correlation        0.9440405
AUC                0.9955000
Per.Expl          83.4658799
cvDeviance         0.4740094
cvCorrelation      0.8484897
cvAUC              0.9646700
cvPer.Expl        65.8074456
[1] "Relative influence of predictor variables"

                     rel.inf
o2_mean_0m_seas   27.9505776
bathy_mean        26.9351820
o2_mean_250m_seas  9.8844047
o2_mean_60m_seas   6.3680630
o2_mean_250m_ann   5.0995438
o2_mean_0m_ann     3.8092643
ssh_mean           3.7296212
chl_mean           3.3491603
o2_mean_60m_ann    3.2545759
temp_mean          3.2487986
sal_mean           2.7231927
mld_mean           1.4744168
bathy_sd           1.4443932
pred_var           0.7288057
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index       var2.names int.size
1          10  o2_mean_60m_seas          3         sal_mean   479.91
2          13   o2_mean_60m_ann          2        temp_mean   478.20
3          13   o2_mean_60m_ann          6       bathy_mean   418.57
4          13   o2_mean_60m_ann         10 o2_mean_60m_seas   254.04
5          12    o2_mean_0m_ann          3         sal_mean   220.61
6           9   o2_mean_0m_seas          4         ssh_mean   220.06
7          12    o2_mean_0m_ann          2        temp_mean   194.82
8           6        bathy_mean          4         ssh_mean   187.13
9          11 o2_mean_250m_seas         10 o2_mean_60m_seas   179.16
10         13   o2_mean_60m_ann         12   o2_mean_0m_ann   167.38
[1] "External percent deviance explained"
[1] 0.7919195

[1] "TPR"
[1] 0.7437238
[1] "TSS"
[1] 0.9095314
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1958424 0.9215876 0.9881686 0.9957744         0.7919195 0.8346588

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_ann.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.2448288
Correlation        0.9388358
AUC                0.9943000
Per.Expl          82.3393335
cvDeviance         0.4840649
cvCorrelation      0.8454748
cvAUC              0.9630700
cvPer.Expl        65.0820953
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m       29.2796312
bathy_mean       28.8141343
o2_mean_250m_ann 13.4961708
o2_mean_250m      5.7919607
o2_mean_0m_ann    4.5850995
temp_mean         3.9269822
chl_mean          3.6838678
sal_mean          3.1729942
ssh_mean          2.9109508
bathy_sd          1.9790000
mld_mean          1.4918958
pred_var          0.8673126
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index     var1.names var2.index var2.names int.size
1          3      temp_mean          1 o2_mean_0m   944.89
2          2       chl_mean          1 o2_mean_0m   382.19
3         11 o2_mean_0m_ann          3  temp_mean   371.49
4          5       ssh_mean          4   sal_mean   265.73
5          9   o2_mean_250m          4   sal_mean   251.19
6          7     bathy_mean          2   chl_mean   230.35
7          7     bathy_mean          5   ssh_mean   220.35
[1] "External percent deviance explained"
[1] 0.778455

[1] "TPR"
[1] 0.7426068
[1] "TSS"
[1] 0.9045105
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2031527 0.9150966 0.9860506  1.001674          0.778455 0.8233933

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_ann_refined.rds",
            test_data = do_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.2398020
Correlation        0.9406621
AUC                0.9948000
Per.Expl          82.7019431
cvDeviance         0.4915797
cvCorrelation      0.8422300
cvAUC              0.9620000
cvPer.Expl        64.5400167
[1] "Relative influence of predictor variables"

                   rel.inf
o2_mean_0m       31.140330
bathy_mean       28.749425
o2_mean_250m_ann 19.411834
temp_mean         4.579720
chl_mean          4.062427
sal_mean          3.623802
ssh_mean          3.214311
bathy_sd          2.324424
mld_mean          1.777783
pred_var          1.115944
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          3  temp_mean          1 o2_mean_0m  1044.00
2          2   chl_mean          1 o2_mean_0m   519.00
3          5   ssh_mean          4   sal_mean   363.68
4          7 bathy_mean          3  temp_mean   352.14
5          7 bathy_mean          5   ssh_mean   333.50
[1] "External percent deviance explained"
[1] 0.7801465

[1] "TPR"
[1] 0.7428095
[1] "TSS"
[1] 0.9019239
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8050 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2024326 0.9157098 0.9864209  1.001757         0.7801465 0.8270194

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.2010998
Correlation        0.9542854
AUC                0.9970000
Per.Expl          85.4936623
cvDeviance         0.4550922
cvCorrelation      0.8583417
cvAUC              0.9666300
cvPer.Expl        67.1719242
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean    30.944500
temp_mean     20.937253
AGI_250m_seas 10.045099
AGI_0m         7.256943
ssh_mean       6.121208
AGI_250m_ann   4.606362
sal_mean       4.514486
AGI_0m_seas    4.190922
chl_mean       3.030145
AGI_250m       2.700179
bathy_sd       2.167797
AGI_0m_ann     2.088657
mld_mean       1.396450
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index  var2.names int.size
1          8        AGI_0m          2   temp_mean  4136.97
2          6    bathy_mean          3    sal_mean   529.40
3          3      sal_mean          2   temp_mean   284.85
4         13  AGI_250m_ann          3    sal_mean   275.46
5         11 AGI_250m_seas          6  bathy_mean   272.17
6          8        AGI_0m          4    ssh_mean   255.06
7         13  AGI_250m_ann         12  AGI_0m_ann   244.55
8         12    AGI_0m_ann         10 AGI_0m_seas   213.62
[1] "External percent deviance explained"
[1] 0.8167617

[1] "TPR"
[1] 0.7449762
[1] "TSS"
[1] 0.9286281
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1797948 0.9348405 0.9907361 0.9974215         0.8167617 0.8549366

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_ann.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.2032064
Correlation        0.9540643
AUC                0.9968000
Per.Expl          85.3417499
cvDeviance         0.4663864
cvCorrelation      0.8534616
cvAUC              0.9653400
cvPer.Expl        66.3573172
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean   30.5304868
temp_mean    23.4915348
AGI_250m_ann  9.6682724
AGI_0m        8.1928263
AGI_250m      4.5223145
AGI_60m_ann   4.1493352
sal_mean      3.8868656
ssh_mean      3.5482652
chl_mean      3.2514216
AGI_60m       2.3822009
AGI_0m_ann    2.2132672
bathy_sd      2.0657467
mld_mean      1.3576929
pred_var      0.7397698
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1           8       AGI_0m          2  temp_mean  6156.07
2          12   AGI_0m_ann          6 bathy_mean   479.59
3           6   bathy_mean          3   sal_mean   425.20
4          13  AGI_60m_ann          6 bathy_mean   339.23
5           8       AGI_0m          6 bathy_mean   242.36
6           3     sal_mean          2  temp_mean   206.38
7           6   bathy_mean          2  temp_mean   198.40
8           9      AGI_60m          6 bathy_mean   162.81
9           8       AGI_0m          4   ssh_mean   153.26
10         14 AGI_250m_ann          6 bathy_mean   147.24
[1] "External percent deviance explained"
[1] 0.8166798

[1] "TPR"
[1] 0.7452172
[1] "TSS"
[1] 0.9252425
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8900 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1810393 0.9338172 0.9912345 0.9977871         0.8166798 0.8534175

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.2074970
Correlation        0.9534637
AUC                0.9970000
Per.Expl          85.0322475
cvDeviance         0.4828283
cvCorrelation      0.8458918
cvAUC              0.9632300
cvPer.Expl        65.1712825
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    30.5184692
temp_mean     23.3377420
AGI_250m_seas 10.3276529
AGI_250m_ann   5.5760228
AGI_0m_seas    5.3250063
sal_mean       4.6047463
AGI_60m_seas   4.1155872
chl_mean       3.4446687
AGI_60m_ann    3.1418888
ssh_mean       3.0303836
AGI_0m_ann     2.1425244
bathy_sd       1.9534824
mld_mean       1.6448171
pred_var       0.8370084
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index    var2.names int.size
1           6    bathy_mean          3      sal_mean   711.68
2          12    AGI_0m_ann          6    bathy_mean   407.71
3          13   AGI_60m_ann          6    bathy_mean   383.91
4           3      sal_mean          2     temp_mean   325.84
5          13   AGI_60m_ann          9   AGI_0m_seas   286.42
6          10  AGI_60m_seas          2     temp_mean   250.03
7           9   AGI_0m_seas          1      chl_mean   247.46
8          14  AGI_250m_ann         11 AGI_250m_seas   235.08
9          11 AGI_250m_seas          2     temp_mean   227.58
10         12    AGI_0m_ann          9   AGI_0m_seas   189.43
[1] "External percent deviance explained"
[1] 0.8063937

[1] "TPR"
[1] 0.7446677
[1] "TSS"
[1] 0.9211093
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9350 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1861977 0.929935 0.9901722 0.9965861         0.8063937 0.8503225

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_ann.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.2335348
Correlation        0.9436237
AUC                0.9953000
Per.Expl          83.1540136
cvDeviance         0.4849331
cvCorrelation      0.8459238
cvAUC              0.9627400
cvPer.Expl        65.0194514
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean   32.5059991
temp_mean    23.2340032
AGI_250m_ann 10.6427249
AGI_0m        8.6595033
ssh_mean      5.0994310
sal_mean      4.4591895
AGI_250m      4.4112764
chl_mean      3.6254212
AGI_0m_ann    2.6455869
bathy_sd      2.4434493
mld_mean      1.3756022
pred_var      0.8978129
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index   var1.names var2.index var2.names int.size
1          8       AGI_0m          2  temp_mean  6122.61
2          6   bathy_mean          3   sal_mean   400.07
3         11   AGI_0m_ann          2  temp_mean   290.87
4         11   AGI_0m_ann          6 bathy_mean   235.60
5         12 AGI_250m_ann         11 AGI_0m_ann   218.64
6          8       AGI_0m          4   ssh_mean   217.18
7          6   bathy_mean          2  temp_mean   206.81
[1] "External percent deviance explained"
[1] 0.7947049

[1] "TPR"
[1] 0.7441577
[1] "TSS"
[1] 0.9103937
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8050 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1946114 0.9229508 0.9891775 0.9984766         0.7947049 0.8315401

explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_ann_refined.rds",
            test_data = agi_test_daily_seasonal_annual)

[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.2609379
Correlation        0.9333431
AUC                0.9931000
Per.Expl          81.1773004
cvDeviance         0.5019193
cvCorrelation      0.8391199
cvAUC              0.9603500
cvPer.Expl        63.7941598
[1] "Relative influence of predictor variables"

               rel.inf
bathy_mean   33.849922
temp_mean    22.651298
AGI_250m_ann 14.888652
AGI_0m        9.378249
ssh_mean      5.130144
sal_mean      5.086498
chl_mean      3.946693
bathy_sd      2.378719
mld_mean      1.594325
pred_var      1.095502
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index   var1.names var2.index var2.names int.size
1          8       AGI_0m          2  temp_mean  6464.78
2          6   bathy_mean          3   sal_mean   374.16
3         10 AGI_250m_ann          2  temp_mean   326.45
4         10 AGI_250m_ann          3   sal_mean   289.62
5          6   bathy_mean          2  temp_mean   284.01
[1] "External percent deviance explained"
[1] 0.7778535

[1] "TPR"
[1] 0.7429219
[1] "TSS"
[1] 0.9018982
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7500 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2038216 0.9148892 0.9868442 0.9964849         0.7778535 0.811773

Summary table of results

output_sum_refined <- read.csv(here("data/brt/mod_outputs/brt_bckg_refined_output_summary.csv"))
kableExtra::kable(output_sum_refined)

model	percent_explained	deviance_exp	TPR_mean	TSS	AUC	RMSE	SpearmanCor	PseudoR2
brt_base_0m_dail_no_wind	77.752	0.732	0.740	0.876	0.981	0.227	0.894	0.778
brt_do_0m_60m_250m_dail_seas_ann_no_wind	86.227	0.814	0.745	0.924	0.990	0.183	0.932	0.862
brt_agi_0m_60m_250m_dail_seas_ann_no_wind	86.983	0.831	0.746	0.933	0.993	0.173	0.939	0.869
brt_agi_250_do_0_dail_seas_ann	77.924	0.712	0.738	0.851	0.997	0.239	0.879	0.779
brt_do_0m_250m_dail_seas_ann	85.296	0.804	0.744	0.919	0.989	0.188	0.928	0.853
brt_do_0m_60m_250m_dail_ann	85.215	0.803	0.744	0.918	0.989	0.188	0.928	0.852
brt_do_0m_60m_250m_seas_ann	83.466	0.792	0.744	0.910	0.988	0.196	0.922	0.845
brt_do_0m_250m_dail_ann	82.339	0.778	0.743	0.905	0.986	0.203	0.915	0.823
brt_do_0m_250m_dail_ann_refined	82.701	0.780	0.743	0.902	0.986	0.202	0.916	0.827
brt_agi_0m_250m_dail_seas_ann	85.494	0.817	0.745	0.929	0.991	0.180	0.935	0.855
brt_agi_0m_60m_250m_dail_ann	85.342	0.817	0.745	0.925	0.991	0.181	0.934	0.853
brt_agi_0m_60m_250m_seas_ann	85.032	0.806	0.745	0.921	0.990	0.186	0.930	0.850
brt_agi_0m_250m_dail_ann	83.154	0.795	0.744	0.910	0.989	0.195	0.923	0.831
brt_agi_0m_250m_dail_ann_refined	81.177	0.778	0.743	0.902	0.987	0.204	0.915	0.812

ggplot(output_sum_refined, aes(AUC, TSS, color = deviance_exp, label = model)) +
  geom_point(size = 5) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35,
                  point.padding = 0.5,
                  segment.color = 'grey50',
                  max.overlaps = 20,
                  label.size = 0.5)

Base models

DO models

AGI models

Summary table of results

Conclusions from initial models w/ tag ID

DO models w/o tag ID

AGI models w/o tag ID

Summary table of results

Conclusions from initial models w/o tag ID

Base models w/o tag ID and w/ data at seasonal and annual resolutions

DO models w/o tag ID and w/ data at seasonal and annual resolutions

AGI models w/o tag ID and w/ data at seasonal and annual resolutions

Summary table of results

Conclusions from initial seasonal/annual models

Model fine-tuning and selection

Summary table of results

Conclusions from refined mdoels