Mako hSDM BRT preliminary results

Author

Emily Nazario

Published

August 20, 2024

Study hypotheses and objectives

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Model summaries

explore_brt(mod_file_path = brt_outputs_crw[7], 
            test_data = base_test_daily_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.7986447
Correlation        0.7174286
AUC                0.9148000
Per.Expl          42.3894599
cvDeviance         1.0127285
cvCorrelation      0.5642493
cvAUC              0.8220800
cvPer.Expl        26.9464423
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 27.367481
temp_mean  18.477169
sal_mean   10.540214
chl_mean    8.710175
ssh_mean    6.015610
mld_mean    5.861958
vostr_mean  5.303627
bathy_sd    5.270798
vo_mean     3.496298
uo_mean     3.392592
uostr_mean  2.849782
pred_var    2.714296
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   363.44
2          6    vo_mean          3   sal_mean   193.16
3         12   pred_var          4    uo_mean   168.67
4          8   ssh_mean          2  temp_mean   162.10
5          2  temp_mean          1   chl_mean   129.68
6         10 bathy_mean          8   ssh_mean   110.34
7          8   ssh_mean          1   chl_mean    98.98
[1] "External percent deviance explained"
[1] 0.3850404

[1] "TPR"
[1] 0.6952086
[1] "TSS"
[1] 0.6133057
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4150 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3706824 0.6788681 0.8917619  1.003817         0.3850404 0.4238946
explore_brt(mod_file_path = brt_outputs_back[7], 
            test_data = base_test_daily_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.2948092
Correlation        0.9249630
AUC                0.9922000
Per.Expl          78.7336988
cvDeviance         0.5909966
cvCorrelation      0.8025147
cvAUC              0.9464300
cvPer.Expl        57.3679835
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 37.726838
temp_mean  23.806676
sal_mean    7.021355
chl_mean    5.980357
ssh_mean    5.413233
uostr_mean  5.244057
vostr_mean  3.838429
bathy_sd    2.871536
mld_mean    2.581925
uo_mean     2.392186
vo_mean     1.868975
pred_var    1.254433
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   835.60
2         10 bathy_mean          8   ssh_mean   650.18
3          8   ssh_mean          2  temp_mean   556.11
4         10 bathy_mean          3   sal_mean   496.83
5         10 bathy_mean          4    uo_mean   406.37
6          3   sal_mean          2  temp_mean   343.56
7          8   ssh_mean          1   chl_mean   337.30
[1] "External percent deviance explained"
[1] -3.437823

[1] "TPR"
[1] 0.2602644
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE        Cor    C-index PredRatio DevianceExplained PseudoR2
1 0.8898691 -0.8883681 0.01999832 0.9823639         -3.437823 0.787337
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862598
Residual.Deviance  0.1784155
Correlation        0.9620158
AUC                0.9980000
Per.Expl          87.1297181
cvDeviance         0.4443144
cvCorrelation      0.8616943
cvAUC              0.9680100
cvPer.Expl        67.9486949
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean    29.410142
temp_mean     22.090687
AGI_250m_seas  8.885923
AGI_0m         7.036726
AGI_0m_seas    4.031791
sal_mean       3.863634
AGI_250m_ann   3.717057
AGI_250m       3.185755
AGI_60m_ann    3.018782
ssh_mean       2.964206
chl_mean       2.686814
AGI_60m_seas   2.575530
AGI_0m_ann     1.639762
AGI_60m        1.581991
bathy_sd       1.553865
mld_mean       1.073780
pred_var       0.683555
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  3465.55
2          15    AGI_0m_ann         12 AGI_0m_seas   386.42
3           6    bathy_mean          3    sal_mean   359.44
4          16   AGI_60m_ann          6  bathy_mean   297.40
5          16   AGI_60m_ann         12 AGI_0m_seas   290.80
6          17  AGI_250m_ann          3    sal_mean   258.66
7          14 AGI_250m_seas          6  bathy_mean   192.39
8          14 AGI_250m_seas          2   temp_mean   163.51
9           8        AGI_0m          4    ssh_mean   162.57
10          6    bathy_mean          2   temp_mean   154.13
11         14 AGI_250m_seas         10    AGI_250m   153.38
12          3      sal_mean          2   temp_mean   150.76
13         12   AGI_0m_seas          2   temp_mean   123.54
14          8        AGI_0m          3    sal_mean   103.44
[1] "External percent deviance explained"
[1] -2.981096

[1] "TPR"
[1] 0.3431204
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.8144573 -0.5895665 0.1853505 0.6929731         -2.981096 0.8712972
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.5891149
Correlation        0.8163435
AUC                0.9606000
Per.Expl          57.5041190
cvDeviance         0.8440735
cvCorrelation      0.6692730
cvAUC              0.8827300
cvPer.Expl        39.1126493
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.287654
o2_mean_250m_ann  14.631246
o2_mean_0m_seas    8.910687
o2_mean_250m_seas  7.142539
temp_mean          6.627435
o2_mean_60m_ann    5.495030
o2_mean_60m_seas   5.325783
bathy_mean         4.964121
sal_mean           4.592187
o2_mean_250m       4.465235
chl_mean           4.270698
o2_mean_0m_ann     3.305816
o2_mean_60m        3.042965
ssh_mean           2.916029
mld_mean           2.724895
bathy_sd           2.244270
pred_var           2.053409
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   508.25
2          12   o2_mean_0m_seas          1   o2_mean_0m   301.24
3          14 o2_mean_250m_seas         10 o2_mean_250m   166.76
4          12   o2_mean_0m_seas          5     ssh_mean   150.68
5          12   o2_mean_0m_seas          2     chl_mean   122.58
6           4          sal_mean          3    temp_mean   101.22
7          15    o2_mean_0m_ann          4     sal_mean    89.73
8          16   o2_mean_60m_ann          7   bathy_mean    86.63
9           9       o2_mean_60m          7   bathy_mean    81.59
10          9       o2_mean_60m          8     bathy_sd    77.18
11         11          pred_var          7   bathy_mean    66.90
12          7        bathy_mean          3    temp_mean    65.75
13         12   o2_mean_0m_seas          4     sal_mean    63.82
14         13  o2_mean_60m_seas          4     sal_mean    63.74
[1] "External percent deviance explained"
[1] 0.5285492

[1] "TPR"
[1] 0.7203563
[1] "TSS"
[1] 0.7319403
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3166705 0.7799527 0.9421311  1.001803         0.5285492 0.5750412
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862937
Residual.Deviance  0.1804510
Correlation        0.9611911
AUC                0.9978000
Per.Expl          86.9832054
cvDeviance         0.4441876
cvCorrelation      0.8615975
cvAUC              0.9680800
cvPer.Expl        67.9586240
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    30.0158951
temp_mean     22.6707636
AGI_250m_seas  9.1916259
AGI_0m         6.6966664
AGI_0m_seas    3.8422741
sal_mean       3.5425059
AGI_250m_ann   3.2096436
ssh_mean       3.1559174
AGI_60m_seas   2.9237078
AGI_60m_ann    2.8613663
chl_mean       2.6961486
AGI_250m       2.6213203
AGI_60m        1.6572538
AGI_0m_ann     1.6292092
bathy_sd       1.6259866
mld_mean       1.0917162
pred_var       0.5679992
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  4282.43
2          15    AGI_0m_ann          6  bathy_mean   498.47
3          16   AGI_60m_ann          6  bathy_mean   428.77
4          15    AGI_0m_ann         12 AGI_0m_seas   302.09
5           6    bathy_mean          3    sal_mean   265.89
6           3      sal_mean          2   temp_mean   254.57
7          12   AGI_0m_seas          9     AGI_60m   224.51
8          14 AGI_250m_seas          2   temp_mean   215.85
9           8        AGI_0m          4    ssh_mean   210.70
10         12   AGI_0m_seas          2   temp_mean   189.86
11         16   AGI_60m_ann         12 AGI_0m_seas   180.44
12          6    bathy_mean          2   temp_mean   138.32
13          9       AGI_60m          6  bathy_mean   130.20
14         17  AGI_250m_ann          3    sal_mean   123.95
[1] "External percent deviance explained"
[1] 0.8308918

[1] "TPR"
[1] 0.7459277
[1] "TSS"
[1] 0.9328179
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9000 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1727172 0.9399568 0.9925128 0.9983244         0.8308918 0.8698321
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.1909322
Correlation        0.9569687
AUC                0.9975000
Per.Expl          86.2271541
cvDeviance         0.4463350
cvCorrelation      0.8600541
cvAUC              0.9682600
cvPer.Expl        67.8037354
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        26.2112221
o2_mean_0m        21.2023536
o2_mean_250m_seas 10.1374648
o2_mean_0m_seas    8.4989877
o2_mean_60m_seas   5.2518224
o2_mean_250m_ann   4.0182405
o2_mean_0m_ann     3.4428506
chl_mean           2.9180736
temp_mean          2.7140167
o2_mean_250m       2.6253593
ssh_mean           2.6104859
sal_mean           2.4001687
o2_mean_60m_ann    2.3289369
o2_mean_60m        2.3143331
bathy_sd           1.3897784
mld_mean           1.2730290
pred_var           0.6628767
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index       var2.names int.size
1           3        temp_mean          1       o2_mean_0m   780.03
2          16  o2_mean_60m_ann          7       bathy_mean   417.55
3          13 o2_mean_60m_seas          4         sal_mean   365.89
4           7       bathy_mean          2         chl_mean   256.48
5           4         sal_mean          1       o2_mean_0m   238.13
6          16  o2_mean_60m_ann         13 o2_mean_60m_seas   224.10
7          16  o2_mean_60m_ann          9      o2_mean_60m   214.13
8          12  o2_mean_0m_seas          3        temp_mean   202.33
9           7       bathy_mean          5         ssh_mean   191.77
10          9      o2_mean_60m          7       bathy_mean   190.11
11          2         chl_mean          1       o2_mean_0m   171.76
12          5         ssh_mean          4         sal_mean   164.64
13         16  o2_mean_60m_ann          3        temp_mean   158.76
14          7       bathy_mean          3        temp_mean   143.24
[1] "External percent deviance explained"
[1] 0.8138392

[1] "TPR"
[1] 0.7447363
[1] "TSS"
[1] 0.9235923
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.182767 0.9320937 0.9901405 0.9997562         0.8138392 0.8622715

Here, dissolved oxygen was included at 0m and at all temporal resolutions, while AGI was included at 250m and all temporal resolutions.

explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_250_DO_0_dail_seas_ann.rds",
            test_data = dat_test_do_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862875
Residual.Deviance  0.6250167
Correlation        0.7985999
AUC                0.9528000
Per.Expl          54.9143526
cvDeviance         0.8554645
cvCorrelation      0.6626543
cvAUC              0.8795600
cvPer.Expl        38.2909726
[1] "Relative influence of predictor variables"

                  rel.inf
o2_mean_0m      16.470171
AGI_250m_ann    15.811049
o2_mean_0m_seas 12.815606
AGI_250m_seas    7.984658
temp_mean        7.744521
bathy_mean       7.655246
sal_mean         6.784886
AGI_250m         5.144102
chl_mean         4.881779
o2_mean_0m_ann   4.757163
ssh_mean         4.089160
mld_mean         3.229369
bathy_sd         2.632290
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index      var1.names var2.index var2.names int.size
1          3       temp_mean          1 o2_mean_0m   461.19
2          9 o2_mean_0m_seas          1 o2_mean_0m   277.38
3         12   AGI_250m_seas          4   sal_mean   266.29
4         10  o2_mean_0m_ann          3  temp_mean   262.22
5          4        sal_mean          3  temp_mean   249.21
6          5        ssh_mean          4   sal_mean   136.76
7          5        ssh_mean          2   chl_mean   126.46
8         12   AGI_250m_seas         11   AGI_250m   120.24
[1] "External percent deviance explained"
[1] 0.5094073

[1] "TPR"
[1] 0.7172322
[1] "TSS"
[1] 0.7106185
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3247552 0.7664765 0.9358957 0.9980248         0.5094073 0.5491435
#load do final model 
agi_mod <- readRDS(here("data/brt/mod_outputs/crw/ensemble/brt_agi_only.rds"))
do_mod <- readRDS(here("data/brt/mod_outputs/crw/ensemble/brt_do_final.rds"))

# make df of predictions from each model
test_ensemble_df <- readRDS(here("data/brt/mod_eval/agi_do_test_daily_seasonal_annual.rds"))

pred_testdata <- data.frame(
  do = predict.gbm(do_mod, test_ensemble_df,
                   n.trees = do_mod$gbm.call$best.trees,
                   type = "response"),
  agi = predict.gbm(agi_mod, test_ensemble_df,
                    n.trees = agi_mod$gbm.call$best.trees,
                    type = "response")
)

# Mean of probabilities
mean_prob <- rowMeans(pred_testdata)

# performance measures for "mean of probabilities"
(perf_mean_prob <- mecofun::evalSDM(test_ensemble_df$PA, mean_prob))
        AUC       TSS     Kappa      Sens      Spec       PCC        D2 thresh
1 0.9310446 0.7052047 0.7052807 0.8729466 0.8322581 0.8526573 0.4576839   0.52
#calculate percent external deviance explained
ext.residual.deviance <- calc.deviance(obs = test_ensemble_df$PA, pred=mean_prob, family="bernoulli", calc.mean=TRUE) #get % deviance
null.dev =  calc.deviance(test_ensemble_df$PA ,rep(mean(test_ensemble_df$PA),length(test_ensemble_df$PA)), family="bernoulli", calc.mean=T)
(dev=(null.dev - ext.residual.deviance)/null.dev) 
[1] 0.4576839
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6277548
Correlation        0.7973163
AUC                0.9522000
Per.Expl          54.7169289
cvDeviance         0.8617301
cvCorrelation      0.6582930
cvAUC              0.8768100
cvPer.Expl        37.8391330
[1] "Relative influence of predictor variables"

                rel.inf
temp_mean     15.504668
AGI_250m_ann  15.163685
AGI_0m        11.740021
bathy_mean    11.710734
AGI_0m_seas    8.197830
sal_mean       6.807123
AGI_250m_seas  6.751818
AGI_0m_ann     5.161877
chl_mean       5.090897
AGI_250m       4.624616
bathy_sd       3.784137
mld_mean       3.106183
pred_var       2.356411
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index    var2.names int.size
1          7        AGI_0m          2     temp_mean  2843.30
2          7        AGI_0m          5    bathy_mean   355.65
3         12    AGI_0m_ann         10   AGI_0m_seas   197.02
4         10   AGI_0m_seas          8      AGI_250m   181.68
5         13  AGI_250m_ann         12    AGI_0m_ann   165.05
6         11 AGI_250m_seas          8      AGI_250m   149.59
7         10   AGI_0m_seas          2     temp_mean   136.02
8         13  AGI_250m_ann         11 AGI_250m_seas   122.24
[1] "External percent deviance explained"
[1] 0.4999456

[1] "TPR"
[1] 0.7155658
[1] "TSS"
[1] 0.7086853
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5450 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.328907 0.7586342 0.9325364  1.007354         0.4999456 0.5471693
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_dail_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6163731
Correlation        0.8037426
AUC                0.9552000
Per.Expl          55.5378474
cvDeviance         0.8577154
cvCorrelation      0.6617449
cvAUC              0.8786300
cvPer.Expl        38.1285877
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.645183
o2_mean_250m_ann  17.206276
o2_mean_0m_seas   10.345249
temp_mean          7.771990
o2_mean_250m_seas  7.199949
bathy_mean         7.171107
sal_mean           6.275883
chl_mean           5.065733
o2_mean_0m_ann     4.840965
o2_mean_250m       4.413205
ssh_mean           4.028459
mld_mean           3.069241
bathy_sd           2.700490
pred_var           2.266270
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   510.24
2          13    o2_mean_0m_ann          3    temp_mean   283.65
3          12 o2_mean_250m_seas          4     sal_mean   254.65
4          11   o2_mean_0m_seas          1   o2_mean_0m   212.12
5           4          sal_mean          3    temp_mean   189.20
6          11   o2_mean_0m_seas          5     ssh_mean   181.69
7           7        bathy_mean          3    temp_mean   158.82
8          13    o2_mean_0m_ann          4     sal_mean   151.35
9          12 o2_mean_250m_seas          9 o2_mean_250m   148.24
10          5          ssh_mean          4     sal_mean   132.70
[1] "External percent deviance explained"
[1] 0.5114889

[1] "TPR"
[1] 0.7177333
[1] "TSS"
[1] 0.7216508
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3235065 0.7685859 0.9368822  1.002484         0.5114889 0.5553785
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862896
Residual.Deviance  0.2010998
Correlation        0.9542854
AUC                0.9970000
Per.Expl          85.4936623
cvDeviance         0.4550922
cvCorrelation      0.8583417
cvAUC              0.9666300
cvPer.Expl        67.1719242
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean    30.944500
temp_mean     20.937253
AGI_250m_seas 10.045099
AGI_0m         7.256943
ssh_mean       6.121208
AGI_250m_ann   4.606362
sal_mean       4.514486
AGI_0m_seas    4.190922
chl_mean       3.030145
AGI_250m       2.700179
bathy_sd       2.167797
AGI_0m_ann     2.088657
mld_mean       1.396450
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index  var2.names int.size
1          8        AGI_0m          2   temp_mean  4136.97
2          6    bathy_mean          3    sal_mean   529.40
3          3      sal_mean          2   temp_mean   284.85
4         13  AGI_250m_ann          3    sal_mean   275.46
5         11 AGI_250m_seas          6  bathy_mean   272.17
6          8        AGI_0m          4    ssh_mean   255.06
7         13  AGI_250m_ann         12  AGI_0m_ann   244.55
8         12    AGI_0m_ann         10 AGI_0m_seas   213.62
[1] "External percent deviance explained"
[1] 0.8167617

[1] "TPR"
[1] 0.7449762
[1] "TSS"
[1] 0.9286281
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1797948 0.9348405 0.9907361 0.9974215         0.8167617 0.8549366
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862943
Residual.Deviance  0.2038413
Correlation        0.9528997
AUC                0.9971000
Per.Expl          85.2959607
cvDeviance         0.4624112
cvCorrelation      0.8532735
cvAUC              0.9663100
cvPer.Expl        66.6440829
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        27.1431098
o2_mean_0m        21.2576581
o2_mean_250m_seas 14.6524628
o2_mean_0m_seas   10.3343085
o2_mean_250m_ann   3.7807478
chl_mean           3.5499333
temp_mean          3.4191758
o2_mean_250m       3.3145643
o2_mean_0m_ann     2.9473000
sal_mean           2.8866095
ssh_mean           2.6457934
bathy_sd           1.7336484
mld_mean           1.5223918
pred_var           0.8122965
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index var2.names int.size
1           3         temp_mean          1 o2_mean_0m   550.04
2          12 o2_mean_250m_seas          4   sal_mean   330.52
3           5          ssh_mean          4   sal_mean   277.30
4           7        bathy_mean          2   chl_mean   228.67
5          13    o2_mean_0m_ann          3  temp_mean   217.07
6           7        bathy_mean          3  temp_mean   200.66
7          11   o2_mean_0m_seas          5   ssh_mean   200.53
8           2          chl_mean          1 o2_mean_0m   197.61
9           7        bathy_mean          5   ssh_mean   180.78
10         14  o2_mean_250m_ann          7 bathy_mean   176.73
[1] "External percent deviance explained"
[1] 0.8042935

[1] "TPR"
[1] 0.7442145
[1] "TSS"
[1] 0.9193297
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1882474 0.9277648 0.9891198 0.9997909         0.8042935 0.8529596

Summary table of model performance metrics

model percent_explained deviance_exp TPR_mean TSS AUC RMSE SpearmanCor PseudoR2
base_Nwind_crw 36.455 0.349 0.683 0.556 0.867 0.385 0.640 0.365
base_Nwind_back 77.752 0.732 0.740 0.876 0.981 0.227 0.894 0.778
agi_alldep_allres_crw 57.743 0.526 0.719 0.730 0.940 0.318 0.777 0.577
do_alldep_allres_crw 57.504 0.529 0.720 0.732 0.942 0.317 0.780 0.575
agi_alldep_allres_back 86.227 0.814 0.745 0.924 0.990 0.183 0.932 0.862
do_alldep_allres_back 86.983 0.831 0.746 0.933 0.993 0.173 0.939 0.869
do_agi_comb 54.914 0.509 0.717 0.711 0.936 0.324 0.766 0.549
do_agi_ensemble NA 0.458 NA 0.705 0.931 NA NA NA
agi_0m_250m_dail_seas_ann_crw 55.077 0.503 0.716 0.703 0.932 0.328 0.760 0.551
do_0m_250m_dail_seas_ann_crw 55.538 0.511 0.718 0.722 0.937 0.324 0.769 0.555
agi_0m_250m_dail_seas_ann_back 81.177 0.778 0.743 0.902 0.987 0.204 0.915 0.812
do_0m_250m_dail_seas_ann_back 82.701 0.780 0.743 0.902 0.986 0.202 0.916 0.827

Conclusions

  • Based on the above comparisons between models using PAs from CRW and background sampling approaches, I have decided to move forward using the CRW PA models. The background sampling models consistently indicated that they were overfit, and thus the CRW models are the preferred choice.

  • Given the similar values among performance metrics between the overfit models (those with all depth levels and temporal resolutions) and the models without data at 60m, we have selected the latter to be the final study models. We explored similar models without data at 60m and at a seasonal temporal resolution, but these models performed considerably worse. Additionally, DO and AGI at 0m and 250m at a seasonal resolution were often a predictor variable with a top 5 highest relative importance, thus, keeping data at a seasonal resolution seemed valuable.

Final models w/o random predictor variables

explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_base_0m_dail_no_wind.rds",
            test_data = base_test_daily_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.8160352
Correlation        0.7038050
AUC                0.9065000
Per.Expl          41.1349935
cvDeviance         0.9973663
cvCorrelation      0.5753416
cvAUC              0.8287700
cvPer.Expl        28.0546057
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 30.831647
temp_mean  22.940454
sal_mean   12.777564
chl_mean   10.523532
bathy_sd    7.978901
ssh_mean    7.803239
mld_mean    7.144664
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          6 bathy_mean          2  temp_mean   730.22
2          4   ssh_mean          2  temp_mean   560.59
[1] "External percent deviance explained"
[1] 0.3774799

[1] "TPR"
[1] 0.6925418
[1] "TSS"
[1] 0.5960813
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 7 predictors of which 7 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3740063 0.6699421 0.8864212  1.002138         0.3774799 0.4113499
explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6004545
Correlation        0.8099189
AUC                0.9575000
Per.Expl          56.6861326
cvDeviance         0.8423891
cvCorrelation      0.6697309
cvAUC              0.8832400
cvPer.Expl        39.2341516
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.027705
o2_mean_250m_ann  15.896634
o2_mean_0m_seas    9.384843
temp_mean          7.043452
o2_mean_250m_seas  6.239946
o2_mean_60m_ann    5.633626
bathy_mean         5.169538
o2_mean_60m_seas   5.037047
sal_mean           4.915712
o2_mean_250m       4.514210
chl_mean           4.181528
o2_mean_0m_ann     3.503057
o2_mean_60m        3.087269
ssh_mean           3.035695
mld_mean           2.872809
bathy_sd           2.456927
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   421.52
2          11   o2_mean_0m_seas          1   o2_mean_0m   256.74
3          14    o2_mean_0m_ann          4     sal_mean   210.40
4          13 o2_mean_250m_seas         10 o2_mean_250m   190.15
5          11   o2_mean_0m_seas          5     ssh_mean   144.83
6           7        bathy_mean          3    temp_mean   138.68
7           5          ssh_mean          4     sal_mean   124.56
8           4          sal_mean          3    temp_mean   115.46
9          11   o2_mean_0m_seas          4     sal_mean   107.40
10         15   o2_mean_60m_ann          7   bathy_mean   105.21
11          9       o2_mean_60m          7   bathy_mean   105.03
12         12  o2_mean_60m_seas          9  o2_mean_60m    94.94
13          9       o2_mean_60m          8     bathy_sd    94.79
[1] "External percent deviance explained"
[1] 0.5236399

[1] "TPR"
[1] 0.7194688
[1] "TSS"
[1] 0.7276229
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3188455 0.7760116 0.9403423  1.001684         0.5236399 0.5668613
explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.5858010
Correlation        0.8161710
AUC                0.9602000
Per.Expl          57.7432667
cvDeviance         0.8309178
cvCorrelation      0.6751996
cvAUC              0.8859900
cvPer.Expl        40.0617726
[1] "Relative influence of predictor variables"

                rel.inf
temp_mean     13.808659
AGI_250m_ann  13.248314
AGI_0m        11.562619
bathy_mean     8.434040
AGI_0m_seas    7.508202
sal_mean       5.742767
AGI_250m_seas  5.436860
AGI_60m_ann    4.943205
AGI_60m_seas   4.543656
AGI_0m_ann     4.409229
AGI_250m       4.252699
ssh_mean       4.180552
chl_mean       3.804112
bathy_sd       2.999297
mld_mean       2.623939
AGI_60m        2.501850
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  1760.25
2          15   AGI_60m_ann         11 AGI_0m_seas   315.74
3           8        AGI_0m          6  bathy_mean   235.87
4          11   AGI_0m_seas         10    AGI_250m   189.96
5          16  AGI_250m_ann         14  AGI_0m_ann   183.28
6           8        AGI_0m          4    ssh_mean   145.41
7           8        AGI_0m          3    sal_mean   141.05
8          13 AGI_250m_seas         10    AGI_250m   109.75
9          14    AGI_0m_ann         11 AGI_0m_seas   106.67
10          9       AGI_60m          6  bathy_mean    98.24
11         12  AGI_60m_seas          4    ssh_mean    96.58
12         14    AGI_0m_ann          7    bathy_sd    92.19
13         15   AGI_60m_ann          4    ssh_mean    73.03
[1] "External percent deviance explained"
[1] 0.5261534

[1] "TPR"
[1] 0.719393
[1] "TSS"
[1] 0.7302788
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5750 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE     Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3183118 0.77655 0.940186  1.009582         0.5261534 0.5774327
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6216714
Correlation        0.7997202
AUC                0.9530000
Per.Expl          55.1556520
cvDeviance         0.8550097
cvCorrelation      0.6633890
cvAUC              0.8795600
cvPer.Expl        38.3237656
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        18.096412
o2_mean_250m_ann  17.371934
o2_mean_0m_seas   10.555409
o2_mean_250m_seas  8.392439
temp_mean          8.008430
sal_mean           6.610190
bathy_mean         6.609148
chl_mean           5.117774
o2_mean_0m_ann     4.922417
o2_mean_250m       4.500747
ssh_mean           3.912119
mld_mean           3.194189
bathy_sd           2.708790
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index        var1.names var2.index   var2.names int.size
1          3         temp_mean          1   o2_mean_0m   666.21
2         10   o2_mean_0m_seas          1   o2_mean_0m   368.62
3         11 o2_mean_250m_seas          4     sal_mean   296.11
4         12    o2_mean_0m_ann          3    temp_mean   248.07
5         11 o2_mean_250m_seas          9 o2_mean_250m   221.72
6          4          sal_mean          3    temp_mean   220.04
7         10   o2_mean_0m_seas          5     ssh_mean   188.99
8          7        bathy_mean          3    temp_mean   180.12
[1] "External percent deviance explained"
[1] 0.5113989

[1] "TPR"
[1] 0.7174051
[1] "TSS"
[1] 0.712829
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3240668 0.7672517 0.9361937  1.003678         0.5113989 0.5515565
explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6227688
Correlation        0.7983866
AUC                0.9524000
Per.Expl          55.0765938
cvDeviance         0.8514835
cvCorrelation      0.6647368
cvAUC              0.8804000
cvPer.Expl        38.5782656
[1] "Relative influence of predictor variables"

                rel.inf
AGI_250m_ann  15.624829
temp_mean     14.313855
AGI_0m        12.578555
bathy_mean    10.022586
AGI_0m_seas    8.257366
sal_mean       7.104914
AGI_250m_seas  5.966422
ssh_mean       5.289787
AGI_0m_ann     4.800005
AGI_250m       4.606867
chl_mean       4.471600
bathy_sd       4.007598
mld_mean       2.955618
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index  var2.names int.size
1          8        AGI_0m          2   temp_mean  2100.42
2         10   AGI_0m_seas          9    AGI_250m   333.68
3          8        AGI_0m          6  bathy_mean   264.37
4         11 AGI_250m_seas          9    AGI_250m   150.53
5          8        AGI_0m          4    ssh_mean   145.22
6         12    AGI_0m_ann         10 AGI_0m_seas   128.88
7         13  AGI_250m_ann         12  AGI_0m_ann   127.32
8         11 AGI_250m_seas          3    sal_mean   106.97
[1] "External percent deviance explained"
[1] 0.5028644

[1] "TPR"
[1] 0.7155138
[1] "TSS"
[1] 0.7032217
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE      Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3280184 0.759673 0.932425  1.008251         0.5028644 0.5507659

HSI maps

These dates run from January to December of each year, but I am happy to modify this if there are better ways to categorize/plot ENSO phases.

Neutral year (2013)

Strong La Niña year (-1.5 to -1.9 temperature anomaly; 2011)

Moderate/strong El Niño year (1.5 to 1.9 temperature anomaly; 2009)

Base model

AGI model

DO model

Neutral year (2013)

Strong La Niña year (-1.5 to -1.9 temperature anomaly; 2011)

Moderate/strong El Niño year (1.5 to 1.9 temperature anomaly; 2009)

HSI difference maps

Neutral year (2013)

Strong La Niña year (-1.5 to -1.9 temperature anomaly; 2011)

Moderate/strong El Niño year (1.5 to 1.9 temperature anomaly; 2009)

AGI maps

Winter

Spring

Summer

Fall

Neutral year (2013)

Strong La Niña year (-1.5 to -1.9 temperature anomaly; 2011)

Moderate/strong El Niño year (1.5 to 1.9 temperature anomaly; 2009)

Model evaluation

Predictor variable relative importance

Base model

DO model

AGI model

DO, AGI model

Partial plots

DO model

0m

250m

AGI model

0m

250m

Performance metric comparisons

20 iterations were performed for each model type. Then an anova and post-hoc Tukey tests were run to evaluate the pairwise comparisons. Letters represent relative significance.