We will attempt to disentangle the complex allometric and biogeographic relationships between body size (volume), latitude, and various other traits in salamanders.
These relationships are illustrated below.
From this, we can come up with a series of simple linear models, excluding any interaction terms.
1) Vol~Gen
2) Vol~Area
3) Vol~Lat
4) Gen~MolRate
5) MolRate~DR
6) MolRate~Lat
7) DR~Lat
8) DR~Area
9) Area~Lat
Load the Data
setwd('~/Dropbox/Phylo_Comp_Meth/')
tree <- read.nexus('caud.ultra.nexus')
dat <- read.csv('Caudata_PhyloComp_Data.csv')
rownames(dat) <- dat$Species
dat <- dat[,c(1,8,10,13,14,15,16)]
dat <- na.omit(dat)
# Because these are raw data, we need to do a few things first. This includes converting latitude to absolute value (we are interested in the effects of distance from the equator), and normalizing non-normal data.
# Convert latitude to absolute latitude.
dat$lat <- abs(dat$lat)
# Lets look at the normality of our data.
hist(dat$cvalue.quant)
hist(dat$area)
hist(dat$DR)
hist(dat$Abs.Mol.Rate)
hist(dat$Volume)
# All but rate of molecular evolution clearly need to be log transformed. SO lets test the
# normality of Abs.Mol.Rate pre and post transformation.
shapiro.test(dat$Abs.Mol.Rate)
##
## Shapiro-Wilk normality test
##
## data: dat$Abs.Mol.Rate
## W = 0.95072, p-value = 0.0001066
shapiro.test(log(dat$Abs.Mol.Rate))
##
## Shapiro-Wilk normality test
##
## data: log(dat$Abs.Mol.Rate)
## W = 0.96702, p-value = 0.002573
# Both are non-normal, but the log transformed is less non-normal. So let's go with that.
dat$cvalue.quant <- log(dat$cvalue.quant)
dat$area <- log(dat$area)
dat$DR <- log(dat$DR)
dat$Abs.Mol.Rate <- log(dat$Abs.Mol.Rate)
dat$Volume <- log(dat$Volume)
#Compare tree to data and prute the tree such that it only include the necessary taxa.
match <- name.check(tree, dat)
tree <- drop.tip(tree, match$tree_not_data)
# Let's see what our tree looks like...
plot(tree, show.tip.label = F)
tree
##
## Phylogenetic tree with 133 tips and 132 internal nodes.
##
## Tip labels:
## Cryptobranchus_alleganiensis, Andrias_davidianus, Salamandrella_keyserlingii, Ranodon_sibiricus, Hynobius_nebulosus, Hynobius_dunni, ...
##
## Rooted; includes branch lengths.
Obviously this does not lend well to model comparison. These are explorative at best. Down the line, I intend to use structural equation modelling to get at a greater understanding of correlation versus causation in this greater model, but for the present we can think of two distinct models. For the present, we will run simple pgls for these models and retain for later model comparison.
Throughout, I will run both the ordinary linear model and the pgls so we can compare regressions. PGLS will be in blue, lm in red.
colnames(dat) <- c('species','cval','area','lat','DR', 'MR', 'vol')
# First the linear models...
Vol.Gen.lm <- lm(vol~cval, data = dat)
Vol.Area.lm <- lm(area~vol, data = dat)
Vol.Lat.lm <- lm(vol~lat, data = dat)
Gen.MR.lm <- lm(cval~MR, data = dat)
MR.DR.lm <- lm(DR~MR, data = dat)
MR.Lat.lm <- lm(MR~lat, data = dat)
DR.Lat.lm <- lm(DR~lat, data = dat)
DR.Area.lm <- lm(DR~area, data = dat)
Area.Lat.lm <- lm(area~lat, data = dat)
# Now the PGLS. We will ensure that we use ML rather than REML to facilitate model comparison.
Vol.Gen.pgls <- gls(vol~cval, correlation = corBrownian(phy=tree), data = dat, method='ML')
Vol.Area.pgls <- gls(area~vol, correlation = corBrownian(phy=tree), data = dat, method='ML')
Vol.Lat.pgls <- gls(vol~lat, correlation = corBrownian(phy=tree), data = dat, method='ML')
Gen.MR.pgls <- gls(cval~MR, correlation = corBrownian(phy=tree), data = dat, method='ML')
MR.DR.pgls <- gls(DR~MR, correlation = corBrownian(phy=tree), data = dat, method='ML')
MR.Lat.pgls <- gls(MR~lat, correlation = corBrownian(phy=tree), data = dat, method='ML')
DR.Lat.pgls <- gls(DR~lat, correlation = corBrownian(phy=tree), data = dat, method='ML')
DR.Area.pgls <- gls(DR~area, correlation = corBrownian(phy=tree), data = dat, method='ML')
Area.Lat.pgls <- gls(area~lat, correlation = corBrownian(phy=tree), data = dat, method='ML')
# Now let's see what these look like.
##### 1) Vol~Gen
##### 2) Vol~Area
##### 3) Vol~Lat
##### 4) Gen~MolRate
##### 5) MolRate~DR
##### 6) MolRate~Lat
##### 7) DR~Lat
##### 8) DR~Area
##### 9) Area~Lat
plot(vol~cval, data=dat)
abline(Vol.Gen.lm, col='red')
abline(Vol.Gen.pgls, col='blue')
summary(Vol.Gen.pgls)
## Generalized least squares fit by maximum likelihood
## Model: vol ~ cval
## Data: dat
## AIC BIC logLik
## 438.1277 446.7988 -216.0639
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.702981 1.9855909 1.864926 0.0644
## cval 1.482467 0.4768155 3.109100 0.0023
##
## Correlation:
## (Intr)
## cval -0.884
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.77118669 -0.93082853 -0.56388515 -0.09841665 1.59479353
##
## Residual standard error: 2.3909
## Degrees of freedom: 133 total; 131 residual
plot(area~vol, data=dat)
abline(Vol.Area.lm, col='red')
abline(Vol.Area.pgls, col='blue')
summary(Vol.Area.pgls)
## Generalized least squares fit by maximum likelihood
## Model: area ~ vol
## Data: dat
## AIC BIC logLik
## 668.9808 677.6519 -331.4904
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -2.8778438 2.8774037 -1.000153 0.3191
## vol 0.4976781 0.2008239 2.478182 0.0145
##
## Correlation:
## (Intr)
## vol -0.639
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.54326721 -0.47784173 -0.05422871 0.23141811 1.12375301
##
## Residual standard error: 5.694716
## Degrees of freedom: 133 total; 131 residual
plot(vol~lat, data=dat)
abline(Vol.Lat.lm, col='red')
abline(Vol.Lat.pgls, col='blue')
summary(Vol.Lat.pgls)
## Generalized least squares fit by maximum likelihood
## Model: vol ~ lat
## Data: dat
## AIC BIC logLik
## 447.3964 456.0675 -220.6982
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 9.450307 1.1642467 8.117100 0.0000
## lat -0.007869 0.0177148 -0.444194 0.6576
##
## Correlation:
## (Intr)
## lat -0.563
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.0466049 -0.9801381 -0.6424292 -0.2563396 1.6825403
##
## Residual standard error: 2.475679
## Degrees of freedom: 133 total; 131 residual
plot(cval~MR, data=dat)
abline(Gen.MR.lm, col='red')
abline(Gen.MR.pgls, col='blue')
summary(Gen.MR.pgls)
## Generalized least squares fit by maximum likelihood
## Model: cval ~ MR
## Data: dat
## AIC BIC logLik
## -14.28105 -5.610002 10.14052
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 3.534655 0.22345857 15.817945 0.0000
## MR -0.022334 0.02230589 -1.001274 0.3185
##
## Correlation:
## (Intr)
## MR 0.651
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.3885639 -0.9842833 -0.4754154 0.3081468 2.5437017
##
## Residual standard error: 0.4364354
## Degrees of freedom: 133 total; 131 residual
plot(DR~MR, data=dat)
abline(MR.DR.lm, col='red')
abline(MR.DR.pgls, col='blue')
summary(MR.DR.pgls)
## Generalized least squares fit by maximum likelihood
## Model: DR ~ MR
## Data: dat
## AIC BIC logLik
## 310.5736 319.2446 -152.2868
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -6.636173 0.7578489 -8.756592 0
## MR -0.396813 0.0756493 -5.245428 0
##
## Correlation:
## (Intr)
## MR 0.651
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.6481143 0.1336305 0.4912431 0.8537044 2.6563260
##
## Residual standard error: 1.480149
## Degrees of freedom: 133 total; 131 residual
plot(MR~lat, data=dat)
abline(MR.Lat.lm, col='red')
abline(MR.Lat.pgls, col='blue')
summary(MR.Lat.pgls)
## Generalized least squares fit by maximum likelihood
## Model: MR ~ lat
## Data: dat
## AIC BIC logLik
## 345.6096 354.2806 -169.8048
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -5.714699 0.7940706 -7.196714 0.0000
## lat -0.021854 0.0120824 -1.808750 0.0728
##
## Correlation:
## (Intr)
## lat -0.563
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.0757156 -0.1394858 0.1452571 0.2931376 0.9326615
##
## Residual standard error: 1.688528
## Degrees of freedom: 133 total; 131 residual
plot(DR~lat, data=dat)
abline(DR.Lat.lm, col='red')
abline(DR.Lat.pgls, col='blue')
summary(DR.Lat.pgls)
## Generalized least squares fit by maximum likelihood
## Model: DR ~ lat
## Data: dat
## AIC BIC logLik
## 335.3764 344.0474 -164.6882
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -3.729326 0.7641022 -4.880663 0.0000
## lat -0.008593 0.0116264 -0.739111 0.4612
##
## Correlation:
## (Intr)
## lat -0.563
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.5477286 0.1463595 0.3740209 0.6096627 2.4536630
##
## Residual standard error: 1.624803
## Degrees of freedom: 133 total; 131 residual
plot(DR~area, data=dat)
abline(DR.Area.lm, col='red')
abline(DR.Area.pgls, col='blue')
summary(DR.Area.pgls)
## Generalized least squares fit by maximum likelihood
## Model: DR ~ area
## Data: dat
## AIC BIC logLik
## 331.6317 340.3027 -162.8158
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -4.131193 0.6238188 -6.622426 0.00
## area 0.049831 0.0240232 2.074300 0.04
##
## Correlation:
## (Intr)
## area -0.065
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.5639186 0.1795366 0.4335469 0.7344387 2.7302393
##
## Residual standard error: 1.602089
## Degrees of freedom: 133 total; 131 residual
plot(area~lat, data=dat)
abline(Area.Lat.lm, col='red')
abline(Area.Lat.pgls, col='blue')
summary(Area.Lat.pgls)
## Generalized least squares fit by maximum likelihood
## Model: area ~ lat
## Data: dat
## AIC BIC logLik
## 650.2476 658.9186 -322.1238
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -5.609738 2.4959580 -2.247529 0.0263
## lat 0.196916 0.0379778 5.185030 0.0000
##
## Correlation:
## (Intr)
## lat -0.563
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.30414490 -0.39983241 -0.04779661 0.26221792 0.85026999
##
## Residual standard error: 5.307457
## Degrees of freedom: 133 total; 131 residual
Now for fun, let’s try some phylogenetic path analysis
library(phylopath)
PathMods <- list(
Mod1a = DAG(vol ~ cval, area ~ vol, vol ~ lat, cval ~ MR, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod1b = DAG(vol ~ cval, area ~ vol, cval ~ MR, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod1c = DAG(vol ~ cval, area ~ vol, vol ~ lat, cval ~ MR, DR ~ MR, DR ~ lat, DR ~ area, area ~ lat),
Mod2a = DAG(vol ~ cval, vol ~ area, vol ~ lat, cval ~ MR, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod2b = DAG(vol ~ cval, vol ~ area, vol ~ lat, cval ~ MR, MR ~ DR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod3a = DAG(vol ~ cval, vol ~ area, vol ~ lat, cval ~ MR, MR ~ DR, MR ~ lat, DR ~ lat, area ~ DR, area ~ lat),
Mod3b = DAG(vol ~ cval, area ~ vol, vol ~ lat, cval ~ MR, MR ~ DR, MR ~ lat, DR ~ lat, area ~ DR, area ~ lat),
Mod4 = DAG(vol ~ cval, area ~ vol, vol ~ lat, MR ~ cval, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat)
)
set1 <- list(
Mod1a = DAG(vol ~ cval, area ~ vol, vol ~ lat, cval ~ MR, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod1b = DAG(vol ~ cval, area ~ vol, cval ~ MR, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod1c = DAG(vol ~ cval, area ~ vol, vol ~ lat, cval ~ MR, DR ~ MR, DR ~ lat, DR ~ area, area ~ lat)
)
set2 <- list(
Mod2a = DAG(vol ~ cval, vol ~ area, vol ~ lat, cval ~ MR, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat),
Mod2b = DAG(vol ~ cval, vol ~ area, vol ~ lat, cval ~ MR, MR ~ DR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat)
)
set3 <- list(
Mod3a = DAG(vol ~ cval, vol ~ area, vol ~ lat, cval ~ MR, MR ~ DR, MR ~ lat, DR ~ lat, area ~ DR, area ~ lat),
Mod3b = DAG(vol ~ cval, area ~ vol, vol ~ lat, cval ~ MR, MR ~ DR, MR ~ lat, DR ~ lat, area ~ DR, area ~ lat),
Mod4 = DAG(vol ~ cval, area ~ vol, vol ~ lat, MR ~ cval, DR ~ MR, MR ~ lat, DR ~ lat, DR ~ area, area ~ lat)
)
plot_model_set(set1)
plot_model_set(set2)
plot_model_set(set3)
result.brown <- phylo_path(models = PathMods, data=dat, tree=tree, cor_fun = ape::corBrownian)
result.pagel <- phylo_path(models = PathMods, data=dat, tree=tree, cor_fun = ape::corPagel)
result.brown
##
## A phylogenetic path analysis
##
## Evaluated for these models: Mod1a Mod1b Mod1c Mod2a Mod2b Mod3a Mod3b Mod4
##
## Containing 50 phylogenetic regressions.
summary(result.brown)
## model k q C p CICc delta_CICc l w
## 1 Mod3a 6 15 30.844 0.002 64.947 0.000 1.000 0.306
## 2 Mod2b 6 15 30.868 0.002 64.970 0.023 0.988 0.302
## 3 Mod1b 7 14 34.831 0.002 66.390 1.443 0.486 0.149
## 4 Mod4 6 15 33.823 0.001 67.925 2.978 0.226 0.069
## 5 Mod3b 6 15 33.925 0.001 68.028 3.081 0.214 0.066
## 6 Mod1a 6 15 34.221 0.001 68.324 3.377 0.185 0.057
## 7 Mod2a 6 15 35.060 0.000 69.163 4.216 0.121 0.037
## 8 Mod1c 7 14 39.462 0.000 71.021 6.074 0.048 0.015
best_mod.brown <- best(result.brown)
best_mod.brown
## $coef
## lat DR MR cval area vol
## lat 0 -0.1410776 -0.5277020 0.00000000 0.750642 -0.2119764
## DR 0 0.0000000 -0.5622039 0.00000000 0.165660 0.0000000
## MR 0 0.0000000 0.0000000 -0.03243514 0.000000 0.0000000
## cval 0 0.0000000 0.0000000 0.00000000 0.000000 0.3621468
## area 0 0.0000000 0.0000000 0.00000000 0.000000 0.2299315
## vol 0 0.0000000 0.0000000 0.00000000 0.000000 0.0000000
##
## $se
## lat DR MR cval area vol
## lat 0 0.1908748 0.2247942 0.00000000 0.13746329 0.14012671
## DR 0 0.0000000 0.1026827 0.00000000 0.06279121 0.00000000
## MR 0 0.0000000 0.0000000 0.03239386 0.00000000 0.00000000
## cval 0 0.0000000 0.0000000 0.00000000 0.00000000 0.11942870
## area 0 0.0000000 0.0000000 0.00000000 0.00000000 0.07947117
## vol 0 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
##
## $lower
## lat DR MR cval area vol
## lat 0 -0.5186735 -0.9724304 0.00000000 0.47868729 -0.48922049
## DR 0 0.0000000 -0.7653493 0.00000000 0.04143506 0.00000000
## MR 0 0.0000000 0.0000000 -0.09651792 0.00000000 0.00000000
## cval 0 0.0000000 0.0000000 0.00000000 0.00000000 0.12585420
## area 0 0.0000000 0.0000000 0.00000000 0.00000000 0.07269584
## vol 0 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
##
## $upper
## lat DR MR cval area vol
## lat 0 0.2365183 -0.08297363 0.00000000 1.0225967 0.06526777
## DR 0 0.0000000 -0.35905855 0.00000000 0.2898848 0.00000000
## MR 0 0.0000000 0.00000000 0.03164765 0.0000000 0.00000000
## cval 0 0.0000000 0.00000000 0.00000000 0.0000000 0.59843942
## area 0 0.0000000 0.00000000 0.00000000 0.0000000 0.38716716
## vol 0 0.0000000 0.00000000 0.00000000 0.0000000 0.00000000
##
## attr(,"class")
## [1] "fitted_DAG"
average_mod.brown_full <- average(result.brown, method = "full")
plot(best_mod.brown, algorithm='mds', curvature = -0.1)
plot(average_mod.brown_full, algorithm='lgl', curvature = 0.2)
result.pagel
##
## A phylogenetic path analysis
##
## Evaluated for these models: Mod1a Mod1b Mod1c Mod2a Mod2b Mod3a Mod3b Mod4
##
## Containing 50 phylogenetic regressions.
summary(result.pagel)
## model k q C p CICc delta_CICc l w
## 1 Mod1b 7 14 12.131 0.596 43.690 0.000 1.000 0.211
## 2 Mod2a 6 15 9.678 0.644 43.781 0.091 0.956 0.202
## 3 Mod3a 6 15 10.096 0.608 44.198 0.508 0.776 0.164
## 4 Mod2b 6 15 10.162 0.602 44.264 0.574 0.751 0.158
## 5 Mod4 6 15 11.072 0.523 45.175 1.484 0.476 0.101
## 6 Mod1a 6 15 11.600 0.478 45.702 2.012 0.366 0.077
## 7 Mod3b 6 15 12.466 0.409 46.569 2.878 0.237 0.050
## 8 Mod1c 7 14 15.617 0.337 47.177 3.486 0.175 0.037
best_mod.pagel <- best(result.pagel)
best_mod.pagel
## $coef
## lat MR cval vol area DR
## lat 0 -0.206108 0.00000000 0.0000000 0.6786896 -0.18485901
## MR 0 0.000000 -0.03791029 0.0000000 0.0000000 0.07298753
## cval 0 0.000000 0.00000000 0.3784385 0.0000000 0.00000000
## vol 0 0.000000 0.00000000 0.0000000 0.2163503 0.00000000
## area 0 0.000000 0.00000000 0.0000000 0.0000000 -0.07305495
## DR 0 0.000000 0.00000000 0.0000000 0.0000000 0.00000000
##
## $se
## lat MR cval vol area DR
## lat 0 0.1367411 0.000000e+00 0.0000000 0.09978369 0.14912200
## MR 0 0.0000000 9.150696e-07 0.0000000 0.00000000 0.07260799
## cval 0 0.0000000 0.000000e+00 0.1055994 0.00000000 0.00000000
## vol 0 0.0000000 0.000000e+00 0.0000000 0.07476097 0.00000000
## area 0 0.0000000 0.000000e+00 0.0000000 0.00000000 0.10039117
## DR 0 0.0000000 0.000000e+00 0.0000000 0.00000000 0.00000000
##
## $lower
## lat MR cval vol area DR
## lat 0 -0.4766145 0.0000000 0.0000000 0.48127954 -0.47990053
## MR 0 0.0000000 -0.0379121 0.0000000 0.00000000 -0.07066915
## cval 0 0.0000000 0.0000000 0.1695377 0.00000000 0.00000000
## vol 0 0.0000000 0.0000000 0.0000000 0.06844466 0.00000000
## area 0 0.0000000 0.0000000 0.0000000 0.00000000 -0.27168134
## DR 0 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000
##
## $upper
## lat MR cval vol area DR
## lat 0 0.06439851 0.00000000 0.0000000 0.8760997 0.1101825
## MR 0 0.00000000 -0.03790848 0.0000000 0.0000000 0.2166442
## cval 0 0.00000000 0.00000000 0.5873393 0.0000000 0.0000000
## vol 0 0.00000000 0.00000000 0.0000000 0.3642559 0.0000000
## area 0 0.00000000 0.00000000 0.0000000 0.0000000 0.1255714
## DR 0 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
##
## attr(,"class")
## [1] "fitted_DAG"
average_mod.pagel_full <- average(result.pagel, method = "full")
plot(best_mod.pagel, algorithm='lgl', curvature = 0.1)
plot(average_mod.pagel_full, algorithm='lgl', curvature = 0.1)
First, the latitudinal, biogeographic model. Here, we consider the macrobiogeographical relationships observed between latitude and body size (Bergmann’s Rule) range size (Rapoport’s Rule), and diversification rate (latitudinal diversity gradient). Included in this is the body-size x range-size relationship, a well documented pattern. These are represented below.
1) Lat~Vol
2) Lat~Area
3) Lat~DR
4) Lat~Vol*Area
Our second set of analyses will be those in which we assess allometric relationships between body size and genome size
1) Vol~Gen
2) Vol~MolRate
3) Vol~Gen*MolRate