This document describes analyses and preliminary results of avian point count surveys across a successional gradient at the Monta Verde Cloud Forest in Costa Rica.
#### REQUIRED LIBRARIES ####
library(dplyr)
library(ggplot2)
library(lme4)
library(AICcmodavg)
library(ggeffects)
library(viridis)
library(tidyr)
Load a custom function that will help us combine initial migratory coding into bigger bins.
combineMigCats <- Vectorize(vectorize.args = "x",
FUN = function(x) {
switch(as.character(x),
"altitudinal" = "mig",
"long-distance" = "mig",
"short-distance" = "mig",
"nomadic" = "mig",
"resident" = "res",
"NA" = NA)})
combineGuildCats <- Vectorize(vectorize.args = "x",
FUN = function(x) {
switch(as.character(x),
"carnivore" = "specialist",
"frugivore" = "specialist",
"granivore" = "specialist",
"herbivore" = "specialist",
"insectivore" = "specialist",
"nectarivore" = "specialist",
"nectarivore (omnivore?)" = "omnivore",
"omnivore" = "omnivore",
"scavenger" = "omnivore")})
First load all the point count data and combine into a single data frame.
#load site data
nf4_2011 <- read.csv("../data_by_site/NF4-2011.csv")
nf1_2003 <- read.csv("../data_by_site/NF1-2003.csv")
nf2 <- read.csv("../data_by_site/NF2.csv")
mf1 <- read.csv("../data_by_site/MF1.csv")
mf2 <- read.csv("../data_by_site/MF2.csv")
nf6_2011 <- read.csv("../data_by_site/NF6-2011.csv")
p1 <- read.csv("../data_by_site/P1.csv")
p2 <- read.csv("../data_by_site/P2.csv")
nf3nf5_2008 <- read.csv("../data_by_site/NF3NF5-2008.csv")
#excluded Finca site from analysis
#nf7ab_2008 <- read.csv("../data_by_site/NF7AB-2008.csv")
#combine into single data object
dat <- rbind(nf4_2011, nf1_2003, nf2, mf1, mf2, nf6_2011, p1, p2, nf3nf5_2008)
Load the datasheet containing key to codes.
codes <- read.csv("../data_by_site/CODES.csv")
Load in the tree data:
trees <- read.csv("../trees_nofinca_121219.csv")
#set factor order
trees$treatment <- factor(trees$treatment, levels = c("0", "2011", "2008", "2003", "old"))
Load in densiometer data (values represent open “dots”):
dens <- read.csv("../dens_nofinca.csv")
#set factor order
dens$treatment <- factor(dens$treatment, levels = c("0", "2011", "2008", "2003", "old"))
Initial basic cleaning actions:
#convert date to R date format
dat$date <- as.Date(dat$date, format = "%d-%b-%y")
#convert year planted from numeric to factor
dat$year_planted <- as.factor(dat$year_planted)
#fix 2 NA values for year in LC NF2
dat$year_planted[is.na(dat$year_planted)] <- "2003"
#reorder the levels of the year factor variable to make plots nicer
levels(dat$year_planted) <- c("0", "2011", "2008", "2003", "old")
#fix weird transect coding
dat$transect[dat$transect == "P1(N)"] <- "P1"
dat$transect[dat$transect == "P1 (N)"] <- "P1"
dat$transect <- droplevels(dat$transect)
levels(dat$transect)
## [1] "NF4-NACI" "NF1-LC" "(LC) NF2-T" "MF1-LC" "MF2-LC"
## [6] "NF6-NACI" "P1" "P2" "NF3-NACI" "NF5-NACI"
Subset the dataset to only include those individuals that were observed within 25m of the sample point.
dat_close <- dat[dat$dist == "<25",]
Further subset to exclude those species whose behavior was coded as “pass through” since they were’nt truly using the habitat.
dat_close_use <- dat_close[dat_close$beh != "pt",]
Now we need to join the species coding information with the database so that we have feeding guild and migratory status associated with each observation.
dat_join <- dat_close_use %>%
inner_join(., codes, by = "species")
## Warning: Column `species` joining factors with different levels, coercing
## to character vector
Finally, we’ll simplify the migratory guild data to only include 3 groups: “resident”, “migratory”, or “nomadic”; and simplify feeding guilds into “omnovore” or “specialist”.
dat_join$mig_code <- unlist(combineMigCats(x = dat_join$mig_guild))
dat_join$guild_code <- unlist(combineGuildCats(x = dat_join$feed_guild))
Now we’re ready for some anlyeses. Let’s first build some really basic models and plots to see if there is an effect of treatment on the total number of birds seen (having excluded pass throughs, and those observed >25m from the observer).
This next code chunk counts up the number of observations grouped by transect and date (so for each transect-day, we get the total number of birds)
mod1_counts <- dat_join %>%
group_by(transect, year_planted) %>%
count(date)
We can fit two pretty simple generalized linear models to this. One is the simple intercept model, the other is modeling the abundance during each survey as a function of successional stage (measured as a factor variable). Each model includes date as a random effect on the intercept - basically saying that there may be more or less birds on a given day). WE don’t have enough observations to allow for day to modify how succession effects the count (e.g., cant’ do random effect on slope). We’ll first just look at the AICc-based selection table.
#intercept ("dot") model
f0 <- lmer(n ~ 1 + (1|date), data = mod1_counts, REML = F)
#succession effect model
f1 <- lmer(n ~ year_planted + (1 |date), data = mod1_counts, REML = F)
aictab(list("dot" = f0, "succession" = f1))
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## succession 7 410.90 0.00 0.89 0.89 -197.48
## dot 3 415.18 4.28 0.11 1.00 -204.39
Clear information theoretic support for the succession model. Let’s now evaluate the model. By looking at the summary and a residuals plot. First though we need to refit the models using REML instead of ML. ML is needed to use AICc but produces biased coefficient estimates.
#intercept ("dot") model
f0 <- lmer(n ~ 1 + (1|date), data = mod1_counts)
#succession effect model
f1 <- lmer(n ~ 0+year_planted + (1 |date), data = mod1_counts)
summary(f1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: n ~ 0 + year_planted + (1 | date)
## Data: mod1_counts
##
## REML criterion at convergence: 382.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2033 -0.5221 -0.2472 0.3542 3.5939
##
## Random effects:
## Groups Name Variance Std.Dev.
## date (Intercept) 7.656 2.767
## Residual 20.010 4.473
## Number of obs: 66, groups: date, 17
##
## Fixed effects:
## Estimate Std. Error t value
## year_planted0 8.243 1.436 5.741
## year_planted2011 11.773 1.436 8.198
## year_planted2008 6.262 1.523 4.111
## year_planted2003 5.617 1.388 4.047
## year_plantedold 7.803 1.434 5.440
##
## Correlation of Fixed Effects:
## yr_pl0 y_2011 y_2008 y_2003
## yr_plnt2011 0.221
## yr_plnt2008 0.217 0.221
## yr_plnt2003 0.227 0.238 0.193
## yer_plntdld 0.233 0.222 0.178 0.229
plot(f1)
qqnorm(resid(f1))
qqline(resid(f1))
The residuals don’t look great, but not horrible either - need to think more about what might be going on, but there seems like there might be some structure where we see residual bias at the highest predicted values. It might also be because we’re technically using a model that assumes normality of residuals with count data which we could also model as a Poisson. That said, think about Gary White’s talk at RRF and his paper that showed sometimes the normal model is actually more robust than the Poisson even for known-Poisson-generated data. I think the residuals look OK enough to keep going.
The actual fixed effect coefficient estimates are also a bit confusing in their trend, but at least they are larger than their standard error (except for old) - let’s make some plots to look at these graphically.
#make some model fited values
f1_pred <- ggpredict(f1, terms = c("year_planted"))
ggplot() +
geom_point(data = mod1_counts, aes(x = year_planted, y = n, fill = date,
color = date),
alpha = .6) +
geom_point(data = f1_pred, aes(x = x, y = predicted), shape = 15, size = 3) +
geom_errorbar(data = f1_pred, aes(x = x, ymax = conf.high, ymin = conf.low),
width = .2) +
ylab("Abundance") +
theme_minimal()
The above plot shows the raw count data (in the blue points) colored by date. The black squares represent the model estimate for each succession class (think of this as the mean corrected for day effects). The error bars ar 95% CIs. Hard to see a trend I’d be confident in here, other than maybe a boost in abundance at the earliest successional stage. I think the model is overfit, so the parameter estimates themselves are difficult to have much confidence in (the model, in the end, includes 7 parameters for ~65 observations, that’s not great… I trust the graph more.)
Instead of raw abundance, let’s look at the proportion of migrants versus residents. Let’s just make the simplest version of the dataset and plot.
#break down mean counts per mig guild
mig_counts <- dat_join %>%
group_by(year_planted, transect, mig_code) %>%
count(date) %>%
ungroup() %>%
complete(nesting(year_planted, transect, date), mig_code,
fill = list("n" = 0))
mig_prop <- mig_counts %>%
group_by(year_planted, mig_code) %>%
summarise(m_count = mean(n), se = sd(n)/length(n))
mig_date <- mig_counts %>%
group_by(year_planted, date, mig_code) %>%
summarise(m_count = mean(n), se = sd(n)/length(n))
ggplot(mig_prop) +
geom_bar(aes(x=year_planted, y = m_count, fill = mig_code),
position = "fill", stat = "identity") +
ylab("Proportion Birds Observed") +
xlab("Treatment") +
theme_minimal() +
scale_fill_discrete(name = "Migratory Status",
labels = c("Migrant", "Resident"))
Interesting that we see a bump in migrants for the 2011 treatment. But the above doesn’t account for date yet, which we probably should do…
First a plot of counts by date separated by mig code and succession treatment:
ggplot(mig_counts) +
geom_point(aes(x = date, y = n, group = mig_code,
color = mig_code)) +
facet_wrap(~year_planted)
## Warning: Removed 2 rows containing missing values (geom_point).
Now let’s fit some candidate model. First the dot model that includes intercept as a random effect of date; then another model that includes migratory status and treatment as an interactive effect (same random effect of date); and finally one that considers migratory status alone (again with random effect of date).
f2 <- lmer(n ~ 1 + (1|date), data = mig_counts, REML = F)
f3 <- lmer(n ~ year_planted*mig_code + (1|date), data = mig_counts,
REML = F)
f4 <- lmer(n ~ mig_code + (1|date), data = mig_counts, REML = F)
aictab(list("dot" = f2, "succesion X mig" = f3, "mig" = f4))
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## mig 4 711.01 0.00 0.7 0.7 -351.35
## succesion X mig 12 712.69 1.67 0.3 1.0 -343.03
## dot 3 782.29 71.28 0.0 1.0 -388.05
Let’s look at the summary of our best model (spoiler: it’s probably just overfitting):
f3 <- lmer(n ~ year_planted*mig_code + (1|date), data = mig_counts)
summary(f3)
## Linear mixed model fit by REML ['lmerMod']
## Formula: n ~ year_planted * mig_code + (1 | date)
## Data: mig_counts
##
## REML criterion at convergence: 669.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.9976 -0.4700 -0.0794 0.3263 5.2766
##
## Random effects:
## Groups Name Variance Std.Dev.
## date (Intercept) 1.878 1.370
## Residual 10.247 3.201
## Number of obs: 132, groups: date, 17
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.737333 0.957430 0.770
## year_planted2011 2.613145 1.268922 2.059
## year_planted2008 -0.783694 1.290377 -0.607
## year_planted2003 -0.288127 1.244664 -0.231
## year_plantedold 0.007173 1.263448 0.006
## mig_coderes 6.769231 1.255602 5.391
## year_planted2011:mig_coderes -1.692308 1.775690 -0.953
## year_planted2008:mig_coderes -0.384615 1.775690 -0.217
## year_planted2003:mig_coderes -2.054945 1.743693 -1.179
## year_plantedold:mig_coderes -0.461538 1.775690 -0.260
##
## Correlation of Fixed Effects:
## (Intr) yr_2011 yr_2008 yr_2003 yr_pln mg_cdr y_2011: y_2008:
## yr_plnt2011 -0.663
## yr_plnt2008 -0.648 0.493
## yr_plnt2003 -0.676 0.513 0.491
## yer_plntdld -0.660 0.498 0.477 0.508
## mig_coderes -0.656 0.495 0.487 0.504 0.497
## yr_pl2011:_ 0.464 -0.700 -0.344 -0.357 -0.351 -0.707
## yr_pl2008:_ 0.464 -0.350 -0.688 -0.357 -0.351 -0.707 0.500
## yr_pl2003:_ 0.472 -0.356 -0.350 -0.700 -0.358 -0.720 0.509 0.509
## yr_plntdl:_ 0.464 -0.350 -0.344 -0.357 -0.703 -0.707 0.500 0.500
## y_2003:
## yr_plnt2011
## yr_plnt2008
## yr_plnt2003
## yer_plntdld
## mig_coderes
## yr_pl2011:_
## yr_pl2008:_
## yr_pl2003:_
## yr_plntdl:_ 0.509
Note too that the only well-estimated parameters are those having to do with 2011.
Let’s also look at mig only:
f4 <- lmer(n ~ mig_code + (1|date), data = mig_counts, REML = F)
summary(f4)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: n ~ mig_code + (1 | date)
## Data: mig_counts
##
## AIC BIC logLik deviance df.resid
## 710.7 722.2 -351.3 702.7 128
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0699 -0.5113 -0.0855 0.3775 5.1797
##
## Random effects:
## Groups Name Variance Std.Dev.
## date (Intercept) 1.503 1.226
## Residual 10.960 3.311
## Number of obs: 132, groups: date, 17
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.0503 0.5099 2.06
## mig_coderes 5.8333 0.5763 10.12
##
## Correlation of Fixed Effects:
## (Intr)
## mig_coderes -0.565
This model actually looks fine, but all it tells us really is that residents are more common than migrants in general.
Similar analysis, but for collapsed feeding guild.
#break down mean counts per mig guild
guild_counts <- dat_join %>%
group_by(year_planted, transect, guild_code) %>%
count(date) %>%
ungroup() %>%
complete(nesting(year_planted, transect, date), guild_code,
fill = list("n" = 0))
guild_prop <- guild_counts %>%
group_by(year_planted, guild_code) %>%
summarise(m_count = mean(n), se = sd(n)/length(n))
guild_date <- guild_counts %>%
group_by(year_planted, date, guild_code) %>%
summarise(m_count = mean(n), se = sd(n)/length(n))
ggplot(guild_prop) +
geom_bar(aes(x=year_planted, y = m_count, fill = guild_code),
position = "fill", stat = "identity") +
ylab("Proportion Birds Observed") +
xlab("Treatment") +
theme_minimal() +
scale_fill_discrete(name = "Feeding Guild",
labels = c("Omnivore", "Specialist"))
Interesting trend here, but probably not explanatory of the bump for 2011.
Again, given sample sizes I don’t think it makes a ton of sense to model the effect of veg parameters directly (though I’m certainly open to conversation about that…). But some visualizations to help with the “storytelling” might be useful - let’s see what’s in the veg data…
trees %>%
group_by(treatment) %>%
summarise(m = mean(ht), se = sd(ht)/length(ht), v = var(ht))
## # A tibble: 5 x 4
## treatment m se v
## <fct> <dbl> <dbl> <dbl>
## 1 0 691 162. 104882
## 2 2011 268. 2.65 5125.
## 3 2008 355. 3.98 14250.
## 4 2003 376. 0.840 15861.
## 5 old 409. 1.74 38164.
Let’s look at the distribution of tree heights by treatement.
ggplot() +
geom_violin(data = trees, aes(x=treatment, y = ht, fill = treatment)) +
geom_point(data = trees[trees$treatment == 0,], aes(x = treatment, y = ht,
color = treatment)) +
theme_minimal() +
ylab("Tree Ht. (cm)") +
xlab("Treatment") +
theme(legend.position = "none")
That’s pretty interesting, again, the only real standout is the 2011 treatment - 2003 and 2008 look more similar to old than 2011 - just missing the far end of the tail.
We can do the same thig with DBH data too :
ggplot() +
geom_violin(data = trees, aes(x=treatment, y = dbh, fill = treatment)) +
geom_point(data = trees[trees$treatment == 0,], aes(x = treatment, y = dbh,
color = treatment)) +
theme_minimal() +
ylab("Tree DBH (cm)") +
xlab("Treatment") +
theme(legend.position = "none")
## Warning: Removed 1 rows containing non-finite values (stat_ydensity).
Pretty cool looking graph but I think 2011 doesn’t jump out to me the way it does for tree height.
Pulling the densiometer data:
ggplot() +
geom_point(data = dens, aes(x = treatment, y = dens_mean,
color = treatment)) +
theme_minimal() +
ylab("Canopy Openness") +
xlab("Treatment") +
theme(legend.position = "none")
Create presence/absence dataset filling in implied “0”s.
occur <- dat_join
occur$pres <- 1
ocurr_sites <- occur %>%
complete(species, nesting(transect, year_planted), fill = list("pres" = 0))
ocurr_visits <- occur %>%
complete(species, nesting(transect, date, year_planted),
fill = list("pres" = 0))
First let’s look at raw species richness by site:
rich <- ocurr_sites %>%
group_by(transect, year_planted) %>%
distinct(species, transect, year_planted, pres) %>%
tally(wt = pres)
rich %>%
group_by(year_planted) %>%
summarise(mean(n))
## # A tibble: 5 x 2
## year_planted `mean(n)`
## <fct> <dbl>
## 1 0 20.5
## 2 2011 26.5
## 3 2008 19.5
## 4 2003 16
## 5 old 16
ggplot(rich) +
geom_point(aes(x = year_planted, y = n))
This is a work in progress, but here’s what I’m thinking for now. There’s a bump in both abundance and proportion of migrants in the 2011 treatment. We’ll need to appropriately caveat because, with the small sample size, one or a few days could be driving this (though I think the panel plot helps make the case it’s a real trend). That said, there could be a story here. We wouldn’t necessarily expect the most migrants in pasture because that’s actually a totally different habitat type. So you would expect the most migrants in the earliest successional stages, which is exactly what we see. What’s surprising, is that the trend appears to be nonlinear - it really only holds for the earliest successional habitats and after that, there is no discernable difference when compared to uncut forest.
The tree height plot helps us disentangle that a bit more - the distribution of tree heights is really different in the 2011 plot compared to the other successional stages or the mature forest.
Here’s another way of looking at it - we fit a separate loess smoother over the counts by date grouped by treatment and mig code. It makes the influx of migrants late winter into early spring more obvious than when looking at the raw data. This smoother doesn’t show what our model did exactly (we don’t have enough data to model this exactly, but it at least makes it visible to us).
ggplot(mig_counts, aes(x = date, y = n, group = mig_code,
color = mig_code)) +
# geom_line(data = cbind(mig_counts[!is.na(mig_counts$date),],
# pred = predict(f3)), aes(y = pred),
# size = 1) + # adding predicted line from mixed model
geom_smooth(method = "loess") +
ylim(c(0,20)) +
facet_wrap(~year_planted)
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## Warning: Removed 29 rows containing missing values (geom_smooth).