Introduction

This document describes analyses and preliminary results of avian point count surveys across a successional gradient at the Monta Verde Cloud Forest in Costa Rica.

Required Libraries and Custom Functions

#### REQUIRED LIBRARIES ####
library(dplyr)
library(ggplot2)
library(lme4)
library(AICcmodavg)
library(ggeffects)
library(viridis)
library(tidyr)

Load a custom function that will help us combine initial migratory coding into bigger bins.

combineMigCats <- Vectorize(vectorize.args = "x",
                               FUN = function(x) {
                                 switch(as.character(x), 
                            "altitudinal" = "mig",
                            "long-distance" = "mig",
                            "short-distance" = "mig",
                            "nomadic" = "mig",
                            "resident" = "res",
                            "NA" = NA)})

combineGuildCats <- Vectorize(vectorize.args = "x",
                               FUN = function(x) {
                                 switch(as.character(x), 
                            "carnivore" = "specialist",
                            "frugivore" = "specialist",
                            "granivore" = "specialist",
                            "herbivore" = "specialist",
                            "insectivore" = "specialist",
                            "nectarivore" = "specialist",
                            "nectarivore (omnivore?)" = "omnivore",
                            "omnivore" = "omnivore",
                            "scavenger" = "omnivore")})

Import and Process Data

Load data

Bird Data

First load all the point count data and combine into a single data frame.

#load site data
nf4_2011 <- read.csv("../data_by_site/NF4-2011.csv")
nf1_2003 <- read.csv("../data_by_site/NF1-2003.csv")
nf2 <- read.csv("../data_by_site/NF2.csv")
mf1 <- read.csv("../data_by_site/MF1.csv")
mf2 <- read.csv("../data_by_site/MF2.csv")
nf6_2011 <- read.csv("../data_by_site/NF6-2011.csv")
p1 <- read.csv("../data_by_site/P1.csv")
p2 <- read.csv("../data_by_site/P2.csv")
nf3nf5_2008 <- read.csv("../data_by_site/NF3NF5-2008.csv")

#excluded Finca site from analysis
#nf7ab_2008 <- read.csv("../data_by_site/NF7AB-2008.csv") 

#combine into single data object
dat <- rbind(nf4_2011, nf1_2003, nf2, mf1, mf2, nf6_2011, p1, p2, nf3nf5_2008)

Load the datasheet containing key to codes.

codes <- read.csv("../data_by_site/CODES.csv")

Vegetation Data

Load in the tree data:

trees <- read.csv("../trees_nofinca_121219.csv")

#set factor order
trees$treatment <- factor(trees$treatment, levels = c("0", "2011", "2008", "2003", "old"))

Load in densiometer data (values represent open “dots”):

dens <- read.csv("../dens_nofinca.csv")

#set factor order
dens$treatment <- factor(dens$treatment, levels = c("0", "2011", "2008", "2003", "old"))

Process data

Initial basic cleaning actions:

#convert date to R date format
dat$date <- as.Date(dat$date, format = "%d-%b-%y")

#convert year planted from numeric to factor
dat$year_planted <- as.factor(dat$year_planted) 

#fix 2 NA values for year in LC NF2
dat$year_planted[is.na(dat$year_planted)] <- "2003"

#reorder the levels of the year factor variable to make plots nicer
levels(dat$year_planted) <- c("0", "2011", "2008", "2003", "old")

#fix weird transect coding
dat$transect[dat$transect == "P1(N)"] <- "P1"
dat$transect[dat$transect == "P1 (N)"] <- "P1"
dat$transect <- droplevels(dat$transect)
levels(dat$transect) 
##  [1] "NF4-NACI"   "NF1-LC"     "(LC) NF2-T" "MF1-LC"     "MF2-LC"    
##  [6] "NF6-NACI"   "P1"         "P2"         "NF3-NACI"   "NF5-NACI"

Subset the dataset to only include those individuals that were observed within 25m of the sample point.

dat_close <- dat[dat$dist == "<25",]

Further subset to exclude those species whose behavior was coded as “pass through” since they were’nt truly using the habitat.

dat_close_use <- dat_close[dat_close$beh != "pt",]

Now we need to join the species coding information with the database so that we have feeding guild and migratory status associated with each observation.

dat_join <- dat_close_use %>%
  inner_join(., codes, by = "species")
## Warning: Column `species` joining factors with different levels, coercing
## to character vector

Finally, we’ll simplify the migratory guild data to only include 3 groups: “resident”, “migratory”, or “nomadic”; and simplify feeding guilds into “omnovore” or “specialist”.

dat_join$mig_code <- unlist(combineMigCats(x = dat_join$mig_guild))

dat_join$guild_code <- unlist(combineGuildCats(x = dat_join$feed_guild))

Analyisis

Abundance

Now we’re ready for some anlyeses. Let’s first build some really basic models and plots to see if there is an effect of treatment on the total number of birds seen (having excluded pass throughs, and those observed >25m from the observer).

This next code chunk counts up the number of observations grouped by transect and date (so for each transect-day, we get the total number of birds)

mod1_counts <- dat_join %>%
  group_by(transect, year_planted) %>%
  count(date)

We can fit two pretty simple generalized linear models to this. One is the simple intercept model, the other is modeling the abundance during each survey as a function of successional stage (measured as a factor variable). Each model includes date as a random effect on the intercept - basically saying that there may be more or less birds on a given day). WE don’t have enough observations to allow for day to modify how succession effects the count (e.g., cant’ do random effect on slope). We’ll first just look at the AICc-based selection table.

#intercept ("dot") model
f0 <- lmer(n ~ 1 + (1|date), data = mod1_counts, REML = F)
#succession effect model
f1 <- lmer(n ~ year_planted + (1 |date), data = mod1_counts, REML = F)

aictab(list("dot" = f0, "succession" = f1))
## 
## Model selection based on AICc:
## 
##            K   AICc Delta_AICc AICcWt Cum.Wt      LL
## succession 7 410.90       0.00   0.89   0.89 -197.48
## dot        3 415.18       4.28   0.11   1.00 -204.39

Clear information theoretic support for the succession model. Let’s now evaluate the model. By looking at the summary and a residuals plot. First though we need to refit the models using REML instead of ML. ML is needed to use AICc but produces biased coefficient estimates.

#intercept ("dot") model
f0 <- lmer(n ~ 1 + (1|date), data = mod1_counts)
#succession effect model
f1 <- lmer(n ~ 0+year_planted + (1 |date), data = mod1_counts)
summary(f1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: n ~ 0 + year_planted + (1 | date)
##    Data: mod1_counts
## 
## REML criterion at convergence: 382.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2033 -0.5221 -0.2472  0.3542  3.5939 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  date     (Intercept)  7.656   2.767   
##  Residual             20.010   4.473   
## Number of obs: 66, groups:  date, 17
## 
## Fixed effects:
##                  Estimate Std. Error t value
## year_planted0       8.243      1.436   5.741
## year_planted2011   11.773      1.436   8.198
## year_planted2008    6.262      1.523   4.111
## year_planted2003    5.617      1.388   4.047
## year_plantedold     7.803      1.434   5.440
## 
## Correlation of Fixed Effects:
##             yr_pl0 y_2011 y_2008 y_2003
## yr_plnt2011 0.221                      
## yr_plnt2008 0.217  0.221               
## yr_plnt2003 0.227  0.238  0.193        
## yer_plntdld 0.233  0.222  0.178  0.229
plot(f1)

qqnorm(resid(f1))
qqline(resid(f1))

The residuals don’t look great, but not horrible either - need to think more about what might be going on, but there seems like there might be some structure where we see residual bias at the highest predicted values. It might also be because we’re technically using a model that assumes normality of residuals with count data which we could also model as a Poisson. That said, think about Gary White’s talk at RRF and his paper that showed sometimes the normal model is actually more robust than the Poisson even for known-Poisson-generated data. I think the residuals look OK enough to keep going.

The actual fixed effect coefficient estimates are also a bit confusing in their trend, but at least they are larger than their standard error (except for old) - let’s make some plots to look at these graphically.

#make some model fited values
f1_pred <- ggpredict(f1, terms = c("year_planted"))

ggplot() +
  geom_point(data = mod1_counts, aes(x = year_planted, y = n, fill = date, 
                                     color = date),
             alpha = .6) +
  geom_point(data = f1_pred, aes(x = x, y = predicted), shape = 15, size = 3) +
  geom_errorbar(data = f1_pred, aes(x = x, ymax = conf.high, ymin = conf.low),
                width = .2) +
  ylab("Abundance") +
  theme_minimal()

The above plot shows the raw count data (in the blue points) colored by date. The black squares represent the model estimate for each succession class (think of this as the mean corrected for day effects). The error bars ar 95% CIs. Hard to see a trend I’d be confident in here, other than maybe a boost in abundance at the earliest successional stage. I think the model is overfit, so the parameter estimates themselves are difficult to have much confidence in (the model, in the end, includes 7 parameters for ~65 observations, that’s not great… I trust the graph more.)

Migratory status

Instead of raw abundance, let’s look at the proportion of migrants versus residents. Let’s just make the simplest version of the dataset and plot.

#break down mean counts per mig guild
mig_counts <- dat_join %>%
  group_by(year_planted, transect, mig_code) %>%
  count(date)  %>%
  ungroup() %>% 
  complete(nesting(year_planted, transect, date), mig_code, 
           fill = list("n" = 0)) 

mig_prop <- mig_counts %>%
  group_by(year_planted, mig_code) %>% 
  summarise(m_count = mean(n), se = sd(n)/length(n))

mig_date <- mig_counts %>%
  group_by(year_planted, date, mig_code) %>% 
  summarise(m_count = mean(n), se = sd(n)/length(n))

ggplot(mig_prop) +
  geom_bar(aes(x=year_planted, y = m_count, fill = mig_code), 
           position = "fill", stat = "identity") +
  ylab("Proportion Birds Observed") +
  xlab("Treatment") +
  theme_minimal() +
  scale_fill_discrete(name = "Migratory Status", 
                      labels = c("Migrant", "Resident"))

Interesting that we see a bump in migrants for the 2011 treatment. But the above doesn’t account for date yet, which we probably should do…

First a plot of counts by date separated by mig code and succession treatment:

ggplot(mig_counts) +
  geom_point(aes(x = date, y = n, group = mig_code, 
                color = mig_code)) +
  facet_wrap(~year_planted)
## Warning: Removed 2 rows containing missing values (geom_point).

Now let’s fit some candidate model. First the dot model that includes intercept as a random effect of date; then another model that includes migratory status and treatment as an interactive effect (same random effect of date); and finally one that considers migratory status alone (again with random effect of date).

f2 <- lmer(n ~ 1 + (1|date), data = mig_counts, REML = F)
f3 <- lmer(n ~ year_planted*mig_code  + (1|date), data = mig_counts,
           REML = F)
f4 <- lmer(n ~ mig_code  + (1|date), data = mig_counts, REML = F)



aictab(list("dot" = f2, "succesion X mig" = f3, "mig" = f4))
## 
## Model selection based on AICc:
## 
##                  K   AICc Delta_AICc AICcWt Cum.Wt      LL
## mig              4 711.01       0.00    0.7    0.7 -351.35
## succesion X mig 12 712.69       1.67    0.3    1.0 -343.03
## dot              3 782.29      71.28    0.0    1.0 -388.05

Let’s look at the summary of our best model (spoiler: it’s probably just overfitting):

f3 <- lmer(n ~ year_planted*mig_code  + (1|date), data = mig_counts)
summary(f3)
## Linear mixed model fit by REML ['lmerMod']
## Formula: n ~ year_planted * mig_code + (1 | date)
##    Data: mig_counts
## 
## REML criterion at convergence: 669.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9976 -0.4700 -0.0794  0.3263  5.2766 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  date     (Intercept)  1.878   1.370   
##  Residual             10.247   3.201   
## Number of obs: 132, groups:  date, 17
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                   0.737333   0.957430   0.770
## year_planted2011              2.613145   1.268922   2.059
## year_planted2008             -0.783694   1.290377  -0.607
## year_planted2003             -0.288127   1.244664  -0.231
## year_plantedold               0.007173   1.263448   0.006
## mig_coderes                   6.769231   1.255602   5.391
## year_planted2011:mig_coderes -1.692308   1.775690  -0.953
## year_planted2008:mig_coderes -0.384615   1.775690  -0.217
## year_planted2003:mig_coderes -2.054945   1.743693  -1.179
## year_plantedold:mig_coderes  -0.461538   1.775690  -0.260
## 
## Correlation of Fixed Effects:
##             (Intr) yr_2011 yr_2008 yr_2003 yr_pln mg_cdr y_2011: y_2008:
## yr_plnt2011 -0.663                                                      
## yr_plnt2008 -0.648  0.493                                               
## yr_plnt2003 -0.676  0.513   0.491                                       
## yer_plntdld -0.660  0.498   0.477   0.508                               
## mig_coderes -0.656  0.495   0.487   0.504   0.497                       
## yr_pl2011:_  0.464 -0.700  -0.344  -0.357  -0.351 -0.707                
## yr_pl2008:_  0.464 -0.350  -0.688  -0.357  -0.351 -0.707  0.500         
## yr_pl2003:_  0.472 -0.356  -0.350  -0.700  -0.358 -0.720  0.509   0.509 
## yr_plntdl:_  0.464 -0.350  -0.344  -0.357  -0.703 -0.707  0.500   0.500 
##             y_2003:
## yr_plnt2011        
## yr_plnt2008        
## yr_plnt2003        
## yer_plntdld        
## mig_coderes        
## yr_pl2011:_        
## yr_pl2008:_        
## yr_pl2003:_        
## yr_plntdl:_  0.509

Note too that the only well-estimated parameters are those having to do with 2011.

Let’s also look at mig only:

f4 <- lmer(n ~ mig_code  + (1|date), data = mig_counts, REML = F)
summary(f4)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: n ~ mig_code + (1 | date)
##    Data: mig_counts
## 
##      AIC      BIC   logLik deviance df.resid 
##    710.7    722.2   -351.3    702.7      128 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0699 -0.5113 -0.0855  0.3775  5.1797 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  date     (Intercept)  1.503   1.226   
##  Residual             10.960   3.311   
## Number of obs: 132, groups:  date, 17
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   1.0503     0.5099    2.06
## mig_coderes   5.8333     0.5763   10.12
## 
## Correlation of Fixed Effects:
##             (Intr)
## mig_coderes -0.565

This model actually looks fine, but all it tells us really is that residents are more common than migrants in general.

Foraging Guild

Similar analysis, but for collapsed feeding guild.

#break down mean counts per mig guild
guild_counts <- dat_join %>%
  group_by(year_planted, transect, guild_code) %>%
  count(date)  %>%
  ungroup() %>% 
  complete(nesting(year_planted, transect, date), guild_code, 
           fill = list("n" = 0)) 

guild_prop <- guild_counts %>%
  group_by(year_planted, guild_code) %>% 
  summarise(m_count = mean(n), se = sd(n)/length(n))

guild_date <- guild_counts %>%
  group_by(year_planted, date, guild_code) %>% 
  summarise(m_count = mean(n), se = sd(n)/length(n))

ggplot(guild_prop) +
  geom_bar(aes(x=year_planted, y = m_count, fill = guild_code), 
           position = "fill", stat = "identity") +
  ylab("Proportion Birds Observed") +
  xlab("Treatment") +
  theme_minimal() +
  scale_fill_discrete(name = "Feeding Guild",
                      labels = c("Omnivore", "Specialist"))

Interesting trend here, but probably not explanatory of the bump for 2011.

Vegetation

Again, given sample sizes I don’t think it makes a ton of sense to model the effect of veg parameters directly (though I’m certainly open to conversation about that…). But some visualizations to help with the “storytelling” might be useful - let’s see what’s in the veg data…

trees %>% 
  group_by(treatment) %>% 
  summarise(m = mean(ht), se = sd(ht)/length(ht), v = var(ht))
## # A tibble: 5 x 4
##   treatment     m      se       v
##   <fct>     <dbl>   <dbl>   <dbl>
## 1 0          691  162.    104882 
## 2 2011       268.   2.65    5125.
## 3 2008       355.   3.98   14250.
## 4 2003       376.   0.840  15861.
## 5 old        409.   1.74   38164.

Let’s look at the distribution of tree heights by treatement.

ggplot() +
  geom_violin(data = trees, aes(x=treatment, y = ht, fill = treatment)) +
  geom_point(data = trees[trees$treatment == 0,], aes(x = treatment, y = ht,
                                                      color = treatment)) +
  theme_minimal() +
  ylab("Tree Ht. (cm)") +
  xlab("Treatment") + 
  theme(legend.position = "none")

That’s pretty interesting, again, the only real standout is the 2011 treatment - 2003 and 2008 look more similar to old than 2011 - just missing the far end of the tail.

We can do the same thig with DBH data too :

ggplot() +
  geom_violin(data = trees, aes(x=treatment, y = dbh, fill = treatment)) +
  geom_point(data = trees[trees$treatment == 0,], aes(x = treatment, y = dbh,
                                                      color = treatment)) +
  theme_minimal() +
  ylab("Tree DBH (cm)") +
  xlab("Treatment") + 
  theme(legend.position = "none")
## Warning: Removed 1 rows containing non-finite values (stat_ydensity).

Pretty cool looking graph but I think 2011 doesn’t jump out to me the way it does for tree height.

Pulling the densiometer data:

ggplot() +
  geom_point(data = dens, aes(x = treatment, y = dens_mean, 
                              color = treatment)) +
  theme_minimal() +
  ylab("Canopy Openness") +
  xlab("Treatment") + 
  theme(legend.position = "none")

Diversity

Create presence/absence dataset filling in implied “0”s.

occur <- dat_join
occur$pres <- 1

ocurr_sites <- occur %>%
  complete(species, nesting(transect, year_planted), fill = list("pres" = 0)) 

ocurr_visits <- occur %>%
  complete(species, nesting(transect, date, year_planted), 
           fill = list("pres" = 0)) 

First let’s look at raw species richness by site:

rich <- ocurr_sites %>% 
  group_by(transect, year_planted) %>%
  distinct(species, transect, year_planted, pres) %>% 
  tally(wt = pres)

rich %>% 
  group_by(year_planted) %>% 
  summarise(mean(n))
## # A tibble: 5 x 2
##   year_planted `mean(n)`
##   <fct>            <dbl>
## 1 0                 20.5
## 2 2011              26.5
## 3 2008              19.5
## 4 2003              16  
## 5 old               16
ggplot(rich) +
  geom_point(aes(x = year_planted, y = n))

Preliminary Conclusions

This is a work in progress, but here’s what I’m thinking for now. There’s a bump in both abundance and proportion of migrants in the 2011 treatment. We’ll need to appropriately caveat because, with the small sample size, one or a few days could be driving this (though I think the panel plot helps make the case it’s a real trend). That said, there could be a story here. We wouldn’t necessarily expect the most migrants in pasture because that’s actually a totally different habitat type. So you would expect the most migrants in the earliest successional stages, which is exactly what we see. What’s surprising, is that the trend appears to be nonlinear - it really only holds for the earliest successional habitats and after that, there is no discernable difference when compared to uncut forest.

The tree height plot helps us disentangle that a bit more - the distribution of tree heights is really different in the 2011 plot compared to the other successional stages or the mature forest.

Here’s another way of looking at it - we fit a separate loess smoother over the counts by date grouped by treatment and mig code. It makes the influx of migrants late winter into early spring more obvious than when looking at the raw data. This smoother doesn’t show what our model did exactly (we don’t have enough data to model this exactly, but it at least makes it visible to us).

ggplot(mig_counts, aes(x = date, y = n, group = mig_code, 
                color = mig_code)) +
  # geom_line(data = cbind(mig_counts[!is.na(mig_counts$date),], 
  #                        pred = predict(f3)), aes(y = pred), 
            # size = 1) +  # adding predicted line from mixed model 
  geom_smooth(method = "loess") +
  ylim(c(0,20)) +
  facet_wrap(~year_planted)
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## Warning: Removed 29 rows containing missing values (geom_smooth).

Notes