Data science with {hyenaR}:
LESSON 8

🎉hyenaR v0.9.99994🎉

Use {drat} to access the new version of {hyenaR}.

## To download package from other sources
library(drat)

## Include 'hyenaproject' as a package source
addRepo("hyenaproject") 

## Download hyenaR
install.packages("hyenaR")


#Check you have the right version (0.9.99994)
packageVersion("hyenaR")
[1] '0.9.99994'

Prepare our workspace


STEP 1: Load required packages

library(hyenaR) ## For our hyena specific functions
library(dplyr) ## For most data wrangling
library(ggplot2) ## For plotting
library(lubridate) ## Working with dates
library(tidyr) ## Extra data wrangling functions


STEP 2: Load the database

load_package_database.full(
  
  # Location of our database file
  db.path = "example_git/source_data/Fisidata_2022_08_10.sqlite"
  
)

Today’s goals

GOAL 1: Introduce the new features of v0.9.99994

Check the NEWS to find out more

build_vignette_news()

Relatedness functions

👪 New and old relatedness functions

#This is now *correct* relatedness using `kinship2`
fetch_dyad_relatedness(ID.1 = "T-212", ID.2 = "T-177")
[1] 0.2519531


#This is the previous (incorrect) relatedness used in social support
fetch_dyad_relatedness.via.filiation(ID.1 = "T-212", ID.2 = "T-177", filiation = "father")
[1] 0.25

👪 New and old relatedness functions

Code
#Find relatedness of all individuals in Triangle at a given date
Triangle_IDs <- find_clan_id(clan = "T", from = "2000-01-01", to = "2001-01-01")

#Estimate relatedness for the first 50
df <- expand.grid(ID.1 = Triangle_IDs,
                  ID.2 = Triangle_IDs) %>% 
  mutate(old_relatedness = fetch_dyad_relatedness.via.filiation(ID.1 = ID.1,
                                                                ID.2 = ID.2,
                                                                filiation = "mother_genetic", verbose = FALSE),
         new_relatedness = fetch_dyad_relatedness(ID.1 = ID.1,
                                                  ID.2 = ID.2)) %>% 
  #When there's no relatedness on mother genetic assume it's 0
  mutate(old_relatedness = tidyr::replace_na(old_relatedness, 0))

df %>% 
  mutate(diff = abs(old_relatedness - new_relatedness)) %>%
  {ggplot(.) +
      geom_histogram(aes(x = diff), binwidth = 0.01) +
      labs(title = "Functions will have given similar results (but not always)",
           subtitle = "Triangle clan member during the year 2000") +
      labs(x = "Difference between old and\nnew relatedness measures") +
      theme_classic() +
      theme(plot.title = element_text(face = "bold"))}

create_id_starting.table()

🐛 Bug fix

# OLD BEHAVIOUR: Dispersers are skipped incorrectly
create_id_starting.table.historic(lifestage = "!dead",
                                  from = "1998-01-01",
                                  to = "1998-12-31",
                                  lifestage.overlap = "always", verbose = FALSE)
# A tibble: 111 × 1
   ID   
   <chr>
 1 A-001
 2 A-003
 3 A-006
 4 A-009
 5 A-010
 6 A-011
 7 A-013
 8 A-015
 9 A-016
10 A-020
# … with 101 more rows

🐛 Bug fix

# NEW BEHAVIOUR: Dispersers are included!
create_id_starting.table(lifestage = "!dead",
                         from = "1998-01-01",
                         to = "1998-12-31",
                         lifestage.overlap = "always", verbose = FALSE)
# A tibble: 182 × 1
   ID   
   <chr>
 1 A-001
 2 A-003
 3 A-006
 4 A-008
 5 A-009
 6 A-010
 7 A-011
 8 A-013
 9 A-015
10 A-016
# … with 172 more rows

🐛 Bug fix

# NEW BEHAVIOUR: 'alive' lifestage is possible
create_id_starting.table(lifestage = "alive",
                         from = "1998-01-01",
                         to = "1998-12-31",
                         lifestage.overlap = "always", verbose = FALSE)
# A tibble: 182 × 1
   ID   
   <chr>
 1 A-001
 2 A-003
 3 A-006
 4 A-008
 5 A-009
 6 A-010
 7 A-011
 8 A-013
 9 A-015
10 A-016
# … with 172 more rows

✨ New lifestages

🛑 Lifestage immigrant no longer exists 🛑

✨ New lifestages

  • founder_male: sexually active adult male present in the crater at the start of the study (i.e. not cub, subadult, or natal).

  • foreigner_X: individual whose birth clan is unknown (X) or was born outside of the main clans (rim clans e.g. U, C) and that has made X observed selections. The number of previous selection events that occurred before the first selection is unknown, therefore X is a minimum number of previous selections.

✨ New lifestages

  • sexually_active: Individuals that are sexually active (= “philopatric” + “disperser” + “selector_X” + ‘foreigner_X’ + “founder_male”).

  • selector: Individuals born in the main clans that are sexually active (= “philopatric” + “disperser” + “selector_X”).

  • foreigner: Individuals born outside the main clans that are sexually active (= ‘foreigner_X’).

  • native: Individuals that have not left their birth clan (= “cub” + “subadult” + “natal” + “philopatric”).

✨ New lifestages

WARNING

selector meta-lifestage now excludes foreigners. Use sexually_active if you want all individuals that are sexually active.

⚠️ Change default behaviour

# OLD BEHAVIOUR: Return all individuals ever BORN in Airstrip
create_id_starting.table.historic(clan = "A")
# A tibble: 537 × 1
   ID   
   <chr>
 1 A-001
 2 A-002
 3 A-003
 4 A-004
 5 A-006
 6 A-007
 7 A-008
 8 A-009
 9 A-010
10 A-013
# … with 527 more rows

⚠️ Change default behaviour

# NEW BEHAVIOUR: Return all individuals ever PRESENT in Airstrip
create_id_starting.table(clan = "A")
# A tibble: 597 × 1
   ID   
   <chr>
 1 A-001
 2 A-002
 3 A-003
 4 A-004
 5 A-006
 6 A-007
 7 A-008
 8 A-009
 9 A-010
10 A-011
# … with 587 more rows

⚠️ Change default behaviour

# NEW BEHAVIOUR: Return all individuals ever BORN in Airstrip
# using the clan.birth argument
create_id_starting.table(clan.birth = "A")
# A tibble: 537 × 1
   ID   
   <chr>
 1 A-001
 2 A-002
 3 A-003
 4 A-004
 5 A-006
 6 A-007
 7 A-008
 8 A-009
 9 A-010
10 A-013
# … with 527 more rows

The ‘clan.birth’ argument

Allows us to query current lifestage/clan and birth clan

# Individuals that were in Airstrip in 2010
# but that were not born there
create_id_starting.table(lifestage = "!native",
                         clan = "A",
                         from = "2010-01-01",
                         to = "2010-12-31")
# A tibble: 22 × 1
   ID   
   <chr>
 1 F-112
 2 F-128
 3 F-135
 4 F-139
 5 L-095
 6 L-128
 7 L-137
 8 L-196
 9 M-051
10 M-203
# … with 12 more rows

The ‘clan.birth’ argument

Allows us to query current lifestage/clan and birth clan

# Individuals that were in Airstrip in 2010
# but were born in Forest
create_id_starting.table(lifestage = "!native",
                         clan = "A",
                         from = "2010-01-01",
                         to = "2010-12-31",
                         clan.birth = "F")
# A tibble: 4 × 1
  ID   
  <chr>
1 F-112
2 F-128
3 F-135
4 F-139

The ‘clan.birth’ argument

WARNING

‘clan.birth’ does not consider from/to/at. For this we need to use lifestage = “cub”.

# Individuals born in Airstrip in 2010
create_id_starting.table(lifestage = "cub",
                         lifestage.overlap = "start",
                         clan = "A",
                         from = "2010-01-01",
                         to = "2010-12-31")
# A tibble: 25 × 1
   ID   
   <chr>
 1 A-271
 2 A-272
 3 A-273
 4 A-274
 5 A-275
 6 A-276
 7 A-277
 8 A-278
 9 A-279
10 A-280
# … with 15 more rows

🌧️ Weather data ☀️

Extract weather data

create_weather_starting.table(variable = c("temp", "rain"),
                              location = "acacia")
# A tibble: 4,650 × 7
   station_name site_name date_time           latitude longitude air_temp precip
   <chr>        <chr>     <dttm>                 <dbl>     <dbl>    <dbl>  <dbl>
 1 upepo        acacia    2022-05-31 11:30:00    -3.23      35.5     16.4      0
 2 upepo        acacia    2022-05-31 12:00:00    -3.23      35.5     16.4      0
 3 upepo        acacia    2022-05-31 12:30:00    -3.23      35.5     17.4      0
 4 upepo        acacia    2022-05-31 13:00:00    -3.23      35.5     17.3      0
 5 upepo        acacia    2022-05-31 13:30:00    -3.23      35.5     18.1      0
 6 upepo        acacia    2022-05-31 14:00:00    -3.23      35.5     18.7      0
 7 upepo        acacia    2022-05-31 14:30:00    -3.23      35.5     19.8      0
 8 upepo        acacia    2022-05-31 15:00:00    -3.23      35.5     19.6      0
 9 upepo        acacia    2022-05-31 15:30:00    -3.23      35.5     20.3      0
10 upepo        acacia    2022-05-31 16:00:00    -3.23      35.5     20.5      0
# … with 4,640 more rows

Extract weather data

Code
raw_data <- create_weather_starting.table(variable = c("temp", "rain"),
                                          location = "acacia")

plot_data <- raw_data %>% 
  group_by(date = lubridate::as_date(date_time)) %>% 
  summarise(across(.cols = c(air_temp, precip), .fns = ~{mean(., na.rm = TRUE)}, .names = "{.col}_mean"),
            across(.cols = c(air_temp, precip), .fns = ~{min(., na.rm = TRUE)}, .names = "{.col}_min"),
            across(.cols = c(air_temp, precip), .fns = ~{max(., na.rm = TRUE)}, .names = "{.col}_max"))

ggplot(data = plot_data) +
  geom_ribbon(aes(x = date, ymin = air_temp_min, ymax = air_temp_max),
              fill = "grey70", alpha = 0.75) +
  geom_line(aes(x = date, y = air_temp_mean)) +
  labs(title = "Acacia air temperature",
       subtitle = "(Apr - Jul 2022)",
       y = "Mean air temperature (°C)") +
  scale_x_date(date_labels = "%B-%Y") +
  theme_classic() +
  theme(plot.title = element_text(face = "bold"),
        axis.line.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_blank(),
        panel.grid.major = element_line(colour = "grey10", size = 0.1))

Extract weather data

fetch_weather_temp.mean(from = "2022-05-01", to = "2022-07-01",
                        location = "acacia")
# A tibble: 1 × 1
  acacia_air_temp_mean
                 <dbl>
1                 17.1
fetch_weather_rain.max(from = "2022-05-01", to = "2022-07-01",
                       location = "acacia")
# A tibble: 1 × 1
  acacia_precip_max
              <dbl>
1               1.4

Extract weather data

WARNING

Currently, only data from Acacia and Ngoitokitok are extractable with functions.

WARNING

On Windows, will need to use download_package_csv(download.method = 'curl') to download weather data.

HOMEWORK: Combine what we’ve learnt!

TASK 1:

Use create_id_starting.table() to find all individuals born in main clans in the years 1997 - 2021 (where we have full year observation).


How many individuals have been born in total during this period?

TASK 2:

Extract individual birth clan, birth date, sex, and lifespan.


How many individuals have missing data?


How would you interpet NAs in each column?

TASK 3:

Extract the year of birth for each individual.


In which year were the most cubs born?


Is the most productive year the same for all clans?


BONUS: Use filter to return a data frame with only the best year(s) for each clan?

TASK 4:

Extract the month of birth for each individual.


What was the most productive month in the crater from 1997-2021?


How many cubs were produced in that month?