Data science with {hyenaR}:
LESSON 6

Prepare our workspace


STEP 1: Load required packages

library(hyenaR) ## For our hyena specific functions
library(dplyr) ## For most data wrangling
library(ggplot2) ## For plotting
library(lubridate) ## Working with dates
library(tidyr) ## Extra data wrangling functions


STEP 2: Load the database

load_package_database.full(
  
  # Location of our database file
  db.path = "example_git/source_data/Fisidata_2022_08_10.sqlite"
  
)

Recap: What did we cover last week?

Wide vs. long formats

WIDE FORMAT

# A tibble: 1 × 10
  date       to             A     E     F     L     M     N     S     T
  <date>     <date>     <int> <int> <int> <int> <int> <int> <int> <int>
1 1997-07-01 1997-12-31    49    18    14    48    52    20     4    12

LONG FORMAT

# A tibble: 8 × 4
  clan  date       to         clan_size
  <chr> <date>     <date>         <int>
1 A     1997-07-01 1997-12-31        49
2 E     1997-07-01 1997-12-31        18
3 F     1997-07-01 1997-12-31        14
4 L     1997-07-01 1997-12-31        48
5 M     1997-07-01 1997-12-31        52
6 N     1997-07-01 1997-12-31        20
7 S     1997-07-01 1997-12-31         4
8 T     1997-07-01 1997-12-31        12

Pivoting

Last observation v. death date

# Last observation is the last record in the sightings table
fetch_id_date.observation.last(ID = "A-001")
[1] "2007-06-07"


# Death date is (usually) the day AFTER last sighting
fetch_id_date.death(ID = "A-001")
[1] "2007-06-08"

New hyenaR functions

# Is individual > 2yo?
fetch_id_is.adult(ID = "A-011", at = "1997-01-01")
[1] TRUE


# Age at death
fetch_id_duration.lifespan(ID = "A-011")
[1] 12.00821

Today’s goals


GOAL 1: HOMEWORK


GOAL 2: LOOK BACK ON WHAT WE’VE COVERED


GOAL 3: PLAN NEXT STEPS

HOMEWORK (TASK 1):
Filter sightings of all three jackal species from the carnivores table (bj, gj, sj).

Our attempts

OLIVER & ARJUN & EVE

Code
extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>% 
  select(date_time, species, groupsize)
# A tibble: 5,320 × 3
   date_time           species groupsize
   <dttm>              <chr>       <int>
 1 1996-03-28 07:20:00 bj             10
 2 1996-03-28 08:46:00 gj              2
 3 1996-03-28 08:46:00 gj              2
 4 1996-04-12 09:30:00 gj              1
 5 1996-04-12 10:03:00 bj              2
 6 1996-04-12 10:03:00 bj              2
 7 1996-04-12 10:17:00 gj              2
 8 1996-04-12 10:17:00 gj              2
 9 1996-04-13 09:07:00 bj              3
10 1996-04-13 09:07:00 bj              3
# … with 5,310 more rows
# ℹ Use `print(n = ...)` to see more rows

HOMEWORK (TASK 2):
Determine total sightings of each species in each year.

Our attempts

OLIVER

Code
extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  summarise(n = length(species)) %>%
  ungroup()
# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   384
 2 bj       1997   402
 3 bj       1998   449
 4 bj       1999   159
 5 bj       2000   104
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   298
 9 bj       2004   215
10 bj       2005   203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

ARJUN

Code
extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(species, year) %>% 
  summarize(n = sum(species==species)) %>%
  ungroup()
# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   384
 2 bj       1997   402
 3 bj       1998   449
 4 bj       1999   159
 5 bj       2000   104
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   298
 9 bj       2004   215
10 bj       2005   203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

EVE

Code
extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(year, species) %>%
  summarize(n = sum(groupsize)) %>% 
  ungroup()
# A tibble: 50 × 3
    year species     n
   <dbl> <chr>   <int>
 1  1996 bj        683
 2  1996 gj        419
 3  1997 bj        828
 4  1997 gj        322
 5  1997 sj         12
 6  1998 bj         NA
 7  1998 gj         NA
 8  1998 sj         24
 9  1999 bj        320
10  1999 gj        115
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

OLIVER

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  ## Use lubridate to extract year
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  ## Count length of the species column in each species/year combo
  summarise(n = length(species)) %>%
  ungroup()
# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   384
 2 bj       1997   402
 3 bj       1998   449
 4 bj       1999   159
 5 bj       2000   104
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   298
 9 bj       2004   215
10 bj       2005   203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

ARJUN

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  ## Use lubridate to extract year
  mutate(year = lubridate::year(date_time)) %>%
  group_by(species, year) %>% 
  ## Count the number of times that species matches species of the species/year combo
  summarize(nsight = sum(species==species)) %>%
  ungroup()
# A tibble: 50 × 3
   species  year nsight
   <chr>   <dbl>  <int>
 1 bj       1996    384
 2 bj       1997    402
 3 bj       1998    449
 4 bj       1999    159
 5 bj       2000    104
 6 bj       2001    117
 7 bj       2002    151
 8 bj       2003    298
 9 bj       2004    215
10 bj       2005    203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

EVE

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  ## Use lubridate to extract year
  mutate(year = lubridate::year(date_time)) %>%
  group_by(year, species) %>%
  ## Sum the groupsize columns within each year/species combo
  summarize(sightings = sum(groupsize)) %>% 
  ungroup()
# A tibble: 50 × 3
    year species sightings
   <dbl> <chr>       <int>
 1  1996 bj            683
 2  1996 gj            419
 3  1997 bj            828
 4  1997 gj            322
 5  1997 sj             12
 6  1998 bj             NA
 7  1998 gj             NA
 8  1998 sj             24
 9  1999 bj            320
10  1999 gj            115
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

The na.rm argument

numbers <- c(1, 2, 3, NA, 5)

sum(numbers)
[1] NA


sum(numbers, na.rm = TRUE)
[1] 11

Our attempts

LIAM (count total group size)

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  # Deal with cases where there are repeated measurements
  # when there are duplicates (species/date_time) only use the first record
  group_by(date_time, species) %>% 
  slice(1) %>% 
  ungroup() %>% 
  ## Use lubridate to extract year 
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  ## Sum the groupsize columns within each year/species combo
  ## BUT IGNORE NAs
  summarise(n = sum(groupsize, na.rm = TRUE),
            #Alternative way to ungroup after summarise
            .groups = "drop")
# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   587
 2 bj       1997   826
 3 bj       1998   928
 4 bj       1999   319
 5 bj       2000   189
 6 bj       2001   228
 7 bj       2002   299
 8 bj       2003   577
 9 bj       2004   459
10 bj       2005   462
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

LIAM (count total records)

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  group_by(date_time, species) %>% 
  slice(1) %>% 
  ungroup() %>% 
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  ## Count number of rows in each group
  summarise(n = n(),
            #Alternative way to ungroup after summarise
            .groups = "drop")
# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   353
 2 bj       1997   401
 3 bj       1998   447
 4 bj       1999   158
 5 bj       2000   103
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   292
 9 bj       2004   215
10 bj       2005   201
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

HOMEWORK (TASK 3):
Convert table to wide format so that we have one row per year and one column per species

Our attempts

OLIVER

Code
extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  summarise(n = length(species)) %>%
  ungroup() %>% 
  pivot_wider(names_from = species,
              values_from = n) %>% 
  slice(1)
# A tibble: 1 × 4
   year    bj    gj    sj
  <dbl> <int> <int> <int>
1  1996   384   240    NA

ARJUN

Code
extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(species, year) %>% 
  summarize(nsight = sum(species==species)) %>%
  ungroup() %>% 
  tidyr::pivot_wider(names_from = species,
                     values_from = nsight) %>% 
  slice(1)
# A tibble: 1 × 4
   year    bj    gj    sj
  <dbl> <int> <int> <int>
1  1996   384   240    NA

EVE

Code
extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(year, species) %>%
  summarize(sightings = sum(groupsize)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = species,
              values_from = sightings) %>% 
  slice(1)
# A tibble: 1 × 4
   year    bj    gj    sj
  <dbl> <int> <int> <int>
1  1996   683   419    NA

RECAP

LESSON 1

  • Structure of hyenaR (GitHub, SQL Database)
  • Build and load the database
  • Types of hyenaR functions (create, fetch, find)
  • mutate()/group_by()/summarise()

LESSON 2

  • create_id_starting.table()
  • Filter rows with logical values (TRUE/FALSE; filter())
  • Filter by row number (slice())
  • Order dataframe (arrange())
  • Extract reproductive success (fetch_id_number.offspring())
  • Identify censored individuals (fetch_id_is.censored())

LESSON 3

  • Extract rank data (e.g. fetch_id_rank.sex.std())
  • Filter columns (select())
  • Generate multiple rows for the each ID, clan etc. (reshape_row_date.seq())

LESSON 4

  • lubridate package for working with dates
  • tidyr package to pivot wider/longer
  • Avoid grouped data (ungroup())

LESSON 5

  • Age of individuals (fetch_id_duration.lifespan() & fetch_id_age())
  • Last observation v. death date

NEXT STEPS

FUTURE TOPICS

  • Plotting (ggplot2)
  • ‘Advanced’ filtering (string matching, any/all functions)
  • Dataframe joins (combined records from two dataframes)
  • ‘Advanced’ dplyr methods (list columns, efficiently repeat tasks over multiple columns/rows)
  • Loops