Data science with `{hyenaR}`:
LESSON 6

Prepare our workspace

STEP 1: Load required packages

library(hyenaR) ## For our hyena specific functions
library(dplyr) ## For most data wrangling
library(ggplot2) ## For plotting
library(lubridate) ## Working with dates
library(tidyr) ## Extra data wrangling functions

STEP 2: Load the database

load_package_database.full(
  
  # Location of our database file
  db.path = "example_git/source_data/Fisidata_2022_08_10.sqlite"
  
)

Recap: What did we cover last week?

Wide vs. long formats

WIDE FORMAT

# A tibble: 1 × 10
  date       to             A     E     F     L     M     N     S     T
  <date>     <date>     <int> <int> <int> <int> <int> <int> <int> <int>
1 1997-07-01 1997-12-31    49    18    14    48    52    20     4    12

LONG FORMAT

# A tibble: 8 × 4
  clan  date       to         clan_size
  <chr> <date>     <date>         <int>
1 A     1997-07-01 1997-12-31        49
2 E     1997-07-01 1997-12-31        18
3 F     1997-07-01 1997-12-31        14
4 L     1997-07-01 1997-12-31        48
5 M     1997-07-01 1997-12-31        52
6 N     1997-07-01 1997-12-31        20
7 S     1997-07-01 1997-12-31         4
8 T     1997-07-01 1997-12-31        12

Pivoting

Last observation v. death date

# Last observation is the last record in the sightings table
fetch_id_date.observation.last(ID = "A-001")

[1] "2007-06-07"

# Death date is (usually) the day AFTER last sighting
fetch_id_date.death(ID = "A-001")

[1] "2007-06-08"

New `hyenaR` functions

# Is individual > 2yo?
fetch_id_is.adult(ID = "A-011", at = "1997-01-01")

[1] TRUE

# Age at death
fetch_id_duration.lifespan(ID = "A-011")

[1] 12.00821

Today’s goals

GOAL 1: HOMEWORK

GOAL 2: LOOK BACK ON WHAT WE’VE COVERED

GOAL 3: PLAN NEXT STEPS

HOMEWORK (TASK 1):
Filter sightings of all three jackal species from the carnivores table (bj, gj, sj).

Our attempts

OLIVER & ARJUN & EVE

Code

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>% 
  select(date_time, species, groupsize)

# A tibble: 5,320 × 3
   date_time           species groupsize
   <dttm>              <chr>       <int>
 1 1996-03-28 07:20:00 bj             10
 2 1996-03-28 08:46:00 gj              2
 3 1996-03-28 08:46:00 gj              2
 4 1996-04-12 09:30:00 gj              1
 5 1996-04-12 10:03:00 bj              2
 6 1996-04-12 10:03:00 bj              2
 7 1996-04-12 10:17:00 gj              2
 8 1996-04-12 10:17:00 gj              2
 9 1996-04-13 09:07:00 bj              3
10 1996-04-13 09:07:00 bj              3
# … with 5,310 more rows
# ℹ Use `print(n = ...)` to see more rows

HOMEWORK (TASK 2):
Determine total sightings of each species in each year.

Our attempts

OLIVER

Code

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  summarise(n = length(species)) %>%
  ungroup()

# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   384
 2 bj       1997   402
 3 bj       1998   449
 4 bj       1999   159
 5 bj       2000   104
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   298
 9 bj       2004   215
10 bj       2005   203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

ARJUN

Code

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(species, year) %>% 
  summarize(n = sum(species==species)) %>%
  ungroup()

# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   384
 2 bj       1997   402
 3 bj       1998   449
 4 bj       1999   159
 5 bj       2000   104
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   298
 9 bj       2004   215
10 bj       2005   203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

EVE

Code

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(year, species) %>%
  summarize(n = sum(groupsize)) %>% 
  ungroup()

# A tibble: 50 × 3
    year species     n
   <dbl> <chr>   <int>
 1  1996 bj        683
 2  1996 gj        419
 3  1997 bj        828
 4  1997 gj        322
 5  1997 sj         12
 6  1998 bj         NA
 7  1998 gj         NA
 8  1998 sj         24
 9  1999 bj        320
10  1999 gj        115
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

OLIVER

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  ## Use lubridate to extract year
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  ## Count length of the species column in each species/year combo
  summarise(n = length(species)) %>%
  ungroup()

# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   384
 2 bj       1997   402
 3 bj       1998   449
 4 bj       1999   159
 5 bj       2000   104
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   298
 9 bj       2004   215
10 bj       2005   203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

ARJUN

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  ## Use lubridate to extract year
  mutate(year = lubridate::year(date_time)) %>%
  group_by(species, year) %>% 
  ## Count the number of times that species matches species of the species/year combo
  summarize(nsight = sum(species==species)) %>%
  ungroup()

# A tibble: 50 × 3
   species  year nsight
   <chr>   <dbl>  <int>
 1 bj       1996    384
 2 bj       1997    402
 3 bj       1998    449
 4 bj       1999    159
 5 bj       2000    104
 6 bj       2001    117
 7 bj       2002    151
 8 bj       2003    298
 9 bj       2004    215
10 bj       2005    203
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

EVE

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  ## Use lubridate to extract year
  mutate(year = lubridate::year(date_time)) %>%
  group_by(year, species) %>%
  ## Sum the groupsize columns within each year/species combo
  summarize(sightings = sum(groupsize)) %>% 
  ungroup()

# A tibble: 50 × 3
    year species sightings
   <dbl> <chr>       <int>
 1  1996 bj            683
 2  1996 gj            419
 3  1997 bj            828
 4  1997 gj            322
 5  1997 sj             12
 6  1998 bj             NA
 7  1998 gj             NA
 8  1998 sj             24
 9  1999 bj            320
10  1999 gj            115
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

The `na.rm` argument

numbers <- c(1, 2, 3, NA, 5)

sum(numbers)

[1] NA

sum(numbers, na.rm = TRUE)

[1] 11

Our attempts

LIAM (count total group size)

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  # Deal with cases where there are repeated measurements
  # when there are duplicates (species/date_time) only use the first record
  group_by(date_time, species) %>% 
  slice(1) %>% 
  ungroup() %>% 
  ## Use lubridate to extract year 
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  ## Sum the groupsize columns within each year/species combo
  ## BUT IGNORE NAs
  summarise(n = sum(groupsize, na.rm = TRUE),
            #Alternative way to ungroup after summarise
            .groups = "drop")

# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   587
 2 bj       1997   826
 3 bj       1998   928
 4 bj       1999   319
 5 bj       2000   189
 6 bj       2001   228
 7 bj       2002   299
 8 bj       2003   577
 9 bj       2004   459
10 bj       2005   462
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

Our attempts

LIAM (count total records)

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  group_by(date_time, species) %>% 
  slice(1) %>% 
  ungroup() %>% 
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  ## Count number of rows in each group
  summarise(n = n(),
            #Alternative way to ungroup after summarise
            .groups = "drop")

# A tibble: 50 × 3
   species  year     n
   <chr>   <dbl> <int>
 1 bj       1996   353
 2 bj       1997   401
 3 bj       1998   447
 4 bj       1999   158
 5 bj       2000   103
 6 bj       2001   117
 7 bj       2002   151
 8 bj       2003   292
 9 bj       2004   215
10 bj       2005   201
# … with 40 more rows
# ℹ Use `print(n = ...)` to see more rows

HOMEWORK (TASK 3):
Convert table to wide format so that we have one row per year and one column per species

Our attempts

OLIVER

Code

extract_database_table(tbl.names = "carnivores") %>%
  filter(species %in% c("bj","gj","sj")) %>%
  mutate(year = year(date_time)) %>%
  group_by(species, year) %>%
  summarise(n = length(species)) %>%
  ungroup() %>% 
  pivot_wider(names_from = species,
              values_from = n) %>% 
  slice(1)

# A tibble: 1 × 4
   year    bj    gj    sj
  <dbl> <int> <int> <int>
1  1996   384   240    NA

ARJUN

Code

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(species, year) %>% 
  summarize(nsight = sum(species==species)) %>%
  ungroup() %>% 
  tidyr::pivot_wider(names_from = species,
                     values_from = nsight) %>% 
  slice(1)

# A tibble: 1 × 4
   year    bj    gj    sj
  <dbl> <int> <int> <int>
1  1996   384   240    NA

EVE

Code

extract_database_table("carnivores") %>% 
  filter(species %in% c("bj", "gj", "sj")) %>% 
  mutate(year = lubridate::year(date_time)) %>%
  group_by(year, species) %>%
  summarize(sightings = sum(groupsize)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = species,
              values_from = sightings) %>% 
  slice(1)

# A tibble: 1 × 4
   year    bj    gj    sj
  <dbl> <int> <int> <int>
1  1996   683   419    NA

– RECAP –

LESSON 1

Structure of hyenaR (GitHub, SQL Database)
Build and load the database
Types of hyenaR functions (create, fetch, find)
mutate()/group_by()/summarise()

LESSON 2

create_id_starting.table()
Filter rows with logical values (TRUE/FALSE; filter())
Filter by row number (slice())
Order dataframe (arrange())
Extract reproductive success (fetch_id_number.offspring())
Identify censored individuals (fetch_id_is.censored())

LESSON 3

Extract rank data (e.g. fetch_id_rank.sex.std())
Filter columns (select())
Generate multiple rows for the each ID, clan etc. (reshape_row_date.seq())

LESSON 4

lubridate package for working with dates
tidyr package to pivot wider/longer
Avoid grouped data (ungroup())

LESSON 5

Age of individuals (fetch_id_duration.lifespan() & fetch_id_age())
Last observation v. death date

– NEXT STEPS –

FUTURE TOPICS

Plotting (ggplot2)
‘Advanced’ filtering (string matching, any/all functions)
Dataframe joins (combined records from two dataframes)
‘Advanced’ dplyr methods (list columns, efficiently repeat tasks over multiple columns/rows)
Loops

Data science with {hyenaR}:LESSON 6

Prepare our workspace

STEP 1: Load required packages

STEP 2: Load the database

Recap: What did we cover last week?

Wide vs. long formats

WIDE FORMAT

LONG FORMAT

Pivoting

Last observation v. death date

New hyenaR functions

Today’s goals

GOAL 1: HOMEWORK

GOAL 2: LOOK BACK ON WHAT WE’VE COVERED

GOAL 3: PLAN NEXT STEPS

HOMEWORK (TASK 1):Filter sightings of all three jackal species from the carnivores table (bj, gj, sj).

Our attempts

OLIVER & ARJUN & EVE

HOMEWORK (TASK 2):Determine total sightings of each species in each year.

Our attempts

OLIVER

ARJUN

EVE

Our attempts

OLIVER

Our attempts

ARJUN

Our attempts

EVE

The na.rm argument

Our attempts

LIAM (count total group size)

Our attempts

LIAM (count total records)

HOMEWORK (TASK 3):Convert table to wide format so that we have one row per year and one column per species

Our attempts

OLIVER

ARJUN

EVE

– RECAP –

LESSON 1

LESSON 2

LESSON 3

LESSON 4

LESSON 5

– NEXT STEPS –

FUTURE TOPICS

Data science with `{hyenaR}`:
LESSON 6

New `hyenaR` functions

HOMEWORK (TASK 1):
Filter sightings of all three jackal species from the carnivores table (bj, gj, sj).

HOMEWORK (TASK 2):
Determine total sightings of each species in each year.

The `na.rm` argument

HOMEWORK (TASK 3):
Convert table to wide format so that we have one row per year and one column per species