Data science with {hyenaR}:
LESSON 17

🎉hyenaR v0.9.999941🎉

Use {drat} to access the new version of {hyenaR}.

## To download package from other sources
library(drat)

## Include 'hyenaproject' as a package source
addRepo("hyenaproject") 

## Download hyenaR
install.packages("hyenaR")


#Check you have the right version (0.9.999941)
packageVersion("hyenaR")
[1] '0.9.999941'

Prepare our workspace


STEP 1: Load required packages

library(hyenaR) ## For our hyena specific functions
library(dplyr) ## For most data wrangling
library(ggplot2) ## For plotting
library(lubridate) ## Working with dates
library(tidyr) ## Extra data wrangling functions
library(stringr) ## Working with text
library(waldo) ## To compare objects
library(skimr) ## Inspect data
library(purrr) ## For loops in the tidyverse


STEP 2: Load the database

load_package_database.full(
  
  # Location of our database file
  db.path = "example_git/source_data/Fisidata_2022_08_10.sqlite"
  
)

Today’s goals


GOAL 1: 🎉 Introduce the new features of v0.9.999941


GOAL 2: 🧑‍🏫 Functions and debugging in R

GOAL 1: Introduce the new features of v0.9.999941

Check the NEWS to find out more

build_vignette_news()

Much faster!

fetch_clan_number.anysex.all(clan = "A", at = "1997-01-01")


v0.9.99994

[1] "Median time: ~1.04s"

v0.9.999941

[1] "Median time: ~387ms"



~3x FASTER!

GOAL 2: 🧑‍🏫 Functions and debugging in R

Why create functions?

  • Reproducible, consistent behaviour.

  • Great for collaboration.

  • The ‘Rule of Three’

Creating a function

# Our function name...
my_function

Creating a function

# Our function name and argument names
my_function <- function(argument1, argument2){}

Creating a function

# Our function name and argument names
my_function <- function(argument1, argument2){
  
  # Code between {}
  argument1 + argument2
  
}
my_function(10, 1)
[1] 11

Using a function

You can specify the value of each argument explicitly

my_function(argument1 = 1, argument2 = 3)
[1] 4

…or use the order or arguments.

my_function(1, 3)
[1] 4

Specifying argument names is often better to prevent bugs!

my_function(argument2 = 3, argument1 = 1)
[1] 4

Using a function

Note

Not all functions need arguments!

day_of_the_week <- function(){
  format(Sys.Date(), "%A")
}
day_of_the_week()
[1] "Thursday"

Default values

Using default values can make functions more flexible but still user friendly.

This function is inflexible

add <- function(number){
  
  number + 1
  
}
add(number = 10)
[1] 11

This function is flexible but more complex

add <- function(number, addition){
  
  number + addition
  
}
add(number = 10, addition = 1)
[1] 11

Default values

Using default values can make functions easier to use, but still flexible.


This function is flexible but simple to use for the default case!

add <- function(number, addition = 1){
  
  number + addition
  
}
add(number = 10)
[1] 11
add(number = 10, addition = 2)
[1] 12

What does a function return?

By default, a function returns the last value output by the code…

add <- function(number, addition = 1){
  
  number + addition
  10
  
}

What will this return?

add(20)
[1] 10

What does a function return?

Use return() if you want to be specific.

Warning

Everything after return() is not run!

add <- function(number, addition = 1){
  
  new_number <- number + addition
  return(new_number)
  10
  
}

What will this return?

add(20)
[1] 21

What does a function return?

return() can be more efficient!

is_even <- function(object){
  ## Check if a number is 0...
  if (object == 0) {
    output <- NA
  }
  ## THEN check if a number is >0 and even
  if (object > 0 & object %% 2 == 0) {
    output <- "even"
  }
  ## THEN check if a number is >0 and odd
  if (object > 0 & object %% 2 != 0) {
    output <- "odd"
  }
  ## Return the output
  output
}

What does a function return?

return() can be more efficient!

is_even_new <- function(object){
  ## If number is 0...return NA and stop!!
  if (object == 0) {
    return(NA)
  }
  ## Otherwise...check if a number is >0 and even
  if (object > 0 & object %% 2 == 0) {
    output <- "even"
  }
  ## and check if a number is >0 and odd
  if (object > 0 & object %% 2 != 0) {
    output <- "odd"
  }
  ## Return the output
  output
}

What does a function return?

return() can be more efficient!

# A tibble: 2 × 3
  expression       median `itr/sec`
  <chr>          <bch:tm>     <dbl>
1 is_even(0)        574ns  1082427.
2 is_even_new(0)    164ns  1886342.

What if something goes wrong?

Default error messages are often hard to understand.

add <- function(number, addition = 1){
  
  number + addition
  
}
add("B")
Error in number + addition: non-numeric argument to binary operator

What if something goes wrong?

Use stop() to create your own error message!

add <- function(number, addition = 1){
  
  if (is.character(number) | is.character(addition)) {
    stop("The arguments 'number' and 'addition' cannot be characters!")
  }
  number + addition
  
}
add("B")
Error in add("B"): The arguments 'number' and 'addition' cannot be characters!

What if something goes wrong?

Use warning() to flag things that can work but might cause problems.

add <- function(number, addition = 1){
  
  if (is.character(number) | is.character(addition)) {
    stop("The arguments 'number' and 'addition' cannot be characters!")
  }
  
  if (is.logical(number) | is.logical(addition)) {
    warning("Coercing logical value to numeric.\n")
  }
  number + addition
  
}
add(number = 10, addition = TRUE)
Warning in add(number = 10, addition = TRUE): Coercing logical value to numeric.
[1] 11

Debugging!

  • browser()

  • debug()

  • trace()

Debugging!

browser() can help debug a function you are creating!

day_of_week <- function(date){
  if (is.numeric(date)) {
    stop("Function cannot accept a number")
  }
  day <- format(as.Date(date), "%A")
  paste("You were born on", day)
}
day_of_week("12-03-88")
Error in charToDate(x): character string is not in a standard unambiguous format

Debugging!

The function will stop at the point where we add browser()

day_of_week <- function(date){
  browser()
  if (is.numeric(date)) {
    stop("Function cannot accept a number")
  }
  day <- format(as.Date(date), "%A")
  paste("You were born on", day)
}
day_of_week("12-03-88")

Debugging!

Use the ‘expr’ argument in browser() to debug in certain conditions!

transform_numbers <- function(n){
  
  output <- c()
  
  for (i in runif(min = -10, max = 10, n = n)) {
    
    output <- append(output, log(i))
    
  }
  
  return(output)
  
}
set.seed(123)
transform_numbers(10)
 [1]         NaN  1.75199641         NaN  2.03605742  2.17581317         NaN
 [7] -0.57605814  2.06030725  0.02829615         NaN

Debugging!

Use the ‘expr’ argument in browser() to debug in certain conditions!

transform_numbers <- function(n){
  
  output <- c()
  
  for (i in runif(min = -10, max = 10, n = n)) {
    
    browser(expr = i <= 0)
    output <- append(output, log(i))
    
  }
  
  return(output)
  
}
set.seed(123)
transform_numbers(10)

Debugging!

debug() can be used for any function (even ones you didn’t write!)

debug(day_of_week)
day_of_week("12-03-88")

Warning

You will need to undebug() a function (or use debugonce())

Debugging!

If you identify a problem in another function, you can use trace(edit = TRUE) to tweak it.

date_time <- lubridate::ymd_hms("1999-01-01 00:00:00", tz = "Africa/Dar_es_Salaam")
as.Date(date_time)
[1] "1998-12-31"

Debugging!

Try it yourself.

average_age_firstrepro <- function(sex, birth_year, na.rm = FALSE){
  
  ## Create start and end date for the target year
  start_date <- paste('birth_year', "01/01", sep = "_")
  end_date   <- paste('birth_year', "12/31", sep = "_")
  
  ## Extract all individuals born in that year and their age at first birth
  all_births <- create_id_starting.table(from = start_date, to = end_date,
                           sex = sex,
                           lifestage = "cub", lifestage.overlap = "start") %>% 
    dplyr::mutate(firstbirth = fetch_id_date.birth.first(ID)) %>% 
    dplyr::mutate(age = fetch_id_age(ID, at = firstbirth))
  
  ## Determine an average age at first birth
  avg_age <- mean(all_births$age, na.rm = na.rm)
  
  ## Return the number
  return(avg_age)

}

Debugging!

Why doesn’t this code work?

Note

There might be more than one problem…

average_age_firstrepro(2019, "male")