2018-05-04

Go with the "Flow"

  • Any program will be designed to perform some set of computational operations, for example:
    • Calculate summary statistics
    • Plot raw data
    • Do a t-test
  • "Control Flow" describes the order in which specific operations take place
  • A key objective when writing your code is to make your program's control flow as simple as possible while minimizing redundancy.

Linear Program Flow

  • So far, the control flow of your code has been solely determined by the line number of each expression
    • "Code rolls downhill"
group1 <- sleep[sleep$group==1,] # Separate group 1 
group2 <- sleep[sleep$group==2,] # Separate group 2
x1 <- mean(group1$extra) # Calculate group 1 mean
x2 <- mean(group2$extra) # Calculate group 2 means
sem <- sqrt(.5*(var(group1$extra) + var(group2$extra)))*sqrt(2/10) # Calculate SEM
t <- (x1-x2)/sem # Find the t score

Linear Program Flow

  • The control flow diagram for this script is shown below
    • A. Separate groups
    • B. Calculate the mean
    • C. Calculate SEM
    • D. Find the t score

Altering the Flow

  • R has expressions which can alter this linear flow. Specifically, we can:
    • Execute a set of statements only if some condition is met
      • if else statements
    • Execute a set of statements zero or more times, until some condition is met
      • for and while statements
    • Stop the program, preventing any further execution
      • stop statements

if else syntax

An if else statement allows us to conditionally evaluate specific statements. The general syntax is:

expression_A
expression_B
if (condition) {
  expression_C
} else {
  expression_D 
}
  • condition is an R expression that evaluates to a single logical value (i.e. TRUE or FALSE value). This expression must be in parenthesis
  • expression_C and expression_D are two different blocks of R code that should be contained in curly braces

if else control flow

expression_A
expression_B
if (condition) {
  expression_C
} else {
  expression_D 
}
  • If condition is equal to TRUE, take the path through C
  • Otherwise, take the path through D

if else

Lets take a look at some simple examples of conditional branching. Here we'll use an if else statement to print out which of two random integers is larger.

x <- sample(1:50000,1) # random value from 1 to 50,000
y <- sample(1:50000,1)
print(sprintf("X is %d, Y is %d", x, y))
## [1] "X is 13276, Y is 18607"
if (x < y) {
  print("Y Wins!")
} else {
  print("X Wins!")
}
## [1] "Y Wins!"

if else

But we forgot a corner case - what if y equals x? We can add on additional if into this same statement like so:

x <- sample(1:50000,1) # random value from 1 to 50,000
y <- sample(1:50000,1)
print(sprintf("X is %d, Y is %d", x, y))
## [1] "X is 28643, Y is 45411"
if (x < y) {
  print("Y Wins!")
} else if (y < x) {
  print("X Wins!")
} else {
  print("zomg X equals Y!!")
}
## [1] "Y Wins!"

We can have as many else if statements chained together as we want, but only one else.

if else

We can also use just the if part of the statement.

Here, we sample a random integer for x, and if it comes out negative, we flip the sign with the abs() function.

x <- sample(-10:10,1)
if (x < 0) { 
  x <- abs(x)
}

Activity

Sample 2 random values from a standard normal distribution, using rnorm(). Call them x and y.

Then, write an if-else that does the following:

  • If they are both greater than the mean of that normal distribution, print "x and y are above the mean".
  • If they are both less than the mean of that normal distribution, print "x and y are below the mean".
  • If x and y fall on either side of the mean, print "x and y straddle the mean"

Loops

Loops are a class of control flow that allow you to repeat the same operations over and over again, without explicitly writing out those operations over and over again in the code.

  • for loops repeat your operations for an explicit number of repetitions
  • while loops repeat your operations an indeterminate number of times, stopping only when a pre-specified condition occurs (whenever that might be)

for loops

At the heart of a for loop is the R object which determines the number of iterations your for loop should make.

This counter variable (i in this example) is declared inside the parenthesis immediately following the for keyword. It is common to refer to the counter variable as the variable you "loop over".

Here, we loop over the variable i, which is the vector [1, 2, 3]. Inside the body of the for loop, the variable i takes on the values 1, 2, and 3, one at a time..

for (i in 1:3) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3

for loops

But the counter variable doesn't have to be sequential integers enumerating which iteration of the loop you are on. It can be any vector, and i will sequentially take on each value in the vector.

for (i in c("A","B","C",3,2,1)) {
  print(i)
}
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "3"
## [1] "2"
## [1] "1"

for loops

for (i in c("A","B","C",3,2,1)) {
  print(i)
}

Note 2 important things that happened:

  1. The print() statement was executed exactly 6 times, the same number of times as the length of c("A","B","C",1,2,3)
  2. The value of i changed on every iteration of the loop, as evidenced by the fact that print(i) spit out a different value every time

for loops

For loops are most often used for their "side effects"

This means you loop over one object (e.g., i = 1:10), but use the value of i inside the loop as a subscript to index into other data structures

This technique can be used to modify objects in the global environment, e.g., adding new values to an existing data frame

for loops

To demonstrate this indexing, here we:

  1. Create 4 x 2 character matrix
  2. Create another 4 element vector of numeric indices
  3. Loop over those indices, and use the iterator to index into the matrix and print each row
letters <- matrix(c("A","B","C","D",
                    "E","F","G","H"), nrow = 4)
indices <- 1:4
for (x in indices) { print(letters[x,]) }
## [1] "A" "E"
## [1] "B" "F"
## [1] "C" "G"
## [1] "D" "H"

Group Means Example

A common problem to solve in R is calculating summary statistics for each group in some data set. We're going to tackle this problem in the InsectSprays data set.

head(InsectSprays,5)
##   count spray
## 1    10     A
## 2     7     A
## 3    20     A
## 4    14     A
## 5    14     A

First we'll walk through how we would accomplish this process with a purely linear control flow, and then we'll see how to approach the problem with a loop.

Group means: the hard way

First we need find the unique sprays used, and create a "placeholder" data frame which will end up storing the mean for each spray.

sprays <- unique(InsectSprays$spray)
sprayMeans <- data.frame(spray = sprays, meanCount = NA)
print(sprayMeans)
##   spray meanCount
## 1     A        NA
## 2     B        NA
## 3     C        NA
## 4     D        NA
## 5     E        NA
## 6     F        NA

Group means: the hard way

Then we select the first spray, and use $ and == to extract the observed counts for that spray group from the InsectSprays data frame.

sprays <- unique(InsectSprays$spray)
sprayMeans <- data.frame(spray = sprays, meanCount = NA)
i = sprays[1]
print(i)
## [1] A
## Levels: A B C D E F
counts <- InsectSprays$count[InsectSprays$spray==i]
print(counts)
##  [1] 10  7 20 14 14 12 10 23 17 20 14 13

Group means: the hard way

Then we average those extracted counts, subset our place holder data frame, and fill in the count column for this specific spray with the mean value.

sprays <- unique(InsectSprays$spray)
sprayMeans <- data.frame(spray = sprays, meanCount = NA)
i = sprays[1]
counts <- InsectSprays$count[InsectSprays$spray==i]
m <- mean(counts)
sprayMeans[sprayMeans$spray==i,'meanCount'] <- m
sprayMeans
##   spray meanCount
## 1     A      14.5
## 2     B        NA
## 3     C        NA
## 4     D        NA
## 5     E        NA
## 6     F        NA

Group means: the hard way

##   spray meanCount
## 1     A      14.5
## 2     B        NA
## 3     C        NA
## 4     D        NA
## 5     E        NA
## 6     F        NA

OK, one group down, 5 to go! We will repeat the same thing over and over and over again for each group. If only there was a way to get R to repeat the same operations over and over and over again….

Oh wait, for loops do just that!

Group means: for loop

We start the same way as before, finding the unique sprays used, and creating a "placeholder" data frame to store results.

sprays <- unique(InsectSprays$spray)
sprayMeans <- data.frame(spray = sprays, meanCount = NA)
print(sprayMeans)
##   spray meanCount
## 1     A        NA
## 2     B        NA
## 3     C        NA
## 4     D        NA
## 5     E        NA
## 6     F        NA

Group means: for loop

But this time we use a for loop, looping over each unique spray. Inside the loop, we:

  1. Subset out the observed counts for a particular spray
  2. Find the mean of those counts, and fill that mean into the placeholder data frame
sprays <- unique(InsectSprays$spray)
sprayMeans <- data.frame(spray = sprays, meanCount = NA)

for (i in sprays) {
  counts <- InsectSprays$count[InsectSprays$spray==i]
  sprayMeans[sprayMeans$spray==i,'meanCount'] <- mean(counts)
}

Group means: for loop

The result is a data frame the holds the mean insect count for each unique spray used, with no code duplication

sprayMeans
##   spray meanCount
## 1     A 14.500000
## 2     B 15.333333
## 3     C  2.083333
## 4     D  4.916667
## 5     E  3.500000
## 6     F 16.666667

Activity

Write a for loop that calculates the cumulative sum of the vector c(7,2,22,5,33,8).

If you don't know, the cumulative sum is a sequence of the partial sums of a given sequence. For example, the cumulative sum of the vector c(1,2,3) is (1,3,6).

while loops

  • while loops are like a hybrid of an if else and for loops
    • They repeat the same code block over and over again like for loop
    • But they only evaluate the block if some condition is met
k <- 1
while(k < 5) {
  print(k)
  k <- k + 1
}

while loops

  • while loops are appropriate to write when you don't know a priori how many iterations you will need
    • e.g. How many samples from the normal must you take before getting a value more than 3 SD's away?
r <- rnorm(1)
samples <- 1
while(r < 3 && r > -3) {
    r <- rnorm(1)
    samples <- samples+1
}
print(paste("Success only took",samples,"samples"))
## [1] "Success only took 386 samples"

while loops

A good example of looping indefinitely until a condition occurs is the brute-force method of calculating how many trials you will need to achieve a specific level of experimental power

power <- 0; alt_p <- .75; null_p <- .5; n <- 0;
while (power < .8) {
  n <- n+1
  crit_n <- qbinom(.025, size=n, prob = null_p, lower.tail=FALSE) - 1
  power <- pbinom(crit_n, size=n, prob = alt_p, lower.tail = FALSE)
}
print(paste(n, "trials needed"))
## [1] "23 trials needed"

loop pro-tips

  • Loops are often slow, so when you start writing one, carefully consider whether or not there is another solution.
  • If you must use one (and sometimes you must), ALWAYS PREALLOCATE THE FINAL DATA STRUCTURES YOU WANT
  • Remember in our InsectSprays means example, we made a place holder data frame the exact size and shape we wanted, and then filled it in
sprays <- unique(InsectSprays$spray)
sprayMeans <- data.frame(spray = sprays, meanCount = NA)

for (i in sprays) {
  counts <- InsectSprays$count[InsectSprays$spray==i]
  sprayMeans[sprayMeans$spray==i,'meanCount'] <- mean(counts)
}

stop

  • You can halt the execution of your program using a stop statement
    • You probably don't need this just yet in your programming lives, but its good to know it exists
  • One good use for this is in combination with if statements, to detect states that are indicative of serious problems, and throw an error
x <- somenumbers
if (sd(x) < 0) {
  stop("Standard Deviation cannot be negative!")
}