09| For and While Loops

Miles Robertson, 12.25.23 (edited 01.16.24)

Introduction

We do repetitive tasks all the time. However, this repetitiveness in tasks can be looked at through different perspectives. Here is a list of some examples:

In all of these examples, we are doing one task over and over until some end goal is achieved. In the above examples, the task is completed either (1) a certain number of times, or sometimes once for every object available, or alternatively, (2) over and over until some condition is met. These two approaches/perspectives are qualitatively distinct: in one case, you know how many times you'll be doing something from the get-go, whereas in the other, you simply keep performing the task until completion is achieved. To flesh out this concept, I will try to explain how these two approaches apply to each of the above examples.

End Goal Repetitive
Task
Perspective (1) Perspective (2)
clean teeth brush back
and forth
brush 500 times brush until teeth are clean
clean room pick up
one item
at a time
pick up each item
on the ground
pick up items until room is clean
run put one foot
in front of
the other
put one foot in front
of the other 1000 times
put one foot in front of the other
until you reach your destination
study understand
one topic
at a time
understand each topic
in the book
understand topics until you
understand the whole curriculum

These two different perspectives appear clearly through loops in coding, specifically, through (1) for and (2) while loops. Practically all languages have these two types of loops, and they are used to perform repetitive tasks. The difference between the two is that a for loop repeats code once for every object in a vector or list, whereas a while loop repeats code until some condition is met.

for loops are generally preferred over while loops, and I avoid using while loops in virtually all cases. The reason for this is that for loops will run a fixed number of times, are easier to debug, and are more readable. while loops, on the other hand, will run until some condition is met, and this condition may never be met; this can lead to infinite loops. Although infinite loops will not cause your computer to explode, and can always be exited out of, they can leave you waiting, wondering if your code is working properly, just to find out that some bug caused an infinite loop to occur.

Below, I will show you how to use both for and while loops. I will also show you how to use break statement in these loops.

For Loops

Generally, for loops in R are structured as follows:

for (iteration.variable in iterable) {
    # do something
}

A quick definition: an iterable is an object that is made up of elements. In R, these are almost always vectors or lists. for loops are unique in that they create a variable, iteration.variable, which is used inside of the for loop. This variable is assigned to each element in iterable sequentially, once for each element. This variable can be used inside of the for loop to perform some task.

The variable defined in the for loop can be named however you please. However, this is one of the rare cases where the name of the variable is often short or abbreviated, such as i or j. This is because the variable is often just representing an index of some sort. However, when you can give the iteration variable a clear name, you should. This can be particularly helpful when your script has nested loops.

Below, I will go over a few examples of how for loops can be used.

Example 1: Print Lots of Numbers

In this example, we will print the numbers 1 through 100 that are not divisible by 3 to the console. We can find the remainder of division in R by using the %% operator (called the modulo operator). For example, 5 %% 2 will return 1, because 5 divided by 2 has a remainder of 1. We can use this to check if a number is divisible by 3, because if the remainder is 0, then the number is divisible by 3. We can use this to write the following code:

for (number in 1:100) {
    if (number %% 3 != 0) {
        cat(number, "\n") # print the number, then a newline
    }
}

The variable number takes on the values 1 through 100, one at a time, and each time the code inside of the for loop is run. Then, if the number is not divisible by 3, cat(number, "\n") is run.

Example 2: Find Prime Numbers

You will likely remember that prime numbers are integers that are only divisible by 1 and themselves. We can use this fact to write a program that finds if a given number is prime. To do this, we can again use the %% operator. However, this time, we will use it to check if a number is divisible by any number between 2 and one minus the number itself. There are ways to make this more efficient, but this approach will work for now. Of course, a number is only proved to be prime if it is not divisible by smaller numbers, so we should assume the number is prime, and only change our mind if we find another divisible number.

potential.prime <- 1234321
is.prime <- TRUE # assume the number is prime

for (factor in 2:(potential.prime-1)) {
    if (potential.prime %% factor == 0) {
        is.prime <- FALSE # conclude number isn't prime
        break # exit the loop early
    }
}

if (is.prime) {
    cat(potential.prime, "is prime.")
} else {
    cat(potential.prime, "is not prime.")
}

In the for loop, we check if the potential prime number is divisible by smaller numbers. It is divisible by any of those numbers, we set is.prime to FALSE. Since we now know that the number is not prime, we can exit the loop using the break statement. The break statement can be used when the rest of the loop is no longer needed, and it will exit the loop immediately.

We can extend this example to find all the prime numbers between 1 and 1000. To do this, we can use a nested for loop, or in other words, a for loop inside another for loop. We can increase the efficiency of this program by using the fact that we only need to check if a number is divisible by smaller prime numbers. This is because if a number is divisible by a non-prime number, it is also divisible by the prime factors of that number. For example, if a number is divisible by 6, it is also divisible by 2 and 3. Therefore, we can use the following code:

prime.vector <- c() # create an empty vector to store primes

for (potential.prime in 2:1000) {    
    is.prime <- TRUE # assume the number is prime

    for (factor in prime.vector) {
        if (potential.prime %% factor == 0) {
            is.prime <- FALSE # conclude number isn't prime
            break # exit the loop early
        }
    }

    if (is.prime) {
        prime.vector <- c(prime.vector, potential.prime) 
        # add the confirmed prime to the vector
    }
}

cat("The primes between 1 and 1000 are:", prime.vector)

This is certainly a lot to take in, so it is worth breaking this concept into smaller pieces. Note that the inside of the first for loop is practically identical to the previous code chunk. The only difference is that we are only checking divisibility by prime numbers, and we store new prime numbers in prime.vector instead of printing a message. Otherwise, the code is the same. The outer loop only serves to check subsequent numbers for being prime.

This code may feel complicated, but it is an incredibly helpful exercise to walk through the code line by line, and think about how R is executing this code: Initially, prime.vector is empty. In line 3, potential.prime is defined and takes on its first value of 2. In line 4, is.prime is assigned to be TRUE. In line 6, the inner loop is started. However, thus far, prime.vector is empty, so there is nothing for the inner loop to do, so it skips itself entirely. Therefore, is.prime is still TRUE, and thus 2 is added to prime.vector in line 14. Then, the loop restarts, and potential.prime is assigned to 3. With the completion of the first pass through the outer loop, potential.prime now takes on the value of 3, and the program jumps back to line 4 and starts again. The program proceeds as last time, except that now prime.vector is no longer empty, so the inner loop is run. For your own benefit, you should think through the next few iterations of this outer loop to get the hang of what is going on.

Example 3: Format Vectors in a List

Lists in R can also be used for for loops. In this example, we will use a for loop to edit each vector in a list. Imagine that we want to make a single message out of several character vectors. You have a list of character vectors that contain the names of a bunch of doctors, with each entry corresponding to a different hospital. You want to print a message that lists all the doctors at each hospital with this particular format, with each line representing one hospital:

[Name 1], [Name 2], [Name 3]
[Name 1], [Name 2], [Name 3]
[Name 1], [Name 2], [Name 3]

However, you want to make the letters all capitalized and add "MD" to the end of each name if it isn't already there. This could be done manually, but that would be tedious. Instead, you can use a for loop to do this for you. Copy the following list definition into a file to create the list of doctors (copying is usually restricted but is allowed here):

doctors.by.hospital <- list(
    c("olivia bennett MD", "ETHAN HAYES MD", "Mia Rodriguez"), 
    c("Liam Parker MD", "Ava mitchell", "sophia ramirez MD"), 
    c("NOAH TURNER", "Isabella carter", "jackson Foster")
)

Now, we can use a for loop to edit each vector in the list. Since there are multiple elements in each vector, we will need to use a nested for loop. In order to show how user-defined functions might be used in a situation like this, I will define a function that makes the appropriate corrections to each element of the character vectors, and then I will use a for loop to apply this function to each vector in the list.

fixDoctorName <- function(doctor.name) {
    # make all letters capitalized
    doctor.name <- toupper(doctor.name)
    # if the name doesn't end with " MD"...
    if (!endsWith(doctor.name, " MD")) {
        # ...add " MD" to the end
        doctor.name <- paste(doctor.name, "MD")
    }
    return(doctor.name)
}

This function fixes each name per our specifications. Now, our for loop will look more simplified than if we had not used a function.

for (hospital in doctors.by.hospital) {
    for (i in 1:length(hospital)) {
        hospital[i] <- fixDoctorName(hospital[i])
    }

    cat(paste(hospital, collapse=", "), "\n")
}

Nested for loops can be difficult to understand at first. To try to help you understand what is occurring here, I will walk through how your computer reads this code, line by line, with reference to line numbers:

Notice that we were able to iterate over the list doctors.by.hospital directly in the outer loop (i.e., we didn't use 1:length(doctors.by.hospital)). This means that the variable hospital became each character vector in the list, instead of being a number like we saw in the previous examples. However, in the inner loop, the i variable was indeed always a number value, taking on each value from 1 to length(hospital) (which was always 3 in this case).

While Loops

As mentioned above, while loops are generally avoided when for loops can be used instead. However, I will introduce the structure of a while loop, and give a couple examples of how it could be used:

while (condition) {
    # do something
}

Simply put, a while loop is just an if statement that repeats itself until the condition is no longer true. This means that the code inside the while loop must change the condition in some way, or else the loop will never end. For example, the following code will run forever until you stop it:

while (TRUE) {
    cat("This will never end!\n")
    Sys.sleep(0.5) # wait 0.5 seconds
}

Even though this creates an infinite loop, you should try out this example in your console. I add a delay of half a second so that your console will not immediately fill with the message. When you get tired of it, you can press the stop sign in the top right corner of the console to stop the code from running. This is the same way you would stop an infinite loop in a script.

In the example above, the condition is literally always TRUE, so the loop will never end. However, normal applications of a while loop will have a condition that is initially TRUE, but will eventually become FALSE after some number of iterations. Below shows a few ways that a while loop could be used.

Example 1: Divide by 2 until Less Than 1

If you are interested to see how many times a number can be divided by 2 before it is less than 1, you can simply use a logarithm with base 2 and then round down to the nearest integer:

number <- 1234321
cat(
    "The number", number, "can be divided by two", 
    floor(log2(number)), "times."
)

This is because a base 2 logarithm of some number asks the question "2 to what power is equal to this number?" For example, log2(8) is 3, because 2 to the power of 3 is 8. If we round the base 2 logarithm of a number down to the nearest integer (i.e., use the floor function), we get the biggest integer power of 2 that does not exceed the number of interest.

However, perhaps we want to print a message for every step of the division. We can use a while loop to do this:

number <- 1234321
current.number <- number
count <- 0

while (current.number > 1) {
    current.number <- current.number / 2
    count <- count + 1
    cat(
        number, "divided by two", count, 
        "times gives", current.number, "\n"
    )
}

Here, we use a while loop to divide current.number by 2 until it is less than 1, keeping track of how many times we have divided by 2 with count. Then, we print a message that shows the original number, how many times it has been divided by 2, and the current number. This is done until the number is less than 1, at which point the loop will stop.

Example 2: Calculate the Factorial

The factorial of an integer is the product of all the integers from 1 to that number. For example, 5 factorial is 5*4*3*2*1, or 120. Although we can simply calculate this using the factorial() function, we can also use a while loop:

input.number <- 5
result <- 1
counter <- 1

while (counter <= input.number) {
    result <- result * counter
    counter <- counter + 1
}

cat("The factorial of", input.number, "is", result, "\n")

TODO: figure out how this loop works

Walk through the above loop step-by-step, reading it like R would, to see how it works. The variable input.number is small enough that you can do this on paper relatively quickly.

Conclusion

In this tutorial, I introduced for and while loops. I showed you how to use them, and I gave a few examples of how they could be used. I also showed you how to use the break statement to exit a for loop early. It should be noted that break statements can also be used in while loops, and have the same function.


Practice

Redo Earlier Practice Problem

In a practice problem of a previous chapter, I asked you to run the following code, which included a for loop.

set.seed(123)
number.of.flips <- 100
number.of.trials <- 10000
longest.streaks <- c()

for (i in 1:number.of.trials) {
    flips <- sample(c("H", "T"), number.of.flips, replace=TRUE)
    run.lengths <- rle(flips)$lengths
    longest.streaks <- c(longest.streaks, max(run.lengths))
}

mean(longest.streaks)

Now, you should be able to understand what this code is doing. In this previous practice problem, I asked you to run the code many times with different numbers of trials, recording outcomes by hand and then plotting the results to see how the average longest streak changed as the number of trials increased. Now, you can do this automatically with a second for loop. Type the code above into an R script, and use an additional outer for loop to run the code with different numbers of trials. Refer to the initial version of this question for more details if necessary (link in the first sentence of this problem). This can seem intimidating, so I will write some pseudocode to help you get started:

make a vector of trial numbers
make a vector that will store the average longest streaks for each trial number

for each trial number:
    run the code above, but using that trial number
    store the result in the vector of average longest streaks

show results in a plot

Complete Tasks with Loops

Complete the following tasks using for loops and the cat function. Remember that new lines of output text can be started using "\n" in cat.

  1. Create a vector containing several people's names. Use a for loop to print each of their names with a new line after each name.
  2. Create two vectors: c(1, 2, 3) and c("A", "B", "C"). Use these vectors with nested for loops to print the following output:
    A1 A2 A3
    B1 B2 B3
    C1 C2 C3
  3. Create three vectors: c(1, 2), c("A", "B"), and c("+", "-"). Use these vectors with nested for loops to print the following output:
    +A1 +A2
    +B1 +B2
    
    -A1 -A2
    -B1 -B2

Generate Pascal's Triangle

You may have seen Pascal's triangle before. It is a triangle of numbers that is frequently used when calculating probabilities. The first five rows of it looks like this:

        1
       1 1          
      1 2 1
     1 3 3 1
    1 4 6 4 1

This triangle can continue for as many rows as you like. To make it, follow these rules:

  1. Rows are centered horizontally with each other, and entries of a row are equally spaced
  2. The $n$th row has $n$ entries in it (e.g., the 5th row has 5 entries, the 13th has 13, etc.)
  3. The first and last number of each row is 1
  4. To find entries that are not first or last in a row, add the two numbers that are found above the entry (top left + top right)

For this problem, create a function that takes a number $n$ as input and returns the first $n$ rows of Pascal's triangle in a list. For example, if $n=4$, the function should return a list with four elements that looks like this:

[[1]]
[1] 1

[[2]]
[1] 1 1

[[3]]
[1] 1 2 1

[[4]]
[1] 1 3 3 1

This problem is tricky and will require some serious thought. I will give you some pseudocode to get started, though you should feel free to deviate from it if you have a different approach.

define function getPascalRows with input number.rows
    make an empty list that is number.rows elements long
    make a variable previous.row that is given an initial value of NULL

    for each number (call it current.row.number) from 1 to number.rows:
        make a vector (call it current.row) that is current.row.number long, full of 1's
        if current.row.number is 3 or greater:
            for each number (call it current.entry) from 2 to current.row.number-1:
                set that number to the sum of whatever is at current.entry-1 and current.entry of previous.row
        insert current.row to the list at index current.row.number
        set previous.row to be current.row
    
    return the list

Each line of pseudocode can be turned into a line of R code. I have done the break-down of the problem for you, but you will need to fill in the details. This problem is not easy, so do not be discouraged if this is a difficult process. Ask for help if you need it!

Once you complete that problem, try to make a second function that takes the output of the first function and prints it to the console in the format shown above. It should clearly show a triangle shape, and each row should be centered with the row above it. A few hints to help:

Write a Sorting Algorithm

One enlightening practice is to think about how lists of numbers are sorted. There are many ways to sort a list of numbers, and some are more efficient than others. In this practice problem, I will ask you to write a function that sorts a list of numbers using for loops. You can use any approach that works, but I will try to give you helpful guidance. To begin, consider the following list of numbers:

4 2 7 3 9 8

Begin by literally sorting this list by hand, writing down the result on a paper. Once you have your result, ask yourself how you were able to get to that answer. What steps did you take? Write down these steps you took to sort the list. How might you explain the steps you took to someone else who does not understand the concept of sorting? This is how you write a sorting algorithm.

This is not an easy problem, so I will give you some things to think about. However, you should give the problem a proper go first before reading these hints.

  1. Perhaps your approach started with you finding the smallest number in the list. If you were given a super long list of numbers, how might you find the smallest one? Alternatively, if I were to read off list numbers to you one by one, can you think of a method you could use to find the smallest one?
  2. Once you have found the smallest number, what do you do with it? How do you remove it from the list? Where should it go?
  3. Once you have removed the smallest number from the list, what do you do next? How do you find the next smallest number?
  4. Can you identify any patterns in the steps you took to sort the list? Can you write these steps down in a way that could be executed over and over again in a for loop to get the final sorted list?

Make an Interactive Game of Blackjack

This is a very fun project that can help you see what coding is capable of. The idea is that you will write a script that gives the user prompts as they play a game of blackjack. The full game is somewhat complicated, so we'll use a simpler set of rules for the game:

Since the job of the dealer requires no decision-making, you can write a script that plays the role of the dealer. The script should "deal cards" to the player and dealer, and then play the game according to the rules above. For example, the script might play out like any of the following three examples (note that the user input is only yes or no in all cases):

Welcome to blackjack! Let's play.

You are dealt a 7♠ and a 10♦. Your hand value is 17.
The dealer is dealt a 4♣.

Would you like to draw a card? (yes/no) yes
You are dealt a 5♠. Your hand value is 22.
You bust. 

The dealer wins.
Welcome to blackjack! Let's play.

You are dealt a 7♦ and a 2♠. Your hand value is 9.
The dealer is dealt a J♣.

Would you like to draw a card? (yes/no) yes
You are dealt a 4♠. Your hand value is 13.

Would you like to draw a card? (yes/no) yes
You are dealt a 6♦. Your hand value is 19.

Would you like to draw a card? (yes/no) no
You stand.

The dealer is dealt a 6♠. The dealer's hand value is 11. 
The dealer draws a card.
The dealer is dealt a 10♠. The dealer's hand value is 21.

The dealer wins.
Welcome to blackjack! Let's play.

You are dealt a 10♠ and a 8♦. Your hand value is 18.
The dealer is dealt a 9♣.

Would you like to draw a card? (yes/no) no
You stand.

The dealer is dealt a 7♠. The dealer's hand value is 16. 
The dealer draws a card.
The dealer is dealt a 10♦. The dealer's hand value is 26. 
The dealer busts.

You win!

There are a lot of moving pieces in this one, so it may be appropriate to start by thinking about what you would need to do to write this script. What would you need to keep track of? What would you need to check for?

To begin, it is helpful to decide what data structures will be useful for each component of the game. For example...

Given that there are a lot of moving pieces here, I will give you the backbone of a script that you can use to get started. Note that the code chunk below can be copied, and that every comment that starts with # TODO is something for you to fill in.

spade <- "\U2660" # unicode for the suit symbols
heart <- "\U2665"
diamond <- "\U2666"
club <- "\U2663"

card.ranks <- c("A", as.character(2:10), "J", "Q", "K")
card.suits <- c(spade, heart, diamond, club)
total.card.count <- length(card.ranks) * length(card.suits)

### FUNCTIONS

createDeck <- function () {
    # function to create a deck of cards
    deck <- data.frame(
        matrix(
            ncol=2,
            nrow=0, 
            dimnames=list(NULL, c("suit", "rank"))
        )
    ) # makes an empty data frame with suit & rank columns
    # TODO: for each card suit 1-4...
        # TODO: for each card rank 1-13...
            deck[nrow(deck)+1, ] = c(suit.number, rank.number)
    return (deck)
}

shuffleDeck <- function (deck) {
    # function to shuffle a deck of cards
    shuffled.card.order <- # TODO: shuffle the numbers 1-52
    return (deck[shuffled.card.order, ])
}

dealCard <- function (current.deck) {
    # function to deal a card from a deck. Note that this
    # function returns a list, one item being the card drawn
    # and the other being what is left of the deck. Whenever
    # using this function, remember to update the deck with 
    # the new deck.
    new.card <- # TODO: get first row of current.deck
    return (list(
        card=new.card, 
        deck=# TODO: get all but first row of current.deck
    ))
}

getCardName <- function (suit.number, rank.number) {
    # function to get a string representation of a card
    return (paste(
        card.ranks[rank.number], 
        card.suits[suit.number]
    ))
}

getCardValue <- function (rank.number) {
    # function to get the value of a card from its rank.
    # Since all cards with ranks greater than 10 have a
    # value of 10, we can use the min function to get
    # the final value of the card like this:
    return (min(rank.number, 10))
}

### GAME

# TODO: create variables to track the value of the hands

# TODO: create the deck of cards
# TODO: shuffle the deck of cards

# TODO: use `cat` to print a welcome message
# TODO: draw 3 cards, 2 for the player and 1 the for dealer
# TODO: Print messages for both, including cards & hand values

# TODO: while the player hand value is less than 21...
    # TODO: ask the player if they want to draw a card
    # TODO: if the player says no...
        # TODO: break out of the loop
    # TODO: if the player says yes...
        # TODO: draw a card
        # TODO: print message with card and the new hand value
        # TODO: if value is greater than 21...
            # TODO: print player bust message

# TODO: if the player has not busted...
    # TODO: while dealer's hand is <17...
        # TODO: draw a card for the dealer
        # TODO: print message with card and the new hand value
        # TODO: if value is greater than 21...
            # TODO: print dealer bust message

# TODO: print winner message