We do repetitive tasks all the time. However, this repetitiveness in tasks can be looked at through different perspectives. Here is a list of some examples:
In all of these examples, we are doing one task over and over until some end goal is achieved. In the above examples, the task is completed either (1) a certain number of times, or sometimes once for every object available, or alternatively, (2) over and over until some condition is met. These two approaches/perspectives are qualitatively distinct: in one case, you know how many times you'll be doing something from the get-go, whereas in the other, you simply keep performing the task until completion is achieved. To flesh out this concept, I will try to explain how these two approaches apply to each of the above examples.
End Goal | Repetitive Task |
Perspective (1) | Perspective (2) |
---|---|---|---|
clean teeth | brush back and forth |
brush 500 times | brush until teeth are clean |
clean room | pick up one item at a time |
pick up each item on the ground |
pick up items until room is clean |
run | put one foot in front of the other |
put one foot in front of the other 1000 times |
put one foot in front of the other until you reach your destination |
study | understand one topic at a time |
understand each topic in the book |
understand topics until you understand the whole curriculum |
These two different perspectives appear clearly through loops in coding,
specifically, through (1) for
and (2) while
loops. Practically
all languages have these two types of loops, and they are used to perform
repetitive tasks. The difference between the two is that a for
loop
repeats code once for every object in a vector or list, whereas a while
loop repeats code until some condition is met.
for
loops are generally preferred over while
loops, and I
avoid using while
loops in virtually all cases. The reason for this is
that for
loops will run a fixed number of times, are easier to debug, and
are more readable. while
loops, on the other hand, will run until some
condition is met, and this condition may never be met; this can lead to infinite loops.
Although infinite loops will not cause your computer to explode, and can always be
exited out of, they can leave you waiting, wondering if your code is working properly,
just to find out that some bug caused an infinite loop to occur.
Below, I will show you how to use both for
and while
loops.
I will also show you how to use break
statement
in these loops.
Generally, for
loops in R are structured as follows:
for (iteration.variable in iterable) {
# do something
}
A quick definition: an iterable is an object that is made up of elements.
In R, these are almost always vectors or lists. for
loops are unique in
that they create a variable, iteration.variable
, which is used inside
of the for
loop. This variable is assigned to each element in iterable
sequentially, once for each element. This variable can be used inside of the for
loop
to perform some task.
The variable defined in the for
loop can be named however you please.
However, this is one of the rare cases where the name of the variable is often short or
abbreviated, such as i
or j
. This is because the variable is
often just representing an index of some sort. However, when you can give the
iteration variable a clear name, you should. This can be particularly helpful when
your script has nested loops.
Below, I will go over a few examples of how for
loops can be used.
In this example, we will print the numbers 1 through 100 that are
not divisible by 3 to the console. We can find the remainder of division
in R by using the %%
operator (called the modulo
operator). For example, 5 %% 2
will return 1, because 5 divided by 2 has a remainder of 1. We can use this
to check if a number is divisible by 3, because if the remainder is 0, then
the number is divisible by 3. We can use this to write the following code:
for (number in 1:10) {
if (number %% 3 != 0) {
cat(number, "\n") # print the number, then a newline
}
}
In this simple example, each of the numbers 1 through 10 are
printed to the console. The variable number
takes on
the values 1 through 100, one at a time, and each time the code inside
of the for
loop is run. Then, if the number is not divisible
by 3, cat(number, "\n")
is run.
You will likely remember that prime numbers are integers that are only
divisible by 1 and themselves. We can use this fact to write a program
that finds if a given number is prime. To do this, we
can again use the %%
operator. However, this time, we will
use it to check if a number is divisible by any number between 2 and
one minus the number itself. There are ways to make this more efficient,
but this approach will work for now. Of course, a number is only proved to be
prime if it is not divisible by smaller numbers, so we should assume the
number is prime, and only change our mind if we find another divisible number.
potential.prime <- 1234321
is.prime <- TRUE # assume the number is prime
for (factor in 2:(potential.prime-1)) {
if (potential.prime %% factor == 0) {
is.prime <- FALSE # conclude number isn't prime
break # exit the loop early
}
}
if (is.prime) {
cat(potential.prime, "is prime.")
} else {
cat(potential.prime, "is not prime.")
}
In the for
loop, we check if the potential prime number
is divisible by smaller numbers. It is divisible by any of those numbers,
we set is.prime
to FALSE
. Since we now know that
the number is not prime, we can exit the loop using the break
statement. The break
statement can be used when the rest of
the loop is no longer needed, and it will exit the loop immediately.
We can extend this example to find all the prime numbers between 1 and
1000. To do this, we can use a nested for
loop, or in other words,
a for
loop inside another for
loop. We can increase
the efficiency of this program by using the fact that we only need to check
if a number is divisible by smaller prime numbers. This is because if a number
is divisible by a non-prime number, it is also divisible by the prime factors
of that number. For example, if a number is divisible by 6, it is also divisible
by 2 and 3. Therefore, we can use the following code:
prime.vector <- c() # create an empty vector to store primes
for (potential.prime in 2:1000) {
is.prime <- TRUE # assume the number is prime
for (factor in prime.vector) {
if (potential.prime %% factor == 0) {
is.prime <- FALSE # conclude number isn't prime
break # exit the loop early
}
}
if (is.prime) {
prime.vector <- c(prime.vector, potential.prime)
# add the confirmed prime to the vector
}
}
cat("The primes between 1 and 1000 are:", prime.vector)
This is certainly a lot to take in, so it is worth breaking this concept into
smaller pieces. Note that the inside of the first for
loop
is practically identical to the previous code chunk. The only difference is
that we are only checking divisibility by prime numbers, and we store new
prime numbers in prime.vector
instead of printing a message. Otherwise,
the code is the same. The outer loop only serves to check subsequent numbers for
being prime.
This code may feel complicated, but it is an incredibly helpful exercise to
walk through the code line by line, and think about how R is executing this code:
Initially, prime.vector
is empty. In line 3, potential.prime
is defined and takes on its first value of 2. In line 4, is.prime
is assigned to
be TRUE
. In line 6, the inner loop is started. However, thus far,
prime.vector
is empty, so there is nothing for the inner loop
to do, so it skips itself entirely. Therefore, is.prime
is still
TRUE
, and thus 2 is added to prime.vector
in line 14.
Then, the loop restarts, and potential.prime
is assigned to 3.
With the completion of the first pass through the outer loop, potential.prime
now takes on the value of 3, and the program jumps back to line 4 and starts again.
The program proceeds as last time, except that now prime.vector
is no longer
empty, so the inner loop is run. For your own benefit, you should think through the
next few iterations of this outer loop to get the hang of what is going on.
Lists in R can also be used for for
loops. In this example, we will
use a for
loop to edit each vector in a list.
Imagine that we want to make a single message out of several character vectors.
You have a list of character vectors that contain the names of a bunch
of doctors, with each entry corresponding to a different hospital.
You want to print a message that lists all the doctors at each hospital with this
particular format, with each line representing one hospital:
[Name 1], [Name 2], [Name 3]
[Name 1], [Name 2], [Name 3]
[Name 1], [Name 2], [Name 3]
However, you want to make the letters all capitalized and add
"MD" to the end of each name if it isn't already there. This could be done
manually, but that would be tedious. Instead, you can use a for
loop to do this for you. Copy the following list definition into a file to create the
list of doctors (copying is usually restricted but is allowed here):
doctors.by.hospital <- list(
c("olivia bennett MD", "ETHAN HAYES MD", "Mia Rodriguez"),
c("Liam Parker MD", "Ava mitchell", "sophia ramirez MD"),
c("NOAH TURNER", "Isabella carter", "jackson Foster")
)
Now, we can use a for
loop to edit each vector in the list.
Since there are multiple elements in each vector, we will need to use a
nested for
loop. In order to show how user-defined functions
might be used in a situation like this, I will define a function that
makes the appropriate corrections to each element of the character vectors,
and then I will use a for
loop to apply this function to each
vector in the list.
fixDoctorName <- function(doctor.name) {
# make all letters capitalized
doctor.name <- toupper(doctor.name)
# if the name doesn't end with " MD"...
if (!endsWith(doctor.name, " MD")) {
# ...add " MD" to the end
doctor.name <- paste(doctor.name, "MD")
}
return(doctor.name)
}
This function fixes each name per our specifications. Now, our for
loop
will look more simplified than if we had not used a function.
for (hospital in doctors.by.hospital) {
for (i in 1:length(hospital)) {
hospital[i] <- fixDoctorName(hospital[i])
}
cat(paste(hospital, collapse=", "), "\n")
}
Here, you can see that we were able to iterate over the list
doctors.by.hospital
directly in the outer loop
(i.e., we didn't use 1:length(doctors.by.hospital)
).
This means that the variable hospital
became each character
vector in the list, instead of being a number like we saw in the previous
examples. However, in the inner loop, the i
variable was
indeed always a number value, taking on each value from 1
to length(hospital)
(which was always 3 in this case).
We applied our new function to the i
th element of the vector,
and saved over the previous value. Then, at the end of the outer loop (line 6),
we printed the vector to the console. This was done for each vector in the list
to achieve the final result.
You'll see that I used the collapse
argument of the paste
function, which allows you to turn a character
vector of many elements into a single string, with each element separated by
the string you specify. In this case, I used a comma and a space. Whatever is
given to the collapse
argument is placed between each element
of the vector. Look below to see the difference between when to use the
collapse
argument and when to use the sep
argument.
paste(c("a", "b", "c"), collapse=", ")
paste("a", "b", "c", sep=", ")
# these give the same output, but the first has a vector as
# input and uses collapse while the second has
# individual elements as input and uses sep
As mentioned above, while
loops are generally avoided when
for
loops can be used instead. However, I will introduce
the structure of a while
loop, and give a couple examples of
how it could be used:
while (condition) {
# do something
}
Simply put, a while
loop is just an if
statement
that repeats itself until the condition is no longer true. This means that
the code inside the while
loop must change the condition
in some way, or else the loop will never end. For example, the following
code will run forever until you stop it:
while (TRUE) {
cat("This will never end!\n")
Sys.sleep(0.5) # wait 0.5 seconds
}
Even though this creates an infinite loop, you should try out this example in your console. I add a delay of half a second so that your console will not immediately fill with the message. When you get tired of it, you can press the stop sign in the top right corner of the console to stop the code from running. This is the same way you would stop an infinite loop in a script.
In the example above, the condition is literally always TRUE
, so
the loop will never end. However, normal applications of a while
loop will have a condition that is initially TRUE
, but will
eventually become FALSE
after some number of iterations.
Below shows a few ways that a while
loop could be used.
If you are interested to see how many times a number can be divided by 2 before it is less than 1, you can simply use a logarithm with base 2 and then round down to the nearest integer:
number <- 1234321
cat(
"The number", number, "can be divided by two",
floor(log2(number)), "times."
)
This is because a base 2 logarithm of some number asks the question "2 to what
power is equal to this number?" For example, log2(8)
is 3, because
2 to the power of 3 is 8. If we round the base 2 logarithm of a number down to
the nearest integer (i.e., use the floor
function), we get the
biggest integer power of 2 that does not exceed the number of interest.
However, perhaps we want to print a message for every step of the division.
We can use a while
loop to do this:
number <- 1234321
current.number <- number
count <- 0
while (current.number > 1) {
current.number <- current.number / 2
count <- count + 1
cat(
number, "divided by two", count,
"times gives", current.number, "\n"
)
}
Here, we use a while
loop to divide current.number
by 2 until it is less than 1, keeping track of how many times we have divided
by 2 with count
. Then, we print a message that shows the original
number, how many times it has been divided by 2, and the current number. This
is done until the number is less than 1, at which point the loop will stop.
The factorial of an integer is the product of all the integers from 1 to that number.
For example, 5 factorial is 5*4*3*2*1
, or 120. Although we can
simply calculate this using the factorial()
function,
we can also use a while
loop:
input.number <- 5
result <- 1
counter <- 1
while (counter <= input.number) {
result <- result * counter
counter <- counter + 1
}
cat("The factorial of", input.number, "is", result, "\n")
Walk through the above loop step-by-step, reading it like R would,
to see how it works. The variable input.number
is
small enough that you can do this on paper relatively quickly.
In this tutorial, I introduced for
and while
loops.
I showed you how to use them, and I gave a few examples of how they could
be used. I also showed you how to use the break
statement
to exit a for
loop early. It should be noted that break
statements can also be used in while
loops, and have the same function.
In a practice problem of a previous chapter,
I asked you to run the following code, which included a for
loop.
set.seed(123)
number.of.flips <- 100
number.of.trials <- 10000
longest.streaks <- c()
for (i in 1:number.of.trials) {
flips <- sample(c("H", "T"), number.of.flips, replace=TRUE)
run.lengths <- rle(flips)$lengths
longest.streaks <- c(longest.streaks, max(run.lengths))
}
mean(longest.streaks)
Now, you should be able to understand what this code is doing. In this previous practice problem,
I asked you to run the code many times with different numbers of trials, recording outcomes by hand
and then plotting the results to see how the average longest streak changed as the number of trials
increased. Now, you can do this automatically with a second for
loop. Type the code above into
an R script, and use an additional outer for
loop to run the code with different numbers of trials.
Refer to the initial version of this question for more details if necessary (link in the first sentence of this problem).
This can seem intimidating, so I will write some
pseudocode
to help you get started:
make a vector of trial numbers
make a vector that will store the average longest streaks for each trial number
for each trial number:
run the code above, but using that trial number
store the result in the vector of average longest streaks
show results in a plot
You may have seen Pascal's triangle before. It is a triangle of numbers that is frequently used when calculating probabilities. The first five rows of it looks like this:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
This triangle can continue for as many rows as you like. To make it, follow these rules:
For this problem, create a function that takes a number $n$ as input and returns the first $n$ rows of Pascal's triangle in a list. For example, if $n=4$, the function should return a list with four elements that looks like this:
[[1]]
[1] 1
[[2]]
[1] 1 1
[[3]]
[1] 1 2 1
[[4]]
[1] 1 3 3 1
This problem is tricky and will require some serious thought. I will give you some pseudocode to get started, though you should feel free to deviate from it if you have a different approach.
define function getPascalRows with input number.rows
make an empty list that is number.rows elements long
make a variable previous.row that is given an initial value of NULL
for each number (call it current.row.number) from 1 to number.rows:
make a vector (call it current.row) that is current.row.number long
set the first and last elements of current.row to be 1
if current.row.number is 3 or greater:
for each number (call it current.entry) from 2 to current.row.number-1:
set that number to the sum of whatever is at current.entry-1 and current.entry of previous.row
insert current.row to the list at index current.row.number
set previous.row to be current.row
return the list
Each line of pseudocode can be turned into a line of R code. I have done the break-down of the problem for you, but you will need to fill in the details. This problem is not easy, so do not be discouraged if this is a difficult process. Ask for help if you need it!
Once you complete that problem, try to make a second function that takes the output of the first function and prints it to the console in the format shown above. It should clearly show a triangle shape, and each row should be centered with the row above it. A few hints to help:
paste0
function to turn
a vector of numbers into a single string, with each element separated by a space.sprintf("%*s", number.spaces, your.number)
. Don't worry about the details of that for now.
This will make the number
take up number.spaces
spaces. For example,
sprintf("%*s", 3, 1)
gives " 1"
(note the two initial spaces) and sprintf("%*s", 3, 23)
gives " 23"
.
cat(current.row, "\n")
, where the
"\n"
means "new line".
One enlightening practice is to think about how lists of numbers are sorted. There are many ways to sort
a list of numbers, and some are more efficient than others. In this practice problem, I will ask you to
write a function that sorts a list of numbers using for
loops. You can use any approach that works,
but I will try to give you helpful guidance. To begin, consider the following list of numbers:
4 2 7 3 9 8
Begin by literally sorting this list by hand, writing down the result on a paper. Once you have your result, ask yourself how you were able to get to that answer. What steps did you take? Write down these steps you took to sort the list. How might you explain the steps you took to someone else who does not understand the concept of sorting? This is how you write a sorting algorithm.
This is not an easy problem, so I will give you some things to think about. However, you should give the problem a proper go first before reading these hints.
This is a very fun project that can help you see what coding is capable of. The idea is that you will write a script that gives the user prompts as they play a game of blackjack. The full game is somewhat complicated, so we'll use a simpler set of rules for the game:
Since the job of the dealer requires no decision-making, you can write a script that plays the role of the dealer.
The script should "deal cards" to the player and dealer, and then play the game according to the rules above.
For example, the script might play out like any of the following three examples (note that the user input is
only yes
or no
in all cases):
Welcome to blackjack! Let's play.
You are dealt a 7♠ and a 10♦. Your hand value is 17.
The dealer is dealt a 4♣.
Would you like to draw a card? (yes/no) yes
You are dealt a 5♠. Your hand value is 22.
You bust.
The dealer wins.
Welcome to blackjack! Let's play.
You are dealt a 7♦ and a 2♠. Your hand value is 9.
The dealer is dealt a J♣.
Would you like to draw a card? (yes/no) yes
You are dealt a 4♠. Your hand value is 13.
Would you like to draw a card? (yes/no) yes
You are dealt a 6♦. Your hand value is 19.
Would you like to draw a card? (yes/no) no
You stand.
The dealer is dealt a 6♠. The dealer's hand value is 11.
The dealer draws a card.
The dealer is dealt a 10♠. The dealer's hand value is 21.
The dealer wins.
Welcome to blackjack! Let's play.
You are dealt a 10♠ and a 8♦. Your hand value is 18.
The dealer is dealt a 9♣.
Would you like to draw a card? (yes/no) no
You stand.
The dealer is dealt a 7♠. The dealer's hand value is 16.
The dealer draws a card.
The dealer is dealt a 10♦. The dealer's hand value is 26.
The dealer busts.
You win!
There are a lot of moving pieces in this one, so it may be appropriate to start by thinking about what you would need to do to write this script. What would you need to keep track of? What would you need to check for?
To begin, it is helpful to decide what data structures will be useful for each component of the game. For example...
for
loops to loop through the suits and ranks.
Given that there are a lot of moving pieces here, I will give you the backbone of a script that
you can use to get started. Note that the code chunk below can be copied,
and that every comment that starts with # TODO
is something
for you to fill in.
spade <- "\U2660" # unicode for the suit symbols
heart <- "\U2665"
diamond <- "\U2666"
club <- "\U2663"
card.ranks <- c("A", as.character(2:10), "J", "Q", "K")
card.suits <- c(spade, heart, diamond, club)
total.card.count <- length(card.ranks) * length(card.suits)
### FUNCTIONS
createDeck <- function () {
# function to create a deck of cards
deck <- data.frame(
matrix(
ncol=2,
nrow=0,
dimnames=list(NULL, c("suit", "rank"))
)
) # makes an empty data frame with suit & rank columns
# TODO: for each card suit 1-4...
# TODO: for each card rank 1-13...
deck[nrow(deck)+1, ] = c(suit.number, rank.number)
return (deck)
}
shuffleDeck <- function (deck) {
# function to shuffle a deck of cards
shuffled.card.order <- # TODO: shuffle the numbers 1-52
return (deck[shuffled.card.order, ])
}
dealCard <- function (current.deck) {
# function to deal a card from a deck. Note that this
# function returns a list, one item being the card drawn
# and the other being what is left of the deck. Whenever
# using this function, remember to update the deck with
# the new deck.
new.card <- # TODO: get first row of current.deck
return (list(
card=new.card,
deck=# TODO: get all but first row of current.deck
))
}
getCardName <- function (suit.number, rank.number) {
# function to get a string representation of a card
return (paste(
card.ranks[rank.number],
card.suits[suit.number]
))
}
getCardValue <- function (rank.number) {
# function to get the value of a card from its rank.
# Since all cards with ranks greater than 10 have a
# value of 10, we can use the min function to get
# the final value of the card like this:
return (min(rank.number, 10))
}
### GAME
# TODO: create variables to track the value of the hands
# TODO: create the deck of cards
# TODO: shuffle the deck of cards
# TODO: use `cat` to print a welcome message
# TODO: draw 3 cards, 2 for the player and 1 the for dealer
# TODO: Print messages for both, including cards & hand values
# TODO: while the player hand value is less than 21...
# TODO: ask the player if they want to draw a card
# TODO: if the player says no...
# TODO: break out of the loop
# TODO: if the player says yes...
# TODO: draw a card
# TODO: print message with card and the new hand value
# TODO: if value is greater than 21...
# TODO: print player bust message
# TODO: if the player has not busted...
# TODO: while dealer's hand is <17...
# TODO: draw a card for the dealer
# TODO: print message with card and the new hand value
# TODO: if value is greater than 21...
# TODO: print dealer bust message
# TODO: print winner message