Insertion Sort
Introducion
Input: A sequence of n numbers ( a1 , a2 . . . , an )
Output: A permutation (reordering) ( a'1 , a'2 , . . . , a'n ) of the input sequence such
that a'1 < a'2 <. . . < a'n.
The numbers that we wish to sort are also known as the keys. Although conceptually we are sorting a sequence, the input comes to us in the form of an array with n
elements.
We start with insertion sort, which is an efficient algorithm for sorting a small
number of elements. Insertion sort works the way many people sort a hand of
playing cards. We start with an empty left hand and the cards face down on the
table. We then remove one card at a time from the table and insert it into the
correct position in the left hand. To find the correct position for a card, we compare
it with each of the cards already in the hand, from right to left. At all times, the cards held in the left hand are sorted, and these cards
were originally the top cards of the pile on the table.
We present our pseudocode for insertion sort as a procedure called INSERTION-SORT, which takes as a parameter an array A[1 . . n] containing a sequence of length n that is to be sorted. (In the code, the number n of elements in A is denoted by A.length.) The algorithm sorts the input numbers in place: it rearranges the numbers within the array A, with at most a constant number of them stored outside the array at any time. The input array A contains the sorted output sequence when the INSERTION-SORT procedure is finished.
![]() |
Sorting a hand of cards using insertion sort. |
We present our pseudocode for insertion sort as a procedure called INSERTION-SORT, which takes as a parameter an array A[1 . . n] containing a sequence of length n that is to be sorted. (In the code, the number n of elements in A is denoted by A.length.) The algorithm sorts the input numbers in place: it rearranges the numbers within the array A, with at most a constant number of them stored outside the array at any time. The input array A contains the sorted output sequence when the INSERTION-SORT procedure is finished.
Loop invariants and the correctness of insertion sort
Above figure shows how this algorithm works for A = (5, 2, 4, 6, 1, 3). The index j indicates the “current card” being inserted into the hand. At the beginning
of each iteration of the for loop, which is indexed by j, the subarray consisting
of elements A(1 . . j - 1) constitutes the currently sorted hand, and the remaining
subarray A(j + 1 . . n) corresponds to the pile of cards still on the table. In fact,
elements A[1 . . j - 1] are the elements originally in positions 1 through j - 1, but
now in sorted order. We state these properties of A[1 . . j - 1] formally as a loop
invariant:
At the start of each iteration of the for loop of lines 1–8, the subarray
A[1 . . j -1] consists of the elements originally in A[1 . . j - 1], but in sorted
order.
We use loop invariants to help us understand why an algorithm is correct. We
must show three things about a loop invariant:
Initialization: It is true prior to the first iteration of the loop.
Maintenance: If it is true before an iteration of the loop, it remains true before the
next iteration.
Termination: When the loop terminates, the invariant gives us a useful property that helps show that the algorithm is correct.
Termination: When the loop terminates, the invariant gives us a useful property that helps show that the algorithm is correct.
Loop invariant vs Mathematical induction
When the first two properties hold, the loop invariant is true prior to every iteration
of the loop. (Of course, we are free to use established facts other than the loop
invariant itself to prove that the loop invariant remains true before each iteration.)
Note the similarity to mathematical induction, where to prove that a property holds,
you prove a base case and an inductive step. Here, showing that the invariant holds
before the first iteration corresponds to the base case, and showing that the invariant
holds from iteration to iteration corresponds to the inductive step.
The third property is perhaps the most important one, since we are using the loop
invariant to show correctness. Typically, we use the loop invariant along with the
condition that caused the loop to terminate. The termination property differs from
how we usually use mathematical induction, in which we apply the inductive step
infinitely; here, we stop the “induction” when the loop terminates.
Applying loop invariant for insertion sort
- Initialization: We start by showing that the loop invariant holds before the first loop iteration, when j = 2. The subarray A[1 . . j - 1], therefore, consists of just the single element A[1], which is in fact the original element in A[1]. Moreover, this subarray is sorted (trivially, of course), which shows that the loop invariant holds prior to the first iteration of the loop.
- Maintenance: Next, we tackle the second property: showing that each iteration
maintains the loop invariant. Informally, the body of the for loop works by
moving A[j - 1], A[j - 2], A[j - 3], and so on by one position to the right
until it finds the proper position for A[j] (lines 4–7), at which point it inserts
the value of A[j] (line 8). The subarray A[1 . . j] then consists of the elements
originally in A[1 . . j] , but in sorted order. Incrementing j for the next iteration
of the for loop then preserves the loop invariant.
A more formal treatment of the second property would require us to state and show a loop invariant for the while loop of lines 5–7. At this point, however, we prefer not to get bogged down in such formalism, and so we rely on our informal analysis to show that the second property holds for the outer loop. - Termination: Finally, we examine what happens when the loop terminates. The condition causing the for loop to terminate is that j > A.length = n. Because each loop iteration increases j by 1, we must have j = n + 1 at that time. Substituting n + 1 for j in the wording of loop invariant, we have that the subarray A[1 . . n] consists of the elements originally in A[1 . . n], but in sorted order. Observing that the subarray A[1 . . n] is the entire array, we conclude that the entire array is sorted. Hence, the algorithm is correct.
Analysis of insertion sort
The time taken by the INSERTION-SORT procedure depends on the input: sorting a
thousand numbers takes longer than sorting three numbers. Moreover, INSERTIONSORT can take different amounts of time to sort two input sequences of the same
size depending on how nearly sorted they already are. In general, the time taken
by an algorithm grows with the size of the input, so it is traditional to describe the
running time of a program as a function of the size of its input. To do so, we need
to define the terms “running time” and “size of input” more carefully.
The best notion for input size depends on the problem being studied. For many
problems, such as sorting or computing discrete Fourier transforms, the most natural measure is the number of items in the input—for example, the array size n
for sorting. For many other problems, such as multiplying two integers, the best
measure of input size is the total number of bits needed to represent the input in
ordinary binary notation. Sometimes, it is more appropriate to describe the size of
the input with two numbers rather than one. For instance, if the input to an algorithm is a graph, the input size can be described by the numbers of vertices and
edges in the graph. We shall indicate which input size measure is being used with
each problem we study.
The running time of an algorithm on a particular input is the number of primitive
operations or “steps” executed. It is convenient to define the notion of step so
that it is as machine-independent as possible. For the moment, let us adopt the
following view. A constant amount of time is required to execute each line of our
pseudocode. One line may take a different amount of time than another line, but
we shall assume that each execution of the ith line takes time ci, where ci is a
constant.
In the following discussion, our expression for the running time of INSERTION-SORT will evolve from a messy formula that uses all the statement costs ci to a much simpler notation that is more concise and more easily manipulated. This simpler notation will also make it easy to determine whether one algorithm is more efficient than another.
We start by presenting the INSERTION-SORT procedure with the time “cost” of each statement and the number of times each statement is executed. For each j = 2, 3, . . . . , n, where n = A.length, we let tj denote the number of times the while loop test in line 5 is executed for that value of j . When a for or while loop exits in the usual way (i.e., due to the test in the loop header), the test is executed one time more than the loop body. We assume that comments are not executable statements, and so they take no time.
The running time of the algorithm is the sum of running times for each statement executed; a statement that takes ci steps to execute and executes n times will contribute cin to the total running time. To compute T (n), the running time of INSERTION-SORT on an input of n values, we sum the products of the cost and times columns, obtaining :
In the following discussion, our expression for the running time of INSERTION-SORT will evolve from a messy formula that uses all the statement costs ci to a much simpler notation that is more concise and more easily manipulated. This simpler notation will also make it easy to determine whether one algorithm is more efficient than another.
We start by presenting the INSERTION-SORT procedure with the time “cost” of each statement and the number of times each statement is executed. For each j = 2, 3, . . . . , n, where n = A.length, we let tj denote the number of times the while loop test in line 5 is executed for that value of j . When a for or while loop exits in the usual way (i.e., due to the test in the loop header), the test is executed one time more than the loop body. We assume that comments are not executable statements, and so they take no time.
The running time of the algorithm is the sum of running times for each statement executed; a statement that takes ci steps to execute and executes n times will contribute cin to the total running time. To compute T (n), the running time of INSERTION-SORT on an input of n values, we sum the products of the cost and times columns, obtaining :
T (n) = c1n + c2(n - 1) + c4(n - 1) + c5 Σnj=2 tj + c6 Σnj=2 (tj - 1) + c7 Σnj=2 (tj - 1) + c8 (n - 1)
Even for inputs of a given size, an algorithm’s running time may depend on
which input of that size is given. For example, in INSERTION-SORT, the best
case occurs if the array is already sorted. For each j = 2, 3, . . . . . , n, we then find
that A[i] key in line 5 when i has its initial value of j - 1. Thus tj = 1 for
j = 2, 3 . . . . , n, and the best-case running time is
T (n) = c1n + c2(n - 1) + c4(n - 1) + c5(n - 1) + c8(n - 1)
= (c1 + c2 + c4 + c5 + c8)n - (c2 + c4 + c5 + c8) .
We can express this running time as an + b for constants a and b that depend on
the statement costs ci; it is thus a linear function of n.
If the array is in reverse sorted order—that is, in decreasing order—the worst case results. We must compare each element A[j] with each element in the entire sorted subarray A[1 . . j - 1], and so tj = j for j = 2, 3, . . . , n. Noting that
If the array is in reverse sorted order—that is, in decreasing order—the worst case results. We must compare each element A[j] with each element in the entire sorted subarray A[1 . . j - 1], and so tj = j for j = 2, 3, . . . , n. Noting that
Σnj=2 j = n(n + 1)
_________ - 1
2
and
Σnj=2 (j - 1) = n(n - 1)
_______
2
we find that in
the worst case, the running time of INSERTION-SORT is
We can express this worst-case running time as an2 + bn + c for constants a, b, and c that again depend on the statement costs ci; it is thus a quadratic function of n. Typically, as in insertion sort, the running time of an algorithm is fixed for a given input.
We can express this worst-case running time as an2 + bn + c for constants a, b, and c that again depend on the statement costs ci; it is thus a quadratic function of n. Typically, as in insertion sort, the running time of an algorithm is fixed for a given input.
Order of growth
We used some simplifying abstractions to ease our analysis of the INSERTION-SORT procedure. First, we ignored the actual cost of each statement, using the
constants ci to represent these costs. Then, we observed that even these constants
give us more detail than we really need: we expressed the worst-case running time
as an2 + bn + c for some constants a, b, and c that depend on the statement
costs ci. We thus ignored not only the actual statement costs, but also the abstract
costs ci.
We shall now make one more simplifying abstraction: it is the rate of growth,
or order of growth, of the running time that really interests us. We therefore consider only the leading term of a formula (e.g., an2), since the lower-order terms are
relatively insignificant for large values of n. We also ignore the leading term’s constant coefficient, since constant factors are less significant than the rate of growth
in determining computational efficiency for large inputs. For insertion sort, when
we ignore the lower-order terms and the leading term’s constant coefficient, we are
left with the factor of n2 from the leading term. We write that insertion sort has a
worst-case running time of Θ(n2) (pronounced “theta of n-squared”).
Comments
Post a Comment