Comparison of Search and Sort Techniques in Programming
A look at some search and sort techniques commonly used in programming, comparing their uses and performance on increasing data lengths.

By Tim Trott | Software Engineering | October 25, 2001

1,238 words, estimated reading time 4 minutes.

| (0) | (0)

There are many best methods depending on what is to be sorted, on what machine, and for what purpose.Donald Knuth

Since there are many different search and sort techniques (algorithms), software engineers must know their basic operation, relevant efficiencies etc to select the best one for a particular application. One efficiency criterion that gives an idea of the relationship between processing time and the size of the array is the simple big-O (O - Order of magnitude) calculation.

The three basic types of sorting (insertion, exchange and selection) will be compared with regard to the basic differences in their operation, their relative efficiencies and how they are affected by small and large array sizes and the initially ordered state.

The different search and sort techniques (linear and binary) will be compared and the basic operation outlined. This includes how the absence of the required item can be found and under what condition each would be used.

big-O Notation

The big-O Notation is also sometimes called Landau's Symbol. It is a technique used to mathematically describe the asymptotical behaviour of functions. It is a guide to how fast the output of the function grows with the increase of the input. The results are relative times.

Some common big-O functions are shown below.

O(1) - Constant
O(log(n)) - Logarithmic
O((log(n))^c) - Polylogarithmic
O(n) - Linear
O(n²) - Quadratic
O(n^c) - Polynomial
O(cⁿ) - Exponential

Sorting Techniques

Insertion Sort

The insertion sort extracts the second data element from the data. It then compares the extracted element with the next and if the next is larger, the next element is moved up an index. The extracted element then takes its place. The process is repeated until no passes are made.

If the data is almost sorted, for example, the numbers (8, 1, 2, 3, 4, 5, 6, 7, 9) then the 8 is extracted, 1 is moved to the first index, 2 into the second and so on until a number greater than 8 is encountered (or end of data). The 8 is then inserted into the data.

In general, the maximum number of passes made is given by the formula N - 1, where N is the number of data elements.

The relative time of the algorithm is O(N²) for unsorted data. As the array size increases the time the algorithm takes to execute increases.

Exchange Sort

The Exchange Sort, also known as bubble sort, is the simplest sorting algorithm. The algorithm works by taking an item and comparing it with the next adjacent item. If the first item is greater than the next item then they are swapped. If the first item is less than or equal to the next then no change is made. There are several passes through the data.

In general, the maximum number of passes made is given by the formula N - 1, where N is the number of data elements.

The relatively short time of the algorithm is O(N²) for unsorted data.

There is also a bi-directional exchange sort that repeatedly makes a pass downwards and then upwards sorting the data.

Selection Sort

Sorts data by repeatedly comparing remaining items to find the smallest and moving it to the end of the data. The effective size (area of data unsorted) is then reduced and the process is repeated on the sub-array until the effective size is one (an array of one is sorted).

In general, the maximum number of passes made is given by the formula N - 1, where N is the number of data elements.

The relatively short time of the algorithm is O(N²) for unsorted data.

Shell Sort

The shell sort is the optimised version of the insertion sort. It takes the smallest value and moves all values above it down an index. The smallest value is then inserted at the start of the array. The process is repeated, with the next smallest value inserted below the last smallest value.

The shell sort has an average sort time of O(N^1.25).

Quick sort

The Quick Sort uses the divide and conquer paradigm to sort the data. The first element is the central one. The data on the left is sorted in the same way, then the data on the right. The algorithm is recursive so the data is recursively broken down into halves until the data cannot be broken down further.

The quick sort has a sort time of O(n log n).

As the initial ordered state becomes sorted the time increase.

Heap Sort

The heap sort builds a heap of the data and repeatedly extracts the maximum item.

To build the heap the algorithm loops through all items in the array and places the value on the heap. The item is pushed into its position on the tree (highest at the top and lowest at the bottom).

When all of the items have been placed in the tree, the values are then extracted back into their indexes in the array.

The heap sort has an time of O(n log n)

Visual Analysis

The graph below shows a comparison between the O(N²) un-optimised sorts and the optimised O(n log n) and O(N^1.25) sorts. The graph shows the time taken to sort the data.

The un-optimised sorts are nearly infinitely long to execute as the data size approaches 100. This would cause the algorithm to take a very long time to sort the data. The time increases quadratically. (The value is about 500 seconds or 8 minutes).

On the other hand, the optimised sort algorithms still perform well with over 500 data items.

Summary of Sort Algorithms

For small amounts of unsorted data the basic sort types can be used.
For Large amounts of unsorted data the optimised types should be used
As the Array size increases, the time taken increases
Time increases as the data becomes more sorted

Searching Techniques

Linear Search

A linear search starts at the first data item and moves towards the last item trying to find an item. Linear searches can be done on data that is unsorted or sorted; there is no difference in performance.

The absence of the item can only be found once all of the items have been looked at.

The time taken to perform the search is O(N/2).

Binary Search

Binary search will only work on sorted data. The technique is similar to the quick sort in that it partitions the array into progressive halves until the item is found. If the search item is less than the middle item then the array is narrowed to the lower half. If the search item is greater than the middle value then the array is narrowed to the upper half.

The absence of a data item can be found easily, as the data is partitioned. The number of partitions to find the data depends on the number of data items.

The time taken to perform the search is O(Log N).

Visual Analysis of Search Techniques

The graph below compares the search times of the two search techniques described above. It can be seen that the binary search is much faster than the linear search.