Relevant are algorithms sorting how?
In a sentence, word order is essential to convey the intended message. The same applies to programming: Having sorted datasets makes it possible to organize massive amounts of data while making it quickly searchable.
That’s why you often find sorting algorithms in the first pages of programming textbooks. In this article, we describe sorting algorithms, explain how they work, and illustrate how to choose the right sorting algorithm for your program.
What Are Sorting Algorithms?
Sorting algorithms are algorithms that organize items in a sequence according to a specific condition, for example, in ascending order. Sorting algorithms are usually evaluated by how well they organize sequences of integers, but in the real world, sorting isn’t limited to just numbers. Almost anywhere you look, from simple websites to complex web applications to intricate business systems, sorting algorithms are doing the work in organizing the underlying data.
What makes sorting a cornerstone of modern technology? The answer: search. In a dataset that isn’t ordered, your best option when searching (and it’s not a very good one) is to consider each item in the dataset and see if it’s what you need. The more elements you examine, the longer it takes to search an unordered data structure.
Imagine how slow searching for a keyword on Google would be if it meant Google had to look through every web page in existence. And Google is far from the only place where we rely on search: Other examples include your browsing history, the transactions in your banking interface and the songs you recently listened to online.
Sorting the data first is what makes it possible to search for everything quickly. If you organize the data in a way that helps you find the right element faster, you significantly reduce the amount of time needed to produce accurate search results, especially over vast datasets.
How Sorting Algorithms Work
If you’re given an unordered set of elements — for example, a vector if you are programming in C++ — a sorting algorithm generally moves the elements either within the vector or into a new vector in a specific pattern until contents are fully sorted.
Some algorithms compare pairs of elements to decide which item goes first — these are called comparison algorithms. Others attain the right order without comparing elements, relying instead on hashing functions. Sorting algorithms aren’t just for numbers; you can usually extend them to work on any data structure and even provide custom functions to present the desired ordering.
You can objectively evaluate a sorting algorithm on three criteria: time complexity, memory usage and stability.
Time Complexity in Sorting Algorithms
Time complexity is an abstract way to show how long a sorting algorithm would take to sort a vector of length n. The best algorithms that make comparisons between elements usually have a complexity of O(n log n). This complexity means that the algorithm’s run time increases slightly faster than the number of items in the vector.
As a programmer, you generally want the sorting to finish quickly on large datasets. Thus we tend to find sorting algorithms with short run times in real-world programs.
Question: Are O(n log n)-complex algorithms faster than O(n)?
No, although there are no sorting algorithms with O(n) complexity in the typical case. Here’s some quick math to explain why O(n log n) isn’t faster than O(n):
n is the number of elements you are sorting, a positive integer. The logarithm of a positive integer is always greater than one. Therefore log n > 1 for all n > 1, making n * log n > n for all n > 1.
By the same logic, algorithms with O(log n) complexity are faster than those with complexity O(n).
Memory Usage in Sorting Algorithms
Given that the vector you’re sorting has n elements, some algorithms won’t need any additional memory beyond what’s required to store the n elements themselves. These are called in-place sort algorithms. In-place sorting is the best possible case for sorting algorithms from a memory usage standpoint. A few examples of in-place algorithms:
But many other algorithms require more memory than in-place algorithms. For example, those that need at least as much additional memory as the vector itself are said to have memory requirements of n.
It’s essential to be aware of your chosen algorithm’s memory usage. A memory-intensive sorting algorithm might not have enough RAM to run on a tiny IoT device, especially if you are sorting large datasets.
Stable and Unstable Sorting Algorithms
An algorithm is stable when, upon finding two elements that are equal for sorting, the algorithm preserves the original order of the items. An algorithm is unstable when there is no guarantee that the equal elements will end up in the original order. In most cases, stability isn’t as important as complexity or memory usage. However, if you are building, say, a queueing system, you might need to use a stable algorithm to guarantee first-in-first-out ordering on similar items in the queue.
Which is the Best Algorithm for Sorting?
There is no single best choice for all sorting tasks. However,, the answer very frequently lies in the algorithm that’s already available to you in your programming language’s standard library.
While it can be useful to know the differences between the many sorting algorithms out there, in most cases, you won’t need to implement your own algorithm for sorting in a production application. The standard library of the programming language you’re using likely has one, and it’s most probably implemented in a high-performance and robust way.
For example, in C++, the default algorithm used is introsort, a combination of heapsort, quicksort, and insertion sort. Introsort is an in-place algorithm and uses very little additional memory. Introsort performs at the best possible level, O(n log n), so there is rarely a reason not to use the default sorting algorithm from the C++ standard library.
Is Merge Sort Better than Quicksort?
Unless you are looking for a sorting algorithm to suit a high-volume production environment, you are generally better off choosing the sorting algorithm already available to you that has the most robust implementation; creating your own sorting algorithm is no small task.
While the core functionality is simple, things you must also consider include adding support for multiple data types and external comparison functions, optimizing for edge cases such as pre-sorted sequences and incorrect vectors, debugging memory leaks and creating documentation for other developers.
Any performance improvements you’d gain from using an algorithm you create are rarely worth all this additional effort. So, in general, choose the sorting algorithm already present in your programming language’s standard library.
If you insist on comparing two algorithms like merge sort and quicksort, it’s vital to compare them in the context of your use case. Look at the key characteristics of each algorithm: time complexity, memory usage, and stability. Here is how merge sort and quicksort compare, for example:
|Time complexity||n log n||n log n||Applies to both algorithms in the best and average cases. However, in the worst case, quicksort can have n² complexity.|
|Memory usage||n||log n||Memory usage of quicksort is considerably lower than that of merge sort when working with large datasets.|
To decide whether merge sort is better than quicksort for your use case, evaluate each difference between the algorithms and see whether one is more suitable. Also, consider the implications of running your algorithm in a production application.
Can the sorting algorithm be easily parallelized across multiple CPU cores? Across multiple machines? Are data moves time-intensive on the device the algorithm is going to run? In that case, is an algorithm that minimizes data moves preferable?
If the performance for your use case is roughly the same for both algorithms, go with the option that minimizes maintenance and/or documentation load.
What is the Slowest Sorting Algorithm?
Bogosort, which is sometimes called permutation sort, works by randomly shuffling the dataset and testing whether the data is now sorted. If the order isn’t right, the algorithm shuffles the vector and tests the order again.
Random shuffling isn’t precisely the most efficient sorting strategy: in the worst case, the algorithm will never finish running! Even in an average case, the time complexity for bogosort is n x n!, since after each shuffling it’s necessary to test whether the vector is now in order.
Thankfully, bogosort is only used for demonstration purposes and doesn’t run in production environments. At least, we hope it doesn’t.
Why is Bubble Sort So Slow?
Bubble sort is another quite inefficient algorithm that’s often used for teaching and not for production. The sorting is performed in multiple iterations. In each iteration, the algorithm compares every adjacent pair of elements, swapping them if they are not in the desired order.
The algorithm only stops running when there have been no swaps in the most recent iteration — the sign that the dataset is now in order.
Because each iteration requires many comparisons to understand which elements to swap, and because the algorithm requires multiple iterations to sort a dataset, bubble sort is quite slow on large datasets compared to other algorithms. Bubble sort’s time complexity is O(n²), and this level of performance is significantly worse than that of merge sort (O(nlog n)) or quicksort (also O(n log n).
In this article, we’ve explained the sorting algorithms and how they work, and we’ve shared some guidelines on choosing the best sorting algorithm for your use case.
Are you interested in learning more about sorting algorithms and other aspects of programming?
Sign up for our Intro to Programming Nanodegree today.