Relational databases store information in tables — columns that are analogous to elements in a data structure and rows which are one instance of that data structure. In those cases where this data set contains duplicate values (like membership location by state or province) SQL Distinct traverses these data, filters them, and returns just one of each duplicated value (making it easy to determine, for example, the number of members living in each state). SQL Distinct evaluates the span of a particular set of values.
Starting With SQL Distinct
We start with a table of some of Charles Dickens’ characters and the novels in which they appear. This table is imagined not as an exhaustive index of every character but those most worth discovering; it’s a table of the most notable and memorable characters in Dickens’ oeuvre.
The SQL required to create and populate this table appears at the end of this blog post.
SELECT * FROM dickens ;
Note that while the person column is unique (by chance; uniqueness was not enforced at the database level) the book column contains duplicates, as expected.
|Bob Cratchit||A Christmas Carol|
|Ebenezer Scrooge||A Christmas Carol|
|Jacob Marley||A Christmas Carol|
|Mr. Pumblechook||Great Expectations|
|Old Fezziwig||A Christmas Carol|
|Pip Gargery||Great Expectations|
|The Artful Dodger||Oliver Twist|
|Vincent Crummles||Nicholas Nickleby|
To extract the set of books across the dickens table — with duplicates collapsed down to one entry — the SQL Distinct keyword is added to the SELECT command:
SELECT DISTINCT(book) FROM dickens ;
|A Christmas Carol|
Ordering and Grouping Values With SQL Distinct
The above generates an unordered list as the result. To extract an ordered list of the same values one could use:
SELECT DISTINCT(book) FROM dickens ORDER BY book ;
The GROUP BY clause both collapses the results into distinct values and does the ordering, so as a synonym the following also works:
SELECT book FROM dickens GROUP BY book ;
Both of the statements above return the following results:
|A Christmas Carol|
Counting Values With SQL Distinct
Once we have the distinct values, we’d like to count them in various ways. To count the distinct values directly one uses:
SELECT COUNT(DISTINCT(book)) FROM dickens ;
This leads us into the quest to see which books are most popular (based upon the number of notable characters within its pages). As above, we use the GROUP BY clause to process counts within the distinct groups.
SELECT book, COUNT(*) FROM dickens GROUP BY book ORDER BY COUNT(*) DESC, book ;
This SQL statement selects the book title and a count of that title filters out duplicates with the GROUP BY clause and then presents the results ordered first by the count (so books in which more notable characters appear first) and then by book (so books with the same number of characters appear alphabetically).
|A Christmas Carol||4|
Enforcing Uniqueness at The Table Level
The above works well when a character appears only once in the table (assuming for this blog that Dickens wrote no sequel novels). Using the UNIQUE constraint causes the database to reject duplicate entries in the person column.
CREATE TABLE dickens ( person VARCHAR(50) UNIQUE, book VARCHAR(33) ) ;
We’ve covered extracting the distinct values from a group with duplicates, how to count those duplicates, and several related and synonym functions to display several interesting aspects of the data.
SQL Distinct Sample Table Code
The following will build and populate the sample data referred to in this blog post, data you can test interactively at SQL Fiddle.
CREATE TABLE dickens ( person VARCHAR(50), book VARCHAR(33) ) ; INSERT INTO dickens VALUES ( "Bob Cratchit", "A Christmas Carol" ), ( "Ebenezer Scrooge", "A Christmas Carol" ), ( "Fagin", "Oliver Twist" ), ( "Jacob Marley", "A Christmas Carol" ), ( "Mr Pumblechook", "Great Expectations" ), ( "Old Fezziwig", "A Christmas Carol" ), ( "Pip Gargery", "Great Expectations" ), ( "The Artful Dodger", "Oliver Twist" ), ( "Vincent Crummles", "Nicholas Nickleby" ) ;