Relational databases store information in tables — columns that are analogous to elements in a data structure and rows which are one instance of that data structure. In those cases where this data set contains duplicate values (like membership location by state or province) SQL Distinct traverses these data, filters them, and returns just one of each duplicated value (making it easy to determine, for example, the number of members living in each state). SQL Distinct evaluates the span of a particular set of values.

Starting With SQL Distinct

We start with a table of some of Charles Dickens’ characters and the novels in which they appear. This table is imagined not as an exhaustive index of every character but those most worth discovering; it’s a table of the most notable and memorable characters in Dickens’ oeuvre.

The SQL required to create and populate this table appears at the end of this blog post.

SELECT * FROM dickens ;

Note that while the person column is unique (by chance; uniqueness was not enforced at the database level) the book column contains duplicates, as expected.

personbook
Bob CratchitA Christmas Carol
Ebenezer ScroogeA Christmas Carol
FaginOliver Twist
Jacob MarleyA Christmas Carol
Mr. PumblechookGreat Expectations
Old FezziwigA Christmas Carol
Pip GargeryGreat Expectations
The Artful DodgerOliver Twist
Vincent CrummlesNicholas Nickleby

To extract the set of books across the dickens table — with duplicates collapsed down to one entry — the SQL Distinct keyword is added to the SELECT command:

SELECT DISTINCT(book) FROM dickens ;
book
A Christmas Carol
Oliver Twist
Great Expectations
Nicholas Nickleby

Ordering and Grouping Values With SQL Distinct

The above generates an unordered list as the result. To extract an ordered list of the same values one could use:

SELECT DISTINCT(book) FROM dickens ORDER BY book ;

The GROUP BY clause both collapses the results into distinct values and does the ordering, so as a synonym the following also works:

SELECT book FROM dickens GROUP BY book ;

Both of the statements above return the following results:

book
A Christmas Carol
Great Expectations
Nicholas Nickleby
Oliver Twist

Counting Values With SQL Distinct

Once we have the distinct values, we’d like to count them in various ways. To count the distinct values directly one uses:

SELECT COUNT(DISTINCT(book)) FROM dickens ;
COUNT(DISTINCT(book))
4

This leads us into the quest to see which books are most popular (based upon the number of notable characters within its pages). As above, we use the GROUP BY clause to process counts within the distinct groups.

SELECT book, COUNT(*)
FROM dickens
GROUP BY book
ORDER BY COUNT(*) DESC, book ;

This SQL statement selects the book title and a count of that title filters out duplicates with the GROUP BY clause and then presents the results ordered first by the count (so books in which more notable characters appear first) and then by book (so books with the same number of characters appear alphabetically).

bookCOUNT(*)
A Christmas Carol4
Great Expectations2
Oliver Twist2
Nicholas Nickleby1

Enforcing Uniqueness at The Table Level

The above works well when a character appears only once in the table (assuming for this blog that Dickens wrote no sequel novels). Using the UNIQUE constraint causes the database to reject duplicate entries in the person column.

CREATE TABLE dickens (
person VARCHAR(50) UNIQUE, 
book VARCHAR(33) ) ;

Conclusion

We’ve covered extracting the distinct values from a group with duplicates, how to count those duplicates, and several related and synonym functions to display several interesting aspects of the data.

To learn more about SQL, enroll in our SQL Nanodegree program, described in detail here.

Start Learning

SQL Distinct Sample Table Code

The following will build and populate the sample data referred to in this blog post, data you can test interactively at  SQL Fiddle.

CREATE TABLE dickens ( person VARCHAR(50), book VARCHAR(33) ) ;
INSERT INTO dickens
VALUES
( "Bob Cratchit", "A Christmas Carol" ),
( "Ebenezer Scrooge", "A Christmas Carol" ),
( "Fagin", "Oliver Twist" ),
( "Jacob Marley", "A Christmas Carol" ),
( "Mr Pumblechook", "Great Expectations" ),
( "Old Fezziwig", "A Christmas Carol" ),
( "Pip Gargery", "Great Expectations" ),
( "The Artful Dodger", "Oliver Twist" ),
( "Vincent Crummles", "Nicholas Nickleby" ) ;