collections - SQL In

SQL In — Picking Through Collections

Relational databases store information in tables — in columns that describe aspects of an item and rows that tie together the columns. The SQL In clause is an example of a SQL operation that addresses a specialized programming need: To iterate through collections (sets of related data). To illustrate, let’s consider the offerings of a large department store, described by the following SQL:

CREATE TABLE superstore (
dept VARCHAR(20),
kind VARCHAR(20),
item VARCHAR(50)
);

We add sample data:

INSERT INTO superstore
VALUES
( 'bedroom', 'mattresses', 'Happy Fluffy Dreams' ),
( 'bedroom', 'bedding', 'Comfy Comforter' ),
( 'bedroom', 'bedding', 'Exceedingly Firm Pillow' ),
( 'kitchen', 'cookware', 'Comically Huge Slotted Spoon' ),
( 'kitchen', 'cookware', 'Overly Heavy Cast Iron Pan' ),
( 'kitchen', 'scullery', 'Optimistic Motivational Poster' ),
( 'kitchen', 'scullery', 'Humorous Hand Towels' );

Notice the top-down hierarchy: the store is divided into departments, which in turn groups their offerings by kind, into which the products are organized.

Using or And = to Build a Collection

The canonical way of finding all products of one kind or another is to have a WHERE clause with an equality test for each possibility, each test connected with a boolean OR operator (because we want to find out if any of the options match the object kind). The SQL to evaluate whether a value is either 'cookware' or 'scullery' is:

SELECT
*
FROM
superstore
WHERE
kind = 'cookware' OR kind = 'scullery' ;
deptkinditem
kitchencookwareComically Huge Slotted Spoon
kitchencookwareOverly Heavy Cast Iron Pan
kitchensculleryOptimistic Motivational Poster
kitchensculleryHumorous Hand Towels

Using IN to Build a Collection

“Syntactic sugar” — syntax within a programming language designed to make the code easier to read, express, and maintain over the code lifetime — assists with this task, as the following SQL shows the repetitive equality tests linked by OR replaced by the IN operator:

SELECT * FROM superstore;
 
SELECT
*
FROM
superstore
WHERE
kind IN ('cookware', 'scullery');

Writing, maintaining, and reading such a list — ('a', 'b', 'c', 'd') — is much less prone to programmer error than the repeated kind = 'a' OR boolean test blocks.

Using AND and != to Build an Anti-Collection

Exactly opposite from the previous examples, now we concentrate on building an anti-collection — elements that are not members of the specified set. Using a combination of AND and != (not equals) our SQL looks like this:

SELECT
*
FROM
superstore
WHERE
kind != 'cookware' AND kind != 'scullery' ;
deptkinditem
bedroommattressesHappy Fluffy Dreams
bedroombeddingComfy Comforter
bedroombeddingExceedingly Firm Pillow

Using NOT IN to Build An Anti-Collection

Continuing to show how syntactic sugar can make code easier to read and maintain, this is how NOT IN simplifies the previous code:

SELECT
*
FROM
superstore
WHERE
kind NOT IN ('cookware', 'scullery');

Using IN With a Subquery to Dynamically Fetch Values

To extend our example, consider our store’s website allows visitors to specify their purchasing interests. The following code creates a preferences table and populates it with values for an enthusiastic home cook:

CREATE TABLE preferences (
interest VARCHAR(20)
);
INSERT INTO preferences
VALUES
( 'cookware' ),
( 'scullery' );

In the examples above, the kind IN (‘cookware’, ‘scullery’) action has hard-coded values. The preferences table allows for a dynamic value look-up, through a “subquery” (a SQL Select which returns values to its parent SQL Select.

To build a customized collection for this visitor, in order to display enticing products, we use IN with a subquery:

SELECT
  *
FROM
  superstore
WHERE
  kind IN (SELECT interest
           FROM preferences );

Using a subquery allows for a clean separation of product and preference, decreasing chances of ill-considered side-effects when either are changed in future code revisions.

deptkinditem
kitchencookwareComically Huge Slotted Spoon
kitchencookwareOverly Heavy Cast Iron Pan
kitchensculleryOptimistic Motivational Poster
kitchensculleryHumorous Hand Towels

The Difference Between IN and EXISTS

In the examples above, the IN operation iterates over a small list of elements, whether hard-coded or provided by a subquery, which are each examined in turn. When the list becomes large, performance falters. The EXISTS operator provides a higher-performing alternative.

While EXISTS is not exact syntactic sugar for IN, it can be used to a similar effect. EXISTS returns values only if the subquery comparison is true, resulting in better performance over very large datasets.

In a fashion similar to SQL table joins, a column of common values is needed. In the following example,  the connection between product and preference is the WHERE kind = interest comparison.

SELECT
  *
FROM
  superstore
WHERE EXISTS (SELECT idea
              FROM preferences
              WHERE kind = interest );

Conclusion

Covered here are ways of creating collections and anti-collections by the parallel methods of:

  • IN vs = and OR
  • NOT IN vs != and AND

Also covered are the concepts of “syntactic sugar” and the benefits of using specialized keywords (like IN and EXISTS) to enhance the reliability of code and decrease the long-term maintenance costs. Lastly, the relationship between the IN and EXISTS clauses is demonstrated.

Start Learning

To learn more about SQL, check out other SQL blog posts and enroll in our SQL Nanodegree program.

SQL Sample Table Code

The following will build and populate the sample data referred to in this blog post. Test SQL interactively at  SQL Fiddle or ExtendsClass.

CREATE TABLE superstore (
dept VARCHAR(20),
kind VARCHAR(20),
item VARCHAR(50)
);
INSERT INTO superstore
VALUES
( 'bedroom', 'mattresses', 'Happy Fluffy Dreams' ),
( 'bedroom', 'bedding', 'Comfy Comforter' ),
( 'bedroom', 'bedding', 'Exceedingly Firm Pillow' ),
( 'kitchen', 'cookware', 'Comically Huge Slotted Spoon' ),
( 'kitchen', 'cookware', 'Overly Heavy Cast Iron Pan' ),
( 'kitchen', 'scullery', 'Optimistic Motivational Poster' ),
( 'kitchen', 'scullery', 'Humorous Hand Towels' );
 
CREATE TABLE preferences (
interest VARCHAR(20)
);
INSERT INTO preferences
VALUES
( 'cookware' ),
( 'scullery' );