data science - inner join - SQL - SQL Join

SQL Inner Join — Together, Forever

Relational databases store information in tables — in columns that describe aspects of an item and rows that tie together the columns. The SQL Inner Join constructs a temporary table by stitching together tables as long as they have a common column. SQL Inner Join makes proper decomposition — the reduction of complex tables into multiple, simple, linked, space-saving tables — possible. And, it’s the natural result of good database architecture.

The syntax of the SQL Inner Join clause is:

SELECT
   table1.column,
   table2.column
FROM
   table1
INNER JOIN
   table2 ON table1.common_column = table2.other_common_column ;

This blog entry is a deeper dive into the general SQL Join – Connecting Tables. Please visit that entry for a higher-level overview of the SQL Join statement.

Buying and Selling — A Tale Of Two Databases

The relationship between buyers and sellers is one of shared information. The seller selects products (hopefully) attractive to buyers and shares the description, price, and perhaps a photo. Whether it’s a glossy, printed catalog or an online portal, the news of available wares is shared.

Buyers (hopefully) order products from the vendor’s list and then maintain a list of their own, typically  containing the item, the quantity currently on hand, and the price paid per item. As a product is used up the quantity is updated, and the buyer knows when to order more product to avoid running out.

The common, real-world database jeopardy happens when the buyer duplicates too much of the seller’s product information — when the seller changes prices or discontinues a product the buyer who is relying on copied information doesn’t get the updated information.

A solution is to divide responsibilities: the seller maintains the database of available products that’s made available to prospective buyers, and the buyer maintains a separate database of in-stock purchases, with some minimal connection between the two that propagates updated information to the buyer. SQL Inner Join solves this problem, relying on a common column with unique, distinct ID values. (There’s no security risk in the seller’s sharing; this is the same type of information that was available in a printed catalog, but now with constantly updating current information.).

For example, you’re running a Spanish tapas bar. Being a responsible business-owner requires keeping track of monies spent, so the Bar de Tapas database contains a row that looks like:

Item CodeItem NameItem PriceIn StockTotal Cost
5Jamón ibérico Pata Negra Bellota$119.993$359.97

This table contains all the information about this item and will be our reference for what’s needed in any scheme between the buyer and seller; it’ll be referred to as the “overview table” later on.

The Vendor’s Offerings

If the vendor stops selling that item, or changes the price, the data duplicated in the buyer’s database becomes worse than useless. A responsible vendor shares their own frequently-updated database so the buyer always has actionable data at hand:

CREATE TABLE vendor (
codeDECIMAL,
nameVARCHAR(50),
priceDECIMAL(19,4)
);
Item CodeItem NameItem Price
5Jamón ibérico Pata Negra Bellota$119.99

The Buyer’s Perspective

One large table, with multiple rows referring to a product via a long text string like “Jamón ibérico Pata Negra Bellota,” wastes space. Using a unique item code, or UPC, that is common to related tables uses far less space. In this way, the buyer maintains a database table containing relevant information, including the item code and how many are currently in stock, ready for use:

CREATE TABLE store (
idDECIMAL,
quantity DECIMAL
);
Item CodeIn StockTotal Cost
53$359.97

Notice that both tables contain a common column, the item code. The common column with distinct values is the key requirement for joining tables. The common element can be a simple integer scheme starting at the number “1” or a globally agreed-upon universal product code like “860247001129”..

Zipping Together Tables Via a Common Column

Computationally derive the overview table (above) by combining the individual tables based upon that common ID column via a SQL Inner Join — this will blend vendor and buyer information to the benefit of both.

SELECT
   vendor.code,
   vendor.name,
   CONCAT('$', FORMAT(vendor.price, 2, @@lc_time_names)) AS item,
   store.quantity,
   CONCAT('$', FORMAT((store.quantity * vendor.price), 2, 'en_US')) AS total
FROM
   vendor
INNER JOIN
   store ON store.id = vendor.code ;

There’s a lot to unpack here, especially as this shows several best practices, so let’s take things in pieces.

The SQL Inner Join Component

The SQL Select clause is requesting which columns will be assembled into the overview table, and the order they should appear. As shown above, they are vendor.code, vendor.name, vendor.price, store quantity, and a computed column showing the total cost (store.quantity * vendor.price).

The two tables to be joined — vendor and store — are specified in the FROM … INNER JOIN … clause. The column in common is specified by ON store.id = vendor.code

The Derived Total Cost Component

SQL is able to perform mathematical operations on column values, in this case calculating the total cost represented by the items already purchased and still unused; in stock. Multiplying store.quantity by vendor.price equals the total cost.

Locally Appropriate Number Formatting

In the United States the text string “1,028.55” is recognisable as a number with the decimal separators — the command and the period — used appropriately. For Spanish-speakers worldwide, however, “1.028,55” is more understandable.

Writing SQL with an understanding of Internationalization and localization makes for more robust code with more utility across a larger market.

Instead of hard-coding commas and periods, it’s preferable to using the built-in variable @@lc_time_names or explicit strings like en_US (English in the US), es_US (Spanish in the US), or any other combination of ISO-3166 Country Codes and ISO-639 Language Codes. (Take note of gaps in internationalization.) 

FORMAT(vendor.price, 2, 'en_US')

Locally Appropriate Currency Formatting

Unfortunately, no database standard function library exists for deriving currency formatting. Used instead is the SQL Concat function to prepend a ‘$’ before the currency value.

CONCAT('$', …)

Depending on the computing environment hosting the database there may be variables set, or the local database administrator may know of a library installed that queries ISO-4217 Currency Codes and returns $ from the country code.

One can derive currency code from the country code, but the reverse is not true — for example, the US, Canada, and Argentina use the “dollar sign” as the currency mark, but given the string “$1234” it’s impossible to figure out which country’s money is being referred to.

Renaming Columns For More Readable Tables

By default, SQL names column names with the column name or the function used to derive the value. Without intervention, one column from the above SQL will display as CONCAT('$', FORMAT((store.quantity * vendor.price), 2, 'en_US')).

Use the AS clause to present the reader with a much more human-friendly option:

SELECT
column_name AS more_readable_name
FROM
table

Conclusion

We covered the practical uses of the SQL Inner Join clause that can help you build your skills. Additionally, we noted some of the issues that come from writing robust software including:

  • mathematically computing values,
  • internationalization and localization concerns around currency symbols and decimal separators, and
  • formatting column names to better serve readers.

Start Learning

To learn more about SQL, enroll in our SQL Nanodegree program, described in detail here.

SQL Sample Table Code

The following will build and populate the sample data referred to in this blog post. Test SQL interactively at  SQL Fiddle or ExtendsClass.

CREATE TABLE vendor (
codeDECIMAL,
nameVARCHAR(50),
priceDECIMAL(19,4)
);
INSERT INTO vendor
VALUES
( 1, '’Nduja Calabrian pork sausage', 6.99 ),
( 2, 'black garlic', 1.99 ),
( 3, 'buckwheat honey', 7.99 ),
( 4, 'Casa Barone piennolo tomatoes', 2.99 ),
( 5, 'Jamón ibérico Pata Negra Bellota', 119.99 ),
( 6, 'Kokuho Rose brown rice', 1.99 ),
( 7, 'Madagascar vanilla beans', 32.99 ),
( 8, 'Mangalitsa Hungarian hog salumi', 7.99 ),
( 9, 'Umami Negra Con Camarón salsa', 9.99 );
 
CREATE TABLE store (
idDECIMAL,
quantity DECIMAL
);
INSERT INTO store
VALUES
( 1, 17),
( 2, 70),
( 3, 2),
( 4, 40),
( 5, 3),
( 6, 1),
( 7, 3),
( 8, 19),
( 9, 3);