Early in my career, I found myself learning SQL on the fly. Specifically, I was given access to a database and told, “use the data here to find the answers.” Almost every experienced analyst I know has a similar story. They are thrown into the proverbial deep end that is query structure, something that their college or graduate education did not spend a lot of time on.
SQL, like so many things in the analyst world, takes minutes to learn, but can take an entire career to master. In this post, I’ll focus on the basic structure of a SQL query and provide some techniques to help understand it. While not exhaustive, this post provides a foundation on which to build your SQL skill set.
Start at the Bottom
SQL has only two mandatory clauses: SELECT and FROM. The most common optional clause is the WHERE clause, but this last part may vary for you, since different databases require different techniques. You may already be familiar with the following basic query structure:
SELECT customer_name, segment, sales
FROM superstore_sample
WHERE state = ‘Alabama’
In SQL, arguably the two most important clauses, FROM and WHERE, are pushed toward the end of the statement. These clauses are critically important because they create context. It doesn’t make sense to look at the SELECT clause if you don’t know where those columns come FROM. Considering this, we should read the clauses in this order: FROM, WHERE, SELECT. Added bonus: this is also the order in which the machine executes your SQL.
Say It in a (Weird) Sentence
As mentioned above, a good way to test your understanding of a query is saying it in English. Let’s convert our query into a one-sentence summary:
We need superstore sample records from Alabama only, for each category and segment combination returning the sum of sales for each, sorted by segment.
Here’s how this sentence breaks down:
SQL Snippet | English Translation | What’s Actually Happening |
FROM superstore_sample | We need superstore sample records | Pull from the superstore_sample table only. |
WHERE state = ‘Alabama’ | from Alabama only | Narrow from all records to just those from Alabama. |
SELECT customer_name, segment, sales | specifically the customer name, segment, and sales from each. | Return the specified columns only. |
This might sound awkward in everyday speech, but narrowing your context from the start, and translating that process into English, is good practice for any code that you write. You can, of course, also phrase this query as follows:
We need the customer name, segment, and sales from the superstore sample, focused only on Alabama.
This feels less clunky and generally more conversational, but it has two disadvantages:
- It ignores the order in which the machine actually works.
- Important context is left dangling at the end of the sentence, requiring you to revert your thinking to context after we’ve gotten to what feels like the most important part.
A More Complex Example of Translating SQL
We can extend this technique to a more complex query such as:
SELECT category, segment, SUM(sales) AS sales
FROM superstore_sample
WHERE state = ‘Alabama’
GROUP BY category, segment
ORDER BY segment
This query is a bit more complex, but we can still transform it into a single sentence:
We need superstore sample records from Alabama only, for each category and segment combination returning the sum of sales for each, sorted by segment.
A bit of a mouthful, but we got through it. Let’s break it down below:
SQL Snippet | English translation | What’s Actually Happening |
FROM superstore_sample | We need superstore sample records | Narrow from all possible tables to one. |
WHERE state = ‘Alabama’ | from Alabama only | Narrow from all records to just those from Alabama. |
GROUP BY category, segment | for each category and segment combination | Aggregate records along category and segment. |
SELECT category, segment, SUM(sales) | returning the sum of sales for each | Display the category, segment, and sum of sales for each combination. |
ORDER BY segment | sorted by segment. | Order the results by segment in alphabetical order. |
Admittedly, this breaks the “start at the bottom” convention a little, but retains the order in which the clauses actually executes as follows:
- FROM
- WHERE
- GROUP BY
- SELECT
- ORDER BY
SELECT main_takeaway FROM blog_post
It isn’t necessary to do this exercise every time, but taking time to understand what the SQL engine is doing can help you optimize even the most complex of queries. As a bonus, it provides a sentence that can be placed directly in an email to a stakeholder to help explain in plain English how data is retrieved and processed to create a particular dataset.