Evolytics Blog

How to Read SQL like the Pros

Photo by Caspar Camille Rubin on Unsplash

Photo by Caspar Camille Rubin on Unsplash

Early in my career, I found myself learning SQL on the fly. Specifically, I was given access to a database and told, “use the data here to find the answers.” Almost every experienced analyst I know has a similar story. They are thrown into the proverbial deep end that is query structure, something that their college or graduate education did not spend a lot of time on.

SQL, like so many things in the analyst world, takes minutes to learn, but can take an entire career to master. In this post, I’ll focus on the basic structure of a SQL query and provide some techniques to help understand it. While not  exhaustive, this post provides a foundation on which to build your SQL skill set.

Start at the Bottom

SQL has only two mandatory clauses: SELECT and FROM. The most common optional clause is the WHERE clause, but this last part may vary for you, since different databases require different techniques. You may already be familiar with the following basic query structure:

SELECT customer_name, segment, sales

FROM superstore_sample

WHERE state = ‘Alabama’

In SQL, arguably the two most important clauses, FROM and WHERE, are pushed toward the end of the statement. These clauses are critically important because they create context. It doesn’t make sense to look at the SELECT clause if you don’t know where those columns come FROM. Considering this, we should read the clauses in this order: FROM, WHERE, SELECT. Added bonus: this is also the order in which the machine executes your SQL.

Say It in a (Weird) Sentence

As mentioned above, a good way to test your understanding of a query is saying it in English. Let’s  convert our query into a one-sentence summary: 

We need superstore sample records from Alabama only, for each category and segment combination returning the sum of sales for each, sorted by segment.

Here’s how this sentence breaks down:

SQL Snippet English Translation What’s Actually Happening
FROM superstore_sample We need superstore sample records Pull from the superstore_sample table only.
WHERE state = ‘Alabama’ from Alabama only Narrow from all records to just those from Alabama.
SELECT customer_name, segment, sales specifically the customer name, segment, and sales from each. Return the specified columns only.

 

This might sound awkward in everyday speech, but narrowing your context from the start, and translating that process into English, is good practice for any code that you write. You can, of course, also phrase this query as follows:

We need the customer name, segment, and sales from the superstore sample, focused only on Alabama.

This feels less clunky and generally more conversational, but it has two disadvantages:

  1. It ignores the order in which the machine actually works.
  2. Important context is left dangling at the end of the sentence, requiring you to revert your thinking to context after we’ve gotten to what feels like the most important part.

A More Complex Example of Translating SQL

We can extend this technique to a more complex query such as:

SELECT category, segment, SUM(sales) AS sales

FROM superstore_sample

WHERE state = ‘Alabama’

GROUP BY category, segment

ORDER BY segment

This query is a bit more complex, but we can still transform it into a single sentence:

We need superstore sample records from Alabama only, for each category and segment combination returning the sum of sales for each, sorted by segment.

A bit of a mouthful, but we got through it. Let’s break it down below:

SQL Snippet English translation What’s Actually Happening
FROM superstore_sample We need superstore sample records Narrow from all possible tables to one.
WHERE state = ‘Alabama’ from Alabama only Narrow from all records to just those from Alabama.
GROUP BY category, segment for each category and segment combination Aggregate records along category and segment.
SELECT category, segment, SUM(sales) returning the sum of sales for each Display the category, segment, and sum of sales for each combination.
ORDER BY segment sorted by segment. Order the results by segment in alphabetical order.

 

Admittedly, this breaks the “start at the bottom” convention a little, but retains the order in which the clauses actually executes as follows:

  1. FROM
  2. WHERE
  3. GROUP BY 
  4. SELECT
  5. ORDER BY

SELECT main_takeaway FROM blog_post

It isn’t necessary to do this exercise every time, but taking time to understand what the SQL engine is doing can help you optimize even the most complex of queries. As a bonus, it provides a sentence that can be placed directly in an email to a stakeholder to help explain in plain English how data is retrieved and processed to create a particular dataset.

John Carney

John Carney, Data Operations Analyst, guides clients through the process of accessing, analyzing, and visualizing data. His primary specialty involves automating pipelines and processes for data science, data analysis, and big data across multiple systems. With a well-rounded set a data science skills, John often acts as a bridge between the Data Operations and Decision Science teams. John recently spoke at the national Data Architecture Summit on decoding the data management/data science coin.

John CarneyHow to Read SQL like the Pros

Related Posts