Most Candidates Fail These SQL Ideas in Knowledge Interviews

By admin2010

September 5, 2025

35

Most Candidates Fail These SQL Concepts in Data Interviews

Picture by writer | Canva

An interviewer’s job is to search out essentially the most appropriate candidates for the marketed place. In doing so, they are going to gladly arrange SQL interview inquiries to see if they’ll catch you off guard. There are a number of SQL ideas at which candidates usually fail.

Hopefully, you’ll be a kind of who keep away from that future, as I’ll clarify these ideas intimately under, full with examples of methods to clear up sure issues appropriately.

Most Candidates Fail These SQL Concepts in Data Interviews

# 1. Window Features

Why It’s Onerous: Candidates memorize what every window operate does however don’t actually perceive how window frames, partitions, or ordering truly work.

Frequent Errors: A standard mistake will not be specifying ORDER BY in rating window features or worth window features, equivalent to LEAD() or LAG(), and anticipating the question to work or for the outcome to be deterministic.

Instance: In this instance, that you must discover customers who made a second buy inside 7 days of any earlier buy.

You would possibly write this question.

WITH ordered_tx AS (
  SELECT user_id,
         created_at::date AS tx_date,
         LAG(created_at::DATE) OVER (PARTITION BY user_id) AS prev_tx_date
  FROM amazon_transactions
)

SELECT DISTINCT user_id
FROM ordered_tx
WHERE prev_tx_date IS NOT NULL AND tx_date - prev_tx_date <= 7;

At first look, all the pieces might sound proper. The code even outputs one thing which may look like an accurate reply.

Window Functions

To begin with, we’re fortunate that the code works in any respect! This occurs just because I’m writing it in PostgreSQL. In another SQL flavors, you’d get an error since ORDER BY is obligatory in rating and analytical window features.

Second, the output is improper; I highlighted some rows that shouldn’t be there. Why do they seem, then?

They seem as a result of we didn’t specify an ORDER BY clause within the LAG() window operate. With out it, the row order is unfair. So, we’re evaluating the present transaction to some random earlier row for that consumer, not the one which occurred instantly earlier than it in time.

This isn’t what the query asks. We have to evaluate every transaction to the earlier one by date. In different phrases, we have to specify this explicitly within the ORDER BY clause inside the LAG() operate.

WITH ordered_tx AS (
  SELECT user_id,
         created_at::date AS tx_date,
         LAG(created_at::DATE) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_tx_date
  FROM amazon_transactions
)

SELECT DISTINCT user_id
FROM ordered_tx
WHERE prev_tx_date IS NOT NULL AND tx_date - prev_tx_date <= 7;

# 2. Filtering With Aggregates (Particularly HAVING vs. WHERE)

Why It’s Onerous: Folks usually don’t perceive the execution order in SQL, which is: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY. This order signifies that WHERE filters rows earlier than aggregation, and HAVING filters after. That additionally, logically, means that you could’t use combination features within the WHERE clause.

Frequent Mistake: Making an attempt to make use of combination features in WHERE in a grouped question and getting an error.

Instance: This interview query asks you to search out the whole income made by every vineyard. Solely wineries the place 90 is the bottom variety of factors for any of their varieties ought to be thought of.

Many will see this as a straightforward query and unexpectedly write this question.

SELECT vineyard,
       selection,
       SUM(value) AS total_revenue
FROM winemag_p1
WHERE MIN(factors) >= 90
GROUP BY vineyard, selection
ORDER BY vineyard, total_revenue DESC;

Nevertheless, that code will throw an error stating that combination features aren’t allowed within the WHERE clause. This beautiful a lot explains all the pieces. The answer? Transfer the filtering situation from WHERE to HAVING.

SELECT vineyard,
       selection,
       SUM(value) AS total_revenue
FROM winemag_p1
GROUP BY vineyard, selection
HAVING MIN(factors) >= 90
ORDER BY vineyard, total_revenue DESC;

# 3. Self-Joins for Time-Based mostly or Occasion-Based mostly Comparisons

Why It’s Onerous: The thought of becoming a member of a desk with itself is kind of unintuitive, so candidates usually neglect it’s an choice.

Frequent Mistake: Utilizing subqueries and complicating the question when becoming a member of a desk with itself can be less complicated and sooner, particularly when filtering by dates or occasions.

Instance: Right here’s a query asking you to point out the change of each foreign money’s trade charge between 1 January 2020 and 1 July 2020.

You’ll be able to clear up this by writing an outer correlated subquery that fetches the July 1 trade charges, then subtracts the January 1 trade charges, which come from the internal subquery.

SELECT jan_rates.source_currency,
  (SELECT exchange_rate 
   FROM sf_exchange_rate 
   WHERE source_currency = jan_rates.source_currency AND date="2020-07-01") - jan_rates.exchange_rate AS distinction
FROM (SELECT source_currency, exchange_rate
      FROM sf_exchange_rate
      WHERE date="2020-01-01"
) AS jan_rates;

This returns an accurate output, however such an answer is unnecessarily sophisticated. A a lot less complicated answer, with fewer strains of code, includes self-joining a desk with itself after which making use of two date filtering circumstances within the WHERE clause.

SELECT jan.source_currency,
       jul.exchange_rate - jan.exchange_rate AS distinction
FROM sf_exchange_rate jan
JOIN sf_exchange_rate jul ON jan.source_currency = jul.source_currency
WHERE jan.date="2020-01-01" AND jul.date="2020-07-01";

# 4. Subqueries vs. Frequent Desk Expressions (CTEs)

Why It’s Onerous: Folks usually get caught on subqueries as a result of they study them earlier than Frequent Desk Expressions (CTEs) and proceed utilizing them for any question with layered logic. Nevertheless, subqueries can get messy in a short time.

Frequent Mistake: Utilizing deeply nested SELECT statements when CTEs can be a lot less complicated.

Instance: Within the interview query from Google and Netflix, that you must discover the highest actors based mostly on their common film ranking inside the style during which they seem most ceaselessly.

The answer utilizing CTEs is as follows.

WITH genre_stats AS
  (SELECT actor_name,
          style,
          COUNT(*) AS movie_count,
          AVG(movie_rating) AS avg_rating
   FROM top_actors_rating
   GROUP BY actor_name,
            style),
            
max_genre_count AS
  (SELECT actor_name,
          MAX(movie_count) AS max_count
   FROM genre_stats
   GROUP BY actor_name),
     
top_genres AS
  (SELECT gs.*
   FROM genre_stats gs
   JOIN max_genre_count mgc ON gs.actor_name = mgc.actor_name
   AND gs.movie_count = mgc.max_count),
     
top_genre_avg AS
  (SELECT actor_name,
          MAX(avg_rating) AS max_avg_rating
   FROM top_genres
   GROUP BY actor_name),
   
filtered_top_genres AS
  (SELECT tg.*
   FROM top_genres tg
   JOIN top_genre_avg tga ON tg.actor_name = tga.actor_name
   AND tg.avg_rating = tga.max_avg_rating),
     ranked_actors AS
  (SELECT *,
          DENSE_RANK() OVER (
                             ORDER BY avg_rating DESC) AS rank
   FROM filtered_top_genres),
   
final_selection AS
  (SELECT MAX(rank) AS max_rank
   FROM ranked_actors
   WHERE rank <= 3)
   
SELECT actor_name,
       style,
       avg_rating
FROM ranked_actors
WHERE rank <=
    (SELECT max_rank
     FROM final_selection);

It’s comparatively sophisticated, but it surely nonetheless consists of six clear CTEs, with the code’s readability enhanced by clear aliases.

Curious what the identical answer would appear to be utilizing solely subqueries? Right here it’s.

SELECT ra.actor_name,
       ra.style,
       ra.avg_rating
FROM (
    SELECT *,
           DENSE_RANK() OVER (ORDER BY avg_rating DESC) AS rank
    FROM (
        SELECT tg.*
        FROM (
            SELECT gs.*
            FROM (
                SELECT actor_name,
                       style,
                       COUNT(*) AS movie_count,
                       AVG(movie_rating) AS avg_rating
                FROM top_actors_rating
                GROUP BY actor_name, style
            ) AS gs
            JOIN (
                SELECT actor_name,
                       MAX(movie_count) AS max_count
                FROM (
                    SELECT actor_name,
                           style,
                           COUNT(*) AS movie_count,
                           AVG(movie_rating) AS avg_rating
                    FROM top_actors_rating
                    GROUP BY actor_name, style
                ) AS genre_stats
                GROUP BY actor_name
            ) AS mgc
            ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
        ) AS tg
        JOIN (
            SELECT actor_name,
                   MAX(avg_rating) AS max_avg_rating
            FROM (
                SELECT gs.*
                FROM (
                    SELECT actor_name,
                           style,
                           COUNT(*) AS movie_count,
                           AVG(movie_rating) AS avg_rating
                    FROM top_actors_rating
                    GROUP BY actor_name, style
                ) AS gs
                JOIN (
                    SELECT actor_name,
                           MAX(movie_count) AS max_count
                    FROM (
                        SELECT actor_name,
                               style,
                               COUNT(*) AS movie_count,
                               AVG(movie_rating) AS avg_rating
                        FROM top_actors_rating
                        GROUP BY actor_name, style
                    ) AS genre_stats
                    GROUP BY actor_name
                ) AS mgc
                ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
            ) AS top_genres
            GROUP BY actor_name
        ) AS tga
        ON tg.actor_name = tga.actor_name AND tg.avg_rating = tga.max_avg_rating
    ) AS filtered_top_genres
) AS ra
WHERE ra.rank <= (
    SELECT MAX(rank)
    FROM (
        SELECT *,
               DENSE_RANK() OVER (ORDER BY avg_rating DESC) AS rank
        FROM (
            SELECT tg.*
            FROM (
                SELECT gs.*
                FROM (
                    SELECT actor_name,
                           style,
                           COUNT(*) AS movie_count,
                           AVG(movie_rating) AS avg_rating
                    FROM top_actors_rating
                    GROUP BY actor_name, style
                ) AS gs
                JOIN (
                    SELECT actor_name,
                           MAX(movie_count) AS max_count
                    FROM (
                        SELECT actor_name,
                               style,
                               COUNT(*) AS movie_count,
                               AVG(movie_rating) AS avg_rating
                        FROM top_actors_rating
                        GROUP BY actor_name, style
                    ) AS genre_stats
                    GROUP BY actor_name
                ) AS mgc
                ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
            ) AS tg
            JOIN (
                SELECT actor_name,
                       MAX(avg_rating) AS max_avg_rating
                FROM (
                    SELECT gs.*
                    FROM (
                        SELECT actor_name,
                               style,
                               COUNT(*) AS movie_count,
                               AVG(movie_rating) AS avg_rating
                        FROM top_actors_rating
                        GROUP BY actor_name, style
                    ) AS gs
                    JOIN (
                        SELECT actor_name,
                               MAX(movie_count) AS max_count
                        FROM (
                            SELECT actor_name,
                                   style,
                                   COUNT(*) AS movie_count,
                                   AVG(movie_rating) AS avg_rating
                            FROM top_actors_rating
                            GROUP BY actor_name, style
                        ) AS genre_stats
                        GROUP BY actor_name
                    ) AS mgc
                    ON gs.actor_name = mgc.actor_name AND gs.movie_count = mgc.max_count
                ) AS top_genres
                GROUP BY actor_name
            ) AS tga
            ON tg.actor_name = tga.actor_name AND tg.avg_rating = tga.max_avg_rating
        ) AS filtered_top_genres
    ) AS ranked_actors
    WHERE rank <= 3
);

There may be redundant logic repeated throughout subqueries. What number of subqueries is that? I don’t know. The code is inconceivable to take care of. Despite the fact that I simply wrote it, I’d nonetheless want half a day to know it if I needed to vary one thing tomorrow. Moreover, the fully meaningless subquery aliases don’t assist.

# 5. Dealing with NULLs in Logic

Why It’s Onerous: Candidates usually assume that NULL is the same as one thing. It’s not. NULL isn’t equal to something — not even itself. Logic involving NULLs behaves otherwise from logic involving precise values.

Frequent Mistake: Utilizing = NULL as a substitute of IS NULL in filtering or lacking output rows as a result of NULLs break the situation logic.

Instance: There’s an interview query by IBM that asks you to calculate the whole variety of interactions and the whole variety of contents created for every buyer.

It doesn’t sound too tough, so that you would possibly write this answer with two CTEs, the place one CTE counts the variety of interactions per buyer, whereas the opposite counts the variety of content material objects created by a buyer. Within the closing SELECT, you FULL OUTER JOIN the 2 CTEs, and you’ve got the answer. Proper?

WITH interactions_summary AS
  (SELECT customer_id,
          COUNT(*) AS total_interactions
   FROM customer_interactions
   GROUP BY customer_id),
   
content_summary AS
  (SELECT customer_id,
          COUNT(*) AS total_content_items
   FROM user_content
   GROUP BY customer_id)
   
SELECT i.customer_id,
  i.total_interactions,
  c.total_content_items
FROM interactions_summary AS i
FULL OUTER JOIN content_summary AS c ON i.customer_id = c.customer_id
ORDER BY customer_id;

Virtually proper. Right here’s the output. (By the way in which, you see double citation marks (“”) as a substitute of NULL. That’s how the StrataScratch UI shows it, however belief me, the engine nonetheless treats them for what they’re: NULL values).

5. Handling NULLs in Logic

The highlighted rows include NULLs. This makes the output incorrect. A NULL worth is neither the shopper ID nor the variety of interactions and contents, which the query explicitly asks you to point out.

What we’re lacking within the above answer is COALESCE() to deal with NULLs within the closing SELECT. Now, all the purchasers with out interactions will get their IDs from the content_summary CTE. Additionally, for purchasers that don’t have interactions, or content material, or each, we’ll now substitute NULL with 0, which is a legitimate quantity.

WITH interactions_summary AS
  (SELECT customer_id,
          COUNT(*) AS total_interactions
   FROM customer_interactions
   GROUP BY customer_id),
   
content_summary AS
  (SELECT customer_id,
          COUNT(*) AS total_content_items
   FROM user_content
   GROUP BY customer_id)
   
SELECT COALESCE(i.customer_id, c.customer_id) AS customer_id,
       COALESCE(i.total_interactions, 0) AS total_interactions,
       COALESCE(c.total_content_items, 0) AS total_content_items
FROM interactions_summary AS i
FULL OUTER JOIN content_summary AS c ON i.customer_id = c.customer_id
ORDER BY customer_id;

# 6. Group-Based mostly Deduplication

Why It’s Onerous: Group-based deduplication means you’re choosing one row per group, e.g., “most up-to-date”, “highest rating”, and many others. At first, it sounds such as you solely want to choose one row per consumer. However you’ll be able to’t use GROUP BY except you combination. Then again, you usually want a full row, not a single worth that aggregation and GROUP BY return.

Frequent Mistake: Utilizing GROUP BY + LIMIT 1 (or DISTINCT ON, which is PostgreSQL-specific) as a substitute of ROW_NUMBER() or RANK(), the latter in order for you ties included.

Instance: This query asks you to determine the best-selling merchandise for every month, and there’s no have to separate months by yr. The perfect-selling merchandise is calculated as unitprice * amount.

The naive method can be this. First, extract the sale month from invoicedate, choose description, and discover the whole gross sales by summing unitprice * amount. Then, to get the whole gross sales by month and product description, we merely GROUP BY these two columns. Lastly, we solely want to make use of ORDER BY to kind the output from the very best to the worst-selling product and use LIMIT 1 to output solely the primary row, i.e., the best-selling merchandise.

SELECT DATE_PART('MONTH', invoicedate) AS sale_month,
       description,
       SUM(unitprice * amount) AS total_paid
FROM online_retail
GROUP BY sale_month, description
ORDER BY total_paid DESC
LIMIT 1;

As I mentioned, that is naive; the output considerably resembles what we want, however we want this for each month, not only one.

Group-Based Deduplication

One of many right approaches is to make use of the RANK() window operate. With this method, we observe an identical technique to the earlier code. The distinction is that the question now turns into a subquery within the FROM clause. As well as, we use RANK() to partition the info by month after which rank the rows inside every partition (i.e., for every month individually) from the best-selling to the worst-selling merchandise.

Then, in the primary question, we merely choose the required columns and output solely rows the place the rank is 1 utilizing the WHERE clause.

SELECT month,
       description,
       total_paid
FROM
  (SELECT DATE_PART('month', invoicedate) AS month,
          description,
          SUM(unitprice * amount) AS total_paid,
          RANK() OVER (PARTITION BY DATE_PART('month', invoicedate) ORDER BY SUM(unitprice * amount) DESC) AS rnk
   FROM online_retail
   GROUP BY month, description) AS tmp
WHERE rnk = 1;

# Conclusion

The six ideas we’ve coated generally seem in SQL coding interview questions. Take note of them, follow interview questions that contain these ideas, study the right approaches, and also you’ll considerably enhance your possibilities in your interviews.

Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor educating analytics, and is the founding father of StrataScratch, a platform serving to information scientists put together for his or her interviews with actual interview questions from high firms. Nate writes on the newest traits within the profession market, offers interview recommendation, shares information science tasks, and covers all the pieces SQL.

Most Candidates Fail These SQL Ideas in Knowledge Interviews

# 1. Window Features

# 2. Filtering With Aggregates (Particularly HAVING vs. WHERE)

# 3. Self-Joins for Time-Based mostly or Occasion-Based mostly Comparisons

# 4. Subqueries vs. Frequent Desk Expressions (CTEs)

# 5. Dealing with NULLs in Logic

# 6. Group-Based mostly Deduplication

# Conclusion

Pandas: Superior GroupBy Strategies for Complicated Aggregations

Strolling sooner, hanging out much less

Google AI Analysis Releases DeepSomatic: A New AI Mannequin that Identifies Most cancers Cell Genetic Variants

LEAVE A REPLY Cancel reply

Most Popular

pockets – The place can I simply earn small quantities of Bitcoin?

Sharepoint ToolShell assaults focused orgs throughout 4 continents

Samsung bets on Android XR with the launch of its new Galaxy XR headset

☑ Pure USD & Euro Index – Real Buying and selling Technique – Buying and selling Methods – 22 October 2025

Recent Comments

ABOUT US

POPULAR POSTS

pockets – The place can I simply earn small quantities of Bitcoin?

Sharepoint ToolShell assaults focused orgs throughout 4 continents

Samsung bets on Android XR with the launch of its new Galaxy XR headset

POPULAR CATEGORY