SQL Joins and Aggregations Deep Dive: INNER, LEFT, CROSS, SELF, GROUP BY and HAVING
Joins and aggregations are the core of relational data querying. Most developers know the basics, but interviews and real analytical queries require a deeper understanding: when FULL OUTER JOIN is the right tool, how SELF JOIN works, what ROLLUP adds, and when to use HAVING vs WHERE.
INNER JOIN
Returns only rows that have matching values in both tables:
sql-- Orders with their customer names SELECT o.id AS order_id, c.name AS customer_name, o.total, o.created_at FROM orders o INNER JOIN customers c ON c.id = o.customer_id WHERE o.total > 100 ORDER BY o.created_at DESC; -- Customers excluded: those with no orders -- Orders excluded: those with no matching customer (orphaned rows)
LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table, and matching rows from the right. Non-matching right rows become NULL:
sql-- All customers, whether or not they have orders SELECT c.name, COUNT(o.id) AS order_count, SUM(o.total) AS total_spent, MAX(o.created_at) AS last_order_date FROM customers c LEFT JOIN orders o ON o.customer_id = c.id GROUP BY c.id, c.name ORDER BY total_spent DESC NULLS LAST; -- Find customers who have NEVER placed an order SELECT c.name, c.email FROM customers c LEFT JOIN orders o ON o.customer_id = c.id WHERE o.id IS NULL; -- NULL on the right side = no match = no orders
RIGHT JOIN
Mirror of LEFT JOIN β returns all rows from the right table, matching rows from the left:
sql-- All orders, even if the customer was deleted SELECT o.id, o.total, c.name FROM customers c RIGHT JOIN orders o ON c.id = o.customer_id; -- In practice, RIGHT JOIN is rarely used -- you can always rewrite as LEFT JOIN -- by swapping the table order. LEFT JOIN is more readable.
FULL OUTER JOIN
Returns all rows from both tables. Non-matching rows on either side get NULLs:
sql-- Find all mismatches: orders without customers AND customers without orders SELECT c.id AS customer_id, c.name AS customer_name, o.id AS order_id, o.total FROM customers c FULL OUTER JOIN orders o ON c.id = o.customer_id WHERE c.id IS NULL OR o.id IS NULL; -- c.id IS NULL: orders with no matching customer (orphaned) -- o.id IS NULL: customers with no orders -- Reconcile two lists: find items in list A, list B, or both SELECT a.product_id AS in_warehouse_a, b.product_id AS in_warehouse_b, COALESCE(a.quantity, 0) + COALESCE(b.quantity, 0) AS total_quantity FROM warehouse_a a FULL OUTER JOIN warehouse_b b ON a.product_id = b.product_id;
CROSS JOIN
Returns the Cartesian product β every row in the left table combined with every row in the right:
sql-- Generate all possible size/color combinations SELECT s.name AS size, c.name AS color, s.name || '-' || c.name AS sku_suffix FROM sizes s CROSS JOIN colors c; -- 5 sizes Γ 8 colors = 40 rows -- Generate a date range (using generate_series in PostgreSQL) SELECT d::date AS report_date, EXTRACT(DOW FROM d) AS day_of_week FROM generate_series( '2025-01-01'::date, '2025-12-31'::date, '1 day'::interval ) d; -- Fill gaps in time-series data using CROSS JOIN WITH dates AS ( SELECT generate_series( date_trunc('month', NOW()) - interval '11 months', date_trunc('month', NOW()), interval '1 month' )::date AS month ), users AS ( SELECT DISTINCT user_id FROM events ) SELECT u.user_id, d.month, COUNT(e.id) AS event_count FROM users u CROSS JOIN dates d LEFT JOIN events e ON e.user_id = u.user_id AND date_trunc('month', e.created_at) = d.month GROUP BY u.user_id, d.month ORDER BY u.user_id, d.month;
SELF JOIN
A table joined to itself. Used for hierarchical data and comparing rows within the same table:
sql-- Employee-manager hierarchy SELECT e.name AS employee, m.name AS manager, e.department FROM employees e LEFT JOIN employees m ON m.id = e.manager_id; -- e is the employee, m is their manager (same table, different alias) -- Find employees earning more than their manager SELECT e.name AS employee, e.salary AS employee_salary, m.name AS manager, m.salary AS manager_salary FROM employees e JOIN employees m ON m.id = e.manager_id WHERE e.salary > m.salary; -- Find duplicate email addresses SELECT a.email, a.id AS id_1, b.id AS id_2 FROM users a JOIN users b ON a.email = b.email AND a.id < b.id; -- a.id < b.id prevents showing (1,2) and (2,1) -- each pair once
GROUP BY Deep Dive
sql-- Basic aggregation SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary, MIN(salary) AS min_salary, MAX(salary) AS max_salary, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary, SUM(salary) AS total_payroll FROM employees GROUP BY department ORDER BY total_payroll DESC; -- GROUP BY multiple columns SELECT department, EXTRACT(YEAR FROM hire_date) AS hire_year, COUNT(*) AS hired FROM employees GROUP BY department, hire_year ORDER BY department, hire_year; -- Filter groups with HAVING (applied AFTER aggregation) SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent FROM orders WHERE created_at >= NOW() - INTERVAL '1 year' -- filter rows BEFORE grouping GROUP BY customer_id HAVING COUNT(*) >= 5 -- filter groups AFTER aggregation AND SUM(total) > 500 ORDER BY total_spent DESC;
WHERE vs HAVING
sql-- WHERE filters individual rows before grouping -- HAVING filters groups after aggregation -- This is WRONG (cannot use aggregate in WHERE): SELECT department, COUNT(*) FROM employees WHERE COUNT(*) > 10 -- ERROR: aggregate functions not allowed in WHERE GROUP BY department; -- Correct: use HAVING SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department HAVING COUNT(*) > 10; -- You CAN use both: SELECT department, COUNT(*) AS headcount FROM employees WHERE salary > 50000 -- filter rows first (only high earners) GROUP BY department HAVING COUNT(*) > 5; -- then filter groups (departments with 5+)
ROLLUP and CUBE
Generate subtotals and grand totals automatically:
sql-- ROLLUP: subtotals for each level of hierarchy SELECT COALESCE(department, 'All Departments') AS department, COALESCE(CAST(EXTRACT(YEAR FROM hire_date) AS TEXT), 'All Years') AS year, COUNT(*) AS headcount, SUM(salary) AS total_salary FROM employees GROUP BY ROLLUP(department, EXTRACT(YEAR FROM hire_date)) ORDER BY department, year; -- Output includes: -- (Engineering, 2023): subtotal for Engineering 2023 -- (Engineering, NULL): subtotal for all Engineering -- (NULL, NULL): grand total for all departments -- CUBE: all combinations of subtotals GROUP BY CUBE(department, region, role) -- Generates subtotals for every combination: all 8 combinations of 3 dimensions
Common Aggregation Patterns
Conditional aggregation (FILTER or CASE)
sql-- Count by category in one query SELECT COUNT(*) FILTER (WHERE status = 'completed') AS completed, COUNT(*) FILTER (WHERE status = 'pending') AS pending, COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled, COUNT(*) AS total FROM orders; -- Equivalent with CASE WHEN (portable across databases) SELECT COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed, COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending, SUM(CASE WHEN status = 'completed' THEN total ELSE 0 END) AS completed_revenue FROM orders; -- Pivot: rows to columns SELECT product_id, SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 1 THEN amount ELSE 0 END) AS q1, SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 2 THEN amount ELSE 0 END) AS q2, SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 3 THEN amount ELSE 0 END) AS q3, SUM(CASE WHEN EXTRACT(QUARTER FROM sale_date) = 4 THEN amount ELSE 0 END) AS q4 FROM sales GROUP BY product_id;
Running totals
sqlSELECT order_date, daily_revenue, SUM(daily_revenue) OVER (ORDER BY order_date) AS cumulative_revenue FROM daily_sales;
Common Interview Questions
Q: What is the difference between WHERE and HAVING?
WHERE filters individual rows before they are grouped β it cannot reference aggregate functions. HAVING filters groups after GROUP BY aggregation β it can reference aggregate functions. Use WHERE to reduce the rows entering aggregation (more efficient), and HAVING to filter the resulting groups.
Q: What does a CROSS JOIN produce and when would you use it?
A CROSS JOIN produces the Cartesian product β every row in the first table paired with every row in the second table. N rows Γ M rows = NΓM results. Use it to generate all combinations (product SKUs from sizes and colors), to fill time-series gaps by joining a date spine with entity lists, or to generate test data.
Q: How does a SELF JOIN work?
A SELF JOIN joins a table to itself by aliasing it with two different names. It is used for hierarchical data (employees and their managers), for finding relationships between rows in the same table (pairs of users in the same city), or for duplicate detection. The join condition typically compares different columns or adds a constraint like a.id < b.id to avoid duplicates.
Practice SQL on Froquiz
Joins and aggregations are core SQL interview topics at every level. Test your SQL knowledge on Froquiz β from basic queries to advanced window functions and CTEs.
Summary
INNER JOIN: only matching rows from both tablesLEFT JOIN: all left rows, matching right rows (NULLs for no match) β useWHERE right.id IS NULLto find unmatchedFULL OUTER JOIN: all rows from both tables β unmatched rows get NULLs on the other sideCROSS JOIN: Cartesian product β every row Γ every row β use for combinations and date spinesSELF JOIN: table joined to itself with two aliases β use for hierarchies and pair comparisonsWHEREfilters rows before grouping;HAVINGfilters groups after aggregationROLLUPgenerates hierarchical subtotals;CUBEgenerates all combinationsCOUNT(*) FILTER (WHERE condition)is the cleanest conditional aggregation syntax