Using ON Versus WHERE Clauses to Combine and Filter Data in PostgreSQL Joins
May 19, 2020 • 10 Minute Read
Introduction
In an SQL query, data can be filtered in the WHERE clause or the ON clause of a join. This guide will examine the difference between the two in PostgreSQL.
Setup
In order to examine the difference between placing a filter condition in the ON clause and the WHERE clause, two tables have to be created that can be joined together. Otherwise the ON clause cannot be used.
The following code will create and populate two tables, one with employee data and another with department data, that can be joined together based on department number:
DROP TABLE IF EXISTS emp;
DROP TABLE IF EXISTS dept;
CREATE TABLE dept(
deptno INT,
dname VARCHAR(14),
loc VARCHAR(13),
CONSTRAINT pk_dept PRIMARY KEY(deptno)
);
INSERT INTO dept
(deptno, dname, loc)
VALUES
(10, 'ACCOUNTING', 'NEW YORK'),
(20, 'RESEARCH', 'DALLAS'),
(30, 'SALES', 'CHICAGO'),
(40, 'OPERATIONS', 'BOSTON')
;
CREATE TABLE emp(
empno INT,
ename VARCHAR(10),
job VARCHAR(9),
mgr INT,
hiredate DATE,
sal NUMERIC(7,2),
comm NUMERIC(7,2),
deptno INT,
CONSTRAINT pk_emp PRIMARY KEY(empno),
CONSTRAINT fk_deptno FOREIGN KEY(deptno) REFERENCES dept(deptno)
);
ALTER TABLE emp ADD CONSTRAINT fk_mgr FOREIGN KEY(mgr) REFERENCES emp(empno);
INSERT INTO emp
(empno, ename, job, mgr, hiredate, sal, comm, deptno)
VALUES
(7839, 'KING', 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 10),
(7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, NULL, 30),
(7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, NULL, 10),
(7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, NULL, 20),
(7788, 'SCOTT', 'ANALYST', 7566, '1987-07-13', 3000, NULL, 20),
(7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, NULL, 20)
;
COMMIT;
Table emp:
empno | ename | job | mgr | hiredate | sal | comm | deptno |
---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ||
7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | 30 | |
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | |
7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | 20 | |
7788 | SCOTT | ANALYST | 7566 | 0001-07-13 | 3000.00 | 20 | |
7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | 20 |
Table dept:
deptno | dname | loc |
---|---|---|
10 | ACCOUNTING | NEW YORK |
20 | RESEARCH | DALLAS |
30 | SALES | CHICAGO |
40 | OPERATIONS | BOSTON |
Using ON Versus WHERE Clauses in an Inner Join
In an inner join, adding a filter condition in the ON clause has the same affect on a query result.
Suppose there is a business requirement to return employee and department data only when there is a matching department number in both tables, and that the result should only contain rows where the department name is ACCOUNTING. The user wants to display all employee attributes along with department name and location.
The following three queries could be written to satisfy that requirement:
SELECT emp.*, dept.dname, dept.loc
FROM emp
INNER JOIN dept
ON emp.deptno = dept.deptno
AND dept.dname = 'ACCOUNTING'
;
SELECT emp.*, dept.dname, dept.loc
FROM emp
INNER JOIN dept
ON emp.deptno = dept.deptno
WHERE
dept.dname = 'ACCOUNTING'
;
SELECT emp.*, dept.dname, dept.loc
FROM emp, dept
WHERE
emp.deptno = dept.deptno
AND dept.dname = 'ACCOUNTING'
;
empno | ename | job | mgr | hiredate | sal | comm | deptno | dname | loc |
---|---|---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ACCOUNTING | NEW YORK | ||
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | ACCOUNTING | NEW YORK |
All three queries return the exact same result regardless of whether the filter and join condition are placed in the ON clause or the WHERE clause.
As far as performance goes, it makes no difference whether the filter condition is placed in the ON clause or the WHERE in PostgreSQL.
In terms of readability though, especially in complex queries that have multiple joins, it is easier to spot join conditions when they are placed in the ON clause and filter conditions when they are placed in the WHERE clause. In the example queries above, the second query is written this way.
Using ON Versus WHERE Clauses in a Left Join
In a left join, placing a filter condition in the ON clause will affect a query result differently.
Suppose there is a business requirement to return all employees from the emp table and look up department information only for employees that are in the ACCOUNTING department. The user wants to display all employee attributes along with department name and location.
This requirement could be satisfied by placing a filter condition in the ON clause so that the dept table is filtered on dept.dname = 'ACCOUNTING' before being left joined to emp:
SELECT emp.*, dept.dname, dept.loc
FROM emp
LEFT JOIN dept
ON emp.deptno = dept.deptno
AND dept.dname = 'ACCOUNTING'
;
empno | ename | job | mgr | hiredate | sal | comm | deptno | dname | loc |
---|---|---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ACCOUNTING | NEW YORK | ||
7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | 30 | |||
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | ACCOUNTING | NEW YORK | |
7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | 20 | |||
7788 | SCOTT | ANALYST | 7566 | 0001-07-13 | 3000.00 | 20 | |||
7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | 20 |
NULL entries are placed in dname and loc wherever a match could not be made to the dept table based on department number.
Suppose now that there is a business requirement to look up department information for all employees and then based on that lookup to return only rows were the employee is in the ACCOUNTING department. The user wants the same attributes displayed as before.
This time, to satisfy the requirement the filter condition in the query should come after the join since you'd want to first look up department information and then filter based on the results of that lookup:
SELECT emp.*, dept.dname, dept.loc
FROM emp
LEFT JOIN dept
ON emp.deptno = dept.deptno
WHERE dept.dname = 'ACCOUNTING'
;
empno | ename | job | mgr | hiredate | sal | comm | deptno | dname | loc |
---|---|---|---|---|---|---|---|---|---|
7839 | KING | PRESIDENT | 1981-11-17 | 5000.00 | 10 | ACCOUNTING | NEW YORK | ||
7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | 10 | ACCOUNTING | NEW YORK |
This result should look familiar. The inner join queries from the previous section returned the same result. In fact, this query is logically equivalent to doing an inner join. There first has to be a match on department number between emp and dept tables to be able to filter on dname = 'ACCOUNTING'.
It should be apparent now that the two seemingly similar left join queries have two completely different results. In the first example a filter condition was placed in the ON clause to filter the lookup table dept before the join. In the second example a filter condition was placed in the WHERE clause to filter rows after the join based on the lookup table column dname. This was effectively an inner join in disguise.
In a left join, understanding business requirements is key when deciding whether to place a filter condition in the ON clause or the WHERE clause.
Conclusion
In SQL, using the ON clause versus the WHERE clause to filter data in an outer join will cause the filter to be applied either before or after joining occurs and will lead to different results. In an inner join, whether a filter condition is placed in the ON clause or the WHERE clause does not impact a query result, although for readability placing join conditions in the ON clause and filter conditions in the WHERE is a good idea.