SQL - Selecting master records based on criteria in child records
How do you select records whose detail records meet certain criteria?
For example, how do you select all invoices which contain both product A and product B?
The short answer: Use relational division.
Common examples
Here are a few common examples:
- Find all invoices containing both milk and honey
- Find all movies starring Brad Pitt and Jennifer Aniston
- Find all football players who played for both FC Barcelona and Real Madrid
In my case, I needed to find all files and folders in Nextcloud which had two particular WebDAV properties.
All examples have two things in common. The first one is the presence of a one-to-many relationship. The second one is that records must meet criteria in their associated child records.
Solving the movies' example
Let's pick the movie example. As a basis, we use the Sakila example database.
Here are the involved tables:
film
is the master tablefilm_actor
is the relation betweenactor
andfilm
film_actor.actor_id
holds the actor id.- For the sake of simplicity, let's pretend the ids of Brad Pitt and Jennifer Aniston are
23
and43
.
Note: Unfortunately I had to make those ids up because Sakila does not feature real actor names.
Anyway, here's how a first naive approach might look:
SELECT f.film_id
FROM film f
JOIN film_actor fa ON f.film_id = fa.film_id
WHERE actor_id = 23 AND actor_id = 43;
However, this can't work because actor_id
can't be both 23
and 43
.
Luckily, there are quite a few ways to accomplished our goal.
Here's one variant:
SELECT f.film_id
FROM film f
WHERE EXISTS (SELECT * FROM film_actor WHERE film_id = f.film_id AND actor_id = 23)
AND EXISTS (SELECT * FROM film_actor WHERE film_id = f.film_id AND actor_id = 43);
I won't try to explain what's going on here. Instead, please find some pointers and resources below.
Theoretical underpinnings
If you'd like to dig deeper, this operation is called relational division. The above solution is just one out of many ways to accomplish it. To get an overview, database guru Erwin Brandstetter has summarized a lot of them (including performance measurements) in this superb StackOverflow answer.
Links
Here is a list of helpful links:
- Understanding Relational Division in SQL - YouTube
- Divided We Stand: The SQL of Relational Division - Simple Talk
- sql - Using same column multiple times in WHERE clause - Stack Overflow
- postgresql - SQL query to find all parent records where child records are a specific set - Database Administrators Stack Exchange