SQL - Selecting master records based on criteria in child records
How do you select records whose detail records meet certain criteria?
For example, how do you select all invoices which contain both product A and product B?
The short answer is: You've got to use relational division.
It's best to continue with some more examples.
Here are a few common examples:
- Find all invoices which contain both milk and honey
- Find all movies starring Brad Pitt and Jennifer Aniston
- Find all football players who played for both FC Barcelona and Real Madrid.
In my case, I needed to find all files and folders in Nextcloud, which had two particular WebDAV properties.
All examples have two things in common. The first one is the presence of a one-to-many relationship. The second one is that records must meet criteria in their associated child records.
Solving the movies' example
Let's pick the movie example. Furthermore, let's assume we're using the Sakila example database.
A few words on the database structure relevant to our query:
filmis the master table
film_actoris the relation between
film_actor.actor_idholds the actor id.
- For the sake of simplicity, let's pretend the ids of Brad Pitt and Jennifer Aniston are
Note: Unfortunately, I had to make those ids up because Sakila does not feature real actor names.
Anyways, here's how my first naive approach would have looked like:
SELECT f.film_id FROM film f JOIN film_actor fa ON f.film_id = fa.film_id WHERE actor_id = 23 AND actor_id = 43;
Pros will see the problem immediately:
actor_id can't be both
Luckily, there are quite a few ways to accomplished what we're looking for.
SELECT f.film_id FROM film f WHERE EXISTS (SELECT * FROM film_actor WHERE film_id = f.film_id AND actor_id = 23) AND EXISTS (SELECT * FROM film_actor WHERE film_id = f.film_id AND actor_id = 43);
I won't try to explain what's going on here. Instead, please find some pointers and resources worth checking out below.
If you'd like to dig deeper, database professionals call this operation relational division. The above solution is just one out of many ways to accomplish the same thing. To get an overview, database guru Erwin Brandstetter has summarized a lot of them, including performance measurements, in this superb StackOverflow answer.
Here are a few more links I found helpful in understanding relational division:
- Understanding Relational Division in SQL - YouTube
- Divided We Stand: The SQL of Relational Division - Simple Talk
- sql - Using same column multiple times in WHERE clause - Stack Overflow
- postgresql - SQL query to find all parent records where child records are a specific set - Database Administrators Stack Exchange