SQL - Selecting master records based on criteria in child records

How do you select records whose detail records meet certain criteria?

For example, how do you select all invoices which contain both product A and product B?

The short answer is: You've got to use relational division.

It's best to continue with some more examples.

Common examples

Here are a few common examples:

  • Find all invoices which contain both milk and honey
  • Find all movies starring Brad Pitt and Jennifer Aniston
  • Find all football players who played for both FC Barcelona and Real Madrid.

In my case, I needed to find all files and folders in Nextcloud, which had two particular WebDAV properties.

All examples have two things in common. The first one is the presence of a one-to-many relationship. The second one is that records must meet criteria in their associated child records.

Solving the movies' example

Let's pick the movie example. Furthermore, let's assume we're using the Sakila example database.

A few words on the database structure relevant to our query:

  • film is the master table
  • film_actor is the relation between actor and film
  • film_actor.actor_id holds the actor id.
  • For the sake of simplicity, let's pretend the ids of Brad Pitt and Jennifer Aniston are 23 and 43.

Note: Unfortunately, I had to make those ids up because Sakila does not feature real actor names.

Anyways, here's how my first naive approach would have looked like:

SELECT f.film_id
FROM film f
  JOIN film_actor fa ON f.film_id = fa.film_id
WHERE actor_id = 23 AND actor_id = 43;

Pros will see the problem immediately: actor_id can't be both 23 and 43.

Luckily, there are quite a few ways to accomplished what we're looking for.

Here's one:

SELECT f.film_id
FROM film f
WHERE EXISTS (SELECT * FROM film_actor WHERE film_id = f.film_id AND actor_id = 23)
  AND EXISTS (SELECT * FROM film_actor WHERE film_id = f.film_id AND actor_id = 43);

I won't try to explain what's going on here. Instead, please find some pointers and resources worth checking out below.

Theoretical underpinnings

If you'd like to dig deeper, database professionals call this operation relational division. The above solution is just one out of many ways to accomplish the same thing. To get an overview, database guru Erwin Brandstetter has summarized a lot of them, including performance measurements, in this superb StackOverflow answer.


Here are a few more links I found helpful in understanding relational division:

Published by Robert Möstl

« Back to Blog