Creating Basic Joins for your Queries

Setting up the FROM line of your Query

Purpose of JOINs

Books ERD

So, JOINs can be a bit confusing when you are first learning how to create the "FROM" line in SQL queries. Before you can really understand joins, it helps to understand the relationships between tables, usually created by a Foreign KEY (FK field in one table) linking to a Primary KEY (PK field in a different table).

To avoid duplication/etc, information about any one chunk of information can be split amongst many tables. So, in the example ERD (entity relationship diagram, schema design etc) above, the books table contains the book information only, but it does have a foreign key "publisher_id" that has the id number of a specific publisher.

The publishers table has only the publisher information, and its primary key "publisher_id" is what the foreign key "publisher_id" in the books table references.

Author information is kept in the authors table, so looking at just the books table tells you NOTHING about who the authors are for that book.

Because you can have more than one author per book, and authors can have more than one book, we create a table to connect those two tables together called "books_authors" which contains sets of unique combinations of book_id and author_id.

This is sometimes called a "JOIN" table, which can be confusing when doing the JOINs in a query, but they are kind of related.

If a book, say, "Mad Scientist Stories" with a book_id of 15 has 23 authors, book_id 15 will show up 23 times in the books_authors table, but each time it does, it will be matched up with a DIFFERENT author_id, once for each unique author_id:

15,2
15,34
15,255
15,13
15,15
and so on for 23 times

If an author has 10 books that they have authored, their author id (we'll say author 34 is Stan Lee), then author_id 34 shows up 10 times in books_authors, once for each different book(book_id). Notice above, apparently Stan Lee is one of the authors for "Mad Scientist Stories" in this example.

But, as we set a combined unique key on (book_id,author_id) a particular combination of author and book can only show up once. (so we will only see 15,34 once, even though book 15 shows up 23 times, and author 34 shows up ten times).

NOW, to set up JOINS (aka INNER JOINS) for your QUERY, you have to specify to the Query Optimizer (built into MySQL) how the data in one table matches up to another table. This USUALLY is just the PK to FK relationship mirrored via JOINs.

Types of Relationships and Joins

How to build your Query

Parts of a SELECT query

  1. SELECT followed by the list of fields you want to display
  2. FROM often the hardest part to set up at first, lists the tables and how they are joined. NOTE: This part is usually created first, which seems odd
  3. WHERE this is where your filtering conditions come into play
  4. GROUP BY used for aggregate queries (where you combine related data) using functions like count(), sum(), and avg() in the SELECT line
  5. ORDER BY for sorting
  6. Many other options.

Getting Started

What Tables to Use

First Table

Once you know which tables you have to use, you need to tell MySQL what tables to use, and how the information in the different tables matches up (ie how to link or join the tables together). If you only need one table, the from line is very easy

    FROM authors

    /* or */

    FROM books

    /* or */

    FROM publishers

But is there is more than one table, you have to tell the system how the other tables are connected. If not, the database will basically give you every possible combination, which is useless (basically every author is on every book and every publisher is for every book and so on :P)

So, we'll pick "A list of books and authors" which needs books, authors, and books_authors to get the info we need.

For general queries, it isn't always that crucial which table to list first, but I usually pick a edge table in the design and work my way through. As this database is about books, I'll pick that one first.
Books Table

Again, the very first table is easy, you just name it, and possibly create an alias for it ("b" in this case)

    FROM books  b

    /*
    so, search the books table, and we temporarily name it "b"  for convenience
    */

Adding Tables

Now, we need to add in the next table, and again, the order doesn't always matter.

HOWEVER, any table that you are JOINing to the FROM line HAS to connect to one of the tables that you have already put in the FROM line

As books_authors is the only one that connects to books

books_authors table

If we just listed the next table without telling SQL how they match up, we'd have a disaster like so

    FROM books b, books_authors ba

    /*
    WRONG, it matches every line in books with every line in books_authors
    */

So, instead, every table we add AFTER the first table listed, we have to JOIN it, and tell it ON what fields to check for matches. As the only possible connection between books and books_authors is "book_id" we do

    FROM books  b  JOIN books_authors ba  ON (b.book_id = ba.book_id)

This says only show me results where the book_id in the books table (b) matches up to a book_id in the books_authors table (ba).

This effectively has told your query to do this:

books JOIN books_authors

If we decide to add the publishers table, we can just do

    JOIN publishers p ON (p.publisher_id = b.publisher_id)

But that table is not needed to answer our question, so we will add in the authors table next, which if you look at our design makes visual sense, but again, usually the order of tables isn't that crucial

    FROM books b JOIN books_authors ba ON (b.book_id = ba.book_id) 
    JOIN authors a ON (ba.author_id = a.author_id)

Which represents:

books authors and join table

So, at this point, we have listed the three tables, and we have set up the connections (JOINS) between all three. Now if we do a query, we only see a result if ALL of those JOIN conditions are met.
If there is a book without a value in books_authors, it doesn't show up at all, same for authors.

All Tables Included

Even though our question does not need publishers, say we later want to know the publisher for each book as well as all the authors:

    FROM books b JOIN books_authors ba ON (b.book_id = ba.book_id)
    JOIN authors a ON (ba.author_id = a.author_id)
    JOIN publishers p  ON (b.publisher_id = p.publisher_id)

And of course, this represents:

Full books ERD

Building the Rest of your SELECT query

Page last updated on 20230913

HTML 5 Validation CSS 3 Validation Web Accessibility EValuator