#Django ORM, efficiently accessing related objects

9 messages · Page 1 of 1 (latest)

obsidian mortar
#

Hi, I am trying to understand how to use the Django ORM. It seems like it is sometimes possible to access the same data through a forward relationship and a reverse relationship.

Context

Given these models:

class Author(models.Model):
    name = models.CharField(max_length=255)

class Book(models.Model):
    title = models.CharField(max_length=255)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)
    price = models.DecimalField(max_digits=6, decimal_places=2)

class Order(models.Model):
    book = models.ForeignKey(Book, on_delete=models.CASCADE)
    quantity = models.IntegerField()
    customer = models.ForeignKey(Customer, on_delete=models.CASCADE)

Let's say you want to find the total number of books written per author. It seems to me that this can be done two ways:

  1. Forward relationship from Book-->Author

Something like: Book.objects.values('author').annotate(book_count=Count('id')).order_by('author')

  1. Reverse relationship from Author-->Book

Author.objects.annotate((book_count=Count('book'))

Question

Assuming that both are valid, does it matter which one is used? Is one more efficient than the other? Thanks!

twin plaza
#

try both and see.

#

watch the generated SQL queries.

#

all else equal, I'd prefer the second, since it's simpler.

#
>>> Book.objects.values('author').annotate(book_count=Count('id')).order_by('author')

SELECT "app_book"."author_id",
       COUNT("app_book"."id") AS "book_count"
  FROM "app_book"
 GROUP BY "app_book"."author_id"
 ORDER BY "app_book"."author_id" ASC
 LIMIT 21
Execution time: 0.000944s [Database: default]
<QuerySet []>

>>> Author.objects.annotate(book_count=Count('book'))
SELECT "app_author"."id",
       "app_author"."name",
       COUNT("app_book"."id") AS "book_count"
  FROM "app_author"
  LEFT OUTER JOIN "app_book"
    ON ("app_author"."id" = "app_book"."author_id")
 GROUP BY "app_author"."id",
          "app_author"."name"
 LIMIT 21
Execution time: 0.000507s [Database: default]
<QuerySet []>
#

I'd have to stick some actual data in there to tell if the results are both accurate

drowsy loom
#

I have to imagine the first one would perform better because there's one less join. However, that may not be a big performance difference because these are pretty basic queries.

Using the .explain() method on the queryset to get the underlying explain analyze would be more telling. Though like offby1 pointed out, there needs to be production levels of data.

twin plaza
#

yeah I too would assume, if I knew nothing else, that fewer joins == better, but then ... my assumptions are pretty naive

#

unfortunately, the correct answer to almost all performance questions is "try both, measure, and see" 😐