#How to achieve this using django orm?

12 messages · Page 1 of 1 (latest)

brisk trellis
#

Assume we have table Discussion and a table Message. Message has columns: id, created_at, text, author, discussion_id. Discussion has columns: id, created_at, participant1, participant2. For a given user, how can you create a queryset of the discussions sorted in their recent-ness. The recent-ness of a discussion is defined by the maximum created_at of the discussion of the latest message in this discussion.

Now, is there a way to do this WITHOUT SUBQUERIES and such that I can access the latest_message as if it was a one-to-one model of discussion (which means that I could access the message text by doing: discussion.latest_message.text AND NOT discussion.latest_message_text [which wouldve been annotated to the request]).

Here is what I came up with so far:

Discussion.objects.annotate(
        latest_message_date=Max('message__created_at'),  # Using reverse relation from Discussion to Message
        latest_message_text=Max('message__text'), # Somehow retrieves the text of the latest_message
        latest_message_author=Max('message__author__username')
    ).annotate(
        recent_date=Greatest(
            'created_at',
            F('latest_message_date')
        )
    ).filter(
        Q(participant1=user) | Q(participant2=user)
    ).order_by('-recent_date')

It works well but I was wondering if there was a way to somehow assign latest_message to the model of message instead of annotating every single field. I can survive without it, its just good to know.

elder ridge
#

I would use a JSONObject https://docs.djangoproject.com/en/5.1/ref/models/database-functions/#jsonobject

So your query would look something like:

Discussion.objects.annotate(
        latest_message=JSONObject(
              date=Max('message__created_at'),  # Using reverse relation from Discussion to Message
              text=Max('message__text'), # Somehow retrieves the text of the latest_message
              author=Max('message__author__username')
        )
    ).annotate(
        recent_date=Greatest(
            'created_at',
            F('latest_message__date')
        )
    ).filter(
        Q(participant1=user) | Q(participant2=user)
    ).order_by('-recent_date')
#

(Disclaimer: not tested)

brisk trellis
#

I get what you propose but my objective is to be able to access any attribute of message even if there are 1000 of them without writing each of them in the query. Also, what if I wanted to call a managers method on the latest_message. For instance, what if I wanted to do: discussion.latest_object.objects.random_manager_method().

elder ridge
#

In that case, I would probably just add a one to one field to the one of the models and have some code update that relationship when a new message get's created for that discussion.

brisk trellis
#

Yeah I guess, the only problem with this approach is that it adds complexity and creates redundancy. I guess what I want to achieve isnt possible using django orm

elder ridge
#

Well it is possible... but possibly not without the restrictions you are putting on yourself, but I obviously don't have the full context. Also why no to subqueries?

brisk trellis
#

The reason I want dont want subqueries is that, from what I understood, in my case, it would mean running the subquery on every row which is not performant especially for very large table. For instance, I could annotate a subquery that gets the latest message of each discussion but if I have 1 million discussions, it would run this subquery a million times.

#

What I would love is to have something like this:

Discussion.objects.annotate(
        latest_message=GetFirst(related_object='message', order_by='-created_at')
    ).annotate(
        recent_date=Greatest(
            'created_at',
            F('latest_message__created_at')
        )
    ).filter(
        Q(participant1=user) | Q(participant2=user)
    ).order_by('-recent_date')

where GetFirst would be a function where you specify the related_object and the way to order it. The difference with a subquery is that the related object would have to be a foreign field. This would allow the engine to format the query as a join query instead of a subquery which could be less efficient.

pallid valley
#

If you put an index on message.created_at it's going to be a long time before you notice a performance issue. @elder ridge's suggestion is great and reflects a common pattern in NoSQL databases, but you're right it is redundant.

brisk trellis
#

Im not sure I understand how the index would help here. From what I understand, indexing could help by providing sorting order based on the key in question. However, Im not trying to find the most recent messages overall, but rather the most recent message within each conversation.

While writing this however , I read about composite index which could be useful by indexing the grouping key (discussion) and then by the value key (created_at) which would allow to quickly find the highest created_at for each discussion.

However, will indexing + subqueries ever make it better than joining (+ indexing even) purely from an performance perspective? At the end of the day with subqueries, you will always have to run it once for each discussion? (But I guess that at a lower level, joining probably leads to the same thing since you need to check the ON condition for each row).

I guess it doesnt really matter at this point. Using an index will surely be enough either way

pallid valley
#

Good call, a composite index is going to get you even better performance. I'm not sure how different a subquery and join would look without seeing the raw sql. But I think you're right, the query optimizer is likely going to evaluate them to the same plan.