#Prefetch Filter complex relationship

63 messages · Page 1 of 1 (latest)

meager vigil
#

Hello, i have a problem that i cant figure out how to filter correct relations.
Here are my models:


class Company(models.Model):
    customers = models.ManyToManyField("app_organisation.Customer", related_name="companies")
class CustomerTask(models.Model):
    company = models.ManyToManyField(Company, related_name="tasks")
class Customer(models.Model):
    tasks = models.ManyToManyField(CustomerTask, related_name="customer")

Basically the structure separates customers and tasks for each company.

In my Viewset im trying to prefetch the tables and filter out for each company customers and tasks:


def get_queryset(self):
        # Here is the wrong filter. both queries do not work as expected

        # task_subquery = CustomerTask.objects.filter(customer=OuterRef("id"), company__in=Subquery(Company.objects.filter(tasks=OuterRef("id")).values("id"))).values("id")

        task_subquery = CustomerTask.objects.filter(customer=OuterRef("id"),
                        company__in=OuterRef("company__id")).values("id")

        company_query = Company.objects.prefetch_related(
            Prefetch("customers", queryset=Customer.objects.prefetch_related(
            Prefetch("tasks", queryset=CustomerTask.objects.filter(id__in=Subquery(task_subquery)
                )))))

        return company_query

My Serializers have the following schema:


class CustomerTaskSerializer(serializers.ModelSerializer):
    class Meta:
        model = CustomerTask
        fields = ["id", "name"]


class CustomerSerializer(serializers.ModelSerializer):
    tasks = CustomerTaskSerializer(many=True, read_only=True)

    class Meta:
        model = Customer
        fields = ["id", "name", "tasks"]


class CompanySerializer(serializers.ModelSerializer):
    customers = CustomerSerializer(many=True, read_only=True)

    class Meta:
        model = Company
        fields = "__all__"
        read_only_fields = ["name"]

When i return company i want to input its customers and tasks which belong to the company.

In get_query i struggle to filter that correctly.
I have Company A and Company B
both have Customer C
and Company A + Customer C have Task D and Task E

but it returns for both companies Task D on Customer C

south cargo
#

Hi, What is the aim of the query? As I understand it, you want to get the ‘Company’ of all ‘CustomerTask’?

meager vigil
#

company2 shouldnt have Task1, and company1 should have Task2. thats what my database shows me

#
{
    "table": "app_organisation_customer_task_company",
    "rows":
    [
        {
            "id": 9,
            "customertask_id": 2,
            "company_id": 1
        },
        {
            "id": 10,
            "customertask_id": 1,
            "company_id": 1
        }
    ]
}


{
    "table": "app_organisation_customer_tasks",
    "rows":
    [
        {
            "id": 8,
            "customer_id": 1,
            "customertask_id": 2
        },
        {
            "id": 10,
            "customer_id": 1,
            "customertask_id": 1
        }
    ]
}

both companies have same customer

meager vigil
#


task_subquery = CustomerTask.objects.filter(customers=OuterRef("customers"), company=OuterRef("company__id")).values("id")

customer_task_prefetch = Prefetch("tasks", queryset=CustomerTask.objects.filter(id__in=Subquery(task_subquery)))

customer_prefetch = Prefetch("customers", queryset=Customer.objects.prefetch_related(customer_task_prefetch))

companies = Company.objects.prefetch_related(customer_prefetch)

for company in companies:
    cprint(f"Company: {company.id}")
    for customer in company.customers.all():
        cprint(f"  Customer: {customer.id}")
        for task in customer.tasks.all():
            cprint(f"    Task: {task.id}")

what i get:
Company: 1
  Customer: 1
    Task: 1
    Task: 2
Company: 2
  Customer: 1
    Task: 1
    Task: 2

what it should be:
Company: 1
  Customer: 1
    Task: 1
    Task: 2
Company: 2
  Customer: 1

#

the problem lies in customertask subquery:

task_subquery = CustomerTask.objects.filter(customers=OuterRef("customers"), company=OuterRef("company__id")).values("id")

it doesnt filter out by company

here is what really works

        for company in companies:
            cprint(f"Company: {company.id}")
            for customer in company.customers.all():
                cprint(f"  Customer: {customer.id}")
                for task in customer.tasks.filter(company=company): # !company=company
                    cprint(f"    Task: {task.id}")
#
task_subquery = CustomerTask.objects.filter(customers=OuterRef("customers"), id=OuterRef("customers__companies__tasks__id")).values("id")

i dont understand it returns all tasks foreach company

south cargo
#

You have many to many links. This means that a task and its linked customer data record can exist in several companies. I also wonder why you link CustomerTask with Company? Shouldn't it actually be linked to the Customer? After all, it is ‘his’ (Customer) task.

#

You then link Customer with CustomerTask... So if you now take the CustomerTask as the basis, it accesses the Company, the Company then accesses the Customer and the Customer then has tasks and these tasks are ALL of the Customer.

#

Maybe someone else has understood it, but I need a little more information to be able to help you:

  1. what are you going to do with the data? Are they just for display? If so, what should it look like?
  2. what do the tables and data look like? (Graphic with 2-3 data sets per table would be sufficient)
meager vigil
south cargo
#

and then you want to output a list of the tasks of a company and their customers?

wild briar
#

I think your data model is designed incorrectly. You're using many to many fields when you may be better off putting a foreign key (one to many).

For example. Should a customer have one or many company's associated with it? The company should have many customers but the reverse probably isn't true.

Similarly, a customer task should only be related to a single customer. While the customer can have many customer tasks related to it.

meager vigil
#

it is desired. i mean by that, company A and Company B has sometimes same customers and sometimes same tasks

wild briar
#

Is that what you want?

#

Oof

meager vigil
wild briar
#

Honestly, you're going to have a really hard time with that.

#

The customers bit is understandable. The tasks one is the problem

meager vigil
#

everything works fine. like adding removing. but i cant query/prefetch the data correctly

wild briar
#

Yeah, that tracks. Things are too interrelated.

meager vigil
#

ive found a way but it produces n+1 problem, ill show you

wild briar
#

Well maybe that's fine in this case. You shouldn't be looping over all customers except for internal reporting needs.

meager vigil
#
class CompanyViewSet(BaseViewSet, AllActionMixin, ModifyActionMixin):
    queryset = Company.objects.all()
    serializer_class = CompanySerializer

    def get_queryset(self):
      companies = Company.objects.prefetch_related("customers","customers__tasks__company").order_by("name")
      return companies

class CompanySerializer(serializers.ModelSerializer):
    class Meta:
        model = Company
        fields = ["customers", "name"]
        read_only_fields = ["name"]

    def to_representation(self, instance):
        ret = super().to_representation(instance)
        if ret.get("customers"):
            ret["customers"] = [
                {
                    "id": customer.id,
                    "name": customer.name,
                    "tasks": [{"id": task.id, "name": task.name} for task in (customer.tasks.filter(company=instance))],
                }
                for customer in instance.customers.all().order_by("name")
            ]
        return ret
meager vigil
wild briar
#

Yeah, I'm going to argue your company list view shouldn't show ALL your data

#

Force the client of the api to make a request per company to get the details.

south cargo
# meager vigil yes

ok then you have to design the query in such a way that you 1. select the company, then the respective tasks from the companies and from these tasks (per company) the customer.

meager vigil
meager vigil
wild briar
#

Print the queries

meager vigil
wild briar
#

I don't want to see them. I meant for you to analyze.

south cargo
#

like this:

SELECT * FROM company LEFT JOIN customertask ON company.id = customertask.companyid LEFT JOIN customer ON customer.id = customertask.customerid
wild briar
#

Regardless, I think you're going to want to change your approach for this problem. You're trying to access too much data. Its going to grow exponentially and cause performance problems down the line (1000 companies with 100k tasks each). The better approach is to simplify the problem by changing the API design.

meager vigil
wild briar
meager vigil
meager vigil
wild briar
#

The prefetch for all customers is limited by a big ID IN (...)
There's no limiting by company. So then when you try to filter on OuterRef company, it's finding all the companies for that customer which is why you're getting duplicate tasks.

wild briar
meager vigil
wild briar
#

Right because you're going across two many to many relationships. Any company that has a relationship to that task somehow, is going to match on that filter.

meager vigil
wild briar
#

Not easily.

#

Do you understand why this isn't working though?

#

You could use a Common Table Expression. Or drop to raw sql. Or maybe use the extra queryset method

meager vigil
meager vigil
# wild briar You're accessing data that hasn't been prefetched. Look at the duplicate queries...
SELECT "app_organisation_customer_task"."id", "app_organisation_customer_task"."name" FROM "app_organisation_customer_task" INNER JOIN "app_organisation_customer_tasks" ON ("app_organisation_customer_task"."id" = "app_organisation_customer_tasks"."customertask_id") INNER JOIN "app_organisation_customer_task_company" ON ("app_organisation_customer_task"."id" = "app_organisation_customer_task_company"."customertask_id") WHERE ("app_organisation_customer_tasks"."customer_id" = 131 AND "app_organisation_customer_task_company"."company_id" = 4)

query that gets duplicated :
"app_organisation_customer_tasks"."customer_id" = 131

companies = Company.objects.prefetch_related("customers","customers__tasks").order_by("name")

i dont understand what should i prefetch here

wild briar
#

Hmm, I'm not sure either

south cargo
meager vigil
wild briar
meager vigil
wild briar
#

No, the image you sent me and your models

meager vigil
#

the M2M box contains the tables that were created. one thing i notice ive mistaken the name of a table:
i wrote

app_organisation_company_customer_tasks
instead of
app_organisation_customer_task_company
#

i dont understand whats wrong with relations.
in app_organisation_customer_task_company
you can add tasks to different companies

south cargo