#Deploy on Kubernetes Cluster - Permify D...

1 messages · Page 1 of 1 (latest)

lone yew
#

Hey @untold gull , that performance gap is pretty big. From our load tests, we do not get results as you mentioned, so I would like to figure out what’s going on. I have a few quick questions:

  • Are you using the latest version?
  • Can you share your Permify schema?
  • Can you share your deployment YAML too?
untold gull
#

Hey @lone yew thanks for the help
Yes, we are using Yes, we are using the latest image ghcr.io/permify/permify:latest

The schema is a placeholder for testing and won't represent our final use case but the schema in use in the load test is:

entity user {}

entity appgroup {
  relation member @user
}

entity person {
  relation user_of @user

  permission user = user_of
}

entity organization {
  relation admin_of @appgroup
  relation employee_of @person

  permission admin = admin_of.member
  permission manage = employee_of.user or admin
}

entity order {
  relation guardian @person
  relation organization @organization

  permission view = guardian.user or organization.manage
  permission edit = guardian.user or organization.manage
  permission delete = organization.admin
}

entity charge {
  relation order @order

  permission view = order.view
  permission edit = order.edit
  permission delete = order.delete
}

entity receivable {
  relation charge @charge
  relation origin @receivable

  permission view = charge.view or origin.view
  permission edit = charge.edit or origin.edit
  permission delete = charge.delete or origin.delete
}

entity invoice {
  relation receivable @receivable

  permission view = receivable.view
  permission edit = receivable.edit
  permission delete = receivable.delete
}
#

The deployment YAML is

#

The check permission we are running is like

checkPermissionRequest := &permifyTypes.PermissionCheckRequest{
  TenantId: "1",
  Metadata: &permifyTypes.PermissionCheckRequestMetadata{
    SchemaVersion: "",
    Depth:         20,
  },
  Entity:     &permifyTypes.Entity{
    Type: "order",
    Id:   "Order_1",
  },
  Permission: "edit",
  Subject: &permifyTypes.Subject{
    Type: "user",
    Id:   "User_1",
  },
}

untold gull
#

An interesting thing I noticed today is that if I set it to only 1 pod, the performance for 500 req/s improved to a 180ms p50
Still not ideal but a 10x improvement over 10 pods (which is really weird)

#

Another thing of note is that the k8s Service being used is of type ClusterIP and not LoadBalancer

celest walrus
#

Hello @untold gull , we are investigating your usecase. I will let you know when we get the results.

celest walrus
#

Hey @untold gull ,

I checked your deployment and didn’t notice anything wrong. When I tested your use case in our cloud environment, I got: (med): 52.66ms, (p90): 193.88ms

Is it possible that in your test scenario, you are writing the same data repeatedly? Are you seeing any serialization errors in the application logs or any connection drops from the database?

Could you try testing checks separately to isolate the issue?

If you’d like to discuss this further, we’d be happy to jump on a call with you.

untold gull
#

Hey @celest walrus

We are indeed seeing a lot a serialization errors.
Postgres logs are

ERROR:  could not serialize access due to read/write dependencies among transactions

DETAIL:  Reason code: Canceled on commit attempt with conflict in from prepared pivot.

However, we are not trying to write the same data repeatedly - as I state the data is similar to

{
  Entity: &permifyTypes.Entity{
    Type: "person",
    Id:   "Person_1",
  },
  Relation: "user_of",
  Subject: &permifyTypes.Subject{
    Type: "user",
    Id:   "User_1",
  },
},

The _1 is based on a shared index but we are protecting it with mutex and I have checked that each request correctly has a different number.

These errors only occur on writes though, and I'm worried mostly with Check performance. In our load test the writes are segregated from the reads and we only start reading when every write has completed. If we modify our code to not clear the data on startup and skip the writes, so we test only the reads with preexisting data, the performance does improve a bit but still has a mean of 800ms.

celest walrus
#

Hello @untold gull ,
Can you check if all pods are under load when you are testing. Maybe there is a problem with ClusterIP.

Even with 0 cache hits, the maximum response time I see for your schema is 200ms(p90). We can prepare a deployment document to help identify any unseen misconfigurations, or we can set up a cloud environment for POC purposes.

celest walrus
#

Hello @untold gull ,

We have identified the root cause of the write errors you encountered.

Could you test using version v1.3.0 and share the results with us? If possible, using the k6 tool would help us match our results.

I’m attaching the results from our test along with the script I prepared for your use case.

Thank you for your collaboration. I look forward to your response.