#Request for feedback on a caching solution

1 messages · Page 1 of 1 (latest)

fiery basalt
#

Hi, at sennder we are currently thinking about overhauling our first and super simple way of using snapshot tokens to improve the system performance.

In this thread I will describe the improvement proposal and kindly ask to provide feedback.

First let's create a simple pseudo schema.

entity user {}  # no relations nor attributes on user 

entity organization {
    relation role_1 @user
    relation role_2 @user
    relation role_3 @user
    permission role_based_permission_1 = role_1
    permission role_based_permission_2 = role_2 or role_3
}
entity chartering_office {
    relation role_1 @user
    relation role_2 @user
}
entity carrier {
    relation chartering_office @chartering_office
    relation admin @user
    attribute att_1
    action traversal_permission_1 = chartering_office.role_1 or chartering_office.role_2
    action attribute_permission_1 = admin not att_1
}

Our schema is way larger but these are the permission types we are using.

  1. Writing permission data

In our system all the writes go through a single set of kafka consumers that consume various topics and based on the consumed data write permission tuples and attributes to permify. A single consumed message may result in > 1 tuple/attribute write in permify.

Once the data is written in permify, we have a bunch of snap_tokens. We store them all in redis, one token per affected entity, under following keys:

  • permify:snap:tenant:{entity_type_1}:{entity_id_1} = {token_1}
  • permify:snap:tenant:{entity_type_2}:{entity_id_2} = {token_2}
  • permify:snap:tenant:{entity_type_3}:{entity_id_3} = {token_3}

etc.

There is no TTL.

[continued in comments due to message length limit]

#
  1. Checking permissions

Can the user U do the action A on the entity E?

First step is to check whether the action A on entity E is traversal. Traversal means it depends not only on the state of the entity E but also on some other entity(s) higher in the hierarchy. Like the "traversal_permission_1" in my example schema.

In order to determine whether the permission is traversal, we have a simple schema parser that implements this check. Basically if the permission definition contains a dot "." it is traversal.

If the permission is traversal - we just make a query to permify without attaching a snap_token at all.

If the permission IS NOT traversal - we fetch the snap_token for that entity from redis and attach that token to the permission check query we make in permify.

  1. Other concerns

Schema update. Our schema gets updated a few times a day tops. Mostly few times a week. I was thinking to just invalidate entire redis cache each time a schema is updated. Could you think of some more efficient implementation that wouldn't be very complex?

Any and all feedback is very welcome. Are there any flaws in the proposal? BTW there is nothing really domain specific in this solution, I think something like this should be available as a plugin solution to permify. Just configure a redis instance and let it rip 🙂

Thanks a lot in advance for any help!

south smelt
#

Hi @fiery basalt , this looks like a good idea. I have a few questions:

1- How are you currently determining local vs traversal exactly? Is that based on the full permission dependency graph, or a simpler syntactic parse of the schema?
2- On schema updates, do you recompute that classification per schema_version? A permission can change from local to traversal across schema versions.

If you always send schema_version together with the check, and keep the locality classification aligned with that same version, I don’t think you need to flush all Redis snap-token entries on every schema change. Schema changes should be handled by schema_version and by recomputing the locality classification.

The follow-up concern then would be Redis growth, since one token per entity with no TTL can make Redis get bloated over time.

fiery basalt
#

Hi @south smelt thanks for your answer!
Now to your questions:

  1. right now we are not really determining local vs traversal at all. But I was thinking to base it on the syntax, simply on the presence of a dot "." in the permission definition
  2. the classification would happen just once upon our lambda authenticator cold start and the permify schema is simply baked into the lambda deployment package, so it wouldn't really get out of date

Right now in our architecture we don't have any special treatment for the schema versions, the schema version is never attached to queries.
Do you see some way of improving this situation?

As for the redis blow in a possible solution with explicit schema version attached - not sure if that would be a problem yet, we are currently rolling out a simple solution to check how many different entities do we really have in our workset etc.

south smelt
#
  1. I’d be a bit careful with using only "." for local vs traversal. It catches obvious cases like chartering_office.role_1, but it can miss indirect traversal too. For example, can_view = office_access looks local syntactically, but if office_access = chartering_office.role_1, then can_view is traversal as well. Same thing for subject relations like group#member.
  2. On the schema_version part: if you redeploy the lambda together with every schema change, I don’t really see a mismatch risk for the local/traversal classification.

If you don’t send schema_version, Permify will use the head version, so old cache entries won’t be reused after a schema change. My main point was just that because of this, I don’t think you need to flush Redis, as long as the schema in the lambda stays in sync with the schema Permify is evaluating.