#Sync Issue under E2E Test Load

15 messages · Page 1 of 1 (latest)

wispy crown
#

I'm setting up some E2E tests for my application, and I sometimes get into a state where my electric collection just refuses to sync a shape stream. Optimistic updates get reverted after a few seconds. It's not exactly deterministic, but on the "happy path" where I run everything in a fresh docker compose, minimal parallelism, and no repeat tests it happens 5-10% of the time. If I up the parallelism and repeats the failure rate shoots up

Is this something I should be expecting if I am creating and deleting many db objects and shape streams in a very short space of time?

cedar tulip
#

Definitely shouldn't be happening

#

What do you mean by refusing to sync a shape stream? Like you make a write to pg and then the new update doesn't come back?

#

We shipped a bunch of bug fixes to electric's client and DB last week so worth upgrading to see if that fixes things

wispy crown
#

yeah exactly. basically the write goes through (I can see it in pg) and the optimistic update fires, but then the UI reverts after a few seconds.

I'm working on trying to make a minimal reproducer. I had a lot of relatively complicated shapes with subqueries that I have simplified a lot into one big shape stream, by denormalizing, and it has helped but it's not quite fixed yet

cedar tulip
#

did you try upgrading everything

wispy crown
#

yeah I'm on the latest of it all as of a few days ago. the odd thing it's super repeatable (in my codebase, don't have a separate MVP yet) just by bumping up the playwright number of repeated tests and parallelism. like it just completely flakes out. but I'm having a hard time deciding if it's just lag somehow from docker or smth else

wispy crown
#

I've mostly resolved it. Basically I was ddosing myself bc I had a shared account btwn workers in playwright, and they all subscribed to one global collection which caused a 409 loop that it could never get ahead of, even though they were all working on different subsets of that collection. Switching to isolated workers has helped a lot

I'm still seeing it every now and then though. The test makes a new object, edits it, and then the edit disappears. It seems like it might happen if a 409 appears during the optimistic update? What is the expected behavior there, ie when the collection still is only holding a temp id object from an optimistic update that hasn't been acked by the db yet (so no txid) and a 409 arrives?

cedar tulip
#

If a 409 appears then we just refetch everything which means no txids coming in. So what happens is the awaitTxid times out and the optimistic update drops and then the new server state shows

#

So the only way to fix this that comes to mind is mutationFns could subscribe to when there's refetches happening and resolve when those are done (assuming the API to mutate the DB has returned successfully)

#

But this is an edge case, refetches aren't that common in the wild and the timeouts solve it so dunno if it's worth thinking about too much? Thoughts?

wispy crown
#

sorry I forgot to get back to you here. yeah I think it would be nice to have a mechanism for queuing updates more generally, maybe as part of the feature to resolve temp ids to real server ids?

I think the 409s were a bit of a red herring for my initial issue: I have mostly solved my E2E test flakiness by a) not reusing the same user accounts for different tests bc the constant collection updates for unrelated objects (for the current test) led to issues, and b) I enforce a wait for the server ID before I try to make edits to newly created objects.

I think b) in particular would be nice to have electric/tanstack db handle, but I believe that has already been discussed/is WIP

cedar tulip
wispy crown
#

oh awesome! but this is not yet handling temp ids -> server ids right?

cedar tulip