#Scheduling, cancellation, and streaming AI responses
1 messages · Page 1 of 1 (latest)
trying again:
I'm working on an app which works similarly to AI dungeon. I want it to support streaming in responses chunk-by-chunk, and I want to support cancellation.
Here's a simplified version of how I'm doing it now. This is a lot of code. And I'd have to repeat a lot of this for whatever other parts of the app I want to support generation and cancellation. And again, this is simplified. This is missing auth checks, status guards, regeneration, and other things. I want to try to simplify this.
Some alternatives I considered:
- Have the entrypoint be an action instead of a mutation, and
awaitit on the frontend. This would solve the need for astatusvariable, but only in the context of the client awaiting that function. Since generations can take upwards to minutes, I want to make it so the pending state is independent of browser refreshes and other things. Most of the boilerplate still remains with this approach anyways. - Scheduled action cancellation. From my understanding, the docs don't cancel scheduled functions; the stream would continue running and updating the DB. They read "cancelled actions won't run any other scheduled actions", but it doesn't mention that for mutations/queries run during actions. Plus, unless the action itself can get its own status (I couldn't find a way how?) I still need some way for the action to know to abort the completion stream.
- Workflows. To allow cancellation, I would want to have several
.step()calls per AI response chunk, but I read that workflows have to be deterministic, so to my understanding, that wouldn't work. It also doesn't look like it supportsfetch()yet anyways (?). - Generalize this into a generic "completion" concept, where other models (like the chapter in this example) would point to a
completionstable where its related content is stored and generated. All generatable things would go through this completions interface. I can't think of a way to make this abstraction that doesn't compromise on the flexibility I need. The type of response and the manner in which it's processed differs between parts of the app in a way that can't be parameterized (easily, if at all).
So my question is whether I can simplify this, generalize this, or if there's something I missed / misunderstood in reading the docs. Thanks in advance!
Following