#Random errors & logging

1 messages · Page 1 of 1 (latest)

analog vessel
#

Hello!
I’ve started encountering random errors when sending messages to my self-hosted server.
They’re neither 502 nor 400 errors, but it seems to happen with every second message I send.

At the same time, I don’t see any errors in the cloud. To track it I’ve added publicApiKey on the FE side and set the COPILOT_CLOUD_PUBLIC_API_KEY environment variable on the server side.

I’m using the claude-sonnet-4-20250514 model.

When I run kubectl logs copilot-kit-server I don’t see any errors.

What could be causing this? I’d really appreciate your help.

analog vessel
#

Interesting detail is that when I run this server locally - there is no network errors 🤔

trail forumBOT
#

Could you share the error message (log or screenshot)? Also, what version of CopilotKit are you using?
In the meantime, try switching to a different model to see if the issue persists.

analog vessel
#

UI:
"@copilotkit/react-core": "^1.10.4",
"@copilotkit/react-ui": "^1.10.4",
Server:
"@copilotkit/runtime": "^1.10.4",

For 502 I getting typical claudflare error page, for 400 there is no message

Bad gateway Error code 502
Visit cloudflare.com for more information.
2025-10-02 06:55:37 UTC
You
Thanks!

trail forumBOT
#

Thanks for the version details!
So you're on 1.10.4 - could you try bumping up to the latest version (use next) and see if that makes a difference?
Also, did you get a chance to test with a different model?
Quick question about those 502 errors - are you actually using Cloudflare as your CDN/proxy? If yes, it might be worth checking if Cloudflare's timeout settings are too short for Claude Sonnet 4.
One thing that's really interesting is the "every second message" pattern you mentioned - does it literally alternate like that (message 1 works, 2 fails, 3 works, 4 fails)? Or is it more just intermittent?
And when you run it locally without issues, you're bypassing Cloudflare completely, right?

analog vessel
#

Some insights during debugging:

  • locally (bypassing claudflare) - works well
  • test env - starting failing after a few messages ( 1 works, 2 fails, 3 works, 4 fails)
  • prod env - works much more stable, failing happening from time to time

Now I'm thinking about resources,
so the theory that it's happening because we have only 1 replica (32 max-connections) in test, and 3 (32*3) in prod
and AI streaming responses hold connections open much longer than normal API calls (30 seconds vs 100ms) so it's kind of used resources much more heavily and it just crashing when there are no free "slots"

does it make sense? 🙂