#function execution timeout
209 messages · Page 1 of 1 (latest)
It would help if you could share the code or logic of the function.
@iron saffron additionally, can you share which runtime you're using and if there are any differences in input data or any other surrounding details?
Check that you properly handle all possible error flows. If you don't call return res.[RESPONSE_TYPE]() it will timeout.
I use nodejs 18
I can give you access to the repository if you allow, I can't reveal the whole code.
the body is wrapped in a try catch
if you help, it would be very good, we are making a startup with very good prospects. we initially chose appwrite and wrote the functions. we now have a choice between rewriting the entire project to another backend or using firebase or others.
Timeout error means that code didnt finish for a long time. In JS this could mean some await that never finish, or often I see this with Promise.all.
In context you get req and res, but also log.
I would recommend to try to use log() function in few places of your code. When function times out, you still see logs that occured before. Thanks to that you can find a spot of your code that is causing the freeze.
You could also write a test file to execute function with some sample data and run locally with proper debugger, if that's what you would prefer
when we run it locally, everything works stably for you. there are no errors, we only get an error in production. out of 5 requests, 1-2 of them will definitely fall off. sometimes it works stably.
when we get the timeout logs do not remain, I tried to do this.
Thats weird 🤔
Can you do me a small favour please and add log right at the beginning of the function, and cause this timeout? Then check logs tab if they are empty or not
added at the very beginning of the function, but the logs and counters are not shown.
logs empty
error
Operation timed out after 30000 milliseconds with 0 bytes received with status code 0\nError Code: 0
Hmm 🤔 That's either bug, code is doing cpu-heavy operation for over 1 second or timeout is outside of your code.
If thats fine by you, could you please give me access to the repository? I want to check for some possible problems with the use of await in combination with arrays, or possibly problematic libraries.
GitHub: https://github.com/meldiron
I dont think its bug based on this test:
- https://github.com/open-runtimes/open-runtimes/blob/main/tests/Base.php#L392-L397
- https://github.com/open-runtimes/open-runtimes/blob/main/tests/resources/functions/node-18.0/tests.js#L85-L93
It ensures that logs caused before timeout are there, but logs after are not.
So we only have 2 more options ✨ Slowly getting there 😅
yes, I also tried to catch the error through the logs, but I couldn't because it didn't show anything.
it seems to me that the request is stuck somewhere.
Have you received an invite?
Yes, thanks. I accepted and checked out the code.
I cant spot any problem. Only thing that comes to my mind is something getting stuck in attempt to communicate to Appwrite database or your PosterService API
But that doesnt explain why logs are not present from before the timeout
I'm pretty sure if there's a timeout, there won't be logs. Especially for this 30s timeout from sync execution
Gooooood point! Sync executions have hard-stop on 30 seconds and executor is not aware of it
Can you please go to function settings and set timeout to let's say 15 seconds? Looking at your execution screenshot, usually it takes below 1 second so it shouldnt affect anything.
sometimes it knocks me out, I get a timeout, but the data received from the PosterService is saved to the database, but I still get a timeout.
It looks almost identical to behaviour I often see on self-hosted machines with low CPU cores count 🤔 But it's on Cloud and we never got that reported there.
it used to be 15 seconds, then I changed it to 30 seconds, I thought it would solve my problem.
Let's set it back to 15 and lets again try to add some logs - one immidietelly after the start of function.
If Steven's concern is right, we should see logs now that timeout is 15 seconds
after some recommendations, I even migrated my self-hosting to the appwrite cloud, but in some cases I get a timeout there.
If we manage to isolate problem, Ill setup a function to reproduce it and fix it quickly
There is a change.
error logs: Execution timed out.
.
Amazig yeah, that is expected. Cool, Im making myself issue to ensure sync executions timeout properly.
Now you can try to add some verbose logs to find out in which part the code freezes.
I can understand this will be annoying process and we are working on improving this experience in next version of functions.
moment
Ill be stepping away for a while, Ill be back in ~1 hour
We must have found the reason.
there may be a problem when getting a getBranch
async getBranch(id: string): Promise<Branch> {
try {
return await this.databases.getDocument(this.databaseId, this.branchCollectionId, id);
} catch {
throw Error('Филиал не найден');
}
}
but it seems to me that this function is not very heavy.
The is cloud or self-hosted?
^ Both, seems like. Currently on Cloud
Can you please try to cause the timeout multiple times, and analyse each log? I would like to ensure its this exact function every time
my branch model
Relationships 🤔🤔🤔
Ill ping @ebon valley here, seems like we are getting function timeout on databases.getDocument with this schema ^
@ebon valley can you see possible reason why this could cause a freeze?
(Jake is differnt timezone, he will likely only answer tomorrow. Or Monday)
new logs
I suspect the problem is in the relationship. what do you think?
if we find a problem, it will help a lot. I've seen a lot of requests in the forum.))
we used to use Bun, because of the timeout we switched to nodejs 18, but even then the problem did not disappear.
receipt model
It makes the most sense to me too, yes.
But what bugs my mind is why it doesnt happen locally
You already provided a lot of useful information and I am very grateful 🙌
We will do our best to identify the problem and get it fixed.
With that said, if you would be willing to try to reproduce it on a new project with simpler collections & relations setup, it would be highly appreciated. With a simple flow we could quickly reproduce it, fix it, and even introduce a test for that specific scenario to ensure it never happens agian.
Let me test the local one right now.
Are you able to share the full schema of your relationships? So far I see you have:
branches <-> organization
branches <-> receipts
branches -> pos_config
branches -> address
receipts <-> transactions
receipts -> pos_system
Yesterday I created a demo project with exact copies. I can invite you there if you give me your email.
@ebon valley
Cool thanks, [email protected]
I would be very glad if you would help
Could you please add me to the repo with the function code? https://github.com/abnegate
Or just share the full code over DM if it's easier
i send invite
@ebon valley I can invite you to join the appwrite which is self-hosting installed.
there is more error observed there
pls send me your email
you can test this endpoint
https://receipt.core.pai.kg/receipt?branchId=6592a5aacea01097e372&tablet_id=1
A new Flutter project.
sometimes the timeout occurs after several requests, sometimes after a lot of requests.
I assume that the branch model is cached and the request responds quickly. but at some point getdocument(branch) freezes.
can relationship queries turn into recursion?
Two-way relationship could it be a source of recursion?
Yeah, internally the relationships are fetched recursively, but there's a max of 3 levels of recursion, which is why on the third level you get null for any further relationships.
We also detect cyclic relationships across collections and short-circuit them so they're only fetched once to avoid an infinite loop.
Fetching from any of the collections via console, client and server SDKs are all working okay, so I would rule out a relationship bug.
Tried the same code with a PHP function instead and got the same issue so it's not node specific, seems to me to be function related. Back to you @vale peak 😅
did you manage to get a timeout problem too?
Yeah I did, but only in functions. Manually using console and SDKs worked okay. Tried the same code in a PHP function and got the same issue so it's not runtime specific either
do you have any suggestions or ideas to solve the problem? we are a startup and should have launched 2 weeks ago, but because of this problem we are standing still.
own project
https://business.pai.kg
Effortless Bill Payment and Rewards
I would have to defer to other team members on this, @vale peak and @heady gull may be able to help further when they're available
Morning 👋
@iron saffron Could you please share the code and credentials with me as well? Ill reproduce it on my end and pin-point when in function execution flow this happens. That will let us move forward and find the bug
How long does making the same query take outside of the function?
From memory about half a second
Sorry for taking so long to answer, I got a little sick.
We are still getting a timeout for the problem.
Apparently, the delay occurs when we access the database, according to our logs, the function freezes exactly when we make a request to the database.
The problem is definitely not the relationship.
Do any Appwrite API calls work? Can you increase the timeout and execute the function asynchronously?
The request usually takes about a second to complete. We tried to increase the execution time without result. I did such an experiment, I took the bun server from the openruntime repository.ts configured and launched it locally. Everything works perfectly. We only have a problem when it runs directly in apwrite functions.
for example we have otp module. from 10 requests 2 failed(timeout).
the logic of this function is very simple.
we could not use the ftp services that you offer, as they are very expensive for our country.
- we create a session for the phone on behalf of the administrator.
- we send the received secret code to the user through a local service.
we get a timeout even for regular functions, especially if the function has not worked for a while.
@heady gull I was able to reproduce it (very weirdly, we need to find better way to reproduce) on production. Next best steps for us would be:
- Find easy way to reproduce, a simple function that can cause it + benchmarking script to trigger it. Should be as simple as Appwrite Function with query to
listDocuments - Both functions and on Stage, and QA server. Will help us prove if it's infrastructure issue or not
- Function on Stage (or QA) and Database on Production. If it works, it might be DNS issue
I have started investigating but some urgent tasks popped up, so I had to step away for a few days.
https://discord.com/channels/564160730845151244/1204999093445988394 new post with timeout problems
There are lots of posts with this problem. We also have it.
Function just times out randomly but functions properly most of the time?
We also accidentally get a timeout.
Solving this bug solved many problems.
You guys solved it?
No, we could not solve the problem, as it is a problem inside the appwrite.
It would be good if @vale peak 🙏 would help solve the problem as quickly as possible.
👀 Heads up that Matej is OOO for a week. I'll remind him when he's back and I'll keep you updated if someone else picks it up or we have any updates.
Appreciate your patience and help!
These flakey bugs are the hardest to debug
thank you
Any updates, I am still struggling with this and I cant even find my own support thread anymore 😦
@pine olive I'm also really looking forward to releasing Mate <@&634618551491100692>. This is a really critical bug. We are already considering using other firebase services or others. But appwrite would be a handy tool, since a large amount of ram business logic is written.
I am slated to launch my product in May, and we're about 70% done right now. I've been considering moving to firebase cause I am so stressed that I am gonna launch and run into some issue that I cant fix like this timeout situation
@pine olive @iron saffron our team is on this, we understand the importance of solving this issues, we've done some tests last week and we'll continue this week. We have a few directions and we'll keep you all posted.
In my experience from many times messing with Functions, it’s typically an uncaught error somewhere in my code tbh that causes this. Or too many promises when I was awaiting many at the same time
It seems like Appwrite’s fault because the error doesn’t tell us what’s happening, I don’t know why it has that error message, but usually I can fix it myself by debugging more and stopping execution before things using return statements to find where it happens
But once it works it won’t happen, at least it hasn’t to me in ~23,000 executions
Our project is already ready to launch, but we can't launch because of the timeout.
The 15 minute timeout?
It seems to us that the problem does not depend on promises.
I’m happy to help check it out if you’re looking for help
No 15-30 seconds timeout when executing function
I think you mean that the build of the project.
Wait this is self hosted or cloud?
Self host and cloud
so I self host all my instances, is the issue primarily on cloud or primarily self host?
We noticed this problem with both usage scenarios.
Huh, is there a link to the reproduction of it?
or would you like me to try to help you debug tomorrow to see if we can find a workaround?
Maybe the Python SDK won’t do it, maybe it’s an issue with promises, I know it can be frustrating but there’s always an answer 🙂 worst case scenario I can help you spin up a portainer instance and host an API server or something to solve the issue so you can launch
Can I send you my code that is having the issue? I have a simple function that either completes in milliseconds, or fails after 15 seconds. We're still in development but its basically grabbing two documents from a collection (there's literally 2 in the entire collection), checking something and returning the value. With literally every single variable being equal through repeated tests it fails anywhere from 20-50% of the time.
Yeah please do
Hey community 👋
I can see multiple threads about function timeouts, Ill post update here as it seems the most active.
We managed to isolate and reproduce the problem. It affects both self-hosted and Cloud. Cloud is affected in rare scenarios, and self-hosted common-to-rare, depending on amount of CPU cores.
Solution for self-hosted will be released as part of patch versions in 1.5.x version. I would still recommend increasing amount of CPU cores of self-hosted machines to reduce this problem, or scale horizontally as Appwrite API containers are stateless.
For Cloud users, this week we will work on multiple solutions, each reducing occurrence of function timeout. First fixes will be deployed as soon as today or tomorrow. To our knowledge, combinations of proposed solutions should make a very big difference in both occurrence rate as well as recovery speed.
Those are quick short-term solutions, but we also plan to work on long-term solution to fully avoid this problem for all users, and allow even better scaling. While such solution doesn't take too much time, a lot of preparation in our libraries needs to be done, and we want to very firmly test these changes. I don't have any ETA for it, but we plan to start working on it soon after having above mentioned quick fixes deployed.
Thanks for all the reports and information you all provided, without your active feedback this wouldn't be possible ❤️ With that said, now Ill start working on the fixes so we can celebrate soon 🙌
As good as the recommendations are, 6+ cores for a single worker is insane. In my case it's just update products in a collection :/
well it’s not 6+ cores for your one function haha, it’s for Appwrite
it's not the "entire" appwrite either
Still, 6 cores for just that, since it's not a production used docker yet, is really unneccessary
so basically appwrite is doing nothing, so all cores are basically just for that one function, which fails anyway
Also, for your solution, do you mean _APP_FUNCTIONS_CPUS or _APP_WORKER_PER_CORE
Since _APP_FUNCTIONS_CPUS is 0, and last time you said worker_per_core in the other thread (which is on 8 atm), i'm guessing you mean worker_per_core
(which to be working correctly, I guess this needs to be 12+ at the moment)
Depending on your database capabilities, you can safely increase variable _APP_WORKER_PER_CORE
For instance, Open RUntimes Proxy uses PER_CORE=100 and works perfectly fine. Problem is that Appwrite will make 1 DB connection for each worker thread so you need to make sure your SQL can accept as much as you increase, including all workers. Best to try and fail, to find the limit (with benchmark)
✨We got some new updates ✨
I have been doing benchmarks all day today. One bottleneck was fixed and it got confirmed to work as expected, but it had only little impact on this function timeout issue (but it had impact 🎉). It's a good thing it got fixed, it would become bottleneck very soon.
Another fix was implemented today, and we expect much higher impact. I am waiting for PR reviews before I do benchmark again.
This second fix will have same impact both Cloud and self-hosted. First fix only affected instances using Proxy for functions (Cloud + a few self-hosted instances).
The database is built-in, docker by appwrite, not my own DB, so it should be fine, yeah?
I dont think we have custom configuration, so I would say its default. Chat GPT says that is 151 connections. Considering it's split between API container and all workers, its hard to guess what _APP_WORKER_PER_CORE is safe. You can try to increase it, and do benchmark. If its too high, it will start saying "pool is empty" or "connection refused" or "too many connections", or similar.
How do we get these fixes if were self hosted?
We will add them in patch version after we safely deploy them to Cloud
Yes. On a clean, powerful dev server with only 1 user online.
Same here.
@vale peak Imagine doing a normal:
import { getUsers } from './utils/getUsers.js'
export default async({ req, res, log, error }) => {
if (req.method === 'GET') {
switch (req.path) {
case '/':
return res.send('Hello, World!')
case '/getUsers':
const users = await getUsers()
if (!users) {
return res.send('Error fetching users')
}
return res.send(users)
default:
return res.send('Not found')
}
}
};
With my /getUsers endpoint being:
export async function getUsers () {
const response = await fetch(
`${process.env.APPWRITE_API_URL}/v1/databases/65527f2aafa5338cdb57/collections/65564fa28d1942747a72/documents`,
{
method: 'GET',
headers: {
'Content-Type': 'application/json',
'X-Appwrite-Project': `${process.env.APPWRITE_PROJECT_ID}`,
'X-Appwrite-Response-Format': '1.4.0',
},
},
)
if (!response.ok) {
return false
}
const data = await response.json()
return data.documents
}
It's timing out, how? It's a simple fetch 
Heyy 👋
Yes, this behaves acording to our understanding. A request to Appwrite APIs from synchronous execution randomly causes timeout.
--
✨ Update time ✨
We have recently finished libraries needed for our long-term solution that will prevent this problem, and we will be starting implementation in Appwrite. Finger crossed for smooth transition 🤞

So what's causing the timeout?
It's a rare scenario when one API container thhreads are filled only with synchronous execution, and a request outgoing from an execution is randomly assigned to the same container, and it becomes frozen until execution times out.
I consider them rare because there needs to be as many requests from executions as there are threads, times amount of API containers in infrastructure.
Solution we are attempting will allow us to 10-50x amount of concurrently working threads, allowing us to mitigate those occurances for many months. In the meantime we might shift to coroutine approach, or different solution we find.
Thanks for the detailed explanation
We made great progress on a long-term solution ✨ The benchmark shows much better concurrency results. Benchmark req/sec results are worse, but that's expected as the solution switches from a multi-core to a single-core approach. With 8 containers on 8 CPU cores, results are much much better 🎉
We also noticed that the function timeout issue isn't 100% solved with this, so we continued the investigation on that front. As far as I understand, there is some bug with Node 18 specifically because Node 21 and Python 3.9 don't have the same issue. I am working actively on understanding the cause and proposing a solution to that too.
Hi, thank you for your work, The good news is, I noticed a problem with timeout on node and bun.
lol, my import function is no longer "party failing", it is now "all failing" 
Update time ✨
I got very good news, we spotted and managed to reproduce a scenario when a freeze is caused by Swoole, an underlying runtime that Appwrite uses. For anyone interested in technical details, you can check out a public issue: https://github.com/swoole/swoole-src/issues/5274
No spam on the issue please, lets be respectful towards Swoole maintainers 🙏 They are doing an amazing job 🔥
While waiting to hear back from them, we are considering infrastructure changes that will prevent dependent request, and prevent timeout issues.
For Cloud users, we will handle it all.
For self-hosted, to do those changes, you will neede to spin up second instance of Apppwrite API, expose it to runtimes network and use it instead of your domain endpoint inside Functions code.
Sounds great!
No idea what to do, someone got a step by step? 😆
also is that a permanent solution or just to temporarily fix it?
From feedback we got on the issue, it seems like we can customize Swoole for our needs on this front. I will be trying it next week and likely deploy it to Cloud. If we manage to resolve timeout issues on Cloud, we will release it to open source as well - Ill fight for patch 1.5.x version for this bugfix. If all of that goes well, simply upgrading your Appwrite version will resolve the problem. Ill keep you posted 🙌
Oh and yes, it will be permanent fix. We will still explore coroutines as they can speed-up Appwrite APIs in general, but it will no longer have urgency from this function timeout issue.
Hey, any update?
Would like to publish some sites soon and waiting on this.
Good morning 👋
We managed to customize how the underlying HTTP server distributes work, and managed to prevent the freeze - avoiding all function timeouts. As per our understanding, this may worsen performance a little, but it's perfectly fine, as we can always scale horizontally. Right now we are benchmarking the before&after, and are testing the solution on stage to ensure stability and memory usage.
Soon, we plan to deploy this on Cloud - likely next week
With successful deployment on Cloud, and no new reports on this issue, I we will release this as a patch version in 1.5.x.
For more technical details, you can check out a PR that addresses this problem: https://github.com/appwrite/appwrite/pull/7855
How much is a little performance?
Just assumptions so far, from theory the way we split workers. Once we have some benchmark results, I will let you know.
Thanks!
closed means it already merge ?
Nope, it means it wasn't merged but closed instead 🤔
why🥹
Any update on the cloud deployment, Matej? 🙂
Good morning ☀️
We deployed an update to Cloud and are currently monitoring the effect. I am not actively working on it right now, so I sadly dont have all the details. What's very positive is that deployment was a success, as we did a huge change to an underlying logic of request processing - and everything remains stable. There are some configurations we might need to adjust before our final solution.
This specific file is causing code dupliucation between open source and cloud versions. We decided to close open-source one, and focus on Cloud to allow us to move quickly. Once finalized, we can do quick text diff on the files and move changes to open source verison too
Hey, I think the Function timeout seems fixed because I did not encounter that error anymore. Thank you @vale peak to fix that issue
Such amazing news! 🔥 cc @sudden fossil @cosmic dew
so this issue is still present in self-hosted right?
Yes. A solution will be coming soon for self-hosting too
thanks
Heyo, any update? :)
Hi 👋 We are currently in waiting phase. From your feedback we believe the problem is resolved, but on Sentry we are seeing almost identical amount of timeouts as before. We didn't yet manage to create a conclusion out of that, and we are waiting to have a bit more data before we do. Ill talk to Eldad about it by the end of this week, and we mark it as resolved for Cloud, we will add it to self-hosted.
cc @cosmic dew, if you have any more relevant info
No more information
That's a bummer! It's almost like you "suppressed" the error? haha
Any updates on this?
Try redeploying the function
It's a known issue. Which AFAIK is still not fixed on self hosted.
Yeah, it seems a fix was deployed on cloud but nothing on self hosted. Do you have an ETA for the next update of Appwrite?
@cosmic dew Is there some update on this?
@vale peak 
A few folks are OOO, once they get back we will sync all cloud changes to open-source. There are more bug fixes, so we can release them all in 1 patch version
I think all could be merged by end of this week
This mean, you have fully solved this issue?
We believe so, yes. Freezes will no longer happen. If server is slow, or there are too many requests, it will take longer. But never freeze anymore
Excellent! very good news!
@vale peak wow good job 🔥🔥🔥
When are they back?
I am needing this fix for a longer amount of time now.
bump !
Bump also :p
Good news. Thanks everyone for pushing on this topic. You made it clear this fix is important. We considered it temporary fix, and we still do, before we have proper concurrency support. With that said, it helped Cloud in the meantime, and it can help self-hosted instances too.
We will be releasing the fixes in 1.5.7, next patch release.
@vale peak @iron saffron I think we can mark this as solved now? 🙂
Ye
I think we need to wait for others to test🤝
Hey.
Did it get fixed in the latest update?
I don't see it in #📢│announcements.
Seems yes 🙂
Test it yourself and confirm
This fixed all timeout issues for me on self-hosted.
Hello sir, if we want to just update the function executor what version is it?
To fix the function timeout, Appwrite version needs to be updates.
Also, I can confirm. Version 1.5.7 has fixed this bug.