So, obvs not a new post, but trying to fetch large files in a CRON-Job for my database to update some security vulnerabilities we're using and getting An internal curl error has occurred within the executor! Error Msg: Http invalid protocol\nError Code: 500 every time now. It worked for a second, but now it's mad. It could be because they are huge files (230 MB or so) and it has to unzip them and stuff, but I'm a bit confused what's going on in general as this error doesn't help me.
#[SOLVED] An Internal CURL Error Has Occurred Within the Executor
57 messages · Page 1 of 1 (latest)
@quick linden this is kind of a big deal, my functions keep randomly failing and I thought this was fixed in 1.4.x?
I moved it to Bun and it runs for 12 minutes before giving up
can this be broken down in any way? 12 minutes for 1 iteration is too long
Hm
So
It's pulling CVE Nist data which are security vulnerabilities
they're clearly very large

I was thinking I could maybe spawn child function executions for each chunk of data
do you think that would work?
You can fetch a chunk at a time
You should have a function execution handle a chunk at a time
Let me know if you need help with the API. I've worked with it 😜
If you have any examples I'd love to have them 😄 but right now it seems like just creating chunks of data from JSON so seems pretty doable via just breaking it up and sending off an async execution then continuing onward yeah?
import { Client, Databases, Functions, ID } from 'node-appwrite';
const batch_size = 1000;
const max_instances = 5;
async function createDocuments(client: Client, databaseId: string, collectionId: string, data: any[]) {
const db = new Databases(client);
const chunkedData = Array.from({ length: Math.ceil(data.length / batch_size) }, (_, i) => data.slice(i * batch_size, (i + 1) * batch_size));
return Promise.all(chunkedData.map(async chunk => {
const promises = chunk.map(item => db.createDocument(databaseId, collectionId, ID.unique(), item).catch(e => console.log(`Error creating document: ${e}`)));
return Promise.all(promises);
}));
}
export default async ({ req, res, log, error }: any) => {
const client = new Client()
.setEndpoint('https://cloud.appwrite.io/v1')
.setProject(Bun.env["APPWRITE_FUNCTION_PROJECT_ID"])
.setKey(Bun.env["APPWRITE_API_KEY"]);
const functions = new Functions(client);
log("Spawned child function to create data");
const databaseId = req.body.databaseId;
const collectionId = req.body.collectionId;
let data = req.body.data;
if (data.length > batch_size * max_instances) {
// Create a new function execution for the remaining data
try {
await functions.createExecution('your-appwrite-function', JSON.stringify({ databaseId: databaseId, collectionId: collectionId, data: data.slice(batch_size * max_instances) }), true);
} catch (e) {
log(`Error creating function execution: ${e}`);
}
data = data.slice(0, batch_size * max_instances);
}
// Use recursion to divide and conquer
if (data.length > batch_size) {
await createDocuments(client, databaseId, collectionId, data.slice(0, batch_size));
return await createDocuments(client, databaseId, collectionId, data.slice(batch_size));
} else {
await createDocuments(client, databaseId, collectionId, data);
}
return res.empty();
};
something like this?
ah they don't like spawning each other haha
where are you getting the data you're passsing in?
so i suggest having a function that runs periodically.
get last run timestamp
get last start index
if there is a last start index
fetch the next N CVEs
insert into Appwrite M CVEs at a time
update the last start index
else
fetch CVEs updated since the last run timestamp
insert into Appwrite M CVEs at a time
you can set a start index and then have this run every T minutes until it's done inserting into Appwrite. Then, clear the start index and run it every 2 hours to keep the data updated
so when I ran it one by one
you would tune the N, M, and T until you find the best values
yes, it will take a while to import all the data
Do you know the speed comparison between awaiting promise.all vs one by one?
but I love that approach
definitely don't do 1 by 1.
copy lol
you know how long I was going 1 by 1? lmao
Docs should be updated with common "helper" functions IMO, stuff like that isn't that common knowledge, it makes sense, but I feel like most won't think of it
the promise.all part is the:
insert into Appwrite M CVEs at a time
part
that M is what you would tune
I thought it would be cool to spawn X child functions hahaha
ya...that sounds complicated
Technically it was easy, but yeah the actual implementation with Functions is not
cause it was waiting for the execution to finish before returning resolution, and if I did allSettled then nothing would happen
which makes sense, docker container gets destroyed
stuff like what?
i was doing 150/s...so i think each execution took about 15s to finish
so since it took 15s, i had the function run every minute
neat, smart, that makes sense
Like "tricks" sort of that aren't commonly known, such as inserting large amounts of data using promise.all because I def didn't know for a minute, mostly because I love Python and Dart and such
dart has something like promise.all too 😉
Hey I'm trying to batch delete and I made it dead simple, I'm only giving it 1000 documents at a time for each function creation and every single one is getting internal curl error
Maybe it's too much? Try lowering it?
What's your code?
const batch_size = 1000;
export default async ({ req, res, log, error }: any) => {
const client = new Client()
.setEndpoint('https://appwrite.pva.ai/v1')
.setProject(Bun.env["APPWRITE_FUNCTION_PROJECT_ID"])
.setKey(Bun.env["APPWRITE_API_KEY"]);
const db = new Databases(client);
log("Spawned child function to delete data");
const databaseId = req.body.databaseId;
const collectionId = req.body.collectionId;
const ids = req.body.ids;
const chunkedIds = Array.from({ length: Math.ceil(ids.length / batch_size) }, (_, i) => ids.slice(i * batch_size, (i + 1) * batch_size));
const childPromises: Promise<any>[] = chunkedIds.map(chunk => {
log(`Deleting ${chunk.length} documents`);
return Promise.all(chunk.map(id => db.deleteDocument(databaseId, collectionId, id).catch((e: any) => log(`Error deleting document: ${e}`))));
});
await Promise.all(childPromises);
return res.empty();
};
I'm only seeing 2-3 second runtimes before internal curl
but tbh what I don't understand is what the heck is the internal curl error that's happening? Shouldn't code that works just... work?
Yeah this is still broken