#[SOLVED] An Internal CURL Error Has Occurred Within the Executor

57 messages · Page 1 of 1 (latest)

waxen jetty
#

So, obvs not a new post, but trying to fetch large files in a CRON-Job for my database to update some security vulnerabilities we're using and getting An internal curl error has occurred within the executor! Error Msg: Http invalid protocol\nError Code: 500 every time now. It worked for a second, but now it's mad. It could be because they are huge files (230 MB or so) and it has to unzip them and stuff, but I'm a bit confused what's going on in general as this error doesn't help me.

waxen jetty
#

@quick linden this is kind of a big deal, my functions keep randomly failing and I thought this was fixed in 1.4.x?

#

I moved it to Bun and it runs for 12 minutes before giving up

quick linden
waxen jetty
#

Hm

#

So

#

It's pulling CVE Nist data which are security vulnerabilities

#

they're clearly very large

waxen jetty
#

do you think that would work?

quick linden
quick linden
waxen jetty
#

Okay gotcha

#

[SOLVED] An Internal CURL Error Has Occurred Within the Executor

quick linden
waxen jetty
waxen jetty
# quick linden Let me know if you need help with the API. I've worked with it 😜
import { Client, Databases, Functions, ID } from 'node-appwrite';

const batch_size = 1000;
const max_instances = 5;

async function createDocuments(client: Client, databaseId: string, collectionId: string, data: any[]) {
  const db = new Databases(client);
  const chunkedData = Array.from({ length: Math.ceil(data.length / batch_size) }, (_, i) => data.slice(i * batch_size, (i + 1) * batch_size));
  
  return Promise.all(chunkedData.map(async chunk => {
    const promises = chunk.map(item => db.createDocument(databaseId, collectionId, ID.unique(), item).catch(e => console.log(`Error creating document: ${e}`)));
    return Promise.all(promises);
  }));
}

export default async ({ req, res, log, error }: any) => {
  const client = new Client()
    .setEndpoint('https://cloud.appwrite.io/v1')
    .setProject(Bun.env["APPWRITE_FUNCTION_PROJECT_ID"])
    .setKey(Bun.env["APPWRITE_API_KEY"]);
  const functions = new Functions(client);
  log("Spawned child function to create data");

  const databaseId = req.body.databaseId;
  const collectionId = req.body.collectionId;
  let data = req.body.data;

  if (data.length > batch_size * max_instances) {
    // Create a new function execution for the remaining data
    try {
      await functions.createExecution('your-appwrite-function', JSON.stringify({ databaseId: databaseId, collectionId: collectionId, data: data.slice(batch_size * max_instances) }), true);
    } catch (e) {
      log(`Error creating function execution: ${e}`);
    }
    data = data.slice(0, batch_size * max_instances);
  }

  // Use recursion to divide and conquer
  if (data.length > batch_size) {
    await createDocuments(client, databaseId, collectionId, data.slice(0, batch_size));
    return await createDocuments(client, databaseId, collectionId, data.slice(batch_size));
  } else {
    await createDocuments(client, databaseId, collectionId, data);
  }

  return res.empty();
};

something like this?

waxen jetty
#

ah they don't like spawning each other haha

quick linden
waxen jetty
#

CVE Nist Data

#

API now, it was a ZIP file

quick linden
# waxen jetty CVE Nist Data

so i suggest having a function that runs periodically.

get last run timestamp
get last start index

if there is a last start index
    fetch the next N CVEs
    insert into Appwrite M CVEs at a time
    update the last start index
else
    fetch CVEs updated since the last run timestamp
    insert into Appwrite M CVEs at a time
#

you can set a start index and then have this run every T minutes until it's done inserting into Appwrite. Then, clear the start index and run it every 2 hours to keep the data updated

waxen jetty
#

so when I ran it one by one

quick linden
#

you would tune the N, M, and T until you find the best values

waxen jetty
#

it took about I think 5-6 hours, something like 1.6 million entries

#

or 160,000

quick linden
waxen jetty
#

Do you know the speed comparison between awaiting promise.all vs one by one?

#

but I love that approach

quick linden
waxen jetty
#

copy lol

#

you know how long I was going 1 by 1? lmao

#

Docs should be updated with common "helper" functions IMO, stuff like that isn't that common knowledge, it makes sense, but I feel like most won't think of it

quick linden
#

the promise.all part is the:

insert into Appwrite M CVEs at a time

part

waxen jetty
#

though I imagine there's something in the works

#

yeah that's what I figured

quick linden
#

that M is what you would tune

waxen jetty
#

I thought it would be cool to spawn X child functions hahaha

quick linden
waxen jetty
#

Technically it was easy, but yeah the actual implementation with Functions is not

#

cause it was waiting for the execution to finish before returning resolution, and if I did allSettled then nothing would happen

#

which makes sense, docker container gets destroyed

quick linden
#

so since it took 15s, i had the function run every minute

waxen jetty
#

neat, smart, that makes sense

waxen jetty
# quick linden stuff like what?

Like "tricks" sort of that aren't commonly known, such as inserting large amounts of data using promise.all because I def didn't know for a minute, mostly because I love Python and Dart and such

quick linden
waxen jetty
quick linden
waxen jetty
# quick linden What's your code?
const batch_size = 1000;

export default async ({ req, res, log, error }: any) => {
  const client = new Client()
    .setEndpoint('https://appwrite.pva.ai/v1')
    .setProject(Bun.env["APPWRITE_FUNCTION_PROJECT_ID"])
    .setKey(Bun.env["APPWRITE_API_KEY"]);
  const db = new Databases(client);
  log("Spawned child function to delete data");
  const databaseId = req.body.databaseId;
  const collectionId = req.body.collectionId;
  const ids = req.body.ids;

  const chunkedIds = Array.from({ length: Math.ceil(ids.length / batch_size) }, (_, i) => ids.slice(i * batch_size, (i + 1) * batch_size));
  
  const childPromises: Promise<any>[] = chunkedIds.map(chunk => {
    log(`Deleting ${chunk.length} documents`);
    return Promise.all(chunk.map(id => db.deleteDocument(databaseId, collectionId, id).catch((e: any) => log(`Error deleting document: ${e}`))));
  });

  await Promise.all(childPromises);
  return res.empty();
};
#

I'm only seeing 2-3 second runtimes before internal curl

#

but tbh what I don't understand is what the heck is the internal curl error that's happening? Shouldn't code that works just... work?

waxen jetty
#

Yeah this is still broken