#Stuck when run is triggered via API call but not on dashboard?

68 messages · Page 1 of 1 (latest)

slow hound
#

I have a project that let's me upload videos on google cloud storage (it is very bare and that's the only thing that it does at the moment).

If I trigger the request form serverless dashboard, the job gets completed, but if it is triggered via API it is stuck forever

this is what the code looks like:

const requestConfig = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${runpod.api_key}`,
  },
  body: JSON.stringify({
    input: {
      video_path,
      audio_path,
    },
  }),
};

  const response = await fetch(https://api.runpod.ai/v2/*****/run`, requestConfig);

Any Ideas?

coarse heartBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

raw sail
slow hound
#

@raw sail not 100% sure but I have a hunch that it gets stuck in this function (based on the progress message)

def generate_signed_url(blob_name, expiration_minutes=60):
    log.info(f"Generating signed URL for {blob_name}")
    blob = bucket.blob(blob_name)
    url = blob.generate_signed_url(expiration=timedelta(minutes=expiration_minutes))
    log.info(f"Signed URL: {url}")
    return url

but that doesn't explain why it works if I trigger it via dashboard but not via API?

raw sail
#

that is your handler?

raw sail
#

doesn't seem like a proper handler function

#

try to debug your handler, try to print / see everything you return so you can see whats being returned on the logs

slow hound
#

sorry, this is what my handler looks like - it doesn't really do anything at the moment other than downloading the same file and uploading it to GCS

#

and this is what my docker looks like:

# Use Runpod base image
FROM runpod/base:0.4.0-cuda11.8.0

# Optional: Install extra Linux packages here if needed
# COPY builder/setup.sh /setup.sh
# RUN bash /setup.sh && rm /setup.sh

# Set working directory
WORKDIR /app

# Copy Python dependencies
COPY builder/requirements.txt .

# Install Python dependencies
RUN python3.11 -m pip install --upgrade pip && \
    python3.11 -m pip install --upgrade -r requirements.txt --no-cache-dir && \
    rm requirements.txt

# Copy your actual src code
COPY src/ .

# Start the worker
CMD ["python3.11", "-u", "handler.py"]
raw sail
#

thanks for sending them over

slow hound
#

I cancel it when it gets too long, but sometimes I come back after around 30mins or so and it is still processing, but when I trigger it via dashboard it would only take a couple of seconds

raw sail
#

processing means its still uploading in your handler?

#

or there might be some error? that keeps retrying

#

whats happening inside your worker

slow hound
#

I mean "In Progress" status

raw sail
slow hound
#

or there might be some error? that keeps retrying
the code I sent you was a refactored version, this was the first version that has the same issue, at first I thought something is failing that's why it stuck so i added retry event and error handlers, but that doesn't seem to be the case

raw sail
#

or which process/func of that is still running?

raw sail
slow hound
#

so this one, I tried running this like 3 minutes ago and still processing. it is stuck in Generating signed URL..

raw sail
#

what about the worker logs

slow hound
#

this is what the logs look like

raw sail
#

hmm no logs from your print?

raw sail
slow hound
#

the RunPodLogger doesn't seem to work

raw sail
#

use print() instead for now i guess

rapid remnant
#

is the handler running? why does it stop immediately

raw sail
slow hound
#

all the workers are on idle as well

#

lemme try using print and update you guys.

raw sail
#

btw def adjust_concurrency(current_concurrency):
max_concurrency = 5
min_concurrency = 1
log.info(f"Adjusting concurrency. Current: {current_concurrency}")
if current_concurrency < max_concurrency:
return current_concurrency + 1
elif current_concurrency > min_concurrency:
return current_concurrency - 1
return current_concurrency

you can just directly return 5

#

i guess

slow hound
#

I have adjusted the concurrency to return 5 and use print() directly but it is still the same 🤔

#

I even waited for the build to complete

slow hound
slow hound
#

HI @raw sail ok, it looks like the logs are transitioned to the one above once it is "completed" and I was able to capture the worker logs before it transitioned, as you can see on this log it says "Job completed successfully" there was even a "Finished." message after that, and this is what my code looks like (file attached).

the

print("✅ Job completed successfully.")

is right before my return on my handler, yet the status of my job is still tagged as IN PROGRESS

dapper lilyBOT
raw sail
#

Hmm this is kind of weird try to open a ticket

slow hound
#

Hmm this is kind of weird try to open a ticket
Thank you so much!

rapid remnant
#

What happens if you do runsync?

#

instead of run

slow hound
#

@rapid remnant

ok I just tried it, very weird thing is happening:

If I run this from my proxy API (this is in NodeJS):

const requestConfig = {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${runpod.api_key}`,
  },
  body: JSON.stringify({
    input: {
      video_path,
      audio_path,
    },
  }),
};

const response = await fetch(`${runpod.endpoint}/runsync`, requestConfig);
const data = await response.json();

return data;

data would return:

{
    "delayTime": 5150,
    "executionTime": 2775,
    "id": <job id>,
    "output": {
        "message": "Files processed, uploaded, and cleaned up successfully",
        "signed_url": "..."
    },
    "status": "COMPLETED",
    "workerId": "83kgwesbus3cfb"
}

meaning it has been COMPLETED BUT If I visit the dashboard it would show as IN PROGRESS, calling the API to check for the status of job id, it would return IN PROGRESS then after a couple of minutes it would get removed from my "requests" panel leaving no trace of that ever happening.

Although In reality, I will not use runsync in my project since it is a proxy API and I do not want to have timeout errors if my process in the future would take too long to finish. I would rather use run and poll the status.

rapid remnant
#

the request getting removed part is expected

#

i think i saw that runsync operations have short TTL in the docs

#

anyways the API returning IN_PROGRESS is strange and must be a runpod side problem

raw sail
#

so it finished actually, but your dashboard isnt updated

#

thats why isnt it

#

request removed is 1min after job finished IF /runsync
and 30mins after job finished IF /run

raw sail
#

do you mean its in progress when you poll /status/:jobid (every 1second or so )?

slow hound
slow hound
raw sail
#

So on /runsync it says it's done but in status it's not?

#

You have created a ticket for this right

slow hound
#

I clicked the "Open ZenDesk ticket" above, are there any way of opening a ticket or that's it? (I may have entered the wrong email, but I guess that's fine?)

#

So on /runsync it says it's done but in status it's not?
exactly.

rapid remnant
#

Anyways its 100% runpod's fault

raw sail
#

You Can create one more I guess, from the contact button

slow hound
raw sail
#

Ya it looks cool, need zooms to view clearly in small screen devices

proud robin
#

@slow hound : I have updated your ticket with some suggestion, can you please try that when you get a chance

slow hound
#

Hi @proud robin I replied on the email, but will copy-paste my reply here:

It is still bugged, but now it is stuck in progress even if I trigger it directly from the dashboard:

here is a recording.
https://app.canvid.com/share/fi_01JTFC00JSQMJPP236V39N3NKW

As you can see, in the code and even on the "progress output" it is not stuck on the "Done" but the status is in progress, however it seems that jobs triggered via API is now getting completed.. 

This is very weird
proud robin
#

@slow hound : Can you please have a look in the ticket when you get a chance. I need few details to involve our engineering team.