Hi, I'm working with a transcription and diarization endpoint. The docker image works great, tested locally and also inside a worker, I ssh into the worker and tested using:
python handler.py --test_input '{"input": {"endpoint": "transcribe_option", "file_path": "dev/tmp/test_files/FastAPI_Introduction_-_Build_Your_First_Web_App_-_Python_Tutorial.mp4", "is_diarization": true}}'
The processing time is around 1 minute for this video (11 min), works great, these are the logs I get from running inside the worker the same reques -> Message.txt appended.
Once I test this endpoint using a normal request the worker behaves completely abnormal, taking more than 5-6 minutes just to start the transcription, then even more minutes transcribing. The really weird part is I tested the handler in the worker itself using ssh, I have no idea how to debug this or what might be happening:
2024-02-04T03:05:20.295007097Z --- Starting Serverless Worker | Version 1.6.0 ---
2024-02-04T03:10:30.435312352Z {"requestId": "c81dbe2d-2000-47a8-9336-3c056b9576ca-u1", "message": "Started.", "level": "INFO"}
2024-02-04T03:10:30.596425577Z credentials.py :1123 2024-02-04 03:10:30,596 Found credentials in environment variables.
2024-02-04T03:11:12.226402320Z ic| type(audio_file): <class '_io.BytesIO'>
2024-02-04T03:17:26.405880274Z transcribe.py :263 2024-02-04 03:17:26,405 Processing audio with duration 11:56.266
2024-02-04T03:21:45.191995763Z transcribe.py :317 2024-02-04 03:21:45,191 Detected language 'en' with probability 1.00
This should happen in matter of seconds, just liek the logs from the execution within the worker says:
Manual execution
--- Starting Serverless Worker | Version 1.6.0 ---
{"requestId": null, "message": "test_input set, using test_input as job input.", "level": "INFO"}
{"requestId": "local_test", "message": "Started.", "level": "INFO"}
credentials.py :1123 2024-02-04 03:15:50,178 Found credentials in environment variables.
ic| type(audio_file): <class '_io.BytesIO'>
transcribe.py :263 2024-02-04 03:15:58,890 Processing audio with duration 11:56.266
transcribe.py :317 2024-02-04 03:16:00,622 Detected language 'en' with probability 1.00
Weirdly enough, I have another endpoint using just transcription, its also fast:
2024-02-04T03:28:55.812030213Z ic| 'Initializing transcribe with files'
2024-02-04T03:28:55.865688543Z credentials.py :1123 2024-02-04 03:28:55,865 Found credentials in environment variables.
2024-02-04T03:29:00.602901119Z ic| 'transcribe_wfiles'
2024-02-04T03:29:00.612098062Z ic| type(audio_file): <class '_io.BytesIO'>
2024-02-04T03:29:33.423009670Z transcribe.py :263 2024-02-04 03:29:33,422 Processing audio with duration 11:56.266
2024-02-04T03:30:02.189243458Z transcribe.py :317 2024-02-04 03:30:02,188 Detected language 'en' with probability 1.00