#text generation inference docker image on serverless?

7 messages · Page 1 of 1 (latest)

quick canyon
#

Hi i have created a template using tgi docker image and in docker commands i have entered --model-id <llama-3-8b> hf repo name and --port 8080 and choose 24gb gpu and ran a serverless instance. But i am not able to connect to this worker what i mean is when i try to ask a question, question is not being sent to the worker, but when i try to ssh into worker and asked a curl request curl 127.0.0.1:8080/generate_stream
-X POST
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'
-H 'Content-Type: application/json'

it actually worked,

but how do i connect to this serverless endpoint from outside probably from my codebase and make inference to the llm model using TGI

inland trail
#

Hmmm use a http client to proxy your request from runpod 's /run

#

So when /run happens you send a request to your localhost ( 127.0.0.1:xxxx/xxx)

quick canyon
#

i have actually tried sending a request using this:
using requests, my endpoint is something like this
https://api.runpod.ai/v2/{endpoint_id}/run, a job is created but i am not getting response , and in the logs i see no request is being sent

inland trail
#

Did you log the request?

#

Runpod's logger doesn't magically reports or log anything that's happening, I think it takes from stdout, stderr