#TensorRT-LLM setup

52 messages · Page 1 of 1 (latest)

muted tulip
#

Has anyone been able to successfully install tensorrt_llm?

I'm trying with pip, but I'm running into mpi related errors:
Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found

I've tried a few templates (runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04; nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3) on A100 and on a 4090.
Cuda 12.2

stone siloBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

jade sierra
#

tried:

apt install libopenmpi-dev```
muted tulip
#

Doesn't work unfortunately

#

Tried uninstalling and reinstalling as well. But doesn't help

jade sierra
#

apt-get install libopenmpi-dev openmpi-bin

muted tulip
#

Yeah, tried them too. I've narrowed down the problem to building mpi4py which gets built from tensorrt_llm

jade sierra
#

are you running it in venv or normal?

muted tulip
#

Normally

#

Let me try in venv

jade sierra
#

mpicc --version

#

do you get output?

muted tulip
#

Same error:
root@afabf97a0d57:/workspace# mpicc --version
Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt
Error parsing data file mpicc: Not found

jade sierra
#

try with venv

#
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com```
muted tulip
#

Same error 😅

jade sierra
#

you will probably need to ask on their repo

muted tulip
#

Okay, thank you

muted tulip
jade sierra
#

@muted tulip I mean runpod does not change files in docker container

muted tulip
#

#🎤|general message

#

With reference to the new error here (reached this point thanks to @odd glen)

#

Can we increase the limit? I don't have permissions to do so...

odd glen
#

you should be able to stop openmpi from trying to increase it

#

idk why the variable I posted doesn't work for you

jade sierra
#

It's not possible as containers are not provilaged

muted tulip
#

@odd glen , did you do a
apt install libopenmpi-dev

as well if you remember? I'm not sure if we should be doing that based on the github link I shared above

#

But if I don't, then I get a different set of errors like:
/usr/bin/ld: cannot find -lvt.mpi: No such file or directory
/usr/bin/ld: cannot find -lvt-hyb: No such file or directory
/usr/bin/ld: cannot find -lvt.ompi: No such file or directory
_configtest.c:2:10: fatal error: mpi.h: No such file or directory

odd glen
#

I ended up not having any time to mess more with tensorrt-llm

#

my original goal was to run tritonserver

muted tulip
#

Worked!

odd glen
muted tulip
#

Thanks a lot!! I think the apt-get command along with the exports you shared together worked out for me
I'm on the runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 template. Will have to see if it works with others too

odd glen
#

if you don't want to run triton that should work just fine

muted tulip
#

Well. Triton is the goal

#

Will go through your post

#

😄

odd glen
#

then you should run it in the nvidia container image like I did there yeah

#

but you have to install trtllm the same way to get the tools to build the engine locally

#

I didn't get to the step of actually running triton

#

realized it would be more work than I have time for rn

#

I definitely want min-p sampling for example

#

it's probably not that hard to add it

#

except if I build trtllm myself the built executable doesn't work

#

worlds least stable software

muted tulip
#

Does seem that way!

#

Thanks for helping out here 😄

tame bear
#

hi guys - is someone using torch tensorrt?