#Cosmos Reason 2 on full Jetson lineup, including Orin Nano 8GB

1 messages · Page 1 of 1 (latest)

oblique karma
#

Hi guys,

We wanted to share that we got Cosmos-Reason2-2B quantized running on the full Jetson lineup, including Orin Nano 8GB. Our release includes memory and latency numbers, instructions, and practical adjustments we found necessary on these constrained devices.

You can check it out here:
https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16

Best of luck to everyone participating in the Cookoff, looking forward to seeing what you build!

oblique karma
#

Quickstart (vLLM Jetson container):

-gpu-memory-utilization and --max-num-seqs should be adapted to system specifications (i.e., available RAM).

docker run --rm -it
--network host
--shm-size=8g
--ulimit memlock=-1
--ulimit stack=67108864
--runtime=nvidia
--name=vllm-serve
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
vllm serve "embedl/Cosmos-Reason2-2B-W4A16"
--max-model-len 8192
--gpu-memory-utilization 0.75
--max-num-seqs 2

fierce island
oblique karma
fierce island
#

Sounds good. Thanks👍