Cosmos Reason 2 on full Jetson lineup, including Orin Nano 8GB | NVIDIA Omniverse | Page 1

oblique karma Feb 21, 2026, 11:43 AM

#

Hi guys,

We wanted to share that we got Cosmos-Reason2-2B quantized running on the full Jetson lineup, including Orin Nano 8GB. Our release includes memory and latency numbers, instructions, and practical adjustments we found necessary on these constrained devices.

You can check it out here:
https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16

Best of luck to everyone participating in the Cookoff, looking forward to seeing what you build!

oblique karma Feb 21, 2026, 12:19 PM

#

Quickstart (vLLM Jetson container):

-gpu-memory-utilization and --max-num-seqs should be adapted to system specifications (i.e., available RAM).

docker run --rm -it
--network host
--shm-size=8g
--ulimit memlock=-1
--ulimit stack=67108864
--runtime=nvidia
--name=vllm-serve
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
vllm serve "embedl/Cosmos-Reason2-2B-W4A16"
--max-model-len 8192
--gpu-memory-utilization 0.75
--max-num-seqs 2

fierce island Feb 21, 2026, 2:04 PM

#

oblique karma Quickstart (vLLM Jetson container): -gpu-memory-utilization and --max-num-seqs ...

Is there an equivalent developed for x86 systems with upto 8GB VRAM?

oblique karma Feb 21, 2026, 3:39 PM

#

fierce island Is there an equivalent developed for x86 systems with upto 8GB VRAM?

Yes! if you have vllm installed this same model should work out-of-the-box also for x86 systems!

vllm serve embedl/Cosmos-Reason2-2B-W4A16
--max-model-len 8192
--gpu-memory-utilization 0.75
--max-num-seqs 2

fierce island Feb 21, 2026, 5:14 PM

#

Sounds good. Thanks👍

#Cosmos Reason 2 on full Jetson lineup, including Orin Nano 8GB