#AI21 Jamba MoE: 256K Context, 3X the speed.

11 messages · Page 1 of 1 (latest)

slim dove
#

This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.

AI21's blog
Weights, HF

flint perch
#

is jamba based on mamba?

#

(duh, just opened the blog article afterwards and yeah)

slim dove
#

so yes its mamba

mild drift
#

Any idea if/when we will get this model to play with on OpenRouter? Looks amazing

cosmic oxide
#

looking into it

#

looks like only a base version is available (on HF) - no instruct yet, but it's coming

mild drift
#

are there any base versions on OpenRouter at all? I'd love to try them out for certain use cases

winter imp
glad gazelle