AI21 Jamba MoE: 256K Context, 3X the speed. | OpenRouter | Page 1

slim dove Mar 29, 2024, 12:18 AM

#

This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.

AI21's blog
Weights, HF

flint perch Mar 29, 2024, 12:20 AM

#

is jamba based on mamba?

#

(duh, just opened the blog article afterwards and yeah)

slim dove Mar 29, 2024, 12:21 AM

#

flint perch is jamba based on mamba?

Maenod probably a mix of AI21's "Jurassic" models and the name mamba.

#

so yes its mamba

mild drift Mar 29, 2024, 1:45 AM

#

Any idea if/when we will get this model to play with on OpenRouter? Looks amazing

cosmic oxide Mar 29, 2024, 4:18 AM

#

looking into it

#

looks like only a base version is available (on HF) - no instruct yet, but it's coming

mild drift Mar 29, 2024, 7:20 PM

#

are there any base versions on OpenRouter at all? I'd love to try them out for certain use cases

winter imp Mar 29, 2024, 7:36 PM

#

mild drift are there any base versions on OpenRouter at all? I'd love to try them out for c...

If you mean base versions of any model - OR has base Mixtral and Yi, for example.

glad gazelle May 2, 2024, 4:13 PM

#

Sorry to resurrect an old thread, but looks like AI21 is now publicly testing this: https://www.ai21.com/blog/announcing-jamba-instruct

Built for the Enterprise: Introducing AI21’s Jamba-Instruct Model

An instruction-tuned version of our hybrid SSM-Transformer Jamba model, Jamba-Instruct is built for reliable commercial use, with best-in-class quality and performance.

#AI21 Jamba MoE: 256K Context, 3X the speed.