Gemma 3n | OpenRouter | Page 1

sacred wave May 20, 2025, 9:35 PM

#

https://developers.googleblog.com/en/introducing-gemma-3n/

Announcing Gemma 3n preview: powerful, efficient, mobile-first AI- ...

#

will bring this live soon

kind creek May 20, 2025, 9:42 PM

#

i wonder how fast this will run

sacred wave May 20, 2025, 9:49 PM

#

https://openrouter.ai/google/gemma-3n-e4b-it:free

Gemma 3n 4B (free) - API, Providers, Stats

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Run Gemma 3n 4B (free) with API

kind creek May 20, 2025, 9:49 PM

#

kind creek i wonder how fast this will run

answer is not very fast (15t/s)

sacred wave May 20, 2025, 9:50 PM

#

this is like a preview deployment

kind creek May 20, 2025, 9:50 PM

#

yeah but its still really slow

sacred wave May 20, 2025, 9:50 PM

#

once this is properly on HF, i'm sure someone will host for a billion tps

kind creek May 20, 2025, 9:50 PM

#

especially for a 4b model

sacred wave May 20, 2025, 9:50 PM

#

(sambanova etc)

meager spoke May 20, 2025, 9:58 PM

#

This is a quite unique architecture I think; I'd imagine it will take a lot of work to get into common inference frameworks

#

Android app to inference it: https://news.ycombinator.com/item?id=44045265

nolist_policy

You can try it on Android right now:Download the Edge Gallery apk from github: https://github.com/google-ai-edge/gallery/releases/tag/1.0.0Download one of the .task files from huggingface: https://huggingface.co/collections/google/gemma-3n-preview-6...Import the .task file in Edge Gallery with the + bottom right.You can take pictures right from ...

#

https://huggingface.co/collections/google/gemma-3n-preview-682ca41097a31e5ac804d57b

Gemma 3n Preview - a google Collection

#

Gated model which requires manual approval

#

https://arxiv.org/pdf/2310.07707 describes the architecture partially

wild leaf May 20, 2025, 10:17 PM

#

If you have an Android phone you can download and run the smaller of the two models on your phone right now. The app is pretty barebones - it doesn't even save chat history - but it works, and supports images. I'm getting around 12 tok/sec on a 2 year old Oneplus 11. https://github.com/google-ai-edge/gallery/releases

GitHub

Releases · google-ai-edge/gallery

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally. - google-ai-edge/gallery

#

Hopefully a more feature-rich app will come soon that will add the audio and vision modalities. That would be kickass

#

Just install the APK, and open the app, and it will list the models you can download and install. It will ask for a Huggingface login and ask you to accept the terms and conditions, etc, and then will automatically download the model

#

I want whatever app Google is using in this demo video. It looks amazing. I hope they release it https://www.youtube.com/watch?v=eJFJRyXEHZ0

YouTube

Google for Developers

Announcing Gemma 3n Preview: Powerful, Efficient, Mobile-First AI

We’re excited to announce Gemma 3n – a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio and video, empowering developers to build live, interactive applications that understand and interact based on th...

▶ Play video

#

Real-time streaming video/audio chat, all on a smartphone

meager spoke May 20, 2025, 11:19 PM

#

On Galaxy S24, stock firmware, the E2B crashes

#

ok it turns out I needed to get other apps out of memory

#

fortunately you can do that in android settings

#

#

#

Wow 15 tok/s

#

(GPU)

warm hamlet May 21, 2025, 9:27 AM

#

I have a galaxy S23

#

It's the Google AI edge thing to run inference @meager spoke ?

meager spoke May 21, 2025, 2:59 PM

#

Hmm?

#

https://github.com/google-ai-edge/gallery is the app to run inference on Android

GitHub

GitHub - google-ai-edge/gallery: A gallery that showcases on-device...

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally. - google-ai-edge/gallery

#

It's a demo app for MediaPipe, which is a wrapper around tflite

marble bison May 22, 2025, 12:52 AM

#

How exactly does gemma 3n differ from gemma 3? Is gemma 3 higher quality at the expense of speed?

stable plover May 22, 2025, 1:35 AM

#

marble bison How exactly does gemma 3n differ from gemma 3? Is gemma 3 higher quality at the ...

3n is for a edge device that run android

#

3 is just the normal one that didn't have specific architecture and goals for edge devices

marble bison May 22, 2025, 1:45 AM

#

Gotcha, is 3n specifically optimized for android or mobile in general?

wild leaf May 22, 2025, 3:41 AM

#

Been testing the vision ability of these today. Very impressive considering they're running entirely on my phone. A year ago I wouldn't have believed it was possible to have a vision model this good on a phone. Due to their size they're not very smart though, very weak on world knowledge, and hallucinate a ton when asked direct factual questions.

#

That was all hallucinated

Screenshot_2025-05-21-23-43-13-92_d16c12617cf2e25ee29ca0060bc5ddcc.jpg

#

This is the Sunsphere, lol

231_3_22505_jpeg_aad47c57-04cd-4ffb-8a5e-af8a414a7639.png

#

The Gemma models do have a strong tendency to hallucinate when they don't have answers, that's a problem going back to the Bard days

meager spoke May 22, 2025, 3:58 AM

#

marble bison Gotcha, is 3n specifically optimized for android or mobile in general?

3n has the "matformers" architecture, where you can take a subset of the weights to make a working smaller model

#

Its also better than the Gemma 3 models in its size class

wild leaf May 22, 2025, 3:59 AM

#

In any case it'll be exciting to see the full potential of this model, with its audio and video support as well

marble bison May 22, 2025, 4:13 AM

#

meager spoke 3n has the "matformers" architecture, where you can take a subset of the weights...

Ah interesting, I'll have to do some research on matformers

sour dove May 26, 2025, 10:46 AM

#

i think 3n is a great model for mobile

#

i tried the 2b model on my mobile

#

i get a couple tokens per second

haughty marlin May 26, 2025, 11:57 AM

#

anyone found a way to inference locally on a pc yet? mediapipe in a browser runs out of memory 😦

meager spoke Jun 13, 2025, 2:56 AM

#

Maybe an Android emulator, VM, or container (Waydroid) would work

slow olive Jun 25, 2025, 10:13 AM

#

Finally I can become an inference provider. I've got plenty of Android phones that are made for inferencing 3n<\s>

drowsy dagger Jun 26, 2025, 4:35 PM

#

Google has recently added support for Gemma 3n in transformers, so no longer limited to LiteRT
https://github.com/huggingface/transformers/pull/39059
https://huggingface.co/google/gemma-3n-E2B-it
https://huggingface.co/google/gemma-3n-E4B-it

GitHub

Gemma 3n by RyanMullins · Pull Request #39059 · huggingface/trans...

initial commit of Gemma 3n scaffold

Fixing param pass through on Gemm3p5RMSNorm

Adds Einsum layer to Gemma 3n

Updating EinsumLayer API

Undoing erroneous force push

Reverting RMSNorm to wi...

google/gemma-3n-E2B-it · Hugging Face

google/gemma-3n-E4B-it · Hugging Face

#

Fun thing from the squashed commit messages, Gemma 3n used to be called Gemma 3.5 before it was renamed to Gemma 3 Nano (3n)

haughty geode Jun 28, 2025, 1:04 PM

#

I get Developer instruction is not enabled for models/gemma-3n-e4b-it. Is there a system vs developer role confusion on openrouter side? Or if gemma-3n actually does not accept a system prompt, is there a way to know from model inspection, from the /models api endpoint (supported_parameters?)

nimble garnet Jun 28, 2025, 1:11 PM

#

Hi

lapis hound Jun 28, 2025, 1:43 PM

#

Tested Gemma 3n E4B it (local, fp16):

small multimodal local model, though I tested text only (due to lacking llama.cpp implementation)
capability falls between 4B & 9B Gemma models
I saw no hard refusals, though disclaimers and nagging that is present in whole family remains
It's a nice fast, small multipurpose model that can be used for easy tasks in anything except code
Not exactly required for my use cases, but a nice alternative small model. YMMV.

#Gemma 3n