#Gemma 3n
55 messages · Page 1 of 1 (latest)
i wonder how fast this will run
Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Run Gemma 3n 4B (free) with API
answer is not very fast (15t/s)
this is like a preview deployment
yeah but its still really slow
once this is properly on HF, i'm sure someone will host for a billion tps
especially for a 4b model
(sambanova etc)
This is a quite unique architecture I think; I'd imagine it will take a lot of work to get into common inference frameworks
Android app to inference it: https://news.ycombinator.com/item?id=44045265
nolist_policy
You can try it on Android right now:Download the Edge Gallery apk from github: https://github.com/google-ai-edge/gallery/releases/tag/1.0.0Download one of the .task files from huggingface: https://huggingface.co/collections/google/gemma-3n-preview-6...Import the .task file in Edge Gallery with the + bottom right.You can take pictures right from ...
Gated model which requires manual approval
https://arxiv.org/pdf/2310.07707 describes the architecture partially
If you have an Android phone you can download and run the smaller of the two models on your phone right now. The app is pretty barebones - it doesn't even save chat history - but it works, and supports images. I'm getting around 12 tok/sec on a 2 year old Oneplus 11. https://github.com/google-ai-edge/gallery/releases
Hopefully a more feature-rich app will come soon that will add the audio and vision modalities. That would be kickass
Just install the APK, and open the app, and it will list the models you can download and install. It will ask for a Huggingface login and ask you to accept the terms and conditions, etc, and then will automatically download the model
I want whatever app Google is using in this demo video. It looks amazing. I hope they release it https://www.youtube.com/watch?v=eJFJRyXEHZ0
We’re excited to announce Gemma 3n – a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio and video, empowering developers to build live, interactive applications that understand and interact based on th...
Real-time streaming video/audio chat, all on a smartphone
On Galaxy S24, stock firmware, the E2B crashes
ok it turns out I needed to get other apps out of memory
fortunately you can do that in android settings
Wow 15 tok/s
(GPU)
Hmm?
https://github.com/google-ai-edge/gallery is the app to run inference on Android
It's a demo app for MediaPipe, which is a wrapper around tflite
How exactly does gemma 3n differ from gemma 3? Is gemma 3 higher quality at the expense of speed?
3n is for a edge device that run android
3 is just the normal one that didn't have specific architecture and goals for edge devices
Gotcha, is 3n specifically optimized for android or mobile in general?
Been testing the vision ability of these today. Very impressive considering they're running entirely on my phone. A year ago I wouldn't have believed it was possible to have a vision model this good on a phone. Due to their size they're not very smart though, very weak on world knowledge, and hallucinate a ton when asked direct factual questions.
That was all hallucinated
This is the Sunsphere, lol
The Gemma models do have a strong tendency to hallucinate when they don't have answers, that's a problem going back to the Bard days
3n has the "matformers" architecture, where you can take a subset of the weights to make a working smaller model
Its also better than the Gemma 3 models in its size class
In any case it'll be exciting to see the full potential of this model, with its audio and video support as well
Ah interesting, I'll have to do some research on matformers
i think 3n is a great model for mobile
i tried the 2b model on my mobile
i get a couple tokens per second
anyone found a way to inference locally on a pc yet? mediapipe in a browser runs out of memory 😦
Maybe an Android emulator, VM, or container (Waydroid) would work
Finally I can become an inference provider. I've got plenty of Android phones that are made for inferencing 3n<\s>
Google has recently added support for Gemma 3n in transformers, so no longer limited to LiteRT
https://github.com/huggingface/transformers/pull/39059
https://huggingface.co/google/gemma-3n-E2B-it
https://huggingface.co/google/gemma-3n-E4B-it
Fun thing from the squashed commit messages, Gemma 3n used to be called Gemma 3.5 before it was renamed to Gemma 3 Nano (3n)
I get Developer instruction is not enabled for models/gemma-3n-e4b-it. Is there a system vs developer role confusion on openrouter side? Or if gemma-3n actually does not accept a system prompt, is there a way to know from model inspection, from the /models api endpoint (supported_parameters?)
Hi
Tested Gemma 3n E4B it (local, fp16):
- small multimodal local model, though I tested text only (due to lacking llama.cpp implementation)
- capability falls between 4B & 9B Gemma models
- I saw no hard refusals, though disclaimers and nagging that is present in whole family remains
- It's a nice fast, small multipurpose model that can be used for easy tasks in anything except code
Not exactly required for my use cases, but a nice alternative small model. YMMV.