#Gemma 3n

55 messages · Page 1 of 1 (latest)

kind creek
#

i wonder how fast this will run

sacred wave
kind creek
sacred wave
#

this is like a preview deployment

kind creek
#

yeah but its still really slow

sacred wave
#

once this is properly on HF, i'm sure someone will host for a billion tps

kind creek
#

especially for a 4b model

sacred wave
#

(sambanova etc)

meager spoke
#

This is a quite unique architecture I think; I'd imagine it will take a lot of work to get into common inference frameworks

#

Gated model which requires manual approval

wild leaf
#

If you have an Android phone you can download and run the smaller of the two models on your phone right now. The app is pretty barebones - it doesn't even save chat history - but it works, and supports images. I'm getting around 12 tok/sec on a 2 year old Oneplus 11. https://github.com/google-ai-edge/gallery/releases

GitHub

A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally. - google-ai-edge/gallery

#

Hopefully a more feature-rich app will come soon that will add the audio and vision modalities. That would be kickass

#

Just install the APK, and open the app, and it will list the models you can download and install. It will ask for a Huggingface login and ask you to accept the terms and conditions, etc, and then will automatically download the model

#

I want whatever app Google is using in this demo video. It looks amazing. I hope they release it https://www.youtube.com/watch?v=eJFJRyXEHZ0

We’re excited to announce Gemma 3n – a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio and video, empowering developers to build live, interactive applications that understand and interact based on th...

â–¶ Play video
#

Real-time streaming video/audio chat, all on a smartphone

meager spoke
#

On Galaxy S24, stock firmware, the E2B crashes

#

ok it turns out I needed to get other apps out of memory

#

fortunately you can do that in android settings

#

Wow 15 tok/s

#

(GPU)

warm hamlet
#

I have a galaxy S23

#

It's the Google AI edge thing to run inference @meager spoke ?

meager spoke
#

Hmm?

#

It's a demo app for MediaPipe, which is a wrapper around tflite

marble bison
#

How exactly does gemma 3n differ from gemma 3? Is gemma 3 higher quality at the expense of speed?

stable plover
#

3 is just the normal one that didn't have specific architecture and goals for edge devices

marble bison
#

Gotcha, is 3n specifically optimized for android or mobile in general?

wild leaf
#

Been testing the vision ability of these today. Very impressive considering they're running entirely on my phone. A year ago I wouldn't have believed it was possible to have a vision model this good on a phone. Due to their size they're not very smart though, very weak on world knowledge, and hallucinate a ton when asked direct factual questions.

#

That was all hallucinated

#

This is the Sunsphere, lol

#

The Gemma models do have a strong tendency to hallucinate when they don't have answers, that's a problem going back to the Bard days

meager spoke
#

Its also better than the Gemma 3 models in its size class

wild leaf
#

In any case it'll be exciting to see the full potential of this model, with its audio and video support as well

marble bison
sour dove
#

i think 3n is a great model for mobile

#

i tried the 2b model on my mobile

#

i get a couple tokens per second

haughty marlin
#

anyone found a way to inference locally on a pc yet? mediapipe in a browser runs out of memory 😦

meager spoke
#

Maybe an Android emulator, VM, or container (Waydroid) would work

slow olive
#

Finally I can become an inference provider. I've got plenty of Android phones that are made for inferencing 3n<\s>

drowsy dagger
#

Google has recently added support for Gemma 3n in transformers, so no longer limited to LiteRT
https://github.com/huggingface/transformers/pull/39059
https://huggingface.co/google/gemma-3n-E2B-it
https://huggingface.co/google/gemma-3n-E4B-it

GitHub

initial commit of Gemma 3n scaffold

Fixing param pass through on Gemm3p5RMSNorm

Adds Einsum layer to Gemma 3n

Updating EinsumLayer API

Undoing erroneous force push

Reverting RMSNorm to wi...

#

Fun thing from the squashed commit messages, Gemma 3n used to be called Gemma 3.5 before it was renamed to Gemma 3 Nano (3n)

haughty geode
#

I get Developer instruction is not enabled for models/gemma-3n-e4b-it. Is there a system vs developer role confusion on openrouter side? Or if gemma-3n actually does not accept a system prompt, is there a way to know from model inspection, from the /models api endpoint (supported_parameters?)

nimble garnet
#

Hi

lapis hound
#

Tested Gemma 3n E4B it (local, fp16):

  • small multimodal local model, though I tested text only (due to lacking llama.cpp implementation)
  • capability falls between 4B & 9B Gemma models
  • I saw no hard refusals, though disclaimers and nagging that is present in whole family remains
  • It's a nice fast, small multipurpose model that can be used for easy tasks in anything except code
    Not exactly required for my use cases, but a nice alternative small model. YMMV.