#AI VTuber's in mobile platforms

1 messages · Page 1 of 1 (latest)

carmine lance
#

Let's try to keep this neuro-related, i don't want this to turn out negative and for it to be removed by mods

This may be less of a question or suggestion but more of actual discussion. This topic originally was in #neurotic-neurons but because that topic has died down I think it's fitting for me to move it here.

Few edits: Must be native to mobile, but AR devices like the vision pro are in a grey area for mobile devices in general due to a few factors

#

Here's some things we know so far:

Vtuber models are already publicly usable in android/iOS via the VTUBE STUDIO app

ARM can not be directly compared to both x86 and x64 architectures because they all run very differently

Phones do not have dedicated GPU VRAM but they have the ability to use dedicated memory (both Physical and Virtual) for graphics (example: OpenGL)

.

#

.

CPU peak speed is not true for all threads (For example, New chips may come with 2 High power threads but the rest are low powered)

Android phones usually have a RAM limit of 16-20GB with either 8-12GB normal and 8GB allocated from storage

silver dawn
#

honestly we'll probably have to wait for better dedicated inference hardware on mobile

#

and especially better standardized hardware

#

the only phones i know with decent accelerators are modern iPhones and the pixel

#

and the pixel is pretty niche

#

tbh apple has been making incredible future-facing plays on their hardware for the past few years

#

like we already know that all iphones will run a weak transformer for autocomplete which is pretty impressive

#

but if you've ever tried to get CPU inference for language models, yeah it doesnt work at all, you need dedicated inference

carmine lance
#

Even then, pixels don't have high enough performance overall so i doubt that would work, meanwhile apple on the other hand is too closed (as in its less customizable even with jailbreak compared to Android rooting)

silver dawn
#

eh, if they can develop good ML APIs like they did with Metal, ARkit, etc. it'd be great

#

iPhones are surprisingly prevalent in medical and VFX just because they're the cheapest device with quality a LIDAR ecosystem

#

in the future they might be the cheapest device with a quality AI accelerator API

#

simply put, iPhone sales top the charts and Apple benefits a lot from volume and integration, whether you see it or not

#

there's a reason the phones have some of the smallest batteries in the industry, yet still have some of the best battery life

#

it allows apple to spend less on manufacturing, deliver more exotic hardware, and charge more profit margin

carmine lance
#

Little reminder to keep your messages kind of short, they may be deleted by the automod

soft mural
#

You can easily run an aivtuber on a server and just stream the information to your phone. But if you want a fully functional aivtuber that works offline it's just not very feasible at the moment. Not that it can't be though, Apple is releasing their Vision Pro device that contains an m2 chip which is capable of running LLMs effectively. https://github.com/ggerganov/llama.cpp

GitHub

Port of Facebook's LLaMA model in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

carmine lance
# soft mural You can easily run an aivtuber on a server and just stream the information to yo...

Oh right, i forgot to mention that it has to be fully native to mobile.

The vision pro COULD count as a mobile device however it's in a big grey area, especially because of it's requirements (for example, that powerbank that you have to put in your pocket)

On the "easily run an AI vtuber" thing, you don't even need servers. Just make an AI VTuber, run it in a relatively mid range PC, then put it through something like anydesk, but that's pretty much cheating (and it isn't the point that I was making)

soft mural
#

Any llm currently small enough to run on a phone won't make very coherent replies (I've tried some very small models) and they'll also just be very slow. Like up to 10 minutes per response.

#

That could change, and might soon as companies introduce specialized AI chips into mobile devices to act as assistants.

carmine lance
#

Maybe in the future but that thought would probably have to wait

#

Or maybe now that i think about it it may not be a common thing on phones, especially because they want them to be thin without being extremely big

soft mural
#

There will be a big push to integrate AI into everything, just like everything needs to be IoT. That requires very efficient hardware, and a possible solution may be analog chips.

silver dawn
#

rn you can comfortably run a 13 billion parameter 4 bit quantized model on 10 GB of inference

#

if you do some fanangling you can get them running on 8 GB

#

research models like Phi-1 are 1.3 billion parameter, and demonstrate impressive performance

#

though it breaks down outside of textbook information and good spelling

#

but it's a good example of targeted training at very small models

#

it shows that you don't necessarily need large models to accomplish niche tasks

#

which i think can be applicable to a 9 IQ AI VTuber

#

so i short i think that local 9 IQ vtubers are closer than we think, a lot of LLM innovations will be on de-generalizing and shrinking models for the edge

#

and it would not be surprising if those models are able to run on "conventional" inference accelerators

lapis lintel
#

Looks like it’s by Microsoft not Google

#

That really proves how important the dataset is

#

Did they really only use textbooks?

silver dawn
silver dawn
# lapis lintel Did they really only use textbooks?

yeah so it involves generating synthetic examples (in this case with GPT-3.5) that match the style of textbooks, filtering sludgey "useless" training examples using another network, and then training the Phi-1

#

the reason being that there just... isn't enough textbook data to work with

#

so in a way, we're "distilling" a large foundational model's understanding of the textbook material into a smaller NN

#

so like they don't take the entirety of Stack Overflow Qs and As, they only take the ones which have meaningful... meaning

past kindle
#

yes Alpaca can tun in Android using termux, but its slow to the point your vtuber will have very delayed responses:
https://www.youtube.com/watch?v=NHJstwOyLmc
https://github.com/rupeshs/alpaca.cpp#android

Google PaLM can run better on TensorFlow light:
https://codelabs.developers.google.com/kerasnlp-tflite#0

➡ 詳細設定過程:https://ivonblog.com/posts/alpaca-cpp-termux-android/
➡ BGM: C418 - Mice on Venus
➡ 使用的媒體處理軟體: Kdenlive, GIMP

▶ Play video
GitHub

Locally run an Instruction-Tuned Chat-Style LLM (Android/Linux/Windows/Mac) - GitHub - rupeshs/alpaca.cpp: Locally run an Instruction-Tuned Chat-Style LLM (Android/Linux/Windows/Mac)

lapis lintel
#

It looks like you still need to request access but they emailed the weights to me straight away lol

sacred jolt
#

Am i missing something, or why don't you guys consider using a gaming laptop (Possibly an old one) as hardware?
Sure its bigger than a phone, but it gets around the ARM issue and its probably more powerful too.

carmine lance
lapis lintel
#

What about an 11-inch iPad Pro with an M2 chip? ARM is an advantage not an issue

#

The iPhone 14 actually has a better chip than the M1, the A16 Bionic, memory is the real limiting factor

#

16GB is the iPad Pro’s max which is fine just enough to run smaller LLMs

dusk atlasBOT
#

You have unlocked new role

lapis lintel
#

The iPhone 14 Pro only has 6GB

#

And yet my iPhone 12 Pro can already run some very small LLMs slowly

#

Well it’s not even that slow, 15 tokens per second

#

It’s not even that bad at generating content

#

I wonder how many parameters Neuro is, this is a 3B parameter model

sacred jolt
lapis lintel
#

👀

silver dawn
#

apple is always cooking

#

(i own no apple products)

#

but i'm always jelly

carmine lance
lapis lintel
#

Yeah it seems like they’re gonna use Google Cloud or something which is extremely weird for Apple

#

I assume the end goal is running their own LLM locally on an iPhone

carmine lance
#

https://arxiv.org/abs/2312.11514

A research paper has been released, trying to make LLM’s run on low DRAM devices

#

Here is a video to a summary made by TechQuickie: https://youtu.be/r0QiCJ04FkA?t=2m47s

If you do not immediately get redirected to the time after opening the link/checking the embed, go to ~2:47 or click the 4th chapter

Save 15% on MotionGrey’s Holiday Sale at https://lmg.gg/motiongrey and get a free mystery gift!

► GET MERCH: https://lttstore.com
► GET EXCLUSIVE CONTENT ON FLOATPLANE: https://lmg.gg/lttfloatplane
► LISTEN TO THE TECH NEWS: https://lmg.gg/TechLinkedPodcast
► SPONSORS, AFFILIATES, AND PARTNERS: https://lmg.gg/partners
► OUR PODCAST GEAR: https:...

▶ Play video
lament hinge
#

The Galaxy S24 Ultra is a step to say the least in raw power.

#

It has the new Gen 3 for Galaxy chipset iirc and it is a speedy boi

carmine lance
#

But the 8 Gen4 plans to have 4GHZ of power in it, yes, 4GHZ on at least one core

Idk if its confirmed

carmine lance
#

I literally cried when I saw this

merry fulcrum
carmine lance
#

I got news: I found a few apps that can run LLMs on mobile

https://github.com/Mobile-Artificial-Intelligence/maid (Maid - Mobile Artificial Intelligence Distribution for Android)

Or

https://testflight.apple.com/join/sFWReS7K (CNVRS FOR IOS/IPADOS/MACOS TESTFLIGHT)

GitHub

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely. - GitHub - Mobile-Artificial-Intelligence/maid: Maid is a cro...

Available on iOS

carmine lance
#

this is an ipad though so it does have a buff there, but ill try using MAID later

lapis lintel
#

GPT-3.5 like performance on a phone

carmine lance
#

Forgot to update this thread

I currently have a SD8GEN3 on my hands through the Xiaomi 14 Ultra and it works well running a 7b model actually

lapis lintel
#

How many tokens per second?

carmine lance
#

That's with 5 threads allocated

#

2.9 t/s on 8 threads

#

(makes sense cause of resource space for other apps)

carmine lance
#

On Phi-3-mini with a video playing in the background through Revanced, I get 7 to 9 t/s

atomic compass
#

How much does it drain the battery juice?

carmine lance
#

FYI this is the app I'm using is Layla Lite

carmine lance
#

Note: i have the Global variant, which is 300mah less than the China Variant of the 14 Ultra

carmine lance
#

I saw that too but no clue how to make it run natively on device

#

huh, it uses the llama tokenizer

#

Oh I just realized that name is similar to OpenNeuro-13B (a private LLM of mine meant for RP and stuff)