#▶|stable-video-diffusion
1 messages · Page 1 of 1 (latest)
!!!!
hi
Yahoo
whats up
Nice to meet you, Firat!
lol noice
lmao
40vram required ? xD
12kb vram required
just download some vram
You can wait for bot
Where am going to get 40gb of VRAM
oh what is this 40 ram
how many 4090s is that?
Like I do for hugging face spaces
404
lol i cant afford a A200 gpu
o
2
At least 2 4090s, so $4000 lmao
Welcome in!
you can make half video
You can rent GPUs on services like runpod, 40gb VRAM costs under $0.8/h on it
i dont have 2 4090s 😢
woah, i love this!
I see
i only have one 4070ti
wait you need that to run it!?
40GB vram bruh
run it on your cpu xdd
how that
yup 40gb vram
For 40 GB VRAM yes
ok now i love it slightly less
1x 4090 = 24GB VRAM
me with my 6700xt.. wonder if it is even worth trying
🤔 hm.... I WANT THIS!
the vram from 2 gpus won't add up afaik
Updated link in the announcement post!
we will only need 8gb
Only for 6090!
nvidia smiles with 12GB : well 12-16 now is not a big deal
mods bot account
40 GBS OF VRAM BRUHH
12 is sure good for an NVIDIA
12-16 now is even not enough at all, if you do ML stuff ^^'
ML?
you need rtx 6000 ada generation gpu for this video ai
Yo anyone got 10k? https://www.amazon.com/NVIDIA-Tesla-A100-Ampere-Graphics/dp/B0BGZJ27SL
The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world's toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale to thousands of GPUs or...
Machine Learning
i mean we were complaining about it while 40gb not even debatable xD
so u need rtx 4090 for machine learning?
can this run across two GPUs?
RTX 5090 is rumored to have 48 GB VRAM, but no confirmation on this yet 🙂
yes i totally agree xD i hope they forgot a " zero " by mistake xD
Depends on what you do, but for training purposes you even want more... and several of them in parallel
I think you need even more powerfull GPUs
topping out at 36 from what i gather
in 2 years from now ig hearing those numbers wont be impossible
damn idk how i can I get it then
Stability.AI should release their own GPU lineup with their models at this point
how to use this
We'll probably see an optimized version of the model soon, able to run on 3090s at least
I wonder how good it is
40GB of VRAM is correct for the local model - but there will be a web version for people to try out
!
Keep in mind this is targeted towards researchers & not the full commercial release so set expectations accordingly ❤️
Knowing Nvidia next gen will have the same amount as this 
ig renting gpu
the pipeline can't finish xdd
u can rent gpu now?
Yeah, just rumors ofc, it's most definitely up to change
You can already buy an A100 with 80gb or vram... for like $20k ^^'
21,000€
guess I gotta wait for the 24gb patch
damn
Or some ai videos
sadly 😭
thanx a lott for explaining, it makes sense for someone like me seeing those 40vram was a shock xD
can this run on two 24GB GPUs?
40GB VRAM for inference, rip fine tuning already
For sure! I can understand why lol!
WHY ARE YOU EVERYWHERE
12gb so it can run in Google colab
You can already do AI vids with way less than 40gb to be fair, with AnimateDiff

Result quality might vary though
You can still settle with tools that will do animations with less VRAM 
Yea
those arent AI videos
What are they if not that?
I wanna see how good this one is
14 mutual servers be like
All you gotta do is buy this bad boy: https://www.nvidia.com/en-us/data-center/a100/
maybe in the future we will get ai so good it will replace animators
yup, and so unrelated
all you need is runpod, vastai, bananadev or something like that
For real I share 7 servers with the guy
lol
lets go we're about to get some quality gifs up here
Someone mentionned the rtx 6000 ada, they're legit cheaper: https://store.nvidia.com/en-us/nvidia-rtx/store/?page=1&limit=9&locale=en-us&category=GPU
and videos of course
@ebon salmon I see you keep on top of the news
"cheaper"
sorry but i can't even afford a gumball wtf 😭
Glad to see you there 😛
Ours are pretty much all related though
rip my beta member role
There are some examples available on the website if you'd like to give them a look! https://stability.ai/news/stable-video-diffusion-open-ai-video-model
(Here's a video too!)
Yea
No, thanks!
oh so you need 40GB VRAM? rip
yes
Cheaper don't mean cheap lmao
ok ima see
I mean, even a 3090 today is nowhere near cheap >.<
And nowhere near enough either depending on what you do ^^'
this is absolutely absurd
Isn't this the old model? I thought newer A100's have 80GB VRAM
If I save money on a Ferrari is it still a deal?
thats amazing
They do, there are 40gb and 80gb models
Go now to waitlist... just have company name at hand XD https://stability.ai/contact
Gotcha
I cannot for the life of me find the 80GB models available anywhere online..
Runpod
but i cant get what is the different between this sd video and animated diffusion node ?
why do they look like animated vector graphics
I will compare same prompts in runway now
i might be able to convince our media production to get 80gb vram.
does anyone know how to run this and if it'll work across two GPUs?
Yoh, so how do we run this model ? Not obvious
Pls no don't make all of my workers throttled -_-
get 500gb and run the trillion param LLM
Idk if they sell those locally
apply for the waitlist
i would like to see if u can mention me with comparison
yes, it just released and i read through all documentation in like 3 minutes
where even is the documentation?
I wish they would 😦
ok I can show you the runway anims and u can check whether the runway is better or the stable diffusion is better
For local stuff, most of the time you're better off with multiple smaller cards, it would be way cheaper and not necessarily much slower
sure !
Any idea of speeds for the 14/25 frame versions on a "budget" GPU that can handle this like an a6000?
your move, openai
Yeah fair enough, especially with most general tasks. That or workstation gpus
Yeah but those gpus are mostly for data centers and enterprise workstations so b2b selling
is it part of a combined github repo?
Budget GPUs will likely not be able to run this during the initial research preview. BUT, we have a web interface coming soon that everyone should be able to play around with! More details on this later~
How do we run the model ? Can't see any code
first you need the hardware lmao
An a6000 48GB VRAM won't run that?
this is the astronaut walking on the moon
Some of the code and weights are just getting pushed through so they may take just a wee little longer to get going! Please bare w/ me!
Pizza time?
The model being released for research is a image-to-video model. The text-to-video portion will be included in the web interface to be released 
It showed it can to txt2vid
Oh ok
That's a nice magic trick
Pizza on the moon?
Thank you!
She made a slice of pizza... WTF... How did she do it?
Is that how pizza is made?
Yeah, I would love to know how to do that
yeah, how else?
Neat
Maybe pizza ovens are already running video models
two blue jays on top of a building. looks scary
I'm curious to test this and see how it differs with AnimateDiff
oh he didnt move at all, ignoring the unrealistic moon walk of sd video but as test i can feel it will be awesome later
Motion is definitively better, and from the examples I've seen there are less/no weird limbs suddenly appearing and disappearing
yeah runway is pretty bad for movement but maybe with the new motion thing it will be good
It's been 15 minutes and we still lack a video of will smith eating spaghetti. come on people.
im also gonna test pikalabs to see
i hope so, also thanx for sharing
yw
"keep my wife's spaghetti out your damn mouth"
soon in the future u can make a 20 min animation in 10 minutes 😆
Next stop... 3D rendering and destructive enviroments in games.
Apple M chip should work😁
Yeaaaah I wouldn't expect it too soon though... Currently on AnimateDiff, a much smaller model, it takes 2-3 minutes to generate 3seconds of animation on a 3090
We've already got 3d
will smith isnt eating spaghetti

This is horrifying
damn
3090 also has 24 gb of vram that a lot
It's not VRAM the bottleneck here ^^'
the bottleneck?
like cpu and gpu bottleneck
will smith is eating the spacetime continuum
Yeah, I generated animations on 3090, 4090 and A100s, and the speed different isn't really significant
Even if A100s have 80gb
damn u got a100 bro
No I use online GPU services xD
ohhh ok
You can even rent H100s nowadays
Legendary....
At $4/h tho
Well Davinci Resolve (example) you can render on CPU (free version) or GPU (paid version) up to 32k resolution. A 4090 does it jsut fine.
so that is 88 dollar a day right
wait
4 x 2 = 100
96 dollars a day
Is it billed by the second
96 x 30
69$ per frame
Me and you learned learned very different math
2880 dollars a month
Practically yeah, not sure but it's a relatively small time frame for price calculation
rendering pixels and inferring are very different beasts
Assuming he's running 100% 24/7
I only do sessions on 1-2h when I need to do some tests tho, I'm not rich enough to rent that h24 d365 😛
yeah assuming that
5.8752e+110 USD if you ran it till the heat death of the universe
give or take
Also there is the option for serverless, GPU clusters that are idle most of the time and only wake up (and bill) when used
ohh ok
Resolve has AI infusion. It can't be that different. The node system plugs into Autodesk and UE5.
artists lost their job now animators
and soon teachers
and pretty much everoyne will lose their jobs
All those jobs lost also created new jobs for AI whisperers tho 😛
there is still a major controllability and editability problem in AI
editing generative images and videos still requires expert skillsets
It's been improving all over the board with the progress
Which in turn will cause people to use their noggins to think of even more creative ways to make money which then will bring us further in our evolution. It's a cycle
Just look at SDXL, it's way better at following prompt than SD
it will come with time, but it definitely lags behind
Even if yes it still requires quite a bit of expertise to get the most out of it
for a 1024 x1024 image, that's a network of 1.048.576 "dimensions", the network has to decide what goes into each one of those, it's very different from rendering pixels, even in 3D you bounce a ray and it gives you a pixel color, it's less work
that's why in AI animation you need a lot of VRAM
isnt microsoft developing chatgpt v with sdxl
you need to fit the model for inference into ram
or developing sdxl with chatgpt v
Been there, done that 😛
So I'm reading paper for SVD and am confused, can anyone explain to me how many parameters does it have? Paper says that it's 1521M, but checkpoint size is 9 gigabytes. Did they ship optimizer state there as well?
I'm glad I'll be retired before they automate my job away
It's coming sooner or later
Stop the sillyness.
have anyone tried if this can run in 24G vram enviornment?
it can't
for now
with lowering to fp8 or fp16 to reduce vram consumption
NO!
Hey! So the Stable video diffusion isnt available yet right ?
Really stocked to see what it uses, if it use some AnimateDiff, Deforum, Ip adapters, LCM or if it's fully dev by Stability 👀
40gb for local inference
someone pin that
this question is gonna come up 300 million times in this chat
it is, you just need a hefty GPU (40 GB)
it is in research preview
40Gb of VRAM ?!?!
it is in announcement
yeah guys this is at research level for now
but my focus here is if it can be inference in fp8/16 without losing the quality too much
as it is fp32 model
which is why it is costing so much vram
it will get optimized, it'll work on consumer GPUs down the road
Oh wow xD, require so much gpus ahahah
Correct
Or they just upgrade it
Can you use multigpu for inference?
Saying, "this will be on the web" isn't addressing some people's concerns, such as will this ever be runnable locally on consumer hardware.
As I said, it's not a complete loss, it's more like a job shift ^^'
Anyone who doesn't have a spare 40GB of VRAM sitting on their shelf collecting dust should keep an eye out for the web experience coming 
another way of looking into this is that animators now have TOOLS that expand their horizons, make them more efficient and they can do more and better work
👁️ 👅 👁️
let's not be pessimistic, these tools are amazing
It's like looking in a mirror
Yes, all these tools help a lot in the creative process
true
Still, it's a radical change of tools for those not willing to change their ways ^^' I think there is still space for both anyway
True, but you also have to keep in mind this is an exclusive release targeted towards researchers while still in development. This is not a commercial release targeted towards both the hardware and use cases of the general public.
It's hard to apply one side's expectaitons to a different situation that wasn't intended to be the proper fit.
To me, the tools that will succeed the most in production settings are those that can use input images and work on top of them without destroying the intent from the artist. E.g. someone sketches a rough tree and adds the lighting direction and temperature, the AI output resembles the orignal sketch but rendered out faster.
Unspinned answer sounds like no. 🙂
There is space for everyone, for those who want to continue creating as they always have and for those who are interested in testing new paths
and the ones who will mix everything too
Text to image isn't as usable in every production setting, its fine for things that are unimportant or as a starting point that artists have to work on top of afterwards
2 x 3090?
i do believe by reducing the model to fp8 inference it would be very likely that would run in 24G vram enviornment
but the only issue is if the characteristic of learnt data would be lost when its inferencing in lower precision
not as simple as that
Diffusion models don't respond that well to quantization
But what if we ask it nicely?
who wants to go in on an A100 with me?
have tested by doing fp8 to checkpoint for lower vram consumption in ad generation, effectively cutting 18GB vram to less than 12GB, from fp16
without much quality lose
I will put in $20
but idk if thats applicable here
are you talking about a motion module?
both checkpoint and motion module
If you want, you can give me money and I'll keep the RTX 6000
Alternatively, I can go on Tiktok live and start ebegging
This is an image to video model not a text to vid right? the subreddit is saying image to vid
Correct! Image-To-Video with a text-to-video interface coming out on a web platform soon
I dont think that will work for me lol
I don't know why, but on the website the example clips looks like a Robot 'post nut', now stuck in a crisis.
https://stability.ai/video doesn't work
I'm sure it's been asked, but how do I run the model locally? automatic?
arg, why did they used SVD as abbreviation 🤦♂️
512x460 fits 24GB at least...results are interesting 😄
Just buy an A6000 if you have that much money anyway.
Where is the ComfyUI nodes for svd_xt models? Models don't work in AnimateDiff loader.
what is the difference between the xt model to the non-xt model?
non-xt was trained for 14 frame prediction & xt was trained for 24 frame prediction
At how many FPS? 8 like AnimateDiff?
(just wanted to mention for those that might find it's not enough: there are very good frame interpolation tools to transform 8fps into 64fps or more if you want, like RIFE)
variable rate up to 30fps
13 to 30 fps
AnimeDiff has a 16 frame context, you can combine it to any fps you like (via comfyUI, dont know about auto1111)
How to use the models?
it's mostly for research purposes right now, if your GPU can run it you can test it locally (you need at least 40GB vram)
it will be accessible via a website soon #▶|stable-video-diffusion message
The autoencoder for the video model is the same one as 2.1 isn't it? I'm going to end up testing it later to see if the temporal bits to the decoder work on other video models like AnimateDiff
it runs on 24GB with system ram fallback on, just slow as hell
Damn
go into scripts/demo/streamlit_helpers.py and enable lowvram_mode = True for model offloading and it will run on 24gb
set frame decoding to 5 in the UI as well
what card are you testing it on?
3090
UI? is there a UI already?
the streamlit demo in the repo
curious if the multi-view synthesis finetune of the video model will be avaliable at some point as well, thats far more interesting to me than the actual image to video, might be good for creating interesting 3d models or alphas
I'm getting ModuleNotFoundError: No module named 'imwatermark' with that script, even though I have that module installed
will this get support for RTX 4090 cards?
where exactly should lowvram_mode = True be set?
where its lowvram_mode = False now
ah I see I misread which file - thank you
now I'm getting ModuleNotFoundError: No module named 'scripts.demo'
I managed to fix the invisible watermark thing
nice, can you upload scene with human face in it? it's a crucial weak point in the most of competitiors, and it will be nice to see raw generation of the model
can we have a room for just videos?
Setup Instructions (Python 3.10, 4090, working on Linux):
- git clone the repo
- git clone git@github.com:Stability-AI/generative-models.git
- cd generative-models
pip install -r requirements/pt2.txt- double check that pip install actually worked. on windows you may need to comment out xformers and triton
pip install .- modify streamlit_helpers.py
lowvram_mode = True - create a
checkpointsfolder in the root folder of the project - download the weights from https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main to checkpoints folder
streamlit run scripts/demo/video_sampling.py- set "Decode t frames at a time)" to 2 or lower
- click "Load Model"
- upload image and go
this one turned a bit 3d but an example at least
thanks! wow, looks very good!
that assumes I have Linux. is the model only for Linux systems?
trying on m1 right now. lets see
ok, I bypassed the thing that wanted Triton. now I'm getting:
C:\Users\joker\OneDrive\Desktop\A.I\generative-models\venv\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") 2023-11-21 23:36:09.160 Uncaught app exception Traceback (most recent call last): File "C:\Users\joker\OneDrive\Desktop\A.I\generative-models\venv\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script exec(code, module.__dict__) File "C:\Users\joker\OneDrive\Desktop\A.I\generative-models\scripts\demo\video_sampling.py", line 5, in <module> from scripts.demo.streamlit_helpers import * ModuleNotFoundError: No module named 'scripts'
its hard coded for cuda. I suspect the code can be made to run on m1
File "/Users/bryce/.pyenv/versions/3.10.13/envs/gen-mdls-3.10.13/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
you need to pip install . probably
I did do that, idk what's the error I'm getting
how do you install?
pretty sure that fixed it for me though i did it with the -e flag, but not sure that should make any difference
wait why do you think this assumes linux systems? are you on mac or PC?
I'm on windows, when I followed the setup it errored out due to needing Triton, which is a Linux package
run your commands from the root of the project
I changed the requirements file, continued as normal and it still errored out, a completely different error though
i ran mine in wsl because of triton
that's what I did, I'll try in WSL I guess
you could try PYTHONPATH=. streamlit run scripts/demo/video_sampling.py
not sure if thats how env vars work on windows. apologies if its wrong 🙂
you need to install the generative-models module
pip install .
yeah, I already did that
oh weird, am on win & that fixed it for me
i can only get 8 frames on a 4090 but I see some opportunities for better memory usage
yeah, still getting ModuleNotFoundError: No module named 'scripts'
that's after the UI began loading itself in browser though
SD image to SVD, with 3x interpolation in post
it already got to a point where it automatically opened my browser, but it still spits that error
just made it to the same step as you
Congrats to Stability team for releasing a video model! GO JOE!
new we just need to foce gpu manufactuers to solve the vram proplem
also, maybe we can not call it SVD as that is kind of homophonic with another initialism
software drives demands. While this might seem like it, gaming cards are made for games. So convince developers to make games that need 40gb cards :X
ai cards are to expensive for consumers
i have hered you can mod gpus to dubble the vram by switching the ram chipp
thats right. but we use gaming cards to do it because gaming cards are priced for consumers. so we need to encourage developers to make games even more unoptimized and demanding on vram
start demanding 4k specular and normal maps
generated with SVD, interpolated 3x
ram is so inexpencive in comparison its just sad that thats the limiting factor
yes
you can but then you have to also patch the bios. it generally is only done on models where there's an 8gb and a 16gb varient
since the bios already has 16gb code
they shuld use uncompressed 8k textures for everything
you're getting it!
i don't think we'll get LLMs running locally in our games. Games that use those will tie into openai's api or whatever exists since that's busy self immolating right now
another option are games that will use ai feutres so nvida needs to add more vram for that
yeah. i think that stuff is going to be server side 😦
naw that would be hell
but you are probably right 😦
Microsoft just bought activision. do you think things will get betteR?
i hope the sever costs are so extreme and the user so unpredictibil that it iwll not work
You think GTA6 trailer is going to show off client side ai? lol
gaming cards were 4-24gb for about a decade now. at this point it probably won't cost NVIDIA more to bump that up to higher VRAMs
server costs are not free. and game companys that run llm 24/7 will notce that soon if they try to do that
true, but nvidia likes money and there's no gaming demand for more than 24gb. games can't fill the 4090 or 3090 up and won't for a few years. especially as software optimizations like nanite and lumen are brought in
even open ai seems to make a loss
yeah but server costs are also a huge piracy stopper
well even 3090/4090 struggled with cyberpunk before the last update/dlc
think open ai is a non profit that mandates their for profit arm to make as little money as they can
their server costs are basically donated by microsoft
you are not understanding how high llm server costs are. open ai makes a loss with a 20/amothn subsciption
you sure? cyberpunk has never filled more than 12gb of my vram
you're not understanding tha microsoft donates azure compute to them
Microsoft, the games publisher, who has legions of servers
yes but that shows that its not making profit. so toher comapys will not realy be able to do it unless they want to pay for gamers server credits
also NVIDIA keep acknowledging the existence of the many users of their GPUs to run AI locally. they literally made the last few drivers specifically for Stable Diffusion. there could be a good chance that they'll bump up VRAM in the future
i think corporate strategy is very much going to try to price home local AI out and consolidate it all locally. Thats why we need stability. They're the only major players releasing this stuff
good copium
Runway ML sure didn't take up the torch. Stability got an open model released before they ever put gen2 out
stabillety ai gpu when 👀
its true. it's a good halo feature to get people buying their brand over amd. AMD has sort of relied on pushing vram higher and higher over the years, as a way to stay brand relevant
another compnay making GPUs would still have to rely on the same silicon forges that all the processor makers do
microsoft making a processor design now even. everyone's getting into it
i'm hoping intel's new ML focused instruction sets will bring the kind of speed GPU's benefit with, to the CPU. Thhen we just need to stick a new dimm into an open bank
Tried and true scaling form
what the hell those results all look insane
what directory are you supposed to put the safetensor files?
the temporal coherence here is much much better just looking at it, no?
a lot less trippy-looking glitchy visuals
tim has used this model far more than anyone else on earth
when i inslal it do i need ti install the pt2 requerments or pt13 requermetns or something diffrent ?
those look really nice. how many frames are you getting? what are you using for interpolation?
default settings mostly, 14 frames and used topaz for this one
Is there a stable diffusion discord bot that I can add to my own server?
torch 2 or 1? I wonder why i can only get 8 frames on the 4090 with lowvram on.
2, didn't try using the streamlit
ah maybe that's it then - thanks
I also reduced the decoding_t... using only 2 now
what a time to be alive!
yeah that was the setting I was missing
how did u install it? i jsut need a overview idea
just clone the github and create a checkpoints folder and put them into it?
mostly just followed the instructions here: https://github.com/Stability-AI/generative-models
moved the default sample script to the root of the project and put the model from hugginface to checkpoints -folder
they are not realy detailed
@iron topaz imagiAlrgy-Bryce posted this earlier.
setup (Python 3.10, 4090):
pip install -r requirements/pt2.txt
pip install .
-modify streamlit_helpers.py lowvram_mode = True
streamlit run scripts/demo/video_sampling.py
I set up a virtual environment first though
you also need to do pip install steamlit
that's pretty much it yeah
but I ran into an issue
ok thanks
I've updated instructions. #▶|stable-video-diffusion message
nice i installed the wrong requerments file but i guess thats what conda is for xdD
This should get pinned
not sure what causes that
I am guessing because I can't figure out where to put the safetensors files lol
that might be the issue
i would jsut create a folder and called checkpoints in the main folder
if you look into the skript you see that it looks for it ther. but idk
i'll try that
why does everything happen when i am supost to sleep x-x
this is correct
yeah just looked at the time, last 2 hours just gone 😄
hahahah
the model is rly addictive to use
i didint even use it jet x.x
same 😢
I feel like I am so close to getting it working lol
just isntalling a ton of dependecies.
are you on windows?
ah ok
but my 3090 is halve broken let hope it works. becaseu a evil person sold it to me. in most tasks it works but at some intense ai stuff it blacksceens my pc
darn
i can play cyperpunk very well but animate diff for more then 20frames is to mutch
as 3D modeler, this is kinda crazy to me
cool
so I ran into 3 issues
idk if you will run into them
but this is what I had to fix
in video_sampling.py I changed it to streamlit_helpers import *
ok
what version of python are you using?
3.10
same but i am still downloading the wights
and you're running from the root of the project right?
had to do pip install torchvision and pip install opencv-python as well
on windows I just copied the scripts from the demo folder to the root and ran from there
Has anyone gotten this to run locally yet?
yeah, it does run on 4090 at least... dips into system ram so it's slow, but it runs
I’m guessing there will be optimizations in the future tho? I have a 3060
someone mentioned they were able to generate 8 frames on a 4090, that's insane
Hey! First time poster. I saw you guys just launched your video model. Super big step... congrats. I am intersted in working with it.... I am going to need a new system though. Other then the 4090 is there a card that runs the model well?
I'm doing 14 at the default res on 4090, it's just slow
A $6000 A100 🙂 (non-official answer)
When this has matured, will the VRAM requirement be lower or greater?
i got my pc when all of this took off and i thought i would future proof myself with an rtx 3070. welp, look where we are now 💀
i get this message 🤔
What type of system would run this card well?
no idea. i've never run one locally. probably rent a cloud one first
i think online renting would be smarter
I got rid of that by just copying the scrips from the demo folder to the root, and running from there, dunno if there's more elegant solution
show the command you're running to get this
pip install einops, pip install imwatermark, pip install invisible-watermark, pip install omegaconf ran all these other dependencies, thought it was weird I had to install so many, then it circled me back to the script issue again. I am going to make an issue post on the repo. I did get it to the main page, but as soon as I try and select a model it gives me an error
sounds like you didn't pip install the requirements file
witch one is needed the 2 or the 13?
pt2 worked for me and i haven't tried 13
which model did you use?
maybe it didn't succeed? or maybe windows is just hard. SVD
got this error when trying to pip install pt13:
ERROR: Ignored the following versions that require a different python version: 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement triton==2.0.0.post1 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0.post1
yeah probably comment out triton - might be just for linux. not sure
I am on WSL2 and I am getting the same issue: No module named 'scripts'
tried different python versions, conda environment, venv, etc...
show the exact command you are running
ok i solved it by moving the skript to the main directory. strage
move the skript to the miain directory then it works
and start it with this comand "streamlit run video_sampling.py
"
interesting. moving the script apparently works. wonder why I didn't have to do that
oh, I will try, thanks
you might try PYTHONPATH=. streamlit run scripts/demo/video_sampling.py
gpt4 said it shuld work ahahha
what options do u use?
just svd should be fine
why is ther no way to imput a prompt?
yeah thats not a feature of this
image to video?
I confirm this works for me as well, thanks man
?
you input an image and get out a video. no prompt
ok it isdoing something
hels hope that it does not crash
i am at about 20gb vram
looks like it uses 15 gb during generation
at what res are u genrating
it used 20 while genrating and tehn it crashed becseu out of mamorey.
how can i activate system memory fallback on linux?
make sure you did the low memory changes:
#▶|stable-video-diffusion message
daim i wish i saw this erlyer nice gude
nice it worked thanks for all the help
decoding 1 at the time actually makes this run fast on 4090, duh...
should've just done that from the start 😛
here is a 30fps version
i hope you don't mind me trying it out with your video
dont google cat ! xD
i get images like this, what are you getting? xD
how did u make the movment smoth?
images like this xD
i'm using flowframes :) in a minute i will upload an interpolated video of the interpolated video to see if it can smoothen the inbetween frames
instead of it being partly choppy
cool
Thanks @tiny cradle @iron topaz for the help. Looks like it's finally generating. Crossing fingers this time.
nice did u use the low vram setting?
stuck on ModuleNotFoundError: No module named 'imwatermark' unfortunately even though its installed
Moving video_sampling.py to main dir worked. I ran into some other errors. Said I had python version 3.10.13 and said I should use v 3.10.11 so I created a new venv and then I also had to do pip install xformers and I had to take out triton==2.0.0 from the pt2.txt file to get it to work.
yep
yeah I ran into that too
i commented out triton since i'm on a mac and i moved the video_sampling.py to the root dir. still no go though. your vids look good tho! 😄
#▶|stable-video-diffusion message follow ImaginAlry-Bryce instuctions
isn't triton for unix systems?
I am on windows
some forums say its for linux, not sure if its for mac though. i wasnt able to pip install it even with python 3.10. i'll keep pluggin away.
yeah, you are probably right. i'll just sit tight for awhile and work on some other projects : P
u can do this https://www.youtube.com/watch?v=2Tv5ZfPabGM
Follow along and set up LLaVA: Large Language and Vision Assistant on your Silicon Mac and any other llama.cpp supported platforms. The performance of 4bit quantized 7B model is amazing and this can be your local ChatGPT Vision alternative and keep your data private.
Timestamps:
00:00 - Introduction
00:59 - Installation & building LLaVA
02:18 -...
it can see images
I've managed to fix stuff, now getting FileNotFoundError: [Errno 2] No such file or directory: 'outputs/demo/vid/svd\\samples\\000003_h264.mp4'
yeah, LLaVa was said to even be a possible future text/image encoder for SD3
so if you don't have a 3090, best to just wait for the website launch?
probably basic question, when I git clone git@github.com:Stability-AI/generative-models.git it says "Cloning into 'generative-models'...
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists."
haha yeah, just barely able to squeeze it all into 24g vram with fp16 and decode_t=1
~2min for 25 frames with xt on my 3090
very cool
can you get the dogs to actually move and not just camera possibly, like if you says dogs running?
looks good though.
did you ever figure this out 👉👈
no you cant prompt it unfortunately
you can try changing the seed
Do mean can't prompt specific motion like animatediff, but general prompts work?
Yes, but it created more problems I wasn't able to solve.
me
This
you cant prompt at all(?) (in the current release. base model supports txt2vid/txt2img2vid according to the paper)
from hf page:
Limitations
The generated videos are rather short (<= 4sec), and the model does not achieve perfect photorealism.
The model may generate videos without motion, or very slow camera pans.
The model cannot be controlled through text.
The model cannot render legible text.
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.
here is a character moving... it's funny because it seems like the character is trying out some camera lenses that distort him, and he is aware of that... haha
ohh, yeah my bad. I forgot txt to vid wasn't released yet.
i get a bunch of motion distortions at the default 6fps but barely any at 12+
good to know, thanks!. I would love more explanation about the rest of parameters, like for example: s_churn #1, etc...
6/12/24 same seed
have you tried to loopback it? take the last frame and feed it again at 24 fps using the same seed
yeah theres a lot to mess around with for sure
im trying this rn 
haha, me too
ooooh yeah the autoencoder is lossy indeed 
anyone got advice for running in windows with 4090 - i have to restart the streamlit app after every attempt; seams like a gpu mem leak. Yes i'm using "Decode t frames at a time (set small if you are low on VRAM) = 1"
does anyone have a published script to run this yet?
it looks like stable diffusion didn't publish a script to use it
there's no leak (unless you try to switch to the other model
), just hit sample again & it'll go without reallocating
Is there a way to run this in a Colab? Would it be hard for me to figure out how to create one? I have so many questions and so little vram 😭😭😭
A lot of my generations are just doing simple translations of the image. Annoying. Any tricks to prevent that?
Dang looks pretty rough, that was my main interest
Maybe there will be a finetune or something
for 2D it most often does camera pans and doesn't really animate them at all
haha, just ported the VAE to AnimateDiff. It works really well for it.
It sure loves eating VRAM though.
is this a comparison?
Yes, first is the default decoder, second is the temporal decoder with timesteps=4 (which I am assuming is just the amount of cross-frame attention applied)
so the new vae is less noisy. nice
A small question, what's the difference between svd and svd_image_decoder?
Hello everyone
I created a Google Colab to test it, and it's working quite well with an A100. It downloads two models (svd and svd_xt) from Hugging Face. If it's useful to anyone, here's the link: https://bit.ly/stable-difussion-video
I'm also posting some results. Does anyone know which settings to adjust to make the movement smoother?
your link is not public
nice
Not very good quality with this type of images
It seems like theres some sort of a bias for looping vids or small movements...
svd uses the new temporal VAE, svd_image_decoder uses the normal vae that SD1.5/2.1 uses. The new one should generate less noisy outputs
is xt? did you change the default settings?
nope, only use The official website to generated(127motion) for free and then topaz it
Hi everyone. I’m trying to get this running on either a1111 or comfy and maybe I’m just tired but I feel like I’m missing something any help is appreciated.
Unless you're doing to the integration yourself I doubt those support this yet.
Ah well. I was hopeful 😅 not entirely sure how to do the GitHub pull to test this out as is. Thanks for the snappy response though.
I’m guessing it’s still unoptimized and would need a 4090 minimum?
someone was running it on a 3090. I believe it uses 15 gb when run in lowvram mode
So close to 12gb! I have a 3060 so I anticipate further optimizations
i'm trying run on A100
number of frames increases memory usage so you might be able to run it just generating fewer frames
Interesting, do you think there will be a ui for it soon?
how to use the new temporal VAE for AnimateDiff?
Depends on how much effort you want to put in 😛
Just wait for diffusers to implement it
I just extracted the decoder weights from the model and hacked together something horrible in a jupyter notebook
Hopefully diffusers implements some optimizations too because... well, the VAE is requiring 24GB to run on this machine.
Assuming I didn't break something which is entirely a possibility still, this desperately needs some form of tiling
it doesnt require that if you set the number of images to decode at atime to 1
Is that just turning cross-frame attention off entirely, though?
Also I must emphasize that I have completely gutted the model, I am just using the decoder. I'm guessing it's the timesteps argument?
Yeah that's the timesteps argument I think
Setting it to 1 reduced peak memory consumption to under 20gb
I know it does but that is what it is referred to as in the code /shrug
...no, memory usage is still at 20 when using timesteps=16?
I'm just going to have to run the profiler 
yeah I'm not sure but strong hunch timesteps isn't doing what you think
U should always buy a gpu new. Used gpus are often really used
what class is video_encoder
I remember that line of code though that you mentioned, it's just been getting late at this point and i don't recall off hand where exactly it is
sgm.modules.autoencoding.temporal_ae.VideoDecoder
inherits from decoder which I think has the actual important methods in it
hmm i'm tracing how the argument gets to the decoder
actually where does the pipeline where you use this value start? probably pretty far from where i am
n_samples = default(self.en_and_decode_n_samples_a_time, z.shape[0])
if isinstance(self.first_stage_model.decoder, VideoDecoder):
kwargs = {"timesteps": len(z[n * n_samples : (n + 1) * n_samples])}
so timesteps is related
you might be right
so I think you're missing this logic thats in DiffusionEngine
@torch.no_grad()
def decode_first_stage(self, z):
z = 1.0 / self.scale_factor * z
n_samples = default(self.en_and_decode_n_samples_a_time, z.shape[0])
n_rounds = math.ceil(z.shape[0] / n_samples)
all_out = []
with platform_appropriate_autocast(
enabled=not self.disable_first_stage_autocast
):
for n in range(n_rounds):
if isinstance(self.first_stage_model.decoder, VideoDecoder):
kwargs = {"timesteps": len(z[n * n_samples : (n + 1) * n_samples])}
else:
kwargs = {}
out = self.first_stage_model.decode(
z[n * n_samples : (n + 1) * n_samples], **kwargs
)
all_out.append(out)
out = torch.cat(all_out, dim=0)
return out
yeah... just found that. that is what originally led me to believe it was timesteps that was important, since it was the only place I saw it being set
but this function is running the decoder multiple times, to decode all the frames, and then compiling them together
so basically just what vae slicing does in diffusers
yes kind of
thanks for the help
I use 3090. This was funny and fast generation.
Hello,
I'm currently experiencing a technical issue with a Python script that involves downloading a pretrained model using the open_clip module from Hugging Face Hub. However, I'm facing a LocalEntryNotFoundError as the script is unable to access the necessary files from the Hugging Face Hub due to network connectivity issues. The specific model in question is CLIP-ViT-H-14-laion2B-s32B-b79K, and the file it's trying to download is open_clip_pytorch_model.bin.
Given this situation, I'm considering manually downloading the model file and placing it in an appropriate location on my local system. However, I'm uncertain about the correct directory where the open_clip module expects to find this file. The default cache directory for huggingface_hub seems to be ~/.cache/huggingface/transformers/, but I'm not sure if this is where I should place the downloaded file.
Could you please advise on the correct procedure for manually downloading and placing the pretrained model file, so that my script can access it without needing to download it from the internet?
Thank you for your assistance.
panzhong?
?
decoherence - is it done through decohereance?
Woohoo, new here
damm
how much VRAM is needed for svd_xt? I tried it on 3090 24GB and OOM was reported...
Hello all!
Does anybody know anything about motion buckets and their id's? For now i just spam some random int from 1-255 but maybe it means something 😄 😄
same 😦
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.10 GiB. GPU 0 has a total capacty of 23.99 GiB of which 0 bytes is free. Of the allocated memory 18.91 GiB is allocated by PyTorch, and 3.34 GiB is reserved by PyTorch but unallocated.
Works just fine with 3090 with 24gb vram. You can set the resolution smaller to test like 512x512px. Takes less vramn and renders fasrer.
The generated resolution is 1024*576. Pictures that are not of this ratio will be automatically deformed and compressed to this ratio.🥹
Look for decoding_t: int in the code and set it lower, defaults to 14 IIRC, can goes all the way down to 1, that should reduce VRAM reqirements.
Doesn't work on MPS though the stabilty-ai code assumes XFormers
comparing motion bucket id values, 1024x640, 25 frames with SVD-XT on 4090 (lowvram mode, decoding 1 at the time)
wait what lowvram mode 🙂 i want SVD-XT on 4090 😄 😄
in the streamlit_helpers.py file:
so basically the higher i put the number the more movement im expecting 🙂
yep, didn't think it would stay so consistent, need to test more, maybe lucky seed
Thank's DUDE! For the lowvram and for the comparison!!!!!
yeah I didn't even dare to try XT at first... turns out it runs just fine...
1 min 22 seconds per gen
ok i did that but still got omme. Btw how are you running SVD-XT? Through simple_video_sample.py script? Is there some kind of UI for this?!
Interpolated afterwards. SVD_XT, fps 6, S_noise 1.08 (gives a bit sharper image), Decode_T is 1.
It lost it's face, but i think it's cool.
answering here too so others see: using the streamlit UI
Does anyone have any idea why, after the first second of animation, it hallucinates so much? I'm using a T 25 at 6 FPS, with the XT model?
Any idea?
does this new model runs on a UI?
Try to raise s_noise to 1.05, 1.10 or 1.15... or something inbetween... with same seed from where you like the movement. I think that will give you sharper image. It will halucinate for sure, but I think your video is soft because of that. It is a trial and error until you get something really cool.
Note: 40GB of VRAM required -> LMAO
I made a Google Colab, is open to clone:
https://bit.ly/stable-difussion-video
awsm thanks man
Yes
better
It only needs 24 with some spesific settings
acknowledged, still way to much for my little 3060 with 12 ^^
Next gpu gen needs at least dubble the vram ! XD
bet hahah
I have q idea for later. U automatically grab the last frame of a video and make a new vid with it and make q long video with this 🤔
at least this worked with other text2video tools (pikalabs, runway)
Ok... am i correct that each seed is attached to a certain camera movement? Might be super wrong 🙂
or the other way... how tf the model selects the movement 🙂
I was wondering that too.
that's a good concept but works weird in practice - the model will often not know what motion was happening in the video if you feed it only one frame, and you'll get sudden changes. If you input multiple frames at a time you can theoretically build an automatic continuation system.
seeds as always with diffusion strongly influence results, but there's not a 1-to-1 correlation where x seed always creates y motion in any video
i think if you edit some motion blur into an image that'll strongly influence what motion the model creates in a more controllable way
I wish i would know how to imput 2 frames
tbh a lot of the early days here is gonna be stuff for developers to play with moreso than end-users
normal people get to have the most fun with AI tech only after developers have figured out to how to make it work right and then build an interface around that
I am trying to find the sweet spot for generations and I have a couple of questions for everyone. I put my answers on the side:
-
What OS are you using? Windows
-
What graphic card are you using? RTX 4090
-
What image size are you using for the image generation? 1024x1024 will be testing smaller ones now haha
-
What T value are you using? 48
-
What FPS are you using? 12
-
What Decode t frames at a time are you using? 24 but 48 is way faster but more unstable
-
How long does the generation take? 3 hours. decode t frames at 48 is good and takes about 10 mins, but crashes. Trying to find sweet spot for 48.
-
What errors do you run into when generating? I only get errors at the very end. Either low vram errors, Expected all tensors to be on the same device, but found at least two devices error. Its too bad I have to wait a long time before I can tell if it errored out or not.
will SVD only run on machines with 4090 or greater? I have 3070ti and was wondering if it's even worth it to install. loving what I am seeing so far from what people are generating. thanks in advance
Anyone running SVD with a new MacBook M3 Max? I'm attempting but having issue with Pytorch not working device=mps
I read that someone said that it's not possible on a mac yet.
Ooh okay — I saw cocktailpeanut (on twitter - https://twitter.com/cocktailpeanut/status/1727068314583670969) trying and got me excited
how does one reduce this? I saw someone say look for it in the code but I dont know where
in the streamlit ui it's the very last setting, with the command line script it's very clearly documented in side the script itself (simple_video_sample.py)
Was thinking about what a version of this model designed for low vram would look like
Seems like the minimum would be one that just takes in two frames and generates a single new frame from it
1024x576 is the canonical resolution
the model itself can definitely run in lower vram that the demo code has it
just, yknow, day 1 demo code is always built for the hardware it was trained on, not end-user/consumer tier hardware
(remember: SDv1 at launch required 24GiB of VRAM!)
Gen-2 / Stable diffusion video
Ooooooh my 🙂 This year the Holliday cards will be AWESOME 🙂
SVD moves me lot!
https://x.com/o_ob/status/1727381562311012685?s=20
Original "singularity tunnel" image-to-video Demo
https://www.pixiv.net/artworks/113439831
↓
https://youtu.be/KumvEA6Wu2s
物語のはじまりを語る上で
バックパックとマフラーを装備した
女子高生の背中が力強い事を知った
DCEXPO2023講演「クリエイティブAIとAIDXが拓く新市場 - メタバース・放送・メディアアートのその先に」シンギュラリティについての考察補足|しらいはかせ(Hacker作家) @o_ob https://note.com/o_ob/n/n22d154730de2?sub_rt=share_pw #note
Original "singularity tunnel" image-to-video Demo
https://www.pixiv.net/artworks/113439831
↓
https://youtu.be/KumvEA6Wu2s
#SVD #StableVideoDiffusion
thanks. it fixed my error
is the only way to run on Apple Silicon so far?
I've been trying to get apple silicon to work. Worked through a bunch of issues but finally hit one I think I can't work through:
RuntimeError: Conv3D is not supported on MPS
https://github.com/pytorch/pytorch/issues/77818
Hi, i tried your code, but it is giving error about OS. It seems like i have all your code as one cell, otherwise, it says os is undefined. Is there a way to run it in colab without offloading it to ngrok?
as for me it's constant Connection error 502 while attempting of load a model
I'm gettting another error for your colab file. Error: Invalid value: File does not exist: video_sampling.py But I have the video_sampling.py loaded. Sometimes I think colab doesn't recognize the code before.
Hello
Hi!
I just updated it, can you try again? It should be fine, and separated
There is one line, where I copy the script to the root, make sure you ran it
"!cp '/content/generative-models/scripts/demo/video_sampling.py' '/content/generative-models/' " "!pip install -q streamlit
!pip install pyngrok " I ran it . still same error that video_sampling doesn't exist
check on the folder, there is any video-sampling.py on the root?
I'm getting an error there is no such a file.
check if the file exists where is looking /content/generative-models/HERE
if is not there, copy it or move it there
It is not there
then the line where is copied is not working for some reason
maybe some of the lines didnt run correctly
you have the project in the folder?
I copied /content/generative-models/HERE it is giving me error when i copy it, saying no /. so i wrote generative model, still doesn't work. Where is the actual file video sampling? /content/generative-models/HERE is just a path
Stable Video Diffusion - Good/Bad cases
https://youtu.be/v9DyHMmmxg4
Stable Video Diffusion - My first try at 23rd Nov 2023.
Article
https://note.com/aicu/n/n509bd1d01d91
Original Pictures
https://www.pixiv.net/users/1355931/illustrations
Sorry
Has it been optimized for consumer hardware yet? I have a 3060 (12gb vram)
This
?
gonna have to wait a lil more than a day for that lol
it'll probably be running on 3090s within a few days
3060s will take longer (weeks/months, not sure, definitely not soon)
its already running on 3090
i dont know an obvious way to get it down to 12gb though
maybe if you only generate 2 frames
only posts of it "running" on a 3090 have been buffering all the mem to CPU and taking 3 hours, doesn't count
no it was running quick
It does takes some tries to get a good generation going, and too much dynamic a pose ends up deformed. But when it works, it does movements I haven't seen any other video generator pull off before
it didn't take more than lowvram = True and decoder_frames = 1
id have to scroll up a bit to find who it was 🙂
Lowvram option seems to run it at fp16
dude, if you try to run it locally on a mac, it will take forever to generate a video. It takes like 20-25min on an m2pro to generate 120 frames using deforum in Automatic1111. I imagine it will be even slower since it requires more ram
I actually got it running in comfy too, hacky way and not a proper implementation, but it works
25 frames at the default Res, with fp16, takes just bit under 20gb
Lol it is probably a lot easier to run it using comfy rather than straight up code in colab
But I couldn't get it any lower even by reducing resolution...
reducing number of frames reduces memory needs
Well easier to implement queues and stuff in comfy, can also generate the inits etc.
so we need to see what peak memory is at 4 frames
Yeah frame count influences it greatly
Can this model be fine-tuned so that it does more specific things and works faster and with less vram?
they've already announced it will be finetuned to do a bazillion specific things
@shell plume everything runs great – downloading and copying files without any problems. but after I try to run it I see this in console:
VideoTransformerBlock is using checkpointing
^C
one time it have downloaded .bin file while attempted to load a model in app, but it was only once and now it keeps fail
I downloaded the weights. where is the checkpoints folder?
you make it in the root of the project
so if I'm in Windows, in my generative-models folder I can create a folder called "checkpoints"?
yes. unrelated to OS
ah, good to know. thank you. so once I create the folder called "checkpoints" I can continue with your install steps? thanks so much! have a great day 🙂 Im gonna try and get this to work on a 3070ti over the next few days
put the weights in the checkpoints folder yes
I updated the instructions to include creating the checkpoints folder
awesome! Im sure that will help other noobs like myself. I tried to run the streamlit and it returned this error
(.pt2) C:\Users\xxxxxxxxx\generative-models>streamlit run scripts/demo/video_sampling.py
'streamlit' is not recognized as an internal or external command,
operable program or batch file.
i guess pip install streamlit
but streamlit is already listed in the requirements file so it makes me think that step failed for you
seems to be working now. thank you
installing streamlit that is.
Using cached smmap-5.0.1-py3-none-any.whl (24 kB)
Installing collected packages: pytz, zipp, watchdog, validators, urllib3, tzdata, typing-extensions, tornado, toolz, toml, tenacity, smmap, six, rpds-py, pygments, protobuf, pillow, packaging, numpy, mdurl, MarkupSafe, idna, colorama, charset-normalizer, certifi, cachetools, blinker, attrs, tzlocal, requests, referencing, python-dateutil, pyarrow, markdown-it-py, jinja2, importlib-metadata, gitdb, click, rich, pydeck, pandas, jsonschema-specifications, gitpython, jsonschema, altair, streamlit
makes sense... but still interesting how it would perform
might have been because earlier I was having to convert the instructions from Unix based to Windows based with GPT help lol
i just added this step:
- double check that pip install actually worked. on windows you may need to comment out xformers and triton
torch doesn't support the operations needed on apple silicon. so it doesn't perform at all
awesome. you're the man! it said it worked I think
Successfully installed MarkupSafe-2.1.3 altair-5.1.2 attrs-23.1.0 blinker-1.7.0 cachetools-5.3.2 certifi-2023.11.17 charset-normalizer-3.3.2 click-8.1.7 colorama-0.4.6 gitdb-4.0.11 gitpython-3.1.40 idna-3.4 importlib-metadata-6.8.0 jinja2-3.1.2 jsonschema-4.20.0 jsonschema-specifications-2023.11.1 markdown-it-py-3.0.0 mdurl-0.1.2 numpy-1.26.2 packaging-23.2 pandas-2.1.3 pillow-10.1.0 protobuf-4.25.1 pyarrow-14.0.1 pydeck-0.8.1b0 pygments-2.17.2 python-dateutil-2.8.2 pytz-2023.3.post1 referencing-0.31.0 requests-2.31.0 rich-13.7.0 rpds-py-0.13.1 six-1.16.0 smmap-5.0.1 streamlit-1.28.2 tenacity-8.2.3 toml-0.10.2 toolz-0.12.0 tornado-6.3.3 typing-extensions-4.8.0 tzdata-2023.3 tzlocal-5.2 urllib3-2.1.0 validators-0.22.0 watchdog-3.0.0 zipp-3.17.0
🫂
pity
Just wait for Draw Things' developer to add this. It might take a bit of time.
Hey
did u get colab to work for stable vid dif? from ur older comments, u encountered some errors
when I tried to run streamlit it returned this
Traceback (most recent call last):
File "C:\Users\xxxxxxxx\generative-models.pt2\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "C:\Users\xxxxxxxxx\generative-models\scripts\demo\video_sampling.py", line 3, in <module>
from pytorch_lightning import seed_everything
ModuleNotFoundError: No module named 'pytorch_lightning'
yeah your pip install of the requirements file must not have worked
should I go back through and do the install again?
- Clone the repo
git clone git@github.com:Stability-AI/generative-models.git
cd generative-models - Setting up the virtualenv
This is assuming you have navigated to the generative-models root after cloning it.
NOTE: This is tested under python3.10. For other python versions, you might encounter version conflicts.
PyTorch 2.0
install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txt
3. Install sgm
pip3 install .
4. Install sdata for training
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
If not the whole thing, which parts?
pip3 install -r requirements/pt2.txt
I did that. it returned this error
ERROR: Ignored the following versions that require a different python version: 0.55.2 Requires-Python <3.5; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0
yes likely you need to comment triton and xformers out of the requirements file
it keeps disconnected for some reason, I dunno why
ah I see now., sorry I missed this.
requirements pt2 or pt13? and in commenting out do I just delete those rows and save or is there another process?
ok so save that pt2 file after commenting out and try to install the requirements again?
yes
Hi all, I made instructions on how to install stable Video Diffusion on windows. Here is the text:
Setup Instructions (Python 3.10.11, 4090, working on Windows):
Go to user directory
right click git bash
git clone https://github.com/Stability-AI/generative-models.git
-modify streamlit_helpers.py
lowvram_mode = True
move video_sampling.py file to main dir
create a checkpoints folder in the main dir
download the SVD weights from https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main
(optional) donwload SVD-XT weights from https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main
-modify requirements/pt2.txt file
remove triton==2.0.0 line and save
-modify requirements/pt13.txt file
remove triton==2.0.0.post1 line and save
Open Anaconda
cd to user/generative-models
conda create -n genModelVideo python=3.10.11
conda activate genModelVideo
pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl
pip install -r requirements/pt2.txt
pip install .
pip install -r requirements/pt13.txt
streamlit run video_sampling.py
click "Load Model"
upload image and there you go.
Will get a tensor error but you can ignore it. Still seems to work
*try 48 decode t frames for faster generation
and here's a video explaining it
Setup Instructions (Python 3.10.11, 4090, working on Windows): https://pastebin.com/YpqNSHFy
Requirements:
A good GPU. RTX 3090, RTX 4090
Anaconda
Git
Generative-Models github
SVD or SVD_XT
Download Links:
Anaconda: https://www.anaconda.com/download
Git: https://git-scm.com/downloads
Generative-Models Github: https://github.com/Stabili...
dont install both pt13 and pt2
why?
makes no sense. you're supposed to pick whether you're installing pytorch 1.3 or 2.0
and svd only works with 2.0 i believe
has anyone tried to run this on CPU? it seems to be bypassing the GPU memory but I bet it will output slower
I'll test it out
I did run svd and pip install pt13 last and it generated a video
trying svd_xt right now
well who knows what version of torch you had installed since you tried to install both
ok i'll take a look which version I have installed in my venv
I tried the svd_xt and it also generated a video
okay i installed all requirements without any errors.
when I ran streamlit it returned this
C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\torchaudio\backend\utils.py:74: UserWarning: No audio backend is available.
warnings.warn("No audio backend is available.")
2023-11-22 12:49:08.964 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
exec(code, module.dict)
File "C:\Users\xxx\generative-models\scripts\demo\video_sampling.py", line 5, in <module>
from scripts.demo.streamlit_helpers import *
ModuleNotFoundError: No module named 'scripts'
don't know if i'll run into issues later though everything seems to be working.
that is surprising. but either way, installing torch 2.0, only to unisntall it and then install 1.13 doesnt make sense
I believe when you go to install 1.13, it ignores packages that are already installed.
Therefore, any matching dependencies are just skipped
yeah I agree. I was running into lots of issues trying to install it so I was trying anything to get it to work lol
now this one, i'd love to get to the bottom of this. everyone on windows runs into this
Could cause issues with torch though, so I usually manually install this after to match my CUDA version
Move the script to the parent folder
That would be generative-models
people have fixed it by copy pasting the video_sampling script to the project root. but that shouldn't be necessary. not sure whats going on
doesn't work with cpu because some of the layers only support cuda
It has something to do with python path, though, even manually setting it to the folder, I'm not sure why it doesn't find the scripts folder
i suspect that if you use torch 2.1 then both triton and xformers are not needed
No I tried last night. I tried to just do the install with just pt2
ran into errors then had to manually do pip installs
then just got into a loop of errors
then I went back and looked at any errors while I was installing it
in my custom fork i'm not using xformers but maybe I made some tweaks to get that working
and triton gave me an error and I think it skipped the rest of the requirement installs
I think that was the issue
PYTHONPATH=<<directory to git root>> streamlit . . .
Ok. After i generated 2394 nsfw animations i tried architecture... THIS IS IT.... 10x10. No cherrypicking.
Looks good 👍
wonder if i should bother with it using 6700xt at 12vram
Stunningly beautiful oak tree, on the edge of a forest, in the foreground there is grass blowing in a gentle breeze, the tree is in the middle ground, summer, in the background a gently rising hill
for anyone using Comfy and feeling brave, here's my very VERY early node to run SVD in Comfy:
https://github.com/kijai/ComfyUI-SVD
ok I'll try it
still need to add rest of the settings and better memory management to allow larger workflows around it
impressive
latest comfyui?
i had it gone once but had to update comfyui and manager, dunno what fixed it really
ok
i just updated it
ok yep
that was it, just need to restart it
can I drag and drop this workflow?
For me we entered different times... I feel so empowered. 10000000000000000% power up
the video_sampling.py script?
ELI5
i clickek on this, but i'm so dumb :((( haha i don't know if there's a elevated soul that could send the final method for using the model, installing, thanks ❤️
what is Anaconda? is that something i need to install?
pip install Anaconda? sorry im such a noob
need to download it here
you also need git
What are the requirements for my pc to run this smoothly
works on my rtx 4090 I've read other people where able to run it with rtx 3090
I have rtx 3080ti is it enough? What about ram and storage
I have Git. DLing Anaconda now. thank you
I need someone to optimize it for 3070ti and I will jump through my ceiling
how much vram do you have? You might as well install and test it out. You can decrease the decode t frames
might be not sure, just try the install. I read that it offloads ram when GPU memory is all used up, but it doesn't work on my windows machine I got an error, so it's best just to use the VRAM from your GPU
the workflow is just an example, it's just the one node, I could add workflow with metadata tho
nice!
hey @shell plume , curious why you !pip install what you did in your collab:
!pip install -r requirements/pt2.txt
!pip install .
!pip install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
Wasn't clear on why the . and datapipelines.git?
@tiny cradle is it theoratically possible to add a controllnet? i have hered that its baised on sd 2.1 is that true?
i doubt that existing controlnets can be applied but who knows
certainly new ones will be made that apply
yes i hope that too.
video, on low-vram AMD -- uh, yeah, no, not right now. Wait for a wait for code upgrades n stuff to come out
yeah ig... animatediff is not working for me, only deforum does
how are you getting an h265/video output in VHS?
Thanks mate! I suck at comfy, is there any way I can import your workflow in?
Doesn't it have it by default?
I'd need to make a better shareable one, this is just one node and I don't know yet the best way to use it, basically you can take any default comfy workflow and stick the node in
I will try! Thanks!
Video helper suite (VHS) has good nodes to make it into a video
So I basically stick it to the very end of my workflow?
Yeah if you are generating a single image
Just make sure the resolution of the image is compatible, needs to be divisable by 64
how do I set the streamlit demo to use multiple GPUs for shared VRAM?
not on mine, might be a mac/pc thing?
the streamlit version outputs two videos for me every time
Probably, I think it checks what's available, like if no ffmpeg found you don't get mp4 at all
ive got the ui set up and running, but no clue what settings to change lol
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
I'm stuck installing requirements in Linux
I had errors to , attempting to install it in linux.
I've got a working colab going of my own design. I believe that's Linux.

