#▶|stable-video-diffusion
1 messages · Page 3 of 1
We got SVD working on imagineapp.co now running on our cluster of 4090s
so when can joe average use this video stuffs?
Define joe average
me haha
I own a 3080 10gb gpu and it's pretty cool. I think it can run with 8gb tho
oh how do you use it? i thought it was like invite only or something
is this just like something to dl on automatics?
Works on comfyui. You just need to download one file and get the workflow
interesting. thanks. ill have to look into comfy. quite frankly i thought it was horrible design
first time i tried it and got rid of it
im on a 3080 too so i guess i can run it
It's hard when you need to create a workflow. This one is quite a simple one. You just need to load your picture and hit play
It's looks like that
A bit less
A simple short movie? Yes.
yup, generating the 14frame videos on my 8gb 3070ti in under 2 and a half minutes at 1024x576
what format are the files?
New version, look those lights/reflex moving. This thing is really cool
looks clean
Anyone getting this silly error "IndexError: list index out of range"
Are you loading the right img2vid model?
yes
SVD + Topaz 3X Slow Motion
What you're doing is illegal
👀

im guessing its because svd is strictly for research and not commercial apps rn?
I'm still having issues with eyes.
How could they tell they used the svd
read what he typed and also what his bio states lol
well first of all he admitted it here in his message, but second, SVD is easily identifiable.
4090
I think like 40-50 seconds, not sure
I think I've seen people generate stuff with 10 GB VRAM
thanks ill look into it
I'm going OOM with 16GB 3080 and decoding_t=1, what am I doing wrong?
What workflow are you using?
Do not post videos of Hitler on here.
Sorry
Just trying to get the default stuff working here:
https://github.com/Stability-AI/generative-models/blob/main/scripts/sampling/simple_video_sample.py
Went OOM on an L40 w/ 48Gb of ram doing the same thing... I'm definitely doing something wrong.
clone, make a venv, pip install, python -m scripts.sampling.simple_video_sample
We are doing mostly this https://comfyanonymous.github.io/ComfyUI_examples/video/ i have a 3080 10gb and works well
and download the models, before all that
I never got into comfy, UIs are cancer
I'll look at it though, since it works!
I'm done for today, it was fun.
I was running it basically like this before it was on ComfyUI
https://youtu.be/HMW9hVoQa0M?si=rHYYPW_N_DhPbJ7L
lowvram_mode
https://youtu.be/HMW9hVoQa0M?si=rJ4ZbWkVfjKYEXCx&t=152
It's a really simple UI for this job, maybe it will give hints about what is wrong/different. The original model needs a lot of VRAM
I actually just got it working on a 48Gb L40, decoding_t=4 seems to take up ~20Gb so I don't get why it's going OOM locally w/ 16Gb... 😦
activate the lowvram_mode
i'm trying not to use streamlit
also fish
if you're running the example script, it uses full precision and doesn't offload the model to cpu in between sampling and decoding... it's very badly optimized, comfy's implementation is the only proper one and uses just 8GB for 25 frames
ah, I'll just fix it then... I figured it was something like that...
I also figured they'd have tried to make the code runnable on 16Gb in their sample
I guess there's enough momentum they can just release whatever and everyone else will fix it 🙂
bit hasty release in that sense sure, but it's just research release
however the implementation in comfyUI is already excellent now
cool, I'll look at their code then, thanks!
just tried it for the first time, really cool!
@grim tangle You have already generated many grids, what do you think are the best default parameters for SVD?
too early to say, but from samplers euler with karras seemed consistantly good, 12 steps seems nice spot for speed/quality with FreeU_V2
can always increase steps if you get something you like
motion bucket/aug seems to depend on the input/seed, motion amount is consistently controllable with those in most cases
so many things to take into account though, much to learn still!
fax
I've been playing with this all day today.
- rife with a 4x multiplier helps, otherwise, it looks pretty jittery
- on a 12 GB card, I can do seemingly as many frames as I want.
- CFG 6.0 seems to yield more 'active' results. I can't help but feel that results vary though haha.
- bucket set to 50 has the best results across the board that I have seen so far
- I moved denoise to .75 instead of 1.0 and it seems to be helping. could be voodoo magic though.
This is all I seem to be able to do so far
will try this now
I have yet to try img2vid tho
Which graphics card would be better to buy for SDV a gtx 1080ti 11gb or Rtx 3070 8gb?
here's my setup so far. I have had a harder time managing the SVT_XT for some reason, so I just do SVT. I use facial restore as well, sometimes the facial changes look a little janky, so that normalizes it.
I have a workflow Kexo sent
I will add the face restorer to it didnt even think of it
ty
Way better
11GB one. More VRAM, more space for AI stuffs. You can also look for some old 2000 cards with 16GB VRAM
Same input as before but I upped frames to 48 and changed denoise in KSampler to .66 - behold wiggly jello planet
You want as much VRAM as you can afford to throw at it, it was trained for 40GB server hardware.....
You working with comfy, right?
I am!
running in cli may or may not have better results though 
Cool! Just now seeing your workflow you posted above too
I got it running in CLI originally but have had best results with ComfyUI so far on the latest portable build. It has a good set of Torch/CUDA built in that you may not already have setup if you run from CLI. So I recommend anyone trying this to start with ComfyUI....
its certainly easier and more customizable in comfy as well. and - you can throw in interpolation, which looks ten times better.
yeah it's pretty janky with low fps if you want to render in reasonable time
1000 series nvidia is slow so you should avoid it even if it has more vram
3000 series is going to be much faster
thats amazing. how many samples are you using? mine take a couple minutes at 25 samples
Pretty insane how easily it can do this with txt2vid tbh
try throwing some people portraits in it 😉
Eh I've already seen 25 people doing that
trying photographs now. cant share the results though haha.
next im going to try scenery shots. should have good results.
the default values in this workflow made a scenery shot
no idea why this pans right but when I changed the content it starts to zoom in
beautiful
Okay here we go
Starting to get more interesting
It's like it masks and everything for you somehow
i think it depends on the colors of the input image, stuff with colors that imply depth tend to get more of that layered look and zooming motion from my experiments
what are prompts like for this stuffs?
and so you have to use it like image 2 image??
I'm making these txt2vid
in comfy?
Yup
yeah, I will try to animate some stuff I created with the Cnet Monster Module to compare with animediff
I'm not even using control nets or anything
It's able to do this pure txt2img generation with just my LoRA, it's actually pretty sweet
great, I don't have a Lora for that, so if I need a logo, or Text on an Image I use the monster module https://streamable.com/4tv4sq
Yeh qrmonster is great, but that's for more placing text on the canvas and then generating over it. With my model, you generate the entire image from scratch without anything else, no control nets, img2img, text from fonts, etc https://civitai.com/models/176555/harrlogos-xl-finally-custom-text-generation-in-sd
got it, thank you for sharing, I will try tomorrow.
Awesome! I'd love to see what you create.
OlivioTutorials actually just did a video on it tonight, with a pretty thorough walkthrough for using it
I just sent u a PM, I just wonder what we can create with these new models with prompt schedulling and other things in the future.
If you take a peek at the gallery on the bottom of the model page, you will see that people have come up with some insane stuff already
the best is this one.
Results may vary
oh wow, could script an entire story with this
LOL this is exactly my experience so far as well
I'll give it to you, that one is top shelf, but there's so many other good ones
trying out @icy valley lora
Rainbow + pixel art = 
Hmm I wonder if the graffiti style would have it parallax from the wall 🤔
Damn this circular motion is dope!!
was meant to have the text 404, trying again 🙂
I dunno what causes that waviness tbh but when I got rid of it, all I get now is zooming straight in and out
it's just sort of random, the training vids only had a few types of movement so changing the seed for the sampler after the SVD node has the most effect on type of camera movement
Are you guys doing txt2vid?
testing out 404 now
Although I guess it's the same, you're just making the image to send it in the same workflow
well txt to image and image to SV
the seed before this for the one above was just a straight zoom, same img input....
although I guess I've defected to GlitchGang
nailed it 
you made any loading animation loras?
lora_panrightv2 mixed with your text lora might do it, but probably a bit more technical than that 😄
Just Harrlogos and the 404ra so far. But Harrlogos is unbelieavably versatile.
Oh are you guys using AD motion loras with it?
no, i tried to hook them in but couldnt work out if that works yet, probably doesnt
Cuz like if you see the first animation I did
the motion is completely different
And ever since adding in my LoRA I havent been able to achieve anything like this again
tried to make the text 555
how are you adding loras?
hahahahaha yeah the 404ra is trained to do only that, although a buddy of my made it to 1337 before
Same as any other Comfy workflow
can this run with 8gb?
Supposedly
im running it on 8gb
then yes
Ayoooo lets go
so like a normal lora not a motion?
I need to learn comfy, yall know a good guide?
Yes, I'm using my Harrlogos LoRA to do the text
i have made so many that don't actually say CATGIRLS that I gave up, have a CATRIS instead.... lol
Is there something like a controlnet for video yet?
lol it just came out dude
Hmm, SVD works with CGI, not bad...
You don't have a preview for the img?
I'm not expecting there to be anything like contrlnet, I'm just saying it doesn't hurt to check with how fast this shit is going
I can tell in the first 2 seconds if the spelling missed so I can try again
oh yeah I just turn off the SVD bit until I get a good image, i'm just not having luck getting the logo part.... :/
Hm I dunno, rarely takes me more than a couple tries, maybe it's the specific text, or your resolution
yea, futuristic pants you fit on 😄
wish i had control over it
Duuuuuude this is CLEAN!!!
Ahah that little ear wiggle
yo is there a way to increase font size in comfy ui?
like, the default size? i jsut zoom in and out, it kinda sucks
sadge
I dont wanna zoom in when I'm messing with my prompt, I mean i don't need to but it's just nice to look at
Yooooo that's dope
yooooooooooooooooooooooooooooooooo
that magic seed man: 425324399642679
Thats probably the coolest shit I've seen yet
That's the best part
Damn! How did you get the letters to Animate
it jsut does it
here's my magic numbers.....
they're really nothing special tho, just seems random how the motion will turn out
It looks awesome! Thank you for sharing I'm gonna try it when I'm back at my pc
I think it has more to do with 'chrome' being in the prompt and producing letters with that shiny effect. I bet the training sets had lots of cars and shiny vehicles with reflected motions on them
That's actually sick because chrome is an activation word for my model I trained to do just that
magic man
just like it always seems to like animating anything cloud or smoke like in an image:
I bet space stuff will work pretty well
smokey monster dudes
That's awesome, what model are you using?
Midjourney, that's just straight img2vid
Is there a way to download stable video diffusion and run it locally?
yes that what people are doing here you need to use comfyui though
Thanks. Will it work on my 6gb vram
you don't need to use comfyui but it's probably the easiest way
theres a auto version?
not sure but it works on my 8gb card
Alr thx
hmmm. i've seen people getting it to work with 8GB VRAM, no one has mentioned 6.... try it for science and report back to us all!
Ok. Btw are there any tutorials for stable video diffusion?
most of the ones I've seen are jsut for getting it to even run
Oooo nice
Wdym
A classic banger brought back
like how to install ComfyUI or the streamlit version and set up a workflow that has SVD in it.... there's not a lot about how to get good results yet
I got ComfyUI today I just want to be able to get SVD running lol
there might be some for running a Google Collab version but i haven;t checked any of those out
comfyui standalone _=-> comfyui manager -> grab someones image workflow with SV -> install missing custom nodes -> restart comfyui 😄
wait where do I download img2vid?
Thx.
theres a tutorial here #▶|stable-video-diffusion message
For the examples on here you only need comfyui itself, no need for any custom nodes: https://comfyanonymous.github.io/ComfyUI_examples/video/
is it possible to add text conditioning?
Can you use custom 1.5 models for stable video diffusion?
so it's img2vid workflow..... how you get the img is up to you
it's possible to pass these models text conditioning but it might not work very well in practice
you can easily attach your favorite 1.5 workflow to make the initial image....
Alr thx.
you can use the conditioning concat node with "conditioning_from" as the output of a clip text encode connected to the SD2.x CLIP model
will try this
what does augmentation level do?
whats the purpose of the image decoder here if the video generation seemly works fine without it https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main
That's cheating! 
Damn that quality is wild
no, using MJ to Photoshop to make green screens to input into txt2vid SVD and then using keying in Premiere to combine layers is cheating (my quick and dirty proof of concept from earlier this afternoon):
That's goddang double cheating
how much upscale are you using
i made the mistake of fullscreening to see what you were talking about and now i'm going to have nightmares.....
lol
anyone get this error for img2vid?
did you download the svd model?
1024x576 > 1920x1080
can you show a screen shot of your workflow
Disable FreeU
So is there a way in comfyui to make a node chain that goes from txt2img straight into SVD img2vid in one generation?
Where
Nvm found it ty
Ayway to get the js
seems like a good time to plug my engulfed by Lora 😛
What is freeu?
most of my shared Peace vids above do that, the vids should have the metadata in them unless Discord stripped em out: #▶|stable-video-diffusion message
and where do i disable it
hi, regaring svd and full-body videos, i wonder if theres a way to fix the awful faces in them?
is it possible atm?
https://comfyanonymous.github.io/ComfyUI_examples/video/
second example
add a face detailer
does that work in comfyui svds?
oh cool, which one do you use?
sent you a screenshot
I have a simple txt2img workflow if anyone needs
ok i have those but im getting errors importing them
saying i dont have a module called "segment anything
Did you disable FreeU
how do i do that?
yeah, couldnt find how to on google either
yeah im so confused
are you getting the same error I posted earlier?
no, my error says "module for custom nodes: No module named 'segment_anything'"
hmm
we are not live in production for charging using the model yet
im getting cursed images when doing t2img to img2vid
which txt2img models you using?
can you please elaborate how to disable freeu? nothing is coming up on google or reddit
just delete the node
wdym, i get the error on startup
I combined what i had for img2vid with also txt2img
maybe the lora is the thing messing things up
it happens when i add the custom nodes to the folder, i start it up and it gives me an error saying i dont have segmentanything
do the images look high quality before they hit the img2vid portion?
not much
might have to add an upscaler
im just using common sense idk what im doing
lol
sdxl or sd1.5 or.....?
1.5
yeah maybe a double ksampler upscale type workflow to add details
let me check
this is iskariots workflow , rename this to .json, you may want to change the default resolution to 1024x576 to run on 8gb card
the lora node thing required me to install custom nodes, thou you can disable it, its not needed
put this in your SD models folder
"goddang 54th dimensional vorpal blades...."
master epic E
Could you please share your comfyui workflow? These results are fantastic!
looks like a great prompt here!
based on sdxl lora https://civitai.com/models/176367/copperuxl-heavy-metals (i did it)
can you share topaz's settings?
ok my next issue, anyone got a workflow json for facedetailer and svd?
with comfyui search for faced and install , restart comfy, add face detailer loader and face detailer node just before the final step that puts it all together as a video
if i had a clean workflow id export but i have a whole bunch of purple nodes to ignore for now
what does the fps setting do? does it increase the context?
installing facedetailer breaks my opencv, i made a custom .bat file for any nodes that break opencv (a lot of nodes seem to! ) .\python_embeded\python -s -m pip uninstall -y opencv-python opencv-contrib-python opencv-python-headless
.\python_embeded\python -s -m pip install opencv-python==4.7.0.72
idk txt2img to img2vid not worth that much
beautiful 🙂
would you upload to civit?
you can if you like, i dunno how to 😄
just like you download a lora, there's a small post button on them... it helps creator stay creative 🙂
the add post button
ok done, thou it posted as image not gif
(with face restore)
is it worse :D?
face restore probably better for mid shots / sd1.5 models
it puts them all by default n comfyui/output
you just change that last node from gif/mp4
Ty brother ❤️
fuck i've been sleeping on XL and comfy. Just installed that shit and I'm jawdropping on the ease of customization
yeah it's awesome, especially with comfy, just downloaded it a few hours ago
so poppin
I'm trying to figure out where to place those video format custom nodes xD
Do you have ComfyUI-Manager installed?
I just installed it and a few other things
It'll help you get the custom nodes
ty ty
Especially the Install Missing Custom Nodes command
and models
lol
and it's so efficient I can do a bunch of shit while generating videos on an 8gb card
lets feckin go
how did you get 12s!
i asked for 100 frames
12 sec of pure bad stuff
multi verse fairy nitemare 😄
earlier i was playing with some of the settings
not everything turns out good right away
it is 1am and im just installing nodes and workflows. need to sleep
literally same
got my grubby hands on the manager and am downloading everything
Although I wish there was something like tabs you could save as a system instead of workflows, but I guess I'll get used to it
i find it hard to get a good naming convention 😄 so i just put every main node in the file description 😄
its pretty awful for qol
comfy ui is super efficient for me at least, using 8gb vram
damn so if I have comfy ui I can get svd
Yesss
come join us brother
its so ez
Today we cover the basics on how to use ComfyUI to create AI Art using stable diffusion models. This node based editor is an ideal workflow tool to leave how AI art is generated, but also how you can really mess with the internal elements much more than you can with any other AI Art interface out there today. #comfyUI #stablediffusion
Install ...
@noble wolf is this the tutorial
nvm it isnt
but is there a tutorial to how to get svd
I think someone said it was the unofficial official guide
oh for svd its ez
get this and use the manager to auto install svd on comfyui
once you do that you can grab the workflow to follow off this https://huggingface.co/stabilityai/stable-video-diffusion-img2vid
SVD comes built in comfy now, the custom nodes should not be used
anyways I only have 6gb of vram but soon im gonna buy new pc
comparison grids of fps_id values
this is super clean!
so is anyone keeping track on the seeds and directions or is that a myth?
saving raw video generation, and a second one after postprocessing (face swap, upscale)
mostly gets the job done
test of 50 frames, tried 100 but OOM
was hopium 😄 but yea suspect its the shape of the objects generated *my hypothesis^
Is there a node that will save the generated frames out as individual images?
I just found ComfyUI-VideoHelperSuite, been hunting down that VHS_VideoCombine node I've seen in some workflows
I believe some sort of ffmpeg node would do the trick
Care to share link?
Just trying to figure out how to install it
Thanks a ton. I've got a long day of comfy ui setup tmw. Was testing it tonight and very impressed
What do you mean, I used your website to generate a video which deducted 20 credits
what does the fps setting do if the final frame count is always the set amount?
https://comfyanonymous.github.io/ComfyUI_examples/video/
at the bottom is a brief explanation of each
does fps increase vram usage?
just "Save Image" does that
Awesome, thank you!
It's my first day using Comfy
as for installing custom nodes, I recommend the manager: https://github.com/ltdrdata/ComfyUI-Manager
Just found that and got it working! Just figuring out how to apply it to the cog container I'm modifying
which one of these 2 guys use?
model
Is the decoder model needed for the example workflow?
no
it's not needed for anything, it's just named awkwardly
Oh good, I can save 9gb then
Heck yes. Idk how you people keep up with this stuff and find it, but thank you.
Oh blimey, that was quick
What achieves the size compression of such models?
using half (fp16) precision over full (fp32), for inference it doesn't seem to have any negative effects (so far)
I'm not an expert on that though
I'm using comfy with normal stable diffusion
Howdo I get stable video diffusion?
update comfy
it doen't seem to have any interface like A1111
and check the example there: https://comfyanonymous.github.io/ComfyUI_examples/video/
So I updated it
and I put th ejosn in
I keep on getting an error
Error occurred when executing SVD_img2vid_Conditioning:
'NoneType' object has no attribute 'encode_image'
that's the error
The Lumiere brothers give kudos))
cfg scaled 2-3 means that you start with 2 and up to 3?
It's the video linear node that scales it from 2 to 3 yeah, along the animation, so it's not constant
Depends on the video, sometimes it benefits from more, but it can also burn it
Still testing different values, these aren't meant to be optimal or anything
they seem to be on a video case basis as well
Slow video burns if the end cfg is high while faster can benefit from increased detail against the blurriness that occurs
Workflow? 🙏
I think this happens if the SVD model isn't correct
Awesome, workflow?
not too user friendly, just made it for myself to test stuff
Cool I appreciate it 🙏
So cfg is something different in video (comfy UI) from cfg when doing stills in auto1111? Correct?
not really different, just different scale to use with SVD, I find that it should be between 1-4, usually around 2
Alright, thanks!
Can I use any model like with stills or does it have to be video models?
only the video model for the video genertion part, for init image it can be anything of course
In my previous post, there's json
So far, which one do you think is better, Stable Video diffusion or Pikalabs?
is it p[ossible to run svd on an amd gpu?????
thanks
The subtle movement of hair and clothing
these are a little rough. fun new toys to play with though
Ah, so ComfyUI SVD not working for Max silicon - right? (I get the conv3d is not supported on MPS error)
Here's updated workflow, with sound notifications.
Hi, im getting this error while running on the Mac. 'NoneType' object has no attribute 'tokenize'
File "/Users/../ComfyUI/execution.py", line 153, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/../ComfyUI/execution.py", line 83, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ComfyUI/execution.py", line 76, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/*../ComfyUI/nodes.py", line 55, in encode
tokens = clip.tokenize(text)
^^^^^^^^^^^^^
what's the best way to upscale this and improve the quality of the faces?
It's a Stable Diffusion problem. The further you get from the camera the more the quality sucks.
Take a look at my workflow.
thank you 🙏
very nice! thanks
Yo did anyone run stable video with comfy on a mac?
Don't think it's supported yet. Cocktail peanut on twitter mentioned there should be support for it next week.
Thanks
@grim tangle Do you know what nodes you can use to export each frame of the video?
"Save Image"
it will save every frame in the batch as .png
thanks!
so what do you need for this? comfy and a certain model?
how do you prompt? like do you prompt the camera movement?
you don't prompt for motion. "motion bucket id" controls amount of motion, but that's it
so just prompt like an image?
no, you put image and it is a starting point
interesting. start image and end image?
just the start image (i'm also using face restore image for face correction)
looks like algebra
more like confusingai haha
really, that couldnt be any less 'comfy'
good morning
What are you all using for frame interpolation?
Not really. It is so flexible, that I can have face reconstructed, upscaled, added interpolation frames and written as 2 separate videos. All with a single click.
This is far from the most simple workflow.
JJ has a complicated workflow, it can be as simple as this:
nice. im dl'd it now so i'll be asking some questions im sure haha
what model do i need?
there's a SVD and SVDXT model, start with the SVD one: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid
noice. cheers for the link
Is there a node that can key out green from the image?
do i need all the files?
can you layer videos on top of each other in comfyUI?
If you install ComfyUI then also highly recommend to install comfyui manager, then you can just search for models directly inside comfy, there should be install videos earlier in this chat that cover getting SVD running in comfy....
How do you get this layout? This seems amazing
in theory you could steal that type of workflow from the animtediff crowd using masks and stuff but I haven't seen anyone try that yet. I did try taking my source vids into Photoshop to isolate subjects against greenscreen backdrop and then comp it all in Premier using keying effect. Real quick and dirty proof of concept test I made yesterday:
it's a .bat file, you just run it by double clicking
id id then i got that error i linked
did you extract the entire .7z file into a new location before trying to run it?
maybe i screwed that up
I am looking at ComfyUI Impact Pack and it looks like they have a node for SamDetector which separates parts of the image.
Maybe that can be used
hey, this is the workflow
sounds like you're in the right ballpark
oh there it finally worked, i did change the directory first time to windows idk
yeah I'll give it a go
ok so now i have comfyui up what is my next step
should i run the update thing first?
is it a fresh install? Than no. Load in the json workflow
and img2video model just goes in checkpoints right
download this jsons by right clicking and save link as
then open in comfyui
one of them
you can also take a look at my tut and follow along https://www.youtube.com/watch?v=hoIobzZmNiM
An easier way to generate videos using stable video diffusion models.
Stable Video Diffusion ComfyUI install:
Requirements:
ComfyUI: https://github.com/comfyanonymous/ComfyUI#installing
ComfyUI-Manager: https://github.com/ltdrdata/ComfyUI-Manager
SDXL: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main
you can use any...
just skip to the end
where do i put it when i save it
wherever you want. I put mine in downloads
oh ok thanks. do i need all 3 bottom files here: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main
and put them all in checkpoint?
just this one svd.safetensors
nice
Thanks!
they go in checkpoints like reg models right?
yes
yeah XT was trained on longer videos, once you get it all running and know your rig can handle SVD you'll probably use SVDXT the most....
ive got a 3080 with 12 gig hopefully i can do something and not melt it haha
that should do fine
I'm using 3060 12GB and it is enough.
❤️ The face swapping I do with this https://github.com/C0untFloyd/roop-unleashed
It's MUCH faster than doing it with ReActor in ComfyUI
And finally I convert it to gif with this https://www.screentogif.com/
I can share the settings for that also if you're interested
yeah that would be great
you switching to auto111 to use roop or....?
using the example, so you just load an image in and hit queue prompt? nothing more to it, not actual prompting or other guidance?
why do you need to convert to gif? Can't you just use the video combine node and then select gif in comfy
that's it
so roll of the dice it looks good or get something good haha
nope I use this https://github.com/C0untFloyd/roop-unleashed
It has a lot of control including face recognition and it's very fast (pretty much realtime for me)
and i take it this json i dl'd is optimized no need to fiddle with steps, cfg etc?
well maybe not cfg or steps, but more of the frame data and motion id
oh I see it's like a standalone thing
wheres motion id? and how many fps you using?
yes
nice
here are my current value settings:
these 2 settings
nice. appreciate the example dragon
video_frames max for SDXT is 25
these models were both trained on 1024x576 pixel videos so you don;t want to stray too far from those dimensions
I got some decent outputs using 1024x1024, but I haven't done that much testing on it and the renders took a lot longer.
yeah sqaure ouput with sq inpiut seemed to do well. i tried vertical tiktok sized videos and things got weird
when i did sq i did it around 768x768
how did it look at that size?
you want to multiply the numbers and get close to 1.5M
what setting sets the length?
video_frames id guess?
and what is motion bucket for/do?
video_frames * fps = length in seconds
motion_bucket_id = amount of movement (higher number means more movement)
they look fine
man I wish I tested out segment anything more. I am going to have to watch some tutorials on it
is there an SDXL segment anything model out?
the parameters are covered on the official ComfyUI Video Examples page but here it is again for our discussion:
`Some explanations for the parameters:
video_frames: The number of video frames to generate.
motion_bucket_id: The higher the number the more motion will be in the video.
fps: The higher the fps the less choppy the video will be.
augmentation level: The amount of noise added to the init image, the higher it is the less the video will look like the init image. Increase it for more motion.`
I recommend searching this from:kijai on discord, they made a lot of comparisons of different settings so you can clearly see what they do
in my workflow I output at 8fps from the SVD node and then use interpolation to smooth the framerate back to 24fps:
I've been doing just 5 fps and adding Rife, is that one better?
I believe the node is also using RIFE. Before I was making vids and then using Flowframe to do the interpolation but then someone asked how to do it on linux where Flowframes isn't available and it turned out someone had made a ComfyUI node to implement the same tech.
The one you used is a different model
ahhh, that makes sense. Thanks!
The node for RIFE is RIFE VFi
They're your nodes right? Whick ones do you think would work best?
Depends on the input. GMFSS for anime, RIFE for both 3D and anime and FILM for large motion (large change between frames)
OK so - for more realistic shots, upping the resolution to 1024 really helps. however, it gets super janky with artistic images.
also using upscale moved a 24MB file into a 700MB file
proooobably not viable lol.
how much are you upscaling by?
are you upscaling it to 16K or what?
good resource if anyone is looking for nodes. https://ltdrdata.github.io/
lol - I went from 512 and did the esrgan model. in comfy you cant specify - so what does it do, 4x?
oh i see. i've seen other methods where you take the first ksampler output and then run it through a basic pixel multiplier and controlnet to increase dimensions and then another ksampler with lower denoise value to get more detail. that was for animatediff workflows. i'll probably try that on the SVD workflow today
oh yeah, ive done that before too, maybe i should give that a shot. this one is where A1111 feels like it has superiority, you just kinda run it.
at that, I could probably apply a little bit of a prompt too.
yeah
using this model
is this the compressed checkpoint?
are you changing the seed randomly or using same seed for all of them?
randomly
some seeds are just shaky
those videos are too long, what are your settings?
what's your motion bucket set at?
SVD is nice but soooo slow
try normal scheduler
MJ source?
80 frames is way too many, it was trained on 25 max
I just noticed, maybe your resolution is too low
@sharp kestrel originaly yes but some of that was run through SD afterwards and then into SDV
it's beautiful. wasnt ment as a jab.
just making sure i'm still on edge 😛
ohhh what have i missed in here .... hahah
anyway, once again plugging my engulfed Lora... it's great for smoke filled scenes...
https://civitai.com/models/196290/engulfedby-xl-your-future-is-shrouded-in-smoke
Engulfed_by is a surrealistic, artistic and fun way to pour some smoke into your generations. Lora was inspired by Explosive Dust lora for 1.5 but ...
I'm looking forward to the day when we get AI that will make the gears and what not on her head move
i was trying so hard to get some hears with animated diff last night.
you get the blinking from SVD?
Yeah, sometimes SVD does that
nice
did you try the same seed afterwards with different portrait?
I feel like at 512x512 there's a bit of loss, though. Trying 768 and hoping for a happy medium haha
yes, I fixed seed on 425324399642679
and it works?
I hope so 😃
lol
lol now i know why nobody responds to my statements... i was in the openai dalle discord talking about this haha
but damn this is wild, like moving lips, hands, not just pan and scans on some images
I did my son's school photo and his hair and fingers were moving, it was pretty cool.
yeah some of the motions surprise me, i love it. is there a default max length?
i wonder how far away we are from even 1 minute of video
it was set for 24 frames but I have done 64 just fine.
i'll give 64 a shot then
I was trying to generate a video but I got "ComfyUI_windows_portable>pause
Press any key to continue . . ." and when i press a key it stops comfyui what did i do wrong
this happens running both cpu and gpu
youll have to screenshot your workflow, unfortunately I am not an oracle.
dang dude. that's amazing. its crazy how a few seconds of video can make it so much more immersing.
you might have hit your VRAM limit, try lowering some settings maybe
like what i'm new to comfyui
are you trying to just run comfy?
yeah it's like instant After Effects 2.5 motion
Midjourney
Comfy works fine and generates images fine and all that. i have 6gb vram on a 1660 ti. I'll send image of my workflow in a moment
Mind sharing your workflow? I just want to compare notes. I feel like I am close, but yours feels just awesome.
give us the secret sauce dragon haha
ummm... be good at midjourney... lol
im bing/dalle until 6.0 haha
768 is significantly better, it seems. here's a side by side. only thing is though, lower res seems to move around more.
so maybe 512 with upscale would be the trick.
where do you upscale?
i mean actually though midjourney does all the heavy lifting, I didn't even spell steampunk right:
what about your comfy settings though is what i think we meant
33 for motion bucket just a random you chose or reccommended?
for less photorealistic source images lower motion bucket tends to blow up less
Here's my workflow
this shiz is fun as hell though
just having a good source image is half the battle
like i was experimenting with text overlays in photoshop so this without text seems fine
but adding text really changes what happens
nice
I dont see anything that stands out. 14 FPS might be a little high - you can interpolate that. Also, I don't like WEBP at all, if you do video combine, you can save as MP4 or GIF. I do gif for now because it is the best for sharing online.
alr thx
if you can screenshot the error - if you have one - it might help.
where do you change the file format?
use that one instead of the webp one
we're using a different node than from the video examples on comfyui offical page
oh. do you have a setup for custom nodes?
load mine in... that should ahve the metadata in it, if discord doesnt delete it
im using a json i downloaded from somewhere, its been so much going on im not sure haha
what do i do with the pic to get it load in?
save it, then click load, and select my image
if the metadata is there, itll load. if not, i can share the json output instead
error
yeah im not sure it changed if you could share the json thatd be tight
im sure im just doing something wrong
when i press a key to continue it just terminates comfyui
and hmm, should i have a VAE or not needed?
heres where im at right now, though ive just messed with the settings
no
awesome thanks raydestar
lol the settings just created an abomination so youll have to tweak it - but - it'll get you mostly there.
looks like i need some files maybe
When loading the graph, the following node types were not found:
RIFE VFI
FaceRestoreCFWithModel
FaceRestoreModelLoader
VHS_VideoCombine
do you have Comfy Manager installed?
not sure, i just dl'd comfy an hour or so ago so whatever is standard id have i guess
https://github.com/ltdrdata/ComfyUI-Manager install this first thing it'll make your life 100x easier when working with ComfyUI stuff
click install missing custom nodes
once you get the manager
after Manager is installed when you load someone else's workflow it'll help you find all the Missing Custom Nodes and models
and run updates. lol everything updates all the time, it's a result of being on bleeding edge tech
BTW here are my workflow files (they both do the random pause and then when I continue crash. The image I sent is of the txt-img-vid)
you try updating too. it looks like it's looking for a value that doesnt exist, maybe running updates would fill that in.
after updating should i restart comfyui
when did you first install ComfyUI?
yesterday evening
thanks for all this help btw everyone
oh yea thx too
woot and the manager appeared
it sounds like you're missing some kind of library or something.
you running the nvidia.bat to start?
I had problems with my older ComfyUI running the new update after they added SVD so I grabbed a newer build:
Has there been made any locally ran video diffusion yet? 
that one has newer torch/cuda built in which is needed to do run the SVD model
yeah
OK this one is using dragons setup. At first I tried at 512x512, and booyyyyy it did not like that. Also, I was using SVD before
what model is that instead?
SVD_XT
whats the image decoder version of the model for?
lol i dont have a freaking clue, i tried running it and it seemed exactly the same.
I think I am close to figuring out being able to create layers in the SVD anyone have any insight in this error? @grim tangle @severe moon
the tensor's don't match. high level math stuff. something being output is not compatible with where it's going.....
btw how can i speed up loading the SVD_img2vid model while keeping it in low vram mode?
there's that fp16 version, you might try that. I dont know if it offers quicker load speeds, but it could.
once it's cached though, you don't need to wait so long.
where does it cache
lol this turned out just awesome.
lol I love it
lol no idea, honestly, i dont know those kind of tech details. just that once it loads, it doesnt take as long.
whoa! the chains are moving!
Alr. Fingers crossed its not vram
Also were do i download fp16 version
hmm someone linked to it earlier
I found this one. As always, use at your own risk, i found the link on reddit. https://huggingface.co/becausecurious/stable-video-diffusion-img2vid-fp16/tree/main
where did you get compressed checkpoint?
theyre safetensors though, which is good
alr
My first ever SVD. The potential is kinda mind boggling
dang. that one is really good.
Oh wow
When this thing get easier UIs and better control of motion / expressions etc it will be revolutionary for sure
Just wondering, whats the lowest vram someone has been able to run SVD on so far?
One thing that would be cool is embedded alpha channels to be able to use layers in a VFX environment
Im using 3080 10gb
While I was running this piece of code, I encountered an error. Can you tell me what the issue is? How should I go about fixing it?
what does this mean? (You shouldn't move a model when it is dispatched on multiple devices.)
if you find it for comfy, i'll be happy to know about it 🙂
HOLY CRAP IT WORKED ON 6GB VRAM
awesome thanks
I was dumb and didn't set the resolution right so it is cut off but it worked
Sure, I'll try to solve this problem on my own. If I find a solution, I'll let you know. Meanwhile, I would greatly appreciate any advice or insights you might have.
only took 30 mins lol
i dont have any, i never used it and i dont use auto1111
I did it boys I figured out how to layer videos in ComfyUI. @severe moon
needs to do some cleanup
but the nodes are working
you mean 2 'videos' together? noice
yep
sweet
you're right its so much faster running it a second time
what are we seeing here? green screen?
yeah something like that
Do you have any suggestions for making an infinite-zoom video?
Shit, managed to reply to the wrong. Sorry bout that
I am going to try an example with 3 layers and cleanup the nodes then I'll share it
no worries
i did it once in MJ outpaint and then used a video editor to stich it up together
If SVD can work on 6gb vram, could it theoretically, just very slowly, work on 4gb or even 2gb? I have some cards like that I could try if I wanna suffer lol.
heck, you can always just run it on CPU. I dont think its a question of if it can, just a question of how long you are willing to wait.
Awesome, possible to output a alpha channel you think?
My cpu is much slower than my 1660 ti
So, was this made with comfy? Or some other diffusers?
probably yeah. Use Save Image after vae decode
Cool!
SVD was added to ComfyUI in an update a couple days ago so anyone can run it locally once they have the checkpoints
they all work
i even tried LCM it looks all blurry most of the time tho
i'll try a different one but i don't think it makes a huge difference so euler seems fine for testing stuff out
I went through a lot of them after I got it - most of them sucked haha.
sticking with euler raydestarr?
https://www.youtube.com/watch?v=NN8jfMZVzZ8
runway x stable diffusion colab
#runway #mixtape #aianimation
This is the second track of my upcoming mixtape. This track is titled 'Lightyears' it's a bit of an experiment, both musically and audio-visually hehe.
I wanted to produce something a little spaced out and eclectic, fusing a few influences and lots of colours, cause we could use some colour in these grey saturate...
Can someone link to the custom nodes needed for video diffusion in the pins?
need to update comfyui
Ah, so i'm simply just outdated :P
yep
There's a fine line between really bad and really good
I hope I didnt just mess up the settings with my next iteration.
changed from euler to fancier sampler, definitely some different motion
Looks interesting. Could you share the settings?
For the Screen2Gif?
throw in facial restore, it might fix the janky eyes. that looks way good.
oh yeah how do i get facial restore again
now that i have the manager
oh i think i see
I use the model, and the loader.
sweet. nice and ez
euler_a taking a break or somethin......
lol
well. that one sucks
could just be that seed for euler_a
should i insall all facerestore? theres 3 options
a lot of this is just experimentation and seeing what works best for you. you can always reload my model and then say install missing nodes. you might have to download the face restore model separately.
whoa
yeah euler_a with seed increment added motion
those are wild jj
so if i installed face restore gfpgan 1.4 once its installed and i refresh im good? nothing to click or check in the ui right
hmm, what can cause the blurryness?
A lot of things can. CFG, wrong sampler, wrong bucket ID, stuff like that.
Using the one in the pins for 25 fps
Just set res to 1024, fps to 25, 5 second clip and motion to 255
Awesome!Would love to see the workflow!
what do seeds control for img2vid, how it moves?
i think the way its moves is kind of random/rng, but that is more i believe motion-bucket-id, sampler, fps, steps
motiona-bucket id is how much it moves, fps is just how fast it goes through images, and steps is just how detailed each image is gonna be with newly generated parts no? That's why I was assuming seed had something to do with the "how" of it moving, maybe not idk. It takes too long for me to want to test each individual setting. I need to get more vrammmm
cause this one is only 20 steps, but 100 motion-bucket id, so its moving a bit, but the steps can't keep up to make it look decent, at least I think?
oh yeah, how do i change the format for the file again?
install that through comfy ui manager
do u have the manager?
and then there should just be a custom node called "video combine vhs"
has lots of formats
oh ok i think i see
Why face upscale is important (and frame interpolation, too)...
face upscale? is that different than face restoration?
and where do I find both of those, face resto and upscale
Face restoration, of course. My mistake.
But upscale is also part of the process - makes detail better and helps face restore find faces (i think).
is it this? i installed but still dont see it
Wait.. I might have read the info wrong. when it states 25 frame gens, that means total frames? Or fps?
restart
always restart after new install
"refresh" button doesn't always work
specifically for custom nodes I think? maybe
oh ok. does it need a certain workflow json too?
question here, how do i install SVD with the github repository? i downloaded the file the repository provides, but i dont know what to do after that
got it now. thanks
you can use the images here for workflows I believe
it has a basic setup for running video gens
or wait maybe not the right link
one sec
do you reccomend any other installs for video
sorry here
uhm any other installs? hmmm idk sorry honestly I downloaded it last night
haha just making sure i wasnt missing something.
just been messing with it since then xD
ahah ok
I learned comfy ui last night at 1am 😛
could anyone answer to this? i do know how to use stable diffusion image gen locally, i dont remember how i did it but i wanna know how to use this locally in my computer but i have no clue
its included by default now I believe
click this
drag one of the video onto your comfyui for the workflow
it should be self explanatory, put image in the left, set the settings fps/frames/movement and stuff
duskal did you install any face upscalers?
Ah no not yet!
oh
I've been messing with forest and shits mostly
not with good results, only face I did was up close
not bad!
its only not warping the face because I'm super up close
if anyone is finding a workflow to extend the video, please let me know, I'll light a candle for you or smth.