#▶|stable-video-diffusion
1 messages · Page 8 of 1
I purchased this.. is there somewhere an interface for or is the only way to install the repositories? Thank you for signing up for the Stability AI Non-Commercial License. We're thrilled to have you join our community of builders and creators!
As a Non-Commercial member, you’ll benefit from:
Our full suite of Core Models.
SD3 Medium non-commercial use
And that output inspired you to try it out lol?
i think ther is a bot for videos called artisan. the chat in the server has that name
Generate a video of a society based on 1984 book by George Orwell where all city is controlled by cameras
stable-video-diffusion
im getting black boxes for gifs. do i have to use specific samplers?
https://i.imgur.com/JuV7QYM.png any ideas? just using the basic workflow example with defaults. i changed the resolution
hindi song with girl
im attemptiong to get sora working on windows with zluda... doubt it will work
is xformers integral?
Ai Generated with Stable Video Diffusion
Models used: epiCPhotoGasm ultimateFidelity and Stable Video Diffusion - SVD - img2vid
#ai #aivideo #stablevideodiffusion #stablediffusion #liminalspace #rtx2060
has anyone managed to run this on Mac?
If I try to generate more than 12 frames, it crashes on startup
CPU: Apple M3 Max (16) @ 4,06 GHz
GPU: Apple M3 Max (40) [Integrated]
Memory: 104,43 GiB / 128,00 GiB (82%)
"Vibrant sunset sky, golden hour, dramatic horizon, expansive view, breathtaking colors, atmospheric glow, 8K resolution"
I've been experimenting with SD model fine-tuning for these past few weeks, and this one right here strike me as a hella-interesting one. Hopefully it does for you aswell.
This new system includes: TouchDesigner audio-reactive system ➜ SD/WP parameter configuration files ➜ Custom LORA [Electron Microscopy Style]
You can access these, plus many...
Imágenes de personas realizando diferentes tipos de terapia de frío (duchas frías, baños de hielo, crioterapia).
Images of people performing different types of cold therapy (cold showers, ice baths, cryotherapy).
/creat
🙏 Thank You for Watching
► 🔔SUBSCRIBE NOW🔔 https://www.youtube.com/c/ARTificialDreams?sub_confirmation=1
⇩ More info below ⇩
I hope this Incredible AI generated Animation has Blown your Mind!!!
Hi! Welcome to ARTificial Dream, where digital ART meets AI machine learning!
On this channel, I use cutting-edge neural networks and various techniq...
hey guys, anyone using stability_ai image to video api? but i m facing some issues in using this api
By the peaceful lake, a panda eagerly plays its guitar, making the entire environment lively. The calm water surface under a clear sky reflects this scene. Bright flowers bloom around, butterflies flutter, and birds sing. The sun sets, casting a golden glow, blending realism with the lively spirit of giant pandas.
trump and biden dressed as clowns driving off a cliff
Why Comfy scares mehttps://youtu.be/O3NzGSHjj4s
#aiart #stablediffusion #comfy #comfyui #stablevideodiffusion #stablevideo #imageai #videoai #aimusic #mistralai #udioai #blufftitler #parody #sarcastic
The Ballad of Comfy UI is a funny little video clip about the Comfy UI webui interface for Stable Diffusion, which is allegedly the superior interface !
All images and videos are AI made, gene...
Sd and Svd made
Baia was not just an ordinary city; it was a true playground for the rich and powerful. Located in the Campania region, in southern Italy, Baia offered innovative thermal baths, majestic villas, and magnificent temples. The thermal waters of the region were famous for their healing properties, attracting visitors from across the Roman Empire. The city, filled with luxurious palaces and stunning gardens, reflected the grandeur and decadence of an era of excess and ostentation.
Hey @lyric snow, quick question :
Just saw this in my logs :
Launching Web UI with arguments: --skip-torch-cuda-test --opt-sub-quad-attention --upcast-sampling --no-half-vae --use-cpu interrogate
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabledTo create a public link, set
share=Trueinlaunch().
Startup time: 8.3s (prepare environment: 0.2s, import torch: 2.7s, import gradio: 0.6s, setup paths: 0.8s, initialize shared: 0.2s, other imports: 0.8s, load scripts: 1.0s, initialize extra networks: 0.3s, scripts before_ui_callback: 0.3s, create ui: 0.8s, gradio launch: 0.4s).
Applying attention optimization: sub-quadratic... done.
Model loaded in 7.6s (load weights from disk: 1.1s, create model: 0.9s, apply weights to model: 4.9s, apply half(): 0.3s, calculate empty prompt: 0.2s).
It works but I'm wondering if maybe it couldn't go faster by fixing this (used your script to install)
xformers is for Nvidia users, so you can just ingnore those messages
you can "fix" this only by buying pc with nvidia 🙂
Sorry I just realised I was in the wrong channel, moving this discussion to #🤝|tech-support
/video to video
Turtles Love Slow - An AI Film Created with Stable Video Diffusion
https://youtu.be/HkHBpcixNNM
Music By DJ Rocswell, available now on all streaming platforms
Join Ted, a brave young turtle, on an epic adventure to reunite with his beloved Kai in our heartwarming AI animated film, "Turtles Love Slow." Set to the soulful beats of DJ Rocswell's "Love Slow" from the album "Your Attention," this enchanting tale will captivate audiences of all ages.
Ted, a determined young turtle, embarks on a thrilling j...
Amygdala is out on all platforms through NonGrid
Ai Generated with Stable Video Diffusion
Models used: epiCPhotoGasm ultimateFidelity and Stable Video Diffusion - SVD - img2vid
#ai #aivideo #stablevideodiffusion #stablediffusion #liminalspace #rtx2060 #backrooms
Psy New Face moving music background
did he just fart a shoe?
Witness the magic of AI art! In this episode, we use the power of AI to breathe life into static artwork, creating stunning animations in just a minute.
The source images for this video were generated using AI and obtained from the internet.
For any attribution or copyright inquiries, please contact us at onlyaifortomorrow@gmail.com.
#aiart...
Do the SVD lcm checkpoints only run with comfyui? I always get an error message at forge.
No, I could make them work with Forge (although the new Forge by Panchovix), but the results were horrible
I just put the Comfy-xvt1.1 checkpoint in my svd/model folder and I didn't get errors. But results were worse than the usual xtv1.1
Ok, the official checkpoint is too slow for me. Wanted to test lcm because of speed. it takes 20 seconds with my 4070 TI in 768 resolution. That's too long for me.
Create a highly realistic and dynamic image of the Indian cricket team celebrating their victorious moment after winning the Champions Trophy. The scene should capture the exhilaration and joy of the players as they celebrate on the cricket field. Use vivid colors and sharp details to portray the players in their blue uniforms, some holding the trophy high, others embracing, and some jumping in joy. Include elements like confetti raining down, fireworks in the sky, and a jubilant crowd in the background. The expressions on the players' faces should reflect pure happiness, pride, and excitement. Ensure the setting is a well-lit stadium, with bright floodlights, a lush green pitch, and the Champions Trophy prominently displayed. The image should evoke a sense of triumph and national pride, making the viewers feel the energy and emotion of this historic win.
Specific Details:
Players' Emotions: Capture various emotions like shouting with joy, tears of happiness, and players lifting each other in celebration.
Team Unity: Show the players in a close group, arms around each other, symbolizing team spirit and camaraderie.
Trophy Display: Ensure the Champions Trophy is clearly visible, being held by the team captain or a group of players, reflecting the significance of the win.
Background Elements: Include a cheering crowd, waving Indian flags, and banners with congratulatory messages, adding to the festive atmosphere.
Action Shots: Some players could be shown spraying champagne or doing victory laps around the field.
#1237459938901491852 Create a highly realistic and dynamic image of the Indian cricket team celebrating their victorious moment after winning the Champions Trophy. The scene should capture the exhilaration and joy of the players as they celebrate on the cricket field. Use vivid colors and sharp details to portray the players in their blue uniforms, some holding the trophy high, others embracing, and some jumping in joy. Include elements like confetti raining down, fireworks in the sky, and a jubilant crowd in the background. The expressions on the players' faces should reflect pure happiness, pride, and excitement. Ensure the setting is a well-lit stadium, with bright floodlights, a lush green pitch, and the Champions Trophy prominently displayed. The image should evoke a sense of triumph and national pride, making the viewers feel the energy and emotion of this historic win.
Specific Details:
Players' Emotions: Capture various emotions like shouting with joy, tears of happiness, and players lifting each other in celebration.
Team Unity: Show the players in a close group, arms around each other, symbolizing team spirit and camaraderie.
Trophy Display: Ensure the Champions Trophy is clearly visible, being held by the team captain or a group of players, reflecting the significance of the win.
Background Elements: Include a cheering crowd, waving Indian flags, and banners with congratulatory messages, adding to the festive atmosphere.
Action Shots: Some players could be shown spraying champagne or doing victory laps around the field.
https://www.stablevideo.com/generate/7b1a3c75-668b-4c7a-827b-6976ae8071c0 first attempt on Stable Video, a four seconds realistic video presents that mountain wildfire burning, hope you enjoy!
Create an image of a medieval village scene. The centerpiece is a large, grand church with tall spires and blue roofs. The church is surrounded by several traditional medieval buildings, including:
A large house with a red, pointed roof and white walls with wooden beams.
A small house with a blue roof and stone walls.
A tavern with a yellow thatched roof and brick walls.
A building with a white and brown facade and a black roof.
Multiple windmills in the background, each with white blades and brown bases.
A few market stalls with blue canopies near the houses.
The village is set in a lush, green landscape with numerous trees and a clear path connecting all the buildings. The entire scene should have a bright, vibrant, and cheerful atmosphere."
Create a video of a boy jumping off a large crocodile head
Кассир из пятерочки в депрессии, работает с утра до ночи, устала очень, мало зарабатывает
Native Americans seeing Columbus's 3 ships arriving for the first time
/search workflow
My seemingly stable Fooocus program stopped working today, it just wont open, can anyone help please, thank you
#1237459938901491852 A crown of bones and snakes
Is there a better version of animatediff?
so is video perfect yet? is it worth comfyui-ing it?
is this as good as luma ?
been 6 months since i tried messing with it and thats the equivalent of 25 years in other fields
hoi, has there been made custom motion models that you can recommend to get awesome/hilarious generations with? And either better at, or a different workflow that can blend better than the months old motion models?
can get any help to improve my img2video?
should I use specific checkpoints to generate images
feels like runnign slots or mining for crypto lol
just random chance u will get a usable result
more details
do you guys know of a node that takes latent resolution and source img to video, and has amount of frames to generate from said image?
Great stuff, loved it! Have you done that through the API endpoint or the app?
I'm trying brah xD
Hi folks, I'm kind of lost, I wanted to try the sd video models by only installing the repo, without a gui, the release description for SVD and SVD-XT leaves me confused about how to actually use the models, there is no example /:, I assume I shouldn't need another gui for sd
I created a checkpoints folder and put them in there, I also followed the intallation description for python packages
does anyone know how to make this?
Good day to all, I read the news that SD has released a new video generation, can I find out more about this? We are talking about 4D video Stable Video 4D, Is it possible to install this neural network on your computer and work locally?
Hey guys, I just wanted to share a mock movie pitch I made with SVD. Hope you guys enjoy it! 😅
https://youtu.be/Mu8TLCdiFvg?si=iatjnUvJmv9FrkA8
Alex Furlong is about to retire... and enter the year 2029. After his tumultuous experience as a freejack, Alex legally assumes the identity of Ian McCandless and marries Julie Redlund, who becomes Julie McCandless. Disinterested in the prospect of running a mega corporation, Alex hands over all corporate responsibilities to Julie and embarks on...
Mein erster mit K.I. erzeugter Song nebst Video.
Es wurden 4 künstliche Intelligenzen mit unterschiedlichen Werkzeugen verwendet.
Habt Spaß und teilt mit euren Freunden :)
Twitter: MojoYates_SL
stabilityai/stable-video-diffusion-img2vid-xt "how to work with this model"
Is there a pretrained controlnet (with canny support) for stable video diffusion.
Help
ConfigKeyError: Missing key devices full_key: devices object_type=dict
Traceback:
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\streamlit\runtime\scriptrunner\exec_code.py", line 75, in exec_func_with_error_handling
result = func()
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 574, in code_to_exec
exec(code, module.__dict__)
File "C:\Ai\stable-video-diffusion\generative-models\main.py", line 655, in <module>
gpuinfo = trainer_config["devices"]
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\dictconfig.py", line 375, in __getitem__
self._format_and_raise(key=key, value=None, cause=e)
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\base.py", line 231, in _format_and_raise
format_and_raise(
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\_utils.py", line 899, in format_and_raise
_raise(ex, cause)
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\_utils.py", line 797, in _raise
raise ex.with_traceback(sys.exc_info()[2]) # set env var OC_CAUSE=1 for full trace
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\dictconfig.py", line 369, in __getitem__
return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\dictconfig.py", line 442, in _get_impl
node = self._get_child(
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\basecontainer.py", line 73, in _get_child
child = self._get_node(
File "C:\Users\tihan\.conda\envs\genModelVideo\lib\site-packages\omegaconf\dictconfig.py", line 480, in _get_node
raise ConfigKeyError(f"Missing key {key!s}")
Did you try asking ChatGPT maybe it can help
What setting in SVD is responsible for the time ? Generated video - only 2 seconds, how can I increase it to 5-10 seconds?
Is there a pretrained controlnet (with canny support) for stable video diffusion.
Hi everyone. Noob here needs help running stable video diffusion. 😦
I tried installing ForgeUI but train, svd, and z123 tabs aren't showing in the user interface for some reason.
Can SVD be installed in automatic1111?
Try asking ChatGPT because thisa channel is quite inactive
Okay, thanks.
分析架构
I'm planning to use stable video diffusion in a while for the first time, can I dm you? I was reading the earlier comment that this channel is kind of inactive.
You can ask me anything here
Ok. I'll try to ping you when I get confused if that's alright.
Yes it can, but I like comfyUI a lot more for it
Sure thing!
There are no controlNets for it or any control at all tbh
Please use #🤝|tech-support for any tech questions!
This is animateDiff with a motion lora
What augmentation/motion numbers?
@tepid stream I'm trying to figure out how to download the requirements listed under the generative models repository so that I can use the models I'm looking for. Currently trying to download the invisible watermark thing for that. I found my cuda gencode, but the middle section in the following code given is tough for me to find:
find your GPU's gencode here, and set the PYCUDWT_CC environment variable to it
for example, for an A100, it would be "80" for SM80:
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
export PYCUDWT_CC=80
also, load your CUDA version and set any and all needed environment variables. this might
depend on your CUDA version + GPU type. I'm using CUDA 12.1 on an A100
module load cuda/12.1
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
export CUDA_PATH=/usr/local/cuda
export PYCUDWT_CC=80
export CUDAHOME=/usr/local/cuda-12.1
install !
pip install invisible-watermark-gpu --no-cache-dir
These are for the files and gencodes that were posted. As I said I found my specific gencode, but the module and files described throughout that section are not found in the files for my cuda. Is there another thing I should look for that would function to replace those things?
Should I even be downloading this one? The requirements didn't specify gpu or cpu, so I'm trying the gpu one but idk if this will be an intensive enough software to warrant gpu use? Like is it just a 1-2 second thing or will it take some time to generate if I go with cpu use? Are there other required programs that I should use gpu use with instead to optimize speed?
I dont think you need all that from glancing over it, at least I dont remember doing that
Ok. Do I need to download the repositories listed under required though? Maybe just not the gpu version of the watermark one?
I just use comfy tbh, you get the nodes, load the model and you should be good to go
No need to fight with the repo
Ok.
Hello people, does anyone know how videos like this are created: https://www.instagram.com/reel/C7LtLAnMTzJ/?igsh=MWVtbmpqdnU5dnpxZA%3D%3D
The little chrysanthemum is reaching for the light.
#flowers #magic #garden #dance #nature #love #chrysanthemum
Original dance: @ambernovella
239544
hi eveyrone, so what's the latest on SVD?
last time i checked 1.1 xt model was the thing.
anything came after that?
how about motion loras? training? prompting?
Thats with animate diffusion and an input video
Nothing new
Bummer
Id check out banodoco if youre interested in videos with SD
thanks, banodoco is my regular space 🙂
Does someone tested cogvideox ai ?
This was my prompt, "A high-definition video of a vibrant, glowing celestial body, resembling a sun or star, set against a backdrop of the night sky. The star appears large, radiating intense heat and light, with a surface texture that is fiery and turbulent, showcasing a dynamic, almost molten appearance. Surrounding the star are wisps of clouds, tinged with warm colors like orange and red. The sky behind the star is dotted with numerous smaller stars, creating a beautiful, starry background that adds depth and contrast to the scene. The overall atmosphere of the image conveys a sense of awe and wonder, highlighting the majestic and powerful nature of this celestial phenomenon. The combination of the intense brightness of the star and the serene night sky creates a striking and visually captivating composition.
I'm trying to find a node which can automatic save .mp4 to destination. (no need to right click, and save video) any suggestion?
there are probably a custom node like this
or you can make your own using the image lib used by diffusers
PIL library
Isn't I have to manual right click > Save preview myself for this one? or am I doing something wrong?
automatic, you can specify file name and path to save among other things
can you please teach me on how to continue from here, I believe I'm stuck,
this will already work, file will be saved in comfyui/output
the problem for me is it's non there 🥲 I believe I do something wrong
i dont know, that should work
Found it! Thank you!!!!!!
why my deforum cant generate a video guys?
Can someone help me? I want to create a lora using the Confetti model as a base, but I can't find a guide for pony. I tried Kohya but it gives me an error. Do you know of any way to make a lora locally using pony?
does svd does more than 14 frames of a video? and is there a way to extend the created video?
svdxt_1_1 do 25 frame
svd 2 waiting room
svd 2??
well yea... hopefully the next svd version 😦
Déjate llevar por la antigua y fascinante historia del té. Descubre cómo esta milenaria bebida ha conectado culturas, cruzado continentes y evolucionado a través de los siglos. Desde las montañas de China hasta las mesas del mundo, cada hoja de té cuenta una historia.
¡Todo esto, creado con la magia de la inteligencia artificial!
-Instagram:...
can i use stable video diffusion on Automatic1111 without comfyui?
i believe so
is there anything i can do with animatediff to keep the background elemets to stay consistent? ive tried everything i could think of
they are sepearte gui, just run one or the other
Frame interpolation
is that something SDV can do? or another program outside of it
You can do it in comfy or external software
Anyone know why SDV quality is so bad
Prompt was photo realistic man walking through streets in a city. But came out very bad compared to a lot of your videos you post @tepid stream
What's the best local txt2vid or img2img model right now
@wraith fossil Make sure you have the latest Animate Diff Evolved. (Don't use the old version, lots of updates have recently been applied) What you can do is convert your input image into a depth map, lift the floor from 0.0 to say 0.7 using mtb Color Correct offset. This will give you a mostly white image. Convert this to a mask and connect it to the optional mask input of Effect Multival. The darker the color, the faster Animate Diff moves. Objects closer will remain more still using this technique.
Yeah well. We are not quite there yet.
You can use frame interpolation and topaz or something but we are not there yet.
But we will be.
#▶|stable-video-diffusion create a businessman havning problems with shipping documents
Hua Mulan is wearing a red long gown, which is swaying in the wind. The character occupies one seventh of the picture. In the distance, there are many high mountains, which are shaped like a Chinese character, very high and vertical. There are trees on the mountains, and pink rose petals are floating in the wind. The fairyland is dreamy, in the style of Chinese ink painting. There are flowers floating around, with a light pink tone.
#▶|stable-video-diffusion Hua Mulan is wearing a red long gown, which is swaying in the wind. The character occupies one seventh of the picture. In the distance, there are many high mountains, which are shaped like a Chinese character, very high and vertical. There are trees on the mountains, and pink rose petals are floating in the wind. The fairyland is dreamy, in the style of Chinese ink painting. There are flowers floating around, with a light pink tone.
Hua Mulan is wearing a red long gown, which is swaying in the wind. The character occupies one seventh of the picture. In the distance, there are many high mountains, which are shaped like a Chinese character, very high and vertical. There are trees on the mountains, and pink rose petals are floating in the wind. The fairyland is dreamy, in the style of Chinese ink painting. There are flowers floating around, with a light pink tone.
Generate video of Timon and Pumba from Disney's Lion's King, dancing on a bridge
Generate video of Timon and Pumba from Disney's Lion's King, dancing on a bridge
Generate video of people clicking on #artisan-faq to get more informations about how to use the bot.
How do I create a video here?
Commercial photography, powerful yellow powder explosion, fried chicken, black background, bright environment, white lighting, studio lighting, OC rendering, super detail, solid color isolation platform, professional photography, color gradinging About Midjourney Parameters --ar 9:16 --v 5.2 --s 750 --c 0 --q 1
how i solve this?
ComfyUI w/Flux.1 for txt2img and Stable Video Diffusion for Img2Vid
Music by Suno.AI
Lyrics by Forest Star Walz (reallybigname)
“Loom of Love - Metal Ballad”
https://suno.com/song/515836c0-d2b3-47fd-94e3-9bbf95244612
“Beauty So Kind - 1980's Metal”
https://suno.com/song/397bc0a6-8b2f-4643-a405-10d9fbe4bf4c
#aiart #aivideo #comfyui #flux1 #sv...
Here at the Capitol, we offer 2 easy options to stay healthy!
To support me & this show: https://www.patreon.com/AzeAlter
Capitol of Conformity Series
Written, Directed & Edited By
Aze Alter
Music By
Udio & Aze Alter
Associate Producers
Nyukyung
Christopher Gerardino
AI Visual Assistants
Midjourney & Lumalabs Dream Machine
Voice Lip Sync
...
Getting there?
how can i generate a video with custom prompts?
using the API?
zebra blinds on the wall in the bathroom
How to create a.i. Images using stable diffusion on discord?
i running sd and it show this, then run normally, do it effect much on sd
hey guys
im trying to turn a real life video into a animated video
my video is a man walking uphill
and his eyes are shown
i was previously doing img2img and found out the eye part was quite messy everytime
is there any prompt i need to put for this video generation for the eyes
Where can I get a good image to video model just starting out?
what animation model do you use for SD XL models ?
Open source or closed source? Honestly, there really is no "good" image to video open source model yet that can be run on consumer hardware. You can use opensora plan which is decent but uses a very large amount of vram, I believe 60+.
CogVideoX 5b is the best open source text to video model, but doesn't support Image to video(yet). For the best closed source image to video model, its probably kling.
I'm not sure what I need, I will learn some more basics and figure out what I need to figure out next.
ello. is SVD realistically usable for video to video? i've been experimenting with processing animations made in blender with animatediff and the results are OKish, but i get the impression that SVD would give me greater consistency/stability (assuming that it's usable for video to video)
i can't seem to find much information on it one way or another... seems like people are only doing image to video with it (just a single frame)
i typically take 4-5 seconds of 24fps video made in blender, and then have the various comfyui nodes extract keyframes to produce something that's ~8fps. i then pipe those into animatediff, and then interpolate the results back to 24fps with FILM and the like
The music was generated in #udio, drawing inspiration from the iconic style of System of a Down. Powerful riffs and dark tones perfectly blend with the apocalyptic theme. The video was produced using #HaiperAI, adding an extra layer of epic cinematic scale. This fusion of technology and creativity immerses you in a world of chaos, destruction, and the end of days.
This song was entirely created using artificial intelligence! 🎶
The music was generated in #udio, drawing inspiration from the iconic style of System of a Down. Powerful riffs and dark tones perfectly blend with the apocalyptic theme. The video was produced using #HaiperAI, adding an extra layer of epic cinematic scale. This fusion of technology...
nice
Hey! How's it going? I'm sharing some work done with StableDiffusion+Runway
Les comparto mi video resumen de lo que he hecho hasta ahora 😁
✅ 𝗦𝘁𝗮𝗯𝗹𝗲 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 imágenes
✅ 𝗥𝘂𝗻𝘄𝗮𝘆 Gen-3 Alpha para videos
✅ 𝗦𝘂𝗻𝗼 para la música
✅ 𝗣𝗵𝗼𝘁𝗼𝘀𝗵𝗼𝗽 𝗔𝗜 y 𝗖𝗮𝗽𝗖𝘂𝘁 para editar
Pero ¿𝗾𝘂𝗲́ 𝘀𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮 𝗲𝘀𝘁𝗼 𝗽𝗮𝗿𝗮 𝗲𝗹 𝗳𝘂𝘁𝘂𝗿𝗼 𝗱𝗲𝗹 𝘁𝗿𝗮𝗯𝗮𝗷𝗼? ¿𝗦𝗲𝗿𝗮́ 𝘀𝘂𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗲 𝗰𝗼𝗻 𝘀𝗮𝗯𝗲𝗿 𝘂𝘀𝗮𝗿 𝗲𝘀𝘁𝗮𝘀 𝗵𝗲𝗿𝗿𝗮𝗺𝗶𝗲𝗻𝘁𝗮𝘀? 🤔
La respuesta es un rotundo NO. 🙅♀️
Si bien la IA ...
In the darkened streets of Elm Street, nightmares are no longer bound by human imagination. AI has learned to dream, and its visions are far more terrifying than anything Freddy Krueger could conjure. As the boundary between reality and digital horror collapses, a group of survivors faces the ultimate enemy — a self-aware algorithm that twists t...
subscribe to my telegram channel t.me/neuroBeatsAI
"Dive into the world of artificial intelligence with this amazing video showcasing the process of generating images using AI. From fantastic landscapes to abstract masterpieces, these artworks inspire and amaze with their uniqueness. Discover how AI can turn your ideas into visual works of art...
Generate a photo with a giant moon in the background. A beautiful Chinese Asian woman in transparent Tang dynasty clothing descends from the sky. A Taiwanese Asian bodybuilder naked granddaddy wearing a painter's hat and standing fucking her asshole. Many winged Miffy rabbits are flying nearby.
subscribe to my telegram channel t.me/neuroBeatsAI
#AIgeneratedMusic, #AIArt, #NeuralNetworkMusic, #HaiperMusic, #HaiperAI, #AIMusic, #AIProduced, #ArtificialIntelligence, #MusicVideo, #NewMusic, #AIClips, #FutureOfMusic, #MusicTechnology, #InnovativeMusic, #AIInnovation, #ElectronicMusic, #PopMusic, #Trending, #Viral, #Music2024
#TrendingNow, #...
subscribe to my telegram channel t.me/neuroBeatsAI
#AIgeneratedMusic, #AIArt, #NeuralNetworkMusic, #HaiperMusic, #HaiperAI, #AIMusic, #AIProduced, #ArtificialIntelligence, #MusicVideo, #NewMusic, #AIClips, #FutureOfMusic, #MusicTechnology, #InnovativeMusic, #AIInnovation, #ElectronicMusic, #PopMusic, #Trending, #Viral, #Music2024
#TrendingNow, #...
The renderings generated with Flux.1 dev, and then the video generated using KlingAI's image-to-video method.
Background music by Leonardo Griego
Unfortunately most open source video gen is not near closed source. The best is cogvideox 5b but it does not support image to video and is pretty slow. It only requires 5gb vram at the least tho.
"Nirvana Forever"
https://www.youtube.com/watch?v=5-B36LrH3Do
"Nirvana Forever"
Я был поклонником Nirvana с самого детства. Их музыка не просто захватывала — она меняла жизнь. Когда я впервые их услышал, я был поражен. Энергия и страсть Курта Кобейна вдохновили меня взять в руки гитару и научиться играть.
Спустя годы я решил создать что-то особенное в память о Курте и его влиянии на меня и мир. Использ...
Hoi, is there a text to video node for the stable video diffusion models? Or is it only img to video for now?
Yeah they didn't release any text to video model. I would highly recommend CogVideoX 5b as it's the best open source text/image to video model right now and requires a pretty low amount of vram.
Aye, just found it a few minutes ago from randomly googling, but it has 2 halves of a model. No idea how to use that lol
that is done to save ram, if it was 1 file, it usually takes a very large amount of ram. you can use it in comfyui with this: https://github.com/kijai/ComfyUI-CogVideoXWrapper
Thanks :) I usually force text encoder over to cpu, that way i can make even 2k flux resolution images
As i have 64GB ram, then it's plenty for as is :P
@summer seal Also, is it text_encoder or transformer i fetch the models from? https://huggingface.co/THUDM/CogVideoX-5b/tree/main
well both, the text encoder is t5 xxl model(same one as flux, sd3, pixart) and the transformer is the actual model, similar to a unet.
Ah, gotcha. Cause hugging's speed is all over the place, and time remaining is between 2 minutes and 2 hours. So this might take a while xD
This is actually a revolutionary fuckin node!
Why haasn't anyone made one for civitai? xD If we load in a workflow from an image, have it auto download the model if not present lol
@summer sealIt states 5 min, was that during the pip install? Or is the first gen here the compilation part?
Also, do tell if i'm pinging/asking too much, and i shall stop lol
5min for the compilation part, should be faster then. and the pinging/asking is fine.
I don't know which part being the compilation one :P
As the pip install part was just mere seconds, The first generation expects 12 minutes
it should be a bit faster after the first generation.
"a bit" xD
Also, doesn't cog do third party models? As in generate with whatever base model, but animate with cog, like animatediff does?
Nope, cogvideox is a completely pretrained from scratch so it's not compatible unlike animatediff which is a finetune of sdxl and sd1.5. You can't use a base model with cogvideox but it will produce much better results then animatediff.
If you want a model that's compatible with sd1.5 models, you can try FancyVideo, it's better quality then animatediff but slightly lower quality then cogvideox imo. You can customize the sd1.5 base model though. FancyVideo comfyui node: https://github.com/AIFSH/FancyVideo-ComfyUI
1.5, SDXL or flux, as long as it's better than animatediff, as so far i've gotten somewhat results with 40-45 frames :P And wouldn't mind a T2V with better results :P
Rip, cog video combine broke
yeah cogvideox natively generates a 6 seconds video which is pretty nice.
Aye. And turns out H264 cpu and nvenc is bust with cog. Webm worked fine
check my video guys https://youtu.be/OxKc6Tsq4VU?si=F59AfyhA3c4bXNuM
Prototype: The Movie (2025) - Official Teaser Trailer
Based on the hit video game, follow Alex Mercer as he battles a deadly virus outbreak in Manhattan. With shapeshifting abilities and a quest for vengeance, the line between hero and villain blurs. Witness the chaos. Coming 2025.
Music CTTO:
Red Alert by Soundridemusic: https://www.youtube...
@summer seal Rip, fancyvideo failed to load
Cannot import D:\Stablediff\Comfyuimanual\ComfyUI\custom_nodes\FancyVideo-ComfyUI-main module for custom nodes: Not a gzipped file (b've')
Also, appears fancyvideo broke cog nodes lol. Had to reinstall a few dependencies xD
Tbh, add to the title that it's A.I, cause it's videos like those that makes us mistrust youtube for being untrustworthy when fake movie trailers is being provided with clickbait titles not informing it being fan made.
Haiiii
So I just returned to stable diffusion after quite some time
Aand I just found out there's audio and video now
Butt
Is it/will it be available on A1111? Or something similar at least, just self-hosted is the point
Dive into the exciting world of fashion and technology with the new official trailer for "Minimax vs Runway." In this video, artificial intelligence takes center stage on the fashion runway. Don’t miss the chance to see how two cutting-edge neural networks battle for the title of fashion king, featuring memes and ironic scenes. Witness how AI is...
Welcome to the Intergalactic Fashion Show, a one-of-a-kind event where the universe’s most stunning models strut the runway in cosmic couture. From futuristic designs to alien-inspired outfits, witness a fashion revolution powered by cutting-edge artificial intelligence. Every design, detail, and visual has been generated using advanced AI techn...
Stable Audio is decent but it strictly makes just sound effects, not songs with lyrics. Stable Video is outdated, the much much better alternative is CogVideoX.
However both are not as good as closed source, they are close but not as good.
There is also flux(image gen) which is open source and incredibly good. It can write sentences or 2 of text, have very good prompt following, and gives you 5 fingers and perfect human anatomy basically all the time. It's comparable and even better then some closed source models.
I'm mostly for sound effects
And well I don't wanna use closed source
I'll look into flux, but what I really like about sd is loras and all the various tensors and stuff, I'm mostly for cartoon stuff than realistic too
There are a bunch of Lora’s for flux(including cartoony ones) so you can try any one you like now.
Hello Guys
I want to study Image generation
using Stable diffusion
i learnt about the basics
is there any reference youtube channel or paper
i need to follow
Hey been away for a couple of years, is there an api for this stuff yet or still best to run it on something like google colab? Or could someone point me in the right direction for a contemporary tutorial?🙃
What do you mean by api? There are several api's for models but do you want open or closed source models? I would recommend flux, flux is open source like stable diffusion but it's extremely good(better or similar to closed source models, its better then midjourney and dalle3).
It's excellent in writing text in images, prompt following, and having basically perfect humans. It works with most prompting styles but prefers natural language.
You can run it locally, or in google colab, or use api's. It's basically right now everyone's go-to model.
If you are talking about video gen, then CogVideoX is by far the best open source model but isn't quiet closed source level(kling, gen3, minimax). The closed source api's are very expensive compared to CogvideoX though.
Oh that's nice
That's guud
I'll definetly look into it
just wanted to share something ive been working on for a while
Wow really impressive, how did you make that?
Thank you. What I did was use a video I shot and then just played with the settings in stable diffusion as well as using different loras.
I'm hoping someday we get serf like midjourney to stable diffusion.
I have yet to try stable diffusion tho.
I get lazy everytime, thinking running it locally. It requires GPU and massive storage. 🤧
According to google, serf seems to be consistent styles. That is 100% possible with sd models.
It’s possible in flux as well which is better then mid journey and it’s also open sourcd.
How 😅
I tried to put tags and artist name in the prompt, still didn't get the results 🫠
I use flux on hugging face. 🙂↔️🤷♂️
Part 2: https://youtube.com/shorts/AkQwmkoO7d0
A small kitten is on a pirate ship this time. The kitten is hungry and wants to eat something, but the evil pirate captain won't allow it. The pirate captain doesn't yet know what's in store for him.
The film was created with the help of artificial intelligence. The animals, people, and events are...
Does anyone have any idea what Video generator they might be using?
Намалюй Логотип Pavlo Ruban School Діагностика ходової частини та Встановлення кутів коліс . Елементи автомобіля: Включи силует автомобіля або його деталей (наприклад, колеса або підвіску), щоб одразу зрозуміти, що моя школа пов'язана з автомобільною діагностикою.
-
Інструменти: Додай іконки інструментів, які використовуються при діагностиці авто (наприклад, гайкові ключі або рівні), щоб підкреслити практичний аспект навчання.
-
Текст: Використовуй сучасний шрифт, щоб написати назву "Pavlo Ruban School". Можна виділити слово "School", щоб акцентувати на освіті, або "Ruban" для індивідуальності.
-
Кольори: Використовуй кольори, що асоціюються з автомобілями, такі як синій, чорний або червоний. також можеш додати металеві відтінки, щоб підкреслити технічну тематику.
-
Символи: Можеш додати символи, такі як компас або шестерня, що може символізувати точність та якість навчання.
\\
Намалюй Логотип
Намалюй Логотип Pavlo Ruban School Діагностика ходової частини та Встановлення кутів коліс . Елементи автомобіля: Включи силует автомобіля або його деталей (наприклад, колеса або підвіску), щоб одразу зрозуміти, що моя школа пов'язана з автомобільною діагностикою.
Інструменти: Додай іконки інструментів, які використовуються при діагностиці авто (наприклад, гайкові ключі або рівні), щоб підкреслити практичний аспект навчання.
Текст: Використовуй сучасний шрифт, щоб написати назву "Pavlo Ruban School". Можна виділити слово "School", щоб акцентувати на освіті, або "Ruban" для індивідуальності.
Кольори: Використовуй кольори, що асоціюються з автомобілями, такі як синій, чорний або червоний. також можеш додати металеві відтінки, щоб підкреслити технічну тематику.
Символи: Можеш додати символи, такі як компас або шестерня, що може символізувати точність та якість навчання.
how do I create images here ?
cf #artisan-faq
hi
Hi
guys please help my channel grow I made a new video using HailuoAI https://youtu.be/A3CuCoHUGM8?si=eiBYOIKmdG_R5y8C
don't forget to LIKE and SUBSCRIBE and also COMMENT XD XD XD
In this inspiring journey, we delve into the early life of Dwayne, a young boy who faced countless challenges and moments of clumsiness. From his awkward childhood antics to the hurdles that tested his resolve, Dwayne’s story is one of determination and growth. Watch as we explore how he transformed his insecurities into strengths, fueled by har...
how do i get full quality video previews in comfy ? the videos in the editor are always of much worse compressed quality compared the output video in the outputfolder
hello! I'm looking to get into making animated films using stable diffusion for video, training my own models and capturing my own video then turning into animated style.
what kind of specs should I be going for? would an rtx 4080 super be enough? Or i need the vram of 3090 or 4090?
And how much system ram- 32gb or 64?
Hi, I am trying to create a hairstyle that is very specific, (since it is hard to prompt it right using description)
I get to naming this hairstyle, and it seem work. However, one thing I notice is that I need get high rate. so like 0.7-0.9 to see the hairstyle I want, however, most of the times, it also change the face of the person to be similar face to model that I used when I train too. I already try to find most diverse face I could find for this hairstyle on internet. so I think the trouble is from the tagging. Can anyone suggest?
Liminal Low-Fidelity Dreamlike / Nightmarish AI Footage.
Model Epicphotogasm Z-Universal
#ai #aivideo #stablevideodiffusion #stablediffusion #liminalspace #rtx2060
hello
Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.
If you have any questions, feel free to ask us!
Your dashboard
Help
Support server
Other languages
en: help
ja: help Japanese
/video
Any idea what tools can generate such cool videos if fed with great ideas? https://www.instagram.com/jayprints?igsh=MThheGtiMGM0azFuaA==
What is the easiest way to create a Zoom Out Effect? I thought about creating key frames with out painting and then just interpolate but feels inefficient to me
is it better to use sd 1.5 or sdxl for animatediff? I can't find any sdxl lcms?
I think sd1.5 is considerably better then sdxl right now since sdxl is very beta in animatediff.
Oh wow i see
Trying to run txt2img.py I get this error: from imWatermark import WatermarkEncoder
I saw many people run into this from long time ago but no definite solution. Anyone runs into this? how do you fix it? I tried installing invisible-watermark but same result. Thanks in advance
Technically not SVD but mostly Liveportrait, but I didn't know where to post a musical video. It's for my brother's birthday, everyone calls him Darth and he plays orcs in AD&D in the Faerune universe : https://youtu.be/CP0XykFXzlE?feature=shared
#Darth #happybirthday #stablediffusion #aiart #parody #mmorpg
Happy birthday, Darth !!!
Some AI fun with Stable Diffusion and Liveportrait to celebrate Darth's birthday. The soundtrack has been created with the assistance of Suno.
I made this video in honor of Halloween, it's a parody of the worlds first crypto currency made just for Vampires. I made it using Fooocus for the original images, for the voices I used e2-f5-tts, I used cog studio and live portrait for the animations and MAGNeT for the music. Finally the entire show was edited using Open Shot https://www.youtube.com/embed/Lxa2BepSh4U?si=OCfuyiHkPmCm6b9F
In honor of Halloween I thought I would put a little spoof together of a crypto currency made just for Vampires. A little crypto humor using all open source AI software.
hi, I have exactly this question, about txt2vid.
I'm working with 12 GB VRAM
or img2vid (!?)
same, information is evasive
Best open source text to video model is Mochi-1 for sure. On the text-to-video leaderboard, it’s #2. it beats kling, Luma, gen3 and is behind MiniMax.
Unfortunately, it’s not fitting on a 12gb vram gpu, it requires a 24gb vram gpu.
Allegro 2.8b is the best smaller text to video model and would fit in 12gb vram but since it’s very unoptimized, it would take 30mins.
Your best bet is probably CogVideoX. You have really amazing control(img2vid, trajectory, ControlNet) so you can probably make better videos even then closed source competitors. It’s going to be fast too, taking a few minutes probably.
great answer, thanks. So animatediff or the img2vid SD model, is not recomended?
Those are very outdated, perform far worse then the modern models, and the community also doesn’t show much interest in them now.
sooooooooooooooooooooooo goooooooooooooooooooooooooooooodddddd
can i make an ai pic by just thiss img
@dreamy coyote I use Cog Studio, it works very well on my Laptop, even though I only have 8gb of VRAM
does it have img2vid? I see some videos but wasn't really thrilled. Also, what model or workflow would you recomended? I saw the site and it wasn't clear to me.
@dreamy coyote yes it has img2vid, in fact that is how I do all of my animations. I make a still image in fooocus, then put that image into Cog to animate it. It is a pretty simple workflow, but it works for me. I am pretty new to all of this, so the simpler the better lol
how about vid2vid? i realize most models are capable of that, but it's not guaranteed
sorry, answered my own question. it does do vid2vid
woah, how did you get scene consistency while zooming out + camera tilt upward? 😮
i have two gpu's one with vram 12 gb on with 16 gb. is there any possibility to run stable diffusion video using these two. its would be a great help. i am new learner .
Yes you could, but stable diffusion video is not really state of the art, its very outdated and you can't even really control it with text, you can only input a single image and svd will just predict the motion.
The current sota is Mochi-1 which outperforms closed source models like gen3/luma/kling but requires 24gb vram at the minimum.
I would recommend CogVideoX, it has lots of control(img2vid, controlnet, trajectory) which can make videos possibly better then even closed source. It will fit on a single 12gb vram gpu too.
It can run now in a rtx3060 with 12gb vram https://www.reddit.com/r/StableDiffusion/s/cqgptvwlgR
Is this possible to run cogvideoX-5b on a rtx 3060 12gb, when i try the inference take an eternity (10mn/it ) is this normal ?
For the 5b FUN Model there is a q4 gguf file available which works with 10 Gbyte VRAM with comfyui
thanks , can you do image to video with it ?
Well you would need the 5b image to video model which needs more memory. Not sure if a 4bit gguf version exist. Works with 24gbyte of ram and i am not sure i could also work with 16...
ha i see
i am using the cogvideo fun i2v workflow and sometimes i get "allocation in device" error during video decode , do you know what make this error ?
is because i am out of memory in gpu and ram ?
Yes you should try to reduce the amount of frames
give me an example
Does someone see the new cogvideox lora : DimensionX ?
Hello, I noticed that the stability-ai/stable-video-diffusion:3f0457e4 model has been removed on Replicate. Is there an updated version that we can use? or has it been removed from there permanently?
hey everyone im new to this but is this the right tool to create ai videos?
Well there are a few online paid services. If you want to create ai videos on your local hardware there are a few models out there. Deforum (one of the first, good for dream like animations...), AnimateDiff (short animations), SVD (this one by stablility) the most current should be cogxvideo and mochi-1
Yeah lord of the weed summarized the models. Svd is heavily outdated though, and doesn’t even support text prompts.
I would highly recommend using CogVideoX as that’s a far far better alternative, it supports txt2vid, img2vid, vid2vid, trajectory controlled, controlnets. It requires very little vram too and quality is great. Requires 6gb vram at the least(8gb+ is a good idea)
Mochi-1 is for sure the best text to video generator but isn’t very controllable with no img2vid or vid2vid. Requires 12gb vram at the least.
Some mochi gens
Some CogVideoX gens
i got a 4090 last year and im just looking to use it now to make a bit of extra money with ai instead of just gaming lol. hoping to make some cool ai videos with it
Yeah 4090 is well enough to run both models, mochi is slower tho since it’s larger but much better quality then CogVideoX at text to video. On the text to video leaderboard, mochi surpasses gen3, luma, kling.
You can only run mochi locally with q8/fp8 quant, and that will lower r quality a bit btw.
Tho CogVideoX has really lots of control and Lora’s and is considerably faster.
@summer seal am i able to install that stuff through the comfyui manager?
Do you think it will be possible to speed up mochi inference? I saw a new way to quantify diffusion models like flux or cogvideox ?
Hey - thanks for this. I don't really keep up on the video side of things much. Just randomly popped in here this morning and saw this post. 🙂
Yes quantization works for mochi as well but it seems much more sensitive, q8 is the best right now and fp8 seems to lower quality. Theres no good 4bit quant right now, it seems like it has some parts of weights that are very very sensitive to quantization, and people are still looking on how to quant it effectively: https://github.com/huggingface/diffusers/pull/9769
There is also a few extra optimizations like FasterCache that will massively speed up inference but uses more vram, it isn't out for mochi yet(only cogvideox) but the author said they will look into it.
I believe mochi has native support in comfyui, not cogvideox. I would reccomend kijai's nodes instead since they have more experimental options for further speed-ups and less vram usage.
cogvideox: https://github.com/kijai/ComfyUI-CogVideoXWrapper
mochi-1: https://github.com/kijai/ComfyUI-MochiWrapper
Thank you for your answer, I didn't know that quantization was a delicate process.
it's crazy to see how the quality drops quickly when you go from fp32 to fb16, I hope they find a solution for now they seem to be making good progress
yeah I think bf16 is actually fine now, you can check the banadaco discord where lots of people are experimenting with mochi/cogvideo. bf16 does seem good quality enough, and seems like even upscaling works as well,
Is realy smooth , i love it
I have seen SVDQuant by MIT research and it increase inference speed by 3 with flux and sd models , one of researchers said is also possible on mochi
It works with a new inference engine named nunchaku
30x and 40x rtx cards
yeah svdquant seemed great, hope svdquant supports mochi soon.
cogvideo seems to crash a lot for me but mochi works pretty well so far. is there a way to do img to video for mochi?
no img2vid sadly, does have vid2vid though, also cogvideox should use far less vram then mochi. Cogvideox is 5b dit while mochi is 10b dit.
Yeah im not to sure why but i keep running out of memory with cogvideo
Do you get an error when it try to decode ?
How many frames have you set ?
Yesterday i get the same error , i reduced the steps and frames count
its at 49 with 12 frame rate
Try to reduce the mount of frames
Ive tried but it requires me to change the cog video model then when i change to a different one i get a different error. I'll try it again here in a min and let you know what it says
"Given groups=1, weight of size [3072, 16, 2, 2], expected input[14, 32, 60, 90] to have 16 channels, but got 32 channels instead"
I got the same error , are you using cogvideoxFun model ?
Alibaba released a new video model https://huggingface.co/spaces/alibaba-pai/EasyAnimate
Yeah I tested it, its decent but it's definetly not worth the vram usage, its 12b params. Mochi is 10b params and far better but only supports text to video, I would even say cogvideox 5b is better.
Anybody know why I could be running out of vram using cogvideox on a rtx 4090? Whenever it happens I have to restart my pc and then I can generate about 5 videos til I get the memory error again
always happens on decode
hello
This is a short AI generated parody I made having some fun with over weight cops. I made the original images using Fooocus, I animated the images using Cog Studio, The music was made using MAGNeT and I did all the editing using Open Shot. All of these programs are 100 percent free and can be easily downloaded to your own computer using the Pinokio Browser. https://www.youtube.com/embed/tPgRP3INCZs?si=tY3vxw_NTCi-dcBr
This is a short AI generated parody I made having some fun with over weight cops. I made the original images using Fooocus, I animated the images using Cog Studio, The music was made using MAGNeT and I did all the editing using Open Shot. All of these programs are 100 percent free and can be easily downloaded to your own computer using the Pinok...
What do you think is best solution to fix the anatomy / defects in a video
My current solution is just taking frames every 0.5 seconds upscale fix with flux and interpolate at the end
#artisan-1 wolf run
What open source video models do you guys use
mochi best for text to vid
closed source is way ahead though
for img2vid theres pyramid flow and cogvideo
Full precision mochi is a different story though, that actually beats gen3, luma, kling1, pika on the text to video leaderboard except minimax.
https://huggingface.co/spaces/ArtificialAnalysis/Video-Generation-Arena-Leaderboard
whats full precision ?
I meant like fp16 with 200 steps, most people do like 50 steps and use fp8 quant which considerably lower quality but increases speed by a lot. You need 40gb vram gpu to run it with that precision.
Can cogvideo do videos longer than 49 frames
So damned good 😂
@red hill, thanks, buddy! Glad you liked it
which one is better for img2vid
No it get weird after 49 frames
does someone know a comfyui node for made frame interpolation from a video ?
cogvid for sure especially with Tora
Where do you get Tora
this?
Yeah that's the original repository but its not very optimized and doesnt support img2vid, this does though: https://github.com/kijai/ComfyUI-CogVideoXWrapper
You can input a trajectory with Tora so tell the model where to go, and the repository has workflow examples: https://github.com/kijai/ComfyUI-CogVideoXWrapper/tree/main/examples
ooo like you can have a rock in the picture for img2vid then make the rock move the way you want it to?
I have an example somewhere, one sec
installed and found workflow gonna check it out
cogvideox 5b tora trajectory example? or l2v testing
some tora examples
are you able to combine cogvideox with live portrait?
is there also an updated version of live portrait for lip syncing
ooo I see
Yeah should be able to, I personally didn't try it but I saw some other people try it in the banadaco discord.
Like to animate humans? controlnext svd2 should be probably better: https://github.com/dvlab-research/ControlNeXt
@summer seal made using Mochi with 13 frames taking 167 seconds upto 250 seconds
I'm running a new batch where I do side by side comparisons of KSampler VS ClownSharkSampler to see if there's any difference in output quality
That's really great results actually, is that q8 or fp8?
yeah I also used fp8 because q8 is much slower, didn't test q8 personally though so maybe there's an error rn.
how many steps btw?
i'm using the default everything for the given workflow so 30 steps, cfg 3.5 euler simple
here's the two samplers (still waiting on the results)
i tried to set up clownshark to be as close to ksampler as i could using the similar sampler and the same scheduler, steps, cfg
do you think i should enable ETA or leave it at 0?
(These last two are still using ksampler)
thats pretty impressive it generated that in just 30 steps, for the ETA why not try 1.
@glacial orchid if I wanted the absolute best video quality what settings would you use? I'm thinking I want to do some 'high quality' runs where I use a much better sample, more steps and a different scheduler
eta 1? wouldn't that be way too high?
i was thinking 0.5 or 0.25
yeah it might be lol, too much noise. 0.5 could be more stable but still enough noise.
you are doing low steps so 0.25 might be better but its up to you.
so give me your recommendation for a high quality configuration
let's say 0.5 eta, 40 steps, res_3s, beta57, and cfg leave it at 3.5 or change it?
I remember that cfg of 6 produced slightly better results, but I didn't test it fully. Try a cfg of 5 maybe? Also res_3s works with mochi? I thought only their custom sampler worked(the linear quadratic thingy)
@summer seal I couldn't get ClownSharkSampler to work out of the box as a drop-in replacement for KSampler I keep getting an error about the pooled_output and the positive conditioning, it would be cool if @glacial orchid could adjust it to support mochi but if he dooesn't I figured out a way to 'trick' clownshark to work. left is KSampler and right is ClownShark
oh interesting, the clownshark one honestly seems better.
i like the dolphins better on ksampler but i like the boat better on shark
oooo a shark based prompt would be nice
you should be able to save the clips and load them as workflows if you wanna try it yourself
one more, not sure what sampler generated this one
left is ksampler, right is clown
A bustling harbor filled with colorful sailboats swaying in the breeze, seagulls calling overhead, as a massive storm cloud begins to roll in from the sea.
I should start includng the prompts
Id say K did a better job with the boats, clown did a better job with the clouds
interesting viewpoint but I honestly like the 2 above ones, maybe I need to try mochi again with shark, it seems pretty great.
on the two below ones, clown wins again for me. The clouds and the atmosphere is better like you said, boats are a bit weird in both but both are not too bad.
the boats are more complex but not colorful with clown, id have to reopen it to see what sampler I used, I queued up a bunch with a mix of samplers and steps to get a feel for it
my gen times went up real high tho bc I have to use Ksampler as a proxy for clown
A runner sprinting along a cliffside trail at dawn, wind rustling through tall grasses, as flocks of birds rise dramatically from the trees below.
I might have to give this one the ksampler, birds are more stable as well as running.
yeah i didn't want to influence your decision so i was waiting for you to say somemthing but yeah i feel like it took "rise dramatically" and then just made them go crazy with the 2nd video whereas they're not adhering to the rising dramatically part but it looks more pleasing to see thhem gliding gracefully
A great white shark gliding silently through crystal-clear waters, sunlight filtering down in beams, as a school of fish scatters in all directions from its path.
shark sampler wins hands down on actually making a shark, K doesnt even know what a shark looks like, looks more like a dolphin, they should call it DolphinSampler lol
Yep for sure lol, no competition there.
just checked and that was res_3s and beta 57 at 40 steps with 0.25 eta
A hammerhead shark weaving gracefully through an underwater kelp forest, the plants swaying with the current, as tiny bioluminescent creatures illuminate the depths.
- I feel like mochi is pretty weak when it comes to underwater stuff whereas land stuff it did really well
Both are decent, I think clown is better though since it has the nice looking bubbles too.
neither one is a hammer head, k sampler got the kelp forest better, clown did swaying better, no bioluminescent. I uess the bubbles do give it bonus points, clown wins again, with a score so far of ksampler 1, clownsharksampler 4
A shark circling an isolated buoy in the open ocean at dusk, the water rippling in eerie stillness, as a helicopter hovers above, its searchlight scanning.
hands down clown wins again, like its not even a competition with this one
on a technical level, and i don't know if @glacial orchid can answer this but I was feeding the latent output of ksampler to clown so I don't now if clown shark was cheating by feeding pre-digsted latent space rather than an emptyy one so I adjusted it like so and that seems to work, that way they both start with an empty latent, again not sure if adv eff. is a pass-through or not, the main problem it seems is that clown is having a hard time accepting the conditioning which is weird bc it's just load load clp > clip text encode > clownshark
this is the first comparison video where I switched out that configuration for shark to start with an empty latent space
Prompt: A futuristic cityscape at night, with towering skyscrapers glowing in every imaginable color, holographic advertisements swirling, and flying vehicles streaking through the air.
I like the first ones atmosphere and stability more but the 2nd one looks better kinda.
i like the "streaking vehicle through the air" is more true to the prompt ini clown, but i do like K's city better that actually looks like a futuristic city tho a bit incoherent its kinda okay
maybe call this one a draw? lol
@scenic basin maybe you can act as a tie breaker?
A bustling carnival at sunset, with colorful streamers and balloons filling the air, carousel lights twinkling, and fireworks bursting in a kaleidoscope of colors overhead.
I think I'll give this one to clown it managed to pack in a lot more fireworks and detail in the 13 frames it has to showcase its power
Both are similar, but clowns fireworks are better like you said.
with K we get to see 1 firework sort of start to end, with C we get 3x fireworks from start to finish plus the 4th one sortof just linger in the sky which they also do. i also don't see any ballons for K whereas I see 2x balloons ono C. lol anyways I got 3x more and then i'm gonna go back to rendering pics 🙂
i like the one on the left
alriht so let's give it to K, score is Ksampler 2, clownshark 5
A vibrant coral reef teeming with life, neon-colored fish darting among rainbow-hued corals, as a sea turtle glides gracefully through the crystal-clear water.
clowns looks, unfortunately, fake - colorful, but fake
left
turtle has 5 legs and coral doesn' tlook like that underwater
what happens if you disable the truncate conditioning... set it to false, and be sure both pos and neg are hooked up
I tried with just positive and with both hooked up and I also tried setting it to false and true, I tried all the combinations, it’s crashing on line 461 in samplers.py specifically in the part for the positive input
I’d love for you to fix that bc it’s increasing render time by 3x to 4x to have to run both in the same workflow rather than one at a time
I think it’s holding ksampler in memory while it runs clown
@glacial orchid here’s a chat with ChatGPT where I tried to debug it
https://chatgpt.com/share/673bd8cf-a8c0-800f-a46c-6c0fa6fa9fa2
Provides some detailed error message logs and what happens when it tried to fix it, it’s above its head I knew it wasn’t gonna go anywhere so I gave up
when was your last git pull? this might be something i already fixed actually
with cogvideox are you able to use character loras?
Yes I believe so
do you know where I can find them
A serene marketplace in a coastal town, with stalls overflowing with vibrant fruits, spices, and textiles, as golden light filters through colorful awnings overhead.
i mean it's not lke i'm biased but clown is clearly the winner
i think K did a better job showing spices, C diid a better job showing fruits
yeah pretty sure i fixed that
indeed you did, updating to the latest version it works perfectly yay
A sprawling field of wildflowers in full bloom, with every shade of the rainbow stretching to the horizon under a brilliant blue sky dotted with fluffy white clouds.
that was res_2m, i'm rerunning this prompt with a bunch of different configurations and ill report back
I think we've clearly established ksampler sucks as always even in the video generation realm, now with mochi support
res_3/brownian/brownian: 506 seconds
res_3s/guassian/guassian: 435 seconds
rk_exp_5s/brownian/brownian: 702 seconds
Two posters for the Black Friday event
#artisan-1 Two posters for the Black Friday event
A lantern festival at dusk by a peaceful lake, glowing lanterns drifting into the sky, their warm light reflecting on the water, as bursts of fireworks illuminate the scene in vivid colors.
A vibrant city square on New Year’s Eve, confetti raining down from above, cheering crowds, and brilliant fireworks exploding in rapid succession against the backdrop of towering skyscrapers.
I have an rtx 3060 12 gb vram and 16gb of ram, when I use cogvideoX 2b the decoding takes a lot of memory and I often have allocation errors. Is there a way to reduce the memory usage of the decoder without necessarily reducing the number of images?
111
Yeah me too, for mochi when it hits the vae decode stage it throws that error out of memory switching to vae tiled mode. It’s more like an error than a warning, if I go over 85 frames I get actual OOM errors.
Are you saying cogvideo throws actual errors and is unable to complete when you queue it up sometimes?!if so how many frames are you trying to render ?
the following videos I'm going to post each took 40 minutes to render on my 8GB gpu here's some stats
- used ClownSharkSampler not KSampler
- used res_3s or 5s sampler for most of these
- 40 steps cfg 4.5 for all of these videos
- 49 frames @ 15 fps to try to get at least 3 seconds out of it
I am trying to render 49 images
Frames*
cool just like me
so when you say "often have allocation errors" do you mean yyou get those warnings about VAE decode or do you get errors and you lose your work? @karmic schooner
I get error from vae decode and i lost the frames from the sampler
Is there a way to correct this ? It works only if i mut 16 frames and 15 steps
i haven't tried using cogvideo yeah, i founud the comfyui extension and i was thhinking about installing it today but i wasn't impressed with the quallity of the output and i'm even more discouraged from your review
using my 8gb GPU Mochi can handle up to 85 frames from my testing, if you're saying with your 12gb GPU you can only do 16 frames then that's kind of a deal breaker for me
Is because i have 16 gb of ram
oh i see, i have 32gb of ram, that must make a big difference then
i made an update of the nodes it works
i think they made optimizations
it takes 304s to generate
304s to generate 15 frames?
fastest for me is it takes me 110 seconds to generate 13 frames, 150 - 160 s on average
49
oh 304 seconds to generate 49 frames is excellent i could never get that speed
it also depends on the sampler
for cog you HAVE to use the cogvideo samplerl, for mochi you can pick your favovirte one
can i run mochi in fp8 with my config ?
yeah in fact you dont have to install anything to try mochhi
just update to the latest comfy and its all built in
How come a lot of the videos people are talking about on here are only showing as images instead of video clips?
@livid jackal you mean you don't see them moving or they're .webp file format? if you dont see them moving it's probably an older version of discord or a browser issue. when the image is done rendering with the sampler you can choose to use the SaveWEBPFile node or find one to save it as an .mp4 and Mochi is defaulting everyone to the webp node so that's most likely the reason it's the most sed format atm
I’m m not sure because some peoples videos show as videos too
i mean a workflow is just a workflow, anyone can choose anything they want for how it saves, im just saying the example workflow most people start off with defaults to the webp node so that's why they're not 'video clips' aka mp4 or avi filies
a webp file is still a 'video clip' in the sense that its a series of pictures played really fast
here's 3 little whatever ones I've made today @livid jackal do you consider these to be 'video clips' or 'images'?
It’s only showing them to me as images
so like a static non-moving image is all you see? I just tried it on my phone and I can conirm the same thing
lets see if there's an update in the App Store for discord
Non moving, I even downloaded one and still same
there is an update for Discord.... updating now and then going to retry
are you on iphone?
For the app? Yeah
yeah sometimes the app store is misleading so type discord and then clik into it to update it
and i just finished updating it and now they're moving
You mean it doesn’t always tell you about updates? I’ve noticed that for other apps before
i think it depends how its configured, i dont know if theres a setting to auto update all apps as hey come out
anyways you're just on an older version like me, super quick and easy to update and you should be good to go 👍
I don’t auto update apps because sometimes the devs are corrupt? and change the permissions to ones that don’t make sense for the app, it would also be good if it put all the ones that need updating first instead of showing me them all in a list randomly.
afaik i dont think iOS has an option to auto update apps, especially bc stuff like permissions
on a side note i find it pretty incredible how these models are able to include so much detail in 13 frames, especially for that car racing one, feels like a lot longer than 13 frames when watching it
Devs also sometimes change an app you paid for into a subscription app and remove a bunch of features unless you subscribe which is scummy and should be illegal imo
the freemium model pays the bills i mean i think its scummy when they do that for simple basiic apps, i understand they gotta get their money back but it does feel wrong when its a low effort app
I can not really say as I only saw the image but some of the ones that did show as videos to me looked pretty damn real it’s nutz
remember it's $100 a year just to be part of the developer club or apple. that's $100 out of your own pocket you gotta make back somehow lol
Well on my iPhone at least
Not by scamming people who bought your app before it was subscription based tho, they shouldn’t be allowed to do that.
so you updated and it fixed it right?
I did update it
agreed, that's a very specific case, i was talking in general, low effort freemium feels wrong, but i don blame devs for using freemium to get back their investments
Sometimes it depends I’ve seen no end of apps that are over price and or totally don’t make sense to be pay monthly.
Weird they still show as only images ?
so Genmo.ai lets you generate videos for free at 1696x960px and its finished in like less than a minute. meanwhile I'm like "I wonder how long it'll take to make a video with 13 frames at that resolution' its been an hour and 20 minutes and I think there's 5 minutes left
I’ve not heard of that one? Is it for pc or is it an app?
https://www.genmo.ai/ they are the same people who make Mochi that I can run on my computer
so they have the full model running over there using their fancy servers and they can basically make a video that is 5x longer, 2x more frames and the same resolution in <1 minute while it takes my computer 75 minutes, that's not even accounting that they're using model that's likely 2x or 3x bigger/more complex lol
Damn that must be a powerful pc to do it in 1 minute
there it is, took me 90 minutes to render this or exactly 5397 seconds lol
oh wait that's not it:
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Processing interrupted
Prompt executed in 5397.26 seconds
there goes 90 minutes for nothing 🤷♂️
Damn that would be irritating
messing with actual SVD (stable video diffusion) just for fun, generated that little clip from the source image on the right
for my next test I wonder if @glacial orchid's fix to whatever he did to make it work with Mochi would also apply to SVD
not fixed seed comparison but I just want to check for output first
You wouldn’t happen to have any idea if I’d be able to do Ai video on my 4070ti 16gb super Ai gpu and if so would it be worth trying it or would it take forever to do stuff?
That’s well enough actually, both mochi and cogvideox can fit.
Mochi has the best text2vid for sure but not much other control(has some ok vid2vid but that’s not better then cogvideo, and some experimental img2vid but worse then cogvideo)
CogVideoX is decent text2vid but so much control(img2vid, trajectory, control net unions, interpolation, supports multiple video sizes, low as 256x256 or high as 1360x768, lots of loras, orbitx for 3d/4d, vid2vid) and is considerably faster and will use less vram.
What is the reason for comfyui's black picture, my graphics card is fine
Bad configuration
@summer seal I'm tryng cogvideo + orbit and I can't get it to work on my system. I get "Allocation Error" . I set it to 16 frames too. I'm also attaching my WF if you wanna review it and suggest any changes that would help it run on my 8GB of vram
img2vid with cogvideo-2b
looks like ai minecraft
does kijai cogvideoX nodes support quants models ? because i dont see anymore the quant workflow
Are you saying that if I download comfy, Then I could get started using Mochi right immediately right out the box, and I would not have to do anymore node installs?
exactly, you just have to download the files and put them in the right place, stuff like diffusion_models folder, clip folder and vae folder is all you need
SWEET!! Thank you, I also only have 8GB of Vram, is that going to be an issue?
i do too, you can run both the fp8 and bf16 model they both work for me
so the small and large model both work, averaging 160 seconds for 13 frames
Wow that's pretty good, I am in the process of downloaded comfy again, I uninstalled it before because it just seemed too damn complicated lol
it is complicated and that's what i like about it
Is there some easy to follow guide to get cog-video running?
new ai model ! https://huggingface.co/Lightricks/LTX-Video
damm they just released a new video ai model , is faster and keep the quality , i runed this with my rtx 3060 and got 2s/it <2mn
you can also do i2v !
the model takes 11 seconds to generate a video in a rtx 4090 gpu
Yep truly amazing model.
the inference speed/quality ratio is insane
they said it was just a preview and there will be another version with fixes, i tried with i2v but the video is often static, and you need a good prompt for it to work well
Is SVD still the best StabilityAI Video generator? SD 3.5 images or SDXL? any tips for how to make the best videos, epecially using APIs
Svd is no longer the best video model
The bests are mochi (for t2i) cogvideoX(for controll : trajectories , controlnets , orbit ) , and ltx (for speed and quality )
I mean the best from Stability AI
Thanks! Yes mochi looks great I need to try that one... I tried LTX Studio I didn't like it at all
(I am making animated feature films)
How many vram you have on your gpu
I'll DM you to chat more thanks
these 5x are from mochi using these prompts:
A lantern festival at dusk by a peaceful lake, glowing lanterns drifting into the sky, their warm light reflecting on the water, as bursts of fireworks illuminate the scene in vivid colors.
A vibrant city square on New Year’s Eve, confetti raining down from above, cheering crowds, and brilliant fireworks exploding in rapid succession against the backdrop of towering skyscrapers.
A serene beach at night, waves gently lapping the shore, as fireworks light up the horizon in shimmering reds and golds, their reflections dancing across the water.
A small-town fairground surrounded by hills, with Ferris wheels spinning and bright fireworks shooting into the starlit sky, echoing across the open landscape.
A bustling harbor at midnight, colorful fireworks erupting over anchored boats, their light casting shimmering patterns on the rippling water, as seagulls scatter into the night.
LTX demands longer prompts for quality supposedly so I rewrote them like so (only including the first 2)
A lantern festival at dusk by a peaceful lake, where hundreds of glowing paper lanterns are being released into the darkening sky. Each lantern glows softly in warm tones of amber and gold, their light casting faint, flickering reflections on the still, glassy water below. The camera focuses on one particular lantern drifting upward, its delicate paper frame visible against the fading hues of the twilight sky. In the background, faint silhouettes of trees and mountains outline the horizon, while soft chatter and laughter echo from festival-goers gathered by the lakeshore. Suddenly, bursts of colorful fireworks light up the sky, their vivid reds, greens, and blues illuminating the lake and the delighted faces of onlookers.
A vibrant city square on New Year’s Eve, alive with the energy of a cheering crowd bundled up in winter coats and scarves. Confetti flutters down from above in countless shades of pink, gold, and silver, sparkling under the glow of neon lights and massive electronic billboards. The camera pans upward to capture a display of brilliant fireworks erupting in the night sky, their vivid colors reflecting off the glassy facades of surrounding skyscrapers. The sounds of joyous shouts, distant car horns, and the rhythmic beat of celebratory music fill the air. Steam rises from street food carts, and the occasional pop of champagne corks adds to the festive atmosphere.
of the 5 videos only one of them actually has fireworks unless you count that confetti explpsion in the city square
How many time the generation takes for you ?
LTX is 15 times faster pixel for pixel on an 8GB GPU
yeah depends on your settings, stuff like resolutiton, frame count and steps are the major factors
90 seconds for how many frames and what res?
Is the base res
so 768x512
make sure to test it twice, the first time it includues model loading time, the second time should be much faster
Yes
With models loading is 130s
I can't wait to see fine tunes and loras of this model
Yes but it seems broken
There are no movements in the video
LTX is so fast
Yeah , this model is perfect for build a 4d video model
What model did you use?
When trying to pip install the requirements.txt of LTXVideo in ComfyUI, I get an error due to conflicting dependencies ; (comfyui-easynodes depends on torch) ; any pointers to fix that ?
ok thanks!
You could uninstall the easy nodes via the manager and then try again
ok will try
easy nodes is not visible in the confyui manager ; as I understand the requirements of LTXVideo include them ; I'm at a loss here. Not sure if I'm asking at the right place though
I think the ComfyUi server would be a better place but the best advice I can give you to actually figure it out is have ChatGPT help you, just explain the problem like you did here and share the logs in detail and it’ll tell you exactly what you have to run and how to get it working
that would be wild but it's worth a shot 🙂 thanks for the advise
It’s not wild at all it’s my go to solution for any conflict resolution issues, make sure you don’t hold back on the logs it’s important it sees the extent of the errors to get a proper solution
is there gguf quant models for ltxv ?
quality is on par with Mochi
Those example videos are sadly heavily cherrypicked. If you test it, it will give you worse videos then the new ltxv and cogvideo. Has i2v and t2v only, no v2v or anything else.
i'm judging the quality based on the my 'rubric' Ive been using to test different models
so this is my prompts:
A lantern festival at dusk by a peaceful lake, glowing lanterns drifting into the sky, their warm light reflecting on the water, as bursts of fireworks illuminate the scene in vivid colors.
A vibrant city square on New Year’s Eve, confetti raining down from above, cheering crowds, and brilliant fireworks exploding in rapid succession against the backdrop of towering skyscrapers.
A serene beach at night, waves gently lapping the shore, as fireworks light up the horizon in shimmering reds and golds, their reflections dancing across the water.
A small-town fairground surrounded by hills, with Ferris wheels spinning and bright fireworks shooting into the starlit sky, echoing across the open landscape.
A bustling harbor at midnight, colorful fireworks erupting over anchored boats, their light casting shimmering patterns on the rippling water, as seagulls scatter into the night.
i ran the first 2 on huggingface but i was going to run the whole set locally
here it is
i think those look way better than LTX and on par with Mochi
Oh it did a good job with that, t2v isn't too bad(still worse then cogvideo imo) but i2v gets blurry and morphs a lot. On humans as well, it does a pretty bad job.
i never got cogvideo to workk, i tried a bunuch of settings, tried a bunch of models and i just get allocation error (oom)
i got pyramidflow running already too, straightforward install
doesn't look anyywhere near as good when running locally
LTX
The Swin-DPM is crazy , tou can use it with a'y DiT video generation model , it allows you to extend video duration above 1mn and generate infinite duration videos
Ltxv can do video to video now
Well tried mochi with a bit longer prompt (obviously with water zombies...)
And used CogX for this one:
lool water zombies
i like the lighting for mochi better over cog in that example, i don't like that high contrast dark setting cog has
For the cogX one i used Img2Video so the starting image was one generated.
that's a bit of a cheat
anyways whatever works for you, if anything Id say cog isn't that far behind mochi, if i were to rank the big 3 I'd say mochi first, cog second, ltx distant third in terms of quality
i totally agree and at least cog is supporting img2video and even different resolutions if you use the fun model.
i wanna try cog i really do it does seem cool, especially the orbit lora, if i could get that to run on my machine I'd fast track it inito my system to be able to click any image and make an orbit video for it, i think that would be neat but as of right now I tried a few different models and i always get allocation error, on the Load checkpoint node too its not even getting past just being able to load the checkpoint (8gb vram gpu here)
anyways @pliant current pass the joint lets smoke of of that good stuff you have 🌲 🚬 lol
I am suprised by cogvideoX result
Is possible to run cogvideoX 1.5 5b with 12 gb vram ?
Thanks for the hint for the orbit lora, it works quite flawless...
lol dammit bro now i'm jelly i cant play with it 😦 I'll take a hit for that clip 🚬
With fp8 Text Clip and maybe some offloading it should work with less vram?
For 12 Gbyte VRAM it would need the usage of the 2B Model instead of the 5B i guess...
but I don't think the 2b model supports the orbit lora so i gotta use the 5b model and i only have 8gb of vram anyways it already got too messy for me I called it quits
Yes i just read the list, for now only 5b models are supported. Well i guess it will take the devs 1 week to get new models out...
i saw a new tool called AnimateAnything that seems better than cogvideo in that it's not llimited to just orbiting left
This? Cogvideo seems far far better but still decent control. https://animationai.github.io/AnimateAnything/
but you could probably use Tora with CogVideoX to achieve similar level of control.
Did you try cpu offloading with fp8 when loading CogVideoX? Should fit in 8gb vram them btw.
similar name, different project: https://yu-shaonian.github.io/Animate_Anything/
that looks way better right?
i love this one , is like a dream
Yeah seems pretty decent, but honestly for cameractrl, dimensionx might looks better but only part of the loras are released yet(the orbitx loras, up and left)
https://chenshuo20.github.io/DimensionX/
For trajectory control, Tora might be better: https://github.com/alibaba/Tora
DimensionX is kinda crazy, the example vids
cool ill look into Tora
is LTX-Video only for ComfyUI ?
yes , you can run directly with python inference code btw
how fast should LTX render videos?
taking me quite a while with a 4080
and the quiality isn't that great
seems like the scene completely changes
do you tried with a longer prompt ?
do you know a good vision model that describes images well? I would like to automate the improvement of prompts with a vlm in comfyui ?
is the ltx video model enough, or do i need another model like stable video diffusion img2vid for it to work ?
Is enough
You just need to update comfyui and download t5 xl encoder
how extensive is the ltx model ? what if i want a video with certain movie characters ?
for now is hard to prompt with this model because is undertrained
the dataset seems to be limited
damn
but you can enhance your prompts with an llm , it works sometimes
I have no idea what else to Add to the prompt
Santa claus smoking weed and riding a bike, camera follows the bike, camera slowly moves up and centers santa claus on the bike
It just turns to different stuff with smoke in between
It feels like it makes 3 images and blends between them
Anything im doing wrong?
You can try Llama it can describe images with some versipns, and obviously gpt 4/o/o1 prev
You can use img2vid tech, where you give it the initial image, but i couldn't make it work my way yet
I wonder if cogvideox is better
remember the fiirst run is always going to include model loading times, always measure subsequent runs where everything is already in memory, i was getting like 160 seconds for 201 frames at default resolution and steps using 8gb vram
LTX
Using WD-14 tagger + llm_party would do the trick for captioning
thanks
i am using llm party , how to ask a multimodal ai ti generate prompt for i2v , when give an image it give the description of image unleas the system prompt
Hi, everyone! who has the experience about cogvideo 1.5 fine tuning?
I want it to generate a prompt for me to generate a video
minicpm-v is realy great for generating prompts with images
not that much, maybe 20 sec 15 seconds
but after a few runs it goes down to 10
Amazing
still though, it is incoherent as all hell
It needs long prompts to work
LTX I2V
"Generate an anime-style image of a young woman with long, braided hair that is half green and half pink. She has bright green eyes and is wearing a dark school uniform with a white collar and gold buttons. The background is a blurry, green outdoor setting. The overall aesthetic should be inspired by 'Demon Slayer' with a focus on vibrant colors and detailed character features."
The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.
My Links 🔗
➡️ Subscribe: https://www.youtube.com/@WesRoth?sub_confirmation=1
➡️ Twitter: https://x.com/WesRothMoney
➡️ AI Newsletter: https://natural20.bee...
what are the system requirements for stable video diffusion
I think open ai will release sora-turbo soon
very low, runs on mym 8gb pretty fast
they haven't even released sora at all, much less turbo
@silent hinge check it out I made these with SVD
maybe they have a sora2 or sora 1.5 in reserve. but I think the most likely is that they moved to another one because Sora had emerging abilities for world models
the weights weren't actually leaked, that would be way cool, openai just gave unlimimted early access to 300 people and one of the 300 decided to make an API for their account so others can use it on huggingface
so openai didn't really lose anything except letting more than 300 people they wanted to use their service
its stiill cool tho seeing all this new footage, amazing how much better their stuff is than anything open source @silent hinge @summer seal would you guuys agree that Sora is next level when it comes to via gen or would you say any other company rivals them in quality?
Minimax is definitely close if not a rival. The quality of minimax is next-level.
haven't seen anything beat the museum flythrough video sora did
I don't follow video that closely cos it feels a bit pointless spending money on it this early
I spent $10 on Sellerpic.aii
it does feel silly but itt was $10 for 50x 5s videos i feel that was a good deal
Yeah we should wait for more efficient faster models, I mean we already have Mochi which is really amazing, and can be easily enhanced to sora level outputs but no one really uses it since its just really slow and for full quality, requires 40gb vram gpu.
We neec much more control as well, the dimensionX control is amazing but only for cogvideox and hasn't been fully released. like this
agreed, its just a matter of time until open source catches up to sora and minimax
Actually with all the tools, I think we are already there. But it's not really worth it when most consumer gpu's cant even run the models, and even if they can, it takes forever.
that's why i agree once they're more efficient they'll be able to run better on less hardware, its just a matter of time until I can make sora level viideos locally and quickly with my 8gb gpu
have you seen those screenshots of the reddit threads peple talking crap 3 years ago thinking image generationi qualiityy won't be photorealistic in our lifetime?
we will soon have models that generate very good quality videos with a generation speed of >1 video/s, for me the next step in open source is video control, and long videos >1 minute
lol idk about 1 video per second, even LTXV which is extremely fast and just 2b params, takes like 10sec on a 4090 but yeah long videos are important, 5sec videos are not very useful. CogVideoX already has massive amounts of control, but I want to see the same level of control with other models.
I would like to see control tools on ltxv when the training is finished, there are also control tools for closed models that I would like to see in open source
there was a mode that google had presented (light) at the same time as sora, you could modify a video like omnigen but in video
actually @silent hinge post the animation here instead of in DM so we can all see it
wow nice!
are you still going to make that train one?
@silent hinge someone actually uusing the #svd to post svd generated content
/create prompt: flowing black ink in water, dramatic lighting, dark mood, abstract patterns, tattoo machine silhouettes, brutalism style 8k
Ltxv in 8gb vram https://www.reddit.com/r/StableDiffusion/s/iwgXl10PEv
TLDR: The video shows how to run LTX through diffusers and scripting and I'm assuming making modifications or decisions to ensure it fits in limited memory
I recommend instead just use it with ComfyUI where you get a nice variety of ways to save it, customize it, and you get --lowvram mode where you don't even have to worry or think about memory limitations
right
i will stil use comfyui
ol its funny bc i just went through the trouble of dowloading that dumb qwen2vl project to run it locally, t rried the diffusers code running it locallyy, I'm getting out of memory issues, so basically i can't test the original code if i wanted to port it
it's a shame that all these interesting models take up so much memory
This is amazing , i dont know you can reach this quality with ltxv https://www.reddit.com/r/StableDiffusion/s/XfmPcGUTlE
Making a few giantess renders, I used fooocus and Cog Studio for this one
@silent hinge pls post that clip you made with Mochi, it took hm 32 minutes to render it with his 6gb GPU lol
i imean technically she doesn't look giant, she looks normal sized, bc that could be a tall mound that's close by, it's just a matter of perspective, if i didnt know she was supposed to be giant I wouuld've never seen it
Does someone tryed v2v with ltxv it looks very clean ! https://x.com/AIWarper/status/1862231262486045087?t=MgNX3GWffsKu4trGoAD-TA&s=19
what prompt did you give it?
and mind sharing the work flow?
getting just barely usable stuff from it
The prompt is made by Florence node. This is a good method for not make still videos and get better videos https://www.reddit.com/r/StableDiffusion/s/YHoknAp5vK (not used in my video but i will adopt, it's really works). That video is the result of 4 different videos combined, the workflow is not user-friendly
custom nodes or vanilla?
Where?
The video is better if you use video helper suit node to load image
https://www.youtube.com/embed/9yFwgbAlvUU?si=6xtnPkCYDmCB7Dq- This is a test I made using fooocus and live Portrait to make a AI influencer and a explainer video
This is a new AI character I created for my influencer and explainer videos. Her name is Steampunk Lucy, or better known as Bot Gurl. In this episode she will review the Allegro tet2vid creator. She will be reviewing the version that you download to your computer. I went with the Steampunk look just because I love me some Steampunk lol. I used ...
I2V LTX
Also I2V with LTX
v2v LTX
https://www.youtube.com/watch?v=n5qeh5A0Zvw&list=RDn5qeh5A0Zvw&start_radio=1
anyone recognize something about this video and can tell what tool was used to make the vid?
Starships and Star-Babes Science Fiction Future in AI generated short film "Near Space".
Its a bunch of videos generated by a image-to-video model. The images themselves, I'm not sure but flux/sd3.5 can easily generate those images.
You can probably use any of the top closed source i2v service like kling, gen3, minimax to generate such videos. Even cogvideox which is open source could probably make the videos.
was aware of kling and mini but not the other 2, will check them out thanks
damn i just tried gen3 trial, was pretty solid. bit expensive tho lol
The videos don't seem to have a lot of motion either, you could probably even use LTXV which is open source. Its super cheap and fastest video model(3sec for a 5sec video on fal).
Quality isn't like gen3 but for the videos like in that yt video, you don't need gen3/luma/kling/ quality.
Okay
those are fantastic!
Does someone tried ltx-tricks nodes ?
The second is insane
so with sora leaked, does this mean anything for local video AI ?
nope, that was just a public api access but it was not a open model or anything. Mochi-1 is still the best text-to-vid model(better then gen3/kling1/luma) but requires a large amount of vram to run locally at full quality and is pretty slow.
Ive been using cogvideox for a while now and so far so good, is there some tips for prompting to generate more specific things for img2vid? Like if I wanted a ball to bounce 5 times and turn green after 3 seconds, is there a prompt scheduler for cogvideox?
I Love me some Steampunk! I made this montage using Fooocus and Cog Studio, did the final editing in OpenShot and Audacity https://www.youtube.com/watch?v=T-r11IDDnpQ
This is the first video in my all new series featuring amazing Steampunk images that I have created. All of the images in this video were created using open source software. I used Fooocus and Cog Studio to make the image and video's. I also used Audacity to edit the music and Video Shot to do the editing of the final project.
Is there a tool I can add to stable diffusion to make it possible for me to make a video?
@distant vale Cool, glad you liked it! It took me about three days to make it, most of that of course was render time because I am using a Geforce RTX 2070, so it takes about 40 minutes to animate a 6 second image
@distant vale Really? I am only using 8GB of VRAM myself, and I have never had any issues, I have used both txt2vid and img2vid at the 2888x1920 super resolution option and never had a problem, I have a i7 cpu and 32GB of computer ram. The program that I use is actually called CogStudio, I downloaded it using the Pinokio web browser
@distant vale TBO I am pretty new to all of this, never even noticed that part lol, but yes I use the default 2B, and I also use the float16 option if that helps
I don't know video stuff sorry
No idea, I mostly use models with diffusers. I would probably recommend asking in the banadaco discord because it’s all about video gen there with comfyui.
any programms like CogStudio or similar that can be used on services like vast dot ai ?
@distant vale So I tried the 5b model in Cog Studio and it worked great, took longer then the 2b of course, but the quality is much better
Only problem is that it requires 24gb vram gpu I believe right now, even with later optimizations, idk if it can fit in 8gb vram. Maybe with 4bit quantization.
you can use fp8 version, so you don't need to quantize yourself
https://huggingface.co/Kijai/HunyuanVideo_comfy/tree/main
Lol, why did Tencent finetune a 389b param llm(which is pretty eh, mistral 123b outperforms) to just enhance prompts for HunyuanVideo which even llama 3.2 1b without finetuning can easily do.
https://huggingface.co/tencent/HunyuanVideo-PromptRewrite
800gb 😆
i love stg because the video have more motion
whats the best way to get longer videos? i'm currently using cogvideox
What is the best way for img2video in a1111?
Probably Deforum
Is it worth switching from a1111 to comfy?
For video? Sure
Here is another AI generated video I made about the Steampunk genre. I used Fooocus for the original Image and Cog Studio to animate.https://www.youtube.com/watch?v=SXIRhr2jA08
Here is another episode in my Steampunk series, where I use AI to recreate the steampunk experience. As usual I used all open source programs to make this video. I used Fooocus to create the images and Cog Studio to animate the images. Then I used Open Shot to do the video editing. All of these programs are 100 percent free and can be downloaded...
IMAGES OF STEAMPUNK (THE DARK SIDE)https://www.youtube.com/watch?v=MAnHDWfzpto
I love me some Steampunk, but my favorite kind is the dark, gloomy, gothic type of Steampunk. Where evil mechanical creations lurk under the cover of fog and darkness, with plenty of sexual deviance and gloomy undertones. So this is my first collection of renderings focusing on the dark side of Steampunk. I used an open source program called foo...
"In this powerful and visually stunning video, Moses has a divine encounter with God, who appears as a child in the desert. The video explores spiritual themes with breathtaking visuals shot in IMAX and Panavision 70mm format. Experience this sacred moment with immersive sound and music that will transport you to a world of mystery and divine re...
I wonder if Hunyuan would ever be compatible with Forge in the future
It doesn't support controlnet for flux or SD3.5 officially. I wouldn't have any hope
I gotta learn how to use comfy then...
For video it's a must
"Prepare to journey to a galaxy far, far away... in this groundbreaking trailer for a 1953-style Star Wars adventure starring Sarah Jessica Parker as Jedi Alora Vannis. Featuring stunning Panavision Technicolor 70mm visuals, this trailer transports you to a time when the Force was new and the galaxy was filled with untold mysteries. With an unbr...
prompt = (
"A beautiful Asian woman, wearing stylish and slightly sexy clothes, "
"sitting in a cozy cafe, holding a cup of coffee. "
"Photorealistic, highly detailed, natural lighting."
)
https://www.youtube.com/watch?v=l3wRDb-03Zg I usually don't make xmas videos, but I had a few xmas renderings already made up so I thought I would mix them in, hope you all emjoy, and Merry Christmas. By the way, I used Fooocus to create the images and Cog Studio for the animations!
Welcome to "Images of Steampunk Dark Christmas," a captivating 1-minute steampunk music montage that blends the whimsical charm of Victorian aesthetics with a dark holiday twist. This AI-generated video showcases stunning steampunk visuals, bringing together intricate gears, vintage machinery, and a mysterious, Christmas-inspired atmosphere. If ...
Join us in this enlightening presentation as Albert Einstein unveils the potential of artificial intelligence in filmmaking. Using a simple yet profound formula, Einstein predicts the year when AI-generated films will rival the quality of Hollywood's best. Discover how technology will reshape creativity, culture, and the economy in the film indu...
I was going through my AI folder and saw I had quite a few renders left over from this year, so I decided to put them together in this video! I used Fooocus to make the images and Cog Studio to animate them! https://www.youtube.com/watch?v=J6gWbgs2_jE
I am still very much in the experimental stages of AI art. Most of the time I make renderings based on how I feel at that moment, other times I make renderings based on ideas that I have had in my head for many years, but regardless of what motivates me, I know with AI I can make that video or artwork a reality. I use a program called Fooocus, a...
Can cogvideox only make 720 * 480 videos?
One message removed from a suspended account.
yeah
nope, the funs one are very flexible, they can go as low as 128x128 or or 1024p and the 1.5 cogvideo can go as high as 1024p too.
I probably recommend ltxv though at this point since its so much faster and honestly better/similar quality.
New video out!! This time with a futuristic theme. As usual I would love to get some feedback, and mabye also support to keep growing. I cant imagine how I reached 500 subs in 2 weeks. Much love ❤️
https://www.youtube.com/watch?v=BJAdKSqfhSQ
In the year 2170, humanity has left the surface behind. Above the endless sea of clouds, futuristic cities float like islands in the sky, gleaming in the sunlight. Below lies a forgotten world—dark, desolate, and ravaged by time. This cinematic journey takes you through breathtaking skybound metropolises and the haunting remnants of the old Eart...
any hunyan prompting tips? espeecially when trying to create silly things like chef kitties lol
just tried 1024 seems 96 frames max for my vram
for 1280 max was only 64
Guys, I have a question: what image to video method would you use to animate a boy's mouth for a few lines?
I'd like to make a video like this from an image like this
Here is my latest AI generated vid, using fooocus and cog Studio along with a few other open source tools! https://www.youtube.com/watch?v=C24eywwdj9Q
This is my latest AI Steampunk inspired music video, like all of my videos I only use free and open source programs. A lot of people have a misconception that all AI generated content is super easy and you really don't need talent to make outstanding work. Well that might be true in some cases, but in my case that is not true at all. First of a...
move slowly into center
I did just hit 1000 subs in under 2 weeks, got so many people from discord supporting me, love you all ❤️
https://youtu.be/7VzUF-Z9Mzg
Wow, I can’t believe we’ve hit 1000 subscribers in just two weeks! Thank you so much for all the love, support, likes, comments, and subscriptions. I’m beyond grateful to have you all here with me on this journey! I can’t wait to share more exciting content with you, including longer stories and new creative projects. This is just the beginning,...
The people are slowly walking from the shore to the boat, preparing to board the boat, and the birds are flying slowly in the air
i made the part 3 of liminal spaces turned to videos : here
softwares used :
- comfyui : https://github.com/comfyanonymous/ComfyUI
- ollama(prompt generation) : https://ollama.com/
ai models used :
- video generation(ltxv0.9.1) : https://huggingface.co/Lightricks/LTX-Video
- sound generation(mmaudio) : https://github.com/kijai/ComfyUI-MMAudio
- prompt generation(minicpm-v) : https://ollama.com/librar...
🌥️ New Video Out Now! 🌥️
"Unfair: Dreaming of the Sky | Emotional AI Short Film" is live on YouTube! This is one of my most emotional projects yet—a story about a cute girl dreaming of a life beyond the shadows, in a world where only the privileged get to live above the clouds.
I used AI tools like Leonardo AI, Hailuo AI, Chat GPT to create stunning visuals, and edited it all in DaVinci Resolve. If you enjoy it, please consider leaving a like, a comment, or even subscribing—it really helps me create more!
🔗 Watch here: https://youtu.be/IqTFKBCBT04
💬 Let me know what you think! Feedback is always welcome. 🌟
In a world divided by wealth and power, the privileged live in grand cities above the clouds, basking in sunlight and luxury. Meanwhile, those left behind struggle to survive below, in darkness and despair. This emotional AI short film follows a cute girl from the shadows below, dreaming of a life beyond the sky—one she will never know. Through ...
Well hi everyone, how does this work ?
🌿 New Video Out Now! 🌿
Check out my latest AI short film, an emotional fairy tale with breathtaking visuals and heartfelt storytelling. Don’t forget to like, comment, and subscribe if you enjoy it! 💫
🔗 Watch here: https://youtu.be/X5-YTBA6fxU
Would love to hear your feedback! 😊✨
In a world where magic once thrived, cute fairies girl lived in glowing forests and sang songs to the stars. But as belief in them faded, so did their light. This emotional AI short film follows a heartfelt fairy tale about loss, hope, and the magic we’ve forgotten.
For business inquiries, collaborations, or sponsorships, feel free to reach out...
👁️ New Video Out Now! 👁️
Check out my latest AI short film, "Rethinking Monsters Through a Young Girl's Eyes". A unique blend of beauty, fear, and stunning AI visuals. Would love to hear your thoughts! 💬
🔗 Watch here: https://youtu.be/oLD15N2Wh6E
In a world where beauty meets horror, a young girl faces her deepest fears as monsters loom in the shadows. This emotional AI short film blends breathtaking visuals, eerie atmospheres, and heartfelt storytelling to explore the contrast between innocence and darkness. Witness a tale where beauty stands in the face of terror, and fear becomes some...
Is the best local video model right now Cog video x1.5?
100% not, hunyuan is the best for sure, comparable to commercial video gen models but it doesn't have i2v yet. i2v release date is supposed to be in january I believe.
ltxv and cogvideo are significantly worse but both have i2v, if you want i2v though, I would highly recommend ltxv since its way way faster and similar quality, possibly better.
Hi, I was trying to run cogvideo, I got this
im downloading comfyui with hunyuan and im wondering what hunyuan model should i install? to do videos in 720p at least constistently for like tiktok and insta reels. I can choose from the bf16 model to like q3 even to q8 (q7 skipped). i have a rtx 4060 ti with 16 gb vram and 16 gb ram
all the models are from the hugging face website
how to make an ai video from image but with prompts too? so the ai doesnt do only whatever it wants, i want to give it direction
Fp8 is the fastest, for gguf I would recommend q4 or q8
Ok so is like Q8 slower but a better video?
?
Yep, q8 is the closest to default but the slowest too.
KED KED KED 8B1TCH - OVRLL BGZ - HORRORCORE EXPERIMENTAL SCRATCH HIPHOP AGRESSIVE SERIAL EXPERIMENT
]
MADE WITH SUNO AI
CHECK MY INSTAGRAM: @overll_bgzy
#suno #sunoai #aimusic #aivideo #aftereffects #aiart #ai #phonk #hiphop #lofi #beat #beats #aesthetic #genmo #capcut #rap #rapbeats #horrorcore #horrorcorehiphop #brazil #brazilmusic #cyber...
a cat playing with a bal .
The character walks through the forest and spots something surprising."
roar cub lion
While trying a cogvideo workflow
I tried manual installation, "Try fix"
Is there some simple workflow?
i downloaded hunyuan ai in comfui as a gguf. how do i add like audio synced to the video within comfyui ?
Noob question here. I'm gonna come out and just pretty much assume that a 3070 is not enough for local video diffusion, is it.
Well it can work, it will be just slow.
Ltxv is going to be pretty fast like 1min but quality isn’t going to be like closed models.
Hunyuan is really good quality(comparable to closed) but will take much longer and has no image to video.
A serene sunrise over rolling hills, but slowly, the skies darken and thunder rumbles.]
[Opening shot: A serene sunrise over rolling hills, but slowly, the skies darken
Hunyuan
that's run locally, how much VRAM?
I have yet to run some img2vid locally, I couldn't run cogvideo
the setup is so tedious and I got errors
Yes, it's local with 24GB. I use i2v with Hunyuan as well.
how long doest it take to generated? it is possible to run on 12 GB?
I had problems even geting the simples workflow running with 16GB
I have Hunyaun running on my 8GB 3050RTX. I can get 173 frames at 576x416. Feel free to try out the workflow. There's a lot of bypassed junk in this ongoing WIP.
Ever wondered what Attack on Titan would look like if AI took over the script and animation? Well, wonder no more! In this AI-powered parody, we’ve rewritten Episode 1 with brand-new scenes, unexpected twists, and dialogue that definitely wasn’t in the original.
From hilariously misplaced action sequences to dramatic moments that take a turn for the absurd, this is Attack on Titan like you've NEVER seen before!
Ever wondered what Attack on Titan would look like if AI took over the script and animation? Well, wonder no more! In this AI-powered parody, we’ve rewritten Episode 1 with brand-new scenes, unexpected twists, and dialogue that definitely wasn’t in the original.
From hilariously misplaced action sequences to dramatic moments that take a turn fo...
My first attempt at cinematic, filmic AI trailer from the 90's (all practical effects) - Star Wars IV (1994): https://youtu.be/NFuB1Y5QQ_E created using SwarmUI for stills & ComfyUI (nodes) for video; Hunyuan Video with V2V + Loras, Hailuo Minimax as my base videos and I2V, SDXL & Flux .1 Dev for stills. Also some minor Grok2 and some stills tweaked using Photoshop's generative video function. What a whirlwind this was... Happy to answer any and all questions. (PS contains some violence so I guess its considered NSFW depending on where you work)
Music by John Williams, Myles "Rain Sword" Rogers, Omega, Michael Kamen, Brad Fiedel.
Created using Hunyuan Video, Hailuo Minimax, SDXL 1.0, Flux .1 Dev, SwarmUI.
Is there any way to use hunyuan video online
without comfy
or through something like google colab?
Poe, but you'll need a subscription cause of very intense credit requirements on hunyuan
Ai Generated with Stable Video Diffusion
Models used: epiCPhotoGasm ultimateFidelity and Stable Video Diffusion - SVD - img2vid
#ai #aivideo #stablevideodiffusion #stablediffusion #liminalspace #rtx2060 #backrooms #uncanny #uncannyvalley #liminal
a cat playing with a bal .
generate Cartoon IP for voice product, it will show in TV, cute
Do you happen to have a guide you followed?
I'm pretty sure it was an example file. Try the workflow I posted above.
Way better results than even image gens with hunyuan 
@foggy lantern Compromised account right above

je souhaite trouver une personne qui pourrait m'aider : stable Diffusion et DeForum a fonctionné pdt trois semaines et maintenant ... ça marche plus; iT Doesn't work anymore. I don't know if anyone could give me some help.
Hey @foggy lantern, can you permaban the person I'm replying to? Spammed all channels.
Will do, back in front of a computer in 15
Thanks a bunch. Tired of bots trying to scam people.
@foggy lantern Compromised account above.
Steps:
- Open up mitte.ai
- Create or upload images
- Open image in editor
- Switch to Video from toolbar
- Write prompt + hit generate button
for those looking for Veo 2 image to video, mitte.ai has it right now
Hi guys👋
Please help me to solve a problem in Stable Warp Fusion.
I will detail the details below:
Need to make a picture similar to the one at number 1. That is, need the same style, clarity, drawing realistic-anime, cartoon, etc.
I get a picture number 3.
I used a lot of models such as revAnimated_v2Rebirth, realisticVisionV60B1_v60B1VAErealisticVisionV60B1_v60B1VAE, realisticFantasy_v20, juggernaut_reborn, faetastic_Version2, dreamshaper_8, Anime_style.
Also tested different settings, strength, strength in promts, tried lors, samplers, etc., but the result did not change or was worse and not at all like what I need.
The picture remains dark, fuzzy, poor quality, smoky, well you can see everything yourself.
Picture number 1(frame from the video), was made on the model Revanimated(most likely).
Also the background is not drawn (I need an abandoned building with mirrors on the walls), but in the first picture the background is drawn well, although the original video is just white empty walls and floor.
Under number 3 will be the original picture from the video (original).
Also when loading there is an error on the last picture, but despite this the service still loads and works.
@foggy lantern How is the rules regarding mmaudio? for instance a 24 sec clip from the matrix, but mmaudio applied for a.i generated audio? Or would that go under "no piracy" rule?
I'm cool with it, if someone tells me otherwise I'll let you know
Gotchu! Have my "low budget" gen of the first matrix fight 
this is pretty cool!
I'm gonna have to check out this tool
Update all the nodes first, as my videohelper suite was outdated and borked and had to reinstall it :P
But for a 24 second video, it barely had enough vram left on my 3090 :P
Where can I find "How to" to do videos with stable-diffusion? Edit: or is it just in alpha stage?
trying to find a nice workflow for slowing down video footage taken at 60/120fps -> 240+
You're not trying to slow it down. What you're actually trying to do is generate frames between frames. You want specialized stuff for this.
I recommend https://github.com/k4yt3x/video2x
So... can some one give me a tutorial on running hunyuan in comfy? I read that it needs me to install some dependencies outside of comfy? Also, I reinstalled comfy a month ago and couldn't find why the comfy manager is...
It's been a long time since I last used comfy
Sorry
Also does WAN 16B needs extra dependencies too?
So, eh, I think I got the comfy stuffs set up, but I've run into another problem:
I tried running it with the quantized gguf model, but I don't know which node I'm supposed to use to load it
Put the Wan gguf models in unet folder, but it does't show up in the comfy UI, what did I do wrong
hi. the file should be placed in /ComfyUI/models/diffusion_models instead of unet folder, and in comfyui instead of load clip, load GGUF
hi, I can't find a GGUF loader, is it an addon node I have to pull somewhere?
Thanks for replying btw
Could be, do you have ComfyUI manager installed? It could help you in the long run
Yes I do have comfy manager, do you have a gguf node installed?
Seems to be running, will let you know if I fucked it up
Hi, it did run but ended up with all green noise.
Are yor workflow from the huggingface wan repacked?
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/example workflows_Wan2.1
I couldn't find the vae gguf node
Can you perhaps share your workflow with me? I can at least try getting the manager to download the correct nodes for me
Huh, it's too lae over on my end but I suddenly thought it might be a problem with the version of comfyGGUF I chose
I picked latest instead of nightly. Probably why I don;t have CLIP type of wan
still missing VAE-GGUF tho
Yeah... I guess I wasn't supposed to use the unet loader
sadly I just can't find a "GGUF loader" whose model type name is "gguf"
I'll give this a rest for now
Best I can get is the above.
That's probably because you can right click on nodes, hit "title" and edit them to whatever you want them to say. >_>
It's working! It might have been a problem with the resolution or the model (I switched to Q4_K_M and it's outputting the fox)
Does anyone have any idea what app may be capable of doing something like this? Also, this is an automated process, after recording the 4-5 second normal video it processes and outputs what you see here.
Man WAN is great
WAN is awesome, I just wish it were faster. 6 minutes is not too bad, but it's still such a long time to wait when images generate in 15 seconds. I hope I can get the generation down to at least 3 minutes. That wouldn't be as bad.
The workflow is embedded I think (ComfyUI)
Look up the kijai workflow
There is a good tutorial on youtube
I mean it depends on the resolution, im making those pretty big
Kijai is a good base, but there are some improvements you can make outside of that
You talking about tritton/sage/teacache? I looked into some of that and it looks way too complicated
It's pretty straight forward, there is a Triton for Windows repo that talks you through
Getting that working is key
did it make a huge difference for the speed?
It's taking me about 6 minutes to generate a 440p video.
For me Yeah, I could increase my speed by almost 400% with a few other optimizations
But I also got a 4090 so idk
Tbf I prefer the setup I posted tho, some stuff has major quality losses
WOW! I have a 4090 too.....ok....sounds like I need to really set up tritton. Did you set up sage as well? I still don't understand if that's the same thing or if it's different. I'm going to save your workflow above.
any chance you remember what github you followed to install it?
I can check later if I dont forget
how long does it take you to generate a 480p or 720p video?
400% speed increase sounds amazing
Im doing a little bigger than 720p in about 4-5 mins atm but not everything enabled
Well I would use the clip and split it into half. The take the last image of the first part and use a llm on it to get a prompt. Then use the prompt as base for deforum or image 2 image generation. Align the result images or the deforum animation between part 1 and part 2.
I understand what you're saying, but i'm interested in a option to do this on-site, in a photobooth style of setup, where people don't have to wait more than 1-2 min
And? I guess the resolution of the image would be like 768 x 512 or similar. On a RTX 4090 generating a bunch SDXL Images don't take 1 minute. The LLM Part works in seconds. Simple cut and paste operations with ffmpeg works also blazing fast. Won't see any problems with a timeframe of < 2 Minutes.
Wan test
rife interpolator benefits Wanx a lot and it is fast, there is comfy integration too
Thanks. With GIMM-VFI Interpolate:
is there anybody?
no
Hi guys, can you please advise me why the face can turn out so fuzzy and poor quality?
I tried all possible settings, promts, constrolnet, but the face is still so bad.
I got triton installed, thank you so much, but the workflow is not embedded in the video. When I drop it in comfy, nothing happens. Can you point me in the direction of the workflow you are using?
animar personaje estilo pixar
A boy
Thats mine
🕷️ ⚠️
new bee
zen me le
@late dove 老乡吗?
@late dove 哇,你中文这么好
你是不是也懂得“new bee”的意思👍
@late dovefine . it's good to talk with you. I'm Chinese 🇨🇳
@distant badgerMy English is not very good,but i'd love to make new friends with English speaker😃
生成视频
Hey guys, I'm new to stable diffusion.
I want to create several images which contains the given person with the exact same hair style.
Is this possible?
any notebook for image to video generator that works?
I made this video using Hunyuan Video text to video. I swear the quality of these programs just gets better and better https://www.youtube.com/embed/EnuHXtBcOas?si=FpYYtzdR72x_4ZTX (edited)
YouTube
This is my first video using the Hunyuan text to video program. Using this program I noticed a major improvement in image quality over the program that I use to use called Cog Studio. For this video I used all free and open source software, except for the editing program. The editing program that I used is called Pinnacle Studio and even though ...
What's is the latest video generator that is available?
WAN2.1 is the latest I know of.
Can I use WAN2.1 locally on desktop?
If you have enough VRAM, yes.
Is WAN only for comfyui or can I use it with SDUI?
I have no idea if it can be used with SD Web UI or Forge. I've been using ComfyUI for the last year or so.
/animate [attach image] motion=high style=cinematic, dynamic motion,the boy is running towards the light, Frames per second (FPS):30,Resolution: 1280x720,Animation length:5 sec
You can use pinokio for Wan 2.1
Vision,This vibrant,digitally illustrated image showcases a whimsical,futuristic cityscape in a cartoon style. A central tower,resembling a giant clock,features various colorful rooms and platforms with characters in casual attire. The city is surrounded by fluffy white clouds,with a bright blue sky above. The scene is bustling with playful elements like a flying soccer ball and a cat. The overall aesthetic is cheerful and imaginative,blending elements of fantasy and modern architecture.,
sharing some results i got using sdm1:
https://youtu.be/jxRS9WhhoEY
What if an entire city could be frozen in time? This is the tragic story of Pompeii—a thriving Roman metropolis wiped out in a single day by the fury of Mount Vesuvius in 79 AD.
⚡ Witness the warning signs that were ignored.
🔥 Experience the terror of the eruption.
🕰️ Discover the haunting remains of a civilization lost for 1,700 years.
...
((cartoonish style), (Q版 fantasy)),
main elements:
smiling sun character with straw hat (拟人化太阳),
wheat fairy holding scythe (木属性精灵),
dynamic composition with wind-blown wheat waves (火性动感),
color palette:
orange sun (丙火),
emerald wheat (乙木),
light gray clouds (金属性弱化),
avoid deep blue or silver (忌水金)),
text overlay: "庚午匠心" in bold calligraphy (火属性印章)
这个怎么用啊
Do Flux and WAN work on RTX 5090?
