#๐๏ฝsd3
1 messages ยท Page 128 of 1
#๐๏ฝsd3 panorama view of a carnival, lit at night, beautiful night background with Ferris wheel, stall, roller coaster
Woah. Works amazing.
So that's an anime "drone shot", is it?
Why do the drone shots have drones in them though. Also these buildings are pretty wonky.
Probably because of the low resolution on my PC. ๐
I don't have that issue at all. Quite the contrary. I get contrasty punchy images
Sounds like an issue in your setup. Either nodes or Python.
I agree, SD 3.5 has more vibrant colours than flux in my opinion
Send her to rehab, now!
Kohya_ss does not support Lora training for SD3.5 yet, right?
SD 3.5L def is still nice, it's like flux's uneducated but creative cousin, bit dumber but really tries to make up for it ๐
i havent tried to much with 3.5 but the first SD3 you could pop out some super vibrant stuff
now with audio
What were your prompts here?
It may be only for certain types of prompts I guess
But it's for sure something I've noticed, again just versus the original SD3 on the same prompt with completely identical settings in Comfy
Show me a prompt
nice, ltxv with mmaudio?
I have some outputs saved at home that show this IIRC, I'll find them later
Any prompt. Even the seed number. I will run it and see if they match.
Easiest and best way to troubleshoot
ya
It's not a bug, it's definitely just a stylistic difference lol
I'm quite sure my Comfy setup is totally fine
The thing is, I use a very wide variety of prompts, art, design, photorealism, and so on, and have yet to see a single instance that remotely matches your description
basically -> an LLM interpreting some random shit XD
i have no idea what the actual prompts were
i just press generate at this point
Speaking of LLMs, I downloaded and tested one of the quantized versions of Llama 3.3 (30GB file) since my machine has a total of 32GB RAM, and was shocked at how good it was. Truly shocked. I fed it a favorite, asking to describe a scene as if for a role-playing game, and it was absurdly good.
Since talk is cheap, here is my query and its reply.
Write me a description for a role-playing game in which the player has just entered a small cave. In it are two goblins arguing over a small chest of treasure in the center.
As you step into the small cave, the cool darkness envelops you, a stark contrast to the warmth and brightness of
the outside world. The air inside is stale, heavy with the scent of damp earth and mold. Your eyes adjust slowly
to the dim light that filters through the entrance, illuminating the rough stone walls and the uneven floor. Ahead
of you, in the center of the cave, lies a small chest adorned with rusted locks and bands, its wood surface worn
smooth by time and handling.
Two goblins, their wiry frames tense with agitation, stand on either side of the chest, their argument filling the
space with high-pitched voices and animated gestures. They seem oblivious to your presence, too engrossed in their
dispute over the treasure that supposedly lies within the chest. One goblin, slightly taller than the other,
brandishes a crude dagger, its point glinting faintly in the dim light. The shorter goblin, undaunted by the
threat, sneers and spits back, baring yellowed teeth.
The cave around them is sparse, with jagged rocks jutting out from the walls and a scattering of pebbles and dust
covering the ground. A small, dying torch, its flame flickering wildly, casts eerie shadows on the walls, making
it seem as though the very darkness itself moves and twists in response to the goblins' quarrel.
Your entrance has gone unnoticed by the arguing pair, offering you a moment of surprise and the opportunity to
decide how to proceed.
This was quite literally my first query, not some result after multiple tries or the like.
q3? i tried q4
Q3_K_S
it was all I could run. lol
but in light of the output, there are no complaints either
ya im still in "testing" mode heh
Out of curiosity I reran the exact prompt, and while some things are the same (a few phrase bits), it made multiple and quality changes too.
I know these models have a very wide range of uses, even programming is supposed to be top notch, but I like to test the basics too. It must live up to the ANLP after all.
on ram?
i'm trying the exaone by lg, i think it's pretty damn good too, to be honest
Scene: You cautiously step into the narrow, dimly lit entrance of a hidden cave, the musty air thick with the scent of damp earth and ancient moss. The flickering light from your torch casts eerie shadows across the rough stone walls, revealing intricate, gnarled formations that hint at eons of silent passage. Ahead, the cave opens slightly, revealing a small, circular chamber where the flickering torchlight illuminates a scene of unexpected chaos.
The Encounter:
In the heart of this chamber, two goblins are locked in a heated argument, their small, green forms towering over a modest chest nestled in the center. One goblin, clad in mismatched armor with a crude sword clutched tightly, is red-faced and gesturing wildly, clearly agitated. Its counterpart, perhaps slightly more cunning, is dressed in ragged cloaks, clutching a crude dagger and trying to reason with its compatriot through frantic gestures and shrill shouts.
Treasure Chest: The chest in question, though unassuming, gleams faintly with gold and jewels peeking through its worn leather bindings. The treasure within could provide a significant boost to your adventurer's coffers, but navigating the situation requires finesse and perhaps a bit of cunning.
Choices Ahead:
Intervene: Approach the arguing goblins and attempt to mediate the dispute, perhaps offering a bribe or a clever negotiation to claim the chest without bloodshed.
Stealth Approach: Move silently around the chamber, observing the goblinsโ behavior for an opportune moment to snatch the chest when their attention is diverted.
Confrontation: Charge into the fray, using your combat skills to overpower both goblins and claim the treasure by force.
The path you choose will set the tone for your journey ahead, potentially unlocking new alliances, enemies, or hidden paths within this labyrinthine network of caves. What will you do? ```
what are you running it on?
what hardware ๐
ollama run llama3.3:70b-instruct-q3_K_S
Ohhhhh.... lol. New laptop I got for BF on a deal ($1070 after tax): i13900-HX (32 threads), 32GB, RTX 4070, Cherry KB
that looks like mostly run in ram then -> must've taken a while ot get that answer
it did, but wasn't back breaking either
Since unlike images where you wait until the end, here I got the text output in order, as it was rendered
word by word, line by line
marco-o1 is REALLY good for prompt enhancing, if you add a reasoning block (it's actually a reasoning model)
and time to say goodnight. sd3.5L one of the best models to play around with โค๏ธ
the coffee went bad
prompt:A beautiful woman, black hair, modern dress, sitting on a sofa
unfortunately, this is not an image generation channel
Ah, lol, which group is the right one?
i actually have no idea
OK, thanks all the same
there is no free image gen bots here, only local is free
Ah yes, the feared ArmEGGadon
prompt = (
"A beautiful Asian woman, wearing stylish and slightly sexy clothes, "
"sitting in a cozy cafe, holding a cup of coffee. "
"Photorealistic, highly detailed, natural lighting."
)
prompt = (
"A user who doesn't waste space asking for free images, knows how to read, wearing stylish and slightly sexy clothes, "
)
this is a realy great use of reasoning model
can GPT O1 make a nice prompt or would it be terrible
here's a good example of what I meant
here neither the original SD 3.0 or SD 3.5 Medium really came out particularly good in terms of overall composition and coherency (both would need at least probably upscaling and a second denoise pass), but the stylistic difference is quite obvious
first pic is the original SD 3.0, second one is SD 3.5 Medium, seed was the same
prompt was
a still from a movie, featuring an elderly man dressed in a traditional wizard's attire. He is sitting in a dimly lit, ornate library filled with rows of wooden bookshelves that stretch into the background. The man has a long, flowing white beard and is wearing a large, pointed, black hat that casts a shadow over his face. His robe is also black, made of a heavy, textured fabric that adds to the mystical appearance. He holds an open, large book in his lap, and his hands are resting gently on its pages, suggesting he is deeply engrossed in reading or studying. The room is illuminated by a warm, golden light, which enhances the rich, dark tones of the wood and the man's attire. Several candles in brass holders are placed on wooden tables around the room, their flames flickering softly. Behind the wizard, a large, intricate astronomical globe is mounted on the wall, adding a sense of ancient knowledge and mysticism to the scene. The overall atmosphere is one of quiet reverence and intellectual curiosity, typical of a scholar's sanctuary. The image captures a moment of quiet contemplation and magical study.
Again this definitely isn't a "bug", this is just what SD 3.5 thinks certain prompts should look like for whatever reaon, in contrast to the original SD 3.0
but I'm not sure what actual part of the prompt it is exactly making it think the whole room should appear foggy / hazy
my only guess is maybe 3.5 had more training on like classical oil paintings and similar traditional art than the original 3.0
and that's unintentionally bleeding into the image sometimes when certain keywords are hit
Just to compare, this is the same prompt on flux schnell (in 4 steps)
yeah that's what I'd expect-ish
main reason I know the prompt itself is the cause in any case is that I can also get insanely good, photographically perfect, vibrant images from SD 3.5 Medium in many cases
for not-long prompts, at that
this one was literally just beach, water, rocks, cliff, raw, photo, realistic for example
๐
It looks amazing
i dont'care about realism that much... xD
yeah, that sort of thing is the main reason I like SD3.5
Flux can't generate images that look like that unless you use Pro Ultra 1.1 in "Raw" mode (which seems to be I guess just their actual full model without any distillation, like even the original "Pro" seems to be distilled in comparison)
oh yeah I forgot, this one is with the RealVis guy's WIP finetune of 3.5 Medium, on the same seed and everything, a noticeable improvement overall in comparison to both
so something to keep an eye on IMO if you like 3.5 Medium
I love flux but I agree, textures even in the original sd 3.0 are much sharper and detailed
sd 3,5 with iphone photo lora (idk if the lora worked in this case)
it's doing the "Large when you img2img" thing around the edges lol
"spaghetti universe taking over reality"
Yeah i noticed xd, but its text2img
actually 3.0 Medium also had that problem
3.5 Medium is the only one that solves it / doesn't have it
It seems it doesnยดt support wide resolutions
high res / multires support is again another reason I prefer 3.5 Medium
1440x1440, no problem
well time to try
Large Turbo is pretty comparable to your Schnell gen, if I take it to square format on the same seed, also 4 steps
i wouldn't use the FP8
GGUF Q8 will be better quality if you can't run the FP16 for some reason
but even that is only like 5GB
FP16 T5 is the issue for most people really
just using the GGUF Q8 or FP8 T5 encoder can be enough to get you going in a lot of cases
while still running the FP16 version of the actual transformer model
YMMV. Here is what I got on first try:
from Flux?
Yes
๐ ๐ ๐
I usd no LoRA
well again I did say you "have to use Pro 1.1 Ultra in Raw mode" to get similar results to my 3.5 Gen
like if I actually try it on Ultra Raw (only gonna do one cause it costs like a million dollars per gen lol), I get this:
I did not use Flux Pro.
Any idea why comfyui doesnยดt recognize the .GGUFs? only the safetensors files are working
you need uh, the City96 GGUF loader nodes
they're in Manager if you don't have them
Aaaa thanks, i found it
right, yeah, I was saying their most expensive API only model is the only one that can reliably do photographic stuff without any hint of distillation
getting random cartoon gens from Flux Dev and such on the other hand is definitely a thing though yeah
yeah that's about the best Dev can do
it's got that noticeable CGI-like crispiness everywhere though
particularly on the moss and stuff
I wonder if dev has that problem only because it is a distilled model , and also if it could be "fixed" with finetunes
it does
that's my whole point lol, Dev looks like all distilled models do, overall
you throw a rank 16 lora trained for 100 steps at flux and it'll blast cartoons out fine
trained with 512 x 512 images in fp8 mode
yeah I know
all I've been trying to say is that Dev looks like what it is, a distilled model
Ultra Raw 1.1 isn't distilled seemingly and so can do the unfiltered photography stuff properly
best example of this is, what people jokingly refer to as "Flux Girl" should really just be called "Distillation Girl"
cause if you generate literally any image of a woman with SD 3.5 Large Turbo specifically, you get results that could easily be mistaken for a Flux Dev gen without Loras
the sameface is clearly a result of distillation in general, basically
you guys using comfyui? it doesn't make the knob fiddling very easy. on forge i dialed in a couple settings, prompted for a cartoon dog, and got a cartoon dog
flux isn't that hard to crack through the distillation
i get it. it's difficult to load an entirely new worksheet for every situation, but maybe it's just a problem with your node graph and not the model
I think you've missed my point
which was just "Dev is sometimes stylistically inconsistent for terse prompts, and won't always respect single keywords like photo"
like I've released numerous Flux Dev loras myself lol, I'm well versed in it
i must've missed that point when you said "Yup that's about all you can get outta it"
I was saying the actual visual appearance of his photographic beach Dev gen was the most photographic Dev can look without Loras that brute force realism back in, basically
so still I think misinterpreting what I said
the picture was labelled "a random cartoon" and posted in context of flux making cartoons.. so i guess there was hidden context i wasn't aware of
he was being sarcastic as far as I can tell
possibly also not quite getting what i meant
again my entire point boils down to "Dev looks like exactly what it is" lol, nothing else
Correct. He was saying the images cannot attain photorealism and random cartoons are the norm
i just wanted to clear the misconception up that flux is incapable of cutting through distillation aesthetic
no lol
see here
#๐๏ฝsd3 message
dude is a troll. i'll ignore for future reference. seems he's an AP that wants to derail discussions
?????
??
not you. you know who i mean
Anyhow, here is your reply Zoot:
ZootAllures91 โ Today at 3:17 PM
right, yeah, I was saying their most expensive API only model is the only one that can reliably do photographic stuff without any hint of distillation
getting random cartoon gens from Flux Dev and such on the other hand is definitely a thing though yeah
yeah
the last part of that what I meant was
it can be more random
than Ultra Raw ever will be
it doesn't know what "photo" means as a single keyword as reliably
basically
a separate point from the appearance of the images themselves
YMMV, but I had zero issues with the beach image it produced.
As to SD3.5 Medium (I was using SD Large ass I did not have it installed), here is the output on your prompt:
yeah it's not "bad", it's just, distilled
I like Flux for a lot of things but "hard realism" just isn't a strong suit
i even made a Lora to try and address it for people myself lol
https://civitai.com/models/790722/zoots-human-photo-realmaxxer-for-flux
yeah this one is pretty good, or better at least
could just be a seed thing
there's still definitely some varying keyword behaviour between 3.0 and 3.5 though
I would hope so since the original had so many weird issues it was bound to have an effect on non human anatomy images
But I only use 3.5L as a rule anyhow
Also here's a good example of Flux Dev's sometimes odd prompt behavior I just did real quick on CivitAI
1girl is MASSIVELY weighted towards Anime, in Flux Dev
so much so that it almost always overpowers everything else
its a booru tag
it's a general tag meaning "one female person / character / etc" basically
often used in the SD 1.5 days for tagging stuff
the whole point of T5 is NLP, not the ancient single word prompting of the past
Complaining the AI cannot understand some weird name such as 1girl is not really much of a complaint to my eyes
I'm just saying any appearance of 1girl with Flux Dev even within a complete sentence that should arguably produce a photo, as I did, almost always results in an anime gen no matter what
as far as I can also tell Flux actually IS trained on Booru tags as well as natural language
like when you actually want anime from it, prompting only with tags generally works very well
the Pony crowd like to prompt that way
A boohoo tag?
its a thing from Japan
it means image board in Japanese
and they post tagged anime images there
like if Flux Dev wasn't trained on Booru tags pretty aggressively this wouln't be a thing, it wouldn't go "anime time" in this context, I don't think
ok, a fair point but if 1girl is used almost exclusively in an anime setting and you way you want a photo of an anime girl.....
As to the Pony crowd..... cry me a river
I train all my Loras concatenating Florence 2 Large "more detailed" mode natural language captions with Booru tags from wd-eva02-large-tagger-v3 right after them, in the same file
best approach I've found
cause each captioner tends to get stuff the other doesn't
so the combination is better than eiteher alone
Well, as far as I'm concerned it does a nice job in photorealism, though to be fair I quite literally never do any photorealism with them
Here is a higher res image with that beach prompt
I always find it so funny when people argue the realism of any model
It's so pointless to argue anything if you can just generate an image with one model then do a second pass with any other model of any style
Flux Redux helps with this magnificently combined with loras
What are you talking about
if you do a workflow that first pass fluv dev and second pass SDXL
your image exits via the SDXL VAE
Then you can just re-encode it. Again.
This changes nothing lmao
If vae was an issue, you could literally just make a vae encodedecode loop
and loop hundreds of ksamplers
you can't undo the quality loss from the SDXL vae though
Quality loss how? How much quality is actually lost per-decode when you are quite literally re-inferecing the image every time in the ksampler
The main issue at hand with diffusers-based transformers is temporal consistency
The changing and/or loss of data based on denoise value and steps
I don't think vae decode is relevant in terms of loss quality
if you try img-to-img with SDXL, lots of k-samplers at 0.5 denoise
the image quite quickly gets destroyed
it isn't able to fix the damage
Well yeah, because you're denoising half of the image for the model fo inference upon.
Have you tried preventing it via other methods?
Canny, Depth controlnets
IPadapters
was still the same issue
I use a very low denoise combined with flux redux to maintain quality
original
brocolli in the clouds
its raining legumes
if you try 50 ksamplers in a row you will see what I mean
the image is so damaged its not gonna be saved
anime image boards have been ready for datasets with captions for years. those people have had taxonomical tagging down to a science for the entire 21st century. nobody tags image collections like anime enjoyers
That's what I use in combination with lower denoise
WD-14 tagger can give decently accurate image tagging results in danbooru tags for any image
But ik what you guys are talkin bout
and i aint going down that rabbit hole
danbooru is one i avoid. so much extreme smut there
I do not judge people for their interests, just actions.
there's many sites that have used image tagging and it's not specifically a smut thing
Rocket Fish.
and it was done long before any ai classifiers existed. WD tagger was trained on those datasets
I'm literally just letting flux make its own images.
Using an LLM.
I gave an LLM a sysprompt, and a constantly changing user prompt with a seed value.
you can throw WD tagger into the captioning mix if you want yeah
what Sana does is use four captioning models
and then it selects the caption with the highest clip score for the image
and uses that caption
Sadly for me I just can't use SANA properly rn.
I can inference it but it's vae outputs on intel arc aren't good
ah I don't know intel arc too well
A lot better supported than it used to be
I'm glad I bought the A770 at the time
16gb of vram and 4060-level speeds
good enough for me
for gamers it truly is a golden age of hyper graphics
The universe has decided to play an elaborate game of hide and seek with the moon, and as it's now in a never-ending cycle of falling off its axis due to an infinite loop, time itself has begun to unravel, creating ripples through the fabric of existence. The sky is painted in hues of violet and silver, with occasional patches of neon green swirling about like psychedelic storms. Meanwhile, giant mutant lobsters have taken over New York City, transforming it into a sprawling underwater metropolis teeming with bioluminescent flora
lol one of the prompts
3 manufacturers all putting out budget level cards that can smash FHD resolutions on any game on the market. its a good time to be a gamer
I should be using gemini flash 2.0 from openrouter, they got it as a free model
probably get better prompts
phi-4 is out now too
Is it? I wonder how good it is.
phi 4 14b
ok
it competes with 70b models
ok
I'd been a good frontend that supports it then
IPEX OLLAMA defo wont
hmm
i use koboldcpp for llms but i dont know if it has phi4 support or not
ah thanks I didnt know about IPEX
IPEX is how I'm running ComfyUI and LLMs locally
And now with Xess2 and XeLL
Intel's really starting to pick up the pace on competition
I've started doing JAX stuff
TPUs scale a bit better for total TFLOPs/$ than Nvidia at the moment
i trust that intel is going to create some ML instruction sets directly on their future CPUS. i was going to boost to a rocket lake or whatever. but then those power issues showed up. going to wait for the next big generation of their CPUs with some serious improvements. my alderlake works good nuff for now
would ipex use my i7's integrated gpu very well you figure?
all the cloud servers I use have been AMD
I think you want openVINO for CPU
https://github.com/rupeshs/fastsdcputhis is a good openVINO SD implementation
the integrated gpu is a gpu. separate from the cpu cores
๐ nice gen!
Thank you.
same guys who worked on the dc-ae used for sana
SVDQuant is a super-performant quantization
i just want to load an llm on something other than my nvidia, so that i can use it in tandem with image gen
i got a 4080 for images
Oh
Then use llama cpp
lmao
or
ollama cp
u
and they would both work
yeah so no benefit to using integrated gpu then. just go with cpu
Not really no. You'll get more FLOPs out of the cpu alone
Otherwise using your GPU would be more sensical
Also I do this already
You can use ollama with OLLAMA_KEEP_ALIVE="0"
in order to load a model, generate and unload after
phi-4 is nuts
good to know xD
svdquant is what I currently use yeah
I can't use it myself
its on the tensorRT model optimiser roadmap
so it will improve soon
I don't have a GPU so I just use cloud
In the mystical city of Quagmire, where skyscrapers were made entirely of licorice and pigeons sported top hats adorned with tiny monocles, the citizens lived in an uproar of chaos and calm. One day, a peculiar event unfolded: Mr. Puddingpot, the local wizard with a penchant for mismatched socks and rainbow-colored waffles, decided to invent time-traveling croquet mallets. The invention was powered by the synchronized chirping of rare golden geeseโeach bird had their own distinct jingle tune that resonated only when they saw a full moon through a kaleidoscope lens. Meanwhile, in the corner of Puddingpot's laboratory, Lady Zinniaโa dragonfly with an extraordinary collection of porcelain teacups and a slight allergy to rhinestonesโwas knitting scarves out of spider silk laced with fairy dust for her upcoming tea party with the whispering willow trees. As fate would have it, Sir Whifflebottom, a knight whose armor was perpetually sticky from jellybean juice spills, had just stumbled upon a magical map that only appeared in rainbows on Thursdays when painted by left-handed penguins. This extraordinary day promised to bring even more strange and
Phi-4 is good at prompting.
thats good yeah
I asked for a "random things" prompt.
I actually use wd-14 with underscores removed
lmao
but since im currently using flux
i actually don't need to make an image2image prompt
redux with clip vision does that for me
I haven't tried redux yet it seems good
I can test an image for you if wanted
its ok I've seen a lot of examples
๐
if you take any image, and do a straight encode-then-decode in Comfy
with nothing in between
it's pretty noticeable
for fine details
left is original, right is passed through SDXL VAE
look at her eyes at full resolution, in particular
(this is an actual photograph of Madison Beer to be clear, not an AI generated image to begin with)
so you can see how eyes that aren't necessarily even as large in the overall image would eventually become kinda weird
when training a whole model with the XL VAE
as is the case in base SDXL
There are things like latent interpolation nodes to mitigate this issue isn't there?
nope nvm
even that has differences between each gen
most people its a small issue cos they don't have many k-samplers
but I like weird long complex workflows
and I get limited by it
spamming dozens of k-samplers is what I'd like to do, will be possible one day
lol yeah stuff breaks all the time
yeah one pass isn't a huge deal
but more than one quickly degrades on XL
SD 3.5 and Flux you have a lot more leeway
cause the degradation per pass is far less
yes with flux, using a lot of colour match nodes in between, I was able to use like 20 VAE cycles once
Do anyone knows if last sd forge update broke something? Im getting 31.33 seconds per iteration in flux and I was getting 4-6s/i
forge doesn't get many updates too often. the issue might be elsewhere.
I went to update my nvidia drivers last week, and it installed this new thing to replace geforce experience. "nviida app" .. it's a complete hog and i hate it. i've had so many issues since. Overlay is turned off but it's still attaching some container process to everything. It's crap and is not ready for them to deploy. But i had to get it to get the new driver so whatever. fuck.
Damn I have not idea what it could be then, and on google only a few people had the same problem, iยดll reinstall the gpu driver and if it doesnt work iยดll have to reinstall cuda again (I hope not windows ๐ )
I think Iยดve downloaded the wrong fp8 version, im downloading the 17gb one this time
Its fixed!
why bro brought midjourney bot here
hyper realistic cinematic photo, candid, black and white, professional photo, fashion, a woman with dark hair standing leaning against a wall blowing a light transparent smoke, she is wearing a white t-shirt with a long black coat, long exposure, in a dark setting, low light, dynamic shadows, volumetric light, 8k, detailed, masterpiece
i love it
the only human hands involved in this image is an idiot creating a workflow so he doesn't have to actually prompt ...
my workflow also does well with actual prompts...
The
top
Smoking is bad

What in the vaal is this
Yeah
So how is SD 3.5 looking? And u guys use large turbo, large or medium? Iโm considering using it
Sd3.5 large is pretty great, a good alternative to flux dev.
Turbo is good too but requires 4+ steps and I think shuttle3(flux schnell finetune) surpasses it since it can do same if not better quality at 2steps and can do 2k images.
Medium is probably the sota base model for its size but I think the low-step models are better choices since you can just use quantization with them for speed-ups and lower vram usage. Shuttle3 is probably faster then medium.
These are large powerful models. Qwen is 72B and might take the throne of best image describer (for local installs)
Their Qwen 2.5 LLM is comparable to Llama 3.3 in quality
Heated race between Meta and Alibaba on that front
So any tips whenever using 3.5 large
Exllamav2 already supports both and is almost 2x faster then llama.cpp/ollama while using less vram.
Is promoting with 3.5 large sim to flux or the same?
Yep prompting is similar
It has its own quant format, not gguf but yes, it can also do decimals like 4.5bpw or 2.4bpw.
Qwen models(2.3bpw-6.0bpw): https://huggingface.co/turboderp/Qwen2-VL-72B-Instruct-exl2
The interface will download and install the models?
is comfy UI completely separate from 3.5 medium, large, and large turbo? I just installed it and it never gave me a chance to choose
here is a tutorial for flux and comfy https://comfyanonymous.github.io/ComfyUI_examples/flux/
thx
allso the images on that page are workflows. so you can downdle them and drag and drop them into comfyui to get the workflwos that are deskibed
and you shuld probably use the fp8 checkpoint it uses less vram
Thx. I think I'm finally getting it now. Glad that link actually says where to put everything
yes its very helpful i also used it xD
https://huggingface.co/crystalwizard/Surreal-abstractions sd 3.5 large lora
Amara, an anthropomorphic owl with a serene expression, prepares matcha in a traditional Japanese tea house. She wears a simple kimono with floral patterns and carefully whisks the bright green tea in a ceramic bowl. The room is minimalistic, with tatami mats, sliding shoji doors, and a small zen garden visible outside. Delicate steam rises from the bowl, and the atmosphere is tranquil, with soft sunlight filtering through bamboo trees. The scene captures a moment of calm precision and elegance.
We propose a Linear-complexity text-to-Video Generation (LinGen) framework that scales linearly in the number of pixels. For the first time, LinGen enables high-resolution minute-length video generation on a single GPU without compromising quality.
Linear complexity ๐คฏ
I not will be suprised if one day we reach log(n) complexity
@remote holly that's what one of the path of exile dev's figured how to do with screen space reflections
doing ray-casting (path tracing) with log(n) complexity
so for each sample point the next point will be cheaper. he held a talk about it. when i was watching my mind was like "poooof"
raytracing becoming CHEAPER for each increase in resoluation instead of more expensive.
yeah i see they used a manba for the attention , that's a good optimization
non-quadratic attention is what Sana does also
I didn't know this fact , they removed the DiT ?
no they made a research breakthrough
and managed to use linear attention with DiT
Ha , it keeps a good quality ?
breakthrough?
I'm sceptical. So far there are plenty of papers about howto get rid of quadratic attention and none of them works as good as quadratic attention.
All images from Sana I have seen so far look relatively bad
yeah I guess breakthrough was too strong of a term when its possible that it is worse LOL
the research on VAE speedups might be more valuable
I struggle with VAE encode/decode times when doing something like 4k SDXL with 4 steps
feels weird when VAE encode/decode time is longer than the Unet time
for non-linear attention the Mamba stuff might be more promising, the Dimba, Dim or ZigMa models
I don't like this either xD
its like Stable Cascade. Sounds good in the beginning, but comes with a lot of flaws
if you train your own custom finetune you usually don't have to finetune the vae. I bet, however, you would have to do that for Sana
their vae has a much higher compression factor that probably won't work well with custom finetunes
(same happened for StableCascade with their stage b model)
I feel like there is a middle ground between the SD 1.5, SDXL and Flux VAEs and their TAESD versions
even going a bit faster for a bit of quality loss would be welcome sometimes
the TAESD versions are just too bad I can't get them to work a lot of the time
I wasn't aware that you had to fine tune the VAE for Cascade fine tunes, that does sound pretty bad
I skipped stable Cascade cos I didn't like the aesthetics
if we could get just one photography/realism checkpoint for Sana that would be good maybe
my bigger issue with Sana is that it is not that much faster than Schnell
particularly the Schnell finetune Shuttle Diffusion
which happily does 2560x2560 in 2 steps
Yeah I think vaes can definitely be improved. This is reducio vae, sota quality and compresses 4096x times:https://huggingface.co/microsoft/Reducio-VAE
For reference, mochi-1 compresses 384x and Hunyuan does 512x. Reducio vae is similar if not better quality and does 4096x. Limited to 1sec sadly.
we're gonna need it for images if diffusion is going to be 1 step
Yeah it still does 1024x for images, in real 128x I believe. Which is still massive, stable cascade/sana are only 32x and that was the highest.
you can also compress images arbitrarily strong, but is it worth it? At some point if you compress your image too strong you end up with a GAN like model which lacks any creativity and cannot be finetuned anymore
For fast real-time generations, probably a good idea. Vae needs to be probably lightweight too though.
I don't know. You need a strong compression cause attention is quadratic. Maybe we have to overcome the transformer architecture instead
in the LLM world I think they got further with some of the non-quadratic attention stuff
they got at least a 7B or something
I'm in the "focus on scaling" camp mostly, I think in the short term we just need to keep on scaling transformers
can distill from the larger ones
There are actually hybrid models large as this: https://huggingface.co/ai21labs/AI21-Jamba-1.5-Large
not really. Yes, there is mamba and xlstm and stuff but it just does not work as good as attention
they have nice benchmarks showing they are as good as attention. But just in their benchmark. As soon as you leave their narrow benchmarks attention outperforms this stuff clearly
what works, however, are hybrid architectures where you mix global and local approaches. They show better performance while dropping only very slightly in quality. However, this approach don't solve the scaling issue, it just makes it more efficient but quadratic runtime is still quadratic runtime
hybrid architectures don't get rid of the quadratic scaling yeah
its really bad scaling if we want sequence lengths to get several orders of magnitude higher
Last 7 days <Dec 10 2024> โ <Dec 16 2024>
- Member counts
- 346039 โ 346018 โ 346017 โ 346003 โ 346018 โ 346000 โ 345993
- Action members
- 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 70
- Message members
- 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 58
- Reaction members
- 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 25
More details
Author: @pine loom
MotionPrompt: say hello
Command: </animate:1181475143340539995>
Background: from video
Model: V3-beta
Explore more features at viggle ai
Beautiful picture.
hunyuan video gguf q8 with rtx 3060
i can't wait hunyuan svd quant
is it faster than fp8?
i never tried fp8 but i have but i generated this in 400s
20 steps
is like generating 2 images with flux
2s video in 408s
i don't remember but I feel like it wasn't slower for me
also, flux takes only ~100s for me on 3060
q8 or fp8 ?
i think i could do with more frames , maybe 3s
i have not enough memory for decode
i love the consistency
fp16 for flux and fp8 for hunyan
WHat was the prompt? Was this by LLM? If so which one and what was asked?
Imagen 3 by DeepMind is updated
It's an LLM enhanced prompt. However, there's a twist to it. The prompt i did: non-euclidean dreams
the output:
"T5": "A surreal dreamscape where gravity and space warp in defiance of the ordinary. Jagged, floating islands with archways that lead nowhere and staircases spiraling infinitely upwards or downwards dominate the scene. Colors are vivid and unnatural, bleeding into one another, creating a mesmerizing yet disorienting effect. Figures float effortlessly, their forms merging and separating as if woven from light and shadow, embodying the fluidity of non-Euclidean space.",
"CLIPG": "non-Euclidean, surreal, floating islands, infinite stairs, vivid colors, ethereal figures",
"CLIPL": "Surreal floating landscapes with warped geometry and ethereal figures in vivid, merging colors.",
"ARTSTYLE": "Surrealism, digital art"}```
i noticed SD3.5L and M really like it when the clipg and clipl aren't as long. so i have a LLM output various fields ->
reasoning: to get its bearings. It also helps getting rid of stuff that should not go into a prompt
T5: obvious
clipg: shortest prompt, just keywords
clipl: short stentences, basically a summary
artstyle: i add these in front of clipg and clipl, so there is an actual art style used
the LLM model i use is ExaOne, i'm really impressed with it
what size? I was not terribly impressed with its text, but admit it had some curious cultural habits with its creative output
why do people always use different text for clipl and clipg ;_;
i use a vlm to make a specific kind of prompts to create found footages videos with ltxv , do you think reasonning can enhance the generated prompts ?
there was a bug in the first release, the repeat penalty made it really bad
if you still have that parameter in there, it's TERRIBLE for real
is possible to ask a reasonning for describe an urbex video ?
i'm not sure if vlm models are really suited to reasoning
and exaone isn't a reasoning model per se -> it's smart enough to do it when instructed, which helps
i mean use vlm output as an input for reasonning model
@lavish sparrow that's really cool!
as long as it's text, of course you can ๐
yeah , i will try with marco-o1 in ollama
marco o1 is surprisingly capable
if you want to use it in comfyui, you have to so some extra stuff
i will use for sure, can you split reasoning part and take only the final ?
yes, that's the thing i've done, actually!
json format ?
set the format to json, and add in the system prompt what should go in what fields
probably wouldn't recommend that, just using something like qwen2vl/minicpmv2.6/pixtral should work well for describing images in a specific format.
No need for reasoning honestly like marco o1, llama 3 8b might be even better if used properly.
All format issues can be super easily solved by few-shot, just give it examples and you will get identical formats.
so i let it reason in the reason tag, but i can discard it afterwards, but the intelligence of those models comes from having reasoning available to them in the first place.
I kinda dislike marco o1, its just a cot like finetune which even llama 3.2 1b/3b can do zero shot. The only open source model that actually seems to improve performance with reasoning is QwQ 32b(qwen) imo but thats more for coding.
is use minicpm-v2.6 and qwen 2 vl with this system prompt : You are an AI assistant expert in writing very precise and quality urbex video descriptions from images of places and spaces. When the user gives you an image representing a place or a space you write a description of a video following this template:
A urbex video filmed in the first person in [insert a short description of the place], the operator moves forward, [insert your description with details], the lighting is [insert a description on the lighting of the place], hyper realistic, fluid, masterpiece, amateur video, tilted camera
Instructions for generating video descriptions:
-
Do not contradict the image of the first frame.
-
Start with a concise description of the scene in a single paragraph, focusing on the key actions in order.
-
Include detailed character and object movements, focusing on specific, observable actions.
-
Provide essential background details to set the context, highlighting elements that enhance the mood without overloading the description.
-
Mention camera angles and movements that define the visual style of the scene, keeping it concise.
-
Specify lighting and colors to establish tone, ensuring they complement the action and setting.
-
Make sure the description reflects the type of source, such as live action or animation, clearly and naturally.
it works very great but in think i can do better results
Any flaws with the normal output? The best way to solve it usually is by giving it a real example that you think is very good, try that maybe?
It does well on math logic puzzles too, but its excessive neurotic behavior is a bit annoying
they're not the same as what O1 did cos it wasn't cot
it was reinforcement learning apparently
Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.
If you have any questions, feel free to ask us!
Your dashboard
Help
Support server
Other languages
en: help
ja: help Japanese
help Chinese
Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.
If you have any questions, feel free to ask us!
Your dashboard
Help
Support server
Other languages
en: help
ja: help Japanese
very nice, i saw that the distelled versions of hunyan lost was heabily degraded but this seems reeally nice, how long does it take for a vid to gen, at what fps and for how many seconds can you get it, also which ui are you using? i do allot of image stuff but never video so curious
i also have 3060 so would be good info ๐
@uneven storm i have an rtx3060 12gb. You can generate 2s(50 frames) of video with classic resolution and 20 steps , it takes me ~10mn to generate one video.
I am using hunyuan-q8 with the new native nodes in comfyui
can you send the comfy workflow pls?
https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/
https://huggingface.co/city96/HunyuanVideo-gguf/tree/main
Update your comfyui , and your comfyu-gguf node pack
tyvm
Do you mean ComfyUI implemented a GGUF loader for Hunyuan? And what about FastHunyuan? Did you try it?
yeah the comfyui-gguf nodes have a support for hunyaun video , adn fastHunyuan works only with h100 gpu i think
But why Fast so? ๐
Do you have a GGUF working workflow to give me?
i dont know, researchers test always they work on h100 gpu
And about a working workflow for GGUF Hunyuan?
just use comfyanonymous huyuan workflow and replace model loader to unet gguf loader
this one
Ok ๐
In a surreal, neon-lit dreamscape, a young woman stands surrounded by thick, swirling clouds of electric blue mist. Her sharp silhouette contrasts with the glowing haze, creating a sense of otherworldly wonder. She wears an oversized black coat adorned with glowing neon stars, which pulse with an ethereal, blue-white light, casting faint, shimmering reflections on her surroundings. The stars appear to hover slightly above the coatโs fabric, as if suspended by magic, radiating a celestial glow. Her face is illuminated softly by the ambient light, and her expression is calm yet enigmatic, as she gazes directly into the distance. The atmosphere is immersive and futuristic, as the dynamic fog continues to drift and curl through the scene, giving the impression of a cosmic, dreamlike environment. The contrast of the deep shadows and vibrant neon hues evokes a moody, cyberpunk-inspired aesthetic.
anyone tried this? Looks cool
https://github.com/lehduong/OneDiffusion?tab=readme-ov-file
its good yeah
lots of conditioning that is like control net or pullid
and they included seg control net which is one of the better but rarer ones
็ๆๅธฆ้ชๅฑฑใๆ ไพฃใ็ปๅฉ่ฏ็ๅ ็ด ็ๆผซ็ป้ฃๅพ็
Flux ?
Hello
Yes. Demoncore
i have the prompt for image version , i will send you
i dont find the prompt sorry
but is used these keywords :
a pretty woman with dark black hair , standing with a stylish pose in front of a red background with snow on the ground and smoke ,
she wears a large blue wool coat with long blue wool pants and a large blue hat ,
reflections , detailed shadows , bright and dynamic light ,
8k, detailed, masterpiece```
hyper realistic cinematic photo, artistic photo, artistic fashion style, long exposure, bokeh effect, cinematic view, fashion, close angle, bright colors, professional photo, a pretty woman with dark black bob hair, floating in the air in a photo room with a red background, she is wearing a large orange wool coat with long orange wool pants and a large orange hat, reflections, detailed shadows, bright and dynamic light, 8k, detailed, masterpiece
with sd3.5l
??? XD
3.5L colours are so nice
you can send your Flux, Cascade, SD 1.5 or SDXL latents directly to SD 3.5 with this https://github.com/city96/SD-Latent-Interposer
particularly for SD 1.5 it lets you avoid using the small VAE
Midjourney is having a special for this month, with UNLIMITED IMAGES using their smallest subscription. They said the reason is they have too many servers that are running idle, so made this announcement. They literally state that (IMHO it means the competition from free purveyors like Flux, and the sheer strength of said models, is the cause)
They call it the Relaxathon, meaning set it from Fast image generation to Relaxed, but due to huge server availability there is little to no wait time using this theoretically slower mode
Again, this is in their very own announcement
its quite slow in relaxed mode as far as I know
So you are trying it in this Relaxathon and new conditions? Ok, good to know
no I don't use closed source personally
its just what I have heard
But you heard this now, when they declared that their servers are currently running idle
Still good to know
This is the announcement for those interested:
Hey. we got a ton of servers for our models next year, and they're currently sad and idle. So for the remainder of the year we're doing a holiday RELAX-ATHON. Relax mode is now available to all members (including basic) with near-zero wait time.
More specifically this applies to
Standard jobs, variations, sref, permutation jobs
It does not apply to:
Upscales, weird jobs, editor, outpainting, retexture
We've also increased the queue sizes and maximum simultaneous jobs
You can now do up to 12 simultaneous relax mode jobs
You can now have 24 jobs in your waiting queue
(these numbers may change based on server availability)
This is a pretty huge amount of free computing power and we hope you can use it over the holidays to have a lot of fun and take some crazy aesthetic journeys.
Happy holidays! โค๏ธ
Sounds also llike MJ7 is on the way
I have not tested even the new release of Imagen 3 frankly, which is free and came out yesterday.
I still may bite for fun.
Can you use it online?
You can only use it online. It is DeepMind/Google's image service. But free.
They added different ratios, so now it is not locked into 1:1 images
its just marketing to get people to subscribe really
SUre, but unlimited images for the cheapest plan is a good deal
I meant more like is it on a website or do I have to download an app

oh... no, just website
They still don't let you generating with people
I can't generate Santa using a jetpack, 0/10 
Yeah? I never tried. Let me see what happens
Yeah they tried to "diversify" things on the older models and got backlash for it
Gemini is their text bot, it can generate stuff
But because of that it might be getting blocked like the old models
Like Copilot and Dalle3.
It's whatever, closed source AI will never be as good
Not available in my country, rip
That's not a very grounded opinion IMHO. Preference won't make it true.
Oh, and nor in my country. I use a VPN
I mean, all models, open source or closed source are made by companies big and small. There is no magic sauce in creating a model that makes the open source inherently better. You can create tools and LoRAs and so on, but the base model? The sheer cmpute alone is outside the geenral community's reach for now.
This is in contrast to open-source software projects directly developed by the community
Wait it's available in Brazil
Imagen 3?
Huh. It wasn't a couple of months ago
I got messages about availability soon blablabla
I went to the faq and saw Brazil there and was like wtf
Went back and it worked 
The UI in that site is pure suffering on mobile

@lavish sparrow
Solid artwork though
Hunyuan can gen 20sec vids now with ctx window and not using extra vram. Will use extra time obviously though.
oh I remember this yeah
was in the early Gemini days
You mean the latent image ?
Amazing
yeah the latent image
pink spaghetti noodle in ComfyUI
Yooo thatยดs some good quality
I just tested it... its looking good
Too bad its closed source but at least its free (no, im not gonna subscribe or upgrade to it XD)
looks nice
๐คฃ
Flux Schnell 1step, 1024x512
Using different latents change the output image ?
Nice
yeah the VAE makes a different image
you use the VAE of the final model
thats realy cool , i will try theses nodes
new ltx video version 0.9.1 !!! https://github.com/Lightricks/ComfyUI-LTXVideo
new vae , built in stg and image compression !
yeah
i love they listen the community and add the image compression and built in stg
i am looking forward controlnet and lora support
ltxv with controlnets have the potential to be the new animatediff with DiT
anatomy is not good but i love colors
is it possible to correct the hands with inpainting ?
Worked for sdxl, never tested on sd3.5 but probably
New ltxv looks amazing so far, left is old one, right is new one
realy amazing !
the movement and camera are better
no crazy deformations with stg
Yeah it got a nice 5 finger hand, and itโs actually faster too. Previous ltxv already took like 15sec on a 4090 with stg. This one is 15% faster I believe
what is this new option :
Additional initial latent optional input to chain latents for high res generation.
15% ? a realy good improvement
4s/it for 5s and 30 steps !
Even the apple is faking it
New Ltxv 0.9.1
blood microchips
remember to water your plants
oh wow this is really good
time for the little monster to sleep
and morning coffee, important too
how can you do this?
i tried.
abstract rabbits xD
?
"i'm tired, boss"
"T5": "Abstract swirling flames dance across a canvas, interwoven with ethereal, flowing lines representing voices and emotions. Shades of deep reds and oranges contrast with cool blues and purples, symbolizing bittersweet desire and peace. Interlaced figures, barely discernible, embody connection and eternal devotion, their forms merging into a harmonious chaos that speaks of burning passion and eternal union.",
"CLIPG": "flames, ethereal lines, red-orange, blue-purple, intertwined figures, voices, eternity",
"CLIPL": "Abstract flames and flowing lines symbolize burning passion and eternal love, merging figures in a vibrant, bittersweet palette.",
"ARTSTYLE": "Abstract Expressionism"}``` the fact that sd3.5 L even gets this close, is 100% amazing...
"The fact that..." [opinion]
the fact that i'm getting images i'm getting entertained by -> i 100% do not care other people might not like it. i like it. that's all that matters to me. why would anyone's opinion matter in the first place? but i know there's people in this channel that might like it too.
that's sd 3.5 large. It's a test for a lora i'm creating.
too cute award!
and of course no comfy wrapper 
I saw a comment once saying
instead of waiting for Comfy wrapper, just go to the tool where its at
and I've started doing that lol
Sadly, I have a skill issue
Yea, I should learn
but later 
What plans do Stability have? Will sd3.5 be the last img model?
its ok everyone has skill issue
not sure what's gonna happen with Stability
feels like video is the focus of everywhere for a while
I wish sd3.5 would hit 2mpx natively : (
Yeah when will stability cook a sota video model, same with Black Forest labs and theyโve been promising it for a very very long time.
they took a lot of VC money so the funding is there
Even flux usually gets artifacts around that res, shuttle 3 is probably the best at native 2k gen, not 100% sure. You can always just do a 2nd pass.
2nd passes still gives artifacts for me, but probably skill issues. Unless you talk about upscaling which I dont like
demoncore is good
they trained it on flux de-distilled
it can go to 2560x2560 even
I agree shuttle is probably the best model out there currently, for quality per FLOP
do they run on a 3070 ?
will try
SVDquant would be the best thing for you
its currently tricky to use
they are gonna make it easier apparently
I recognise the Shuttle look lol
The fact shuttle is written on his arm is a giveaway
lol didnt even see
Shuttle beats dev quite regularly
it can't do every style/subject but when it works its great
what are its weaknesses?
it goes painterly sometimes
I always want photographic
also it is not a soft model by any means
which is both a pro and a con
very nice green
SD3.5
So I have SD forge....so how to can use sd3?
Can we train loras on SD3/SD3.5?
Also, is it worth it go from SD2 to 3/3.5 for anime?
Yes 3.5 should be much better then sd2, but maybe not anime specific sdxl finetunes like animagine.
SD 3.5 large. prompt: anime girl yelling at an anime boy - no loras, no finetuned check points, just the base model. SD 3.5 L and SD 3.5 M are fantastic models that people refuse to use. Large doesn't NEED loras - but people refuse to even try to use it without them
I want loras for specific characters
I doubt the base model can run all of the characters I want on good quality
you might be surprised - but that's a very specific thing. in general, however, for some odd reason the mindset in this community seems to be 'can't generate without lora' - like a lora is a magic wand that'll give them wings, or an invulnerability potion, or something. and if they don't have a lora, they won't even try prompting the base model.
using only sd3.5 medium as second stage for pixart sigma. Sadly sd3.5 large is too big for my pc
and sd3.5m does shitty composition on it's own
the new nvidia gpus should be coming out after the first of the year, and that should drop the price on everything else. maybe you can upgrade soon?
have you tried the GGUFs that were put out, out of curiosity?
yes i did, but is too slow, also it does not have the quality of sd3.5m. The images above were made with pixart sigma 15 steps + sd3.5m 35-50 steps in a step-swap workflow.
will upgrade soon, but we will see if nvidia will give us more vram this time
you can always buy additional vram
no i can't, a gpu comes with a fixed vram
well, we'll wait and see what they release
SD 3.5 large
Whitfield Diffie (SD3.5 Large Turbo).. turned him into father time... a bit ๐ฅฒ๐ hf space

move
Rathorse. A sing of our times.
sd3.5 large turbo + medium
anyone finetuning sd3.5 medium ?
also have people started using more sd3.5 now, or most of them are still on flux
check Civit numbers
I don't wanna check myself cos Civit's site lags everything up
jsut saying that flux loras had exploded when it came out, and it seems not that much dor sd3.5, there is almost nothing
oh I thought you were asking a question rather than making a rhetorical point
I like 3.5 better
I have no idea why, sd3 loras are far far easier to create, especially when doing uhm, civitai themes
It's true though, barely any 3.5 loras in comparison
i do like it better for the creativity, but flux has better image coherency.
it's hard to switch when you are already happy with flux ?
There's some stuff that sd3 can do that flux cannot
seeing the ton of loras for flux, i was tempted to switch again to it lol
like ?
Anything nsfw, but also melting clocks was extremely difficult to create loras for with flux!
flux has something special though, like symmetric details. idk how they achieved that
Fortunately they are both free, and run locally ๐
probably just the size
most 20B DiTs have good structure
yeah, im thinking to try flux dev stage 1 + sd3.5 medium stage 2, to remove that plastic feel from flux gens
Anyone made a Hunyuan Video lora yet? ๐
flux is no longer a shiny new toy
theres several now actually I believe
i think i saw one mentioned a couple days ago
yeah but still, i see almost no lora or finetunes for sd3.5
I'm getting around to it lol
i'm sitting here cranking out sd 3.5 large loras, about one every two days. there are a LOT of them out there, both on hugging face and on civit
here
start there - those are all style loras
The cool thing with hunyuan is since its also like a text-to-image model, you can train it on images too and it can actually gen videos of that character/style. Image training is a lot cheaper and faster.
<-- got distracted by Hunyuan Video lol
mage just put hunyuan in, free and unlmited, for all accounts ...
just sayin'
I can't wait until that aspect of it hits mage! (My computer is too slow)
mhhm, only sd3.5 large. Is there any chance of having a really good sd3.5M in the future, or it is still unusable as a base model ?
I noticed! ๐
3.5 m is not unusuable as a base model, but i can be talked into creating loras for it if you want
there is 3.5m turbo now also
but i'm making style loras. what sort of lora are you looking for?
https://old.reddit.com/r/StableDiffusion/comments/1hieril/introducing_tensorarts_stable_diffusion_35_medium/
i have a lot of fantasy dnd like images, handpicked. If you can train a lora on them
sure - how many do you have?
mage video is hunyuan?
yes
they just released it yesterday
so it's in beta still
Generate amazing videos using artificial intelligence. Create, customize and share AI-generated videos instantly.
nice, there was no free demo of hunyuan before and this one is unlimited๐
around 2000, but they are all very high quality.
it's low res, keep that in mind. they will put in high res, but not till it's out of beta and that probably won't be free to all accounts
i can't use more than 1000 for a lora, you're wanting a fine tuned check point. that's not me, that's @sage burrow
maybe i can choose 1000 then lol
i would need some time to sort them
i'll be happy to train a lora for you if you're willing to do that, sure
Civitai.com is really easy to create loras with, up to 1000 images. $2 approx each ๐
great
yeah but i don't know the settings, i would need to pay a lot until i get something decent
i never trained before
Unfortunately I am only able to do loras so far.
(Though sometimes I'll create a huge lora, then pseudo-merge it with a checkpoint)
which would probably work for this
sent you a friend request, will be back to you once i have the 1000 sorted
why don't you see if you can crete what @frail shoal is after. and i'll do a lora from a smaller dataset if it doesn't work
yeah not too bad, speed is actually decent and the cinema aspect ratio res is 672x368 seems like.
yeah. i made several videos right after it released, they worked well
although i must warn you, there are a lot of women. not all, some dragons and a lot of fighting scenes, but a lot of women still
The preset ones work extremely well (that's what I use most often). Just remember to switch image size from 512 up to 1024
be aware that it's very picky about aspect ratio. if you want someone walking as the subject, you need (click customize) to use 9:16 AR
otherwise you might get just their stomach
there is a preset about sd3.5m ?
a preset what?
my images are not all in 1024 though, they have different aspect ratios. would i need to crop them before ?
i thik becky meant in civit ai
i make mine all 1024 - some people don't. i just want the AI to have a dataset that's consistant
3.5 in general there are. (Akso flux)
i try to have a decent group for each aspect ratio
I'm a lazy lora maker, I never do lol. I think you are supposed to though
you don't have to - there are pros and cons for each
if they are all the same AR, then you can have issues if you try to generate somethign that's not that AR
So, anyone know a good free video upscaler?
on the other hand, you might get a better training if they are all the same AR
capcuts
i cropped like a 100 and inpainted some details into things that were needed. but it takes a lot of time. i also turned a guy character into a woman, because i really liked that one lol
on their magic tools page
Thank you
Hunyuan Video is decent at sword swinging characters btw! Better than still images last o tried
Is there any fancy way of upscaling with SD 3.5 (I'd like to get 4k image), or do I just generate standard 1k image and then pass through SD upscaler?
yeah if you can split the tiles up
then run a captioning model, control net and IP adapter on each tile, you can get much better results
I've used one workflow from da internets with basically default setup (just removed inpainting) and the upscale produces artifacts.
I guess that's because it's using 512 tiles, not 1024 which are default for sd 3.5l?
SD 3.5L like 1024 yeah
Setting tile size to 1024 did get rid of the artifacts, but there's clear chessboard pattern for brigthness
was there controlnet
workflow should be included in image
Trying dropping the setting to around 0.10 in the upscale toolbox
denoise?
when denoise is low it means it only changes the image a little bit
that was a question to @finite osprey. And I did reduce denoise to 0.1 and grid is still there.
Not quite sure then maybe you should post your upscaler screenshot here so that people can help more
Are there any fancy flow samplers that work with sd3 .5 in comfyui ?
yeah the clown stuff all works https://github.com/ClownsharkBatwing/RES4LYF
in stock Comfy samplers, Euler ancestral and DPM++ 2SA also work
as far as I understand
the thing that needs to be done for an SDE/ancestral sampler to work with rectified flow models, is for the SDE to be variance preserving rather than variance exploding
What's with the bot constantly pinging alextitanov and mayz? annoying
structured prompts work with T5 xl ?
prompt ```
<subject>
fashion presentation, professional video
</subject>
<style>
fashion, long exposure, professional photo, vhs, hyper realistic, cinema
</style>
<actor>
a woman with black bobbed hair
</actor>
<clothes>
a white shirt, a very long black navy blue velvet coat, black wide pants, a blue shark-shaped cap, blue shoes
</clothes>
<action>
the woman is standing holding her hat
</action>
<background>
a bright room, a black background with a starry sky
</background>
<camera>
the camera films the woman from above
</camera>
<extra>
dynamic shadows, light reflections, fashion, professional video, filming, cinematic view, hyper realistic, 4k, masterpiece, smooth, detailed and sharp faces
</extra>
i love the colors and aesthetics but the hands ruin everything
prompt :
fashion presentation, professional video
</subject>
<style>
fashion, long exposure, professional photo, vhs, hyper realistic, cinema
</style>
<actor>
a woman with black bobbed hair
</actor>
<clothes>
a large black tee-shirt dress with blue bubble patterns, black boots with blue arrow patterns, a blue velvet hat
</clothes>
<action>
the woman is standing holding her hat over opaque pink water in space
</action>
<background>
a dark space background with stars and the planet earth in the distance
</background>
<extra>
reflections, dynamic shadows, light reflections, fashion, cinematic view, hyper realistic, 4k, masterpiece, smooth, detailed and sharp faces
</extra>```
I would also be willing to give it a try if you can share the images. Feel free to friend request if interested.

How did you get a longer video? What's "CTX window"?
they mean context window
but I am not sure how it was done
if it was just passing the last frame as a new initial frame that doesn't rly count
its still useful to do that but its different
did anyone manage to install svdquants in comfy on windows?
Is it okay to talk about the safeguard on StableDiffusion 3.5 Medium?
I have set pipe.safety_checker = None but there is still significant safeguarding
this is a diffusers thing not a StableDiffusion 3.5 Medium thing
Should I switch to another UI (great if I can get suggestions), or can I somehow switch it off from Diffusers?
not an expert, Kijai is the one who implemented it, but basically the steps are divided into segments(If you do 40steps, 20sec video with 5sec segments, then 10steps per segment) and there is an overlap factor so the next segment "remembers" what happened before.
It should be a bit better then passing last frame as next frame as neon said and the passing last frame technique won't even really work since hunyuan has no i2v yet.
Makes flux linear and makes it do 8k generation 6.3x faster at that res: https://github.com/Huage001/CLEAR
I'm surprised no one's released in image 2 vid for hunyuan yet
tencent said they are planning on releasing it on january, and looks like they have controlnets too but not sure when they will release that.
Thanks for explaining, but I still don't understand it. How can I use it in Comfy, is there a workflow/nodes to download?
Kijai's wrapper has support for it, not sure if comfyui natively supports it: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper
I use those nodes (comfy version just gives me blurry output), but I don't see anything that explains how to create a longer video. Increasing the frames just takes forever and crashes my PC...but that's another issue.
I'm using a 4090
you can switch it off in diffusers
Should be something that says HyVideoContextOptions?
wow thanks I missed this one
really important paper
PipeFusion is what I was looking at currently, it should stack with that
May I ask how?
pipe.safety_checker = None does not seem to be sufficient
wdym by safeguard? Is it giving black images or what?
By safeguard I meant censor ๐
I mean sd3.5 medium is not really trained for nsfw stuff if you mean that, ponyxl is a better choice then.
maybe try thesehttps://github.com/CompVis/stable-diffusion/issues/331#issuecomment-1562198856 https://stackoverflow.com/questions/77616823/stablediffusionpipeline-always-producing-empty-picture-probably-because-of-saf
he's getting dinged by a safety checker thing diffusers has
