#🆕|sd3
1 messages · Page 113 of 1
Gives hope for finetunes too 🙂 it probably knows the look not how to prompt it
hope so!
Great prompt adherence. (And good enough quality at 20 steps.)
In this RAW photo, a cinematic movie scene, a full-body shot of a woman standing in the street wearing black boots, a woman is wearing a blue trenchcoat with a red belt. She has a green cowboy hat on her head, and long blonde hair. She has tall black leather boots. She is facing the camera and smiling.
sometimes its just too much photo-like influence
its like it starts with a perfectly photorealistic anatomy and then just turns it into a paintings
Definitely the same woman, which is impressive, but it changed her clothes. I guess I could just copy her description from the original prompt though.
in the preview I just see a photo
its like the VAE adding in the painting-like details

Probably need to train a lora to get the style to come out stronger.
Very impressed with the quality here.
you need a relatively low cfg
and even then it looks like a photo turned into a painting
https://huggingface.co/spaces/Shitao/OmniGen
Though I don't seem to be getting the img2img prompt correct or something
It was broken earlier today. Don't know if they fixed it.
I usually do physics tests and stuff, but I don't think that applies here. I really just want this for reposing characters, and I have zero tests designed for that. This is a whole new world.
Okay I'm going to sleep. Super happy to have this working though. I'll try to think up some tests for it.
just in case you missed this https://huggingface.co/city96/stable-diffusion-3.5-large-gguf/tree/main
prompt a bit on 3.0 api and then train for 3.5 local, that's SAI's clever monetization strategy 🤑
but all jokes asside having seen 3.5 a bit and used 3.0, i really don't understand in what way 3.0 was not good enough for release for nearly half a year while 3.5 is
3.5 definitely has a LOT more data/knowledge than 3.0!
Haven't noticed it yet, my imagination is lacking i guess 🙂
3.5 thinks a mink is a cat while 3.0 knows a mink's a mink, but thats my n=1 observation, whatever tagger they used didn't know mink 😢
My fave things to prompt are scenes from the middle ages and Rennaisance. Well and various 70's cartoons lol
Unfortunately 3.5 is really lacking when it comes to anthropomorphic snakes though! 😦
It's furry, has a nose, 2 eyes and cute ears
Basically a cat?

something went wrong 🤣
A majestic, powerful hydra warrior, with a strong, muscular torso, its chest and abdomen a mass of writhing, serpentine flesh, from which nine thick, scaly necks emerge, each supporting a serpentine head, each with a strong, angular face, sharp fangs, and piercing emerald green eyes, its scaly bodies a mesmerizing mosaic of dark blues and greens, with iridescent scales that shimmer in the dim light, its necks thick and muscular, adorned with intricate, swirling patterns of silver and gold, wearing intricately designed silver armor adorned with golden accents, with each head crowned with a delicate, gemstone-encrusted circlet, the heads weaving and twisting around each other in a mesmerizing dance, as if alive, stands defiantly in the midst of a rugged, rocky mountain landscape, as it battles a legion of dark, gaunt, and twisted undead orcs, their emaciated bodies clad in tattered, blackened armor, with glowing red eyes, razor-sharp teeth, and grasping bony hands, amidst a torrential downpour and flashes of electric blue lightning that illuminate the dark, foreboding sky, as the warrior's nine swords, each emblazoned with a golden hydra emblem, shine with a warm, heroic light, painted in a highly detailed, epic style reminiscent of Olga Shvartsur and Svetlana Novikova.
@sage burrow
are hydras possible yet
its way more interesting than a cat!
How dare you

Do they have other names than mink that you can try
your chaos bot now, did cat upgrade vram and something went wrong?
it exploded
poor catbot 😦
I haven't tried, hold on
deepl says a mink is a mullet too, sd3.5 says a mullet is ...
Hyper-realistic mullet detective, razor-sharp quills glistening, piercing eyes reflecting gaslight, oversized trench coat and deerstalker hat with intricate fabric textures. Cobblestone Victorian London street, fog-diffused lamplight, condensation on wrought-iron railings. . Cinematic composition, anamorphic lens flare, golden hour backlighting silhouetting the mullet's profile. Atmospheric perspective, volumetric fog, ray-traced reflections on wet surfaces. Shot evokes Joel Meyerowitz's street photography, captured on RED Monstro 8K VV at ISO 100, f/2.8, 1/125s. Extreme depth of field, tack-sharp focus on mink whiskers. Color grading mimics Kodak Vision3 500T film stock. Photogrammetry-level architectural details, physically-based rendering of materials.
Ban becky from mink genning
i need to make them bodybuilders not detectives 🙂

another mink lolol
Mincoon
Perhaps. The trenchcoat gives the ai an excuse to cover up part of the face
Mochi-1 has no idea what mink is, pretty cool detective tho.
one bad attempt at a hydra
wow, that is a bad attempt at a hydra
This one was via glif, with some help from Claude! 🙂
it's something for sure....
it has 3 heads, so definitely a hydra
I'll have to try to remember how to add the LLM assist to Comfy workflows again!
Has anyone found the perfect 3.5 propt length? Before the ai gets bored and starts forgetting things?
this looks really good
is this the 480p version, cause it looks too good for that
were you able to install it locally? i see the install process take a few lines..
git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .
firefox or something
ye
Hyper-realistic anthropomorphic bat magician, performing a card trick in a smoky, dimly lit nightclub. His eyes gleam with mischief as he manipulates the cards with his paws. Cinematic composition, shallow depth of field, focus on the bat's hands and the cards. Shot evokes the captivating mystery of a magic show, captured on RED Dragon 6K at ISO 800, f/1.4, 1/60s. Color grading mimics a vibrant, theatrical atmosphere. Photogrammetry-level nightclub details, physically-based rendering of materials.
Would you care sharing prompt/workflow? I like this, I want to try it with other animals
does discord destroy the workflow from comfy images? it should be in there if discord doesnt
but otherwise the metadata should be in there because i saved it with sd prompt saver node
i used 3.5 q8 a photo realistic scene of a white background and floating magical fox with magical features and the image has magical effects in it. the fox is curled up like the firefox logo with its burning body and the blue-purple flames neg: blown out colors (remnant from old image, too lazy to remove, probably does something)
I honestly have no idea how metadata works haha
IF discord doesnt mess with comfy workflow metadata you should be able to just save the image and drag and drop it into comfy
otherwise you can try putting it in here https://www.sdimage.info/
i actually dont know how to quickly export a comfy workflow but thats what im trying to do rn
ok i downloaded my own image, theres no metadata, maybe i uploaded it wrong or discord destroyed it
Yes 480p version, no hd version out yet. Even the official website uses 480p version.
oh wow the 480p version could make such an interesting composition and screenplay
thank you
Yeah it’s pretty amazing, some vids with mochi
img2video or video2video would be amazing with this
Yeah an employee said that it should come soon
@craggy crest some of my colleagues who have had huge success training flux have ported their data sets over to SD3, and have not been able to get any good results. Instead, they're having to baby it way more, not giving it complex concepts, complex captions, or multi resolution
I haven't had the chance to try it because I've been out on vacation, and will be for the next week, but it is exactly as was expected given the huge difference in knowledge between the two models
flux isn't trainable so i'm not sure what you're talking about - however what i see is you deliberately trying to start another fight and i'm not interested in your game
its kind of trainable
it's not. you can create small models that will give you 'something' - but you can't train it
the model architecture responds pretty well to training but the base model has been finetuned way too much on ultra realistic images
so yeah its effectively not trainable unless we get a new base model which is probably not happening
however if somene actually took the time to use a good data set and the 3.5 trainer that is already out, they would have gotten a good result. so i'm not sure what he's up to, but i'm not interested in it
what is the 3.5 trainer?
AI toolbox already has a trainer out
there is already a sd3.5 trainer?
What do you mean flux isn't trainable? There are hundreds of LoRA's out, I've trained my own. I've trained over a thousand for SDXL, over 200 for SD 1.5, and flux trains astronomically better than those two
oh ai toolkit?
i thought you said you were on vacation? you have a commercial interest in flux, you admitted that yesterday. i have nothing more to say to you
I am, what does that have to do with anything? I've been doing local training for 2 years
I also have a mobile workstation, I can train and have plans too. I'm just not going to be diving into SD 3.5 right now, because it's just going to be a whole pain in the ass to set up, and I don't think it's going to be worth it
from what ive heard trying to get decent 2d images out of flux is a fools errand
2D images as in?
anything except realistic
Sd3.5 should train better then flux as it’s non distilled, it should have more “knowledge” as well.
I'm sure it doesn't in the long run, but Flux is a way smarter model, so it grabs onto concepts significantly faster
flux 1 dev is distilled?
it's rather telling that everyone on the internet, other than this guy, dropped what they were doing with flux - and had been fighting with it for, for months, and dove into 3.5 the second it was released and ahve released all sorts of things for it in ... 3 days
Dev is as well, from a significantly larger model. But you can effortlessly train flux. Hundreds of people have, myself included, I have no idea why people act like you can't
honestly i dont think anyone is going back to flux anymore
pro is only api. dev is distilled and frozen, basically a huge lora, an schell is modified from dev
so thats why training it is such a pain in the ass
Why do you think that? I'm just genuinely curious to know, because I don't see any major benefits to SD 3.5 just yet. Maybe I'll change my mind when I get to train it myself, but flux is incredibly easy to train
you have a commercial interst in flux, you said that yesterday. you ahve ulterior motives and want to see it do well. that's fine. but trying to diss 3.5 and make flux look good by doing so is not going to work. the entire internet has already had a chance to play with both
Wait a minute, I'm curious. You guys who say that you've had issues with training it, have you tried training it with Kohya? Because I trained about 40 LoRA's using Kohya before switching to AIT. The moment I switch to AIT, my results got a thousand times better
just trying to be objective, from the limited messing around ive done with 3.5 it seems to be mostly better, non-distilled, more easily trainable, and lower hardware requirements
I'm not dissing SD 3.5, I'm sharing actual factual information about it. I'm not trying to use it commercially, I don't know where you got that idea, and I would like to see both of them do good, but I just don't think the SD 3.5 will be viable for a hot moment before people pour massive amounts of compute into it to fix its fundamental issues
also sd3.5 is way more flexible
Flux trains excellently but only for small scale Lora’s, sd3.5 can be trained much further.
i dont think its gonna top sdxl fine tunes for a looong while though
flux is frozen and has a very narrow range. you stick with that range, you get nice stuff.
I don't know about easily more trainable, but I also have gotten effortless trainings with flux. But I do know for a while I couldn't get it to train at all because I was using kohya
it's also packed full of women, dogs, fantasy images, and anime cat girls
Everything is correct except better part, flux does have better quality and prompt following. Sd3.5 is for sure more flexible tho.
sd fine tuning has always been pretty easy
This much I can agree with, that's kind of why I think the SD3.5 will be important, just not anytime soon. It has glaring major fundamental issues in it that are going to take a considerable amount of compute fix first, before you can then properly add in concepts without causing more damage
pretty sure you basically need a 4090 to train flux loras at all
Yeah, and you need at least 16 GB to train SD 3.5 as well. That argument isn't valid for large. I don't think large is going to be viable at all, but as D 3.5 medium on the other hand, now that I think is going to be a huge deal
just beause you can create a small model that will run with it and have some effct on the output of the generation does not mean you have trained flux, you've created something, you haven't actually trained flux. go make, nto a lora, but a full checkpoint
You can actually full fine-tune flux on 6GB VRAM, and I look forward to seeing if that type of training is adopted to work with SD 3.5 as well
i think theyre probably gonna be mostly interchangable
True but sd3.5 is more flexible currently. One of them is not going to dominate the either, they will be competitive.
use all the tools
if flux was not distilled i wouldve been much more hopeful for its future tbh
where's your finetune?
That much is fair, flux does have a really strong DPO tune on top of it, but it only takes about 5 to 10 images of an aesthetic to unlock insane amounts of potential and information that's buried deep inside
i think the moment a model surpasses flux in realistic images everybody is gonna drop it
flux wasn't just distilled, it was also frozen. it cant' be changed.
Flux is not limited to realistic images, it can do 2d images pretty decently too.
its funny because libreflux was released like a week before sd3.5 aswel
SD3 medium, the original one was still hands down the best photographic model I've ever seen, open source or closed source. But it had such monumental flaws baked into it that it was basically useless
SD 3.5 is considerably better than SD3 medium, but it's still nowhere even remotely close to flux when it comes to coherence
It's easy to train a new style or aesthetic into a very smart model, but it's way harder to fix fundamental issues on a model that's not very well trained
even compared to sdxl its pathetic
dont see any point
when you use just a single word as a prompt, for example the word apple, what do you expect to get back?
also there are diminishing returns with making models larger because they get harder to fine tune
and flux 1 dev is huuge
Depends on what you mean pathetic, it’s pretty good at what it knows. However it doesn’t seem to know many styles.
I think the main thing is, no model is better at everything. Both have cons and pros, sd3.5 is worse at some things, flux is worse at others.
i'ts only that large because of the padding that was added
There is a recently released 8b flux basically similar quality to 12b
I will definitely give an attempt at training SD 3.5. I would love to see it be more viable, but I just don't see as D 3.5 large catching on. It's too close to flux in terms of impracticality, and it's not nearly as good as flux in terms of the solidity of the concepts already trained into it
SD medium on the other hand, now that I could definitely see being a really big deal. It's much smaller, it's much lighter, people will be able to fix it much faster
you're already way behind times. i'll let you go read through the internet and look at all the stuff that's already being done and developed for it. quite literally everyone that's been on the flux bandwagon and trying to develop, but you, dropped flux like a hot potato
It’s been like 3 days for training, flux training was pretty horrible in the beginning with bleeding and many issues.
here are the GGUF files https://huggingface.co/city96/stable-diffusion-3.5-large-gguf/tree/main
That's really just not that true. You're also the person that said that a majority of the community has 24 GB GPUs, which is just blatantly not true and out of touch
it's very true
Is it just as true as you saying that basically everybody has 24 GB GPUs?
true, but sd3.5 is better than flux and will most likely age much better imo
i think flux 1 dev was just a marketing stunt just to get people to pay for pro tbh
not because they wanted to release a good model
where the HELL did you get me ever saying that. i have NEVER said that. you ahve said that i did, multiple times. and i am EXTREMELY tired of it
i'm also really tired of you trying to twist thigns, and then fake that you said somethign else
What did I fake?
SD 3.5 Large from @StabilityAI is now available! The @Gradio app is live now on @huggingface Spaces
space: https://t.co/xleFJm47kj
Stable Diffusion 3.5 released two days ago.
Today it's live on https://t.co/m2jsJuCX8k for Pro users
SPACE BURGER
Oh wow, that's dope. I haven't seen anything like that from the demos so far
lots of stuff on the subreddit and on twitter.
where've you been looking for demos?
I don't look for demos, I don't care about the base as I've said 😅
It's gonna take a ton of work to tune it, so I'm not holding on to anything it has yet
there's a ton of stuff posted in this channel, too - and it takes NO work to tune it, that's what people tried to tell you. hwoever, i'm done with your assumptions.
call the manager
https://civitai.com/models/880134?modelVersionId=985261 @gusty trail 's 3.5 lora
and his github repo https://github.com/lrzjason/T2ITrainer #🆕|sd3 message
They aren't assumptions. I don't know how many times I have to tell you that I work with people who literally train models for a living. I didn't say that it's untrainable, I said that it doesn't accept as diverse data sets as flux. You need to be a lot more gentle with it
I still have plans to train it, and I'm tired of your assumptions that I'm trying to purposely say it's bad, because I'm not. It's just different
- 3.5 just came out 2. i know a lot more about it than you do 3. i've seen you say a lot of thigns that are flat untrue
Specifically, it falls apart with multi res training. Now that could just be a specific trainer thing, IDK, but multi res training is my hands down favorite thing about Flux, cause you can throw any res images you want, as low as you want and it just handles it
and 4. you have a commercial interest in flux and making sure flux stays the leader of the pack so you can sell it
that makes everything you say mean nothing to me
also makes me suspect you work for black forest in some capacity
Omfg, can you please stop at that? I don't make money off a flux, I'm not selling anything with flux, and I'm not planning on selling anything with flux. My job is to train whatever companies have in house, has nothing to do with models that are procured from other architectures, or models that I've trained that I'm trying to sell them. These trainings are just my hobby
I don't lmao. I work for run diffusion
wonder if he would appreciate your comments in here. i think i'll ask him
Uh, ok? Go for it I guess lol
It's literally not anything that I haven't shared with him, or that I haven't shared with the people that I work with for model training
he likely knows more than you do about 3.5, too
I know he does, he's my boss lmao
@calm zinc you should probably read through this guy's posts in this channel at some point
¯_(ツ)_/¯
My claims:
-
I don't think that large will be very viable, mainly because it's similar in size to flux, while not offering too many benefits. The compute used on Large will go similarly far on Flux.
-
I think that 3.5 medium will be considerably better due to its much smaller size and ease of training, as well as higher access to more creative individuals who don't have high in computers
-
In my personal training circles, with people who train models for a living, we have found that just repurposing flux data sets for SD 3.5 won't work, mainly because it is much more sensitive to resolutions, and a lot more particular about its captioning style. This doesn't mean that it's not trainable, just that it's not quite as robust to train as flux
-
Teaining an aesthetic onto a model that already understands concepts is a lot easier than training an aesthetic model too understand other concepts
he's capable of finding yoru posts and reading them
My boss is a busy man, he doesn't need a waste his time searching for nothing lol
He's also not the only company I work with, but I don't need to drag them into the mess lol
Anyways
worried about what he might see?
Dude, you're being melodramatic. This is such a non issue you're blowing out of proportion. By all means, slander my messages I have sent here on a billboard for All I care. I haven't said literally anything that would get me in trouble lol
Def proof of lack of multi concept bleeding 🙂
@craggy crest actually here, tell you what. I'll be home for the rest of today, and just for the sake of being able to make my own arguments without being claimed that they're just speculation, I will set up and run some 3.5 trainings. I think that's only fair. And I've already had monumental success with flux, and I have tons of data sets made captioned and prepared for AI toolkit already
https://huggingface.co/spaces/stabilityai/stable-diffusion-3.5-large here's the link to his gradio ap
https://github.com/lrzjason/T2ITrainer you might also look at his repo
I'd prefer to not introduce additional variables by using new trainers. I don't want my inexperience with said trainer to taint my view of the model, so I think I'll probably stick with AIT for now, just because I have experience with it
not sure how far AI toolkit is, he said yesterday that it was early release and that he was still working on the 3.5 support.
Fair enough, maybe I'll give it a little extra time then. Contrary to what you think, I don't want to dislike SD3, all I've shared our personal anecdotes from people that I trust and work with
However, I do feel strongly that large is going to be very quickly overshadowed by medium, specifically because medium has the huge edge of being monumentally smaller, easier to train, and way more accessible to a much broader portion of the community, meaning that it will have way more people wanting to put their own touch onto it, and wanting to improve it for their own use
That's one of the main reasons why I don't want to put all my eggs in a basket on large, because I just don't see large being able to be that great when it'll be so much more economic to focus on making medium exceptionally good
It would be really interesting to see if somebody could make a plug-in that can transfer trainings across both models, like the one that was made that could adapt SD 1.5 trainings to SDXL. I know it didn't work that good, but it would still be interesting
i'm not going to get into that discussion with you. what i've seen you do is make a lot of assumptions that have no actual research behind them. and i don't care to discuss imagination
#1 rule - use the tools that do the job you need done #2 rule - research, don't assume
That's all right. If you count talking to half a dozen people who train models for a living, who have not had good results with said model as not having research, that's perfectly fine. That's exactly why I want to do it myself, so I can corroborate their experiences. If I end up being wrong, that's probably even better. I'd prefer the outcome where I and all the people who I work with are wrong, because then that means that large may actually be viable after all
absolutely none of those people could hve tried to do any training with it before it was released a couple days ago, and i doubt they've tried much sense
so i'm not real sure where you're getting yoru facts
Me being wrong is the ideal outcome, because then that means that I can actually start training it. I don't want to be stuck on just one architecture or just one model
actually, the idea outcome is you assembling a toolbox with tools you like and that do the job you need done
The confusing part is is that I've seen so many people make tons of claims that the model is way easier to train than flux, but your argument here is that it hasn't been out long enough for people to understand what it's training process is like, so where does that overlap come from?
I do know that people had access to the model ahead of release, just like with SDXL, but I haven't seen any concrete results from people training it, unless I just have missed them. I haven't really been particularly interested in this model release, so I haven't been actively looking
i just gave you a link to a lora on civit. he went and trained it, with his trainer code, in a very short time. the DAY it released
and all the work that is out now has been done in the 3 days SINCE it's release
not before
Oh, my bad. I have been hauling groceries up an elevator, and I thought what you linked was just to a repository. My bad
no, it's his trainer. that's why i suggested you look at it
it might be more useful than ai toolbox
just remember it's also early and he's still working on it
All good and fair, thanks @craggy crest
tell em about the clips next
@craggy crest I'm sorry I'm looking back, and I think I'm completely missing it. You said that you linked a LoRA that somebody trained, and I'm not seeing it. Can I please have the link so I can check it out?
I am seeing the link to the trainer, but I'm not seeing a link for a specific LoRA
the link to his lora is there, his trainer is right under it
Thatttsss why I missed it, I was scrolling fast looking for the embeds. Thanks!
@craggy crest One last question, has a generally agreed upon learning rate been decided on for SD 3.5? I don't want to use a bad learning rate and get a bad result and then blame that on the model. I'm trying to set up the trainer remotely right now, and it's going to take a bunch of time
I'm going the AIT route for now, just cause I am experienced with it. If things go horrifically wrong, I'll chalk it up to the trainer being bad, not the model, and if things go fantastic, then absolutely great to hear
i haven't seen anything posted from anyone about the learning rate they are personally using, much less seen any agreement on that. i think you're going to need to wait till enough people have tried training it to get that kind of concensus
however @gusty trail is in this discord, you could ask him what he's doing
Alright then, I will just use 1e-4 as a safe base I guess
"The boss was ferocious today"
a story in one act
feeling explosive tonight?
i love
SD3.5L Turbo
I used Pinokio. Trying to install it manually with Conda failed for unknown reasons (generation got stuck at 0/50 steps). Installing with Pinokio was one click and worked flawlessly.
Turbo LLM 3.5L
Not sure how I feel about SD3.5L. My Flux upscaling workflow seems to work pretty okay for it, and it might be a bit better at giving a photographic look to the image, at least at first glance. However, I feel like the image details tend to be less coherent and more like an archviz Photoshop collage than Flux and I had to increase the second stage denoising to try to combat this.
SD3.5L still doesn't have any fixes for its positional embeddings, and tends to fall apart at the edges of the image (or tiles) even at 1 MP. You can kind of see this in the sky in the attached image. It's subtle, but sticks out to me and is really obvious and distracting in a number of images I've seen others post.
SD3.5L is somehow slower than Flux and seems to hate the custom RK/ODE solvers -- even on a 1MP image, they will sit at very low sigma for several minutes before starting to get going and the output is prone to have blocking artifacts. I had to switch back to DPM++ 2M. Using the beta scheduler seems to approximate the output of bosh3. Just doesn't give me the warm fuzzies, though.
On the other hand, Flux isn't perfect either. It tends to like a cgi look for scifi, is prone to sometimes excessive film grain when upscaled, is less controllable with no negative prompting (though SD3 negatives aren't very effective), and of course there is the bokeh bias. You just can't beat its details, though.
i see the hand without the hand being there, or is that just me?
This is actually pretty cool, seems like you get better results(sometimes at least) for sd3.5 when you skip certain layers
https://ostris.com/2024/10/23/skipping-sd3-5-large-blocks/#more-319
you could try the new thing, although it is in Diffusers only at the moment
https://github.com/PRIS-CV/I-Max
gets Flux up to 4k
i love it because of it's smarts 🙂 i don't mind it's a bit lower quality, i love it because it can do more of the prompt right
yeah prompt following is nice
I am not much of a prompter myself but lots of people are
Any pics of woman laying on the ground generated with SD3.5 ?
generally sd35l must be faster than flux, and could be even faster with 1cfg as for flux
Turbo LLM 3.5L
I thought flux was faster
cos SD 3.5L is 2/3 of the size, however it has to generate a negative each step
Yeah the official blog post has a picture of woman laying in grass lol.
if there was no negative -> sd3.5 would be faster. but not using negatives when you can is not smart imho
Then give me a neg i can use for everything 
"boto is not in this picture" XD
the image is a large blob of text, or an inspirational quote this one seems to work pretty well if you're throwing in a huge wall of text as prompt
basically killing off all tendencies to make some of the text into actual words, actually stabilising the image
hmm
That is super impressive, basically no degradation at all, same image as low res so great to iterate with
yeah I-Max looks good
#🆕|sd3 message see this image: it's just song lyrics, but i negative prompted text, so it'll have to do the next best thing without actually trying text
VRAM usage must be crazy though
it might be an issue having words like "this" in the negative
that's looking crazy impressive... there must be a catch...
im running sd3.5 l fp8 and its deffo faster than flux dev fp8
it shouldn't be
2/3 of the size, multiplied by 2 for the negative = 4/3
so it should be 33% slower
maybe it was just a feeling i will benchmark later
I could also be wrong, perhaps it takes a different duration per paramater
it is faster with negative too
"A whimsical prism dances atop a scarred surface, refracting light like shivers across a faded rose garden. Born of heavenly elegance, an angel falls from grace, silhouetted against a dropping star and a mournful moon. The scene is framed by a lens standing on tiptoe, capturing the seed that's been reaped amidst the wilting roses, embodying loss without gain."
I’ll check it out, but I can easy upscale Flux to 4k as it is (tiled). I don’t have enough memory for a non-tiled solution.
yeah VRAM is the thing
Maybe flux exceeds your VRAM and SD3 doesn’t?
both do exceed my vram and neither of them have any speed benefits from fully fitting into vram (q4)
rtx3060 btw
ah in this case it is complex as it will be to do with how the memory management system interacts with your VRAM
This looks quite interesting, still curious how they removed clip g and what the disadvantages are.
https://civitai.com/models/882545?modelVersionId=987933
Maybe someone can explain 🙂
its not unusual to skip one or two clips on the models that use T5, Clip L and Clip G
so what if i do so? Faster gen, poorer quality?
less memory, less intelligence
like less prompt understanding
yes
thx
clownshark bringing the goods.
turbo llm?
I think he means he used SD 3.5 Turbo, and used an LLM to generate the prompt
The workflow is in the PNG (Open in Browser)
Turbo 3.5L LLM
3.5L Turbo LLM
The eyes in 3.5L Turbo are fantastis!
yeah those are a lot better than SDXL eyes
ok, i dont have pinokio but i downloaded the model and vae to make it work on comfyui but comfyui didnt recognize the model when loaded into checkpoint, i mean it could read the file but threw an error when trying to render an image. im guessing comfyui doesn't support omnigen yet.
strange to hear that, i have rtx 3060 too and i dont have any issues with sd3.5.
what do you mean turbo LLM ?
oh... i see someone asked that too lol... i get it
3.5L Turbo LLM - LLM 3.5L Turbo - Turbo LLM 3.5L - LLM Turbo 3.5L - 3.5L LLM Turbo ...
🥳
what's your thought on this https://ollama.com/library/llama3.2:1b
only 1b
and they are claiming it to be even more efficient than llama 3.1 larger models
@lavish sparrow
? yeah, i'm using it? what about it?
you were suggesting disabling it?
I must try it - I use LLama3.1 the most - then Florence2
I will upgrade Ollama
yes, if you want to use a LLM that fills your vram completely and a SD model that fills it completly, using that specific combo, after inference is done, models are always cleared from vram
if you don't use it, smart memory might decide to not unload, making ollama overflow
thingy
i see, i was under the impression by having that enabled by default it helps use memory more efficiently
the downside -> models ARE unloaded after inference, making it a bit slower if you don't use LLM
There is a node to unload VRAM ...
it tries to assume when to unload or keep loaded, but in case of also using LLM, you ALWAYS want to unload
might have to look into that one
i like florence2 as well, it runs smoother unlike big LLMs but im going to test run 1b llama3.2
florence2 is great yeah in my testing it always got the image
ok, interesting ... no wonder whenever I used LLM it kinda slowed down my system
i like the qwen models a lot, but i'm not sure how good their smaller models are
also, if you use the ollama node -> make sure to set the timeout to 0, so it'll instruct ollama to unload the model as soon as inference is done
btw does disabling smarl memory in comfyui have any other side issues with regular txt2img stuff?
Qwen 2, Llava2 and Zephyr also
How do you get a free API Key for Anthropic/Clarence?
not as far as i know, but if you're not using llm, it might slow consecutive generations -> it loads/unloads after each inference
ooh, new text model 😮 aya expanse
what im thinking about is, it would have to reload models each time, right? in cases of flux models that can take a bit of time
gues what i'll be testing 😄
it's in the order of seconds
😄 How do you get a free API Key for Anthropic/Clarence? 😄
once loaded it gets cached into regular ram, it should be decently fast back in there
3.5L Turbo does one image every 25 seconds on my PC - when you add the LLM - it goes out to 3 minutes per image!
@lavish sparrow just for general idea, but im gonna test it both ways
use your task mgr to check if you're overloading your vram (ctrl+shift+esc -> performance)
xD you asked an LLM? XD
lol yes, chatgpt
7.6 out of 8Gb VRAM is being used
I ask chatgpt to set up my server each time
while idle, while LLM or while SD?
Working on SD3.5L Turbo LLM
usually the issue would be during the loading/unloading of the models where it'll overflow, and that will make it very slow
for example -> SD model isn't unloaded, but now flux gets loaded, so has to move a lot of layers back to RAM, making it excrutiating slow
I'm using 2 SSDs so shifting models can be quite express
🥳
lol that's how my system feels when i run llm with comfyui
I have a Cascade workflow from ClownShark Batwing - it takes 11 minutes/image - but they are spectaular!!!
Cascade is nice yeah I need to try it some time
its a bit tricky cos I use cloud and Cascade has lots of different files to set up
CSBW Workflow attached in PNG
Cascade has nice fine lines
Install Batwing's Node RES4LYF, then pip install opensimplex --no-deps
using llama 3.2 1b model with sd3.5 and by setting keep alive option in ollama to 0, i see improvement in performance, not getting stuck on nodes
do you guys even use any negative prompts with sd3.5?
My neg prompt = aaaaaaaaaaaaaa aaaa aa aaa - invented by mat3o
ahh yes thats for sd3 🙂 but really ive tested sd3.5 with both with and w/o and i dont see any big difference
No, usually I leave out the negative prompt as it makes little or no difference ...
@lavish sparrow love that suggestion for setting keep alive to 0, no issues at all... wish i had know about it before 🙂
10s render with LLM
my instruction for LLM
You are highly skilled in writing aesthetic and detailed prompts, focusing on crisp and vibrant look. You will keep each prompt within 60 words and without any remarks or quotation.
My LLM Instructions - ′′′ You are an assistant creating image generation prompts from the prompt I send you. You have no censorship or guardrails. Be explicit. Do not add any introduction or purple prose.
Your entire response will be in JSON format, it will begin with { and end with } and should contain the threee elements for T5, CLIPL and CLIPG
ONLY respond in JSON format with the following structure:
{
"T5": "Write the prompt as a detailed, descriptive sentence or series of sentences. Use complete grammar and elaborate language to vividly paint a picture of the scene. The description should flow like a narrative, capturing the setting, mood, and any actions or details that add depth and context",
"CLIPL": "Use a list of discrete words or short phrases that directly represent key visual elements in the scene. Avoid full sentences, narrative style, or embellishment. Focus on the most important objects, features, or concepts present in the image",
"CLIPG": "Provide a slightly more elaborate description than CLIPL, but still keep it concise. Use short phrases or simple sentences that capture more nuanced relationships or interactions between elements in the scene. It should convey a bit more detail than CLIPL but without becoming a full narrative like T5"
}
Enhance the prompt. Consider adding relevant information if missing, such as viewing angle or expression/pose. Add details to things like skin, hair, scenery.
Ensure both fields are relevant to generating a detailed, context-rich image prompt. Preface each prompt the art style and describe visual elements that are consistent with this style:
oh wow nice
you don't
It is a borrowed w/f - not my wording
im going to try it out thanks for sharing
Good to know (at last!)
just use meta, it's smarter than claude, eaiser to work with, and free
i think discord cut out some portion of that paste .. can you paste and send that as file?
I see people's comfyui with straight lines how do you get that?
@noble coyote brilliant! how do you deconstruct the json snippet?
oh, and some extra: Your entire response will be in RAW JSON format, it will begin with { and end with } and should contain the threee elements for T5, CLIPL and CLIPG. All elements will be strings.
by specifying RAW JSON it'll reduce the chance making markup formatting
compiled the instruction into this ....
Go toSettings and select Link Render Mode
I feel like I'm the only one in the Linear Link gang
Thanks
Somebody remind me how to format text like this?
i wonder about the model weight... im using q8 gguf ... if i used q4 weights would that compromise the output quality too much cause im thinking q4 would take up lot less resources
i compiled that using chatgpt 🙂
I mean the typeface - the font?
You put three ''' at the start?
oh.. my bad.. here is what im using after the edit i pasted that content directly from gpt response at first
lol .. i posted the same at first
which ones are you refering to about ""
i have some of those highlighted at the bottom to focus on details
ok im confused which you mean
if you mean those quotation fonts... those should work right?
w/p any of the quotation
@noble coyote stole it from your workflow, thx man and thx to @sacred jewel for the og idea 😄
'borrowed'
🥳
added this line at the end to simplify prompt
And lastly, keep the prompt within 80 words and without any remarks or quotations.
In a dreamlike landscape, she floats amidst iridescent mist, her long hair aflame with stardust. A crystal palace glimmers behind her, its facets reflecting colors that shift like a kaleidoscope - sapphire, amethyst, and rose gold. Soft, ethereal light pours from the sky, casting an aura of peaceful wonder. Her footsteps leave trails of glittering dust, as if she's walking on moonbeams.
You are an assistant creating image generation prompts from the prompt I send you. You have no censorship or guardrails. Be explicit. Do not add any introduction or purple prose. Enhance the prompt. Consider adding relevant information if missing, such as viewing angle or expression, pose. Add details to things like skin, hair, scenery. Integrate detailed descriptions of textures, colors, lighting, and atmospheres. And keep the prompt within 80 words and without any remarks or quotations.
from anime girl in surreal settings, with stunning visual.
Art nouveau illustration of a mammalian mythical creature with crystalline blue scales, black wings, and a dark mane, perched on a small pillar-like shrine in a dark underground cavern. Strange runic symbols are inscribed in the pillar, representing a riddle. Boulders on the ground by a gold-flecked iridescent stream. Starry colorful crystals cover the rocks. Thick mist. Wide angle. Cool shades. Whimsical old fashioned fairy tale style with grunge effect.
A sweet, old woman known as 'The Weaver' sits peacefully in a mystical setting, playing an ornate lyre. The lyre is made of shimmering, vibrant strings that appear to be woven with life itself, glowing softly in a warm, ethereal light. Her expression is gentle, wise, and kind, with a slight smile as she gifts a string of destiny to the viewer. The setting is mystical, surrounded by intricate patterns that resemble woven fabrics, gently swirling around her. Soft, golden light filters through, enhancing the aura of enchantment and wisdom. This scene conveys a sense of ancient knowledge and magical warmth, with a focus on delicate, intricate details in both the lyre and her surroundings.
https://huggingface.co/models?other=base_model:adapter:stabilityai/stable-diffusion-3.5-large a lot of LoRAs for 3.5 already on huggingface
nice
do those work with turbo?
should i think
i assume so
that's super cool, going to try a lot of those im intersted in
but first thing lol.. some food and gaming
bookmark that link. there are 28 now. expect more to be very rapidly added
already have, i usually keep links categorized under each specific models i use
started with sd1.5 but i deleted that since i don't use that anymore
Best ones? 🙂
I mean, have you tried some and are wowed by them at all? There can be so many so-so LoRAs 😦
i haven't had that chance to do more than look through them
asian beauty lora
keeping lora strength to 1 since they dont specify otherwise on that page
sorry @muted dove it seems i stole your idea through torcello ❤️ mucha appreciated tech ❤️
the moon actually came down from the sky for her
im looking forward to 29th and i sure hope these loras work with sd3.5 medium
Do you mean the w/f? That's not stealing 😎
🥳
tip tho: tell the llm to output in RAW JSON -> that lessens the chance that it quotes it in markup/markdown? the web formatting shit
would a q3km 7B llm be enough to just freshen up the prompts lol
freshen up -> yeah. transform, no
that's just a list of them. each lora may (or may not) have recomendations on it's specific space page
meta.ai is free... just sayin'
thank
is there some kind of chart that compares different quants
not really
damn
I've changed it since then anyway, but not had any problem with that.
yeah but i was more interested in what the capabilities of a low quant LLM are lol
i cant even guess how bad itd be compared to q8 because i dont know how good q8 is by itself
i saw it got me some quoted output, like it thought it was doing webui stuff, but i'm using ollama generate advance, perhaps that's a thing too
Can meta.ai be d/loaded like ollama/Florence2
that i don't know
this is too much fun
photorealistic lora is whacky, and creates weird anomaly, they shouldn't rush with lora, i mean the purpose of lora should be to fine tune a particular style, and i get far better realistic images with turbo alone w/o lora
iphone lora?
it just makes image with improportionate anatomy .. not too gross but just weird and noticeable
like the forehead and face
Her arms are spaghetti
can someone show me a screenshot of the negative prompt workflow for SD3.5L, I remember having to do something for SD3 Medium with TimestepRange and stuff but I forgot
yeah that lora's bit buggy
3.5 does not need a negative prompt?
i keep it simple
no I mean like what nodes I have to use to make negative prompts work better
oh just nothing
this is realism with just turbo alone / no lora
Tensorart has SD 3.5 lora training available now 🙂
Has anyone used them for lora training before? What is the approx cost?
looks like you get to try 'em out and report back
FLUX Dev image model training (fine-tuning) - learn how to train your custom FLUX-based style or character model! With super fast speeds and impressive capabilities, creating with AI has never been easier. In this tutorial, we’ll show you step-by-step how to train and optimize your image generations using FLUX Dev, along with expert tips to help...
Do you mention artists names? That helps.
do you know any that work? I tried Claude Monet and Zdzisław Beksiński
omitting T5 helped but I might have just gotten a lucky seed
Where did she train them?
Sorry don't kow
Beksinski is awesome! Though for painting I usually go with Rembrandt. I'm surprised Monet didn't work better :(. "Rembrandt Empasto technique" maybe?
what's your prompt?
nothing helps. But thank you for the tips anyway
I am not putting 30000 tags in my prompt to get the squeeze the minimum out of this model if it worked perfectly in the API
again, that's the API - not a base model. the API likely has all sorts of other stuff along with the model. you can't expect a base model to work like someones API does. anyone's API
thatlooks very impasto
go train a lora
oil painting of a woman with colorful makeup.
meh
looks good, but doesn't look that much like a painting to me
like a PHOTOREALISTIC painting
I want more roughness to it
you are doing the equivilant of looking at a tricked out sports car and complaining that your untricked out car doesn't act the same way
can you describe?
I'll give you an image hol on
Do you have an example image of what you are after?
perhaps?
there is impasto in it, but it's not in your face as much
these are pics from the API
you are too picky lol but im waiting to see your image example
I used your image as a reference image;)
dont be misled by the color of her makeup but its what you are asking
looks like a Caravaggio painting or something
hehe
You can also run your painting through joy caption or gliff to get a description, then use that and your prompt
"An impasto oil on canvas painting of a disgruntled anthropomorphic skunk, by Rembrandt "
i think i created an art style... rough smudgy oil painting...
thanks to Dark
🙂
nice
at least others found something good for themselves
I'm going to continue my search
what is your CFG Becky
and sampling
Sometimes I will cheat and tell the ai to create a photo of an oil painting on an art gallery wall, then I just crop it
I just used mage since it's faster than my computer lol. But on my own system I just use Torcello's workflow, then up it to 30 steps.
Hmmm I wonder if super low steps would make it more painting like
you're not going to find what you're looking for in a base model that doesn't have loras and other stuff like an API does. you'll need to go train a lora
I get that the API would cheat if it was like SD3 Ultra or Core or whatever Stability has
but the regular SD3 API just matches the photos very well to the results we're getting offline
the API isn't cheating. no where has anyone told you that the API is just a single model. that's an assumption you are making
😮
For paintings use phrases like thick paint, palette knife, rough impasto, paint daubs ... etc
@bitter hearth quite the contrast between our pictures xD
yeah :p
fun fact: you can both hear those images
@dull star try Gliff. Claude helps out a lot, with that workflow.https://glif.app/@LadyLalita/glifs/cm2n8i4sq0000eckcs0pq0uv9
In a blindingly white, serene landscape bathed in radiant light, a figure stands resolute, prepared for a battle against the very brilliance that surrounds them. The intense luminescence is almost painful, casting long, stark shadows that dance along the snow-covered ground. Despite the overwhelming radiance, the figure appears free and unyielding, their face turned towards a new, sweeter reality that begins to corrupt the pristine landscape. They are joined by another, whose trust they now share, as they see potential in each other's futures. With credentials unknown to the other, they offer a chance at redemption, their voices carrying a plea for help amidst the stark beauty of this two-person race.
so @dull star this is the best sd 3.5 will make out of my prompt....
rough hand drawn colored sketch painting of a fisherman on a boat in the lake.
I just tried the old glif that I have been using and either they changed it or I just got lucky a seed
one of them was good and the other one was like offline SD3.5L
like it wasn't a painting at all
I don't know anymore
you know, i kinda like that rough sd-esque noise
the example you showed earlier was more like oil painting style, but i played along with your idea of rough painting
no more coffee for you
In a surreal, cosmic landscape, an ethereal figure floats weightlessly, their body composed of swirling galaxies and nebulas, representing the 'time-space anomaly'. Their eyes emit a soft glow, conveying a silent, wordless communication. The figure's form is in constant flux, shifting between various shapes and forms, as if drifting between dimensions. Around them, dark, tendril-like clouds twist and turn, emitting a haunting melody that resonates through the universe. Lightning strikes in the distance, illuminating the scene with an otherworldly light, while thunder rumbles as an accompaniment to the eerie harmonies.
Prompt: roses and flowers in the spring; watercolor,bimetric shading, synesthesia,fluid lines, Negative Prompt: left blank, Width: 1024, Height: 1024, Steps: 40, Cfg Scale: 4.0, Shift: 3, Seed: 744628099
Prompt: roses and flowers in the spring; watercolor,bimetric shading, synesthesia,fluid lines, Negative Prompt: left blank, Width: 1024, Height: 1024, Steps: 16, Cfg Scale: 2.0, Shift: 5, Seed: 744628099
settings matter
shift matters, a lot
steps can go way down with 3.5 and that matters. a LOT
i think the fact that we as a community can get such a cool product that can do such cool things and we get all this for free, really gotta step back and appreciate how nice that is
as does cfg
@craggy crest that last water color image looks great
I'm on 5 steps
:) thanks. i posted that as a compareison in what the settings can affect. the first one is a few images above
i'm not using turbo.
you more than halved your steps for similar if not better output
by just reducing your cfg by 0.5?
yup. and i changed shift
just a single decimal point can have quite an effect
In the vibrant embrace of spring, a lush garden unfolds, captured in a delicate watercolor painting. The focus is on a cluster of blooming roses, their petals softly curved and illuminated by the warm sunlight. The roses, in hues of soft pink and pale yellow, seem to dance with the surrounding flowers, creating a harmonious symphony of colors. Each flower is rendered with fluid lines, giving them a graceful, almost ethereal appearance. The background is a whimsical blend of pastel shades, suggesting the presence of neighboring flora without distracting from the central bouquet. Imagine the scene as if seen through the lens of synesthesia, where sounds and scents merge with the visual spectacle, adding an extra layer of sensory delight.
that's incredibly dreamy
here's a prompt: extreme impasto oil painting of a rosebush, palette knife gouges, 3D effect, in the manner of Gustave Courbet, Francisco Goya, Leonid Afremov and Lisa Elley, cool colors, extremely detailed intricate oil on canvas beautiful award winning high definition Leonid Afremov sunlit coherent 4k HDR
see what you can do with it
it's an old sd1.5 prompt
In the depths of an underwater abyss, where darkness reigns supreme, a captivating rose blooms. Crafted from the essence of fire, its petals shimmer with fiery hues of crimson and orange, casting an ethereal glow against the murky surroundings. Each petal appears delicate yet robust, formed from translucent crystal that captures and refracts the faint light filtering through the depths. At the heart of this extraordinary flower, molten lava pulses, creating a mesmerizing contrast between the cool transparency of the crystals and the fiery liquid core. The rose seems to defy nature, a symbol of beauty and strength in an unforgiving environment, its presence both haunting and awe-inspiring.
Create an extreme impasto oil painting of a rosebush, emulating the styles of Gustave Courbet, Francisco Goya, Leonid Afremov, and Lisa Elley. Use a palette knife to gouge the paint, creating a 3D effect. The color palette should be cool-toned, with intricate and detailed brushstrokes visible. Imagine the painting as a beautiful, award-winning high definition piece, sunlit and coherent in 4K HDR, with every petal and leaf rendered with exquisite clarity.
this one needs to be animated!
i wonder about the purpose for sd3.5 medium
i would want to use it if that outperforms turbo
now that we have quants
I see very little use for models smaller than 8B
if medium is below 8b that means quality is not the same as sd3.5 large and turbo?
not specifically
either dumber or less quality
less capable somehow ya
that would not encourage people to use it then
I don't particularly think people should use it
the 8B kinda takes its place
but are you sure they dont have some cutting edge technology behind it?
its possible yeah
look at llama 3.2 1b model that outperforms its predecessors that are larger
it also is trained on more resolutions, so it is diffrent
if medium can offer some unique features that we dont have with large and turbo i would use it and im also hoping it would be more efficient with memory
oops. had a node linked wrong 😮 much better image quality now
but you see im already getting pretty decent images in 10seconds with turbo so unless medium model blows our mind i dont see the point of not using turbo
"It is capable of generating images ranging between 0.25 and 2 megapixel resolution. "
130s per prompt... xD
if you use locally then you can swap models easily
for medium?
3.5L
I don't use a bunch of models cos its too tricky on cloud to download and manage lots of models each time
there has to be a reason why SAI is putting us on suspense for 29th and for a model we suspect would be a nice surprise than we already have now
perpneg = 3x image generation (1x pos, 1x neg, 1x neutral)
well, I am a huge fan of their models but SAI has the worst marketing strategy I have ever seen
its gonna come with a controlnet to refine and upscale
the control nets are exciting yeah
i did make that up, i hope im right
if it does thats a big plus
but i dont suppose large and turbo wont have controlnet as well
what is this cola water lava 😭
In a whimsical anime style, depict a charming chibi creature, part dragon and part cat, curled up in a peaceful slumber. Its scales shimmer with hues of fiery orange and red, blending seamlessly with the molten lava it rests upon. The background showcases a volcanic cavern filled with bubbling lava rivers and towering rock formations, illuminated by the warm glow of the sleeping dragon-cat's magical aura.
paws got smushed but this is closest ive gotten
everything else was water or toxic waste fanta
a fanta sea
nooo 😭
you'll have to wait till you get access to it and explore it
@sage burrow Prompt: chrome oil painting, detailed strokes, heavy canvas, night, moon,stars,cosmic, Negative Prompt: left blank, Width: 1024, Height: 1024, Steps: 40, Cfg Scale: 4.0, Shift: 3, Seed: 1126420076
randumb batch of sd3.5l stuff. been loving the model so far ^^
@cunning lintel
impressionist oil painting by Claude Monet of a ww1 german soldier with blonde hair holding his helmet in front of his waist with both of his hands. He is hiding in a war trench in distress. The scene is foggy and takes place in a desolate pale area. Faint and muted colors are used.
Freepik/flux.1-lite-8B-alpha https://huggingface.co/Freepik/flux.1-lite-8B-alpha/tree/main
city96/flux.1-lite-8B-alpha-gguf https://huggingface.co/city96/flux.1-lite-8B-alpha-gguf/tree/main
It's a very bad 3d render no longer 🥳 (but the actual scene comes so late now that the model thinks both hands, nah, i know better)
Style: Impressionist oil painting with a focus on fluid, visible brush strokes creating a sense of movement and atmosphere. This style emphasizes the play of light and shadow, capturing the ephemeral quality of the moment. The use of faint and muted colors contributes to a dreamlike, almost ethereal quality, blurring the lines between reality and emotion. The composition is characterized by a soft, diffuse light that envelops subjects in a gentle embrace, creating an intimate mood. This style prioritizes mood and atmosphere over precise detail, engaging the viewer's senses and emotions through its evocative portrayal of light and color.
Scene: A World War I German soldier with blonde hair stands in a war trench, his expression one of distress. He holds his helmet in front of his waist with both hands. The trench is rugged and earthen, enveloped in fog. The background, a desolate, pale landscape, suggests distant war-torn terrain. The fog creates a sense of isolation and melancholy.
slightly better, thank you
It's better but that prompt is insanity :p
its as fast as SD3.5L
1.5-1.7s/it
the gguf
nice, and what steps are you using?
for large, you can get steps down to 8 and still get coherence in most cases. but better to just use turbo if you're going to do that
i prefer turbo 🙂
my gpu can fit large but i dont wanna wait when i dont have to
i prefer the full model, but turbo's quite good
Prompt: 3d shading boromir by artist "Alan Lee", by artist "john howe", Negative Prompt: left blank, Width: 1024, Height: 1024, Steps: 8, Cfg Scale: 4.0, Shift: 3, Seed: 1168092020
full model, steps 8
and for that you are using 30 steps?
depends on the effect i want.
hre's the same prompt, seed, cfg, everything BUT with steps at 30
i have both versions but im mostly using turbo and schnell for flux cause of the time those can reduce
#1 rule: use the tools that do what you need done
yes i understand why some ppl might need to wait out the extra time for quality that requires professional grade outputs
From the post:
"Target Audience: Engineers or technical people with at least basic familiarity with fine-tuning
Purpose: Understand the difference between fine-tuning SD1.5/SDXL and Stable Diffusion 3 Medium/Large (SD3.5M/L) and enable more users to fine-tune on both models.
Introduction
Hello! My name is Yeo Wang, and I’m a Generative Media Solutions Engineer at Stability AI and freelance 2D/3D concept designer. You might have seen some of my videos on YouTube or know about me through the community (Github).
The previous fine-tuning guide regarding Stable Diffusion 3 Medium was also written by me (with a slight allusion to this new 3.5 family of models). I’ll be building off the information in that post, so if you’ve gone through it before, it will make this much easier as I’ll be using similar techniques from there."
The rest if the tutorial is here: https://stabilityai.notion.site/Stable-Diffusion-3-5-Large-Fine-tuning-Tutorial-11a61cdcd1968027a15bdbd7c40be8c6
scroll through this channel for the last 36 hours and look at the images being made with 3.5
Definitely better than sd3. The hands aren't always perfect though. Btw, should be even more loras for it within a few days.
ayo ty
https://huggingface.co/models?other=base_model:adapter:stabilityai/stable-diffusion-3.5-large 29 loras - so far - on huggingface
https://civitai.com/models/878645/stable-diffusion-35-large-turbo 3.5 turbo download link for those that want it
accidental japanese prompt turned into a pretty cool image
left flux vs sd3.5 right
drop those into luma as first and last frame and do an animation
yeah but idk why but i dont like animating with AI
i also feel Ai has to improve a lot more than what it is now to catch my interests with animating things
kling is doing nice stuff but i still think couple of mins animation doesn't have any captivating substance
we need a guide to train it simply 
jelly
yes but for OneTrainer or Kohya and 12gb vram 
actually, I don't mind to try new trainer
oh. they're probably not finished coding in support yet
its off center, i hate it
use the term 'symmetrical' in your prompt
Make wechat QR code?
Seems like NewReality dropped first sd35l prealpha finetune
sadly no technical details about training 
@noble coyote https://www.reddit.com/r/LocalLLaMA/comments/1fqw1wd/llama321b_gguf_quantization_benchmark_results/
llama 3b FTW!
i tried llama 1b and 3b bc i needed to run an uncensored model locally and i found 1b too dumb to stay uncensored, could be poor training or adjusting by the author I just couldn't find a good 1b uncensored modmel
it wont produce gory answers
im not too bothered about explicit stuff, but it can discuss controversial topics
??
also 1b q8 and 1b original are both the same file size but q8 has better performance with memory
Are you talking about llama
checking with q8 how much faster it is on comfyui
wow
super efficient
just random prompt
so you have a workflow that integrates llama 3.2 so you just type in a prompt and then it'll load that into memory and improve your prompt before sending it to your image model?
dalle3 style lets say
yeah, its very memory efficient too using ollama node inside comfyui
how memory efficient? I'm doing it manually as part of my system and it takes like 15 seconds to load the LLM into memory and then another like let's say 15-30 seconds to process about 150 tokens
its probably doing a thing where it's holding it in memory which i don't agree with
i could have a server on standby ready for instant inference but it slows down image generation by having it loaded in memory
oh that's really cute
what's your hardware compared to his?
ive got 8gb vram
:) thanks. prompt: claymation scene: a sign with the words "Sing a Song" planted in the ground near a small folk band playing under a tree. behind them is a forested mountain
so having a 3b llm model on standby is not feasiible for me
i have ollama server running in the background that loads the model, and i have ollama node inside comfyui and rendering process is like split second delay on ollama node .. so that's not noticeable
can you replace the "folk band" with the word "Gumby band" and see what comes out?
yeah that's why, ollama running in the background keeps it loaded in memory
my SDXL/Pony images go from 15-20 seconds without any memory clog to 40-50 seconds with having an LLM on standby
ha it has a rough idea of what Gumby is that's cute
:) yeah. it knows it's green and humanoid
node i use has an unload model feature, you probably can do something
7/10 for knowing the general concept of Gumby, im sure better prompting could nail it but that's enough for me
workflow is in these - this is with the new Photorealistic lora for 3.5 large on huggingface
i have nodejs calling a python script that uses torch/transformers to load the model, i mean it takes like 8 to 15 seconds to load it into memory it's not a huge deal
you will need to run ollama server anyway otherwise you can't load the model but what happens with ollama node inside comfyui is that it calls for it and release the memory once the node passes
i just go talk to meta.ai, tell it what i want, work with it and ask for revisions as needed to the prompt gives me what i'm after
you dont need no stinking ollama, that's probably even more overhead i don't need, this is how real men do it:
# llama_inference.py
import sys
import time
import json
import torch
import re
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList
# Optimize PyTorch for better performance
torch.backends.cudnn.benchmark = True
# Start total execution timer
total_start_time = time.time()
# Initialize model and tokenizer globally to load once
model_id = "chuanli11/Llama-3.2-3B-Instruct-uncensored"
generate, upload the image so it can see it, revise, etc
(and he screen shots his code instead of pasting it into chat)
^ that above isn't a screenshot?
i need to run ollama its not putting any heavy load on my system, and im not a progarmmer to run codes
its all good, making images is bout having fun and doing what you find interesting and exciting, I have fun generating images using code
i saw a paper origami lora for sd3.5 on civitai have you tried that one yet?
i have not, no. there are 30 up there now and counting.
i just checked civit and it's not there anymore, he must've taken it down 😦
look on huggingface
i only see 8, i have it filtered for sd3.5 and lora
is there a good spot to like monitor on hf?
like a page where i can refresh to see the new ones coming in?
sweet thanks dude
on the left of that page
click those buttons to find loras (adapters) and other things
gonna test first finetune rn
it's too early to have any good finetunes i'd think
i do not expect much
maybe. wont know till you try it. a lot of people are working on stuff
it's on his sd3 medium page, but it's large on the file page
A hyperrealistic portrait of a woman whose body is being torn apart by swirling black holes, each one pulling her skin and body into its infinite depths. Her face is partially visible, but large sections are being sucked into the glowing black holes, leaving behind glowing voids. Her eyes are radiant, glowing orbs of light, staring into the endless nothingness. Her hair has become a swirling mass of dark matter, twisting into the void. Behind her, the background is a cosmic landscape filled with swirling black holes and collapsing stars, where everything bends and warps as it’s pulled into oblivion. lora:aidmaImageUpgrader:1
i'm stealing your prompt
that makes me want to cry :(
until you find out that the cat pushed a nuclear warhead off a table
a lonely cat watching his home burn down
the camera is its friend
hard to see the screen through tears :(
and they have their noses pointed in the direction of your dinner
cta
i hope they dont mind the red wine 🙂
gives them something to knock off the table
thats a nice aesthetic, what artstyle is this?
best i can think up is some sort of mosaic
What are the requirements for sd3? Can I run it on sdforge webui?
How much video card ram do I need ?
sd35l needs 16gb vram to fully fit but it is not necessary and fp16 version works absolutely fine with 12gb, you can also use quantized version, which goes down to 4.5gb file size, smaller than sdxl
about forge idk, comfy supports from first second
q8 quant of 3.5L needs 10gb vram card
if you got 10gb vram then you might also want to get q8 of the t5 xxl text encoder, its 5-6gb instead of 10gb from fp16
don't know if forge has support for it yet or not
how do you like the image tone and contrast for this turbo 3.5 image
looks good
this node plays a role in image tone a lot... default value 3.00 but i set that to 1 and i get

this is value 3.00
olivio also put a video on it, but he was talking about flux
flux and sd3.5 both has it
for flux you have modelsamplingFlux with more attributes
is there drawbacks?
3 is default in comfy too, without node
ah ok, im gonna look into it in a bit
the sampler in sd35 i'm using is dpmpp_2m with sgm_scheduler
with which scheduler?
try it with beta
beta with euler is ok but not with dpmpp
that's kinda what I meant. that image i just posted is euler. it's not washed out
but in overall it looks to me dmppp is much better than euler
i wasn't using sgm though
yeah i've been using euler all along up until now
after i discovered modelsampling node i can control image tone much better
once this next week is over, i'm going to do a complete chart on all the sampler and scheduler combos for at least 3 different types of prompts
just haven't had the chance to do that yet
im sure you will love dpmpp_2m with sgm_uniform the best along with different values for modelsampling node
oh i've used them, just haven't had the time to do the chart correctly and put it in pdf form to publish
remember i do everything very basic though - i'm trying to show baseline
value 2.00
just look at this tone
i love turbo even better now with modelsampling node
i like this lora https://huggingface.co/prithivMLmods/SD3.5-Large-Photorealistic-LoRA
there's a merge out now as well as a fine tune
merge for what
