#🆕|sd3
1 messages · Page 114 of 1
and there's this https://huggingface.co/argmaxinc/mlx-stable-diffusion-3.5-large
that's a finetune
is it me or the turbo 4 steps looks crispier than merge?
i'm biased
good effort 🙂
yes
btw the higher than modelsampling node value is the smoother image looks .. high values would even make image look rather too smooth
i will probalby never use that node, actually
you should
it gives you fine tuned artistry
and value would range from 1-5
i should check how it looks below 1
;) i can get that out of base with just the right prompting
and adjusting shift and cfg if needed
0.6
ostrich egg
i think its safe to say value should be between 1-4
deis and beta for sampler/scheduler
here's a prompt to play with: photo of <SUBJECT> riding a majestic eagle through the skies above a vast forest, with the wind rushing past him and the trees stretching out far below. He wears a leather aviator jacket with a fur collar, goggles strapped over his eyes, thick gloves to protect against the cold air, and a scarf that flaps dramatically in the wind. His expression is one of exhilaration and freedom, capturing the thrill and majesty of flight.
didn't touch it yet
btw, are you using sd35lturbo with cfg 1 or 2?
turbo with cfg 1
thats' schnell
trying out the modelsampling
SD3.5 large, steps: 16, cfg: 3, shift: 2, sampler: deis, scheduler: beta
with the photorealistic lora i posted the link to earlier
so what im seeing from the image i render, trying to control character's pose via prompt is not always great
Thats the reason for open pose. Ipadapter with controlnet for 3.5 is being worked on.
you know that wasn't a issue with sd1.5
or sdxl even
Sure it is. Thats why ipadapter and open pose was created in the first place
if i leave body poses completely upto the model it can create decent images like this one, but any indication to instruct it for specific pose can cause anomaly
Im on my phone, so this'll be slow, but prompt? Settings?
a woman in casual white shirt and shorts.
default settings on turbo
if you add a woman in casual white shirt and shorts, relaxing indoors you'l start to see some strange outputs
What, specifically, does "relaxing indoors" mean?
what do you mean
she is chilling in her home
its just broad specification about indoors - outdoors
You are talking to an artist that has lived all its life in a box, cant read your mind, and only knows about the world through images it was shown. Thats a generic phrase that could mean a lot of things, and is confusing. Be more descriptive
Or itll guess, and you wont like the results
think of it a broad generalization to let the Ai run its possible outputs
Re-read what i said. Thats exactly why you are getting results you dont like
Its a computer, not a human
you cant strictly specify every tiny detail
that would be too rigid for ai to run its scopes
and not a good approach for artistic styles
for example you can specify a blonde woman in red dress or a woman in red dress and the Ai would have the flexibility to give you diverse hair colors
You could, but you dont need to. However, you will get strange if your prompt is generic. Use open pose if you want to be generic and use refrence pose images
this is not too odd to ask a woman in casual white shirt and shorts, relaxing in her livingroom is it?
but what i found out is that any indication of body poses can render strange result
Prompt: a blond woman leaning against a tree trunk, one hand brushing hair from her eyes
because standing is easy to generate but other poses needs more training or aligning because the are more rare in training data
Tell me what you think relaxing in her living room looks like. Remember, im a machine without any udea of what that us as ive never done that
yeah im not compalining about it per se, im just pointing out that the common issue with poses is still prevalent
Because the promot is too general and generic
thts' the thing about prompt understanding, it could have been better trained
You did not soecify a pose, though, thats the point
relaxing hint towards wide arrays of body pose
Use a good prompt, itll have no problem
even with specifying "lying on the bed" anatomy is bad
lol i know but this is not a complaint
we need more tuning and a bit overfitting
No it does not. Not even to a lot of humans. Especially not to a computer
this is my expectation for something better
No, we just need to use clear, specific prompts
You wont get what you expect, if you do not clearly communuicate what you want
you can't fixate language in specific order, thats not organic, also why LLM are actively worked on to improve human interactions
can you show example of such prompt for "lying on the bed"?
its more about understanding complexity rather than making it complicated
Even a human artistvwiukd have a hard figuring out what you were actually after with thst pharse, snd humans have an imgination. Computers do not
no he won't .. its a broad artistic freedom
i would be ok with any pose or any styles, but the deformity
Sorry, but please understand that you are actually programming that machine with your prompt. Make it clear, specufic, and concise
thats where you have to push the limit if you want to expect a functioning AGI
Computers dont function that way
you are avoiding the objective of Ai
what i think of Ai is far more potent than human collective brain
that is expected, when i dunno so im just pointing things along that line
That isnt what AGI is, and you favorite image gen is in no stretch if the imagination AGI
This is about getting a correct result out of a tool
i admire you technical understandings but sometimes you push to defend flaws why?
if you are saying this is the limits of sd3.5 i acknowledge that fine, but thats not good enough, and that's what im suggesting .. something better maybe something to work on
AGI= artifucal general intelkegence. I E. An ai that can do stuff it wasnt specifically trained for. Not an image generator specifically trained to create images.
Youre not going to magically make AGI out of stable diffusiin by being unclear with your prompts
technically AGI would involve all modality in humanoid form but Ai can be digital depiction of our mind interactions
Nope
thats cause we haven't reached a phase to call any AI module developed enough to function on it's own
AGI is the endgame of AI and is more an ethical and philosophical construct than an actual usecase.
please dont paste any wiki texts
humans need to work on their social system and ethics by principles before they can dictate Ai which would have far greater intelligence capacity than collective human brain
don't get me wrong, I like sd35l and see potential, but it has obvious anatomy problems which was not so prominent for sdxl, relatively to it's size
people are klling each other over religion, economy, and politics, please dont train AI with all that nonsense ethics
Yeah. Agi will literally make the human race irrelevant in terms of cognitive progress.
Why would i paste wiki texts?
just saying
No it wont
it would
why are you saying it wont?
human organic brains dont have the bandwidth to match Ai
That’s the whole point of a fully developed AGI and also the point why it’s unlikely to ever fully happen 🙂
But we will get close. And that’s a problem for many.
Specially in a hyper capitalist world.
how Ai will evolve past human reach is that Ai will reach a point of functioning on it's own well enough to train itself
A cat is NGI =natural general intellegence. Agi is just an artificial cat. A baby is ngi. Agi is just an AI that can do stuff it wasnt specifically trained to do. Not "super smart" it could very easilly be very dumb
you are hard focused on the limitations, which i understand you have to take into account but you are not doing that to push past those limations, you seem to settle with those limitations
the problem with the agi vs human intelligence debate is the part about human intelligence
Think of the matrix. Best example for AGI.
Learn to prompt correctly, or use tools like open posr
That is for sure debatable 😂
i hope you live well and i hope to pick up this convo with you in 2030
An agi will for you 😂
i wanna be a bot when i grow up
In times when the internet became obsolete
Aahahahaa this
Guys any useful controlnets for sd3.5? For depthz, cannyedge, normals?
it's been like 3.5 days
Some people need to stop listening to the media and do some research
Its being developed
Lately things move fast
i totally dont 🙂 but i do read some articles that put forth some interesting concept on possibilities
Cool! You got any leads?
It has to train
and i would chime in to say avoid hollywood nonsense
any cnet that drops 3 days after the base model is bound to be trash that someone rushed out just to be "first" and get PR for their startup etc
Yes. And no i cant tell you more than that
Yes you should
A business partner studies ai ethics in Cambridge about this topic, they do some interesting research about agi and human adaptation towards this topic. We are already at a point now where the possibilities of ai need 10 years of human development to unlock all tools we can gain from them to boost our performance. So anything from now on is already future tech.

yeah .. we are already in it.. and things are moving fast
but we dont have any AI module that can function independent of human inputs yet
And obsolete by the time it hits the media
Agents are the first move towards this. Just needs the learning curves and adaptability to think rather intuitive by combining streams.
It’s all in the beginning of it. ChatGPT remembering complex conversations, Claude using human user interface, agents for crm.
We do too. Yohe has created several
there are also guardrails in place that prevents Ai to learn things on its own exponential pace
Nonsense
it'll happen one way or another eventually
multicellular life never gave up
AGI has billions of years left before our planet is toast
Sounds like the Blackwall from cyberpunk phantom liberty.
no we dont have Ai that can function w/o human inputs
if you mean turning the switch to let Ai do its things thats not what i mean
we dont have systemic grid to allow for ai to function on its own
its kinda like conceptualizing bullet trains on a muddy farm land
a blonde woman sitting on the wooden dock, dangling her feet in the fish pond. She is holding a rose in both hands
i have had sexier pics with sd1.5 with those kinda prompt
Yeah, we do. Yohe has created several automomous agents
Ok?
and even if we didn't have it yet... it would happen eventually
such as what.. im curious
AGI has billions of years left to evolve
You can go look him up and research
only way it doesn't is we nuke ourselves
Would you guys say with the last 3 days that you prefer the outcome of sd3.5 over flux1 from a lookdev & aesthetics point of view?
the default aesthetic yes
Nano-light years?
i'm really not a fan of the default aesthetic of flux
mwashed out
it's DPO'd to hell
Greatly prefer 3.5
Always looks like high cfg
Okay nice
much more diverse outputs with 3.5
The license model for enterprise with stabilityai is also a lot easier to handle.
AI is gonna make new medications to "help" us
That’s great to hear from all of you. Thanks.
and it's a real base so we'll see real finetunes
You dont want a 12 billion parameter lora?
i've tried finetuning flux, it's a nightmare tbh
Actually funny
Theres already one on huggingface
just for clarity the possible outcomes of an Ai that is developed enough to function w/o human inputs would also be surpassing the collective human intelligence, and for that to happen you will need entirely different system grid but if you are talking about something like Ai beating humans in a game that's not what i mean, those are limited areas
Thats what flux is
Wrong
im beginning to think your definition of Ai is quite different from mine
Amnesty international isn’t it
it's just got a better sense of style
That’s clean
a lot of issues with coherence can be cleaned up easily with SDE sampling too
have you looked into modelsampling node, that can influence image tones a lot
You realize hes a programmer?
sure, but im asking if he looked into it
ya lol im giggling, if he only knew what his workflows and nodes were like
well that wasn't intended as you are making it look like
i just thought it was funny, that guy is a pretty advanced user by far, i know you didnt know
i wanted to know your definition of Ai and you posted that image, that doesn't explain much to me
got my own version of it that allows you to pick between SD35 timestep scaling and flux timestep scaling
nice
I told you to go look up yohe
i dont have that node yet
threw it up on my res4lyf repo tonight
ok!
We're top of the leader board, or at least we were earlier, on huggingface
what is on top?
sd35L?
awesome
glad there's as much interest as there is
SAI deserves a lot of credit for this one imo
Yes.
2b was a total lemon but this is pretty great
Definately
it was fun training/sampling my way to convincing photos with flux, but just beyond that and the prompt adherence, just not a fan of the aesthetics
this however... loving it
Loras, a fine tune, a merge... already
Nero's got something cooking on onetrainer too that will allow finetuning SD35L on a 24gb card without the big slowdowns of block swaps
Really happy with the community response
don't know anything beyond that it's apparently going to happen soon-ish
Thatll be cool
for sure, gets some multigpu support going with something like that... get 8x4090s and you're in fn business
Matteo's got stuff cooking too
You working on anything special?
sd35l has the best look compared to any other models form y opinion
Agreed
all we need is additional anatomy tunning and it is dream model
at a min i'll get a pair and use one for the conditioning and one for the unconditioning to double CFG speed lol
i'm sure ppl will like that... oh, yeah, just grabbed a spare 5090 here to handle my negative prompts
Theres a really nice photorealistic lora on that huggingface link for that now
what i'd like to do then is get some training code going that shuttles the latents back and forth between staggered blocks of the model loaded alternately on different gpus
theoretically i think you could do that and suffer almost no performance hit
just run a batc hsize equal to the number of cards
Assuming you have more than one gpu
yeah, that's the plan
just not crazy enough to load up on cards right befor the next gen drops
i've got a 25 amp circuit in the basement doing nothing
perfectly positioned to cut a hole in the wall and connect into the central air to exhaust the heat to heat the house in the winter
Made you a new avatar @dusky thistle
that is cool but I really want to train something myself, waiting for tools and methods to mature a bit
Yeah, hasnt even been a week yet
i been told that at nf4 and 512, lora can be trained with under 8gb vram already
3.5L Turbo LLM (Meta's Llama3.2 - about 30% slower than Llama3.1)
Don't know yet if it's better than 3.1?
3.2 is the same model but with vision?
How is 3B slower then Llama 3.1 8b?
they have quantized version of 3.2 which should be faster
On my PC, Llama3.1 is 37 seconds/iteration. Llama3.2 is 49 seconds/iteration.
In 5 step Turbo
Send me link? 🙂
Are you using quants?, since llama 3.1 q_8 on my computer is 80t/s
I am using Ollama.com Q4_K_M - so it is quantized
What is your computer gpu+cpu?
ok, but just wondering why did you choose q4 when you could use q8?
At Ollama.com I don't see a Q8
there is, if you expand the list
nice, and i was thinking maybe q4 had some advantage that i could replace q8 with and save up space and memory
Good - 25% faster!!! 45 seconds/iteration instead of 60
aweome, i was using the 1b q8 model and i noticed some speed increase too
3.5L Turbo LLM (Llama3.2:3b_instruct_Q8_0)
will we be able to make turbo model from finetuned one's in the future?
There is no way to get this running locally other than Pinokio. Comfy doesn't support it. The diffusers version is broken. Local install fails. Only Pinokio works (at least as of yesterday).
Omnigen altering a person's age much younger:
Slightly older:
Much older:
Omnigen can transform the material composition of an object. (I had to read the original paper in order to figure out how to prompt the model. I tried many prompts that failed before understanding how to do this.)
It understands material deformation physics (it can conform a generated material to geometry that it infers from the input).
(I am using a guidance scale of 3 and 20 inference steps. This model might give more photorealistic results with other settings. I will test that later, but you can always refine these images with Flux.)
looks good
It can integrate object / person inputs.
This is actually the most impressive use of inputs I have ever seen out of an AI model. This blows IP Adapter out of the water. Also, check out that perfect hand and grip??!!
All that said, it did not follow the prompt perfectly, because she is not taking a drink.
I will try rerolling it.
Results just as good, but still not drinking. I will try rewording the prompt.
So far, Omnigen seems great at extracting features from a person, but if you want it to extract features from an object, you need to tell it.
These results are shockingly more impressive than before. No joke, this thing understands physics and solid objects WAY better than Flux.
Still no drinking though. Moving on.
hands are right yeah
And in all honesty this photo quality is top-tier. Not necessarily the textures or aesthetics (although I haven't experimented with the settings yet), but just the solidity and anatomical details. (The eyes are messed up though.)
I don't think I've seen any model get hands of this quality, and it does this effortlessly. (Even though that's not its main selling point.)
could you try a few sci fi or fantasy prompts like a dragon or a spaceship
not sure if it only knows photos of people
Added to list.
First try. The hands are definitely not perfect this time, but better than Flux. And again, this is better than Loras and IP adapter.
I mean seriously... Those are the same people. Reposed and relit following my prompt. Can we do manga now?
all that done by 3b model???

can it generate something with that spongebob?
This is OmniGen, which is a 3.8B parameter model, yes.
I'm not sure what you mean by heavy. Right now, the only way to run it is Pinokio, and it is completely unoptimized. If Comfy supports it, this will be blazingly fast.
Effortless style from text.
and we need quants
Yes, FP8 should take up less than 4GB VRAM.
wait, then I can run full precision
the model in repo is 16gb, is it in fp32?
Full 32-bit precision is 15.2GB, but the Pinokio code is putting a bunch of encoders and context into the VRAM.
can it completely change form, for example, transforming into voxel style?
Added to testing list.
These generation speeds are so inconsistent...
Failed extreme watermark removal (but a small one works fine).
@turbid grotto Yes, It can infer geometry and then simplify forms. (I did not expect this.)
It can handle extreme relighting effortlessly.
try something very weird
that doesnt even make sense
I am happy to add things to my test list, but your recommendation is a little too vague.
This model is definitely not magic. It takes careful prompting to get what you want.
Hmm
First thing that came to mind is to make them metal sheets with glows of angel halos
If I don't understand the prompt, I can't know if it failed though? (What does it mean to make them metal sheets.) And it fails very easily.
For that relighting test, first I tried "change the background to an underground volcano", and it just copy-pasted them as if it had cut them out in photoshop, without changing the lighting at all. You could even see the cut-out lines around the guy's hair.
Minimalist logo redrawing.
Did you try ?
No, it's very slow to generate one image on my PC. But I'm 100% sure it could change their texture to metal and add halos. It has already passed similar tests. (You can scroll up to see what I've done so far.)

It lacks aesthetic quality, but it can generate a fantasy subject.
Urgh. Here's why I'm definitely not going to try random nonsense. This model fails so easily. Zero change here. So much wasted time. (It doesn't fail randomly, it's just very sensitive to how you word the prompt.)
is it a very slow model to run or is your pc like mine 
Just d/loading Pinokio (again!) OmniGen looks fantastic. It can take two photographs with a prompt like "take the middle character from photo1, and the character on the left in photo2 and make a new photo with them together!"
I do not think it can repose a dragon. I suspect it could for an android. I will try with a specific cat next.
Yes, it actually aced a similar test like you won't believe. Better than loras.
It can reposition a cat while preserving features not specified in the prompt, but it is less accurate than with people.
this one is amazing
didn't even realise that would be possible this year
Same here. I still can't believe it. And that was a first try. No cherry picking.
The quality is its biggest weakness. I will experiment with settings next.
it is more low poly than voxels but very good
Passes across the board...
Yes, that's what I prompted for. I expected it to fail, which is why I threw so many terms in there.
understand, thanks!
An amateur, realistic photo of a woman taking a selfie on the porch of her house. She has freckles, and is smiling slightly. Her hair is a little messy.
20 steps, locked seed (123456789), Guidance at: 1, 3, 7, 10
For photorealism, guidance increases contrast, clarity, quantity of small-feature details, and prompt adherence.
No combination of guidance and steps seems to produce good anime results. This model would need fine-tuning for anime.
It can do ethnicity changes fine. (Although the model will prefer asian faces by default.)
On my PC, without optimization, 2048*2048 looks like it might take over 15 minutes per generation. (But I killed the generation after 1 step, so I don't know.)
(SD3.5L TURBO Q8)
OmniGen's current code does not support 8-bit floats or quants. Someone much more knowledgeable than me would have to modify it.
Gotta go. I will try leaving a 2048*2048 generation running, at least to see what kind of quality it comes up with.
Still d/loading 15.5 Gb checkpoints
Nevermind it only took 3 minutes. There are definitely artifacts here, although different from the normal diffuser artifacts I'm used to seeing. 2048x2048. In that case, I will try a 4096 if it will let me? Hmm... Nah, no point. It already can't do 2048.
I will try telling it to upscale.
can it "fix hands"?
probably too specific
my little test of first sd35l finetune!
comfy released the version of 3.5 large with the encoders built-in ... does anyone know if they did the same with Turbo?
sorry, I don't know but what a point of it?
takes more disk space, but keeps things a little cleaner and simpler to use.
oh yea, one node for all things
Yeah. So trying to see if there's similar for Turbo. Doesn't look like they released one.
Opened a ticket with pinokio as OmniGen errors out each time
another comparison\test
Aw, too bad. Is there a Comfy issue requesting support yet?
Pinokio is the only working OmniGen - I updated SD Next - but apart from having a snazzy new GUI - its fallen short of supporting OmniGen. I'm sure Comfy.org are on the OmniGen trail as we speak ... 😄
When you start pinokio - does it d/load 10 new files from Cuda each time (like mine does!)? 🙃
yoooo
up to your imagination if hes skydiving without gear or if hes a giant
impressionist oil painting by Claude Monet of A man in a forest whilst sitting by a tree. The man is shirtless and is barefeet. He looks frail and cold. There is a lot of mud and in front of the man there is a pond. The weather is overcast and the pond has ripples on it. The scene is dramatic and depressing. The man is looking down in sadness.
@cunning lintel Flux - Pixelwave finetuned model
the arm and knee is melting together though but aside that this one is nice
I wonder if I just got a lucky seed haha
⚠️ close to nudity but doesn't expose anything
16 steps with dpm++2m and beta scheduler
huh...
also there is a fix for euler_ancestral that makes it work with rectified flow models
SD3.5L Turbo LLM (Llama3.2)
you're not supposed to ignite the tip of it!
but this isn't very painterly like, the guy has perfect hands or it isn't hidden in his pocket 
prompt is kinda dumb but somehow mochi made a nice video still(not mine)
prompt: make an ai video about me as a future marine engineering and talking about my success
yeah wow this is nice
for an offline model, man
did it take like more than 10 minutes
I heard it took 30 minutes on a 4090 for someone
Yeah I can't wait 1hr for one vid lol(my gpu isn't that great), I got it from the genmo website which uses the same open model + an upscaler. It only takes like 3-4 min there and they also show the video each step.
if it were to take 5-6 minutes on a 3090 I'd generate some vids myself, especially if there was vid2vid
looking through a bunch of overnight generations right now
huge hats off to the folks at SAI
so much more diversity of style than flux
cant wait to train this thing... gonna start this ewekend
and the watercolors are really great
Yep same, either I use fal(which also uses same model but less steps and no upscale but takes 1 min) or their offical website which uses more steps but 3-4 min. There is a lot of optimizations left and the employees also said vid2vid, img2vid should come soon.
right now I think with lots of experimental optimizations that Kijai implemented, it takes like 5 min on a 4090.
I want vid2vid to make video games look like live action, like those GEN-3 videos, but open source
cogvideo would have been fine but I don't know how good the comfyui plugin implementation is
CogVideo is actually pretty great too, mochi is clearly much better in text-to video but CogVideo supports so much more(trajectories, img2vid, vid2vid, and more)
im in need of some assitance, all of my images come out blurry regardless of what i do
Not enough cfg or not enough steps?
cfg is 7 and steps i use between 20 and 40
turn that cfg down to 3.5 or 4 and steps to around 32. and if you're using any flavor of sd3, set your shift to 2
@winged seal list of loras already on huggingface https://huggingface.co/models?other=base_model:adapter:stabilityai/stable-diffusion-3.5-large and one finetune so far under the finetunes tab
I wonder how much work training out the dream shaper aesthetic will be in 3.5
I haven't had the time to get AIT fully set up, and I'm not even sure what dataset to use to try and do that 😅
Oh wow, you pinged me right on time lmao
i saw you typing and knew you'd get a chance to see it before it got buried
Nice lol
I do have goals to try it soon, just been dealing with more emotional/personal problems
it's a dynamic list, just bookmark the URL
@craggy crest and by the way, my boss went through the chat and didn't find me saying anything bad. I just hope we don't try and pull him back into things in the future 😅
anything we can do to help?
Oh no no no, it's not anything people here are responsible for
he's on my friends list. don't act out, i won't have to tell him to read the chat
i'd prefer you and I were friends and could work together, quite frankly
What I'm saying is he agrees I didn't do anything except provide insight as a well seasoned trainer, and relay the information of my research partners and their findings. There wasn't really any act out
I just don't want any of my future results to be lumped in with the accusation that I am financially incentivized for flux to do better than SD3.5, cause I'm not by any means 😅
he and i talked.... we'll leave it at that.
The insinuation that my boss would lie to me is not exactly appreciated, but that's not that important. Anyways
not insinuating anything, just letting you know i already talked to him. let's consider the incident closed and move forward?
I like the aesthetic of the first two. Now my general wonder is how does one fix the really plastic/fake same face and still allow for that aesthetic/style underlying? That's gonna be a real challenge
Do you have an example where it's fixed?
I'd love to see one, if anybody does, cause that's a pretty huge hold up on 3.5 for me
How do you mean fixed? Same face/different surroundings?
on that lora's page, there's a really nice photorealistic lora that works well
Ah alright, thanks. I don't currently have it running at this moment but I'll save them for later
That's not what I was asking for or wanting tho
i know, but you might try it as it helps with what you were asking about
Photorealistic implies it's a baked in style, no?
Same face/plastic face specifically. The goal is to still have the style of the image (like the overall aesthetic) but with faces that look less synthetic and manufactured
refine with SD 1.5 is what I do :shrug:
Does that fix the facial structure tho?
if you use a noisy enough sampler yeah
Fixing facial structures is easy, like with flux, but that always inherently changes the style to be more photographic
very minute changes in sigma matter a lot for img-to-img
sometimes you need to start on a very specific sigma
Ok, so it looks like 3.5 doesn't like multi res images, that's a monumental bummer
sorry what's a multi res image?
Training on multiple resolutions of the same image or a dataset of multiple resolution groups
i used it with 3d pixar style prompts, and it had a nice effect
ddim/ddim_uniform don't work!!!
oh I see you mean training
with SD3.5L
For example, I train flux on 256, 512, 768, and 1024, which makes the results much more robust and take a lot less time
yeah most models in the initial training phase spend most of their time at 256, this is best technique currently probably
Compared to just 1024x, it takes about 35% less time, and gets less stuck in the same composition
But it seems as though SD3.5L is too stable in its current state to take proper advantage of multi res images. Maybe later once it's patched up it will be more viable
@bitter hearth hey there, did you ever successfully get flux dev fp16 to tensor on rented hardware and get it back to a local 4090 machine where it worked? I got it working with that flux-lite-alpha thing because it was 16 gigs, but the visual difference is rather significant.
no I forgot to do that, sorry
I would have to look up how to save the tensorrt model cos I don't actually know that
ddim/ddim_uniform worked the second time in 3.5L Turbo
oh the stock nodes have save load, its easy
I've been using torch.compile a lot lately instead personally
3rd output ddim/ddim_uniform - top-half is poor
would not recommend ddim
I'm learnin' 🥳
For anybody using flux, euler makes flux look really bad too
Ipndm-v is ideal for flux in my findings. Much bette colors, contrast, details and color temp
album cover or something
My Flux looks OK using Euler - so I will change up and see what gives
some people have got stochastic samplers working with Flux
I use Flux Super - mebbe 3 x KSampler compensates for Euler?
Are all loras for models older than 3 incompatible with 3.5?
3 ksamplers will help a lot yeah
cos each ksampler adds some noise
its a bit like noise injection
"I hate stochasm!!!" 
lol
I would assume to some extent, but I would recommend against the, just because both models are unstable in their own ways
issues that needed to be overcome in 3.0 will be overcome in 3.5, which means the fixes may over-express and give some very weird issues
@craggy crest did the last little push and I am trying my first run with 3.5 now
what kind of speed increase that does that give you?
tensorRT is literally double speed.
Downloading the model
TensorRT speed increases are different from cars to card, and model architecture
on a 4090 it's double
My 3060ti always got a good bit bigger of a boost than my 3090
Is there a TensorRT for 3.5L/3.5L Turbo?
it doesn't work yet. just outputs noise or black boxes.
I doubt it this early on
i've opened an issue on comfy's repo for tensor about it.
works with sd3m, just not the new sd3L
And I can only imagine how long the compiling will take 😅
I haven't benchmarked it, its significant but I am not sure if it matches or comes close to tensorRT
not that bad. takes about 12 minutes on other 16 gig models like auraflow / flux-lite-alpha which are the same size.
the latest pytorch added a speedup to torch.compile apparently
12 minutes per resolution per batch size 😅
Did they at least fix it to where adding LoRA's in doesn't mean you have to fully re-compile?
yeah, i'm always only doing 1 image at 1344x768, so it works out for me
That's fair then honestly
lora doesn't work still, have to merge those in if you want it THEN do the tensorRT treatment
I see the benefits
Yeah, that makes it basically useless for me then. That's why I never focused on it working before
if you don't use tensorRT, could still use torch.compile its much faster compile time but for a smaller speed boost
I think its mostly good to use one or the other
well, at least with flux I've got 3-4 loras that I usually merge together in with the base model. I have to do that anyway because with full flux dev fp16, it won't fit with all those loras in 24 gigs anyway. so I have merge so it'll fit. so TensorRT'ing that thing for double speed would definitely be worth it for me.
Ok, so it looks like training SD3.5L is about the same speed as Flux. I was worried it would be slower
is there an example page for torch.compile?
this is all I saw https://github.com/comfyanonymous/ComfyUI/commit/d0b7ab88ba0f1cb4ab16e0425f5229e60c934536
I browse the new commits to find new stuff
which training tool are you using?
It should be noted, flux has a huge one up here, which is holding back SD3.5
If SD3.5 reaches a point where it can handle multi res inputs, it will likely train way faster
yeah cos you could drop down to 256
oh @noble coyote theres never really a good time to use ddim, i dont think its taken into any consideration by anyone when making models or UIs, its just a legacy thing tbqh
thanks. so you just do flux diffusion model loader, this torch compile model node, then to ksampler?
yeah exactly that
Yeah, if I could do 4x multi res, I'd expect SD3.5 to be maybe 2x as fast as flux to train
and try to get up to date CUDA to get the FP8 boost on Ada or Hopper
But I assume that will only be the case on consumer GPU's. I don't see that happening on workstation cards
in inference, SD 3.5 seems to not go beyond 1024 without funny squares appearing
not sure what is going on
am I wrong btw, re:ddim? I cannot think off my head of a single use for it demonstrated where its the best tool in the box
you're right, I don't recommend ddim
That's a shame... Flux was an astronomical leap forward with high res generations. I think that will be SD3.5's biggest weakness for sure
I'm quite excited about gguf versions of 3.5
in the I-Max paper they trained a 2048 checkpoint of Lumina which looks amazing but they did not release it
hopefully they will soon
just gguf being used in text2img in general
I think a lot of people down play flux's native capability to do more than 2048x resolution when SDXL and SD3.5 can't even do 1280x
I believe it. It does phenomenal 1440 and 1440 Ultra wide gens
but the Lumina checkpoint can do 8k
well, 1 megapixel is still ideal for flux otherwise you start getting stripes at random. that said I've got a really nice and simple upscaler for flux which makes those 1 mp images look great. lots of detail etc.
I have tested this claim, and I have found that if you're able to put any LoRA That's even moderately well trained on top of it, it completely fixes that issue
And, if you have any LoRA's That make the issue significantly worse, they are likely either improperly trained, or trained in Kohya.
Since moving to AIT, basically all of my trainings have greatly improved flux's ultra high res generations
Like here
apparently Flux works with DiffuseHigh, that might also take it to 4k, by a different method to I-Max
Native 3440x1440 I got with a realism LoRA I made in about 2 hours
whoah
No upscale, no second pass, just raw diffusion
so the answer is that 99% of lora are trained at 512/768/1024. To be able to render at higher resolutions and not get the stripes, they have to be trained at 1344 and higher. The problem, is that this is where 24 gig cards hit their limit. To do that you'd need to rent something like on runpod etc.
native 3440x1440 flux with one ksampler?
without using library like I-max or DiffuseHigh?
that's amazing
Where did I say that? I train at 1024 and below as well. The issue is just that Kohya specifically does not train properly
Here's another
this image is so amazing on 4k monitor
In fact, the one that I trained that allowed me to do this high resolution generating was actually trained at 960x
Absolutely not. I refuse to use it because of the creator
hmm... looks like i need triton installed for this torch compile thing.
oh yeah its linux only sorry
There is a triton for windows if am not mistaken
there was some very recent thing about triton in windows
maybe this https://old.reddit.com/r/StableDiffusion/comments/1g45n6n/triton_3_wheels_published_for_windows_and_working/
not sure I don't really do windows
for this stuff
@bitter hearth not sure if you saw my response, but I am vehemently against simple tuner and everything to do with it. I will not use it, condone it, or support it in any way
ok thanks. looks like I'll need to spend some time getting all that together. I'll have to try that later tonight.
why is that? I'm not aware
I've been making a fine tuning script in Pytorch and one in Jax personally
I used to be very close friends with the creator of it for over a year. We were part of the terminus research group, where I and many other very talented individuals would compile together our knowledge and share it with one another.
The creator of simple tuner is an exceptionally horrible person, one of the worst I've ever met. Exploiting individuals, abusing individuals, fleecing nudes off of individuals under the guise of training data, making a heinous accusations about me and other people in front of our employers because he was jealous that we got positions he didn't, stealing training code, stealing data sets, you name it
I defended him for a long time, but after months and months of him abusing me and other people that I enjoyed in the scene, I was finally able to get away from him, at which point he took some of my methods, my data set source is, and the resources that various of my research partners spent months and months compiling together, and he turned them into a bastardized repackaged version that's not as good as any of our individual projects
ok thanks for letting me know, wasn't aware of things
Of course, I don't expect anybody to be majorly aware, it's stuff that is quite personal. However with that said, I'm always open to sharing my experiences, because I don't believe that he deserves any form of platform built off of all of the horrible things he's done and all of the information he's blatantly stolen
I'd go into more detail if I didn't feel like I'd get banned from the server for the sheer depravity of things he's done
One of which was confiding in me about personal problems and depraved fantasies that he had that he was trying to get better with, and I shared similar experiences with him about things that I wish to get better with, and then he weaponized those things that I told him, turned them into 100 times worse accusations than what they actually were, and purposely blasted claims of genuinely illegal and deplorable nature in front of two of my potential employers
But anyways, I don't want to trauma dump in here. All I will say is that I have dealt with a lot of truly terrible people in my life, and I think he might single-handedly sucker punch every other person I have interacted with out of the way in order to claim first place as the worst person I've ever interacted with. And that hurts to say, because I defended his actions for over a year before I realized I was a victim too
okay I wasn't aware of this, seems like there was a lot of history
:( ouch. sorry you had to go through that
I assure you this is like the 1% I can say lmao. He is the most deplorable person I have ever met. Luckily, he's such a jackass that his bad attitude and missdeeds keep getting him fired over and over again so people aren't stuck being around him
Maybe some day he will learn to be a less insufferable person, but the half a dozen job firings in the last year don't bode well
(I've just been to the Chip Shop!)
You seem talented enough to walk out - and immediately into 6 new jobs!
Nice! What's the secret to suddenly get paintings? Does it work for any scene
that's cheating :p
its a dreambooth-like finetune
no lora
if SD3.5L gets a finetune that improves paintings I might use that instead
I'll have to download it 🙂 looks good
except that SD3.5 sucks with anatomy but we'll see
also I haven't tested it too much
there are Claude Monet-isms when I prompt for it, like using random colours in places
Van Gogh as you just saw there
Caravaggio didn't work iirc
have you tried the photorealistic lora on the huggingface space?
have not
i always liked to use Peter Mork Monsted / jacob van Ruisdael
I might try it later
well I'm not sure if those will work
but I'll try
expect a 1% chance of it working
Running OmniGen through Pinokio's setup, I don't have Triton. Not needed.
Zdzislaw doesn't work for example
they work jsut fine with 3.5. it's flux you can't train, not 3.5
oh god peter mork monsted makes photoreal paintings
I wonder if regular Flux Dev with a low guidance would get you that already
painting by jacob van Ruisdael [... rest of the prompt] @cunning lintel
🤷♂️
I mean close enough
It's plenty close, esp compared to other models
I use ddim_uniform scheduling with euler_ancestral sampling (update comfyui if euler_ancestral is messing it up for you)
16 steps I think
for no lora this model is impressive
I update comfy much too much, i have it 🙂
SD3.5 Turbo ...
a lookalike between Elon Musk and Donald Trump with long hippy hair and 70s baggy outfit.
when people were using SDXL they had to run additional extensions to cross mix characters, now you can do that easily with sd3.5 vanilla model
pretty amazin stuff what you guys can create with just prompting, i think personally loras has turned me into a lazy prompter, sometimes I don't even cite a certain style and just add the lora to get that efect
loras are certainly helpful for very specific tasks with great results if trained properly, but i also enjoy playing with regular prompts to bring out ideas into images
raw prompts reflect on the model capabilities, so far 3.5 gives pretty good outputs
while chatbots like gemini and the likes censor any real life character depictions
all valid points
why does it look like steven tyler 😆
a lookalike cross between Elon Musk and Donald Trump wearing 60s hippy outfit.
didnt come out right this time... i think i need to edit the prompt
do a cross like Steven Tyler and Elon Musk let's see what it gives us
the glasses ruin it, iits more fun when the glasses dont cover the face to see a better idea of the cross. im not getting any elon from this latest take
maybe the Elon goatee?
same prompt using sd3.5L on my system
a lookalike cross between Elon Musk and Donald Trump wearing 60s hippy outfit.
sd3.5_large_fp8...aled | 🌱 2854914518 | 🦶 26 | 🦮 6.0 | 🎤 ddim | 🗓 10/26, 4:39 PM | ⏱️ 108s
maybe I should do more than 26 steps or lower the cfg?
30 steps is ideal for Large
your results look way better it almost looks overexposed for me
wow interesting cfg at 1 with turbo
and for realism aspect of it i utilize modelsampling node .. its very efficient within a value of 1-3
sd3.5_large_fp8...aled | 🌱 948269089 | 🦶 20 | 🦮 7.0 | 🎤 euler | 🗓 10/26, 4:41 PM | ⏱️ 47s
supposed cross between trump and elon, looks terrible
heck yes and within 10 seconds for each image
47 seconds for that image isn't bad at all for me, id say thats a new record actually
btw when you use euler you will get softer and smoother look
sd3.5_large_fp8...aled | 🌱 2413612102 | 🦶 29 | 🦮 5.0 | 🎤 euler | 🗓 10/26, 4:43 PM | ⏱️ 78s
i changed it to portrait, increased steps to 30, and reduced cfg to 5 as suggested and reuslts idneed look much better now
but honestly i prefer dpmpp_2m and control sharpness with modelsampler 🙂
i don't like to mess with custom nodes I like sticking to classic workflows
i dont even mess with shift
not custom really
its part of the comfyui tools
no need to download anything special
yeah i know i'm just saying I don't like to go outside of the 'Default' workflow where it's just load checkpoing > clip text encode > ksampler
i guess it's a result of i don't actually use comfyui but rather I use it through my website and i don't have support for 'custom' or basically nodes outside of the classic setup
that looks like steven tyler attended a Tesla event
"steven tyler presenting on stage at a Tesla keynote, wearing a black shirt and pants. Elon Musk looks on from the side of the stage, pleased."
ahh ok, yeah i like to keep things simple and organized
snapshot of my general workflow
so where's this "modelsampler" node you were talkinig about in that screenshot?
i have ollama on the far left too, i switch between that and manual prompting often
on the loders
that's what id consider the classic setup, i'd say you're using all the standard things I am too, nothing special like you're not even the node to expose the shift
its more coherent for me in that way, i could add other stuff but not when i dont need them for specific purpose
where's the modelsampler though?
you were talking about controlling sharpness with it
the light blue and pink loaders
where is that in your screenshot?
at the 2nd row in those loaders
oh i see, i missed that the first time, thats what I call the "shift node" so you are using it, so you're saying you like to use dpmpp_2m and mess with the shift to control sharpness?
yeah it has big influence on the image tone
sd3.5_large_fp8...aled | 🌱 3775773810 | 🦶 32 | 🦮 4.0 | 🎤 euler | 🗓 10/26, 4:49 PM | ⏱️ 137s
and you dont need to change the value too much, you can have big results between 1 - 5
that looks more liike a video game jacket thhan a hippie jacket
yeah crystalmancer was showasing the shift parameter the other day
nice
i also noticd when using sd3.5 with modelsampler, a value of 1 makes for a nice realistic image
this is what my form looks like, I've just been too lazy to add the shift field in the form, considering both Flux and SD3 use it I think it's about time I get around to it
is that a custom UI?
yeah it's 100% custom using ComfyUI for the backend
i even use the vae_decode preview from ksampler to dynamically show the rendering inline
neat .. looks interesting to be using your own UI
i just spent like 20-30 hours rewriting how I handle the generating of workflows for ComfyUI, right now I have it set up so iti's a workflow for all stable diffusion models and one for flux models but it's really messy and it makes it hard to add nodes like the shift node or model sampling node as it's called
i can imagine
basically im generating a custom workflow per image based on what options I use
this way also avoids overhead like loadingi stuff I dont need to use
i have almost 9K loras installed taking up 1.6TB of space i think
i would say my system is doing fairly ok using comfyui
and these turbo and schnell models are nicely optimized
speically with quantized versions
i use q8 for both those models
that's another thing
is Q8 just fp8 or is there more to it?
id like to use gguf for sd3.5 but i cant because im bound to the limitations of these two workflows I have set up so with this new system im building ill be able to use gguf for sd3.5 similarly id be able to use safetensors for flux images
better precision and accuracy than fp8
and we have about 2 days now, till 3.5 medium drops
there's a little share button on each image and when I click it I get this little panel at the bottom so i can then just click Post Next and it takes me to the CivitAI website where I can share it, https://civitai.com/posts/8388007
internally I'm using BLIP2 and JoyTag to generate a post title and the tags for it
i think Flux has taught me the importance of patience for quality, sd3.5 medium is kinda meh, when im gettingi 47 seconds on the large model that's "good enough" for me to not want to mess with something lower quality
well yes but also turbo model is not really lacking much .. you can have great results much faster w/o any bad compromise
yeah that seems to be the sentiment across the community, ive heard that multiple times, it's hard to believe really that's why i haven't even tried it yet
there is definitely use for large model but for professional reasons for those who are making images commercially but for people who are just fiddling with it for casual use it wont matter too much
i dont make money out of making these images
these are mostly for fun and play
me too
im just generating images for fun, i have a remix pipeline and then i share them on civit, my motivation for generating high quality is so that i can share high quality content on civit to farm reactions
feels good to be 'validated' for a nice generated image lol
2-8 steps is fine in my opinion I do that for most models where possible (hyper, turbo etc)
im guessing you have goal with developing UI or at least in the process of creating something that you can polish and market later?
this is the size of my image generation queue, 155K thousand images, $73 in costs if I were to use RunPod to generate them all, and 101 days of processing to make them all locally
nope im just developing the UI for myslef, for fun, as a hobbyist programmer, its a mix o programming and generating images for fun, they go hand in hand
ahh ok
so far I've genearted 550K images locally with my system
top Pony and Flux models I have
i think Pony is still king of creativity, Flux is king of realistic and SD3.5 is the king of illustrated images
that's really good for realsitic, speaking of which right? lol
it is comparable, i mean it takes a big man (SAI) to admit they've been bested by Flux when they posted that chart showing Flux still bats SD3.5 in aesthetics, there's something to be said for that
yeah but i think fine tuning is necessary in some areas of the anatomy
for flux or sd3 or both?
i think flux has weak anatomy but I think its ability to generate high quality realistic + high quality anatomy loras makes for a better model overall
flux may not be as easily trainable as sd3.5
that is true but as of right now theres like 100x more loras for flux than there is for sd3.5
plus you can even train flux on civitai for like $5
yes but most of the loras are buzz farm
i came across handful of loras that create bad hands than the base flux model
true
it could be partly due to how black labs designed their model that are hard to train
yeah thats the main reason a lot of people already prefer sd3.5 in the state it's in over flux despite higher quality for flux
trainability goes a long way in the long term
@bitter hearth hey i igot a prompt for you if you can try it on turbo for me real quick
modelsampling left to right.. 0.5 - 1.00 - 2.00 - 3.00 - 4.00
its very subtle but def a change
use the arrow key on your keyboard to swap image views
ok here's the prompt
a painting of a pink Jester giving a high five to a panda while in a cycling competition. The bikes are made of cheese and the ground is very muddy. They are driving in a foggy Forest. The panda is angry.
each of them are distinctly different
yeah i think it's more noticable when i toggle through them with the arrow keys, especially the eyes
my preference is 0.5 for the realistic look
no cherry picking ..
i had a fixed seed btw
let me try another
using turbo on glif https://glif.app/@rennurtsfx/glifs/cm2l63ksi0000pp15p3us3usn
cherry picked one
i like the image contrast in it
pink jester ✅
high five ✅
cyling competition ❌
bikes made of cheese ❌
muddy ground ✅
foggy forest ✅
panda angry ❌
missed 3, i can get dalle3 to nail iti 100% not miss a single thing, using that prompt, and i can get flux destill to miss just the cycling competition aspect
i consider the cycling competition aspect to add some distant bikers competing with them
Glif only uses T5XXL and Clip_L and doesn't use Clip_G so quality may be effected
dalle3 is openai and openai still owns it
and OmniGen is a MS product i think
openai gave microsoft access to use in their little website tool thing but it's sitll there's and available through chatgpt
ahh so not sold to MS
im looking forward to when omnigen drops
it's kinda not fair bc internally dalle3 rewrites the lazy prompt to something that nails it
Omnigen is out
not usable on comfyui yet
but yeah its already out, i tried a few render on their webui .. i think the quality falls short of sd3.5
but the omnigen perk is more about integrating controlnet and bunch of other stuff
dalle3 feels like it regressed, it used to get this prompt really well
you could just prompt w/o needing any extra tools
i ran it using glif + i turned on the claude rewrite feature and this is what i got
i like dalle3's concept of bicycles made out of cheese the best
yeah ms owns like 49% of openai or something ike that
btw modelsampling does more than influencing image tone, it has impact on content too such as the character
they own all the technology until such time that openAI achieves AGI - that was part of their 'exclusive partnership" deal
yeah exactly i knew that
problem is by some older definitions we already have AGI
yeah the AGI term can be broadly interpreted and misused imo, I lean on the side of the fence that we're not there yet, we've just made really fancy machines that can memorize a lot of stuff and recall it, true AGI is being able to reason in a different way
Yeah it’s obviously much worse in quality then sd3.5 and even worse then even sdxl sometimes.
But the main thing about Omni gen is that it can do everything ip adapter, controlnets, pulid, do without any extra model and better sometimes.
if its possible to fine tune we can imagine where that would lead to
Yeah it’s possible to finetune and adding new modalities as well is easier then training separate controlnets too.
i hope so, and we can have model that's dynamic and easy to use
Yep, it’s a bit slow though but just because it’s unoptimized right now.
It’s a lot smaller then full sd3.5, flux and even sdxl too(around 12b, 16b, 4b).
OmniGen is just 3.8b parameters.
sounds super cool
i wanted to try it out but not supported on comfyui, any idea when they will release a working model?
castle carved from fruit, lofty rose spires, dripping water, cotton candy clouds, surreal landscape, symetrical fractal fantasy
AGI is Artificial General Intelligence = regardless of the fantasy twists that social media has put on it the last 2 years, it ONLY means an AI that can do things it wasn't specifically trained to do. doesn't mean super-intelligent or anything else. a two-year old child is NGI - natural General Intelligence. it can do all sorts of things that 1. aren't instinctive and 2. it wasn't superficially taught how to do. an AI could be no smarter than a 2 year old child and if it could do things you didn't train it to do, it would be AGI. Social media is made up of people that have no clue and that love to grab onto a concept, weave it into all sorts of wierd fantasy, mix in science fiction, and then publish it as fact. doesn't make it fact.
SD3.5 large
i have to say, that's philosophically and technologically deficient understanding of Ai and AGI, you may be sharing a practical outlook of the current infrastructure which has a long way to go, but that's not the definition of objective goal of AI
Polaroids LoRA
if you can picture this concept that a developed AI or AGI however you want to spin the intelligence would be the last frontier of man's invention that will give unprecedented results
that's the defintion, from the scientific community, that has always been what AGI is
AGI is something that we've been working toward for years
not something that just came up in the last 2 years
they might be afraid of the fact AI will surpass their intelligence
that is ASI - Artificial Super Intellegence - and no, they're not
AGI != ASI
lets not keep labeling each cateogry
that side tracks the point
im talking about intelligence in general
sigh here - https://arxiv.org/abs/2303.12712 an actual paper for you to start with
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In ...
thanks i will read it, but not atm
bookmarking
there are quite a few others on there if you feel like digging through them
ok
i will read the article when im less distracted but ty for sharing
warning, its a deep rabbit hole with a lot of branches
certainly - just block out time to read and dont' try to do it all in one chunk
Tuesday, their latest model.. Intelligence.. pffft 😄
it is a statistical machine.. still
that has nothing to do with intellegence. you'll find plent of people that are mensa level that cant' do math
and that's grammer anyway
the question isnt' asking it to do math, it's asking for the correct way to write the sentence
I knoow, people would say this:
It is statistical
in any way
shrug. this is a discussion for #💬|general-chat
Here is an image
watson was considered intelligent bot too back in those days, but all these chatbots are not the standards of intelligence that's in progress, we havent achieved that bar yet
https://www.ibm.com/watson watson has been updated
the litmus test is that we dont have artifical intelligence that can be incorporated to its' own autonomy if your life depended on it
still havn't looked up yohe and his autonomous agents he's made?
no, i havent, couldnt find links for it, if you can share that would be nice
you can start with his profile here https://x.com/yoheinakajima
VC by day, builder by night: @untappedvc, @babyagi_, @pixelbeastsnft. Build-in-public log: https://t.co/UdHHGbZba5
ty
bookmarked but will go through them later
you might also look at the autonomous stock trading agents that've been in use for several years
what you are suggesting is fragmented modules for isolated tasks
nope
thats not the broad scale integration
but i'm not going to continue down this path with you. you have stuff to read and research
i know what you are saying but i will read the articles later to see what makes you think what you think
Prompt: four color illustration from a children's book about a puppy and a basketball
you realize what they are presenting at https://www.yohei.me/ are specific modules
Personal website for @yoheinakajima
i thought yoiu were goign to read and research later - that you had stuff you were in the middle of? i'm gonna stop giving you rabbit holes when you say you're busy
im looking at that site casually
you know that leads to 'oh hey, i've been sitting here for the last 9 hours'
but do you understand the point im making ... if we had global scale AI we wouldn't be stuggling over energy crisis, we wouldn't be struggling over cancers and so on
those are some of the objective from materialistic pov
we dont have that yet
i think we have differences in our expectations rather that where ai is and where it needs to go
just like sd3 and sd3.5
i also acknowledge your practical views, we can't have an overly big leap, we need stepping stones regardless how far the reach is
i'm not giving you any more rabbit holes right now ;)
i went through all the holes there is already
and im not bragging about it, there is nothing to brag
its a reality how i see
we are in self destructive mode and AI will have our last chance in steering away
SD3. 5 large: prompt: four color illustration from a children's book about a puppy and a basketball. The puppy is standing up its hind legs, bouncing the ball on its nose
those are isolated modules btw, not largely integrated to human systems
but promising never the less and necessary
you're gonna be sitting here at midnight at this rate
also understand you can develop Ai to build itself but that's not the same as implying whether that Ai has access to all the possible information archive or lacking, this is also one of the pressing issue of censoring Ai, you need to feed Ai all the information, including human lust just for the sake of practicality of life
im interested to go through them, good content and i have time
this is bit off from our actual topic but when you baby sit an Ai you will make a lame ai
ai needs to see everything and compile them into meaningful solutions to where mess happens and how to achieve utopia
that stuff is soooooo cool, one prompt 3 totally distinct images, i missed that, it's sooooo much simple fun to use
@craggy crestOk, I forgot to get back, but I trained SD3.5
I have only looked at the generated sample images from AIT, but I seriously hope there is something wrong with the way it inferences, cause the results look bad lmao
now I have to deal with downloading and running SD3.5 locally
did you have a range of resolutions
cos that's currently the big unknown thing with SD 3.5
whether it can go much outside of 1024, and if so, how to do it
sigh. maybe the issue is, i dunno, the way you trained it? i think, really, the better question is does your boss like what you did or not - as he's training it and quite happy
I mean, there is no real information on how to train it, so its just the best I can try lol
talk to @calm zinc and see what he's doing then
I'm hoping we still get a paper for Flux and SD 3.5 at some point
in the absence of that, what I have been thinking is we can piece together information from third party papers
currently there are two on Flux https://arxiv.org/abs/2410.07536 https://arxiv.org/abs/2410.10792
I doubt it, seems like no companies really care about giving proper resources for others to spend their time and money realizing their projects for them lmao
almost all major models have papers though including previous ones by this same team
essentially the original SD3 paper applies to Flux and SD3.5 anyway, to a decent extent
since the two papers on Flux that I listed essentially say that it is a similar design to Peebles and Xie 2022
which is what SD3 is
I am really hoping the bad results are just AIT's sampling, cause it looks ass from the first sample as well, so I am gonna assume that
1024
ah ok so its not the positional embed issue
even the non trained one looks abysmal, so I assume its just the modified trainer not having proper sampling settings
I guess now I have to do all the BS to get 3.5 running locally lol
I hope it will work with my GGUF text encoder loader
the SD3 paper is already out. 3.5 is just a modification of sd3. you'll never get a paper on flux
alright, so I need the FP8 SD3.5 model, and hopefully I should be able to use my other workflow and the gguf text encoders
did you try what @gusty trail told you he was doing?'
I agree that 3.5 is also Peebles and Xie 2022 structure
I didn't see anything, sorry
search this discord for from: XiaoZhi and read his posts. he was talking to you
are we having any progress in having less shitty websites to host models on? Or is Civit still just kinda all thats really relevant>?
sorry, but that struck me as incredibly funny the way you said that
you could use the GGUFS https://huggingface.co/city96/stable-diffusion-3.5-large-gguf/tree/main
LOL yeah
also when you say we will never get a paper on flux
was this something someone there said or is this an inference?
you will never get a paper on flux. i'll be happy to DM you and we can discuss thsi if you like
okay sure
oh cool, so they have the ggufs already, nice! I am more interested already haha
i gave you all the links yesterday. deep breath and start not doing 20 things at the same time
dm sent
Flux was big enough to make ViT's actually usable. Kinda a smart idea for SAI to have waited for all of the flux improvements to be slapped on SD3.5 to make it much more viable at launch'
none of the flux so-called improvements were 'slapped on' or used on at all, on 3.5
tehre are no 'improvements' in flux