#✨|sdxl
1 messages · Page 155 of 1
i am here
well, let me see if I can share it actually
my uncle works for nintendo, they're working on ultra nintendo switch
ok, yes, I can
Nintendo 128 I presume..
My dad is mr nintendo
bro's, im looking for a collection of Comfee templates to play with, where might one find such a directory o0
@bright valleythere you go, linked what I was talking about. He just dropped it apparently lol
of course, how silly of meh
this is all of SDXL retrained for some reason?
Hes the leader of our research group, and is also responsible for most of the progress on my LoRA
it added in Zero-Terminal SNR, a paper that SAI themselves said they couldn't train into it, and he figured it out
with the hel of hugging face
dude if y'all got a team of people to work on shit at least make a checkpoint or somethin damn 😂
he writes all his own training tools, all of his own bucketing code, all of his own everything. I don't even know how to run it, and its for testing for hugging face ATM
I mean, I figured out how to train something into SD that they couldn't too 
or migjourney
mid
an entire research paper vs a single concept is a big difference 😅
I don't even know how long the whole thing took him
several months
eh, I think text is a pretty big concept
it was like the holy grail of generative AI
besides hands
I feel like you are playing up what you did a lot... lol
A concept the AI could already intermittently do that you refined
Vs a whole entire way of handling early step diffusion and noise schedueling across an entire model not made for it
I'm just being honest about it
like the text thing is dope, something I don't know how to do, thats for sure, but SDXL could already do text
just like how SDXL can do realism already. You and I training LoRA's are nothing compared to what he did here lol
thats cool and all
if it could, I, nor anybody I know, knew about it
Sounds like a good talk for the general chat.
i thought i was in reddit tbh
I'm just sitting here trying to make a giants glove fallen in the forest
i made a forest giant yesterday
Trying to get rid of the double letters unfortunately.
Good morning. Unfortunately I haven't worked on it lately due to a shift of interest into other styles. A big problem is, that SD doesn't seem to understand what part of the prompt is actually the text. One letter words often need a repeat in the prompt to get them showing up. So, yeah, currently the Status is "on hold"
All good, just messing with it.
you may think my lora is interesting https://civitai.com/models/176555/harrlogos-finally-custom-text-generation-in-sd
sry didnt finish that thought lol. I'm working on it, I think it may be a training issue, but I'm definitely trying to fix it
there may even be some prompting technique that can help
people keep finding those every day
I saw it and it looks promising. But how many words can you chain together before it falls apart?
Ok, I was at 5-6.
but I felt like I had enough to correct with just single words first before I try and increase it
Well, you said yours was busted yeah? 
There was a high chance of a Missing letter or a double, but at least it tried to write it 😅
oops i didnt mean to send the top left
ignore that
but two words is pretty achievable
there we go
So close!
Moonster 😂
Mononster
I actually love that O too!
actually how it does that to the bottom of a few letters
thats a neat effect
my buddy ral was doing some crystal shit with it
(PCMonster:1.2) (Text Logo:1.2), (Green:0.9), ([Glass | Crystal]:1.1), (Caustics:1.1), (Refraction:1.05)
Trying different things.
make sure you at least try the prompting structure on the model page
it is built to operate that way at least
here's that sarah one i just made
Just added it as some notes.
Santa vs Aliens
If it is the one from Futurama we will be alright lol.
This year, Santa's bursting from chests to deliver the gifts
Followed the prompt instructions this time so see what we will get, also trying my new custom nodes.
oh no sry you just need the colors
not any of the rest
so its basically Text, colors, style, extra shit
Will try it that way to.
it's trained to pick up the first color as the color of the text for example
"home for the holidays"
That must be hell to explain to civitai users.
With the prompt i shared.
do tiddies
I've done many dumb words, but tiddies is actually not one of them, I'll get right on that
You have always the "that's not how it is supposed to look like" image submissions. I can clearly see that with my MS Paint LoRA. Some of the Images are like masterpieces, but it's supposed to look like shit. 5/5
Tried any cutoff with it to dial in colors etc?
Tbh I've been blown away by some of the shit people have been making with it
What do you mean exactly
^ and this is a weird thing for me, as it's trained on my own artwork, so now some people are better at my style than I am 
Kind of a workflow, can use it tailor specific aspects of images.
Oh interesting
Haven't tried it yet
I've been meaning to learn more of the comfy stuff, like conditioning and all that
Node sanity lol
I have one like that too haha
Problem is I don't know what half the shit does
It's what I made this with tho
Can see the steps basically here
I want to build a workflow that makes a bunch of base renders and shows me previews of all of them, and I can select which to pass on to the rest of the workflow
I think that would be the best way to use the lora
Can send you this, would have to download my CSV Loaders custom node though.
Is that what it does?
Drop down options for prompts.
Oh yeah I wouldn't need all that
Just 1 prompt, but like a batch of base gems, then pausing
And you can select which of the images to send to the rest for upscaling and all that
You use the reroute nodes as "switches"
As you've seen in its current state, it can sometimes take a bunch of attempts to get the spelling right
I think there's actually node(s) that provide that type of functionality
I've seen somebody with it, I just forgot what they're called lol
Can use boolean nodes, not sure after that.
It's sucks because we were in voice chat on the civit server when they showed me, sharing their screen, so I can't go back and find an image or something
Probably like this, not to sure though. I use the re-routes.
I just use Custom nodes that pass the input. So, latent in -> latent out. They can be muted fine.
Is it already possible to mute reroutes?
You mean like right click bypass?
For reroute you mean?
I mean Ctrl+m
To break the chain ⛓️
Yea same thing as bypass i belive
I am unsure, but I think that doesn't stop the flow
pretty sure mute and bypass are different
one stops the workflow the other doesnt maybe?
not sure
And reroutes could not be muted. At least it had no effect
I do it more like this, if I want to use Image to Image on the next step I connect i2i to ON, if I want to use Ultimate SD Upscale i connect USDU to the ON. More of a selection.
thats pretty neato
If i want to skip that section entirely I switch it to OFF
So my flow goes txt2img, img2img, ultimate sd upscale, after detailer, inpaint. Can switch on and off and interchange models/loras along the flow
Can pass things along as needed and route them through the re routes,
yeah that gets wayyyyyyyyyyy to unruly for me
i just have workflows of specific tasks
Have to admit its a lot, can download the base off CIVITAI i think its called Ultimate ComfyUI workflow or something. I have adapted it for my need and updated some of the nodes.
Nice subway map you have there 
yeah i mean if I have a different use case, id much rather click Load than a million nodes and buttons and logic gates and whatnot
I come from Blender 3D where nodes are life lol.
But they have much better nodes for node management. ComfyUI is strong but needs some QOL updates.
man i left programming to do art full time
i thought i was done with nodes
these worklows look like goddang db schemas
If anyone wants to poke around with it.
You can check out what's possible with the nodes backend on the original repository. Pretty neat. https://github.com/jagenjo/litegraph.js
There's also a demo
i know a guy whos trying to build on top of it
Will give it a look. Thanks
Check this out for a cleaner ComfyUI front end.
https://github.com/Stability-AI/StableSwarmUI
looks kinda similar
I don't mind my noodles though, what you gain on the front end you loose power on the backend sometimes.
sekiro vibes
Going to try yours the the Cutoff to see if I can dial in the colors etc.. last one before bed though
It did mostly, just playing around.
@bright valleyso I uhhh, messed up big time with those images I was generating for comps earlier lmao
I had my LoRA ontop of realistic vision, not base lmao
that explains why it was better at some things than I remembered, and worse at others
this is what it looks like PROPERLY compared
it suddenly got better at eyes, and a lot worse at foliage, and I was so confused lmao
Looks like you've got a ways to go to be the best realism model if you couldn't tell the difference 😂
no I COULD tell the difference, which is why I was confused
I thought the foliage looked a lot worse, but the eyes looked better (which made no sense cause my model didn't finish training)
my LoRA didn't finish training which is why it has splotchy small details
altho I will say, it looks like it plays well with real vision which is nice
And now I have a new logo lol.
you can see the difference here
I was trying to figure out why my LoRA was darker than normal
Now you're getting it
base, real vis, my lora on base, my lora on real vis
You wanna see where that style came from
The text style?
my loRA does seem to fix the issues with real vis a lot tho
I could demo it on base and ontop of realism finetunes, might work even better on realism finetunes
Pretty awesome!
Thanks! Apparently it works great as training data too after seeing your gen
The dark purple and neon green, ya can't beat it
Altho- don't sleep on gold
That's the sleeper hit of the bunch
PCMonster text logo, black, green, purple, slime, skeltons
prompt for it.
I told you to try the prompt structure 😂
It's like midjourney, it takes the simple prompt and just goes to the moon 
Can say it worked lol, mostly lol. Was trying to get black text but close enough.
Black is a tough one. If you do black and white it works, but I guess I haven't made many logos in those styles with black text
Here's one I made with gold
Good enough for me.
Lol dude I love the pfp! That looks super dope
If ya have the chance I'd love it if you uploaded your favorite(s) to the model page
I'd totally want some of that on display
Will get a few up there in the morning, gave it heart so will be able to find it
Awesome I appreciate it
I guess civit is making a training video for the site and their social media's for it, and in the next couple of days I'll be hosting a contest through their bounty system
20,000 buzz on the line 
I mean, it's not hard to do it
I can't say I know all that much about getting it right
This was my first model
I think I honestly just got REALLY lucky with all the parameters I guessed on
Yeah man the gold is littttyyyy
It knows dragon and wolf dragon
My name is harder I think because it keeps wanting to put monsters in the image. And can't really put it in the negative prompt lol.
Yep I've noticed that
Things that are actual things or objects or whatever can confuse the text encoder
What if you put "monster" in the negative?
Removes the word from the logo.
bummer
Monsters not an activation word so you're at least good there 😂
Dude I'm telling ya, I'm so happy I didn't listen to anything people told me about captioning. It responds so well to the activation words and order
It's actually built to know single or plural too
Like Skeleton or Skeletons
Heart or hearts
Keeps brining up this little dude lol.
wow, Real vis and base react very different to my Realism LoRA lol
somebody made me do an on/off comparison thing today because they thought it couldn't be my LoRA doing it
I tend to like both loRA outputs for different things
this was the picture
and when I took the LoRA off, same everything else (Seed, etc), I got this
Mine listens more to what I ask for (makes sense, its the model it was trained for), but thew realistic vision one gives interesting results for sure
What Sampler are you using for yours?
honestly I feel like Euler is the easiest
comfy made Euler and DDIM identical now, so I guess I also use Euler haha
for me it gives the best results most often, but if you look on the gallery for it on civit, people get insane results from all KINDA samplers
yeah, skill matters a lot more TBH
this is what i mean man, people better than me at this shit
I just choose DDIM cause its the fastest
and at my own style 😢

I wonder how the other living artists feel 👀
No man like I totally get what theyre talking about now
in a visceral sense
but im doing it to myself on purpose tho
I try to avoid training on art from living artists for that reason.
there's a guy zodiac, mod on the civit server, he makes tons of cool pictures with it without any text in them lol
I only use images I made myself so I get it
I use any images that are free to use, and nothing copyrighted, so I also get it
Logos were what I got commissioned for more that anything else I'm pretty sure
so I've made a ton of damn logos over the past couple years
and my own stuff as well, but I have a pretty limited dataset cause of where I live lol
So that part, all the work was already done, when I decided to try I didnt have to get or touch a single image
I get part of the Anti-Ai arguments and stuff, but most of it is just immense fear mongering and a select few artists feeling threatened when they aren't actually being targeted. I say that personally as an artist and Ai researcher. I have also changed the consensus around AI with all of my creator friends after showing them how it actually works
The amount of false info in the internet made just to scare people about AI is insane
yeah i mean im a full time artist now 
True that.
So I guess technically "artists" includes me
Some of my artist friends used to give me major shit for using AI generated content, back when they didn't understand how this stuff really works. And now they all hit me up constantly asking for generated references, imposes, styles, and environments that you just cannot possibly find online
It didn't help that nearly every "prompt" guide that got popular with 1.5 was based on including an artists name (looking at you frank frazatta)
lol
lol are you being for real
so much fluff
watermark, signature, all that in the negative
you know what you're doing at that point
It does depend on what models you're using
dude these are wild
I make almost no money off of anything related to AI, and absolutely no money off of anything related to the intellectual property of individual creators
Most of my stuff is for personal use, or references for creative individuals in my life
I get really pissed at the share amount of individuals who know nothing about AI acting like they know everything about it online, utilizing their large follower counts to fear monger how the technology actually works
*sheer
Several months back there was a video by a pretty major creator on YouTube, one that I can't remember at the moment, and also one that I don't want to give any attention to, where they effectively said that once an AI sees anyone image you've created, you can basically be replaced instantaneously
This was back well before SDXL was even in beta
That looks pretty sick
I think it would look cooler if the pentagram was upside down personally,
Very splatty
The problem with pentagrams in SDXL is that they associate occult pagan pentagrams and satanic pentagrams under the same term. And there are a lot more occult pentagrams out there than there are satanic pentagrams
And as somebody who tries to generate a lot of demonic / satanic imagery, I have almost never gotten a properly oriented pentagram in any of my generations ._.
Interesting, I didn't realize you trained your LoRA for tagging instead of captions
I have alternated between both for my trainings, but I found that even if I tag my training images, SDXL seems to respond better if you caption your prompts. At least for what I'm training
theres also loras for almost every japanese artist out there (for 1.5)
Yeah, and a bunch of douches in the community decided to go after specific and iconic artists to try and attack them. Like everybody who decided that it would be a great idea to attack and go after Sam does art
sorry to say this once they posted their works online thats when it became public property
Like sure, I don't like the guy at all, and I think he was very rude in the way that he handled the whole AI stuff, but I don't think that going out of their way to try and attack his livelihood was the way to do it
It sure as hell didn't give a good light to the community
yeah it works in a specific order, which equates to front-to-back if you think about them as layers
every term is in front of the one behind it
The most impressive model I've ever used to date is still a 1.5 model, absolutely blowing SDXL out of the water in terms of how well it listens to what you're prompting. And I just recently found that it listens the most to what's in the middle of the prompt
As weird as that sounds
I did a whole bunch of tests and found that it responds differently depending on what concept you place at what part of the model
I mean the prompt
The whole model is just the most insanely saturated and capable model I've ever seen, and it's trained on over 77,000 unique tokens just for its own content, but it's really crazy the way that you can prompt it
Like PC Monster was doing what would be quality prompting regularly at first
I don't see any SDXL models coming close to even a fraction of its capability, diversity, and consistency anytime soon.
I just seriously hope that the people who trained that model get their nose into SDXL soon, because oh my God, that will be one of the greatest things generative AI has ever seen
got stuff like this
first time (that I saw) he used the prompt structure from the model page,
One of the most interesting things that I have found about my LoRA is specifically that the original versions were trained to follow a very trigger tag Have you prompting style, and the yielded pretty bad results.
But when I ended up dropping the trigger tagging in favor of simplistic linguistic prompts, the whole thing performed monumentally better
So then since that moment, I've simplified my tags/captions, and found that it's performing even better, although I do have a little less granular control over it. Which leads me to believe that if I mix the simplistic tagging with a ton of images that still cover a broad range of tags, I will allow it to have the highest quality generations, while still having clear control over certain aspects
Generally speaking, when it comes to realism prompting in anything related to SDXL, I 10 out of 10 times would recommend using a linguistic prompt over a tagged prompt. I have yet to see a single model/LoRA That reacts better to tagging than captioning when it comes to photographic images
yep that's what everybody told me to do
Going off the rails again lol.
Nah man, this stuff is awesome
love how that black paint looks on the wall
hmph, looks like I still need to work on neon a bit more
the neon around the heart tho, shyyiiittt 
Harrlogos LoRA "will do great things!!!" 😄
Ayyyyyy!
... it's getting there!!!
That's cool, what is that like children's book illustration style?
Yep we're only on v1 so hopefully I have nothing but room to improve
The prompt in SDXL ComfyUI was Neon text "DAFFODILS" fibonacci psytrance
Lol! Wow would've never guessed that
Same, but "HIBISCUS"
I hate to ask you but, did you try structuring prompts the way it says on the model page?
This was the difference for PC
OK, will go back to the Model Page ...
Fliboiscico 😂
I kinda fck with that, along with HILSIBUSS
Its kinda hit or miss with anything that's not listed, most things work, some not really
Just for fun!!! Oink!!!
Trying to do some inpainting but not much luck
Dude I swear every night somebody tells me they're going to bed, or 1 more picture
And then they just spam for another hour straight
Pretties with one button prompt
Lol i got 2 min my router auto resets at 4am for updates lol
It was meant to say "Pig's Burp!"
Aww close!
Damn love this look
Gotcha!
They have 2 tails but we can deal with that
nice colors
Also that kinda stringy style, it's very cool
Zentangle is that style
Before
Pigs Will Fly!!!
after a bit of photoshop touchup
Ayyyyyyyyyyyy!!!!!!!
With the panoramic joint
Damn you understood the assignment 😂
Other interesting styles - fibonacci, cubism, vexel, prismatic, pentimento, sfumato, sgraffito, alebrijes, greebles ...
Also wtf man that photoshop job is so good
You have them back connected to eachother
Way to go on using PS - Generative Fill?
FYI PS is now available in a WebUI
Some interesting garbled text ...
Mostly brushes, clone stamp, eraser I believe. I kinda just go with it. Been using photoshop for not even sure how many years now lol
Really a great edit
I have to check them against eachother to find out what changed
I connected it to my webui at one point, but removed it as it was when it was first released and didn't work too well
The U was quickly done, most noticeable to see it wasn't super nice. But it's late, and i"m tired and getting ready for bed
symmetrical at least lol
Could've had it through the O though
queue 100
well the rocm guys just disabled the multi query attention unit tests for Flash Attention so i feel like it's gonna be a while longer before it's running smoothly lol
Nooooo!!!!! Take anything, but not my multi query attention units tests for flash attention

When I use 3 inputs on IPAdapter and have Desktop PS open, PS robs too much memory, so IPA x 3 is very slow. The Photoshop WebUI is a godsend as it takes much less memory!
using @bright valley Harrlogos
masslevel text logo, cinematic scifi film still of 3d motion graphics letters made out of metal. fire particles. blue light streaks. floating in space, dark energy, in the background stars and galaxies
if you dont be quiet ill take something else
interesting idea with the roses
directly rendering 960x1440
if you want a softer pink like I have, try Sakura rose
Amazing how SD can't handle rain in front of things. Might be worth a LoRA. But it's like the "behind window/ behind Glass" problem.
I have "blurry" in my negative which might dissuade foreground raindrops
dalle does rain really well 🥲
Nah, it just can't handle that.
So, an impossible prompt would be: a crowbar in the rain.
Good luck 🤞
it's not terrible surprisingly, other than the cane? pickaxe? shaped crowbar
a kitten in the rain shows how it's putting the rain everywhere but in foreground better I guess
bro, why does Kohya's Bucket res work so ass
I am gonna have to hand crop and downscale all of these images since kohya's built in image bucketing is so fucked
these buckets are trash, WTF Kohya
1024x640? what the fuck
yes, let me just train at almost half my target resolution
also, might I add, thats below my minimum bucket res???
like how does it get that fucked up
the only thing I can understand is if its some form of bugged cached latents from a test with bad bucketing, but it seems to be inconsistent
Euler not Euler a?
I found the issue with the bucketing, after WEEKS of issues
apparently, it only follows your settings if you do not enable "Do not upscale bucket resolution"
So I have to have bucket upscaling on my 4k-8k+ image dataset in order for it to listen to my bucket settings
Wait a sec. Is "upscale" a mistranslation and actually needs to be "extend"?
I am not sure
all I know is that after looking through all of the startup statements, I found one that said "Bucket resolution sizes ignored due to "do not upscale bucket resolution" being enabled, please disable to enable custom bucket resolutions"
I seriously think that the fucked buckets have been why my LoRA has had such trash fine details, which makes sense, cause it was training on images as low as 600k pixels, instead of the 1,040k goal
Hmmmm, I might look into OneTrainer today. See what that has to offer.
these new buckets are all 1,030k-1,110k now, which should be much better
unfortunately, the increase to proper res also means that BS8 no longer fits in my GPU's VRAM
so BS6 it is in guess
man, this should hopefully increase image quality monumentally
along with the lower LR and the better captions AND training fully, man, this should be a huge leap in all regards
Are you training with DeepSpeed? that'll probably help you if you don't
no I don't use deep speed
you should; it's a bit funky to set up, but it's basically AIT but for training
maybe some other time then
@indigo carbonhow would I even go about setting it up? Is it something that can run in Kohya?
I'd assume so, it's just a training optimization
I did use Kohya a while ago, it did support DeepSpeed if I remember correctly
it would just need you to either install the pip package or build from source
I used it in Kohya_SS a while ago, it should still compatible
That doesn't sound super confident 😅
I'll have to ask you a little more about it some other time if you don't mind
anyways; it should be as easy as pip install deepspeed if you're on Linux, if on windows- it's not as easy to do
I am on windows, of course lol
yeah, you'd need to build from source
it's so dumb that microsoft's OS isn't compatible with their own optimizations
it's looking at min_bucket_reso, resolution, max_bucket_reso, bucket_reso_steps and bucket_no_upscale, no upscale means it will only downscale images if they are smaller than resolution, resolution is the mean pixel count, max is max width/height, same for min
if your image is 4096x4096 but your resolution is 1024x1024 and max_res is 1344x1344 the output will be 1024x1024
that doesn't make sense, as its still downscaling the whole image. It should really be named something different, cause there is no upscaling involved at all
if you have bucket_no_upscale set to True, images will only be downscaled if needed, if you have it set to False they will be either downscaled or upscaled to match bucket reso
I think it makes sense
If your image is 768x768, bucket_no_upscale set to False, it will be upscaled to 1024x1024 (with res being 1024)
ok, then why would that affect images that are way higher res than the target? they are still downscaling, not upscaling
that would make sense if that decided whether or not to upscale images that have a dimension below your max size, but mine do not
bucket reso steps being 64 also matters, it's probably looking for the nearest matching w // 64 * 64, and h // 64 * 64
I am using 128, but my point is, if my images are several times higher resolution than my bucket, then why would "upscaling" be a part of it at all? that setting should either be renamed, or changed to only affect upscaling input images lower than your target res
I personally don't see how downsampling a 60MPX image to 1MPX has anything to do with upscaling lol
uhh I don't think we're on the same page, nevermind
I am not sure what there is to be confused about, but alright
I personally think that getting a 4000x6000 image to 900x1200 res doesn't involve upscaling, and thus, should not be held back by turning upscaling off
now getting a 450x600 image to 900x1200 does involve upscaling, which I could understand in that case
in any case just remove --bucket_no_upscale or disable the corresponding gui setting and it should be fine
as stupid and irrational as it is, that did in fact fix my issue.
Don't understand why you can set a max bucket res if a single setting makes the max bucket res default to the same max you pre-specified. Might as well just get rid of the button, and set the damn thing to your max square res lol
Works the exact same
oh well, its only just a few months worth of trouble shooting and training down the drain cause of a poorly worded and redundant setting :/
In Kohya's defense it's not enabled by default
if anything, it lead to me getting great results with fucked up training, so I can only hope that the new training runs improve massively now that they aren't being cucked
Been down similar rabbit holes with Kohya myself. We so much need new tools but they are beyond me to make. 1.5 the porn, anime, NAI people have said they went back to over XL. I really wish they didn't have 1.5 to go back to and we would probably have some stellar tools by now.
1.5 is still objectively better for a lot of use cases so not happening, blame XL if anything
I am just blown away by 3 picture IPAdapter!!! Just wait until I branch out to 4!!! Mebbe get another 64Gb DDR5 ... ?
Plenty of blame to be tossed around but it is all directed at SAI tbh. They have yet to not just pump and dump and with XL it is far too much for the same tools that people used to train with actually work. Amazed it works at all.
NAI has a lot of blame too, per the porn creators. They have flat out said that if it had not been for that leak none of this would be where it is now.
Yeah, pretty much, was just trying not to be too mean
I am mean when facts are obvious.
Dumping and moving on is just wow to me. Where are the tools at? We will figure it out? No, we will not those who can just go back to 1.5
I saw this in 2.1 and it came back in spades with XL. Same old stone tablet and chisel being used for 2.1 was rough, but doable. In XL with 2 te's and lord knows what else happening the trainers know we need two sets of controls for each TE then go from there.
OT is working on that for a few months but nothing close to releases from what I gather.
An amazing new AI art tool for ComfyUI! This amazing node let's you use a single image like a LoRA without training! In this Comfy tutorial we will use it to combine multiple images as well as use controlnet to manage the results. It can merge in the contents of an image, or even multiple images, and combine them with your prompts. The IP-A...
@bright valley I Just got done talking with a representative from one of the companies I reached out to for research funding. They ended up being so interested that I have an appointment scheduled on November 3rd. I'm really hoping that this goes through.
Proper funding/support is exactly what I need to bring this to the mainstream, because I'm confident that I can create something truly incredible with the proper resources
@bright valley
Dropped you a review and post for the text lora.
I have something new coming today, new model, home-grown, home trained. CineVisionXL - Keep an eye out for it, will be dropping on Civit this afternoon/early evening 🙂 🙂
pics look good, what's the intended usecase and/or training method for it?
Cinematic output. Like my NightVision, but for when you want that movie scene feeling
rather than photographic which NV does
Nice, I may use that heavily for a project I'm working on. Been using NV a lot for it, but maybe this will work even better
as for training, it saw lots and lots and lots of cinematic themed synthetic output. put several hundred hours total into this one, lots of creating movie scene style output in MJ or in NV, working on it in LR to get the feel I want, then into the training pile.
Nice, looking forward to giving it a go 🔥
I'm also working on an animated model in the pipeline (I've been calling it AnimeVision, but that's not quite right, it's different than anime) that will be another 100% homegrown SCG model. been having fun with that one, it's getting close to a public release as well
@high skiff If you have issues with bucketing in kohya, just take care of it yourself. way better control over the dataset. I can probably share a few python scripts that does it for you if you wish.
Yes, as here is one for you.
Ran into this a few weeks back but I asked for 1536x640. Standard XL rez and it bucketed my 1024x1024 as 960x960. I had to manually go in and change everything.
don't upscale was unchecked
Oh okay, could you please tell me how to properly upscale images as you mentioned using several platforms?
I'm not sure if SAI are even moving at the moment, they haven't announced anything about diffusion since the original SDXL release
All I heard was they had started on 3.0 but that was 3 weeks ago. I was amazed since I think it is far too soon to release another flop. Of course emad is on twitter talking about it so they are moving.
I don't trust emad any more
oh really? I gotta read all that, link?
I don't keep links. Just find him on X
Conclusion of that tweet: a lot of things are coming
LOL, yep
could they finally be moving on to a new text encoder?
or just a mini version of SDXL
Sounds like the former
Many new things come after SDXL released. AI fields change too fast.
Honestly, does it really matter in the end if all they do is dump what they have and move on without the tools for what they dumped? Honestly, it doesn't.
last post from Emad is from January 21
Nah, maybe he has another account but it was mentioned in here. I don't keep him in my subs he isn't worthy of that as he just spouts nonsense when he tweeted anyway. I am sure someone will get ya what ya need just realize they are not idle.
There he is
and no technical details?
From emad? LOL LOL LOL godd luck
damn, here I was ready to go and read more papers
ex hedge fund manager is never ex.
anyways, I am pretty sure whatever SD3 is it will probably either have image conditioning/different text encoder. I'll keep track of this
It is so limited with current architecture. It works but just could do it better.
why do so many people hate emad so much here?
whenever i see him brought up it's just a slew of toxicity
i wouldn't be surprised if internal policy was "don't engage with the toxic community" and that's why it's so silent
nah, I just don't really know. my only complaint is the lack of papers/technical details when announcing about new models
as soon as theres SD3 news, the misinformation toxic army will start talking about how it's beyond censorship and they're infringing on human rights
its better that they don't say anything for as long as they can
toxic community members have managed to weaponize most press releases against them . its ridiculous to watch occur
I mean, it would be nice to know what architecture/changes they've made on the new models but idk.
yellow journalism youtube would take any announcement and make a 15min clickbait video about how its the death of free AI if they even mentioned architecture changes
who cares? I'd just read the original paper
you saw how much youtube hate about how censored SDXL was going to be before it released? we all remember that don't we?
the lead up to 0.9 was nothing but toxic hype cycles
stability cares. they've gone silent since
whatever, all I'm saying it will be easier to make all the optimizations compatible with it before they even release it. AIT took a month to adapt to SDXL because they released it out of basically nowhere
Not their style, as much as we would love for it to be.
"sdxl released out of no where" is the kind of toxic hyperbole. Case in point. 0.9 came out and allowed people to test and build before the 1.0 release. but, that doesn't count because of reasons.
they're better off with no communications since everything gets twisted to toxic proportions anyways
honestly expected better from a developer. toxicity i can expect from the flavaflav hype men. but for a developer to claim they had no leeway? the toxic strands weave and twist
i remember when the sdxl 0.9 report got published and people were tearing it to shreds and syaing it wasn't good enough. big members of this community
well, if there would be an SD2.9 and it'll have the same architecture as SD3 I probably could have modules/compilation scripts ready
there was an sdxl 0.9 and you've denied it ever happened. why should they bother to help you at this point?
i'm sure through academic circles, machine learning students and researchers have private access to the research in stability labs. Wider public releases haven't been received well, historically
to me it makes sense that they're a lot more quiet these days.
AITemplate wasn't a flexible optimization then, I never denied SDXL0.9's existance
i misunderstand what " AIT took a month to adapt to SDXL because they released it out of basically nowhere" because you just contracdicted it there
if sdxl released out of no where, that means you think there was no beta release
what could that statement be if not a denial of a soft release?
communication is a two way street and if anything said is twisted and lied about, people are going to leave that street
One thing that might be announced, stability text to video. They are testing it on bot channel. Just wait for their announcement.
they trickled a little bit about that stuff out and did you see what happened?
100% expect them to remain tight lip and potentially not even release that model, after the initial reception
Awesome stuff dude! Before I even saw your prompt I thought it looked straight out of a movie trailer or something. Really cool!
Yeah man if you can find a company to pay you money to train your own LoRA, without expecting anything in return, power to you! I know I'd consider you a master of business negotiation.
Thanks a ton homie! I had a great time seeing all of your creations last night, you got the hang of it real quick
well, due to using components from FX2AIT, it won't be too hard to adapt it to anything at this point. I'd say it won't take too long to have modules ready after they release it anyways
so, tooling improves regardless if stability does a soft release or not? Seems like it only benefits them to stay quiet. All pros and no cons
Don't expect anything until it released. It good for sai and the community.
we've seen how the past year has unfolded. i fully expect them to be liberal with information from now on. even emad's last post, there's all sorts of conspiracies forming aorund it all ready. reddit calls it a deleted post but its still there on his x profile
how are people even traing SDXL models
very obvioulsy it was only deleted to change which gif he used, but accurate informaiton isn't the name of the game. conspiracy theorists gotta make up stuff
i can't even get one to train
using nvidia gear HEYOHHHHHHHH hehe. but yeah, i don't bother training models because i don't have 24gb. i focus on loras instead
you're using youtube guides aren't you?
https://rentry.co/59xed3 read this whole thing and go through it over again
It could be trained with 24GB with reasonable amount of time for lora
its for anime but it works
the only thing that would need some heavy changes is if they are using something like T5, but that's as simple as changing which pipelines the compilation script imports
i wounder how someone managed to get 2 million steps on a single card
sd3 i don't think will use t5 since that'll require 24gb gpus

don't worry about htat. get it working first. those people got a lot of time and experience on their side
perhaps a quantized version of T5? I experimented the other day on DFIF, quantizing the T5 text encoder barely did anything besides lowering VRAM usage. I was able to go down to 6bit with almost no difference
i don't know how that would translate to actual models though. often is said you want higher precision for training. Pixart-alpha seems like the state of the art, they use t5, and claim it needs 23gb. I think it needs more than 23 which is why it's not released yet.
I tested that on DeepFloyd IF. barely effected anything besides memory usage
using 70gb of vram is why.
hold up, I'll send the results I got there
This is fine tuning with 2400 images with A100.
You could get 4 batch size using pageAdamW8bit
and even more to reduce the resolution.
batch size of 1 shouldn't be shooting up to 70gb of vram usage
im using the wrong optimizer then
FP16 T5
6bit
nice results. what's the vram use like?
of just the encoder?
theres your problem. you're using the dreambooth extension which is the most unoptimized training software
nobody uses that. does the author even maintain it?
doubt. it's using 70gb for batch size of 1
he updates the software but lazy for optimizing
i think i've told you you need to use kohya , like weeks ago.
You're kind of standing in your own way
i honestly think he's just grifting his patreon donators
the text encoder on its own used 8gb on FP16, 3gb on 6bit
https://github.com/d8ahazard/sd_dreambooth_extension the github doesn't even exist
wtf
not to bad at all
i'm not surprised
it's all just been a donation scheme. everyone saying it works i bet were just bots trying to astroturf hype
right? I feel like the degradation isn't even noticeable with T5. so maybe SD3 uses a 4bit/6bit version of T5?
maybe that kind of research is why pixart-alpha isn't out yet too. they're working on 4bit versions
cached copy
it was pretty easy and didn't need me to do any adaptations to DeepFloyd for it to use a 6bit version of T5, I doubt that's their reasoning
hoyah better be good for SDXL
anyways, 6bit T5 easily fits as a text encoder on normal cards.. maybe that's what SD3 uses
theres another problem. you use steamunlocked. means your system is rife with malware and that will screw with your gpu drivers
before you don't listen to me about piracy, remember that you didn't listen to me about the dreambooth webui extension too
oh? like, using t5 to caption?
yeah i think sd3's aiming to run on more gpus sdxl does. like it may be another 512 model with bucketing up to 768
i probably won't. you'll have another problem because of being root kitted and a virtual machine is using your gpu for mining or something.
your browser bookmarks show steamunlocked which tells me you've got horrible security policy and have multiple rootkits running
it also certainly either has image conditioning or a different text encoder, maybe not even T5- idk
we'll see eventually
pirated pc games are the #1 vector used to get botnets with powerful gpus. black hats love releasing scene cracks
if you pirate pc games, you're owned. bottom line
like if you never pirated before
grew up in the sneaker net days
know all the words to dont copy that floppy. used to know the dance
youll run into more problems, even with a properly configured kohya. you'll see
ya we'll see
its because of the piracy
Paradox SD XL 1.0: https://civitai.com/models/188001?modelVersionId=211105
Introducing Paradox, original by Design.
Paradox creates unique content while offering you more control over your creations by generating exactly what you request.
The model doesn't rely on merging; it generates purely original content and is finely tuned to follow your instructions.
It draws influence from over five thousand original images and trained on 65.000 steps, resulting in a meticulous approach that leads to a creative model.
We invite you to load the model onto your favorite stable diffusion application, give it a spin, and share your thoughts with us.
- Special thanks the people who help me and who I care a lot about:
@markorez
@upbeat summit
@Kamikaze(Elon Musk)
@osiworx
@mix
@polar gale
ah hah, that's where sushi kitty came from
Downloading post haste, my dude! Can't wait to try it.
Great job! I might be using it as part of general model block equation in the future if you won't mind
🙂 ❤️
Yes sure the model is allowed for merge and anything.
very nice tune
Thanks mix love u man. 🙂
Generally speaking I would love the functionality of that, however I know basically nothing about implementing my own code and programs, and the last thing I want to do is jeopardize the training of my future models because of some faulty code I potentially write
totall same face syndrome
So is SDXL worht it? Ill just stick to 1.5 for now.... I;m fairly new to all this, I just upgraded to be able to run locally...
SDXL is worth it
I haven't touched 1.5 except for like 3 images since SDXL came out
Just switch to SDXL everything is moving over an I'd say decent Loras now.
ok thanks guys
Yes. Far away from 1.5 and also better than MJ
sdxl has 4x the base resolution so its prettynice
not only. SD 1.5 base model was trained from ~90 million images. SDXL base model was trained from 6.6 billion images
it has everything inside
i trained one lora for 1,5 haha took 20 minutes or so with like 25 images
not sure what awaits me if i want to make loras for xl
Whats the difference in base_1.0_0.9vae? Should I stick with the default 1.0.safetensors? Noob here haha
👌 thanks for the heads up
i thought they rereleased 1.0 and thats all old news
Lora for XL takes way longer
1.0 thats available now has the 0.9 vae right?
I do on runpod because with 4070 it takes too time
Well for the base model on the website. Standalone vae is rereleased. Not sure if they rereleased the base+vae. I just stick with manual vae anyways
yeah i use ollins half precision one
yes
but what settings tho
I train Lora not model so can't help
grate
but SDXL has really everything inside
oh wow
when i tried SDXL earlier i blue screened my gpu
wich gpu?
RTX 4090
it's good also for training
i can only find lora guides online not model ones
start training a Lora
i can get one trained in 4 hours
how may images for 4 hours?
700
good result
with mine settings on 4090 (runpod but it's the same) 2hrs with 50 images
whats runpod
What you get after eating Taco Bell
since I've 4070 that is good for SDXL but not for training sdxl, when i need a Lora I use Runpod 0,74$ per hour
16GB is really the min and 40GB is comfort
Just do Loras
you could do it in 14GB but not all that easily. Lots of quality things you have to ditch. For instance on a LoRA (forget about a dreambooth) you would need 16 dim cause more the mem requirements shoot up
Anyone trying to train XL in 8GB is a masochist.
12GB is doable but no less. have to use all sorts of tricks
12Gb is a pain fopr training sdxl, but good for using SDXL
remember XL is 4 times the size, datawise, than 1.5 and 3 times over 2.x
one trick people do is train the lora XL at 768
can work for styles
Hell, for DB I'd kill for 32gb, or 40GB for xl. 24 I barely eek by if I attempt prodigy.
XL should only be used with 1024x1024?
Or one of its predefines for best results, yeah
More ram slightly slower lower wattage.
My 4090 stays with me until it, or I, die.
renting another card will get expensive in the long run
look, something is wrong with you, or the card if you have a 4090 and can't train. just being 100 as I have a 4090 and train DB as well as lora and locon. I struggled with the shit but if you want a lora just use the predefineds in Kohya_ss gui as they work you just need to tweak to fit your dataset.
About how much ram do you need to train?
23.2-23.6 but I use BS10-14
much less for lora
if I have like 6 images I can BS16
I mainly DB then extract
Ahh have a 3060 12GB so out of the question for me.
Can you run 2 cards at once? Like seperate PCIE lanes but still train?
Would it really have to be sli could you half batch it?
i dont think you can train ai with sli
nvlink you can share memory between both, but sli you can only use one memory bank
just started testing LoRA training SSD-1B using a modded kohya
@heavy gulch thanks for adding the category, and @drifting jackal for the quick response as well!! Expect a new version soon, but uploaded the current epoch.
also very exciting is @stone fossil 's Paradox model tonight:
https://civitai.com/models/188001/paradox-sd-xl-10
Have fun! Make art.
newest training test for my LoRA looks more promising. Hope its enough to impress my potential sponsors
I have until Monday to come up with some new tests as well, which could be very useful. I have an idea for a new test as well
Wtf do I have a wish-version of SDXL or why my outputs are like this regardless of the level of detail in my prompts (and negative prompts)?
wrong image res. use these only
#✨|sdxl message
Huh, interesting.Thank you! I picked one, but still kinda bad. What else am I missing in my settings here?
best results are 1024 x 1024. try that
Got slightly better but still kinda rough
Anyone have any kind of image to share generated with the base SDXL 1.0 (31e35c80fc). I am testing this SDXL but I can't find any image online to replicate to test that I get everything working. I am looking to do the asuka test of sorts for SDXL.
I am getting weird results and I just need someone to show me a proper render
It will always come down to how good your prompts are in the end. if you want a photo, start the prompt with photograph, dont add hyperrealism, or realism.
Just have "photgraph of sand dunes" or similar
All the images I find have the parameters stripped. I legit can't find a single SDXL image online with parameters so I can verify my settings
Very frustrating
you can find prompt examples on https://civitai.com/
Explore thousands of high-quality Stable Diffusion models, share your AI-generated art, and engage with a vibrant community of creators
I tried some example prompts I found with example images included and they came out nothing close 😄
in order to have the exact image you need the same resolution as well, which isnt provided on civitai. sometimes the device or software you are using will be a different result
I see. Well I keep on exploring. Thanks for your tips though!
Hi everybody, happy Friday! I'm happy to announce a release of a brand new trained SDXL model, CineVisionXL, a new trained model I have built with several of the awesome folks from my Discord channel SCG-Playground. This model has been trained on widescreen (16:9, 21:10) formatted images and can produce stunning "straight out of a movie" scenes that are coherent and beautiful.
Read the full patch notes and what went into it, along with the dozens of samples for you guys to play with in the gallery!
Nice, been looking forward to trying this out, thanks for your awesome work as always
Also Dynavision is my absolute favorite checkpoint to use with my LoRA, it's like a match made in heaven.
They have bunch of nodes and stuff. I just want a extremely simple image rendered on the automatic1111 to test my installation.
I don't have the comfyui or ability to do that node stuff.
Love the look!
many people using SDXL are using comfyui, but if you dig through a bit more you should be able to find some A1111 gens with metadata
I've got another model in the works too, started as an anime focus, but it's kinda turning into it's own thing. it needs a lot more work, tho if you keep an eye on my discord, you'll see it a lot.
Hey I can only take half the credit 😉 guess we make a great team
Is there any way someone has any render. I've been literally searching for 1 and a half hours for an reference image
What do you need?
Getting quite frustrated that there is not a single "try this image to verify that you have installed correctly" type thing
On what? Usually there's default values for almost everything
I need extremely simple image generated on base SDXL 1.0, can be anything, but I need to see the parameters, seed, etc. I then try to render the same image and verify that I have installed correctly
Few more. I still need to give it some more training, but I like where its going
I have stuff behaving very weirdly and I get bad results
What UI
Automatic
Okay, and you have an XL checkpoint loaded, and try a text to image gen?
Yes, but I am not quite sure if there is even a problem. That is why I would need a reference. Known good result
To compare
Uhh.. well does it look good or not 
Ahh, forget it
we cant know if theres a problem if u dont post the images
I can make you one real quick I have Auto1111 and ComfyUI
That was a genuine question, I'm only trying to help lol
Give me your prompts (pos/neg) and settings and we can compare
your gens dont need to be exact. just enter all the same details into A1111 and you should get a similar result
If there's something "wrong" with your installation/setup, you'd be able to tell pretty easily I imagine
@bright valley Few more i turned out today for my son and girl friends online handles.
So if you create a decent looking image, you're good, just need to tweak parameters
Ayyyyyy!!! Love it man! Cat is a very cool activation word
And some for a friends band, more did it out of practice but offered them to him to use.
Made these for a friend last night as well
Dduuuuudddeeee

Super dope!
Man I'm such a sucker for that neon green and purple
Same and the font reminds me of the pop punk bands lol.
I used to play in metal bands for a few years
So my style is heavily influenced from those days
Makes sense. Do kind of wish there was a way to pick the font a little better though.
It works on more of a style basis. I tried to get as many styles as I could in v1 (pixel art, graffiti, tattoo, anime, etc)
There will be more to choose from in future versions
How fast is everyone's s/it? I'm just trying to find out if mine is slow or something
depends on gpu,whats yours
Just my old a2000. Usually between 1-3s/it which is pretty acceptable to me but sometimes it'll have a process that's doing like 6s/it and I'm trying to figure out why it's doing that
Might be throttling or something
with that 12gb card you should get like 4it per second
O wow ok I'll have to look at the installation again
Depends on the render too
yea 4it/s at 1024x1024
Some operations I get 5it/s, some are 10s/it
if u are using auto1111 just switch to comfyui or foocus,its faster for sdxl
Yeah I'm on a1111, I'll give the other ones a spin
How do I know what renderer it's using tho? Noob me I just pick model and everything is pretty default
I just mean it depends on what's involved with the render you're doing
Ah gotcha
For example if I'm generating animation frames with Deforum, it's also doing optical flow cadence, and using a depth model, etc
So those will be more resource intensive operations
Also if you're using control nets, those would effect it
Stuff like that
Post apocalyptical Sweeney Todd, time for your cut.
LOL
Pretty cool though
They're the same, it/s is iterations per second, but if iterations are taking longer than a second, it'll switch over to s/it
After your cut lol
How does she breathe lol
Just think when she gets to 70
she doesnt breathe shes made of plastic
Thinnest nostrils oxygen breather I've seen lol. Must be a robot.
Yeah, girl out in the middle of a field of shrooms with a nice plate of them. I dunno man.
Actually really like this one.
I noticed the price of SDXL 1.0 on https://platform.stability.ai/pricing has reduced to 1/10th ($0.002). There is no news about the price drop though.
Yeah, for their API, and likely their sandbox too
Ooof.. local or bust for me.
Same.
Homie got bills to pay.
Like I love to animate, and for a great while Warpfusion was the cleanest and smoothest render you could get it AI animation
And I wanted to use it so bad, I didn't care about the monthly sub cost or anything either. It's just it works through a colab notebook.
I see that XL is $0.20 now that's wild
$10 = 5000 SDXL images
How do they even turn a profit from that 😂
Either I'm a dumb idiot or their margins have to be infinitesimal on XL
Or the audience is LARGE. A lot of people cant afford GPUs i guess.
Thanks
Messing with Chibi
weeeeeeeeeeeeeeeeb
The WEEBiest
It's official, need a weeb model lol
do [32-bit pixel art | voxel art] in the same prompt. Might add more 3D pop. Interested to see the results.
once I get a full training job that doesn't fail, 3D is going to be an activation word
If you don't mind lol
same seed?



