#🏞|general-with-images
1 messages · Page 89 of 1
I like it due to the caustics I am seeing
Def need to go through and explore more bottles and light--a fun topic, along with terrariums.
should i make another batch of this comparison?
Esp with light photography techniques.
Yes
when you accidentally eat too much dinner and have no room left to eat your only companion in life
just sit back and watch the meteor land together
i'd go for Artius first, and Paragon next, and then the V5.
depends what you wanted though. did you want 3D objects in there? or more like the fabric of a mini universe?
Paragon does the latter better. Artius does the former.
artius is an absolutely incredible model
this is from Artius
just straight up from the model, no Controlnet or hires fix or anything
well, i think with longer prompts artius sometimes does better, but with shorter prompts it never wins.
that was just rick moranis as harry potter
if you use a short prompt with artius or most 2.1 models, it just looks noisy
idk then, that's how it is for me
for what it's worth, SD 2.1 responds really poorly to prompt weighting. i would test without that if possible
hm, interesting. Artius had zero noise, which is what impressed me so much. try adding UHD, 8K, sharp to your prompt.
there was no prompt weighting on the comparison
already was in the prompt
well, now i will do the exact same comparison but this time, with a shorter prompt
should be interesting
also should try DDIM sampler at 15-25 steps with CFG at 9.0
good, sorry if my suggestions feel "been there, done that"
all i can do is mention things that worked for me
it's hard to give general help for prompts etc in my opinion because everyone has slightly different ways of working. I've learned that the best thing is to see what the person wrote and had in their settings before giving tips.
Other than giving tips that are more "out there" :P
uhg, gonna have to make a new Esty account. They ghosted me
this is interesting, this time all models made somewhat simmilar resaults.. huh
bump maps and all that data? :O
Yep!
what are their sizes?
Using a program called Materialize, you can get all these maps and customize them with just a few clicks
pretty sure its all 1024x1024
I did that manually years ago, and I hated it so much that I still have nightmares :P
Actually the other ones are 512x512 it seems
usually for noise textures it's ordinary to use 512x512
hmm. I think I am just not saving it properly.
as long as they're not pixelated or "noisy," then it's still nothing to worry about :D
Nevermind its all 1024x, I do not know why I saw 512x
can't tell how they look when "tiled" as well
I have it tiled twice in those screenshots
2x2?
I think previews might need 3x3?
and more "straight on" but that's just my thoughts, can't say it's the standard or not :P
The texture itself isn't great for tiling because it has a weird little moss patch in a corner, it was more to show that I can create a texture
I just faked mine :P
controlnet
this time paragon definitely took the cake
what do y'all think?
i only changed some parameters in the generation
Are all of those 2.1 models?
one of them is for sure a 2.1, left unsure, middle ones are 1.5
I might suggest "fishbowl" or "terrarium" to add variety
Not a critique, but if you were looking for some dissimilar results with the vessel, that's helped me
i like this one the best. it seemes over-all that the model im not sure what it's based on, is the best one in this comparison.
what is the prompt? I would like to try it out on a model
galaxy in a glass bottle, 4K, unreal engine, octane render
were the random google check I got :P
PCOS? lmao that's a health issue
(the button labels in that UI)
@hasty nova see my current breakthrough results i've had in training 2.1
training so slowly it's taken 3 days to hit 3600 steps
looks neat
I havent messed with SD in a while, I think I might try to train tonight if I have the time, get v3 back underway for DD
adding dracula's name to the vampire castle prompt
is it actually tileable? doesn't look like it 
Both textures I shared were 100% tileable, at least on a flat surface.
if you look closesly here you can see that this is tiled, 2x2
I can't tell anything from this angle lol
but yea, I see it's tiled on wood
interesting
i like how the Alien Invasion test prompt has a bunch of people all mindlessly walking on the street, toward some unknown objective. it really reminds me of Close Encounters of the Third Kind
trying to get a list of prompts from GPT4 lmao
i said nothing about Beyonce
did it just want to read a random article
oh man it's googling itself
have they added a "resource list?" after all those lawsuits from different places?
their lawsuits make them less open, not more
yes, wasn't what I asked though :P
I've seen similar spill posts on Reddit where interesting things happen and suddenly you can see other users' history and results
oof
which is better?
lol
cyborg concert
so ambitious to try and do all of those faces it failed at
Kowloon Walled City 2049
this is a really cool concept
here is my rendition of it
its not as clear thats its a universe or galaxy in a jar as it is just space dust but cool
what model did you use here?
looks different then the ones in my comparison
What a cutie 🙂

Tried to make the Wow signal more impressive


feck Elon 
yes
recently if I generate back to back without changing any settings itll give me the exact same images instead of a new batch, anyone else?
seed?
no, but it may be that blue arrow below generate, i reloaded the ui to test
boy
@oak osprey I am suddenly havinga very strange issue with LoRA trainng, and I am curious if you have any guesses
No matter what LoRA I try to do now, at any settings, I never get below 0.16 loss, and they all look terrible, regularization, no regularization, constant LR, poly LR
The start at like 0.23 loss, drop down rapidly to 0.163, and then slowly go down to 0.16, occasionally going lower before bouncing right back up
on any dataset? or always the same one
I used the same base I was getting success with before, and everything looks terrible now
you do training inside a1111 that you never update, right? so that's not the issue?
no, I don't do training in A1111
o
I do training in Kohya
does it use diffusers 0.17
I am just going back to my OG LoRA settinsg to see if thats screwed too, if so, then something is fundementally wrong
good idea
I also changed my base model, cause I was tyring to train Zovya Photo real, which I remember sen saying was a pain in the ass
now my loss is down to about 0.13, and not lowering
something is fundementally broken
and now its climbing to 0.145
0.15
0.155
0.15
guess I am gonna update my kohya install, cause its clearly broken
dope, new install:
instantly closes
Welp, looks like I can't make LoRA's anymore
something happened to kohya or something, its unusable now
it just doesn't work anymore. I didn't change anything, but its just broken now
well, i fixed their faces 
lmaoo
prompt: Lion, Elephant, Giraffe, Tiger, Hippopotamus, Cheetah, Rhinoceros, Zebra, Chimpanzee, Gorilla. at 1920x1080
i'm pretty happy with this
Made this pretty little chill and vibey moment before I go back to sleep :p
experimenting with various settings while making adult oriented material and the ai outputs this absolute freaking masterpiece. obvious ai errors aside, it blew my mind. like...i probably couldn't get it to output something like that if i tried
the john travoltas
john travolta as bob ross
@smoky oak so SDXL uses UniPC

to make it as fast as possible
Interesting
They must have found aay to optimize it
I know it can be faster, but it's a pain in the ass to constantly change the sliders to get the best out of it. I just use DDIM as it's faster 90% of the time with no hassle added
they're constantly changing the values on the bot's output
Yeah, makes sense. That's why I don't like UniPC, it constantly needs to be changed to be faster
it makes really weird results sometimes
And it's not even like a small amount of parameters with a small range either
Why would it? Lol
Meh, I'm used to it at this point lol
😮
The field has worse noise
Is that your fine-tune? It's looking much better
yea
Nice work
thanks sir
I just got a new anthro model specifically taylored to
✨GAY ✨
So I'm good haha
What a rare sight it is, to click onto a model and have it say "Warning, male content focus, this model is not for women"
Hey thanks
lol... pastel paint explosions, battle of flavortown, 8k, high quality, masterpiece, buildings
This is another couple from the series
The geode ones are fun too - miniature city inside geode, crystals, purple, 8k, high quality, masterpiece
realisticVisionV20_v20
The flavortown ones are a combination of pixar models and dreamshaper 6
I sometimes use grapelikedreamfruit if I want to add a little realism
my 2.1 model's version feels lacklustre though i didn't modify the prompt much
Yeah I go through a few and test out concepts to see what works
It is really, really bad at skin color in my experience, or maybe the prompting for skin color is unintuitive. Both natural skin tones and unnatural. With the one exception of blue for some reason. I can get people with blue skin no problem.
hi everyone, been reading the documentation and can't find a way to add more than 1 control unit. My UI looks like this but I need to have 2 control units like the other image
i love how his suit looks backwards
You have to go to A1111 settings and then under controlnet you can enable more than 1
@cyan snow try a toddler, sitting on the floor, welding on metal, sparks
steel smelting plants run by toddlers is pretty great too
haha, nice
i wished we could get one better than Bing's image generator but so far nothin
those are all great, not saying they aren't. but this Bing image is next-level
i love how mesmerised he is by the prety sparks
we lost to fukin bing
i wouldn't say that, SDXL is insane
SDXL isn't a finished model..
will try again when they push the next ckpt
well, yes. but right now, Bing is still better lmao
and that should sting
🤣
Bing uses DALL-E though, internally, and i don't remember DALL-E looking this good
i'll grab a couple more examples
well, SDXL pretty much beat MJ, and the current bot uses the 50% done model, so i'd say after it's done, then we will see
SDXL doesn't beat MJ most of the time, but if you cherry-pick, you could say that
yeah, i remember SD crushing DALL-E, that can't be DALL-E
well, 1.5 models can beat MJ, so SDXL after it's finetuned will school MJ.
I just royally fucked up my beard ._.
Anyways, what are you all up to?
i don't know what you mean when you say "beat", MJ is a collection of networks, not a single model. it is more vastly capable than 1.5 could be without putting in substantial work?
we lost to bing, apparently
What who huh
@smoky oak
please beat Bing's image output
this is embarrassing
Ah
nah, i can make images better than MJ, so there must be people that can demolish MJ.
oh hex nah
the photos they show to demonstrate 1.5's abilities i'm like, how the hell did they get them
must have made 100s of gens to get those
I can beat MJ in several ways, but obviously not all of them lol
i can in most, most of the time at least..
i don't have access to MJ but i have about 488,000 images from it that i've been sorting through and using as training data, and it's quite impressive but there's a fair bit of garbage
that said, making regularization images from any 1.5 model goes like total crap compared to just firing off the same class prompts at MJ
not necessarily SD's fault, btw
the fine-tuning tools we have access to are just really bad
MJ is already fine tuned.
i said any 1.5 model, including all of the fine-tunes
that is not true for the most part
i've been doing a lot of fine-tuning and i understand the process and its shortcomings very well. the tools that we have access to are really bad.
for the longest time, text encoder freezing just wasn't even a part of them
the TOOLS are bad, i agree. but the models available can compete with MJ.
there's still no mode of operation for sequential freeze and there's no adaptive freeze implementations available publicly for use
we get a box of crayons, and SD is using a fine set of tools, written by people who use them. they have stuff we do not
they would never =[
the whole point of SD is to be open sourced
Mmm, yes, childhood blindness
we get research like SmartFRZ but those guys did not release documentation on their training process, and they didn't release the model checkpoint they trained
so all we know is that you can train a cross-attention based prediction model that can decide when to freeze a layer, but i don't know how they got it
i thought i was missing something so i fed the whole paper to GPT4 and it was like "this is really cool! how will you train it?" and i'm like "what?"
so, are you implying that running AI locally is going to lose to capitalisem?
running generative AI locally is for enthusiasts now and likely into the future except in niche cases where a game engine might benefit from low latency with a specialised model for something
That's all the effort I am gonna put into the welding lol
some things just never hit the mainstream. there are WAY too many knobs to fiddle with in sd-webui for it to become popular. i don't like that concept, but i understand it
Baby's first retina bleaching
lmfao
Has a fun ring
Also, just saw this, and my god is it painfully true lmao
that's what i was sort of talking about just now
firefly is trash, just inpaint in a1111
Oh for sure, firefly is definitely trash
when AI was inaccessible and difficult and nerds-only, it's a "ban that shit" thing
SD will have all of its features better than adobe does before it's even out of beta lol
people need a point-and-click idiot experience
ReimagineXL and Unclip XL or wahtever they're called, are SAI's recognition of that
@oak osprey also, don't think I didn't see that cursed shit you wrote lmao
Hell. Even just the new update to controlnets inpainting model give adobe a run for their money
well, apparently not, since SD is going to lose to fuckin bing and midjorney
SD isn't losing to anybody anytime soon lol
only on pictures of babies welding
Once SDXL fully releases, it's gonna be a slaughter until the next generation of image gens come out
still, this is very upsetting
Bing makes odd looking adults
Bing and MJ are basically one trick ponies
oh that's so copium 
They do a lot of stuff decently, but nothing really great
i am not a fan of their closed business model but i recognise they're amazing
i'm just confused about DALL-E being so good through Bing, but not through the OpenAI API
I have yet to see MJ or Bing do anything too impressive, personally
dude, why is OpenAI even named like that? almost everything they made is closed-sourced.
this is made using ElevenLabs TTS (cloned my own voice, and it kinda i guess, sounds like me) and a script is written by GPT4 and the images are prompts created by GPT4 based on each line of the script using DALL-E 2
Like sure, they can do cool one off gens with decent quality pretty reliably, but they stick to decent, never going into excellent
it's a php script that makes all of this automatically and i don't plug it all in together. i could plug into stable diffusion, but believe it or not, it's easier to use DALL-E2 for newcomers
same, but i bet if someone like MJ releases their model, we can make FAR greater images on our own.
MJ will never do that
Besides, their stuff is gonna be massively behind when SDXL drops
yeah. there's no putting that cat back in the bag
and Midjourney is likely working on stuff using cutting edge research, the same research being used to develop SDXL. so i wouldn't necessarily put them down to pasture just yet
thats what we said about NAI, and here we are.
besides. why wouldn't you want Midjourney to beat SD? that encourages SAI to do better. look at what's happened to NVIDIA if you want to see what an echo chamber does
I'll hold my reservations on midjourney. They just have yet to do anything that inspires confidence, IMO
confidence in what, the dream of them releasing their model?
Oh no for sure, I do want competition to continue, but as of now, I don't see MJ as any form of competition
i'll agree to that but people like Ivan keep uploading millions of midjourney image datasets to Kaggle, so, i don't really need them to
Confidence in them releasing something truly impressive, like MJ v6 or something. Sure MJ v5 was a big improvement, but all it really did was close the gap to the now ancient 1.5 models of yester eon lol
@smoky oak but you're a niche user, an enthusiast. you have a 3080 GPU which places you already in a slim minority of users that are not only on a PC but also have a GPU and more than 6GB VRAM
yeah but for most of the world they are currently the best. though now that i'm playing with Bing, i'm like, why are people thinking Midjourney's so much better
There is no reason to use them when I have access to theoretically infinite expansion limited by just time
well clipdrop has SDXL i suppose
and DreamStudio has it as an option..
but those aren't for me
There are also other services I would recommend as well
Ah, those are the cut down and neutered versions
other services that likely use Stable Diffusion internally?
i don't know why i hate that so much
probably because i'm a "power user" and i'm like, if you're going to dangle a carrot over me at least make it unique
I supported Wombo Dream for a long time. Their service is fast, has no credit limits, has dozens of ever changing styles, has decent results, and costs a fraction of MJ
interesting. all of my friends are happy with MJ or BlueWillow 
Their subscription even comes with access to a discord bot version that has no NSFW filtering
Can't do it in app, cause that's against app store rules
Sure, their gens aren't as good as MJ, but you can get dozens of them in the time it takes MJ to do 1, and it costs like $7/m IIRV
And that's not exaggerating, BTW, I mean dozens lol
oh that's another dimension about MJ i don't know about. i have never sat there and watched it generate
i have no idea how long it takes
Their system uses SD. But also has some additional layers and latent upscalers built in
all of the language model layers to MJ likely add processing time though
Back when I was with MJ, a single V3 gen would take about 1-2 minutes
And that's assuming there was no wait time
that could just be queueing and capacity but still sounds horrendous
No, that's with no queue lol
well also back then pytorch2 hadn't been released/fixed, right
That's how long the gen itself would take
It would be like 30 seconds to get 4 low res previews, then you pick, and it would be like another 1-2 minutes of upscaling
And then you could upscale to max, which was like 5 minutes
Back then it was 512x512 to 1024x1024 to like 1400x1400 IIRC
And it was not good at all
Lol
yeah i've seen those datasets 😄
that's how we got OpenJourney
i wish prompt hero would do a v5
They dropped their not completely trash upscaler as I was leaving, right around when they started being super scummy and problematic
Which is why I left them
that could be so good with their expertise tbh
they seem to have fallen asleep on it
like it was a proof of concept, and a publicity stunt
not something they wanted to seriously improve or keep up with
I wonder how much longer I would have been with MJ, if they weren't so shitty
if they had a good desktop UI and fine-tuning APIs for cheap/free (because they can easily just, add your tuning to a library and profit from it)... it probably woulda saved you money on a GPU
there's absolutely a business model that MJ could do and dominate with, but they're afraid
I would have switched to SD eventually, it's just a much better platform for serious generators
poor fellows.
yeah it's like the Threadripper of image gens
you can get a lot out of it if you know what you're doing and everyone else thinks it's a scam / broken
Oh God, the amount of times I got that lmao
my mom was freaking out about AI and hated it until i started sending her obese versions of celebrities
this is the common person
🤣
Like when we held that big MJv4 vs MJv5 vs firefly, vs dalle, vs bing, vs SD comparison, and my image wiped the floor with them
Dudes response was "it took more work, so it's fundamentally worse"
Like
What
oh i get that though. from his eyes, it's true
my coworker comes along to generate images and his prompts are really bad. not just in content, eg. what he wants to see. but how he asks for it
i can't even paste any of his example prompts lmao
um, wtf, since when can operaGX can make prompts
i swear this is not a joke
it's a working prompt generator, i just noticed
crazy
these were the 3 shared at the time. first is MJv4, then bing, then firefly. prompt was like "futuristic city with art deco inspired architecture"
i asked GPT4 to generate prompts yesterday and it googled itself and found a midjourney prompting guide on reddit and used that to help me 😂 i died
and i shared this as my contribution
i think it's using a1111, it shows me the tokening process, is this a collaboration of Opera and a1111
And he said "it's higher resolution, so it's not a fair comparison"
To which I said, all of the others were max res, so I decided to share a decently high res images from SD
it's literally connecting to your A1111
People agreed, but he didn't lol
Oh wow lol
i mean, you have Automatic running already, right? or do you not use that
if you don't use it usually, then yeah it seems like Opera, uh, has a complete copy of PyTorch
yes i am, when i turn it off or leave the webpage it disappears wtf
that has to use a shitton of disk space, man
oh
so Opera has some new feature for A1111 users
that's pretty slick
i should make an extension for Chrome for my pytorch APIs and charge people for generating prompts 
OperaGX atleast
god that sounds like a pain in the ass
i don't want to have to have a billing relationship with users
i should just, like, start a whole company and hire people who love dealing with stupid shit like money
Honestly, the things I would do to get hired to just mess around with AI and do whatever my boss tells me to do with it
Pay me to break things, god damn it

I need to resume my python courses
Yeah, bing and MJ win in super niche stuff, not heavy hitters
i'd ask for the latest SD1.5 model so i can run some tests but i have a feeling i'll just see the same issues i usually see, just less severe - and i'm referring to faces in groups of people. this seems to be due to the small internal representation of the image in 64x64 or 96x96 (for 1.5 and 2.1-v respectively)
i can make one, two, four subjects with great faces but once i have a row of men standing side by side i'm like what the actual fuck 
Yeah, realism models can struggle hard with same face syndrome
Or distorted voided where the face is, until you upscale lol
1050 steps into fine-tuning
Linkin Park? Lol
it's much better than baseline but like, this seems to be some sort of ceiling on quality of faces in a group
if it makes them closer-up, the faces are better
i kept the prompt the same throughout training just to see if i fixed it at all. i did not. this is 5400 steps
's why i'm dying to know more about the SDXL architecture. was asking questions and got "i can't tell you that yet"
i need to know. is it still 64x64
how the fuck does it make faces work
what is its text encoder, how is it being trained, how many times have they had to restart from scratch
can you put toothpaste back into the tube, metaphorically speaking, once a model is over-trained? and bring it back?
i've never tried. once i see it, i just delete
yeahhhh i've come to realise that 'hires fix' just remains checked and enabled in a majority of people's A1111 installs, and i technically don't have that available to me (yet)
It's just img2img
@oak osprey do you have img2img? If you do, you have high res fix with like 2 more steps
High res fix is just an img2img operation
hi, does diffusers have this: https://github.com/Kahsolt/stable-diffusion-webui-hires-fix-progressive I realized that there's a lot of other features that I was not aware of when seeing this co...
love how a doge shows up
LMAO
why
thank you!
i should look into how to set up GANs in diffusers. it's probably not hard
Using this guide I was able to create some cool QR's https://www.youtube.com/watch?v=IntRn96C4l4, but is there a way to instead of passing just text, provide lets say a logo and the QR goes with that?
In this video, I explained how to make up a QR Code using Stable Diffusion and ControlNet. I hope you like it. (Creating QR Code with AI) Please don't be a shy to show your likes and follow me if you like the workflow.
Notes / Updates:
- Don't need to select "inpaint_global_harmonious" as Preprocessor in ControlNet tabs, "none" is okay for thi...
I am testing with additional ControlUnits including a sample logo , but results are kinda bad
If you get that working, then you have fully functioning high-res fix
All it is is a pixel upscale into an img2img
I don't even use pixel upscalers for my high-res fix anymore
the Real-ESRGAN?
poor dang ol' controlnet ruining my awesome model 
https://youtu.be/pvcTmnbiv2Y?list=PLDm3Ufb_Ohkhj5nc2jMFXAqO6LIPci7GI
ahhh so this is the song from always sunny where Frank loses it on the dog tranquilizers
the comments 😁
@smoky oak so real-esrgan was trained on purely synthetic data, not any photos at all
I want to remove photorealistic from existence
my fav is when I see people prompt an image generator like its taking requests "please make me a castle on the beach"
you can do instruction fine-tuned image text encoder. "make the balloon red" becomes a viable prompt lmao
oh yea well aware of that. I am talking about the people using it unaware into say base SD or MJ lol. Been looking at far too many prompts recently
oh, true, i forget who i'm talking to sometimes
some of the midjourney prompts in my dataset are like what the hell please ban these people 😁
in the beginning it's hard to know what helps contextualise for the model and what is just a waste of token space
hahah all good. I think instruction tuning is a pretty neat, love the concept but as with all things it ofc has the chance to "corridor" in concepts and make it hard to achieve certain outcomes. its honestly why I was more of a fan of the original prompt2prompt which instuct used, it took more effort of course but had much more fine grained control
also I love that controlNet just distilled the dataset into a 1.1 model for the architecture, that to me is real neat
absolutely. I still have records of all the nonsense I typed back in vqgan days haha
i don't like how much of it is "just use automatic1111", i would rather it be in Diffusers so it is force-multiplied for eg. InvokeAI to benefit from
auto1111 wen
yea, I have been using comfy a whole bunch recently which is a fantastic interface. That said true integration into say diffusers is best for all to accomodate all the structures easily
and controlnet seems uninterested in 2.1
is there a way to pass as input a screenshot of a Dashboard like this one and get an output of a 3d variation of it? without affecting too much the content
I think Thibaud trained a good set of them on that model, not the 1.1s of course. We are trying XL controlnets which are a whole thing in themselves lol
they won't tell Thibaud how to train the Tiles model
it's the only one i even care about lmao
Canny is pretty cool, but Tiles helps fix a lot of fundamental issues
haha I was looking for the details on that one too at a point
tiles fundamentally is an awesome model concept
yea full cnet context, totally cool
this came out as a validation image from training and i've been unable to really reproduce it in a typical context
the prompt was testing
😄
hahaha thats a pretty great test
idk man. the same batch kicked out bluetooth ATV and autonomous companion drone
and yea training samples vs actual samples is like a whole thing. its a good indicator but I need to try stuff myself at scale for a bit for any real idea of how something feels hah
those are pretty sweet! 2.1 right?
yes, always 2.1 from me
it's taken idk 2 months to figure out how to train it properly
i have something like 100 prompts i run through to get a good surface area coverage of what it's doing at each step and i generally select a ckpt by finding the most aesthetic / detailed / non-blurry test results that are the most consistent across the board. i use the earliest ckpt i can that still has the results i want, just to avoid overtraining
have to freeze most of the text encoder. i've never tried adding a new random layer and training that. i don't know how
you ever merge your checkpoints from the same run?
i don't merge things, i have yet to write any code to do it.
id say give it a shot if you can. it continues to suprises me what it does merging slightly undertrained ckpts all the way through concepts starting to burn in
i've managed to avoid burn-in with a learning rate of 1e-8 and a batch size of 150
also, freezing most of the TE
it's been running for 4 days now and it'd be burnt to shit otherwise ahahaha
hahaha nice and crispy
how many images do you usually do for a run? I guess I typically see people around here doing tiny batch sizes with even tinier datasets haha
150, are you running on multiple processes?
training terminal SNR into 2.1, 450 -> 900 -> 1500 -> 2550 -> 3600 steps
i am on a single A100-80G. it is a workhorse
ooo nice, terminal snr from that paper? You doing vpred then?
i use about 54,000 images currently. i had a lot more at one point but i've nuked most of the blurry/shitty ones
i started with 2.1-v so it started as vpred
ah nice size set!
in the sdxl-feedback channel i linked to a subset of it, the cushman kodachrome slides. i've labeled them using his own metadata that he labeled them with from 1938-1969. you guys can use that. it is public domain
it helped 2.1 figure out what certain eras looked like since the captions contain the date of the image, and where it was, and what's in the image
previously if i fine-tuned i'd lose the ability to say "1950s <prompt>" and radically change it
i imagine you have better image retrieval tools internally than i do, i recommend probing the Getty.edu APIs. they use the Linked Open Data API standard
I am not the data guy, for all I know its all already there haha, we have a few people who just focus that stuff with some great tooling
oh yeah i can imagine abusing apache toolsets like NiFi for sure for this
it's neat how much data you can cram into two layers
same model that made the kodachrome images
didn't expect that 😁
oh totally, kind of blew my mind how much you can cram into loras or other peft methods too. Some things just make a very large difference on top of these large generative models looking for any form of context haha
are you just training specific unet layers there?
i'm training all of them
ah ok was so by this you meant the encoder I assume?
yeah
the text encoder is kind of badass like that
just, like, never try to train the first idk, 13 layers of OpenCLIP
maybe it works okay at super high batch sizes but i haven't got the horsies to try that with
chatGPT does that lol
there's no publicly available information on how stability AI fine-tuned openclip when 2.x were made
ah you mean tuning of openclip on the 2.0/1 model?
I cant confirm this as I was not in the research end when 2.x was created but I believe the unet was trained purely on the frozen encoder for the base training of the model
that's wild, man
that did not work for me at all
it takes on textures from the training images, like the pixelation and blurriness. so i could see putting a super high res dataset together for just unet training but i've not tried that
oh, but that was with a lower batch size than i've come to accept is required now
haha I will say the batch sizes were likely the typical 2048 at least for that model, def a ton has changed since then in how we approach making and eventually releasing models that are hopefully way way nicer to use out of the gate
are you using any attention-based loss calculations like the SmartFRZ paper described? 
looking the paper up
The core component of SmartFRZ is a lightweight attention-based predictor. This component decides which layers will be frozen and when to freeze them during the training process.
Training the Predictor: The attention-based predictor cannot be directly trained using conventional datasets for classification tasks, such as ImageNet. Instead, you propose a novel method to generate the training dataset for the predictor.
Offline Training: The predictor is trained offline. Once it's well-trained, it can be used for different datasets and networks, as it learns generic converging patterns from collected training histories.
that's GPT4's explanation to me, because i'm too dumb to figure it out on my own 😄
ahh this seems like a very neat idea from first glance. We are actively working to improve large scale training efficiencies but we are not using something like this right now
this honestly sounds like the bee's knees', for my current understanding of the backwards pass. but it doesn't save a whole lot of memory if you freeze alternating layers, so, it's not going to solve that
you still have to calculate them
what'd be interesting is to investigate this approach with a sequential growth pattern in parameter count. instead of initialising the full model's weights randomly as they do in the paper. you could do them one at a time. it might just be a big waste of time to try that. because i suppose the first layer could learn representations that don't work with a new layer, and it would maybe never move past the big mess it needs to clean up after adding the new layer..
because a predictor that could actually determine with good accuracy when a layer has converged doesn't necessarily need to work multiple layers at once
i've not reached out to the guy responsible for that paper, but he's on Github. maybe for your organisation he would be able to release the model weights to you
hold on now you have me on the search for a long lost paper here haha
the one i mentioned? or a different one?
different one that is kind of what you are describing a bit haha
but I do love the paper you mentioned using the models own internal self-attention on the layers to dynamically freeze them, that is a really cool idea. Def going to share it
Originally, a few reviewers raised concerns. Primarily, the original paper limited its experimental results to CNN models and image classification on CIFAR-10. However, during the discussion phase the authors provided a large number of additional results on transformer architectures, recurrent architectures, language models, other vision datasets, etc, which significantly strengthened the paper. The authors were also able to satisfactorily address other concerns related to conceptual benefits of the proposed approach.
so the paper itself is a bit weaker than the approach really is
just keep that in mind
shortens training time apparently
man i'm impressed with researchers' patience when i see they waited a whole 36 hours for a checkpoint to test

i make one every 10 minutes til i run out of disk space and then cry as i try and figure out which ones to delete
is that all
I am sitting here waiting on one for like 4 days now, still cooking hah
ever seen the movie Clockstoppers?
you could be French Stewart, in that film
we'll put you into a time anomaly chamber
omg this looks so 90s hahhaha
if u wer truly eleet, kid, u would have it inferencieng in reeltime inside the training loop without ne1 knowin'

@oak osprey this actually looks pretty great for 2.1 https://www.reddit.com/r/StableDiffusion/comments/146272w/freedom_is_here_the_generalist_21_768x_finetuned/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=share_button
253 votes and 69 comments so far on Reddit
I'll have to download and try it when I get home
"our" huggingface demo always makes me sus
who tf are they lol
but i just remember that Anna Oop thing too freshly
hmm
Lol
it focused on his chest logo lmao
it's just that it's a demo img
if they're not cherry picked that's actually incredible
if they are cherry-picked, do they have eyes?

@oak osprey when I get home I'll be testing that model I sent. Seems to be the most promising I have seen
yeah that's the one from like 3 weeks ago
i'll be glad to have another model i can use to make fat celebrities to send to my mom to make her laugh
i think the "Robin Williams" it can make is by far her favourite
ahh so real, much wow
tbf i've never seen a mushroom cloud so i can not confirm nor deny that is what they look like
@smoky oak oh boy, that model is overtrained. you have to run it at CFG 5
CFG 7
i got her good with this one this morning. she had just woken up
dinner, 1940 😁
you know. THE dinner
Little Jack Horner, Pixar edition
i had to google it for a sec to make sure it wasn't a real thing
little miss muffet apparently makes a photoreal person with my negative prompt lol
How do I deal with this appearing on pictures?
Are you using high res fix?
keanu reeves' famous 1993 hot dog ice cream stand where the hot dog and ice cream are combined

mornin
let me know when AI art can surpass this 🥱
Damn, this chat has been dead for a while
what, high res fix? Helps massively for me, and all it is is img2img
Whats the best images youve seen generated by base SD?
trick question! Nobody uses base SD :p
oo a challenge
using base SD is just shooting yourself in the foot haha
more realistic than some of the 'realism' models I've seen lol
if by realistic, you mean deformed and perespectively inconsistent... then yes 😅
it has skin texture, and that's all anyone really cares about
Fair enough I guess lol
Oh wow, Deforum just released their own new node based AI workflow program made for image generation and video generation
where?
is there a video about this anywhere
it's a bit clunky to use
this creeps me out for some reason
been playing around with block weight merges of my mix and some anime models to get a bit of a cartoony version of my model
I did not know what to do so I put some random prompt someone sent me into img2img with a random meme i found on my computer, came out very nice actually
(the meme i used)
i used kawaice with 60 steps euler a
positive prompt:
masterpiece, best quality, 1girl, solo, watercolor, birdwatching, nature reserve, 20s, green hair, pixie cut, cap, blue eyes, round eyes, curious expression, binoculars, casual clothes, field guide, trees, bushes, lake, birds, binocular strap, standing pose, looking through binoculars, observing birds. nice hands, perfect hands, close-up, lyco:add_detail:0.6, lyco:hitokomoru_locon_new:0.6 lyco:GoodHands-beta2:1.0, BREAK, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame, wakame,
negative: (low quality, worst quality:1.4), (bad anatomy), fewer digits, jpeg artifacts, (extra fingers, deformed hands, polydactyl:1.5), wakame, EasyNegativeV2 FastNegativeEmbedding
Jesus, think you used Wakame enough? Lmao
Sir please don't use our Lord's name in vain in this Christian minecraft server 
Take a look at my profile again, in the AntiChrist 😈
Lol
Oh man, speaking of, mom and I just pulled into a church on Sunday just to flip a Uturn, and I think that is the most fitting shit we have ever done in our life lol
It was the only place where we could turn around lmao
Satanism is a terrible thing. You should heed my warning and reconsider the ramifications to be suffered by your immortal soul 
Lnao
*lmao
Following that by that emote is actually the best thing ever haha
The energy in that message is immaculate
Is that rayan Reynolds? Lmao
Emmanuel Macron
Oh wow, he looks like off brand Ryan Reynolds to me haha
um what?
Pretty sure that slash dream is blocked
Cause people always use it when this server doesn't have bots like that
Interesting, that always helps for me
it definitely upscaled the images
It should be the same, all A1111 high res fix is is an upscale before an img2img
what resolution does it go from/to
I do 2x upscaling, and I don't use pixel upscaling anymore, just raw ing2img
That's not actual SDXL, that's why lol
That's SDXL untrained
it's still annoying how the bot randomizes settings and gives you total garbage
It's a beta, just a little play thing
yes, i know how betas work
When the full version comes out, it's gonna be insane. Cause they are going from 2.1s terrible data set to their new massively improved data set with all of the new technologies and such
All that's the same between the one on the bot and the real one is the amount of parameters
it's just discouraging that the latest state of the art still has all/most of the same issues
the way they train, they've already frozen their text encoder
I mean, it's using old tech and old data sets, only new part is the parameters
these hand issues won't go away
they've already chosen their dataset, lmao
why do you think they're training it for a month+ on the wrong data
it's using their new UniPC sampler etc, they wrote it internally
The version we are using isn't trained on their actual data set
i was talking to TwoDukes about it
i gave him a bunch of new data for it, but they're already past that
Even just the quality results from the 50% trained checkpoint are astronomically better than any other image generation models I have seen, and it can only image how good it will be at 75% or the final output
Unlikely any of us will be able to run it tho
But still
It's a good proof of concept, which makes me even more hopeful for SD 3.0, which is taking the further improved better parts of SDXL and cutting it down to a size we can all run like 1.5 and 2.1
it runs currently on a 3090 just fine
from the sounds of it, they are not having luck distilling the model
They themselves said that SDXL is kinda just a tech demo, and is being used to make SD 3.0
every time they've distilled it, they lose a massive ability chunk of the model to generalise
Interesting
distilling it is like quantizing it down to a set of minimally required parameters
it just seems like the way model weights work, they're ALL required
Yeah
you train them all, why get rid of some
Interesting
they also told me to start merging checkpoints from a single training run
so i assume they're doing that too
What I was told some months ago, which could very well have changed, is that SDXL is more of a tech demo to help select new tech for the more condensed, but still massively more capable 3.0 release, where they take all the good and further cut down the bad. They said it won't be as capable, but the parameters it does have should be even better reinforced and capable for most gens
they also said that internal experiments are showing that LoRA in 1.5 are roughly equal to Textual inversions in 2.1, and that LoRA are more powerful in 2.1 than in 1.5, but that in SDXL, textual inversion beats LoRA in 2.1, and LoRA in SDXL is like fine-tuning 2.1 or 1.5
Yeah, that makes sense. That's the secret to offset noise that I guessed about a while ago. Ended up being right, but we don't really need offset noise anymore haha
there's multiple "secrets" to offset noise, but yeah the first such secret is to stop using it now 😄
Real-ESRGAN is yucktastic
All of the models I have now have some level of offset noise, which serves my need for most generations
it's hard to train with, while still ensuring you absorb all the concepts you want to
Yeah, something must be fundementally different in your pipeline to a1111's for upscaling... Hmmm
the offset noise eventually becomes splotchy
"my name is Mike Rotch. take me to your leader"
I wonder if I could train a LoRA for Alexandres extremely over the top earrings lol
hard to tell if things are improving still
Imagine being able to put these on anybody lmao
you're going to need better images of them than that
😛
i found a dataset for people in groups
I wear anything more than large stud earrings and my ears hurt
I could only imagine how damn heavy those Alexandres earrings are lmao
i doubt they're made of high-impact steel
do you have vitamin deficiencies 😛
i could probably have a toddler hanging off my ears and it would be fine
lol
Ahhhh, I googled it
They are silver plated leather, much less heavy than solid silver
They are $3000 💀
Can't find any info on their weight lol
Lmfaooo
that group is how i discovered bing not only has image gen but its incredible
lmao
i think 95 percent of people in there use bing
Bing is just the right amount of unhinged
theres a minority of kandinsky users somehow and i see very few sd posts
so @smoky oak img2img requires pixel upscale 
Never even heard of kandinsky
as far as i can tell, they're doing a Pillow resize before putting it into the img2img
PIL.Image.resize()
if you don't define an upscaler
that works for you? that's what i was doing with Controlnet Tiles
All that really does is just stretch the image to the size of the canvas you stated, it doesn't do anything in terms of generating new info or anything
true, yes
It's like taking a 512x512 image and zooming it out to fix a 1024x1024 canvas, then it does Img2img that way
*to fix
yeah for controlnet tile you condition the image so that one side (the smaller) is 1024px
I don't even use tile upscaler anymore
i've tried different resolution/aspect ratio for controlnet tile and it goes total trash
I just use high res fix, it's high enough resolution for what I need for my current gens
I do go higher with realism
But I'm not doing realism currently cause it's deeply disappointing compared to other things SD can do haha
yeah sd excels with weird shit
not realism
thats why the demo image is an astronaut riding a horse
in fact sd is good at the stuff 3d rendering is good at, likely for the same reasons
I use basically only the two anthro models I have now, cause they work so much better than any of the other models.
Actually give you what you ask for lol
when its not a person, our brains forgive errors more easily
Not sure what happened with my message formatting there lol
first day on the internet
man i think sdxl gave my cricket human hands
controlnet fixes faces at the cost of all image details
If I could find a realism or art model that could do stuff half as good as the anthro models I use now, I would probably be back to doing realism again haha
Got a link? I am in a car ride, but I can remote to my PC to download it
I just use SD in the car now lol
My own go anywhere generation service
no because im on a phone too
Ah damn haha
Is it a 2.1 model?
At this point, even Zovya's photo real feels kinda pathetic compared to these new anthro models. Can barely get them to do specific sinple things
*simple
Majicmix is good
I don't see myself using realism models again any time soon, just too restricted, which sucks cause I really liked them before. The curse of getting something better 😅
Restricted?
Yeah, they fucking suck at listening lmao
You have to fight to get a simple pose out of them
Skill issue
They are still pretty bad at hands, their realism isn't really "realistic", just reallyyy bad compared to the newer models I use now, but the new models aren't realism capable
I think the only way to get a realism model that's less of a pain to use would be to retrain one from the beginning
i told you lmao
my check point for 2.1 is a true base model that can be further fine tuned
i know because im doing it now, about 3k steps in and no burning
I doubt we will get any new models with considerably better realism before SDXL releases
Freedom is already burned and overtrained and the author says its a base model
idk why there is so much wakame lol i just took the prompt from someone else
evidently, they don't know how to prompt weight 😅
you cant prompt weight on burnt models
Wasn't talking about that
i mean whoever made the prompt could have been struggling with that
Yeah
I remember with Wombo Dream, spamming words was the only way to get their weight up lol
ooookay that's unique and cool
1.0 strength controlnet tile on "a steaming cup of coffee" from 768x to 1024x
it made it boulders on a beach up against the ocean with a building shaped like a coffee mug
i would so go to that coffee shop
but this is why i don't use CTU
just destroys all the image details
It's likely that you're not using it right
lmao it's possible. the way they say to use it is with the prompt "best quality", but i do what you told me and i pass the image prompt
the other thing is i'm using a 1.5 model for this
I do it with a modified prompt. Never had any issues with loss of detail
i'm not being a dick about 1.5 again i just think that's a problem
a modified prompt?
I just alter it to better suit the upscale
well the detail loss occurs independently of the prompt, that is more dependent on the strength. but the lower the strength the less 'fixing' it does to the image, and the higher the strength, the more 'replacing' it does
so i tend to use 0.3 strength with an already-high-res image and that works great
when you do an upscale from a low quality image, 1.0 strength works well but i've admittedly not tried like, 0.9 or 0.7 very often
I am confused, you're using the controlnet tile upscaler, right? That's kinda the whole point is that you can go as high denoise as you want and it references the original to change it little
I've used 0.8 denoise, and it's been fine and consistent
yep that's what i'm using
Weird, there is certainly something different
i have no opportunity to try a 2.1 controlnet model on the 2.1 outputs
so i'm under the assumption that this is why it sucks
i'm taking a 2.1 image and using a 1.5 model that's completely unrelated to try and add detail to it
and 1.5's idea of detail is fucking weird
it wants everything to be covered in grass and leaves
Ohhhh, going across models, yeah that's gonna cause issues
yeah 
i've been bothering Thibaud and the ControlNet guys but CN isn't interested in 2.1 or SDXL even
It's the same reason why using a different sampler to upscale is a bad idea
erm
the controlnet docs use UniPC
so i don't think that's 100% an issue every time, like going cross-model would be
I always use the same upscaler as I generated in. Back in my research on ultimate upscale, I found that switching specifically between ancestral and non ancestral created a form of radial blur
oh. the ancestral samplers are pretty bad - but i'm mostly just repeating words from one of teh Stability staff from yesterday, when i say that.
Still kinda upset that all of that data I collected on ultimate upscale is kinda moot now
Well, it still helps with CTU
yeah it's still a viable workflow idk what you sayin'
it just doesn't work for every little thing
You're supposed to use ultimate upscale with CTU, that's where you get the best stuff, but the research I did is in parts of ultimate upscale that aren't bad important with CTU
steaming hot french toast ocean, blowing waves of blueberries onto the shorline
The introduction of CTU with ultimate upscale got rid of most of the issues I was looking into, basically
oh, that
At the time, I had basically the best workflow around, but CTU make it so much more accessible and less of a pain in the ass to get good results lol
and that's upsetting? 
This music generation is getting pretty good...
Based off of some Contra music. Crazy.
Or Zelda 2 cave music...
It was at the time, cause I had so many people following the release of my guide on Reddit, and then I had to go through each person and tell them that all of those weeks of work were for nothing's and all of their feedback was moot
wakam
Make a new workflow guide Sytan 😄
yeah
just in time for SDXL to obsolete it
i bet if you started working on your guide, just before you're done, SDXL will be released because you started working on the guide
but if you don't start that work, it'll never happen. the universe will have no reason
It's not worth it anymore unfortunately
I learned a tonnnn about how ultimate upscale works, and how it doesn't lol
I can still utilize some of those secrets for my own gain haha
We're having a day at Disneyland
Not how I was planning on spending my day, but my grandfather won tickets
you left Texas??
I was never in Texas
aren't we all, in a way, in Texas?
Certainly not
im in the Canadian part of Texas

