#🔧|finetune
1 messages · Page 5 of 1
yes, and thats why i never do it
I'll just do both and compare with promts I'm just testing everything anyway atm
okay, but if its just a style, i recommend 512x512 and just you making manual cropping, that way you choose what to cut
sort of like this
if i wanted to have the bleach manga style
yea and it's a lot of work to crop even with birme
I'll do that later when I'm more familiar with TE
do you use the auto cropping in automatic?
didn't even know that existed
thx gonna try that next then
no prob
I'll also have to redo the dreambooth training
what I trained was a complete mess but I used too many source images
I'll try separating backgrounds from characters and train 2 models later I think
yea I already deleted most
Yhea dont do that
lol yep I did that too
what is the output like?
i should train another 90s model
it was good, tho it was trained on HN
let me see
add some armitage and ghost in the shell if you haven't already
extra stuff with mainly 90s and Bleach
i do yes
i think i deleted my older 90s results prob because im running out of space
lol yea sd sucks up space in no time
yup
its an NVME but for some reason it came divided into two sections each 500gb when i bought it
so its 1TB
that is pretty low too
Mainly Anime
I assumed as much with your outputs
I do backgrounds and textures mostly with SD
I hope that I can train a model that will help me with gamedev
pretty much
nice our tastes in anime are very similar
I only watched bleach season one after that it became really bad
pokemon I don't like that much
db same only season 1
Nah man....you talking to a man who loves Bleach rn. and the new season....WOW
got OST Manga, extra stuff too
So I assume you are rewatching bleach to create a model rn?
Nope
Not yet
im more interested in the manga style
but....
i currently have my hands full on a project
if you want to save time use blender in video editing mode and render in 1fps that gives about 1k pics for 20min
yup i do that
ahh sd project
yessir
lol you remind me of a friend
i have been working on her model for WEEKS!
yesterday i did a breakthrough
and im closer to my goal
to release the model publicly
and yes
it can do NSFW
lol
he likes to use yessir too like that
yep those pics are really good for sd
also you might notice she don't got a tail
its an old model
the new one has the tail
now that you say it yea but it's been ages since I've seen the movie so I didn't even notice before XD
do you also render out 3d models for poses for img2img?
i haven't.... maybe i should
my test a couple month back was not very good but with TE that should become a lot better
I think I still got the output
i also upscale with img2img
this is the original render from blender
I'll def test it again with a trained model
its an interesting concept, maybe i should do the same
with your skills it's prob gonna look a lot better
i doubt it, but it will probably depend on the prompt
if you don't know blender well just genereate a model with makehuman
or download a free rigged one
then you just need to pose it add a camera in the right angle and render out an image
it's not too hard to learn posing a rig
i know blender, i just haven't gone to to use it a while
i used to do a bunch of texture editing, renders and NSFW stuff
but now im distracted by SD
ic
y I'm distracted by SD too
gotta do more gamedev soon
but if I can get SD to generate textures then I'll save a lot of work in the future
This is the new model, the previous one was trained on WD this one is on Furry model (ye18)
for sure
im still working out some stuff
still in WIP
i also noticed sometimes the neg prompt can disturb what you want so its best to take out the negative and generate, then use the negative and compare results
thx that is good to know
damn now with all the different models are appearing my HDD is gonna explode with model files
wow that is a lot of time
they are prob. gonna be outdated in 3 month and you'll be able to delete them all XD
Maybe....
cries in GBs
i wish i had more TB's
its kinda sad people kicked Clip guided diffusion into the waste
These are old outputs i made with Clip
or in this case CLIP + VQGAN
and i love it
I bet that will come back at some point when everyone is fed up with the perfect looking images
i think this one was "Beach in a dream"
that is some pretty good stuff
i guess this is more of what dreams look like instead of the realism that SD provides
so is clip gone forever or just not used much?
no clue
prob not used much
i still have the app
so i can use it
but i never do
maybe a SD model trained with abstract images would output similar images
i tried putting it in automatic1111 but it wouldn't run it
even tho it's a ckpt file?
I'm working on a kind of expanded "prompt x/y" to compare multiple sets of diffusers on different configurations and a set of prompts. I'm still working on the UI, but the inner comparing function is mostly done, and I wanted to share 🙂
What do you mean?
isn't sd 2.0 stable diffusion v 2?
why tf is my cpu at 64 degrees
cpu isn't even used 10%
guess the gpu is radiating that heat everywhere
guess I'll have to change the cooling if I do this a lot
yeah, like those old amplifiers for eletric guitars
holy shit gpu waterblocks cost 200$
excuse-me, what's waterblock?
it's cooler used to connect watercooling to a gpu
thank you
my old pc was water cooled and I thought about watercooling my current one but if did that It would cost like 500$
can't afford that atm just for cooling
I hope the model training works then I'll hopefully finish a game sometime in 2023
oh, you're making a game as well?
multiple
also, I don't get why you need special cooling for generating
my pc runs too hot
yeah haha but I mean, do you generate all the time? Like many batches per minute ?
here is one I've paused atm as it was too much work for a single gamedev: http://brahrah.6te.net/projects/MayIncAz/pic1_4k.png
but if I can get SD to generate nice pixelart textures I might pick it up in 2023/2024 again
I've switched to a really small project where I'm currently animating a worm:
that is what I should actually be doing but instead I'm making SD models ...
SD doesn't in that project
what are you working on?
I wonder if it would be possible to train SD to create diffuse textures for uv maps
hmmm
I don't know much about uv maps, but I've seen GANs doing that, so it should be possible with SD
unless you have some kind of hard requirement, it should be a matter of finetuning
do you see a reason it wouldn't work?
as for my project, hmmm
I'm working on my master's thesis, on a new idea that might become a paper
so I really shouldn't tell much of the details until I can publish or something
Already had to restart my research from scratch once, don't wanna do it again haha
First time didn't have to do with SD, it was a new segmentation technique that turned to be crap in the end
ok, not crap, I'm exaggerating. But not good enough to publish
every uv map is differently shaped so I'm not sure if SD is gonna be able to fill them right
and generating an object with SD and cutting it out would only texture on side of an object
there is also stretching effect in the diffuse of round objects on a 2d image that looks normal on a 3d object
there is a lot of knowledge involved in creating a diffuse for a 3d object so I assume SD won't be able to do that
or maybe only for simple objects that have similar uv maps
yeah, I've seen networks trained specifically for that and they can do pretty well
in the context of clothing for fashion
but maybe SD doesn't have the specific knowledge
yet, maybe it does. Who knows 😛
guess I'll have to try it sometime
what I guess could work is img2img for alternative diffuses
yup. I've yet to see someone fine tune SD with image prompts instead of text captions
would be interesting
so you could finetune with img2img
first I need to train a good pixelart model
that is prob. gonna take a couple of weeks
if my pc didn't run so hot I could train SD and use blender at the same time
sorry for delay
quite the heavy user, huh.
so your focus is the pixelart model? YOu seem like the type that does a lot at once, honestly
which is always dangerous 😛
i accidentally img2img an image with no prompt for upscale....and ngl it looks better than i would have done with a prompt
original
no prompt^
for some reason no prompt just made it work
no
sometimes I have problems in img2img when I use the same seed as for the lowscale version
ok
huh interesting
i did an upscale with the same prompt as original and it looks worse
No prompt is better?
idk i just see the no prompt has no artifacts
maybe you wanna ask this is #1045349359044280360 . I, for one, have no idea why this happens
im not using 2.0
ohhh
just a custom model and img2img
yeah, I thought so because the 2.0 comes with an upscaler. Got it now.
yhea but the upscaler still doesn't work i think
@weary knot how is that dangerous? Anyone that doesn't learn how to use AI now will have to learn it later. My assumption is that in 5 to 10 years digital artists that can't use AI will be out of jobs.
oh no, I mean doing a lot of things at once is dangerous
because you end up not finishing many of them
and getting distracted
not you specifically, anyone who does that
ahh yea that is true but the only distraction I have atm is SD so I'm not too worried
My latest Dreambooth model creates images of objects taped to walls: https://huggingface.co/ProGamerGov/Object-Taped-To-Wall-Diffusion-V1
Genius!
As for why? The question you should really be asking is "why not"?
Can you tape her to a wall?
lol, sadly not yet. It doesn't work so well with taping really large objects to walls
im liking the new DPM++ SDE Karras sampler
ahahahha that's awesome!
how did you do that, though?
your workflow
🍔
I acquired photos and paintings of objects like pizza, wine, bananas, cans, phones, cameras, bags, dildos, etc... taped to walls (22 images in total), and then trained the model on it
To increase the training dataset size, I'd need photos from people of them duct taping objects to walls as I've exhausted the search online for such content
that those images exist to begin with is the weirdest part
so it's like a style, in the end
why not do it yourself?
tape them
I could also try to get Redditors to do it for me: https://www.reddit.com/r/StableDiffusion/comments/z67zn8/my_latest_dreambooth_model_release/
Worried about damaging my walls with the duct tape
ahh
i have cement walls so i don't need to worry
I..... LOVE the freckles here
she looking like she has shark skin
was it also a simple dreambooth training, or did you do anythign special on this model?
simple, just tags, and everything else normal, its the same model as earlier
tho im using the new sampler
DPM++ SDE Karras
i just realized, those are some perfect fingers
indeed
yea I think I should do 512x512 next
I did not expect to get abstract art when using 768x432 instead of 512x512
or was 10k steps for an embed too low?
nah man....its the res
besides SD is bad at lower res than 512
thats why people always do 512x512
how do I generate regularization images? I only downloaded 1k for persons but I have none for backgrounds.
maybe if you search discord you'll find a set you can use
I just forked a dreambooth notebook with Stable Diffusion V2 enabled and wrote a guide to go with it!! I hope this helps folks out 🙂
https://twitter.com/kaliyuga_ai/status/1596955332181655552?s=46&t=NFRtT8ATnPZRMb4dNQ8_NA
Ok! By popular demand I forked the DreamBooth notebook I use to have SD V2 enabled, and I wrote a guide to go along with it! I hope folks find this useful and fun!
Notebook: https://t.co/ymeiSpYzZk
Guide: https://t.co/HiIppaOiCj
:)
#stablediffusion2 #dreambooth #stablediffusion
what is a DreamBooth notebook?
So you don't generate them but collect them?
not sure, actuall
Hum I've an new issue with DB extension of a1111 since last update, I didn't have it before: Error completing request
Arguments: ('MyModel', '', 'D:\SD\SD-learn\input512', 'D:\SD\SD-learn\Regul', 'photo of a RobertLF person', 'photo of a person', '', '', 1, 7.5, 40, 2371, 512, False, True, 1, 1, 1, 3000, 1, True, 1e-06, False, 'constant', 0, True, 0.9, 0.999, 0.01, 1e-08, 1, 5000, 5000, 'fp16', True, '', False, True, '75', True, False) {}
Traceback (most recent call last):
File "P:\SD\stable-diffusion-webui\modules\ui.py", line 169, in f
res = list(func(*args, **kwargs))
File "P:\SD\stable-diffusion-webui\webui.py", line 58, in f
res = func(*args, **kwargs)
File "P:\SD\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\dreambooth.py", line 331, in start_training
trained_steps = main(config)
File "P:\SD\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 649, in main
unet = UNet2DConditionModel.from_pretrained(
File "P:\SD\stable-diffusion-webui\venv\lib\site-packages\diffusers\modeling_utils.py", line 471, in from_pretrained
for param_name, param in state_dict.items():
AttributeError: 'NoneType' object has no attribute 'items'
Is there a missing argument?
Ho well there was another error above, seems the model structure directory and files were not created correctly... These two subdir were not here in the model directory:
so I copied them from a preceding model...
and this file was corrupted (considered as "non zip file" in an error)
So I copied it from a preceding model
and now it seems to work...
At least the logs says it is training
Any v2 models out yet?



Has anyone come across any text inversion notebooks for 2.0 yet? or can I just chuck the path to 2.0 in https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb and it'd just work?
@hot creek I had good results setting it to 0.000001 not sure how much 4e-6 is.
It's 0.000004
during fine tuning?
no, on training
train with 512x512 or any 1:1 ratio
HUGE update coming to my Legend of Korra model soonTM! Featuring training on 768x768 pixel size (on 1.5 SD!), ALL characters, experimental caption method, manually pruned dataset from 30000 images....
My currently released v1.0 Legend of Korra model (https://huggingface.co/ai-characters/4elements-diffusion) has some big issues with caption overload, inflexibility, overtraining, only Korra trained, etc... but I have recently come across the EverDream repo by Freon that allows for training of up to 768x768 pixel size, up from the standard 512x512, and which I want to use for the rework of my model.
So I took this opportunity to redownload the entire original 30000 image dataset and manually prune it this time around. It will have taken 5 days now to fully prune it down to 5000-10000 images.
I will also include more fanarts and cosplay photos.
I am also going to try out a new caption method where I only caption a small set of the images but very detailed, and then the rest of the images will serve more as padding. E.g. those images could be captioned as say "screencap from token in the token aesthetic". My thinking here is that the model AI will be smart enough to link the captioned character images to uncaptioned character images in the wider dataset as it will now who is who from the captioned images. If this works it will massively reduce my workload while massively increasing likeness and flexibility by providing a ton more data of the characters and style for the AI without overloading the captions.
I will also now feature all major characters and some minor characters (including Naga and Pabu!) in all outfits, and no longer just Korra!
I will still train this new version on 1.5 SD, but I may train a 2.0 SD version too sooner or later.
The fact that there will be a ton of screencaps featuring hands likely means that hand generation with this model will be very good too!
However as this manual processing all takes a lot of my time and attention I probably wont be able to start training until in a few days. I also have final exams starting thursday which will further slow down things.
I will release the pruned but uncaptioned dataset later today
or tomorrow depending on how long the fanarts will take
will be in original aspect ratios and resolutions though as i intend to use this dataset for 768x768 training
but i put 5 days of pruning work into this. around 8 hours per day or so.
Hi! Trying to run Dreambooth on 2.0 using the "accelerate launch train_dreambooth.py" from Shivam Shrirao repository but I get this error. What did I miss?
transformers.utils.hub.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/stabilityai/stable-diffusion-2/resolve/main/config.json
The URL it's trying to get to doesn't exist
Yeah.. but why does it try to get that url? My only parameter is stabilityai/stable-diffusion-2 as name of source model, and he completes by himself the other parts of the url... So maybe I've a wrong version of some module somewhere?
There were some issues on huggingface a bit earlier. Perhaps it's just busted.
Donc think so... there are indeed no config.json file in the root directory of the distribution https://huggingface.co/stabilityai/stable-diffusion-2/tree/main (and there seems to do not exists resolve/main directory neither). The only json file is model_index.json
does Shivam's fork support 2.0?
with updated diffusers and transformers it should train on the 512-base model just fine.
ah
im trying to make an embedding for a subject, does anyone know a tool to download all images from google images?
is there a nice tool so that making a dataset is less of a pain?
curl
"Error completing request
Arguments: ('harrisondreambooth3', False) {}
Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\modules\call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "C:\AI\stable-diffusion-webui\modules\call_queue.py", line 28, in f
res = func(*args, **kwargs)
TypeError: start_training() takes 1 positional argument but 2 were given
Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
output = await app.blocks.process_api(
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 983, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 913, in postprocess_data
if predictions[i] is components._Keywords.FINISHED_ITERATING:
IndexError: tuple index out of range"
Does anyone have any ideas how to solve this error while training dreambooth?
You’re doing something wrong. It takes me 2 days to train an embedding on CPU
how many steps do you use
6K
That sounds like overkill
i just left the default values
What are you training out of curiosity
burgonet
has anyone got dreambooth extension to automatic1111 to work with sd 2.0 768 ?
2 Days still sounds like a lot
do I need to care about the hypernetwork thing?
100k steps sounds like it should take several hours
(just to be clear I am just asking if its possible yet, not asking to be told how to do it.... i mean that would be nice...but if i know its not possible yet I will stop looking for how lol)
5k steps?
if you have the hard drive space go for 6500k but have it output an checkpoint every 300 steps
think ill run it overnight at 10k steps or smth
when its done, test each one.... sometimes i get good results at 1200 steps, sometimes at 6k
sometimes i get good results when the sample has been bad, other times the sample has looked good but the results turned out bad
test each what?
each checkpoint
no
and with the dreambooth extension on automatic1111 you can train more steps as needed
ah assumed you were doing dreambooth, didnt think embeddings got good results? but yeah my advice is dreambooth advice as i have no experience on embeddings lol
i heard you cant merge concepts with dreambooht
i use a concept.json to train it on 3 people at once.... works rather well
the latest update to the one i use means i got to re-do the concept file though
is there a tool to make annotations less of a pain?
not that i know off, but its not that difficult
its tedious
shrug sometimes you got to put in the work to get the results
once you got a template for a concept...cut and paste is a go to.... just make sure to get notepad++ so you can check the json file is valid
you make copilot work
I think so. Embedding size is only determined by the number of vectors you give it
maybe i wait a year or two, then all the smart and hard working people would have figured everything out
as a programmer you know that sometimes to do something simple requires a significant amount of work lol ..... hell some of my "lazy" methods ended up more work then they solved lol
have i been mistold when people have said to me to use dreambooth not embedding as dreambooth is supposedly better?
if i know something is going to be tedious, i would rather waste 10 times as much work it would take to do it manually so that i can have a bot do it for me 😂
I have got great results from dreambooth after all...but never tried embeddings
you ARE aprogrammer 🙂
Dreambooth is more accurate at replicating something specific but it’s less flexible usually.
ah using it for my (and mates faces) so accuracy is somewhat key
the results i get tend to be either very very exact (spooky) or regonisable as me/mates..... the ai does like to pack a few pounds onto me...yes the ai.... totally the ai ......
Embeddings don’t give that kind of result unless you crank the vector count up to absurd numbers, but then it starts requiring a lot more training data than dreambooth uses
and for reference this is training image
so you can get the same results but just longer training?
I haven’t tried training a high precision embedding yet, so I couldn’t say
so far I have trained me and mates on a lot of models, the openmidjourney being one of my favs... thats the main downside I seem to have... an embedding may take longer but if the result is the same then thats a one shot deal reagardless of models
Embeddings only work on the model they were trained with
I mean I am here to find out if there is a way to get dreambooth to work for 2.0, but that will mean a few hours training again lol
considering it has never see anyone wearing a burgonet
oh?
Sometimes there’s a little cross-compatibility, but an embedding only usually works on the 1 model
ah I have been using a couple on multiple models... but dont tend to use them often (and with mixed results)
then the only plus to embeddings would appear to be the hard drive space used?
(which tbh is starting to become an issue lol)
That definitely is an advantage
I may have to actually give embeddings a try.... compare the two methods (once its all working for 2.0/when i find how to)
can you combine multiple dreambooth finuted model?
in automatic1111 yes... merge models...but you lose something of both when you do....hard to explain (mainly as i dont understand it)
but you are essentially influencing the first model to a percentage of the second model
personally i find it better to just train the other model the same as you did the first
yeah alright, I can't get to have the model do what I want
think ill just go learn how to draw or smth 😂
its not quite that bad lol
I just use the model i like as the source model and train the same as i did the base model
With weighted average, you dilute each input model. With add difference, you keep what is unique about both models, but you lose some of the original
.... There may (probably is) be better ways
Emad? think thats his name, has said better faster tools are coming to finetune 2.0
but coming is an annoying timeframe.... star citizen full release is coming.....
the eventual heat death of the earth is coming
(not sure which of those examples will be first, but you get my point lol)
nice
if you try the same prompt with say the midjourney model, does it work?
(or any other model you may have)
curious how much the model an embed is trained on has an effect
I had this with MJV4
to be honest
the amount of work
to have anything REMOTELY as good as midjourney
no i meant the midjourney model for stable diffusion
is probably worth more than the amount of $$ one would give to MJ
but unfortunately, it doenst undestand what a burgonet
is
if memory serves its the model i trained my above image on, I know i have a few models based on the midjourney one
ah havent tried with the midjourney finetuned model yet
this i generated earlier on my trained version of midjourney v4
(for somereason it added a beard lol)
actually now not sure if that was midjourney one.... i have too many models lol
that dinosaur thing does NOT have good intentions
anyway, still too much work, ill learn to draw instead xd
lol, my artistic talents extends to "deformed stickman" so I am stuck with ai
i drew when i was a kiddo
right i need go do some work, nice chatting with you guys
same
Can I dreambooth the 768 model using the training res as 512? or the results will be bad ( running out or memory with 768 res )
How are you training the 768 one and how much vram do you have? tried before work but couldnt get the extension to work
I'm not, using google colab I can do 768, on my computer (12gb) just 512.
Thats why I was asking to use 512 on the 768.
ah, i suspect that 512 on the 768 wont work right (I found that it seems to have issues generating to prompt at anything bellow 768x768)
was hoping the 2.0 768 model would work ok locally on dreambooth
For hypernetwork I was able to do with auto1111, 12gb vram
with: https://www.reddit.com/r/StableDiffusion/comments/yibx9b/successful_hypernetwork_training_on_a_6gb_vcard/
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:24
what's your regular RAM, please? Mine is 16GB
ty!
Question for y'all - what would you say is a good minimal dataset size+step count for a quick-and-dirty dreambooth run? I've had good results with around 30 images/6k steps, but I'm wondering how much (if at all) I can safely trim that down for a quicker meme-ier take.
12 images, 2k steps
very minimal quick and dirty, worked well for 2 models I did (both were 12 images, 2020 steps)
I just followed this guide, although this one is outdated now, but it has the training rate and settings that worked well:
https://www.youtube.com/watch?v=TgUrA1Nq4uE
can go as low as 500 steps with DB if you only focus on a close up and also stack the classifiers with pictures more specific than the class used
speaking of which the DB colab seems to be working for 768 just need a "premium GPU" (13 units per hours, is like 1.3$) and it takes 20 minutes to do 500 steps
so 10k steps which is usually where it starts getting really good if you have say 150 instance pictures willl be like 400 minutes which is almost 7 hours
wich would cost 91 credits on colab ... a bit steep for now 😄
might be ok after all, here is a run on 512 with a regular colab gpu #1045349359044280360 message
the first 2500 steps were a bit hopeless but it kinda got there after all
would say maybe try pushing it a bit further but seeing how much longer 2.0 seems to be needing compared to 1.5, and how expensive that can get ... try if you're really motivated or are running locally
I am currently training 3 concepts to 768 model in dreambooth extension to auto1111. They concepts are my mates and I ...however the images are 512x512 so i do not expect it to work.... time will tell
currently it using exactly 12gb vram
🤞
glad to hear iti's working on local installs already
just gotta wait a bit more for colab and runpod
first samples look terrible but that would be expected at 300 steps tbh
none of these look like who they are meant to.....though i sort of see josh's beard in his...sorta
top one is meant to be my bald mate craig lol.... as i say though 300 steps...not gonna judge it on 300steps
in the grid posted above it didn't even start looking like anything befoe 2.5k steps
good luck 🤞
yeah i may not have even got the json file right ( a lot changed)
trying the max steps thing to cut one concept off at 1200 steps (have less images for that one) and the rest to be the max amount of steps
Awesome, thanks! Will give this a go 
900 steps and they aint getting better lol
not sure if its patience or 512 in 768 model
or a bad json
to compare that with old training samples:
hell it could be i have the sample prompt not right
and my class files were generated with 1.5 ...... i think I may have too many variables for this to be scientific in any way shape or form lol
its using 12gb of ram..... thats scientifically proven 🙂
shout loudly at it?
I'll sing it Daisy...and threaten to pull out the ram chips
😄 i'm afraid i can't do that dave
lol (was worried that reference may show my age too much and not be regonised lol)
hehehe old geezers gotta stand together
indeed!
🧓
wonder if the Discovery 1 is in the dataset..... this may need to be explored when training is done
(and yeah i admit i had to look up the name, knew it was Discovery something but couldnt remeber what)
yup still lots to be experimented with instance pictures their number and diversity and same goes with the class pictures
these samples are disturbing
there's some funky funky results to be had by setting a couple variables in ways they probably weren't intended 😛
mhmm welcome to the uncanny valley, it gets better ... after a while
would recommend grabbing a drink, maybe reading a paper
tempted to load the console for a bit got an hour to go.
so far my mate josh is the only one thats consistant...it looks nothing like him but its the same imposter in each image lol
/me afk for an hour
that'll happen more often with dpmadaptive but DB doesn't train with that one, wish you luck 🤞
worse case scenario it will be a horrible failure and i will never know why lol
you'll know eventually, once more data points come populate the graph, like testing with 768 resolution, more instance pictures, different class pictures and more or less of them
lots of knobs to explore 😄
I think its my json file tbh...i should have done the first test with the webui and one concept...then expanded
doesnt help that dreambooth has had a complete ui makeover
there's a description of every variable on the github
read the manual....first? thats crazy talk
manuals are for figuring out what went horribly wrong....not for preventing it lol
😄
I think its calling me fat
My AI: one person...hmmm no, one person and a little person...hmmm no, one person he must be one person...no...OH he is a house! that explains the waist
yeah it didnt go well
I have to say that has to be the most hillerously wrong a training has gone (that was 100% the first gen test of the model)
hi friends, any advice on how to achieve style consistency when fine tuning models? i would like my output to look a lot more like my training images (side profile, no background, same angle etc) but cannot seem to figure out how to get there!
Hey guys! I've just released a new photorealistic SD 1.5 model. It's available on dreamlike.art, in diffusers, and as .ckpt.
Trained on a large dataset of high quality images. Based on SD 1.5 with the new VAE.
Model Card: https://huggingface.co/dreamlike-art/dreamlike-photoreal-1.0
Link to .ckpt: https://huggingface.co/dreamlike-art/dreamlike-photoreal-1.0/resolve/main/dreamlike-photoreal-1.0.ckpt
Diffusers model id: dreamlike-art/dreamlike-photoreal-1.0
- Use standard SD prompts, or add danbooru-like tags for characters (1girl, brown hair, etc.)
- Supports any aspect ratios! No more double heads or repeating subjects. Non-square aspect ratios work better for some prompts. If you want a portrait photo, try using a 3:4 or a 9:16 aspect ratio. If you want a landscape photo, try using a 16:9 aspect ratio.
- Trained on 640x640 images, so increase the resolution a bit
anyone getting good results with 2.0 embeddings ?
im having problems training a personal model
any help? i keep getting images like this even after long hours of training
What are you training it with ..... i got the same
might be a basic tip, but have you considered putting things like "white background" on the text prompt
not sure what you're going for but I love those
Hi #🔧|finetune! Our first forum is going live with the addition of #1047197565365538826!
This forum will be a supplement to #🔧|finetune where users can share their custom models with the community. Creators can add tags to their post indicating the category the model falls under, for example, an anime-centric model focused on a specific character would use the anime & character tag. Please read the pinned post for guidelines before submitting your model! ❤️
How does 16gb of vram compares to 24gb regarding finetuning the model? Can I run out of memory with training?
lol thanks, was going for more structure for landscape stuff
When I use your model on dreamlike-art, it's removing my credits, it isn't free?
It's not. You can use it for free locally
Oh alright thanks you, I though of it by seeing that
You get 100 free credits after signing up and 1 free credits per hour after that
go to your v2-inference-v.yaml, ctrl-c, ctrl-v, name it the same as your model with .yaml
yes I did. I have good quality generations now. But the resemblance is off
he put a post on reddit saying that it needs way more training steps than before
So I put no in the contains_faces section...
I am doing 1500 steps, 2000, 2500, 3000 and 3500 checkpoints
I put captions to all my images
like
token_selfie_smiling_with_a_white_shirt.jgp
100% train text encoder
fp16
768 model and 768 images
Guys I'm sorry if this is super basic but is there a way to set the "weight" of a negative prompt? Like.. I see what the model is doing, and it is removing too much of the features that I would like to see less of. I'm using the stable diffusion UI
I have tried is like "something:0.1" but the weight doesn't seem to matter at all. If I set it to "something:0.0000001" or "something:0.99" it seemingly results in roughly the same effect
use (something:0.1)
Thanks!
How does 16gb of vram compares to 24gb regarding finetuning the model? Can I run out of memory with training?
16gb requires a lot of tricks to run well
it can dreambooth and textual inversion, for sure, but I'm not sure personally if it can finetune the whole model
is it normal for the loss to go up and down from .2 to .1, currently at step 500/15000?
and which one is better? higher or lower loss?
some training actually uses just above 16GB of VRAM, like dream artist will sometimes zap up just over 16GB of VRAM
and then various dreambooth options can take the full 24GB VRAM
well im using Automatic7777's webui to do SD stuff
Hey guys, Does anyone have any idea on how to train Dreambooth on 8GB VRAM??
🥲
then what's point of SD2.0?? I guess, for people like me, need to depend on someone unknow in this world to train a subject or style and upload it on Internet.
point is that it's the same size as the old SD, but produces higher quality results and should also theoretically be a better baseline for those finetuned models you use
so in the next few weeks you'll see people that do have enough VRAM finetuning v2.0 instead of v1.5 and that ideally will produce better quality weights.
okky
With dreambooth on a local system (via automatic1111) is there any harm in just adding more and more concepts to a model?
so for example, I train 1.5 model a person (me david eastwick)... i get it to perfectly make me.
I then take that model (lets call it 1.5dave) and train it on my mate josh..
I then take that model (lets call it 1.5daveJosh) and I train it on my mate craig..
I then take that model (lets call it 1.5JoshDaveCraig) and train it on a specific art style.
Now would the result of that model (lets call it 1.5JoshDaveCraigSpeghetti-art) still know how to make images of david eastwick? would it still be able to make the an image of the statue of liberty?
at what point does training/finetuning degrade a model?
can i just go on adding things to a model as i like ? i mean they are trained on billions ...but guessing not through dreambooth
anyone know? (if anyone does, then feel free to ping me)
Does dreambooth diffuse all images you give at once or does it diffuse them one after the other?
it trains your batch_size number of images at one time, though if you use dreambooth it usually batches one training image and one regularization per batch_size as well
but is there degredation the more you train it?
I believe so. Never seen anyone be sucessful in training more than one subject
I have managed to train 3 concepts at once with two looking perfect and one looking ok. but its such a pain to balance i was hoping one at a time would work
Interested in stylized diffusion
in training a style? there is a few guides on that
or using a style... for that huggingface will be your go to
gnollingcase is a good style to try
Training but I’ll wait for a colab that has a simple ui I just kept on running into issues when I tried I think it was called accelerate the image idk if there’s anything new tho
Please post if it worked, I'm curious now.
thanks
Will do, may not be tonight though
Hey guys, could I have someone who is really experienced with the software shoot me a message please, need some advice
my test just failed
I trained with just me until I was perfected
I then took that model and trained my mate craig until he was perfected
I then Tried to generate craig with the end result... perfect
I tried to generate me with that result I got craig... even though i was only using my token
Do you think I could finetune Stable Diffusion to do something like beard/no beard? Or are GANS still the bees knees for image-to-image translation? Like I think I want to to Img2Img with a prompt like "Man without a beard" to remove beards.
Or is finetuning better just for styles or something? Any good guides you could point me towards?
In that case why not do (((beard))) in the negative prompt?
but yes you could fine tune a model techincally
as for guides....im not sure, it would depends on what you want to do and think how you think is the most easiest way to do it.
(Hope this is the right channel for this) This is an image2image with the prompt "girl, cat ears, Yoji Shinkawa" Everything about this is almost perfect but the hair on the left and right are supposed to be pig-tails. How do I go about fixing this with stablediff?
I thought all I had to do was just use the same seed and change/add a word but that changes the entire image still :C Like could I photoshop in some pigtails over and then run it again with the same seed or something? Sadly now whenever I try to do it with the same seed it just spits out a completely dif image idk what I did
Have you tried inpainting?
I did but it more just kinda smears hair around, I didn't exactly know what I was doing ig
Yeah that's pretty normal btw. Small changes in the prompt might result in widely different results
It's basically the butterfly effect in practice
Yeah well "it is what it is" I suppose. Your best bet is learning photomanipulation/photobashing or smth like that if you have some exact thing you're aiming for
Like, generate a couple pics and mash 'em together
either way sadly I lost the seed, for some reason it generates a completley dif image everytime now so rip
Do textual inversions work the same on v2 as they did on v1.x? Like, can I just use automatic1111 and train on a small library of 768x768 images with the training tab? Or do I need another repo/use another method to 'shape' SD 2
You can put it in png info to get the seed.
Then, just use something like [ponytail:7] to make it think about ponytails after base image established.
thanks for sharing
What I mean to ask is specifically do you think it could perform well (look like the original picture with only a removal of a beard)? I'm trying to understand if this task requires fine tuning or not. For example, I tried img2img, queried CLIP for a prompt, then removed "with a beard" from the original prompt and added ((beard)) in the negative prompt. It definitely gave me a man with no beard in the same spot with similar colors (sometimes his skin was black, sometimes not), but it was a DIFFERENT man. I can see you're getting amazing results, so that's why I'm asking you specifically because you understand what finetuning can and can't accomplish.
Same method
What about merging models? Could you train a model on "Dave" and then train another model on "Steve" and merge those two models into "DaveSteve" and then create a new model for "Paul" and merge that into "DaveSteve" so you get "DaveStevePaul"?
When you merge models it does exactly that, it merges them together. So the two distinct people wouldn't feature anymore you'd get a sort of hybrid of the two.
If you want to train multiple subjects you need to do them all together really. Unless you can find something that lets you actually resume training and not just stick the new stuff on top.
What's the point of merging checkpoints/models then?
Some people like merging the styles to see what happens
Also, the model obviously already has the full set below the added subject, so what happens to all of that data when you merge two models that share that base?
You have some control over it as you can weight it either side. It it's not suitable if you wanted to stick two single subject models together
It gets blended together
How so? Data from 2.3 billiuon images in both models. How does it merge all of that data together?
It's taking weight from both models and balancing them out between the values you pick. So maybe 50% of the weights from each model.
That sounds like a theory
It's what it's doing. The code is open source you can go look at a dreambooth trainer.
The ckpt files contain a bunch of weights that tells it how to control the noise generation for certain items. You train a model your telling it how to produce the right noise for your images.
When you have 2 different models and you merge them it gives a weighted average for the new model.
Why do you think the file size stays the same and doesn't double.
So then what is emad suggesting the community does to fix the borked 2.0 model? The implication is that we can retrain it on all of the missing subjects. What you are suggesting is that is impossible.
When I finetune a model on a single subject with token xyz, it adds just a tiny bit of data into the model, yes? And that tiny bit of data doesn't wreck the data from billions of images that was already in the model, correct?
So then why would taking that model that was trained on xyz lose any data about xyz if I did a subsequent finetuning against subject abc?
Is the data for xyz stored in a special compartment that gets shared with abc when I do a subsequent training?
And if that is indeed the case, then is if safe to assume that dreambooth training is not an exact reproduction of the process that the model underwent in its initial training?
Dreambooth isn't the only way to do training. I Suspect they'll release tools that help people train models from scratch.
Fair enough.
I got the seed back and the image back to normal I was just being dumb and didn't input the original image. So you're saying I add "ponytail:7" to the prompt or the in-paint?
what's the 7 part do
I'm talking from experience with dreambooth models too and merging them caused the blending. But training them all in 1 session worked.
Understood. Separate from that conversation, do you know the answer to my last question regarding dreambooth compared to the original training?
Don't forget the bracket. This is a very useful feature in A1111 to control the generator attention
oh sick! I'll keep that in mind
So when you train something brand new, and you give it a token name so SD understands what it is, you also include regularisation images, so it tells SD that this stuff should still be in the model and it won't overwrite it.
You'd still get different results from a base model, but it wouldn't completely wreck that data.
In theory if you had 1 model you trained and then you wanted to add something into it, if you trained a new model, but with regularisation data from the old model, you might be able to keep both that way, but I'm not sure how well that would work.
So Dreambooth is doing finetuning. Is that quite different from what is done when training the initial model on 2.3 billion images?
I understand that part, but beyond that, is the actual process of training fundamentally different than finetuning?
I wonder what software they use officially then, dreambooth seems to be a hacky way to do what they do but not perfect
I don't know the exact way they do training from scratch, I presume it would be fairly similar.
You've got your text encoder and your telling the text encoder with images and captions what certain things are.
You do that both with dreambooth and from scratch
thing is when you look at some of the model cards for the official models they say "resumed from model xyz" so there has to be a way...just not publically available
I think you can, it's just more difficult and requires a lot more VRAM than commercial cards.
The Waifu Diffusion model for example, is created by resuming from models, it's not dreambooth.
Although I don't know exactly how it's done.
There's a reason I'm asking. There's a guy on FB attempting to prove that the model does in fact contain complete copies of images. He used dreambooth to finetune on a single image with 1000 steps, so it's overfitted AF, and then he shows that now he can create a "perfect" duplicate of that image by prompting SD with the tag. I think this is an invalid experiment, manipulated to further the agenda of the AI-haters, and is obviously not reproducible with a dataset of 2.3 billion images.
hmmm hadnt thought of the vram need
I mean It doesn't contain the image that's not how it works.
It contains data for a noise pattern and then sort of instructions on how to turn that noise pattern back into an image. If you've told it that a certain word is this image 1000 times, it's going to start producing this image. But it's just noise, it's not an actual image.
https://www.youtube.com/watch?v=1CIpzeNxIhU
https://www.youtube.com/watch?v=-lz30by8-sU
These two videos are very good in explaining how it works. Although I imagine the person in your scenario won't care much about the facts.
For what it's worth, I know that. I just wanted to better understand the difference between finetuning and initial training in that sense.
I think the only real difference is with finetuning your modifying the weights already there, and sort of moving them around so they create what you want them to.
From scratch there's no weights to begin with so you have to teach it from scratch
Like you can fine tune a particular type of clothing and because the main model knows people, it knows clothes and it knows people wear clothes, it can apply that type of clothing to people.
But if you trained a model from scratch with clothes, it would only do clothes, just on their own. Because it has no perspective of anything else.
So if this person is training an image over and over, it's not even just his image it's learning from, it's using all the other artwork that are in the model already.
Hello, beginner here - what would be the most obvious use case for fine tuning vs other ways to make a model?
Not sure this is the right place but here is my Day of the Dead embedding https://huggingface.co/datasets/Rocinante2000/Day-of-the-Dead
If you want to put yourself into SD, for example. That's the main use case I employ. I have created models of myself and other friends and family and then create cool images of them as other things.
Such as my little brother as the Hulk...
Yep!
One sec, I'll grab one
Dreambooth local training has finally been implemented into Automatic 1111's Stable Diffusion repository, meaning that you can now use this amazing Google’s AI technology to train a stable diffusion model with your own images. You can train a character, an object, a style, or anything you want! There is also a new option that allows you to use D...
I've been using that one, although, the extension changed it's UI as of late
so it's a little outdated, but the concepts remain the same
Wow, awesomeee, thanks!!!
Did anyone manage to convert diffusers weights based on SD2.0 to a ckpt usable in auto's web-ui?
I used this script https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_stable_diffusion.py
but it gave me completely different results using the exact same parameters in the web-ui.
And is it normal that I never manage to finetune the SD 2.0 base 512 model under 0.25 or 0.24 loss value, no matter what I change (learning rate, steps, training images, class/regularization images, etc...). What am I doing wrong? I am training locally with the diffusers repo. Does the loss value actually matter for finetuning SD 2.0??
Oh, when you guys say "finetune" you mean Dreambooth?
Hey guys, if I feed an even mix of 3 different faces into a training model, can I expect the output to be a combination of all 3 faces?
Or, is there a better way to achieve that?
I'm trying to combine my face with my brothers, into one model
Would it be better to train 3 models individually, using the same token name, and then combining them with an even weight?
has someone exp. with everydream? Is it worth using to train models?
Yes
Hey guys, It is possible to fine tune Stable Diffusion using Dreambooth for Whole Category like for example Anime. Not just single Object or Style.
not with dreambooth, I think
but there are ways to finetune the whole thing
check https://github.com/victorchall/EveryDream-trainer and see if that's what you want
also there are models for anime
Not anime exactly, It was just example.. Because anime itself is not a single thing.
EveryDream won't run for me, Locally seems impossible, and not even for colab...I guess I need to stick with Dreambooth for now...
=/ yeah those things can take a lot of compute
if you have the money, you can try an instance at e.g. lambdalabs
this works with SD 2 too?
Hi guys! What do you recommender to fine tune interior design projects? Make something similar as interiorai.com
I kinda doubt it, haven't tried yet though, so possibly! But they aren't 1 for 1 with the older models so
I've been working on a training for a bit, I'm running into an issue where my hypernetwork is doing a great job of getting the correct face shape/type fitted onto a main subject, but any (medium-to small faces) don't get the same treatment. (I'm curious if it is possible to include transparent images in a meaningful way to a hypernetwork so that it could have smaller faces without whiping out backgrounds because it's just a tiny floating face in empty space.
damn everydream needs 24gb....
do 32 and 48gb gpus work too?
like the AMD Radeon Pro W6800 has 32gb and doesn't cost that much more compared to a nvidia rtx 3090
Is there any api that I can train my face and it returns images? I want something official stable for non-financial illegal actions
Hey guys, have you found in your testing for training that women seem to produce more "stable" models than men do? I'm not sure what training parameters are wrong but the women I've trained a model on come out looking very consistent
But then all the photos that I try to generate from men come out with weird, uncanny valley faces
does anyone have any advice on how to solve this, and get more consistent faces with male trained models?
random guess but different amounts of facial hair between pictures might be hard for it to pick up. I'm not sure what training method you're using but it might help to tag images with that information, i.e. "with a beard, clean shaven"
hey guys, does FP16 half precision training lowers the quality?
Does anyone know if there are any problems with merging together, like 100 models?
My plan is to create a BUNCH of finetuned, dreambooth models, and them combine them together with merge model.
But, I am wondering there there are any issues with merge model, at scale. Like, as long as I use a DIFFERENT keyword, PER thing that I am fine tuning the model on, that should be fine, right?
I'm doing this because a DB model, is like 5GB, but it is still the same size if I combine them together.
merging dreambooth in general doesn't work, as far as I know
I has some conversation in this chat a while back about this, hang on
Is it because the REST of the dreambooth model, gets copied/ doubled?
I didn't try too many merges, but I started to see some worse results... and wondering if other people saw this
yeah, it's expected to get worse results
because dreambooth trains the weights of the model itself
so it changes/forgets something every time
like this
Hmmm, there has got to be a solution for this.....
Like, lets say I trained ONE dreambooth model, on ALL of my data, but I assosiated a DIFFERENT word per image type.
Basically, I want to say "MyModelSword" and "MyModelShield", and train both of these keywords at once.
I thought you had to merge models for this. Is there another way to train multiple words at once?
Right now, in the UI, you can only put in 1 word/phrase, right? And that word is constant?
Any methods on how to not run out of vram for larger model training/finetuning?
that I know of, there are no methods... you could try combining textual inversion embeddings for different things
would probably get mixed results, but it's the best bet
I'm sure there are ppl working on that problem, from the amount of times I hear this request xD but so far no tools I know of
emphasis on "that I know of"
Actually, wait, there is a new feature in DB. Its called "optionally use [filewords] to read image captions from files?"
This seems like it is dynamic keywords
Or, it has multiple concepts now? And thats different?

that's for optimizing the dreambooth process itself, it's not multiple concepts.
Ok, but the "file words" section implies that it ACCEPTS an instance token, PER image. Which means that the instance tokens can be different. All I need to do, is have a different instance token, per image, I think.
thanks!
I think I am just going to have to test this out.... this is really a needed feature.... like why not train something on multiple instance prompts?
because it doesn't work 🙃
It's not just writing the program, it's a research effort
Why wouldn't it work? The entire point of these models are that they have multiple concepts already, right? Stable diffusion knows what a "man" and a "woman" is, and those are both different instance prompts.
Or is it something to do, with fine tuning? Fine tuning a model, on multiple instance prompts doesn't sound that different?
even though the model has seen a lot of things, it has not seen everything. Like, it has not seen my dog. If I want a picture of my dog, I need to change the model so it adapts to that new concept. That's kinda the idea behind finetuning (dreambooth, for example, is a way to achieve that). The problem is that can only bend so far before it breaks
if you try too many things, the model will try to do them all at once and won't be precise enough in any of them
there may be ways of circumventing that, but then one needs to have a grasp on the inner mechanism to find exactly what needs to be changed
which is a big effort by itself
Hi thanks for responding! I'm using the Stable Diffusion Google Colab based on ShivamShrirao's Deepdream implementation
Is it possible to tag images within this method? Or is there a better method that I can be using to train models?
I'm still figuring this tech out, I know I can tell it the instance and classification tokens, but I don't see anywhere to actually detail what the separate images are of
ahh yeah i'm not sure for dreambooth actually
it's an option in textual inversion / embeddings in automatic1111
Is that something you'd recommend I research? I'm not really attached to one training method right now haha, I just want to make high quality models of my friends
I'm having a loooooot of fun taking them around the world on photoshoots haha
This picture is almost impossible to distinguish from reality, and it pisses me off because I didn't write down my training settings (I am now!) and I don't know how or why the model is so damn accurate
And I can't even get my own face to look that good
I even tried training 3 separate models on a ton of images of my face and combining them into one model but it still gives me caveman chin and forehead sometimes
oh duh you're here already
Sure am!
I'm about a week in to learning all of this stuff right now, and I've only been using the link I posted above for training as my own GPU (a 1660ti) can't handle training locally
well, if you're up to read a lot, there's a big discussion threads with tips on textual inversion, mentions things like taking care with shadows or you'll get weird beard, etc.
Haha yes please, I'm not here for handouts 😛 I am all about learning as much as I can
also, if you want pictures of people, dreambooth is the best method, I think. Is that link enough to do dreambooth?
sure thing! that's cool. I'm just taking some time to understand what would actually help you
Ah, thank you 🙂
Yeah the link above does dreambooth, but I'm on the free version of Colab so I'm limited to 15gb of Vram
by the way, are you trying to get more than one person in the same model? That's tricky, as consecutive dreambooth trainings tend to fail
No just the one for now, I'll try two when I get comfortable with one, lol
yeah sure xD
I'm very confused about all the settings with dreambooth though. Batch size, gradient checkpointing, 8bit adam, they're all terms I'd like to understand so I know when to use them
this repo tries to do dreambooth with little memory: https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
oh so
you're already using it, I think, as you mention 8 bit adam
The Colab I linked above is using this implementation, yes
I tried to run it locally and CUDA runs out of memory every time, even in CPU mode with a 3800x, the ol' 1660 just can't handle it
yeah. You're in a good path already
Okay, that's good to know 🙂
So I guess I just need some better understanding on how to actually use this tool, and when to use which settings
so, the problem is that your face still looks bad, huh...
yeah. You just need to tweak the parameters
Women seem to come out consistently
Men seem to have slightly different looking faces every time
Like it generates a bunch of lookalikes XD
are all of those generated? They don't look that bad to be, is the problem that they're in black-and-white?
Oh yeah they're great! I love the quality, and I've been working on some scripting for random cameras and locations and stuff, that's not the issue, the issue is the consistency between faces
These are all the same face
These are uncanny valley nightmares
Okay, I'm understanding through typing this out, my issue isn't generating faces, it's getting the training to provide consistent faces, which means my issue is probably more with fine tuning than the general software
yeah
Ctrl+F for a comment from Oct 8
"My most effective way to train textual-inversion to reproduce people faces is..."
even though textual inversion is another method, it is similar to dreambooth in some aspects
so it might help
Thank you very much, that link has a ton of good data!
I thought, originally, that textual inversion was something you added on top of your already existing CKPT model file
And it kind of "duct tapes" the cracks in training through very strict descriptions of what the subject actually is
well, not quite. I could go into the detal if you'd like. I like talking haha. Or you can read their project page/paper
If you don't mind, please!
Let me steal as much of your knowledge as possible hahahaha
hahaa
God what a superpower that would be
yeah xD
language is kinda that, right? The ability to transfer knowledge from one mind to the other. ANyway,
textual inversion
when you type in the words, they first get processed by something called CLIP (or OpenCLIP, for stable diffusion 2.0)
each word gets transformed into a code of sorts
which is a bunch of numbers between 0 and 1
and THOSE numbers are what the visual module uses to know what to generate
An API for accessing new AI models developed by OpenAI
And that is what is being displayed here, correct?
Tokens that CLIP will use for generation?
yeah, so. The words get split into tokens (that's a simple substitution, almost 1-to-1) and then the tokens go into CLIP to get replaced by embeds
I was talking about the embeds, which are codes generated by the CLIP neural network
tokens are integers, like 1958, 287, etc.
embeds are a sequence of numbers between 0 and 1
Okay, with you so far
it's simple to get a token, not that simple to get an embed. Calculating the embeds is CLIP's whole job, basically
and then the embeds turn into images. cool. SO,
the genius idea behind textual inversion is
there are more embeds than tokens
like, possible embed combinations
so textual inversion picks something like your face
and finds the correct embedding to generate your face, even though there is no token -- no "word" -- for your face that the model knows about
Okay, so I type words into Stable Diffusion. Those are split into tokens and then each token is fed into CLIP to get an embed, and the image is the weighted difference of those embeds.
hmmm
almost
you got it right until you said that the image is the weighted difference
Yeah I knew that was an incorrect guess
I need to understand the different programming language terms better
the actual process from embeds to image is very complex, and it's the whole work of making something like stable diffusion
Haha right okay
Okay, embed goes into magic box, magic box makes picture
yeah
so what textual inversion does is
it starts with a general embed, like "face"
and makes a random-ish picture
and then it compares to your examples
it can calculate how far it is from matching your examples
so it uses that knowledge to adjust itself and try again
after many cycles, it finds the embedt that generates your face (or any object)
Okay, so kind of like anchor points?
hmmm somewhat, yeah
As in, generate a picture, look at the mouth, mouth doesn't match input from user, re-align to match
oh
yes, but more automatic
since it works for a cup of tea, for instance
or any object
Okay, this makes a lot of sense
so that's why it's called textual inversion
Or at least it's starting to haha
because it starts from the output and makes it way back to deduce what is the (textual) embedding that woudl generate that output
dreambooth does something similar, but it also changes (trains) the model itself to get as close as possible
textual inversion is not well fit for reproducing a specific identity, for instance
dreambooth CAN do that, but it is not guaranteed I guess. From your results xD
though you may find ways to tweak the settings and make it better
Okay, so going back to my duck tape metaphor earlier
ok
If TI is working backwards to deduce what the model should look like, and dreambooth can generate models that are somewhat close to accurate but not quite
Can they be used in conjunction so that TI has a more stable starting position?
perhaps if you use TI first, and then dreambooth from the starting position of TI, it might be improved
but how to improve Dreambooth or TI are generally complicated matters, which is what I was trying to say to someone else earlier. It's certainly possible and certainly hard haha
this would require stitching the methods together somehow
I once spent 16 hours trying to get my 3D prints to look, at best, 5% better
Certainly possible but certainly hard is what I'm all about XD
ha I like you
And I've noticed while loading the webui, in the CLI
it says "loading embeds"
But those files are NOT ckpt files, which is my current workflow, correct?
They don't have to be selected as a style
Just "my" version of 1111 now "knows" what that TI data is, and can implement it on any CKPT
yes
So if I load a disney style
And train a TI on Redw04
And then ask for Redw04 in Disney style
That all goes in the magic box and it spits out the expected output?
though it depends if the disney style model changed the CLIP part. If it did, TI might not work as well
dreambooth does that by default
Just trying to get a better understanding of how the TI works vs. a CKPT file
Okay, well this is a TON of useful information, and I think there's really not much left to do but to try and load up a TI model of myself and see what happens!
I really appreciate you taking the time to help me out with this
It was fun haha okay
This is a person you've trained on TI?
Kinda xD
Hey guys, do you guys know how to prevent Overfitting with dreambooth training?
Hey, quick question. Just had a big discussion about this. Are you succeeding in using multiple PROMPTS together, if you train it all at once?
So, if I did "myModelSword", and "myModelShield", then it would create a new model, that works for instance prompting, with BOTH of these prompts, in seperate queries? Does that make sense?
If you train everything in one go into 1 model and you give them different instance names then it could work
SD still has the general tendency to try merge multiple objects in a scene together
Especially with people.
Yeah, this is a different problem though. This is if I wanted to do a "myModelShield", and "myModelSword" WITHIN 1 single prompt.
I was just trying to make it so I can put 100 different models, into 1 model, and call upon each of them seperately
Yeah you can do that, I'm guessing your meaning like if you put "A Man and a Woman" into a prompt to try get both.
It sometimes works, but SD can be a little odd with it
I haven't done much programing or development but would like to train SD 2 on an artist (from the 19th century) there is a lot of his work in public domain, any suggestions for web crawlers or ways to download a training data?
Try this Image Extension for Google Chrome
thank you!
How much of an effect does learning rate have to model training?
I just noticed the Colab I am using has a learning rate of 1-e6 whereas Auto1111's training uses a default of 5e-3
Is a higher or lower number what I want for more accurate models?
@fiery frigate this channel is the training one chap
so ask here?
these guys will know way more than i will ever know (generally speaking)
ok so lemme change the question's channel
i asked in tech support
will it be possible to train using the dreambooth extension on a 3070ti 8gb?
Here you go
@fiery frigate
Hum okay a sec
from the ui itself
No wait I don't get it. The ui uses an extension to do dreambooth
yes
but
i always get an out of memory error
which fricks me out
Make sure you're using 8bitadam mode, in the advanced settings of the extension
It should run on 8gb
Did you try CPU mode?
Yeah but it'll run
i wanna run it on my gpu
I believe cpu mode just offsets some of the processing, your card is still doing work
Typically it's designed to run on like 16gb of VRAM, so you have to make some tradeoffs with less vram
but it'll take hours to run
oh wow that's alot
Check the advanced options here
https://github.com/d8ahazard/sd_dreambooth_extension
If they don't work, you're probably out of luck
Yes it will, but what can you do?
Speaking of which @weary knot I can't run TI locally for the same reason
8gb minimum
will colab work better here?
Oh damn. Can you use colab?
But I'm using 10000 steps and 100 images so it still takes about 2 hours to train
Not sure I haven't tried
I also want to experiment with loading in a 2.0 CKPT
Probably, is there one I can check out? I have only found a dreambooth colab
2.0 uses a different CLIP. So its TI is not compatible with v1
That's also the reason prompts change from 1 to 2
I don't know of any by memory
It sounded smarter pre-edit 😛
which one trains better: dreambooth from the ui or dreambooth from colab?
They use openclip, which is a recent open source project. Original clip uses a closed dataset