I radically changed my dataset:
- removed full shot, half-body, closeup etc
- removed location tokens (e.g. exterior, interior, city street etc)
- removed all mentions of facial expressions but smug
- removed mentions of poses such sitting, lying, etc except fighting pose
- removed all , and ;
- removed mentions of non-show clothing
- made the syntax the same for all images and added "with" in front of the hairstyle mentions
my thinking here:
my version 1.0 is very bad at locations and landscapes. not only do they look bad but i have noticed that its very focused on my training images. my belief here is that this is due to me tagging all locations within my images, like "city street" when an image has a city street background. so instead of applying the tlok artstyle to a random city street when prompting it, it will output the city street i trained it on. and because in some shots you would barely even see any of the city street it would now always only give me very zoomed in shots of a city street.
this is likely because each caption token is trained into the model as a token. so it trained my training image version of a city street into the city street token.
so now that i left it empty again it should still be able to learn what a street in the tlok artstyle will look like, but without training the specific street into the model.
and i have done this for all things now. so almost all my style images are now tagged as just "tlok artstyle" except for those inages where i do want certain things be trained into the model, e.g. when people wear earth kingdom clothing.
as for the other stuff: i think the whole full shot stuff was unnecessary and cluttered the captions too much. i should be able to prompt a full shot just as easily if i negative prompt closeup and the like.
i think the , ; are unnecessary and may even be bad? anyway lets see what happens without them.
i also removed mentions of any facial expressions the model seems to already know; e.g. all but smug.
similarly i removed all mentions of poses it should already know.
similarly to the city street thing i removed all mentions of non-show clothing and hairstyles such as a tshirt because i believe that the model can infer this information itself from the training but if i tag it as e.g. a tshirt it will now always try and give korra that tshirt.
and last but not least i had some images where it was "wearing X outfit Y hairstyle" but then some where it woulf be "wearing X outfit fighting pose Y hairstyle" and i have no idea if it influences anything but just to be sure i made the syntax the same for all now and made it grammatically correct, e.g. "wearing X outfit with Y hairstyle"
i have also even further upped the training rate from 3e-6 to 5e-6 and i am very curious if it that will work. i am doing it for only 25 repeats right now and taking a ckpt every 5 repeats.