@regal lion if the text encoder is CLIP or T5 from the web, well it's not like you got express permission for CLIP's training data, or the text that goes into T5. CLIP also performs too poorly trained from scratch on fewer than 200m images for Stable Diffusion model conditioning. so you guys didn't train it from scratch. do you see why this matters? adobe has the same issue.