#✨|sdxl
1 messages · Page 137 of 1
my poor harddrive ^^ this AI stuff is taking so much space
keep in mind 13b models only work with 13b loras and so on
also you need like 12gb vram with full context
(that's before any sd models)
i run it in 4bit
yeah, it seems to wirk
use this
so that it cleans the history and generates better prompts
oh, I didn't used the chat Tab
the prompts from the lora are a bit random, though xD
sometimes it works really good, sometimes it feels as it would just add random style and artist names to the prompt
(contradicting stuff like "black and white, movie by james cameron, unreal engine"
yeah, its not too good, its just a testing one
the prompts were from gustavo's dataset
so they arent really great
i am avoiding his dataset in the bigger training
maybe use less training data? I heard you can train a Lora with only a few hundred instruction-response pairs
@rustic garnet where can we find some good prompts with context included with tags?
yea, just searchn on hf
hm, you could crawl the "pantheon" channel here in the sd discord. Not perfect, but might be not that bad either
ah okay found the gustavosta one
the one is sent you only had 80k, and 40% are repeated, but the instructions were kinda different like generate a sd prompt of a dog & give me a stable diffusion prompt which can generate a dog
dont use that, its bad
well, I don't see any others, also one option is just using danbooru dumps
it's how nai was trained after all
also yeah, 2 million prompts is probably an overkill, just going to fry the llm
in the above kek
filter them
by length, and remove duplicates
I feel like these should have instructions, otherwise it's just a meaningless string of words
use any llm to give them an input
I mean, as a dataset, it should contain something like "Generate XYZ, make it look ABC." as instruction and then the prompt as response.
Sure you can generate the instructions afterwards
but that's less than ideal imo
crumb/bloom-560m-RLHF-SD2-prompter-aesthetic
do you think it is realy 560 milions?
isn't that a 560m parameter model?
yeah seems to be a model not dataset, in any case parameter number doesn't correspond to dataset size
max size of model is i know 1.4 GB it can say something probably.
Holy shit, dynavisionxl is incredible. The image has NOTHING to do with the prompt ({A determined runner, muscles tensed, pushing hard to climb a steep hill, sweat dripping down their forehead, juxtaposed against a speaker backstage, taking a deep breath, their eyes closed momentarily as they mentally prepare to face their audience.}, digital photograph, realism, Dribble, side view, short telephoto, 70mm, studio lighting, high resolution, (challenge:1.25), (overcoming:1.2), (determination:1.15)) but the images themselves are insane
And this one is scary af
Hey sorry for my ignorance but what service offers that node interface?
Boo!
Hey guys. So I just went back to image generation and reinstalled automatic1111 on my googlecollab (unless there are better options now that I am not aware of). I would like to make that kind of image with hidden words in a visual. I understand (but i may be wrong) that it's achieved with those QR code controlnets. I'm only seeing them for version 1.5. Is it indeed the way to go, and if so, does it exist for SDXL? Thanks in advance.
no qr controlnet for sdxl
i was messing with one earlier that was available for sdxl but man, results were meh
soon (tm)
i like dynavision
Sure, I just wish it followed the prompt... 
You can check the status of your clegs at the top of the suatllee menu. 8K game ui.
OpenAI's Dall-E3 is coming out soon, and it can do text much better than SDXL, at least for standard fonts like Arial and TNR. Will it be able to write swear words? I bet not.
it's comfyui
anyone knows how to load pti files in comfy?
it wont generate violent, adult or hateful content. Also wont generate public figures
yes only for sd, which shouldnt be problem, just different models. It is controlnet model called QR monster. @sturdy sinew
You may have a value in a box for a file that doesn't exist or may have a different name on your system.
does it matter that dall-e3 is coming out if you can't run it locally though 
dalle3 would be useless once oobabooga get's ComfyUI backend integration
They can say that, but they said that about GPT-4 too, right?
I mean maybe if llms could generate and tweak workflows on their own, not sure just having them generate a prompt can produce anything similar to dall-e
freeU is pretty nice though, I think...
we will find out soon I guess
nah, I think all it needs is access to the unfinished latent and it could handle the writing
sounds complicated
The LLM needs to understand vision, not just read a CLIP description and then right a prompt. It just doesn't come out right.
I bet they pulled off something like that on dalle3, the quality of the images aren't as good as SDXL, but they have better text- so that's my theory
LLaVa
My theory as of 10 minutes ago is they have something about as good as SDXL for text, then they're detecting text boxes after generation, and using AI to paste in text written in Arial or Times New Roman. Because in the background of some of their cherry-picked images, they have garbled text exactly like SDXL.
llava/r is just clip-whatever-it-was-called + ocr, it's not really that great imo
it's not like it can actually see the image
oh right, clip interrogator
yeah, dalle3 on it's own without LLM is just like closed source SDXL imo
It could have more parameters and be trained on way more images though. Because they have more money than Stability for fighting lawsuits, and they're not targeting consumer hardware.
their results say otherwise. the quality of the images they showcased wasn't really as good as SDXL imo, they just had better text
it's certainly not pixel diffusion anymore
But those were prompted by OpenAI's engineers, right? I bet I could make it do better things. (But I'm just going to test it for prompt comprehension TBH. I have zero use-cases for a watermarked, censored, corporate-owned workflow.)
idk, I tested some of the prompts they did on my SDXL workflow, SDXL did better on all of the images that didn't have text
I find that hard to believe. You know what that means??? I'll just have to test it. 🙂
can you recreate this with sdxl though?
https://images.openai.com/blob/54facbbb-c94c-4884-8c94-5b984b19749c/dalle-image-map.png
with the same level of coherency
probably, hold up
Right off the bat, the dunking nebula explosion is following the prompt much closer for Dall-E3 than for SDXL. Sure, I could use prompt engineering to try to get a nebula explosion to turn into a basketball player, but that's not the same as saying the model understands what I'm asking.
that's actually one of the prompts I tested and SDXL did better lol
Pics or it didn't happen! SDXL cannot render the human body's shape as an explosion. It would mess up the edges.
You can immediately tell which of the two models actually understood what this prompt was saying, and which model was just picking a few random words as inspiration.
A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.
(And, to be clear, I am 100% on open source's side, and I will continue using SDXL. But I'm also super obsessed with truth for personal reasons, so I probably come across as a jerk about facts.)
It's a dragon fruit. (The fruit is a banana, to be precise.)
@visual glade I just updated to the latest ComfyUI and speed is still a fraction of what it was on commits from a month ago.. any idea?
Weird thing as those porcelain thing, there isnt single piece have common fractured color, or is it there?
i have difficulty express myself in native language so i cant in english
with commits from a while ago, I'm getting 3.2it/s on batch 4 with AIT, and with the latest commit I'm getting 4s/it also with AIT
I'm getting 1.1s/it with A1111, so I don't know.
@uncut gull what is your gpu? 3060 or similar?
It just can't do it. SDXL cannot mess up the edges of a human's form.
LOL no, I'm maxed out. I have the 4090. Bought a new PC a few months ago.
ok 😄
DRAGON fruit
That is beautiful! What did you do?
first attempt on my SDXL workflow. again, I'm thinking the reason why dalle3 is performing better than SDXL for you is likely your inference
No, like, what prompt did you use?
"dragon made out of fruit"
that's it really
that's a skibidi-looking fruit
I tried "dragon made of fruit" and it failed??? Has SDXL been able to recast materials this whole time but I was missing the word "out"? (Testing now!)
no, again, it would also do that without the word "out". it's your inference, not the model.
No, it's definitely working 100% with "made out of" and 0% with "made of". It's the prompt.
I must have set these to render then forgot about them, because I did not seem them until now
"dragon made OF fruit"
that thing might be your friend or devour you in your sleep. never really know
it did just as good, idk man
OK, hang on. I must be remembering something wrong then. Let me check my prompt history and see what I actually tried.
Human with messed up edges!!!! It's real! 😄
dalle-3 might have gpt-4, but it doesn't have my workflow or fruit skills
yeah, GPT 4 is overkill for controlling diffusion, I bet we could do better by using Oobabooga x ComfyUI
you would just give it access to ComfyUI with a workflow loaded, and it will do the rest
lol, you can already use them in conjunction with each other to a degree can't you?
Oh LOL the mistake was human error in my case. I didn't zoom in on the picture, and I thought it had failed. These are actually little dragon heads. 😐
yeah, but it will be much better if Oobabooga was talking directly to ComfyUI
so left is SDXL, i thought right one
the bottom right one reminds me of harry potter 1 Voldemort when loosing the stone
Come on though, break the face into pieces somehow.
Still not broken.
Okay, I give up. It's an interesting challenge, but I want to try to get more interesting game UI designs.
i like challenges
@uncut gull i think you can try some nsfw mode. What i saw makes me sick, and it is only dynovision
What settings are you using for FreeU?
?
NSFW for explosions?
nsfw isnt just nudity no? Something hardly bare mature or adult, not sure what is right. Explosion of head can be pretty ugly... I think
nsfl 
I'm currently liking JuggernautV2 for everything, some of the SDXL finetunes are mind boggling
True. I was turning the head into another material first (mercury or tile) and then trying to fracture that into exploding pieces. TBH my game UIs are much more beautiful than exploding heads, and I will stick with those for now.
@uncut gull 👍
this seems like proper explosion but in part we dont see it
@uncut fiber
Exactly
but i think is impossible for example made headless robot, or statue. Probably with word torso? I know i tried robot, and cant.
torso is working
rarely
Blueprintify SD XL 1.0; https://civitai.com/user/Thaevilone/models
Blueprintify - Transforming Images into Architectural Blueprints
Introducing Blueprintify, an innovative LoRa (Text to Image) model designed to morph ordinary images into intricate architectural blueprints. With Blueprintify, you can effortlessly breathe new life into your visuals, creating captivating representations that showcase the beauty of structural design and precision.
Use trigger word: bl3uprint.
Trained on 3500 steps from a highly detailed by hand captioned large dataset.
- Special thanks the people who help me and who I care a lot about:
@dapper dragon
@upbeat summit l
@osiworx
@uncut fiber
@abstract depot
@lusty wolf
Some images do not seem synced well yet lol, give it a spin let me know what ya think. 🙂
scary
Ah, the north Korean space program.
Sorry dint want to bring politics into this channel it just felt like a funny thing.
🙂
i would call it rocket program
But if u do it good lol.
Sometimes you dont even need the front side of the person lol.
And it is just funny like heck. 🙂
Oh, one of my ancestors 😅
😛
Wait, i am not Austrian. Nevermind
I like this one myself a lot as it is almost as hes trying to play c&c.
its well made nice picture of ugly person
there is radioactive symbol?
Draws in the high hat.
There better be. 😄
when i was there on civitai, i cant download your previous loras, i hope issue is on my side.
Should be civit is a bit of a cake tho.
how would you describe this image?
mona lisa space dish silo bokeh backdrop.
imagine how powerful lighthouse 😄
@stone fossil 1:0
i dont know her
i cant nor see gifs... When posted i see them, but cant post them, because not see them
If aliens who used magic instead of science took hand-written notes about nature, it would look like this.
as you speaks about UI i know two games that changed UI and both game devs it took over 1 year. No small indie companies, quite big players
Not sure I know what you mean, but UI design is definitely challenging.
maybe this is what they left here! 🙂
simply developers decided to change GUI of game. And it took them over 1 year of working
do you know this Manuscript? @stone fossil @uncut gull ?
I can't read the title. The watermark is covering it.
it is old manuscript, nobody cant translate it nobody knows it origin. There is one small clue to italy. But who knows. It has as well unknown flowers, i think scans HQ can be found on net
Voynich manuscript
covered is probably THE
Tired of just looking at pictures of meat on your out-of-date phone? Introducing!
SDXL is great at darkmode UIs, but apparently it thinks light-mode UIs are just for pretty princess games. (Probably just a problem with my prompts though.)
have to give it better directions maybe
The challenge is finding a way to keep all the colors and high-frequency details in there. That's what most of the prompt is doing.
you mean just keeping it bright and all that
just tell it to make the opposite of this
those squiggles are warped
scream! wasnt here football derby
Hey... was this note about "our" Sarge? 😦
Thats the sarge i've seen here. Very sad to hear. 😦
heavy
oh no. sargezt was on this server but never posted here
still, feels bad
Is this for ComfyUI. How is it implemented?
Workflow V15.0 - optional image caption, new menu layout
https://github.com/JPS-GER/JPS-ComfyUI-Workflows
why do all these custom workflows use gan upscalers in the mix? they always make gan scaling artifacts
sdganl
you have a memory leak somewhere
hard to say what it is
something isn't clearing out as it should
make sure to use nvidia studio drivers. the game ready have been crash with the new dlss3.5 introductions
but I don't know about the vlad fork so I can't tell you what's going on really
JPS, how well versed are you on the different prompting tools and models? not like models devoted to only prompts, but at least used for them. I'm wondering if there are any options for enhancing a prompt rather than changing it's style
or maybe one of you fine folk know the answer for this
studio drivers are the safe bet. i dont know where all the community tips came from about downgrading to older game ready drivers.
the generation speeds between xl and comfy on my system are negligable. i use auto more lately because comfy i always have to fudge around with the ui connections to do anything
what do you mean with "without changing its style"? sai-enhance is something in that direction. it adds keywords like "masterpiece", without a specific artist or movie
is a matter of preference, I personally like comfy, but it's kind of hardcore for some. a1111 just seems like one of those too many chefs not enough cooks situations
the work between prompting is so much longer in comfy
stable swarm is a good ui but i'm still confused by it and don't feel as streamlined in my creative process as i do when using a1
well I mean, I'm using some stylers in my workflow, but some of them drastically change the output, which is fine, but not always into doing that. and it doesn't have many options to just make it look cleaner, or mildly something. I don't know. I'd like to keep the essence of the base and pull other things out with it
stable swarm is pushing forward
implementing new things
not sure about the others really
I've only used a1111, comfy, and then stable swarm some
a1111 was what I began on. it was great at the time because not really anything else, but they sort of got stagnant. seems they're catching up now though
my styler has artist, movie and general style - the general style has the sai prompts (the ones they use on their own website). if you pick a matching general style it shouldn't change too much.
just seems like if several people were building a house and none of them communicated with one another
stableswarm needs to get a ux designer consulting. this is what it looks like when i load in. .... why is the negative prompt and actual prompt on opposite sides of the ui? everything is just mishmashed in there and there's no reasoning to it at all
well the designer is in this channel so you could probably talk to him about it
yeah. i dont like to engage developers directly. if i'm not being paid to be somene's boss i feel really really weird talking to them about their project.
what i can say for sure is the UI is super confusing
at this point in the evolution of these things I don't expect perfection from any of the options, just open minded developers willing to listen and push forward
well I'm not saying you need to say anything
I understand where you're coming from. and I do find it outlandish when people have started talking trash to someone that's created something they cna use for free. that's silly to me. but giving feedback seems fine to me
@hardy cipher you could give your own part of the prompt or input image a higher weight to reduce influence of a prompt styler.
that's a great idea and not sure why I didn't think of it. thanks
this ipamonster machine makes such clean things
how do i install comfyui manager to an existing comfyui
i want to download extensions but not sure how
mhm
same prompt and ip adapter image - styler examples (with the new caption option, so i can easier compare results)
and then you need git
git pull into the custom node folder and it'll install itself
discusses it in that link
well you know, I actually don't even want my own prompt to hold much weight
The funny thing is by the time everything gets to a great spot in ux/ui developement we are all going to have to start from scratch with SDXL 2.0 etc... LMAO
I'm sort of combining several things. the blip output from 3 images, then some words I'll add to and remove from here and there, and I'll concatenate those. then I run them through a styler.
how do i do the git clone with powershell into the custom_nodes folder
Yeah, a big loss for us, his control net models were really good
is it possible to make animated door or similar stuffusing stable diffusion?
UPDATED Endless Nodes 🌌 are now pushed to GitHub (added aesthetic scoring and others)
I've updated the small set of nodes I am working on:
Took the node from https://github.com/strimmlarn that does aesthetic scoring and re-purposed it
Added an eight input number switch because I needed it
Added the Endless Nodes Parameterizer with Text_G and Text_L prompt box
Added the Parameterizer with a_score for both pos/neg
Added the Parameterizer with a_score for both pos/neg and Text_G and Text_L prompt box
Added the Endless Nodes Parameterizer
You can grab them from here:
copy paste what's on that page
I'm curious about the aesthetic scoring tools and what criteria they follow. because in my experience I've consistently disagreed with their ratints
we might just be going for different things
i'm not sure that placing the negative and positive prompts together in the same UI needs to be "UX progress made". It's just programmer ui and no UX effort at all. That's why they need to get a ux designer, someone who thinks about stuff like creative flow mindsets and making the ux facilitate that flow, sooner than later
https://en.wikipedia.org/wiki/Flow_(psychology) bad ux is the enemy of this
did a test with different ip-adapter weight: 0.7, 1.0 , 1.5, 2.0, 2.5, 0.5, 0.3, 0.15 (same seed and ip-adapter source image)
going above 0.5 doesn't really care much about the prompt anymore
i had to install git, ive cloned it in now
and it works
thanks
I've been messing with model block weights. well groups of blocks
as these systems come into play too, i notice ux problems. People are used to image to image denoise setting, higher meaning less of the original image. now these new systems have the setting higher means more like the oriignal image. Technically we can understand why these decisions were made. Realistically, it is bad UX and is contradicting paradigms
well they can't all be winners. yowzers
that's one angry nun
overcooked all of them, but bottom right looks a bit unhealthy
with the comfyui manager do i still have to manually install loras?
she just stay on head quite longer
you put them in a folder. wouldn't really call it installing
but yes, you'd still have to put them in the folder yourself
okay
im not sure if ive even done that right as its hard to tell if the prompts are working half the time
are you using default workflow?
well if you put them in the right folder they show up in your lora node as an option
if they're in that menu they should work unless something is broken somewhere
also lora must be for same model, 1,5 2,0+ SDXL as you are using
well if it's not you'll figure it out pretty quickly
it's happened to me, civitai trickry
yeah i think i did mess up somewhere
what workflow are you using? @idle kelp
told it only sdxl stuff and downloaded a cool looking lora without thinking. turns out that they inserted 1.5 models as things I might like
so that made things goofy
im not sure i even know what that even means
i barely managed to get sdxl to work as is
i am used lora maybe twice. SW uniforms to test and some another one
"oh, you're looking exclusively for one type of model version? let's give you something incompatible and keep it a secret"
lol
i think i figured the loras out though
i just have to create nodes
instead of the text based loras i see a lot
yes nodes is basic no nodes nothing in comfui 😄
I really can't fathom why would compel the designer of civitai to have that happen
thats more because uplaoders mislabel their stuff because 1.5 is more popular and they want to be popular
contests like giving away 4090s for popularity can have that community effect
it's on me for not double checking
but really, I just didn't think I needed to since I restricted my search options
after that I checked the settings and turned off all the new junk that I could
did you see the one guy everyone was saying put out fake loras got unbanned?
do you see the inputs and outputs and how they have names and are colored?
as a general rule you can use those as a guid
e
so your checkpoint output has model and clip
and lora loader has those as input
so connect those
okay
now just remember that any output name/color can connect to any input of the same name/color. it's not always the best idea in some situations. but they are going to be sending/receiving the same type of data
right to left
right to left?
same seed
here's a basic flow. you can see how everything should connect. ftom there it's just adding more pieces
you'll notice all the lines are straight. that's because it's objectively the cleanest looking approach
that looks much cleaner than mine
i need to redo my lines
theyre all curved
it works fine though
lol. well don't worry about that now. I'm just joking
but yeah, that's good
the one in the top left with the black blotch that replaced her face is moderately terrifying to me
that is a little unsettling
what does the cfg change?
ive just been messing with mine really
cfg is essentially how much you want the image to conform to your prompt
ohh
and there are drawbacks to both too high and too low
too low won't necessarily output a bad image, but it'll be pretty random and not very representative of your input
but too high it becomes overly cooked and looks terrible
super meat boy
yeah
what?
now some would say these are cooked. and they'd be right. but I like the aesthetic in some situations
the bottom right is actually pretty good
messing with model merging and block weights along with some other tricky things
so tricky tricky
thats so good
I can crank batch images like that in under 30 seconds (per batch)
20 is good for cartoon
sorry for this, but nice solved i "I" all should be mouse 🙂
nice
I was genuinely impressed by them when I realized what they were. I did not tell it to make backpacks
I think this is the same seed number as above. or they're sisters
@idle kelp it is in settings if you want spagheti or straight lines. There are 3 posibilities, you cant redo it manualy. It is i think under small cogwheel in manager window.
3rd eye
I need to amp some of these up and upscale
oh i see
thoughts on this?
good, red eyes he must be angry 🙂
yeah
same input, different model block weights
physical meltdown 🙂
i spend too much time today in A1111 with CN
S/HE has heart on bad place 😄
Cuba?
yust a car
Those cars are on Cuba very much. Nice car!
@idle kelp do you know, somebody told you if you drag drop image containing workflow (nodes in some order) it is default to keep this workflow in picture.
Just to be sure: Has the Ultimate SD Upscaler been examined for any odd code?
yes, I scrolled through its single python script available on github for everybody to look at just now, and can tell you it doesn't connect to the internet, send your prompts to the fbi or have a built-in keylogger, hope that helps
he cant bite 😄
I just saw some epic 3D VR180 videos on DeoVR. I can't find anything VR related on Civitai. Does anyone know how these were made??? They do say Stable Diffusion in the title, and they obviously are. But I really want to make one! https://deovr.com/videos/180-vr-ai-overclocked-stable-diffusion-deforum-part-33-12461
In a T2i node system, my clip is "clip g" in both clip loaders, is this correct ?or do I need one call "clip vision I"
@uncut gull The official port of Deforum, an extensive script for 2D and 3D animations, supporting keyframable sequences, dynamic math parameters (even inside the prompts), dynamic masking, depth estimation and warping.
Supposing 3d animation based on depth estimation is it
Is DeForum coming to SDXL ComfyUI?
I know how to use Deforum. 3D VR180 needs side-by-side images showing the same scene from two different angles, and those images need to be panoramic projections on a spherical lens. You can't just use depth maps to "slide" the image or something, because you get blank spots. (I know, I've tried.)
His VR180 3D is perfect though. No missing spots, no artifacts, no stereo glitches. There was even a brief moment when part of a landscape morphed into an astronaut, and the face inside the helmet had perfect stereo rendering.
yes and this can be done with depth estimation bad not best yes
i have VR quite long time but wasnt in it as well long time
A flatten 3d 360 video looks like this flatten , like a flatten globe map,so if you can get it to produce images like this then convert to 360 video
No it can't. You get blind spots. Imagine you're staring at a wall with a window. Move to the side, and the view out the window shifts. You can't do that in post. You can't change your view out the window.
360 panoramas have nothing to do with 3d VR180. You need two different camera angles on the same scene, with zero glitches in the stereo data.
Depth has great quality also notice they used some special model. Also bet it is cherry picked. Yes and those two cameras are done throught estimated depth
save animation's depth maps as extra files
Save 3D depth maps
Okay, I'm gonna try it. I have Deforum zoom and I learned how to set up a stereo camera in Blender last week, and I can get depth maps and put them on a plane... Or something. We'll see how it goes!
But this example https://civitai.com/models/125100/stereodiffusion is awesome, it just has too many stereo glitches to be usable, and there's no sphere projection, so it doesn't work for VR.
i realy dont want arguing. it is pointles. 1st my english, 2nd no reason
i have for vr great player, paid but cheap. Have almost unlimited posibilities. All sorts of projection and much more. I forgot its name 😄
here they upload video, and then get depth from it. Why? because stereo image only comes to my mind
it doesn't look like those have stereo renders on them. i think they're just flat images projected into a vr space like 360 videos on youtube are
Whirligig media player
dynamic math parameters
So deforum is still a disaster when it comes to userinterface. I like the idea but the execution is really botched.
Now that is one beautiful 3D! Not VR180 though. But we'll see if I can get that working.
it is quite good will try short video from it
deforum is a toolset targeted at technical directors and animators. people who plan stuff out meticulously and love doing it that way. i wouldn't say they botched their goals at all. it works so well in the realms they made it for
maybe what you want isn't something they've been aiming for
Nothing to do with the prompt but damn, this looks like Cyberpunk 2077!
If you want to use it you kind of need a proper tool to create camera paths and export that into deforum. But if you have that then it might be useful.
I'm testing out using my controlnet for more consistent text in SDXL and while it's not perfect yet, it is showing promise.
Left one is without controlnet and right one with controlnet.
photo of a cake with the words "behind you"
@uncut gull dont forget it is video, not real time head motion capture. As i understand it.
you made a controlnet model? 😮
Yes, video.
It is 3D VR180, but the render takes up only so much of the field of view.
Controlnet-lllite, quite easy to train and quite small. 44 MB files if I'm not mistaken.
my pruned models for 1,5 are 800MB 🙂
not my as made by me, just i have on pc. <- clarification
Sorry i forgot it cycles short videos. so about 8 second, also glitches, but i believe when some texture not so bad and probably can be somehow corrected.
Maybe you do the VR180 projection mapping in SDXL? It can definitely handle something like that.
@uncut gull if whirliwig has demo, i suggesting to you try it. i think it has its homepage on xyz.
Looked up whirliwig, found nothing. But I'm not using a proprietary 3rd party tool I can't control, if that's what you mean.
it is on steam so should be clean, it is VR player.
VR player? Hmm. I have a Quest 2 headset, and I have some player that I use on it. Day something? Daybox? Skybox?
Whirligig is a VR media player for the playback and viewing of video files and images. It support a wide range of projection types, render paths and has many other features to make your experience as enjoyable as possible.Tutorials highlighting some of the features and how to use them here:http://steamcommunity.com/app/451650/discussions/0/15464...
$3.99
453
Skybox is pretty great
skybox, imho 0,1% of features whirligig has 🙂 Or used to have
I've never run into a problem though... So I don't know what else I could want.
yeah it doesn't have the same features, but usability is also a bit better 😉
i think there is free demo, used to be. So nothing to loose
ZoeDepth is still running on a single image, maxing out my GPU, and it's been 6 minutes???
you may want different projection, different cubes, different anything you never dream about.
something wrong
it downloaded model probably 1,28 GB
But I definitely don't even know where I would find videos like that. I already know my Blender camera setup works with whatever projection I'm using now. (It's a spherical lens and a panoramic projection. Not a cube, whatever difference that makes.)
ZoeDepth will work, it did this for the last image too. It is just taking forever. I will switch to Midas after this one finishes.
o.k. i found some mars surface scanning in stereo by japanesse and similar
if last image, it is over i bet
they said it finnish before 100% frequently
I mean previous image. I did one image before, and it worked. Now I am doing another image, so I know it will work. But it takes a very long time.
AAAAA! I am going to stop it.
weird
Restarted, and downloading Midas 3.1 now. Hopefully it will be faster.
i had only superlow res but it did 12,5 fps and 151 images was made in few second
it downloading its own
This is 1024x1024 res? Maybe that is why?
Finished!
I just have to test in my VR headset now somehow. Hmm.
But I can already tell from crossing my eyes that it is very glitchy.
for example can you in skybox watch single pictures? SBS Top and bottom?
Maybe. I just rerendered with Res101, and it's amazing!!! I will try copy-pasting the image to Quest 2, and try opening in Skybox. It might need video though.
@uncut gull if you are unbiased you can at least watch video from steam about it, it cover some aspects of program. And i am not related to it at all, just fanboy of it.
@cyan crown reminds godfather.
It worked! 😄 Sort of. The projection wasn't spherical enough, but it was still really good. Also the image was too low-resolution. I need a higher resolution image somehow? But... Hmm. Well, I'll try it.
in sw i talked about you can change curvature and barrelenes of screen 🙂
Yes, but I don't think this was the software. The image isn't "round" enough when I compare it to other VR180 images. I'm changing my prompt to make it rounder.
i think it can make image rounder, watch steam video, should be mentioned
That is too much roundness. But wow. SDXL has a really good imagination about how to make a lens even more round than it can be in physical reality.
Okay, trying this one in my headset now...
just check how many custom things on right
LOL OK! Okay. I will try it out. Might be more convenient than copy-pasting onto the headset. 🙂
good night! 😄 LOL
you're posting some really great stuff. nice work!
thanks
I'm testing various artists and illustrators
8K VR180 SBS 3D scene. About to test in headset. 🙂
Wow! That is amazing in VR. It's really beautiful.
Okay. Deforum video, here I come. Not sure I can scale that up though...
How's it looking?
It's phenomenal. Other than the smoke-like texture wisps you expect from SDXL, it's like standing inside an actual landscape photo. I'm working on making a Deform video now.
Urgh. I have no idea how to make video. It's all flickers and weird transitions.
need help?
I feel like I'm hunting blindly through parameters here in Deforum. I also downloaded TemporalKit, but it doesn't seem related.
it's not related, it's for when you wish to do vid2vid mostly
I understand more or less what these numbers mean, I just don't know how to get that smooth zoom with very little change from frame to frame.
reduce the zoom to minimum and increase cadence
while making sure nither of them is too high or too low
dont forget high steps for sdxl
I have 50 steps for SDXL, and cadence is at 8. I just increased zoom speed, so maybe I should've decreased it instead.
there's a whole discord server for deforum, if you're into diving into it.
I can believe it.
deforum is what really hooked me on GenAI
My image keeps exploding instead of zooming.
increase strength
I have strength at 0.85. Pretty sure that's why it's exploding, actually. But lowering it was leading to changes...
you need to zoom slower, and then at some point reduce str so it can create something new
It turns into cloud nebulas at the end LOL.
Oh? Hmm. That makes sense. There is a noise schedule. But maybe the scenery should accommodate that instead. I need a scene that inherently contains zoomable information maybe?
Hey I’ve been kinda out of the loop since before xl came out, have the optimizations brought it down to be able to run on 8gb vram? Ty :)
Yes. There's a comfyui workflow for 4GB VRAM on Civitai.
https://civitai.com/models/147057?modelVersionId=163935
Simple model for CrystalClear SDXL, does not use refiner, uses sdxl vae and uses upscaling.
Oh great, tyvm
no, it can keep creating visuals for the same prompt or entirely different ones, just need to give it "air to breath" by reducing str for a few frames (you either check out math for deforum to give it a function or try to give it a soundtrack first)
Well, the zoom view works too. But I am zooming too fast for sure. Still, it's looking better! It's not exploding anymore. I'll try giving it more to work with and slowing down.
first, it's beautiful.
reduce noise to 0.02 and it will look nicer
i shuold try to do some for VR, my kids now are in their gorilla tag phase and that might get them into something else
is 12 too early to introduce kids to flying ai waifu?
😱 Depends on what you mean by waifu? But stick to beautiful landscapes and it should be fine?
do you share the prompt? 🙈
positive :: impactful paint by Makoto Shinkai, woman at the train station, highly detailed, 8k, sharp, professional, clear, high contrast, vivid deep blacks, crystal clear
negative :: ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft, multiple hands, low contrast, low resolution, out of focus
thanks for sharing @cyan crown !
sometimes I use a Lora called xl_more_art-full at 0..8 weight
only for artistic (not photo)
thanks! I guessed it will be Makoto shinkai, I love using this author tag for anime images
I only know "Your name" from him, though
That is a quality lora
hello. I had a question
how are they creating those encrypted text in images?
or like encrypting one image in another?
controlnet
controlnet qr
Oh, it's that simple. I do wonder how they made the training data...
I guess they don't actually have to put text bitmaps into the training data
BLiP diffusion is giving me a hard time atm.. I'm not sure if it's possible to implement AIT to it
You convert an image into a threshold black and white image. Now you can train the threshold image against the original image. Do that a few thousand times and you have your qr controlnet. Easy. At least i think thats how it works
I DIDN'T NOTICE
I just went into my room and saw that from afar and I'm dying rn
@mods?
this is not okay
@halcyon tusk
this is actually making me want to kill myself
I need to go to church for the rest of my life now
i think ppl with good eyes see it bit worse. Hidden texts.
block is only way
I actually just threw up
The gif's keep playing 
nah, a ban to this guy isn't enough
oh no. those hardly call them people should have banned IP on discord
maybe even lawsuit. this actually made me throw up
its ok chibi seagal is here
how you can lawsuit stupid around 10 years old kids? Whose IQ = their age
taken care of, thanks!
jesus, thank god it's over
thank you
thank you!
Thank you
thank you guys! i was deep inside comfyUI hahaha
I feel like I gotta go to church for the rest of my life now
how does stuff like that bypass the bot?
easiest way to avoid that is just removing embed permissions from new ppl
let's have some nice images to forget about this
Yeah!!! Eyebleach
nothing is perfect, but will keep improving
understandable. just that some of the worst possible images are gif embeds. happens too often
probably bot cant check videos
I need to put a gallon of this on my eyes now
it you go too hard on the filters, half of the images on this channel and other channels will get blurred, flagged, etc.
my eyes are used to seeing thousands of lines of code, not the worst things humanity has to offer
nice both generated? i mean text2img?
More examples from my Controlnet-lllite training to support better text generation. Still far from perfect but it is a lot more consistent on certain prompt than base SDLX.
Prompt:
movie poster with the words "cats & dogs"
And the input image, it doesn't need to be placed correctly for the Controlnet to use it.
yes, all based on "detailed closeup portrait of a female in a beautiful japanese red and gold temple" with different style settings (styles are just additional text prompts). position of the girl is done with low weight control net depth. seed is different.
nice consistent look
Making some stickers for Halloween... I could use some input...
those are great!
culturist ?
Here is a list of reaction emojis if you ideas for sticker Thumbs Up 👍
Thumbs Down 👎
Heart ❤️
Laughing Face 😂
Sad Face 😢
Angry Face 😠
Wow Face 😮
Clapping Hands 👏
Fire 🔥
OK Hand 👌
Peace Sign ✌️
100 Points 💯
Eyes 👀
Thinking Face 🤔
Heart Eyes 😍
Facepalm 🤦♂️
Shrug 🤷♀️
Hugging Face 🤗
Poop 💩
Thumbs Up and Down (Like/Dislike) 👍👎
Praying Hands 🙏
Party Popper 🎉
Face with Tears of Joy 😅
Rolling on the Floor Laughing 🤣
Ok. So the prompt for the first one was 👍. But since I got an eyeball, I thought I'd try 👀. That gave me the other eye image, which I think makes more sense. But now I also think that it's mocking me in response to the 👍 .
It most certainly is
This was for ❤️
I'm fine with that one
Open the Pod bay doors, please, HAL. "I know you and Frank were planning to disconnect me, and that is something I cannot allow to happen."
Alright, HAL, I'll go in through the emergency airlock.
"Without your space helmet, Dave, you're going to find that rather difficult."
Zoom in on that, and just WOW
hand 
Goonies - Fratelli's hideout
V2 out of my workflow.
https://civitai.com/models/142535?modelVersionId=169206
man, if I could ever find a stabilized point with what I'm doing I might just put my workflow up, but I just keep pushing forward with other weird things, lol. might just start it over and try to be orderly this time around
It's hard to not just keep expanding it.
Have to call it done eventually lol
Especially when new features keep being released
should probably streamline it a bit. I just keep trying new things, but then, some of them aren't really the kind of thing that most people would probably want to mess with. dynamic thresholding, model block merging. and not sure there are one size fits all optimal settings for those.
Yeah when I have stuff that's variable in that regard, I just put it near middle variable and note to play around with the weights as they wish
could also just make it optional to use some of the stuff. or have a couple versions. maybe a low vram less indepth flow. but for me personally I can't stop myself from going deeper if there's the option to do so
the weights aren't really the thing that I think would get people, but dynamic thresholding is a tad tricky and not sure how I'd effectively explain it to anyone since I don't even know yet what all the settings actually do
on another note, found the longest node you can use. lists hundreds of weights that you can modify individually. pretty practical
Yeah I get that, some concepts are just out there for many.
But, then again, that isn't who would generally be looking for a workflow like you have
true enough. well the flow without either of those two things worked great in my opinion. so I could certainly put that one up for people. not sure where though. I guess civitai. could put it on my github but not sure anyone would ever see it. need to update my listing in the manager too
ugh, or I could do none of these things
he went with macaroni salad, vintage godzilla, and ferrofluid. not seeing a lot of ferrofluid in the mix, but close enough
also these for some reason
Where Do I find "BNK clip text encoder"?
salad monsters from the bot yesterday
Spaghetti monster looks good, I'll try that next
nice. these ones are fun because not even describing the images anywhere. just put them in and see what it gives me
you had a spaghetti monster long time ago also
like I could put in 2 of those with that truck picture above, and something coherent will come out. sometimes it doesn't really even make sense to me, but always works, lol
speakin of salads
yes, but that took typing words. these are just automated. so certainly takes away some aspects of control. but I do still positive and negative prompt boxes if I want to manually add anything
Spaghetti monster didn't work well. Didn't even give me fsm, oddly. Pasta monster worked better. Not that I tried either very hard...
Try for scout badges. Might be able to get something better than this? This was the best arson badge I got. None of the stitched patches even came close 😦
hmm
I suppose I could anyways fix that one with sd15 img2img but cbf
I like the tomato
It definitely took the colors from the salad at least
Title: Main Theme from Attack of the Killer Tomatoes
artist: Various
Album: Attack of the Killer Tomatoes – The Soundtrack
Release: 2003 Rhino Records
This started from the Rhino Records OST but when I matched it up with the video I went ahead and extended a few of the lead-ins. It is a peculiar short soundtrack from a bizarre comedy movie.
En...
well different aspects dominate depending on input and settings. I like to keep the same seed number and move things around to see how it changes the scene. this was definitely more cartoony and animated than I expected. next ones seem to be shaping up better though
I did some killer tomato stuff with sd15 iirc. Dunno where it is though
better memory then a movie
changed some weights. same seed numbers
Tomato bun
that guys face is cursed and terrible. oof
is that miles morales burger 🧐
maybe. I don't know miles first hand. but it keeps making me burgers for some reason. which is okay with me I guess
speakin of food
not bad
Spaghetti monster didn't work. Pasta salad monster worked 2 times out of 6, I think
tasty
cfg might be just a tad bit too high there
wow
this a weird elephant
Jack Nicholson
that's his younger brother that decided to pursue a career in competitive eating
lora,korean girl,pretty face,on bed,materpiece,best quality,photography,
done
your damn mammoth dominated the photograph
Try putting these together? 🤔
Idk
Pizza monsters are harder
Was hoping for olives or pepperoni eyes, not googly eyes.
Neg googly eyes and I just get holes
it has to be 3 pictures
🤔
I set it to kirigami
I've got some fat Yoda?
lol
pizza time
I should do pumpkin spice pizza. It would go well with my other pumpkin spice stuff
alright, this style might work
Pizza anyone?
he loves pizza
That's fine as long as you don't put pineapple on it.
that's like me with strawberry pictures, but that was an accident
he hates pineapple pizza
TOS Rule Nr. 4 doesn't allow pineapple pizza pictures
now that u mentioned pineapple he destroyed it
Mostly not pineapple pizza, but worse https://sd.efreakbnc.net/gallery/Discord-Bot-Pizza.html
steven seagal joined the server and livestreamed a full stack of 12 H100's giveaway
Only 12?
that's not even a terabyte of vram is it?
wheres 34N3's demmit?!!
hes too cheap 😔
Man, I should be asleep
Cant get the QR code monster to work in Comfy 
1.5 or SDXL?
I have tried the SDXL super early one, but the results are questionable.
what is aesthetic score ? and is posible to use on already created images?
I really don't know what exactly it does, but you don't want it above around 6 if you are going for photorealistic
Oh, coincidentally i had an eye pizza a moment ago
i think it has something with beauty of image 🙂
yes 🙂
this is a pretty dope idea, salami, mozzarella, olives or something
pitty using QRmonster i cant be with you all here 😦
Weird question, I want to create ai voice dataset, does anyone know where I should start with this?
Do you mean Text to Speech?
yes sir
I was reading on gradio is that a good start?
You could look into https://github.com/neonbjb/tortoise-tts
cool, can you train models with that too?
I think that's possible, but i don't know for sure. Let me check whats state of the art right now
This also looks very promising, there is a demo available on Hugging Face: https://github.com/coqui-ai/TTS
that does look pretty good, I'll take a look. Thanks
i think there were one author painting heads as green apples 🙂
They are staring into my soul 😱
can anybody do bitten pizza? Oh i probably got what i have wrong...
Better than Chicken Man
why not both?
Send you a DM, cant show that here 🤣
That's super meat boy to you
well these should be interesting
any english classical dish?
reminds me of land before time
someone should shout on them to do it for SDXL
pushing the limits
khajiit
Hi everyone! I have created "game cards" thanks to SDXL. You can check out the whole project here if you are interested in the creation process. I would appreciate if you would leave a feedback.
https://www.behance.net/gallery/180764931/Ancient-Egypt-CSGO-Key-dropcom
SDXL 1.0 + Photoshop
and After Effect for animations
@rustic garnet @fierce hollow do you guys know if anyone used pickascore to validate multiple images?
a cat with long hair sitting on a table, a photorealistic painting by Natasha Tan, cgsociety, photorealism, realistic anime cat, realistic anime 3 d style, realistic 3 d style
dream /a cat with long hair sitting on a table, a photorealistic painting by Natasha Tan, cgsociety, photorealism, realistic anime cat, realistic anime 3 d style, realistic 3 d style
I have some classifier nodes but afaik there aren't any pickascore ones, looked into the code at some point but it wasn't very obvious of an implementation
its not really good?
it's quite good, just requires some effort to implement
Bill Hawkins
pushing this cfg hard. hitting unprecedented levels
I have something like this setup, where it basically takes a bunch of images, picks the best one based on some classifier and if the score is higher than threshold it continues with the gen, otherwise stops
what do you use to rate the images? because I've found myself disagreeing with the assessments of the models I tried out
yeah, it's an issue for me too
tried a bunch of different ones but none are perfect
I often just pick all 4 and average the score, that tends to help, currently using... one sec
up to 60 now
cafe_aesthetic/style/waifu and vit-age-classifier
pickascore is nice since it also takes the clip embedding (is my understanding anyway) and checks how close the image follows the prompt
gotta implement it one of these days
right on. I'd like to learn more about that stuff. and that seems to be a pretty unbiased approach
oh ok
alright, going for cfg 70. no one has ever tried this before now
alright, something is cooking
Has anyone got the "control-lora-sketch-rank256" working and is using it??
no
vit-age-classifier actually works pretty well by itself too in place of the aesthetic ones, if you give it a bit of a ramp to follow where the highest scores are assigned to [your-preferred-age-range-no-judgy] it tends to pick pretty good images, though it mostly works for realistic art
have you tried https://github.com/mcmonkeyprojects/sd-dynamic-thresholding? there's a comfyui version now
I kinda like low cfg now, like 3-5, works well with loras
had to bump up that threshold percentile
3-5 is great and all. but have you ever tried 70?
I mean yeah, I imagine most people cranked the sliders up to the max at some point and noped the hell away after one gen
well it's not something everyone can handle, that's true
dynamic cfg is nice too I guess, personally I never really found much of a use for it
also doing this because I can
well I like to make things that are different from everything else, and these are definitely that
freeU, freeU depth map, ori
freeU is good for structure (usually) buy bad for color. The idea is to use depth map to keep the structure from freeU and generate with ori prompt
if you ask me the colors from freeU look better than original even without the depth map 
It usually add too much red and blue for me
in your examples above the original feels kinda washed out but could be the case in general, not sure
I'd say some kind of merging of the two would be ideal
or at least the option to do so. maybe some kind of sliding scale
cfg 70.592 looking pretty clean. not gonna lie
I think I should use recolor instead
Dudley Doberman
Bill Potter/Harry Gates!! 🥳
anyone ever found good use of the SDXL sketch control lora?
@foggy garnet qrmonster?
sdxl depth
oh great!
i made those to myself in inkscape 🙂
those with alpha not sure how they would work with depth
this is bit more dense then i posted before
i hope they train qrmonster for sdxl
we talked about it yesterday and author claimed somewhere where repo is, he will. Just skip 2.1 models
I think there is a QR model for SDXL on Civitai and it seems kind of simple to train if you don't want to wait.
freeU, freeU depth map+ori ip-adapter+image blend, ori
what is freeU?
@strange mist
https://arxiv.org/abs/2309.11497
i am bad in english
is there a node / extension for it? 😅
I don't have enough functioning brain cells left to read and understand research papers.
just update to latest comfyui
oh so it's already implemented? 😮
cool beans
can anybody share wf with animdiff?
hey guys I need your help
can you convert this prompt into a good prompt antique photograph of a beautiful woman, sad eyes, sad smile, facial symmetry, frontal view, cracked and faded photo paper, staring at the camera, headshot, dark background, 1850, low light, dark, 4k by adding a bit more context and adding/remove tags
im training an llm to generate prompts, so your help by converting this prompt into a good prompt would be valuable for us
In this time range I would prompt for Daguerreotype. Try to mention old photo techniques instead of the damage and fading.
it would be great, if you can type out the entire prompt
it is appreciated
I would like to test the prompt first before I write something, but unfortunately I am currently at work. But if I would prompt in my style I would probably go for: "Daguerreotype photo of a sad beautiful woman, ca. 1850" thats it.
looks like facial asymmetry move their head on side 🙂
the prompt is too short, i want to add a bit more context so that we get precise details as possible
In the course of this project, you are challenged to replicate an antique photograph reminiscent of the mid-19th century ambiance, employing modern 4K resolution to capture haunting emotion and detailed aesthetics. The focus is on a beautiful woman, whose melancholic eyes and subtle, sad smile speak volumes. The attributes of facial symmetry, a defining element of classical beauty, should be well articulated in the image. Your model will pose for a headshot with a direct frontal view, engaging the camera lens with a soulful gaze that traverses time.
The photograph's aesthetics should mimic the age-old picture—cracked and faded photo paper that tells tales of bygone eras, accentuated by the stark contrast of a dark, enigmatic background. Low light conditions will engulf the scene, adding to the solemn yet profoundly evocative atmosphere.
The frame will capture more than just a melancholy visage; it's a recreation of a moment frozen in time, rekindled by modern technology. Your objective is to seamlessly marry the essence of 1850 with the precision of a 4K resolution, concocting a visual narrative that is as compelling as it is nostalgic.
Upon completion, the project should invite the audience to reflect on the timeless beauty and transient nature of human emotions, preserved through the evocative power of photography. Through this lens, allow the viewers to travel back in time, feeling the tangible sorrow and tender beauty radiating from the antique visage now reborn in high definition.
is this gpt/llm generated?
Sure 
the misc tags are missing
How many tokens? 🤣
Enough 
Flawless prompt
like 8k, uhd, charcoal, unreal engine, etc, these tags can really boost an output
Recreate an antique photograph, 1850s setting, beautiful woman with sad eyes and a melancholic smile, frontal view, facial symmetry emphasized, headshot against a dark background, low-light ambiance, image on cracked and faded photo paper effect, modern 4K resolution, emotion-transcending time, blending historic aesthetics with today's technology
damn, its too good
did you type it out or gpt?
that prompt is not so bad 🙂
its too good
i added yellow tint. Actualy wasnt those old photo manualy colorized?
damn, gpt4?
Yup

