#✨|sdxl
1 messages · Page 65 of 1
more f*ckton details
Ahhhhh! That's so beautiful! What a great color palette.
I have not seen anyone else use this workflow like Scott Detweiler's, where he uses Refiner (3 steps) > Base (12 Steps) > Refiner (20 Steps)
https://youtu.be/2Xe79Nl_6jA?t=791
Has anyone tried it and had better outputs with it? (I have not had a chance to try it yet, but I really want to now.)
Since we have released stable diffusion SDXL to the world, I might as well show you how to get the most from the models as this is the same workflow I use on a daily basis at stability.ai. In this video I show you some of the basics on how to get the model from the models to generate your best AI artwork from our models. You will need some of ...
I believe one of the devs said they tested a bot with it, and people preferred the normal workflow to this
He starts with refiner? 
Yeap
Interesting, gotta try
I've also seen base+refiner+base
Im running base+base+refiner with 2 upscale steps atm
at the end it's all very random, depends on the subject, on the seed-luck and personal taste
Yeah for sure
y'all got any controlnets? /scratches neck
is there a vae fix in the original 1.0 models since it was discovered?
Yes the new files are on Hugging face I believe, the 1.0 model but with the 0.9 VAE.
This workflow does surprisingly well 
Im upscaling between the base and refiner switch
what is everyones goto to remove the DoF field on most photograph-like images with SDXL?
Pray and luck
((Bokeh)) in the negative prompt, there are probably better ways, I saw someone post on Reddit about how they did it, but havn't had a chance to go back and read that yet.
so i dont need the vae fix for the refiner?
use same vae as for base
I think there is a new file for that as well.
😄
that is my current goto. thanks 😄
oh, thanks im sorry
This "Noise Conditioner"/Pre Refiner Workflow produces some pretty dark and edge/creepy stuff, and might be excellent for horror art:
Comparison to what the more standard workflow produced.
neat 👀
I'll add it to my collection, thanks 👍
I have trained a Lora Models using "pip install autotrain-advance" in google colab, then I have added the model in my Lora Folder, but sadly it doesn't show here. Is there a specific process fir this part ?
is inpainting a thing in comfy?
https://www.reddit.com/r/StableDiffusion/comments/15gc941/introducing_rundiffusion_xl_free_and_open_model/ lots of models to play with already. community moves fast
0 votes and 1 comment so far on Reddit
It does seem to produce some pretty photo realistic waifu. I like it 🙂
Can get a lot of twinning ect. from the refiner, so have to cherry-pick more.
Yes, but it's not as smooth as A1111. You have to use a Load Image node, then pass that to a VAE Encode node. There is a mask editor in the Load Image node but not as many inpainting options as A1111.
very extensive testing and research
Sound like I'll just use a1111 for now
For inpainting it is still easier, but SDXL support isn't as good and VRAM management isn't as good either. Use --medvram if you have less than 16 GB.
Control net and inpainting model when? Lol
they've said soon a lot and that they are working a lot on it but are changing how they do controlnets so I guess might take them a while
ControlNet when it's ready (soon I expect). Inpainting model when the community makes one, I have never really needed to use one.
ya they're training their own controlnet model that's lightweight compared to the old 3rd party one that everyone used. anyone is free to train their own sdxl controlnet models and spread it themselves
I trained a few dreambooth models/loras in 1.5, but no idea how I'd go about training a controlnet model
morphing tattoo
you need one as soon as you do inpainting with high noise strength
Yeah tattoos get wild
controlnet is more useful to me than inpainting for the most part. and couldn't you techinically inpaint with 1.5 or 2.1? might not look as good, but it culd be used for smaller things
Or you can just use inpaint sketch or photo-bashing.
Do you guys know an easy way to get images for a LoRa training?
what kind of lora?
web crawler?
How much it/s do you have when rendering in SDXL? In my case, the results are much lower than on SD 1.5 (from 13~ to 2).
I mostly trained models on specific things, like my less tech savvy friends, so I can troll them
1.1 it/s on RTX 3060 with --medvram and Euler sampler.
I have like 1.5
RTX 3090
actually, often dips to 1.4 or sometimes even 1.3
idk, I have GF 4080 and its like 2.20~
let me check. mine is low. 3060 with 6gb ram. it's a trooper though
its so slow
yeah it is
you know, when I got this laptop it seemed pretty high end, lol
So it comes out that SDXL rendering will be so slow?
What extensions do you use ?
now my images getting black at the end.
Remove --disable-nan-check from your command line. This option was not good and shouldn't have been added.
I'm getting 1.35-1.5 it/s with 6gb vram
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
does this actually reduce the quality of the output?
not great but sufficient for my hardware I think
ill try thanks
Try making a 1024*1024 image with a 1.5 model and then compare it to SDXL, everything else wouldn't be fair
Lower it/s is normal for higher resolutions
disable nan check strikes yet again
come to think of it
i dont recall anyone in #🤝|tech-support needing help with nan errors or anything on comfy, only a1
holy optimizations
I use inpainting way more than any other tool in 1.5
can some1 tell me what is comfyai? what is this and why ppls all the time talking about this? i need this for a1111 when using sdxl?
I find comfy so much more enjoyable, every day I use it a little more
ComfyUI and it's a little more advanced UI to use SD
Based on nodes
Need help, I am trying to get the exact outfit from LoRa Nami in CivitAI especially hair color but I am getting the output on the second picture, the whole outfit was not copied only the position.
I used Controlnet for the Openpose.
Is there any extensions I need to install?
And I'm on mobile too 
Yeah that's what I'm doing now
@uncut steeple @azure oxide and why is better than standard a1111?
I think this is because Comfy doesn't try to use fp16 for the VAE. A1111 tries fp16 first and then upcasts if NaN is generated. But if someone watched a bad tutorial and added --disable-nan-check they will isntead get black image.
It is different, better is in the eye of the user
better is in the eye? its loolks much harder to control it than a1111
but idk, maybe have to test it
all those nodes looks hard
you can avoid the nodes by just loading someone else's workflow
You still need to figure out how it works then
After a while you get used to it and you have so much more possibilities
Inpainting is faster for adding something new to the scene. At least for me. It takes like 2 seconds to add anything using latent nothing.
for waht ?
Any custom nobes I use are detaied in the workflow in Credits & Notes
How can i solve this
I never tried that but as you can see the error is "Unauthorized". First thing is check the URL format. It is correct?
it literally tells him in the error the username and password he is supplying (if he is supplying any) are incorrect but it's not the channel for it
man i wish i could read
Many people are intimidated by the long error produced by Python so instead of reading them they paste them for someone else to read.
Also can be language barrier.
Its Hugging Face
Somehwere in the script it probably asks for a Hugging Face Token to auth against the resource
Feels like that has been skipped
Where can you make a checkpoint model? I only find the lora option on kohya
Has anyone had any luck in the dreambooth space?
I've tried the native A1111 method and the Joe Penna repo but neither have worked yet.
It's been recommended to look into kohya_ss - I just still need to bite the bullet on learning that method.
Does there seem to be a substantial difference with Lora's vs Dreamboothing in SDXL? Or is it fairly comparable? I'm wanting to create consistent characters/art style for a project I am working on.
Side bar: does anyone know the state of controlnet? (I've been out of town for a couple of days and am trying to get caught up on all the things that have happened since the launch haha)
If you want to create consistent characters/style then you want to use LoRA.
With 6 GB checkpoints you do not want a ton of dreambooth models for every concept and character.
Kohya SS needs at least 7 GB of Vram (if it hasn't changed)
Totally fair. I had gotten pretty decent results dreamboothing from 1.5 - the project is video based so I am needing something that is going to lock in a bit more. Didn't know if there were any updates around that space.
How are the Lora's in SDXL? Are they good enough to pass as dreambooth quality?
Running a 3090 so that shouldn't be an issue
If really he can't do it at worst he can send it to me 😄
does somebody know how to train a checkpoint? i can only find LoRa
man, I remember when SDXL couldn't do this
That's neat
I'm not saying that Dreambooth won't give you good results, I'm saying that it is space inefficient.
I tweaked a few settings, first is cooked but second is a lot better.
Great.
i had a crazy idea for a add detail lora for sdxl. even though it really doesnt need it. wonder if itll work lmao
latent space magic!
Make a LoRA that removes details and then reverse it?
thats an interesting thought but i wanna keep this a secret. if it doesnt work ill tell and yall can all be like lmao your dumb
800 steps seems better
It's not 800 steps, it's 30.
30 might have cooked it a bit much too. I guess I should rerun it at less steps or lower learning_rate.
Though that might be fixed by using the refiner.
Some details are lost here as you can see.
When you say 30 do you mean 30 total steps? Like after epochs? Sorry if that’s a silly question
Tried RBR workflow. I think it gives the image more detail and I add the upscale part of sytan which bring back the lora likeness with detailed background.
That is cool as shit!
When I say 30 steps I mean 10 epochs with 3 images with 1 repeat.
3 images? Amazing
I really need to restruct my mind of lora training
any workable lora training config file for colab ?
keep getting error for the step "building text encoders"
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', './sdxl_train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=/content/kohya_ss/sd_xl_base_1.0.safetensors', '--train_data_dir=/content/drive/MyDrive/Images', '--resolution=1024,1024', '--output_dir=/content/drive/MyDrive/LoRA/output', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=0.0004', '--unet_lr=0.0004', '--network_dim=256', '--output_name=NANA', '--lr_scheduler_num_cycles=25', '--no_half_vae', '--learning_rate=0.0004', '--lr_scheduler=adafactor', '--train_batch_size=1', '--max_train_steps=12650', '--save_every_n_epochs=20', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Adafactor', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' died with <Signals.SIGKILL: 9>.
cant, then i get oom
That's the core issue, not the nan check error lol
Disabling the nan check is like sweeping the error under the rug and pretending it doesn't exist
i got an amd 5700xt 8gb i need every mem optimization i can to be able to generate 1kx1k native.
im happy that it works at all
Disabling the nan check is not an optimization, if you can't generate without it then you have other issues to tackle
but i can generate the images with it, so it somehow makes no sense what you said
i know its weird, but it is what it is.. base model with extra vae 0.9 works, base model with merged vae0.9 does not work, i have no clue why
Cry "Oink!", and let slip the hogs of war
Yes but what I mean is you don't need a dedicated inpainting model, I don't find them that much better for fixing hands than the standard models.
I use inpainting to take care of things once in a while, but honestly don't really use it that extensively. I've never been super worried about using inpainting models, but I guess I just don't understand what the benefit is
This is how I use it. It's just way faster for me.
https://www.reddit.com/r/StableDiffusion/comments/15ggzp8/15_inpainting_tutorial_ive_seen_a_lot_of_comments/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=1
Pytorch just upgraded it's nightly wheels with ROCm 5.6 a week or so ago. Slightly improved memory usage for my XTX
does anyone know why "NUMBER != number"?
I tested your conditioning trick a bit, works great but there's still some stretch in the image and texture stretch/blur, more noticable in closeup shots. Inversing the negative conditioning seems to fix the issue for me.
so the method is to change the aspect ratio?? how does tha twork exactly?
shouldn't the target values not be at 4096 on longest side?
probably, was quick and dirty just matching the exact settings
will play around with min maxing numbers later
so from my understanding and maybe I misunderstood ;). if your output image resolution is 1024x1336 the target values should be 3139x4096 (or the closest width in this case that fits in a multiply by 4)
yea you would want to get as close as possible to that aspect ratio, im sure someone will make custom nodes to do the math for us
your surreal images are pretty cool 🙂 looks like you're having fun
Yeah. GReat stuff.
so should one side be 1024, or should the total pixel count be 1024^2? because I seem to be seeing conflicting info on that
Joe also said that is a work in progress. so maybe that's just the current best practice
Oooh was that base sdxl?
Oof, kohya doesn't like not-outdated python lol
please take this all with a grain of salt - it's just how I understood it, what could be totally wrong.
we talked yesterday with Joe Penna in the chat about it (not sure if you were there). you set your output resolution. you then calculate the target_values proportional to your output resolution. your longest side will be 4096.
output res / target_values
1024x1024 = 4096x4096
1344x768 = 4096x2341
also the values need to be adjusted to fit in a multiply by 4. so it would actually be
1344x768 = 4096x2344
wow. that's interesting. I appreciate the explanation. I don't consider myself a dumb guy, but it's oging to take me a minute to wrap my head around exactly what that all means, lol
Take this one for instance, native 1720x720, then upscaled 4x. From what i've gathered, it's as long as total pixel count is above 512x512, or 262k pixels on the old, or over 1 million pixels for sdxl
that's just how I interpreted it. not sure if there's already a node that calculates that, but this can be solved programmatically so we don't have to do it manually
yeah, I don't have a problem with the math. but not exactly sure about the what and the why I guess. my knowledge on a lot of this stuff is still pretty limited
yea setting this manually is going to be a huge pain, im sure there will be a node solution once its all tested and figured out
joe did mention they nave some custom nodes he cant share yet
@west breach was already working on something like that for his nodes yesterday 
nice!
I like this ****punk style, whichever one it is
what's the word on using loras on the refiner? is it necessary or beneficial? and has anyone used them on negative prompts?
all examples showing it only on the base model
doesn't mean it's not worth exploring :D. I got a list with so many things to try. with comfyui you have so many possibilities - which I love - choices.
good stuff
That workflow is awesome
what gpu u guys running
3070TI (8gbs of Vram)
2070 Super 8GB
4090 
4090
4090
3060 12GB
btw AttributeError: 'NoneType' object has no attribute 'bos_token_id' what does that mean?
Hows performance on SDXL?
I have the same card but it's not too great
Hows performance on it with SDXL?
How does it perform with SDXL?
I'm semi in the market for a new card atm
fine
I get about 1.3 it/s on my 3060 for SDXL
hmm thats pretty good
1344x768 46steps euler in both base and refiner around a minute per image. So 1.1-1.5 it/s depending what UI
I'm getting 13s/it 2070s 8gb 1024x1024
But you don't really need 46 steps, sounds overkill and if it takes around a minute I think there is something wrong.
Yeah I probably have a bit too much steps 😅 I use SDNext and it's a bit weird how it counts them when I use 80/20 base/refiner ratio. But it's still under development
Not so bad, by example with the workflow of Sytan (with Comfy_UI), it take ~2 min to generate (1024x1024 > 2048x2048)
Depends on how many steps and resolution you set
Does embeddings work in a1111 with SDXL? It says it has multiple items or something
I was wondering why my images were turning out a bit off. turns out I was sending both positive prompts to the pos/neg for the primary model, and both negs were being used for the refiner
very nice. top notch results
go on...?
I do not know what it is, but sounds like I should
I googled it and it thought I meant redxl
I might test it but I don't see the plastic look everyone is talking about so it might not benefit me.
I like that you are making these unconventional things. more people need to take this route
quadro p5000 16gb vs rtx 2060 12gb fox sdxl?
anyone get comfybox to actually work?
more ram wins
looks like my room
🙏
I bet comfybox only works on Linux or something yet says for windows
how high is your CFG? 😄
deep fry
haha exactly
iow cfg 2
with a1111 I'd use some tricks to bump cfg into the mid-20s
need to figure out how to do that with comfy
I mostly go with cfg 4 - 5 right now. but it totally depends on the prompt build and style
I stick with 7 then XL hits and everything is so deepfried I am just not feeling XL
have you tried the script from comfy yet? https://github.com/comfyanonymous/ComfyUI_experiments/tree/master#sampler_tonemappy
I need to look into comfy cfg tools more
least crowded mario game
I've just gone with 2x latent size for all W/H and target-W/H. Seems to be close enough that there's no distortions or loss of quality that I can see compared to longest-side-at-4096. Until someone proves me wrong, that'll do 😆 4xLatent starts to distort. Guessing it caps the longer side at 4096 which craps out the ratios.
how important is it to do that? maybe I need to start implementing it
Likely not at all important as long as you have the same ratios. 😄
I don't like sub 7 because I don't want it to do whatever it wants I want it to do what I tell it.
I'm just confused on the ratios because before it seemed like it should be 1024^2 pixels, but now people seem to be making a side 1024. so not sure what's what
yea that sounds good enough for now for a quick and dirty without too much fuss
That I don't have a clue about. Maybe the model can handle things that are close enough. At least well enough that it doesn't matter much.
yeah, we don't know yet
@soft zealot https://github.com/space-nuko/ComfyBox/issues/132 Did you ever get this to work on Windows? Seems the dev is MIA judging by the tickets and last responses.
I like it. Seems to get arms and legs of people a little better, but makes a lot of very similar looking faces. Anything other than people, I'm not seeing much of an improvement.
RunDIffusion XL I assume? Havent really played around with community finetunes yet, guess I just assume they'll be rushed/overfitted because tuners will race to be the first out the door, especially with prizes involved
there are a few finetunes that I have tried that are not too bad, but they all seem to come with a description that says it is an early version....Same with Run Diffusion XL, but it might be the best of the finetunes i have tried for SDXL
hmm I might give it a shot, even if its overfitted might get some mileage out of it by mixing it with the base to walk it back some
yeah, it has some good stuff mixed in there. Says it is focused on people and animals, photo style
On the last step it gets distorted?
wrong VAE
that dog ate something that was expired
I like the human legs coming out of the middle
I seem to get that crawl a lot too
are these images grainy?
Better... always forget to check the VAE... still learning.
might just be the resolution
Just popping in to see if anyone knows where Sarah Conor is... Just need to talk. DM please.
he looks like arnold Schwarzenegger mixed with beavis
he looks like a mix of arnold, dolf lundgen and conan o'brien
okay, well that makes sense actually
yes! thats what i was sayin in another chat
can't get the Dolf out lol he just keeps showing up in these
it just made me think of this guy for some reason
that guy is def a terminator
ugh, my clip data was bypassing the loras I setup. no wonder it looked like trash
so ran it through them, but then mainlined the clip text encode node to the checkpoint model
@hard fractal is something like the image merger in kandinsky and mj also planned for sdxl in the future?
new workflow test, multiple passes of base and refiner on multiple cond scales, seems to give pretty nice results (still looking for better ways, let me know if you find better parameters etc)
if you wanna try it out install 'comfy UI manager' and then click on 'install missing nodes'
good news, everyone, my node paths are straight lines now
I've leveled up
linear node paths are for madmen
I could'n follow where anything was going with those
perfect. I don't want anyone stealing my secrets
So went back to the spagetti
he looks like that one guy
does anyone know if it's possible to train textual-embeddings (not loras) with a1111 and sdxl yet? and if not, if it's planned to be added ?
pretty cool. even made a gun out of old tubing to kill the alien with.
no controlnet, no training - sdxl so far is a leap backwards
nice.pretty consistent look
For some reason that looks like it would be the Zuck. Maybe that's the metaverse he imagined?
there's training of loras @gloomy barn , those work pretty good and are easy to do
I'd go to that metaverse. seems pretty sweet
yeh honestly thats what I had hoped for.
@boreal bough can't wait to see your final one, this thing is rad
I just want hologram dinosaurs
that's got narrative built in. very nice image
I am really loving these
one of the first ai images I ever made was of bean trump
the black and white ones? yeah, thanks, I like that aesthetic
I am a HUGE Film Noir and B&W Sci-Fi fan.
nice. I have a bunch of them I haven't posted on here
various weird themes
Yeah, I was doing that in 2.0 a lot, and made models in 1.5 of that, but something is just not right with XL base and I can feel it, and on some subjects see it.
pure_fire saw it in some of his so some subjects are just so overcooked. A bit odd.
My hope is that a FT/DB comes out as who uses base since forever?
my fear is this ends up like 1.5 with a merge of a merge with another merge instead.
I bet that happens in spades since it requires more than 24GB to train it all, for now.
yeah, I noticed a bit of that
like things won't look congruent with one another. I don't know how to describe it exactly
those eyes
whats your prompt? I made such images with SDXL 0.9 with easy, but since SDXL 1.0 I'm no longer getting images like this :/
is that the new zuckerberg lora?
probably, it looks very close
nicee
likely gonna have to disassemble the whole thing to clean it, cause man this thing came in nasty
can you do 512 batch size now?!
seller lives in a very very small apartment. Likely the oils from him cooking turned the dust into more of a tar. Compressed air did nothing lmao
you may be trying to be funny, but I CAN do over BS 90 when training 1.5 LORA's lol
damn
good thing about this GPU being dirty from cooking oils is
it smells great lmao
It smells like freshly made Indian food, even though the guy says he never ate indian lol
🤢
plus the ram on 3090s gets nice and hot, fry some eggs on it
My mom and I make indian food all the time lol
nice
I am WAY happier with a food scented GPU than a cigarette black lung scented GPU lol
indian food is delicious. gpus that smell like indian food are not
3090 doubles up as a lie detection machine
gonna validate perf and stability, then disassemble the whole thing and clean it from the inside out
best thing is, the warranty covers disassembly
I love EVGA man
uh @hard fractal we should do that check it on the list
when will we have controlnet? ;D
That's amazing!! how did you do this??
thanks, man! I created the image in SDXL and the animation was generated in Gen-2. Here's a reddit post I created earlier with some info: https://www.reddit.com/r/StableDiffusion/comments/15fn4i9/sdxl_and_runway_gen2_one_of_my_images_comes_to/
any news on controlnet for sdxl?
was bored and tried training a Wes Anderon LoRa 🤔
batch size 1 run on a 2060s 😆
noice
idk who jerma is but at this point I'm afraid to ask
morning, beep boop
Are these numbers right for setting the target_width, target_height and the refiner width and height?
looks like the wrong vae
I'm using the Fixed FP16 VAE
don't think it works
thats a vae issue if i've ever seen one
which vae should I be using?
try none just to test first
Where can I find a list of existing LoRA models to use with Automatic1111?
thank you! I found a vae fix on civitai!
you can also get the 1.0 with the 0.9 vae baked in, then set the vae to none so it uses the baked in vae
Man, Witcher 1 looks kinda old nowadays.
pixel lora?
Yes
nice. I need to start figuring out the new loras. so many being put on civitai every day
I get extremely similar images when prompting sdxl vs 1.5. Like if I promt 'beautiful fantasy landscape, moody winding trail' I get the subject in a very green forest on a trail. No variety. Where 1.5 would have had one image with a castle in the background, another in a tropical forest etc etc. Or if it's in a house, it's always the same brown wall paper and leather couch
you should use 1.5 then
My question is, does anyone else have this issue or is this a personal problem? Do we have to promt down to what kind of bush and how many trees are in the background? Just trying to learn the new prompting reqs
hmm
man, I just use a lot of words honestly
but did the same with 1.5
I don't know that you have to describe everything like that. but everyone has their own approach
i dont have a colab script yet but hopefully on my todo list
thanks a lot for comment
nice
dang, your youtube channel is quite active. nice work. I guess I should watch some of the tutorials. what method do you use for the loras?
well anyway, I guess I could watch the video,l ol
Currently having some issues to upload images to the project page on CivitAI, so it's not up there right now. But for anyone interested, the latest version (v3.0) of my ComfyUI workflow is already here: https://github.com/SeargeDP/SeargeSDXL/releases
searge, nice work on those nodes1 I use them quite often
still haven't worked out how to properly use some of them. but I'm getting there
Thanks, glad to hear that you like the project
haven't really figured out pos/neg g, l, r. I just sort of guess what to put in them. maybe not the best approach
actually updating the node pack right now
cool. the workflow now has 5 prompting modes, one of them is "simple" and is ideal to get started with the whole thing
that reminds me, im sort of surprised i havent come across a transistor node type yet for comfy for easy path switching
hmm, you mean on your end or have it done automatically?
I believe there are some logic gate type nodes, haven't tried them out
I want to put something together that would allow loopbacks. or maybe that's already a thing and I just don't know
can't believe this is not a real image
it is a real image
ooh, scary
The custom node DB is currently being updated, and updates to custom nodes are being checked for.
NOTE: Update only checks for extensions that have been fetched.
Base model is great, I don't get why people bother downloading finetunes Alphas
variety
Is there any technical information on clip_g and clip_l in SDXL and how exactly they are applied to the model? I know that several people have done "experiments" and then tried to guess at the meaning of these. But is there any actual documentation that explains it?
thought the sdxl bots just used the same prompts in both
Probably. And I think A1111 also does that. But Comfy allows different prompts for each. It would be interesting to find accurate information about how these prompts affect the output. I know many people have done "experiments" and then made guesses, but surely there is real documentation about this somewhere.
I don't think there is
That's very strange. To have a feature that no one understands.
Agreed
there's research, lol
it's not like they just put it in by magic. there's a purpose
I know. I want to understand the purpose so I can use them more effectively.
Any in here use comfybox?
Im sayin if the bots just use the same in both I think that's how they're supposed to be used. Ik some people experimented with different things but none of them were consistent from personal testing
believe me, I'm in the same boat. problem is most of the info I've found is either over my head and too technical or clickbait nonsense
I understand, but that's still based on experimentation and guesswork. I thought maybe there is some paper or even some documentation in the code about it.
there's no documentation on anything
it's discussed in there
A111 is weird with sdxl. If I put something at the start of the prompt, it pays attention, but often changes the character. If I put it after some other things, the character goes back to what it was, but that part is ignored
not even on the width/height clip params
Different prompting actually worked but it is kinda weird too as sometimes it didn't do jack
If refining is flaky I can imagine G/I
Text encoder CLIP ViT-L & OpenCLIP ViT-bigG CLIP ViT-L OpenCLIP ViT-H
who has a workflow with G/I?
So it's simply different text encoders?
OK, I don't think I can understand all of that but I understand some of it, I will read through it.
same. but just take what you can from it. from what I understand it's essentially two distinct models for interpreting prompts? might not be wording that correctly. but it basically means you can have two distinct prompts for the base or primary model
CLIP-ViT-L-14 is a model that can map text and images to a shared vector space. It is based on the CLIP model by OpenAI, which uses natural language supervision to learn transferable visual models. It uses a Vision Transformer (ViT) architecture as an image encoder and a masked self-attention Transformer as a text encoder1 2. It can be used for applications such as image search or image captioning1.
I read it and I really don't understand a lot ;-). But it does tell you some technicalities how it works but I'm not sure you will learn how this will translate to real world usage.
DAMN - Nvidia Halts RTX 4000 Production
src?
Link?
I leak bad news for gamers hoping for lower Lovelace pricing…but also good AMD news! [SPON: Click http://drin4kag1.com/brokensilicon to support MLID and get a 1 year supply of immune-supporting Vitamin D3K2 & 5 travel packs FREE with your first purchase! ]
0:00 GPU Pricing has Bottomed for a Few Reasons
1:51 Nvidia is already Throttling Supply...
ok. thanks! well we'll see when others are reporting it
OpenCLIP ViT-bigG is a model that can learn from text and images using a contrastive loss1. It is based on the ViT-L/14 Transformer architecture1 and was trained on the LAION-2B English subset of LAION-5B, a large-scale dataset of text-image pairs. It achieved an accuracy of 80.1% on ImageNet-1k zero-shot.
1.5 used vit-l only
vit-L
you better hope he is just click baiting else whip out the copium
so that prompt would behave similar to 1.5 I believe?
Nvidia has reportedly halted production multiple times already this year because of chip shortage - in march and may
Specifically, we use OpenCLIP ViT-bigG [19] in combination with CLIP ViT-L [34], where we concatenate the penultimate text encoder outputs along the channel-axis [1]
This makes me think that maybe it is good to keep the prompts the same. Like it's just using multiple data sources. But now I have to look up each of those models to see if they are intended for different purposes.
no, no chip shortages they stopped to divert to hopper
I'm glad I picked up my 3090, definitely skipping the 4000 series
screw it im just gonna get a phd in ML so that i can understand all of this SD mumble jumbo
I just wonder how GD bad the 5k series will be and will it take both testicles to afford one?
nvidia will push prices for sure. 4090 sells well
Being a Eunich and having a 5090 I rather not have the 5090
They know gamers and home AI enthusiasts will pay any price for gimped hardware. So they have no reason to change.
I expect it to get worse tbh
Nvidia doesn't want to be a gamer company anymore. They want to be a datacenter AI company.
For me I am leaving Nvidia end of the month as I am jumping off this sinking ship while I can.
I think they just a lot more sales to data centers now they no longer want gamers cutting into the bleeding edge fab capacity when they get higher margins on their data center cards
If that means I can't SD, or train (which it doesn't) then so be it. Time to buy a replacement I can barely afford and wait it out for a few years and really hope Celestial or Druid come through.
Train in the cloud.
Oh, yeah if 1 chip sells for 35-45k or it takes 10-20 chips of a 4090 for the same it is a wise decision for now.
No, I refuse to pay to train.
A nice compromise would be some higher vram 🙂
right now the AI industry is Nvidia's no. 1 customer
4090 = $1800 paid to train. That's about 3600 hours of cloud training time.
is civitai the only place to get sdxl finetunes or are there other popular sites?
VRAM is the thing they don't want to sell to home users because that cuts into datacenter AI.
5090 is 100% 512 bit bus which is confirmed and means 32GB min, but Micron now has a new technique where it doubles the ram for the same bus so could be 48GB (some cut)
48gb consumer card would be insane
I really hope that isn't the case, in theory it should benefit them to sell cards for people who want to do stuff locally? Unless they are expecting proprietary AI models to be the future that can't be run locally regardless of hardware
datacenters don't want GDDR they like that HBM stuff
and cost $2500 lol
for the 5060ti lol
Oh, 5090 I expect 2k-3k MSRP
It's just the profit margins. More profit selling a $30,000 48 GB card to a datacenter because no home user will pay $30,000. They will only pay about $2000.
I am not even going to pay 2k as I have been saving since Dec and still can barely afford 1k
I refuse to buy on credit or go into any form of debt.
I'll give up SD before I will
1k will get a good 3090 I believe
Last time I brought up proprietary models and AI alignment someone accused me of being Qanon, but if I were an AI company I would want to produce a closed source cloud-only model simply because I could make sure it had non-bypassable safeguards. Censorship is just safer for business whether images or text.
Only if I knew the person selling it
Ultimately data centers are more efficient for the end consumer though, like if I wanted to rent a cloud GPU for all the SD stuff I do I would be spending less money than I do buying cards from Nvidia.
Or a new 7900 XTX on amazon including tax assuming the rocm pains get ironed out...
Yeah I don't even share my finetunes lol I'm personally kind of greedy
7900XTX is mine end of this month the Nitro+
If AMD and Intel can get their AI performance on par with Nvidia on Windows it will be good for them. But right now Nvidia is the only solution that really works well for local AI.
like its in the mail?
nitro+ is badass I wish that one was available when I got mine
I mean that is when I will have the 1k
Ah. I have the merc black one from XFX if you want deets
btw, with rocm now on windows with hips I am just going for it as I saw training before that dropped on an XT and XTX is faster
have you used hips yet?
"hips"?
hips is downloadable (about 2 gigs) from AMD and allows rocm to talk to cuda
if you're talking the AMD HIP that's part of ROCm that's with it. I'm on linux so it already works
ahhh, I hate linux desktop so I stick with Windows
have you used hip though to talk to cuda?
Incorrect. HIP is AMD's version of CUDA. while it's cross-compatible mostly due to a similar api the code still needs to be manually updated for it to work.
So nothing CUDA will work out-of-the-box without someone at least recompiling it with hipify-clang
yeah, I barely understood about hip as that is new to me. Still, a good move
what pisses me off is pytorch and tensorflow dragging their heels for rocm 5.6 support
HIP is specific to AMD cards and its similar enough to CUDA that a some projects implement hip support by literally search-replacing all the cuda code with hip code, or using tools like amd's hipify-clang
its not an automated process though. If a project doesn't do that, cuda kernels wont run on amd cards
right
pytorch has had rocm 5.6 wheels for like over a week now
Chicken and egg problem. 99% of AI is on Nvidia so why support AMD? But no one will buy AMD for AI until it has support.
btw, how does training work on your xtx?
rocm 5.6 benches
#✨|sdxl message
never tried.
it still has memory issues so I haven't bothered
I am buying AMD regardless. card is dying and I refuse Nvidia.
I can only run 1024 @ like batch 16 ish
3090 can do batch 36 or something like that
friend has a 3090 and can't do batch that high
Flash attention support is coming for rdna 3 like this fall or something and isn't coming ever for rdna1/2 afaik
so that may help
sdxl?
xl can do higher batches than 1.5 by a mile
yeah, but SDP is supposed to be agnostic and help but pytorch is dragging their heels
I don't ever talk about inference only training
ROCm literally doesn't have flash attention on consumer cards which is what SDP uses to function
afaik
RDNA 3 is getting flash attention like "sometime" which I think might be this fall
Until pytorch accepts that ROCm 5.6 exists sdp isn't going to work
Pytorch already uses 5.6...
remember 7k cards are the first to have AI accelerators and the xtx has 192 of them
no, I looked 5.4.2 was it
I'm using it right now lol
you are on linux is probably why
pytorch doesn't have any rocm support on windows yet
which is what I am talking about
they are dragging their damn heels as if paid off by Nvidia or something
ROCm as a whole doesn't support windows yet properly
only the hip sdk does
the ML libraries still dont
who is they in this criticism
AMD needs to port those first
I feel like someone in here was talking about that having changed very recently?
before torch can support rocm on windows
well, we just got rocm on windows and the hip support on windows so soon I would imagine
If you look on torch's git it's all actual AMD developers implementing the ROCm support
makes me wonder if they are not doing it for fear of hurting their MI sales as Nvidia is afraid of giving us ram due to their hopper sales.
Well "we just got rocm on windows" and "they are dragging their damn heels as if paid off by Nvidia or something" don't really make sense together...
You do realize for AMD, which is ROCm, they should have all came at the same time, so makes perfect sense.
https://www.phoronix.com/news/AMD-HIP-SDK-Windows
this Windows release is limited to the HIP SDK portion and not the numerous machine learning and AI libraries that also encompass ROCm on Linux
Whale shit
That is the dragging of heels I am talking about
So if you're completely adverse to using Linux I'd hold off on the XTX until it actually works lol
their hardware is their but they are slow to implement the stuff to go with it
it's been speeding up I think
rocm 5.5 took ages to come out and get implemented in torch
Not adverse as I used Linux 98-2012 but the desktop I just despise the way it works.
5.6 was pretty fast
lot of people use KDE which has come a long ways in the last couple years. Has mixed fractional dpi and I think rudimentary HDR support.
Its what the steam deck uses and its pretty alright
Ubuntu with cinnamon is still janky
Don't like ubuntu so couldn't say where it is now
if you have the drive space and are hellbent on an XTX you could always dual boot
Like a year or so ago the linux got an new NTFS kernel driver so you can share data between the two OSes
I like popos interface. It's simple tiling window system is really convenient for me. I've installed the desktop on non Pop! OS ubuntu systems too.
I do dual boot and love how easy it is to see all my windows drives. Of course Windows doesn't see my linux drive.
how come i always see people hyping up how awesome linux is compared to windows but then when it comes down to things, theres always compatibility and other frustrations that linux users just learn to live with
If you make your root BTRFS it can: https://github.com/maharmstone/btrfs
At least with AMD I can have my fan curve back cause green with envy is a dead project now and no longer works so I have no fan curve
your 3.2it/s BS16 what sampler and steps did you use?
3.2 it/s was for inference, idk anything about training. FP16 Euler 30 steps
Full SDXL benches on rocm 5.6
#✨|sdxl message
Is it possible to merge SDXL and 1.5 that I trained earlier?
nope
Ah that's a shame.
with batch 16 I got more it/s per-image but I didn't write down the numbers
it was a good like 10% or so, was kinda surprised
3.2it/s for BS16 is really not bad though
I'm sorry for jumping in with questions that were probably discussed a lot of times here, but I failed to find an information on web:
- What are recommended steps for base + refiner? 30/10? 20/20?
- How many noise do refiner need? Should I just generate incomplete 10 steps pic with base and let refiner handle the rest?
"bs16"?
you said you used BS16
"fp16"
idk where you got bs16
Going back to my earlier question about clip_g and clip_l, the more I look at it the more it seems like the two prompts get combined and the difference between them is whether it's using the new SDXL CLIP (clip_g) or the older SD1 CLIP (clip_l). I don't really understand the g_pooled part but it looks like that is just taking a subset of the tokens. py def encode_token_weights(self, token_weight_pairs): token_weight_pairs_g = token_weight_pairs["g"] token_weight_pairs_l = token_weight_pairs["l"] g_out, g_pooled = self.clip_g.encode_token_weights(token_weight_pairs_g) l_out, l_pooled = self.clip_l.encode_token_weights(token_weight_pairs_l) return torch.cat([l_out, g_out], dim=-1), g_pooled
btw, friend is testing on his 3090
for inference mind you
for steps you can refer to this #✨|sdxl message
recommended 1/3 of the steps for refiner
36 is max, 23700 when VAE Decode hit
So I think if you want to use your old SD 1.5 prompts, put those in clip_l and if you want to use newer prompts that are better undertood by SDXL, use clip_g. And if you want to use a mix of both models, put the prompt in both. I disagree with the idea that clip_g is "content" and clip_l is "style". I see nothing in the code to indicate this.
class SDXLTokenizer(sd1_clip.SD1Tokenizer):
def __init__(self, embedding_directory=None):
self.clip_l = sd1_clip.SD1Tokenizer(embedding_directory=embedding_directory)
self.clip_g = SDXLClipGTokenizer(embedding_directory=embedding_directory)```
let me check the actual it/s with 16 batch inference
okies
didnt a1 come out with a feature to use a different set of prompts for the hi-res fix stage just before sdxl .9 came out? ive always viewed clip g as that if not similar
Thanks, that's useful! Any recommendations for noise levels for refiner?
On his 3090 - getting 4s/it at bs16
not sure, sorry
Refiner only supports clip_g. That's all I can say for sure: ```py
class SDXLRefinerClipModel(torch.nn.Module):
def init(self, device="cpu"):
super().init()
self.clip_g = SDXLClipG(device=device)
def encode_token_weights(self, token_weight_pairs):
token_weight_pairs_g = token_weight_pairs["g"]
g_out, g_pooled = self.clip_g.encode_token_weights(token_weight_pairs_g)
return g_out, g_pooled```
it stabilized at about 4.65 s/it which (1/4.65)*16 == 3.44 it/s.
Euler or ddim?
Everything I can see is that clip_g is the new text model, clip_l is the old model we all learned how to prompt with SD1. And if you fill in both, they get concatenated when encoding the weights so you get influence from both.
im using euler which is basically the same as ddim. 26 steps base 4 refiner total time including decoding and image processing/saving is 153.78 seconds
no loras or anything to slow it down?
thank you this confirms what I saw in the specs of the two cards
can you use sdp?
given the whole no flash attention thing
yes but without flash attention its slower and the VRAM spikes so hard I can't even gen a single 1024
oh, you are seeing how their 7900xtx runs?
god damn, none of the llms could even explain the different prompts in sdxl to me
L and G?
he wants to buy one so im showing mine off
No way would they be trained on that info. You'd get a hallucination at best.
a 3090 ran all that 40s faster but he was using xformers or sdp
sdp isn't implemented on AMD as far as I know
yes. I understand they're 2 different clip models, but do not understand how to use them in unison
not yet, no
I have my new 3090 here, I can run side by side tests
it must fall back to something cause it doesnt err it just runs like shit
Maybe comfy can explain it since it's his code I'm looking at. 🙂
My workflow explains hpw
I've looked at the research paper but it's over my head
Clip g is made better for natural language, clip L is made better for tags
yeah it falls back to the basic implementation
if you use my workflow, it has a breakdown inside how to use them
And that's because clip G is the new one with SDXL and clip L is the old SD 1, right? And they just get combined if you use both?
if you play any games I'd say the xtx is worth, it smokes in 4k. only AI I'd consider the 3090 instead.
other away around
G is new, L is old
just pretend theyre the same thing is my scientific approach to it
I don't know why but I can't load the yaml file I downloaded from there
the model was trained with the appended output of both CLIP models
That's what I said?
sorry, yes
it just doesn't load
Ah ha. That was the last piece of the puzzle I was looking for. Thank you!
So same text input on both is what it was trained on?
Your code is quite readable, by the way.
they work together well when they share the same prompt, but you can get way better results/quality if you know how to use them apart
yeah, I read that 1.5 uses L
yeup
when I asked bing to give me examples of what to put into each prompt it locked the conversation, lol
@vital ermine Did you have any tests you wanted me to run again Bein?
And the best way to know how to use them is to understand in detail how they work. Which I somewhat understand now.
let me copy and paste my explaination
Bing training data doesn't know anything about SDXL.
oh actually, I can't
me or him lol. his friend has a 3090 so I was comparing to that
similar results to you. 36 batch is sort of the top end
runs a bit faster @ 16 batch compared to the XTX
SAI needs an explain like im 1 for clip g + clip l usage
rocm is still missing a lot of features so I'm curious to re-test everything once they have proper SDP support
how long did it take you?
per image
so a good approach would be to put the main description into the pos g, and the tags into L?
I posted a link to the research pdf and it seemed to reference it
I think the explanation we came up with here is pretty good. G is the new text model introduced with SDXL. L is the old model we all learned how to prompt on 1.5. And then the training was done by combining both of those. So you can write G prompts in more natural language and L prompts in the SD1 style of tags.
if you load my workflow I linked to, I provide examples
Test as it will be faster if you are both going head to head. FOr me a 7900XTX is as fast as a 3090ti if you both use the same flash attention
I mean his 3090 did. Mine tops out @ like 16 I'm pretty sure lol. 3.44 it/s per-image for me
thats actually how i understood it when i first read about it but people made it seem way more than that so it confused me lmfao
I think that might be a good way to look at it. Write the G prompt in natural words and write the L prompt like a SD 1.5 prompt.
thats about how many it/s I get as well
and then refiner. I guess I will just have to experiment to really get it. but haven't worked out exactly what to put into refiner only. I guess things like "sharp, colorful, clear" but then should it also have the entirety of the primary prompt?
based on current comments by joe (sai community manager), it doesn't look like we'll be getting official documentation for the difference between the two clips. More like that they officially endorse not querying them separately
but I do have a big advantage in VAE decode, so I am not sure
this could obviously change - but at least based on current outreach it's highly unlikely
VAE decodes really fast @ 1024 batch 16
dont know why
I think just use the G prompt with the refiner. That's all it supports. It still needs to know what it's making.
1600x1600 is death without tiling but 1024x1024x16 is fine
yep, I compared the maths specs a couple of weeks ago they are almost identical. If AMD gets xformers or SDP it will be as good as a 3090ti with AV1 modern.
sometimes I'd do things like paste the lyrics from a doctor octagon song into 1.5. probably not the recommended approach, but it made interesting things
here, if you send me your project you are testing, I cna bench on mine
Any text will produce some output, even gibberish. It's all just tokens to the encoder.
yeah, I put the random characters my cat typed on my keyboard one day
anyone know why a lora would output only b&w images?
The paper explains what the different CLIPs are. Comfy's code seems to show how they get used. And we now know that the model was trained on the combined outputs. So to me that means write natural language in G and then old-fashioned SD 1.5 prompt in L. But use both, not just one or the other. Since the model was trained on both.
gotcha. I am still trying to catch up on the technical side of things. all of these concepts are fairly new to me
I'm sure there are some aspects I don't understand. But I understand more than I did before. And I prefer that to simply "experimenting" and then guessing at the results.
Thats how I do it. G is linguistic, L is tags
and I on average can get better results than just mashing the two together
AMD is sort of weird so its hard to calc numbers from the specs. Like their RT cores im pretty sure are hybrid, which means in raster games you get between 4080 and 4090 performance becuase the RT cores still do work while in extremely RT-heavy games its only like 3090-3090 ti because the RT cores aren't as specialized. Might not apply to ai cores though.
But the part I didn't get before is that each one should be a complete prompt. So the "tags" aren't just style tags but also describe the complete image.
I could also just be wrong so there's that
when the nvidia cards are using cuda instead of optix my 7900 XTX benches higher than a 3090 ti in blender
yes, I linguistically ptompy on G, then tag the same concepts in L
You can also leave stuff different between them for some cool effects
what do you guys do for the negative then? same thing? I don't normally use much natural language in the negative prompt
I tested dual prompt for negative, saw very little difference, so I combined it together for my wiki release
AI cores I only just found out about as no article, or spec site, even mentions them.
Haven't gotten that far since I just figured this out. But maybe for negative G "too many fingers on the hands" and negative L "bad hands, extra fingers, deformed"? Just a guess.
so if it was like "analogue photograph of a kitten in the grass" on G, it'd be "kitten, analogue photograph, grassy field" or something on L?
wait what?
how did i miss that T.T
can you link the paper? ❤️
G: A cinematic photograph of a corgi in a field of colorful wildflowers at sunset with snowy mountains off in the distance and cloudy sky
L:Photograph, portrait, cinematic, bokeh, f1.8, sunset, corgi, mountains, colorful flowers, cloudy sky
amd themselves barely uses them lol. none of their software does yet except maybe in pytorch+rocm I'm not sure.
much love! ❤️
ahh, thanks. I need to check that wiki out it seems. and also figure out why that yaml won't load in comfy. I somehow managed to get all sorts of spontaneous errors
https://arxiv.org/pdf/2307.01952.pdf but it just names the models and then you have to go look them up to see how they differ.
AI accelerators is called that for a purpose and they are not like tensor cores. I hope they get used.
FSR and all the ai gaming stuff still just uses the regular shader cores afaik
Sorry, I linked the wrong one 😄 use this #✨|sdxl message
yeah, a bit of a mess
Man, i need some help with LLM's right now lol
until about a month ago most VR games had the frametiming of a pentium CPU lol
I really just have a lot I still need to learn. target width/height vs base etc. and can't expect to have personal tutors in here. so doing my best to figure it all out on my own. but some things have proven difficult to find information on
I am so burned out on SD at the moment
have you messed with the llama models at all?
take 2 days holiday XD go watch a series
wizard-vicuna
Mom and I just started ATLA haha
There is a section in the paper about that but I didn't try to understand it.
oh, and I am about 45 hours deep in a DND series lol
what are the hip new LLMs? I'm still using like OPT n shit...
time to touch grass
with my ex boyfriend turned best friend lol
I saw that now it can do VR
I used the 13b model of that on my 3080 and it worked fantastic
3090 just got here and I downloaded the 30B version from the same poster, and it sucks left ass cheek lmao
my normal approach to these sorts of things is to go back and forth between reading about it and experimenting. and then at some point it just clicks. but I can't get there just by reading
I ran Devil May Cry 5 @ like ultra 8k 80FPS or something stupid like that. Under proton/linux as well, not even native.
which linux distro did you mention as Ubuntu + Cinnamon is more like Windows but still has irksome shit in it I just can't get with.
Yes you also have to experiment. What I was talking about earlier is when people just do "experiments" with no understanding at all and then make incorrect guesses about how the system works.
ugh, I have to use the 7b models in 4-bit mode on my 6gb 3060. but pretty neat. I'm curious how they actually train loras
I would love to make a character for vicuna 30b that can auto segment prompts into the tag part for clip L
dunning kruger effect
That should be an alias for Youtube tutorials these days.
13b has been blowing me away. I use it for interactive storries, or I guess roleplay in this sense, and its shockingly consistent
nearly as good as some of my best chat GPT 3.5 roleplays
most distros have something with the KDE desktop environment. It's sorta half between windows and mac, I really like it. Ubuntu has "KUbuntu", debian has a kde version, etc so just pick your favorite.
that's actually a great idea. I believe they'll be integrated more as time goes on. seems like ai models are evolving to be modular and specialized
13b of what? vicuna?
I was always a Gnome guy in linux
its really damn solid
7b is a bit on the slow side. but it's still pretty amazing. and then once I figure out how to use deepspeed
does that run @ 8 bit or does it NaN?
actually responded fast enough to be worthwhile
I like kobold's Nerys for CYOA but it NaNs at 8 bit
it runs at 4 bit, and it runs really fast
it uses about 11.9GB VRAM max on my 3080, and it runs at 8-13t/s
it seriously runs good
like, the quality is shockingly good
4 bit uses 11.9???
I tried to load a 13b model and it did not like that
like I said, it does roleplay and aharacter consistency about as good as chat GPT 3.5 does for me
I wish it could utilize my ram more. I have plenty of that
is it on HF
there are 2 bit models now, but I have not messed with them
yeah, its uncensored too
if you have anything you wanna test with it, I can run it in instruct
why is there so many
I like to ask it questions like "how do I do crimes?"
it will 100% answer
This model has not even a sliver of morality
I tried to see how far I could push it, and I ended up feeling terrible lmao
Vicuna was the only one I could ever get to generate even partially coherent output. Every other GGML model I tried just wrote total gibberish. Word salad.
I cna link to the one I use
it'll sometimes give it's "opinion, but it will answer virtually any question
this one has that cut out
I am not joking dude, it has 0 morality
sounds like I can finally use an LLM for writing help on genuinely evil villains
it is a 100% un-censored model
you can also use yaml file to give it personality traits
any bad thing you can think of, multiply it by 10 lmao
and tell it to be an amoral sociopath
it's dug into the darkest corners of the internet
you don't even have to tell it
this model will just do that
it is built for instruct
have you tried the pygmalion models?
thats what I started with for roleplay
6b
I think they're for dudes into waifu chats or something. but could be wrong. anyway, they didn't really do much for me
that one was solid
it'll tell you how to make drugs. in detail
GPT-4 is still king overall. It would be interesting to see what a minimally censored and minimally aligned version could do.
so if you're into that you could use it. but you shouldn't
oh for sure lol
gpt-4 was like scifi in march. now it drools
it can't even help me with simple things
The model is still the same but they have to add alignment to avoid bad publicity.
could use the api, but that can addup
Every time someone gets it to say racial slurs or uses it for inappropriate uses like medical/legal advice they have to put a guardrail in place to avoid gettin in trouble.
well open source models are going to blow past openai
for sure, only a matter of time
each guardrail put into place literally dumbs the model down because they're all overarching
OpenAI just wants to stay out of trouble.
generated on a 3090 in 15 seconds
"as a large language model created by openai I'm unable to assist you with that"
before models were uncensored, I used to get that a lot as well haha
let me try the same question with 30b, see how it does
I told it to not tell me that my feedback was appreciated again, and then in it's response it told me my feedback was appreciated
"don't apologize!!"
"..sry >_< "
vicuna 13b is also very solid at translation, which surprised me
at least between english and spanish
they are limited in their specific knowledge of some things due to their size
yo which one is it
Any time it says "as a large language model" that means you hit a guardrail that they programmed in to stay out of trouble.
I jailbreak them, lol
I gave something to bing in latin and realized it was googling the latin to respond
so I started asking it offlimits things in latin
and by the time it translated them it was past the guardrails
that one
nice, gunan try it out lol
doesn't really work now, it just translates the latin
but you can bamboozle them still
there's a llama2 uncensored one that's also pretty great
(also on TheBloke)
13b's response
30b's response
i havent messed with local models yet. but ima play with it a bit, i heard llama 2 is good
@autumn forum what strength is your chappie lora intended to be used at?
there's a chappie lora? 
1 seems to work well, so does 0.85. i just kinda sent it out there cuz it was fun to do.
i kinda plan on downloading a 3d model and making my own pictures of a full turn around to get a lot more high quality data to make it better later on
yes! i made one lmao
yeah. I feel like some really small models could effectively be used to help with prompts and things. but I'm not relaly knowledgable enough to figure that stuff out
just saw it in #🔧|finetune . awesome. I used Chappie a lot in my prompts as a source for machines. I will check it out! thank you
np. probably pretty blurry lmao. i feel like that was a underrated movie. i really loved it.
why are mine blurry 😭
yeahhhhh
do you train at full resolution, or at 720p?
i trained at 1344x768 and a couple 3d renders that were portraits. i can try and retrain. there might be a really blurry one in there
yeah. if around 10% of the source material has a (similar) flaw - then that gets picked up more than anything else
yeah - it was special 😉 Blomkamp's sci-fi tech designs are just really nice. loved district 9
it's why I can't use youtube videos - cause the compression gets learned faster than anything else 🤣
yeah theres a couple of blurry ones. ill download the 3d model and make some pics real quick and retrain, ill send a link when done.
YESSS i wish there was a sequal!!
kept trying to make it work - but everntually gave as I only created "youtube artifact" loras
hmm okay noted. no more low quality photos in datasets, must be sharp
always the highest quality you can get
low quality is fine - under the condition that they don't have the same low quality artifacts 🤣
thats difficult 
but yeah - usually if you get it from one source that's not an option
if you want to use shots from a movie you best get at least a 1920x1080 (full-hd) blu-ray source or even 4k uhd.
1920x800 most of the time
4k, with emphasis on the highest bitrate
true
1080p is fine - but you really need to look at bitrate - as low bitrate artifacts get picked up way too fast
16:9 movies 🤢
yeah that's why blu-ray makes sense. you can't use streaming stuff

