#๐๏ฝsd3
1 messages ยท Page 83 of 1
yess use that @radiant quiver
is that on Flux? It has upscaller stuff?
that's cascade with a lot of custom nodes and model tweaks
i'm using random attention guidance
Cascade? wow
Impressive, and why are you using Cascade?
it's about the most capable model we have imo
that and flux are the top two
there's so much unrealized potential with cascade that's bothered me for months... finally been writing the code that's been missing
these aren't upscales they're directly generated at these resolutions
and i'm hitting a peak of 11gb vram
2560x1536
oh yes really when I got use to it it delivered some really nice stuff
that's neat
it's gone way past base cascade quality with this stuff
the potential was always there
i'll never understand why SAI didn't promote it more, they really achieved something great with it imo
it's sad it remains overlooked
good luck getting this image in a straight shot out of any other model
It's got obscured by SD3, as I understand it SD3 have more prompt adherence or promised to follow better the prompts. Cascade was kind of evasive I think.
yeah they announced sd3 a week later like it was coming out in two weeks lol
so no one ever developed tools thinking it'd be pointless
been doing my best to fill in those gaps
got self-attention guidance implemented, along with random attention guidance, and tiled sampling now for stage B
finetuned stage B which helped a lot
implemented a custom noise sampler designed specifically for stage B
I got some nice generationa in cascade, but prompting was a little tough
yes when it did good it has some really nice textures "not AI" looking
does anyone have a chatgpt prompt to take a main prompt and split it up into the 3 different text encoders for sd3?
i'd ask this guy
no thank u... i'll keep my head attached to my neck
What r u using?
stable cascade and a bunch of custom stuff
Ohk
right now flux is the best if you're looking for prompt comprehension imo, and cascade is best for style and fidelity
nf4 flux weights in forge! what's nf4?
Is flux lora only for FP8 dev?
normal float
oh I need to try so much stuff, install Reforge, and try this
this is forge forge
ohhh ahh
sounds neat. only 10gb file
I don't know about with chatgpt, but there are plenty of local models that can output a JSON format, which you could then have a node pick entry 0/1/2 for t5/L/G or something like that. You'd just need a complicated system prompt to reliably make it work. That or you'd have to make a lora/fine-tune for it.
Just looking for something simple for now. Maybe even just a detailed description of what the different encoders take in so I can engineer my own prompt. I kinda winged something rn but could be better
Well you can always make a good system prompt that spits it all out in one reply, but you'll have to manually copy/paste the different prompts out
I know that it's definitely entirely doable because I've done it in the past. The JSON method would just save the carpal tunnel of copy/pasting over and over
And automate that portion, you could even have the node have pins for each entry
Rob who made Robmix has a ChatGPT GPT that does this
it can give T5, Clip L and Clip G
personally I use a prompt someone here posted that was slightly modified from the auraflow paper
I think I changed the word count but that's about it
although I have finally started writing my own prompts because I now do different prompts for different Ksamplers instead of feeding all the same
^ toad in mario should have been this
Uhm, nice try Schnell lol
Pro did the best job, however, a lineup of people holding hands, was doomed to much post processing needed lol. This is the after I removed some extraneious arms and fingers
Flux trains amazingly
better than SDXL
I'm sure we will see a lot of cool Flux Loras in the future
The NF4 forge's model is fast, very fast
But comfyui isn't compatible, is it?
Okay, fresh out of the oven https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
what is NF4?
wow
(i) NF4 is significantly faster than FP8. For GPUs with 6GB/8GB VRAM, the speed-up is about 1.3x to 2.5x (pytorch 2.4, cuda 12.4) or about 1.3x to 4x (pytorch 2.1, cuda 12.1). I test 3070 ti laptop (8GB VRAM) just now, the FP8 is 8.3 seconds per iteration; NF4 is 2.15 seconds per iteration (in my case, 3.86x faster). This is because NF4 uses native bnb.matmul_4bit rather than torch.nn.functional.linear: casts are avoided and computation is done with many low-bit cuda tricks. (Update 1: bnb's speed-up is less salient on pytorch 2.4, cuda 12.4. Newer pytorch may used improved fp8 cast.) (Update 2: the above number is not benchmark - I just tested very few devices. Some other devices may have different performances.) (Update 3: I just tested more devices now and the speed-up is somewhat random but I always see speed-ups - I will give more reliable numbers later!)
A couple of example results from my latest system, a video player with an audioreactive playhead. Thereโs a couple of things more to it, but Iโll leave it to my patrons to explore.
You can access this system plus many more in my Patreon profile: https://linktr.ee/uisato
#vfx #touchdesigner #audioreactive
@rain current are you running this https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main
It's the one I've downloaded; now I'm going to try it with the experimental node
awesome, im curious how it works
im assuming you use comfyui, do you need any specific nodes to run this or can you just load it up in the usual flux workflow?
b..
@rain current curious to know how you find it in terms of speed and quality, i think i will wait for further developments with schnell version of the nf4
Well, no idea, because I'm getting an error in KSampler, and I have comfyui updated
Is there any way to use nf4 in comfy? (Downloading to test now.)
I'm getting an error in KSampler, maybe there's another way to do it. I have ComfyUI updated and bitsandbytes installed
I've seen the nf4 posts but still
6-8GB people can finally run this model!!!
I really hope it doesn't influence quality that much
I used to have only 6GB too, so I know their pain
and NF4 may outperform FP8 (e4m3fn/e5m2) in numerical precision, and it does outperform e4m3fn/e5m2 in many (in fact, most) cases/benchmarks.
sounds magical
and apparently its faster too
yes but im not going to bother with that dev version, that will run slow on my system, im waiting for the author to release a version based on schnell
incidentally i finally wiped out a1111 from my pc
never have i thought before i'd be so drawn into comfyui
i used to be pessimistic about the noodles
now i love it
and it all happened when i understood how the nodes work
Flux + ToonCrafter
Here is the ComfyUI solution that everyone is refusing to share because Schnell isn't implemented yet. ๐
Lets you load in NF4, will test next.
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
Okay, it's working for me now. I had an outdated version of bitsandbytes, so I updated it
But it doesn't work with the Loras
2.5% speedup. (That's insignificant IMO.)
With NF4
With BF8 (ignore the sign they're holding, just keeping prompt constant. It's BF8)
The same, slightly faster............ 2 seconds 
Honestly there's a really obvious difference. That's surprising.
The girls are much sadder with BF8 (seriously what???), and the lines are more expressive with NF4.
Trying to think up an experiment that could show a clear quality difference... ๐ค
Good morning. Are y'all still laughing at me behind my back today? Been a couple days since I got told everybody makes fun of my balls when I'm not around
Just checking if I'm still the trend
I recommend blocking the person who told you that...
We're talking about a new model format that's supposed to be faster and higher quality.
I found it last night actually, but I couldn't get it working on forge
I'm actually very curious about forge after seeing the repo. I assumed it was just an A1111 clone, but it's more versatile.
How's the jock itch? ๐
WHere is the node located? I have it installed but Comfy menus dont show it
You have to install bitsandbytes, but I'm using a custom anaconda environment and I did that manually from the command line. Don't know how to do it on a normal install. ๐
(And I had your exact error, so I'm pretty sure this is the issue. If you scroll in your command prompt window after restarting, you should see a notice like "no model 'bitsandbytes'")
When your computer creates images at the speed of light, perhaps the difference isn't as obvious?
Lowram mode results anyone? lol
Well it's supposed to be higher quality. Problem is, Dev's quality is already perfect. Trying to find a prompt that fails some % of the time.
Okay, some slight failures:
an anime girl writing on a chalkboard in a classroom, she is wearing a sailor school uniform, she has a red ribbon in her hair, there is a window on the left showing blue sky and a school football field, there is a bird sitting on the windowsill, there is a desk on the right, there is an apple on the desk, on the chalkboard is written the text "BF8", there is a nameplate on the desk that says "BF8",
I already installed BnB too. I mean the node itself. Where in the Comfy menus to add nodes is it located?
I cannot see it anywhere
If you double click and get the search bar, it shows up as
Okay, I imagine I can see slightly higher quality, but could be sample size margin of error.
Going to try locking the seed / using increment next.
Ok, weird, it is not appearing
Restart, and check if you get the no module bitsandbytes error. Might have installed wrong.
BF8, seeds 1000-1003
NF4, same seeds
No dice. I just Git cloned it
and BnB is installed for sure
the search turns up nothing
Try a line of people all holding hands, I dare you!
So much post processing ๐ฆ
Okay, I'm convinced that for this particular test, NF4 is unquestionably superior.
Will try different test design next.
BF8
Text correct on board: 3/4
Text correct on nameplate: 3/4
Desk and apple on right: 1/4
Football field visible: 3/4
Score: 10/16
NF4
Text correct on board: 4/4
Text correct on nameplate: 4/4
Desk and apple on right: 3/4
Football field visible: 4/4
Score: 15/16
I mean, that's a HUGE difference in quality.
Well, I cannot figure out why the node refuses to be recognized in the Custom Nodes, so I submitted an issue
But I don't think it can be a Comfy issue, or it would affect me too...
bug reports tend to need reproducibility
are you using painting loras or just getting sweet paintings from flux raw
Straight Flux
Flux Dev btw full model fp16
i'm no farmer but i've seen jeremy clarkson's farm, and caleb would probably trash talk that size of a field.
straight raw dogging the art styles on flux after people telling you that you can't
Doesn't matter. I followed standard steps, so if there is a procedure issue, it should be clarified since it is still a ComfyUI node. I pip installed BnB, then I went to CLI in Custom Nodes folder and Git Cloned the new node code. It installed fine and shows up in the folder too. Yet every time I start ComfyUI (also updated BTW) it does not appear
I do not mind finding good seeds. some may have forgotten that
were you like me and changed your comfyui branch to the flux controlnet dev branch? that doesn't get the freshest updates
because it does matter. if only you can reproduce it on your machine, it's often not a project issue. bug reports really rely on reproducibility
Did you install bitsandbytes into Comfy's python environment? Or into your system environment?
No, I'm using the standard Comfy 0.0.4 version. Portable etc, but updated
Keep in mind it is not crashing the node, it is simply not acknowledging there is one
This repository has a lot of styles yikes https://enragedantelope.github.io/Styles-FluxDev/
You didn't answer my question, which makes me think you didn't install it to Comfy's environment. It's not straight forward. You can't just pop open a terminal and run "pip install"
I did not install to Comfy's environment. But as I understand it, as there was an issue reported, it would affect it running
not the node being recognzied
i just did a 3 step on my swarm ui. swap back to master, run the update script, git cloned the new node. startup logs show it found the node folder fine. check that you've got it in the right custom_nodes folder
This time, BF8 won the score for the prompt-adherence metrics I picked, but it also produced visually inferior results. Not sure how to grade those...
BF8
canopy visible, contains "BF8": 4/4
trumpet in top left: 4/4
trumpet rubies: 3/4
anime top right: 4/4
anime green hair: 4/4
UFO bottom left: 4/4
UFO blue jets: 4/4
cat bottom right: 4/4
cat gray calico: 4/4
total score: 35/36
yeah. it works fine. it's not a project issue
NF4
canopy visible, contains "NF4": 3/4
trumpet in top left: 4/4
trumpet rubies: 4/4
anime top right: 4/4
anime green hair: 4/4
UFO bottom left: 4/4
UFO blue jets: 4/4
cat bottom right: 3/4
cat gray calico: 4/4
total score: 34/36
Prompt `a glass window of a storefront, inside the window there are four products on display in a grid, in the top left of the display is a trumpet with ruby gemstone valves, in the top right is an anime figurine with green hair, in the bottom left is a miniature UFO spaceship with blue jet thruster flames, in the bottom right is a cat sleeping with gray and white calico stripes,
the photo is a view from the street, and there is a canopy over the storefront window that says "BF8"`
The node does not show up at all in the search unless bitsandbytes is installed into Comfy's environment. That is the exact behavior I got.
If you scroll through your command prompt startup log, you will see "no module bitsandbytes".
There will not be a future update that fixes this problem for you, so I kindly recommend continuing to troubleshoot it.
you can keep blaming the project all you want but this issue is local to your machine
maybe another node i used had bits and bytes installed because i never had to install it
Ok so I should Pip it from within the python directory of Comfy?
if its a portable install you want to use the python embedded pip
I'm really not sure, but I don't think so. I'm using miniconda instead.
I think you have to do a command line call to comfy's python followed by an install as part of the same command, like:
comfy/.../python.exe - ???
I tried googling it but no luck. Just comfy installs.
Let me try perplexity.
no environment activation required afaik. just make sure you're running the embedded exe.
This is the confusing nonsense that makes me wonder why python is even a standard. can't wait till we migrate away from this ridiculous dependency hell situation
i found a limit of the network. it can't do mc hammer hammer dancing on ice skates
Don't worry, I heard we're getting a new standard right after we stop using QWERTY keyboards. ๐
Is hammer dancing a specific kind of dancing?
this is a really simple prompt but his leg positions are kind of getting there. mc hammer doing the hammer dance
Oh okay nevermind.
oops forgot to post this
same prompt. i think this expresses the problem perfectly. flux is a white guy trying to do the hammer dance
And sure enough the issues display I am not the only one with this problem
i tried to reproduce it but my install worked perfect first try. and i was on another branch to start so i had extra steps
Add Friend
i dont think he's a friendly
2 or 3 times faster? It's the same for me! Maybe a couple of seconds less ๐ค
Yeah it's barely faster for me compared to BF8, and I still don't have a test that can conclusively prove that NF4 is better quality. Just a bunch of subtly better results.
i think on the newer cards it doesn't improve much. but it do uses less vram for me. i haven't done any quality testing but it doesn't feel worse
The results seem practically the same to me; I wouldn't know whether to say better or worse, maybe just subtle changes
Good point I'll check how much VRAM it uses. Could speed up larger images more in that case maybe. (Busy running tooncrafter at the moment.)
No, it was definitely better at certain aspects of prompt following and overall with visuals, I just don't know how to describe the improvements. How to put them down in concrete terms. (So that I can score the images objectively and numerically.)
It also doesn't know how a hammerhead shark looks
What I don't understand is why some users gain so much speed...
It could be the VRAM thing. Hang on, I'll interrupt tooncrafter and test.
I haven't seen any changes in VRAM either... ๐
FP8 vs NF4 VRAM usage. NF4 uses 2.3GB more VRAM. ๐
Testing if increasing image size leads to greater speed increases.
NF4 at 1440x1440
I think we've downloaded a different version 
BF8 at 1440x1440
That's a 5% speedup.
Okay, I'm done looking for any noticeable difference.
Yes, NF4 is technically very slightly faster, and maybe technically very slightly higher quality.
There's no reason not to use it, so I'm going to go ahead and use it.
If your device has very low VRAM, maybe you will see better results.
what is low VRAM, 12GB?
Lower than what I have at least, which is 24GB.
I'm not seeing very much gain on 16 for speed. 4080 is newer than other 16gb card too
Have you compared memory usage in your tasl manager?
nvm, just got back and did not see yur previous post
it uses MORE memory? That is in direct contradiction of what was 'promised'
But your mileage may vary. IDK.
Honestly I'm done testing it, and I'm sticking with using it.
Flux + ToonCrafter.
Also just realized ToonCrafter requires 512x320, which Flux can totally do. Gonna try that next.
The downside is that it's not compatible with the Loras from XLabs...
oooooooo i got flux working in forge oo goodie
deleted the venv. always a good solution
new forge got all kinda memory controls too like where to swap and how much. llly changing the game again
Should try some other movement maybe...
lol how soon till people are flexin by wearing a gpu on a chain? with the plastic slot guards still on
imagin seeing a guy with a 5090 hanging from his neck. sure it's ridiculous but you know that guy got some kinda money then
If someone gives me a 5090, I'll wear it around my neck for an entire month
buys dead 3090 on CL for $50, sends to r_made ๐
i'd cheat and i'd get a pci riser to use it while obeying the neck rule
maybe like, some insulation between me an the card
just gotta sit with the desktop between my legs an the riser oughta reach
https://youtu.be/q5xvwPa3r7M?t=310 think they got like 10 risers chained here. should be good
Thanks to LG for sponsoring this video! Check out their Share the Art #LG gram campaign: https://www.lggramevent.com
https://youtu.be/6oKU9APBrwI
Learn more about LG gram laptops http://www.lg.com/us/laptops/
Can PCIe extensions cause a loss in GPU performance? We investigate!!
Tunnelbear sponsor link: Visit https://www.tunnelbear.com/LTT and ...
on a 2080 the NF4 loader/checkpoint got me to SDXL speeds, insane. 5-7 min down to sub 45 seconds
me? 20 step dev
i mean i havent used SDXL in a while and i forget but ya when i stuffed it with Cnets IPadapters seems right for me, the point being i went from 5-7 minutes to 45 seconds on 20 steps of flux dev on a 2080 TI 11GB
its still a good drop yeah
45 seconds is how long it takes for me with 20 steps on a 4060 Ti 16GB... And from BF8 to NF4, there's no difference in time. It seems like the improvement is more noticeable on older cards...
More likely for cards with lower VRAM. This may have been the tipping point for the 11GB VRAM he has
Here is my read now:
so 11GB or more would be a big diff per this
https://github.com/lllyasviel/stable-diffusion-webui-forge/releases/tag/latest
Press any key to continue . . .```
how can i install forge i got this problem
First question: running from main Python or embedded?
(this is for Comfy use of the TF4 I assume)
its for forge. you can tell because the url to the forge project is there and he says "forge"
my first guess is AMD card
Amazon will be giving them away in exchange for the small fee of $2999.99
or maybe i'm wrong. maybe "she says"
It's why I tend to go for laptops nowadays. Though here the diff in VRAM is making a big diff. For $1800 you can buy a plain 4090, which does come with 24GB Vram, while the same amount will get you a laptop with a 4090 16GB (and about 20% slower), but also with a screen, backlit KB, 32GB Ram, 24-32 thread CPU, etc.
I find that heat management always kills any laptop... it is just too restrictive ... if you NEED portable, sure... but if you want a good deal bang for buck, nothing beats a proper desktop.
there's an 80s arnold schwarzenegger lora for flux d now. it looks good! it says it's 90s arnold but naw. that all looks like commando era arnold pre t2 but i dont think loras work with nf4 quite yet and i'm knee deep. ill test it later. https://civitai.com/models/638000/arnold-schwarzenegger-1990s-flux-lora?modelVersionId=713426
Depends on the rig I guess and your local setup.
Right now my GPU is at 100% load and 68 C
in the laptop
Some laptops are MUCH worse at heat management than others, that is not in doubt
laptop is part of the issue too. they're notebook pc's and aren't meant to be on your thermal sponge of a groinal area
BALLZ
My previous ASUS had a cool system in which when you opened it, meaning the lid was up, the entire bottom would open to allow ventilation
my desktop pc has a cooling system called surface area
THough back when I was training NNs for a living I had two rigs with dual 2080tis, and those required permanent A/C in their area
it works by being really big and spread apart so that more heat can escape into the air stream
Electricity bills were brutal too, especially when you add that I live in Rio de Janeiro, a tropical city
i do have some issues with my desktop though. like, it's a rebuilt area 51 from dell but they use proprietary connectors for the case button. i CAN get the adapter and put it on there but i never have so i've been needing to power on my PC this way all the time
so the a/c never let up. Even now in the middle of winter we had a 4-day spike of 90 F
(32 C)
bonus points if you know what im doin
Though curiuosly enough, it isn't that much worse in summer
The average temp diff is 10 C from winter to summer
The toes never come out looking very good. Always messy. ๐
whats wrong with those ones? i'm not really a foot guy but i mean, they looklike feet with all the right toes imo
So my first image with NF4 was indeed faster, but in large part due to the fp8 Clip tensors. It made a really bizarre mistake, so i am now trying the NF4 with the fp16 CLip (slower) to see if this helps
They're fine for a hobgoblin. No nails except on the big toe, which is gnarled. Also her heel looks more like a wrist.
Trying to prompt for feet now to see if that helps.
pedicured feet?
I tried this one and then the basic Schnell, I think the latter gave me better results.
this schnell-dev-merged generated good but sometimes images has this kind of scan lines or things like that
Or somekind of pixelated grain or something
The proof is in the pudding but the only way you're going to be able to really know if this is the model itself or flux is to generate the same image with the same conditions and same seed and test with the different models
If you don't then you'll never know if it wasn't just the luck of the draw
I don't have any reliable way to compare the overall quality of two models. (But Schnell is terrible.)
yes I know, but I generated a lot since yesterday with both models, and I could tell this. And I've never got that effect on the basic Schnell
I just generated a ton with both and at this point I'm able to spot this things, I think
But these hands are practically photo quality. Seriously don't think I have any complaints.
Schnell is quite good, maybe not as amazing as dev, but it works (I couldn't test dev much as it takes 8 minutes to gen vs. 1:30 of Schnell). Dev was always quite amazing, but Schnell was good.
Ok, that's cool. I ran tests of all three and Schnell was clearly worse in the output than Dev-Schnell
but YMMV
Dev-Schnell was fine
ah I haven't tried that one
I don't like the baked in aesthetic of flux
so I am staying on SDXL
I try each task with RealVisXL, Juggernaut, Zavychroma and Helloworld
usually one of those four will be better than the others for whatever the task is
Burger vrs optimized burger
wow that would be quite a burger
Than the other... what? But Ok, let me give you two prompts I use recurringly to compare:
line-art cartoon of dogs playing rugby. They look serious and ferocious and are wearing uniforms. At the top in large dynamic letters are the words, "Don't mess with the All Blacks!"
An ornate teacup is on a table in an old-style coffee shop. In the teacup is a frigate in a storm with very choppy waters as lightning strikes the mast. On the teacup you can read in elegant and ornate letters with ornate decorations, "Tempest in a Teacup".
I don't test anime art BTW as I find it all looks the same and is boring as all hell. I have some others that test prompt ability
here is a third
I'm trying to do inpainting in ComfyUI, and I can't get a soft / blurry mask. I'm loading a mask image that's soft, but Comfy is sharpening it. Does anyone know how this works?
I don't actually do any testing or benchmarking of models myself
cos usually the papers will show some benchmarks anyway
so I am happy to just assume their results are relatively right
typically its FID on MS Coco, Imagenet or CelebA
There is an ocean with the upper part above surface and the lower part below the surface. On top of the surface is a 17th century sailing ship on a calm ocean on a sunny day. Below the surface the water gets increasingly darker the deeper it gets and there are massive eldritch sea monsters.
I test everything since all results are going to be very biased
its quite expensive to calculate a FID score
I don't really feel motivated
I don't bother with scores. I read them and look, but in the end what matters is the results I get, not what their wonderful papers and demos and announcements show
I think it can generate quite good stuff, look
what model, steps and conditions?
stealing your prompts
NO, I got that part, but was wondering what setup yielded that exact result
there are issues with FID score
not correlating perfectly with human preference
but its really not too bad
yes that's what I'm saying, (in this case) the quality is similar to dev, dev would be a little bit better in some details, like the face I think
mage.space with the defaults
I really disagree but people's tastes seem to vary
somethings quality lacks, yes
Yeah but it doesn't matter. As I said, in the end, what matters is the results I get with my particular needs or wants. If they don't align, the highest scores in history won't change that reality. it's like IMDB movie scores, or Metacritic aggregate reviewer scores. Can be as high as they want, but doesn't necessarily mean they will match me
They can serve as a baseline to tell me it is worth checking out, but in the end it is what I get that matters
its a bit like audio gear
like you preffering XL
you can optimise for your personal tastes if you want
I didn't particularly feel the desire to
I just saw two reviews talking about how Flux Pro is clearly unbeatable, and the new god of image AIs. Sounds great and then you see the 'reviews' and see they made exactlt 4 images and tested ONLY photorealism of people. Myeah, ok, but what if I have other interests?
flux-dev with Kajai's flux-disney-lora
so NF4 works excellent bur REALLY needs fp16 text Clip
Inpainting with Flux. Just in case anyone was believing the "impossible" nonsense. ๐
Original render:
Inpainted (to get full art version):
here are two NF4 with that Rugby prompt above: "line-art cartoon of dogs playing rugby. They look serious and ferocious and are wearing uniforms. At the top in large dynamic letters are the words, "Don't mess with the All Blacks!" The first is with the embedded fp8. The second is with fp16:
yeah cherry picking can be annoying
its one of the reasons I like FID, because its 30,000 images
The fact that the fp8 has no rugby ball at all... is a huge fail
The All Blacks are a super famous rugby team for anone who does not know the reference
ooh ok
increase your steps if using fp8
dude, it had 30 steps
revised your prompt: cross section view of the ocean. On top of the surface is a 17th century sailing ship on a calm ocean on a sunny day. Below the surface the water gets increasingly darker. We can see a massive octopus lurking near the bottom of the image.
wow impressive inpainting,it was first try?
Right, but with NF4
you got a workflow with it in it?
see the images above. workflow is included
which image? there are lots
the two rugby cartoons
Oh I would love to tile upscale with flux (well the Schnell version), hope it can be done soon... and looking forward to test that inpaint
Best out 10. 3 out of the 10 turned out artifact-free, and this was the best of those 3.
That is utterly outstanding. and BTW, to show how the commricals can fail horribly on this. Here is a sample of Dall-E3. Not cherry picked, all have similar issue:
thanks ๐
Dall-E 3. Notice how the bottom and top seem to be utterly disconnected?
the weird white caps even make it seem like this is the beach wash coming in
not ugly art per se, but a real fail for the overall image

Flux Pro/Dev and to a slightly lesser extent Midjourney are really good at getting the overall idea of what an image is meant to do as a whole yeah
@errant dust it's done to make it clear to the viewer that it's the top of the ocean - as if we're looking through a thick piece of glass. okay, i don't have CheckpointLoaderNF4 and it's not showing through missing custom nodes. where'd you get it?
Midjourney is pretty baked though
I could never really subscribe to such a baked model
i don't have bitsandbytes - do you have install instructions for that by chance?
i'm running comfy portable, so i assume embedded
ok, so open CMD and go to the ComfyUI folder
k, there
now paste and enter this: ..\python_embeded\python.exe -s -m pip install -U bitsandbytes
without this, your node won't even appear even if installed
okay. ran, seems to have installed fine
ok, if you have not installed the node, but have git installed, go to the custome_nodes folder, also in command line
i did a git clone on it already
your workflow loads now, give me a minute
then you should be good to go
place the file in checkpoint not unet
refresh and my image should now show the workflow with no missing nodes
yup. just taking a bit to download
i can't get the workflow to run, it's having issues with the model - and it's not worth it to me to battle with it.
oh dev is much better it is true
one of my goto prompt exercises is to recreate movie scenes and flux is really good at it. i think i remembered why i've never really been a foot guy. one movie scene in particular ruined my world view forever. you might know it already before clicking it.
aw i forgot to spoiler it. oh well
Schnell
(The one with the border is the original. The full art one is inpainted / outpainted.)
Using differential diffusion node, although I'm not sure it does anything to Flux. I would be surprised if it did.
smart to repost those since that gave more of a buffer to kathy bates ankle hobbling pic
too cute to contrast to that horror
I think I tested that once and found it not working. It's probably just not implemented for Flux
It won't run with Loras BTW
i didn't try to run it with a lora, just out of your workflow. but since you don't seem happy with it, and it's throwing code errors for me, i'm not going to spend any more time on it. i don't have a need for it personally
I only stated that fp8 clips are not good for it. The same was true for fp8 T5 for SD3 Medium mind you
which was much worse than fp16
woah
it fit the entire NF4 Dev model into my VRAM
My render time dropped from 20 mins to 2
using fp16 clip
ok, 3
and 30 steps
ok, that alone is a bit of a game change if results are not obviously worse or something
There ARE differences with identical seed but too little info to make any calls
So first of all, I ran your lovely sea monster with plain Dev as a reference. took 1836 seconds. Then NF4 Dev, took 216 seconds.
i had the same experience on an 11 GB card, amazing
Yes, but this is not even 11Gb, this is a laptop 4060 with 8
and believe me, even Schnell was not this fast with 4 steps
ya, more amazing
Schnell was Schlow
5-7 mins was my wet dream
I did enjoy a speed boost too by manually installing CUDA
12.6 from the Nvidia site
then upgrading Torch to make use of it
Flux pro? More like... FLUX SLOW
heh, it is not even entirely loaded into VRAM, but compare these two lines:
Before:
Loading 1 new model
loaded in lowvram mode 5728.075
Now:
Loading 1 new model
loaded in lowvram mode 64.0
Prompt executed in 189.01 seconds
i think fp16 T5 is noticeable also, trying this to simplify my files
i've never tried fp8 on SD3 medium. now i have to
oh sweet. meta in canada has boosted to llama 3.1 now
Yes, while there is ample talk about it being worse, it is often hard to measure. That is, until I found a very simple way that showed up each and every time. Text. Same prompt and fp8 always botched the text more than fp16. Sometimes criminally so. Interestingly, Flux Dev is not above this issue either, and highlights how much worse fp8 is, and is likely impacting every other aspect of the output.
RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same
Ex: fp8 first and then fp16 next. Flux Dev. The phrase should be "Don't mess with the All Blacks!"
grab one of my workflows and play around with it
fp8 vs nf4
ok, one thing is certain, NF4 for 8GB Vram cards is a total game changer
yeah im so happy that its getting democratized to lower end people
caste is dictated by GPU Model
I used to have 6GB of vram at the infancy of (public?) Stable Diffusion (2022 august)
so even though I had enough vram to run it and all, I still struggled with a bunch of other AI stuff that were just simply out of my reach
so I know how it feels for them
I am so glad for nf4
well, my 8GB had me relegated to the nameless. lol
SD3 2b Medium with the sd3_medium_incl_clips_t5xxlfp8 model - and different prompts to each encoder
ok, now I must try the Schnell version just to see how it looks
lol. Well, the second one is the least worse of them. Though not sure that ocean can be called 'calm'
i didn't prompt for calm, i prompted for choppy
grab the workflow - you get much better results if you don't give both (for flux) or all three (for SD3) encoders the same prompt
Anyhow, NF4 Dev gave me this:
Ok, but how would you split them?
now go back to it, give t5xxl and detailed, rich in concept, prompt - and give clip_l all the ambient and tiny details stuff
you give each of them what they are best at. clip_l is your background stuff, ambience, tiny details you don't notice unless you look for them but that make the image stand out, and clip_g is the black and white, just the facts mam "this is what the image is"
Ok, let me first compare NF4 Schnell as apples to apples first
You mean T5 not g I assume
Regardless, I will be very interested in seeing your Flux workflow
i mean exactly what I said. clip_g (sd3 medium) is your workhorse. you give it the simple, black and white, this is what the scene is. you give t5xxl the richly detailed, narritive prompt - and the text - and you give clip_l the fine subtle details, background, ambient stuff
oh hey it has an animate button now
WIll that work with Flux?
meaning the three encoders?
well, will find out soon enough
same prompt in flux dev. Vegeta eating a carrot like bugs bunny
you only have 2 of them, so t5 does the work of clip_g
doesn't know vegeta i'm tellin u
that's funny cause it knows goku
vegeta would be mad about it
oh boy. NF4 bombed really really bad. Let me try again with 6 steps
maybe the fp16 will fix this cluster f***
wanna know a fun dragonball fact? all the saiyan names are vegetable words. vegeta, kakarot, brolly, bardock, caulifa, cabba. its the anime veggie tales
maybe that's vegeta in his true form?
ok, NF4 Schnell is total no go
nf4 dev on forge
can't do broly either but this is good
Seeing is believing. This is plain NF4 Dev and next is NF4 Schnell. The prompt is: "cross section view of the ocean. On top of the surface is a 17th century sailing ship on a calm ocean on a sunny day. Below the surface the water gets increasingly darker. We can see a massive octopus lurking near the bottom of the image."
nice! action figure route. good prompt skills
now get either of them to add the text "cthulhu awaits"
Oddly, the Schnell is ALSO same speed since the memory management is worse and bleeding more into my normal RAM
next to test is the dev/schnell blends in nf4
I actually removed the text, but FWIW, Schnell mispelled it twice
here is one example:
i couldn't get that text at all
was starting to think it wasn't allowed to spell that word
256x256 image made in 5 seconds with flux. this could be as fast as sdxl
gets text right every time with the new nf4 model
i get 3.3it/s on an rtx 4070ti super
5 seconds in total for a perfectly readable 256x256 image
another prompt, 5.5 seconds at 256x256
somehow text is perfectly fine
amazing
at 128x128 it struggles lol
@errant dust
takes 4 seconds per image so it's not worth it at 128x128
but 256x256 takes 5 seconds at 3.3it/s with like 6gb vram used up i think
let me check again
not a lot of vram
ok nvm it uses 12.3 gb vram
but with that it takes 5 seconds and .5 seconds to load a new prompt
very very fast and no reduced quality @craggy crest
not even worth it to use fp16 or fp8 at this point
here is 512x256. 6.7 seconds at 16 steps
starting to break at 1024x128 lol
understandable
he's been testing it, i wanted him to see what you were doing
ohh okay ๐
Yes, I am using the NF4 Dev at 1024 x 1024 as my baseline, since it is the usual size I work with, and my laptop typically chokes on the Dev model to the point that ~20 mins is not unusual. So let me show you Dev on my Teacup prompt (over 20 mins), and then NF4 just now at 1:47 flat.
Prompt: An ornate teacup is on a table in an old-style coffee shop. In the teacup is a frigate in a storm with very choppy waters as lightning strikes the mast. On the teacup you can read in elegant and ornate letters with ornate decorations, "Tempest in a Teacup".
Dev (normal) and NF4 Dev next:
There are micro-differences, sure, such as NF4 in this case is slightly warmer overall, but 'worse'? No way. In fact, In realized now the Dev was 50 steps, and NF4 was 30. So let me give you the 50 step NF4 to compare properly.
Dev (normal, 1k, 50 steps) and NF4 Dev (1k, 50 steps):
i would say it even made it slightly better
and I was lying. Almost embarrassed to tell the truth. That 50 step Dev took my poor rig well over 45 mins. Really. The NF4? See for yourself:
One curiosity I have noticed is that NF4 is affected much more by greater steps beyond 30 than the 'normal' Dev. In my tests, and I did quite a lot (in spite of the insane times) 50 steps was barely noticeably different than 30 steps. Not so for NF4 Dev.
To illustrate, here is Dev at 30 steps and then 50 (using fp16 Clip):
Now for NF4 Dev fp8 at 30 steps, 40 steps, and 50 steps:
Sorry, was missing a 40 for the fp16. Ok, and here is NF4 Dev fp16 Clip at 30 steps, then 40, then 50:
fp8 clip is the clear dodgy choice, and frankly I saw no speed benefit or memory. NF4 at 30 steps is already superb, and yes there are some changes with more steps, nothing that would have me second guessing the step count. And just for completeness sake, in this particular case, 35 steps seems to be the goldilocks point. So Dev (normal) 30 steps, NF4 Dev 35 steps (both with fp16 clip)
i wont make any definitive claims but i feel like nf4 model often produces more interesting details
Awesome. Any ideas where to find this prompt?
making a prompt for that purpose is not so hard. LLMs are really easy to direct. here's how i'd do one, and then adjust it as needed.
"We are going to make prompts for an image generation model. The model has 3 separate text encoders that work together, but understand text in 3 different ways. When i give you an idea, you will write 3 prompts for each text encoder. The first one is for Clip VIT-L and should have a list of style tags separated by commas. it should follow a pattern like: style, subject, actions, setting, detailing. The second one is for Clip VIT-G. It's the text input that affects the scene the most. Write a couple of sentences describing the scene and it's style in clear natural language. The third input is much more robust and understands a lot more descriptions. It uses the T5 encoder. The prompt for this one should be one or two paragraphs describing the image in detail with elaborate natural language and rich vocabulary. A bit of backstory to the scene can even help to harness the creative power of the model.
Now, given that information, i have an idea about a cat in the sunshine"
this should work in any LLM assistant available. i put it into meta.ai and got this result. I just made it up right now. i make new instructions all the time depending how i'm feeling. it's the same jist usually
thank you
the image
That looks like it'd be an awesome glif!
is that what glif is, llm driven?
if i was to improve on that prompt, i'd give examples . maybe limit what kind of scene components are given to L
glif is secretly comfy
but they took the noodles and put them in a cupboard in the back room to hide them
the noodles are still there though
5120x3072
there's a node on github that is like deep shrink but more gradual
I wonder if it would boost Wรผrstchen
would be next level if it worked
so normal deep shrink is a 2x latent shrink at bock 3 for a duration of 0.3
this node is usually more like a 2x shrink at block 3 which then slowly grows back to normal size across the whole generation
playing with this forge version with flux support. no image to image seems. i can't make it work
saw this and it looks like it's exactly what you were looking for:
https://huggingface.co/spaces/gokaygokay/FLUX-Prompt-Generator
gokaygokay is pretty well known for making some really useful stuff
(had to search through my posts to remember who i was talking to about it all)
That looks cool as fuck. Thank you. Will have a play with it
flux has both clips?
no problem, i havent tried it out, but it looks like it should work. plus, all the different fields you can set to random
not sure why but I thought it had one clip
no, he probably just has it there for SD3 as well
ah ok
i'll check the code
on every model I always forget which one is the smart clip and which one is the tag-prompting clip
https://github.com/dagthomas/comfyui_dagthomas i think he based it on this comfy node*
Anyone remember that old model called SD3?
sweet tool. he totally made it for sd3 and just renamed it. it works though so whatever
yeah, since they all use the same shit under the hood, minus G in this case
thats literally all you need to do to adapt it. ignore the G prompt
am i right that because bitsandbytes added support for qlora, that's why we're now able to load models in nf8?
not sure, i'd have to look at their repo. havent looked at any recent advancements they've made lately. all i know is that fluxdev nf4 in comfy is a godsend. the quality is mostly equal to or just a hair lower than the fp8, but i get 2.5 sec/it now, down from 8 sec/it on this 8gb card. if i want, i can always save off the latents and resample with the slower fp8 version or something
i want to try a schnell version at nf4 to see how it does, saw some people uploaded some already on civit, but F making an account there to download them
You can add LLMs to improve prompts, so in the case with this, it can be added as part of the workflow, so whenever someone types a prompt, all of the above also happens to be addded (so they don't have to type in that aspect every time).
You can scroll up I posted some comparisons but it's just crap
Well no wonder I like glif! Comfy is also my fave.
If this is truly the case that would be awesome, since being able to run glif on my own machine would be amazing!
Flux
can i edit a glif after it becomes popular and eveyrone who visits it gets the new version? asking for a friend not because i want to make one that gets popular and then change it to make everything that's prompted into a butt prompt
of the dev-schnell merge version? i'd be using it at like 8-16 steps, instead of 4 or 30
You definitely can lol
hmmm
i use the dev-schnell merge as like a 8 step lightning model, instead of a turbo or regular model. basically, a happy medium between quality and time
No, I mean the Schnell NF4
ahh alright
Is the Dev version worth experimenting with? I seem to recall you mentioning something about waiting 45 mins for an image though! ๐ฆ I don't remember if that was in relation to nf4 or not though
You're completely correct that was what I reported and now I'm getting 2 minutes
With this new miracle
on my 4080 i'm getting 20 steps in 22 seconds
Workflow? ๐ ๐
Honestly nothing special. I keep my workflows in all the images I share here but basically it's just using nf4. Though I strongly recommend keeping the fp16 clip.
Here for example
Help?
I tried install missing nodes, tried searching for it....
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4 is the node you want
it sayus its on manager now
Follow the steps starting there:
thank you very much ๐
comfyui is getting for loops now and thats super interesting to me
how many versions of flux will there be I wonder ๐
random thought
What does that mean in laymans terms?
its kinda hard to think of a use for them
loops mean that few nodes can be repeated many times
i can't come up with a reason to use them on a single images. but i feel potential
i brought up some the other day: iterative upscaling to avoid artifacts when upscaling a latent too quickly, was one of the ideas
there's these things i've seen, one button datasets they call themselves. it could make those more interesting and potentially use less custom nodes
either comfyui manager search feature sucks, or this isn't true
might take a day for it to propagate to the list on manager
for now, you can manually install it
this is essentially what the gradual deep shrink node does, just only to block 3
so i just found this on the bitsnbytes github page
and apparently he's updated it again sometime today
i'm not sure either, just found it like 30 seconds before i posted it. go read the page
I can never figure out which directory to put these things in! I'm going to try custom nodes first...
WHy would tht not be listed on the github page?
yeah he was saying that the speed increases are more random than first thought . more data needed
oh you don't have to do it that way, forgot you can install from a git url within the comfyui manager
and then paste in https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4.git
it should handle the rest and install the requirements for you
I did it with git desktop but it didn't work (prob had the wrong directory lol)
there's more to it than that, delete the folder from your attempt and do it this way
like you have to manually update bitsandbytes within the comfy python venv
I have so many node conflicts theyll soon start a war amongts themselves
Has comfy manager always had that feature???!!!!!
Er?
but otherwise, you have to do the usual git pull within the custom nodes folder, then from the python embed folder, you have to run python.exe -m pip install -r somepathhere\requirements.txt
venv=virtual environment
when working with python, almost all these apps use their own virtual environment so they dont f each other's dependencies up
but the comfy manager automates the installation and takes care of installing the dependencies to the comfyui venv with the install via git url option
I wondered why!!!
Comfy manager is nice toolset
you probably installed it to your global python install and that's why it didn't work for you
Imagine just installing new software to the entire globe hahaha like hey everybody have a new finger today. it's an update.
thanks for the finger I got it too
So I got this crap (still relearning windows), this is a windows not realizing who owns my computer again thing isn't it?
now I can play a Cminor(11,13) chord with one hand
oh that's right, they started cracking down on custom nodes after that asshat snuck in a malware wheel to that vlm addon months ago
its scary yeah
that that happened
So how does one get around that? You all installed it somehow ๐
sec i'll just show you how to do this manually real quick
I really wanted to try the install from git feature, darnit lol... I was all raedy to use it for the last 10 custom nodes that it couldnt find lololol
alright here's what you do, go into your comfyui custom_nodes folder
at the top of the window where the folder path is, click it to start typing in it and type cmd and press enter (opens a cmd prompt with the current folder path to save hassle)
then in the cmd prompt, run git pull https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4.git
close the cmd prompt window
after that, go up two folder directories and you'll see python_embeded
go into that folder, type cmd in the folder path at the top again to open up a cmd window
run the command python.exe -m pip install -U bitsandbytes (it's the only thing in the requirements.txt file that it needs)
let me know if this works or not
I'll try going up a diff way lol
do it the way i described using the file explorer
if you do it the way i said, you won't need to do any cd nonsense
You prob mean go back to file explorer instead of trying to naviagate in dos or whatever it's called now
like this, and then type cmd and press enter (ignore the folder in my image, i was doing something else, you should be in the custom nodes folder and then for the later step, the python embedded folder)
it will open up a cmd prompt with the current folder path for you, so you dont have to do the cd blah\blah\blah stuff
got distracted after it suggested I upgrade pip, but I think I finally got it!
awesome, and you can run the suggested command, just make sure to copy/paste it exactly, since it will handle installing it to the comfy embedded python venv
grumble I must have gotten the wrong directory or something ๐
hmm
I'll re-read your instructions and try it again
Give me a minute, I'll just make a quick .bat script to automate it for you when I'm back on my computer. I see the folder path in your screenshot from earlier
there is a nf4 schnell version available which is just crazy. 3 steps and upscaled 2x using 4 steps with sdxl lightning
Link pls.
Flux.1-Dev BNB NF4: Source: https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main Flux.1-Schnell BNB & NF4: Source: https://huggingface...
Gracias.
it doesn't text decently well
How often do you actually need to have text though...? You can always just resample the image with the slower fp8 version to correct it
that's true
okay this is great. at 480x840 8 steps it's great at text
i was wrong
well schnell is always kind of slightly less reliable at text, but yeah that looks pretty good
how low can schnell resolution go and still be okay?
here is a cat at 240x480 resolution generated in 1.7 seconds on an rtx 4070ti super
1.7 seconds
but idk if i would call the resolution bearable
rename it to .bat and run it
this will delete the previous installation of the node just to make sure, before reinstalling it
Thank you very much ๐
np, let me know if it works
it looks like I can run it from any directory
yep, thats why i manually entered in the folders
lmao, i hate cmd prompt stuff and i hate python
though i can't remember if del will work without some kind of flag like -y or something, so you might have to input a Y if it asks you about deleting the folder
yup 240 x 480 seems bearable
that's actually really solid
i know flux can supposedly handle all the way down to like 0.1 megapixels
and that's right around that mark
yeah, 115,200 pixels
im going the other way though, just added a bunch of 2MP resolutions to a node to be able to lazy select them
i have to stay around 1MP because of vram, but maybe i can go higher now with the nf4 version
So after receiving this (reason I hate windows11) message, I opened up the command prompt, cd to downloads, typed in nf4.bat and it looked like it did all it was supposed to... However...
I also restarted and refreshed comfy
maybe try deleting the node from the workflow and make sure to update comfyui. also, that smartscreen is just because it's a bat file and people can put malicious shit in them
it's possible the node has a different ID now or something under the hood and that workflow is trying to load an old version
Hmph, I read it before I ran it, windows should chill lol
omg, it's doing a 2mp image, but it's taking 30sec/it instead of 2.5sec/it. maybe i should back the size off a little because it's obviously overflowing vram
but the fact it's letting me do it at all is awesome with only 8gb vram
with the NF4 node i had to force a bitsandbytes upgrade to get mine to work, seems i had an older version that didnt work, might be worth a peek
yeah that's why i had them add the -U tag in there. the requirements file doesn't specify a version, but it needs the latest build i think
most comfy installs already have bnb in them if you mess around with things like llm nodes
ahh cool, someone did upload the dev-schnell merge as an nf4 version https://huggingface.co/silveroxides/flux1-nf4-weights/tree/main
it's the bitte-guidance version "Bitte is schnell with double blocks from dev. if name has guidance in it, it will also include guidance keys which has interesting effect." from the comments
it's the correct merge style like from the comfy blog post where it only swaps the double blocks
a new one appeared also "salto" wonder what that is
not sure, probably some other blend of the two maybe. i know someone had done a bunch of mixes like 2:1, 10:1, 20:1 and stuff, so maybe it's one of those? dunno, would have to ask in the comments
well bitte-guidance works pretty well and does actually respond to changes in guidance
flux1-bitte-guidance-bnb-nf4, 8 steps, euler-simple, flux guidance 2/4/8:
another test with the flux1-bitte-guidance-bnb-nf4 version @ different guidances
Thanks to google I found out how to get around comfyui manager not letting me install from a github url ๐
what did it turn out to be? was it because it was in a protected user folder or something?
Nah, just had to change the comfyui manager config.ini
For once it wasn't windows being a control freak lol
sweet, well i was worried the folder path might have been an issue before with the cmd prompt stuff
Flux nf4; 4 minutes w only 8gb gpu
It should be far quicker than that. How many steps, what card do you have and how much system memory?
Like on this 8gb 2080, it's around 2.5 seconds per step
16GB RAM/GeForce RTX 4060
ahh that's why, you're running out of system memory and having to use the pagefile from your drive
Takes 10 mins with regular dev
What?! It has a voice to txt option?!!!!
If that's the same as a swap file, I set mine to 180gb ๐
nf4
I have given up on res-adapter
IDK if the comfy node is broken
I can only sometimes get it to work
its meant to make SDXL generate nicely at 500-1500 pixels
This is what I'm talking about!
nf4/schnellDev
flux -> kling
any help?
home sweet home
You can disable that, just be careful since there are viri in some sometimes.
Just Google the error (or scroll up here)
Flux.Schnell - prompt from a Bible passage (Ezekiel)
Does NF4 work with rtx 2xxx cards ?
the fine lines on trees you are getting today are the best fine lines I have ever seen for AI image gen
maybe fine lines are a big strength of cascade arch
it's what i'm doing with sampling and the backend
i'm also using random attention guidance which i don't think anyone's done
ah okay I see
maybe cascade has some extra innate ability there but the base out of the box workflow, that kinda stuff looks like a mess usually
squiggly squishy lookng shit
I'm d/loading nf4 - will it be much different than Schnell?
8Gb VRAM RTX 2070
Flux.Schnell - waiting on nf4 ...
I have the same question, what's the deal with that model?
I'm getting error with nf4 - Error(s) in loading state_dict for Flux:
size mismatch for img_in.weight: copying a param with shape torch.Size([98304, 1]) from checkpoint, the shape in current model is torch.Size([3072, 64]).
nf4 does not work with RTX2070 - shd use fp8 instead!
๐ฆ
I was checking the new Forge, it seems to work with that model nf4
I'm using it just fine with a 2080
2.5 sec/it
I am getting this error RTX2070 and nf4
2.5 s/it is cool - using Schnell - I'm getting 10.33s/it - or 100 seconds/image
Sorry if this was posted already
But we got boring reality for flux!!!
Holy molly that was fast]
Are you using comfyui? If so, make sure to update it and make sure to install the nf4 addon. It has a special checkpoint loader with nf4 in the name
https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4 is the node for comfyui, follow the instructions there. don't know if it's up on the addon manager yet or not
oh he updated it to say "Now on the manager for easy installation. Make sure to select Channel:dev in the ComfyUI manager menu or install via git url."
0, 0.7 and 1.2 strength
no, only with 3x and 4x atm
I think this is going to be my new default prompt. I love when the RNG gods give you terrible random prompts when using LLMs. Feels like winning the lottery.
dont think rtx 2x cards are supported to work with nf4
they are, these images are coming from it right now
from a 2080 FE 8gb
yes
it might be that only the 2080 has the feature and maybe the 2070 and lower dont
currently with flux schnell regular regular checkpoint i get max 40 seconds to generate an image
and if nf4 speeds that up even more that'd be exciting
dev is better but needs more gpu power
the reason im avoiding dev cause my system has 12gb vram and 32gb ram
also tested a dev model that takes about 2 min to generate ... not worth it
they use the same vram and ram, just one does shit in 4 steps, the other needs many more
think of it like a regular sdxl model vs its lightning variant
the resource use between those are vastly different
no, they arent. you must not have the same fp versions of them. if they are both fp8, they both use the same resources
now you might have gotten one that was packed with the fp16 t5 encoder and maybe the other had an fp8 t5 encoder
so you are saying fp8 version of dev and schnell both uses the same resources?
yes, as long as the text encoders packed with them are the same precisions
yeah so a single checkpoint for flux is the transformer (dev/schnell) + t5 that can be fp8 or fp16 + clipL + vae
and the dev/schnell transformer can be fp16, fp8 or now nf4
assuming you're using something like dev fp8 + t5 fp8 + clipL + vae and schell fp8 + t5 fp8 + clipL + vae, they will use identical resources
yes but when i ran dev model that took me much longer with 20 steps ... i mean 20 steps is not alot but that felt pretty long
so im guessing the dev model was straining my system
oh right, yeah, one is a 4 step model and the other is meant more for like 20-40 steps
so for instance, i'm getting ~2.5 seconds / iteration on my pc. this means 4 steps takes 10 seconds and 40 steps would take 100 seconds
not compared to what it was before lol
I'm trying the ComfyUI add-on for RTX 2XXXX nf4 support ...
shit would take like 7 minutes

so a minute and a half is nothing for a high quality image
how is your speed with nf4 dev?
2.5 seconds per iteration. fp8 was like 8 seconds per iteration
and the quality is still great usually
ohh, thought you meant that was the regular dev lol
Dev at 20 iterations = 40 minutes on my RTX2070 8Gb
well i'm a developer that's always doing other shit inbetween stuff i'm working on, or i'm doing art, or i'm dealing with the kids pulling each other's hair out downstairs
so it works fine for me
lol i noticed i can't play youtube on my 2nd montior while running flux
or that lags my system
Schnell is 100 seconds/image
on 2x card that's nice
in the task manager, you can go to details, locate the python.exe in the list, right click, set afinity, and uncheck one of the first in the list. this will free up a thread to do other shit
usually im ok with spotify and flux ... but can't play 4k videos at the same time
disable hardware acceleration in your browser
your GPU is being hammered while diffusing, let your cpu handle the video playback
since it's not used as much for diffusion
i guess
oh that's plenty enough
Tried prompting only for the word, didn't work 
dont you have to disable hardware acceleration in windows 11 settings tho? or just the chrome browser setting will do?
NF4 Loader - I take it that the checkpoint must now go under checkpoints and not unets? Also, where do I now input weight_dtype?
yes checkpoint .. not unet
And the dtype?
OK, thanks
you dont need encoders with single checkpoint
all the encoders, vae, and clips are included in that checkpoint

btw are you running dev nf4 or schnell nf4?
I have dev_bnb_fn4 loaded
Now that Flux is out, when is the Capacitor model coming?
But its erroring
Which company is that from
This error with dev_fp8 Error occurred when executing SamplerCustomAdvanced:
4-bit quantization data type None is not implemented.
Doc Brown.
testing flux lora training on a 4090, testing on 1/3 of my giger dataset since its not a style it knows.
still training, no vram for photoshop so shitty comparison straight from the trainer, progress from same seed "woman wearing biomechanical armor"
i'm about to give it a spin, almost finished downloading the model, then i have to update the comfyui and stuff
I will rock flux Dev 80 at 512/512 ๐
I'm perhaps one RTX 2XXXX owner whose GPU will not work with nf4 ๐ฆ
colors
sounds right..but schnell can do it in 30-40s
schnell 4 step with the nf4 version is 10 seconds per image, i'm testing it out right now
