#🏞|general-with-images
1 messages · Page 171 of 1
It is I’m using 1.0 not photoreal mainly because the UI I use on Mac don’t support the model to be imported for whatever reason
But it’s ok
FP8 and then through the UI did a 8-bit conversion
ah yeah mac is very rough sometimes
when its good its the best value
but things can get a bit squiffy
Unless u have a Mac Studio then its pretty good but I dont have £5k-11k for that XD
Tho Draw Things software and stability matrix is great for AI image gen for Mac
Specifically Draw Things
That’s how I did a easy 8-Bit conversion
Just 2-3 clicks
yeah
I wish everything had 1 click quants
to like int8/fp8 and int4/fp4 as a choice
True say true say I can only do 8-bit for now but who knows LuiLui (sorry if I spelt that name wrong) might give Draw Things to do more than 8-Bit convert
4 bit gets a bit spicier anyway you can lose some quality
it doesn't matter for me cos I do like many stages of upscale but for a lot of people they will not like the quality drop
Usually I use 5-bit of Flux dev but I can only use 8-bit for custom models that isn’t in the app itself
GGUF might get good
I’m prolly gonna save up to get used or refurb 3090 or maybe if I’m lucky 4090
apparently latest pytorch nightly has +30% boost for torch.compile for GGUF diffusion model quants
Ah I see
I've been looking at weird stuff like
getting a bunch of CPU servers linked with infiniband and just not having GPU
Rn I’m chilling with a 6600 xt which I mainly got for 1080p gaming before I got into image gen
Interesting
Intel rly wants people to buy their CPU
so they kept cooking their OpenVino thing
and they cooked so hard that it kinda looks viable 👀
upscale ver of first image
Siax is so good still after all this time
I used majority of upscalers out there at this point and Siax can still compete
What other upscaler models do people use with flux?
well the big fancy networks are DAT RGT ATD
this one is common https://openmodeldb.info/models/4x-RealWebPhoto-v4-dat2
Thanks
its hard to rank upscales cos they have different looks
the fancy ones should be able to get you less errors than Siax and a softer look as well
cos ones like Siax and Ultrasharp can be a bit crunchy or harsh
doing a slight blur before and after can help
but it needs to be a rly small blur cos otherwise of course your stuff gets blurry
Actually forgot where to put the file 😭
LOL yesterday I deleted 10 copies of Flux, about 15 copies of SD 1.5 and over 10 copies of Clip L from my laptop
thats with this upscaler
okay yeah slightly softer
you can see how well Siax keeps up lol
the difference is slim
Yeah I had to use stability matrix interface for it cuz Draw Things didn’t support it
Took longer by quite a bit
ye the big new ones are slow
ass straight ass
@wispy nest left is sigma vision on the right right is RayFlux 1.0 (both 8-Bit conversions of FP8)
I personally like the right more than left
yeah for sure, right is so much more natural
colour is the weak area of current AI from what I have seen
someone did DDPM in LAB colour space which was nice but small network only I think it was CelebA training data set
Yeah I’m pretty sure sigma vision is Schnell de distilled
And I think RayFlux is De distilled of DEV
yeah something like that
Which could explain the colours on sigma
if you want image quality sometimes you want to keep schnell out
Since it reminds me of normal schnell colours
distils tend to be bolder and less soft so that makes sense
Yeah fair I mean sigma vision is data set on males only for now according to the creator so RayFlux is my fav and less steps so it’s 4 min gen instead of 12 min gen on sigma vision 😅
sigma vision is narrow data ye
Prompt stolen from Reddit.
"You can't get ye flask!"
there's not much new things going on with stable diffusion in the last months, right ?
or did i miss something important ?
they dropped a rly good 3D model, like actually SOTA in its area
its a multiview video model
I didn't realise before I started doing 3D that they actually cooked rly hard with this one as well https://github.com/Stability-AI/stable-point-aware-3d
its fully modular so you can swap bits out
like you can take out DDIM in the point cloud stage and put stochastic sampler
switch differential renderer in the later stage etc
maybe more importantly you can take that Stable Virtual Camera model and combine it with recent methods to generate point cloud from many multi-view videos
and cook the point cloud harder
Ultra epic manually remastered song is coming soon...
Here, have a Tifa.
RayFlux cat photography
yessss. welcome to Team Tifa.
oh okay
so its blocked because it doesnt allows external ips into the environment
but I used it, it's not more possible? Have some way to install it?
idk, you should message the vast.ai support maybe
also hav you tried install an other extension?
yes, if I reboot sometime works, but sometimes give erro with kernel
oh ok
I will make more trys, thanks
#artisan-1 把图片变成黑白色
I ever used de default resolution when I work with controlnet, it's wrong? Must I match it with ima2img resolution or other thing?
yeah, its best if it matches your image resolution
its not wrong, though. The aspect ratio of your image will still be used.
Any idea how these ppl are doing nearly perfect virtual try ons? All the models Ive used mess with the face and head too much and the images are never as clear as these.
九度
Ayy! Just found back my OG models an configs for the lora i loved the results of! Now to train it further :D
could someone please explain the 0 90 180 270 settings are for and when to use them. it doesn't seem to do anything.
sorry i thought i was in teck
I think im doing something wrong, https://civitai.com/images/60079041 copied and pasted that to my sd using the correct checkpoint and im getting monster faces lol
#🏞|general-with-images a big dog
HELP I make simple puzzle game. I need few images into my menu - buildings. I already have one, but need more in same style. Will you create it for me?
I am trying to get this result. The only light source would be from the phone screen. SD Pony keeps adding extra light sources. Do you have any idea how I can get this done?
img2img
normal map control net can help
its the only controlnet that explicitly bakes some lighting stuff
if you want to get fancier we have some diffusion models that convert back and forth between RGB and elements of principled Bidirectional Scattering Distribution Function shaders
Hidream went high, then it didn't 
HiDream really is nitpicky on how you write prompts 😬 You'd think a full blown llm as input would make it more lenient, but it's weird. "Anthropomorphic dog, muscular agile body, short glossy warm golden brown fur, floppy adorable ears. Stands on a flashing dance floor in front of a discotheque entrance. Wearing a shimmering silver jumpsuit. Ornate cat-faced masquerade mask (realistic markings, faux fur trim) held in front of its face with one paw, partially covering it." doesn't work, it just puts a mask on the face, changing to "** In one paw it holds an ornate cat-faced masquerade mask (realistic markings, faux fur trim) in front of its face, partially covering it.**" And boom from a generic mask on its face it turns into the cat mask in a paw (which HiDream insists looks like a hand, but well, can't have it all...)
Makes me wonder if it has been trained on only one specic form of caption, not a range like normal models are.
do u have chinese translation
Not this time, but i have tried it before, the thing does understand chinese (super weird, clip/t5 can't even tokenize that properly to my knowledge) didn't notice better/worse, unlike colors
(and my Chinese is understanding is nonexistent, i don't even know translation went right 🤡 )
lol this time chinese by deepl butchers the prompt
Bro how do you even generate an image period. I've just gotten into this- tried "/dream" at the start but nothing happens
i download the model does anyone know where should i put the model at?
@sullen perch
This would be an example how to use some custom nodes to use a counter to create textfiles named 1.txt, 2.txt, ....
In this example all files have the content "bla bla bla".
thanks man
trying to create a 6 person family for website, any ideas on how to get stable for not duplicating people?
yes i've used negative keyword
I've just seen that HiDream exists, is it better than Flux?
I can see that it might be heavier than Flux, but isn't it?
Q8 is 17 go..
well this part is interesting
whilst it takes more vram it might be faster
i have not benchmarked myself yet rly
I used it a bit but did not time it
it is moe (mixture of experts) like chatgpt or mixtral
but for diffusion model
I see, how many vrams does it require, does it need a separate textencoder like for flux?
What specs do you have, and do you run Q8?
it needs two clip models, T5 and llama
not sure how much VRAM it requires
and I don't own a GPU
I just rent random cloud ones
I'd be very surprised if it ran on my 3080ti with my poor 12GB of Vram.
you can blockswap
Hi everybody. Can I resize my square image to 16:9 wide using dream_image/ command?
It can run on intel arc's a770 LE 16gb with --reserve-vram 8.0
The GGUF Q8 quant more specifically
at 9s/it
And that card is a 3060 equivalent
interesting
does it works with comfy?
Yes.
Is this legit?
But i just wanna fix my issues
But why it want my wallet?
I don't have any
All i want is to run this ui on my pc
https://github.com/lllyasviel/stable-diffusion-webui-forge
Its because your getting scammed
Don't join them
We have #🤝|tech-support and its free
@wispy nest ban @sudden trellis and @hasty jungle
Trd the one send that scam
Both are in that scam server
Got banned from that server
here comes Billy
@echo tapir Quick test done. I'm no expert in benchmarking and proper analyzing, but better than nothing lol
it's drooling
@wheat girder both @hasty jungle and @sudden trellis are scammers
Veo2?
image made with illustrious, animated with leoAI motion 2.0
ah leo
I always forget you're big on leo 🙂
not really, but trying out new features because why not
Veo2 🙂
Comic-style wide shot of a dark alley, with the polar bear and the little man in the black hoodie locked in a tense face-off. The bear stands on two legs, wearing a red scarf and brown newsboy cap. The man is frozen in place as the bear’s massive shadow stretches over him, cast by stark moonlight. Mood: fatal finality. Use silhouetted framing with sharp light-dark contrast, emphasizing the bear’s dominance and the gravity of the moment. Let the moonlight carve out dramatic shapes in the alley, heightening the cinematic tension.
create image Comic-style wide shot of a dark alley, with the polar bear and the little man in the black hoodie locked in a tense face-off. The bear stands on two legs, wearing a red scarf and brown newsboy cap. The man is frozen in place as the bear’s massive shadow stretches over him, cast by stark moonlight. Mood: fatal finality. Use silhouetted framing with sharp light-dark contrast, emphasizing the bear’s dominance and the gravity of the moment. Let the moonlight carve out dramatic shapes in the alley, heightening the cinematic tension.
My work is done!
show
Haha, it’s an expression meaning I have achieved my goal (if my generation made someone disgusted).
HiDream doesn’t appear to work on MPS right now. I just get a completely black square as output with the standard workflow. If I use the bosh3 scheduler, it complains about the tensor being full of NaNs.
I’ve never used any video models. Thought about it, but not a big priority for me.
what a big priority ?
I barely have time to generate images. I have a job and family.
Okay, tell me which model do you prefer then?🙂
Sorry for delay, banned
Redhead woman with orange hair, green eyes, curved body, freckles.
HiDream Q4 + Flux Q8 :
Are we allowed to put prompts here?
A few more in #dailies & one buried in #anime. Just follow the 💩s 🫡👍
i want a funny picture of ryan renolds wearing this t-shirt but it just makes a similair picture from the original.
from diffusers import StableDiffusionXLImg2ImgPipeline
from PIL import Image
import torch
from diffusers.utils import make_image_grid
# Load the SDXL checkpoint
ckpt_path = "E:/skit/Fooocus_win64_2-5-0/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors"
pipe = StableDiffusionXLImg2ImgPipeline.from_single_file(
ckpt_path,
torch_dtype=torch.float16,
use_safetensors=True,
).to("cuda")
# Optional performance settings
# pipe.enable_model_cpu_offload()
# pipe.enable_xformers_memory_efficient_attention()
# Load your image (the jacket image)
init_image_path = "E:/deadpool.png"
init_image = Image.open(init_image_path).convert("RGB").resize((1024, 1024))
# Enhanced prompt with more detail
prompt = "Ryan renolds wearing t-shirt"
# Generate image with fewer steps for quicker output
image = pipe(
prompt=prompt,
image=init_image,
strength=0.55, # Adjust strength for more realism
guidance_scale=9.0, # Adjust guidance to focus on the prompt
num_inference_steps=15 # Lower number of steps for faster generation
).images[0]
# Save result
make_image_grid([init_image, image], rows=1, cols=2).save("Ger.png")
The one to the left is the t-shirt he should wear and to the right is the generated from this code
img2img is not the right tool for that
for SDXL you should use ipadapter (maybe in combination with inpainting).
for flux you can use redux instead.
ok my computer is not good enough ti use fllux i think, thats why i use sdxl, but do i need to use inpaint cause i want to generate a pose
has anyone got an idea why it doesnt change her position to the one i added for openpose? chatgpt is as lost on this one as i am as it says everything is done as suppsoed to
initially i had prepocessor on openpose_full but then it said since i manuallyy added the exoskellet i need to have it on none so i did that yet nothing changed
you can combine inpainting, controlnets and ipadapter without problems
I would first make an txt2img for your character with the prompt already specifying how the shirt looks like
then inpaint the shirt with ipadapter on the shirt image
And don’t forget higher end AMD GPUs with ROCm would tend to be sort of faster than shown on screen
(As far as I’ve researched)
anyone can give me tip on how to make face remain the same on i2v using wan 2.1 model
Is it cheaper now on that platform since 50 series has been on it?
Hard
Just takes long to transfer models to that platform on Jupiter
Switches from 2m trailing to SDE trailing gives me way more realistic results but takes longer to gen
Especially on Mac
But it’s ok
do you know about aria2?
if you put 16 streams then it downloads 1,600% faster than wget
aria2 is a lightweight multi-protocol & multi-source command-line
download utility. It supports HTTP/HTTPS, FTP, SFTP,
BitTorrent and Metalink. …
Oh I don’t know how to use that
if you paste this in your terminal it might work```aria2c
-c \ # resume partial downloads if interrupted
-x16 -s16 \ # use 16 connections and 16 segments for speed
-o RealVisXL_V5.0_fp16.safetensors \ # save as this filename
'https://huggingface.co/SG161222/RealVisXL_V5.0/resolve/main/RealVisXL_V5.0_fp16.safetensors?download=true' # URL with download flag
that's pretty much all I do
its ok though you don't need aria2 its just fast if it works
I’m kind of confused is this to download models to my Mac or to Vast AI?
luckily aria2 works on windows, mac and linux like on vast.ai
it works on all of those
apparently you install on mac using ```brew install aria2
if you have brew
and on windows with ```winget install aria2
for linux its sudo apt update sudo apt install aria2
ayo theres two barretts now?!
lol you caught that eh?
This was ChatGPT's new image gen.
I kept at it and this is as close as I got. It's... not great. 😛
hey this one has redXIII and marlene. I dunno about the faces though XD Chatgpt looks like it was tryin to faceswap willem dafoe(sp?) on everyone XD but its a really cool image either way
img2img illustrious SwarmUI
#🏞|general-with-images smart room set up HOME THEATRE
Generated using AMD SD3.5 positive/negative prompts used and then upscaled for clarity using SmoothMix Ultra 4x
that adventure really sucked... but I did it!
Now i have to hope that adventure doesn't result in... a mess!
here's the .json for it
it's a start? hah.
IMAGINE/Bússola estilizada integrada a uma tela de TV ou antena
GERAR UMA Bússola estilizada integrada a uma tela de TV ou antena
I have this 6GB Nvidia Tesla K20X, would it be useful for any modern AI workloads?
You better get rid of these before the mods come.
just leave it.
the mods are asleep, they can't do shit about it.
they've been pinged.
aint nothing to be done about it.
I'm saying it for Mr. Man's benefit if he values his membership here.
he does not
his first message was in #📝|prompting-help saying "You want prompt help? shoot yourself" and some other bull.
including the nword with number censoring.
Alright, I hadn't seen him around before.
Plus he joined this server today so i mean. He's just here to be a menace and nothing more.
Thanks, mods. Interesting that Absolute Cool’s messages were also deleted. Based on the other thread’s messages, it seemed like a possible sock puppet.
he was
he was antagonizing everyone else to calm down
while the other guy kept blabbing
Design Title:
Soft Grip Handle for Supporting Existing Shopping Bags
⸻
Concept Summary:
This detachable soft grip is designed to support and enhance the carrying experience of existing store-provided shopping bags.
By wrapping around the top of a bulky or heavy bag, it reduces pressure on the hand, prevents deformation of the contents, and improves stability when carrying.
I am really grasping at straws here, avoiding spending hundreds on an RTX card
Buy now. The Price is only going to go up. The 3060 I bought at the start of the year is already $100 more.
6gb vram is okay for SD1.5
Xl gonna struggle
8gb nVidia GTX card do work, but they are slower.
I may use it instead for image tagging, voice, etc. instead of generative workloads
Hmmm could work
I thought it was 4GB originally until I looked at the RAM chips and that there are 24 of them
GTXs don't have Tensor cores, that's why.
the point of this kind of link is to RECEIVE money
: D
1st HiDream Full
2nd HiDream Dev
3rd HiDream Fast
Hopefully this will somewhat help someone quality wise (FP8)
full, dev, fast
thats the test images ive done so far
A Black man and an Asian woman had a child, and the child is in the middle of them, very exaggerated and humorous.
Flux.1-Dev + Ultramix smooth 4x upscaler. It it appears blurry on zoom in, wait a few secs. Discord takes a while to load the full image. different prompts and weights used for each image.
If you are willing to wait longer for each high-res image, I recommend doing a second pass of denoising with flux. I find that model-upscaled images have a characteristic look that I don’t really like, but Flux will clean them up really nicely. You’ll need to use tiled denoising for the second stage, though, due to memory use and flux’s ability to tolerate high resolution.
Each of my high-res images takes about 50 minutes to generate.
unfortunately currently i'm limited to amuse tool, haven't got round to using stability matrix to unlock the full monty
Okay. I use comfui.
Does anyone have a fix for bad teeth when using SDXL?
thats my plan at some stage yes, though the zluda variant of it as i have a AMD 9070 XT
can't remember if SDXL allows for negative prompts, if so, i would recommend adding some phrase in there that might help with it. personally i've never had issues with teeth
I haven't either, but I'm using PulID and wonder if it is introducing problems with smiles.
I’ve found that a lot of third-party model extensions are half-baked and negatively affect generation quality. If you think that base SDXL is better, you could do a second pass at low strength to try to clean up the details.
Thx. I may try that. I haven't done a lot of second-pass work yet.
Installation Guide: https://huggingface.co/ostris/Flex.1-alpha-Redux
Nunchaku: https://youtu.be/4Crrcrs39-w
ACE Plus + Redux Inpainting: https://youtu.be/4Gtpb4faxVE
Chapters:
0:00 - Intro
0:48 - Redux Comparison with Nunchaku
6:24 - Redux Comparison with Depth LoRA
10:25 - Redux Comparison for Inpainting
@sullen perch
ty
is that geralt? or geralt inspired lol
It is not, but I do have some Yennefers somewhere...
chat quick question
how do i avoid this
i have 6 GB VRAM and using --medvram on A1111
anyone know some checkpoint to allotment with houses? To make this kind of image
It seems that Open AI Sora has fixed the pseudo-Arabic now but some errors still occur and the support is very worse than English support, It starts to break when typing more than 3 words, however I can still fix those minor Arabic errors and found ways to abuse Sora including style consistency.
Hay Galaxy, its dev or full?
They were dev, but the last one (fairy) was full and using a mask.
ADetailer [SEP] lottery
can you manually determine the processing order for ADetailer
Detail daemon workflow is great with Flux Dev Q4 KS
Original with HiDream using cloud server
why my inpaint model never don't generated nothing? only remove, don't generate
Had fun making this one
He looks like an honest sea dog.
Not bad
Some of these are really nice
can someone please help me turn this sketch into an image
Whats the sketch supposed to represent?
Hello everyone I am technical officer at genotek, a product based company that manufactures expansion joint covers. Recently I have tried to make images for our product website using control net ipadapters chatgpt and various image to image techniques. I am giving a photo of our product. This is a single shot render of the product without any background that i did using 3ds max and arnold render.
I would like to create a image with this product as the cross section with a beautiful background. ChatGPT came close to what i want but the product details were wrong (I assume not a lot of these models are trained on what expansion joint cover are). So is there any way i could generate environment almost as beautiful as (2nd pic) with the product in the 1st pic. Willing to pay whoever is able to do this and share the workflow.
its supposed to be circuit board tracings with coding symbol inside like #%//()}{ etc
the tracings are purple and the background is baby blue
and the ratio is 5 height to 1 width
here is another attempt at the sketch
after trying and genrating 70 images with chat they all have the wrong ratio thats why i came here i will be very very thankfull
so your asking us to ai generate a circuitboard?
im afraid your not gonna get better results, have you tried using something like visio or drawIO to visiualize it better?
Any help would be appreciated mates
Not a functioning circuit board—just an artistic visualization. It doesn't need to follow any specific schematic; I just want it to have a circuit board style trace-like paths. Inside the circular nodes coding or programming symbols. drawio is too legit i want this as an artistic visualization
These kind of images? Or should they be cleaner like your sketch? The Resolution (300x1024) can be a problem during render as most models are not trained on these kind of resolutions. Maybe generating the middle and then outpaint towards the needed aspect ratio could help
This is one of the closest images I got ChatGPT to generate for me. It's not exact as it doesn't have the symbols, it's not the right ratio, but it's close enough. I really like the glow around the tracing in yours though, so if it can be like add a little glow around the purple lines. But I wanted the same color schematic where it's purple and baby blue. And thank you so much for trying to generate it for me. I don't know what model you're using, if you can also drop the name of the model, I would really appreciate it.
Model was hidream. But i can try some more, models now that i have seems what should be the result 🙂
Flux got even more a high glossy finish 🙂
here is another one that was close but the nodes were filed and wrong ratio again chat really doesnt want to genrate the right ratio
But from your description and the base image you would like to have a much cleaner appearence,
yeah cleaner in a way that still look like not real so somthing like this im sorry if im beeing misleading i dont know how to describe it better
Well does not get much different. I will try a more comic like, flat style
this is perfict exactly what i want
how did you get that what model and do you recommend a ui i can download it with
Model was flux and I used it with comfyui.
😂 Trump appeared in two out of the five generations in this batch, but I promise you the prompt has no mention of him!
when I mark with brush inside controlnet it means tha will use only what I mark with ip-adapter, or it is only for controlnet inpaint?
Hey guys, I'm looking to reproduce the following image without the character. I have a lot of trouble producing convincing cars. My idea was to use controlnet with an image of the car from a game, and then use a Lora for a PS2 style effect, but I have a lot of trouble using controlnet effectively. How would you do it ?
This is the best result I got using an outline from the Internet, otherwise it's hard to generate a clean canny
@sullen perch
Trained on very few, char seems perfectly good.
Style Lora in this pic too
@sullen perch eyes look fine using hires fix
Hi guys, do you know why a lot of celebrities have been removed from Civit.ai (loras)?
Were they pressured?
celeberties can make objections and lets be real they werent used for SFW purposes anyways
anyone can reply the lady with batman!
Is this flux?
Yep, just with the Eldritch Photography lora v1.3.
Wide shot of a dusty, bomb-ruined street in Mogadishu at golden hour, with soft light filtering through smoke, cinematic and desaturated.” Portrait of a lone figure standing in a dark, graffiti-lined alley at night, digital painting, graphic novel style, cinematic noir, high contrast rim lighting, muted teal and amber tones, textured brush strokes --v 5 --ar 9:16 --q 2 --stylize 1000
It's not great, but I like how the background came out. Cool for what it is. Just a raw piece from HiDream.
damn what was the prompt for this it's fire
If you could give me some tips on how to generate images with the colors and landscapes like yours that'd be fire
Thanks! This is just Flux fp16 with Eldritch Photography V 1.3 lora. The workflow should be attached, unless Discord stripped it. Prompt is A dramatic landscape retrofuturistic photograph of a (giant technological device)1.5, detailed industrial technological habitat megastructure++ that looks like a fruit on the edge of a giant waterfall in landscape. Waterfall from wide sea into abyss. Colorful sunny sci-fi, dramatic lighting. 50mm full shot. I originally made the prompt for SDXL about a year ago. I don't think the prompt weighting does anything with Flux, but I left it in.
Alright I appreciate that Ima give it a shot but I'm using nf4 currently on forgeui
I'm not doing too much unusual. Variations of this prompt just seem to work well with Flux. The most tweaking I've done is with the upscaling workflow.
ahh gotcha
ah thankyou so much
Actually, this one. https://civitai.com/models/717449?modelVersionId=1399148
I didn't like 2.0 as much. It tends to reduce the image quality.
dont worry I made sure to download the v.13
I saw the reviews for 2.0 as well
Im not sure how to trigger lora's tho or use them with flux
Ima just put in your prompt and see what happens
I already put the lora in my loras folder
This is the output I got
Not bad so far just the colors are a little different
If you are using comfy, you need a load lora node and pass the model through it to your diffusion node. Not sure about other interfaces. There is some dice rolling involved, too, as sometimes Flux can be opinionated. For example, in my original version of this prompt (without the fruit aspect), it was decidedly biased toward orange coloring, but occasionally gives different results based on the seed.
I see a lora tab there right under the negative prompt field. You don’t NEED to give any specific text in the prompt. If you keep the seed fixed, you can see the effect of the lora before/after.
Ohh alright thankyou for that
Killer Cat
My attempt at a propper movie still. Base Image was generated with Hidream, piped through an upscale workflow by using 4X Ultrasharp model first, scaled back to 2.5 Megapixel and piped through another KSampler at 0.25 Denoise Strength with JuggernautXL XI Lightning for 10 steps... Then some Photoshop magic happened for the final touchup
😆
Uhhh.. can someone help me with this image and how to improve it? I’m using the draw things app 😭
change size 720 * 1560 this image
Should ask draw things discord server since they are tailored to it obviously many here don’t use it or have not heard of it
I also use draw things on my MacBook and forge using stability Martix
question, wouldn't it be easier to achieve the result of this char doing that witn text 2 image?
assuming results is what you're after ofc
if it's for learning then fair enough
So I herd u liek Mudkipz...
i don't
Trump, in a suit, squatting under a desk, fixing a computer, realistic, office

Can I create an image here?
AMD GPUs can finally run PyTorch on Windows native! After a lot of hard work and long nights, we finally have PyTorch building and running on Windows. The first model I ran was SF3D and it was running smooth on strix halo. This was only possible thanks to https://t.co/qR83kH4WW0
@dry crow thoughts on this? Should be huge news no?
Yep its the huge for AMD on Windows!
🥹 🥲 just finished up one of my favorite pieces ever. I took over 8 months off of creating because life got crazy. It feels so good to be back and I've been researching and studying hard to learn new stuff that I missed out on! It's crazy how much advancement can happen in such a short time!!
Spaghetti Cycle?
Hey yall im trying to train a LoRa to produce images in the style on the left, but they are looking more like the right.
There are a ton of hallucinations and they're just odd. The LoRa is trained off of 1700 images in the correct style, so im thinking it could be overloaded.
Any feedback on things I could try?
Btw i'm using Flux for this through Replicate, i figured yalls SD expertise translates haha
it run as smooth as ubuntu?
I'm curious if many things that won't work for me now would work on Windows
I tried many things in Comfy that didn't work
Its not ready for normal users yet
But comfyui-zluda should work good on windows currently
Generate an image of 216" display in an auditorium
Generate an image of 216" display in an auditorium
/create fish
Make something with this much swag
Big scam
Cookin
yo guys, can someone help me? Im using Juggernaut XL ragnarok, sdxl 1.0 and still having issues with eyes...
im trying to use img to img controlnet to generate a character datasheet with many different poses using 1 character image and 1 open pose skeletal reference image (3 poses). is it possible to combine these two to insert my reference character into the pose ? (sd 1.5)
if possible i only really need to know the controlnet type, preprocessor, model 😄
Well image to image will use the source image with the source pose as input. Therefore it does not work very well for the controlnet pose compared to the results you will achieve with text to image.
@vagrant dust ah bc another guy said he used 3 model then using only the 1 image i gave him he was able to train the lora but he did not expound how. is using 3d model different from the open pose skeletal image ?
im not sure if he used inpaint or openpose or something else
Well if you use the character and train a Lora or use face switch etc. you can get close. You could also use wan rotate Lora and create an animation for it.
@vagrant dust i see there are many creative ways ! im not familiar with all the terms but thanks for introducing me to them i will search more about this ! i really believe if i just overcome the hurdle of datasheets creating loras will be much easier haha
Don't mind me. Coming through!
<url_dem_form> <url_img1_style> a biomechanical humanoid creature with tusks and extended tongue, bust portrait, in the exact rendering style of the second image, cinematic shadows, dark metallic skin, surreal alien armor, inspired by H.R. Giger, highly detailed, photoreal 3D style, atmospheric lighting, monochrome tones
why the images generation with realistic vision looks wierd
Is this a SD 1.5 model? If so I would suggest you try a lower resolution. (512x512 base).
currently all the models installed'
what model should i use
First try a lower resolution to check if the multiple items problem still remain.
now mutiple items problem got fixed by lowering resolution
but, what would be the best settings for realistic vision
There are all kind of different models architecture. So resolution wise you need to be in the sweet spot of the different architecture.
Sd1.5 base is 512x512 but you can try non 1:1 aspect ratios like 400x600 etc. most people use upscaling afterwards to get a useful resolution.
Most other architecture (SDXL, Flux, sd3.5,…) use a 1024x1024 resolution. Each architecture got some good or even great realistic models. You can find them on civitai and there you can see examples and good settings for the individual model.
Hey guys, i'm coding a tool to prepare a dataset for training, with Florence 2 caption in a batch processing 🙂
hey guys do you know what i should do if the face is not symmetrical ? the eye color is not the same :/ is there an easy way to fix this ?
Take a photo of someone else (real human faces are not symetrical, and a person's eyes are almost never perfectly identical shades)
question: does anyone know how to transfer drawing style into an image using controlnet ? for example image A(reference image): any ghibli style drawing, e.g., ghibli house, image B: keanu reeves = image C(generated image): keanu reeves in ghibli style
what would be the best combination to generate realistic image with flux
currently running on rtx 4060, 8 gb vram
using --medvram
web ui :- forge
do you guys know where the inpaint brush size adjuster setting is ? in google, it says theres a slider, but i can't locate it
sd 1.5 automatic 1111
nvm got it !
What would be the best method to replace the warriors hammer with the sword on the right?
And maybe change the pose a little bit
Inpaint?
upload both to chatgpu and ask it to do it 😛
Probably, either inpaint or controlnet and iterative prompting. Or abuse Chatgpt and see if it can do it for you lol
Omnia likes his shortcuts
Lol. I got promoted at my job today. Ive officially become the first rung of the corporate management structure at my job. So with that newfound wage increase....im upgrading my GPU
Lol yeah that does work well but limited uses
What ive gotten so far via chat gpt
Can further improve details with SD
there's the gemini model as well. I hear it's better with those sorts of requests
everyone kinda forgot about it when 4o came out, but it's pretty good at making edits
Gemini 2.5 Pro is good for a lot but i dont think the Imagen image model is quite up to 4o's level yet. Though i hear they got a new Imagen model coming.
Gemini 2.5 is super dogshit compared to GPT 3o or Sonnet 3.7, i do lot of dev, and i noticed that
OH MY GOD GUYS,
I'm Fixing Stable.art extension, it finally works !
Hi! I’m new i am trying using Stable diffusion to make manga panels looking for someone with a powerful GPU (RTX 3060/3080/3090 or better) to run 1–2 tests in AUTOMATIC1111 WebUI using custom checkpoints + LoRA. i want you to use powerful setup/checkpoints
Here’s what I use: i am looking to buy strong video card
Checkpoint: revAnimated_v2Rebirth.safetensors [8463ca6405]
LoRA: lora:kr-maleface-i2:1
Prompt: only 2 heroes black and white detailed manga style , 24 years old man wearing a hunter clothes all over his body with bones as armor, on top have black tiny braids on the side shaved head expression slightly smiling , holding by the neck 18 years old boy who is wearing tank top white and grey trousers with a belt, on top have short dark grey hair lifted up ,on the sides shaved hair expression sleepy
Resolution: 1152x768 landscape
My GPU (GTX 1050 Ti) can’t handle it well. I’ll provide:
✅ 10–20 BGN (~5–10 EUR) as a small thank-you via PayPal/Revolut
DM me if you’re up for it or just want to help 🙏 Thanks! down pose try with control net or not down is what style i am looking for and some of my best generations i am really desperate to be honest
Hey all, I'm honored to be a judge this year for the United Nations AI for Good Film Festival and wanted to remind you all that the deadline for AI films is tomorrow (May 15th!). I do a lot of work in the UN space around arts/culture and creative economy as practitioner and an advocate. Feel free to connect with me at linkedin.com/in/lisarussellfilms if you're interested in more AI art and storytelling for the social good. Here's the link for submissions: https://aiforgood.itu.int/ai-for-good-film-festival-2025/
New Technology. We search partnehip to development and change experiences and suggestions it how improve the system.
Imagine a 21 years old man wise and fit and powerful and calm sitting down on a modern office calm and confidence
Might wanna write everything in english, otherwise you'll only get french visitors :P
README-EN is present in project
请根据上传的人物图,生成这个人物的四分之三视角(等轴视角),要求全身,双手握拳,并确保生成的人物面部与原图一致。
HiDream, with just the first paragraph.
interesting 📝
Rewritten with Apple Intelligence to cut out the AI fluff and focus on a single image:
A T-800 Terminator drives a pearl-white sports car through a rain-soaked cityscape. The car features vivid purple and crimson red neon accents, with “PINOKIO” glowing on its side. The Terminator’s matte-black endoskeleton contrasts with the car’s smooth curves, and its metallic fingers grip the steering wheel. The car’s rear lights blaze red, and the cityscape is sharply rendered with neon signs, raindrops, and traffic lights.
nice thanks
creates a tarp for a Venezuelan food store
so good i gotta share it in every channel. for my exp 33 fans
This game was done at 50 km from my home x)
No x), but I'm pretty sure I've seen one of the musicians in real life x)
Experience the magic of our main theme, Alicia, in this stunning orchestral recording. Witness the passion and artistry that brought this piece to life during an unforgettable three-day recording session.
Clair Obscur: Expedition 33 launches on April 24, 2025! Pre-order now for PlayStation 5, Xbox Series X|S (available day one with Xbox Game Pa...
shadowheart?
I'm still a long way from the end of the game, I think.
Ahh you're talking about shadowheart from Baldur's gate 3, lol
i don't know, i play in french obviously
and you shadowheart voice is voice dubbing
(in exp 33 english version)
this is wild, the colors and detail are insane. feels like it's alive fr
Thanks!
this is incredible
anyone able to identify a model for me? I have a bunch of pictures of a certain style i can't find on civitai
these -
My caption tool is growing up 🙂
hey folks newbie here, just got curios is stable diffusion was strict about content generation when it comes producing ai images?
Adreitz is one of the kind creators who share the workflow within the image. So if you use comfy you could simple drop the image into the comfyui and get the complete workflow with seed etc.
No, at least not if done locally. Some on site generators will have their own restrictions or limits
I use a Mac, so depending on your VRAM you may have to modify my workflow to use a quantized model and text encoder.
Hi guys, any volunteers to give me feedback?
It's fully functional in GUI thanks to Gradio. Very useful to create your LoRA dataset:
https://github.com/SeBL4RD/Florence-Caption
@dry blaze Now default README is in english 😉
Noice :)
@drifting elk Could you see if you could apply multithreading to all that uses cpu performance? As i had GPT apply it to the "wanvideomodelloader, and it loaded up a decent bit faster on that part, but as i used gpt 4o, and had to "quarrel" with it to only add multithreading and not mess with anything else, it took a bit just for that node, and had it attempt to apply the same to any other node that uses cpu, but it appears to have altered other shit and broke the node on every other attempt lol.
The wanvideo nodes.py
As if it's something you could do in the future, i can create an issue on the git as a reminder :P
3k lines, lul,
Yeah, free (or not) GPT 4o is not strong enought
This script include prolly few steps, wich need to be multithreaded?
in your CLI?
Line 467-477 :P
from concurrent.futures import ThreadPoolExecutor
def load_transformer():
with init_empty_weights():
model = WanModel(**TRANSFORMER_CONFIG)
model.eval()
return model
with ThreadPoolExecutor(max_workers=24) as executor:
future_transformer = executor.submit(load_transformer)
transformer = future_transformer.result()
I've set 24 as i got 12 cores
It added it for the model loader at least, but was quite annoying to not have it overwrite other shit
Hence i got curious if kijai could make the other nodes as hyperthreaded as well, of the ones that moves/loads stuff with cpu.
Are you using an IDE or a simple editor? Because if you're using VScode, for example, with Python/Pylance extension, and your project contains the venv in question, you'll have completion and linter for these parameters.
As far as I'm concerned, it's not feasible, because this function doesn't allow it (don't be fooled, much of today's code is still single-threaded), because this function and its parameters don't allow it.
And what's more, to simply load a model, I'm not sure it's efficient.
The model has to be able to load faster with RAM/Vram and/or a faster SSD, so I'd prefer to go that way if it's really a problem for you.
This parameter is apparently useful for parallel loading or other tasks, i.e. several simultaneous tasks, not the same task. That's why it doesn't work.
Using vscode. I barely understand code sadly due to my adhd, as regular people see entire forest, then slowly work down the details, i am opposite, and only see individual bristles of a twig, on the branch of a tree, so shit is too complicated for me atm to understand the very bits and details about code yet 
And i don't know what linter is sadly.
It's more faster filesystem that is needed. When i ran it on linux with BTRFS and ZSTD:3 compression, it blew my mind how outdated and slow NTFS was. As i didn't get to blink before ram was filled even.
And i've managed to get GPT to make a code that writes model data to a .pt with mmap, and be able to get faster loading speed than even directstorage for games can offer. 4GB's model load speed was peak of what i makaged to get it to do so far.
Even the custom node to offload model to ram, i've speed up by 10fold with multithreading. Went from 12% cpu ish to 100%, and text encoder blasted through at 5-8 sec per positive/negative vs 30-40 sec iirc it could use on the old singlethreaded instance.
i was reading this artice: https://beltoforion.de/en/infinite_zoom/
and ive been trying to make an effect similar, however i cant generate the proper images or figure out how to outpaint the images using stable diffusion. The article uses midjourney so im not sure how to recreate it. any ideas?
Turn a sequence of ai generated outpainted images into an infinite zoom.
Awwww. They're adorable 
which model did you use?
The workflow is in the image you can just drag it into comfy to see the settings.
Flux Dev with the eldritch photography lora was used
thankyou
does this mean i can generate nsfw content using this ai model?
yes, you can generate nsfw content using Stable Diffusion, when hosted locally (on your own hardware). Some sites like Civitai also allow, for the time being, nsfw content, within certain bounds (they list their restrictions and will tell you when you try to upload or gen there if it breaks those rules, but they let most stuff slide). Other sites, like SeaArt or some of the more popular mobile apps dont allow NSFW content.
Before you try to post anything you gen here, this discord DOES NOT allow nsfw content 😛 just in case you were unaware.
4?
Correct! Bye!
blowing up my gpu on wan 2.1
I sincerely hope your using the causvid lora
O wait your using the wan app?
yep
basically I got a bunch of old tatu image that I want to animated
Anyways with the causvid lora from wan you go from 10-20min gens to like 2-4min for the uninformed
will look into how to install it after this run
once I get good at animating manga maybe we can have low cost amateur anime
2,2 and 4,5min respective for 24fps with 81frames
I want to make an anime for https://en.wikipedia.org/wiki/Red_Prowling_Devil
Red Prowling Devil (Japanese: 紅, Hepburn: Kurenai, lit. "Crimson") is a Japanese manga series created by Toshimitsu Shimizu, published in Japan by Shōnen Gahousha and spanning eight volumes (or tankōbon). An English translation of the series, covering all eight volumes in flipped format, was published in the United States by the (now appare...
Probably a long time away
Wan isn't at that level yet
I am thinking of just 30 mins ova
btw how do you install causvid lora
If you manage to get anything coherent maybe but you'd want to use the new wan
Well not in the app
Either in comfyUI or in swarmUI (recommended)
can I find it in the resource manager in comfyUI?
You have to download it seperately and edit a few things in a workflow
there are 8 vols of red prowling devil we should be able to create 30mins anime
All i could tell you:
https://github.com/mcmonkeyprojects/SwarmUI/blob/master/docs/Video Model Support.md#wan-causvid---high-speed-14b
Settings apply for comfy too
Once your have a content ai "ova" goodluck but it's not that simple
https://www.youtube.com/watch?v=XNcn845UXdw will watch this later
Wan 2.1 is still the very best local AI video generation model and it just became even more amazing and faster with CausVid LoRA. Now with utilizing ComfyUI backend power inside SwarmUI and my automatic installers to utilize Sage Attention, you can very fast generate very high quality AI videos with Wan 2.1 and CausVid LoRA at just 8 steps.
...
that sucks
Oh yeah that's furkans
Furkan is known to add a paywall but swarmUI is free and the github tells you just as well
The video author
https://www.youtube.com/watch?v=TCHXzX6vUcA this uses a lot of ai to fill in the gap
🎵 Dive into a Melodic Journey of Love and Destiny 🌌
Embark on an emotional voyage with this original Russian rock ballad, blending raw guitar energy, haunting acoustics, and driving rhythms. Inspired by the moody aesthetics of t.A.T.u and the surreal, poetic visuals of Adolescence of Utena’s iconic dance scene, this music video weaves a...
He's sorta notorious on github and discord for advertising his paid services for free apps
https://www.youtube.com/watch?v=0icOFbnn32U I just make shorter songs so I don't have to use ai to fill in the gaps
A forbidden love anthem set in the halls of Lillian Girls' Academy.
Two Catholic schoolgirls defy societal chains, family expectations, and a gilded cage to protect their secret love. Inspired by Maria-sama ga Miteru, this glitch-pop track pulses with rebellion, whispered vows, and the haunting beauty of love that refuses to be silenced.
Ги...
@copper matrix quick question where do I place this https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
The lora folder
Swarmui > models > lora
thre are like two folders called lora
Maybe make a wan folder
you can't directly place it under wan
Pinokio?
yes
? Asking ai about new ai models, risky
Causvid is a LORA
That works with WAN2.1 14b
Both img to vid and text to vid
It suggests ~/pinokio/api/WAN_2.1/models/Lora but after I show the image of where every other models are place it agrees with me
will see in couple hours once I am done with this
And personal experience, google, reddit and the github says causvid is a lora
But you do you
Pinokio is probably weird
this is in the lora folder I think you are right
@copper matrix is ReCamMaster the same as causvid
No?
yes then the loras folder is the wrong dir
You run WAN 2.1 as normal
end product pretty nice
And apply causvid as a lora
is there a button for apply causvid as a lora
Again i dont use pinokio
In swarmUI, in comfyUI its easier
Pinokio, no idea how that specific non mainstream stuff works
Lora preset what
Weird software
I googled quickly for use in pinokio and looks like most users cant get it to work in there, looks like its not updated yet or something
fair enough
will check out swarmUI later. I have comfyUi installed using pinokio
Normally how i apply a lora, its in the lora folder. I simply click it to activatw
What, why are you using that weird interface then? In comfy its just a case of using a lora loader
basically I used pinokio to be my one stop for ai software so I don't have to go thru tons of troubleshooting to get those software installed
I think you installed Wan straight through Pinokio, ending up with a default gradio interface or something like that. ComfyUI has built-in support for Wan, so you don’t need to install it separately (just make sure the model is in the correct location to use with Comfy).
Will look into installing comfy ui naively
It’s fine if you want to use Pinokio — I do, too. I’m just saying that, if you already use comfy in Pinokio, you shouldn’t need or want to install a model through Pinokio (in parallel to comfy) if it is already supported by comfy.
I want to get this working https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors so I can speed up my production time
You will need to find where on your HD Pinokio keeps the folder structure for ComfyUI. You will need to install the Wan model in the correct folder and the LoRA in the loras folder. There is also a folder with a collection of premade workflows that you can load into Comfy (stored in the metadata of image files). I believe there are Wan workflows there, so you could load one and just add the necessary node to load your LoRA.
Hi all, When generating an image with a mask the image quality drops a lot (On a x4 magnification mask even), can anyone tell me why this is happening and how to fix it? (The image on the right is generated without mask)
There is only one file so I can't place it in both lora and model folder
A LoRA is a modification to a model, not the model itself. You need the original model as a base to put the LoRA on top of.
The model goes into the models folder, the LoRA goes into the loras folder.
And both need to be loaded by the workflow.
Look at the enclosing folder, which contains many models. I don’t know what this LoRA does, but it is a modification of Wan, not Wan itself.
what did you download for your causVid installation
https://huggingface.co/Kijai/WanVideo_comfy/tree/main the folder only contains models
I don’t have one. I don’t do video.
oh
As a homosexual, I agree.
The name of the file you linked before makes it clear that it is a LoRA, not a model.
?
Oh ! you saw The last of us saison 2?!
No 💔
isn't .safetensors file always = model
The worst bromance
I watched last of us season 2 ep1-2 and I gave up
Sorry, but you’re mistaken. Safetensors is just a way of packaging a neural network. It doesn’t differentiate the type. Text encoders, VAEs, models, and LoRAs can all use the format.
oh
coz I placed it in lora and the place for model before and it does not seem to do much
A LoRA can do nothing by itself. It only records the differences between the base model and a fine tune.
ok
Also, using a LoRA with a model causes an increase in active memory use. It’s not going to make the model easier to run and may lead to paging issues if you’re right on the edge with the base model. There are some LoRAs for certain models to allow decent quality with a reduced number of diffusion steps, which could enable you to generate images faster. But there will be both a quality and a memory use penalty. Again, I don’t know what this LoRA does.
eurotypo suggests it so reduce the wan 2.1 flf2v run time
hey there
why with the latest version of the requests package, it doesnt print IDs of 64 characters, and only of 32 characters
I have the requests version 2.32.3
solved it by placing it in loras_i2v
safetensor's file format was created to deal with the issues pickle files have - which make them unsafe to run - thus the reason for the name
a lora IS a model
you mean it's not a checkpoint
any good ai tools for converting real life human picture into cute anime figure
Kinda looks like the next-gen skeletons we’ll leave behind like if evolution met AI and clay. Maybe they’re the future preservers of our stories… or just waiting for their firmware update
First time using vace, it tooks me 1h and 10-20 mins to generate this 5s video (t2v), anyway to increase the speed? I am using this workflow https://huggingface.co/QuantStack/Wan2.1-VACE-14B-GGUF/tree/main
I am using the Wan2.1-VACE-14B-Q6_K.gguf and i have a 5060 ti 16gb vram, the workflows already includes causvid lora, i have my steps in 4
wait 5060 ti 16gb is out already
damm
Solved
Now using a 5060 ti 16gb for a 720x512 5 seconds video:
494s q5s
484s q6
780s framepack f1
Shots of kids in a classroom receiving traditional gifts (certificates, sweets). Some kids look bored or uninterested.
Kids in a classroom receiving certificates and sweets, some looking bored, futuristic classroom with stars and planets visible through windows, students wearing space-themed outfits, vibrant colors, high detail”
WAI-SHUFFLE-NOOB... WHY DO YOU DO THIS TO ME?? 😭
how do i fix thissss? 😭
how do i fix this massacure?
looks like satans hellspawn
wrong sampler and scheduler it looks like
sampler?
how do i fix it? 😦
what are you generating with?
FORGE
okay, in forge, what settings do you have?
what checkpoint
waiSHUFFLENOOB
where di dyou get that from?
Civitai
link please
no idea what sampler and scheduler that thing needs but i'm gonna guess that Eular A isn't one it can use. change that to Eular
okay, get hold of the developer on civit that created that model, and get him/her to explain what you need for settings in forge
its strange
a bunch of people are using
it
how the hell did they figure that shi out
i doubt it'll work in forge at all - everyone's probably using comfy
okay well comfy will at least alow you to set both the sampler and scheduler. forge doesn't seem to ahve that ability
no it's not. you're just not famliar with it. install this https://github.com/mcmonkeyprojects/SwarmUI
and that took me months to master
swarm is very user friendly, @proud dagger has a channel here and his own discord
and it will make your life easy. it'll generate AND you can also run comfyUI in it
everything is better than forge...
alr ill try it
and just let alex know if you get stuck
ok
how do i download it?
Good day! Can anyone advise what tools and settings I need to use to achieve the same result as in the second photo?
the way my smile turned into the biggest frown ever......
so uhhhh
care to explain why my image looks like a artwork gallary 😭
man what is this??? 😭
lookin like a marshmello
read through the page. all of it. it tells you how to download it and install it
Try lowering the lora weight
I also see you put it in the prompt, you don't need that there in swarm
Could be also a reason for this weirdness
