lusty quarry Sep 11, 2024, 6:59 AM

#

Hi there, i am using your pods to run ostris/ai-toolkit to train flux on custom images, the thing is now i want to use your serverless endpoint capabilities, can you help me out? do you have some kind of template or guide on how to do it?

wintry zodiac Sep 12, 2024, 5:31 AM

#

@lusty quarry Hii!

I have the dev serverless already! I'll update schnell soon

lusty quarry Sep 12, 2024, 6:05 AM

#

wintry zodiac <@433746338505818112> Hii! I have the dev serverless already! I'll update schne...

Do you have some demo or can I test it out?

wintry zodiac Sep 12, 2024, 6:09 AM

#

lusty quarry Do you have some demo or can I test it out?

Give me 30min

lusty quarry Sep 12, 2024, 6:10 AM

#

Ok man, thx

lusty quarry Sep 12, 2024, 6:59 AM

#

wintry zodiac Give me 30min

What are you using to train it?

wintry zodiac Sep 12, 2024, 6:59 AM

#

📎 FLUX-LORA.zip

#

{
"input": {
"lora_file_name": "laksheya-geraldine_viswanathan-FLUX",
"trigger_word": "geraldine viswanathan",
"gender":"woman",
"data_url": "dataset_zip url"
},
"s3Config": {
"accessId": "accessId",
"accessSecret": "accessSecret",
"bucketName": "flux-lora",
"endpointUrl": "https://minio-api.cloud.com"
}
}

#

@lusty quarry

lusty quarry Sep 12, 2024, 7:03 AM

#

Thanks for sharing I will check it out

lusty quarry Sep 12, 2024, 7:15 AM

#

wintry zodiac

what does this image contain?

FROM navinhariharan/flux-lora:latest

how are you handling the long time proccess of training a model?

wintry zodiac Sep 12, 2024, 7:26 AM

#

Disable this for long time proccess

#

FROM navinhariharan/flux-lora:latest

These contain the flux models dev and schnell

lusty quarry Sep 12, 2024, 7:28 AM

#

Thank you for the help 🫡

wintry zodiac Sep 12, 2024, 7:28 AM

#

lusty quarry Thank you for the help 🫡

Anytime 🙂

#

So the lora is trained and sent to your s3 bucket!

lusty quarry Sep 12, 2024, 7:29 AM

#

wintry zodiac So the lora is trained and sent to your s3 bucket!

I will be hosting it in a server of mine to reduce costs

wintry zodiac Sep 12, 2024, 7:29 AM

#

lusty quarry I will be hosting it in a server of mine to reduce costs

I use minio!

lusty quarry Sep 12, 2024, 7:29 AM

#

wintry zodiac I use minio!

Never heard of

wintry zodiac Sep 12, 2024, 7:29 AM

#

open source s3

#

https://min.io/

MinIO

MinIO | S3 Compatible Storage for AI

MinIO's High Performance Object Storage is Open Source, Amazon S3 compatible, Kubernetes Native and is designed for cloud native workloads like AI.

lusty quarry Sep 12, 2024, 7:30 AM

#

I will take a look

wintry zodiac Sep 12, 2024, 7:30 AM

#

Sure! If you have issues let me know! I'll be happy to help!

lusty quarry Sep 12, 2024, 7:30 AM

#

Do you have any tips to get better results?

#

Or to make it train faster?

wintry zodiac Sep 12, 2024, 7:32 AM

#

Sample dataset with default param works!

📎 dataset.zip

wintry zodiac Sep 12, 2024, 7:32 AM

#

lusty quarry Or to make it train faster?

It takes 2hours!

The one in civit lora trainer is faster!

#

https://civitai.com/models/train

Sign in to Civitai

#

Here is a bmw I trained!

https://civitai.com/models/736216/bmw-m340i-2024-lci-flux?modelVersionId=823275

BMW M340I 2024 LCI (FLUX) - V1 | Stable Diffusion LoRA | Civitai

The BMW M340I!!

lusty quarry Sep 12, 2024, 7:37 AM

#

i was using ai-toolkit

#

what hardware are you using?

wintry zodiac Sep 12, 2024, 7:38 AM

#

lusty quarry what hardware are you using?

lusty quarry Sep 12, 2024, 7:39 AM

#

wintry zodiac It takes 2hours! The one in civit lora trainer is faster!

does it work for schneel? Is is faster then ai-toolkit?

wintry zodiac Sep 12, 2024, 7:39 AM

#

You can deploy this to get started!

wintry zodiac Sep 12, 2024, 7:40 AM

#

lusty quarry does it work for schneel? Is is faster then ai-toolkit?

Yes! Yes!

#

The lora size is small too without loss of quality!

wintry zodiac Sep 12, 2024, 7:40 AM

#

wintry zodiac You can deploy this to get started!

navinhariharan/flux-lora:latest

lusty quarry Sep 12, 2024, 7:41 AM

#

With ai-toolkit i am getting about 30-40 min for 1000 steps

wintry zodiac Sep 12, 2024, 7:41 AM

#

lusty quarry With ai-toolkit i am getting about 30-40 min for 1000 steps

I do 2000 steps!

lusty quarry Sep 12, 2024, 7:43 AM

#

ok, that makes sense

#

are you doing some kind of image selection/preprocessing?

wintry zodiac Sep 12, 2024, 8:00 AM

#

lusty quarry are you doing some kind of image selection/preprocessing?

Yep! The captions!

lusty quarry Sep 12, 2024, 8:01 AM

#

wintry zodiac Yep! The captions!

i am using florence2 for that

#

you arent excluding low quality ones, resizing, etc?

wintry zodiac Sep 12, 2024, 8:01 AM

#

lusty quarry you arent excluding low quality ones, resizing, etc?

The images you mean?

#

I mix a bit of everything!

lusty quarry Sep 12, 2024, 8:03 AM

#

wintry zodiac I mix a bit of everything!

i have noticed that low quality ones can completly mess your output

#

what have you put in this image navinhariharan/flux-lora:latest i want to costumize it, can you share the source?

wintry zodiac Sep 12, 2024, 8:10 AM

#

lusty quarry what have you put in this image navinhariharan/flux-lora:latest i want to costum...

black-forest-labs/FLUX.1-schnell
black-forest-labs/FLUX.1-dev

#

These are auto downloaded by ai-toolkit! Instead of exporting env for HF_TOKEN

I downloaded and made a docker image

#

That lives here

/huggingface/

lusty quarry Sep 12, 2024, 8:14 AM

#

i want to store those models in a network volume, so it can be shared between serverless instances

wintry zodiac Sep 12, 2024, 8:15 AM

#

lusty quarry i want to store those models in a network volume, so it can be shared between se...

That's the best!

lusty quarry Sep 12, 2024, 8:21 AM

#

wintry zodiac That's the best!

the thing is i didnt understood how to choose where its stored

lusty quarry Sep 12, 2024, 10:02 AM

#

another thing:

def train_lora(job):

if 's3Config' in job:
    s3_config = job["s3Config"]
    job_input = job["input"]
    job_input = download(job_input)
    if edityaml(job_input) == True:
        if job_input['gender'].lower() in ['woman','female','girl']:
            job = get_job('config/woman.yaml', None)
        elif job_input['gender'].lower() in ['man','male','boy']:
            job = get_job('config/man.yaml', None)
        job.run()

how are you able to run the job, where does the get_job function come from?

wintry zodiac Sep 12, 2024, 10:48 AM

#

lusty quarry another thing: def train_lora(job): if 's3Config' in job: s3_confi...

The handler bro!

lusty quarry Sep 12, 2024, 10:49 AM

#

Yes but then you call job.run

wintry zodiac Sep 12, 2024, 10:51 AM

#

runpod.serverless.start({"handler": train_lora})

This will call the function train_lora with the input json! that is...

job = {
"input": {
"lora_file_name": "laksheya-geraldine_viswanathan-FLUX",
"trigger_word": "geraldine viswanathan",
"gender":"woman",
"data_url": "dataset_zip url"
},
"s3Config": {
"accessId": "accessId",
"accessSecret": "accessSecret",
"bucketName": "flux-lora",
"endpointUrl": "https://minio-api.cloud.com"
}
}

wintry zodiac Sep 12, 2024, 10:52 AM

#

lusty quarry Yes but then you call job.run

@lusty quarry

lusty quarry Sep 12, 2024, 10:53 AM

#

Anda where is that function?

#

The train_lora ?

wintry zodiac Sep 12, 2024, 11:11 AM

#

@lusty quarry Line 31

lusty quarry Sep 12, 2024, 11:37 AM

#

sorry man it was a pretty stupid question, thats what i get for trying to do n things at a time ahaha

wintry zodiac Sep 13, 2024, 6:50 AM

#

lusty quarry sorry man it was a pretty stupid question, thats what i get for trying to do n t...

No issues mam! We are all learning 😄

lusty quarry Sep 13, 2024, 6:52 AM

#

Have you managed to successfully use network volumes in serverless?

wintry zodiac Sep 15, 2024, 1:55 AM

#

lusty quarry Have you managed to successfully use network volumes in serverless?

I've never tried them! It shouldn't be difficult though

latent steeple Sep 18, 2024, 4:46 AM

#

is this due the container size

#

And may I know what is the inference time , it taking for an image to generate on A100 or any other gpus , for me its taking 15 seconds
,

#

@wintry zodiac

wintry zodiac Sep 18, 2024, 8:19 AM

#

@latent steeple what is your input?

Please remove any credentials you have and send

#

Looks like an error while downloading dataset

latent steeple Sep 18, 2024, 8:21 AM

#

I am using flux and sdxl models in this deployment,

When ever user sends flux lora request, I will generate of flux lora

Same applies to sdxl

#

Input is

Lora blob url
Modeltype

#

What should be the container size

wintry zodiac Sep 18, 2024, 1:04 PM

#

That's all fine!

How are you sending in the training dataset?

#

@latent steeple

latent steeple Sep 18, 2024, 1:06 PM

#

This system doesn't need datasets , it just use the models from huggingface , it will import models from huggingface and download the lora and will use that lora for inference

wintry zodiac Sep 18, 2024, 1:38 PM

#

latent steeple This system doesn't need datasets , it just use the models from huggingface , it...

Could you please send the worker files so that I can take a look?

And also do not forget to remove sensitive info before sending!

latent steeple Sep 19, 2024, 9:23 AM

#

getting this error when I am using runpod-volume

#

Use a more specific base image for efficiency

FROM runpod/base:0.6.2-cuda12.2.0

Set environment variables

ENV HF_HUB_ENABLE_HF_TRANSFER=0
PYTHONDONTWRITEBYTECODE=1
PYTHONUNBUFFERED=1
HF_HOME=/runpod-volume/huggingface-cache
HUGGINGFACE_HUB_CACHE=/runpod-volume/huggingface-cache/hub
WORKSPACE=/runpod-volume

RUN ls -a /

Create necessary directories

RUN mkdir -p ${WORKSPACE}/app ${HF_HOME}

Copy requirements first to leverage Docker cache for dependencies

COPY requirements.txt ${WORKSPACE}/

Install dependencies in a single RUN statement to reduce layers

RUN python3.11 -m pip install --no-cache-dir --upgrade pip &&
python3.11 -m pip install --no-cache-dir -r ${WORKSPACE}/requirements.txt &&
rm ${WORKSPACE}/requirements.txt

Copy source code to /runpod-volume/app

COPY test_input.json ${WORKSPACE}/app/
COPY src ${WORKSPACE}/app/src

Set the working directory

WORKDIR ${WORKSPACE}/app/src

Use the built-in handler script from the source

CMD ["python3.11", "-u", "runpod_handler.py"]

tawny wyvern Nov 6, 2024, 5:53 PM

#

@latent steeple @wintry zodiac

Did you guys ever get this working, I’m trying to do the same thing with ai-toolkit. Flux dev model.

Any code you can share? There are some things in your docker image @wintry zodiac id love to be able to edit

tawny wyvern Nov 6, 2024, 9:56 PM

#

thank you!! 😭😭

wintry zodiac Nov 6, 2024, 10:03 PM

#

📎 FLUX-LORA.zip

#

@tawny wyvern

I have lost the Dockerfile of

https://hub.docker.com/r/navinhariharan/flux-lora/tags

tawny wyvern Nov 6, 2024, 10:07 PM

#

That’s okay ! I should be able to reverse engineer 🙂

#

Thank you so much!!

wintry zodiac Nov 6, 2024, 10:11 PM

#

tawny wyvern That’s okay ! I should be able to reverse engineer 🙂

Please send it here if you have managed to do it!

tawny wyvern Nov 6, 2024, 10:12 PM

#

Deal sounds good!

wintry zodiac Nov 6, 2024, 10:31 PM

#

@tawny wyvern Are you free now?

#

Give this a test! Should work hopefully!

📎 Dockerfile

tawny wyvern Nov 6, 2024, 10:41 PM

#

@wintry zodiac amazing okay thanks!!

I uploaded the contents of the docker image to a private github, did you want me to share it with you private?

wintry zodiac Nov 6, 2024, 10:59 PM

#

Here is the everything working!

🙂

📎 FLUX-LORA.zip

wintry zodiac Nov 6, 2024, 11:00 PM

#

tawny wyvern <@250658871276732428> amazing okay thanks!! I uploaded the contents of the doc...

You can make it public! No issues! Many people may get benefited!

wintry zodiac Nov 6, 2024, 11:03 PM

#

wintry zodiac Here is the everything working! 🙂

Removed unnecessary code!

It's just the models the models that the FROM is pulling!
AI toolkit will now be downloaded on this Dockerfile!

TO-DO:
Support the schnell config

tawny wyvern Nov 7, 2024, 2:40 PM

#

https://github.com/newideas99/flux-training-docker.git !

GitHub

GitHub - newideas99/flux-training-docker

Contribute to newideas99/flux-training-docker development by creating an account on GitHub.

naive drift Jun 1, 2025, 1:31 PM

#

@tawny wyvern @wintry zodiac I built a Docker image using this repo https://github.com/newideas99/flux-training-docker and successfully trained Lora using Runpod serverless endpoints. However, when I run the trained Lora, I get this error: "Exception: Error while deserializing header: HeaderTooLarge." I am no expert, but the Lora safetensor file might be corrupted, and the reason behind the corruption is the Docker base image "navinhariharan/fluxd-model."

Any help is appreciated.
Best,
Jesse

GitHub

GitHub - newideas99/flux-training-docker

Contribute to newideas99/flux-training-docker development by creating an account on GitHub.

wintry zodiac Jun 1, 2025, 1:32 PM

#

naive drift <@309002691311566862> <@250658871276732428> I built a Docker image using this re...

Can you please screenshot the error?

naive drift Jun 1, 2025, 2:44 PM

#

thanks for your quick reply. i am using the lora.safetensors[uploaded to my s3 storage by runpod-serverless.py handler.] file on replicate.

#

@wintry zodiac I have tried to train multiple LoRas, and I got the same errors.

#

i tried to run this lora in comfyUI too, and it gave me same error

wintry zodiac Jun 1, 2025, 4:25 PM

#

@naive drift Your request header is too large

naive drift Jun 2, 2025, 12:39 AM

#

@wintry zodiac what does it mean?

wintry zodiac Jun 2, 2025, 12:40 AM

#

naive drift <@250658871276732428> what does it mean?

The request you sent has a huge text/Data!

Can you send me the request json you sent? Please remove credentials if you have entered any

naive drift Jun 2, 2025, 12:40 AM

#

sure

naive drift Jun 2, 2025, 12:58 AM

#

naive drift Jun 2, 2025, 1:42 AM

#

@wintry zodiac it would be great help if you could provide dockerfile of this image as well. navinhariharan/fluxd-model

#

thanks

wintry zodiac Jun 2, 2025, 3:43 AM

#

naive drift

Can I get a full ss of these logs?

wintry zodiac Jun 2, 2025, 3:43 AM

#

naive drift <@250658871276732428> it would be great help if you could provide dockerfile of ...

I'll need to have a look! Idk where I have put it

wintry zodiac Jun 2, 2025, 3:47 AM

#

naive drift <@250658871276732428> it would be great help if you could provide dockerfile of ...

https://github.com/navin-hariharan/FLUX-INFERENCE-LORA

GitHub

GitHub - navin-hariharan/FLUX-INFERENCE-LORA: Flux Inference with L...

Flux Inference with Lora - runpod worker. Contribute to navin-hariharan/FLUX-INFERENCE-LORA development by creating an account on GitHub.

naive drift Jun 2, 2025, 4:10 AM

#

thank you so much navin, i appreciate it. I'll provide you the logs from desktop shorty, thanks again

last mountain Jun 2, 2025, 4:16 AM

#

How did you download the lora? using what

naive drift Jun 2, 2025, 6:57 AM

#

2025-06-02 00:55:17.380 | INFO | fp8.lora_loading:restore_base_weights:600 - Unloaded 304 layers
2025-06-02 00:55:17.382 | SUCCESS | fp8.lora_loading:unload_loras:571 - LoRAs unloaded in 0.0042s
free=26730077900800
Downloading weights
downloading weights from https://lora-urls.co/xzy.safetensors
Downloaded weights in 8.33s
2025-06-02 00:55:25.713 | INFO | fp8.lora_loading:convert_lora_weights:502 - Loading LoRA weights for /src/weights-cache/f14ea1f2c70aca45
Traceback (most recent call last):
File "/root/.pyenv/versions/3.11.12/lib/python3.11/site-packages/cog/server/worker.py", line 352, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/predict.py", line 566, in predict
model.handle_loras(
File "/src/bfl_predictor.py", line 118, in handle_loras
load_lora(model, lora_path, lora_scale, self.store_clones)
File "/root/.pyenv/versions/3.11.12/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/src/fp8/lora_loading.py", line 543, in load_lora
lora_weights = convert_lora_weights(lora_path, has_guidance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/fp8/lora_loading.py", line 503, in convert_lora_weights
lora_weights = load_file(lora_path, device="cuda")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.12/lib/python3.11/site-packages/safetensors/torch.py", line 311, in load_file
with safe_open(filename, framework="pt", device=device) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

#

@wintry zodiac I have pasted logs from the replicate here.

#

@wintry zodiac I think the GitHub repo is related to ComfyUI, not with the "navinhariharan/fluxd-model", which I requested.

#

@last mountain The trained Lora was uploaded via script[worker] to my S3 bucket, and I am loading it via URL into Replicate Inference.

last mountain Jun 2, 2025, 7:05 AM

#

Oh its on replicate not runpod

#

maybe the safetensors arent valid?

#

you're downloading the wrong file, downloadnig a corrupted file or the file was corrupteed

naive drift Jun 2, 2025, 7:16 AM

#

@ You mean the trained Lora is corrupted, right?

last mountain Jun 2, 2025, 7:21 AM

#

probably, or the download process wrecks the lora

#

you can probably check the hash

#

if the downloaded files & the one in your s3 is same then its not because of the download

#

and also check in other way to use the lora, who knows its the replicate that cannot load the lora

#

it may work somewhere else

#

if it doesnt then your lora is probably corrupted

naive drift Jun 2, 2025, 7:27 AM

#

@last mountain thanks, I will check this out.

#

@last mountain I have verified and found that the downloading process doesn't make any difference to the file.

#

Hashes match, so my Docker image is the culprit

last mountain Jun 2, 2025, 7:49 AM

#

naive drift Hashes match, so my Docker image is the culprit

Huh? What do you mean

#

Does it work somewhere else?

naive drift Jun 2, 2025, 7:51 AM

#

no, it's not working anywhere I tried, over replicate and in comfyUI as well, and both gave me the same error.

#

I used repo and tweaked it a bit for my use case, I think the issue lies in the base image 'navinhariharan/fluxd-model" since the layer image doesn't hold anything related to the training process itself,

https://github.com/newideas99/flux-training-docker

#

i also tried to build an image from scratch, but that didn't work. 😥

last mountain Jun 2, 2025, 8:48 AM

#

maybe its the lora model

naive drift Jun 3, 2025, 6:17 AM

#

last mountain maybe its the lora model

the flux files were corrupted, so I had to start from scratch, and it worked. thanks @last mountain @wintry zodiac for your help

last mountain Jun 3, 2025, 6:20 AM

#

Your welcome glad you found it!

#Training Flux Schnell on serverless

Use a more specific base image for efficiency

Set environment variables

Create necessary directories

Copy requirements first to leverage Docker cache for dependencies

Install dependencies in a single RUN statement to reduce layers

Copy source code to /runpod-volume/app

Set the working directory

Use the built-in handler script from the source