Iset lora_rank to 128 and num_train_epochs to 5, now when it is dome and try to save it gets error out of memory.
What can i do about this error?
Training complete.
Saving the final trained model to: train_output
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 29.11 out of 62.58 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...
31%|█████████████▍ | 10/32 [00:00<00:00, 30.71it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/tracy/Desktop/finetuning/train_dataset.py", line 198, in <module>
[rank0]: main()
[rank0]: File "/home/tracy/Desktop/finetuning/train_dataset.py", line 191, in main
[rank0]: model.save_pretrained_merged("model_trained_merged", tokenizer, "merged_16bit")
[rank0]: File "/home/tracy/Desktop/finetuning/tracy/lib/python3.12/site-packages/unsloth/save.py", line 1313, in unsloth_save_pretrained_merged
[rank0]: unsloth_save_model(**arguments)
[rank0]: File "/home/tracy/Desktop/finetuning/tracy/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/tracy/Desktop/finetuning/tracy/lib/python3.12/site-packages/unsloth/save.py", line 569, in unsloth_save_model
[rank0]: W, bias = _merge_lora(proj, name)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/tracy/Desktop/finetuning/tracy/lib/python3.12/site-packages/unsloth/save.py", line 177, in _merge_lora
[rank0]: maximum_element = torch.max(W.min().abs(), W.max())
[rank0]: ^^^^^^^
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 23.53 GiB of which 201.50 MiB is free. Process 11890 has 6.04 GiB memory in use. Including non-PyTorch memory, this process has 17.24 GiB memory in use. Of the allocated memory 16.25 GiB is allocated by PyTorch, with 146.00 MiB allocated in private pools (e.g., CUDA Graphs), and 108.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[rank0]:[W306 14:40:29.024894852 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())