#HF asks for WOLRD_SIZE and other variables

1 messages · Page 1 of 1 (latest)

pseudo hearth
#

I'm trying to fine-tune an LLM using HuggingFace Transformers, but it keeps asking for WORLD_SIZE, MASTER_ADDR and MASTER_PORT. The process is running on a pc with a single gpu, so I guess WORLD_SIZE=1 and RANK=0, but what about the other variables?

`from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="test_trainer")`

error: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set

prime hearth
pseudo hearth
#

Thanks for your reply. I did solve the problem by using os.environ['MASTER_ADDR'] = 'localhost' and os.environ['MASTER_PORT'] = '29500' and lastly torch.distributed.init_process_group(backend='nccl', world_size=1, rank=0).

#

I find it weird that this is needed in a single machine single gpu environment, especially that the HF documentation makes no mention of this and assumes that it will just work

prime hearth
pseudo hearth
#

I did nothing more than using the code you see above