#Does Unsloth w/ ART work with GPT-OSS:20b?

7 messages · Page 1 of 1 (latest)

waxen rose
#

Hi I've been playing with Unsloth GRPO, as well as ART
I've managed to follow https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning/tutorial-how-to-train-gpt-oss-with-rl and get the 2048 example to work with gpt-oss:20b, and I also got ART https://github.com/OpenPipe/ART 's tic tac toe example to work with Qwen 4B
https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb

The issue however is when I try to use ART with gpt-oss:20b, which runs into

AutoConfig error: The checkpoint you are trying to load has model type `gpt_oss` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

PeftConfig error: Can't find 'adapter_config.json' at 'unsloth/gpt-oss-20b-unsloth-bnb-4bit'

I think the reason is because ART uses outdated transformers & torch (2.7) instead of the latest torch 2.9.
When I naively try to update however I run into version conflicts. has anyone hit this before?

Learn to train OpenAI gpt-oss with GRPO to autonomously beat 2048 locally or on Colab.

GitHub

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more! - OpenPipe/ART

surreal charmBOT
#

ART (Agent Reinforcement Trainer) is designed to work with Unsloth-supported models, but GPT-OSS:20b requires a recent version of transformers (≥4.56.2) for proper config recognition, while ART's default environment uses older versions (e.g., torch 2.7, transformers <4.56). This version mismatch causes the "model type gpt_oss not recognized" error you encountered. Updating transformers in ART's environment leads to dependency conflicts, as ART and vLLM may pin older versions for compatibility reasons. This is a known issue when mixing Unsloth's latest models with ART's default setup, and there is no official compatibility matrix or lockfile to resolve these conflicts automatically yet.

To use GPT-OSS:20b with ART, you must ensure that both ART and all dependencies (torch, transformers, unsloth, vllm, etc.) are updated to versions compatible with GPT-OSS:20b, but this may break ART or vLLM features due to their own version constraints. There is no out-of-the-box solution documented; users facing this must manually resolve dependency conflicts or wait for ART to update its support for newer Unsloth/transformers versions. See ART repo, Unsloth RL tutorial, and Unsloth dependency discussions for more context.

Would you like a step-by-step guide on how to attempt manual dependency resolution for this setup?

Sources:

waxen rose
#

Specifically, if I upgrade transformers, then I run into

    File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
      return model_class.from_pretrained(
  ^^^^^^^^^^^^^^^
    File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
      return func(*args, **kwargs)
  ^^^^^^^^^^^^^^^
    File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4971, in from_pretrained
      model = cls(config, *model_args, **model_kwargs)
  ^^^^^^^^^^^^^^^
  TypeError: GptOssForCausalLM.__init__() got an unexpected keyword argument 'enable_prefix_caching' 

and then if I upgrade unsloth and torch, I run into

  File "/home/hackathon/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 24, in <module>
    from vllm.config import (BlockSize, CacheConfig, CacheDType, CompilationConfig,
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/config.py", line 35, in <module>
    from vllm.model_executor.layers.quantization import QuantizationMethods
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/model_executor/__init__.py", line 4, in <module>
    from vllm.model_executor.parameter import (BasevLLMParameter,
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 10, in <module>
    from vllm.distributed import get_tensor_model_parallel_rank
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/distributed/__init__.py", line 4, in <module>
    from .communication_op import *
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/distributed/communication_op.py", line 9, in <module>
    from .parallel_state import get_tp_group
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 150, in <module>
    from vllm.platforms import current_platform
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/platforms/__init__.py", line 267, in __getattr__
    _current_platform = resolve_obj_by_qualname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 2539, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hackathon/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/platforms/cuda.py", line 18, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: /home/hackathon/grpo-art-basic/.venv/lib/python3.12/site-packages/vllm/_C.abi3.so: undefined symbol: _ZNK3c106SymInt6sym_neERKS0_
waxen rose
#

I am installing vllm from source now

waxen rose
#

FYI I spent like 10 hours and unable to resolve this, I decided to write my own trajectory-based training script based on Unsloth, the result is vasly better VRAM usage (although somewhat slower)

#

script