Gemma 2 issues on raw pretraining | Unsloth AI | Page 1

tiny temple Oct 1, 2024, 8:22 AM

#

Using this notebook:https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing with gemma 2 results in the following error: (only tested 2b, both instruct and base)

Exception in thread Thread-12 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1704, in generate
    outputs = self.base_model.generate(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py", line 919, in _CausalLM_fast_forward
    outputs = fast_forward_inference(
  File "/usr/local/lib/python3.10/dist-packages/unsloth/models/gemma2.py", line 396, in Gemma2Model_fast_forward_inference
    seq_len = past_key_values[0][0].shape[-2]
TypeError: 'HybridCache' object is not subscriptable

This error occurs when running the last step (inference). The same notebook works for llama 3.2 3B

Google Colab

#

relevant discussion: https://huggingface.co/google/gemma-2-2b/discussions/24

google/gemma-2-2b · TypeError: 'HybridCache' object is not subscrip...

tiny temple Oct 1, 2024, 9:17 AM

#

Another issue is that i trained llama 3.2 3B base on this same notebook but when i try to do inference using VLLM, i get this error:
"As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one."

Since we trained a base model with no chat template, what chat template am i supposed to use?

vernal otter Oct 1, 2024, 9:28 AM

#

tiny temple Another issue is that i trained llama 3.2 3B base on this same notebook but when...

Not sure about the HybridCache problem, but this one may relate to this solution: https://discordapp.com/channels/1179035537009545276/1179777624986357780/1290255706053939243

tiny temple Oct 1, 2024, 9:29 AM

#

its not the same issue

split oriole Oct 1, 2024, 2:36 PM

#

tiny temple Another issue is that i trained llama 3.2 3B base on this same notebook but when...

Found this relevant issue/solution for you: https://github.com/vllm-project/vllm/issues/7978#issuecomment-2316723142

GitHub

[Usage]: run gguf model need template，how to write？ · Issue #7978 ...

Your current environment BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you ...

tiny temple Oct 1, 2024, 2:37 PM

#

split oriole Found this relevant issue/solution for you: https://github.com/vllm-project/vllm...

base models dont have a chat template

split oriole Oct 1, 2024, 3:10 PM

#

Did you actually read the link telling you to update transformers?

#

I am aware base models do not have a chat template.

#

Good luck.

tiny temple Oct 2, 2024, 9:30 AM

#

i did unless i missed reading something

#Gemma 2 issues on raw pretraining