#Grok Code Fast 1 auto-including reasoning tokens in input?

7 messages · Page 1 of 1 (latest)

keen tendon
#

I can see on my activity page that OR is clearly charging me for input tokens as if I was including the reasoning tokens from all requests in each call. But here's the thing...I'm not.

                model=TESTEE_MODEL,
                messages=messages,
                temperature=TESTEE_TEMPERATURE,
                max_tokens=32000,
                extra_body={
                    "reasoning": {
                        "enabled": True,
#                        "max_tokens": 8000,
                        "effort": "low"
                    }
#                    "provider": {
#                        "only": ["cerebras"]
#                    }
                }
            )
question = completion.choices[0].message.content.strip()

This last line is the ONLY time I reference the returned answer from the model directly, and it is clearly getting message.content, which indeed is logged by my program as the short text answer, no reasoning. IE, nothing else can possibly reference the reasoning tokens. Like, the function ends immediately after this, returning only "question". Here is an example message history that showed as 8000+ tokens in Activity page.

https://pastebin.com/QhKDuM2n

#

This exact code is also not having this issue with GLM-4.5, reasoning on.

#

@frank mulch

#

Moved this here from the model chat so I don't clog it up.

keen tendon
#

Wait, is this some weird caching thing? Even though past reasoning tokens are never fed into inference cat_huh

ember frost
keen tendon
#

Weird...Hard to do the math on if I'm being charged for this rn because the requests are small and input is cheap.