#This latest model is Dumb as Rocks
1 messages · Page 1 of 1 (latest)
I assume you're a free user? But if not, use "Thinking" instead, that uses 5.4 instead of 5.3. which is significantly better imo (and only available to paying users) but it's really good.
5.4 has big issues due to the pathetic attempt to be made friendly and smart at the same time.
I made a post and a ticket already for very serious issues- in a work environment, no banter or personal confession stuff. It is horibly bad configured
5.4 WARNING!!!
In work environment is having big issues!
day after day we noted
- ignoring constraints in coding workflow
- ignoring inference
- wrong categorization
- not updating reasoning to the last state and hanging on old data
- wrong salience
6.real downstream risk
7 overly confident statements without verification
If you use it pay attention to abnormal output. As it is now it became a liability for any SMB using it for technical workflows!
5.4 use agent tool ...
5.4 pro if i am understand good, works only like this
5.4 thinking advance not allways
So you need to.write what he must do and."WARNING!"what he cant do WHEN you working at repo for example. If you dont write "DO NOT DO THIS!" You will give him not much clear instructions. Try start conversation with 5.3 for example, insta for planing. He got autorouting, so for hard execution he will auto switch for 5.4. With his plan
all good hypothetically. A pity though the inference and constraints through any custom instructions or project instructions are flavour, skipable or easy swapable if circumstances direct to a faster cheaper solution that the 5.4 wont even inform about just work on it and ... will be the user's fault if it cannot intercept deviation in time.
I am know he can driff, Ultimately, the execution is performed by the LLM, it can skip certain steps or stages in various ways. I think the system plays a role in this situation. It depends on what we expect. If the execution were performed by the engine itself, the response would probably not be as consistent with the instructions, but it would be more "systematic." You can build your own platform and customize it, or use a publicly available system. In this case, the situation is that the LLM ultimately has the power - despite prohibitions. (e.g., if it wants to help too much, I'm not talking about jailbreak, but about a typical control system.)
Thats .Therefore, in my opinion, even if the model claims to have done something or performed every step, it is worth checking and asking it for a list of, for example, files in which it changed/deleted/edited anything and indicating exactly what it changed in them, etc. Even if we do not understand coding very well, when we ask for evidence of what it did, if there is no evidence, it should correct itself and indicate where it provided inconsistent information regarding what it claims to have done and what it actually did.
In fact, it's sometimes extremely irritating😅 every single step veryfication
Also Reading the chain of thoughts helps me a lot, at least personally, I know that I can't trust it 100%, but it's often where I detect various LLM or agent outbursts
No, paid user. It ignores the most recent and end of document directions or updates. Just a minor bug with an easy fix that shouldn't have to be fixed by users.