#GPT-4 Code Interpreter: overall feedback/bugs + suggestions

1 messages · Page 1 of 1 (latest)

dusky nimbus
#

I've been using Code Interpreter mostly on Python and html+css+js. While my experiences have been mostly positive, I've noticed that it has a tendency to "lose track of the code" more easily than the non-Code Interpreter version of GPT-4.

There could be various reasons for this: more often than not, CI doesn't go through the provided attachment file either properly, ignores updates in the source code, or skims through it and claims it has read it(!) => hallucinates. This would need some kind of backend implemented verification?

GPT-4 CI's nasty habit of simply just ignoring the attached file and treating it like it's "seen it already" (and hence hallucinating the contents), or "skimming it through" (and missing a plethora of key points while at it), hence, I've often found it way more useful to just copy-paste in the relevant code into the text input field for higher accuracy.

Another issue that still persists in both GPT-3.5 and GPT-4 is the hallucination and random dropouts in functions and variables (they get changed on the fly, stuff gets omitted at random, etc). Maybe check out what could be done in terms of reading in a code's functions and variables on OpenAI's model backend?

I have no idea what's going on over there in OpenAI's backend, but I'd assume a good idea would be to run diff type A/B checks for the code, have a validator in the pipeline, etc -- if it's about the model's context memory, I wonder if it is a plausible idea to use some sort of "shorthand" for code blocks and their functionality in the backend of the model? Could the model be i.e. trained to load all the variables and functions inside a source code into arrays that could be addressed via function calls, so that nothing gets omitted?

An overview of the code with function and variable names should be double-checked; it's odd how often GPT-4 CI just drops something out of a function or replaces variable names. I suggest double-checks.

Thanks for your time and good luck.