I checked out the scores on OAI new 4.1 models which after seeing their perf scores and pricing , I quickly switched over for testing for my pipeline. (gpt-4.1-mini) Both 4.1 AND 4.1-mini both outscore 4.0 and 4o-mini on instruction following. (yes even 4.1-mini)
The important stuff. They have significantly improved instruction following and monitored it across format following (XML,YAML and markdown) ordered and negative instructions....
With 4.1-mini outscoring 4o (instruction following), making the move to 4.1-min brings the costs for me down from $s per million tokens to 40 cents (input)
https://openai.com/index/gpt-4-1/
Subjectivley, I've been using 4.1-mini for the last 6 hours and have noticed the speed is much faster compared to 4o, in additon to the accuracy. I'm now asking my agent to perform multiple tasks at once with no errors. Now all I need is a better mic on this Voice PE unit and it will be a dream!
(Anyone else willing to pay more for a premium HA Voice unit with Alexa quality mics)