#Why are fp4 providers allowed to be used ahead of bf16 providers only due to a better price?

7 messages · Page 1 of 1 (latest)

icy olive
#

Currently, the top provider for gpt-oss-120b is an fp4 provider which is $0.01 cheaper than a bf16 provider. The bf16 model will have significantly better response quality. It seems like you are penalizing the bf16 providers, or rather incentivizing open source inference providers to provide the most quantized and low quality version of open source models possible by not factoring quantization into your best bid algorithm.

still yacht
#

Our price sorting isn't so aggressive that the BF16 provider will get no traffic with the difference being $0.01. Regarding "significantly better response quality", are there any specific evals you ran to measure that difference? I'm sure the team is happy to factor that into consideration if we can reproduce those results

icy olive
#

I do not have an eval, but anecdotal reports from open source subreddits are that fp4 will have a noticeable quality and intelligence reduction from 8 bit or 16 bit

brisk kindle
#

Ive never heard of open source subreddits

icy olive
brisk kindle
icy olive
#

No need to be intentionally obtuse