- If I add a conversation item with base64ed audio, and output modality "text", I never get a complete response, it just hangs.
- Text-only input works fine.
- Adding "audio" to the output modalities also works fine, but has much higher cost (especially since I just want text output).
Is this combination supported? If not, should the API produce an error instead?