Based on the knowledge sources provided, there isn't specific information about when to ask the model for recommended questions in the context of a Retriever-Augmented Generation (RAG) setup or using GPT-4o for refining queries, composing responses, and recommending the next question.
However, the sources do provide a general principle that could be applied to optimize the latency of your system: "Make fewer requests". Each time you make a request, you incur some round-trip latency. If you have sequential steps for the model to perform, consider putting them in a single prompt and getting them all in a single response. This could potentially reduce the additional round-trip latency and also reduce the complexity of processing multiple responses.
In your case, you could consider combining the tasks of refining queries, composing responses, and recommending the next question into a single request to the model, if possible. This could be done by collecting your steps in an enumerated list in the combined prompt, and then requesting the model to return the results in named fields in a JSON. This way you can easily parse out and reference each result.
However, the specifics of how to implement this would depend on the details of your setup and the capabilities of the model you are using. You may need to experiment with different approaches to find the one that works best for your needs.
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: