I have observed an inconsistency in the credit consumption. While I understand that the expanding context window as a project progresses might justify a higher cost per task, the current model lacks predictability and seems disproportionate.
In contrast, tools like Copilot Pro maintain a fixed cost of one credit per interaction, even if the generation process extends for several minutes, without this increasing the price. This predictability, combined with a significantly superior performance, highlights a gap in the current value proposition. Although I understand the product is in a "beta" phase, I suggest that the cost model be reviewed to better align it with the value and performance delivered.
PD: To provide context for my feedback, I have benchmarked the performance of initial tasks on your tool against Copilot Pro and Sonnet 3.5, and the results suggest that your value proposition should be re-evaluated. My sole intention is to offer a constructive perspective aimed at improving the product. I recognize the significant development effort behind such a complex tool, and my interest in thoroughly testing it stems from its clear potential.
Furthermore, I have identified opportunities for improvement in error handling. In the first version I tested a few months ago, I experienced errors when moving or copying files. This time, an error occurred when I tried to upload assets that exceeded GitHub's maximum file size limit. After three failed iterations, I had to manually intervene by instructing it to add the folder with the large files (assets/models) to the .gitignore and start over. In comparison, tools like Copilot Pro also encounter errors, but they demonstrate a greater ability to diagnose and resolve them autonomously.