I like that budget idea. A "compute" budget for exploring how to do a new task that includes searching the local knowledge cache for existing results (fast and cheap), digesting existing data to infer new insights, and scraping the external data landscape via search APIs, and perhaps even hiring third-party agents (human, AI, whatever)
LLMs are the perfect no-code workers, but I've found that the AI struggles at executive tasks that require a lot of "hidden working memory". The LLM has to have it's working memory right there in front of it, kind of like showing all your work while solving a multi-step math problem, if someone messes with your notes during the process, you'll need to start all over again. What I think this means is, you need to have a lot of agents, each with their own "chat-logs" that act as their local working memory. This log must only contain strictly the information they need to succeed in their level of abstraction, and nothing else. At the top-level it's concerned with KPIs and maintaining a global strategy document. The strategy document is then used as a modulator for lower-level task planning, letting middle management prioritize according to those constraints. The leaves of the tree, the agents that are closest to real work getting done would update some global project repository, and perhaps even hire real-world workers (fiverr, etc) as needed.
Right now I think the biggest roadblock is how expensive it can get to process lots of information. I really wish it was 100x or 1000x cheaper. It would open up so much opportunity. Not every agent needs to be an expensive generalist