that is approximately 9 fully populated data centers. Each at 27 trillion weights, or 15 times the scale of GPT-4's original model.
One possible way to utilize this much hardware would be to try 9 separate hypotheses for a major model improvement in parallel, per 1-3 month training cycle, or approximately 36-108 hypotheses per year. A modular architecture, reusing subnetworks, might increase the number of hypotheses tried to thousands.
RSI criticality is some hidden threshold where this begins to work. (where each cycle, the league of models becomes more capable at guessing effective hypotheses, and quickly scale to AGI level capability on the test bench)