Closing the Loop

I have a hunch that a lot of important engineering in the next 5 years will be some form of closing the loop: giving an LLM with agentic capabilities a way to verify its own work, then setting it loose with all the tools it needs to accomplish its task.

In the world we are heading towards only two things matter:

  1. Set up a killer harness so that the LLM can fully use the internet, code, whatever it needs
  2. A good, deterministic, gold evaluation set to test against