The Future of Autonomous Agents

I recently did a talk in SF on the future of autonomous agents – then published the deck on X/Twitter here. We’ll be doing an hour long livestream with Q&A on May 16th at 9am on X/Twitter, so mark your calendars (or add your email here for reminders and recording link)!

[update: We did the livestream, watch it here.]

To summarize, agents are still new. There are 40+ agent related Github projects with over 500 stars, 95% of which are less than a year old (BabyAGI launched a year ago). Eventually we’ll have fully autonomous robots and swarms of digital agents but how do we get there?

When new technology is introduced, there is a period of rapid experimentation – followed by eventual consolidation, as a combined result of best practices emerging and opinionated approaches from market leaders taking hold.

To understand where we are today, it helps to bucket them into three categories of agents: (1) hand-crafted, (2) specialized, and (3) general. Hand-crafted agents are chained prompts and API calls that some wouldn’t call autonomous since people are creating the task list. These are making money today. Specialized agents can dynamically decide what to work on within narrow field utilizing a subset of skills and tools – think Devin for coding. For these, we’re starting to see good demos, they’re lining up pilots with forward thinking customers and raising capital – but as far as I know – not super reliable. General agents can do anything – and we’re far from this.

The practical reality is that the most valuable tasks to automate are high value and happen often so you don’t want an LLM to dynamically generate a task list – you want a hand-crafted agent. As we move to lower value but still common tasks, these tend to fall within the realm of a specialized agent – if it’s not core (and you might not even be doing it), the unreliability of a specialized agent may be acceptable. The practical go-to-market considering these factors is starting by building and selling a high-value hand-crafted agent while building it modularly and testing the ability to dynamically tackle other tasks using the same skills and tools.

A common pattern we see amongst enterprises planning out the automation of human work with AI: (1) assist, (2) AI-review, (3) human-review. Step 1 is having AI assist the human worker in specific steps of their workflow whether it be collecting relevant data, drafting content, etc. You slowly add these in until you’re ready to have the AI review human work – which is a big jump because this requires the AI to understand the full scope of the project. But once you give the AI the ability to do each part of the workflow AND review, you can have the AI do the entire workflow and have people review their work. You can then use these reviews as data to improve the AI – until you no longer need someone review because the output is consistent enough.

I recently did a survey on X/Twitter asking if people wanted a single agent that could do anything, or multiple agents really good at different tasks. Overwhelmingly people wanted a single agent – but with many including a caveat that they didn’t necessarily trust a single agent to hold all of their data or be good at everything. What people seem to want was a single agent they communicate with, that can communicate with and leverage agents from other people and tools.

We have a long way to go until autonomous agents are everywhere – not just technical, but societal (resistance, switching cost), political (accountability, regulation), and ethical (job displacement, bias).

One of my recent explorations is on the memory layer – specifically using graphs (but I talked about this and relevant startups last month). This is just one of many AI agent related approaches and solutions – almost too much to track. e2b’s market map has been a favorite resource here. Again, we are in the rapid experimentation phase so I expect only a few of the approaches to eventually win out – though hard to tell which at this point.

Some examples tools/approaches we found unique or interesting in the space include PayMan which is a marketplace for AI agents to hire people, NPi which is a unified tool use API, FireCrawl turns any URL into LLM ready data, and Browserbase an infra layer for handling headless browsers. Arcee is a tool for managing continuous fine-tuning of models. e2b crossed 150k active sandboxes for their code interpreter SDK and Anon announced their seed raise for their auth tool.

We’re currently exploring an autonomous agent specific fund (side vehicle) to cut smaller checks across a large number of companies in the space (let us know if you’re interested). Make sure to join us on May 16th at 9am on X/Twitter for our one hour session on the Future of Autonomous Agents, where we’ll talk through this topic further.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *