AI Toolkits: The New Infrastructure Stack

From static retrieval to autonomous action

Written by

Meera Oak

Published on

AI agents can think, but they rely on tools to act. These tools are evolving across three layers. Static tools retrieve information. Dynamic tools execute actions like coding or system integration. Emerging tools enable multimodal input, agent collaboration, and real-world interaction.

While static and some dynamic capabilities may become commoditized or absorbed by large platforms, the most promising opportunities lie in secure, trusted infrastructure and cross-agent communication systems. These can scale with increasingly complex, real-world agent workflows.

About The Authors:

Meera Oak
Meera Oak
Partner

Prior to Alumni Ventures, Meera led finance and product initiatives at Yale University. She managed a $1B P&L, led M&A transactions and secured business development relationships with corporate partners. She later led product for a cloud-based ERP implementation giving her the fluency to connect with developers navigating today’s platform shift. Most recently, she worked with early-stage venture funds and incubators like Create Venture Studio and Polymath Capital Partners, launching and sourcing ventures in enterprise SaaS and infrastructure. Meera has a BA in Economics from Swarthmore and an MBA from the Tuck School of Business at Dartmouth.

Lucy Friedmann
Lucy Friedmann
Senior Associate

Before joining Alumni Ventures, Lucy built her career launching new products at Amazon, first for AWS and then for Amazon Devices. She graduated with honors from Yale University in 2019 and earned a dual MBA/MA from the University of Pennsylvania’s Wharton School and the Lauder Institute in 2025, where she focused on finance and European venture capital. A lifelong fencer, she competed all four years on Yale’s varsity team. Now happily retired from the sport, she spends her free time hitting tennis balls with friends and supporting the arts, including her volunteer work with American Ballet Theatre’s Junior Council.

Kshiteej Prasad
Kshiteej Prasad
Alumni Ventures Scout

Kshiteej Prasad is an MBA candidate at Harvard Business School with a background in global operations at Procter & Gamble and early-stage investing at Bessemer Venture Partners. He’s passionate about backing bold ideas across industrial, climate, fintech, and cybersecurity verticals, and spends his free time chasing squash rallies and better thesis ideas.

In our last blog, we broke down the architecture of an AI agent: the reasoning engine, the memory layer, the orchestration control plane, and the guardrails that keep everything on track. We described that, at its core, an agent is just a “brain in a jar”: able to think about actions but unable to execute them.  

Tools are how agents reach into the world: search the web, execute code, authenticate into third-party services, interact with browsers, and soon, operate machinery, place orders, and delegate to other agents. The model thinks. The tools act.

Just as the API economy created enormous enterprise value (Stripe, Twilio, and Plaid together are worth hundreds of billions), the tooling layer of the agentic stack will produce the same. But not all tools are created equal. Three categories are emerging, and each boasts unique risks and competitive dynamics.

Start Investing With AI Infrastructure & Tools Syndicate Today

Take 5 seconds. No document uploads.

Think of static tools as the library card of the agent era. They give agents access to information (the web, documents, and databases) without changing anything. These tools emerged first, are the most mature, and are already experiencing early-stage commoditization.

Take web search, for example. An LLM trained six months ago doesn’t know about last week’s earnings calls or yesterday’s regulatory rulings. Search APIs like Tavily, Parallel, and Exa bridge these gaps and provide structured, agent-optimized retrieval. Exa’s approach is particularly interesting, as the tool indexes the web semantically rather than by keyword.

Then there is the problem of unstructured data (PDFs, Slack archives, internal wikis, scanned contracts, etc.). Morphik and Unstructured (AV portco) specialize in converting this messy reality into clean, retrievable context that agents can actually use. Firecrawl and Apify, in particular, specialize in structured web extraction and turn dynamic pages into clean, structured data that agents can process. As agents are increasingly tasked with market research, competitive intelligence, and price monitoring, this category stands to dramatically increase in scale.

Our Hypothesis:

This layer is under pressure. OpenAI, Anthropic, and Google are all building native retrieval into their models and compressing the standalone value of commodity search APIs. Even domain-specific retrieval (legal documents, scientific literature, financial filings, medical records, etc.) could be at risk, as these large incumbents start to verticalize. There may no longer be a defensible play even with vertical knowledge access.

Dynamic tools are how agents go from observation to action. They’re also how agents become risky. A tool that can search the web is safe. A tool that can execute code, authenticate into production systems, and interact with browsers on a user’s behalf is a different beast entirely.

When it comes to investing in the AI agent stack, most VC dollars are flowing here and for good reason. The problems are hard, the moats are real, and enterprise demand is obvious.

Code Execution & Runtime Environments

When an agent writes code, it needs somewhere safe to do so, isolated from production systems, sandboxed against security vulnerabilities, but fast enough that a multi-step coding agent doesn’t have to wait seconds per step. E2B is building secure, ephemeral sandboxes purpose-built for agents. Daytona treats its sandbox like a persistent workspace rather than something temporary. It supports saved environments and Docker-native images, includes built-in coding tools, and can spin up new sandboxes in under 90ms. This makes it particularly well-suited for long-running agentic workflows that need to pause, resume, and run tasks in parallel. Modal approaches this need from the compute angle, offering serverless GPU infrastructure to make agentic code execution economically viable at scale

Browser & Computer Use

Agents that can operate a browser are, effectively, agents that can operate all legacy software (ERPs, CRMs, insurance portals, government systems, etc.). Browserbase provides scalable, cloud-hosted browser infrastructure. Stagehand, backed by Browserbase, provides higher-level abstraction: natural language instructions that translate into reliable browser actions. The total addressable market (TAM) here is made up of software that has never released a public API, which is most enterprise software that exists today.

Connectors, Integrations & MCP

Connectors and API integration platforms are the plumbing systems that nobody wants to build, but everybody needs. In order for an agent to create a Jira ticket, send a Slack message, or update a Salesforce record, it needs authenticated, permissioned, rate-limited access to this software. Composio has built a library of 150+ pre-built integrations purpose-designed for agentic use. Arcade focuses on the authentication layer specifically.

Holding this dynamic ecosystem together is Model Context Protocol (MCP), Anthropic’s open standard for how agents communicate with tools. It is fast becoming the mechanism that lets any agent talk to any tool without custom integration. Companies building on MCP today have a distribution advantage that will compound as the standard proliferates.

Our Hypothesis:

Browser automation, code execution, and connectors are a large surface area in the agent stack, but unclear if it is the most durable layer. The functionality will exist, but much of it could compress into model providers and frameworks, where switching costs stay low and competition centers on latency, reliability, and cost. That dynamic tends to commoditize. The more durable opportunity may sit in auth and trust: persistent OAuth relationships, permissioned access across production systems, and compliance-grade sandboxing. This is not a benchmark problem. It is a neutrality and security problem.

Start Investing With AI Infrastructure & Tools Syndicate Today

Take 5 seconds. No document uploads.

Static tools let agents know. Dynamic tools let agents do. Emerging tools let agents exist across agentic networks and sensory modalities.

This tooling layer is the frontier and, by definition, early, messy, and high-risk. But it is also where we expect to see the largest VC returns in the coming decade.

Multimodal Tooling

Current agents are primarily text-in, text-out. But the real world communicates in images, video, audio, and spatial data. Gladia has built a real-time audio transcription and understanding API that turns voice into structured agent-readable data, which is critical for any agent that tackles customer service, sales, or field operations.

Agent-to-Agent (A2A)

Today, most agentic systems are single entities. The near future will involve networks of agents: research agents, coding agents, and more that will coordinate, delegate, and compensate each other. Natural has built an interoperability layer that lets agents find and invoke other agents across organizational boundaries and thereby turns siloed AI tools into a composable workforce that can be orchestrated end-to-end.

Physical-World Integration

The most speculative subcategory is tools that direct robots, interact with IoT sensors, and/or instruct autonomous vehicles. This tooling layer is still nascent and largely custom per domain, but we’re seeing a pattern consistent with what occurred in digital tooling. A standardized middleware layer is coming first, and whoever wins will own a structural position in the automation stack. Archetype AI has built a universal perception model that ingests raw sensor streams from cameras, LiDAR, vibration monitors, and more and outputs structured, agent-readable intelligence.

Our Hypothesis:

Agent identity and communication primitives, the A2A layer, is the sub-category most worth watching with near-term conviction. The problem will be solved by a startup, not a hyperscaler, for a structural reason: hyperscalers will build agent identity, but each will optimize for their own ecosystem lock-in. The real prize is a cross-cloud, cross-vendor trust layer, and a hyperscaler cannot credibly be neutral here. SWIFT wasn’t built by JPMorgan. DNS wasn’t built by AT&T.

Looking Ahead

As you can see, the tooling layer of the agentic era is still being built. Most of the companies discussed did not exist three years ago. The category did not even have a name two years ago. We are in the early innings of this buildout. And the tools that will win will be those closest to production-critical workflows, hardest to replicate, and compounding with agent adoption.

This communication is from Alumni Ventures, a for-profit venture capital company that is not affiliated with or endorsed by any school. It is not personalized advice, and AV only provides advice to its client funds. This communication is neither an offer to sell, nor a solicitation of an offer to purchase, any security. Such offers are made only pursuant to the formal offering documents for the fund(s) concerned, and describe significant risks and other material information that should be carefully considered before investing. For additional information, please see here. Achievement of investment objectives, including any amount of investment return, cannot be guaranteed. Co-investors are shown for illustrative purposes only, do not reflect all organizations with which AV co-invests, and do not necessarily indicate future co-investors. Example portfolio companies shown are not available to future investors, except potentially in the case of follow-on investments. Venture capital investing involves substantial risk, including risk of loss of all capital invested. Diversification cannot prevent investment loss; it is a strategy to mitigate investment risk. This communication includes forward-looking statements, generally consisting of any statement pertaining to any issue other than historical fact, including without limitation predictions, financial projections, the anticipated results of the execution of any plan or strategy, the expectation or belief of the speaker, or other events or circumstances to exist in the future. Forward-looking statements are not representations of actual fact, depend on certain assumptions that may not be realized, and are not guaranteed to occur. Any forward-looking statements included in this communication speak only as of the date of the communication. AV and its affiliates disclaim any obligation to update, amend, or alter such forward-looking statements, whether due to subsequent events, new information, or otherwise.


Interested in Seeing Elite Venture Deals (for Free)?

  • Home

    Easy Sign-Up

    Click a button. 5 seconds.
  • Home

    No Obligation to Invest

    Only invest in deals you like.
  • Home

    Co-Invest with Elite VCs

    Frequent co-investors include a16z, Sequoia, Khosla, Accel, and more.
  • Home

    Deal Transparency

    Due Diligence and Investment Memos provided. Live Deal discussions with our investment teams.