AI Agents — Trends in AI: March '25

Agents are all the rage - and for good reason. While the concept of intelligent agents has existed for decades, the latest developments and breakthroughs in AI, with advanced LLMs capable of complex and dynamic behaviors that don't require explicit programming, have pushed agentic AI solutions back into the spotlight. But what exactly are agents?

Trends in AI March 2025: AI Agents special

Defining AI Agents

Within the context of AI, an agent typically refers to an autonomous system capable of performing tasks independently, without explicit human instructions at every step.

Agents are commonly characterized by three core abilities:

Perception: Monitoring the state of their environment (virtual or physical).
Planning: Devising effective strategies or workflows based on their observations.
Action: Executing proposed plans towards achieving specified goals.

The rise of agents naturally evolved from the early AI apps that were powered by large language models. Initially, these applications relied heavily on predefined sequences of operations, such as chatbots querying a static database and prompting generative models with fixed templates. The introduction of "function-calling" (also known as "tool use") allowed the models themselves to determine the most suitable workflows dynamically. This meant they could now handle varying input complexities effectively, creating tailored solutions for nuanced tasks on the fly rather than following a standard pipeline with hardcoded branching rules.

Today's state-of-the-art developments extend this paradigm further, allowing multiple specialized agents to collaborate on challenging tasks. By distributing responsibilities among designated agent roles, these multi-agent systems collectively surpass any individual component's intelligence. Real-world applications of this approach include tasks like generating comprehensive research reports, acting as AI co-scientists assisting in research design, or even pair-programming within an existing codebase.

Tool Use and the Model Context Protocol (MCP)

An essential element that powers AI agents is their sophisticated use of external tools. These tools allow language models to interact with external repositories, APIs, and environments, extending their knowledge and capabilities beyond the original training data.

The primary tool categories currently shaping AI agents include:

Memory & Planning:

Agents leverage memory tools (also referred to as a "scratchpad" or "blackboard") to store intermediate information and historical context, effectively increasing their limited context windows. These shared memory systems also serve as communication channels among multi-agent teams and facilitate task breakdowns through planning and orchestration tools.

Code Execution & Data Analysis:

Handling large-scale structured data or complex analytics tasks can be challenging exclusively through textual representations as inputs grow longer. Tools allowing code writing and execution empower agents to manage structured inputs (such as tables, spreadsheets, or logs), produce aggregate statistics, or create visualizations more reliably.

Data Retrieval:

Central to Retrieval-Augmented Generation (RAG) systems, retrieval tools allow agents to query structured databases or search unstructured data collections. In addition, advanced use cases often incorporate query reformulation or intent recognition tools to improve retrieval performance.

External APIs:

Like traditional software, AI agents can greatly benefit from seamless integration with existing third-party APIs and services, extending their reach to perform targeted, application-specific tasks and access broader informational resources.

Web Navigation:

Agents can be granted freedom beyond predefined APIs by directly browsing the web just as humans do. Web navigation and search engine integration enable dynamic acquisition of up-to-date information, overcoming traditional knowledge-cutoff constraints.

Computer Use:

In advanced scenarios, AI agents can fully control sandboxed desktop environments (or even the user's device), mimicking the full spectrum of human interactions such as clicking, dragging, and typing into graphical interfaces, thus providing the ultimate flexibility and autonomy.

The Model Context Protocol (MCP), introduced by Anthropic, aims to standardize interactions between language models and software applications. MCP enables external data sources and services to advertise their resources, tools, and available text templates directly to language models. Instead of manually integrating each data source repeatedly, developers only have to ensure that their services implement the MCP standard, much like exposing APIs for human-centric software development. Currently, popular MCP servers offer access to development tools such as Git and Docker, local databases like Postgres and SQLite, as well as popular consumer apps like Google Maps or Spotify.

Benchmarking Tool & Computer Use

As agent tasks grow in complexity, benchmarking practices also evolve. Early evaluation frameworks merely assessed whether models could produce accurate answers following a single-turn prompt. By contrast, newer benchmarks like TAU-bench now include additional dimensions like multi-turn interactions, follow-up questions, open-ended dialogues, performance consistency, and handling unpredictable or contextual events. These broader assessment methods expose gaps in reliability more prominently, better simulating the real-world usage of these language models. For instance, recent performance evaluations demonstrate that even state-of-the-art models (like Anthropic's Claude 3.7 release) reach just about 60%-80% success rates in complex agency-driven scenarios, indicating that this is far from a solved task.

Following the growing popularity and surge of both proprietary and open-source solutions for computer use, and considering the inherent heterogeneity of interfaces and devices involved, a relevant benchmark that has emerged is OSWorld. It provides a unified framework for setting up evaluation tasks, assessing execution results, and training multi-modal agents across various desktop operating systems. Currently, OpenAI's premium service, Operator, tops the leaderboard, although open-source models such as UI-TARS remain competitive, placing within the top five.

Popular Agentic Frameworks

Numerous frameworks have emerged in recent years, each addressing distinct aspects of constructing and deploying agentic AI systems. Some frameworks emphasize rapid prototyping and iteration, while others focus on specialized tasks such as multi-agent organization or parallel execution. Some notable open-source examples include:

AutoGen: A multi-agent framework providing diverse pre-built agent routines and integrations.
🤗 smolagents: A lightweight toolkit focusing on the actions-as-code-execution paradigm.
LangGraph: Uses graphs, inspired by finite state machines for flexible control flow modeling.
OpenAI's Swarm: An educational framework illustrating message-handling and orchestration.
CrewAI: One of the most customizable solutions featuring high-level prompt configuration alongside low-level workflow orchestration flexibility.
Zeta Alpha's Agents SDK: Geared towards production readiness, providing out-of-the-box tools for iterative building, performance evaluation, and deployment.

Trending Research Papers

Here is an overview of recently trending research papers on AI agents, covering their applications, planning, optimization, and benchmarking, to give you a feel of where the field is headed. You can always find this collection of papers (and more that didn't make the cut) in Zeta Alpha, allowing you to easily explore and discover more related work.

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? - Miserendino et al. (OpenAI) - 17 Feb. 2025
Multi-Agent Risks from Advanced AI - Hammond et al. (Cooperative AI) - 19 Feb. 2025
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs - Betley et al. (Truthful AI, UCL) - 24 Feb. 2025
Towards an AI co-scientist - Gottweis et al. (Google) - 26 Feb. 2025
LangProBe: a Language Programs Benchmark - Tan et al. (UC Berkeley) - 27 Feb. 2025

Despite the excitement around them, truly reliable AI agents for complex workflows remain a challenge, even for the current state-of-the-art models. Zeta Alpha offers the practical experience and specialized expertise to navigate these challenges and identify realistic use cases. Let's discuss your business objectives and brainstorm practical agentic applications to bridge this reliability gap. Whether you're facing inconsistent agent performance or simply curious about their business impact, share your challenges, and we'll help you refine your approach, pinpoint areas for improvement, and boost accuracy, dependability, and value. Contact us for an initial conversation to see how we can make your AI Agents reliable and valuable assets. To learn more about frameworks and the latest breakthroughs in AI R&D around agents, watch our webinar recording and join our Luma community for upcoming discussions and events.

Until next time, enjoy discovery!