Building Smarter AI: Microsoft's Alliance with Google's Agent2Agent and Meta's Push Towards Automated Code Production

May 08, 2025

How to build a better AI benchmark

Developers of coding agents are crafting models too tailored to benchmarks like SWE-Bench, primarily designed for Python, which reveals a larger issue in AI evaluation. This practice, termed as "gilded," results in models excelling at specific tests but failing in broader applications. The problem extends to industry benchmarks such as FrontierMath and Chatbot Arena, criticized for lack of transparency and drifting from evaluating actual capabilities. An "evaluation crisis," as described by OpenAI's Andrej Karpathy, underscores the need for more trusted methods. A growing group of academics suggests a shift towards validity-focused evaluations akin to social sciences, challenging the current benchmark system and emphasizing the need for models to prove their claimed capabilities. (Source)

The Two Flavors Of AI Agents In The Enterprise And Their Implications For Identity Security

Tarun Thakur, cofounder and CEO of Veza, highlights the transformative impact of AI-powered agents in enterprises, delineating how these systems autonomously integrate and operate across various applications to enhance workflows. Unlike traditional chatbots, these agents function with goal-driven autonomy, real-world connectivity, and cross-application orchestration, promising productivity gains in sectors like software development, marketing, and supply chain management. However, they also introduce identity security challenges linked to broad permissions and persistent data access. Thakur describes two types of enterprise AI agents: organization-approved, enterprise-managed systems and employee-managed solutions, each posing distinct security risks. Despite such concerns, he urges organizations to focus on identity security principles like least privilege access and suggests starting with small-scale implementations to adapt and iterate effectively. (Source)

From prompts to production: AI will soon write most code, reshape developer roles

At Meta's inaugural LlamaCon AI event, Microsoft CEO Satya Nadella disclosed that AI systems now contribute to 30% of the company's software code, while Meta CEO Mark Zuckerberg announced plans for an AI model that will autonomously create future programs for its systems. Zuckerberg predicts that within a year, AI could handle up to half of Meta's software development tasks. These AI advancements are poised to transform software development by automating source code creation and test generation, ultimately increasing developer productivity. Industry projections suggest that AI tools may enhance productivity by 30%, potentially boosting global GDP by over $1.5 trillion. (Source)

Microsoft Backs Google's Open Agent2Agent Protocol to Power Multi-Agent AI Apps

Microsoft is advancing AI interoperability by integrating Google's open AI agent protocol, Agent2Agent (A2A), into Azure AI Foundry and Copilot Studio, enabling seamless collaboration across platforms and organizations. With A2A, Microsoft enhances structured agent communication, ensuring secure and observable exchanges through safeguards like Entra and Azure AI Content Safety. Over 10,000 organizations have adopted Microsoft’s Agent Service, highlighting the protocol's role in orchestrating tasks across diverse ecosystems. Joining the A2A working group on GitHub, Microsoft aims to contribute to the protocol’s specification, moving toward what it describes as "agentic computing," a future where collaborative and adaptive software operates fluidly across various models and domains. (Source)

Empowering multi-agent apps with the open Agent2Agent (A2A) protocol

Microsoft is advancing the evolution of AI agents into vital enterprise system components through platforms like Azure AI Foundry and Copilot Studio. With over 230,000 organizations, including most Fortune 500 companies, utilizing these platforms, Microsoft aims to enhance agent interoperability across clouds and frameworks via the new Agent2Agent (A2A) protocol. This open protocol will enable agents to collaborate and execute complex multi-agent workflows across various platforms while maintaining governance and security standards. By supporting A2A, Microsoft is fostering a future where AI agents operate seamlessly across organizational and cloud boundaries, emphasizing openness, safety, and adaptability. (Source)