AI Agents Are Getting Better. Their Safety Disclosures Aren't

AI Agents Are Getting Better. Their Safety Disclosures Aren't

A study led by MIT researchers found that agentic AI developers seldom publish detailed information about how these tools were tested for safety.

Headshot of Macy Meyer
Headshot of Macy Meyer

Macy is a writer on the AI Team. She covers how AI is changing daily life and how to make the most of it. This includes writing about consumer AI products and their real-world impact, from breakthrough tools reshaping daily life to the intimate ways people interact with AI technology day-to-day. Macy is a North Carolina native who graduated from UNC-Chapel Hill with a BA in English and a second BA in Journalism. You can reach her at mmeyer@cnet.com.

Expertise Macy covers consumer AI products and their real-world impact Credentials

  • Macy has been working for CNET for coming on 2 years. Prior to CNET, Macy received a North Carolina College Media Association award in sports writing.

AI agents are certainly having a moment. Between the recent virality of OpenClaw, Moltbook and OpenAI planning to take its agent features to the next level, it may just be the year of the agent.

Why? Well, they can plan, write code, browse the web and execute multistep tasks with little to no supervision. Some even promise to manage your workflow. Others coordinate with tools and systems across your desktop. 

The appeal is obvious. These systems do not just respond. They act -- for you and on your behalf. But when researchers behind the MIT AI Agent Index cataloged 67 deployed agentic systems, they found something unsettling.

Developers are eager to describe what their agents can do. They are far less eager to describe whether these agents are safe.

"Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement," the researchers wrote in the paper. "However, there is currently no structured framework for documenting … safety features of agentic systems."

That gap shows up clearly in the numbers: Around 70% of the indexed agents provide documentation, and nearly half publish code. But only about 19% disclose a formal safety policy, and fewer than 10% report external safety evaluations. 

The research underscores that while developers are quick to tout the capabilities and practical application of agentic systems, they are also quick to provide limited information regarding safety and risk. The result is a lopsided kind of transparency. 

What counts as an AI Agent

The researchers were deliberate about what made the cut, and not every chatbot qualifies. To be included, a system had to operate with underspecified objectives and pursue goals over time. It also had to take actions that affect an environment with limited human mediation. These are systems that decide on intermediate steps for themselves. They can break a broad instruction into subtasks, use tools, plan, complete and iterate. 

AI Atlas

That autonomy is what makes them powerful. It's also what raises the stakes.

When a model simply generates text, its failures are usually contained to that one output. When an AI agent can access files, send emails, make purchases or modify documents, mistakes and exploits can be damaging and propagate across steps. Yet the researchers found that most developers do not publicly detail how they test for those scenarios.

Capability is public, guardrails are not

The most striking pattern in the study is not hidden deep in a table -- it is repeated throughout the paper.

Developers are comfortable sharing demos, benchmarks and the usability of these AI agents, but they are far less consistent about sharing safety evaluations, internal testing procedures or third-party risk audits.

That imbalance matters more as agents move from prototypes to digital actors integrated into real workflows. Many of the indexed systems operate in domains like software engineering and computer use -- environments that often involve sensitive data and meaningful control.

The MIT AI Agent Index does not claim that agentic AI is unsafe in totality, but it shows that as autonomy increases, structured transparency about safety has not kept pace.

The technology is accelerating. The guardrails, at least publicly, remain harder to see.

Sponsorluk
Sponsorluk
Upgrade to Pro
Choose the Plan That's Right for You
Sponsorluk
Sponsorluk
Reklam
Read More
Download the Telestraw App!
Download on the App Store Get it on Google Play
×