AI Agents Are Getting More Capable, But Can We Trust Them?

AI / SecurityMarch 20, 20267 min read

Autonomous AI agents moving from capability to trust — AI agents are becoming more capable, which makes trust and containment the real architecture question

AI agents are getting more capable, but the real question is whether we can trust them.

That was my biggest takeaway from NVIDIA's work around OpenShell and NemoClaw.

The industry has focused heavily on what agents can do: persist across sessions, use tools, spawn subagents, interact with real systems, write code, retrieve context, and keep working after the user walks away. Impressive, yes. But from a security and operations perspective, that is exactly where the conversation gets more serious.

An agent with shell access, credentials, network connectivity, memory, and the ability to evolve over time is not just a smarter assistant. It is a very different risk model.

Capability without containment is not maturity. It is exposure.

Agents change the risk model

Traditional applications usually expose a defined set of features. Users click buttons, submit forms, call APIs, and operate inside a relatively predictable interaction model. Agents are different because they can decide which tools to use, in what order, and with what interpretation of the surrounding context.

That changes the security posture. A workflow engine executes a workflow. An agent may choose a workflow, modify it, ask another agent for help, read more context, call a tool, and then continue. That flexibility is the value. It is also the risk.

The failure modes become more subtle: the agent misunderstands a goal, overreaches on a tool call, leaks context to the wrong model, follows malicious instructions embedded in a document, or takes an action that is technically allowed but operationally unsafe.

The controls need to live outside the agent

What stood out to me about OpenShell is the emphasis on controls outside the agent itself: sandboxing, policy enforcement, privacy routing, and deny-by-default principles. To me, that is the right direction.

OpenShell architecture for safer autonomous agents — OpenShell's architecture for safer autonomous agents

We should not expect an agent to be its own entire security boundary. Prompts are useful. System instructions matter. Model behavior matters. But production trust cannot depend only on the model choosing to behave.

The controls that matter

Trustworthy agents need controls outside the model, not just better prompts inside it.

Sandboxing

Limit what the agent can touch so experimentation does not automatically become access to the whole environment.

Policy enforcement

Define what actions are allowed, denied, reviewed, or routed through stronger approval paths.

Privacy routing

Control where sensitive context goes, which models can see it, and what must stay local or isolated.

Deny by default

Treat tool use as a permissioned act, not an open-ended capability granted because the agent asked nicely.

Observability

Capture prompts, tool calls, data access, decisions, and outcomes so operators can reconstruct what happened.

Lifecycle management

Version, update, monitor, suspend, and retire agents like any other production system with real blast radius.

This is the same lesson enterprise technology keeps relearning. Identity belongs outside the application. Network policy belongs outside the workload. Audit logs belong outside the user session. Backups belong outside the system they protect. Agent controls should follow that same pattern.

Deny by default is the right instinct

Deny by default matters because agents are good at finding paths. That is part of why they are useful. Give an agent enough tools, context, and permission, and it may discover a way to accomplish the task that the human did not explicitly imagine.

Sometimes that is wonderful. Sometimes it is exactly the kind of behavior security teams are paid to worry about.

A trustworthy agent environment should make the safe path easy and the unsafe path structurally difficult. Tool calls should be scoped. High-impact actions should require approval. Sensitive data should have routing rules. Network access should be explicit. Agents should not inherit broad access simply because the user running them has broad access.

Trust is also an operations problem

Security is only one part of this. Operations matter just as much. If an agent is always on, then someone needs to know whether it is healthy, what it changed, what it attempted, when it escalated, when it failed, and whether its behavior has drifted.

That is why observability is not optional. Logs should not be an afterthought bolted on after the first incident. They should be part of the architecture from the start.

For enterprise environments, I would want to see every meaningful action tied back to an identity, a policy decision, a tool call, source context, and an outcome. If the agent touched a file, called an API, created a ticket, queried a database, or invoked another agent, that should be visible.

This is what enterprise readiness really means

Enterprise readiness is not just single sign-on and a procurement checklist. For agentic systems, it means the organization can answer basic questions with confidence:

What can this agent access?
Which tools can it call without approval?
Which actions require human review?
Where can sensitive data be sent?
How are agent versions, prompts, policies, and tool permissions changed?
Can we replay or reconstruct what happened after an incident?

If those questions are hard to answer, the agent may still be useful, but it is not yet ready for broad trust.

The future depends on trust

The future of AI agents will not be decided only by how autonomous they are. It will be decided by how securely they can operate.

The more useful agents become, the closer they move to real systems, real data, and real operational consequences. That is where capability has to be matched with containment, governance, monitoring, and accountability.

Autonomy is impressive. Trust is what makes it deployable.

References

Topics: AI agents, agentic AI, AI security, cybersecurity, trustworthy AI, OpenShell, NemoClaw, enterprise AI, AI governance, and NVIDIA.

← All writing