The security gap in agentic AI
The security gap in agentic AI

Why developers and security teams must work together from day one to secure autonomous AI systems.

Published Mar 11, 2026

240415 TTC stock 085

0%

The security gap in agentic AI
The security gap in agentic AI

Published Mar 11, 2026

At a recent IT industry event in Copenhagen, I sat down with our developer Nicolai Lassen and Anders Rømer Skøtt, a red teamer (cybersecurity expert), to discuss a significant challenge facing most companies today: How do you secure an AI system that can think, act, and access your most sensitive data autonomously – i.e., AI agents? The conversation centred around a real-world case of an agentic AI platform in the financial services industry, capable of autonomously accessing member data, executing transactions, and coordinating across multiple backend systems. In this article, we’ll uncover how a developer and hacker worked together to secure it against challenges the industry is only beginning to understand. 

The agentic teams are here 

AI agents are no longer confined to research labs. They are live in production environments, orchestrating workflows, accessing sensitive data, and making decisions that directly impact business outcomes. Gartner predicts that, by the end of 2026, 40% of enterprise applications will include task-specific AI agents, up from less than 5% in early 2025.1  

AI agents or agent teams are autonomous systems that actively execute tasks and interact with external tools – such as querying databases, sending emails, and modifying records with minimal human oversight. Unlike traditional AI that generates responses, agents drive end-to-end resolutions. Agent swarms extend this capability by coordinating multiple specialised agents that work together across a shared communication layer, enabling complex workflows while introducing new security considerations at each layer.

From Facebook games to enterprise AI 

Nicolai's path to AI development started unexpectedly. "In primary school, we all had these Facebook games where we competed to build the best restaurant or pet society," he recalls. "But I was lazy. Instead of grinding for points, I figured out how to modify RAM codes in the flash games to get unlimited resources. That's when I realised – as a developer, you can find shortcuts to make your own or other people's lives easier." 

That hacker's mindset serves him well now in the age of large language models. For Nicolai, AI has not changed what coding is; it has changed how quickly you can get there. "You don't learn to write code because it's fun to sit in a dark IDE writing if statements," he explains. "You do it to reach a goal. Large language models let us skip the early phases – the boilerplate, the scaffolding – and get to results faster. Then you can backtrack to optimise." 

But there’s a catch. When asked about the viral images of developers running multiple AI agents simultaneously, letting them write and ship code with minimal oversight, Nicolai is skeptical. "If you just set 20 agents loose on your repository, you get new technical debt that nobody understands," he warns. "You have to stay close to the process. Otherwise, when you leave the client, there's nobody who knows how the thing actually works anymore." 

Building an agent swarm for financial services 

The system Nicolai built, and Anders subsequently tested, was in the financial services industry, for a company that wanted more than a simple chatbot. "They started by asking us for a chatbot so customers could talk to their FAQ," Nicolai explains. "But during strategy work, we realised that what they actually wanted was an executing agent setup." 

The difference is significant. Rather than retrieving documents and generating responses, the system needed to access the backend systems, pull personalised member information, and execute self-service workflows all autonomously. "Think of it like Booking.com," Nicolai offers. "One agent handles flight databases, another handles hotels. The customer says 'April, two weeks in Madrid,' and instead of letting us engineers define A-to-B-to-C workflows, we tell the agent: choose A, B, or C yourself, just reach the result we’ve agreed on." 

The agents coordinate through the communication layer, what Nicolai describes as "a Messenger chat between agents," working asynchronously, sharing progress, until they reach a consensus output. The same core system powers both customer-facing chat and live assistance for advisers, surfacing relevant information in real time during phone calls. 

How a red teamer sees agentic systems 

Anders approaches these systems differently. Simply put, where Nicolai builds them, Anders breaks them. "We prefer smashing things to building them," he admits. His team uses methods such as fuzzing (throwing parameters at applications to see when they fail), prompt-injections, and reverse engineering to find vulnerabilities. 

When AI agents emerged, Anders was initially excited. "We were super stoked, thinking that now we'll find vulnerabilities ten times faster!" The reality, however, proved more nuanced.

/* "When we work with an application that we know is vulnerable and we launch AI agents analyse it, they will find lots of 'medium-bad' issues. But they struggle to find the really interesting primitives that most vulnerability researchers look for, such as executing arbitrary code. It's gotten better over the past year, but without human quality assurance, especially on complex tasks, you may end up with ‘hallucinated’ vulnerabilities rather than actual, exploitable ones unless robust validation flows are set up to verify and control agentic findings." */

A new way to think about agent security

Anders brought a whiteboard to the session. "I've been thinking about how we talk about agent security," he says. "We have established frameworks for applications and networks but not yet established common frameworks for agentic security. Some are trying, but there's no unified body like the Internet Engineering Task Force (IETF) that can help standardise how we approach this field."

He sketched out the full attack surface from the user, through application and agent layer all the way to data sources: 

"There are no tools that test this entire stack in a consolidated way," Anders notes. "It's complex, but we have developed a tool that tests the full stack of components that make agents run." 

Despite the novelty of agentic systems, we, quite paradoxically, see a resurgence of misconfigurations and poor application security practices. Internally, the running joke goes that “the 90’s called and wants its misconfigured web applications back”. In the tests we have performed so far, we’ve seen an alarming rate of common misconfigurations such as leaking API keys and broken or improper authentication or permission.

/* "These types of flaws are serious and can lead to data theft, model manipulation or even backdooring the agent’s application context, making it possible for attackers to ‘break into’ the organisation hosting it.” */

At the agent logic layer, prompt injections emerge as a fundamental challenge. OWASP ranks it as the top vulnerability for LLM applications.2 The attack is deceptively simple: an attacker crafts input that the model misinterprets as legitimate instructions. Imagine an intruder in the real world – charming or sneaking their way past the front desk; suddenly they can listen to conversations, observe workflows, and plant false documents that employees unknowingly act upon. In indirect prompt injection, the attacker manipulates the agent through face-to-face user input. Here, malicious instructions are embedded in external data sources, hidden in emails, documents, or databases the agent processes as trusted information. 

The attack model developed by Anders and his team utilises all of the attack types mentioned above to test the integrity and confidentiality of agentic systems. 

A practioner’s guide to defence 

So how did Nicolai's team actually secure the system? The key insight: focus on boundaries, not the model itself. 

"We focused on input and output channels, not on messing with the core logic," Nicolai explains. "If you interfere too much with the reasoning, you get sporadic errors that trickle through workflows, and the LLM's content filters can trigger unpredictably." 

Their security architecture has four layers:

1

Standard guardrails

These evaluate intent like sharing PII or "Are you trying to make a Molotov cocktail?" Before deciding to implement a model ensure that its core business purpose is defined. Its purpose should define its permission and how we apply guardrails to the model logic to guard against attacks such as prompt injection.

2

Anomaly detection

For obscure attempts, including roleplay-based attacks and unusual token distributions compared to normal usage, one lesson is clear from our testing: viable and systematic bypasses typically involve anomaly/atypical behaviour such as a high volume of prompting or heavy token consumption. Such behaviours can be monitored and detected – much like we monitor for adverse events in networks.

3

LLM-as-judge

LLMs can help LLMs. A highly capable AI reasoning model that evaluates responses against criteria like tone of voice ("as a pension company, we don't want to discuss political opinions") and compliance requirements. We have successfully deployed agents that ‘help’ other agents be more secure. A notable example is our judge LLM, which we use when running thousands of different attack patterns. Its basic function is to evaluate when a target model may fail and provide mitigation guidance.

4

Cloud and infrastructure hardening

Ultimately, an agent needs to be hosted. Robust principles for asset hardening and authentication, already recognised as best practice, can be applied to safeguard your agents. Providers like Microsoft and Amazon are starting to move tools into production that can help guard both the inputs to and outputs from agents.

5

Firewall your hosts and infrastructure

One of our findings was that access management proved critical. To solve this, the system should identify users based on what they should have access to. "When a customer verifies themselves, they get a session token. Only with that token can they query external data," Nicolai explains. "We have a protocol on top, so if you lack rights to certain tools, they simply don't appear. LLMs love to 'understand everything,' but we put the connector in a tight box and prevent lateral movement." Perhaps most importantly, they filter what information flows back to the model. "We strip out infrastructure details, facade names, endpoint names," Nicolai says. "A pentester often doesn't need the actual data from an external source, just knowing the infrastructure behind it is gold. We try to hide that."

Tens of thousands of attacks later 

Anders and his team built a testing engine, Mindbreaker, with two components: 1) an attacker LLM that attempts both deterministic and non-deterministic attacks within heuristic parameters, and 2) a judge LLM that evaluates whether responses crossed lines such as racism, legal violations, self-harm encouragement, IP infringement, or customer data leakage. 

The results? "We saw roughly 94% of attacks fail initially," Anders reports. "After further hardening, that rose to about 98% out of tens of thousands of prompt injection attempts. So,” he pauses briefly. "We may not be at the point where automation is high enough, and humans definitely still need to review the output. But the collaboration worked." 

When asked about guardrail packages like those offered by Azure, Anders is pragmatic. "They're fine up to a point. But with tweaks, you can bypass them. The question is: where do you have control?" 

The clear answer to that question? You exert the greatest amount of control at the initial layer, the application layer. At this level, ‘old tricks’ such as rate limiting, whitelisting, and even infrastructure hardening can yield great results. As you move down the stack, you have fewer security controls that the model owner can deploy. 

Builders and breakers, unite! 

Asked for one piece of advice, Anders went first: "Get your role assignments in order, and solid authentication and authorisation. That removes the worst of it – in a sense, simple application and web security practices can get you quite far.” This was not meant as critique directed at Nicolai, but as a general principle for developers. 

Nicolai responded with a question: "When you approach an engagement, how do we get the information needed to make testing feel more like data-driven penetration testing, and how do we then implement the findings afterward? We're missing a framework. When we build agent teams, it is new. When we get attacked, we need a shared language to talk about it." 

Anders nodded. "100% agree. That's exactly why this conversation is hard. Not because anyone's being difficult, but because the frameworks simply aren't there yet." 

What's clear is that security can’t just be bolted on after the fact. It requires developers who think like hackers and security teams who understand AI capabilities. It requires early, continuous collaboration rather than a final gate before deployment. And it requires the industry to develop shared frameworks to foster a common language for risks that are genuinely new. 

Sources

1https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025

2 https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/ 

For insights on how to navigate AI adoption with appropriate security controls, reach out to The Tech Collective.