ماه: تیر 1404

How TigerEye is redefining AI-powered business intelligenceHow TigerEye is redefining AI-powered business intelligence

How TigerEye is redefining AI-powered business intelligence

At the Generative AI Summit in Silicon Valley, Ralph Gootee, Co-founder of TigerEye, joined Tim Mitchell, Business Line Lead, Technology at the AI Accelerator Institute, to discuss how AI is transforming business intelligence for go-to-market teams.

In this interview, Ralph shares lessons learned from building two companies and explores how TigerEye is rethinking business intelligence from the ground up with AI, helping organizations unlock reliable, actionable insights without wasting resources on bespoke analytics.

Tim Mitchell: Ralph, it’s a pleasure to have you here. We’re on day two of the Generative AI Summit, part of AI Silicon Valley. You’re a huge part of the industry in Silicon Valley, so it’s amazing to have you join us. TigerEye is here as part of the event. Maybe for folks that aren’t familiar with the brand, you can just give a quick rundown of who you are and what you’re doing.

Ralph: I’m the co-founder of TigerEye – my second company. It’s exciting to be solving some of the problems we had with our first company, PlanGrid, in this one. We sold PlanGrid to Autodesk. I had a really good time building it. But when you’re building a company, you end up having many internal metrics to track, and a lot of things that happen with sales. So, we built a data team.

With TigerEye, we’re using AI to help build that data team for other companies, so they can learn from our past mistakes. We’re helping them build business intelligence that’s meant for go-to-market, so sales, marketing, and finance all together in one package.

Lessons learned from PlanGrid

Tim: What were some of those mistakes that you’re now helping others avoid?

Ralph: The biggest one was using highly skilled resources to build internal analytics, time that could’ve gone into building customer-facing features. We had talented data engineers figuring out sales metrics instead of enhancing our product. That’s a key learning we bring to TigerEye.

What makes TigerEye unique

Tim: If I can describe TigerEye in short as an AI analyst for business intelligence, what’s unique about TigerEye in that space?

Ralph: One of the things that’s unique is we were built from the ground up for AI. Where a lot of other companies are trying to tack on or figure out how they’re going to work with AI, TigerEye was built in generative AI as a world. Rather than relying on text or trying to gather up metrics that could cause hallucination, we actually write SQL from the bottom up. Our platform is built on SQL, so we can give answers that show your math. You can see why the win rate is that, and it will decrease over time.

Why Generative AI Summit matters

Tim: And what’s interesting about this conference for you?

Ralph: The conference brings together both big companies and startups. It’s really nice to have conversations with companies that have more mature data issues, versus startups that are just figuring out how their sales motions work.

The challenges of roadmapping in AI

Tim: You’re the co-founder, but as CTO, in what kind of capacity does the roadmapping cause you headaches? What does that process look like for a solution like this?

Ralph: In the AI world, roadmapping is challenging because it keeps getting so much better so quickly. The only thing you know for sure is you’re going to have a new model drop that really moves things forward. Thankfully for us, we solve what we see as the hardest part of AI, giving 100% accurate answers. We still haven’t seen foundational models do that on their own, but they get much better at writing code.

So the way we’ve taught to write SQL, and how we work with foundational models, both go into the roadmap. Another part is what foundational models we support. Right now, we work with OpenAI, Gemini, and Anthropic. Every time there’s a new model drop, we evaluate it and think about whether we want to bring that in.

Evaluating and choosing models

Tim: How do you choose which model to use?

Ralph: There are two major things. One, we have a full evaluation framework. Since we specialize in sales questions, we’ve seen thousands of sales questions, and we know what the answer should be and how to write the code for them. We run new models through that and see how they look.

The other is speed. Latency really matters; people want instant responses. Sometimes, even within the same vendor, the speed will vary model by model, but that latency is important.

The future of AI-powered business intelligence

Tim: What’s next for you guys? Any AI-powered revelations we can expect?

Ralph: We think AI is going to be solved first in business intelligence in deep vertical sections. It’s hard to imagine AI solving a Shopify company’s challenge and also a supply chain challenge for an enterprise. We’re going deep into verticals to see what new features AI has to understand.

For example, in sales, territory management is a big challenge: splitting up accounts, segmenting business. We’re teaching AI how to optimize territory distribution and have those conversations with our customers. That’s where a lot of our roadmap is right now.

Who’s adopting AI business intelligence?

Tim: With these new products, who are you seeing the biggest wins with?

Ralph: Startups and mid-market have a good risk tolerance for AI products. Enterprises, we can have deep conversations, but it’s a slower process. They’re forming their strategic AI teams but not getting deep into it yet. Startups and mid-market, especially AI companies themselves, are going full-bore.

Tim: And what are the risks or doubts that enterprises might have?

Ralph: Most enterprises have multiple AI teams, and they don’t even know it. It happened out of nowhere. Then they realize they need an AI visionary to lead those teams. The AI visionary is figuring out their job, and the enterprise is going through that process.

The best enterprises focus on delivering more value to their customers with fewer resources. We’re seeing that trend – how do I get my margins up and lower my costs?

Final thoughts

As AI continues to reshape business intelligence, it’s clear that success will come to those who focus on practical, reliable solutions that serve real go-to-market needs.

TigerEye’s approach, combining AI’s power with transparent, verifiable analytics, offers a glimpse into the future of business intelligence: one where teams spend less time wrestling with data and more time acting on insights.

As the technology evolves, the companies that go deep into vertical challenges and stay laser-focused on customer value will be the ones leading the charge.

Why agentic AI pilots fail and how to scale safelyWhy agentic AI pilots fail and how to scale safely

Why agentic AI pilots fail and how to scale safely

At the AI Accelerator Institute Summit in New York, Oren Michels, Co-founder and CEO of Barndoor AI, joined a one-on-one discussion with Alexander Puutio, Professor and Author, to explore a question facing every enterprise experimenting with AI: Why do so many AI pilots stall, and what will it take to unlock real value?

Barndoor AI launched in May 2025. Its mission addresses a gap Oren has seen over decades working in data access and security: how to secure and manage AI agents so they can deliver on their promise in enterprise settings.

“What you’re really here for is the discussion about AI access,” he told the audience. “There’s a real need to secure AI agents, and frankly, the approaches I’d seen so far didn’t make much sense to me.”

AI pilots are being built, but Oren was quick to point out that deployment is where the real challenges begin.

As Alexander noted:

“If you’ve been around AI, as I know everyone here has, you’ve seen it. There are pilots everywhere…”

Why AI pilots fail

Oren didn’t sugarcoat the current state of enterprise AI pilots:

“There are lots of them. And many are wrapping up now without much to show for it.”

Alexander echoed that hard truth with a personal story. In a Forbes column, he’d featured a CEO who was bullish on AI, front-loading pilots to automate calendars and streamline doctor communications. But just three months later, the same CEO emailed him privately:

“Alex, I need to talk to you about the pilot.”

The reality?

“The whole thing went off the rails. Nothing worked, and the vendor pulled out.”

Why is this happening? According to Oren, it starts with a misconception about how AI fits into real work:

“When we talk about AI today, people often think of large language models, like ChatGPT. And that means a chat interface.”

But this assumption is flawed.

“That interface presumes that people do their jobs by chatting with a smart PhD about what to do. That’s just not how most people work.”

Oren explained that most employees engage with specific tools and data. They apply their training, gather information, and produce work products. That’s where current AI deployments miss the mark, except in coding:

“Coding is one of those rare jobs where you do hand over your work to a smart expert and say, ‘Here’s my code, it’s broken, help me fix it.’ LLMs are great at that. But for most functions, we need AI that engages with tools the way people do, so it can do useful, interesting work.”

Why agentic AI pilots fail and how to scale safely

The promise of agents and the real bottleneck

Alexander pointed to early agentic AI experiments, like Devin, touted as the first AI software engineer:

“When you actually looked at what the agent did, it didn’t really do that much, right?”

Oren agreed. The issue wasn’t the technology; it was the disconnect between what people expect agents to do and how they actually work:

“There’s this promise that someone like Joe in finance will know how to tell an agent to do something useful. Joe’s probably a fantastic finance professional, but he’s not part of that subset who knows how to instruct computers effectively.”

He pointed to Zapier as proof: a no-code tool that didn’t replace coders.

“The real challenge isn’t just knowing how to code. It’s seeing these powerful tools, understanding the business problems, and figuring out how to connect the two. That’s where value comes from.”

And too often, Oren noted, companies think money alone will solve it. CEOs invest heavily and end up with nothing to show because:

“Maybe the human process, or how people actually use these tools, just isn’t working.”

This brings us to what Oren called the real bottleneck: access, not just to AI, but what AI can access.

“We give humans access based on who they are, what they’re doing, and how much we trust them. But AI hasn’t followed that same path. Just having AI log in like a human and click around isn’t that interesting; that’s just scaled-up robotic process automation.”

Instead, enterprises need to define:

What they trust an agent to do
The rights of the human behind it
The rules of the system it’s interacting with
And the specific task at hand

These intersect to form what Oren called a multi-dimensional access problem:

“Without granular controls, you end up either dialing agents back so much they’re less useful than humans, or you risk over-permissioning. The goal is to make them more useful than humans.”

Why specialized agents are the future (and how to manage the “mess”)

As the conversation shifted to access, Alexander posed a question many AI leaders grapple with: When we think about role- and permission-based access, are we really debating the edges of agentic AI?

“Should agents be able to touch everything, like deleting Salesforce records, or are we heading toward hyper-niche agents?”

Oren was clear on where he stands:

“I’d be one of those people making the case for niche agents. It’s the same as how we hire humans. You don’t hire one person to do everything. There’s not going to be a single AI that rules them all, no matter how good it is.”

Instead, as companies evolve, they’ll seek out specialized tools, just like they hire specialized people.

“You wouldn’t hire a bunch of generalists and hope the company runs smoothly. The same will happen with agents.”

But with specialization comes complexity. Alexander put it bluntly:

“How do we manage the mess? Because, let’s face it, there’s going to be a mess.”

Oren welcomed that reality:

“The mess is actually a good thing. We already have it with software. But you don’t manage it agent by agent, there will be way too many.”

The key is centralized management:

A single place to manage all agents
Controls based on what agents are trying to do, and the role of the human behind them
System-specific safeguards, because admins (like your Salesforce or HR lead) need to manage what’s happening in their domain

“If each agent or its builder had its own way of handling security, that wouldn’t be sustainable. And you don’t want agents or their creators deciding their own security protocols – that’s probably not a great idea.”

Why AI agents need guardrails and onboarding

The question of accountability loomed large. When humans manage fleets of AI agents, where does responsibility sit?

Oren was clear:

“There’s human accountability. But we have to remember: humans don’t always know what the agents are going to do, or how they’re going to do it. If we’ve learned anything about AI so far, it’s that it can have a bit of a mind of its own.”

He likened agents to enthusiastic interns – eager to prove themselves, sometimes overstepping in their zeal:

“They’ll do everything they can to impress. And that’s where guardrails come in. But it’s hard to build those guardrails inside the agent. They’re crafty. They’ll often find ways around internal limits.”

The smarter approach? Start small:

Give agents a limited scope.
Watch their behavior.
Extend trust gradually, just as you would with a human intern who earns more responsibility over time.

This led to the next logical step: onboarding. Alexander asked whether bringing in AI agents is like an HR function.

Oren agreed and shared a great metaphor from Nvidia’s Jensen Huang:

“You have your biological workforce, managed by HR, and your agent workforce, managed by IT.”

Just as companies use HR systems to manage people, they’ll need systems to manage, deploy, and train AI agents so they’re efficient and, as Alexander added, safe.

How to manage AI’s intent

Speed is one of AI’s greatest strengths and risks. As Oren put it:

“Agents are, at their core, computers, and they can do things very, very fast. One CISO I know described it perfectly: she wants to limit the blast radius of the agents when they come in.”

That idea resonated. Alexander shared a similar reflection from a security company CEO:

“AI can sometimes be absolutely benevolent, no problem at all, but you still want to track who’s doing what and who’s accessing what. It could be malicious. Or it could be well-intentioned but doing the wrong thing.”

Real-world examples abound from models like Anthropic’s Claude “snitching” on users, to AI trying to protect its own code base in unintended ways.

So, how do we manage the intent of AI agents?

Oren drew a striking contrast to traditional computing:

“Historically, computers did exactly what you told them; whether that’s what you wanted or not. But that’s not entirely true anymore. With AI, sometimes they won’t do exactly what you tell them to.”

That makes managing them a mix of art and science. And, as Oren pointed out, this isn’t something you can expect every employee to master:

“It’s not going to be Joe in finance spinning up an agent to do their job. These tools are too powerful, too complex. Deploying them effectively takes expertise.”

Why pilots stall and how innovation spreads

If agents could truly do it all, Oren quipped:

“They wouldn’t need us here, they’d just handle it all on their own.”

But the reality is different. When Alexander asked about governance failures, Oren pointed to a subtle but powerful cause of failure. Not reckless deployments, but inertia:

“The failure I see isn’t poor governance in action, it’s what’s not happening. Companies are reluctant to really turn these agents loose because they don’t have the visibility or control they need.”

The result? Pilot projects that go nowhere.

“It’s like hiring incredibly talented people but not giving them access to the tools they need to do their jobs and then being disappointed with the results.”

In contrast, successful AI deployments come from open organizations that grant broader access and trust. But Oren acknowledged the catch:

“The larger you get as a company, the harder it is to pull off. You can’t run a large enterprise that way.”

So, where does innovation come from?

“It’s bottom-up, but also outside-in. You’ll see visionary teams build something cool, showcase it, and suddenly everyone wants it. That’s how adoption spreads, just like in the API world.”

And to bring that innovation into safe, scalable practice:

Start with governance and security so people feel safe experimenting.
Engage both internal teams and outside experts.
Focus on solving real business problems, not just deploying tech for its own sake.

Oren put it bluntly:

“CISOs and CTOs, they don’t really have an AI problem. But the people creating products, selling them, managing finance – they need AI to stay competitive.”

Trusting AI from an exoskeleton to an independent agent

The conversation circled back to a critical theme: trust.

Alexander shared a reflection that resonated deeply:

“Before ChatGPT, the human experience with computers was like Excel: one plus one is always two. If something went wrong, you assumed it was your mistake. The computer was always right.”

But now, AI behaves in ways that can feel unpredictable, even untrustworthy. What does that mean for how we work with it?

Oren saw this shift as a feature, not a flaw:

“If AI were completely linear, you’d just be programming, and that’s not what AI is meant to be. These models are trained on the entirety of human knowledge. You want them to go off and find interesting, different ways of looking at problems.”

The power of AI, he argued, comes not from treating it like Google, but from engaging it in a process:

“My son works in science at a biotech startup in Denmark. He uses AI not to get the answer, but to have a conversation about how to find the answer. That’s the mindset that leads to success with AI.”

And that mindset extends to gradual trust:

“Start by assigning low-risk tasks. Keep a human in the loop. As the AI delivers better results over time, you can reduce that oversight. Eventually, for certain tasks, you can take the human out of the loop.”

Oren summed it up with a powerful metaphor:

“You start with AI as an exoskeleton; it makes you bigger, stronger, faster. And over time, it can become more like the robot that does the work itself.”

The spectrum of agentic AI and why access controls are key

Alexander tied the conversation to a helpful analogy from a JP Morgan CTO: agentic AI isn’t binary.

“There’s no clear 0 or 1 where something is agentic or isn’t. At one end, you have a fully trusted system of agents. On the other hand, maybe it’s just a one-shot prompt or classic RPA with a bit of machine learning on top.”

Oren agreed:

“You’ve described the two ends of the spectrum perfectly. And with all automation, the key is deciding where on that spectrum we’re comfortable operating.”

He compared it to self-driving cars:

“Level 1 is cruise control; Level 5 is full autonomy. We’re comfortable somewhere in the middle right now. It’ll be the same with agents. As they get better, and as we get better at guiding them, we’ll move further along that spectrum.”

And how do you navigate that safely? Oren returned to the importance of access controls:

“When you control access outside the agent layer, you don’t have to worry as much about what’s happening inside. The agent can’t see or write to anything it isn’t allowed to.”

That approach offers two critical safeguards:

It prevents unintended actions.
It provides visibility into attempts, showing when an agent tries to do something it shouldn’t, so teams can adjust the instructions before harm is done.

“That lets you figure out what you’re telling it that’s prompting that behavior, without letting it break anything.”

The business imperative and the myth of the chat interface

At the enterprise level, Oren emphasized that the rise of the Chief AI Officer reflects a deeper truth:

“Someone in the company recognized that we need to figure this out to compete. Either you solve this before your competitors and gain an advantage, or you fall behind.”

And that, Oren stressed, is why this is not just a technology problem, it’s a business problem:

“You’re using technology, but you’re solving business challenges. You need to engage the people who have the problems, and the folks solving them, and figure out how AI can make that more efficient.”

When Alexander asked about the biggest myth in AI enterprise adoption, Oren didn’t hesitate:

“That the chat interface will win.”

While coders love chat interfaces because they can feed in code and get help most employees don’t work that way:

“Most people don’t do their jobs through chat-like interaction. And most don’t know how to use a chat interface effectively. They see a box, like Google search, and that doesn’t work well with AI.”

He predicted that within five years, chat interfaces will be niche. The real value?

“Agents doing useful things behind the scenes.”

How to scale AI safely

Finally, in response to a closing question from Alexander, Oren offered practical advice for enterprises looking to scale AI safely:

“Visibility is key. We don’t fully understand what happens inside these models; no one really does. Any tool that claims it can guarantee behavior inside the model? I’m skeptical.”

Instead, Oren urged companies to focus on where they can act:

“Manage what goes into the tools, and what comes out. Don’t believe you can control what happens within them.”

Final thoughts

As enterprises navigate the complex realities of AI adoption, one thing is clear: success won’t come from chasing hype or hoping a chat interface will magically solve business challenges.

It will come from building thoughtful guardrails, designing specialized agents, and aligning AI initiatives with real-world workflows and risks. The future belongs to companies that strike the right balance; trusting AI enough to unlock its potential, but governing it wisely to protect their business.

The path forward isn’t about replacing people; it’s about empowering them with AI that truly works with them, not just beside them.

AIAI New York, 2025AIAI New York, 2025

Catch up on every session from the AIAI New York with sessions across 3 co-located summit featuring the likes of Meta, Bank of America, Google DeepMind and many more.

LLMOps in action: How we move GenAI from prototype to productionLLMOps in action: How we move GenAI from prototype to production

Struggling to get your GenAI prototype into production? Discover how LLMOps helps streamline deployment – fast, scalable, and reliable.

CAP theorem in ML: Consistency vs. availabilityCAP theorem in ML: Consistency vs. availability

CAP theorem in ML: Consistency vs. availability

The CAP theorem has long been the unavoidable reality check for distributed database architects. However, as machine learning (ML) evolves from isolated model training to complex, distributed pipelines operating in real-time, ML engineers are discovering that these same fundamental constraints also apply to their systems. What was once considered primarily a database concern has become increasingly relevant in the AI engineering landscape.

Modern ML systems span multiple nodes, process terabytes of data, and increasingly need to make predictions with sub-second latency. In this distributed reality, the trade-offs between consistency, availability, and partition tolerance aren’t academic — they’re engineering decisions that directly impact model performance, user experience, and business outcomes.

This article explores how the CAP theorem manifests in AI/ML pipelines, examining specific components where these trade-offs become critical decision points. By understanding these constraints, ML engineers can make better architectural choices that align with their specific requirements rather than fighting against fundamental distributed systems limitations.

Quick recap: What is the CAP theorem?

The CAP theorem, formulated by Eric Brewer in 2000, states that in a distributed data system, you can guarantee at most two of these three properties simultaneously:

Consistency: Every read receives the most recent write or an error
Availability: Every request receives a non-error response (though not necessarily the most recent data)
Partition tolerance: The system continues to operate despite network failures between nodes

Traditional database examples illustrate these trade-offs clearly:

CA systems: Traditional relational databases like PostgreSQL prioritize consistency and availability but struggle when network partitions occur.
CP systems: Databases like HBase or MongoDB (in certain configurations) prioritize consistency over availability when partitions happen.
AP systems: Cassandra and DynamoDB favor availability and partition tolerance, adopting eventual consistency models.

What’s interesting is that these same trade-offs don’t just apply to databases — they’re increasingly critical considerations in distributed ML systems, from data pipelines to model serving infrastructure.

CAP theorem in ML: Consistency vs. availability

Where the CAP theorem shows up in ML pipelines

Data ingestion and processing

The first stage where CAP trade-offs appear is in data collection and processing pipelines:

Stream processing (AP bias): Real-time data pipelines using Kafka, Kinesis, or Pulsar prioritize availability and partition tolerance. They’ll continue accepting events during network issues, but may process them out of order or duplicate them, creating consistency challenges for downstream ML systems.

Batch processing (CP bias): Traditional ETL jobs using Spark, Airflow, or similar tools prioritize consistency — each batch represents a coherent snapshot of data at processing time. However, they sacrifice availability by processing data in discrete windows rather than continuously.

This fundamental tension explains why Lambda and Kappa architectures emerged — they’re attempts to balance these CAP trade-offs by combining stream and batch approaches.

Feature Stores

Feature stores sit at the heart of modern ML systems, and they face particularly acute CAP theorem challenges.

Training-serving skew: One of the core features of feature stores is ensuring consistency between training and serving environments. However, achieving this while maintaining high availability during network partitions is extraordinarily difficult.

Consider a global feature store serving multiple regions: Do you prioritize consistency by ensuring all features are identical across regions (risking unavailability during network issues)? Or do you favor availability by allowing regions to diverge temporarily (risking inconsistent predictions)?

Model training

Distributed training introduces another domain where CAP trade-offs become evident:

Synchronous SGD (CP bias): Frameworks like distributed TensorFlow with synchronous updates prioritize consistency of parameters across workers, but can become unavailable if some workers slow down or disconnect.

Asynchronous SGD (AP bias): Allows training to continue even when some workers are unavailable but sacrifices parameter consistency, potentially affecting convergence.

Federated learning: Perhaps the clearest example of CAP in training — heavily favors partition tolerance (devices come and go) and availability (training continues regardless) at the expense of global model consistency.

Model serving

When deploying models to production, CAP trade-offs directly impact user experience:

Hot deployments vs. consistency: Rolling updates to models can lead to inconsistent predictions during deployment windows — some requests hit the old model, some the new one.

A/B testing: How do you ensure users consistently see the same model variant? This becomes a classic consistency challenge in distributed serving.

Model versioning: Immediate rollbacks vs. ensuring all servers have the exact same model version is a clear availability-consistency tension.

Case studies: CAP trade-offs in production ML systems

Real-time recommendation systems (AP bias)

E-commerce and content platforms typically favor availability and partition tolerance in their recommendation systems. If the recommendation service is momentarily unable to access the latest user interaction data due to network issues, most businesses would rather serve slightly outdated recommendations than no recommendations at all.

Netflix, for example, has explicitly designed its recommendation architecture to degrade gracefully, falling back to increasingly generic recommendations rather than failing if personalization data is unavailable.

Healthcare diagnostic systems (CP bias)

In contrast, ML systems for healthcare diagnostics typically prioritize consistency over availability. Medical diagnostic systems can’t afford to make predictions based on potentially outdated information.

A healthcare ML system might refuse to generate predictions rather than risk inconsistent results when some data sources are unavailable — a clear CP choice prioritizing safety over availability.

Edge ML for IoT devices (AP bias)

IoT deployments with on-device inference must handle frequent network partitions as devices move in and out of connectivity. These systems typically adopt AP strategies:

Locally cached models that operate independently
Asynchronous model updates when connectivity is available
Local data collection with eventual consistency when syncing to the cloud

Google’s Live Transcribe for hearing impairment uses this approach — the speech recognition model runs entirely on-device, prioritizing availability even when disconnected, with model updates happening eventually when connectivity is restored.

Strategies to balance CAP in ML systems

Given these constraints, how can ML engineers build systems that best navigate CAP trade-offs?

Graceful degradation

Design ML systems that can operate at varying levels of capability depending on data freshness and availability:

Fall back to simpler models when real-time features are unavailable
Use confidence scores to adjust prediction behavior based on data completeness
Implement tiered timeout policies for feature lookups

DoorDash’s ML platform, for example, incorporates multiple fallback layers for their delivery time prediction models — from a fully-featured real-time model to progressively simpler models based on what data is available within strict latency budgets.

Hybrid architectures

Combine approaches that make different CAP trade-offs:

Lambda architecture: Use batch processing (CP) for correctness and stream processing (AP) for recency
Feature store tiering: Store consistency-critical features differently from availability-critical ones
Materialized views: Pre-compute and cache certain feature combinations to improve availability without sacrificing consistency

Uber’s Michelangelo platform exemplifies this approach, maintaining both real-time and batch paths for feature generation and model serving.

Consistency-aware training

Build consistency challenges directly into the training process:

Train with artificially delayed or missing features to make models robust to these conditions
Use data augmentation to simulate feature inconsistency scenarios
Incorporate timestamp information as explicit model inputs

Facebook’s recommendation systems are trained with awareness of feature staleness, allowing the models to adjust predictions based on the freshness of available signals.

Intelligent caching with TTLs

Implement caching policies that explicitly acknowledge the consistency-availability trade-off:

Use time-to-live (TTL) values based on feature volatility
Implement semantic caching that understands which features can tolerate staleness
Adjust cache policies dynamically based on system conditions

Design principles for CAP-aware ML systems

Understand your critical path

Not all parts of your ML system have the same CAP requirements:

Map your ML pipeline components and identify where consistency matters most vs. where availability is crucial
Distinguish between features that genuinely impact predictions and those that are marginal
Quantify the impact of staleness or unavailability for different data sources

Align with business requirements

The right CAP trade-offs depend entirely on your specific use case:

Revenue impact of unavailability: If ML system downtime directly impacts revenue (e.g., payment fraud detection), you might prioritize availability
Cost of inconsistency: If inconsistent predictions could cause safety issues or compliance violations, consistency might take precedence
User expectations: Some applications (like social media) can tolerate inconsistency better than others (like banking)

Monitor and observe

Build observability that helps you understand CAP trade-offs in production:

Track feature freshness and availability as explicit metrics
Measure prediction consistency across system components
Monitor how often fallbacks are triggered and their impact

Wondering where we’re headed next?

Our in-person event calendar is packed with opportunities to connect, learn, and collaborate with peers and industry leaders. Check out where we’ll be and join us on the road.

LLMOps Virtual Summit, May 2025LLMOps Virtual Summit, May 2025

Catch up on every session from LLMOps Virtual Summit, with sessions from the likes of Google, Unicef, Capital One, Linkedin and more…

Why do we need autonomous AI agents?

How to build autonomous AI agent with Google A2A protocol

Picture this: it’s 3 a.m., and a customer on the other side of the globe urgently needs help with their account. A traditional chatbot would wake up your support team with an escalation. But what if your AI agent could handle the request autonomously, safely, and correctly? That’s the dream, right?

The reality is that most AI agents today are like teenagers with learner’s permits; they need constant supervision. They might accidentally promise a customer a large refund (oops!) or fall for a clever prompt injection that makes them spill company secrets or customers’ sensitive data. Not ideal.

This is where Double Validation comes in. Think of it as giving your AI agent both a security guard at the entrance (input validation) and a quality control inspector at the exit (output validation). With these safeguards at a minimum in place, your agent can operate autonomously without causing PR nightmares.

How did I come up with the Double Validation idea?

These days, we hear a lot of talk about AI agents. I asked myself, “What is the biggest challenge preventing the widespread adoption of AI agents?” I concluded that the answer is trustworthy autonomy. When AI agents can be trusted, they can be scaled and adopted more readily. Conversely, if an agent’s autonomy is limited, it requires increased human involvement, which is costly and inhibits adoption.

Next, I considered the minimal requirements for an AI agent to be autonomous. I concluded that an autonomous AI agent needs, at minimum, two components:

Input validation – to sanitize input, protect against jailbreaks, data poisoning, and harmful content.
Output validation – to sanitize output, ensure brand alignment, and mitigate hallucinations.

I call this system Double Validation.

Given these insights, I built a proof-of-concept project to research the Double Validation concept.

In this article, we’ll explore how to implement Double Validation by building a multiagent system with the Google A2A protocol, the Google Agent Development Kit (ADK), Llama Prompt Guard 2, Gemma 3, and Gemini 2.0 Flash, and how to optimize it for production, specifically, deploying it on Google Vertex AI.

For input validation, I chose Llama Prompt Guard 2 just as an article about it reached me at the perfect time. I selected this model because it is specifically designed to guard against prompt injections and jailbreaks. It is also very small; the largest variant, Llama Prompt Guard 2 86M, has only 86 million parameters, so it can be downloaded and included in a Docker image for cloud deployment, improving latency. That is exactly what I did, as you’ll see later in this article.

The complete code for this project is available at github.com/alexey-tyurin/a2a-double-validation.

How to build it?

The architecture uses four specialized agents that communicate through the Google A2A protocol, each with a specific role:

How to build autonomous AI agent with Google A2A protocol — Image generated by author

Here’s how each agent contributes to the system:

Manager Agent: The orchestra conductor, coordinating the flow between agents
Safeguard Agent: The bouncer, checking for prompt injections using Llama Prompt Guard 2
Processor Agent: The worker bee, processing legitimate queries with Gemma 3
Critic Agent: The editor, evaluating responses for completeness and validity using Gemini 2.0 Flash

I chose Gemma 3 for the Processor Agent because it is small, fast, and can be fine-tuned with your data if needed — an ideal candidate for production. Google currently supports nine (!) different frameworks or methods for finetuning Gemma; see Google’s documentation for details.

I chose Gemini 2.0 Flash for the Critic Agent because it is intelligent enough to act as a critic, yet significantly faster and cheaper than the larger Gemini 2.5 Pro Preview model. Model choice depends on your requirements; in my tests, Gemini 2.0 Flash performed well.

I deliberately used different models for the Processor and Critic Agents to avoid bias — an LLM may judge its own output differently from another model’s.

Let me show you the key implementation of the Safeguard Agent:

Plan for actions

The workflow follows a clear, production-ready pattern:

User sends query → The Manager Agent receives it.
Safety check → The Manager forwards the query to the Safeguard Agent.
Vulnerability assessment → Llama Prompt Guard 2 analyzes the input.
Processing → If the input is safe, the Processor Agent handles the query with Gemma 3.
Quality control → The Critic Agent evaluates the response.
Delivery → The Manager Agent returns the validated response to the user.

Below is the Manager Agent’s coordination logic:

Time to build it

Ready to roll up your sleeves? Here’s your production-ready roadmap:

Local deployment

1. Environment setup

2. Configure API keys

3. Download Llama Prompt Guard 2

This is the clever part – we download the model once when we start Agent Critic for the first time and package it in our Docker image for cloud deployment:

Important Note about Llama Prompt Guard 2: To use the Llama Prompt Guard 2 model, you must:

Fill out the “LLAMA 4 COMMUNITY LICENSE AGREEMENT” at https://huggingface.co/meta-llama/Llama-Prompt-Guard-2-86M
Get your request to access this repository approved by Meta
Only after approval will you be able to download and use this model

4. Local testing

Screenshot for running main.py

Screenshot for running client

Screenshot for running tests

Production Deployment

Here’s where it gets interesting. We optimize for production by including the Llama model in the Docker image:

1. Setup Cloud Project in Cloud Shell Terminal

Access Google Cloud Console: Go to https://console.cloud.google.com
Open Cloud Shell: Click the Cloud Shell icon (terminal icon) in the top right corner of the Google Cloud Console
Authenticate with Google Cloud:

Create or select a project:

Enable required APIs:

3. Setup Vertex AI Permissions

Grant your account the necessary permissions for Vertex AI and related services:

3. Create and Setup VM Instance

Cloud Shell will not work for this project as Cloud Shell is limited to 5GB of disk space. This project needs more than 30GB of disk space to build Docker images, get all dependencies, and download the Llama Prompt Guard 2 model locally. So, you need to use a dedicated VM instead of Cloud Shell.

4. Connect to VM

Screenshot for VM

5. Clone Repository

6. Deployment Steps

Screenshot for agents in cloud

7. Testing

Screenshot for running client in Google Vertex AI

Screenshot for running tests in Google Vertex AI

Alternatives to Solution

Let’s be honest – there are other ways to skin this cat:

Single Model Approach: Use a large LLM like GPT-4 with careful system prompts
- Simpler but less specialized
- Higher risk of prompt injection
- Risk of LLM bias in using the same LLM for answer generation and its criticism
Monolith approach: Use all flows in just one agent
- Latency is better
- Cannot scale and evolve input validation and output validation independently
- More complex code, as it is all bundled together
Rule-Based Filtering: Traditional regex and keyword filtering
- Faster but less intelligent
- High false positive rate
Commercial Solutions: Services like Azure Content Moderator or Google Model Armor
- Easier to implement but less customizable
- On contrary, Llama Prompt Guard 2 model can be fine-tuned with the customer’s data
- Ongoing subscription costs
Open-Source Alternatives: Guardrails AI or NeMo Guardrails
- Good frameworks, but require more setup
- Less specialized for prompt injection

Lessons Learned

1. Llama Prompt Guard 2 86M has blind spots. During testing, certain jailbreak prompts, such as:

And

were not flagged as malicious. Consider fine-tuning the model with domain-specific examples to increase its recall for the attack patterns that matter to you.

2. Gemini Flash model selection matters. My Critic Agent originally used gemini1.5flash, which frequently rated perfectly correct answers 4 / 5. For example:

After switching to gemini2.0flash, the same answers were consistently rated 5 / 5:

3. Cloud Shell storage is a bottleneck. Google Cloud Shell provides only 5 GB of disk space — far too little to build the Docker images required for this project, get all dependencies, and download the Llama Prompt Guard 2 model locally to deploy the Docker image with it to Google Vertex AI. Provision a dedicated VM with at least 30 GB instead.

Conclusion

Autonomous agents aren’t built by simply throwing the largest LLM at every problem. They require a system that can run safely without human babysitting. Double Validation — wrapping a task-oriented Processor Agent with dedicated input and output validators — delivers a balanced blend of safety, performance, and cost.

Pairing a lightweight guard such as Llama Prompt Guard 2 with production friendly models like Gemma 3 and Gemini Flash keeps latency and budget under control while still meeting stringent security and quality requirements.

Join the conversation. What’s the biggest obstacle you encounter when moving autonomous agents into production — technical limits, regulatory hurdles, or user trust? How would you extend the Double Validation concept to high-risk domains like finance or healthcare?

Connect on LinkedIn: https://www.linkedin.com/in/alexey-tyurin-36893287/

The complete code for this project is available at github.com/alexey-tyurin/a2a-double-validation.

References

[1] Llama Prompt Guard 2 86M, https://huggingface.co/meta-llama/Llama-Prompt-Guard-2-86M

[2] Google A2A protocol, https://github.com/google-a2a/A2A

[3] Google Agent Development Kit (ADK), https://google.github.io/adk-docs/

Building & securing AI agents: A tech leader crash courseBuilding & securing AI agents: A tech leader crash course

Building & securing AI agents: A tech leader crash course

The AI revolution is racing beyond chatbots to autonomous agents that act, decide, and interface with internal systems.

Unlike traditional software, AI agents can be manipulated through language, making them vulnerable to attacks like prompt injection and they also introduce new security risks like excessive agency.

Join us for an exclusive deep dive with Sourabh Satish, CTO and co-founder at Pangea, as we explore the evolving landscape of AI agents and best practices for securing them.

This session covers:

Demos of MCP configuration and vulnerabilities to highlight how different architectures affect the agent’s attack surface.
An overview of existing security guardrails—from open source projects and cloud service provider offerings to commercial tools and DIY approaches.
A comparison of pros and cons across various guardrail solutions to help you choose the right approach for your use case.
Actionable best practices for implementing guardrails that secure your AI agents without slowing innovation.

This webinar is a must-attend for engineering leaders, AI engineers, and security leaders who want to understand and mitigate the risks of agentic software in an increasingly adversarial landscape.

What exactly is an AI agent – and how do you build one?What exactly is an AI agent – and how do you build one?

What exactly is an AI agent – and how do you build one?

What makes something an “AI agent” – and how do you build one that does more than just sound impressive in a demo?

I’m Nico Finelli, Founding Go-To-Market Member at Vellum. Starting in machine learning, I’ve consulted for Fortune 500s, worked at Weights & Biases during the LLM boom, and now I help companies get from experimentation to production with LLMs, faster and smarter.

In this article, I’ll unpack what AI agents actually are (and aren’t), how to build them step by step, and what separates teams that ship real value from those that stall out in proof-of-concept purgatory.

We’ll also take a close look at the current state of AI adoption, the biggest challenges teams face today, and the one thing that makes or breaks an agent system: evaluation.

Let’s dive in.

Where we are in the AI landscape

At Vellum, we recently partnered with Weaviate and LlamaIndex to run a survey of over 1,200 AI developers. The goal? To understand where people are when it comes to deploying AI in production.

What we found was pretty surprising: only 25% of respondents said they were live in production with their AI initiative. For all the hype around generative AI, most teams are still stuck in experimentation mode.

The biggest blocker? Hallucinations and prompt management. Over 57% of respondents said hallucinations were their number one challenge. And here’s the kicker: when we cross-referenced that with how people were evaluating their systems, we noticed a pattern.

The same folks struggling with hallucinations were the ones relying heavily on manual testing or user feedback as their main form of evaluation.

That tells me there’s a deeper issue here. If your evaluation process isn’t robust, hallucinations will sneak through. And most businesses don’t have automated testing pipelines yet, because AI applications tend to be highly specific to their use cases. So, the old rules of software QA don’t fully apply.

Bottom line: without evaluation, your AI won’t reach production. And if it does, it won’t last long.

What exactly is an AI agent – and how do you build one?

How successful companies build with LLMs

So, how are the companies that do get to production pulling it off?

First, they don’t just chase the latest shiny AI trend. They start with a clearly defined use case and understand what not to build. That discipline creates focus and prevents scope creep.

Second, they build fast feedback loops between software engineers, product managers, and subject matter experts. We see too many teams build something in isolation, hand it off, get delayed feedback, and then go back to the drawing board. That slows everything down.

The successful teams? They involve everyone from day one. They co-develop prompts, run tests together, and iterate continuously. About 65–70% of Vellum customers have AI in production, and these fast iteration cycles are a big reason why.

They also treat evaluation as their top priority. Whether that’s manual review, LLM-as-a-judge, or golden datasets, they don’t rely on vibes. They test, monitor, and optimize like it’s a software product – because it is.

The future of IoT is agentic and autonomousThe future of IoT is agentic and autonomous

The future of IoT is agentic and autonomous

According to recent Cisco research, it’s projected that by 2028, 68% of all customer service and support interactions with tech vendors will be handled by agentic AI. This makes sense, as 93% of respondents in the same study predict that a more personalized, predictive, and proactive service will be possible with agentic AI.

Agentic AI represents a departure from traditional automation, as it enables AI agents to act independently, make decisions with minimal human input, and learn from context. Earlier systems required humans to connect and monitor the automated workflows. However, agentic AI agents have reasoning abilities, memory, and task awareness.

Because agentic AI can enhance both the operation and management of interconnected devices, this development is particularly relevant to the IoT (Internet of Things). It can proactively address network issues that originate from misconfigurations, which results in stronger security, smarter networks, and more productive teams.

This theme takes center stage at Agentic AI Summit New York on June 5, in the session led by Capital One’s Srinath Godavarthi on ‘Agentic AI: The next frontier of generative AI…from simple tasks to goal-oriented autonomy’.

The future of IoT is agentic and autonomous

Agentic AI vs. traditional AI: A structural shift

Agentic AI introduces an essential shift in how intelligent systems are architected and deployed, transitioning from task-specific and supervised models to autonomous and goal-oriented agents that can make real-time decisions and adapt to their environment.

Intelligence is mainly static and narrowly scoped in traditional AI systems; models are trained offline and embedded into rule-based workflows, usually operating in a reactive way. A conventional AI system could, for example, classify network anomalies or trigger alerts, but still rely on humans to contextualize signals and respond.

Agentic AI systems, however, are formed of AI agents that have agency:

Memory. Continuous internal state across interactions and tasks.
Autonomy. Ability to plan, execute, and adapt strategies without needing real-time human input.
Task awareness. An understanding of task objectives, dependencies, and sub-task decomposition.
Learning and reasoning. Ability to infer causality and change behavior depending on data and environmental signals.

As these agents have intention-driven behavior, they can decide how to perform a task, and which task to undertake based on system-level goals. Agentic AI systems often employ multi-agent coordination, where distributed agents communicate, negotiate, and delegate tasks across a network, which is ideal for large-scale and latency-sensitive environments, such as the Internet of Things (IoT).

Taking an industrial IoT context, a traditional AI pipeline would:

Detect vibration anomalies in a turbine,
Flag it for a technician via a dashboard,
And wait for human triage.

In contrast, an agentic AI system would:

Detect the anomaly in the turbine,
Query contextual data (such as usage patterns),
Dispatch a robot or an autonomous drone to inspect the anomaly,
Adjust the operating parameters of any adjacent machinery,
Alert a technician only after composing a full incident report and proposing actions.

Taking IoT into consideration, agentic AI eliminates the need for centralized control and brittle scripting. Instead, it promotes systems capable of evolving and optimizing over time, which is essential for those dynamic environments where resilience, latency, and scalability are crucial.

Why agentic AI matters for IoT

The intersection between agentic AI and IoT is a vital issue in distributed computing. IoT systems are becoming increasingly dense and latency-sensitive, and traditional centralized or rule-based automation is struggling to scale. Agentic AI is the solution to this challenge, allowing for decentralized and real-time decision-making at the edge.

Agentic systems work as goal-oriented and persistent software agents that are capable of adaptive planning, context retention, and multi-agent coordination. When they’re deployed across IoT networks, like cities, smart homes, or factories, AI agents allow for dynamic and autonomous orchestration of device behaviors and inter-device logic.

Smart homes

Agentic agents are able to autonomously manage demand-side energy consumption by dynamically adjusting lighting, HVAC, and storage systems based on real-time information and historical usage.

Smart cities

Agents can manage distributed load balancing across the infrastructure in municipal contexts. They dynamically re-route traffic through real-time edge sensor inputs or by coordinating energy resources through microgrids and public utilities, according to forecasted demand.

Industrial IoT

Agentic AI allows for runtime decisioning in factory or logistics networks for quality control, maintenance, and asset reallocation. If an agent embedded in a vibration sensor detects abnormal patterns, it can issue localized shutdowns while updating a model used by agents in adjacent systems.

Real-world applications

IBM ATOM (Autonomous Threat Operations Machine) is a system created to enhance cybersecurity operations through the use of agentic AI. It works by orchestrating multiple specialized AI agents that collaborate to detect, investigate, and respond to security threats with minimal human intervention.

Abdelrahman Elewah and Khalid Elgazzar developed the IoT Agentic Search Engine (IoT-ASE) to address the challenges of fragmented IoT systems. These often impede seamless data sharing and coordinated management. By leveraging large language models (LLMs) and retrieval augmented generation (RAG) techniques, IoT-ASE can process large amounts of real-time IoT data.

Challenges and opportunities

With the increasing presence of agentic AI systems across IoT environments, several technical challenges need to be addressed. Edge computing still has limitations that create a critical challenge. Agents thrive on local autonomy, but many edge devices still don’t have enough compute power, energy efficiency, or memory to support complex and context-aware reasoning models under real-time constraints.

There are also trust and privacy concerns. Agents operate semi-independently, meaning that their decision-making has to be auditable and aligned with both security and data governance policies. Any agent hallucinations or misaligned goals can lead to real-world risks, particularly in safety-critical environments.

Inter-agent communication and communication is another challenge that needs to be addressed. Strong communication protocols and semantic interoperability are needed so that agents don’t become brittle or inefficient in collaborative tasks.

These challenges can, however, point to some opportunities. Developing open standards for agent behavior, audit traits, and coordination could lead to ecosystem-level interoperability. Feedback loops, task routing, and dynamic oversight are frameworks for human-agent collaboration that can blend human judgment with scalable automation.

Why this matters now

This convergence of agentic AI and IoT is happening in real-time. Organizations are pushing to deploy more intelligence and distributed systems, and the need for autonomous and goal-driven agents is an architectural need.

That’s why events like the Agentic AI Summit New York are so important. They offer a platform to showcase innovation and to confront the practical infrastructure, coordination, and safety challenges that exist with large-scale agentic deployments.

Co-located alongside the Generative AI Summit and the Chief AI Officer Summit, this event forms part of a broader conversation on how intelligence is being embedded not just at the system level, but also into leadership strategy and foundational model development—making it a key gathering point for those shaping the future of AI in all its forms.

Srinath Godavarthi’s session ‘Agentic AI: The next frontier of generative AI…from simple tasks to goal-oriented autonomy’ offers a deep dive into how agent-based systems are evolving from simple task execution into fully autonomous orchestration across industries.

As the convergence of agentic AI and IoT accelerates, keeping pace with how the ecosystem evolves is more important than ever. Platforms like IoT Global Network offer a window into the technologies, use cases, and players shaping this space.

Why responsible AI will need to be your new USPWhy responsible AI will need to be your new USP

Why responsible AI will need to be your new USP

Although AI has been the No. 1 trend since at least 2023, organizational AI governance is just slowly catching up.

But with rising AI incidents and new AI regulations, like the EU AI Act, it’s clear that an AI system that is not governed appropriately has the potential to cause substantial financial and reputational damage to a company.

In this article, we explore the benefits of AI governance, its key components, and best practices for minimal overhead and maximal trust. Additionally, we cover two cases of how to start building an AI governance, demonstrating how a dedicated AI governance tool can be the game changer.

Summary

Efficient AI governance mitigates financial and reputational risks through fostering responsible and safe AI.
Key components of AI governance include clear structures and processes to control and enable AI use. An AI management system, a risk management system, and an AI registry are part of that.
Start governing AI in your organization by implementing a suitable framework, collecting your AI use cases, and mapping the right requirements and controls to the AI use case.
The EU AI Act poses additional compliance risks due to hefty fines. AI governance tools can help in creating compliance, e.g. by mapping of the respective requirements to your AI use cases and automated processes.
When selecting a dedicated AI governance tool, look beyond traditional GRC functionalities, but for curated AI governance frameworks, integrated AI use case management, and the connection to development processes or even MLOps tools. This is part of our mission at trail. </aside>

A lack of effective AI governance can lead to severe AI incidents that can create physical or financial harm, reputational damage, or negative social impact. For instance, Zillow, a real estate company, suffered a $304 million loss and had to lay off 25% of its staff due to incorrectly predicted home prices, causing the company to overpay for purchases, relying on suggestions by AI models.

Similarly, Air Canada was facing financial losses stemming from a court ruling triggered by a chatbot’s wrong responses regarding the company’s bereavement rate policy.

These incidents highlight the importance of having AI governance measures in place, not just for ethical reasons, but to protect your business from avoidable harm. The demand for effective governance structures, safety nets, and responsible AI has been increasing drastically in the past months.

Companies, especially large enterprises, can only scale AI use cases when ensuring trustworthiness, safety, and compliance throughout the AI lifecycle.

However, AI governance is often perceived as slowing down performance and decision-making, as additional structures, policies, and processes can create manual overhead. But when implemented in an efficient and actionable manner, AI governance structures accelerate innovation, create business intelligence, and reduce the risk of costly issues later on. Put differently: You can move fast without breaking things.

By now, there are already dedicated AI governance tools out there that can help you to automate compliance processes, for instance, the writing of documentation or finding the evidence needed to fulfil your requirements.

But they also help you to bring structure to your AI governance program and curate best practices for you. As AI continues to reshape businesses, robust and automated governance practices will be the cornerstone of responsible AI adoption at scale.

At trail, we developed a process with actionable steps and specific documents you can request here to jump-start your strategy.

But what exactly is AI governance? AI governance is a structured approach to ensure AI systems work as intended, comply with the rules, and are non-harmful. It’s much more than classical compliance, though, as it tries to put the principles of responsible AI, such as fairness or transparency, into practice.

Some key components include:

Clear management structures, roles, and responsibilities
A systematic approach to development or procurement processes, fulfilling requirements and meeting controls (that includes risk management, quality management, and testing)
AI principles implemented in a company-wide AI policy with C-level support
AI registry as an inventory of all AI use cases and systems
AI impact assessments
Continuous monitoring (and audits) ofan AI system’s performance
Mechanisms to report to all relevant stakeholders

Why responsible AI will need to be your new USP

It’s becoming clear that companies’ focus is shifting — from “if” to “how” when it comes to AI governance implementation. Especially with the EU AI Act coming into force in August 2024 and first deadlines having already passed, it transitioned from a pioneering effort to a non-negotiable when scaling AI in the enterprise.

When starting to think about how to set up your AI governance program, it can make sense to start with your legal requirements and to tackle compliance challenges first.

This is because non-compliance can come with hefty fines, as seen in the EU AI Act, for instance, and it could open the door for discussions about responsible AI in your organization. However, it is important not to think of AI governance as merely a compliance program.

As regulatory compliance in itself can become quite complex, automated AI governance structures that cover e.g. EU AI Act provisions and that facilitate the categorization of AI systems will free up the resources needed for important innovation.

The EU AI Act 🇪🇺

The EU AI Act is the first comprehensive regulation on AI, aiming to ensure that AI systems used in the EU are safe, transparent, and aligned with fundamental rights. Non-compliance can result in penalties up to 35 Mio. EUR or 7% of the global turnover, whichever is higher.

Put in simple terms, it categorizes AI systems into four risk levels: unacceptable, high, limited, and minimal risk.

Unacceptable risk systems—like social scoring or real-time biometric surveillance—are banned entirely.

High-risk systems, such as HR candidate analyses, credit scoring, or safety components in critical infrastructure, must meet strict requirements, including risk management, transparency, human oversight, and detailed technical documentation.

Limited-risk systems, or those with a “certain risk” (e.g. chatbots interacting with humans) require basic transparency measures, and minimal-risk systems face no further obligations.

General-purpose AI (GPAI) models, like large language models, are covered separately under the Act, with dedicated obligations depending on whether the model is classified as a foundational GPAI or a high-impact GPAI, focusing on transparency, documentation, and systemic risk mitigation.

The requirements organizations face can differ on an AI use case basis. To identify the respective requirements, one must consider the risk class and role (such as provider, deployer, importer, or distributor) of the use case under the AI Act.

A transparent inventory of all AI use cases in the organization enables the categorization and lays the first step towards compliance. However, even if AI use cases across the organization are not accompanied by extensive requirements, implementing AI governance practices can speed up innovation and position an organization at the forefront of responsible AI practices.

But how can a company implement AI governance effectively?

The answer to this question is dependent on the status quo of the organization’s AI efforts and governance structures.

If no governance processes or even AI use cases have been established, the organization can fully capitalize on a greenfield approach and build up a holistic AI governance from scratch. If there are already compliance structures in place, aligning such structures across departments that use AI is crucial to achieve some sort of standardization.

Additionally, established IT governance practices should be updated to match the dynamic nature of AI use cases. This can result in shifting from static spreadsheets and internal wiki pages to a more efficient and dynamic tooling that enables you to supervise AI governance tasks.

Working with many different organizations on setting up their AI governance in the past years has let us find common structures and best practices that we are covering in our starter program, “the AI Governance Bootcamp,” so you don’t have to start from scratch. Find more information here.

First steps for getting started with AI Governance:

Choosing a framework: Don’t reinvent the wheel — start with existing frameworks like EU AI Act, ISO/IEC 42001, or NIST AI Risk Management and align them with your organization. The next step is to translate the frameworks’ components into actionable steps that integrate into your workflows. This cumbersome procedure can be fast-tracked by using dedicated software solutions that already embed the frameworks and respective controls, while leaving enough room for you to customize where needed.
Collection of AI use cases: The next step is to collect existing AI use cases and evaluate them according to their impact and risk. If the AI Act is applicable to your organization, it must also be categorized according to the EU AI Act’s risk levels and role specifications. This AI registry also gives you a great overview of the status quo of past, current, and planned AI initiatives, and is usually the starting point for AI governance on the project level.
Using tools to streamline governance: When setting up AI governance or shifting from silos to centralized structures, a holistic AI governance tool can be of great benefit. While a tool in general can be helpful to reduce manual overhead through automation, it facilitates the tracking of requirements, collaboration, and out-of-the-box use of an AI registry and categorization logic. Additionally, tools, such as those by trail, often already cover most of the AI governance frameworks, translate the (often vague) requirements into practical to-dos, and even automate some evidence collection processes.

Best practices for successful AI governance implementation

For AI governance to be as efficient as possible, some best practices can be deduced from pioneers in the field:

First and foremost, AI governance should not be just a structural change, but should always be accompanied by leadership and cultural commitment. This includes both the active support of ethical AI principles, but it can also shape an organization’s vision and brand.

This can be best achieved if the entire company is invested in responsible AI and involved with the required practices. Start by showing C-level support through executive thought leadership and assigning clear roles and responsibilities.

Additionally, having an AI policy is a crucial step to building a binding framework that supports the use of AI. The AI policy outlines which AI systems are permitted or prohibited to use, and which responsibilities every single employee has, e.g. in case of AI incidents or violations.

Another key driver for successful AI governance implementation is the ability to continuously monitor and manage the risks of AI systems. This means having regular risk assessments and implementing a system that can trigger alerts when passing certain risk and performance thresholds, and when incidents occur. Combining this with risk mitigation measures and appropriate controls ensures effective treatment.

Involving different stakeholders in the design and implementation process can greatly help to gather important feedback, but also includes providing ongoing training programs for all stakeholders to ensure AI literacy and promote a culture of transparency and accountability.

Real-world examples

To help you understand where to begin, we’re sharing two real-world client examples tailored to different stages of organizational maturity. One shows how we helped a client build AI governance from the ground up. The other highlights how we scaled AI governance within a highly regulated industry.

1.) The SME: Innovation consulting firm for startups and corporates

Our first client is an innovation consultancy for startups and mostly mid-cap corporates. Their processes need to be dynamic and fast, and should not block their limited human resources, with about 400–500 employees. As they did not have any AI governance structures in place and a small legal team, we’ve helped them to create their governance processes from scratch.

We focused on three main aspects: building up an AI policy, fostering AI literacy to comply with Article 4 of the EU AI Act, and setting up an AI registry with respective governance frameworks and workflows. All of this was conducted via our AI governance tool trail.

Especially the AI training, which was implemented by their HR, was a central part in enabling the workforce to leverage their new AI tools effectively. Building up the AI registry laid the foundation for transparency about all permitted AI use cases in the company, and is now used as the starting point for their EU AI Act compliance process per use case.

2.) The enterprise: IT in finance

Our second client example is a company operating in the financial industry with 5000+ employees. This firm already had lots of existing risk management and IT governance processes that were scattered across the organization,e.g. and several AI initiatives and products.

Together with a consultancy partner who was actively contributing to the change management, we started to aggregate processes to find an efficient solution for AI governance. We started by assessing the status quo with a comprehensive gap analysis of their governance structures and AI development practices.

This allowed us to pinpoint areas in need of adjustment or missing components, e.g., EU AI Act compliance. Stakeholder mapping and management is an essential part when distributing responsibilities of governance and oversight — this requires a lot of time.

In the next step, we built up an AI registry via the trail platform to have a single source of truth on what AI systems and use cases exist. The main challenge was the identification of these, as they were scattered across internal wiki pages and multiple IT asset inventories. Finally, this enabled us to classify all AI use cases and to connect them to the respective governance requirements, starting with the EU AI Act requirements.

In-house developed AI systems, in particular, require thorough documentation, which creates manual overhead for developers and other technical teams.

Therefore, technical documentation is being created using trail’s automated documentation engine, which gathers the needed evidence directly from code via its Python package and creates documentation using LLMs.

This helps to automatically track the relevant technical metrics and directly recycle them into documentation. This decreases the manual overhead drastically and increases the quality of documentation and citation.

As this client already had an established organization-wide risk management process, it was essential to integrate the management of AI-related risks into their existing process and tool. By integrating an extensive AI risk library and respective controls from trail, a dedicated process of risk identification and mitigation for AI could be established and fed into the existing process.

After starting to work with trail, our client was able to speed up processes, increase transparency, embed a clear AI governance structure, and save time with more efficient documentation generation. This positioned them as a pioneer of responsible AI among their clients and allowed for in-time preparation for the EU AI Act.

What to look for in governance tools

Especially the latter case shows how clever solutions are necessary in order to balance innovation and compliance when existing governance structures are not uniform.

As AI governance is evolving, traditional tools like spreadsheets no longer meet the complexity or pace of modern compliance needs. And rather than viewing regulation as a roadblock, teams should embed governance directly into their development workflows to speed up innovation processes by streamlining documentation, experimentation, and stakeholder engagement.

However, not every GRC tool is a good fit for AI governance. Here are some key features every efficient AI governance tool should have:

Curated governance frameworks: A good AI governance tool should offer a library of different governance frameworks, AI-related risks and controls, and automate the guidelines or requirements as much as possible. Such frameworks can either be ethical guidelines or include regulations like the EU AI Act or certifiable standards such as ISO/IEC 42001. Some tasks and requirements cannot be automated, but a tool should give you the guidance to complete and meet them efficiently.
AI use case management: A governance tool that helps you to manage AI use cases and depict your AI use case ideation process, not only supports you in thinking about governance requirements early on. It also allows you to make AI governance the enabler of AI innovation and business intelligence by having a central place to oversee how individual teams make use of AI in your company. From here, each stakeholder can see the information relevant to them and start their workflows. The AI registry is a must-have even for those who are less concerned with governance. Why not make it multipurpose?
Connection to developer environments: An effective tool lets you think about AI governance holistically and across the whole AI life cycle. This means that you need to integrate with the developer environments. By using a developer-friendly governance tool, you can directly extract valuable information from the code and data to automate the collection of evidence and create documentation. Further, it can also show you which tasks must be done during development already to achieve compliance and quality early on.

An example of such a tool, enabling effective AI governance, is the trail ****AI Governance Copilot. It’s designed for teams that want to scale AI responsibly, without slowing down innovation by getting stuck in compliance processes.

It helps you turn governance from a theoretical checkbox into a practical, day-to-day capability that actually fits how your team works. With trail, you can plug governance directly into your stack using reusable templates, flexible frameworks, and smart automation. That means less manual work for compliance officers and developers alike, but more consistency without having to reinvent the wheel every time.

As a result, trail bridges the gap between tech, compliance, and business teams, making it easy to coordinate tasks along the AI lifecycle. It reduces overhead while boosting oversight, quality, and audit-readiness. Because real AI governance doesn’t live in PDFs, wiki pages, and spreadsheets — it lives in the daily doing.

If you want to make responsible AI your competitive advantage, let us help you — we are happy to explore the right solution for you in a conversation. If you want to continue with your AI governance learning journey, you can do so here with our free AI governance framework.

Building efficient data pipelines for AI and NLP applications in AWSBuilding efficient data pipelines for AI and NLP applications in AWS

Building efficient data pipelines for AI and NLP applications in AWS

Advanced AI and NLP applications are in great demand in today’s modern world, wherein most businesses rely on data-driven insights and automation of business processes.

All applications of AI or NLP have a requirement for a data pipeline that can ingest data, process it, and provide output for training, inference, and subsequent decision making at a large scale. AWS is taken to be the cloud standard with its scalability and efficiency for building these pipelines.

In this article, we will discuss designing a high-performance data pipeline using only basic AWS services like Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker for AI and NLP applications.

This article discusses building a high-performance data pipeline for AI and NLP applications using core AWS services such as Amazon S3, AWS Lambda, AWS Glue, and Amazon SageMaker.

Why AWS for data pipelines?

AWS is the most preferred choice for building data pipelines because of its strong infrastructure, rich service ecosystem, and seamless integration with ML and NLP workflows. Azure, Google Cloud, and AWS also outperform open source tools like Apache Suite in terms of ease of use, operational reliability, and integration. Some of the benefits of using AWS are:

Scalability

AWS would automatically scale up or down because of its elasticity, hence always assuring high performance irrespective of the volume of data. Though Azure and Google Cloud provide features for scaling, the auto-scaling options available in AWS are more granular and customizable, hence providing finer control over resources and costs.

Flexibility and integration

AWS has various services that best fit the components of a data pipeline, including Amazon S3 for storage, AWS Glue for ETL, and Amazon Redshift for data warehousing. More so, seamless integrations with AWS ML services like Amazon SageMaker and NLP models make it perfect for AI-driven applications.

Cost efficiency

AWS’s pay-as-you-go pricing model ensures cost-effectiveness for businesses of all sizes. Unlike Google Cloud, which sometimes has a higher cost for similar services, AWS provides transparent and predictable pricing. Reserved Instances and Savings Plans further enable long-term cost optimization.

Reliability and global reach

AWS is built on extensive global infrastructure comprising several data centers across the world’s regions, ensuring high availability and low latency for users spread worldwide.

Whereas Azure also commands a formidable presence around the world, the sheer reliability and experience in operations favor AWS. AWS, moreover, provides compliance with a broad set of regulatory standards and hence finds a better proposition in healthcare and finance.

Security and governance

AWS provides default security features, including encryption, identity management, and more, to keep your data safe during the entire data pipeline. AWS provides AWS Audit Manager and AWS Artifact for maintaining compliance-much more advanced compared to what is available on other platforms.

So, by choosing AWS, organizations gain access to a scalable and reliable platform that simplifies the process of building, maintaining, and optimizing data pipelines. Its rich feature set, global reach, and advanced AI/ML integration make it a superior choice for both traditional and modern data workflows.

Building efficient data pipelines for AI and NLP applications in AWS

Some Key AWS services to use for data pipelines

Before discussing architecture, it is worth listing some key AWS services that almost always form part of a data pipeline for artificial intelligence or NLP applications:

Amazon S3 (Simple Storage Service): A kind of object storage that can scale to hold enormous amounts of unstructured data. In general, S3 acts like an entry point or an ingestion layer where raw datasets are normally stored.
Amazon Kinesis: A service for real-time data streaming, allowing ingesting and processing streams of data in real time. Applications requiring live data analysis can apply this.
AWS Lambda: A serverless compute service that allows you to run code without provisioning servers. Lambda is very helpful in event-driven tasks, for example: data transformation or triggering a workflow when newer data arrives.
AWS Glue: A fully managed extract, transform, and load (ETL) service that prepares and transforms data automatically for analysis. Glue can also act as a catalog for locating and organizing the data stored within various AWS services.
Amazon SageMaker: This is Amazon’s fully integrated service of machine learning, which greatly simplifies the process through which developers build, train, and deploy AI models. SageMaker works in conjunction with other AWS data pipeline services, providing an easier way to operationalize AI and NLP applications. As a result, it has become much easier for developers, scientists, and engineers to build integrated AI and NLP applications through SageMaker’s single-click workflows.

Designing an efficient data pipeline for AI and NLP

A structured approach is paramount in designing a really efficient data pipeline for AI/NLP applications. Further below are steps regarding how an effective pipeline can be designed on AWS.

Step 1: Ingesting data

Normally, the ingestion of data is the first activity in any data pipeline. AWS supports various methods of data ingesting, depending on the source and nature of the data.

In general, data can be unstructured, like text, images, or audio, for example. In that case, it usually would have Amazon S3 as a starting point for data storage. On the other side, Amazon Kinesis would work just fine for real-time data ingesting that may emanate from sensors, social media, or streaming platforms.

Most of the time, this is something that forms a starting point of your big data pipeline, tasked with collecting and processing huge amounts of data in real time before sending it further down the pipeline.

Step 2: Data transformation and preparation

Once ingested, the data should be prepared and transformed into a format appropriate for AI and NLP models. AWS has the following tools for this purpose:

AWS Glue: It is the ETL service that makes the process of data extraction from one storage to another much smoother and straightforward.

In cleaning and preprocessing text data as preparation for several NLP tasks, such as tokenization, sentiment analysis, or entity extraction, this tool can be applied. AWS Lambda is another very useful tool for performing event-driven transformations.

The latter automatically preprocesses data in S3 for newly uploaded objects, which could mean filtering, format conversion, or enriching with additional metadata. Several applications of AI in real-time prediction can be directly fed into Amazon SageMaker for training or inference.

Running the batch jobs periodically through the scheduling of AWS Glue and Lambdas will ensure that data is always up to date.

Step 3: Model training and inference

Following data preparation, this is where the actual training of AI and NLP models happens. Amazon SageMaker is a comprehensive environment for machine learning to rapidly build, train, and deploy models.

Thus, one can now use SageMaker in this regard to support popular ML frameworks like TensorFlow, PyTorch, and scikit-learn, which directly integrates with S3 for access to the training data.

Similar to specific tasks involving NLP, such as sentiment analysis, machine translation, and named entity recognition, there is a built-in solution within SageMaker for natural language processing. Seamlessly train your NLP models on SageMaker and scale up the training on several instances for faster performance.

Finally, SageMaker deploys the trained model to an endpoint that can then be used to make predictions in real time. These results can then be written back to S3 or even to a streaming service like Amazon Kinesis Data Streams for further processing.

Step 4: Monitoring and optimization

The proper workflow is to keep a check on the performance-security of both the data pipeline and the models. AWS provides some sets of monitoring tools, such as Amazon CloudWatch, to track metrics, log errors, and trigger alarms in case of something wrong happening with the pipeline.

You can also use Amazon SageMaker Debugger and Amazon SageMaker Model Monitor to monitor the performance of ML models in production. These allow you to detect anomalies, monitor for concept drift, and otherwise make sure your model keeps performing as expected over time.

Step 5: Integrating with data analytics for fast query and exploration

Once your data pipeline is processing and producing outputs, enabling efficient querying and exploration is key to actionable insights. This ensures not only that your data is available but also accessible in a form that allows for rapid analysis and decision-making.

AWS provides robust tools to integrate analytics capabilities directly into the pipeline:

Amazon OpenSearch Service: to offer full-text search, data visualization, and fast querying over large datasets, something important, especially when analyzing logs or results originating from NLP tasks.
Amazon Redshift: Run complex analytics queries on structured data at scale. Redshift’s powerful SQL capabilities allow seamless exploration of processed datasets, enabling deeper insights and data-driven decisions.

This integration pipes the transformed data out to immediate exploration in real-time and/or to downstream analytics, giving a complete view of outputs from your pipeline, which brings us to the next section, where we’ll explore best practices for building efficient and scalable AWS data pipelines.

Best practices for building AWS data pipelines

Leverage serverless architecture

You can do this by using AWS Lambda and AWS Glue to simplify your architecture by reducing the overhead involved in the management of servers. In that respect, a serverless architecture will ensure that your pipeline scales seamlessly, utilizing only those resources actually used. Thus, it becomes optimized for performance and cost.

Automate data processing

Leverage event-driven services like AWS Lambda to have events triggered automatically for the transformation and processing of incoming data. This ensures minimum human intervention and that the pipeline is up and running smooth, without delays.

Optimize storage costs

Utilize different storage classes in Amazon S3, such as S3 Standard, S3 Infrequent Access, and S3 Glacier, to balance cost and access needs. For infrequently accessed data, use Amazon S3 Glacier to store the data at a lower cost while ensuring that it’s still retrievable when necessary.

Implement skey

Ensure that your data pipeline adheres to security and compliance standards by using AWS Identity and Access Management (IAM) for role-based access control, encrypting data at rest with AWS Key Management Service (KMS), and monitoring network traffic with AWS CloudTrail.

Security and compliance are key in AWS data pipelines, considering the present landscape where data breaches are rampant, coupled with growing regulatory scrutiny.

Therefore, this may involve sensitive information such as personally identifiable information, financial records, or healthcare data that demands strong security measures against unauthorized access to protect against possible monetary or reputational damage.

AWS KMS ensures data at rest is encrypted, making it unreadable even if compromised, while AWS IAM enforces role-based access to restrict data access to authorized personnel only, reducing the risk of insider threats.

Compliance with regulations like GDPR, HIPAA, and CCPA is crucial for avoiding fines and legal complications, while tools like AWS CloudTrail help track pipeline activities, enabling quick detection and response to unauthorized access or anomalies.

Beyond legal requirements, secure pipelines foster customer trust by showcasing responsible data management and preventing breaches that could disrupt operations. A robust security framework also supports scalability, keeping pipelines resilient and protected against evolving threats as they grow.

It is important to keep prioritizing security and compliance so that organizations not only safeguard data but also enhance operational integrity and strengthen customer confidence.

Conclusion

AWS is a strong choice for building AI and NLP data pipelines primarily because of its intuitive user interface, robust infrastructure, and comprehensive service offerings.

Services like Amazon S3, AWS Lambda, and Amazon SageMaker simplify the development of scalable pipelines, while AWS Glue streamlines data transformation and orchestration. AWS’s customer support and extensive documentation further enhance its appeal, making it relatively quick and easy to set up pipelines.

To advance further, organizations can explore integrating AWS services like Amazon Neptune for graph-based data models, ideal for recommendation systems and relationship-focused AI.

For advanced AI capabilities, leveraging Amazon Forecast for predictive analytics or Amazon Rekognition for image and video analysis can open new possibilities. Engaging with AWS Partner Network (APN) solutions can also offer tailored tools to optimize unique AI and NLP workflows.

By continuously iterating on architecture and using AWS’s latest innovations, businesses can remain competitive while unlocking the full potential of their data pipelines.

However, AWS may not always be the best option depending on specific needs, such as cost constraints, highly specialized requirements, or multi-cloud strategies. While its infrastructure is robust, exploring alternatives like Google Cloud or Azure can sometimes yield better solutions for niche use cases or integration with existing ecosystems.

To maximize AWS’s strengths, organizations can leverage its simplicity and rich service catalog to build effective pipelines while remaining open to hybrid or alternative setups when business goals demand it.