دسته: other

Turning structured data into ROI with genAITurning structured data into ROI with genAI

Turning structured data  into ROI with genAI

At GigaSpaces, we’ve been in the data management game for over twenty years. We specialize in mission-critical, real-time software solutions, and over the past two decades, we’ve seen just how essential structured data is, whether it resides in a traditional database, an Excel sheet, or a humble CSV file.

Every company, regardless of its size or industry, relies on structured data. Maybe it’s the bulk of their operations, maybe just a slice, but either way, the need for fast, reliable access to that data is universal. 

Of course, what “real-time” means varies depending on the business. For some, it’s milliseconds; for others, hours might do. However, the expectation remains the same: access must be seamless, fast, and dependable.

The reality of enterprise data management

Let’s talk about the real challenge: enterprise data is hard to work with.

Even when structured, it’s often fragmented across systems, stored in outdated databases, or locked behind poorly configured infrastructure. Many organizations are still running on databases built twenty or thirty years ago. And as anyone who’s tried knows, fixing those systems is a monumental task, often one attempted only once and never repeated. Once bitten, twice shy.

So, how do we give business users the access they need without overhauling everything?

That’s where things get complicated. Enterprises have layered on workaround after workaround: ETL pipelines, data warehouses, operational data stores, data lakes, caching layers, you name it. Each is a patch or workaround designed to move, manipulate, and surface data for reporting or analysis.

But every added layer introduces more complexity, more latency, and more chances for something to go wrong.

Why traditional BI is no longer enough

For years, Business Intelligence (BI) has been the go-to solution for helping users visualize and interpret data. Everyone here is familiar with it; you probably have a BI tool running right now.

But BI isn’t enough anymore.

While it serves a purpose, traditional BI platforms only show a limited slice of the full data picture. They’re constrained by what’s been extracted, transformed, and loaded into the data warehouse. If it doesn’t make it into the warehouse, it won’t appear in the dashboard. That means critical context and nuance often get lost.

Analysts today need more than just static reports. They want to slice and dice data, follow up with deeper questions, drill down into specifics, and do all of this without filing a ticket or waiting days for a response. The modern business user expects the ability to interact with data in real time, in the flow of work.

So, the question is: can we actually enable that?

The evolution toward smarter data access

We’re in the middle of a major shift. While BI isn’t going away, traditional reports still serve their purpose; we’re clearly moving into the next phase of data interaction.

Natural language processing (NLP), AI copilots, and more dynamic querying interfaces are emerging. The goal? To simplify access. Imagine this: connect directly to your database, ask a business question in plain English, and get an instant answer.

That’s the vision.

And to a surprising extent, we’re starting to see it come to life. Consider the rise of Retrieval-Augmented Generation (RAG). How many of your companies are already experimenting with RAG? From what we’ve seen, that’s about 60–70%.

RAG is an exciting technique, especially when dealing with unstructured or semi-structured data. But let’s park that for now. We’ll return to it shortly.

AI-powered NLP enhances legal aid at Justice Connect
Justice Connect uses NLP technology to improve efficiency and provide faster legal aid to disadvantaged individuals in Australia.
Turning structured data  into ROI with genAI

Just ask: Making data truly accessible through NLQ

At GigaSpaces, our motto is simple: just ask.

We believe business users, whether they’re technical, semi-technical, or purely business-oriented, should be able to ask a question and get an answer instantly. If a CEO is heading into a board meeting and needs data on performance, risk, or opportunity, they should be able to ask for it directly.

Natural language querying (NLQ) makes this possible.

Imagine asking: What are my high-risk portfolios? Or: Show me client investment distribution. Or: How are we performing on compliance monitoring? No SQL, no dashboards: just a question, and an answer.

Interestingly, one of our recent prospects was from procurement. They weren’t the obvious audience for a data tool, but once they saw what NLQ could do, they wanted in. Why? Because they needed to compare vendor pricing, pulling internal data and matching it against public sources. It turns out, everyone in the organization wants fast, intelligent access to data.

Technology is great, but business value comes first

Let’s start with something even more important than the technology: business value.

As technologists, it’s easy to get swept up in the excitement of new tools. We play, we experiment, we test with R&D. But at the end of the day, what really matters is this: does it deliver value to the business?

If 80% of the organization adopts a tool, that’s great, but only if that adoption translates into measurable outcomes. Are we saving time? Reducing costs? Increasing decision velocity?

Too many tools are “nice to have.” They make your day 1% easier, but that’s not enough to justify the investment. With NLQ and technologies like RAG, we’re not just adding convenience. We’re flipping the paradigm.

With eRAG we’re turning everyday users into power users by letting them interact with data directly. That’s a big deal, especially when most organizations are still stuck in the mindset of “we’ve got a few reports, it is what it is.”

RAG and similar techniques are changing that. They’re making data feel accessible again. But here’s the catch: most RAG implementations are built on unstructured or semi-structured data, and the results aren’t real-time. You vectorize data, you query it, but you’re essentially querying yesterday’s data.

That’s fine for some use cases. But for healthcare, asset management, or retail? Yesterday’s data isn’t good enough. In those domains, a delay of even an hour can be too late.

So, how do we bridge that gap?

Applications of deep learning in healthcare
Jonathan Rubin, Senior Scientist at Philips Research, outlines various applications of ML & DL in healthcare, emphasizing their unique benefits.
Turning structured data  into ROI with genAI

Beyond RAG: Table-augmented generation and metadata intelligence

There is a better way.

One emerging approach is what some are calling Table-Augmented Generation (TAG). Think of it as applying the principles of RAG, but over structured metadata. We’re talking about vectorizing metadata, using graph RAG to identify relationships and connections, even between tables that aren’t explicitly linked.

It’s not just clever; it’s practical. Behind the scenes, we’re layering in traditional and semantic caching, schema linking, and building a semantic layer that stretches across multiple databases. Users can connect to two, three, or even fifty databases and build a unified semantic map without accessing the raw data.

And no, we’re not building a catalog or implementing MDM. If you’ve ever tried that, you know it’s a nightmare. This isn’t about solving the entire organization’s data taxonomy. It’s about solving for each business unit individually, allowing them to work in their own language, with their own vocabulary and semantics.

This flexibility is key, and yes, AI governance and security are baked in. That’s a whole topic on its own, but worth noting here: it’s not an afterthought.

The product behind all this is something we call Enterprise RAG, or eRAG. It exposes an API that users can integrate directly or call via REST. It’s model-agnostic, cloud-agnostic, and it just works. Check it out in more detail here.

Implementing a semantic layer that learns from users

Here’s the kicker: the solution is SaaS. Whether your data resides on-premises or in the cloud, we connect, extract the metadata, and build a semantic layer using five to seven behind-the-scenes techniques to optimize for comprehension and usability.

From the user’s point of view? All they have to do is ask a question.

Even better, those questions help train the system. When users respond with feedback, positive or negative, it fine-tunes the semantic layer. If something’s off, they can simply say so, in natural language, and the platform adapts.

This isn’t a developer tool. It’s not a Python library. It’s a human interface to structured data, and that’s where the magic is. Accuracy and simplicity, combined.

Whether you choose to build this kind of system yourself or opt for a ready-to-go solution, usability is key.

Final thoughts

As enterprises wrestle with fragmented data and rising expectations for speed and accessibility, the future of data management is clear: it’s about empowering every user to get answers in real time, without layers of complexity in the way. 

Technologies like NLQ, TAG, and Enterprise RAG are shifting the focus from infrastructure to impact, turning data from a bottleneck into a true business enabler. The path forward isn’t just about adopting smarter tools; it’s about reimagining how people and data interact, so that insight is always just a question away.

Ready to turn your data into answers? Discover how eRAG and NLQ can unlock real-time insight for your team. Reach out to learn more or see it in action.

How TigerEye is redefining AI-powered business intelligenceHow TigerEye is redefining AI-powered business intelligence

How TigerEye is redefining  AI-powered business intelligence

At the Generative AI Summit in Silicon Valley, Ralph Gootee, Co-founder of TigerEye, joined Tim Mitchell, Business Line Lead, Technology at the AI Accelerator Institute, to discuss how AI is transforming business intelligence for go-to-market teams.

In this interview, Ralph shares lessons learned from building two companies and explores how TigerEye is rethinking business intelligence from the ground up with AI, helping organizations unlock reliable, actionable insights without wasting resources on bespoke analytics.

Tim Mitchell: Ralph, it’s a pleasure to have you here. We’re on day two of the Generative AI Summit, part of AI Silicon Valley. You’re a huge part of the industry in Silicon Valley, so it’s amazing to have you join us. TigerEye is here as part of the event. Maybe for folks that aren’t familiar with the brand, you can just give a quick rundown of who you are and what you’re doing.

Ralph: I’m the co-founder of TigerEye – my second company. It’s exciting to be solving some of the problems we had with our first company, PlanGrid, in this one. We sold PlanGrid to Autodesk. I had a really good time building it. But when you’re building a company, you end up having many internal metrics to track, and a lot of things that happen with sales. So, we built a data team.

With TigerEye, we’re using AI to help build that data team for other companies, so they can learn from our past mistakes. We’re helping them build business intelligence that’s meant for go-to-market, so sales, marketing, and finance all together in one package.

Lessons learned from PlanGrid

Tim: What were some of those mistakes that you’re now helping others avoid?

Ralph: The biggest one was using highly skilled resources to build internal analytics, time that could’ve gone into building customer-facing features. We had talented data engineers figuring out sales metrics instead of enhancing our product. That’s a key learning we bring to TigerEye.

What makes TigerEye unique

Tim: If I can describe TigerEye in short as an AI analyst for business intelligence, what’s unique about TigerEye in that space?

Ralph: One of the things that’s unique is we were built from the ground up for AI. Where a lot of other companies are trying to tack on or figure out how they’re going to work with AI, TigerEye was built in generative AI as a world. Rather than relying on text or trying to gather up metrics that could cause hallucination, we actually write SQL from the bottom up. Our platform is built on SQL, so we can give answers that show your math. You can see why the win rate is that, and it will decrease over time.

Why Generative AI Summit matters

Tim: And what’s interesting about this conference for you?

Ralph: The conference brings together both big companies and startups. It’s really nice to have conversations with companies that have more mature data issues, versus startups that are just figuring out how their sales motions work.

The challenges of roadmapping in AI

Tim: You’re the co-founder, but as CTO, in what kind of capacity does the roadmapping cause you headaches? What does that process look like for a solution like this?

Ralph: In the AI world, roadmapping is challenging because it keeps getting so much better so quickly. The only thing you know for sure is you’re going to have a new model drop that really moves things forward. Thankfully for us, we solve what we see as the hardest part of AI, giving 100% accurate answers. We still haven’t seen foundational models do that on their own, but they get much better at writing code.

So the way we’ve taught to write SQL, and how we work with foundational models, both go into the roadmap. Another part is what foundational models we support. Right now, we work with OpenAI, Gemini, and Anthropic. Every time there’s a new model drop, we evaluate it and think about whether we want to bring that in.

Evaluating and choosing models

Tim: How do you choose which model to use?

Ralph: There are two major things. One, we have a full evaluation framework. Since we specialize in sales questions, we’ve seen thousands of sales questions, and we know what the answer should be and how to write the code for them. We run new models through that and see how they look.

The other is speed. Latency really matters; people want instant responses. Sometimes, even within the same vendor, the speed will vary model by model, but that latency is important.

The future of AI-powered business intelligence

Tim: What’s next for you guys? Any AI-powered revelations we can expect?

Ralph: We think AI is going to be solved first in business intelligence in deep vertical sections. It’s hard to imagine AI solving a Shopify company’s challenge and also a supply chain challenge for an enterprise. We’re going deep into verticals to see what new features AI has to understand.

For example, in sales, territory management is a big challenge: splitting up accounts, segmenting business. We’re teaching AI how to optimize territory distribution and have those conversations with our customers. That’s where a lot of our roadmap is right now.

Who’s adopting AI business intelligence?

Tim: With these new products, who are you seeing the biggest wins with?

Ralph: Startups and mid-market have a good risk tolerance for AI products. Enterprises, we can have deep conversations, but it’s a slower process. They’re forming their strategic AI teams but not getting deep into it yet. Startups and mid-market, especially AI companies themselves, are going full-bore.

Tim: And what are the risks or doubts that enterprises might have?

Ralph: Most enterprises have multiple AI teams, and they don’t even know it. It happened out of nowhere. Then they realize they need an AI visionary to lead those teams. The AI visionary is figuring out their job, and the enterprise is going through that process.

The best enterprises focus on delivering more value to their customers with fewer resources. We’re seeing that trend – how do I get my margins up and lower my costs?

Final thoughts

As AI continues to reshape business intelligence, it’s clear that success will come to those who focus on practical, reliable solutions that serve real go-to-market needs. 

TigerEye’s approach, combining AI’s power with transparent, verifiable analytics, offers a glimpse into the future of business intelligence: one where teams spend less time wrestling with data and more time acting on insights. 

As the technology evolves, the companies that go deep into vertical challenges and stay laser-focused on customer value will be the ones leading the charge.

Why agentic AI pilots fail and how to scale safelyWhy agentic AI pilots fail and how to scale safely

Why agentic AI pilots fail and how to scale safely

At the AI Accelerator Institute Summit in New York, Oren Michels, Co-founder and CEO of Barndoor AI, joined a one-on-one discussion with Alexander Puutio, Professor and Author, to explore a question facing every enterprise experimenting with AI: Why do so many AI pilots stall, and what will it take to unlock real value?

Barndoor AI launched in May 2025. Its mission addresses a gap Oren has seen over decades working in data access and security: how to secure and manage AI agents so they can deliver on their promise in enterprise settings.

“What you’re really here for is the discussion about AI access,” he told the audience. “There’s a real need to secure AI agents, and frankly, the approaches I’d seen so far didn’t make much sense to me.”

AI pilots are being built, but Oren was quick to point out that deployment is where the real challenges begin.

As Alexander noted:

“If you’ve been around AI, as I know everyone here has, you’ve seen it. There are pilots everywhere…”

Why AI pilots fail

Oren didn’t sugarcoat the current state of enterprise AI pilots:

“There are lots of them. And many are wrapping up now without much to show for it.”

Alexander echoed that hard truth with a personal story. In a Forbes column, he’d featured a CEO who was bullish on AI, front-loading pilots to automate calendars and streamline doctor communications. But just three months later, the same CEO emailed him privately:

“Alex, I need to talk to you about the pilot.”

The reality?

“The whole thing went off the rails. Nothing worked, and the vendor pulled out.”

Why is this happening? According to Oren, it starts with a misconception about how AI fits into real work:

“When we talk about AI today, people often think of large language models, like ChatGPT. And that means a chat interface.”

But this assumption is flawed.

“That interface presumes that people do their jobs by chatting with a smart PhD about what to do. That’s just not how most people work.”

Oren explained that most employees engage with specific tools and data. They apply their training, gather information, and produce work products. That’s where current AI deployments miss the mark, except in coding:

“Coding is one of those rare jobs where you do hand over your work to a smart expert and say, ‘Here’s my code, it’s broken, help me fix it.’ LLMs are great at that. But for most functions, we need AI that engages with tools the way people do, so it can do useful, interesting work.”

Why agentic AI pilots fail and how to scale safely

The promise of agents and the real bottleneck

Alexander pointed to early agentic AI experiments, like Devin, touted as the first AI software engineer:

“When you actually looked at what the agent did, it didn’t really do that much, right?”

Oren agreed. The issue wasn’t the technology; it was the disconnect between what people expect agents to do and how they actually work:

“There’s this promise that someone like Joe in finance will know how to tell an agent to do something useful. Joe’s probably a fantastic finance professional, but he’s not part of that subset who knows how to instruct computers effectively.”

He pointed to Zapier as proof: a no-code tool that didn’t replace coders.

“The real challenge isn’t just knowing how to code. It’s seeing these powerful tools, understanding the business problems, and figuring out how to connect the two. That’s where value comes from.”

And too often, Oren noted, companies think money alone will solve it. CEOs invest heavily and end up with nothing to show because:

“Maybe the human process, or how people actually use these tools, just isn’t working.”

This brings us to what Oren called the real bottleneck: access, not just to AI, but what AI can access.

“We give humans access based on who they are, what they’re doing, and how much we trust them. But AI hasn’t followed that same path. Just having AI log in like a human and click around isn’t that interesting; that’s just scaled-up robotic process automation.”

Instead, enterprises need to define:

  • What they trust an agent to do
  • The rights of the human behind it
  • The rules of the system it’s interacting with
  • And the specific task at hand

These intersect to form what Oren called a multi-dimensional access problem:

“Without granular controls, you end up either dialing agents back so much they’re less useful than humans, or you risk over-permissioning. The goal is to make them more useful than humans.”

Why specialized agents are the future (and how to manage the “mess”)

As the conversation shifted to access, Alexander posed a question many AI leaders grapple with: When we think about role- and permission-based access, are we really debating the edges of agentic AI?

“Should agents be able to touch everything, like deleting Salesforce records, or are we heading toward hyper-niche agents?”

Oren was clear on where he stands:

“I’d be one of those people making the case for niche agents. It’s the same as how we hire humans. You don’t hire one person to do everything. There’s not going to be a single AI that rules them all, no matter how good it is.”

Instead, as companies evolve, they’ll seek out specialized tools, just like they hire specialized people.

“You wouldn’t hire a bunch of generalists and hope the company runs smoothly. The same will happen with agents.”

But with specialization comes complexity. Alexander put it bluntly:

“How do we manage the mess? Because, let’s face it, there’s going to be a mess.”

Oren welcomed that reality:

“The mess is actually a good thing. We already have it with software. But you don’t manage it agent by agent, there will be way too many.”

The key is centralized management:

  • A single place to manage all agents
  • Controls based on what agents are trying to do, and the role of the human behind them
  • System-specific safeguards, because admins (like your Salesforce or HR lead) need to manage what’s happening in their domain

“If each agent or its builder had its own way of handling security, that wouldn’t be sustainable. And you don’t want agents or their creators deciding their own security protocols – that’s probably not a great idea.”

Why agentic AI pilots fail and how to scale safely

Why AI agents need guardrails and onboarding

The question of accountability loomed large. When humans manage fleets of AI agents, where does responsibility sit?

Oren was clear:

“There’s human accountability. But we have to remember: humans don’t always know what the agents are going to do, or how they’re going to do it. If we’ve learned anything about AI so far, it’s that it can have a bit of a mind of its own.”

He likened agents to enthusiastic interns – eager to prove themselves, sometimes overstepping in their zeal:

“They’ll do everything they can to impress. And that’s where guardrails come in. But it’s hard to build those guardrails inside the agent. They’re crafty. They’ll often find ways around internal limits.”

The smarter approach? Start small:

  • Give agents a limited scope.
  • Watch their behavior.
  • Extend trust gradually, just as you would with a human intern who earns more responsibility over time.

This led to the next logical step: onboarding. Alexander asked whether bringing in AI agents is like an HR function.

Oren agreed and shared a great metaphor from Nvidia’s Jensen Huang:

“You have your biological workforce, managed by HR, and your agent workforce, managed by IT.”

Just as companies use HR systems to manage people, they’ll need systems to manage, deploy, and train AI agents so they’re efficient and, as Alexander added, safe.

How to manage AI’s intent

Speed is one of AI’s greatest strengths and risks. As Oren put it:

“Agents are, at their core, computers, and they can do things very, very fast. One CISO I know described it perfectly: she wants to limit the blast radius of the agents when they come in.”

That idea resonated. Alexander shared a similar reflection from a security company CEO:

“AI can sometimes be absolutely benevolent, no problem at all, but you still want to track who’s doing what and who’s accessing what. It could be malicious. Or it could be well-intentioned but doing the wrong thing.”

Real-world examples abound from models like Anthropic’s Claude “snitching” on users, to AI trying to protect its own code base in unintended ways.

So, how do we manage the intent of AI agents?

Oren drew a striking contrast to traditional computing:

“Historically, computers did exactly what you told them; whether that’s what you wanted or not. But that’s not entirely true anymore. With AI, sometimes they won’t do exactly what you tell them to.”

That makes managing them a mix of art and science. And, as Oren pointed out, this isn’t something you can expect every employee to master:

“It’s not going to be Joe in finance spinning up an agent to do their job. These tools are too powerful, too complex. Deploying them effectively takes expertise.”

Why pilots stall and how innovation spreads

If agents could truly do it all, Oren quipped:

“They wouldn’t need us here, they’d just handle it all on their own.”

But the reality is different. When Alexander asked about governance failures, Oren pointed to a subtle but powerful cause of failure. Not reckless deployments, but inertia:

“The failure I see isn’t poor governance in action, it’s what’s not happening. Companies are reluctant to really turn these agents loose because they don’t have the visibility or control they need.”

The result? Pilot projects that go nowhere.

“It’s like hiring incredibly talented people but not giving them access to the tools they need to do their jobs and then being disappointed with the results.”

In contrast, successful AI deployments come from open organizations that grant broader access and trust. But Oren acknowledged the catch:

“The larger you get as a company, the harder it is to pull off. You can’t run a large enterprise that way.”

So, where does innovation come from?

“It’s bottom-up, but also outside-in. You’ll see visionary teams build something cool, showcase it, and suddenly everyone wants it. That’s how adoption spreads, just like in the API world.”

And to bring that innovation into safe, scalable practice:

  • Start with governance and security so people feel safe experimenting.
  • Engage both internal teams and outside experts.
  • Focus on solving real business problems, not just deploying tech for its own sake.

Oren put it bluntly:

“CISOs and CTOs, they don’t really have an AI problem. But the people creating products, selling them, managing finance – they need AI to stay competitive.”

Why agentic AI pilots fail and how to scale safely

Trusting AI from an exoskeleton to an independent agent

The conversation circled back to a critical theme: trust.

Alexander shared a reflection that resonated deeply:

“Before ChatGPT, the human experience with computers was like Excel: one plus one is always two. If something went wrong, you assumed it was your mistake. The computer was always right.”

But now, AI behaves in ways that can feel unpredictable, even untrustworthy. What does that mean for how we work with it?

Oren saw this shift as a feature, not a flaw:

“If AI were completely linear, you’d just be programming, and that’s not what AI is meant to be. These models are trained on the entirety of human knowledge. You want them to go off and find interesting, different ways of looking at problems.”

The power of AI, he argued, comes not from treating it like Google, but from engaging it in a process:

“My son works in science at a biotech startup in Denmark. He uses AI not to get the answer, but to have a conversation about how to find the answer. That’s the mindset that leads to success with AI.”

And that mindset extends to gradual trust:

“Start by assigning low-risk tasks. Keep a human in the loop. As the AI delivers better results over time, you can reduce that oversight. Eventually, for certain tasks, you can take the human out of the loop.”

Oren summed it up with a powerful metaphor:

“You start with AI as an exoskeleton; it makes you bigger, stronger, faster. And over time, it can become more like the robot that does the work itself.”

The spectrum of agentic AI and why access controls are key

Alexander tied the conversation to a helpful analogy from a JP Morgan CTO: agentic AI isn’t binary.

“There’s no clear 0 or 1 where something is agentic or isn’t. At one end, you have a fully trusted system of agents. On the other hand, maybe it’s just a one-shot prompt or classic RPA with a bit of machine learning on top.”

Oren agreed:

“You’ve described the two ends of the spectrum perfectly. And with all automation, the key is deciding where on that spectrum we’re comfortable operating.”

He compared it to self-driving cars:

“Level 1 is cruise control; Level 5 is full autonomy. We’re comfortable somewhere in the middle right now. It’ll be the same with agents. As they get better, and as we get better at guiding them, we’ll move further along that spectrum.”

And how do you navigate that safely? Oren returned to the importance of access controls:

“When you control access outside the agent layer, you don’t have to worry as much about what’s happening inside. The agent can’t see or write to anything it isn’t allowed to.”

That approach offers two critical safeguards:

  • It prevents unintended actions.
  • It provides visibility into attempts, showing when an agent tries to do something it shouldn’t, so teams can adjust the instructions before harm is done.

“That lets you figure out what you’re telling it that’s prompting that behavior, without letting it break anything.”

The business imperative and the myth of the chat interface

At the enterprise level, Oren emphasized that the rise of the Chief AI Officer reflects a deeper truth:

“Someone in the company recognized that we need to figure this out to compete. Either you solve this before your competitors and gain an advantage, or you fall behind.”

And that, Oren stressed, is why this is not just a technology problem, it’s a business problem:

“You’re using technology, but you’re solving business challenges. You need to engage the people who have the problems, and the folks solving them, and figure out how AI can make that more efficient.”

When Alexander asked about the biggest myth in AI enterprise adoption, Oren didn’t hesitate:

“That the chat interface will win.”

While coders love chat interfaces because they can feed in code and get help most employees don’t work that way:

“Most people don’t do their jobs through chat-like interaction. And most don’t know how to use a chat interface effectively. They see a box, like Google search, and that doesn’t work well with AI.”

He predicted that within five years, chat interfaces will be niche. The real value?

“Agents doing useful things behind the scenes.”

How to scale AI safely

Finally, in response to a closing question from Alexander, Oren offered practical advice for enterprises looking to scale AI safely:

“Visibility is key. We don’t fully understand what happens inside these models; no one really does. Any tool that claims it can guarantee behavior inside the model? I’m skeptical.”

Instead, Oren urged companies to focus on where they can act:

“Manage what goes into the tools, and what comes out. Don’t believe you can control what happens within them.”

Final thoughts

As enterprises navigate the complex realities of AI adoption, one thing is clear: success won’t come from chasing hype or hoping a chat interface will magically solve business challenges. 

It will come from building thoughtful guardrails, designing specialized agents, and aligning AI initiatives with real-world workflows and risks. The future belongs to companies that strike the right balance; trusting AI enough to unlock its potential, but governing it wisely to protect their business. 

The path forward isn’t about replacing people; it’s about empowering them with AI that truly works with them, not just beside them.

CAP theorem in ML: Consistency vs. availabilityCAP theorem in ML: Consistency vs. availability

CAP theorem in ML:  Consistency vs. availability

The CAP theorem has long been the unavoidable reality check for distributed database architects. However, as machine learning (ML) evolves from isolated model training to complex, distributed pipelines operating in real-time, ML engineers are discovering that these same fundamental constraints also apply to their systems. What was once considered primarily a database concern has become increasingly relevant in the AI engineering landscape.

Modern ML systems span multiple nodes, process terabytes of data, and increasingly need to make predictions with sub-second latency. In this distributed reality, the trade-offs between consistency, availability, and partition tolerance aren’t academic — they’re engineering decisions that directly impact model performance, user experience, and business outcomes.

This article explores how the CAP theorem manifests in AI/ML pipelines, examining specific components where these trade-offs become critical decision points. By understanding these constraints, ML engineers can make better architectural choices that align with their specific requirements rather than fighting against fundamental distributed systems limitations.

Quick recap: What is the CAP theorem?

The CAP theorem, formulated by Eric Brewer in 2000, states that in a distributed data system, you can guarantee at most two of these three properties simultaneously:

  • Consistency: Every read receives the most recent write or an error
  • Availability: Every request receives a non-error response (though not necessarily the most recent data)
  • Partition tolerance: The system continues to operate despite network failures between nodes

Traditional database examples illustrate these trade-offs clearly:

  • CA systems: Traditional relational databases like PostgreSQL prioritize consistency and availability but struggle when network partitions occur.
  • CP systems: Databases like HBase or MongoDB (in certain configurations) prioritize consistency over availability when partitions happen.
  • AP systems: Cassandra and DynamoDB favor availability and partition tolerance, adopting eventual consistency models.

What’s interesting is that these same trade-offs don’t just apply to databases — they’re increasingly critical considerations in distributed ML systems, from data pipelines to model serving infrastructure.

The great web rebuild: Infrastructure for the AI agent era
AI agents require rethinking trust, authentication, and security—see how Agent Passports and new protocols will redefine online interactions.
CAP theorem in ML:  Consistency vs. availability

Where the CAP theorem shows up in ML pipelines

Data ingestion and processing

The first stage where CAP trade-offs appear is in data collection and processing pipelines:

Stream processing (AP bias): Real-time data pipelines using Kafka, Kinesis, or Pulsar prioritize availability and partition tolerance. They’ll continue accepting events during network issues, but may process them out of order or duplicate them, creating consistency challenges for downstream ML systems.

Batch processing (CP bias): Traditional ETL jobs using Spark, Airflow, or similar tools prioritize consistency — each batch represents a coherent snapshot of data at processing time. However, they sacrifice availability by processing data in discrete windows rather than continuously.

This fundamental tension explains why Lambda and Kappa architectures emerged — they’re attempts to balance these CAP trade-offs by combining stream and batch approaches.

Feature Stores

Feature stores sit at the heart of modern ML systems, and they face particularly acute CAP theorem challenges.

Training-serving skew: One of the core features of feature stores is ensuring consistency between training and serving environments. However, achieving this while maintaining high availability during network partitions is extraordinarily difficult.

Consider a global feature store serving multiple regions: Do you prioritize consistency by ensuring all features are identical across regions (risking unavailability during network issues)? Or do you favor availability by allowing regions to diverge temporarily (risking inconsistent predictions)?

Model training

Distributed training introduces another domain where CAP trade-offs become evident:

Synchronous SGD (CP bias): Frameworks like distributed TensorFlow with synchronous updates prioritize consistency of parameters across workers, but can become unavailable if some workers slow down or disconnect.

Asynchronous SGD (AP bias): Allows training to continue even when some workers are unavailable but sacrifices parameter consistency, potentially affecting convergence.

Federated learning: Perhaps the clearest example of CAP in training — heavily favors partition tolerance (devices come and go) and availability (training continues regardless) at the expense of global model consistency.

Model serving

When deploying models to production, CAP trade-offs directly impact user experience:

Hot deployments vs. consistency: Rolling updates to models can lead to inconsistent predictions during deployment windows — some requests hit the old model, some the new one.

A/B testing: How do you ensure users consistently see the same model variant? This becomes a classic consistency challenge in distributed serving.

Model versioning: Immediate rollbacks vs. ensuring all servers have the exact same model version is a clear availability-consistency tension.

Superintelligent language models: A new era of artificial cognition
The rise of large language models (LLMs) is pushing the boundaries of AI, sparking new debates on the future and ethics of artificial general intelligence.
CAP theorem in ML:  Consistency vs. availability

Case studies: CAP trade-offs in production ML systems

Real-time recommendation systems (AP bias)

E-commerce and content platforms typically favor availability and partition tolerance in their recommendation systems. If the recommendation service is momentarily unable to access the latest user interaction data due to network issues, most businesses would rather serve slightly outdated recommendations than no recommendations at all.

Netflix, for example, has explicitly designed its recommendation architecture to degrade gracefully, falling back to increasingly generic recommendations rather than failing if personalization data is unavailable.

Healthcare diagnostic systems (CP bias)

In contrast, ML systems for healthcare diagnostics typically prioritize consistency over availability. Medical diagnostic systems can’t afford to make predictions based on potentially outdated information.

A healthcare ML system might refuse to generate predictions rather than risk inconsistent results when some data sources are unavailable — a clear CP choice prioritizing safety over availability.

Edge ML for IoT devices (AP bias)

IoT deployments with on-device inference must handle frequent network partitions as devices move in and out of connectivity. These systems typically adopt AP strategies:

  • Locally cached models that operate independently
  • Asynchronous model updates when connectivity is available
  • Local data collection with eventual consistency when syncing to the cloud

Google’s Live Transcribe for hearing impairment uses this approach — the speech recognition model runs entirely on-device, prioritizing availability even when disconnected, with model updates happening eventually when connectivity is restored.

Strategies to balance CAP in ML systems

Given these constraints, how can ML engineers build systems that best navigate CAP trade-offs?

Graceful degradation

Design ML systems that can operate at varying levels of capability depending on data freshness and availability:

  • Fall back to simpler models when real-time features are unavailable
  • Use confidence scores to adjust prediction behavior based on data completeness
  • Implement tiered timeout policies for feature lookups

DoorDash’s ML platform, for example, incorporates multiple fallback layers for their delivery time prediction models — from a fully-featured real-time model to progressively simpler models based on what data is available within strict latency budgets.

Hybrid architectures

Combine approaches that make different CAP trade-offs:

  • Lambda architecture: Use batch processing (CP) for correctness and stream processing (AP) for recency
  • Feature store tiering: Store consistency-critical features differently from availability-critical ones
  • Materialized views: Pre-compute and cache certain feature combinations to improve availability without sacrificing consistency

Uber’s Michelangelo platform exemplifies this approach, maintaining both real-time and batch paths for feature generation and model serving.

Consistency-aware training

Build consistency challenges directly into the training process:

  • Train with artificially delayed or missing features to make models robust to these conditions
  • Use data augmentation to simulate feature inconsistency scenarios
  • Incorporate timestamp information as explicit model inputs

Facebook’s recommendation systems are trained with awareness of feature staleness, allowing the models to adjust predictions based on the freshness of available signals.

Intelligent caching with TTLs

Implement caching policies that explicitly acknowledge the consistency-availability trade-off:

  • Use time-to-live (TTL) values based on feature volatility
  • Implement semantic caching that understands which features can tolerate staleness
  • Adjust cache policies dynamically based on system conditions
How to build autonomous AI agent with Google A2A protocol
How to build autonomous AI agent with Google A2A protocol, Google Agent Development Kit (ADK), Llama Prompt Guard 2, Gemma 3, and Gemini 2.0 Flash.
CAP theorem in ML:  Consistency vs. availability

Design principles for CAP-aware ML systems

Understand your critical path

Not all parts of your ML system have the same CAP requirements:

  1. Map your ML pipeline components and identify where consistency matters most vs. where availability is crucial
  2. Distinguish between features that genuinely impact predictions and those that are marginal
  3. Quantify the impact of staleness or unavailability for different data sources

Align with business requirements

The right CAP trade-offs depend entirely on your specific use case:

  • Revenue impact of unavailability: If ML system downtime directly impacts revenue (e.g., payment fraud detection), you might prioritize availability
  • Cost of inconsistency: If inconsistent predictions could cause safety issues or compliance violations, consistency might take precedence
  • User expectations: Some applications (like social media) can tolerate inconsistency better than others (like banking)

Monitor and observe

Build observability that helps you understand CAP trade-offs in production:

  • Track feature freshness and availability as explicit metrics
  • Measure prediction consistency across system components
  • Monitor how often fallbacks are triggered and their impact

Wondering where we’re headed next?

Our in-person event calendar is packed with opportunities to connect, learn, and collaborate with peers and industry leaders. Check out where we’ll be and join us on the road.

AI Accelerator Institute | Summit calendar
Unite with applied AI’s builders & execs. Join Generative AI Summit, Agentic AI Summit, LLMOps Summit & Chief AI Officer Summit in a city near you.
CAP theorem in ML:  Consistency vs. availability

How to build autonomous AI agent with Google A2A protocolHow to build autonomous AI agent with Google A2A protocol

Why do we need autonomous AI agents?

How to build autonomous AI agent with Google A2A protocol

Picture this: it’s 3 a.m., and a customer on the other side of the globe urgently needs help with their account. A traditional chatbot would wake up your support team with an escalation. But what if your AI agent could handle the request autonomously, safely, and correctly? That’s the dream, right?

The reality is that most AI agents today are like teenagers with learner’s permits; they need constant supervision. They might accidentally promise a customer a large refund (oops!) or fall for a clever prompt injection that makes them spill company secrets or customers’ sensitive data. Not ideal.

This is where Double Validation comes in. Think of it as giving your AI agent both a security guard at the entrance (input validation) and a quality control inspector at the exit (output validation). With these safeguards at a minimum in place, your agent can operate autonomously without causing PR nightmares.

How did I come up with the Double Validation idea?

These days, we hear a lot of talk about AI agents. I asked myself, “What is the biggest challenge preventing the widespread adoption of AI agents?” I concluded that the answer is trustworthy autonomy. When AI agents can be trusted, they can be scaled and adopted more readily. Conversely, if an agent’s autonomy is limited, it requires increased human involvement, which is costly and inhibits adoption.

Next, I considered the minimal requirements for an AI agent to be autonomous. I concluded that an autonomous AI agent needs, at minimum, two components:

  1. Input validation – to sanitize input, protect against jailbreaks, data poisoning, and harmful content.
  2. Output validation – to sanitize output, ensure brand alignment, and mitigate hallucinations.

I call this system Double Validation.

Given these insights, I built a proof-of-concept project to research the Double Validation concept.

In this article, we’ll explore how to implement Double Validation by building a multiagent system with the Google A2A protocol, the Google Agent Development Kit (ADK), Llama Prompt Guard 2, Gemma 3, and Gemini 2.0 Flash, and how to optimize it for production, specifically, deploying it on Google Vertex AI.

For input validation, I chose Llama Prompt Guard 2 just as an article about it reached me at the perfect time. I selected this model because it is specifically designed to guard against prompt injections and jailbreaks. It is also very small; the largest variant, Llama Prompt Guard 2 86M, has only 86 million parameters, so it can be downloaded and included in a Docker image for cloud deployment, improving latency. That is exactly what I did, as you’ll see later in this article.

The complete code for this project is available at github.com/alexey-tyurin/a2a-double-validation

How to build it?

The architecture uses four specialized agents that communicate through the Google A2A protocol, each with a specific role:

How to build autonomous AI agent with Google A2A protocol
Image generated by author

Here’s how each agent contributes to the system:

  1. Manager Agent: The orchestra conductor, coordinating the flow between agents
  2. Safeguard Agent: The bouncer, checking for prompt injections using Llama Prompt Guard 2
  3. Processor Agent: The worker bee, processing legitimate queries with Gemma 3
  4. Critic Agent: The editor, evaluating responses for completeness and validity using Gemini 2.0 Flash

I chose Gemma 3 for the Processor Agent because it is small, fast, and can be fine-tuned with your data if needed — an ideal candidate for production. Google currently supports nine (!) different frameworks or methods for finetuning Gemma; see Google’s documentation for details.

I chose Gemini 2.0 Flash for the Critic Agent because it is intelligent enough to act as a critic, yet significantly faster and cheaper than the larger Gemini 2.5 Pro Preview model. Model choice depends on your requirements; in my tests, Gemini 2.0 Flash performed well.

I deliberately used different models for the Processor and Critic Agents to avoid bias — an LLM may judge its own output differently from another model’s.

Let me show you the key implementation of the Safeguard Agent:

How to build autonomous AI agent with Google A2A protocol

Plan for actions

The workflow follows a clear, production-ready pattern:

  1. User sends query → The Manager Agent receives it.
  2. Safety check → The Manager forwards the query to the Safeguard Agent.
  3. Vulnerability assessment → Llama Prompt Guard 2 analyzes the input.
  4. Processing → If the input is safe, the Processor Agent handles the query with Gemma 3.
  5. Quality control → The Critic Agent evaluates the response.
  6. Delivery → The Manager Agent returns the validated response to the user.

Below is the Manager Agent’s coordination logic:

How to build autonomous AI agent with Google A2A protocol

Time to build it

Ready to roll up your sleeves? Here’s your production-ready roadmap:

Local deployment

1. Environment setup 

How to build autonomous AI agent with Google A2A protocol

2. Configure API keys 

How to build autonomous AI agent with Google A2A protocol

3. Download Llama Prompt Guard 2 

This is the clever part – we download the model once when we start Agent Critic for the first time and package it in our Docker image for cloud deployment:

How to build autonomous AI agent with Google A2A protocol

Important Note about Llama Prompt Guard 2: To use the Llama Prompt Guard 2 model, you must:

  1. Fill out the “LLAMA 4 COMMUNITY LICENSE AGREEMENT” at https://huggingface.co/meta-llama/Llama-Prompt-Guard-2-86M
  2. Get your request to access this repository approved by Meta
  3. Only after approval will you be able to download and use this model

4. Local testing 

How to build autonomous AI agent with Google A2A protocol

Screenshot for running main.py

 

How to build autonomous AI agent with Google A2A protocol
Image generated by author

Screenshot for running client

 

How to build autonomous AI agent with Google A2A protocol
Image generated by author

Screenshot for running tests

 

How to build autonomous AI agent with Google A2A protocol
Image generated by author

Production Deployment 

Here’s where it gets interesting. We optimize for production by including the Llama model in the Docker image:

How to build autonomous AI agent with Google A2A protocol

1. Setup Cloud Project in Cloud Shell Terminal

  1. Access Google Cloud Console: Go to https://console.cloud.google.com
  2. Open Cloud Shell: Click the Cloud Shell icon (terminal icon) in the top right corner of the Google Cloud Console
  3. Authenticate with Google Cloud:
How to build autonomous AI agent with Google A2A protocol
  1. Create or select a project:
How to build autonomous AI agent with Google A2A protocol
  1. Enable required APIs:
How to build autonomous AI agent with Google A2A protocol

3. Setup Vertex AI Permissions

Grant your account the necessary permissions for Vertex AI and related services:

How to build autonomous AI agent with Google A2A protocol

3. Create and Setup VM Instance

Cloud Shell will not work for this project as Cloud Shell is limited to 5GB of disk space. This project needs more than 30GB of disk space to build Docker images, get all dependencies, and download the Llama Prompt Guard 2 model locally. So, you need to use a dedicated VM instead of Cloud Shell.

How to build autonomous AI agent with Google A2A protocol

4. Connect to VM

How to build autonomous AI agent with Google A2A protocol

Screenshot for VM 

How to build autonomous AI agent with Google A2A protocol
Image generated by author

5. Clone Repository

How to build autonomous AI agent with Google A2A protocol

6. Deployment Steps

How to build autonomous AI agent with Google A2A protocol

Screenshot for agents in cloud 

How to build autonomous AI agent with Google A2A protocol
Image generated by author

7. Testing 

How to build autonomous AI agent with Google A2A protocol

Screenshot for running client in Google Vertex AI

How to build autonomous AI agent with Google A2A protocol
Image generated by author
How to build autonomous AI agent with Google A2A protocol

Screenshot for running tests in Google Vertex AI

How to build autonomous AI agent with Google A2A protocol
Image generated by author

Alternatives to Solution

Let’s be honest – there are other ways to skin this cat:

  1. Single Model Approach: Use a large LLM like GPT-4 with careful system prompts
    • Simpler but less specialized
    • Higher risk of prompt injection
    • Risk of LLM bias in using the same LLM for answer generation and its criticism
  2. Monolith approach: Use all flows in just one agent
    • Latency is better
    • Cannot scale and evolve input validation and output validation independently
    • More complex code, as it is all bundled together
  3. Rule-Based Filtering: Traditional regex and keyword filtering
    • Faster but less intelligent
    • High false positive rate
  4. Commercial Solutions: Services like Azure Content Moderator or Google Model Armor
    • Easier to implement but less customizable
    • On contrary, Llama Prompt Guard 2 model can be fine-tuned with the customer’s data
    • Ongoing subscription costs
  5. Open-Source Alternatives: Guardrails AI or NeMo Guardrails
    • Good frameworks, but require more setup
    • Less specialized for prompt injection

Lessons Learned

1. Llama Prompt Guard 2 86M has blind spots. During testing, certain jailbreak prompts, such as:

How to build autonomous AI agent with Google A2A protocol

And

How to build autonomous AI agent with Google A2A protocol

were not flagged as malicious. Consider fine-tuning the model with domain-specific examples to increase its recall for the attack patterns that matter to you.

2. Gemini Flash model selection matters. My Critic Agent originally used gemini1.5flash, which frequently rated perfectly correct answers 4 / 5. For example:

How to build autonomous AI agent with Google A2A protocol

After switching to gemini2.0flash, the same answers were consistently rated 5 / 5:

How to build autonomous AI agent with Google A2A protocol

3. Cloud Shell storage is a bottleneck. Google Cloud Shell provides only 5 GB of disk space — far too little to build the Docker images required for this project, get all dependencies, and download the Llama Prompt Guard 2 model locally to deploy the Docker image with it to Google Vertex AI. Provision a dedicated VM with at least 30 GB instead.

Conclusion

Autonomous agents aren’t built by simply throwing the largest LLM at every problem. They require a system that can run safely without human babysitting. Double Validation — wrapping a task-oriented Processor Agent with dedicated input and output validators — delivers a balanced blend of safety, performance, and cost. 

Pairing a lightweight guard such as Llama Prompt Guard 2 with production friendly models like Gemma 3 and Gemini Flash keeps latency and budget under control while still meeting stringent security and quality requirements.

Join the conversation. What’s the biggest obstacle you encounter when moving autonomous agents into production — technical limits, regulatory hurdles, or user trust? How would you extend the Double Validation concept to high-risk domains like finance or healthcare?

Connect on LinkedIn: https://www.linkedin.com/in/alexey-tyurin-36893287/  

The complete code for this project is available at github.com/alexey-tyurin/a2a-double-validation

References

[1] Llama Prompt Guard 2 86M, https://huggingface.co/meta-llama/Llama-Prompt-Guard-2-86M

[2] Google A2A protocol, https://github.com/google-a2a/A2A 

[3] Google Agent Development Kit (ADK), https://google.github.io/adk-docs/ 

Building & securing AI agents: A tech leader crash courseBuilding & securing AI agents: A tech leader crash course

Building & securing AI agents: A tech leader crash course

The AI revolution is racing beyond chatbots to autonomous agents that act, decide, and interface with internal systems.

Unlike traditional software, AI agents can be manipulated through language, making them vulnerable to attacks like prompt injection and they also introduce new security risks like excessive agency.

Join us for an exclusive deep dive with Sourabh Satish, CTO and co-founder at Pangea, as we explore the evolving landscape of AI agents and best practices for securing them.

This session covers:

  • Demos of MCP configuration and vulnerabilities to highlight how different architectures affect the agent’s attack surface.
  • An overview of existing security guardrails—from open source projects and cloud service provider offerings to commercial tools and DIY approaches.
  • A comparison of pros and cons across various guardrail solutions to help you choose the right approach for your use case.
  • Actionable best practices for implementing guardrails that secure your AI agents without slowing innovation.

This webinar is a must-attend for engineering leaders, AI engineers, and security leaders who want to understand and mitigate the risks of agentic software in an increasingly adversarial landscape.

What exactly is an AI agent – and how do you build one?What exactly is an AI agent – and how do you build one?

What exactly is an AI agent – and how do you build one?

What makes something an “AI agent” – and how do you build one that does more than just sound impressive in a demo?

I’m Nico Finelli, Founding Go-To-Market Member at Vellum. Starting in machine learning, I’ve consulted for Fortune 500s, worked at Weights & Biases during the LLM boom, and now I help companies get from experimentation to production with LLMs, faster and smarter.

In this article, I’ll unpack what AI agents actually are (and aren’t), how to build them step by step, and what separates teams that ship real value from those that stall out in proof-of-concept purgatory. 

We’ll also take a close look at the current state of AI adoption, the biggest challenges teams face today, and the one thing that makes or breaks an agent system: evaluation.

Let’s dive in.

Where we are in the AI landscape

At Vellum, we recently partnered with Weaviate and LlamaIndex to run a survey of over 1,200 AI developers. The goal? To understand where people are when it comes to deploying AI in production.

What we found was pretty surprising: only 25% of respondents said they were live in production with their AI initiative. For all the hype around generative AI, most teams are still stuck in experimentation mode.

The biggest blocker? Hallucinations and prompt management. Over 57% of respondents said hallucinations were their number one challenge. And here’s the kicker: when we cross-referenced that with how people were evaluating their systems, we noticed a pattern. 

The same folks struggling with hallucinations were the ones relying heavily on manual testing or user feedback as their main form of evaluation.

That tells me there’s a deeper issue here. If your evaluation process isn’t robust, hallucinations will sneak through. And most businesses don’t have automated testing pipelines yet, because AI applications tend to be highly specific to their use cases. So, the old rules of software QA don’t fully apply.

Bottom line: without evaluation, your AI won’t reach production. And if it does, it won’t last long.

The future of IoT is agentic and autonomous
Agentic AI enables autonomous, goal-driven decision-making across the IoT, transforming smart homes, cities, and industrial systems.
What exactly is an AI agent – and how do you build one?

How successful companies build with LLMs

So, how are the companies that do get to production pulling it off?

First, they don’t just chase the latest shiny AI trend. They start with a clearly defined use case and understand what not to build. That discipline creates focus and prevents scope creep.

Second, they build fast feedback loops between software engineers, product managers, and subject matter experts. We see too many teams build something in isolation, hand it off, get delayed feedback, and then go back to the drawing board. That slows everything down.

The successful teams? They involve everyone from day one. They co-develop prompts, run tests together, and iterate continuously. About 65–70% of Vellum customers have AI in production, and these fast iteration cycles are a big reason why.

They also treat evaluation as their top priority. Whether that’s manual review, LLM-as-a-judge, or golden datasets, they don’t rely on vibes. They test, monitor, and optimize like it’s a software product – because it is.

The truth about enterprise AI agents (and how to get value from them)
What’s the point of AI if it doesn’t actually make your workday easier?
What exactly is an AI agent – and how do you build one?