AI is emerging as a key differentiator in enterprise finance. As traditional financial models struggle to keep up with the pace of change, enterprise tech organizations are turning to AI to unlock faster, more accurate, and insight-driven decision-making.
Drawing from my experience in sales planning and forecasting in the enterprise tech sector, I’ve seen firsthand how AI is reshaping how global enterprises forecast revenue, optimize GTM strategies, and manage P&L risk.
This article explores how AI is transforming financial modeling and sales forecasting (two pillars of enterprise strategy) and helping finance teams shift from reactive to proactive operations.
1. Why traditional forecasting falls short
There are three main reasons why traditional forecasting is falling short:
Lack of broader business context
Sales forecasters and financial modelers frequently lack visibility into wider organizational shifts such as changes in product strategy, marketing campaigns, or operational execution that affect demand and performance. This makes it difficult to fine-tune models for niche business dynamics or rapidly changing market conditions.
Inflexibility
They often have an inability to account for real-time changes in demand, market shifts, economic conditions, tariffs, or sales performance.
Human bias
Over-reliance on gut-feel projections leads to inaccurate financial planning.
In many enterprise settings, these limitations create friction between planning and execution across business functions, finance, sales, and marketing. Misaligned forecasts result in delayed strategic actions and misused resources, which are issues that AI is now well-positioned to solve.
2. What makes AI a game-changer for financial modeling
Cross-functional simulations tailored by domain experts
One of AI’s most transformative strengths lies in its ability to empower every function within the enterprise to personalize simulations using their domain-specific expertise. For example:
The pricing team can continuously adjust models based on real-time strategy updates.
The product team can simulate outcomes tied to roadmap changes or launch timing.
The marketing team can incorporate variable lead generation budgets or campaign performance assumptions.
Likewise, GTM leaders can simulate how scaling inside sales headcount could drive more transactional business and enhance margins. These deeply integrated, cross-functional simulations not only improve forecast precision but also drive strategic alignment and execution agility across the business.
Real-time forecast adjustments
Unlike static quarterly models, AI allows finance leaders to refresh forecasts dynamically, giving real-time visibility into revenue performance. This is particularly useful in fast-evolving segments like AI infrastructure, where product cycles and demand signals change rapidly.
AI can streamline pipeline visibility and improve forecast reliability through:
Collaborative pipeline reviews with finance and sales using AI-generated risk scores and close probabilities.
Analysis of competitor dynamics and market share shifts at the product and geo level to understand how winning or losing specific deals affects strategic positioning.
Enhanced understanding of how pipeline outcomes impact both profitability and long-term growth trajectories.
Improved sales productivity
AI boosts front-line efficiency by guiding sales teams to focus efforts on the right product segments expected to experience a surge in demand (such as those driven by OS refresh cycles, compliance deadlines, or emerging industry triggers), enabling them to strategically capture growth opportunities.
AI also helps to prioritize accounts while providing accurate bundling suggestions based on buyer profiles and sales history to increase deal size and win rates.
Tighter finance-sales alignment
AI serves as a bridge between strategic planning and operational execution by:
Providing shared insights to drive collaboration between FP&A, GTM, and sales teams.
Enabling joint decision-making based on real-time financial and sales data.
Improving coordination between business units through unified performance metrics.
Reducing misalignment and strategic blind spots across planning cycles.
5. Key considerations for implementation
Data readiness: Clean, structured data is critical. Integrating CRM, ERP, and planning systems improves AI effectiveness.
Human oversight: AI augments, not replaces, finance leadership. Human intuition is still key for context and judgment.
Change management: Teams need training and adoption support to fully leverage AI’s potential.
Conclusion
AI is redefining how enterprise tech companies forecast, plan, and execute. From lead targeting to revenue modeling and cross-functional scenario planning, it brings precision, agility, and alignment to financial operations.
By breaking silos and enabling real-time collaboration across finance, GTM, and sales, AI turns forecasting into a growth engine. Companies that embed AI into their processes will be better positioned to anticipate market shifts, improve profitability, and lead with confidence.
*Disclaimer: The views expressed in this article are my own and do not reflect the official policy or position of any organization. *
This article comes from Dr Nikolay Burlutskiy’s talk at our London 2024 Generative AI Summit. Check out his full presentation and the wealth of OnDemand resources waiting for you.
Nikolay is currently the Senior Manager of GenAI Platforms at Mars
Have you ever wondered just how long it takes to bring a new medicine to market? For those of us working in the pharmaceutical industry, the answer is clear: it can take decades and cost billions. As a computer scientist leading the AI team at AstraZeneca, I’ve seen firsthand just how complicated the drug discovery process is.
Trust me, there’s a lot of work that goes into developing a drug. But the real challenge lies in making the process more efficient and accessible, which is where generative AI comes in.
In this article, I’ll walk you through the critical role that AI plays in accelerating drug discovery, particularly in areas like medical imaging, predicting patient outcomes, and creating synthetic data to address data scarcity. While AI has incredible potential, it’s not without its challenges. From building trust with pathologists to navigating regulatory requirements, there’s a lot to consider.
So, let’s dive into how AI is reshaping the way we approach drug discovery and development – and why it matters to the future of healthcare.
Generative AI’s role in enhancing medical imaging
One of the most powerful applications of generative AI in drug development is its ability to analyze medical images – a process that’s essential for diagnosing diseases like cancer, which can be difficult to detect early on.
In the world of pathology, we’re increasingly moving away from using traditional microscopes, and using digital images instead. With digitized tissue biopsies, we now have access to incredibly detailed scans that show every single cell. The sheer volume of this data – sometimes containing over 100,000 x 100,000 pixels – makes it almost impossible for human pathologists to analyze every single detail, but AI can handle this level of complexity.
At AstraZeneca, we’ve been using generative AI to help analyze these vast datasets. One of the most exciting developments is in medical imaging, where AI models can quickly identify cancerous cells, segment different areas of tissue, and even predict patient outcomes.
For example, AI can help predict whether a patient will respond to a specific drug – information that’s invaluable for companies like ours, as we work to develop treatments that will provide real, tangible benefits for patients.
In my work, we leverage powerful AI techniques such as variational autoencoders and generative adversarial networks (GANs) to build these models. These AI techniques can help us learn from medical images and generate synthetic data that can be used to train AI models more effectively.
The secret? Agent passports: cryptographic credentials that prove delegation, set spending limits, and unlock seamless agent-to-agent coordination.
We’re entering the agent-first internet, where human-era systems (CAPTCHAs, review sites, IP throttling) break down, and new infrastructure rises to support fully autonomous assistants.
What’s changing?
Identity: agents verify delegation, not humanity
Privacy: agents manage granular data permissions in real time
Trust: star ratings are out, verifiable metrics are in
Security: new attack surfaces, new protections
The takeaway? The next internet runs on agents. And whoever builds the infrastructure? Wins.
Meet Gemini 2.5 Flash: Fast, smart, and fully tunable
Google just dropped Gemini 2.5 Flash. An accelerated, cost-efficient model with a twist: you control how much it thinks.
It’s the first hybrid reasoning model:
Turn thinking on/off depending on your use case
Set a thinking budget to balance speed, quality, and cost
Keep Flash-fast responses with smarter performance
Even with reasoning disabled, 2.5 Flash outperforms its predecessor and crushes the price-to-performance curve.
Need deep logic for tough prompts? Crank up the budget.
Just want speed? Set it to zero. Either way, you’re in control.
The takeaway? Fast is table stakes. Controllable reasoning is the future.
Top AI Accelerator Institute resources
1. Today (April 24), discover how prompt injection attacks are putting generative AI at risk and the defenses you need to stay ahead in our live session, Words as Weapons. 2. How to balance helpfulness and harmlessness in AI 3. AWS, Anthropic, and Glean unpack how enterprises can scale AI smartly with agentic tech, rock-solid security, and real ROI on May 6
AIOps in action: AI & automation transforming IT operations
Traditional IT ops are slow, reactive, and overloaded.
How to 8‑bit quantize large models using bits and bytes
Massive models, massive problems: until you quantize.
8-bit quantization shrinks model size, reduces memory usage, and boosts speed, all with minimal loss of accuracy.
Here’s what it unlocks:
75 percent memory savings
Faster inference on CPUs, GPUs, edge devices
Energy-efficient deployment
No major code changes with tools like BitsAndBytes
A real-world example is IBM Granite: 2B parameters, now edge-ready with a single config flag.
The takeaway? Quantization is the quiet revolution powering real-world AI.
Added to our Pro and Pro+ membership dashboard this month:
OnDemand: Generative AI Summit Washington, D.C. Generative AI Summit Austin Generative AI Summit Toronto Computer Vision Summit London
Exclusive articles: The truth about enterprise AI agents (and how to get value from them) How to secure LLMs with the fastest guardrails for peak AI performance GenAI creation: Building for cross-platform wearable AI and mobile experiences Building advanced AI systems: Challenges and best practices
You’re currently an Insider member. Upgrade to Pro+ to access all this every month, plus a complimentary in-person ticket, and members’ events.
Reach 2.3 million+ AI professionals
Spread the word about your brand, acquire new customers, and grow revenue.
Engage AIAI’s core audience of engineers, builders, and executives across 25+ countries spanning North America, Asia, and EMEA.
It’s December 2028. Sarah’s AI agent encounters an unusual situation while booking her family’s holiday trip to Japan. The multi-leg journey requires coordinating with three different airlines, two hotels, and a local tour operator.
As the agent begins negotiations, it presents its “agent passport”—a cryptographic attestation of its delegation rights and transaction history. The vendors’ systems instantly verify the agent’s authorization scope, spending limits, and exposed metadata like age and passport number.
Within seconds, the agent has established secure payment channels and begun orchestrating the complex booking sequence. When one airline’s system flags the rapid sequence of international bookings as suspicious, the agent smoothly provides additional verification, demonstrating its legitimate delegation chain back to Sarah.
What would have triggered fraud alerts and CAPTCHA challenges in 2024 now flows seamlessly in an infrastructure built for autonomous AI agents.
—> The future, four years from now.
In my previous essay, we explored how websites and applications must evolve to accommodate AI agents. Now we turn to the deeper infrastructural shifts that make such agent interactions possible.
The systems we’ve relied on for decades: CAPTCHAs, credit card verification, review platforms, and authentication protocols, were all built with human actors in mind. As AI agents transition from experimental curiosities to fully operational assistants, the mechanisms underpinning the digital world for decades are beginning to crack under the pressure of automation.
The transition to an agent-first internet won’t just streamline existing processes—it will unlock entirely new possibilities that were impractical in a human-centric web. Tasks that humans find too tedious or time-consuming become effortless through automation.
Instead of clicking ‘Accept All’ on cookie banners, agents can granularly optimize privacy preferences across thousands of sites. Rather than abandoning a cart due to complex shipping calculations, agents can simultaneously compare multiple courier services and customs implications.
Even seemingly simple tasks like comparing prices across multiple vendors, which humans typically limit to 2-3 sites, can be executed across hundreds of retailers in seconds. Perhaps most importantly, agents can maintain persistent relationships with services, continuously monitoring for price drops, policy changes, or relevant updates that humans would miss.
This shift from manual, limited interactions to automated, comprehensive engagement represents not just a change in speed, but a fundamental expansion of what’s possible online.
Amid these sweeping changes, a new gold rush is emerging. Just as the shift to mobile created opportunities for companies like Uber and Instagram to reinvent existing services, the transition to agent-first infrastructure opens unprecedented possibilities for founders.
From building next-generation authentication systems and trust protocols to creating agent-mediated data marketplaces, entrepreneurs have a chance to establish the foundational layers of this new paradigm. In many ways, we’re returning to the internet’s early days, where core infrastructure is being reimagined from the ground up—this time for an autonomous, agent-driven future.
In this second post of the AI Agents series, we’ll focus on the foundational infrastructure changes that underlie the agent-first internet: new authentication mechanisms, trust systems, novel security challenges, and agent-to-agent protocols, setting the stage for the more commerce-oriented transformations we’ll explore in the following post.
This article was originally published here at AI Tidbits, where you can read more of Sahar’s fascinating perspectives on AI-related topics.
Proving you’re a human an agent
Remember when “proving you’re not a robot” meant deciphering distorted text or selecting crosswalk images? Those mechanisms become obsolete in a world where legitimate automated actors are the norm rather than the exception.
Today’s CAPTCHAs, designed to block bots, have become increasingly complex due to advances in multimodal AI. Paradoxically, these mechanisms now hinder real humans while sophisticated bots often bypass them. As AI outpaces human problem-solving in these domains, CAPTCHAs risk becoming obsolete, reducing website conversions, and frustrating legitimate users.
The challenge shifts from proving humanity to verifying the agent has been legitimately delegated and authorized by a human user.
I recently failed a CAPTCHA three times before finally passing on the fourth attempt. Now picture an 80-year-old attempting to decipher increasingly convoluted challenges
Today’s rate-limiting mechanisms assume human-paced interactions, relying heavily on IP-based throttling to manage access. But in a world of AI agents, what constitutes “fair use” of digital services? In an agent-driven internet, automated browsing will become not just accepted but essential. Cloudflare, Akamai, and similar services will need to pivot from simplistic IP-based throttling to sophisticated agent-aware frameworks.
As businesses grapple with these challenges, a new solution is emerging—one that shifts the paradigm from blocking automated traffic to authenticating and managing it intelligently. Enter the Agent Passport.
Imagine a digital credential that encapsulates an agent’s identity and permissions—cryptographically secured and universally recognized. Unlike simple API keys or OAuth tokens, these passports maintain a verifiable chain of trust from the agent back to its human principal. They carry rich metadata about permissions scope, spending limits, and authorized behaviors, allowing services to make nuanced decisions about agent access and capabilities.
By integrating Agent Passports, business websites like airlines can distinguish between legitimate, authorized agents and malicious actors. New metrics, such as agent reliability scores and behavioral analysis, could ensure fair access while mitigating abuse, balancing security with the need to allow agent-driven traffic.
Authentication mechanisms, such as signing up and signing in, must also evolve for an agent-first internet. Websites will need to determine not just an agent’s identity but also its authorized scope—what data the agent is authorized to access (‘read’) and what actions it is permitted to execute (‘write’).
Google Login revolutionized online authentication by centralizing access with a single credential, reducing friction and enhancing security. Similarly, agent passports could create a universal standard for agent authentication, simplifying multi-platform access while maintaining robust authorization controls.
Companies like Auth0 and Okta could adapt by offering agent-specific identity frameworks, enabling seamless integration of these passports into their authentication platforms. Meanwhile, consumer companies like Google and Apple could extend their authentication and wallet services to seamlessly support agent-mediated interactions, bridging the gap between human and agent use cases.
A new protocol for Agent-to-Agent communication
In the early days of the web, protocols like HTTP emerged to standardize how browsers and servers communicated. In much the same way, the rise of agent-mediated interactions demands a new foundational layer: an Agent-to-Agent Communication Protocol (AACP). This protocol would formalize how consumer agents and business agents discover each other’s capabilities, authenticate identities, negotiate trust parameters, and exchange actionable data—all while ensuring both parties operate within well-defined boundaries.
Just as Sarah’s travel agent from the intro paragraph seamlessly coordinated with multiple airlines and hotels, AACP enables complex multi-party interactions that would be tedious or impossible for humans to manage manually.
Much like HTTPS introduced encryption and certificates to authenticate servers and protect user data, AACP would implement cryptographic attestation for agents. Trusted third-party authorities, similar to today’s certificate authorities, would issue digital “agent certificates” confirming an agent’s legitimacy, delegation chain, and operational scope. This ensures that when a consumer’s travel-planning agent communicates with an airline’s booking agent, both sides can instantly verify authenticity and adherence to agreed-upon standards.
A potential implementation of the AACP protocol. A full example of booking an airline ticket can be found here.
Without such a protocol, a rogue agent might impersonate a trusted retailer to trick consumer agents into unauthorized transactions, or a malicious consumer agent could spoof credentials to overwhelm a merchant’s infrastructure. By mandating cryptographic proof, robust authentication handshakes, and behavior logs, AACP mitigates these threats before meaningful data or funds change hands.
The handshake phase in AACP would include mutual disclosure of the agents’ technical stacks—such as which LLM or language configuration they use—and their supported capabilities. Once established, the protocol would also govern “write-like operations” (e.g., initiating a payment or updating account details) by enforcing strict sign-offs with auditable cryptographic signatures. Every action would leave a verifiable trail of authorization that can be reviewed and validated after the fact.
Finally, AACP would incorporate locale and language negotiation at the protocol level. Although agents can translate and interpret content dynamically, specifying a preferred language or locale upfront helps streamline interactions. This new protocol weaves together trust, authentication, and contextual awareness, forging a resilient substrate on which the agent-first internet can reliably function.
Trust and reputation reimagined
When we navigate the internet, our judgment of a website’s credibility hinges on a blend of visual and social cues. We look for secure HTTPS connections, professional design, and familiar branding to assure us that a site is trustworthy. No one wants to input their credit card information on a site that looks like it was built in the early 2000s. User reviews and star ratings on platforms like Trustpilot and G2 further influence our decisions, offering insights drawn from shared human experiences.
Perhaps no aspect of online commerce requires more fundamental reimagining than trust and reputation systems. In an agent-mediated economy, traditional cues for reliability fall short. AI agents can’t interpret visual aesthetics or branding elements–they operate on data, protocols, and cryptographic proofs.
Trust mechanisms must pivot from human perception to machine-readable verifications. For instance, an agent might verify a seller’s identity through cryptographic attestations and assess service quality via automated compliance records, ensuring decisions are based on objective, tamper-proof data. Traditional review platforms like Trustpilot and G2, built around subjective human experiences and star ratings, will also become increasingly obsolete.
The emerging alternative is a new trust infrastructure built on quantifiable, machine-readable metrics. Instead of relying on potentially AI-generated reviews, a problem that has already undermined traditional review systems, agents could assess services using benchmarks like delivery time reliability, system uptime, or refund processing speed—measurable metrics that ensure objective evaluations rather than subjective human reviews.
This could involve decentralized reputation networks where trust is established through cryptographically verified interaction histories and smart contract execution records. Such systems would offer objective assessments of service quality, enabling agents to make informed decisions without relying on potentially biased or manipulated human reviews.
Moreover, the feedback loop between consumers and businesses will evolve dramatically. Instead of sending generic emails requesting reviews—a method often resulting in low response rates—commerce websites can engage directly with your AI agent to collect timely feedback about specific topics like shipping or product quality.
They might offer incentives like future store credit to encourage participation. The human user could provide a brief impression, such as “The cordless vacuum cleaner works well, but the battery life is short.” The agent then takes this input, contextualizes it with additional product data, and generates a comprehensive review that highlights key features and areas for improvement. This process not only saves time for the user but also provides businesses with richer, more actionable insights.
Trustpilot and G2 could pivot by introducing agent-oriented verification systems, such as machine-readable trust scores derived from operational metrics like service accuracy, delivery consistency, and customer support responsiveness, enabling agents to evaluate businesses programmatically.
The new data-sharing economy
Information sharing in the age of AI agents demands a fundamental reinvention of the current consent and data access model. Rather than blunt instruments like cookie banners and privacy policies, websites will implement structured data requirement protocols—machine-readable manifests that explicitly declare what information is needed and why.
This granular control would operate at multiple levels of specificity. For example, an agent could share your shirt size (L) with a retailer while withholding your exact measurements. It might grant 24-hour access to your travel dates, but permanent access to your seating preferences.
When a service requests location data, your agent could share your city for shipping purposes but withhold your exact address until purchase confirmation. These permissions wouldn’t be just binary yes/no choices—they could include sophisticated rules like “share my phone number only during business hours” or “allow access to purchase history solely for personalization, not marketing.”
Such granular controls, impossible to manage manually at scale, become feasible when delegated to AI agents operating under precise constraints.
AI agents would also act as sophisticated information gatekeepers, maintaining encrypted personal data vaults and negotiating data access in real time.
These mechanisms will fundamentally shift the balance of power in data-sharing dynamics. GDPR-like frameworks may evolve to include provisions for dynamic, agent-mediated consent, allowing for more granular data-sharing agreements tailored to specific tasks.
Websites might implement real-time negotiation protocols, where agents can evaluate and respond to data requests based on their principal’s preferences, preserving privacy while optimizing functionality.
New attack vectors
The shift to agent-mediated interaction introduces novel security challenges. Agent impersonation and jailbreaking agents are two examples.
Jailbreaking AI agents poses significant risks, as manipulated agents could act outside their intended scope, leading to unintended purchases or other errors. Techniques like instruction-tuning poisoning or adversarial suffix manipulation could alter an agent’s behavior during critical tasks.
For example, adversarial instructions embedded in websites’ HTML might influence an agent’s purchasing logic, bypassing its human-defined constraints. Robust safeguards and continuous monitoring will be essential to prevent these vulnerabilities.
Agent impersonation adds a complex layer to cybersecurity challenges. Malicious actors could spoof an agent’s credentials to access sensitive data or execute fraudulent transactions. Addressing this threat demands robust multi-layered verification protocols, such as cryptographic identity verification paired with continuous behavioral monitoring, to ensure authenticity and safeguard sensitive interactions.
Building the new web – opportunities for founders
The web’s agent-first future has no established playbook, and that’s exactly where founders thrive. Entirely new product categories are waiting to be defined: agent-to-agent compliance dashboards, cryptographic attestation services that replace outdated CAPTCHAs, and dynamic data-sharing frameworks that make “privacy by design” a reality.
Platforms that offer standardized “agent passports,” identity brokerages that verify delegation rights, agent-native payment gateways, and trust ecosystems driven by machine-readable performance metrics—each of these represents a greenfield opportunity to set the standards of tomorrow’s internet.
Startups anticipating these shifts can position themselves as foundational players in an agent-driven economy, opening new channels of value creation and establishing a competitive edge before the rest of the market catches up.
Some concrete areas include:
Trustpilot for agents – creating machine-readable trust metrics and reputation systems that help agents evaluate services and vendors
Okta for AI agents – building the identity and authentication layer that manages agent credentials, permissions, and delegation chains
OneTrust for agents – creating the new standard for privacy preference management, turning today’s basic cookie banners into sophisticated data-sharing frameworks where agents can negotiate and manage granular permissions across thousands of services
Cloudflare for agent traffic – developing intelligent rate-limiting and traffic management systems designed for agent-scale operations
LastPass for agent permissions – building secure vaults that manage agent credentials and access rights across services
AWS CloudFront for agent data – creating CDN-like infrastructure optimized for agent-readable formats and rapid agent-to-agent communication
McAfee security for agents – developing security platforms that protect against agent impersonation and novel attack vectors
The advancement of digital frameworks has created new hurdles for business IT operations. A company’s network, cloud infrastructure, and streams of data need to be monitored and secured to meet performance and availability requirements, which directly cuts into productivity.
These demands are nearly impossible to cope with under traditional workflows due to outdated approaches relying on reactive monitoring and manual debugging.
The use of artificial intelligence for IT operations (AIOps) has become a breakthrough with regard to IT operation streamlining and business growth.
AIOps applies predictive IT maintenance, proactive incident detection, and scalable automation through AI and machine learning, thus bolstering IT operations. Optimized management of resources, minimal downtime, and efficient IT service management (ITSM) transform AIOps into a framework that is crucial for modern-day enterprises.
Understanding AIOps and its role in IT operations
AIOps refers to the application of AI and machine learning technologies to IT operations. It enhances decision-making and automation by analyzing vast amounts of data from numerous sources, such as logs, metrics, and network traffic. Key capabilities of AIOps include:
Data ingestion and correlation: Aggregating IT data from multiple sources.
Anomaly detection: Identifying irregular patterns that indicate potential operational issues (such as misconfigurations), potential failures or security threats.
Root cause analysis: Automatically diagnosing issues to pinpoint the source of disruptions.
Automated remediation: Implementing fixes without human intervention, reducing mean time to resolution (MTTR).
AIOps not only enhances IT operations with advanced analytics and automation but also represents a paradigm shift in how IT teams manage infrastructure and incidents. Unlike traditional IT operations, which rely on reactive monitoring and manual intervention, AIOps enable proactive action by continuously analyzing data to predict and prevent failures before they impact performance.
Traditional IT operations rely on reactive monitoring, where teams respond to alarm notifications only after a problem has already caused system downtime. This approach not only prolongs downtime but also drives up operational costs. Furthermore, the reliance on human interaction introduces additional inefficiencies and increases the risk of incorrect results, ultimately hindering IT teams’ ability to deliver seamless service
Furthermore, AIOps enables one to be proactive by constant data analysis to foresee and prevent failures. So, by implementing AI into IT procedures, organizations are able to optimize infrastructure management, enhance security, and automate the remediation of incidents.
Real-world use cases of AIOps in predictive maintenance and incident response
A. Predictive maintenance with AIOps
One of the primary advantages of AIOps is its ability to perform predictive maintenance. By using AI-driven analytics, organizations can detect system anomalies before they escalate into failures. This is how AIOps Enables Predictive Maintenance:
Pattern recognition: Machine learning models can be trained to recognize the expected behavior of a system, analyzing performance data to identify trends and patterns. By doing so, these models can predict potential failures or misconfigurations before they occur, enabling proactive maintenance and minimizing downtime.
Proactive interventions: Upon detection of potential issues, automated runbooks can be triggered to swiftly address the problem, minimizing downtime and ensuring business continuity. In cases where human intervention is unavoidable, IT teams can proactively schedule maintenance during planned downtime or off-peak hours, preventing system issues from impacting end users and reducing the risk of service disruptions.
Moreover, predictive maintenance offers a range of key benefits that help organizations optimize operations and reduce costs:
Operational efficiency: Automating maintenance reduces the workload on IT teams.
In order to illustrate the impact of predictive maintenance in action, let’s look at a case study where AIOps played a crucial role in preventing server failures.
One of the best examples of AIOps in action is Netflix’s Simian Army, a set of tools employed to make its streaming service reliable.
Among its ranks is Chaos Monkey, which randomly kills instances in Netflix‘s cloud infrastructure to test the system’s ability to survive failure. This is done in advance so that Netflix can detect and fix problems before they impact users, making the system more robust and minimizing downtime.
Having observed how AIOps can actively avoid system failure through predictive maintenance, it is also essential to appreciate its contribution towards improving incident response and resolution.
While AIOps aid in anticipating and avoiding failures, they also assist organizations by automating the identification and resolution of unforeseen incidents, minimizing disruption, and enabling quicker recovery. This leads naturally into the discussion about how AIOps aids in incident response.
AIOps enhances incident response by using automated anomaly detectionandresolution processes. Through continuous system monitoring, AI can detect ongoing threats in real-time, unauthorized login attempts or performance anomalies, to ensure problems are detected in a timely manner.
Furthermore, AIOps enables IT Service Management tools to automate the response process. It generates tickets, allocates tasks, and even applies resolutions automatically, all without human intervention, reducing the time and effort required to resolve incidents and preventing operations from becoming derailed.
It also applies to ITSM functions like root cause analysis and issue tracking, where diagnostics are accelerated by AI to enable quicker response to high-priority issues. Moreover, AIOps’ integrated platform with helpdesk software guarantees proper case management and seamless team coordination, increasing overall IT service efficiency and reducing resolution times.
A notable example of AIOps in action is a case study of a major multinational financial services organization. The organization implemented Moogsoft’s AIOps platform to automate incident management processes. By automating event correlation and noise reduction, the bank decreased MTTD by 35% and MTTR by 43%. These decreases led to greater operational efficiency and a more responsive IT environment.
Building a scalable AIOps architecture
AIOps significantly enhance incident response through the ability to automatically detect and remediate, thus allowing organizations to respond instantly to issues and maintain business continuity.
To effectively leverage AIOps, however, there is a need to create a scalable architecture that will be able to manage increasing data sizes and still be effective as the IT infrastructure grows.
This leads to the essential elements that are the building blocks of an AIOps solution, enabling faster detection of incidents, accurate predictions, and seamless automation of IT operations. These are the key components of AIOps-Driven IT Infrastructure:
Data ingestion layer: Collecting logs, metrics, and event data from diverse IT sources.
AI and ML models: Analyzing patterns, detecting anomalies, and making predictions.
Automation and orchestration: Executing remediation actions and optimizing workflows.
After defining the correct AIOps architecture, it is important to implement it in a manner that provides the best benefits. Best practices for effective deployment and sustained accomplishment are best adopted by organizations as per business objectives and IT processes. The best practices embed AI-driven operations natively and maximize their impact:
Selecting the right AIOps tools: Choose platforms that align with business objectives.
Ensuring seamless integration: AIOps should work with existing IT workflows and monitoring solutions.
Building a feedback loop: Continuously refine AI models to enhance accuracy and effectiveness.
By following these best practices, organizations can maximize the potential of AIOps and enhance their IT operations. However, as organizations scale their AIOps solutions, they must also confront certain challenges that can hinder their growth and effectiveness. Addressing these challenges is crucial to maintaining the value of AIOps as the IT environment continues to evolve.
Two key challenges include handling vast amounts of data, where AI models must process extensive datasets efficiently, and overcoming resistance to automation, as IT teams may need training to trust AI-driven operations. Therefore, addressing these challenges is essential for achieving successful and scalable AIOps deployment.
In the future, as AIOps continues to advance, various emerging trends are defining its role in IT operations. One significant trend is the creation of AI-driven self-healing systems, where system recovery mechanisms will be automated in order to self-correct faults without human involvement. This technological advancement will revolutionize operational efficiency by allowing systems to correct issues in advance.
Apart from this, the integration with edge computing will seek to enhance AIOps’ capability to manage distributed IT environments better. With more devices and sources of data being executed at the edge, AIOps must scale and accommodate such decentralized networks.
Moreover, Cloud-native AIOps solutions are gaining popularity with greater flexibility and scalability for hybrid and multi-cloud environments. These advances will allow firms to deploy AIOps in increasingly complex IT landscapes.
In addition to these advancements, data security and privacy concerns are coming to the fore ever more, with AIOps becoming mature.
This can be achieved through stronger encryption and compliance features, whereby sensitive information is effectively safeguarded. Besides, as decision making is becoming more and more reliant on AI models, building transparent AI models is essential to ensure trust.
Through the application of explainable AI (XAI) techniques, organizations will be able to offer greater transparency regarding how decisions are made by AI systems, ensuring stakeholders that AI is used ethically and responsibly.
By embracing these new trends and addressing data privacy concerns, AIOps can lead the way to the future of IT operations, making them autonomous, secure, and efficient units.
Conclusion
To sum up, AIOps is revolutionizing IT operations by enabling predictive maintenance, proactive incident management, and automated scalability. Organizations are able to maximize efficiency, reduce downtime, and simplify IT service management by leveraging the capabilities of AI and machine learning.
As AI and automation technologies continue to evolve, AIOps is set to become the key to orchestrating complex IT infrastructures. Further, organizations that adopt AIOps will gain a competitive edge by optimizing their operations and providing users with seamless digital experiences. Finally, in the future, AIOps will no longer be seen just as an assisting tool but will emerge as the backbone of intelligent IT management, driving both innovation and business excellence in the digital era.
Have you checked our free Insider plan?
Access exclusive talks, templates, and more for free.
This article comes from Ryan Priem’s talk at our Washington, D.C. 2025 Generative AI Summit. Check out his full presentation and the wealth of OnDemand resources waiting for you.
What’s the point of AI if it doesn’t actually make your workday easier?
That’s the question I keep coming back to – and the one that ultimately brought me into the generative AI space.
I’m Ryan Priem, and I lead sales for Glean here in the East. After more than two decades in tech, working in data and analytics at places like Snowflake and EMC, I saw something shift. Large language models weren’t just impressive – they were starting to offer real, measurable value.
But there’s a catch: value doesn’t come from the model alone. It comes from how well you apply it.
That’s what drew me to Glean. We’re focused on using AI to solve actual workplace problems. Whether it’s helping someone find the right document, answer a critical question, or automate a tedious task, we’re building AI that works the way people do.
This article is a walk-through of what that journey looks like and what it really takes to build useful, scalable agents that people actually want to use.
Let’s dive in.
What work AI systems actually do (and why they matter now)
We classify ourselves as a “work AI” company. What that means is we’re focused on three core use cases:
Find something. Think of enterprise search – Google-like capabilities across your entire data corpus. We’ve built 120+ native connectors that index everything from Slack and Teams to Confluence, Salesforce, and SharePoint.
Answer something. This is where generative AI kicks in. It’s about providing accurate, relevant answers from within your organization’s ecosystem – like what Microsoft Copilot does, but across all your apps.
Do something. This is the really exciting part: task automation. Whether it’s preparing for a meeting, writing follow-up notes, creating a social media post, or resolving a support ticket – these are the everyday things that slow people down. We help you automate them.
The key to all of this is reducing friction. If you can find the right doc in seconds, get the right answer immediately, and offload repetitive tasks to an agent, you can spend more time doing the high-impact work that actually moves the business forward.
However, as these models continue to grow in size and complexity, the demands on the hardware required for memory and compute continue to skyrocket. In light of this, there are promising strategies to overcome these challenges, one of which is quantization. This lowers the precision of numbers used in the model without a noticeable loss in performance.
In this article, I will dive into the theoretical processes underlying this strategy and show the practical implementation of 8‑bit quantization within a large parameter model, in this case, we will be using the IBM Granite model and BitsAndBytes for quantization.
Introduction
The quick growth of deep learning has resulted in an arms race of models boasting billions of parameters, which, in most cases, achieve stellar performance but require enormous computational resources.
As engineers and researchers look for methods to make these large models more efficient, quantization has shown to be an incredibly effective solution. By lowering the bit width of number representations from 32‑bit floating point to x‑bit integers, quantization decreases the overall model size, speeds up inference, and cuts energy consumption, all while keeping a high accuracy in the output.
I will explore the concepts and techniques behind 8‑bit quantization in this article. I will explain the approach’s benefits, outline the theory behind it, and walk you through the process step by step.
I will then show you a practical application: quantizing the IBM Granite model using BitsAndBytes.
Understanding quantization
At its core, quantization is the process of mapping input values from a quite large set (usually continuous and high-precision) to a much smaller and more discrete set, which has lower precision. Deep learning typically involves converting 32‑bit floating‑point numbers to x‑bit integer alternatives.
The result is a massive reduction in memory usage and computation time.
Benefits of quantization
Lower memory footprint: Lower precision means that each parameter requires much less memory.
Increased speed: Integer math is generally much faster than floating‑point operations (FlOps), especially on hardware optimized for low‑bit computations.
Energy efficiency: Lower precision computations consume far less power, making them ideal for mobile and edge devices.
Types of quantization
Uniform quantization: This method maps a range of floating‑point values uniformly to integer values.
Non‑uniform quantization: Uses a more complicated mapping based on the distribution of the weights or activations of the network.
Symmetric vs. asymmetric quantization:
Symmetric: Uses the same scale and zero‑point for positive and negative values.
Asymmetric: Allows different scales and zero‑points, which is useful for distributions that are not centered around zero.
8‑bit quantization is when each weight or activation in the model is fully represented with 8 bits, thus offering us 256 discrete values.
This approach helps maintain compression and precision by enabling:
Memory savings: Lowering the uint from 32 bits to 8 bits per parameter can cut the memory footprint by up to 75%.
Speed gains: Many hardware accelerators and CPUs are fully optimized for 8‑bit arithmetic, which massively improves inference times.
Minimal accuracy loss: With careful calibration and potentially fine‑tuning, the degradation in performance with 8-bit quantization is often minimal.
Deployment on edge devices: The reduced model size and faster computations make 8‑bit quantized models perfect for devices with limited computational resources.
Theoretical underpinnings of quantization
Quantization is thoroughly rooted in signal processing and numerical analysis. The objective here is to reduce precision whilst also controlling the quantization error, the difference between the original value and its quantized version.
Quantization error
Scale and zero‑point
A linear mapping is normally used to perform quantization:
Scale (S): Sets the step size between our quantized values.
Zero‑point (Z): The integer value assigned to the real number zero.
The process normally involves a calibration phase to determine the optimal scale and zero‑point values. This is then followed by the actual quantization of weights and activations.
Quantization Aware Training (QAT) vs. Post‑Training Quantization (PTQ)
Quantization Aware Training (QAT): This integrates a simulated quantization into the training process, allowing the model to adapt its weights to quantization noise.
Post‑Training Quantization (PTQ): Applies quantization to a pre‑trained model using calibration data. PTQ is simpler and faster to implement but it may incur a slightly larger accuracy drop compared to QAT.
Steps in 8‑bit quantization
Applying 8‑bit quantization includes some essential steps:
Preprocessing and calibration
Step 1: Investigate the Model’s Dynamic Range
Before quantization, we need to know the weights and activation ranges:
Collect Statistics: Pass a part of the dataset through the model to collect statistics (min, max, mean, standard deviation) for all the layers.
Establish Ranges: Based on these statistics, create quantization ranges, possibly clipping outliers to create a tighter range.
Step 2: Calibration
Calibration is the process of selecting the best scale and zero-point for each tensor or layer:
Min/Max Calibration: Uses the minimum and maximum that were observed.
Percentile Calibration: Uses some percentile (e.g., 99.9th percentile) to avoid outliers. Calibration must be correct since poor decisions will result in significant loss of accuracy.
Quantization Aware Training vs. Post‑Training Quantization
Quantization Aware Training (QAT):
Advantages: Greater precision as the model learns how to compensate for quantization distortion.
Cons: Involves modifying the training procedure and extra computation.
Post‑Training Quantization (PTQ):
Advantages: It’s much easier to implement because the model is already pre-trained.
Disadvantages: It can sometimes result in a greater reduction in accuracy, specifically in precision-based models.
For most big models, a small loss of accuracy from PTQ is fine, while mission-critical applications can use QAT.
No matter which deep learning environment—PyTorch, TensorFlow, or ONNX—the concepts of 8‑bit quantization remain the same.
Practical considerations
Before implementing quantization, consider the following:
Hardware support
Ensure that the target hardware (CPUs, GPUs, or special accelerators like TPUs) natively supports 8‑bit operations.
Libraries
PyTorch: Gives us built-in support for QAT and PTQ through its designated quantization module.
TensorFlow Lite: Offers us utilities to transform models to an 8‑bit quantized format, especially for embedded and mobile applications.
ONNX Runtime: Supports quantized models for use across different platforms.
Model Structure: Not all the layers in the model are created equal when quantized.
Convolutional and fully connected layers will generally be fine, but some activation and normalization layers may need further special treatment.
Fine-Tuning: Fine-tuning the quantized model on a small calibration dataset can help restore any performance loss due to quantization noise.
BitsAndBytes: A specialized library for 8‑bit quantization
BitsAndBytes is an independent library that helps us further streamline the 8‑bit quantization process for very large models. Frameworks like PyTorch offer us native quantization support. However, BitsAndBytes provides additional optimizations designed to convert 32‑bit floating point weights into 8‑bit integers.
With a simple config flag (e.g., load_in_8bit=True), it enables significant reductions in memory usage and speeds up inference without requiring massive code modifications.
Model structure: Not all layers are equally amenable to quantization. Convolutional and fully connected layers usually perform well under quantization, but some of the activation and normalization layers may need special treatment.
Fine‑tuning: Fine‑tuning the quantized model on a small calibration dataset can help us recover any performance loss due to quantization noise.
Integrating BitsAndBytes with your workflow
For seamless integration, BitsAndBytes can be used alongside other popular frameworks like PyTorch. When you pre-configure your model with BitsAndBytes, you simply have to specify the quantization configuration during model loading.
This tells the system to automatically convert the weights from 32‑bit integers to 8‑bit integers on the fly thus reducing the overall memory footprint by up to 75% and enhancing inference speed, which is ideal for deployment in resource-constrained environments.
you can achieve a quick switch to 8‑bit precision. This approach not only optimizes memory usage but also maintains high performance, making it a valuable addition to your deep learning workflow.
Case study: Quantizing IBM Granite with 8‑bit using BitsAndBytes
IBM Granite is a 2‑billion parameter model designed for instruction‑following tasks. Due to its enormous size, it is possible to quantize IBM Granite to 8‑bit to reduce its memory footprint significantly with good performance.
IBM Granite quantization: Example code
The following is the code segment for configuring IBM Granite with 8‑bit quantization:
# Setup IBM Granite model using 8-bit quantization. model_name = “ibm-granite/granite-3.1-2b-instruct” quantization_config = BitsAndBytesConfig(load_in_8bit=True) model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=quantization_config, device_map=“balanced”, # Adjust as needed based on available GPU memory. torch_dtype=torch.float16 ) tokeniser = AutoTokeniser.from_pretrained(model_name)
Code breakdown
Model Selection:
The model_name variable sets up the IBM Granite model to be used for instruction execution.
Quantization Setup:
BitsAndBytesConfig(load_in_8bit=True) activates 8‑bit quantization. It is a flag that informs the model loader to quantize 32‑bit floating point to 8‑bit integer.
Model loading:
AutoModelForCausalLM.from_pretrained() loads the model using the specified configuration. The parameter device_map=”balanced” helps distribute the model across available GPUs, and torch_dtype=torch.float16 ensures that any remaining computation uses half‑precision.
Tokenizer initialization:
The tokenizer is instantiated with AutoTokeniser.from_pretrained(model_name) and guarantees the input text undergoes correct preprocessing for the quantized model.
This method not only lowers the memory usage of the model by as much as 75%, it also increases inference speed, making it particularly suitable for deployment in memory-limited settings, such as edge devices.
Even though 8-bit quantization is highly advantageous, it also has some challenges:
Challenges
Accuracy degradation
Some models can suffer from a loss of accuracy after quantization due to quantization noise.
Calibration difficulty
It is important to determine appropriate calibration data and techniques and may be difficult, especially for models with a broad dynamic range.
Hardware constraints
Ensure that your target deployment platform fully supports 8‑bit operation, or performance will be disappointing.
Best practices full calibration
Use a representative data set to accurately calibrate the model’s weights and activations.
Layer-by-layer analysis
Determine which layers are sensitive to quantization and evaluate the necessity to retain them at a higher precision.
Progressive evaluation
Quantization is not a one-shot fix. Repeat your strategy in turn experimenting with different calibration techniques and potentially mixing PTQ with QAT.
Use framework tools
Utilize the high-level quantization utilities integrated into frameworks such as PyTorch and TensorFlow, as these utilities are always being improved and updated.
Fine‑tuning
If possible, optimize the quantized model on a subset of your data to recover any performance loss due to quantization.
Conclusion
Quantization and 8‑bit quantization are powerful techniques for reducing the memory footprint and accelerating the inference of large models. By converting 32‑bit floating‑point values to 8‑bit integers, you can achieve significant memory savings and speedups with minimal accuracy loss.
In the current article, we discussed the theoretical foundations of quantization and expounded on the steps involved in preprocessing, calibration, and choosing between quantization-aware training and post-training quantization.
We then gave practical examples using popular frameworks, finishing with a case study involving the quantization of the IBM Granite model using BitsAndBytes.
As models in deep learning increase in size, mastering techniques like 8‑bit quantization will be needed to deploy efficient state‑of‑the‑art systems: right from the data center down to edge devices.
Regardless of whether you’re an AI researcher or a deployment engineer, understanding how to make large models optimized is a needed skill in today’s AI landscape.
The application of 8-bit quantization through tools such as BitsAndBytes allows the reduction of the computational and memory overhead of big models, such as IBM Granite, to be achieved for more scalable, efficient, and energy-consumption-friendly deployment in diverse applications and hardware platforms.
Happy quantizing, and may every bit and byte count in your models become leaner, faster, and more efficient!
Connect with like-minded AI professionals and enthusiasts at our in-person events across the globe.
Check out where we’ll be this year, and join us to discuss emerging topics with some of the world’s leading AI minds.
Artificial intelligence forms the heart of the digital revolution in the advent of the twenty-first century. Handling big data through fine-grained data pipelines is crucial for perfect AI training, and such a requirement is felt more strongly in computer vision applications.
AI models, mainly deep learning models, need large volumes of labeled image data for efficient training and reasoning. A well-designed, scalable image processing pipeline ensures that AI systems are appropriately trained with quality-prepared data to ensure accuracy by minimizing errors in model training and optimizing their performance.
This article discusses essential components and necessary strategies for implementing efficient and scalable image data pipelines for the training of AI models.
Scalable image data pipelines: A need
Image-based AI applications have been infamous for being extremely data-hungry. Be it image classification, object detection, or facial recognition, all of these models require millions of images to learn from. The images have to be preprocessed before training: resized, normalized, and often augmented. As data starts to scale up, such operations become increasingly complex, and one needs a strong and flexible pipeline that could handle a variety of tasks like:
Ingestion of Data: Ingest a large volume of image data coming from different sources very fast.
Preprocessing of Data: Raw image data is transformed into forms that are usable in the training of models, including resizing, cropping, and augmentation.
Storage of Data: Preprocessed data should be stored in a manner such that during training, it can be accessed fast.
Scalability: The system should scale up with larger and larger data without a drop in performance.
Automation and monitoring: Automate the repetitive tasks, while at the same time keeping track of what happens in the pipeline to maintain it at peak efficiency level, therefore capturing potential problems before they emerge.
Data ingestion refers to the initial step in an image data pipeline that deals with source image data collection, coming from a variety of sources—public image repositories, company databases, or web scrapings. Since the size of the image dataset spans from thousands to millions of files, efficient mechanisms for their ingesting need to be designed.
Best practices for data ingestion:
Batch processing: Ingest large datasets in batches for smooth handling of high volumes.
Streaming data ingestion: Streaming data should be directly fed into the pipeline from cameras or IoT devices in certain real-time applications to avoid latency and ensure freshness.
Data versioning: Versioning of datasets allows tracking changes and ensures the integrity of the training datasets.
After ingestion, the raw images will undergo preprocessing. This will involve several steps, such as resizing images to uniform dimensions, normalizing pixel values, converting image formats, and augmenting data by rotation, flipping, or color modification. That is an effective way of synthetically increasing the size of a dataset, to enhance model robustness.
2. Efficient data pre-processing:
Parallel processing: If the images are preprocessed in parallel across multiple nodes, this greatly reduces the time to prepare large datasets.
Use of GPUs: Image preprocessing—especially augmentation—is greatly helped by the parallelism afforded by GPUs.
Pipeline automation: Automatic preprocessing pipelines with either TensorFlow’s tf.data or PyTorch’s DataLoader simplify the process.
3. Data storage and management
This calls for a storage approach that will allow for swift retrieval while training, offer scalability, and be inexpensive.
Popular large-scale image data pipelines use distributed storage systems, such as Amazon S3 or Google Cloud Storage. These provide high availability and scalability while allowing one to store huge datasets without being puzzled by complicated infrastructure at your side.
Key considerations for image data storage:
Object storage: Employ an object storage system like Amazon S3, which can handle unstructured data and store images in large amounts.
Data caching: For repeatedly accessed images, a caching mechanism could be developed to minimize retrieval times, especially during model training.
Data compression: Compression of image files reduces storage costs and time taken in transferring the images without losing quality.
4. Distributed processing and scalability
Among the major considerations in building an image data pipeline, scalability is paramount since datasets keep increasing. This can be supported with distributed processing frameworks like Apache Spark or Dask that allow the processing of huge data in parallel across several machines, ensuring scalability and reduction of processing times.
Scaling strategies for image data pipelines:
Horizontal scaling: By adding nodes, the load can be scaled across a number of servers. This is quite advantageous in datasets of large-scale images.
Serverless architecture: Leverage serverless compute, such as AWS Lambda or Google Cloud Functions, to perform common image data processing tasks without concerns about the management of an underlying server.
Once the image data is ingested, processed, and stored, it is ready to train. Training requires efficient mechanisms for data access and must be able to scale up to large-scale distributed training on multiple machines or GPUs.
Major machine learning platforms like TensorFlow, PyTorch, and Apache MXNet support distributed training, allowing models to leverage huge datasets without bottlenecks.
Optimizing data access toward training:
Prefetching: Use data prefetching whereby batches of images are loaded into memory while the model is still operating on the previous batch to reduce I/O wait times as much as possible.
Shuffling and batching: Shuffling prevents overfitting, and batching allows models to train on subsets of data, gaining efficiency.
Integration with distributed storage: Ensure your training environment is tightly integrated with the distributed storage system. This cuts down latency and ensures quick access to training data.
6. Monitoring, automation, and maintenance
The pipeline would be continuously monitored to ensure that, by means of automated tasks in charge of recurrent processes such as data ingestion, preprocessing, and error checking, everything happens efficiently.
Monitoring tools such as Prometheus or Grafana can keep track of performance metrics while alerting mechanisms signal issues such as failing processes or resource bottlenecks.
Best practices for monitoring and maintenance:
Automate tasks: Use Apache Airflow and Kubeflow Pipelines as scheduling tools.
Log collection and alerts: Leverage logging frameworks and alerting systems to monitor the health of pipelines.
Best practices for scalable image data pipelines
Leverage cloud-native solutions: The use of cloud-native solutions provides much-needed flexibility, scalability, and optimization of costs. AWS S3, Google Cloud Storage, and Azure Blob Storage make it easy to manage big image datasets.
Data governance: Provide versioning, labeling, and access controls over the datasets for security coherence.
Optimize for cost: Image data pipelines are costly in large-scale systems. Use storage tiers—hot and cold storage—to manage data costs optimally.
Automate and test regularly: Regular testing of the pipeline on the integrity of data and preprocessing ensures predictable performance. This helps catch potential problems before they cause issues in model training.
Conclusion
Designing and sustaining scalable image data processing pipelines for AI training involves careful planning of each step—from ingestion and preprocessing to storage, scalability, and monitoring. Distributed processing, cloud-native utilities, and automation create efficient and agile pipelines that cope with growing volumes of data, laying a solid foundation for robust, high-performing AI models.
It’s November 2028. Maya’s personal AI agent quietly handles her holiday shopping, easily navigating dozens of e-commerce sites. Unlike the clunky chatbots of 2024, her agent seamlessly parses product specifications, compares prices, and makes purchase decisions based on her preferences.
“The boots for your sister,” it explains, “are from that sustainable brand you both discussed last month – I found them at 20% off and confirmed they’ll arrive before your family gathering.” What would have taken Maya hours of manual searching now happens automatically, thanks to a web rebuilt for agent-first interaction.
—> The future, three years from now.
As we approach the end of 2024, a new paradigm shift is emerging in how we build and interact with the internet. With rapid advances in AI reasoning capabilities, tech giants and innovative startups alike are racing to define the next evolution of digital interaction: AI agents, .
Google, Apple, OpenAI, and Anthropic have all declared AI agents as their primary focus for 2025. This transformation promises to be as significant as the web and mobile revolutions were and represents perhaps the most natural interface for LLM-powered technology, far more intuitive and capable than the chatbots that preceded it.
In the recent No Priors Podcast, Nvidia’s CEO Jensen Huang stated that “there’s no question we’re gonna have AI employees of all kinds” that would “augment every single job in the company”.
Moreover, Gartner projects that by 2028, 33% of enterprise software applications will include agentic AI, up from less than 1% today, enabling 15% of day-to-day work decisions to be made autonomously. This rapid adoption mirrors the mobile revolution of the early 2010s but with potentially more far-reaching implications for how we interact with digital services.
While there’s ongoing debate about what an AI Agent is, at its core, what sets agents apart from traditional software is their ability to autonomously plan and adapt.
Unlike rule-based systems that follow predetermined paths, agents can formulate strategies, execute them, and—most importantly—adjust their approach based on outcomes and changing circumstances. Think of them as digital assistants that don’t just follow a script, but actually reason about the best way to achieve your goals.
If a planned action fails or yields unexpected results, an agent can reassess and chart a new course, much like a human would. This flexibility and autonomous decision-making capability marks a departure from traditional software, which can only respond in pre-programmed ways.
The use of tools
Central to agents’ capabilities is their sophisticated use of tools. Much like a handyman who knows when to use a screwdriver versus a hammer, agents must determine which tools to use, when to use them, and how to use them effectively.
For instance, when helping you plan a trip, an agent might first use a calendar tool to check your availability, then a flight search API to find options, and finally a weather service to ensure you pack appropriately. The key isn’t just having access to these tools — it’s the agent’s ability to reason about their use and orchestrate them intelligently to accomplish complex tasks.
This article was originally published here at AI Tidbits, where you can read more of Sahar’s fascinating perspectives on AI-related topics.
From mobile-first to agent-first
Remember when ‘www’ stood for something closer to ‘Wild Wild West’ than ‘World Wide Web’? The early 2000s internet was an untamed digital frontier, where users navigated through a maze of pop-ups, fought off malware, and relied on bookmarked URLs just to find their way around.
The early 2010s, when mobile exploded, weren’t that different as businesses scrambled to make their websites mobile-responsive. That shift wasn’t just about resizing content for smaller screens–it fundamentally changed how we approached web design, user experience, and digital strategy. It created a whole new field of website and mobile optimization: choosing the best colors and text copy to increase traffic, conversion rates, and stickiness.
The agentic AI inflection point
Today, we stand at a similar inflection point with AI agents.
Just as mobile-responsive design emerged from the need to serve smartphone users better, “agent-responsive design” is emerging as websites adapt to serve AI agents. But unlike the mobile revolution, which was about accommodating human users on different devices, the agent revolution requires us to rethink our fundamental assumptions about who – or what – is consuming our digital content.
In this agent-first era, websites will undergo a dramatic transformation. Gone are the days of flashy advertisements, elaborate typography, and resource-heavy images — elements that consume bandwidth but provide little value to AI agents.
Instead, we’re moving toward streamlined, efficient interfaces that prioritize function over form. These new websites will feature minimalist designs optimized for machine parsing, structured data layers that enable rapid information extraction, standardized interaction patterns that reduce processing overhead, and resource-efficient components that minimize token usage and computation costs.
This evolution extends beyond traditional websites. Mobile applications are already being reimagined with agent-interaction layers, as evidenced by recent novel methods like Apple’s Ferret-UI 2 and CAMPHOR, enabling seamless agent navigation of mobile interfaces while maintaining human usability.
Google and Microsoft also invest in this space, as demonstrated in their recent papers AndroidWorld and WindowsAgentArena, respectively. Both are fully functional environments for developers to build and test agents.
The incentives are becoming clear: optimize for agents, and you’ll unlock new channels of engagement and commerce. Ignore them, and you risk becoming invisible in the emerging agent-first internet.
What is Agent Responsive Design?
At its core, agent-responsive design represents a radical departure from traditional web design principles. Instead of optimizing for human visual perception and engagement, websites must provide clear, structured interfaces that agents can efficiently navigate and interact with.
This transformation will likely unfold in two phases:
Phase 1: Hybrid optimization
Initially, websites will maintain dual interfaces: one optimized for human users and a “shadow” version optimized for agents. This agent-optimized version will feature:
Enhanced semantic markup with clear structure and purpose
Unobfuscated HTML that welcomes rather than blocks automated interaction
Well-defined aria-label labels and metadata to help agents choose and interact with the right UI components
Direct access to knowledge bases and documentation by exposing information beyond what’s visible on the “website interface”, giving the querying agents access to their RAG to easily retrieve information such as refund policy or answer questions the agent has based on their help docs. Also, after being authenticated, providing easy access to user-related information such as last purchases or stored payment methods.
Streamlined authentication and authorization protocols
Phase 2: API-first architecture
The second phase will move beyond traditional UI components, focusing on exposing clean, well-documented APIs that agents can directly interact with. Consumer websites like Amazon, TurboTax, and Chase will:
Provide clear documentation of available tools and capabilities. The agent will leverage its reasoning engine and the task the human delegated to plan the tools and sequence that it needs to use.
Offer structured workflows with explicit input/output specifications
Enable direct access to business logic and user data
Support sophisticated authentication mechanisms for agent-based interactions
AI agents will make traditional A/B testing obsolete
In an agent-first world, the traditional approach to A/B testing becomes obsolete. Instead of testing different button colors or copy variations for human users, companies like Amazon will need to optimize for agent interaction efficiency and task completion rates.
These A/B tests will target similar metrics as today: purchases, sign-ups, etc., employing LLMs to generate and test thousands of agent personas without the need for lengthy user testing cycles.
This new paradigm of testing will require new success metrics such as:
Model compatibility across different AI providers (GPT, Claude, etc.) – each language model has its own nuances. Optiziming can help businesses squeeze a few more percentage points for conversion, bounce rate, etc.
Task completion rate for the human-delegated task at hand, like purchasing a product or subscribing to a newsletter
Token efficiency and latency optimization, enabling lightning-fast interactions while minimizing computational overhead and associated costs
Authentication and security protocol effectiveness, ensuring robust protection while maintaining frictionless agent operations
The competitive landscape in this new era will be shaped significantly by model providers’ unique advantages. Companies like OpenAI and Google, with their vast user interaction data, will possess an inherent edge in creating agents that deeply understand user preferences and behaviors. However, this also creates an opportunity for innovation in the form of universal memory and context layers, like what mem0 is pitching with their recently released Chrome extension—systems that can bridge different models, devices, and platforms to create a cohesive user experience.
Drawing from Sierra’s τ-bench research, we can anticipate the emergence of standardized benchmarks for measuring agent-readiness across verticals and task types, similar to how we currently measure mobile responsiveness or page load times.
New discovery protocol – Agent Engine Optimization (AEO)
Just as websites evolved from manually curated directories to sophisticated search engine optimization, the agent era demands a new discovery mechanism. The question isn’t just about findability—it’s about actionability: how do agents identify and interact with the most relevant and capable digital services?
In 2005, Google introduced the Sitemap protocol to improve search engine crawling efficiency, enable discovery of hidden content, and provide webmasters with a standardized method for communicating site structure and content updates to search engines. What is the Sitemap equivalent for AI agents?
Just as SEO emerged to help websites become discoverable in search engines with Google’s inaugural PageRank algorithm, Agent Engine Optimization (AEO) will become crucial for visibility in an agent-first web. Back in Aug 2023, I called it Language Model Ranking Optimization.
This new protocol will go beyond traditional sitemaps, providing agents with structured information about websites:
Available services and capabilities like signing up, placing an order, booking a flight seat
Authentication requirements – what actions require authentication
Data schemas and API endpoints – what data does each action/endpoint need? What is mandatory vs. optional?
Privacy and security protocols – how information is being stored
Service level agreements like refund and shipping guidelines and data retention policy
Exposing such information will become a standard feature in website builders like Shopify and Wix, much like mobile responsiveness is today. These platforms will automatically generate and maintain agent-interaction layers, democratizing access to the agent-first economy for businesses of all sizes.
Companies will need to optimize not just for search engines but for an emerging ecosystem of agent directories and registries that help autonomous agents discover and interact with digital services.