دسته: other

AI-powered healthcare, with Archie Mayani [Video]AI-powered healthcare, with Archie Mayani [Video]

AI-powered healthcare, with Archie Mayani [Video]

How Agentic AI is transforming healthcare deliveryHow Agentic AI is transforming healthcare delivery

How Agentic AI is transforming healthcare delivery

In the Agents of Change podcast, host Anthony Witherspoon welcomes Archie Mayani, Chief Product Officer at GHX (Global Healthcare Exchange), to explore the vital role of artificial intelligence (AI) in healthcare.

GHX is a company that may not be visible to the average patient, but it plays a foundational role in ensuring healthcare systems operate efficiently. As Mayani describes it, GHX acts as “an invisible operating layer that helps hospitals get the right product at the right time, and most importantly, at the right cost.”

GHX’s mission is bold and clear: to enable affordable, quality healthcare for all. While the work may seem unglamorous, focused on infrastructure beneath the surface, it is, in Mayani’s words, “mission critical” to the healthcare system.

Pioneering AI in the healthcare supply chain

AI has always been integral to GHX’s operations, even before the term became a buzzword. Mayani points out that the company was one of the early adopters of technologies like Optical Character Recognition (OCR) within healthcare supply chains, long before such tools were formally labeled as AI.

This historical context underlines GHX’s longstanding commitment to innovation.

Now, with the rise of generative AI and agentic systems, the company’s use of AI has evolved significantly. These advancements are being harnessed for:

Predicting medical supply shortages
Enhancing contract negotiations for health systems
Improving communication between clinicians and supply chain teams using natural language interfaces

All of these tools are deployed in service of one goal: to provide value-based outcomes and affordable care to patients, especially where it’s needed most.

Building resilience into healthcare with “Resiliency AI”

GHX builds resilience. That’s the ethos behind their proprietary system, aptly named Resiliency AI. The technology isn’t just about automation or cost-savings; it’s about fortifying healthcare infrastructure so it can adapt and thrive in the face of change.

Mayani articulates this vision succinctly: “We are not just building tech for healthcare… we are building resilience into healthcare.”

Anthony, the podcast host, highlights a key point: AI’s impact in healthcare reaches far beyond business efficiency. It touches lives during their most vulnerable moments.

The episode highlights a refreshing narrative about AI: one not focused on threats or ethical concerns, but rather on how AI can be an instrument of positive, human-centered change.

The imperative of responsible AI in healthcare

One of the core themes explored in this episode of Agents of Change is the pressing importance of responsible AI; a topic gaining traction across industries, but particularly crucial in healthcare. Host Anthony sets the stage by highlighting how ethics and responsibility are non-negotiable in sectors where human lives are at stake.

Archie Mayani agrees wholeheartedly, emphasizing that in healthcare, the stakes for AI development are dramatically different compared to other industries. “If you’re building a dating app, a hallucination is a funny story,” Mayani quips. “But in [healthcare], it’s a lawsuit; or worse, a life lost.” His candid contrast underscores the life-critical nature of responsible AI design in the medical field.

How Agentic AI is transforming healthcare delivery

Transparency and grounding: The foundation of ethical AI

For GHX, building responsible AI begins with transparency and grounding. Mayani stresses that these principles are not abstract ideals, but operational necessities.

“Responsible AI isn’t optional in healthcare,” he states. It’s embedded in how GHX trains its AI models, especially those designed to predict the on-time delivery of surgical supplies, which are crucial for patient outcomes.

To ensure the highest level of reliability, GHX’s AI models are trained on a diverse range of data:

Trading partner data from providers
Fulfillment records
Supplier reliability statistics
Logistical delay metrics
Historical data from natural disasters

This comprehensive data approach allows GHX to build systems that not only optimize supply chain logistics but also anticipate and mitigate real-world disruptions, delivering tangible value to hospitals and, ultimately, patients.

Explainability is key: AI must justify its decisions like a clinician

One of the most compelling points Archie Mayani makes in the discussion is that AI must explain its logic with the clarity and accountability of a trained clinician. This is especially important when dealing with life-critical healthcare decisions. At GHX, every disruption prediction produced by their AI system is accompanied by a confidence score, a criticality ranking, and a clear trace of the data sources behind the insight.

“If you can’t explain it like a good clinician would, your AI model is not going to be as optimized or effective.”

This standard of explainability is what sets high-functioning healthcare AI apart. It’s not enough for a model to provide an output; it must articulate the “why” behind it in a way that builds trust and enables action from healthcare professionals.

Avoiding AI hallucinations in healthcare

Mayani also reflects on historical missteps in healthcare AI to highlight the importance of data diversity and governance. One case he references is early AI models for mammogram interpretation. These systems produced unreliable predictions because the training data lacked diversity across race, ethnicity, and socioeconomic background.

This led to models that “hallucinated”, not in the sense of whimsical errors, but with serious real-world implications. For example, differences in breast tissue density between African American and Caucasian women weren’t properly accounted for, leading to flawed diagnostic predictions.

To counteract this, GHX emphasizes:

Inclusive training datasets across demographic and physiological variables
Rigorous data governance frameworks
A learning mindset that adapts models based on real-world feedback and outcomes

This commitment helps ensure AI tools in healthcare are equitable, reliable, and aligned with patient realities, not just technical possibilities.

The conversation also touches on a universal truth in AI development: the outputs of any model are only as good as the inputs provided. As Anthony notes, AI doesn’t absolve humans of accountability. Instead, it reflects our biases and decisions.

“If an AI model has bias, often it’s reflective of our own societal bias. You can’t blame the model; it’s showing something about us.”

This reinforces a central thesis of the episode: Responsible AI begins with responsible humans; those who train, test, and deploy the models with intention, transparency, and care.

Earning confidence in AI-driven healthcare

As AI becomes more embedded in healthcare, public fear and discomfort are natural reactions, particularly when it comes to technologies that influence life-altering decisions. Anthony captures this sentiment, noting that any major innovation, especially in sensitive sectors like healthcare, inevitably raises concerns.

Archie Mayani agrees, emphasizing that fear can serve a constructive purpose. “You’re going to scale these agents and AI platforms to millions and billions of users,” he notes. “You better be sure about what you’re putting out there.” That fear, he adds, should drive greater diligence, bias mitigation, and responsibility in deployment.

The key to overcoming this fear? Transparency, communication, and a demonstrable commitment to ethical design. As Mayani and Anthony suggest, trust must be earned, not assumed. Building that trust involves both technical rigor and emotional intelligence to show stakeholders that AI can be both safe and valuable.

The challenge of scaling agentic AI in healthcare

With a strong foundation in ethical responsibility, the conversation shifts to a pressing concern: scaling agentic AI models in healthcare environments. These are AI systems capable of autonomous decision-making within predefined constraints, highly useful, but difficult to deploy consistently at scale.

Mayani draws an apt analogy: scaling agentic AI in healthcare is like introducing a new surgical technique.

“You have to prove it works, and then prove it works everywhere.”

This speaks to a fundamental truth in health tech: context matters. An AI model trained on datasets from the Mayo Clinic, for example, cannot be transplanted wholesale into a rural community hospital in Arkansas. The operational environments, patient demographics, staff workflows, and infrastructure are vastly different.

Key barriers to AI scalability in healthcare

Contextual variability. Every healthcare setting is unique in terms of needs, infrastructure, and patient populations.
Data localization. Models must be fine-tuned to reflect local realities, not just generalized benchmarks.
Performance assurance. At scale, AI must remain accurate, explainable, and effective across all points of care.

For product leaders like Mayani, scale and monetization are the twin pressures of modern AI deployment. And in healthcare, the cost of getting it wrong is too high to ignore.

GHX’s resiliency center: A scalable AI solution in action

To illustrate how agentic AI can be successfully scaled in healthcare, Archie Mayani introduces one of GHX’s flagship products: Resiliency Center. This tool exemplifies how AI can predict and respond to supply chain disruptions at scale, offering evidence-based solutions in real time.

Resiliency Center is designed to:

Accurately categorize and predict potential disruptions in the healthcare supply chain
Recommend clinical product alternatives during those disruptions
Integrate seamlessly across dozens of ERP systems, even with catalog mismatches
Provide evidence-backed substitute products, such as alternatives to specific gloves or catheters likely to be back-ordered

These “near-neighborhood” product recommendations are not only clinically valid, but context-aware. This ensures that providers always have access to the right product, at the right time, at the right cost, a guiding principle for GHX.

“The definition of ‘right’ is really rooted in quality outcomes for the patient and providing access to affordable care, everywhere.”

This operational model is a clear example of scaling with purpose. It reflects Mayani’s earlier point: you can’t scale effectively without training on the right datasets and incorporating robust feedback loops to detect and resolve model inaccuracies.

Making sense of healthcare data

As the conversation shifts to the nature of healthcare data, Anthony raises a key issue: data fragmentation. In healthcare, data often exists in disconnected silos, across hospitals, systems, devices, and patient records, making it notoriously difficult to use at scale.

Mayani affirms that overcoming this fragmentation is essential for responsible and effective AI. The foundation of scalable, bias-free, and high-performance AI models lies in two critical pillars:

Data diversity. AI systems must be trained on varied and inclusive datasets that reflect different patient populations, healthcare contexts, and operational environments.
Data governance. There must be strict protocols in place to manage, verify, and ethically handle healthcare data. This includes everything from ensuring data integrity to setting up feedback mechanisms that refine models over time.

“All of that, scaling, performance, bias mitigation, it ultimately comes down to the diversity and governance of the data.”

This framing offers a critical insight for healthcare leaders and AI practitioners alike: data is the bedrock of trustworthy AI systems in medicine.

Why local context and diverse data matter in healthcare AI

One of the most illustrative examples of data diversity’s value came when GHX’s models flagged a surgical glove shortage in small rural hospitals, a disruption that wasn’t immediately visible in larger healthcare systems. Why?

Rural hospitals often have different reorder thresholds.
They typically lack buffer stock and have fewer supplier relationships compared to large Integrated Delivery Networks (IDNs).

This nuanced insight could only emerge from a truly diverse dataset. As Archie Mayani explains, if GHX had only trained its models using data from California, it might have overlooked entirely seasonal and regional challenges, like hurricanes in the Southeast or snowstorms in Minnesota, that affect supply chains differently.

“Healthcare isn’t a monolith. It’s a mosaic.”

That mosaic requires regionally relevant, context-sensitive data inputs to train agentic AI systems capable of functioning across a broad landscape of clinical settings.

Trust and data credibility: The often overlooked ingredient

Diversity in data is only part of the solution. Trust in data sources is equally critical. Archie points out a fundamental truth: not all datasets are equally valid. Some may be outdated, siloed, or disconnected from today’s realities. And when AI systems train on these flawed sources, their predictions suffer.

This is where GHX’s role as a trusted intermediary becomes essential. For over 25 years, GHX has served as a neutral and credible bridge between providers and suppliers, earning the trust required to curate, unify, and validate critical healthcare data.

“You need a trusted entity… not only for diverse datasets, but the most accurate, most reliable, most trusted datasets in the world.”

GHX facilitates cooperation across the entire healthcare data ecosystem, including:

Hospitals and providers
Medical suppliers and manufacturers
Electronic Medical Record (EMR) systems
Enterprise Resource Planning (ERP) platforms

This integrated ecosystem approach ensures the veracity of data and enables more accurate, bias-aware AI models.

Diversity and veracity: A dual mandate for scalable AI

Anthony aptly summarizes this insight as a two-pronged strategy: it’s not enough to have diverse datasets; you also need high-veracity data that’s trusted, updated, and contextually relevant. Mayani agrees, adding that agentic AI cannot function in isolation; it depends on a unified and collaborative network of stakeholders.

“It’s beyond a network. It’s an ecosystem.”

By connecting with EMRs, ERPs, and every link in the healthcare chain, GHX ensures its AI models are both informed by real-world variability and grounded in validated data sources.

From classical AI to agentic AI: A new era in healthcare

Archie Mayani makes an important distinction between classical AI and agentic AI in healthcare. For decades, classical AI and machine learning have supported clinical decision-making, especially in diagnostics and risk stratification. These systems helped:

Identify patients with complex comorbidities
Prioritize care for those most at risk
Power early diagnostic tools such as mammography screenings

“We’ve always leveraged classical AI in healthcare… but agentic AI is different.”

Unlike classical models that deliver discrete outputs, agentic AI focuses on workflows. It has the potential to abstract, automate, and optimize full processes, making it uniquely suited to address the growing pressures in modern healthcare.

Solving systemic challenges with agentic AI

Mayani highlights the crisis of capacity in today’s healthcare systems, particularly in the U.S.:

Staff shortages across both clinical and back-office roles
Rising operational costs
Fewer trained physicians are available on the floor

In this context, agentic AI emerges as a co-pilot. It supports overburdened staff by automating routine tasks, connecting data points, and offering intelligent recommendations that extend beyond the exam room.

One of the most compelling examples Mayani shares involves a patient with recurring asthma arriving at the emergency department. Traditionally, treatment would focus on the immediate clinical issue. But agentic AI can see the bigger picture:

It identifies that the patient lives near a pollution site
Notes missed follow-ups due to lack of transportation
Recognizes socioeconomic factors contributing to the recurring condition

With this information, the healthcare team can address the root cause, not just the symptom. This turns reactive treatment into proactive, preventative care, reducing waste and improving outcomes.

“Now you’re not treating a condition. You’re addressing a root cause.”

This approach is rooted in the Whole Person Care model, which Mayani recalls from his earlier career. While that model once relied on community health workers stitching together fragmented records, today’s agentic AI can do the same work; faster, more reliably, and at scale.

Agentic AI as a member of the care team

Ultimately, Mayani envisions agentic AI as a full-fledged member of the care team, one capable of:

Intervening earlier in a patient’s health journey
Coordinating care across departments and disciplines
Understanding and integrating the social determinants of health
Delivering on the promise of Whole Person Care

This marks a paradigm shift, from episodic, condition-focused care to integrated, data-driven, human-centered healing.

One of the most transformative promises of agentic AI in healthcare is its ability to identify root causes faster, significantly reducing both costs and systemic waste. As Anthony notes, the delay in getting to a solution often drives up costs unnecessarily, and Mayani agrees.

“Prevention is better than cure… and right now, as we are fighting costs and waste, it hasn’t been truer than any other time before.”

Agentic AI enables care teams to move from reactive service delivery to proactive problem-solving, aligning healthcare with long-promised, but rarely achieved, goals like holistic and whole-person care. The way Mayani describes it, this is now a practical, scalable reality.

COVID-19: A catalyst for AI innovation in supply chain resilience

Looking back at the COVID-19 pandemic, Mayani reflects on one of the biggest shocks to modern healthcare: supply chain collapse. It wasn’t due to a lack of data; healthcare generates 4x more data than most industries. The failure was one of foresight and preparedness.

“The supply chain broke not because we didn’t have the data, but because we didn’t have the foresight.”

This crisis has become a compelling event that has accelerated innovation. GHX’s own AI-driven Resiliency Center now includes early versions of systems that can:

Detect high-risk items like ventilator filters at risk of shortage
Recommend five clinically approved alternatives, sorted by cost, delivery time, and supplier reliability
Provide real-time, evidence-based recommendations across a multi-stakeholder ecosystem

Mayani likens this transformation to going from a smoke detector to a sprinkler system; not just identifying the problem, but acting swiftly to stop it before it spreads.

Learning from crisis: Building a proactive future

COVID-19 may have been an unprecedented tragedy, but it forced healthcare organizations to centralize data, embrace cloud infrastructure, and accelerate digital transformation.

Before 2020, many health systems were still debating whether mission-critical platforms should move to the cloud. Post-crisis, the conversation shifted from adoption to acceleration, opening the door to advanced technologies like AI and GenAI.

“Necessity leads to innovation,” as Anthony puts it, and Mayani agrees.

The result is a more resilient, more responsive healthcare system, better equipped to navigate future challenges, from pandemics to geopolitical shifts to tariff policy changes. GHX now plays a pivotal role in helping suppliers and providers understand and act on these evolving variables through data visibility and decision-making intelligence.

AI hallucinations in healthcare

While agentic AI offers powerful capabilities, hallucinations remain a significant risk, particularly in healthcare, where errors can have devastating consequences. Archie Mayani openly acknowledges this challenge: even with high-quality, diverse, and rigorously governed datasets, hallucinations can still occur.

Drawing from his early work with diagnostic models for lung nodules and breast cancer detection, Mayani explains that hallucinations often stem from data density issues or incomplete contextual awareness. These can lead to outcomes like:

Recommending a nonexistent or back-ordered medical supply erodes trust
Incorrectly suggesting a serious diagnosis, such as early breast cancer, to a healthy individual

Both are catastrophic in their own way, and both highlight the need for fail-safes and human oversight.

GHX’s guardrails against AI errors

To mitigate these risks, GHX employs a multi-layered approach:

Validated data only. Models pull from active, verified medical supply catalogs.
Human-in-the-loop systems. AI makes predictions, but a human still approves the final decision.
Shadow mode training. New models run in parallel with human-led processes until they reach high reliability.
“Residency training” analogy. Mayani likens early AI models to junior doctors under supervision; they’re not allowed to operate independently until they’ve proven their accuracy.

This framework ensures that AI earns trust through performance, reliability, and responsibility, not just promises.

The future of agentic AI

When asked to predict the future of agentic AI in healthcare, Mayani presents a powerful vision: a world where AI becomes invisible.

“When AI disappears… that’s when we’ve truly won.”

He envisions a future where AI agents across systems, such as GHX’s Resiliency AI and hospital EMRs, communicate autonomously. A nurse, for instance, receives necessary supplies without ever placing an order, because the agents already anticipated the need based on scheduled procedures and clinical preferences.

Indicators of AI maturity

Care is coordinated automatically without juggling apps or administrative steps.
Agentic AI enables seamless, behind-the-scenes action, optimizing outcomes while removing friction.
Patients receive timely, affordable, personalized care, not because someone made a phone call, but because the system understood and acted.

This is the true potential of agentic AI: not to dazzle us with flashy features, but to blend so naturally into the work that it disappears.

The normalization of AI

As AI becomes more embedded in daily life, public perception is shifting from fear to discovery, and now, toward normalization. As Mayani and Anthony discuss, many people already use AI daily (in smartphones, reminders, and apps) without even realizing it.

The goal is for agentic AI to follow the same path: to support people, not replace them; to augment creativity, not suppress it; and to enable higher-order problem-solving by removing repetitive, predictable tasks.

“It’s never about the agents taking over the world. They are here so that we can do the higher-order bits.”

The path forward for Agentic AI in healthcare

The future of healthcare lies not in whether AI will be used but how. And leaders like Archie Mayani at GHX are laying the foundation for AI that is ethical, explainable, resilient, and invisible.

From predicting disruptions and recommending evidence-based alternatives to coordinating care and addressing root causes, agentic AI is already reshaping how we deliver and experience healthcare.

The next chapter is about when it quietly steps into the background, empowering humans to do what they do best: care.

How AI is redefining cyber attack and defense strategiesHow AI is redefining cyber attack and defense strategies

How AI is redefining cyber attack and defense strategies

As AI reshapes every aspect of digital infrastructure, cybersecurity has emerged as the most critical battleground where AI serves as both weapon and shield.

The cybersecurity landscape in 2025 represents an unprecedented escalation in technological warfare, where the same AI capabilities that enhance organizational defenses are simultaneously being weaponized by malicious actors to create more sophisticated, automated, and evasive attacks.

The stakes have never been higher. Recent data from the CFO reveals that 87% of global organizations faced AI-powered cyberattacks in the past year, while the AI cybersecurity market is projected to reach $82.56 billion by 2029, growing at a compound annual growth rate of 28% .

This explosive growth reflects not just market opportunity, but an urgent response to threats that are evolving faster than traditional security measures can adapt.

Part 1: Adversaries in the age of AI

Cyber adversaries have found a powerful new weapon in AI, and they’re using it to rewrite the offensive playbook. The game has changed, with attacks now defined by automated deception, hyper-realistic social engineering, and intelligent malware that thinks for itself.

The industrialization of deception

The old security advice – “spot the typo, spot the scam” – is officially dead. Generative AI now crafts flawless, hyper-personalized phishing emails, texts, and voice messages that are devastatingly effective.

The numbers tell a chilling story: AI-generated phishing emails boast a 54% click-through rate, dwarfing the 12% from human-written messages. Meanwhile, an estimated 80% of voice phishing (vishing) attacks now use AI to clone voices, making it nearly impossible to trust your own ears.

How AI is redefining cyber attack and defense strategies

This danger is not theoretical. Consider the Hong Kong finance employee who, in 2024, was tricked into transferring $25 million after a video conference where every single participant, including the company’s CFO, was an AI-generated deepfake.

In another cunning campaign, a threat group dubbed UNC6032 built fake websites mimicking popular AI video generators, luring creators into downloading malware instead of trying a new tool. The result is the democratization of sophisticated attacks. Tools once reserved for nation-states are now in the hands of common cybercriminals, who can launch convincing, scalable campaigns with minimal effort.

Malware that thinks for itself

The threat extends beyond tricking humans to the malicious code itself. Attackers are unleashing polymorphic and metamorphic malware that uses AI to constantly change its own structure, making it a moving target for traditional signature-based defenses.

The BlackMatter ransomware, for example, uses AI to perform live analysis of a victim’s security tools and then adapts its encryption strategy on the fly to bypass them.

On the horizon, things look even more concerning. Researchers have already designed a conceptual AI-powered worm, “Morris II,” that can spread autonomously from one AI system to another by hiding malicious instructions in the data they process.

At the same time, AI is automating the grunt work of hacking. AI agents, trained with Deep Reinforcement Learning (DRL), can now autonomously probe networks, find vulnerabilities, and launch exploits, effectively replacing the need for a skilled human hacker.

Part 2: Fighting fire with fire: AI on cyber defense

But the defense is not standing still. A counter-revolution is underway, with security teams turning AI into a powerful force multiplier. The strategy is shifting from reacting to breaches to proactively predicting and neutralizing threats at machine speed.

Seeing attacks before they happen

The core advantage of defensive AI is its ability to process data at a scale and speed no human team can match. Instead of just looking for known threats, AI-powered systems create a baseline of normal behavior across a network and then hunt for tiny deviations that signal a hidden compromise.

This is how modern defenses catch novel, zero-day attacks. The most advanced systems are even moving from detection to prediction. By analyzing everything from global attack trends to dark web chatter, and new vulnerabilities, AI models can forecast where the next attack wave will hit, allowing organizations to patch vulnerabilities before they’re ever targeted.

Your newest teammate is an AI

The traditional Security Operations Center (SOC) – a room full of analysts drowning in a sea of alerts is becoming obsolete. In its place, the AI-driven SOC is rising, where AI automates the noise so humans can focus on what matters.

AI now handles alert triage, enriches incident data, and filters out the false positives that cause analyst burnout. We’re now seeing AI “agents” and “copilots” from vendors like Microsoft, CrowdStrike, and SentinelOne that act as true partners to security teams.

These AI assistants can autonomously investigate a phishing email, test its attachments in a sandbox, and quarantine every copy from the enterprise in seconds, all while keeping a human in the loop for the final say. This is more than an efficiency gain; it’s a strategic answer to the massive global shortage of cybersecurity talent.

Making zero trust a reality

AI is also the key to making the “never trust, always verify” principle of the Zero Trust security model a practical reality. Instead of static rules, AI enables dynamic, context-aware access controls.

It makes real-time decisions based on user behavior, device health, and data sensitivity, granting only the minimum privilege needed for the task at hand. This is especially vital for containing the new risks from the powerful but fundamentally naive AI agents that are beginning to roam corporate networks.

Part 3: The unseen battlefield: Securing the AI itself

For all the talk about using AI for security, we’re overlooking a more fundamental front in this war: securing the AI systems themselves. For the AIAI community – the architects of this technology – understanding these novel risks is not an option, it’s an operational imperative.

How AI can be corrupted

Machine learning models have an Achilles’ heel. Adversarial attacks exploit it by making tiny, often human-imperceptible changes to input data that cause a model to make a catastrophic error.

Think of a sticker that makes a self-driving car’s vision system misread a stop sign, or a slight tweak to a malware file that renders it invisible to an AI-powered antivirus. Data poisoning is even more sinister, as it involves corrupting a model’s training data to embed backdoors or simply degrade its performance.

A tool called “Nightshade” already allows artists to “poison” their online images, causing the AI models that scrape them for training to malfunction in bizarre ways.

The danger of autonomous agents

With agentic AI, autonomous systems that can reason, remember, and use tools – the stakes get much higher. An AI agent is the perfect “overprivileged and naive” insider.

It’s handed the keys to the kingdom – credentials, API access, permissions – but has no common sense, loyalty, or understanding of malicious intent. An attacker who can influence this agent has effectively recruited a powerful insider. This opens the door to new threats like:

Memory poisoning: Subtly feeding an agent bad information over time to corrupt its future decisions.
Tool misuse: Tricking an agent into using its legitimate tools for malicious ends, like making an API call to steal customer data.
Privilege compromise: Hijacking an agent to exploit its permissions and move deeper into a network.

The need for AI red teams

Because AI vulnerabilities are so unpredictable, traditional testing methods fall short. The only way to find these flaws before an attacker does is through AI red teaming: the practice of simulating adversarial attacks to stress-test a system.

This is not a standard penetration test; it’s a specialized hunt for AI-specific weaknesses like prompt injections, data poisoning, and model theft. It’s a continuous process, essential for discovering the unknown unknowns in these complex, non-deterministic systems.

What’s next?

The AI revolution in cybersecurity is both the best thing that’s happened to security teams and the scariest development we’ve seen in decades.

With 73% of enterprises experiencing AI-related security incidents averaging $4.8 million per breach, and deepfake incidents surging 19% just in the first quarter of this year, the urgency couldn’t be clearer. This isn’t a future problem – it’s happening right now.

The organizations that will survive and thrive are those that can master the balance. They’re using AI to enhance their defenses while simultaneously protecting themselves from AI-powered attacks. They’re investing in both technology and governance, automation and human expertise.

The algorithmic arms race is here. Victory will not go to the side with the most algorithms, but to the one that wields them with superior strategy, foresight, and a deep understanding of the human element at the center of it all.

AI and the future of international student outreachAI and the future of international student outreach

AI and the future of international student outreach

My daily work in the EdTech industry consists of constant back-and-forth comparison between the United Kingdom’s admissions machine and the digital experiences offered at higher education systems elsewhere.

While UK universities debate workflow changes, universities in competing nations are plugging mass-scale AI systems directly into recruitment and immigration processes.

Unless we get similar equipment to work for us, ethically and at sector scale, the recent drop in foreign applications might be the beginning of a longer fall.

Global competition is accelerating

In March 2024, Reuters wrote that Microsoft and OpenAI are mulling a $100 billion U.S. super-computer project called Stargate to train the next generation of language models. Faster models translate to richer, more personalized student-facing services: anything from adaptive test preparation to multilingual visa counseling.

On the demand side, UNESCO’s 2024/5 Global Education Monitoring Report suggests that cross-border tertiary enrolments will rise by some two million seats by 2030, driven by South Asia and Sub-Saharan Africa most of all.

Potential students in these nations do much of their research online and respond quickly to chat-based counsel. It’s the perfect setting for today’s language models.

AI and the future of international student outreach

Home-grown headwinds

The UK, on the other hand, has done the reverse. From 1 January 2024, the majority of international students will have the right to bring dependents with them, as a Home Office press release confirmed. And on 12 May 2025, the immigration white paper of the government laid out to cut Graduate-Route work rights to 18 months from two years (White Paper). Early figures show the impact is immediate: Universities UK reports a 44 percent decrease in January 2024 postgraduate-taught enrollments compared to the previous year.

The applicant’s maze

Policy changes contribute to an already fragmented process: multiple document portals, disproportionate English-language regulations, and inconsistent turnaround times.

When there are delays in official replies, students congregate in WhatsApp or Telegram groups, where misinformation and downright fraud spread rapidly. I’ve spoken with families who paid unlicensed agents just to upload PDFs that the university should have accepted for free.

Every lost or intimidated candidate blemishes the UK’s reputation for educational transparency.

Why AI now makes practical sense

Big language models at last give the missing layer of always-present, policy-aware advice. A model trained on the UKVI rule-set, UCAS codes, CAS logic, and institutional rules can answer subtle questions in dozens of languages, urge applicants if a bank letter or tuberculosis certificate is missing, pre-screen document quality before human officers waste time, and even emit rule changes the moment the Home Office updates its guidance.

All of that can happen in seconds, at any hour, without building yet another portal.

Lessons from building an admissions companion

We tested out an AI companion as an internal pilot during the past two years with my colleagues. We grounded the model on nothing but formal reports: Home-Office sponsor advice, university policy PDFs, anonymized email templates.

There are three things that we found most interesting:

Context before horsepower. Precision was enhanced more by selecting clean source material than by changing to the latest model.
Transparency fosters trust. Showing the precise paragraph a response is from, with a live link, cuts down on follow-up questions more than any tone adjustment.
Staff want relief, not replacement. When admissions staff created lists of work they would offload first, “file-name formatting” and “duplicate-document emails” were first; no one feared for the loss of the human touch of their work.

Designing for interoperability and privacy

Any production-ready solution must read and write to installed CRMs and student-record systems using secure APIs, adhering to GDPR and local data-classification guidelines.

That integration approach avoids the creation of a standalone “AI portal” and maintains human agency: the model reports uncertainty, triages hard cases, and trains continuously from staff edits.

Expected impact

Increased industry uptake will not remake immigration policy, but it can close the service gap currently sending applicants to competing destinations.

Small efficiency boosts translate into days shaved off offer release, percentage-point decreases in visa refusals, fewer out-of-hours queries: total millions of pounds of saved fee revenue and, more importantly, into goodwill among students perceiving the UK as responsive, not bureaucratic.

A collaborative path forward

Government, sector groups, and suppliers would collectively pay for a shared knowledge base for visa regulations, credential-evaluation standards, and regulatory reporting.

Local layers would then be adapted by individual universities without replicating the regulatory core. This kind of collaboration would mirror the infrastructure-first approach behind the U.S. Stargate initiative and enable the UK to continue to be a destination of choice for global talent.

Conclusion

Global recruitment is no longer a battle of shiny pamphlets but one of latency, language coverage, and policy correctness. AI will not undo discriminatory visa policies, but it will ensure that when opportunities are available, deserving students are not lost in bureaucratic fog.

For a sector generating more than £40 billion a year in exports, the deployment of ethical, interoperable AI is less an experiment and more a prudent maintenance of the UK’s competitive edge.

How large language models are transforming pediatric healthcareHow large language models are transforming pediatric healthcare

How large language models are transforming pediatric healthcare

What if artificial intelligence could help us solve some of the most complex challenges in pediatric healthcare, especially when it comes to rare diseases?

At Great Ormond Street Hospital (GOSH), we face these challenges daily, treating children with some of the most difficult and rare conditions imaginable. But as powerful as human expertise is, we often find ourselves dealing with an overwhelming amount of data, from patient histories to diagnostic reports, making it hard to extract the insights we need quickly and efficiently.

This is where artificial intelligence and machine learning come in. These technologies have the potential to revolutionize the way we process and utilize healthcare data. At GOSH, we’re leveraging AI, particularly large language models (LLMs), to tackle the complexity of this data and improve patient outcomes.

In this article, I’ll share insights from our journey of integrating AI into pediatric healthcare at GOSH and how AI is helping us improve care, streamline operations, and make healthcare more accessible for children with rare diseases.

Let’s dive in.

The role of GOSH’s DRIVE unit

In 2018, we established the DRIVE unit, which stands for Data, Research, Innovation, and Virtual Environment. Our goal? To harness data and technology to improve outcomes for children, families, and our healthcare staff.

We want to make GOSH the global go-to center for pediatric innovation, and we aim to do this by utilizing AI and data to drive breakthroughs in treatment, diagnosis, and patient care.

Our mission goes beyond merely innovating for the sake of it; we want to use AI to make an impact not just locally but globally. The data we collect is especially valuable for research, particularly in the realm of rare diseases – conditions that often don’t receive enough attention due to their rarity.

But how do we make sense of this relatively small dataset, and how can we share this knowledge globally? That’s one of the questions we’ve been working to answer with the help of AI and ML (machine learning).

How large language models are transforming pediatric healthcare

Harnessing data for research and operational efficiencies

In terms of data management, GOSH has undergone a massive overhaul in the past few years. Before 2019, we were using over 400 different systems for collecting patient data. As you can imagine, this was both inefficient and hard to maintain.

That’s when we made the strategic decision to replace our outdated systems with a single platform – EPIC. This transition has allowed us to integrate all patient data into a unified electronic health record system.

Humans in the loop: How leading companies are building practical, trustworthy AIHumans in the loop: How leading companies are building practical, trustworthy AI

Humans in the loop: How leading companies are building practical, trustworthy AI

At the NYC Generative AI Summit, experts from Wayfair, Morgan & Morgan, and Prolific came together to explore one of AI’s most pressing questions: how do we balance the power of automation with the necessity of human judgment?

From enhancing customer service at scale to navigating the complexity of legal workflows and optimizing human data pipelines, the panelists shared real-world insights into deploying AI responsibly. In a field moving at breakneck speed, this discussion was an opportunity to examine how we can build AI systems that are effective, ethical, and enduring.

From support to infrastructure: Evolving with generative AI

Generative AI is reshaping industries at a pace few could have predicted. And at Wayfair, that pace is playing out in real time. Vaidya Chandrasekhar, who leads pricing, competitive intelligence, and catalog ML algorithms at the company, shared how their approach to generative AI has grown from practical customer support tools to foundational infrastructure transformation.

Early experiments started with agent assistance, particularly in customer service. These included summarizing issue histories and providing real-time support to customer-facing teams, the kind of use cases many companies have used as a generative AI entry point.

From there, Wayfair moved into more technical territory. One significant area has been technology transformation: shifting from traditional SQL stored procedures toward more dynamic systems.

“We’ve been asking questions like, ‘if you’re selecting specific data points and trying to understand your data model’s ontology, what would that look like as a GraphQL query?’” Vaidya explained. While not all scenarios fit the model, roughly 60-70% of use cases have proven viable.

Perhaps the most transformative application is in catalog enrichment, which is at the core of Wayfair’s operations. Generative AI is being used to enhance and accelerate how product data is organized and surfaced. And in such a fast-moving environment, agility is key.

“Just this morning, we were speaking with our CEO. What the plan was two months ago is already shifting,” Vaidya noted. “We’re constantly adapting to keep pace with what’s possible.”

The company is firmly positioned at the edge of change, continuously testing how emerging tools can bring efficiency, clarity, and value to both internal workflows and customer experiences.

Building AI inside the nation’s largest injury law firm

When most people think of personal injury law firms, they don’t picture teams of software engineers writing AI tools. But that’s exactly what’s happening at Morgan & Morgan, the largest injury law firm in the United States.

Paras Chaudhary, Software Engineering Lead at Morgan & Morgan, often gets surprised reactions when he explains what he does. “They wonder what engineers are even doing there,” he said.

The answer? Quite a lot, and increasingly, that work involves generative AI.

Law firms, by nature, have traditionally been slow to adopt new technologies. The legal profession values precedent, structure, and methodical processes, qualities that don’t always pair easily with the fast-evolving world of AI.

But Morgan & Morgan is taking a different approach. With the resources to invest in an internal engineering team, they’re working to lead the charge in legal AI adoption.

The focus isn’t on replacing lawyers, but empowering them. “I hate the narrative that AI will replace people,” Paras emphasized. “What we’re doing is building tools that make attorneys’ lives easier: tools that help them do more, and do it better.”

Of course, introducing new technology into a non-technical culture comes with its own challenges. Getting attorneys, many of whom have been doing things the same way for a decade or more, to adopt unfamiliar tools isn’t always easy.

“It’s been an uphill battle,” he admitted. “Engineering in a non-tech firm is hard enough. When your users are lawyers who love their ways of doing things, it’s even tougher.”

Despite the resistance, the team has had measurable success in deploying generative AI internally. And equally important, they’ve learned from their failures. The journey has been anything but flashy, but it’s quietly reshaping how legal work can be done at scale.

Humans in the loop: How leading companies are building practical, trustworthy AI

Human data’s evolving role in AI: From volume to precision

While much of the conversation around generative AI focuses on model architecture and compute power, Sara Saab, VP of Product at Prolific, brought a vital perspective to the panel: the role of human data in shaping AI systems. Prolific positions itself as a human data platform, providing human-in-the-loop workflows at various stages of model development, from training to post-deployment.

“This topic is really close to my heart,” Sara shared, reflecting on how drastically the human data landscape has shifted over the last few years.

Back in the early days of ChatGPT, large datasets were the core currency. “There was an arms race,” she explained, “where value was all about having access to massive amounts of training data.”

But in today’s AI development pipeline, that’s no longer the case. Many of those large datasets have been distilled down, commoditized, or replaced by open-source alternatives used for benchmarking.

The industry’s focus has since shifted. In 2023 and into 2024, efforts moved toward fine-tuning, both supervised and unsupervised, and the rise of retrieval-augmented generation (RAG) approaches. Human feedback became central through techniques like RLHF (reinforcement learning from human feedback), though even those methods have begun to evolve.

“AI is very much a white-paper-driven industry,” Sara noted. “Every time a new paper drops, everyone starts doing everything differently.” Innovations like reinforcement learning via rejection sampling or variational reward shaping (RLVR) began to reduce the need for heavy fine-tuning, at least on the surface. But peel back the layers, she argued, and humans are still deeply embedded in the loop.

Today, the emphasis is increasingly on precise, expert-curated datasets, the kind that underpin synthetic data generators, oracle solvers, and other sophisticated human-machine orchestrations. These systems are emerging as critical to the next generation of model training and evaluation.

At the same time, foundational concerns around alignment, trust, and safety are rising to the surface. Who defines the benchmarks on which models are evaluated? Who assures their quality?

“We look at leaderboards with a lot of interest,” Sara said. “But we also ask: who’s behind those benchmarks, and what are we actually optimizing for?”

It’s a timely reminder that while the tooling and terminology may shift rapidly, the human element, in all its philosophical, ethical, and practical complexity, remains central to the future of AI.

Human oversight in AI: The power of boring integration

As large language models (LLMs) become increasingly capable, the question of human oversight becomes more complex. How do you keep humans meaningfully in the loop when models are doing more of the heavy lifting, and doing it well?

For Paras, the answer isn’t flashy tools or complex interfaces. It’s simplicity, even if that means embracing the boring.

“Our workflows aren’t fancy because they didn’t need to be,” he explained. “Most of the human-in-the-loop flow at the firm is based on approval mechanisms. When the model extracts ultra-critical information, a human reviews it to confirm whether it makes sense or not.”

To drive adoption among lawyers, notoriously resistant to change, Paras applied what he calls the “radio sandwiching” approach. “Radio stations introduce new songs by sandwiching them between two tracks you already like. That way, the new stuff feels familiar and your alarms don’t go off,” he said. “That’s what we had to do. We disguised the cool AI stuff as the boring workflows people already knew.”

At Morgan & Morgan, that meant integrating AI into the firm’s existing Salesforce infrastructure, not building new tools or expecting users to learn new platforms. “All our attorney workflows are based in Salesforce,” Paras explained. “So we piped our AI outputs right into Salesforce, whether it was case data or something else. That was the only way to get meaningful adoption.”

When asked if this made Salesforce an annotation platform, Paras didn’t hesitate. “Exactly. It works. Do what you have to do. Don’t get stuck on whether it looks sexy. That’s not the point.”

Vaidya Chandrasekhar, who leads pricing and ML at Wayfair, echoed the sentiment. “I agree with a lot of what Paras said,” he noted. “I’d frame it slightly differently; it’s about understanding where machine intelligence kicks in, and where human judgment still matters. You’re always negotiating that balance. But yes, integrating into existing, familiar workflows is essential.”

As AI systems evolve, the methods for keeping humans involved might not always be elegant. But as this panel made clear, pragmatism often beats perfection when it comes to real-world deployment.

Orchestrating intelligence: How humans and AI learn to work together

As the conversation turned to orchestration, the complex collaboration between humans and machines, Paras offered a grounded view shaped by hard-earned experience.

“I did walk right into that,” he joked as the question was directed his way. But his answer made it clear he’s thought deeply about this dynamic.

For Paras, orchestration isn’t about building futuristic autonomy. It’s about defining roles and designing practical workflows. “There are definitely some tasks machines can handle on their own,” he said. “But the majority of the work we do involves figuring out which parts to automate and where humans still need to make decisions.”

He emphasized that the key is not treating the system as a black box, but instead fostering a loop in which humans improve the AI by correcting, contextualizing, and even retraining it over time. “The job of humans is to continue evolving these machines,” he said. “They don’t get better on their own.”

Paras also highlighted the importance of being able to pause and escalate AI systems when needed, especially when the model encounters something novel or ambiguous. He gave the example of defining a new item like “angular stemless glass.”

“You don’t want the model to just make it up and run with it,” he said. “You want it to scrape the internet, make its best guess, and then ask a human – Is this right?”

That ability for the system to admit uncertainty is central to how Paras thinks about orchestration. “It’s like hiring someone new,” he said. “The smartest people still need to know when to raise their hand and say, I’m not sure about this. That’s the critical skill we need to train into our AI.”

Why determinism matters in AI orchestration

As the panel discussion on orchestration continued, Paras offered a grounded counterpoint to the rising excitement around agentic systems and autonomous AI decision-making.

“If I didn’t believe in a hybrid world between humans and machines, I wouldn’t be sitting here,” he said. “But let me be clear: at our firm, we have no interest in dabbling with agent tech.”

While many startups and venture-backed companies are chasing autonomous agents that can reason, plan, and act independently, Paras argued that this kind of complexity introduces too much uncertainty. “Agent tech creates too many steps, too many potential points of failure. And when the probability of failure multiplies across those steps, the overall chance of success drops.”

Instead, Paras advocated for an orchestration model grounded in determinism, where workflows are tightly scoped, predictable, and easily governed by clear logic.

“I love it when orchestration is deterministic,” he said. “That could mean a simple if/else statement. It could mean a human approver. What matters is that the system behaves in a way that’s traceable, testable, and reliable.”

At Morgan & Morgan, where the stakes are high and the work is bound by legal and procedural constraints, this type of orchestration isn’t just a preference, it’s a necessity. “We’re not in a startup trying to sell a dream,” Paras pointed out. “We’re in a firm where outcomes matter, and we need to know the system will work as expected.”

That pragmatic approach may not sound flashy, but it’s exactly what’s enabling his team to make real, measurable progress. By prioritizing reliability over autonomy, they’re proving that impactful AI doesn’t always need to be cutting-edge; it needs to be dependable.

The broader conversation circled back to how the most valuable AI systems are the ones that know when they don’t know and are built to ask for help.

Human limits and AI benchmarks

As the discussion shifted toward AI orchestration, Sara paused to reflect on a subtler but essential thread: the limits of human understanding and how they shape the systems we build.

“Chain-of-thought reasoning and explainability in AI are fascinating,” she said. “But what’s just as fascinating is that humans aren’t always that explainable either. We often don’t know why we know something.”

That tension between human intuition and machine logic quickly leads to deeper questions. “Whenever I talk about these topics, we’re always two questions away from a philosophy lecture,” Sara joked. “What are the limits of human intelligence? Who quality-assures our own thinking? Are some of the world’s most unsolvable math problems even well formulated?”

These aren’t just abstract musings. In the context of large language models (LLMs), they expose a critical challenge: Can we ever be sure that models are doing what we expect them to do?

This thread naturally led Sara into a critique of how the industry measures performance. “Right now, we’re living in a leaderboard-driven moment,” she said. “Top-of-the-leaderboard has become a kind of default OKR, a stand-in for state-of-the-art.”

But that raises deeper concerns about accountability and meaning. What does it really mean for a model to be aligned, safe, or trustworthy? And perhaps more importantly, who decides?

“I’m always curious and a bit skeptical about who’s defining and scoring these benchmarks,” Sara added. “Who’s grounding the definitions of concepts like ‘verbosity’ or ‘alignment’? What counts as success, and who gets to say?”

These questions aren’t just philosophical; they’re foundational. As AI systems become more central to how decisions are made, the frameworks we use to evaluate them will increasingly shape what we build, what we trust, and what we ignore.

Sara’s insight served as a quiet but powerful reminder: in the rush toward smarter models and more automation, human judgment, with all its limits, still defines the boundaries of AI progress.

The moving target of AI benchmarks and human judgment

As the panel delved deeper into the topic of alignment and accountability, one question emerged front and center: Who gets to define the benchmarks that guide AI development? And perhaps more importantly, are those benchmarks grounded in human understanding, or just technical performance?

The challenge, according to Paras, lies in the fact that alignment is not static.

“It depends,” he said. “Alignment is always evolving.” From his perspective, the most important factor is recognizing where and how human input should be embedded in the process.

Paras pointed to nuanced judgment as a key domain where humans remain indispensable. “You might have alignment today, but taste changes. Priorities shift. What was acceptable last month might feel outdated next quarter,” he explained. “LLMs are like snapshots; they reflect a frozen point in time. Humans bring the real-time context that models simply can’t.”

He also emphasized the limits of what AI models can process. “You can’t pass in everything to an LLM,” he noted. “Some of the most valuable context, institutional knowledge, soft cues, and ethical boundaries live outside the prompt window. That’s where human judgment steps in.”

This makes benchmark-setting especially tricky. As use cases become more complex and cultural expectations continue to evolve, the metrics we use to measure alignment, safety, or usefulness must evolve too. And that evolution, Paras argued, has to be guided by humans, not just product teams or model architects, but people with a deep understanding of the problem domain.

“It’s not a perfect science,” he admitted. “But as long as we keep humans close to the loop, especially where the stakes are high, we can keep grounding those benchmarks in reality.”

In short, defining success in AI is a constant process of recalibration, driven by human judgment, values, and the ever-shifting landscape of what we expect machines to do.

The unsolved challenge of human representation in AI

As the panel explored the complexities of benchmarking and alignment, Sara turned the spotlight onto a fundamental and unresolved challenge: human representation in AI systems.

“Humans aren’t consistent with each other,” she began. “At Prolific, we care deeply about sourcing data from representative populations, but that creates tension. The more diverse your data sources are, the more disagreement you get on the ground truth. And that’s a really hard problem, I don’t think anyone has solved it yet.”

Most human-in-the-loop pipelines today rely on contributors from technologically advanced regions, creating a skewed perspective in what AI systems learn and reinforce. While it may be more convenient and accessible, the trade-off is systems that reflect a narrow slice of humanity and fail to generalize across cultures, languages, or values.

Paras expanded on that point by reminding the group of what LLMs really are at their core: “stochastic parrots.”

“They learn by mimicking human language,” he said. “So, if humans are biased, and we are, models will be biased too, often in the same ways or worse.” He drew a parallel to broader democratic ideals. “We all believe in democracy, but how many people actually feel represented by the people they vote for? If we haven’t figured out representation for humans, how can we expect to figure it out for language models?”

That philosophical thread, the limits of objectivity, the challenge of consensus, keeps resurfacing in AI conversations, and with good reason. As Paras put it, “Almost every problem in AI eventually becomes a philosophical question.”

Vaidya added a practical layer to the discussion, drawing on his experience with AI-generated content. Even when a model produces something that’s technically accurate or politically correct, it doesn’t mean it fits the intended use. “You have to ask: is this aligned with the tone, context, and audience we’re targeting?” he said.

Vaidya emphasized the value of multi-perspective prompting, asking the model to generate outputs as if different personas were viewing the same content. “What would this look like to a middle-aged person? What would a kid want to see? If the answers are wildly different, it’s a signal to bring in a human reviewer.”

In short, representation in AI is about surfacing variability, noticing it, and knowing when to intervene. And as all three panelists acknowledged, this challenge is still very much in progress.

Will humans always be in the loop?

As the panel drew to a close, the moderator posed a final, essential question: Will humans always be part of the AI loop? And if not, where might they be phased out?

It’s a question that sits at the heart of current debates around automation, accountability, and the future of work, and one that Paras didn’t shy away from.

“I hope we’re a part of the process and that this doesn’t turn into a Terminator situation anytime soon,” he joked. But humor aside, Paras emphasized that we’re still searching for an equilibrium between human judgment and machine autonomy. “We’re not there yet,” he said, “but we’re getting closer. As we build more of these systems, we’ll naturally find that balance.”

Paras pointed to a few specific use cases where agentic AI systems that can act autonomously without human intervention have started to show real promise.

“Research and code generation are the two strongest examples so far,” he noted. “If you pull out the human for a while, those agents still manage to perform reasonably well.”

But beyond those narrow domains, full autonomy still raises red flags.

“The truth is, even if AI can technically handle something, we still need a human in the loop, not because we can do it better, but because we need accountability,” Paras explained. “We need someone to point the finger at when things go wrong.”

This is why, despite years of development in AI and machine learning, fields like law and medicine have remained cautious adopters. “It’s not that the technology isn’t there,” Paras said. “It’s that when things go south, someone has to be responsible.”

And that need for traceability, interpretability, and yes, someone to blame, is unlikely to disappear anytime soon.

In a world that increasingly leans on AI to make decisions, keeping humans in the loop may be less about capability and more about ethics, governance, and trust. And for now, that role remains irreplaceable.

Final thoughts: Accountability and skill loss in an AI-driven future

As the conversation on human oversight neared its conclusion, Vaidya added a final and urgent perspective: we may be underestimating what we lose when we over-automate.

“One thing to keep in mind,” he said, “is that when we talk about AI performance today, we’re often comparing the entry-level output of a model with the peak performance of a human.”

That’s a flawed baseline, he argued, because while the best human output has a known ceiling, AI capability is continuing to grow rapidly. “What models can do today compared to just six months ago is mind-boggling,” he added. “And it’s only accelerating.”

But Vaidya’s deeper concern wasn’t just about the rate of improvement; it was about the risk of atrophy.

“I was just chatting with someone outside,” he shared. “They said people are going to forget how to write. And that stuck with me.”

The fear is that humans will lose foundational skills before we’ve built the safeguards to do those tasks well through automation. “That’s the danger,” Vaidya said. “We’re handing over control while our own capabilities fade, and without proper checks, we won’t notice until it’s too late.”

To avoid that future, Vaidya made a call to action for leaders and organizations: to treat human skill preservation and accountability mechanisms as part of responsible AI adoption.

“It’s on us to design for that,” he said. “We need systems, formal or informal, that ensure we retain critical human capabilities even as we scale what machines can do.”

As AI continues to evolve, this perspective added a final layer of nuance to the panel’s core message: progress doesn’t just mean doing more, it means knowing what to protect along the way.

“Agentic AI is here — are you ready?” With Ash Dhupar“Agentic AI is here — are you ready?” With Ash Dhupar

"Agentic AI is here — are you ready?" With Ash Dhupar

LLMOps in action: Streamlining the path from prototype to productionLLMOps in action: Streamlining the path from prototype to production

LLMOps in action: Streamlining the path from prototype to production

AIAInow is your chance to stream exclusive talks and presentations from our previous events, hosted by AI experts and industry leaders.

It’s a unique opportunity to watch the most sought-after AI content – ordinarily reserved for AIAI Pro members. Each stream delves deep into a key AI topic, industry trend, or case study. Simply sign up to watch any of our upcoming live sessions.

🎥 Access exclusive talks and presentations
✅ Develop your understanding of key topics and trends
🗣 Hear from experienced AI leaders
👨‍💻 Enjoy regular in-depth sessions

LLMOps is emerging as a critical enabler for organizations deploying large language models at scale – bringing data scientists, engineers, and end-users into tighter, more effective collaboration.

Join us for a deep dive into the operational backbone of successful LLM deployments. From model design to monitoring in production, you’ll learn how to unlock the full potential of LLMs with streamlined processes, smarter tooling, and cross-functional alignment.

In this session, you’ll explore:

🧠 What LLMOps is – and why it’s essential for scalable AI success
🔄 The full LLMOps lifecycle: from experimentation to deployment and iteration
🤝 How to accelerate collaboration between data teams, engineers, and business users
🧰 Practical frameworks and tools for building robust LLM pipelines
📊 Real-world case studies showcasing high-impact LLM applications
⚠️ Common challenges in LLMOps – and how to overcome them

Whether you’re an AI practitioner, developer, or team leader, this session will equip you with the insights and strategies to operationalize LLMs with confidence.

Meet the speaker:

Samin Alnajafi, AI Solutions Engineer, Weights & Biases

Samin Alnajafi is an accomplished Pre-Sales AI Solutions Engineer at Weights & Biases, specializing in AI-powered solutions for enterprise clients across EMEA. With experience at tech leaders such as Snowflake and DataRobot, he excels in guiding organizations through the technical intricacies of machine learning and data-driven innovation. His expertise spans large language model operations, sales engineering, and AI solutions, making him a valued advisor in deploying transformative technologies for a range of industries.

LLMOps in action: Streamlining the path from prototype to production

How to optimize LLM performance and output quality: A practical guideHow to optimize LLM performance and output quality: A practical guide

How to optimize LLM performance and output quality: A practical guide

Have you ever asked generative AI the same question twice – only to get two very different answers?

That inconsistency can be frustrating, especially when you’re building systems meant to serve real users in high-stakes industries like finance, healthcare, or law. It’s a reminder that while foundation models are incredibly powerful, they’re far from perfect.

The truth is, large language models (LLMs) are fundamentally probabilistic. That means even slight variations in inputs – or sometimes, no variation at all – can result in unpredictable outputs.

Combine that with the risk of hallucinations, limited domain knowledge, and changing data environments, and it becomes clear: to deliver high-quality, reliable AI experiences, we must go beyond the out-of-the-box setup.

So in this article, I’ll walk you through practical strategies I’ve seen work in the field to optimize LLM performance and output quality. From prompt engineering to retrieval-augmented generation, fine-tuning, and even building models from scratch, I’ll share real-world insights and analogies to help you choose the right approach for your use case.

Whether you’re deploying LLMs to enhance customer experiences, automate workflows, or improve internal tools, optimization is key to transforming potential into performance.

Let’s get started.

The problem with LLMs: Power, but with limitations

LLMs offer immense potential – but they’re far from perfect. One of the biggest pain points is the variability in output. As I mentioned, because these models are probabilistic, not deterministic, even the same input can lead to wildly different outputs. If you’ve ever had something work perfectly in development and then fall apart in a live demo, you know exactly what I mean.

Another well-known issue? Hallucinations. LLMs can be confidently wrong, presenting misinformation in a way that sounds convincing. This happens due to the noise and inconsistency in the training data. When models are trained on massive, general-purpose datasets, they lack the depth of understanding required for domain-specific tasks.

And that’s a key point – most foundation models have limited knowledge in specialized fields.

Let me give you a simple analogy to ground this. Think of a foundation model like a general practitioner. They’re great at handling a wide range of common issues – colds, the flu, basic checkups. But if you need brain surgery, you’re going to see a specialist. In our world, that specialist is a fine-tuned model trained on domain-specific data.

With the right optimization strategies, we can transform these generalists into specialists – or at least arm them with the right tools, prompts, and context to deliver better results.

How to optimize LLM performance and output quality: A practical guide

Four paths to performance and quality

When it comes to improving LLM performance and output quality, I group the approaches into four key categories:

Prompt engineering and in-context learning
Retrieval-augmented generation (RAG)
Fine-tuning foundation models
Building your own model from scratch

Let’s look at each one.

1. Prompt engineering and in-context learning

Prompt engineering is all about crafting specific, structured instructions to guide a model’s output. It includes zero-shot, one-shot, and few-shot prompting, as well as advanced techniques like chain-of-thought and tree-of-thought prompting.

Sticking with our healthcare analogy, think of it like giving a detailed surgical plan to a neurosurgeon. You’re not changing the surgeon’s training, but you’re making sure they know exactly what to expect in this specific operation. You might even provide examples of previous similar surgeries – what went well, what didn’t. That’s the essence of in-context learning.

This approach is often the simplest and fastest way to improve output. It doesn’t require any changes to the underlying model. And honestly, you’d be surprised how much of a difference good prompting alone can make.

2. Retrieval-augmented generation (RAG)

RAG brings in two components: a retriever (essentially a search engine) that fetches relevant context, and a generator that combines that context with your prompt to produce the output.

Let’s go back to our surgeon. Would you want them to operate without access to your medical history, recent scans, or current health trends? Of course not. RAG is about giving your model that same kind of contextual awareness – it’s pulling in the right data at the right time.

This is especially useful when the knowledge base changes frequently, such as with news, regulations, or dynamic product data. Rather than retraining your model every time something changes, you let RAG pull in the latest info.

Human + AI: Rethinking the roles and skills of knowledge workersHuman + AI: Rethinking the roles and skills of knowledge workers

Human + AI: Rethinking the roles and skills of knowledge workers

Artificial intelligence is not just another gadget; it’s already shaking up how white-collar jobs work.

McKinsey calls this shift an arrival at superagency, a space where machines think alongside people and the two groups spark new bursts of creativity and speed. Suddenly, the click-by-click chores-plowing through code, crunching spreadsheets, scrubbing datasets-are handled by bots, letting human brains leap to bigger questions.

Software developers, for instance, now spend more energy sketching big-picture road maps than wrestling syntax errors. Data scientists swap grinding model tweaks for debating which human questions an AI model really answers. In every corner of knowledge work, the quiet obey-yesterday-tasks face is evaporating.

The revamped role of the knowledge worker is equal parts translator, coach, and ethical guardian. Successful pros read the business landscape, nudge AI tools in the right direction, and steer output so it stays inside value lines. Human judgment steps in when computers run out of context, making it the real superpower of the partnership.

Some experts have taken to calling us AI strategists, a pivot away from the older task executor label. We use machines as sturdy scaffolding, letting us build fresher ideas faster while keeping accountability firmly in hand.

Skills for human-AI collaboration

Living in a world that teams up humans and machines is no longer a sci-fi plot; it’s the daily grind for millions. A recent World Economic Forum report warns that nearly 39% of the skills we brag about on our resumes will be different by 2030, and tech is doing the heavy lifting. This figure represents great disruption but is down from 44% in 2023.

Right now, big-ticket items like AI, Big Data, and cybersecurity sit at the head of the table, with cloud know-how and solid digital literacy close behind. Wages for professionals in those fields already show it. But numbers alone aren’t enough. Hiring managers keep shouting out for soft skills, too. Creative thinking, bounce-back strength, and plain old curiosity keep sneaking onto every shortlist we see.

The classic bedrock talents-leadership, talent management, and sharp-eyed analysis aren’t going anywhere either. Recruiters still want people who can steer teams and sway an audience while keeping facts straight. Long story short, the winning mix for tomorrow’s worker is hi-tech fluency slapped together with high-touch judgment.

Human + AI: Rethinking the roles and skills of knowledge workers

Key skill areas include:

AI and data literacy

Understanding how to work with AI systems is imperative, from preparing effective prompts to interpreting a model’s output.

Workers must learn to gauge an AI suggestion and realise its value or shortcomings – whether based on accuracy, bias, or security concerns – and blend those insights into the final decision. Data and statistics will always be important.

Critical and strategic thinking

When routine tasks are automated, the human side of problem framing, strategy, and design can shine through.

This means developing, along with domain expertise, long-term thinking: choosing the right technology tools, architecting resilient systems, and carving out innovative ways to do things. The ability to envision strategic applications of AI for processes, rather than simply applying it to a single task, will set leaders apart.

Creativity and innovation

The human realm will generate fresh ideas, brainstorm new physical or digital products or services, and think outside the algorithmic box. According to the WEF data, roles that require these abilities, such as engineering new fintech solutions, envisioning new educational curricula with AI, or designing novel avenues for public service, are growing rapidly.

Emotional intelligence and ethics

The human right of empathy and social judgment cannot (yet) be emulated by AI. When many things are automated, skills like communication, collaboration, negotiation, and emotional nuance increase in desirability.

For instance, knowledge workers must manage the human side of operations to interpret and present results to various stakeholder groups, ensuring that the application of AI is within an ethical framework.

For example, UNDP’s “AI for Government” program trains government officials to deal with the legal, social, and bias aspects of AI issues, emphasising that AI deployment needs to be regulated and humanized by public servants.

Adaptability and lifelong learning

Technology continues to change rapidly, making learning for each role an absolute necessity. While adaptability, curiosity, and a growth mindset are focal points for experts, this is all about workers updating their skill sets again and again with the best practices of a new awareness of AI capabilities.

Organizations should promote a culture of continuous learning because WEF has noted that investing in upskilling programs is already a crucial guarantee of future-readiness.

In summary, we can state that skills are understood as the ability to coordinate across the human-AI frontier. Applied technical knowledge entails using the data, AI platform, and software tool, whereas higher-order thinking concerns analysis, strategy, ethics, and soft skills, such as communication and leadership.

The demand will be for those who can straddle these domains; hence, a finance analyst with a working knowledge of machine learning or a government officer familiar with data policy and stakeholder engagement is a rare find.

Organizational strategies for reskilling and transformation

Bridging the gap between people and machines requires more than shiny new software; it calls for a deliberate shift in how teams operate. Leaders who tinker with job titles but stop there risk missing the moment, so they must rethink workflows, back training with dollars, and keep learning in plain view instead of hiding it in quarterly targets. Several approaches are starting to catch on:

Step back and redesign the whole operating model before you even think about flipping the automation switch. Slapping code onto a clunky process only glues the bad parts together. Grab a whiteboard, outline the steps again, and look for a cleaner route.

Process-mining software can trace every click and keystroke, exposing the stalled choke points that slow everyone down. With that map in hand, you can chop unnecessary work, slot in AI where it crunches numbers faster than a person would, and set humans loose on the judgment-heavy tasks only they can handle.

Take the story of IBM’s HR crew: they stripped the quarterly promotions grind of manual busywork by letting a custom Watsonx Orchestrate choreograph the data fetch, freeing the team to focus on tough calls about talent rather than hunting spreadsheets.

Invest as boldly in your people as you do in code, readying the workforce for the tremors AI and other waves of tech will send through the usual order.

Right now, HR is poised at a turning point, and the folks in those seats need to sketch how humans and machines will pull the organization’s heavy wagon forward.

Someone has to spot the high-value corners of the business, carve out the keystone positions, and map which skills, certainly not all of them, are going to matter most tomorrow.

That handiwork means trimming away repetitive errands that a bot can swallow, sometimes joining two titles into one, sometimes enlarging a job so it drags an AI dashboard into the daylight, and all the while cooking up quick-hit training that lets real people handle the meatier tasks.

Make skills the heart of your workforce plan, both for the challenges employees face now and for those still on the horizon. Leaders should worry less about flashy projects and focus on steadily lifting everyone’s tech know-how, because that solid base is what lets people branch out and try fresher things.

Many roles will not ever require serious coding, yet most team members will inevitably play with new-generation AI tools, so a little exposure goes a long way. When staff grasp the basics of artificial intelligence, they think critically, use the software sensibly, and even push back when something feels off.

Asking what data trained a model, how it arrived at a given output, or whether hidden bias lurks in the results stops being an academic debate and starts sounding like standard procedure.

Technology itself can play a role in personal growth. Point-and-click roadmaps that update as the market shifts show each person precisely what steps and what skills prepare them for the next rung on the ladder. Delta Airlines leaned on IBM Consulting to spin up just such a skills-first talent hub, and the IT crew there ramped up quickly on the hottest technologies.

Beyond today, every firm is staring at a yawning AI skills gap that won’t fix itself; filling that chasm demands deliberate hiring, strategic learning budgets, and a bit of patience while new talent rises through the ranks.

Let employees steer their own work, and suddenly jobs stop feeling like drudgery. When teams get to pick the tasks they hate, the routine pain melts away.

Generative software picks up the monotonous load and hands people back the hours they used to waste repeating the same clicks. New openings pop up organically, since folks now have breathing room to try odd experiments that might just turn into career paths.

Open channels matter, so project lists, quick polls, even a spare Slack room where anybody can shout, “Hey, this job could use a robot, keep the ideas flowing.” A steady stream of feedback like that also acts as a low-key boot camp for future leaders because they get to practice owning change right on the frontline.

Encourage managers, interns, pretty much anyone, to mash up tech with wild ideas in their day-to-day and watch the ownership spread.

We stand at the crossroads, holding a rare moment where policy can tip the balance toward people or toward code. The choice of landing squarely in human hands still looks daunting, but nobody gets dragged through this blindly. Rethink talent models so skill, spirit, and technology line up instead of running in separate lanes.

If those pieces fit together, the productivity spike follows, and so does the business value everyone keeps talking about. Skip that realignment, and the same tools that promise freedom end up sharpening the very collars we said were gone.

Career and management implications

The workplace is changing in ways that ripple well beyond the latest technology demo. Managers now need to rethink what authority even means when AIs pull as much weight as people do.

Old command-and-control hierarchies simply don’t fit. Collaboration, trial-and-error, and plain visibility in how algorithms make decisions matter far more. McKinsey puts it bluntly: bold AI targets must drive new structures, fresh incentives, and tougher accountability rules. Product, ops, and data leaders often end up elbow-deep together, swapping insights on the fly until a working prototype surfaces. That blend feels messy, but it works.

Careers are reshaping themselves right alongside management practices. Few professionals will climb the same straight ladder their parents did. Instead, a T-shaped profile, deep chops in finance, and wide comfort with AI tools become the norm.

New titles like AI product owner land beside more familiar ones on org charts, and folks are expected to slide from one box to another without fuss. Learning plans now stack competencies; a marketer who takes an AI analytics boot camp, then masters model auditing, suddenly qualifies for a much bigger role.

Oracle insists virtually every job will soon add the phrase using generative AI and supervision thereof to its description, and the company is probably correct.

Talent management is headed toward a sharper, skills-first focus. Where once longevity or pedigree ruled performance reviews, nimbleness, a learn-on-the-go mindset, and the knack for working in messy teams will start to tip the scales.

The World Economic Forum is already calling this shift skills intelligence, a phrase that keeps popping up in boardrooms. Some firms are trying out real-time peer checks and milestone pay jumps: show the muscle, move up. A handful of trailblazers have even hooked up AI engines that nudge people toward fresh roles or courses based on what they have just mastered.

The workplace of tomorrow, powered by ever-smarter tools, is anything but static. Most futurists agree machines won’t erase jobs so much as carve them into new shapes. To keep the workforce from feeling whipsawed, leaders must step in early and steer the transition.

That means backing learning routes, whether it’s funding an ML cert or bringing in coaches, and lavishing praise on the uniquely human spark that tech can’t mimic. One industry sage puts it bluntly: the people who win will be knowledge workers who wield AI deftly but never lose sight of crafting solutions that are durable, valuable, and, above all, humane.

What’s next?

The workplace of tomorrow will blend people and artificial intelligence in ways that feel ordinary before long.

Analysts, designers, coaches-everyone who trades in knowledge-will spend less time pushing pixels or filling sheets and more on insight, judgment, and plain old human connection.

Companies that show real leadership will retrain staff, re-architect roles, and rethink how managers ask questions and give credit. For those that pull it off, productivity will inch upward and, just maybe, the teams doing the work will feel a bit more alive in the process.