ماه: فروردین 1404

Zero trust and AI: The next evolution in cybersecurity strategyZero trust and AI: The next evolution in cybersecurity strategy

Zero trust and AI: The next evolution in cybersecurity strategy

Traditional approaches to cybersecurity have always been to defend the digital perimeter surrounding internal networks. However, with the popularity of remote work and cloud computing technologies, conventional security strategies are no longer as effective at protecting organizations.

Zero trust has now become the go-to security approach. Its guiding concepts are built around the mindset of “never trust, always verify.” Each user, access device, and network connection is strictly evaluated and monitored regardless of where they originate from.

Artificial intelligence (AI) has become an addition to zero trust security architecture. With the ability to analyze large volumes of information and apply complex processes to automate security functions, AI has helped how modern businesses approach their security planning.

Understanding zero trust in modern organizations

Digital environments have changed the cybersecurity paradigm in many different ways, as businesses have moved toward highly connected infrastructures.. Zero trust security models assume every network connection within the organization is a potential threat and requires various strategies to address them effectively.

Zero trust models work on several core principles that include:

Providing minimum access privileges: Employees should only be given access to information and systems that are absolutely essential for the job function they perform. This limits unauthorized access at all times, and in the event a security breach does occur, the damage is contained to a minimum.
Creation of isolated network areas: Rather than having a single company network, organizations should segment their systems and databases into smaller, isolated networks. This limits an attacker’s access to only a part of the system in the event of a successful perimeter breach.
Constant verification: All users and devices are checked and rechecked frequentlyTrust is never assumed, and all activity is closely monitored regardless of who is gaining access or what they’re doing.
Assumed breaches: With zero trust, potential breaches are always viewed as a possibility. Because of this, security strategies don’t just focus on prevention, but also limiting the possible damage from a successful attack.

Identity-centric security has now become an essential element for building a strong cybersecurity posture and improved operational resilience. A big part of this process is safeguarding sensitive information and making sure that even if breaches do occur, it’s less likely that it becomes compromised.

Zero trust and AI: The next evolution in cybersecurity strategy

The role of AI in strengthening zero trust models

Bringing AI and zero trust together represents a major step forward for cybersecurity. AI’s power to analyze large datasets, spot unusual network activity, and automate security responses makes the core principles of zero trust even stronger, allowing for a more flexible and resilient defense.

Improving identity and access management

With leveraging AI, managing various identities and provisioning system access within a zero trust environment can be improved. Machine learning models can scan user behaviors looking for anomalies indicative of compromised accounts or potentially dangerous network activity. Adaptive authentication protocols can then use these risk-based assessments to change various security validation parameters dynamically.

AI technology also helps automate authentication processes when validating user identities. They can help facilitate new user setups, streamlining IT processes while at the same time minimizing human error. This added efficiency reduces the strain and resource requirements of IT support teams and significantly reduces the possibility of accidentally giving out wrong access permissions.

Intelligent threat detection and response

Traditional security measures can overlook subtle, yet important indicators of malicious network activity. However, machine learning algorithms can aid in detecting these threats ahead of time, resulting in a far more proactive approach to threat response.

Autonomous threat hunting and incident resolution can reduce the time necessary to identify and contain breaches while mitigating any associated damage. With AI, network monitoring processes can be done automatically, allowing security personnel to act faster if and when the time comes.

AI can also provide organizations with predictive analytics that help to guard against possible attacks by anticipating them before they occur. By using threat intelligence gathered from external vendors, and at the same time, checking for system vulnerabilities, essential steps can be taken to tighten security defenses to avoid any weaknesses from being exploited.

Automating data security and governance processes

AI systems can help sensitive business information be protected in real time. As data is collected, it can be automatically classified into various categories. This dynamic classification allows AI systems to apply relevant security controls to certain datasets, helping to align with various compliance requirements while adhering to any of the organization’s specific data management policies.

Another important security element for modern organizations is data loss prevention (DLP). AI-driven DLP solutions can be configured to automatically supervise the way users access and relocate information within a system. This helps to identify potential data manipulation and greatly minimizes the danger of unauthorized system access and data leakage.

New security challenges and considerations

Though AI drastically improves the capabilities of traditional zero-trust models, it also can present additional security considerations that require organizations’ attention. Some of these include:

Data privacy and ethical concerns

When applying AI in zero trust settings, balancing security and personal privacy is critical. Organizations need to be certain that their methods of collecting and analyzing data are done within the scope of applicable privacy laws and ethical boundaries.

Bias in AI systems should be dealt with as well. Machine learning algorithms trained on outdated data are capable of producing inaccurate results that could lead to more passive security measures being put in place. Organizations need to ensure that any of their AI-driven systems have supporting policies in place to prevent these biased analyses from taking place.

Integration and implementation challenges

Integrating AI into a zero trust framework isn’t always straightforward. Complications can surface – especially when it comes to system and network compatibility. Organizations need to ensure that their AI solutions can be seamlessly integrated into the existing tech stack and that there aren’t any potential barriers that will impede data flow to and from critical systems.

Another operational challenge with AI-driven security systems is finding qualified talent to operate them. Companies will likely need to allocate dedicated resources for training and staff development to keep systems functioning effectively.

The importance of regular AI model training

AI solutions, especially those that use complex learning algorithms, aren’t a “set-it-and-forget-it” implementation. With cyber threats constantly evolving, maintaining the effectiveness of AI-driven systems requires regular model training.

Without regular intervals of AI model retraining, these systems won’t function accurately and efficiently over time. An AI model must be regularly reviewed and modified to avoid false positive alerts, broken automation, or inadequate threat mitigation protocols.

The future of cybersecurity

Integrating AI with zero trust architecture has changed how businesses can approach their cybersecurity initiatives. As cyberthreats become increasingly more sophisticated, then the need for increased automation and identity-centric security planning will only continue to grow.

With the proper implementation strategies in place, organizations can benefit from enhanced threat management, streamlined access management, and a more proactive approach to data protection.

Have you checked our 2025 events calendar?

We’ll be all over the globe, so why not have a look and see if we’re anywhere near you?

Join us and network with like-minded AI experts in your industry.

AI assistants: Only as smart as your knowledge baseAI assistants: Only as smart as your knowledge base

AI assistants: Only as smart as your knowledge base

Artificial intelligence assistants are quickly becoming vital tools in modern workplaces, transforming how businesses operate by making everyday tasks simpler and faster.

But despite their widespread adoption and advanced capabilities, even the best AI assistants today face a significant limitation: they often lack access to a company’s internal knowledge.

AI assistants need real-time, seamless connections to your company’s databases, documents, and internal communication tools to realize their full potential. This integration ensures they’re brilliant and contextually aware, making them genuinely valuable workplace assets.

The rise of AI assistants

AI assistants are smart applications that understand commands and use a conversational AI interface to conduct tasks. They’re often embedded into dedicated hardware and even incorporated with several systems.

Unlike chatbots, AI assistants are less limited in both intelligence and functionality. They have more agency and advanced abilities, like contextual understanding and personalization. From drafting emails to summarizing reports, these assistants are everywhere.

Some of the more popular AI assistants are:

ChatGPT from OpenAI
Gemini from Google
Claude from Anthropic
DeepSeek from High-Flyer

In business, these large language models (LLMs) can also help you with data analysis, task automation, workflow streamlining, and more. They can be mostly free if you don’t need to scale up, although some users might struggle with the free versions when it comes to tasks that involve uploading or downloading data.

However, even the more advanced AI assistants are missing something that makes them truly useful in your workplace: they don’t have access to your company’s knowledge and information. Without that, these assistants are simply guessing.

And that’s a problem.

AI assistants: Only as smart as your knowledge base

The knowledge gap: Why AI assistants struggle

Picture this: you ask your AI assistant about a specific company policy you need to quote, a conversation that’s buried in Slack, or a past project you need vital information from. You’re likely to get a vague and generic answer or, even worse, something that’s completely irrelevant or downright wrong.

That’s because these AI assistants don’t have access to the right data – your data – and rely on public information instead. As they aren’t drawing from internal knowledge that sits behind your business, you’ll often find issues with their responses.

Wasted time searching for answers the AI should be able to provide.
Frustration when employees get irrelevant or outdated responses.
AI that feels like more of a novelty than a real workplace tool.

If an AI assistant is to work in a business environment, it needs more than intelligence. It needs context, otherwise it won’t be helpful for your employees.

The fix: Connecting AI assistants to your knowledge base

How do you tackle the information problem?

The answer is simple: the AI assistants have to be plugged into your company’s internal database. When they have real-time access to company documents, emails, Slack threads, and more, they can help you the way your business needs.

But how can AI assistants help your business by being connected to your company data?

When you connect an AI assistant to your institutional knowledge base with policies, documentation, manuals, and more, they’ll be able to provide you with accurate and contextual answers on a wider variety of topics.

This could change how employees share knowledge in the workplace, moving from a tedious process of manual document searching to a more conversational, self-service experience. Employees’ wait times and support costs will be reduced by simply asking assistants and getting instant replies.

A custom AI assistant lets you quality customers and offers personalized solutions by taking care of repetitive and time-consuming tasks. Your employees can then focus on improving products and strategic work.

This streamlined strategy leads to increased efficiency and productivity, which greatly reduces bottlenecks and improves output. And as AI assistants can also handle companies’ growing needs, they’ll adapt to increased workloads and offer long-term ROI and usability.

How Glean makes AI smarter

That’s where Glean comes in. Glean connects AI assistants directly to your company’s knowledge, turning them into real, reliable workplace tools. It’s designed to integrate AI capabilities into your company’s data for up-to-date and context-aware answers.

Here’s what that means in practice:

Real-time data synchronization

Glean’s connectors support real-time synchronization, making sure that any updates in the source applications are immediately reflected. This means that your assistant will always work with the most current information, enhancing its responses’ accuracy and timeliness.

Comprehensive data integration

An extensive data integration makes sure that your AI assistant can access a wide range of company data, which allows it to offer relevant and informed responses. Glean connects with over 100 enterprise applications like Box, Confluence, Dropbox, GitHub, Gmail, Google Drive, Jira, Microsoft Teams, OneDrive, Outlook, Salesforce, ServiceNow, SharePoint, Slack, and Zendesk.

Permissions-aware responses

Strictly enforcing the same permissions in your company’s data sources, Glean ensures that users only have access to the information they have permission to see. This keeps your data secure and in compliance with regulations while still delivering the relevant answers.

Personalized results and semantic understanding

Glean Assistant uses deep learning-based language models, meaning it understands natural language queries and can deliver intuitive interactions. Every personalized result takes into consideration ongoing projects, the user’s role, and collaborations for tailored information.

Universal knowledge access

As it combines external web information with your internal company data, Glean Assistant is ideal for researching internal projects and accessing publicly available insights in just one platform. The integration makes it much easier for a comprehensive understanding and informed decision-making.

AI-driven content generation and analysis

Glean Assistant can analyze structured and unstructured data simultaneously across your company’s applications, documents, and even the web. It offers assistance in supporting a smarter decision-making process by drafting deliverables and finding relevant insights.

A seamless integration with your company’s data ecosystem and advanced AI techniques allow for Glean Assistant to enhance your productivity.

The smarter way forward

AI assistants have the potential to transform the workplace significantly, but only if they have access to accurate and relevant internal information. Connecting them directly to internal knowledge allows companies to move from nice-to-have AI to must-have AI.

Glean makes that shift seamless, turning AI from a frustrating gimmick into a powerful, reliable assistant. This enhances productivity and empowers employees to achieve more meaningful outcomes.

How to secure LLMs with the fastest guardrails for peak AI performanceHow to secure LLMs with the fastest guardrails for peak AI performance

How to secure LLMs with the fastest guardrails for peak AI performance

This article comes from Nick Nolan’s talk at our Washington DC 2025 Generative AI Summit. Check out his full presentation and the wealth of OnDemand resources waiting for you.

What happens when a powerful AI model goes rogue? For organizations embracing AI, especially large language models (LLMs), this is a very real concern. As these technologies continue to grow and become central to business operations, the stakes are higher than ever – especially when it comes to securing and optimizing them.

I’m Nick Nolan, and as the Solutions Engineering Manager at Fiddler, I’ve had countless conversations with companies about the growing pains of adopting AI. While AI’s potential is undeniable – transforming industries and adding billions to the economy – it also introduces a new set of challenges, particularly around security, performance, and control.

So in this article, I’ll walk you through some of the most pressing concerns organizations face when implementing AI and how securing LLMs with the right guardrails can make all the difference in ensuring they deliver value without compromising safety or quality.

Let’s dive in.

The growing role of AI and LLMs

We’re at an exciting moment in AI. Right now, research shows around 72% of large enterprises are using AI in some way, and it’s clear that generative AI is definitely on the rise – about 65% of companies are either using it or planning to.

On top of this, AI is also expected to add an enormous amount to the global economy – around $15.7 trillion by 2030, but let’s keep in mind that these numbers are just projections. We can only guess where this journey will take us, but there’s no denying that AI is changing the game.

But here’s the thing: while the excitement is real, so are the risks. The use of AI, particularly generative AI, comes with a unique set of challenges – especially when it comes to ensuring its security and performance. This is where guardrails come into play.

If organizations do AI wrong, the cost of failure can be astronomical – not just financially, but also in terms of reputational damage and compliance issues.

How to secure LLMs with the fastest guardrails for peak AI performance

Gold-copy data & AI in the trade lifecycle processGold-copy data & AI in the trade lifecycle process

Gold-copy data & AI in the trade lifecycle process

The current end-to-end trade lifecycle is highly dependent on having accurate data at each stage. The goal of the Investment iook of records (IBOR) system is to ensure the trade, position, and cash data match the custodian and for the accounting book of records (ABOR) system for this same data set to match the fund accountant.

There are other stakeholders in the process, including broker systems, transfer agents, central clearing parties, etc, depending on the type and location of execution. A position that reflects identically across all systems is known as having been “straight-through processed”; in other words, systems have recognized the trade, and datasets are in line, or at least, within tolerance.

While efficient, the addressal and eventual resolution of non-STP executions remains highly manual. Stakeholders typically compare data points across multiple systems, beginning as upstream as possible, and gradually move down the lifecycle to the root cause of the break. This investigation takes time, creates noise across the value chain, and most importantly, creates uncertainty for the front office to take new decisions.

The proposal is to leverage AI to continually create and refine gold-copy data at each stage of the life cycle through comparison with sources and link downstream processes to automatically update in real-time with the accurate datasets. Guardrails should also be implemented in case of material differences.

Gold-copy data & AI in the trade lifecycle process

Introduction

Let’s analyze the current process with an example – a vanilla bond is about to undergo a payment-in-kind (PIK) corporate action (PIKs occur when an issuer decides to capitalize interest it would have paid in cash as additional security). Assume that the vendor an IBOR system is using utilizes an ACT/360 day-count (to calculate accrual) than the custodian (who uses ACT/365):

On ex-date, the PIK will process with a higher capitalization than the custodian and a mismatch will form between IBOR and Bank.
This mismatch will first be uncovered on ex-date, assuming the bank sends MT567 (corp. action status) and flags the positional difference between the two systems.
Next, on SD+1, this will again be flagged when the bank sends MT535 (position statement), showing the mismatch during position reconciliation.
Finally, if investment accounting is run on ex-date or on SD+1, there’ll be a mismatch between IBOR and the fund accountant, where the balance sheet and statement of change in net asset reports will again show an exception for the security.

This simple example illustrates how one mismatch well upstream in the lifecycle causes three separate breaks in the downstream chain; in other words, three different segments of users (corp. action user, reconciliation user, and an accounting user are all investigating the same root cause).

Once the IBOR system’s data is resolved, each of these user segments need to coordinate the waterfall logic to have each of the downstream system/process updated.

The problem

Unfortunately, such occurrences are common. As front-to-middle-to-back investment systems become more integrated, inaccurate data at any point in the process chain creates inefficiencies across a number of user segments and forces multiple users to analyze the same exception (or the effect of that exception) on their respective tools.

Downstream users that are reconciling to the bank or the fund accountant will notice the security mismatch but would not immediately recognize the root cause of day count difference. These users would typically undertake the below tasks to investigate:

Raise an inquiry with the bank’s MT535 statement to explain the position difference
Raise an inquiry with the fund accountant’s statement to explain the position difference
Raise inquiry with the internal data team to specify IBOR’s position calculations
Once aware of a recent corp. action, raise inquiry with the internal COAC team to investigate the processing of the PIK

As seen, multiple teams’ energy and capacity are being expended to investigate the root cause and all being undertaken manually.

On the other hand, an AI process that could continually query multi-source datasets should have been proactively able to flag the day count discrepancy prior to the corp. action processing, as well as automatically inform downstream teams of potential inaccuracy in the specific position of the PIK security.

While any changes to user data from AI should still undergo a reviewer check, such proactive detection and communication drastically increases resolution times and should reduce user frustration.

The proposal

Let’s look at the corporate action workflow in detail. Users typically create a “gold-copy” event once they’ve “scrubbed” data from multiple sources and created an accurate, up-to-date copy of the event that will occur. This is ideal in many ways: scrubbing multiple sources ensures there’s less chance of an incorrect feed from a single vendor, creating process gaps.

We need AI to undertake this process continuously. IBOR systems should, at minimum, be subscribed to two or more vendors from whom data should be retrieved. Any change to the dataset should be continually updated (either through a push or pull API mechanism). This would work as follows:

A new public security is set up in the marketplace with public identifiers including CUSIP, ISIN, SEDOL etc.
The data vendors supplying the feed to IBOR systems should feed this through automatically, once the required minimum data point details are populated.
- IBOR systems, at this point, would create this security within their data systems
- Any mismatches across vendors should be reviewed by a user, and appropriate values chosen (if deemed necessary)
Any updates the securities undergo from that point in the market should be automatically captured and security updated in the IBOR system
- At this point, downstream applications that leverage the application should automatically flag a security market update and the impending event-driven update
  - This informs users that the dataset they’re seeing may be stale vs. external processes that may be receiving up-to-date data
- To protect against the risk of inaccurate data from a single vendor, only a dataset that is consistent across all vendors should be automatically updated
- Data updates from a single vendor only should be prompted to a user to review and approve
Once underlying securities are updated, this would be considered an ‘event’, which should drive updates to all downstream applications that rely on the security update (called event-driven updates)
- Event-driven updates greatly reduce the number of manual touches downstream users need to make for inaccuracies that have been identified upstream
- Once all applications are in line with the updated data sets, the security market update flag should be removed automatically.

Potential concerns

While exciting, the use of AI and event-driven updates raises a few concerns worth discussing – data capacity/storage, potential timing differences with external participants, and materiality/tolerance.

Let’s address the latter first – materiality/tolerance. Securities can undergo immaterial changes from time to time that may have little to no impact on all upstream and downstream processes in the trade lifecycle.

As a result, a set of fields and tolerances should be identified to be flagged in case of market updates (core dataset). If the updates occur on these specific fields and they’re outside of the existing tolerance, IBOR systems should consume the updates provided by vendors.

If updates occur on any other fields (or are within tolerance), the updates should be rejected. This would ensure the system leverages the efficiency of AI without the inefficiency of noise.

Secondly, there is potential for timing differences with external participants. While the IBOR system may have up-to-date data, external participants (e.g., banks or fund accounting systems) may continue to leverage stale or outdated datasets.

There should be an audit history available of the core dataset’s historical data; in other words, if the bank/fund accounting system refers to any of the audit datasets, an automatic note should be sent to the external participant informing them of stale data and to recheck against external market vendors.

Finally, there is the concern about data capacity. There’s no doubt that continual querying, validation, and updates of core datasets by multiple vendors, along with maintaining audit data, will increase data consumption and storage costs.

A number of companies are required by law to keep an audit history of at least five years, and adding the above requirement would certainly expand the capacity requirements. Making security updates to solely the core data sets and allowing tolerance should help to manage some of this required capacity.

Future

Despite these strong concerns highlighted, the use of AI is still valuable to design and implement across the trade lifecycle process and would be substantially more valuable than the costs that would likely be incurred. While much of the examples in this paper discussed public securities, the universe is substantially wider in private securities with much less high-quality data.

With the investing world transitioning to increased investments in private securities, leveraging AI will continue to pay dividends across both universes.

LLM economics: How to avoid costly pitfallsLLM economics: How to avoid costly pitfalls

LLM economics: How to avoid costly pitfalls

Large Language Models (LLMs) like GPT-4 are advanced AI systems designed to process and generate human-like text, transforming how businesses leverage AI.

GPT-4’s pricing model (32k context) charges $0.06 per 1,000 input tokens and $0.12 per 1,000 output tokens, which makes it a scalable option for businesses. However, it can become expensive very quickly when it comes to production environments.

New models cross-reference all bits of data, or tokens, that deal with other tokens in order to both quantify and understand the context behind each pair. The result? Quadratic behavior of algorithms that becomes more and more expensive as the number of tokens increases.

And scaling isn’t linear; costs increase quadratically when it comes to the length of sequences. If you need to scale up to handle text that’s 10x longer, the cost will go up 10,000 times, and so on.

This can be a significant setback for scaling projects; the hidden cost of AI impacts sustainability, resources, and requirements. This lack of insight can lead to businesses overspending or inefficiently allocating resources.

Where costs lie

Let’s look deeper into tokens, per-token pricing, and how everything works.

Tokens are the smallest unit of text processed by models – something simple like an exclamation mark can be a token. Input tokens are used whenever you enter anything into the LLM query box, and output tokens are used when the LLM answers your query.

On average, 740 words are equivalent to around 1,000 tokens.

Inference costs

Here’s an illustrative example of how costs can exponentially grow:

Input tokens: $0.50 per million tokens

Output tokens: $1.50 per million tokens

Month	Users/ Avg. prompts per user	Input/output tokens per prompt	Total input tokens	Total output tokens	Input cost	Output cost	Total monthly cost
1	1,000/20	200/300	4,000,000	6,000,000	$2	$9	$11
3	10,000/25	200/300	50,000,000	75,000,000	$25	$122.50	$137.50
6	50,000/30	200/300	300,000,000	450,000,000	$150	$675	$825
9	200,000/35	200/300	1,400,000,000	2,100,000,000	$700	$3,150	$3,850
12	1,000,000/40	200/300	8,000,000,000	12,000,000,000	$4,000	$18,000	$22,000

As LLM adoption expands, the user numbers grow exponentially and not linearly. Users engage more frequently with the LLM, and the number of prompts per user increases. The number of total tokens increases significantly as a result of increased users, prompts, and token usage, leading to costs multiplying monthly.

What does it mean for businesses?

Anticipating exponential cost growth becomes essential. For example, you’ll need to forecast token usage and implement techniques to minimize token consumption through prompt engineering. It’s also vital to keep monitoring usage trends closely in order to avoid unexpected cost spikes.

Latency versus efficiency tradeoff

Let’s look into GPT-4 vs. GPT-3.5 pricing and performance comparison.

Model	Context window (max tokens)	Input price	Output price
GPT-3.5 Turbo	4,000	$0.0015	$0.0020
GPT-3.5 Turbo	16,000	$0.0030	$0.0040
GPT-4	8,000	$0.03	$0.06
GPT-4	32,000	$0.06	$0.12
GPT-4 Turbo	128,000	$0.01	$0.03

Latency refers to how quickly models respond; a faster response leads to better user experiences, especially when it comes to real-time applications. In this case, GPT-3.5 Turbo offers lower latency because it has simpler computational requirements. GPT-4 standard models have higher latency due to processing more data and using deeper computations, which is the tradeoff for more complex and accurate responses.

Efficiency is the cost-effectiveness and accuracy of the responses you receive from the LLMs. The higher the efficiency, the more value per dollar you get. GPT-3.5 Turbo models are extremely cost-efficient, offering quick responses at low cost, which is ideal for scaling up user interactions.

GPT-4 models deliver better accuracy, reasoning, and context awareness at much higher costs, making them less efficient when it comes to price but more efficient for complexity. GPT-4 Turbo is a more balanced offering; it’s more affordable than GPT-4, but it offers better quality responses than GPT-3.5 Turbo.

To put it simply, you have to balance latency, complexity, accuracy, and cost based on your specific business needs.

High-volume and simple queries: GPT-3.5 Turbo (4K or 16K).

Perfect for chatbots, FAQ automation, and simple interactions.

Complex but high-accuracy tasks: GPT-4 (8K or 32K).

Best for sensitive tasks requiring accuracy, reasoning, or high-level understanding.

Balanced use-cases: GPT-4 Turbo (128K).

Ideal where higher quality than GPT-3.5 is needed, but budgets and response times still matter.

Experimentation and iteration

Trial-and-error prompt adjustments can take multiple iterations and experiments. Each of these iterations consumes both input and output tokens, which leads to increased costs in LLMs like GPT-4. If not monitored closely, incremental experimentation will very quickly accumulate costs.

You can fine-tune models to improve the responses; this requires extensive testing and repeated training cycles. These fine-tuning iterations require significant token usage and data processing, which increases costs and overhead.

The more powerful the model, like GPT-4 and GPT-4 Turbo, the more these hidden expenses multiply because of higher token rates.

Activity	Typical usage	GPT-3.5 Turbo cost	GPT-4 cost
Single prompt test iteration	~2,000 tokens (input/output total)	$0.0035	$0.18
500 iterations (trial/error)	~1,000,000 tokens	$1.75	$90
Fine-tuning (multiple trials)	~10M tokens	$35	$1,800

(Example assuming average prompt/response token counts.)

Strategic recommendations to ensure efficient experimentation without adding overhead or wasting resources:

Start with cheaper models (e.g., GPT-3.5 Turbo) for experimentation and baseline prompt testing.
Progressively upgrade to higher-quality models (GPT-4) once basic prompts are validated.
Optimize experiments: Establish clear metrics and avoid redundant iterations.

Vendor pricing and lock-in risks

First, let’s have a look at some of the more popular LLM providers and their pricing:

OpenAI

Model	Context length	Pricing
GPT-4	8K tokens	Input: $0.03 per 1,000 tokens Output: $0.06 per 1,000 tokens
GPT4	32K tokens	Input: $0.06 per 1,000 tokens Output: $0.12 per 1,000 tokens
GPT4 Turbo	128K tokens	Input: $0.01 per 1,000 tokens Output: $0.03 per 1,000 tokens

Anthropic

Claude 3.7 Sonnet

Claude.ai plans

Input: $3 per million tokens ($0.003 per 1,000 tokens)

Output: $15 per million tokens ($0.015 per 1,000 tokens)

Free: Access to basic features

Pro plan: $20 per month (Enhanced features for individual users)

Team plan (minimum 5 users):

$30 per user per month (monthly billing) or $25 per user per month (annual billing)

Enterprise plan: Custom pricing tailored to organizational needs.

Google

Gemini Advanced

Gemini Code Assist Enterprise

Included in the Google One AI Premium plan

$19.99 per month.

Includes 2 TB of storage for Google Photos, Drive, and Gmail

$45 per user per month with a 12-month commitment

Promotional rate of $19 per user per month available until March 31, 2025

Committing to just one vendor means you have reduced negotiation leverage, which can lead to future price hikes. Limited flexibility increases costs when you switch providers, considering prompts, code, and workflow dependencies. Hidden overheads like fine-tuning experiments when migrating vendors can increase expenses even more.

When thinking strategically, businesses should keep flexibility in mind and consider a multi-vendor strategy. Make sure to keep monitoring evolving prices to avoid costly lock-ins.

How companies can save on costs

Tasks like FAQ automation, routine queries, and simple conversational interactions don’t need large-scale and expensive models. You can use cheaper and smaller models like GPT-3.5 Turbo or a fine-tuned open-source model.

LLaMA or Mistral are great fine-tuned smaller open-source model choices for document classification, service automation, or summarization. GPT-4, for example, should be saved for high accuracy and high-value tasks that’ll justify incurring higher costs.

Prompt engineering directly affects token consumption, as inefficient prompts will use more tokens and increase costs. Keep your prompts concise by removing unnecessary information; instead, structure your prompts into templates or bullet points to help models respond with clearer and shorter outputs.

You can also break up complex tasks into smaller and sequential prompts to reduce the total token usage.

Example:

Original prompt:

“Explain the importance of sustainability in manufacturing, including environmental, social, and governance factors.” (~20 tokens)

Optimized prompt:

“List ESG benefits of sustainable manufacturing.” (~8 tokens, ~60% reduction)

To further reduce costs, you can use caching and embedding-based retrieval methods (Retrieval-Augmented Generation, or RAG). Should the same prompt show up again, you can offer a cached response without needing another API call.

For new queries, you can store data embeddings in databases. You can retrieve relevant embeddings before passing only the relevant context to the LLM, which minimizes prompt length and token usage.

Lastly, you can actively monitor costs. It’s easy to inadvertently overspend when you don’t have the proper visibility into token usage and expenses. For example, you can implement dashboards to track real-time token usage by model. You can also set a spending threshold alert to avoid going over budget. Regular model efficiency and prompt evaluations can also present opportunities to downgrade models to cheaper versions.

Start small: Default to GPT-3.5 or specialized fine-tuned models.

Engineer prompts carefully, ensuring concise and clear instructions.

Adopt caching and hybrid (RAG) methods early, especially for repeated or common tasks.

Implement active monitoring from day one to proactively control spend and avoid

The smart way to manage LLM costs

After implementing strategies like smaller task-specific models, prompt engineering, active monitoring, and caching, teams often find that a systematic approach to operationalize these approaches at scale is needed.

The manual operation of model choices, prompts, real-time monitoring, and more can very easily become both complex and resource-intensive for businesses. This is where you’ll find the need for a cohesive layer to orchestrate your AI workflows.

Vellum streamlines iteration, experimentation, and deployment. As an alternative to manually optimizing each component, Vellum will help your teams choose the appropriate models, manage prompts, and fine-tune solutions in one integrated solution.

It’s a central hub that allows you to operationalize cost-saving strategies without increasing costs or complexity.

Here’s how Vellum helps:

Prompt optimization

You’ll have a structured, test-driven environment to effectively refine prompts, including a side-by-side comparison across multiple models, providers, and parameters. This helps your teams identify the best prompt configurations quickly.

Vellum significantly reduces the cost of iterative experimentation and complexity by offering built-in version control. This ensures that your prompt improvements are efficient, continuous, and impactful.

There’s no need to keep your prompts on Notion, Google Sheets, or in your codebase; have them in a single place for seamless team collaboration.

Model comparison and selection

You can compare LLM models objectively by running side-by-side systematic tests with clearly defined metrics. Model evaluation across the multiple existing providers and parameters is made simpler.

Businesses have transparent and measurable insights into performance and costs, which helps to accurately select the models with the best balance of quality and cost-effectiveness. Vellum allows you to:

Run multiple models side-by-side to clearly show the differences in quality, cost, and response speed.
Measure key metrics objectively, such as accuracy, relevance, latency, and token usage.
Quantify cost-effectiveness by identifying which models achieve similar or better outputs at lower costs.
Track experiment history, which leads to informed, data-driven decisions rather than subjective judgments.

Real-time cost tracking

Enjoy detailed and granular insights into LLM spending through tracking usage across the different models, projects, and teams. You’ll be able to precisely monitor the prompts and workflows that drive the highest token consumption and highlight inefficiencies.

This transparent visualization allows you to make smarter decisions; teams can adjust usage patterns proactively and optimize resource allocation to reduce overall AI-related expenses. You’ll have insights through intuitive dashboards and real-time analytics in one simple location.

Seamless model switching

Avoid vendor lock-in risks by choosing the most cost-effective models; Vellum gives you insights into the evolving market conditions and performance benchmarks. This flexible and interoperable platform allows you to keep evaluating and switching seamlessly between different LLM providers like Anthropic, OpenAI, and others.

Base your decision-making on real-time model accuracy, pricing data, overall value, and response latency. You won’t be tied to a single vendor’s pricing structure or performance limitations; you’ll quickly adapt to leverage the most efficient and capable models, optimizing costs as the market dynamics change.

Final thoughts: Smarter AI spending with Vellum

The exponential increase in token costs that arise with the business scaling of LLMs can often become a significant challenge. For example, while GPT-3.5 Turbo offers cost-effective solutions for simpler tasks, GPT-4’s higher accuracy and context-awareness often come at higher expenses and complexity.

Experimentation also drives up costs; repeated fine-tuning and prompt adjustments are further compounded by vendor lock-in potential. This limits competitive pricing advantages and reduces flexibility.

Vellum comprehensively addresses these challenges, offering a centralized and efficient platform that allows you to operationalize strategic cost management:

Prompt optimization. Quickly refining prompts through structured, test-driven experimentation significantly cuts token usage and costs.
Objective model comparison. Evaluate multiple models side-by-side, making informed decisions based on cost-effectiveness, performance, and accuracy.
Real-time cost visibility. Get precise insights into your spending patterns, immediately highlighting inefficiencies and enabling proactive cost control.
Dynamic vendor selection. Easily compare and switch between vendors and models, ensuring flexibility and avoiding costly lock-ins.
Scalable management. Simplify complex AI workflows with built-in collaboration tools and version control, reducing operational overhead.

With Vellum, businesses can confidently navigate the complexities of LLM spending, turning potential cost burdens into strategic advantages for more thoughtful, sustainable, and scalable AI adoption.

Numbers that speak louder

EdTech meets edge AI: Scalable, privacy-first ecosystems

Think about the modern classroom. Each pupil receives a unique lesson plan courtesy of generative AI.

Every single plan is flawlessly customized and catered for – even in remote schools with unstable internet. Now consider this projection from MarketResearch: the generative AI in the EdTech sector is anticipated to increase to $5.26 billion by 2033 from $191 million in 2023, which comes with a CAGR of 40.5%.

Or take the National Education Policy Center figure: classroom districts spent $41 million on adaptive learning for personalized education in just two years.

But here’s an astounding statistic – currently, cyberattacks on educational institutions have compromised the information of more than 2.5 million users (eSchool News).

Moreover, over 1,300 schools have been victims of cyberattacks which include data breaches, ransomware, and phishing email scams since 2016 according to a report by Cybersecurity and Infrastructure Security Agency in January 2023. In Sophos’ most recent survey, 80% of schools were reported as a target for a cyber assault in 2022, which is an increase from 56% in 2021.

In fact, schools have now become the predominant targets for cybercriminals according to The74. The increase in attacks on the education sector shows that it has one of the highest rates of ransom payment, where 47% of K-12 organizations admitted they paid an average of $2.18 million in recovery attacks.

These numbers indicate there is a glaring problem: security and privacy have not been more important as EdTech continues transforming the learning experience. There is robust security software that is manageable and economical, but gives schools deep financial challenges.

Here is where Edge AI comes in: this advanced technology not only promises scalable, personalized learning experiences, but it also delivers a privacy-first approach by keeping sensitive information protected through local on-device processing rather than cloud systems. Let’s explore how EdTech and Edge AI merging can solve these nagging problems and reshape the future of education.

EdTech meets edge AI: Scalable, privacy-first ecosystems

The innovative integration of edge AI with educational technologies

For years now, Education Technology (EdTech) has been ‘revolutionizing’ the world of learning by turning simple textbooks into complex adaptive systems that strive to meet the requirements of individual students.

This transformation is driven by adaptive learning algorithms infused with AI, which processes student data and modifies lessons in real-time. However, one flaw exists: The more traditional systems of AI tend to rely favorably on cloud processing. This form of computing has its drawbacks with regard to bandwidth, peak latency periods, real-time responsiveness lag, or even more concerning leakage of sensitive student data.

Enter Edge AI, an AI system that resides within smartphones, laptops, smart gadgets…you name it. Whereas systems dependent on the cloud would struggle with latency and privacy concerns, Edge AI can process data locally, resulting in an increased absence of risk.

The crossroads where Edge AI meets EdTech is more than just a technological improvement: It serves as a scalability and privacy solution, two crucial components needed in education today. This is how education ecosystems stand to be revamped considerably.

Technical overview: The role of edge AI in adaptive learning algorithms

What is edge AI?

In its most basic form, Edge AI is the placement of an AI “brain” on the very “edge” of a network – where data is produced.

As an example, instead of sending every byte of information to a faraway cloud server, algorithms execute on-device using available hardware such as microcontrollers or GPUs. A student’s tablet can evaluate quiz performance, adjust subsequent lessons, and provide feedback, all in real-time, without having to contact a centralized data hub.

Scalability with low latency

The benefits of edge AI are its speed and ability to scale. Adaptive learning requires real time feedback, like increasing the difficulty of math problems for a student who has already mastered the basics. In most cases, cloud-enabled systems tend to falter in this area due to latency as data is sent and received.

Edge AI, on the other hand, does not have this issue as processing is done locally, meaning feedback is instantaneous. According to a 2023 survey by ACM computing surveys, edge computing lagging by as much as 80% when compared with the cloud, makes it best suited for time-sensitive EdTech applications.

Take AI-enabled tutoring platforms for instance: they can not only analyse a learner’s mastery of algebra, but also switch to geometry mid-session without buffering. This kind of immediacy enhances engagement as learners remain submerged in the flow of the moment, not waiting or idling for the next chore.

Energy efficiency

Edge AI is not only swift, but also efficient. It reduces energy use by cutting down data transfers to the cloud. Edge-cloud systems as outlined in ScienceDirect demonstrate local processing can reduce energy usage by 30-50%. This is beneficial for the battery life of devices and the emissions from data centers. In EdTech, this translates to affordable and eco-friendly tools that do not burden school budgets.

Data protection and a privacy-focused strategy

Student data protection

With GDPR, FERPA, and CCPA components intensely scrutinizing student data, it has become a liability. Edge AI keeps it on-device, eliminating the need to transmit sensitive information such as a child’s reading preferences or test scores over the internet.

This, of course, dovetails with privacy regulations: Learnosity reported that GDPR fines exceeded €1 billion in 2023 alone, demonstrating the regulators’ no-tolerance policy regarding data mismanagement.

Reduction of breach opportunities

Hackers have a field day with cloud servers. Edge AI flips the script; there is no single centralized honeypot to crack. On-device processing reduces the opportunity for exposure. According to Parachute, in Q1 2024, the education sector experienced an average of 2,507 cyber attacks per week, indicating a significant rise in targeted attacks on educational institutions.

Ethical issues

There are a lot more issues than compliance when it comes to Edge AI in education technology, surveillance is creepy and data faces constructively exploitation. With capitalist motives milking every click of profit, honed by centralised AI, it’s understandable to feel like Big Brother was tracking you. Users gain back control with decentralized Edge AI. That changes everything. Now, it’s education, not espionage.

Examples of privacy-focused EdTech

The mobile app from Duolingo incorporates some local processing for various language exercises and minimizes reliance on the cloud. On the other hand, some startups like Century Tech use Edge AI to tailor the learning experience while also branding themselves as compliant with GDPR, earning accolades from privacy-sensitive parents.

Case study: ASU’s secure federated learning platform

Together with ATTO Research, Arizona State University is building an edge device secure federated learning platform with a focus on privacy (ASU AI Edge Project).

Under the guidance of Assistant Professor Hokeun Kim, the project develops middleware for edge developers – facilitating collaborative learning without sharing raw data amongst devices. “Historically, edge devices were fairly secure,” says Kim, a faculty member in the School of Computing and Augmented Intelligence, part of the Fulton Schools.

“The devices were performing basic functions and transmitting information to data centers where most of the real work was being done. These centers are managed by experts who provide multiple layers of data protection.”

Use case scenarios range from medical education to smart campus initiatives, improving scalability and privacy. The outcomes are yet to be achieved, but the emphasis on secure, on-device AI is a primary concern for EdTech, especially in remote learning situations.

Limitations and bias: A multi-faceted spectrum

There are some flaws with Edge AI. Devices such as inexpensive tablets have hardware limits, which pose a bottleneck for complicated models; imagine the neural networks needing more power than a microcontroller can provide. As the 2025 Edge AI survey on arXiv mentions, developers have to optimize algorithms, pruning and quantizing to mechanical limits.

Bias is problematic regardless of the form of AI being used: If there’s a skew in data sets that are used for training, all outcomes will be biased. This can be a cause for exacerbating the education gap.

There’s a need for transparency: algorithms need to be made available for examination, something EdTech companies are obligated to provide. While it improves privacy, Edge AI increases the demand for strong security on the device. Take over the tablet, and you have control.

Collaboration between AI tech giants And EdTech

Open-source edge AI frameworks

The future is based on teamwork. AI Giants like Google and NVIDIA can partner with EdTech players such as Pearson or Coursera to develop open-source Edge AI frameworks. These toolkits would allow smaller companies to develop privacy-first, scalable solutions without reinventing the wheel. There is already a glimpse of this in TensorFlow Lite’s focus on the edge; Imagine it’s curriculum specific.

Lowering the barriers

Cooperative effort lowers expenses and technical sophistication. Custom-tailored AI systems are financially unfeasible for rural school districts or lean startup EdTech companies; open frameworks level the playing field. This allows innovation as per Forbes’ reporting on technology inclusivity.

Futuristic-proofing education

AI tech companies are yet to focus on securable scalable tools for EdTech – for example, plug-and-play adaptive learning systems that automatically comply with GDPR and FERPA. Suggestion? Annual AI-EdTech joint conference or interdisciplinary laboratories that combine AI brawn and educational expertise for innovative development.

Final thoughts

The combination of Edge AI and EdTech seems to create the perfect learning environment. By merging expansion and privacy, they’re creating systems for learning that are quick, equitable, and ready for the future.

From distant communities to expansive educational institutions, this unification aims to deliver personalized, safeguarded, and robust educational experiences. In reality, the numbers speak for themselves: With climbing adoption levels and growing concerns over cyberattacks, Edge AI is not an option – it is a necessity for future schools. Let’s embrace the change.

Generative AI Summit Washington, D.C, 2025Generative AI Summit Washington, D.C, 2025

Catch up on every session from Generative AI Summit Washington, D.C. with sessions from the likes of Meta, Glean, AstraZeneca and more.

LLMOps in action: Streamlining the path from prototype to productionLLMOps in action: Streamlining the path from prototype to production

LLMOps in action: Streamlining the path from prototype to production

AIAInow is your chance to stream exclusive talks and presentations from our previous events, hosted by AI experts and industry leaders.

It’s a unique opportunity to watch the most sought-after AI content – ordinarily reserved for AIAI Pro members. Each stream delves deep into a key AI topic, industry trend, or case study. Simply sign up to watch any of our upcoming live sessions.

🎥 Access exclusive talks and presentations
✅ Develop your understanding of key topics and trends
🗣 Hear from experienced AI leaders
👨‍💻 Enjoy regular in-depth sessions

Date: April 23, 2025

Time: 6pm GMT

Location: Online

We’ll explore how data scientists, engineers, and end-users can work together seamlessly to unlock the full potential of LLMs, ensuring effective, confident deployment across use cases.

Key points to be covered:

Understanding the LLMOps lifecycle: An overview of the LLMOps lifecycle from model design and development to deployment, monitoring, and refinement.
Optimising collaboration: Practical approaches to accelerate collaboration among data scientists, engineers and users.
The what, why, and how of LLMOps: A foundational understanding of LLMOps, why it’s critical for organisations, and how to build and scale efficient operations.
Real-world scenarios: Case studies showcasing success with LLM applications.
Challenges in LLMOps and practical solutions: Addressing common obstacles in LLMOps life cycle.

This presentation is perfect for AI practitioners, developers, and team leaders looking to advance their knowledge of LLMOps.

Meet the speaker:

Dr. Dmitry Kazhdan, PhD, CTO & Co-Founder, Tenyks

Co-Founder & CTO at Tenyks. Building a best-in-class Visual Data Management & Analytics platform, powered by Generative AI.

LLMOps in action: Streamlining the path from prototype to production

GenAI creation: Building for cross-platform wearable AI and mobile experiencesGenAI creation: Building for cross-platform wearable AI and mobile experiences

GenAI creation: Building for cross-platform wearable AI and mobile experiences

Yiqi Zhao, Product Design Lead, Meta Reality Labs at Meta gave this talk at the Generative AI Summit in Washington DC, 2025.

I’m Yiqi, the design lead for Meta Reality Labs, the organization that makes many AR/VR glasses, like the Ray-Ban and the Meta Quest series.

Today, I bring a video along with a topic that might not be something you’ve thought about deeply before. But I want you to consider this—can you be a creator?

Can you be someone who makes content and actually makes money from it? Can you create fun, engaging experiences within the new developer ecosystem that’s emerging with devices like the Meta Quest, the Meta Ray-Ban glasses, and the incredible capabilities of AI? Would this be possible?

I want to talk about how you can unlock your creative power and, more importantly, how you can leverage AI to be fully ready for this new platform and the opportunities that come with it.

The rise of immersive content and Meta Horizon

From the video, you might have noticed the rich, detailed 3D immersive content. This isn’t something that’s coming in the future—it’s happening right now on our platform.

We recently rebranded our platform under the Meta Horizon name. Essentially, everything is becoming Horizon.

Meta Horizon is more than just a name change—it represents our vision of a platform that connects people in ways that are richer, more interactive, and more immersive. We want people to socialize, engage, and find their communities in a way that feels natural, just as they do in the real world.

Unlocking your creative power

We are seeing a shift in devices from traditional screens—laptops, phones, tablets—to mixed-reality experiences. The shift is massive.

If you look at traditional devices, they have always had limitations. They are separate from us; they require us to interact with them from a distance. But mixed reality devices, like VR headsets and AR glasses, are different.

LLMOps in action: From prototype to productionLLMOps in action: From prototype to production

If you’ve ever built a GenAI application, you know the drill—your prototype looks amazing in a demo, but when it’s time to go live? Different story.

In this exclusive video, Samin Alnajafi, Success Machine Learning Engineer at Weights & Biases, unpacks why LLMOps is the missing link between promising GenAI experiments and real-world deployment.

Here’s what you’ll learn:

Why so many GenAI projects stall before reaching production
How to measure and optimize performance using LLMOps best practices
Key components of a scalable retrieval-augmented generation (RAG) pipeline
Practical examples and a live demo of Weights & Biases tools

Don’t let your GenAI project get stuck in limbo.

Watch video now

P.S. And if you have a few minutes to spare today, why not share your LLMOps expertise? We know how busy you are, so thank you in advance!

Share the tools you use, the challenges you have, and more, and help define the LLMOps landscape.

Whenever you’re ready, here are three ways we can help you grow your AI career:

Become a Pro+ member. Want to be an expert in AI? Join Pro+ for exclusive access to insights from industry leaders at companies like Meta and Google, one complimentary ticket to an in-person Summit of your choice, experienced mentors, AI advantage workshops, and more.
Become a Pro member. Want to elevate your AI expertise? Join Pro for exclusive access to expert insights from leaders at top companies like HuggingFace and Microsoft, member-only articles and frameworks, an extensive video library, networking opportunities, and more.
AI webinar. Want to unlock smarter, faster, and more scalable incident management? Join us on April 25 for a live session on how AI transforms incident management to accelerate investigations, surface relevant insights, and dynamically scale workflows. Register here.
Exclusive tech leader dinner. Join us in NYC on March 19 for an insightful conversation around the trends, challenges, and opportunities related to harnessing and maximizing Generative AI for the enterprise.