If you’ve been in enterprise AI sales calls lately, you may have heard of the term AI Gateway. The first “Gen AI” gateways in enterprise emerged around late Q1 2023 and is now increasingly adopted as of Q4 2023. Here are some publicly announced ones (still early):
AI Gateway for MLflow (07/25/2023)
Cloudflare AI Gateway (09/27/2023)
In this post, I will go over:
a brief history (year to date) of Gen AI adoption at the largest enterprises (Fortune 500), in order to motivate the Generative AI Gateway (“AI Gateway”).
what AI Gateway is, its key benefits, and intended customer profiles
a potential opportunity for startups to sell AI Gateway as a managed service / or solution, both from offensive and defensive strategy angles.
Setting the stage
It’s been almost a year since the release of ChatGPT, but only few industry incumbents (across healthcare, banking, industrials, retail, etc) have released AI-powered products, especially for end-user facing B2B or consumers.
While this lull can be misconstrued as inaction or slowness, the largest enterprises (3000+ employees) are taking deliberate steps to release the floodgate of AI-powered products, starting next year.
One such step for enterprises is building an internal AI gateway - an internal AI model serving layer, with built-in enterprise-grade governance, observability, reliability, and resilience features. Think API Gateway, but upgraded for the Generative AI era.
Understanding AI gateway is important for anyone selling to enterprise, because it will become either:
a beachhead / entry point for AI startups to distribute their AI services within enterprise
potentially a standalone product opportunity. Considering how multiple API Gateway as a service or data governance startups were founded in the past - AI gateway could be the Gen AI equivalent construct.
The common patterns of enterprise AI adoption (in 2023)
Before describing what AI gateway is in full, let’s understand how enterprises have shaped their AI strategy year to date. Enterprise AI adoption came in mainly three stages this year, and each stage has stuck as an adoption pattern. The stages are:
Do POCs with proprietary models e.g. OpenAI
Adopt a “Portfolio approach” to LLMs / FMs - including both OSS and proprietary
(now) Build AI gateway and abstractions over full lifecycle of LLM / FM consumption
Quick and important caveat: the above stages apply to AI-enthusiast enterprises (majority of Fortune 1000) that are looking to build AI capabilities in-house. A sizable portion, however, are simply willing to wait, without committing resources to building any custom machinery around AI. These enterprises lean toward “buy” versus “build”, and are looking for solutions provided by vendors or consultancies (e.g. Deloitte).
Stage / Pattern 1 - OpenAI / Proprietary POCs
In Q1 2023, the very early adopters of AI unblocked POCs (proof of concepts) by running OpenAI APIs in sandboxed environments. Sandboxes and “clean rooms” allowed enterprises to ensure compliance with regulations.
This stage helped enterprises with “idea of what’s possible” with AI, but many ran into reliability, cost, technical, privacy, and/or AI governance issues that prevented general release of AI prototypes. Note, some companies like Expedia that had dedicated support from OpenAI (perhaps due to having Sam Altman on the board) were able to productionize AI features. Also, many startups still remain in this stage 1 as they don’t feel the need for more complicated enterprise features.
But most sophisticated enterprises were unable to bring a pure OpenAI / Azure-powered app to market due to 1) the lack of private VPC support, 2) gaps in private finetuning features, 3) data privacy concerns, or 4) simply cloud vendor mismatch. Some of these concerns were eventually addressed as of October 2023, but not launching Azure APIs with these privacy features perhaps gave other vendors (e.g. GCP, AWS, etc) a strong messaging to compete with Microsoft.
The real legacy of Stage 1 was not production AI features, but convincing enterprises of the need to prioritize AI roadmap, identify internal use cases, and conduct feasibility studies.
Stage / Pattern 2 - “Portfolio approach” to LLMs / FMs - both OSS and proprietary
As the narrative that OpenAI APIs may not be “enterprise-ready” spread in the industry, many companies simply backed off and started exploring the OSS model ecosystem. The release of Meta’s Llama2 solidified the narrative of needing a “mixture of LLMs / FMs” in enterprise, as well as the benefits of finetuning models (that OSS models are better suited for).
Unfortunately, non-FAANG enterprises tend to be GPU-poor and ML-talent constrained, which constrains their ability to finetune and serve LLMs on their own infrastructure at scale. Many ended up questioning (again) whether OSS LLMs are ever needed, especially when serving Llama2 on own GPUs cost more when there’s not enough requests (all-in costs are especially higher considering personnel).
The real legacy of Stage 2 was that enterprises realized two things:
the areas of gap for production-izing LLM / AI apps, such as reliability and AI governance
the need for a platform to seamlessly support a mix of LLMs (not just proprietary, but also OSS) - not just at the SDK / library level (like Langchain) but at greater scale
These needs led to stage 3 of AI adoption for large enterprises: building AI gateway and abstractions over full lifecycle of LLM / FM consumption
Stage / Pattern 3 - AI gateway and hub and spokes model.
Enterprises foresaw early the need for abstractions and controls around serving LLMs, both for internal and external customers. The question was mainly a matter of “build vs buy”, and waiting for the best practices of LLMOps to emerge. Also, many questioned whether LLM / FM Ops could be folded into existing MLOps and serving stack, or whether a greenfield project was needed.
Over time, large enterprises are starting to agree on the following realizations and pain points, which support the notion of building a separate abstraction just for LLMs / FMs:
Insight #1: the need for abstractions over multiple model providers to use both proprietary and OSS LLMs, rotate them easily in production, and avoid “vendor lock-in”. That’s because 1) models are getting better and landscape is changing quickly, 2) there may be multiple winners to the LLM war, and 3) OSS models will stay relevant.
Insight #2: That they needed access control for how internal users (and services) consume foundational models, to prevent “bad behaviors” and stay compliant. And that some team needs to own building these controls, instead of duplicating this effort everywhere.
Insight #3: LLM calls need a standard set of “traditional” API Gateway features like observability, metering, access control, usage limits, etc.
Insight #4: no matter the industry, reliability and governance issues need to be addressed prior to “going big” on Gen AI app launches. Companies are now actively working on 1) setting standards on how to do AI safety and governance, then 2) build programmatic controls to enforce policies.
Insight #5: there are Gen AI specific issues like prompt security, semantic caching, and semantic response filtering that need to be incorporated into API gateways.
Thus, enterprises started working on what we loosely call AI gateways, which theoretically allows them to move fast without breaking thing. This post is introducing the term to collectively refer to these efforts.
What is AI Gateway?
AI gateway is API gateway reimagined for the generative AI era. Think of AI gateway as API gateway enhanced with governance and control features - all in one - so that large organizations can confidently roll out LLMs across thousands of employees and millions of customers. For more detailed discussion of AI gateway architecture, I’ll link to a blog post I wrote for AWS here.
At a high level, AI gateway combines three components:
#1: the features of traditional API gateways (observability, metering, audit, access control, etc).
This layer also routes requests to the appropriate model provider’s API or infra (such as OpenAI, VertexAI, your own GPUs, HuggingFace endpoints, etc).
API gateway knows how to route requests based on metadata embedded in request context (such as
model_id)
, and looking up thatmodel_id
in a model endpoint registry that tracks the IP / DNS of various model serving APIs in real time.
#2: (new) FM / LLM-motivated features like prompt-injection protection, AI policy enforcement, LLM-enabled evaluation, model guardrails, semantic caching, and so on.
Notably, these checks and monitoring are done in runtime, and logged for audit trail purposes.
#3: (new) Workflow UI for AI governance / people to register and approve new foundational models, as well as a private model playground.
Prior to models being available internally, compliance people need to ensure models’ EULA and capabilities are compliant, etc.
From the perspective of enterprises, there’s a long list of benefits of centralizing these functions:
Easier governance: preventing “non-compliant” behaviors is easier when there’s only one entry point to FMs / LLMs within your org, as opposed to N number of them. Auditing is easier because logs are centralized.
Move fast while remaining compliant: by standardizing controls and not duplicating the effort of building them.
Ease of switching among FMs / LLMs: having a hardened abstraction layer over model providers at the service level (not SDK, app layer level like Langchain) allows the org to keep up with the latest LLM trends.
Avoiding putting all eggs in a single basket: Being able to switch among LLMs helps companies keep control - which is especially important for multi-cloud environments.
Resilience: Spreading bets over multiple LLM endpoints improves resilience in case one model goes down, etc.
Cost savings and costing: Metering and alarm features of traditional API gateways can help with throttling requests and keeping teams under budget for LLM use.
and more.
Note, there are some gotchas when it comes to building AI gateways. One common mistake I see enterprises make is spending way too long building internal AI gateway. This is potentially disastrous, because it prevents LOB (lines of business) teams from experimenting with LLMs. In fact, building this AI gateway could become an excuse to move slow on generative AI. I have seen companies that have not started a single POC, 9 months after the release of ChatGPT, because they are blocked on building internal tooling.
Another gotcha is that now change management involves more stakeholders, since every consumer of LLMs / FMs within the company depends on AI Gateway. So if the team behind AI gateway moves too slow, or if a model takes too long to be approved, etc, these could be reasons for slow AI adoption.
At this point, you may be wondering whether enterprises should build or buy AI gateways. This question entirely depends on the org’s priorities and capabilities. Generally speaking, build if you want more control and have the resources to operate this service. Otherwise, lean on buying turnkey solutions.
The AI Gateway Opportunity for Startups
With enterprises trending toward adopting AI gateways and centralizing LLMOps and governance functions, there are plenty of opportunities and lessons for startups that compete in LLM / FM picks and shovels space.
Offering AI gateway is perhaps a strategic must for non-Big Cloud companies competing in AI and Data (MongoDB, Databricks, PlanetScale, Cloudflare, etc), because:
Offensive move: AI Gateway is the entry point to LLM consumption within orgs, thus an attractive beachhead to drive adoption for related services. In addition, due to multi-cloud movement as of late, many enterprises are resistant to using a a big cloud provider’s AI gateway solution.
Defensive move: On the flip side, giving up AI gateway stack to others weakens foothold into the AI stack.
And for startups, there are mainly two ways to compete:
You can either build plugins that can be easily used by enterprise AI gateways. For example, I can imagine enterprises buying a neat solution for managing and enforcing AI governance policies.
Or you can sell AI gateway solution directly, which is probably harder. The space is still evolving, so I imagine there are plenty of opportunities for startups to aim to be Kong, Mulesoft , etc for the LLM era.
Either case, startups shouldn’t assume they are operating within a vacuum and ask how their solution will plausibly fit with AI gateways at enterprises.
Note, it’s unclear yet whether AI Gateway is a revenue opportunity or a pure defensibility play. We may even end up in an end state where every major vendor offers an AI gateway as table-stakes feature, driving margins down. This is why I foresee there will be a few winning OSS solutions for this domain as well.
But for large pre-IPO startups such as Databricks that already control a popular OSS project like MLFlow, it makes sense to offer a managed FM / LLM lifecycle management solution.