Why AI Infrastructure Startups Are Insanely Hard to Build

Level of difficulty: insane

Jul 03, 2024

Recently, Adept AI announced they are being acquired by Amazon, and this solidified a somewhat controversial opinion I’ve held for a while - that AI infra startups are a tarpit idea, especially as a “venture-scale” business.

The term “tarpit idea” refers to startup ideas that sound reasonable on the surface, but when put to test against reality or rigorous thought, fail to hold up.

I believe most AI infra startups will also fall into this category, where AI infra refers to the “building blocks” companies between the cloud layer and the application layer - RAG services, finetuning infrastructure, text processing services, TTS APIs, vector databases, etc. I won’t name specific names, but just think of any AI infra startup that raised sizable seed rounds off of open source or social media momentum.

I also believe many founders agree with this viewpoint, which explains the sale of Adept (to Amazon), Rockset (to OpenAI), InflectionAI (to Microsoft), as well as the soon to be acquisitions of Stability (if it happens), CharacterAI, etc. Every incumbent is looking at M&A to paint an “end-to-end AI platform” story. Only a lucky few will get bought.

June 28 updated AI Infra market map — Source: Bessemer Venture Partners

So why is selling AI infrastructure as a startup a tarpit idea? On paper, it’s perfectly reasonable to sell picks and shovels amidst proliferation of AI startups and enterprises building Gen AI features. After all, there’s over 30K “.ai” domains registered every month.

In a nutshell, the new AI infra startups will struggle to succeed because they lack significant differentiation and capital to crack the enterprise segment. It’s not the startups’ fault, the real problem is competitive dynamics. There’s simply too many entities offering the same table stakes features within 1-3 months apart from each other, which creates a collective tarpit dynamic, where only the incumbents can keep swimming.

The argument goes:

For AI infra startups to be “venture scale”, they will eventually need to win over enterprise customers. No question. That requires the startups to have some sustainable edge that separates their products from the incumbents’ (GCP, AWS, as well as the likes of Vercel, Databricks, Datadog, etc).
Unfortunately, most cutting edge innovation either comes from the incumbents or the research / OSS community - and incumbents are in a better position to commercialize the innovations because they have more usage data than startups, as well as the relationships.
To add salt to the injury, any good ideas that originate from startups get benchmarked and copied quickly. For example, I was quite surprised how quickly Databricks and Datadog caught up to the leading LLMOps products from the startup world (e.g. Arize AI).
Furthermore, OSS community can’t help but create OSS versions of other AI infra startups’ products - perhaps a testament to how easy it has become to write software.
Thus, startups struggle to maintain a sustainable lead over the incumbents to buy them time to win enterprise contracts.
And enterprise customers are incentivized to “hold off” on onboarding new vendors, because vendor products diminish in value so quickly because AI landscape changes every few months.
This ultimately lengthens sales cycles, and increases churn, which hurts startups more than the incumbents.

There are also some other dynamics at play (to be discussed in the next section) - but essentially the AI infrastructure space becomes a grind that favors players with the longest runways.

My intention here is not to doom-post, but to highlight some real challenges, which I’m happy to be wrong on (DM me if you disagree). Also, I will end by offering some advice to AI infra startups.

To clarify, by “AI infra startup”, I’m referring to “venture scale” AI infrastructure startups. I’m sure founders can create essentially system integration agencies targeting SMB or mid market, and call themselves an AI infra company. But that’s a completely different business with a much smaller upside.

Other factors that create a tarpit

There’s three other major forces that’s worsening the competitive environment:

Builders are now conditioned to “demand” composability, a.k.a making it easy to switch out your product for others’. This is great for application layer companies, but not infrastructure companies. Developers can rip out Langchain with Llamaindex, OpenAI models with Claude 3.5 through AWS Bedrock, etc. Every layer of the LLM training and inference stack has at least 10+ viable solutions, that it becomes difficult to create any type of lock-in.
The ongoing plummeting of inference costs also plays a role. The COGS are dropping fast, so AI infra players need to constantly price-match the incumbents who have the biggest economies of scale. Models or code have little perceived differentiation, so the consumption goes to the lowest cost providers (incumbents).
Incumbents seem to all have the same business strategy of creating an “end-to-end AI platform”. Databricks is getting into AI model training and business intelligence, competing with AWS Sagemaker and Tableau. Github Workspaces is getting into AI-powered security reviews, etc.
1. Everyone’s default product strategy is to own all upstream and downstream workloads from their core product, which unintentionally makes startups’ lives more difficult, since it becomes hard to compete with a point solution.

Why Pivoting to Vertical Software or Application Layer Is Not the Silver Bullet

With all these challenges, some AI infra startups have chosen to go vertical or move to the application layer. For example, I have been tracking a “Business Intelligence with Natural Language” startup since late 2022 that has pivoted three times already from:

a general purpose “chat with data” platform, to
“chat with business intelligence data” platform, to
“chat with financial data” platform.

The AI infra darlings LlamaIndex and Langchain also took this path of focus when it comes to their enterprise-oriented products. LlamaIndex is focusing on managed document parsing / OCR, whereas Langchain is focusing on LLMOps and agent building solutions. My guess is that both are working on narrowing their focus even further, since even selling a managed document parsing service is a huge scope for a seed-stage startup, given that Google and AWS already have existing vertical text extraction services. It’s not easy.

Narrowing the scope and going vertical is a typical response for AI infra startups - but I argue that these pivots rarely work out and cause new set of problems. Most importantly, these vertical pivots underestimate the importance of deep domain expertise once you go vertical, which many AI infra founders lack. Accumulating domain knowledge is time consuming. Also, your product may need to be heavily customized for the unique needs of the vertical, which means lower margins.

Not to mention, these application layer ecosystems have even worse competition (e.g. VCs’ LegalTech ecosystem maps ran out of space to put new logos long time ago). There’s not just the other AI startup competition, but competition from the legacy software companies. Pivoting to a vertical does not suddenly get rid of your competitors - you will just have new ones in that vertical who have been there before you. For example, legal tech industry has existed for ages, and many Legal AI companies are now competing with the legacy legal tech providers plus system integrators.

Advice for AI infra startups

So what’s the solution for AI infra startups? Should we all hope to be acquihired, or is it possible for startups to also stay independent for longer and find product market fit?

Here’s a somewhat anti-climatic answer, but the solution for startups goes back to the fundamentals: think deeply about how to be different from the incumbents. Here are four ways to iterate from here:

Narrow down the scope even further: focus on a very tiny segment of enterprise customers, as opposed to serving all customers. Don’t build all the integrations. Be a managed RAG service for customers using Salesforce with on-prem VMWare, as opposed to a general purpose RAG service. Startups don’t have the resources to solve for every environment, at least initially.
Focus on just one workload: startups shouldn’t try to solve for too many workloads. Do one thing really well. Don’t try to be a platform for finetuning any LLM - there’s already too many of those. Instead, try to be the best platform for finetuning Tagalog models. The catch: the TAM might be too small.
Raise more VC money than you think you need: long runways are non-negotiable. It can take a while for enterprises to be receptive to buying startup AI infra solutions, if ever. Be prepared for the worst case scenario.
Or, don’t raise any VC money at all: raising VC money kind of forces you to orient business strategy around selling to the enterprise - which might be not something you can or want to do. You want the flexibility to work on more interesting and promising problems when they arise, given there’s constantly new changes in AI landscape.

Lastly, AI startups should be open to being acquired by a larger player, even if it’s not a prestigious destination like OpenAI or Google. My view is that M&A landscape for AI infrastructure sector will become worse, not better, over time.

The acquisition market will become more “efficient” as the winners/losers emerge, and the workloads and enterprise needs become more clearly defined. Thus, in order to sell your startup at an “attractive” valuation, it needs to be marketed prior to the dust settles when the market is less efficient. Don’t wait for another 18 months to shop your startup, when all AI infra startups start running out of runway at the same time.

Enterprise AI Trends

Discussion about this post