Apple’s WWDC 2024 started today. In this post, I quickly recap my thoughts on what this all means for Apple’s AI strategy in simple terms, and why you should care.
In a nutshell, what I took away is the following:
Apple clearly knows its stuff when comes to AI - contrary to what many doubters proclaimed. They just announced the best on-device model in the market, as well as its own data center - that Apple built from ground up - that can run Apple scale inference with its own chips. Apple is already a vertically integrated AI company, and deserve a higher valuation.
Speaking of inference, Apple says “F you” to Nvidia. As they should. Apple’s success story may be the slight bearish case for Nvidia, especially if Apple also figures out how to use Apple silicon for training.
The partnership isn’t that great for OpenAI - and disappointing even, if you are an OpenAI bull.
Indie app publishers are screwed.
Lastly, Siri may bring another “ChatGPT moment'“ for AI.
But let’s quickly recap how Apple classifies its AI workloads, which are grouped into three buckets: on-device, private cloud compute, and 3rd party model inference (explained below). Note, it’s Apple’s OS that automatically decides whether to run AI locally or remotely, unless it’s for ChatGPT which requires the user to explicitly opt-in. The three buckets:
on-edge LLM inference: a small, low latency AI model (3bn params) will be included in future iOS versions, and it will be able to understand user commands, the current screen, and take actions on apps. It can handle simple tasks like summarization, as well as powering the “AI agent” features of Siri (such as handling user commands that require opening and utilizing multiple apps - e.g. “Siri, call an Uber to the nearest Costco”). Most importantly, the model runs on Apple Silicon (e.g. M-X chips).
private cloud compute: The on-device LLM may decide to offload certain complex tasks to more powerful models hosted in Apple’s data centers (called “private cloud compute”). These data centers will also run completely on Apple’s M chips. The data sent to and fro will be fully encrypted and secured. The servers themselves are made by Apple from ground up. In other words, Apple has vertically integrated everything required to run AI on-device and inside data centers.
3rd party model inference: Users will also have the option to use OpenAI’s ChatGPT directly from Siri or certain iOS apps. Note, this is not the same as using ChatGPT as a replacement for Siri - which is what many thought the OpenAI partnership meant. Rather, ChatGPT is provided as an alternative to Apple’s models in certain situations (e.g. user is about to perform email revision, and ChatGPT’s response is offered as a choice).
So in a nutshell, Apple will still power most of the AI features, and “bring in” ChatGPT on a needs basis. This arrangement is perfect for Apple for many reasons, and gives Apple a ton of optionality in the future:
First, Apple gets to maintain its vertical integration for both on-device and server side offloading of AI. This allows Apple to control everything, being the control freaks they are, especially the GPU / semiconductor costs as well as privacy.
Unlike $meta, Apple is not beholden to Nvidia or market forces to determine how much their inference infrastructure will cost in the future.
This allows Apple to save a lot of money upfront, but also iterate and improve on Apple silicon, and maybe even sell these inference clusters as a service down the line.
On the other hand, Microsoft relies on Qualcomm for its CoPilot PCs.
Second, by observing ChatGPT’s responses, Apple can play catch up to the frontier model capabilities of GPT-4o.
This is similar to Google Maps vs Apple Maps situation. For now, GPT-4o is the best model, so having it integrated is the consumer-friendly thing to do.
But Apple can also collect data on how users utilize GPT-4o versus Apple’s models, and perform gap assessment. This becomes valuable training data for Apple.
In other words, it doesn’t hurt Apple to have ChatGPT around, just like having Google Maps didn’t hurt Apple despite Apple having its own maps.
If this plan works out, AI will accelerate Apple’s revenues and strengthen its market positioning and negotiating powers:
AI features will certainly help sell the more recent, high end Apple devices, and accelerate device upgrade cycle.
Apple will determine its own destiny when it comes to AI inference. (Training is another story, since the M chips aren’t designed for that, but this is the first step)
Also, Siri’s agentic features - if they work as advertised - can increase Apple’s leverage over App Publishers, because now the AI - not the user - is the entity opening and clicking on the apps.
In other words, Apple not only owns the discoverability (app store) for apps, but also mediates the user interaction. This puts an additional layer of abstraction to the end user from app developer’s perspective.
The WWDC also highlighted some surprising facts about Apple’s “AI capabilities”:
It’s now clear that Apple knows how to train frontier model-quality models, but it’s simply choosing to lay low. Apple’s server-side models running in the Private Cloud Compute are apparently quite near the GPT-4o in terms of quality - and this model was trained completely in house, with “a mix of TPUs and GPUs”. This means Apple has been training LLMs at least neck-in-neck with what Meta is capable of , except Apple chooses not to directly fight the war of attrition with other hyperscalers, and prefers to lay low.
Apple may already have the best on-device (“small”) model in the market, and it can run on Apple silicon. Apple’s model has extremely low latency (0.6 milliseconds to first token), outperforms similar sized Phi and Gemini models from Microsoft and Google. It is also trained and finetuned on actual user action data that’s directly collected on-device. This means Apple and Google are the only companies that can deliver end-to-end on the mobile phone LLM inference workload. With Microsoft, for example, it relies on Qualcomm for edge inference chips.
Some other observations:
Voice input usage will finally explode, and Siri’s agentic features will bring another “ChatGPT moment” for AI. Agentic features will allow so many small tasks to be done completely “hands-free” (e.g. calling an Uber without opening the Uber app), and Siri will finally deliver on its original promise of being an “assistant”. There will be another “ChatGPT moment” brought on by Siri, and AI usage by normal people (e.g. your parents) will skyrocket, since voice is a great UX for those with accessibility needs.
Individual apps will suffer from less discoverability and in-app monetization opportunities. If most of Siri’s interaction with the user ends with the user never opening any app, then this means apps have less discoverability and monetization opportunities.