Manton Reece - How to rethink Siri

Siri is much too limited and inconsistent. The only time I ever use Siri is when driving, for responding to text messages and dictating notes. Many people will have different use cases, and so when people say “Siri sucks” they probably all mean different things.

There are many things that could be improved in Siri, but to me it all comes down to just two fundamental shifts:

Universal Siri that works the same across all devices.

The illusion of Siri as a personal assistant is broken when basic tasks that work from your phone don’t work from your watch or HomePod. I’ve long thought and discussed on Core Intuition how Apple has tied Siri too closely to devices and installed apps.

That’s not to say that controlling installed apps isn’t useful, in the way that Shortcuts and scripting are useful. I expect Apple to have more of that at WWDC next week. But in addition to extending Siri with installed apps, to make it truly universal there should be a way to extend Siri in the cloud, just as Alexa has offered for years.

Standalone devices like the Human AI Pin and Rabbit R1 have been criticized as “that should be an app”. While it’s true that the iPhone will continue to dominate any potential non-phone competition, I think there is a narrow window where a truly new device could be disruptive to the smartphone if Apple doesn’t make Siri more universal and seamless across devices. This universality might sound subtle but I think it’s key.

Large language models.

This is obvious. Talking to ChatGPT is so much more advanced and useful than current Siri. With ChatGPT, you can ask all sorts of questions that Siri has no clue about. Sure, LLMs are wrong sometimes, and I’d love for Siri to be uncertain about some answers. If there was a way to have some kind of weighting in the models so that Siri could answer “I’m not sure, but I think…” that would go a long way to dealing with hallucinations. Generative AI is less like a traditional computer and more like a human who has read all the world’s information but doesn’t really know what to do with any of it. That’s okay! But we wouldn’t blindly trust everything that human said.

There are many other improvements that would come along with using even medium-sized LLMs on device for Siri, such as dictation. OpenAI’s Whisper model is almost 2 years old now and way better than Siri.

Apple is going to talk a lot about privacy and on-device models at WWDC. A dual strategy for LLMs is the way to go, with models on your phone that can do a bunch of tasks, but some kind of smarts to switch gears to using LLMs in the cloud when necessary. I’ve done a bunch of experiments with open-source LLMs on my own servers, and it requires a lot of RAM and GPU to get reasonable performance. If we use “parameters” as a rough metric for how much horsepower LLMs need, note that Meta’s Llama 3 (which is pretty good!) is a 70 billion parameter model. GPT-4 is rumored to be nearly 2 trillion parameters. If Apple can’t get GPT-4 level quality and performance on device, they should not hesitate to use the cloud too.

Looking forward to WWDC next week! Should be a good one.