How to rethink Siri

Siri is much too limited and inconsistent. The only time I ever use Siri is when driving, for responding to text messages and dictating notes. Many people will have different use cases, and so when people say “Siri sucks” they probably all mean different things.

There are many things that could be improved in Siri, but to me it all comes down to just two fundamental shifts:

Universal Siri that works the same across all devices.

The illusion of Siri as a personal assistant is broken when basic tasks that work from your phone don’t work from your watch or HomePod. I’ve long thought and discussed on Core Intuition how Apple has tied Siri too closely to devices and installed apps.

That’s not to say that controlling installed apps isn’t useful, in the way that Shortcuts and scripting are useful. I expect Apple to have more of that at WWDC next week. But in addition to extending Siri with installed apps, to make it truly universal there should be a way to extend Siri in the cloud, just as Alexa has offered for years.

Standalone devices like the Human AI Pin and Rabbit R1 have been criticized as “that should be an app”. While it’s true that the iPhone will continue to dominate any potential non-phone competition, I think there is a narrow window where a truly new device could be disruptive to the smartphone if Apple doesn’t make Siri more universal and seamless across devices. This universality might sound subtle but I think it’s key.

Large language models.

This is obvious. Talking to ChatGPT is so much more advanced and useful than current Siri. With ChatGPT, you can ask all sorts of questions that Siri has no clue about. Sure, LLMs are wrong sometimes, and I’d love for Siri to be uncertain about some answers. If there was a way to have some kind of weighting in the models so that Siri could answer “I’m not sure, but I think…” that would go a long way to dealing with hallucinations. Generative AI is less like a traditional computer and more like a human who has read all the world’s information but doesn’t really know what to do with any of it. That’s okay! But we wouldn’t blindly trust everything that human said.

There are many other improvements that would come along with using even medium-sized LLMs on device for Siri, such as dictation. OpenAI’s Whisper model is almost 2 years old now and way better than Siri.

Apple is going to talk a lot about privacy and on-device models at WWDC. A dual strategy for LLMs is the way to go, with models on your phone that can do a bunch of tasks, but some kind of smarts to switch gears to using LLMs in the cloud when necessary. I’ve done a bunch of experiments with open-source LLMs on my own servers, and it requires a lot of RAM and GPU to get reasonable performance. If we use “parameters” as a rough metric for how much horsepower LLMs need, note that Meta’s Llama 3 (which is pretty good!) is a 70 billion parameter model. GPT-4 is rumored to be nearly 2 trillion parameters. If Apple can’t get GPT-4 level quality and performance on device, they should not hesitate to use the cloud too.

Looking forward to WWDC next week! Should be a good one.

Paul Robert Lloyd

If they do use cloud based models, curious how they’d account for the impact these would have on their very public climate commitments (the answer can’t be offsets).

Jeff Baxendale

I mean just try using an Apple Watch with no connectivity (like on a jog with no phone, as I do multiple times a week) trying to add a reminder or note of something you just thought of… and its a huge exercise in frustration that it can't do something so basic despite newer watches having on-device neural processing for things like transcription now.

Granted, third party apps are also bad no-phone/connectivity watch citizens too but I'd expect better from the first party.

Manton Reece

@paulrobertlloyd I think Apple has two choices: announce that all cloud AI is at their own data centers, which are already 100% renewable; or, announce the OpenAI partnership using Microsoft servers, which will be 100% renewable in 2025. If they let this go without saying anything, I'll be very surprised. I'm also skeptical that Apple's own data centers are ready to scale up.

Matt Huyck

That is an excellent point about universal interactions. Our mental model is that we are interacting with the same entity each time, but we are not. This is a design flaw, but a fixable one.

Matt Huyck

I keep seeing people say stuff like this:

If there was a way to have some kind of weighting in the models so that Siri could answer “I’m not sure, but I think…” that would go a long way to dealing with hallucinations

While that would be a great thing to have, it’s just not a thing an LLM can do. For this to work, the model needs to “understand” what it’s saying. It doesn’t. It’s just generating words that match a statistical model of the words in its training data. That’s it.

Manton Reece @manton
Lightbox Image