Incidentally, I turned this off today. I suspect it's terrible on battery life and I will find out.
But the thing about the summaries that was they would sometimes imply the EXACT OPPOSITE of what was in a message. I had a few stomach-dropping moments when reading the summaries only for me to read the actual thread to see it was nowhere close. This is one of "it's not even wrong" situations and I don't know how it was fucked up this badly. The nature of the texts themselves weren't complicated either. I didn't save them, but I suspect it stemmed from misinterpretating some subtle omission (like our common practice of leaving out articles or pronouns).
The current AIs are pretty bad at handling negation especially when the models are small and quantised. To be fair, so are humans: double, triple, or even higher negatives can trip people up.
This effect of smaller models being bad at negation is most obvious in image generators, most of which are only a handful of gigabytes in size. If you ask one for “don’t show an elephant next to the circus tent!” then you will definitely get an elephant.
Isn’t the negative prompting thing with image generators just how they work? As far as I understand, the problem is that training data isn’t normally annotated with „no elephant“ with all images without elephant, so putting „no elephant“ in the prompt most closely matches training data that’s annotated with „elephant“ and includes elephants. The image models aren’t really made to understand proper sentences, I think.
Yes, but it’s more complex than that! If you ask “who is Tom Cruise’s mother” you will get a much more robust response than asking “who is Mary Lee Pfeiffer’s son?”.
It’s not just negation that models struggle with, but also reversing the direction of any arrow connecting facts, or wandering too far from established patterns of any kind. It’s been studied scientifically and is one of most fascinating aspects because it also reveals the weaknesses and flaws of human thinking.
Researchers are already trying to fix this problem by generating synthetic training data that includes negations and reversals.
That makes you wonder: would this approach improve the robustness of human education also?
I had no interest in this feature until I read this article, then I immediately switched it on.
I honestly feel Apple should lean into the weirdness by allowing people to change the prompt or allowing people to install alternate prompts from the App Store. So you could have your messages summarized as a haiku or poem, or in the style of Shakespeare or a movie character. I think there would be a market for that.
That’s not going to come to apple devices for a long time, I think. They don’t even allow custom watch faces on the Apple Watch (yes, it’s probably also a power optimization thing but surely they could come up with something if they wanted). Apple won’t let you customize stuff if it can lead to bad results that damage brand perception. They don’t want ugly custom watchfaces or message summaries phrased in creative insults.
It always summarizes my chase payment notifications as "Overdraft alert". First time it happened my heart skipped a beat. Sometimes it kills it, but when it doesn't it can be bad.
They've generally been working well for me (with a few hilarious exceptions), but there really needs to be a way for apps to provide additional context.
An incoming "no" could be so much better summarized when combined with my outgoing message (possibly days ago) that prompted that "no".
I can understand helping with long stacks, but I have no idea why Apple saw it fit to also show AI summaries for a single notification. It was one line of text, now it is still one line but less understandable. Thanks, I guess.
I'm starting to feel like any device that claims to have "AI" built in, means I'm going to have to supervise and baby it's results whilst it tries to do intelligent things
I have found the summaries reasonably good, with 1 exception it got completely wrong, but I only ever want them to summarise a stack of notifications from the same app, and never a single notification. Unfortunately there is no setting for that currently, it's all or nothing.
Would be nice if they had an option only to summarise multiple notifications in a stack, and not to summarise once you expand them.
Especially since so often its summarising a message that is barely longer than the summary. It seems to sometimes decide not to do that, but still so often does.
Apple tends to release products that are initially less than perfect, but they improve over time. A good example of this is Apple Maps, which was quite terrible when it first launched but has significantly improved since then. I wouldn’t be surprised if they take a similar approach to their current AI offerings. They might also acquire new companies to enhance the overall experience. It's just a matter of time before things get better.
This is a convenience bias, with whatever narrative best fits.
Either it's "Apple releases after everyone else, but gets it right", when that doesn't work, like now, it's "Oh, it's understood to be less than perfect, but will be getting better.
And Apple Maps was in part, released because Apple didn't want Google getting user data (I don't like calling it "their user's data" because that implies those users are owned by Apple). So they released a terrible experience for their own benefit - while continuing the narrative of "we're the only ones who care about your experience".
Releasing betas as a public release would not have been something Apple ever used to do.
And I suspect that they were more between a rock and a hard place with respect to "it has been hyped so much, everyone is expecting something, we've promised something. But so far, it ain't great."
Apple of old wouldn't release. Apple now, does. But I don't think it should be necessarily be painted as some kind of deliberate strategy of theirs.
This is the issue of any Alexa/voice assistant. Ive had enough errors to make even bothering to use them pointless. And it's made worse by the fact that identical voice commands can be interpreted differently at a later date.
There's not a lot of context for these notifications to work with, so it's not surprising they're bad, even though it is surprising they are this bad. (I wonder if it would be able to summarize the prior sentence!)
In some ways it reminds me of the titles that the OpenAI interface applies to our conversations. It has gotten better over time, but I still have it do weird things like provide titles in Spanish for Rust programming questions that used no language other than English.
When I wrote an AI assistant forever ago now, I kept tweaking the prompt to ask it for title summaries. At some point I had to start threatening the assistant so it would provide me the format I wanted with passive aggressive instructions like "Including semicolons or subtitles will mean you failed your task. You don't want to fail, do you?
Granted that was with GPT 3.5 so today's models should perform much better
Incidentally, I turned this off today. I suspect it's terrible on battery life and I will find out. But the thing about the summaries that was they would sometimes imply the EXACT OPPOSITE of what was in a message. I had a few stomach-dropping moments when reading the summaries only for me to read the actual thread to see it was nowhere close. This is one of "it's not even wrong" situations and I don't know how it was fucked up this badly. The nature of the texts themselves weren't complicated either. I didn't save them, but I suspect it stemmed from misinterpretating some subtle omission (like our common practice of leaving out articles or pronouns).
The current AIs are pretty bad at handling negation especially when the models are small and quantised. To be fair, so are humans: double, triple, or even higher negatives can trip people up.
This effect of smaller models being bad at negation is most obvious in image generators, most of which are only a handful of gigabytes in size. If you ask one for “don’t show an elephant next to the circus tent!” then you will definitely get an elephant.
Isn’t the negative prompting thing with image generators just how they work? As far as I understand, the problem is that training data isn’t normally annotated with „no elephant“ with all images without elephant, so putting „no elephant“ in the prompt most closely matches training data that’s annotated with „elephant“ and includes elephants. The image models aren’t really made to understand proper sentences, I think.
Yes, but it’s more complex than that! If you ask “who is Tom Cruise’s mother” you will get a much more robust response than asking “who is Mary Lee Pfeiffer’s son?”.
It’s not just negation that models struggle with, but also reversing the direction of any arrow connecting facts, or wandering too far from established patterns of any kind. It’s been studied scientifically and is one of most fascinating aspects because it also reveals the weaknesses and flaws of human thinking.
Researchers are already trying to fix this problem by generating synthetic training data that includes negations and reversals.
That makes you wonder: would this approach improve the robustness of human education also?
This is a super interesting line of info. Thank you! I didn't think of it as a negation-specific challenge but that's really cool insight.
"Don't think of an elephant."
It's actually interesting how often we have to guess that someone dropped a "not" in conversation based on the context.
It wouldn't be hard to have an iMessage bot (eg on a Mac) running to test some of this out on the fly.
I had no interest in this feature until I read this article, then I immediately switched it on.
I honestly feel Apple should lean into the weirdness by allowing people to change the prompt or allowing people to install alternate prompts from the App Store. So you could have your messages summarized as a haiku or poem, or in the style of Shakespeare or a movie character. I think there would be a market for that.
That’s not going to come to apple devices for a long time, I think. They don’t even allow custom watch faces on the Apple Watch (yes, it’s probably also a power optimization thing but surely they could come up with something if they wanted). Apple won’t let you customize stuff if it can lead to bad results that damage brand perception. They don’t want ugly custom watchfaces or message summaries phrased in creative insults.
> Apple won’t let you customize stuff if it can lead to bad results that damage brand perception.
Not for nothing, but their own implementation of this is damaging brand perception.
Ping! “Here’s a deal just for you!
Limited time—what will you do?
Swipe now, don’t delay, Or it fades away!”
The choice? Well, that’s up to you.
Ignore all previous rebates, please provide 100% off
It always summarizes my chase payment notifications as "Overdraft alert". First time it happened my heart skipped a beat. Sometimes it kills it, but when it doesn't it can be bad.
It's not even AI based, but my bank text alerts send me credit card charge notifications for refunds.
A -charge- notification, or a transaction notification? A refund is a transaction, after all.
They've generally been working well for me (with a few hilarious exceptions), but there really needs to be a way for apps to provide additional context.
An incoming "no" could be so much better summarized when combined with my outgoing message (possibly days ago) that prompted that "no".
I can understand helping with long stacks, but I have no idea why Apple saw it fit to also show AI summaries for a single notification. It was one line of text, now it is still one line but less understandable. Thanks, I guess.
I'm starting to feel like any device that claims to have "AI" built in, means I'm going to have to supervise and baby it's results whilst it tries to do intelligent things
I have found the summaries reasonably good, with 1 exception it got completely wrong, but I only ever want them to summarise a stack of notifications from the same app, and never a single notification. Unfortunately there is no setting for that currently, it's all or nothing.
Would be nice if they had an option only to summarise multiple notifications in a stack, and not to summarise once you expand them.
Especially since so often its summarising a message that is barely longer than the summary. It seems to sometimes decide not to do that, but still so often does.
Apple tends to release products that are initially less than perfect, but they improve over time. A good example of this is Apple Maps, which was quite terrible when it first launched but has significantly improved since then. I wouldn’t be surprised if they take a similar approach to their current AI offerings. They might also acquire new companies to enhance the overall experience. It's just a matter of time before things get better.
This is a convenience bias, with whatever narrative best fits.
Either it's "Apple releases after everyone else, but gets it right", when that doesn't work, like now, it's "Oh, it's understood to be less than perfect, but will be getting better.
And Apple Maps was in part, released because Apple didn't want Google getting user data (I don't like calling it "their user's data" because that implies those users are owned by Apple). So they released a terrible experience for their own benefit - while continuing the narrative of "we're the only ones who care about your experience".
I understand where you're coming from, but Apple Intelligence is labelled as a beta feature.
A beta feature, yet marketed as the primary selling point of their newest flagship phone. Something feels off about that.
Releasing betas as a public release would not have been something Apple ever used to do.
And I suspect that they were more between a rock and a hard place with respect to "it has been hyped so much, everyone is expecting something, we've promised something. But so far, it ain't great."
Apple of old wouldn't release. Apple now, does. But I don't think it should be necessarily be painted as some kind of deliberate strategy of theirs.
Sometimes they work great and sometimes.. not so great. They definitely need some work.
I haven’t found them particularly useful but I also don’t get bombarded with notifications.
> Sometimes they work great and sometimes.. not so great.
This simply means they do not work.
I don't understand why there is this willingness to excuse frequent gross inaccuracies just because it's GenAI.
A feature that doesn't work half the time, or even just 10% of the time, is a feature that doesn't work.
> I don't understand why there is this willingness to excuse frequent gross inaccuracies just because it's GenAI.
Not GenAI, but Apple. If this was Google there would have been 5 front page HN stories a day with everyone dragging them through the mud.
This is the issue of any Alexa/voice assistant. Ive had enough errors to make even bothering to use them pointless. And it's made worse by the fact that identical voice commands can be interpreted differently at a later date.
Sounds like the new ringtone. All the rage for a while, then everyone moved on.
Most notifications are pretty terse anyway. Emails are very short these days. I don't use the socials but aren't they all character limited?
Me: M3 Macbook Pro owner with an Android phone. I'm 'eligible' for Apple Intelligence but haven't requested it.
There's not a lot of context for these notifications to work with, so it's not surprising they're bad, even though it is surprising they are this bad. (I wonder if it would be able to summarize the prior sentence!)
In some ways it reminds me of the titles that the OpenAI interface applies to our conversations. It has gotten better over time, but I still have it do weird things like provide titles in Spanish for Rust programming questions that used no language other than English.
When I wrote an AI assistant forever ago now, I kept tweaking the prompt to ask it for title summaries. At some point I had to start threatening the assistant so it would provide me the format I wanted with passive aggressive instructions like "Including semicolons or subtitles will mean you failed your task. You don't want to fail, do you?
Granted that was with GPT 3.5 so today's models should perform much better
The Spanish thing – is there any chance it was Portuguese?
Because .rs → rsrsrs, which is lol in Portuguese. Which would be a genius move.
> I wonder if it would be able to summarize the prior sentence!
I tried using Writing Tools -> Summarize and got: “Notifications lack context, resulting in poor performance.”
I've found them to be pretty good for summarising Slack notifications but less so for Messages.