Siri can now read a web article but it lacks flair

iOS 17 brings a new feature I've been waiting a long while to enjoy— Siri can now read aloud web articles right in Safari.

I've used the feature a dozen times so far, and while its my favorite new feature by far, the implementation leaves a lot to be desired. Siri isn't a particularly smart virtual assistant by any measure. So, it's no wonder that it struggles with all the variables of the written word.

Type: #Note
Re: #iOS #Siri #Standardization #Technology

Discuss...

Siri doesn't seem to consider context when choosing the pronunciation of word with multiple potential pronunciations. Apple's voice assistant also lacks a certain cadence in its delivery; a type of flair one may expect from, say, Stephen Fry reading The Hitchhiker's Guide to the Galaxy.

Siri's shortcomings as a voice actor got me thinking— shouldn't there be a markup standardization for this sort of thing? Some kind of metadata that communicates pronunciation, emphasis, etc. to AI readers?

It turns out, such a standardization is in the works from the W3C.

Specification for Spoken Presentation in HTML describes two approaches for markup attribution: multi-attribute and single-attribute.

My realization that a standard is in the works (because of course it is, for accessibility reasons) brings up more questions.

One, why hasn't Apple pushed for this, or even a proprietary public solution, for spoken presentation attributes? Or, have they and I'm missing it?

Two, is this type of attribution considered a matter of formatting? Because if so, is it possible to incorporate these attributes in Markdown? I'd love to give my articles a little voice direction without diving into the markup, or without adding to WYSIWYG editors already clunky interfaces. But, perhaps its far too early to say.

But damn, how great would it be to listen on demand to articles, essays, and books, as the author intended. And imagine if, once that ability is widely adopted, Stephen Fry sold his voice for AI reading instead of the technology companies screwing him out of work.

What a pleasant experience that would be.

Continuing down the rabbit hole

Stuff I learned after posting this article:

The W3C has Speech Synthesis Markup Language (SSML) which is an XML-based standard (like RSS?).

Amazon uses SSML for Alexa. Microsoft uses it for Azure AI services.

Interesting related links:

https://popey.com/blog/2022/10/blog-to-speech-in-my-voice/

http://library.usc.edu.ph/ACM/CHI2019/2exabs/alt08.pdf

Created: September 29, 2023
Future revisions?: Unsure