Siri Notes

Blogging about the Siri Phenomenon

Archive for the tag “Apple”

On the Comically Absurd Lawsuit Against Apple….

Great piece by Joe Wilcox on the senseless lawsuit against Apple for “deceptive advertising”.  Just like Mr. Wilcox, when I learned about the lawsuit, I was confused.  As Wilcox put it:

Why didn’t Fazio return the phone? The lawsuit reads: “Promptly after the purchase of his iPhone 4S, Plaintiff realized that Siri was not performing as advertised?” Eh, what does, Bud? Stores have return policies for a reason. Did he not keep the receipt? He hasn’t heard of eBay? Where iPhone 4S commanded top dollar last November. If Fazio knew “promptly”, why didn’t he get a refund promptly?

Yes indeedy!  The only way I can explain the motivation behind the filing is to suggest that Mr. Fazio had a genius  idea on how to self-promote.  And it is indeed genius: his name is going to be famous.  Let’s see if the judge receiving this will throw the case away or whether he will be tempted to jump on the quest for his or her 15-minute-of-fame bandwagon.

Quaking in their Boots: Google and Amazon

Google executive chairman and former chief, Eric Schmidt, finally conceded this past week that Siri indeed could pose a “competitive threat” to the company’s core business model.  As we all know, Google generates its billions from click-able ad links returned in the context of generic searches.  You get your results on the main body of the screen,  with ads on top and at the bottom, and on the sides of that body of results.  Siri obviously threatens this arrangement since there is no screen to view search results or space to show those ad supported links.  It is going to be fascinating to watch how Google deals with this.  Google does have quite a bit to come back with in terms of their own voice and natural language technology, but one can see now how Google would have totally soft peddled pushing the envelope on non-visual voice assistants.   I for one, to be perfectly honest, did not see this aspect of the equation and all along thought that Google would be the one to break ground on voice technology.  And I thought that Apple — in spite of its acquisition of Siri (which I thought was merely tactical maneuvering by Apple) — did not have the imaginative capacity to break of the world of visually centric UIs, given its history from day one.  I was off on both counts!

Amazon, on the other hand, also seems to have decided to move into the space with the acquisition of Yap.  To be fair, Amazon’s move to acquire Yap started months before the launch of Siri, as the SEC filing shows.  So, I think we can safely say now that something is in the air about the key role that voice will play in the next generation of smart devices, and that we have probably reached a convergence of multiple forces — technologies, infrastructures, and market expectations — for the holy grail of interfaces to finally become real and part of our lives.  Let’s see what the Amazon folks are thinking about: perhaps speech enabling their Kindle? — i.e., an expert voice assistant that will make your reading experience much more enjoyable?

Also, let’s watch what happens next with Facebook and Twitter.  For Facebook, we will probably see a joint activity between Microsoft and Facebook — given that Microsoft does have very strong technologies in Speech, a healthy bench of talent, and deep pockets (as Facebook does), and a joint move would make perfect sense.  As for twitter, not sure what they can do on their own (and maybe they don’t want to do anything on their own), given that they are already embedded in the iOS5, and I am pretty certain that one of the next upgrades of Siri will be the ability issue twitter commands and post tweets by voice.

On Walt Tetschner’s Downplaying of Siri

Long time editor of the ASRNews monthly newsletter, Walt Tetschner, has come out with a rather curmudgeonly review of Siri in his October issue of his newsletter (published today, November 9, 2011).  Here is an excerpt from the section in the newsletter that focuses on Siri, titled “SIRI positioning is a bad mistake”:

Apple has positioned SIRI as a personal assistant. By doing so, they are setting expectations that will be a challenge for SIRI to achieve. If your assistant makes an error, it might not bother you the 1st time. You correct it, and expect the assistant to learn to do the chore correctly the next time. When the next error occurs, you probably aren’t as forgiving. By the 3rd error, you are thinking of firing the assistant. Data exists that indicates that a similar thing happens with a speech-enabled assistant. As time goes on and the speech-enabled assistant keeps making errors, users simply stop using it. As a general purpose assistant for dealing with all communications, SIRI is simply inappropriate. Aside from the lack of robustness, speaking to a mobile phone is often totally inappropriate. When other people are around, it invariably is wrong. It isn’t private and can disrupt and irritate others. Talking to a machine is perceived as socially weird behavior. Speech is totally appropriate and most effective for use in a hands-eyes busy environment. It is often the only safe way of communicating. Apple would have more appropriately positioned SIRI as a tool for hands-eyes busy communications. One of the weak spots of SIRI is that it requires the user to push a button to get it to recognize speech. SIRI needs to add the Sensory Truly Hands Free technology. The recent 2-day SIRI power outage made it clear that SIRI needs a data connection to do local tasks like play a song or schedule an appointment. This further limits its utility. Siri has gotten a lot of visibility since it was made available. The primary utility appears to be amusement, though. Most users have found it more entertaining than actually helpful. It’s amusing, but how much can it handle your day to day tasks? Two highly publicized speech-enabled personal assistants have failed in the past. Wildfire failed in the late 1990s and General Magic failed in 2002. Users claimed that they loved them. They had high expectations for the products. Over time, these expectations were not met and the users simply stopped using them.

First, having witnessed Mr. Tetschner’s decades-long sustained shrill complaints about how sub-performing Speech Technology has been, I find it surprising that he did not bother to mention that the speech recognition of Siri is remarkable in its accuracy.  I have owned an iPhone 4S for almost 3 weeks now, and have been using Siri on a daily basis – to send email, text, voice dial, look up stuff, or just goof off — and my awe at the level accuracy has yet to wear off.  It is not perfection, but it sure is close to it — so close, that I wonder if its error rate is comparable to that of a human (and the error rate on humans is NOT zero).  And I love the fact that I now type and peck and swipe a lot less than I used to.

Second, Mr. Tetschner seems to miss some pretty basic aspects about Siri that put it in a unique position compared to what has come before.  First and foremost is the fact that Apple is behind it.  Why is that important?  To begin with, Apple cares about the user experience, and so they will make it a mission to do all that they can to improve it.  Second, because it is Apple, they have the resources that are needed to invest in such improvement.  And Third, Apple cares a lot about its brand and will not let any of its products tarnish it.   Apple will not let Siri fail, nor will the legions of Apple users who love Apple’s daring vision and understand fully why Apple is moving in the direction of Siri.

Thirdly, Mr. Tetschner betrays the outlines of the small box within which he seems to have confined his vision of what we should ultimately strive for in a speech interface: let’s use speech only when our eyes and hands are busy.  Really?  If I can reliably dictate an email, even if my eyes and hands are NOT busy, you think I will bother with typing that email?  To be sure, I will type that email if I can’t privately dictate it, but when I can, I will.  And I will voice dial in most scenarios that I can think of, except in meetings, where I shouldn’t dial out anyone in the first place….   Apple is going for the real thing: the most natural User Interface that humans can interact with: naturally spoken language.  When Steve Jobs kept repeating in his last keynote in WWDC back in June, “It’s that simple,” and “It just works,”   he really meant it, and the ultimate interface that is that simple and just works is the spoken word.

Fourth, it seems to me that Mr. Tetschner is missing (or is not aware of) the fact that Siri is a service in the cloud that continuously learns and improves.   His comparisons with Wildfire and General Magic are off the mark.  If indeed those products are to be called failures, past failures are not necessarily predictors of future failure.  Was the Apple Newton (a cousin of General Magic) a failure? To some it was, but in my eyes, it was simply technology before its time, and in any case, the technology was at the very least the conceptual glint in Apple’s eye of what would later become the iPad. But more crucially, the key difference between Wildfire and General Magic on one side and Siri on the other, is that Siri trains against the user’s voice, it is in the cloud and is continually learning.   None of the previous “Assistants” can make that claim.

Siri has a long way to go, there will be outages, it will behave stupidly, its recognition will not be as good as perfection, but for those in the field who have been dreaming of the day when we can just talk to our machines naturally, without having to peck, and swipe, and tap, Siri is a monumental step forward.   The birth of Siri is an occasion to rejoice and to cheer, rather than to pretend that it’s business as usual.  Because, it is not business as usual.

Siri could be smarter

Siri seems to have some very basic gaps in its error recovery strategies.  As this video shows, Siri doesn’t have the smarts to compute the proximity of input against even a small set of alternatives it is offering the user.  The person speaking is trying to say “Work” when given the option between “Work” and “Home,” but Siri hears “Walk.” Reasonable enough a mistake, given the accent.  But instead of comparing “Walk” with “Work” and “Home” and discovering that “Walk” sounds more like “Work” than “Home,” Siri insists that it has no clue what the user is trying to say when it hears him say “Walk.”  Yes, “Walk” as such makes zero sense in the context in question, but Siri still looks stupid. If the user had said, “Helicopter,” one could understand Siri’s inability to resolve, given that “Helicopter”sounds nothing like either “Work” or “Home.”  But Siri heard “Walk”…  This is basic Voice User Interface (VUI) design 101 kind of a problem….

A second failure is in repeating the exact same rejection: “I don’t know what you mean by ‘Walk'”.  Again, Siri sounds dumb.  In VUI design, a best practice in recovering from errors is to remember failure data points and to remove them from the list of potential hypotheses in order to give a chance to the next hypothesis to be considered.  Siri should have remembered that the user couldn’t possibly have meant to say “Walk,” since saying “Walk” is not resulting in moving the exchange forward.  So, it should have dropped “Walk” from the list of things to consider in the next iteration of the exchange, thus opening up the possibility of hearing “Work” perhaps.

Outages today….

Actually, outages since last night, Saturday November 5th, around 11:00 pm EST (at least ).  Then it failed again several times this morning and a few times midday.  What is interesting is that the speech recognition seems to be working just fine — and it seemed to be working not just for English but for the other languages as well.  It’s the natural language processing that seems to be down: for instance, it transcribed perfectly my saying, “How is the weather like today,” but was unable to actually retrieve the weather information.  It responded with “Uh, oh, something is wrong” and variations thereof.  Which brings up an interesting point: it seems that Siri is performing the speech recognition and the natural language processing in two completely separate phases.

Siri: A Dream Come True!

Ever since I professionally got into the Speech Technology and Natural Language Processing fields about 15 years ago, I’ve had to grapple with the proposition that “Speech is just around the corner.” The proposition was both an admonishment of how, year after year, my professional community was collectively failing to fulfill its market promise, but at the same time a reminder that sooner or later something big – something epic no less – was afoot, and that we, the Speech Recognition and Natural Language Processing professionals, were contributing to turning the corner to make that magical moment happen.

That moment has happened and it is called Siri.

Many in my industry are reacting — out of habit, I am sure — with caution: maybe we are not there just yet, maybe Speech has still not arrived, maybe this is yet one more bitter disappointment in the making.  I think not — and here is why I think we are dealing with the real thing:

1. Apple has come out full force behind Siri: I have yet to see a commercial about the iPhone 4S that didn’t feature Siri as the heart of the message.  Also to be noted is that the iPhone 4S looks almost exactly like the iPhone 4, highlighting the point that Apple wants to make about what is new with its newest device: it’s not how it looks but how it sounds that is revolutionary.
2. Apple is obsessed with protecting its brand: The fact that Siri has been released as a Beta product is a good indication about just how carefully Apple is threading, even as it does so boldly.  And having taken the plunge and so publicly, Apple has crossed the Rubicon: it will do everything that it can to ensure that Siri is a success.   Failure is not an option, because damaging the brand is not an option.
3. Apple obsesses like no other company over the usability of its products: And the only way that Siri will be a success is by ensuring that it is highly usable, like any of its other products.  And Apple has understood better than any of its competitors that one cannot take the human out of language.  While Google gives the Android phone user “Voice Actions,” Apple gives its users an Assistant, with a name, a voice, and a sassy personality to boot.  It is very telling that Google saw the interaction as human speaking at a machine, while Apple has introduced the concept of a human talking to a machine, and crucially, the machine talking back to the human.
4. Apple’s user base is a loyal constituency that will jealously defend and earnestly cheer on Apple’s products: The best way to get a sense of just how ecstatic is the “fanboy” base about Siri is to listen to the iPhone Live podcasts following the release of the 4S.
5. Siri is a monumental endeavor — a mountain to conquer that is worthy of the large collective ego of Apple: Siri tackles two of the most challenging problems in the field of Artificial Intelligence: Automatic Speech Recognition and Natural Language Processing.  Only the egos of Steve Jobs and Apple by extension could have been big enough to dare take on such a massively difficult undertaking.  But then again, the only quantum leap in usability that remained to be conquered was the ultimate UI frontier: i.e., the interface that is the most natural, the most intuitive, the one that requires no new learning: naturally spoken language.
6. Siri solves the iPhone’s real-estate problem without compromising the need to keep the real-estate small: Indeed, what makes the iPhone (and for that matter any smartphone device) compelling technology is the fact that it is a computer that fits into your pocket.   But its very size compromises its usability: it is hard to read from and hard to type into.  Listening rather than reading, and speaking rather than typing are elegant solutions in those situations where it makes more sense to listen or talk rather than read or type.
7. Siri solves the iPhone’s keyboard problem without compromising on Jobs’ commandment that there shall never be a non-flat keyboard on any of his mobile devices: At least for those among us in the Speech and Natural Language fields, it was a happy turning point, now looking back, when Steve Jobs decided that we had to do away with the plastic keyboard.  People have tried to adapt to the flat screen keyboard, and some have successfully found their comfort zone, but typing on the iPhone or the iPad will never be a gratifying experience.
8. Siri is a direct threat to the business model of Apple’s Nemesis — Google: If I get my answers from sources other than Google and if I can’t click on ads because I don’t get to see them, how is Google going to continue racking it in?  Is it just a simple coincidence that Apple bought Siri the company barely a month after HTC introduced an Android phone in January 2010?
9. Siri will only improve with time: Since every single interaction between you and Siri is mediated by a Cloud, Siri is continuously gathering data about how people are using it, what they are asking for, and how they are asking their questions.   This means that just like us humans, Siri will learn and improve with every passing day.  Imagine Siri a year from now, or five years from now.
10. Apple is company with deep pockets — $86 billion in cash reserves: Apple paid a paltry $200 million for Siri the company back in early 2010, and I am sure it has spent at least that much if not more since then to get the product to market.  With its brand riding on the success of Siri and pockets that will only continue to deepen, Apple has what it takes in terms of raw resources to make the several mind boggling quantum leaps that we can happily look forward to.

The conclusion is inevitable: Apple will continue to invest in the product, users will put up with its deficiencies not only because it works well enough even at its very nascent phase but also because they see and understand the noble vision and love Apple for taking such a bold step, and so the product will have enough time to improve and establish not only itself but a whole industry dedicated to building intelligent solutions that people can talk to naturally. And as Steve Jobs would have put it, it will just work — it’s that simple!

Post Navigation