Siri Notes

Blogging about the Siri Phenomenon

Archive for the month “December, 2011”

Happy Holidays from Siri — Not!

vLingo Embraces Nuance: Sores and All….

After more than a year of a two-way litigation saga between Nuance and vLingo, vLingo has apparently fully bowed to Nuance’s relentless quest to buy it.  The terms of the acquisition were not disclosed.  The announcement was a complete shock to most observers of the industry.  Not only was the clash between the two companies bitter and at time venomous, but one would not have expected vLingo to fold so soon after its court victory last August over bullying Nuance.

But upon reflection, the move is completely a rational one for both entities in light of Siri, and lately of Google’s Majel (although I am sure the negotiations were started long before rumors over Google’s Voice Assistant project started circulating in earnest earlier last week).

For vLingo, the move is nothing short of a pure self-preservation play. Remember that vLingo had to make its $9.99 offering on the Android market free soon after the launch by Google of Voice Actions last summer.  Then it had to do the same thing again — making its offering free — immediately after the arrival of the iPhone 4S and Siri in October.  Then a month later — no longer waiting for any more shoes to drop — vLingo made its offering on the Blackberry free as well.  That is, vLingo was a technology company with a compelling offering, but no business model.  Worse, the emergence of general Voice Assistants that were tightly integrated with their native OS represented a serious erosive threat that was bound to shrink their user base to the empty set: why would anyone use vLingo when they can use Voice Actions or Siri?

For Nuance, the acquisition of vLingo will enable it to have a play in the Voice Assistant play.  Whether that is a wise thing to do at this point remains to be seen.  To be sure, the move will trigger movement from Apple to seek an alternative Speech Engine rather than rely on what is now a vendor-competitor — a position that Nuance seems to feel is a perfectly proper way to do business in the IVR world.  This is a company that makes a good chunk of its money selling ports to Voice Solution providers and at the same time merrily goes ahead and actually bids on RFP Solution projects AGAINST their very clients, whose bread and butter is building, deploying and hosting such solution. How tacky,Paul Ricci, how tacky!

Either that or the vLingo acquisition was triggered by movement from Apple to shed off Nuance, triggering Nuance to move away from the ASR commodity model (at least within the mobile voice sector) and towards the Assistant App model.  I for one would cheer mightily if and when Siri sheds Nuance off.

In any case, as we move forward in this Brave New era of the Voice Assistant wars, let’s pause to remember the immortal apt words of Dave Grannan, CEO of Vlingo, who said back in May, 2011

Competing with Nuance is like having a venereal disease that’s in remission.  We crush them whenever we go head-to-head with them. But just when you’re thinking life is great—boom, there’s a sore on your lip.

No word yet on whether Mr. Grannan will stay with Nuance or move on to better things.

New Voice Assistant from Google?

Rumor is rampant that Google is feverishly working on releasing its answer to Siri in the next couple of months — if not indeed in the next couple of weeks!  The project, if not the product itself, is called Majel (as in Mabel), named after  Majel Barrett-Roddenberry, the name of the actress behind the voice of the Federation Computer from Star Trek.

This is exciting news, especially for a voice assistant enthusiast like myself.  Indeed, having several assistants (let’s not forget Microsoft) get into a Rockem-Sockem kind of a match will only strengthen everyone involved, and can only spell good news for the user base.  But a few remarks are in order.

First, let’s all pause to remember — before it is erased from our collective memory — that Google’s Andy Rubin, the man in charge of Android’s development, famously said that smartphones should not be perceived as assistants, but rather simply as tools for communicating with OTHER PEOPLE (my loud caps):

Your phone is a tool for communicating. You shouldn’t be communicating with the phone; you should be communicating with somebody on the other side of the phone.

Clearly, the man was totally clueless and didn’t know what he was talking about, and Siri  completely blindsided him and Google.  And now, he and Google are eating their words and feverishly working on developing nothing less than an assistant that will help Android users to talk to their phones!

However, according Matias Duarte, Senior Director, Android User Experience at Google, Google is pursuing a different tack from Apple:

Out approach is different. The metaphor I like to take is – if it’s Star Wars, you have these robot personalities like C-3PO who runs around and he tries to do stuff for you, messes up and makes jokes, he’s kind of a comic relief guy. Our approach is more like Star Trek, right, starship Enterprise; every piece of computing surface, everything is voice-aware. It’s not that there’s a personality, it doesn’t have a name, it’s just “Computer.” And you can talk to it and you can touch it, you can interact with it at the same time as you talk with it. It’s just another way to interface with the computer.

Interesting, but probably telling in how Google, even as it dashes to give birth to Siri’s rival, still doesn’t fully get it: You cannot take the human out of human language.  People cannot speak to a machine and listen to a machine without injecting into the interaction all of the rules of conversation that they have learned — because the only context within which they have learned those rules is the context of talking to another human being.  Apple’s breakthrough move was to give Siri a voice, so that you can hold a conversation with your assistant.  Google still thinks that it’s about the discrete actions and tellingly, thinks that voice interaction “is just another way to interface with the computer.”  It really is not.  It’s a special interaction because it borrows a medium that is at the core of what makes us human.  All this time, with typing, and clicking, and swiping, and tapping, we have been using a language that we had to learn from scratch and that was dictated to us by the machine and the people who built those machines.  In the case of human, natural language, it is radically the other way around: the computer is learning our language and how we use that language, and will have to adapt to it or fail.

Two interesting things to watch for as Majel emerges are: (1) What type of Natural Language will Majel use? and (2) What types of API enhancements will they provide to the developer base?  On the first, the recent purchase by Google of Clever Sense, the company behind Alfred, a personalized restaurant and bars recommendations app, is a strong indication that Google is retracing Apple’s exact  steps, except two years later.  The engine behind Alfred will probably be broadened to provide assistance on several fronts, just as Siri’s was broadened to handle Calendar, SMS, etc.  On the second — the APIs — I expect that Google will have a leg up over Apple, since Google already has published hooks to the Speech engine and the Text to Speech, and will only have to figure out how to expose its Natural Language processing to empower developers.  If it does, it would be Apple’s turn to play catch up….

Last thing — the obvious: this is not the first time that Google will be playing catch up.  It played catch up by releasing its Android phone a couple years after Apple released the iPhone; it is trying to play catch up to Facebook with its Google+ initiative; and now it is playing catch up with Majel.  Indeed, one could say that Google is now the new Microsoft: a company so blinded by its business model that it can’t see beyond the next step, and so devoid of the capacity to generate its own strategic thinking that  its “strategy” is now reduced to sitting back and letting the innovators innovate and then, thanks to the deep pockets, step in and start building stuff.  Which, let us all note, is something that Microsoft has gotten real good at doing very smoothly and without a hint of embarrassment.  Just watch how Microsoft will soon come out and reveal the name of its assistant, and not feel one bit sheepish about doing so.

Smart Sometimes, Dumb Other times

In my previous post, I noted how Siri seems to be informed by the interactional context to resolve the meaning of an imperfect Automated Speech Recognition output: the ASR returned “David Bully” instead of “David Bowie,” but the post-processing was smart enough to correctly guess my intent by noting that “David Bully” was “close” to “David Bowie” (not sure if the distance calculated is the phonetic distance, or simply character distance, etc.)

Which I thought was pretty impressive and a step in the right direction compared to what I had observed last month in a video, where it was not able to guess that the mis-recognition “Walk” should be mapped to “Work,” given that the other option was “Home.”  I confirmed the behavior independently after watching the video.

Yesterday I tried the experiment again, and to my disappointment, it still behaves the same:

So, it doesn’t seem that Siri’s context leveraging is  across domains.  In the case of Music, it seems to have the smarts.  In the case of Voice Dialing, it doesn’t — which is strange, given that Voice Dialing the problem seems to be easier….

What is interesting is that I tried very hard several times to say “Walk” when I was prompted to select between “Work” and “Home,” and the closest rendering Siri gave me was “Walck” (see above).  But when I am off context — i.e., I just bring Siri up — and say “Walk,” the ASR gets it perfectly every time.

So, the ASR is definitely taking context, but it’s puzzling why context is not constraining it in the right way (privilege the offered options).

Siri is Getting Smarter?

I complained last month about how Siri could definitely get smarter in terms of how it interprets the speech input it receives.  My point was that it should make what seems to be very basic intelligent use of its context.  In the example I cited, Siri didn’t seem to have the smarts to detect that “walk” (the utterance it heard instead of hearing “work”) was closer to “work” (the meant utterance) than to “home,” — with “work” and “home” being the only two options that it had under its consideration.   Pretty dumb behavior, I would say.  Today I was pleasantly surprised to discover that Siri seems to have overcome this hurtle.

Here is my interaction with it:

The thing worthy of note is how it was able to map “David Bully” (what it thought I had said) to “David Bowie” (what I actually said) by leveraging the context of my speech — i.e., Music.  “David Bully” is closest to “David Bowie” than any other artist.  At least that’s what I think is happening.  If so, very nice indeed.

My Interview with CBR on Siri and Beyond….

I spoke yesterday for about 30 minutes on the phone with Alan Swann from CBR, and here is the resulting article.  Enjoy!

The Wake-up Phrase: Why?

When you want to wake Siri up to listen to you and interact with you by voice, you press down the home button for a couple of seconds until you hear the double beep and see the Siri purple microphone button.  The same holds for Microsoft’s Mobile Tellme solution.  In the case of vLingo and the Android’s Voice Actions, you initiate the interaction more or less similarly by pressing a soft button. (With Siri and Voice Actions, you can also spark the voice interaction by lifting the device to your ear.) With Microsoft’s xBox Kinect, however, you initiate the voice interaction by speaking the word “xBox”. In technical parlance, the phrase “xBox” is called a “wake-up phrase” — as in, “Wake up computer, I want to talk to you.”

Check this video out:

At first blush, using a wake-up phase may sound pretty neat: rather than pressing a button, hard or soft, you simply say a phase and, magically, the system is at your beck and call, listening to what you have to say.   Yes, neat, but….

First is the obvious risk of misfires.  What if I say the word “xBox” in the context of talking about the xBox rather than in the context of commanding the device to come to life?  What about saying something that sounds like “xBox” as in “Text Bob,” or “Eggs Box,” or “Dick’s Boss”?  An empirical question, to be sure, but certainly a concern to look into.  This is especially relevant given that the use case — a home entertainment setting — is one where random, rambunctious noise will be the norm.

More serious is the fact that the device is — literally — constantly listening: a current of media is in fact streaming through the cloud into software that is continually sniffing for the magic phrase.  Think about that: an audio channel freely piping out with full abandon the domestic sounds of your private hearth out into the  open ether.  And for what?  So that the damn thing can detect its wake up phrase?

Here’s a thought: give me a remote control and have that remote control consist of nothing more than ONE BIG button that I can press to start and stop the speech recognition.  I would be perfectly happy with that — because, you know what: I do LOVE speaking to the TV rather than having to endure the miserable experience of clicking on tiny buttons and navigating senseless graphical interfaces.  Or even better yet, publish Apps on all the mobile market places that will enable me to start and stop speech on my Kinect.  Because you know what: I am simply NOT going to bring a machine into my home so that it can stream the noises of my living room to Microsoft’s cloud. That’s just not gonna happen.

The Fake Siri Backlash….

The old saying, “Give them an inch and they will take a mile,” sure does come to mind while reading Gizmodo’s Mat Honan’s piece about Siri.

The bottom line complaint that Mat Honan has is this: Siri is not as intelligent as a human being, is not perfect, does not understand what you say everytime, cannot always fully fathom what you mean or hint at, and therefore Apple is a liar and is perpetrating a fraud against its consumer base.

I exaggerate, if at all, only slightly.

According to Honan, Apple is in the business of delivering exquisite perfection when it releases its products: it did so every time with its previous products — with the iPod, the iPhone, and the iPad, and the hefty prices it charges for its product are well worth it — but with Siri, it has broken its Market Promise:

The iPod wasn’t the first music player, but it was the best; it was simple and wonderful. The iPhone was not the first smartphone, but it changed people’s lives in a way that hadn’t happened before; it was intuitive and powerful. The iPad was not the first of its kind, but I waited for the Cupertino Nod to buy a tablet. You know what? It was worth the wait too.

In contrast, according to Honan, Siri is a deeply flawed product:

If I wanted a half-baked voice control system, I could snag an Android phone for $49 at T-Mobile. Instead, I waited, and gladly plunked down hundreds of dollars on a new iPhone in October—because it promised to be flawless (or close enough), like everything before it.

The tagging of Siri as a “Beta” product for Honan is a senseless Marketing gimmick, totally beside the point, because “Beta is for Google”.  Not sure what that means exactly, but one not-so-subtle implication is that the dumb public doesn’t understand what “Beta” means and the dumb public — especially the dumb Apple public — can’t deal with an Apple product unless and until it its baked to perfection.

Let’s start first by noting that the vast majority of people who have voiced themselves on Siri have expressed a positive reaction: Exhibit A being the overwhelmingly positive stream of tweets on Siri.  Here is what the latest search results (as of the time of writing) gives me:

If Siri were the unmitigated disaster that Honan claims it to be — “a lie” — a real backlash would have ensued.  Remember, we are dealing with a consumer base that expressed raw, black outrage when the iPhone 4’s antaena didn’t function perfectly when the device was held in a certain way.  The Apple fan base is loyal, to be sure, but it can turn into the fearsome opposition if it feels crossed.

Second: the expectation that Siri must be perfect before releasing it to the public is just plain silly.  No software on this planet is perfect — ever — let alone software that is taking head on two of the most complex problems in Artificial Intelligence (Speech Recognition and Natural Language Processing).  Why should Siri be singled out and expected to perform flawlessly?  Moreover, ask yourself: was the iPhone really that perfect, that awesome and that life-changing of a product when it came out?  The answer is: No, it was not — not by a long shot.  I remember how for a whole year, I had no more than half a dozen apps that I cared about on my home screen and how I hated the fact that I had to deal with the awful soft keypad (which remains equally awful to this day) for typing my email. Aside from looking up the weather, the stocks, my Gmail (my Calendar being as useless to me then as it is now, my work calendar being tied to Microsoft Outlook), and playing my music  — and this only when I had to, given that my iPod gave me more control of how I listened to my music than the soft iPhone Music App —  I just had no real use for my iPhone.  I still sent email and checked my calendar using my blackberry, and I listened to my music using my iPod.  And oh, yeah — my iPhone sucked at making phone calls too….

Which brings me to my third point: Honan lists a litany of shortcomings of Siri and describes these shortcomings in tragic tones, as if they were fatal sins that will condemn Siri to eternal damnation.  But in fact, every single one of those “fatal sins” is nothing more than a simple blemish that will vanish as Siri matures: (1) Speech Recognition is certainly not perfect, but it will — and inexorably does — get better, day after day, thanks to the fact that Apple got Siri in the hands of the the dumb public so that they could submit the training samples that the speech recognition software needs to evolve its model; (2) Siri is not smart enough yet to know that asking it for the fastest way to an emergency room is more or less the same as asking it for the fastest way to a hospital — but getting it to be smarter in that way is now a trivial problem because, again, the application is in the hands of the dumb public, so that now Siri’s designers and developers can learn the many ways that people ask for medical help; (3) Siri doesn’t know how to give the user directions from one location to another — something that GPS Apps have been able to do for years now…. But then again, how many other “obvious” things do you want Siri to be able to do?  The answer is: you want it to do EVERYTHING — from giving you directions, to doing the dozens of things you may want to do on facebook, or twitter, or LinkedIn, or Hulu, or Netflix, or PayPal, or Gmail, etc.   Is it really reasonable to expect Siri be able to do everything under the sun? The answer is that it won’t — it will never be able to all the “obvious” things your want it to do — ever — let alone during its maiden release.  The universe is just too vast for it to do it by itself.

In any case, one thing is for sure: thank God Apple decided to put Siri in the Cloud, so that it can learn and get better with every one of the millions of interactions it engages in every day.  To be sure, it sucks and is a major bummer that I can’t use Siri when the network is down — but you know what, aside for playing music, my iPhone is more or less useless to me anyway when the network is down….

Siri is definitely flawed and in many cases outright dumb: it does not take full advantage of context; the fact that it does not launch Apps is very frustrating; and the fact that it treats complex requests rather primitively (takes only the first request it understands and ignores the rest) is disappointing.  But the key two things to keep in mind about Siri are the following: (1) Siri is a better voice and speech assistant than anything out there — and better by a good mile; and (2) Siri is an evolving product — evolving as a child would evolve as it learns about how to interact with people.  And just like we all knew that the iPhone was a revolution in the making and that we had to put up with its frustrating shortcomings in its inchoate stages, so must we all be equally patient and tip our hat to Apple for starting yet another revolution with Siri.  Siri is — just like the iPhone was — a worthy cause.

Post Navigation