By Todd Bernhard on Thu, 05/12/2011
There are a lot of apps that claim to do speech recognition. In reality, what those apps typically do is record your speech and upload that recording to a server that does all the heavy lifting. Examples include the very popular Dragon Dictation, Vlingo and most importantly, Siri, which was actually acquired by Apple. This method can be effective when you have a fast network connection but the Holy Grail of speech recognition needs to be performed in-app.
Fortunately, SpeakWithMe has managed to do just that. VoiceDJ is their flagship app, and it's FREE for ten uses per day. A $1.99 in-app purchase unlocks unlimited use. Simple commands like volume up, next track, etc., do not count towards this limit. I like this approach because it shows how confident they are that it will recognize your voice and you will want to use it repeatedly. And it did indeed recognize my voice, out of the box, without retraining, after indexing my iPhone's music database. The developer claims it handles a vocabulary of one million English words with a 99% accuracy. What's more is their on-device technology enables a response time measured in milliseconds compared to seconds from competing server-based apps. Dragon does offer some on-device recognition but only of 5,000 words by comparison.
SpeakToMe's first effort is based on music but this technology could be used in a wide variety of apps. They offer an SDK so developers can license the technology for their own apps. So instead of saying 'Play Brown Eyed Girl, by Jimmy Buffet,' someday you could say 'Directions to Key West' directly to a navigation app! Another photography app could use voice recognition to snap a picture when it hears ’cheese!'. SpeakToMe is looking at gaming, social networking, and in-vehicle opportunities.
I had the opportunity to speak with their CEO, Ajay Juneja, during lunch at CES. We discussed Apple's purchase of competitor Siri, but SpeakToMe feels that Siri's technology is limited by being server-based. Of course, the goal for Apple would be to migrate the technology into iOS but with SpeakToMe, it's already done and available today. It should be noted that the iPhone does have some Voice Control capabilities that overlap with VoiceDJ but sometimes I feel like I need to be a contortionist to activate Voice Control... hold the home button for two seconds until it kicks in, but if you hold it too long or too short, it doesn't work!
VoiceDJ is part proof of concept, part customer-ready app. It's not perfect (it crashed when I tried to play songs that had long names) but it's a nice utility that could use some refinements for usability. You still need to look at the screen, which is partly Apple's fault as there is no hard button you can press to enable listening mode. Other apps like Vlingo provide a larger button and the option to either hold down the button or press the button and release to enable listening mode. To their credit, you can enable the proximity sensor to work as a trigger. Just bring the iPhone to your mouth and talk! It's not exactly hands free but it's close and a nice touch. I should disclose that I went to the same college, Carnegie Mellon University, as Ajay, albeit decades apart! Much of the technology used in the app owes its heritage to CMU's work in speech technology.