I'm a little biased (I used to work on the Google speech team), but it seems very hard for a startup to compete on the basis of accuracy, and for wearables like watches it's pretty clear both Google and Apple are putting third-party APIs for voice interfaces (including command-like syntaxes) front-and-center. A lot of earlier speech/NLP startups have struggled with this dynamic--although an aggressive, well-executing team can get a year or so ahead of the platform, if you do something too close its core competency, eventually Google/Apple will build the same feature directly into the operating system, and then you're stuck competing with a team of 100+ PhDs with a 1000x distribution advantage. At least, that's what would give me hesitation about building a speech/NLP API startup in 2014.
I also noticed you're running a conference on voice interfaces (http://listen.ai/). I'm not sure if you're well-connected to the speech folks at Google/Microsoft/Apple, but if you decide you want somebody from Google to speak, I'd be happy to ping some of my former colleagues on your behalf. Looking at the agenda, I think the areas where they could provide coverage is the core technology--acoustic modeling, deep learning, hotword detection, or embedded recognition.
We differentiate ourselves from the Android Speech API in several ways:
1) As a developer, with Google you have no way to customize the speech engine by providing your language model. Wit.ai builds a specific, customized configuration for each app. If your app is specific and you cannot tell Google what kind of thing it should expect, accuracy will be bad especially in noisy environments. Wit.ai builds a specific language model for each app automatically and in real time (each time Wit.ai learns more examples about your app, it's updated), and queries several speech engines in parallel. To do that it uses not only your data, but also relevant data from the community. This is the core of our value proposition and not something Google does provide today.
2) Google keep their Natural Language Understanding layer (what translates text to structured, actionable data) for themselves. Developers cannot access this. They're left with free text, but they often need actionable data.
3) Wit.ai is cross-platform. We have SDKs for iOS, Android, Linux, etc. [1] or you can just stream raw audio to our API. Android Speech API is just available on Android (well, you could hack it and use it from elsewhere but you're not supposed to, and you can be shut down anytime). More and more wearables and smart devices will run Linux. For instance hundreds of developers use Wit.ai on Raspberry Pi.
As for the Apple doc you linked, it's Mac only (no iOS) + it just recognizes a few phrases you provide in advance. I think it's a very old API that's still here :)
Regarding listen.ai, yes please we would love to have Google (especially Now) there. We have the Siri founder, the top Cortana guy, the former CEO of Nuance... but nobody from Google yet.
Having had RSI for a while I can't tell you how much I've wished for interfaces with point(/look) and speak UI. I literally haven't found a single case in which I couldn't quickly dream up a superior version of an existing UI.
Overall I came away with the conclusion that look-and-speak is probably the most deeply ingrained user interface there is. Perhaps the only one that you could argue is truly intuitive, as it seems to be genetically hard-wired.
On top of that modelling UIs as representation of hierarchical state machines is an astoundingly simple and elegant way to model such UIs; even allowing you to leverage persistent data structures to do amazing things. I've explore that to some degree in https://speakerdeck.com/mtrimpe/graphel-the-meaning-of-an-im...
Any chance listen.ai will either be livestreamed or videos made available later (a la confreaks or similar)? I can't make it, but I'm super interested in ALL of this and really, really want to learn.
A week ago Mark Suster wrote an interesting article about the definition of a "seed round":
> If it looks like an A-round, smells like an A-round & tastes like an A-round … it’s an A-round. My personal definition? It is less about actual money and more about structure of your Cap Table. If you have raised $2-4 million from a bunch of high-net-worth individuals I simply don’t see it as an A-round. If you raised $2 million from two small seed funds I probably don’t either (although in the past I would have). But if you raised $3-5 million from well-known seed funds or from a VC and you’re asking for $8-10 million in your next round … that next round is a B-round no matter what we collectively decide to call it when we VCs fund you.
My personal definition of a seed round is a round where you don't give up any board seat or special power to investors. After a seed round you should basically work as usual (product, users, product, users, ... nothing else). By this definition, our round qualifies as seed. Managing a board takes time and energy and the more you can delay this, the better (from my experience).
That being said, this is a very subjective notion and everybody is free to have their own.
I think this piece you're putting here shows quite well how much the definition of a round doesn't matter to anyone but VCs. Important stuff is: awesome product!
Congrats to the Wit.ai team! Not sure there are any other companies laser focused like this (NLP + IoT).
@ar7hur The pricing page[1] shows that the Community (free) plan allows unlimited queries, but the Starter plan is limited to 250 queries per day.
Did you mean that unlimited queries are still allowable to any open instances, while the query limit is only restricted to the three private Wit instances? If so, I recommend another footnote on your pricing page to clarify this distinction.
Yes open instances are free and unlimited. This is the cornerstone of our approach: we want developers to work together and share their training data. Natural Language is very hard and we need to join forces to crack it.
Thanks for the feedback, we'll try to make this easier to understand on the pricing page (yeah, natural language generation is hard for humans, too!)
Awesome thanks! Also, the other ambiguity is whether the private instance query limits are per instance, or total aggregate across all private instances.
Fun fact: "Wit.ai" can be pronounced just like "Witaj" in Polish, which means "Welcome/Hello". Dunno if this is intentional or even acknowledged by founders. ;-)
One of the founders is named Laurent Landowski. Could this be a Polish name? Polish also looks to be in beta for them, which is one of only a handful of languages they support.
Wow, congrats on the round! Looks like an amazing service. Will be trying it soon for a project I'm working on, Android speech APIs aren't quite cutting it.
Can this be run continuously from a Service on Android? Didn't see a mention of it in your docs, but I've yet to play with it.
I'm curious how you differentiate yourself from the built-in speech APIs on iOS and Android? https://developer.apple.com/library/mac/documentation/Cocoa/... http://developer.android.com/reference/android/speech/Speech... http://developer.android.com/reference/android/speech/Recogn...
I'm a little biased (I used to work on the Google speech team), but it seems very hard for a startup to compete on the basis of accuracy, and for wearables like watches it's pretty clear both Google and Apple are putting third-party APIs for voice interfaces (including command-like syntaxes) front-and-center. A lot of earlier speech/NLP startups have struggled with this dynamic--although an aggressive, well-executing team can get a year or so ahead of the platform, if you do something too close its core competency, eventually Google/Apple will build the same feature directly into the operating system, and then you're stuck competing with a team of 100+ PhDs with a 1000x distribution advantage. At least, that's what would give me hesitation about building a speech/NLP API startup in 2014.
I also noticed you're running a conference on voice interfaces (http://listen.ai/). I'm not sure if you're well-connected to the speech folks at Google/Microsoft/Apple, but if you decide you want somebody from Google to speak, I'd be happy to ping some of my former colleagues on your behalf. Looking at the agenda, I think the areas where they could provide coverage is the core technology--acoustic modeling, deep learning, hotword detection, or embedded recognition.