Tech Optimist: An Alumni Ventures Podcast | Transcript: #63 - Meet the Start-Up at the Center of the Voice-Computer Revolution

#63 - Meet the Start-Up at the Center of the Voice-Computer Revolution

October 15, 2024 / 30:59/E63

Sam:
Do you prefer whispering over typing? Then, listen up.

Tanay Kothari:
The thing that makes Flow very unique as compared to other wise dictation tools that have come up in the past. Every other tool tries to write everything that you say word for word, which is not what you want. You speak very differently than you write. Bell Labs was one of the world's first voice assistants. This is before Siri, Alexa, any of those. And to people, it just felt like magic. You can break through the standard limitations of just human-computer interaction.

Naren Ramaswamy:
New version of a keyboard that doesn't require a keyboard basically. So, yeah, just using your voice. That's cool.

Sam:
Hello, everyone. Welcome back to this episode of the Tech Optimist. We have another Meet the Startup episode for you today and the startup that we talked to today is Wispr. So, behind the play with an AV jersey on is Naren Ramaswamy, Senior Principal here on the Alumni Ventures team. And then, our guest today is Tanay Kothari. He's this co-founder and CEO of Wispr Flow. And then, you guys all recognize me, my voice. My name is Sam. I'm the guide and editor for this show.
So, honestly, when it comes to introing this episode, I don't have much to intro because Naren did it for me. So, I'm going to sort of cut the chase here and get us right into the episode. So, sit back, relax. We so, so hope you enjoy this episode and you'll be hearing from me in a few minutes, so don't go anywhere and enjoy.
As a reminder, the Tech Optimist podcast is for the informational purposes only. It is not personalized advice and it's not an offer to buy or sell securities. For additional important details, please see the text description accompanying this episode.

Naren Ramaswamy:
Hi, everyone. Welcome to this episode of the Tech Optimist, a podcast hosted by Alumni Ventures. My name is Naren Ramaswamy. I'm a Senior Principal at the firm. Today, we're excited to chat with Tanay Kothari, the CEO of portfolio company, Wispr AI. Before we begin, a brief background on Wispr.
Wispr is redefining how we interact with technology using our voice. The company has launched a new platform called Flow, which is a smart voice dictation platform using AI. The platform uses context from your screen as well as AI algorithms to create a much more efficient way to interface with your computer. Since our speech is three times faster than our thumbs typing into a keyboard.
In this chat with Tanay, we are going to chat about how Wispr is blurring the human machine interface as well as what he's seeing in the AI space and what the future of voice technology looks like. Hope you enjoy it.

Sam:
All right. Naren set us up perfectly for this episode. I'm going to take the baton here for a few seconds and get everyone on the same page about Wispr Flow and get everyone on the same page about what they do as a company, their values and everything. Before we hop into the conversation with Naren and today, because I think it's just powerful to know what the company is doing, what their values are before we hear from them, right? Kind of let their work speak for itself.
So, this is right on their about page, right on their website. I did the dive so you don't have to, and I'm just going to read it to you. So, building voice intelligence that understands you. We are a team of designers, AI researchers and engineers who step away from the status quo to rethink the fundamental layer of computing, how humans interact with technology.
We want to craft voice interfaces that are both useful so you trust them and ubiquitous, so you can use them everywhere. For us, it's the only way we move from screen first technologies to voice first experiences and create the future where we aren't stuck looking at screens all day. Our first product, Flow, makes voice dictation delightful. We focus on the biggest use case for technology, letting people communicate with others their thoughts and AI.
Over the last few months, Flow became the first consumer voice dictation platform that makes people want to use voice more than their keyboards. And we're just getting started. We care about designing with incredible attention to detail. We care about building intelligence that mimics humans. We care about building experiences that feel intuitive. We care about building software that seamlessly fits into your life and we care about building magic. If this excites you, we'd love to work with you. All right. Let's hop in.

Naren Ramaswamy:
So, with that introduction out of the way, let's jump in. Tanay, thank you for joining us today.

Tanay Kothari:
Naren, thanks for having me.

Naren Ramaswamy:
That's great. You've had a really interesting background in linguistics and computer science. Could you start with a little bit of that background and how it led you to founding Wispr?

Tanay Kothari:
Yeah, of course. So, I started building in this space about 15 years ago back when the first Iron Man movie came out because I wanted to build Jarvis. And then, me and a buddy of mine back in the day build what was one of the world's first voice assistants. This is before Siri, Alexa, any of those. And to people, it just felt like magic. I think at its peak, we had about two and a half million users, and then we got shut down, which is a story for another time.
But to me, what always drove me to this space was when I thought about how we interact with technology, it just felt very mechanical and cold. And given that it was going to be a much bigger part of our lives, I just wanted to make interaction with technology, feel just as natural as interacting with other people, which is where all of this came out from. And then, after that, my love for languages grew. I was part of the Linguistics Olympiad team from India side, which was honestly less to do with this, more so that it was just entertaining, it was fun problems to solve, and that's just something I enjoy doing in my free time.

Naren Ramaswamy:
That's awesome. And then, you spent some time at Stanford studying computer science. When did you have the vision for something like Wispr and how did that evolve over time?

Tanay Kothari:
It's honestly been the exact same vision from the last 15 years. Just how do you make technology feel something that just fades into the background so you can focus on being more present. And that, to me, is a very unique part of voice interfaces in general. When you think about the current technology interfaces, they're all screen first, which means, you have to always be distracted from somebody looking like this. And if you have voice interfaces that are both useful, so you trust them and ubiquitous so you can use them everywhere. That's the only way I imagine that we can step away from this world of screens, and actually make technologies feel more seamless and make it work for us rather than us working for them.

Naren Ramaswamy:
Yeah, I think that's a fascinating vision. And I know you talk a lot about efficiency as it relates to today's technology. We use our thumbs to type into a keyboard, but our brain is actually processing thoughts much faster than that. And I'd love for you to educate our listeners a little bit about the productivity lost as a result of that, and what opportunity you see there and how something like Wispr can actually become a breakthrough technology in terms of productivity.

Tanay Kothari:
Yeah, for sure. So, the average person types at about 40 words per minute, less than 1% of the people in the world type faster than 80 words per minute. And the fastest typers somewhere around 130, 140 is where they cap out. If you think about how fast we speak, on average, it's about 120 to 140 words per minute, which is already three to four times faster than how fast you could type. And when you're thinking in your mind, your brain runs even faster than that.
So, overall, if you see what happens when you're trying to do any sort of work, you know in your mind what you want to say, what you want to write, and then your biggest bottleneck are your fingers as they move over keyboards, which are this archaic piece of technology that have existed since the 1800s, and we still use them because nothing else has worked as reliably yet.
And what ends up happening, and this is what we clearly see when we've given Flow to our users is, you just don't realize how much of your time and how much of your mental bandwidth goes into making sure things are formatted well, that there are no typos, that you're putting the full stops in the right places...

Sam:
More on Flow's technology right after this.

Matt Caspari:
Hey, everyone, just taking a quick break so I can tell you about the Deep Tech Fund from Alumni Ventures. AV is one of the only VC firms focused on making venture capital accessible to individual investors like you. In fact, AV is one of the most active and best performing VCs in the US, and we co-invest alongside renowned lead investors.
With our Deep Tech Fund, you'll have the opportunity to invest in innovative solutions to major technical and scientific challenges, which can have a hugely positive effect on society, companies that have the potential to redefine industries and create a more sustainable future and deliver significant financial returns. So, if you're interested, visit us at av.vc/funds/deeptech. Now, back to the show.

Tanay Kothari:
... versus if you can just let your thoughts flow, you get a lot more out. Lets you be way more creative. And it removes that bottleneck that we have of just the keyboards reducing our typing speed. So, what we eventually found is people on average with Flow are just outputting at 120 words per minute on average. Some people go above 200, even 250 words per minute, which I think is unfeasible to do with the standard keyboard. And once they start doing this for everything on their work site, so we're talking like sending emails, Slack messages, writing long documents, the productivity gains just compound.
And what it looks like in terms of the actual output is, as a manager, you're just way more responsive to your team. If you're doing customer support or sales, you're reaching out to a lot more people. If you're a developer who is, as we all do using ChatGPT or Claude or Cursor to get your work done, you could just be now so much faster and just produce more. And at the heart of it, you can break through the standard limitations of this human-computer interaction.

Naren Ramaswamy:
Yeah, absolutely. It's a new paradigm with how we communicate with technology. And so, just for our listeners to back up a little bit, I know you mentioned Flow.

Tanay Kothari:
Yup.

Naren Ramaswamy:
We haven't actually introduced Flow to the listeners yet. I know that that's your new product, but could you just share a little bit with the audience about what is Flow, how does it work? And I know that you're about to do a public launch, so maybe share more on that.

Tanay Kothari:
Oh, yeah, of course. So, Flow is super simple. You speak and it writes for you in every application in your style. Our first product is a Mac app that runs in the background. You just hold on a key, speak naturally and Flow understands where you are, what you're trying to do, and it writes the messages as you would have written them.
And so, the thing that makes Flow very unique as compared to other wise dictation tools that have come up in the past is every other tool tries to write everything that you say word for word, which is not what you want. You speak very differently than you write. And the second thing is these all systems optimize for the completely wrong metric. If you're technical, most speech recognition systems, they try to make the word edit rates really low, but that's a technical metric, that is not what matters to the end user.
The metric we care about is the percentage of zero-edit messages, which is, of all the messages that you write with Flow, how often do you have to go and edit something and fix it before you send it with Apple Dictation or Google or OpenAI's Wispr? That number is actually about 5%. The only 5% of messages, the ones that are ready to send. With Flow, that shoots up to 50% to 70% depending on where you are, which is what is the biggest differentiator. That is just primarily driven by us rethinking about what we wanted Flow to do.

Naren Ramaswamy:
It's fascinating. Thanks for sharing. And what's the core technology that powers this underlying the voice interface and what makes Flow unique compared to existing solutions in the space?

Tanay Kothari:
The biggest part is we want to make Flow, and I'll start off with the underlying reason on how we think about it, which will make it very obvious on why it's architected this way. We just want Flow to do two things really well. The first one is we want users to trust Flow. And second is, we want users to feel understood by Flow.
So, what that looks like is to say, I'm sending you a message here, "Naren, let's meet at 5:00." Actually, you know what? Let's do 6:00 PM. Standard dictation system would just write all of this word for word. Flow would write here, "Naren, let's meet at 6:00 PM," which actually when a user sees that they think that Flow is competent. They think that, "Oh, I can make mistakes and it'll be fine because I can go correct them with Flow." And that drives a lot of trust.
And underlying it is a set of models that we put together internally at the Wispr office, which has a number of sections around personalization, understanding the context, reducing hallucinations around making you sound more concise, have a reasoning engine that can understand if there's something you're saying as a command, which is like actually let's make that 6:00 PM versus things that you just want to, is like are parts of your message.
And at the core of it, what we started to realize is people think about Flow just as another person, and they have the same expectations of it. If you look at Slack for example, Slack is a tool. If Slack a bug happens like, oh, Slack bugged out. If Flow makes a mistake, people say, "Oh, Flow didn't understand me." And, "Oh, Flow didn't remember this." And because people have human-like expectations from Flow, everything that we do under the hood to architect the system are to match what human expectations would be like if Flow was a person.

Sam:
All right. Y'all here in a few seconds, you're going to hear a video demo of Flow that Wispr has put together. It is awesome. It's really cool and witty, and it's a really cool video. So, to see the visuals, of course, hop over to our YouTube channel, but just if you're here hanging out with us on audio, you will just get the audio version. So, hang tight.

Tanay Kothari:
And here's how. Hey, Sehaj, you want to meet at 5:00? Or actually, you know what? Let's do 6:00 PM. Yo, that's sick, fire emoji. Hey, listen, here's three things that make Flow unique. It gets named right even on common ones like yours. It lets you edit, especially if you change your mind while speaking, and it formats your messages just like you would. And when you're around other people. Plus, you can use Flow commands to use AI wherever you're working.
Flow, let's actually make this for our Spanish-speaking audience. Most Flow users actually use Flow more than their keyboards. You don't believe me? Go to flowvoice.ai to try it for yourself and download the app to use it everywhere. Cheers.

Steve Wozniak:
For those creative types like myself with very personal speaking styles, this is a godsend. I'm glad to see the launch of Flow today. This is what computers were meant to do for people.

Naren Ramaswamy:
Yeah, yeah, absolutely. I mean, that's the blur between humans and machines are getting more and more... There's just the distance between the two of them is through these platforms is going to reduce and that's sort of a sneak peek into what the future is going to look like. But bringing it back to the present, I know that you guys in the Wispr office, all of your employees use Flow regularly. I think every email that I send you, I get a response from you that is written in Flow. So, for our audience, would you be able to share what are some of the common use cases and applications that for which they use the platform?

Tanay Kothari:
Yeah, of course. So, one of the ones that's really taking up right now is everybody who uses Cursor is switching over to using Flow with it. And this is primarily to write code, build products. Same thing for Vizero, same thing for Replit. And when people are programming agents. So, as engineers, they're often early adopters of technology and everything that they have to do is now more natural language driven. And for that Flow is their default go-to.
Other use cases are a lot of our users are often highly ambitious people whose jobs involve a lot of communication. This can be with your team on Slack or other messaging platforms. This can be external to your company if you're messaging clients, potential customers, so on. I use it to do all of our customer support, all of my investor communications. I use it on Slack all day to respond to my entire team.
And those two, I would say are the biggest two use cases. But we've also seen come up is when people are interacting with AI agents like ChatGPT or Claude, what we've seen happen is, if you're typing, you try to write the shortest query possible that explains your problem. It's like toilet broken, how fix. It's like what you also put into Google, right? But if you can speak, you're like, "Hey, my toilet is broken, and actually when I pull the flush, it doesn't actually properly flush and whatever." You can go on a rant for 30 seconds. It's going to take you the same amount of time. But now, the system has so much more context and it gives you better results, which is what you really want.
And this is something that we're still understanding why people like using it more than they do ChatGPT voice, and other voice interfaces that these products are building into it, part to do with how we think about the whole interface in general, but also part of the fact that it just builds a habit and people start using it in one application, and then slowly they're like, "Why do I use my keyboard for anything at all?" And we slowly see people using Flow more than their keyboards in terms of just how much they output during the day.

Naren Ramaswamy:
Yeah, it's the new version of a keyboard that doesn't require a keyboard basically. So...

Tanay Kothari:
Yeah.

Naren Ramaswamy:
Yeah, just using your voice. That's cool.

Tanay Kothari:
Mm-hmm.

Naren Ramaswamy:
Moving to with AI tools like this, there's always a conversation around privacy and security of data. What's your stance on that? And curious how you're tackling that?

Tanay Kothari:
Yeah, great question. So, this is something that is very front and center for the entire product. People use Flow for their most sensitive messages, whether it's to their loved ones or to their team professionally or for all of the personal journaling notes and thoughts that they have. And so, by default, all your data stays locally on your computer, no one else can access it, but you. We don't save anything on our servers. Most AI tools today are privacy, privacy-wise is like, or I'll rephrase. Most AI tools today are data sharing opt out.
So, they opt you in by default, and then you can go and turn it off later on. And a lot of people just don't know about it. So, users just aren't educated well. Flow is by default, data sharing is off. You can turn it on, it'll help us improve our models more, but by default it's not. And that is one of the things that has helped us build a lot of trust with all our early users, and actually let them use the product for every single thing that they want to do without having to think twice.
Now, this does raise a question on if we're not collecting all of this data, how do we make our models better? There's a few things that we have for that, and parts of it also include federated learning that is privacy preserving, completely anonymous that we do. But for most part, the question is how do we use just the context that we have locally on a person's computer to give them the best personalized experience?

Naren Ramaswamy:
Yeah, that's fascinating. Thanks for going into the details around Flow. Just let's zoom out a little bit and talk about the AI space, and I'd love to get your thoughts. The jury is still out on how AI is going to transform our lives, and there's a lot of tools in the enterprise, there's tools in consumer. What are some of the challenges that you are seeing as an AI company today?

Tanay Kothari:
I would say a big one that is coming up is fundamentally right now when people are talking about generative AI, the standard conversation is around large language models and the way they're architected. Now, LLMs are fundamentally non-deterministic, which is great because they have a chance to be creative, they have a chance to reason, they have a chance to do a lot of different emergent behaviors, but they're non-determinism means that you can't use it for tasks that need to be 100% accurate.
They are pretty much like humans in that way. And what the implications of this are, oftentimes, companies are trying to replace everything that they do or different problem spaces with an AI-only solution instead of how they should be thinking about it as an AI-first solution. For example, if you are doing something for accounting, would you rather just rather have a smart person just do it on a piece of paper? Or would you want to give them a calculator as well?
And this is I think a fundamental, I would say, idea that is missed out by a lot of people, which is when you're building AI-based solutions, that just doesn't have to be the only thing in it. You need to back it up with a lot of business logic, with a lot of heuristics and algorithms that you develop yourself and use the LLM part of it for what it does really well, which is reasoning and creativity and knowledge about the world, and put those two systems together.
And I think this is one thing that I hope is going to emerge more and more over the next couple of years for us to be able to build even more powerful solutions because as non-determinism just means that for a lot of business use cases, people just are not adapting these large language models as much as they would've otherwise, if there were more guarantees that it's just not going to hallucinate randomly one day or write an extra zero in one of your invoices.

Naren Ramaswamy:
And you hit the nail on the head. I think, for me, that gets me excited as a VC because there's tremendous potential for this technology. But widespread adaption will require this determinism... In the enterprise, you can't afford to make those kinds of mistakes often. And so, what new tools can help AI tools be deterministic? Because not every company is a Wispr AI where you have AI engineers on your team that can build those systems internally.
So, definitely, an exciting opportunity for us as well as we continue to look for startups. Just to close out, I think what's... Tell us a little bit more about your public launch. How can users try Flow? And anything that you'd want to leave with the users before we close out?

Tanay Kothari:
Yeah, of course. So, so far, Flow, the biggest complaint that our users have had is that it is only available on Macs and people can't feel the magic of Flow on their phones or other laptops that they might have. So, one of the big things that we're launching is actually a web version of Flow that you can go and try it out. It'll be on our website, flowvoice.ai.
And once you go there, you can try it out. You can see how fast you are when you speak, and you can also then just download the application directly and get started. We're going to have a launch promo for all of the users who are going to sign up in the first two weeks of October, giving them three months off on our annual plan. So, if you want that, the code is going to be phlaunchday, and you can just add that in and you should be ready to go.

Naren Ramaswamy:
That's awesome. Cool. Thanks a lot, Tanay. I'm really excited about what you guys are building. And as someone who's had the chance to try Flow and use it in some of my daily tasks, I do think that the old technology of keyboards and the graphical user interface from several decades ago is going to be transformed in an AI era, and you guys are one of the pioneers in the space. So, I'm really excited about what you're building.

Tanay Kothari:
Thank you. Always love hearing from happy users.

Naren Ramaswamy:
That's great. Well, thanks a lot and yeah, good luck.

Tanay Kothari:
Thank you.

Naren Ramaswamy:
That was our podcast with Tanay Kothari, the CEO of Wispr AI. Hope you enjoyed it. For those of you who are interested in learning more about becoming an investor in our Deep Tech Fund, I encourage you to view the fund materials or just book time with our team. You can learn more at av.vc/funds/deeptech. We're actively raising our fifth fund and just completed our first investment from that fund, which was into Grok, which is a pioneering AI chip company. By investing in the fund, you'll have exposure to that company. Thank you for tuning in, and we hope you enjoyed the discussion.

Sam:
Thanks again for tuning into the Tech Optimist. If you enjoyed this episode, we'd really appreciate it if you'd give us a rating on whichever podcast app you're using and remember to subscribe to keep up with each episode. The Tech Optimist welcomes any questions, comments, or segment suggestions. So, please email us at info@techoptimist.vc with any of those, and be sure to visit our website at av.vc. As always, keep building.