Skip to Content

Sesame: Reimagining Voice AI – A Human-Centric Approach to Conversational Interfaces

How many times have you yelled "Hey Siri," only to find yourself reaching for your phone to finish the task? Voice assistants, from phone assistants and smart speakers to future wearables, often fall short. While Siri, Alexa, and Gemini can understand commands and execute actions, they lack genuine conversational ability. They remain mere tools, cold and impersonal. A team of veteran Meta engineers, however, is attempting to change this, creating a truly empathetic voice AI. They've founded Sesame, a company focused on building AI that doesn't just understand your words but also your emotions and the nuances of your tone. This ambitious endeavor has already secured significant investment from prominent venture capital firms like Andreessen Horowitz (a16z), and they are currently in talks for a $200 million Series B funding round.

The Team Behind Sesame: Experience and Vision

Leading Sesame is Brendan Iribe, co-founder of Oculus VR, the company acquired by Facebook (now Meta). This connection, coupled with Sesame's planned use of glasses as a primary interface, makes its strategic direction intriguing – a subtle challenge, perhaps, to his former employer. The company's impressive team comprises serial entrepreneurs with successful exits.

  • Brendan Iribe: Sold Oculus VR to Facebook for $2 billion, establishing himself as a key figure in the virtual reality (VR) landscape.
  • Ankit Kumar: Co-founder of augmented reality (AR) startup Ubiquity6, acquired by Discord in 2021.

Iribe's deep VR experience and direct involvement in Meta's technological strategy are complemented by Kumar's expertise in AR and his active participation in the startup ecosystem. As tech giants like Meta aggressively invest in AI devices, these two tech veterans have joined forces to capture a significant share of this burgeoning market.

Their vision transcends the limitations of current voice assistants. They believe that the future of AI interaction isn't about issuing commands to a cold machine; it's about engaging in natural conversations with an AI that feels like a genuine companion. As Iribe stated on social media, "We are currently in a valley, but we are optimistic that we can climb out of it." This statement reflects both dissatisfaction with the current limitations of voice AI and a clear expectation for the company's future achievements.

Adding further strength to the team is Ryan Brown, the former research engineering director at Meta Reality Labs, who joined Iribe and Kumar as a co-founder in 2023.

Sesame's Technological Approach: Conversational Speech Models (CSMs)

Unlike Meta, Sesame focuses on developing Conversational Speech Models (CSMs). Built on Meta's Llama architecture, these models aim to bridge the interaction gap between AI and humans. The core of Sesame’s innovation lies in its ability to create truly empathetic and engaging conversational experiences.

The company has developed two flagship voice assistants:

  • Maya: A warm and energetic female voice assistant.
  • Miles: A friendly and subtly humorous male voice assistant.

These assistants stand out due to their ability to handle interruptions seamlessly and adjust their tone to match the context of the conversation. As stated on Sesame's website, "The computers of the future should be as real as life itself."

Practical demonstrations highlight the difference. When Maya is interrupted mid-sentence, she effortlessly pauses, listens attentively to the interruption, and then seamlessly resumes the conversation, understanding the context. This contrasts sharply with the more robotic responses of Siri and other assistants. Maya demonstrates human-like conversational skills: adjusting the pace and rhythm of the dialogue, modulating her tone appropriately, and even inferring potential user needs from vocal cues. For instance, if she detects a downcast tone, she might proactively ask, "Are you alright?" offering words of encouragement.

Currently, Maya and Miles are available in English on Sesame's website for free demonstrations. Future plans include expanding support to over 20 languages. Simultaneously under development are AI-powered voice glasses, designed to integrate the speech model, offering a truly ubiquitous companion experience. However, these glasses are still in the prototype phase.

The Competitive Landscape and Sesame's Funding

Sesame's bold challenge to reshape the essence of voice AI has thrust the company into the spotlight from its inception. This attention, even before the launch of its core product (the AI voice glasses), and despite the product still requiring market validation, has attracted the interest of major Silicon Valley venture capitalists.

The company operates in a fiercely competitive market. Major tech giants are heavily investing in their own voice assistant products:

  • Meta: Continues developing Llama, expanding its voice capabilities, and integrating Meta AI into Ray-Ban Meta smart glasses for intuitive voice interaction.
  • Google: Has launched Gemini, partnering with Samsung to make it the default assistant on Galaxy phones, replacing Bixby.
  • Amazon: Recently upgraded Alexa, improving its conversational abilities to maintain its market position.

Despite the lack of a commercially available product, Sesame has secured significant funding from top-tier Silicon Valley investors, based on its groundbreaking voice technology and the experience of its founding team. In March 2025, the company announced it was in talks for a Series B funding round of $200 million (approximately NT$6.5 billion), valuing the company at $1 billion (approximately NT$32.5 billion), led by Sequoia Capital and Spark Capital. This followed a successful Series A funding round of $47.5 million (approximately NT$1.5 billion), led by a16z, with participation from Matrix Partners and Spark Capital.

Early investors like a16z, Spark Capital, and Matrix Partners clearly value the team's ambition to break free from the cold, command-driven interactions of traditional voice assistants. Sesame acknowledges that improvements are still needed in areas such as the naturalness of rhythm and intonation. However, the significant investment demonstrates a strong belief in Sesame’s potential to revolutionize the human-computer interaction paradigm.

The Future of Voice AI: Empathy and Natural Conversation

Sesame's focus on creating empathetic and conversational AI assistants represents a significant shift in the field. While current voice assistants excel at task completion, they often lack the emotional intelligence and natural conversational flow that characterize human interactions. Sesame aims to bridge this gap by creating AI that understands not only the literal meaning of words but also the underlying emotions and intentions.

This approach has several potential benefits:

  • Enhanced User Experience: More natural and engaging interactions lead to a more positive user experience, fostering a greater sense of connection between the user and the AI.
  • Improved Accessibility: AI assistants capable of understanding nuanced language and emotional cues can be particularly beneficial for users with communication difficulties.
  • New Application Possibilities: Empathetic AI could unlock new possibilities in fields like mental health support, education, and customer service, offering more personalized and human-centered interactions.

The development of AI voice glasses represents a further step towards creating a truly ubiquitous companion experience. By integrating the advanced conversational speech models into a wearable device, Sesame aims to provide users with access to a sophisticated and emotionally intelligent AI assistant whenever and wherever they need it. This vision goes beyond simply providing information or completing tasks; it's about creating a technology that provides genuine companionship and emotional support.

The challenges ahead are significant. Creating a truly empathetic and natural-sounding AI requires overcoming considerable technical hurdles, particularly in areas such as natural language processing, speech synthesis, and emotional recognition. Furthermore, the market is highly competitive, with established tech giants investing heavily in their own voice assistant technologies.

However, Sesame’s unique focus on emotional intelligence and its experienced team position it well to capture a significant share of the market. The substantial funding secured from top-tier venture capitalists demonstrates a strong belief in the company's vision and potential. The future of voice AI may not just be about talking to machines; it may be about connecting with them on an emotional level. Sesame's journey to create a truly human-centric voice AI is just beginning, and its success could redefine the way we interact with technology for years to come.

Revolutionizing Photo Editing: Google Gemini's AI-Powered Image Transformation