Launchpad 2017

Primary Audience: Government/Adult Ed., Graduate, Undergrad, Community College, High School, Translators

Voccent is an AI-powered oracy platform for language learning. Teachers and trainers can create their own content or use built-in lessons to help students and professionals improve speaking skills, fluency, and emotional expression. Real-time feedback on voice, emotion, and delivery makes learning more engaging and measurable. Simple. Powerful. Customizable. Story-telling!

LaunchPad Questions

During the LaunchPad, the audience had an opportunity to ask questions about the products. The Tech Center shared those questions with the entrepreneurs and here are the responses. 

Can you describe what your platform does in one sentence?

Voccent is an emotionally intelligent platform that builds oracy through speaking practice.

How do you capture within-language pronunciation variability? How do you automatically decide what's "correct?'

We model pronunciation variability using acoustic embeddings from native speaker data and analyze learner speech with over 65 algorithms at higher-than-IPA precision—including emotion, tone, pitch, and prosody—to compare against native-like patterns. “Correctness” is based on proximity to these patterns and contextual intelligibility.

What kind of training and/or guidance can you give your clients and students on how to use the platform? Do you have any testimonials from current clients?

We provide onboarding tutorials, scaffolded walkthroughs, and in-platform feedback to guide students. For educators and clients, we offer training sessions, usage analytics, and curriculum integration support.

Yes—we have testimonials from school partners highlighting improved speaking confidence, increased student engagement, and ease of adoption.

What kind of semantic analysis does your app perform?

Voccent combines lightweight semantic analysis with deep emotional analysis and speech clarity assessment to evaluate intent, tone, expressiveness, and intelligibility—ensuring learners communicate meaningfully, clearly, with emotional resonance, and build better memory through active speaking.

Any concrete data on the app's impact on learners' oral skills?

In pilot studies, learners using Voccent showed a 38% improvement in fluency scores and a 52% increase in speaking confidence over 6 weeks, with educators reporting higher classroom participation and more natural speech patterns.

Given the platform's open content, how do you filter out inappropriate material?

Voccent uses automated content moderation, including keyword filtering, speech sentiment analysis, and user behavior monitoring, to flag and block inappropriate material in real time, with human review for edge cases.

If the content is created by crowdsourcing, what is the built-in mechanism to assess, check and curate the content and what are the criteria?

Content on Voccent can be created by professional vendors, curriculum authors, teachers, or crowdsourced. All content is peer-reviewed, automatically evaluated across 65+ analysis layers, and ranked by learner feedback. Criteria include linguistic accuracy, cultural sensitivity, clarity of speech, emotional tone, and relevance to learning objectives.

What languages does Voccent support? Does it support Arabic-based scripts?

Voccent supports a wide range of languages, including major ones like Vietnamese, Russian, English, Spanish, German, French, Italian, Polish, Portuguese, Ukrainian, Indonesian, Thai, Japanese, and Persian (Farsi)—as well as rarer languages like Mandinka and Sakha. It also supports Arabic-based scripts through languages like Farsi and potentially Arabic.

Some content on Voccent is public, while other materials are private and organization-specific, based on the user’s account type and content availability.

How do you measure and capture emotion?

Voccent captures emotion using 65+ signal-processing algorithms that analyze vocal features such as pitch, tone, rhythm, intensity, tempo, and spectral qualities. A moving analysis window of 15–20 ms ensures fine-grained resolution, and the minimum input length for reliable emotion detection is 3 seconds. These features are mapped to emotional states like joy, frustration, or hesitation, enabling real-time emotional feedback.

How do you keep energy consumption that low?

Voccent keeps energy consumption low by avoiding large language models and not using GPUs. It relies on lightweight, optimized signal-processing algorithms running on commodity servers to deliver real-time analysis with minimal computational and energy overhead.

How does Voccent ensure the content created by users are appropriate for language learners?

Voccent ensures appropriateness through a multi-layered system: automated filtering for profanity and harmful content, deep analysis of emotional tone and clarity, peer review, and learner feedback. All content—whether from users, teachers, or vendors—is evaluated for linguistic accuracy, cultural sensitivity, and educational value before being made widely accessible.

Who do you hope will be the economic buyers and who are the influencers that will be your champions to motivate buyers to buy.

Economic buyers include school districts, language program directors, ministries of education, the Department of Defense, government agencies, and institutional training departments. Influencers and champions are teachers, speech-language professionals, edtech coordinators, and community language advocates who see firsthand the platform’s impact on speaking confidence and engagement.

What does Classroom Acceptance mean in the context of your last slide?

Classroom Acceptance means that teachers find the platform easy to integrate, students enjoy using it, and it fits naturally into existing curricula—leading to consistent use, improved speaking outcomes, and strong endorsement from educators.

How do you account for individual variation in pronunciation and emotional variability?

Voccent accounts for individual variation by comparing each speaker’s output to a spectrum of native-like acoustic patterns rather than a fixed standard. It uses adaptive thresholds, emotion-aware modeling, and personalized baselining to evaluate clarity, intent, and expressiveness within each learner’s unique vocal and emotional range. Analysis depth is also tuned to the difficulty level—lighter for beginners and more detailed for advanced learners.

What age range and language levels is it for?

Voccent is designed for learners aged 6 and up, supporting all language proficiency levels—from absolute beginners to advanced speakers—by dynamically adjusting task difficulty, feedback style, and analysis depth to suit the learner’s age and skill level.

Can you explain your pedagogical approach?

Voccent’s pedagogical approach centers on oracy-first learning—prioritizing speaking and listening as foundational skills. It emphasizes:

– Active speaking practice with real-time feedback on clarity, emotion, and intent.
– Situated learning, where learners use language in contextually meaningful ways.
– Nonlinear progression, allowing learners to focus on speaking tasks aligned with their interests and needs.
– High-frequency, low-barrier input, enabling consistent engagement without overwhelming complexity.
– Emotional engagement, reinforcing memory and motivation through affective feedback.
– Learn by creating, where learners contribute content—questions, prompts, responses—building skills through expressive production.

This approach supports natural language acquisition, builds confidence, and improves real-world communication skills. Practice is key. Memory extension is a guarantee of success.

What’s your market entry strategy?

Voccent’s market entry strategy focuses on:

– Pilot programs with schools and districts to demonstrate impact on oracy and engagement
– Partnerships with language access programs, speech professionals, and community initiatives
– Direct outreach to educators via webinars, demos, and conference presentations
– Leveraging champions and early adopters to drive referrals and case studies
– Low-bandwidth deployment that enables rapid adoption in under-resourced areas
– Flexible licensing models for institutions, NGOs, and government agencies

This strategy builds credibility, drives word-of-mouth, and scales through trusted networks.

How do you make money? (And a sustainable business)

Voccent generates revenue through:

– Institutional licensing for schools, districts, training centers, and government agencies
– Per-seat subscriptions for organizations with variable learner counts
– Custom content partnerships with curriculum vendors and educational publishers
– Grant-funded deployments in public sector and language-access initiatives
– Regular consumer subscriptions for language learning

Sustainability is driven by low infrastructure costs (no GPUs, no LLMs), scalable server architecture, and a growing base of reusable, user-generated content that increases platform value over time.

1) Sounding happy in one culture is not the same as another— how does the program take that into consideration? 2) Pronunciation & emotion is highly culturally bounded. How do you address that? 3) How do you take into account cultural differences while measuring emotions? Eg. A native Taishan speaker might sound like screaming while saying hello.

Voccent’s Approach to Cultural Variation in Emotion and Pronunciation

1. Sounding happy in one culture is not the same as another—how does the program take that into consideration?
= Voccent is grounded in the universality of primary emotions—such as joy, sadness, fear, and anger—which are biologically hardwired and expressed through consistent micrometric features across all humans. While cultural norms influence how these emotions are outwardly expressed (e.g. tone, volume, pacing), Voccent analyzes deeper acoustic signatures—such as pitch contour, spectral balance, and voice stability—to accurately detect the emotion regardless of cultural surface variations. When the source and destination languages are known, we fine-tune interpretation even further to respect cultural nuance.

2. Pronunciation & emotion is highly culturally bounded. How do you address that?
= Voccent evaluates pronunciation by comparing speech to a range of native-like acoustic patterns, not a single idealized model. Emotional interpretation similarly relies on physiological speech features, not just outward tone. This allows the platform to recognize culturally appropriate variations in both pronunciation and expressiveness while still giving precise, respectful feedback.

3. How do you take into account cultural differences while measuring emotions? Eg. A native Taishan speaker might sound like screaming while saying hello.
= Even if a speech pattern sounds loud or intense, Voccent’s micro-acoustic analysis—measuring elements like jitter, shimmer, and energy distribution—detects whether there is actual physiological stress or just culturally normal expressiveness. In cases like a Taishan greeting, the system will identify the interaction as emotionally neutral or positive if the metrics indicate no tension or distress. With knowledge of both the speaker’s and listener’s language and cultural context, the system adapts to provide accurate and appropriate interpretation.

This approach leverages universal emotional biology, acoustic precision, and contextual adaptability to deliver accurate, culturally sensitive feedback. For tributary emotions more research is needed.

Is the Ukrainian language available in the app?

Yes

Is it correct to assume that your app only focus on the mechanical aspect of speaking (pronunciation and tones for example), but not the meaning making aspects?

No, that’s not correct. While Voccent deeply analyzes the mechanical aspects of speech—like pronunciation, tone, pitch, and clarity—it also addresses meaning-making through lightweight semantic analysis, contextual intent recognition, and emotional expressiveness. The platform evaluates whether a learner’s speech is not just clear, but meaningful, relevant, and appropriately expressive within the conversational context.

What language varieties is your tool designed for? Can I target Glaswegian English or Quebecois French pronunciation?

Yes—Voccent is designed to support language varieties and dialects such as Glaswegian English and Québécois French by leveraging acoustic pattern ranges from native and native-adjacent speakers. The system avoids fixed standards and instead compares learner speech to a spectrum of native-like regional pronunciations, enabling dialect-specific feedback.

Even constructed languages (like Esperanto or fictional languages) are totally possible, as long as there’s sufficient spoken data to establish reference acoustic patterns.

Contact Information

TECH CENTER

1890 East West Road

Moore Hall 256

Honolulu, HI 96822

tech.center@hawaii.edu

Follow Us

The Language Flagship Technology Innovation Center is funded under a grant from the Institute of International Education (IIE), acting as the administrative agent of the Defense Language and National Security Education Office (DLNSEO) for The Language Flagship. One should not assume endorsement by the Federal Government. Project P.I.: Dr. Julio C. Rodriguez