
Primary Audience: Government/Adult Ed., Graduate, Undergrad, Community College, High School, Translators
Voccent is an AI-powered oracy platform for language learning. Teachers and trainers can create their own content or use built-in lessons to help students and professionals improve speaking skills, fluency, and emotional expression. Real-time feedback on voice, emotion, and delivery makes learning more engaging and measurable. Simple. Powerful. Customizable. Story-telling!

LaunchPad Questions
During the LaunchPad, the audience had an opportunity to ask questions about the products. The Tech Center shared those questions with the entrepreneurs and here are the responses.
Can you describe what your platform does in one sentence?
How do you capture within-language pronunciation variability? How do you automatically decide what's "correct?'
What kind of training and/or guidance can you give your clients and students on how to use the platform? Do you have any testimonials from current clients?
Yes—we have testimonials from school partners highlighting improved speaking confidence, increased student engagement, and ease of adoption.
What kind of semantic analysis does your app perform?
Any concrete data on the app's impact on learners' oral skills?
Given the platform's open content, how do you filter out inappropriate material?
If the content is created by crowdsourcing, what is the built-in mechanism to assess, check and curate the content and what are the criteria?
What languages does Voccent support? Does it support Arabic-based scripts?
Some content on Voccent is public, while other materials are private and organization-specific, based on the user’s account type and content availability.
How do you measure and capture emotion?
How do you keep energy consumption that low?
How does Voccent ensure the content created by users are appropriate for language learners?
Who do you hope will be the economic buyers and who are the influencers that will be your champions to motivate buyers to buy.
What does Classroom Acceptance mean in the context of your last slide?
How do you account for individual variation in pronunciation and emotional variability?
What age range and language levels is it for?
Can you explain your pedagogical approach?
– Active speaking practice with real-time feedback on clarity, emotion, and intent.
– Situated learning, where learners use language in contextually meaningful ways.
– Nonlinear progression, allowing learners to focus on speaking tasks aligned with their interests and needs.
– High-frequency, low-barrier input, enabling consistent engagement without overwhelming complexity.
– Emotional engagement, reinforcing memory and motivation through affective feedback.
– Learn by creating, where learners contribute content—questions, prompts, responses—building skills through expressive production.
This approach supports natural language acquisition, builds confidence, and improves real-world communication skills. Practice is key. Memory extension is a guarantee of success.
What’s your market entry strategy?
– Pilot programs with schools and districts to demonstrate impact on oracy and engagement
– Partnerships with language access programs, speech professionals, and community initiatives
– Direct outreach to educators via webinars, demos, and conference presentations
– Leveraging champions and early adopters to drive referrals and case studies
– Low-bandwidth deployment that enables rapid adoption in under-resourced areas
– Flexible licensing models for institutions, NGOs, and government agencies
This strategy builds credibility, drives word-of-mouth, and scales through trusted networks.
How do you make money? (And a sustainable business)
– Institutional licensing for schools, districts, training centers, and government agencies
– Per-seat subscriptions for organizations with variable learner counts
– Custom content partnerships with curriculum vendors and educational publishers
– Grant-funded deployments in public sector and language-access initiatives
– Regular consumer subscriptions for language learning
Sustainability is driven by low infrastructure costs (no GPUs, no LLMs), scalable server architecture, and a growing base of reusable, user-generated content that increases platform value over time.
1) Sounding happy in one culture is not the same as another— how does the program take that into consideration? 2) Pronunciation & emotion is highly culturally bounded. How do you address that? 3) How do you take into account cultural differences while measuring emotions? Eg. A native Taishan speaker might sound like screaming while saying hello.
1. Sounding happy in one culture is not the same as another—how does the program take that into consideration?
= Voccent is grounded in the universality of primary emotions—such as joy, sadness, fear, and anger—which are biologically hardwired and expressed through consistent micrometric features across all humans. While cultural norms influence how these emotions are outwardly expressed (e.g. tone, volume, pacing), Voccent analyzes deeper acoustic signatures—such as pitch contour, spectral balance, and voice stability—to accurately detect the emotion regardless of cultural surface variations. When the source and destination languages are known, we fine-tune interpretation even further to respect cultural nuance.
2. Pronunciation & emotion is highly culturally bounded. How do you address that?
= Voccent evaluates pronunciation by comparing speech to a range of native-like acoustic patterns, not a single idealized model. Emotional interpretation similarly relies on physiological speech features, not just outward tone. This allows the platform to recognize culturally appropriate variations in both pronunciation and expressiveness while still giving precise, respectful feedback.
3. How do you take into account cultural differences while measuring emotions? Eg. A native Taishan speaker might sound like screaming while saying hello.
= Even if a speech pattern sounds loud or intense, Voccent’s micro-acoustic analysis—measuring elements like jitter, shimmer, and energy distribution—detects whether there is actual physiological stress or just culturally normal expressiveness. In cases like a Taishan greeting, the system will identify the interaction as emotionally neutral or positive if the metrics indicate no tension or distress. With knowledge of both the speaker’s and listener’s language and cultural context, the system adapts to provide accurate and appropriate interpretation.
This approach leverages universal emotional biology, acoustic precision, and contextual adaptability to deliver accurate, culturally sensitive feedback. For tributary emotions more research is needed.
Is the Ukrainian language available in the app?
Is it correct to assume that your app only focus on the mechanical aspect of speaking (pronunciation and tones for example), but not the meaning making aspects?
What language varieties is your tool designed for? Can I target Glaswegian English or Quebecois French pronunciation?
Even constructed languages (like Esperanto or fictional languages) are totally possible, as long as there’s sufficient spoken data to establish reference acoustic patterns.
Contact Information
TECH CENTER
1890 East West Road
Moore Hall 256
Honolulu, HI 96822
tech.center@hawaii.edu
Follow Us
The Language Flagship Technology Innovation Center is funded under a grant from the Institute of International Education (IIE), acting as the administrative agent of the Defense Language and National Security Education Office (DLNSEO) for The Language Flagship. One should not assume endorsement by the Federal Government. Project P.I.: Dr. Julio C. Rodriguez
