MeetDot: Videoconferencing with Live Translation Captions

Nancy J. Delong

The existing pandemic created videoconferencing an indispensable part of our performing life.

In buy to assist men and women, who discuss distinct languages, properly converse, a recent paper on arXiv.org proposes a videoconferencing resolution with stay translation captions.

Picture credit history: Mbrickn by using Wikimedia (CC BY four.)

There, participants can see an overlaid translation of other participants’ speech in their chosen language. The incoming speech signal is processed in a streaming manner, transcribed in the speaker’s language, and utilised as enter to a equipment translation system. The researchers use several features to empower a improved person experience as clean pixel-intelligent scrolling of the captions or fading textual content that is very likely to adjust.

A complete analysis suite is applied to correctly compute metrics like latency, caption flicker, and accuracy and motivate rapidly enhancement in accordance to these metrics.

We existing MeetDot, a videoconferencing system with stay translation captions overlaid on monitor. The system aims to facilitate conversation among men and women who discuss distinct languages, therefore lessening communication limitations among multilingual participants. At the moment, our system supports speech and captions in four languages and combines automatic speech recognition (ASR) and equipment translation (MT) in a cascade. We use the re-translation system to translate the streamed speech, resulting in caption flicker. In addition, our system has very strict latency necessities to have appropriate phone excellent. We put into action several features to enrich person experience and minimize their cognitive load, this sort of as clean scrolling captions and lessening caption flicker. The modular architecture makes it possible for us to integrate distinct ASR and MT products and services in our backend. Our system gives an built-in analysis suite to optimize critical intrinsic analysis metrics this sort of as accuracy, latency and erasure. Ultimately, we existing an innovative cross-lingual word-guessing video game as an extrinsic analysis metric to evaluate conclusion-to-conclusion system efficiency. We approach to make our system open up-supply for exploration uses.

Analysis paper: Arkhangorodsky, A., “MeetDot: Videoconferencing with Reside Translation Captions”, 2021. Backlink: https://arxiv.org/ab muscles/2109.09577


Next Post

Pioneering software can grow and treat virtual tumours using A.I.

The EVONANO platform enables scientists to grow virtual tumours and use artificial intelligence to quickly optimise the design of nanoparticles to treat them. The means to grow and treat virtual tumours is an vital move towards developing new therapies for most cancers.  Importantly, scientists can use virtual tumours to optimise […]