Nagoya University Launches First Public Japanese Conversational AI System J-Moshi
2025-12-26 14:04
Source:Nagoya University
Favorite

Researchers at Nagoya University in Japan have made significant progress in developing artificial intelligence systems that mimic human speech patterns, launching J-Moshi, the first publicly available AI system specifically designed for Japanese conversational modes.

J-Moshi successfully captures the natural fluency of Japanese conversations, particularly the common short verbal responses known as "aizuchi" in Japanese, such as "Sou desu ne" (That's right) and "Naruhodo" (I see), which are used more frequently in Japanese dialogues than similar responses in English. Traditional AI systems struggle to employ "aizuchi" because they cannot speak and listen simultaneously, but the emergence of J-Moshi solves this problem and has been warmly welcomed by Japanese speakers.

The system was developed by researchers in the Higashinaka Laboratory at Nagoya University's Graduate School of Informatics, drawing on the English Moshi model created by the non-profit laboratory Kyutai. It took about four months to develop and was trained using multiple Japanese speech datasets, including the J-CHAT dataset created by the University of Tokyo (approximately 67,000 hours of audio) and high-quality dialogue datasets collected by the laboratory. To increase training data, the researchers also developed a text-to-speech program to convert written chat dialogues into artificial voices. The related research results have been published on the arXiv preprint server.

In January 2024, demonstration videos of J-Moshi caused widespread attention on social media. Beyond technical innovation, the system has potential applications in language learning, helping non-native speakers practice and understand natural Japanese conversation patterns. The research team is also exploring its commercial applications in call centers, healthcare, and customer service, but notes that limited Japanese speech data resources pose challenges for application in professional fields or industries.

The research team leader, Professor Ryuichiro Higashinaka, previously served as a corporate researcher at NTT for 19 years and joined Nagoya University five years ago, focusing on the development of consumer dialogue systems and voice agents. His laboratory has 20 members and is currently addressing challenges that bridge theoretical research and practical applications, including understanding conversational timing in Japanese dialogues and deploying AI guides in public places such as aquariums.

Professor Higashinaka stated that technologies like J-Moshi can be applied to systems requiring human operation, such as guide robots at Osaka's NIFREL Aquarium, which can independently handle daily interactions and connect to human operators when visitors encounter complex questions. He also pointed out that Japanese AI research faces unique challenges such as shortages of speech resources and privacy issues, forcing researchers to adopt creative solutions.

Although J-Moshi has achieved major success in capturing natural Japanese conversation patterns, dialogue systems still face difficulties in handling complex social contexts, such as considering interpersonal relationships and physical environments, recognizing visual cues like facial expressions, etc. Currently, J-Moshi still requires human support systems in most practical applications, and researchers are working to enhance these systems, including developing dialogue summarization and dialogue failure detection systems.

In addition, the laboratory's research scope is broad, not limited to J-Moshi, and includes various human-machine interaction methods. They collaborate with colleagues dedicated to researching realistic humanoid robots, developing robot systems that coordinate speech, gestures, and movements for natural communication. These robots represent the latest advancements in the AI field, requiring dialogue systems not only to understand the nuances of conversation but also to possess physical presence and spatial awareness.

Currently, the team's paper on J-Moshi has been accepted for publication at the international conference Interspeech, and Professor Higashinaka and his team look forward to presenting their research results in Rotterdam, Netherlands, in August 2025. Professor Higashinaka stated: "In the near future, we will witness the emergence of systems capable of seamlessly collaborating with humans through natural speech and gestures. I am eager to create the foundational technologies essential for such a transformative society."

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com