More

    [Pangyo Tech] Kakao Enterprise presents 4 papers at the World Voice Processing AI Conference… “Proof of global AI technology!”

    Provided by Kakao Enterprise
    Provided by Kakao Enterprise

    Kakao Enterprise announced that it will present a total of four research results at INTERSPEECH, the world’s largest academic conference in the field of artificial intelligence (AI) voice processing.

    InterSpeech, which celebrates its 23rd anniversary this year, will be held at Songdo Convensia in Incheon from the 18th to the 22nd. The global AI industry is attracting attention as leading AI companies from around the world, including Kakao Enterprise, gather to share the latest research results.

    Kakao Enterprise participated as a platinum sponsor in this year’s InterSpeech, and presented an AI that speaks and understands like a human through a total of four papers. One of these papers was recognized as being of a high standard among papers registered on InterSpeech and was released as an oral presentation.

    Kakao Enterprise focuses on ‘practical AI’ and is investing heavily in research that connects global technology to various services. Since its spin-off from Kakao in 2019, it has published papers on InterSpeech for three consecutive years, and has published a total of eight papers as of this year.

    This research also shows that people can easily access and use AI that can tell how close your English pronunciation is to a native speaker, as well as technologies that can be turned into actual services, such as ▲AI that speaks like a human, ▲AI that understands long speech well, and ▲AI that understands complex human emotions. It is meaningful in that it showcases understandable technology. These technologies are expected to lead to services that can make many people’s lives more convenient, such as AI chatbots that can have natural everyday conversations like humans, and AI contact centers that can streamline the work of human counselors.

    Provided by Kakao Enterprise
    Provided by Kakao Enterprise

    Researchers Lim Dan, Jeong Seong-hee, and Kim Eui-seong presented research on AI that speaks like a human. ‘JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech’ proposes a methodology for developing natural, high-quality speech that is difficult to distinguish from real people.

    In an oral presentation on the 19th, researcher Dan Lim presented a method to simplify the existing ‘Neural TTS (Text to Speech)’ development process with the ‘E2E (End to End)-TTS (End to Speech)’ technique. While the existing neural TTS method had to learn two models (acoustic feature generator, neural vocoder) separately for voice processing, the E2E-TTS technique allows learning two models at once, which not only shortens the learning process. It can produce higher quality voices. This methodology is applied to all services that use Kakao Enterprise’s deep learning TTS (Text to Speech), including Kakao Enterprise’s AI contact center, ‘Kakao i Connect Center’ and ‘Hey Kakao’.

    In order to speak like a human, the ability to understand long speech must be a priority. ‘Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers’, in which researcher Ji-hye Lee participated as co-first author, is a study in which speech recognition errors occur due to mismatches between training data and test data characteristics, especially when the voice length becomes long. We propose a methodology to improve deletion errors, which are voice recognition errors. Providing stable recognition performance without performance degradation even for voice input with characteristics different from the environment in which the existing voice recognizer learned is a very important task in commercializing voice recognition. Kakao Enterprise confirmed a 27.6% performance improvement compared to the previous method through a new methodology. In the future, it is expected that a stable high voice recognition rate will be achieved in various services of Kakao Enterprise where voice recognition is used.

    Understanding complex human emotions is also an eternal challenge for the AI ​​industry. ‘The Emotion is Not One-hot Encoding: Learning with Grayscale Label for Emotion Recognition in Conversation’, written by researcher Lee Joo-seong, presents a methodology for learning complex human emotions. Since multiple emotions are involved in human speech, it is difficult to understand the intention and context of the conversation using existing one-hot encoding, which recognizes only one emotion. Researcher Joo-seong Lee proposed a new methodology to create a grayscale label to learn the distribution of various emotions. This methodology is applied to Kakao Enterprise’s service-type AI, ‘AIaaS (AI as a Service)’, and provides technology to understand the conversation context overall, analyze the meaning contained in utterances, and provide natural answers.

    Lastly, Kakao Enterprise introduced AI that tells you how close your English pronunciation is to that of a native speaker. ‘Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning’, jointly published by researchers Eui-Seong Kim, Jae-Jin Jeon, Hye-Ji Seo, and Hoon Kim, is a paper that deals with a method of scoring and evaluating the English pronunciation of non-native English speaking learners. We propose a new algorithm that evaluates English pronunciation through pre-training and fine-tuning even with small data using deep learning self-supervised learning (SSL). This methodology, which showed a 30% performance improvement compared to existing learning methods, was also introduced in the mobile English learning application ‘Vivaboo English’ jointly developed with English Hunt Co., Ltd., an English education company, to implement AI concentration analysis and pronunciation evaluation functions as well as personalization. It is used to provide AI learning reports.

    Choi Dong-jin, Chief Artificial Intelligence Officer (CAIO) and Vice President of Kakao Enterprise, said, “Kakao Enterprise is focusing on practical AI that users can directly utilize and experience, rather than difficult and difficult-to-access technology. “We expect that AI powerhouses from around the world gathered at Interspeech will also pay attention to and sympathize with Kakao Enterprise’s policy,” he said. “We plan to actively support and continue to invest a lot of effort and investment in upgrading services such as AI chatbots and AI contact centers.”

    Source: Pangyo Techno Valley Official Newsroom
    → Go to ‘Asian Innovation Hub Pangyo Techno Valley 2022’ news