The future of mobile communication: IVAS audio call


Voice is our primary means of communication, and telephony has enabled us to connect using our voices for over a century. The phone call as we know it has evolved from analogue to digital, from fixed to mobile, and from low speech quality to natural speech quality. One major advancement, however, was still lacking: how to enable a fully authentic, immersive sound to be transmitted, live.
The introduction of the IVAS (Immersive Voice and Audio Services) codec, standardized by 3GPP in Release 18 in June this year represents a major advancement in audio technology. Unlike traditional monophonic voice calls, IVAS enables the transmission of immersive, three-dimensional audio, offering a richer, more lifelike communication experience. This innovation is made possible using new audio formats optimized for conversational spatial audio experience. One such example is a new Metadata-Assisted Spatial Audio format, MASA, which uses only two audio channels and metadata for spatial audio descriptions. Spatial audio calls allow users to experience sound as though it were happening in real life, complete with features like head tracking.
Below we will explore the challenges of bringing 3D live calling to mobile phones, the requirements addressed in spatial communication and the new IVAS codec, and the game-changing impact live 3D audio will have for people, mobile operators, and business smartphones.
Head of Product Management, Nokia Technologies.
Bringing 3D calling to Mobile Phones
The last major innovation in voice calling was the EVS codec, introduced in 2014 and recognized by consumers as HD Voice+. While it significantly enhanced call quality, like all previous codecs, it only offered a monophonic listening experience.
With the introduction of 3D audio calling—the biggest leap in voice-calling audio technology in decades—comes the challenge of creating an authentic, immersive experience in everyday communication. While voice technology has evolved significantly – from analog to digital, fixed to mobile, and from low quality to natural speech quality – transmitting spatial audio, where sounds are perceived as naturally coming from all around, is far more complex to recreate in mobile environments.
Achieving this level of immersive sound experience has been easier in controlled settings like movie theaters and video games, where sound design is a core element, but reproducing it in everyday mobile calls introduces a range of technical hurdles including real-time spatial sound processing, hardware constraints, and ensuring compatibility across devices.
The Immersive Voice and Audio Services (IVAS) voice codec is therefore the most significant step forward in voice-call audio technology for decades.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
How to Tackle and Overcome Spatial Communication Challenges
There have been several challenges to overcome for Immersive Voice to become a robust spatial audio solution. A key issue is noise reduction, crucial for enhancing speech clarity in settings like concerts or nature. Traditional noise reduction methods often only filter out continuous sounds, such as air conditioning hums or traffic noise, but often leave other background noise. Wind interference also poses a challenge by introducing unwanted noise and causing fluctuations in audio levels.
However, recent advancements in machine learning and intelligent noise reduction have addressed these issues. Immersive audio technology, for example, is designed to intelligently adjust how much background noise is reduced depending on the surrounding environment, as well as providing users control, allowing individuals to manually adjust the levels of noise reduction. This ensures that the essential sounds are transmitted while minimizing unwanted background noise.
Immersive audio setups with multiple microphones and loudspeakers also face a major obstacle – acoustic echo. This happens when microphones pick up sound from nearby speakers, causing unwanted feedback. The problem is even more challenging in setups with spatial audio, where the placement and number of loudspeakers affect sound quality and the device’s ability to capture spatial audio. Traditional Acoustic Echo Cancellation (AEC) methods often do not work well in these complex environments. To solve this, a machine-learning-based spatial AEC solution was created, which removes the loudspeaker sound from the microphone input using a reference signal. This improves audio quality, especially for spatial audio in real-time voice applications.
Introducing the IVAS codec
To bring spatial audio to mobile phone calling, in addition to Over-the-Top (OTT) services, the 3rd Generation Partnership Project (3GPP) recently adopted a new voice codec standard. Developed through the collaboration of 13 companies, the IVAS codec standard was included in the 3GPP’s Release 18, building on the widely used Enhanced Voice Services (EVS) codec. Importantly, the IVAS codec maintains full backwards compatibility, ensuring seamless interoperability with existing voice services.
One of the key innovations during IVAS standardization was the creation of a new parametric audio format, Metadata-Assisted Spatial Audio (MASA), designed specifically for devices with limited form factors, like smartphones. The IVAS codec integrates a built-in renderer that supports head-tracked binaural audio and multi-loudspeaker playback using the MASA format.
Additionally, an immersive voice client SDK can serve as the IVAS front-end, capturing spatial audio from device microphones and converting it into the standardized MASA format. This technology enables true 3D immersive audio experiences for various types of voice calls.
The Power of 3D Live Audio: What it Means for People, Operators, and Businesses
New immersive 3D audio revolutionizes the audio experience for consumers, enterprises, and industries. For consumers, it deepens engagement in interactions with friends and family by sharing local sounds, whether live-streamed or recorded, and offers full immersion in synchronized metaverse experiences. For enterprises, 3D audio voice calling unlocks new capabilities, from enhanced customer experience through directional audio to transforming team collaboration and decision-making. In industrial settings, audio analytics can drive automated processes like predictive maintenance, streamlining operations, and boosting efficiency.
In order to enable these experiences across diverse network conditions, service providers need scalable solutions that optimize performance regardless of bandwidth constraints. The 3GPP IVAS standard codec accommodates bitrates ranging from 13.2 to 512 kbit/s, ensuring immersive audio quality whether used in congested networks or high-quality streaming environments. This scalability empowers service providers to support more users while delivering rich audio experiences.
Looking to the future, it is expected that voice-based user behavior will continue to evolve. Beyond traditional calls, spatial audio communication will expand to include semi-synchronous messaging through popular apps, people sending voice clips to each other, and more extensive use of group calls. With the rise of extended reality devices and services across industries, the scope of voice communication is set to become even broader, with immersion as a defining feature. A key factor in this evolution will be standardization and the integration of the IVAS codec into the latest 5G advanced standard, which is essential to ensure the interoperability needed to bring 3D calling to every phone at the push of a button.
We’ve rated the best business phone systems.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro
Voice is our primary means of communication, and telephony has enabled us to connect using our voices for over a century. The phone call as we know it has evolved from analogue to digital, from fixed to mobile, and from low speech quality to natural speech quality. One major advancement,…
Recent Posts
- Grok blocked results saying Musk and Trump “spread misinformation”
- A GPU or a CPU with 4TB HBM-class memory? Nope, you’re not dreaming, Sandisk is working on such a monstrous product
- The Space Force shares a photo of Earth taken by the X-37B space plane
- Elon Musk claims federal employees have 48 hours to explain recent work or resign
- xAI could sign a $5 billion deal with Dell for thousands of servers with Nvidia’s GB200 Blackwell AI GPU accelerators
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010