3D audio is the secret to HoloLens' convincing holograms


After a few minutes, the echoless chamber starts to feel uncomfortable, even unnatural. The blood pumping through the heart becomes more audible. The ebb and flow of the air in the lungs comes into focus. It’s a feeling that is often experienced inside anechoic rooms, which have been around for many decades. Dr. Leo Beranek, the director of Harvard’s electroacoustic lab, built the first one in 1943 to test broadcasting systems and loudspeakers and to improve noise control during WWII. Since then, similar spaces have been designed to test microphones and to measure HRTFs for multi-directional audio systems.

At Microsoft, Tashev’s chamber has a black leather chair at the center of the room where the HRTFs of 350 people have been measured. After a pair of small, orange microphones has been placed inside the ears of a subject, a black rig equipped with 60 speakers slowly rises from the back. As the contraption moves in an arc over the person, it stops at brief intervals to play sharp, successive, laser-like sounds. The microphones capture the sound waves as they enter the ear canals of the participant.

By playing sounds all around the listener, the team is able to capture the precise audio cues for both right and left ears in relation to 400 directions in the room. These measurements give them a pair of HRTF filters for each sound source. “If we know these filters for all possible directions, then we own your spatial hearing,” says Tashev. “We can trick your brain and make you perceive that the sound comes from any desired direction.”

Listen to the sounds Microsoft Research used for acoustic measurements.

To place a hologram at a particular location, a corresponding audio filter is applied. When the HoloLens projects those specific sounds, the HRTF clues trick the human brain into spotting the source almost instantly.

Despite the realism, the paraphernalia required to generate spatial sound has kept it from replacing stereo and surround systems for the masses. Apart from the precise acoustic measurements, it also requires constant head-tracking. The orientation of the head has a direct impact on the way sounds reach the ears. If you’re looking away from the bus on the street, for instance, it will sound different than if you’re looking straight at it.

For HoloLens, however, the team did not need to tackle the head-tracking problem from scratch. The holographic visuals work in part because one of the six cameras in the device monitors the user’s head movements at all times. The audio system simply taps into that information.

Microsoft is not the first or only company with the ability to create personalized audio. For most 3D audio experiences in VR, creators have been relying on HRTF databases that are publicly available or turning to research labs where audio personalization has been possible for a number of years. At Princeton University, Edgar Choueiri, a professor of mechanical and aerospace engineering, has been using the microphone-in-ears technique for the past few years. And VisiSonics, a company based in the University of Maryland’s research lab, has been measuring HRTFs to build its own library.

But Microsoft’s audio system stands apart for its engineering, which makes the audio calibration invisible to the HoloLens user. While the personalization isn’t as perfect as it tends to be inside a controlled lab, it is a lot less tedious.

The first time you wear the device, you start with a wizard that guides you through a calibration for the eyes. For the holographic effect to work, the computer around your head needs to measure the distance between your pupils. It asks you to close one eye, hold your finger up and tap down on a projected image in front of you. You repeat the same for the second eye for the system to calculate the interpupillary distance. But that’s not all the system is doing. Baked into this process is an algorithm that correlates the eye measurements with the numbers from Tashev’s research that scanned and measured the eyes and ears of hundreds of subjects to build a generic average. Essentially, the distance between the eyes becomes an indicator of the distance between the two ear canals of the person using the device.

The idea is to make the information-gathering process as inconspicuous as possible. “I think we have succeeded,” says Tashev. “Today the final user doesn’t even know when or how the personalization of the HRTFs happens.”

Recommended for you

Leave a Reply