An overview of the avatars of Facebook's codecs, extremely realistic representations of individuals made by applying machine learning to data collected in the studio


"There is this big, ugly Sucker at the door, "said the young woman, eyes bright," and he said, "Who do you think you are, Lena Horne?" I said no, but I knew Miss Horne as a sister. "

This is the beginning of a short monologue from Walton Jones's play The 1940s radio time, and as she continues with the monologue, it's easy to see that the young woman knows what she's doing. Her smile grows as she continues to tell the doorman's tone change – as if she let you make the joke. Her lips curl as she grabs the right words, playing with their cadence. Her expressions are so finely calibrated and her reading so assured that with the dark background behind her, one would have thought you were watching a black-box revival of the Broadway play of the late '70s.

There is only one problem: his body disappears under the neck.

Yaser Sheikh reaches out and stops the video. The woman is a surprisingly realistic virtual reality avatar, her performance being generated by the data collected previously. But Sheikh, who runs the Facebook branch of Reality Labs in Pittsburgh, has another video that he considers more impressive. In this video, the same woman appears with a VR headset, just like a young man. They discuss on the left side of the screen with their real life. on the right, simultaneously, their avatars continue in perfect concert. As mundane as the conversation is – they talk about hot yoga – it's also an unprecedented glimpse into the future.

For years, people have been interacting in virtual reality through avatars, computer-generated characters that represent us. As headphones and VR hand controls are traceable, our real head and hand movements are embedded in these virtual conversations, the unconscious mannerisms adding a crucial texture. However, even though our virtual interactions have become more naturalistic, technical constraints have forced them to remain visually simple. Social reality applications such as Rec Room and Altspace summarize us in caricatures, with expressions that rarely (if ever) match what we actually do with our faces. Facebook's Spaces is able to generate a reasonable approximation of your cartoon from your photos on social networks, but depends on the buttons and joysticks to trigger certain expressions. Even a more technically demanding platform, such as High Fidelity, which allows you to import a digitized 3D model of yourself, is far from being able to create an avatar. feel like you.

That's why I'm in Pittsburgh on a cold and ridiculous morning in early March in a building that very few foreigners have ever set foot. Yaser Sheik and his team are finally ready to share their work with me today. rented for the first time a small office in the neighborhood of East Liberty. (They have since moved to a larger space on the Carnegie Mellon campus, with plans to expand again in a year or two.) The codec avatars, as Facebook Reality Labs call them, are the result of 'a process that uses to collect, learn and recreate the human social expression. They are also far from ready for the public. In the best case, they will be far in years, if they are used by Facebook. But the FRL team is ready to start this conversation. "It will be big if we can finish this," Sheik says with a not at all content smile from a man who does not doubt that they will finish it. "We want to take it out, we want to talk about it."

In the 1927 essay titled "The Unconscious Structure of Behavior in Society," anthropologist Edward Sapir wrote that humans responded to actions "in accordance with an elaborate and secret code that was not written anywhere, known to all, and understood by all ". Two years later, replicating this elaborate code became Sheik's primary mission.

Before coming to Facebook, Yaser Sheik was a Carnegie Mellon teacher who was investigating the intersection of computer vision and social perception. When Oculus's senior scientist, Michael Abrash, contacted him in 2015 to discuss the direction that AR and VR might take, Sheikh did not hesitate to share his own vision. "The real promise of virtual reality," he says now, his two hands around a bowl of coffee still present, "is that instead of flying to meet me in person, you can put a headphones and have this exact conversation … have now – not a version of you or an ogre version of me, but watch as you do, move as you do, sound like you do. "

(In his founding document of installation, Sheik described it as a "social presence laboratory", a reference to the phenomenon in which your brain reacts to your virtual environment and to your interactions as if they were real He also wrote that he thought he could do photorealistic avatars within five years, with the help of seven or eight people, and the expectations have changed, as has the name: Oculus Research has was dubbed Facebook Reality Labs last year.)

The basic theory of codec avatars is simple and double, what Sheikh calls the "ego test" and the "mother test": you should love your avatar, just like your loved ones. the process Activating avatars is something much more complicated, as I discovered in two different capture procedures. The first is held in an enclosure called Mugsy, whose walls and ceiling are dotted with 132 standard Canon lenses and 350 lamps facing a chair. Sitting in the center, it feels like a black hole made of paparazzi. "I mistakenly named him Mugshooter," Sheikh admits. "Then we realized that it was a horrible and unfriendly name." That was some versions ago; Mugsy has steadily increased with regard to cameras and abilities, sending early kludges (like using a ping-pong ball on a string to help participants stand in the right place, in the style of a car in the garage) to deserved obsolescence.

In Mugsy, research participants spend about an hour on the chair, performing a series of exaggerated facial expressions and reading lines aloud while an employee in another room drives them via a webcam . Clamp the jaw. Relax. Show all your teeth. Relax. Rub your face. Relax. "Aspire like a fish," says Danielle Belko, the program's technical manager, as I try not to succumb to a paralyzing awareness. "Fluff your cheeks."

If the word panopticon It seems to me that it would be appropriate – even if it would be better to apply it to the second catchment area, a larger dome known within the Sociopticon. (Before joining Oculus / Facebook, Sheikh had installed his predecessor, Panoptic Studio, at Carnegie Mellon.) The Sociopticon looks a lot like Microsoft's mixed reality capture studio, although it has more cameras (180 to 106) and higher resolutions (2.5K). 4K versus 2K per 2K) and capture a higher rate (90Hz versus 30 or 60). Where Mugsy is focused on your face, Sociopticon helps Codec Avatar understand how our bodies move and our clothes. So, my time there is less focused on facial expression than on what I would call calisthenic lazy: shake limbs, jump, play charades with Belko via webcam.

The goal is to capture as much information as possible (Mugsy and Sociopticon collect 180 gigabytes each second) so that a network of neurons can learn to associate expressions and movements with muscular sounds and deformities, from all possible angles. The more information that is captured, the more "deep-seated" it looks, the more robust it can be to code that information as data, and then decode it at the other end in the headset. Another person, as an avatar. As all those who have experienced video compression problems at the beginning of the Internet know, this is where the "codec" comes in Codec Avatars: encoder / decoder.

These are not just raw measures. As researcher Jason Saragih tells me, the data must be interpreted. The usual users will not have Mugsy and Sociopticon in their living room, they will only have their VR and AR helmets. While current VR portable systems are known as head-mounted displays, FRL researchers have created a line of head-mounted capture systems (HMCs). Known internally as Silver, these HMC consoles drive LEDs and infrared cameras to different areas of the face, allowing the software to restore them to the image of the person.

One day, Sheikh and his team want to extend this facial scan to the whole body. The software will have to be able to bypass what Saragih calls "extrinsic", strangeness that would otherwise make virtual interaction less realistic. . For example, if it is dark where you are, the system must be able to compensate. If you move your hand behind your back, the system must be able to take this into account so that if your friend walks behind you (in VR), he can see what your hand is doing. There are others, like being able to predict how you move in order to keep your avatar's movement as smooth as possible, but they all aim at removing the variables and letting your avatar be an unhindered representation and undiluted of you.


The animation of people is difficult. It's just the truth. Even colossal video mega-games struggle with problems such as hair, eyes and the inside of the mouth – and wandering paths lead straight into the mysterious valley, this visceral discomfort caused by seeing some something that looks almost but not enough Human. After my experience with the capture process, when I put a headset on chat to chat live with Sheikh and researcher Steve Lombardi, I really expect the reality of virtuality to fall into the same trap.

Nope. Sheikh's Avatar does not have the beard or round glasses that he wears in real life (apparently, it's harder to do things, so he did capture without them) but it's him. It's so him that when he urges me to lean over and look closer at the stubble on his face, it's incredibly overwhelming to do it. It's so much Steve Lombardi that when he comes into the room later, I feel I already know him – though I've never met him in the flesh. The results are not perfect. When people speak with enthusiasm, the mouth of their avatars does not move enough as much as their tone suggests; the hair is visible at each strand but has a fuzzy aura around it; the languages ​​are a little blurry. But the overall effect is overall very atypical. it should not be possible.

It's a wonderful thing to experiment. Trouble too. Although codec avatars are still just a research project, we are learning about them at an uncertain time. Deepfakes, the AI ​​so powerful that it can create faces from scratch, data privacy, disinformation campaigns and toxic behavior have become real problems on a real Internet – and to As VR and AR begin to make their way to becoming the dominant communication platforms of humanity, funded by a social media company that has been the epicenter of some of these issues, they will become even more important. more urgent. Did you think harassment was bad online? You thought that virtual reality, which adds a touch of personalization and personal space, made it even more viscerally disturbing? You have not seen anything yet.

Sheikh understands the concern. "Authenticity is not only essential to the success of this operation, but also to the protection of users," he said. "If you receive a call from your mother and you hear her voice, you will not doubt that what she says is what you hear, no, we need to create that trust and keep it from the beginning. He quotes the sensors on the HMC consoles as a crucial means of authentication: our eyes, our voices and even our mannerisms are all biometrics. (Which, of course, solves one problem but intensifies another.) Conversations about data privacy and virtual reality have intensified in recent years, but such an advance may well bring them to 11.

Despite all the progress made by VR over the last decade, something like Codec Avatars represents a transition to a whole new phase of experience – and the members of the company who have seen it know it. Every year, at the Oculus Connect Developer Conference, Michael Abrash enters the scene and exposes the state of the union on the pace of research and innovation in the company's research labs. Over time, he has become optimistic about some advances in virtual reality, bearish on others. Last October, however, one of his usual ursine positions began to grow horns. "I do not bet to have convincing human avatars in the next four years," he said, "but I do not bet against that either."

I am now sitting next to Yaser Sheikh and I ask him how he felt about the proclamation of Abrash at the time.

"He's right," he said, smiling as he sipped his coffee.

More great cable stories