Voice technology has led to voice analysis research that provides a better understanding of human behavior, but raises concerns about confidentiality and accuracy.


Voicesense makes an intriguing promise to its customers: give us the voice of someone and we'll tell you what it will do.. The Israeli company uses real-time voice analysis during calls to determine if a person is likely to default on a bank loan, buy a more expensive product or be the best candidate for a job.

This is one of many companies looking for the personal ideas contained in our speech. In recent years, researchers and start-ups have become aware of the wealth of information that can be exploited from the voice, especially since the popularity of home helpers like Alexa, from Amazon, makes consumers more and more comfortable talking with their devices. The voice technology market is growing and is expected to reach $ 15.5 billion by 2029, according to a report by business analytics company IdTechEx. "Almost everyone is talking and there are a multitude of devices that capture the voice, whether it's your phone or things like Alexa and Google Home," said researcher Satrajit Ghosh at MITC McGovern Center, which is interested in the development of voice analysis of mental disorders. health purposes. "The voice has become a fairly ubiquitous current in life."

The voice is not only ubiquitous; it's very personal, difficult to simulate – think about the disbelief surrounding Theranos 'former CEO, Elizabeth Holmes' falsely serious voice – and present in some of our most intimate environments. People talk to Alexa (who has mistakenly recorded conversations) at home and digital voice assistants are increasingly used in hospitals. Voice log applications such as Maslo rely on users who speak frankly about their problems. At the moment, many people know that tweets and posts on Instagram will be monitored, but they think less about our voice as another form of data that can tell us about ourselves and give us to # 39; others. All of this has led to exciting research on how this information can enrich our lives, as well as concerns about the confidentiality of how accurate this information is and how it will be used.

The key to voice analysis research is not what someone says, but how he says it: tones, speed, accents, breaks. The trick is machine learning. Take labeled samples from two groups – for example, people anxious over others – and insert that data into an algorithm. The algorithm then learns to detect subtle language signs that can indicate whether a person is part of Group A or Group B, and he can do the same on new samples in the future.

The results can sometimes be counter-intuitive, explains Louis-Philippe Morency, a computer scientist at Carnegie Mellon University, who has developed a project called SimSensei that can help detect depression by voice. In some previous research that attempted to match vocal characteristics with the likelihood of attempting suicide again, Morency's team found that people with a soft, breathable voice, not those with a loud or angry voice, were more likely to retry, he says. This research is preliminary, however, and the links are not usually so simple. In general, the gift is a complex set of features and speech patterns that only algorithms can detect.

"We can provide predictions about health, work, entertainment behaviors"

Nevertheless, researchers have already developed algorithms that use the voice to identify everything from Parkinson's disease to post-traumatic stress disorder. For many, the most promising technology of this technology lies at the intersection of voice analysis and mental health and hope to create a simple way to monitor and & 39 help people at risk of relapse.

People with mental health problems are closely monitored when they are in the hospital, but "a lot of what happens with mental health problems occurs daily," said David Ahern. , who directs the Brigham and Women's Hospital's Digital Behavioral Health program. . He says that outside of a supervised setting, everyday life can wear people down slowly and subtly. In this type of situation, someone who has already been diagnosed with depression may not even realize that they have become depressed again. "These events occur when people are not connected to any type of health system. And if a situation worsens to the point that someone asks for care in an emergency room, to use a Midwestern phrase, the pony has already come out of the enclosure, "says Ahern. "The idea of ​​having a sensor in your pocket to monitor relevant behavioral activities is quite conceptually powerful. It could be an early warning system. "

Ahern is the principal investigator of a clinical trial on CompanionMx, a mental health surveillance system set up in December. (Currently, CompanionMx is only available for physicians and patients.) Other startups, such as Sonde Health and Ellipsis Health, have similar goals.) Patients record audio logs at the same time. help of the application. The program analyzes these logs as well as metadata such as call logs and location to determine how the patient gets results based on four factors – depressed mood, reduced interest, avoidance and fatigue – and traces change with the weather. This information, which is protected by the Federal Privacy Act HIPAA, is shared with the patient and also presented in a dashboard to a doctor who wishes to monitor the evolution of the state of health. his patient.

The company has been testing the product for seven years and with more than 1,500 patients, according to CompanionMx's general manager, Sub Datta. The product, from another voice analysis company called Cogito, has received funding from DARPA and the National Institutes of Mental Health. Results published in the Journal of Medical Internet Research suggest that technology can predict symptoms of depression and PTSD, although additional validation is needed.

In pilot studies, 95% of patients left audio diaries at least once a week and clinicians view the dashboard at least once a day, according to Datta. These numbers are promising, although Ahern points out that there are still many questions about the most useful component. Is the application itself? The feedback? The dashboard? A combination? Research is ongoing and other results have not yet been made public. CompanionMx plans to partner with health care organizations and explore opportunities with the Department of Veterans Affairs.

At the same time, services such as Voicesense, CallMiner, RankMiner and Cogito, the parent company of CompanionMx, promise to use voice analytics in the business environment. Most of the time, it means improving customer service engagement in call centers, but Voicesense has big dreams. "Today we are able to generate a complete personality profile," says CEO Yoav Degani. His projects go beyond appeasing disgruntled customers. His company is interested in everything from loan default predictions, insurance claims forecasting, revelation of client management style, internal evaluation of candidates by human resources, assessment of the probability of employee departures. "We are not correct 100% of the time, but we are ok at an impressive percentage of the time," said Degani. "We can provide predictions about health behavior, work behavior, hobbies, etc., etc."

In a case study shared by Degani, Voicesense tested its technology with a large European bank. The bank provided voice samples to a few thousand debtors. (The bank already knew who had and had not defaulted on their loans.) Voicesense applied its algorithm to these samples and ranked the records as low, medium, and high risk. In one of these analyzes, 6% of the predicted "low risk" people were missing, compared to 27% of the group that Voicesense considered high risk. In another assessment of the likelihood of temporary employees leaving their jobs, 13% of those with an algorithm rated "low risk" left, compared to 39% of the high risk group.

"What happens when the algorithms are wrong?"

These are all plausible applications, says Ghosh, the MIT scientist. Nothing is obvious to him. But as with any predictive technology, it is easy to generalize too much if the analysis is not well done. "In general, until I see proof that something has been validated on X number of people and this diversity of population, I would have a hard time taking for granted the request of someone," he said. "The characteristics of the voice can vary considerably unless you have sampled enough, which is why we avoid making very strong claims."

For its part, Degani indicates that the Voicesense speech processing algorithm measures more than 200 parameters per second and can be accurate in many languages, including tonal languages ​​such as Mandarin. The program is still in the pilot stage, but the company is in contact with major banks, he said, and with other investors. "Everyone is fascinated by the potential of this technology."

Customer service is one thing, but Robert D'Ovidio, a professor of criminology at Drexel University, fears that some of the applications Voicesense is considering are discriminatory. Imagine that you call a mortgage company, he says, who uses your voice to determine that you have a higher risk of heart disease, and that you are considered to be at higher risk because you would not be able to. not be long. "I really think we're going to have a consumer protection law created to protect us from collecting them," D'Ovidio adds.

Ryan Calo, a professor at the University of Washington Law School, believes that some consumer protections already exist. The voice is considered a biometric measure and some states, like Illinois, already have laws guaranteeing biometric security. Calo adds that the problem of biases that are correlated to sensitive categories such as race or gender is endemic to machine learning techniques, whether they are used in voice analysis or in speech analysis. resume reading. But people feel viscerally upset when these machine learning methods are used for facial or voice recognition, in part because these features are very personal. And while there are anti-discrimination laws, there are many problems with voice analysis that arise more generally, namely when it is acceptable to use information and what constitutes discrimination, concepts with which our society has not been sufficiently debated.

"I hope that in the future, we will understand that it is only data, regardless of their form, such as a series of numbers entered in a spreadsheet or captured voiceprint, "says D'Ovidio. At a minimum, he adds, we should demand that we be informed when something like this is used. "And I would like to see a regulatory movement in consumer protection," he said. "What happens when the algorithms are wrong?"

Correction March 14, 2019, 2:20 pm EDT: Satrajit Ghosh is a researcher at MIT's McGovern Brain Research Center. A previous version of the article called him a "professor" of MIT as a second reference.