In late October ProPublica released a scathing investigation showing how Facebook allows digital advertisers to narrow their target audience based on ethnic affinities like “African-American” or “Hispanic.” The report suggested that Facebook may be in violation of federal civil rights statutes and drew parallels to Jim Crow Era “whites only” housing ads.
Facebook’s privacy and public policy manager, Steve Satterfield, told ProPublica that these ethnic filters exist to allow advertisers to test how different advertisements perform with different sections of the population. While A/B testing is standard practice at large tech companies, his comment did not address whether it is appropriate to segment these tests by ethnicity.
This type of story is increasingly common, as concern that automation in the spheres of hiring, housing, advertising, and even criminal sentencing can lead to discriminatory outcomes. ProPublica’s report isn’t Facebook’s first scandal about the company’s online algorithms encoding human biases (see the firing of human editors in the company’s “trending feature”), and it may not be its last. But there are also good reasons why this type of targeting might not always be racist, and could even be necessary to prevent discrimination.
In Fair Machine Learning, the academic field that studies the design of fair algorithms, it’s understood that rather than ignoring ethnic information, fair algorithms should explicitly use it. An elucidating example comes from a New York Times interview with Cynthia Dwork, a computer scientist at Microsoft Research. She imagines being tasked with selecting bright students for an internship, coming from one of two ethnic categories. In the minority group cultural norms result in bright students being encouraged to major in finance, whereas in the majority group they are steered towards computer science.
A fair algorithm for selecting the best students would then select minority students who majored in finance, and majority group students who majored in computer science. However, without ethnic information to identify students, an algorithm would likely only select for students who majored in computer science, since most of the qualified candidates in the aggregate population will have majored in computer science (as there are numerically more students in the majority group). This scheme would be both less fair and less accurate than the one that incorporates ethnic information.
Likewise, a Facebook platform that didn’t filter by ethnicity is not a priori guaranteed to be fair; stripping the advertisers’ inputs of racial data doesn’t prohibit discrimination in the algorithm itself. It’s tempting to think that because algorithms make decisions based on data, absent any skewed inputs they don’t exhibit the same biases a human arbiter would. But recent findings have shown this is not the case. For example, “Man is to Computer Programmer as Woman is to Homemaker?”, published this summer, illustrates how web searches could be more likely to show potential employers a male computer science student’s web page rather than a female’s. This was not due to malicious intent, but to the way Google’s neural net algorithm had learned to represent words. It had decided that the word “programmer” linked closer to the word “male” than “female.”
So how do we design a fair algorithm? Before an engineer commits code to screen, she or he should determine what’s meant by fair. One approach aims to formalize John Rawls’ notion of “fair equality of opportunity,” essentially dictating that a procedure is fair if it favors person A over person B only if person A has more innate merit. This frames fairness as the way we treat individuals rather than groups of individuals. Rather than stipulating, for example, that a qualified black applicant must have the same probability of receiving a loan as a qualified white applicant, group fairness would require the percentage of blacks receiving loans to be the same as the percentage of whites receiving loans. Although both group and individual fairness seem to encode important elements of a common-sense definition of fairness, they can actually be at odds with each other in many situations: enforcing group fairness can force unfair decisions at the individual level, and vice-versa.
For example, if in the minority population there is actually a lower proportion of qualified applicants, a group-fair algorithm would necessarily have to either award loans to unqualified members of the minority group, or deny qualified applicants in the majority group. But this violates individual fairness; qualified individuals in the majority group who were denied loans were clearly treated unfairly relative to unqualified individuals in the minority group who received them.
While it is easy to sound the alarm when ethnic information seems to play a role in an automated system, it’s an artifact of our society’s systemic prejudices that to truly be fair we often must use such information. By the same token, an absence of an ethnic affinity filter or such like does not mean everything is fine and dandy; statistical discrimination may be lurking under the surface. Rather than stopgap measures like removing a filter when it creates a media snafu, companies like Facebook should build fairness into all of their relevant systems and invest in research focused on algorithmic fairness. Without algorithms with strong fairness properties, as well as studies examining the effects of Facebook’s advertising platform on different ethnic groups, not only can we not truly tell if those algorithms are discriminatory, Facebook probably can’t, either.
A first step seems to have come in September, when Amazon, Google, Facebook, IBM, and Microsoft announced the formation of a Partnership on AI, a coalition designed to support best practices and promote public understanding of AI and its potential impacts. Inter-disciplinary thinking will be vital to ensuring that the tremendous benefits some in society reap from machine learning do not come at the expense of subtle, but significant discrimination against others. It only seems fair.