Good coverage, or ‘dangerous science’?

Marin County employees have begun testing to gauge their hidden prejudices, but the method is being called into question by some who say it is ineffective.

In March, the county agreed to pay BiasSync, a Los Angeles company, $150,000 over two years to conduct a series of unconscious, or “implicit,” bias tests and provide related online educational videos. More than 2,300 employees are underway on the testing, which is designed to determine whether they harbor unconscious bias against African Americans, women and members of the LGBTQ community.

Angela Nicholson, an assistant Marin County administrator, said 86% of county employees have now completed their first BiasSync training session, which included both a racial and gender bias assessment conducted via the internet.

Next year, employees will be tested to determine their bias towards people in the LGBTQ community.

“This is an important effort in creating and sustaining a culture of belonging where each of us can thrive,” said the email asking employees to take the bias test.

“The BiasSync program will help you understand potential unconscious biases you may have,” the email stated, “part of a professional development path that will help you as an individual and us as a county.”

Whether the BiasSync test accurately measures unconscious bias, however, is not a settled question, nor is whether the assessment results the company provides can be put to any useful purpose.

In fact, some psychologists question whether the Implicit Association Test (IAT), the original test for unconscious bias considered to be the gold standard, is scientifically valid.

Nicholson declined to disclose the results of the bias assessments recently completed by Marin employees. She wrote that “the results could be misinterpreted and used in a manner inconsistent with the benefits of the program and could chill participation moving forward.”

BiasSync’s managers declined to be interviewed. Instead, they supplied several promotional videos.

The IAT was co-developed in 1998 by Mahzarin Banaji, now the chair of Harvard University’s psychology department, and Anthony Greenwald, a recently retired social psychology researcher at the University of Washington.

“A powerful new psychological tool that shows a shocking number of people — as many as 90 to 95 percent — display the unconscious roots of prejudice will be demonstrated,” said the University of Washington announcement rolling out the test.

“The test measures people’s implicit or unconscious evaluations and beliefs about groups that spring from strong, automatic associations of which a person may be unaware,” the announcement said.

The IAT, to which BiasSync’s test bears a resemblance, uses a very simple approach.

Test takers sit in front of a computer screen and are shown a series of faces and words. They are told to hit a key on the right side of their computer keyboard if the word or image is good and a key on the left side of their keyboard if the word or image is bad.

Then a layer of complexity is added. Test takers are instructed to hit the left key when they see either a black face or a good word and to hit the right key when they see either a white face or a bad word.

Finally, the associations are flipped to black face-bad and white face-good.

The test’s efficacy is based on the assumption that people who are quicker to associate good words with White faces and bad words with Black faces harbor unconscious bias in favor of White people and against Black people.

“The measures found in the company’s cognitive assessments are based on pre-existing scientific research that have a proven track record of realizing actionable data on individual and group rates of unconscious bias,” BiasSync’s website says.

Dan Gould, BiasSync’s co-founder, president and chief technology officer, is quoted on the company’s website as saying, “We have developed a proprietary tool—built from multiple scientific methodologies — that assesses every individual in an organization.” Gould is former vice president of technology at the online dating app Tinder.

Greenwald, the IAT co-founder, said he is highly skeptical of companies offering the type of services BiasSync provides because often they’re not evidence-based.

“There is no evidence that treatment interventions that are offered to reduce implicit bias have any durable effects,” Greenwald said. “If BiasSync did not provide evidence that they can do what they say they can do, those (Marin) executives should not be purchasing this.”

“The reviews of literature on the general class of procedures that I refer to as group-administered training,” Greenwald said, “is that there is no evidence that they succeed in reducing biases or in improving workforce diversity. The scientific evidence is that this kind of training is not established as effective.”

In a promotional video, Gould says BiasSync’s assessments are based on a “whole class of assessments” that were developed at the University of Washington.

“What we did was we both took those original ideas for assessments and modernized them,” Gould says.

BiasSync did not respond to queries regarding how it developed its version of the implicit bias test, but there appears to be few consequential differences between the BiasSync test and the IAT, which is available for anyone to take free online.

“Marin County could have sent its employees to the Project Implicit website to take that test,” Greenwald said.

Project Implicit is a nonprofit founded in 1998 by Greenwald, Banaji and Brian Nosek, a psychology professor at the University of Virginia, to educate the public about bias and to collect data via the internet. Project Implicit also provides educational services to corporate clients that can include an Implicit Association Test demonstration.

Greenwald said BiasSync could have based its bias test on the IAT’s design.

“It is in the public domain,” he said. “I and Project Implicit freely want others to use it for educational purposes.”

Greenwald said BiasSync “really shouldn’t be using it for commercial purposes without giving some credit and probably payment to Project Implicit.”

Amy Jin Johnson, executive director of Project Implicit, said, “BiasSync has not purchased or licensed any intellectual property from Project Implicit.”

One feature that BiasSync offers that Project Implicit doesn’t offer is the ability for clients, such as Marin County, to get aggregated data on the test results of their employees.

In the BiasSync video that Marin County employees watched, company co-founder Michele Ruiz, a former Los Angeles news anchor, said, “Your individual answers will not be provided to your company in any way that personally identifies you. Only anonymized aggregate data is being used.”

Greenwald said, “Yeah, but they’re not going to do anything with those aggregated results. They’re not going to find out anything different from what is published.”

Since the IAT was created, millions of people have taken the test on the Project Implicit website. Greenwald says the data show that about 70% of people — 80% of White people and a smaller percentage of African American people — produce results that demonstrate an “automatic White preference to a non-trivial degree.”

“Marin County won’t find anything different from that,” he said.

Hart Blanton, a social psychologist and professor of communication at Texas A&M University, also sees problems with companies such as BiasSync employing implicit bias tests, but for a different reason.

“I’m not aware of any technology that can accurately diagnose someone’s unconscious biases,” Blanton said. “I’m not aware of any serious researcher who is pursuing that goal anymore.”

Blanton is one of several psychologists who have challenged the scientific validity of the IAT over recent years.

BiasSync bills itself as “a science-based solution designed to help organizations more effectively assess and manage unconscious bias in the work environment.”

BiasSync admits that its trainings don’t reduce unconscious bias in people, but it says the assessments are important.

In another promotional video, Ruiz and Gould discuss the efficacy of the bias assessments. They say the data from the tests are less impactful than the effect the tests have on those who take them.

“What we learned was giving individuals their own results had this real value of shocking them into caring,” Gould said, “into understanding that, whoa, I have these biases. It really led to a big interest from a ton of employees in working to overcome them. It can have super, super powerful life impacts.”

Ruiz said, “It is a significant aha moment. It creates this opportunity or desire to do better through various methods.”

But what if the assessments don’t accurately measure unconscious bias? Is it ethical to lead people to believe they harbor unconscious bias against Black people or women or members of the LGBTQ community if they actually don’t?

Blanton doesn’t think so.

“Even the designers of the IAT now discourage applications that involve interpreting a person’s score,” Blanton said. “They argue against its use for juries or other sorts of critical decisions you’d make about an individual.”

Blanton and other psychologists have criticized the IAT for asserting that both its retest reliability and predictive validity are too low to take its results seriously.

Retest reliability measures how likely test takers are to get the same results if they take the test multiple times.

“There is far more random noise in the score that someone has in a test like this than there is actual signal,” Blanton said. “The problem is a person’s test doesn’t even predict their test score a few hours later. If it is changing that much, it doesn’t seem to be measuring some stable quality in a person.”

As detailed in a New York magazine article by Jesse Singal, most estimates of retest reliability have found that race IATs achieve the same results not more than half the time. Test-retest reliability is measured using a variable known as r, which ranges from 0 to 1.

“The most recent meta-analysis of test retest reliability shows that it is about 0.5,”  Greenwald said. “This means that it is not a strong enough reliability figure to take a single IAT measure as diagnostic of a person.”

Singal cites an email from Banaji that said, “We have always argued that the IAT should NOT be used as a diagnostic tool. It is not, as I’ve said, a DNA test.”

Banaji declined to be interviewed but wrote in an email: “The race IAT has good reliability and validity, but it should never be used in assessing individuals in any selection context. It serves a good function for research and education.”

In its training video, BiasSync seems to convey a mixed message on this score.

“So what do the results of this test mean?” Ruiz asks in the video that Marin County employees watched. “They may reveal that you could be biased against a certain race,” she says, and “it tells you how strong that bias is.”

But Ruiz then adds that even if the test indicates the test taker has a strong unconscious bias that “does not mean you’re racist, nor should the results of one test make you draw any definitive conclusions.”

Ruiz provides no information about why test takers should not consider the results conclusive.

In an appearance on the BiasSync video, Dolly Chugh, a social psychologist at New York University, says, “What we know is that if we think we don’t have any biases, that’s our bias.”

Singal, in a Wall Street Journal article, notes: “Only about half of all published experimental psychological findings are successfully replicated by other researchers. The subfield of social psychology tends to fare even worse.”

Regarding the IAT, Blanton said, “The behavioral predictions of these scores is also abysmal. There is really no evidence they are measuring qualities in people that affect how they act or the decisions that they make.”

Estimates of the predictive validity for the race IAT have ranged from less than 1% to 5.5%. The higher estimate came from meta-analysis — taking the results of numerous studies and combining them — performed by a team that included Greenwald and Banaji.

Blanton says in that meta-analysis the researchers counted both favorable behavior and negative behavior toward Black people as demonstrating predictive validity, claiming that the implicitly biased people who acted favorably towards Black people were overcompensating.

“That is simply incorrect,” Greenwald said.

Greenwald concedes that both the repeatability and the predictive validity of the race IAT are low to moderate. He argues, however, that if someone takes the test many times, or if the test is administered to a large group of people, that the scores are meaningful.

“These magnitudes of correlation count for discriminatory effects when aggregated across large populations or when applied repeatedly to the same person,” Greenwald said.

The Project Implicit website states: “The link between implicit bias and behavior is fairly small on average but can vary quite greatly. … However, even small effects can be important. Small effects can build into big differences at both the societal level (across lots of different people making decisions) and at the individual level (across the many decisions that one person makes).”

Greenwald uses the example of blood pressure readings: Individual readings might be inaccurate but multiple readings provide a fairly accurate measure.

So how many times do people have to take the IAT before they can be assured of an accurate measure of their unconscious racial bias?

“I’d say probably half a dozen is not necessary,” Greenwald said, “but I wouldn’t discourage a person who was skeptical about it to repeat it half a dozen times.”

Blanton says the blood pressure example isn’t applicable, and he says it illustrates another problem with the IAT — that the boundaries for determining which scores indicate bias and which don’t are arbitrary.

“With blood pressure, you know you’re capturing something in the moment that you know is real and physical,” Blanton said. “The reason we call something high blood pressure is that we have data that says that if you hit these marks your chance of a heart attack or stroke goes from here to there.”

“High blood pressure is high because it is dangerous,” Blanton said. “You’re high on the IAT because the researchers who designed it decided to call it high.”

Critics of the IAT have also questioned whether factors besides bias — such as reaction times — could be causing high scores.

Greenwald acknowledged that there have been “some criticisms” of the IAT by scientists but said, “They are not by any means fatal criticisms. They have been mostly addressed empirically.”

“This is somewhat political,” Greenwald said. “Most of the critique comes from people who are associated with conservative media.”

Blanton said, “I guess they really love believing that. I find it hilarious. I’m pretty far left of center.”

“If people are looking for the motive behind the critique and there is a motive,” he said, “I’m trying to defend science. This is bad science.”

The contract with BiasSync is just the latest chapter in the county’s quest to achieve racial equity.

In 2017, the county adopted a “racial equity action plan” that among other initiatives required all county employees to attend four to six hours of “cultural intelligence” training.

“We have acknowledged that inequity has been baked into government structure through the years, and this gives us a tool to address it in our community,” Nicholson, the assistant county administrator, said at the time.

In 2019, the director of the Marin County Department of Health and Human Services, Grant Colfax, unveiled a race-based plan to achieve health equity in the county.

“This strategy really recognizes and acknowledges our country’s racist history and acknowledges that we need to address and redress these racial dynamics in our system in order to address health equity in our communities,” said Colfax.

Since then, the county has expanded its emphasis on equity to require county department heads to view their mission through an “equity lens.” Every staff report that comes to county supervisors for action must now include analysis of the “equity impact.”

“BiasSync is a science based tool designed to help individual employees identify and manage their own personal biases,” Supervisor Katie Rice wrote in an email. “Recognizing our biases is key to ensuring that bias does not influence how we treat each other in the workplace, our policy making, even program design and delivery of services.”

The racial breakdown of Marin County’s workforce is 60% White, 20% Latino, 10% Asian, 6.7% African American, 0.2% American Indian/Alaskan Native, 0.2% Native Hawaiian/Pacific Islander; and 2.5% two or more races.

Comments are closed.