The IAT: Questions of Reliability and Validity
The Implicit Associations Test (IAT) is a very popular method for measuring implicit (implied though not plainly expressed) biases. Greenwald, one of the primary test developers, suggests that “It has been self-administered online by millions, many of whom have been surprised—sometimes unpleasantly—by evidence of their own unconscious attitudes and stereotypes regarding race, age, gender, ethnicity, religion, or sexual orientation.” (2010). It purports to tap into our unconscious or intuitive attitudes at a deeper level than those that we are able to rationally express. The best way to get an idea of just what the IAT is, is to take it. If you haven’t done so already, go to the Implicit Associations Test website and participate in a demonstration of the Race Test. It takes about ten minutes.
I tend to have a skeptical inclination. This in part stems from the training that I benefited from in acquisition of my PhD in psychology. But it is also just part of who I am. Psychology is, in itself, a rather soft science – full of constructs – and variables that are inherently difficult to measure with any degree of certainty. I learned early in my training that there are dangers associated with inference and measuring intangibles. In fact, my training in personality and projective measures essentially focused on why not to use them – especially when tasked with helping to make important life decisions. Why is this? All psychological measures contain small and predictable amounts of unavoidable error – but those based on constructs and inference are particularly untenable.
This is relevant because as we look at thinking processes, we are dealing with intangibles. This is especially true when we are talking about implicit measures. Any discussion of implicit thought necessitates indirect or inferential measures and application of theoretical constructs. So, with regard to the Implicit Associations Test (IAT), one needs to be careful.
Currently, increasing evidence suggests that our intuition has a powerful influence over our behavior and moment to moment decision making. Books like Blink by Malcolm Gladwell and How We Decide by Jonah Lehrer point out the power of intuition and emotion in this regard. Chabris and Simons in their book, The Invisible Gorilla, make a strong argument that intuition itself sets us up for errors. Gladwell perhaps glorifies intuition – but the reality is, it (intuition) is a powerful force. Gladwell uses the story of the IAT as evidence of such power. Essentially, if the IAT is a valid and reliable measure, it provides strong evidence of the problems of intuition.
I am motivated to shed some light on the IAT – not because of my personal IAT results, which were disappointing, but because the IAT has the risk of gaining widespread application without sufficient technical adequacy. Just think of the ubiquitous Meyers-Briggs Personality Inventory and the breadth and depth of popular use and appeal that it has garnered (without a shred of legitimate science to back it up). Real decisions are made based on the results of this instrument and frankly it is dangerous. The Meyers-Briggs is based on unsubstantiated and long out-of-date Jungian constructs and was built by individuals with little to no training in psychology or psychometrics. This is not the case for the IAT for sure, but the risks of broad and perhaps erroneous application are similar.
The authors of the IAT have worked diligently over the years to publish studies and facilitate others’ research in order to establish the technical adequacy of the measure. This is a tough task because the IAT is not one test, but rather, it is a method of measurement that can be applied to measure a number of implicit attitudes. At the very foundation of this approach there is a construct, or belief, that necessitates a leap of faith.
So what is the IAT? Gladwell (2005) summarizes it in the following way:
The Implicit Association Test (IAT)…. measures a person’s attitude on an unconscious level, or the immediate and automatic associations that occur even before a person has time to think. According to the test results, unconscious attitudes may be totally different or incompatible with conscious values. This means that attitudes towards things like race or gender operate on two levels:
1. Conscious level- attitudes which are our stated values and which are used to direct behavior deliberately.
2. Unconscious level- the immediate, automatic associations that tumble out before you have time to think.
Clearly, this shows that aside from being a measurement of attitudes, the IAT can be a powerful predictor of how one [may] act in certain kinds of spontaneous situations.
So here is one of the difficulties I have with the measure. Take this statement: “The IAT measures a person’s attitude on an unconscious level, or the immediate and automatic associations that occur even before a person has time to think.” Tell me how one would directly and reliably measure “unconscious attitude” without using inference or indirect measures that are completely dependent on constructs? I am not alone in this concern. In fact, Texas A&M University psychologist Hart Blanton, PhD, worries that the IAT has been used prematurely in research without sufficient technical adequacy. Blanton has in fact published several articles (Blanton, et al., 2007; Blanton, et al., 2009) detailing the IAT’s multiple psychometric failings. He suggests that perhaps the greatest problem with this measure concerns the way that the test is scored.
First you have to understand how it all works. The IAT purports to measure the fluency of people’s associations between concepts. On the Race IAT, a comparison is made between how fluent the respondent pairs pictures of European-Americans with words carrying a connotation of “good” and pictures of African-Americans with words connoting “bad.” The task measures the latency between such pairings and draws a comparison to the fluency of responding when the associations are reversed (e.g., how quickly does the respondent pair European-Americans with words carrying a “bad” connotation and African-Americans with words connoting “good.”). If one is quicker at pairing European-Americans with “good” and African Americans with “bad” then it is inferred that the respondent has a European-American preference. The degree of preference is determined by the measure of fluency and dysfluency in making those pairings. Bigger differences in pairing times result in stronger ratings of one’s bias. Blanton questions the arbitrary nature of where the cutoffs for mild, moderate, and strong preferences are set when there is no research showing where the cutoffs should be. Bottom line, Blanton argues, is that the cutoffs are arbitrary. This is a common problem in social psychology.
Another issue of concern is the stability of the construct being measured. One has to question whether one’s bias, or racial preferences, are a trait (a stable attribute over time) or a state (a temporary attitude based on acute environmental influences). The test-retest reliability of the IAT is relatively unstable itself. Regardless, according to Greenwald: “The IAT has also shown reasonably good reliability over multiple assessments of the task. …. in 20 studies that have included more than one administration of the IAT, test–retest reliability ranged from .25 to .69, with mean and median test–retest reliability of .50.” Satisfactory test-retest reliability values are in the .70 to.80 range. To me, there is a fair amount of variance unaccounted for and a wide range of values (suggesting weak consistency). My IATs have bounced all over the map. And boy did I feel bad when my score suggested a level of preference that diverges significantly from my deeply held values. Thank goodness I have some level of understanding of the limitations of such metrics. Not everyone has such luxury.
As I noted previously, the IAT authors have worked diligently to establish the technical adequacy of this measure and they report statistics attesting to the internal-consistency, test-retest reliability, predictive validity, convergent validity, and discriminant validity, almost always suggesting that results are robust (Greenwald, 2010; Greenwald, 2010; Greenwald, et al, 2009; Lane, et al, 2007) . There are other studies including those carried out by Blanton and colleagues, that suggest otherwise. To me, these analyses are important and worthwhile – however, at the foundation, there is the inescapable problem of measuring unconscious thought.
Another core problem is that the validity analyses employ other equally problematic measures of intangibles in order to establish credibility. I can’t be explicit enough – when one enters the realm of the implicit – one enters a realm of intangibles: and like it or not, until minds can be read explicitly, the implicit is essentially immeasurable with any degree of certainty. The IAT may indeed measure what it purports to measure, but the data on this is unconvincing. Substantial questions of reliability and validity persist. I would suggest that you do not take your IAT scores to heart.
Azar, B. (2008). IAT: Fad or fabulous? Monitor on Psychology. July. Vol 39, No. 7, page 44.
Blanton, H., Jaccard, J., Christie, C., and Gonzales, P. M. (2007). Plausible assumptions, questionable assumptions and post hoc rationalizations: Will the real IAT, please stand up? Journal of Experimental Social Psychology. Volume 43, Issue 3, Pages 399-409.
Blanton, H., Klick, J., Mitchell, G., Jaccard, J.,Mellers, B., Tetlock, P. E. (2009). Strong Claims and Weak Evidence: Reassessing the Predictive Validity of the IAT. Journal of Applied Psychology. Vol. 94, No. 3, 567–582
Chabris, C. F., & Simons, D. J., 2010. The Invisible Gorilla. Random House: New York.
Gladwell, M. 2005. Blink: The Power of Thinking Without Thinking. Little, Brown and Company: New York.
Greenwald, A. G. (2010). I Love Him, I Love Him Not: Researchers adapt a test for unconscious bias to tap secrets of the heart. Scientific American.com: Mind Matters. http://www.scientificamerican.com/article.cfm?id=i-love-him-i-love-him-not
Greenwald, A. G. (2009). Implicit Association Test: Validity Debates. http://faculty.washington.edu/agg/iat_validity.htm
Greenwald, A. G., Poehlman, T. A., Uhlmann, E., & Banaji, M. R. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology. 97, 17–41.
Lane, K. A., Banaji, M. R., Nosek, B. A., & Greenwald, A. G. (2007). Understanding and using the Implicit Association Test: IV. What we know (so far) (Pp. 59–102). In B. Wittenbrink & N. S. Schwarz (Eds.). Implicit measures of attitudes: Procedures and controversies. New York: Guilford Press.
Lehrer, J. 2009. How We Decide. Houghton Mifflin Harcourt: New York.