Seeing is believing, but first impressions do not tell the whole truth.

Well, who you gonna believe, me or your own eyes?

Chico Marx in Duck Soup (1933)

The sirens of observed behavior do not seduce us to a watery grave; they sing of truths so satisfying that we cease to sail. At port, we tell tales of whole oceans after having seen a single cove just outside the harbor.

It might seem like direct observation would be the final authority that trumps all other forms of evidence. However, there are reliability and validity concerns about direct observation that are every bit as serious as those associated with ability tests, rating scales, and interviews (Meier, 1994). It is not that observed behavior gives false information, but the true information it provides is so vivid that other truths are ignored, and our interpretation is incomplete.

Even though we know that behavior can vary considerably from day to day, it is rare for examiners to observe examinees for more than an hour or two in naturalistic settings (e.g., classrooms, playgrounds, and group homes). Worse, most direct observation occurs in the unnaturalistic setting of the testing environment. The testing environment pulls for particular sets of temporary behaviors that are easily mistaken for persistent personality traits. Even those of us who intellectually appreciate the allure of the fundamental attribution error (Ross, 1977) find it hard to resist the urge to overgeneralize that which we have observed with our own eyes.

We have reason to reserve judgment when an examinee does something unusual in the testing environment because the testing environment is itself unusual. The testing environment differs from most other environments, in part because the interaction is most often one-to-one and thus more personal and focused than group interactions. The intense, unfailing attention of the typical examiner is a rather unusual experience for most people. Being assessed is a break from the examinee’s normal routine, which most examinees find to be quite interesting until the novelty wears off. In addition, the environment is carefully controlled to maximize the examinee’s attention and performance. In other words, the testing is designed to elicit the person’s optimal performance. Therefore, the observed behaviors may not be representative of a person’s typical behaviors in another setting, such as a chaotic home, a noisy classroom, or a competitive work environment.

If you believe that the observed test behaviors are indeed similar to those in the home, school, or workplace, you must confirm that this is the case with supplementary evidence. Direct observation is indispensable, but our best hope for accuracy is in a disciplined, systematic integration of all the available evidence.

Excerpt from pp. 103–104 of Schneider, W. J., Lichtenberger, E. O, Mather, N., & Kaufman, N. L. (2018). Essentials of Assessment Report Writing (2nd ed). Hoboken, NJ: Wiley.

