image Today I am reviewing a listening comprehension assessment for students 5-21 years of age, entitled the Oral Passage Understanding Scale (OPUS) created by Elizabeth Carrow-Woolfolk, PhD, and Amber M. Klein, PhD, which is available via WPS.

The OPUS is a test of listening comprehension which assesses the following forms of knowledge: lexical/semantic (knowledge and use of words and word combinations), syntactic (knowledge and use of grammar, as well as supralinguistic (knowledge and use of indirect/complex language).

It is composed of 6 item sets, labeled A through F. Each set consists of 5 passages, which are accompanied by a range of 7 to 10 questions. Each item set of five passages is designated for use with a particular age group. This test which was published in 2016, takes approximately 20+ minutes to administer and yields the following scores: raw, ability, standard, percentile ranks, and age-equivalents.

imageThe administration is relatively straightforward. Find the item set designated for a specific age group (front page of the test form), turn to that page in the test form as well as on the easel and begin. Don’t forget to flip the pages of the easel to see allowable choices of responses in order to score student answers.

OPUS does come with FREE audio files (upon purchase) for all the passages which can be found HERE. It also comes with a FREE scoring software, available HERE. On the back of the Record Form, clinicians can also find the Item Analysis Worksheet for each of the 6 item sets. It allows the clinician to determine the areas where the deficits lie based on the following categories:

  • Inference
    • Lexical/semanticimage
    • Syntax
    • Inference from background knowledge
    • Inference from context
    • Inference from Figurative language
  •  Memory
    • Memory for non-meaningful information
    • Recall of text details
    • Passage synthesis

The OPUS is based on the Integrative Language Theory, first discussed by Carrow-Woolfolk in “An Integrative Approach to Language Disorders in Children” (Carrow-Woolfolk & Lynch, 1981).  The theory describes language in terms of separate linguistic categories, including Lexical/Semantic, Syntactic, and Supralinguistic (e.g., inference) (see above). “The Integrative Language Theory posits that language reflects two dimensions: knowledge and performance. Language knowledge is defined by four categories: (a) Lexical/ Semantic, (b) Syntactic, (c) Supralinguistic, and (d) Pragmatic (not covered by the OPUS). ” (OPUS Manual, pg. 33)

Lexical/Semantic category of the OPUS measures basic vocabulary words characterized by nouns, verbs, adjectives, adverbs, and lexical morphemes, all of which carry the basic meaning of language. The Syntactic category of the OPUS deals with the grammatical rules, and measures grammatical morphemes (function words such as pronouns and prepositions), inflections (verb tense and pluralization), and features of sentence structure (word order, negation, and active/passive voice), albeit in earlier vs. later passages (see pg. 34 of the manual for further details).  The Supralinguistic category of the OPUS measures the student’s nonliteral processing of language by asking students to make inferences regarding select aspects of the presented passages.

Standardization and Psychometric Properties:

Standardization for the OPUS was based on a sample of 1,517 individuals ages 5 to 21 years. Out of these individuals, 204 clients from the normative sample were diagnosed with a variety of disabilities including intellectual, learning, social communication, etc. image





What does that mean? “According to Peña, Spaulding and Plante (2006), the inclusion of children with disabilities in the normative sample can negatively impact the test’s discriminant accuracy, or ability to differentiate between typically developing and disordered children. Specifically, the inclusion of individuals with disabilities in the normative sample lowers the mean score, which limits the test’s ability to diagnose children with mild disabilities.” (Leader’s Project, 2014)  As such, when the purpose of a test is to identify children with language impairment, the inclusion of children with language impairment in the normative sample can reduce the accuracy of identification (Peña, Spaulding and Plante, 2006).

At this point in the review, we need to discuss two hugely important test properties, sensitivity, and specificity. Sensitivity measures the degree to which the test can accurately identify the students who truly have a language disorder (Dollaghan, 2007). Specificity, in turn, is the ability of the test to correctly identify those children without the disorder (test does not accidentally misdiagnose typically developing children as having a language disorder) (Dollaghan, 2007). Sensitivity and specificity determine the test’s discriminant accuracy, or the ability to distinguish the presence of a disorder from the absence of one.

imageTo continue, the OPUS and the Comprehensive Assessment of Spoken Language, Second Edition (CASL-2) were co-normed. That means that the test validation process was conducted on bothtests using the same sample of test-takers. Psychometrically both tests have limitations. To illustrate,  the sensitivity of the CASL-2  at cutoffs all the way until the standard score (SS) of 90 is quite low, far less than 80%.

Here’s the problem. “Vance and Plante (1994) put forth the following criteria for accurate identification of a disorder (discriminant accuracy): “90% should be considered good discriminant accuracy; 80% to 89% should be considered fair.

Below 80%, misidentifications occur at unacceptably high rates” and leading to “serious social consequences” of misidentified children. (p. 21)“.

imageThus, as it stands, the General Language Ability Intex (GLAI) of 90 will produce an acceptable sensitivity, albeit lowered specificity.  Similarly, to the CASL-2, the OPUS‘s psychometric properties are not without limitations. The sensitivity of the OPUS  at cutoffs all the way until the standard score (SS) of 90 is quite low, far less than 80%. At SS 85 the sensitivity is 79% which is still 1 point lower than the acceptable discriminant accuracy.  Thus, based on the psychometric properties of the  OPUS, the optimal standard score for the detection of the disorder is actually the standard score of 90 vs. 85.

Impressions:  To date, I have administered to 12 students of varying age levels and abilities. The following patterns have emerged. The test is excellent for teasing out the language deficits of students with significant psychiatric impairments (with and without ASD). It also does a great job of highlighting the language limitations of students with intellectual disability (IQ below 70) as well as the below-average cognitive ability (IQ below 85).  It’s quite useful for children with significant weaknesses in the area of listening comprehension as related to problem-solving and verbal reasoning.


  • Test administration takes a relatively short period of time (approximately the length of a 30-minute therapy session or less depending on the student)
  • Test administration begins at 5 years of age, which is a significant advantage since one common test of listening comprehension – the Listening Comprehension Test-2 (LCT-2) begins at 6 years of age, takes longer to administer, and has poorer psychometric properties
  • Availability of audio files ensures appropriate passage administration and is highly convenient
  • Free online scoring ensures the reduction of potential mistakes
  •  Analysis of deficits areas is provided on the back of the record forms, allowing the clinician to understand the areas of struggle for the student
  • “The OPUS was standardized on individuals who demonstrated proficiency in English, including bilingual individuals who were judged by the examiner to be fully proficient in English.” Thus, the OPUS can be administered to simultaneous bilinguals.


  • This is NOT a test suitable for children with more subtle (mild-moderate) language deficits and average IQ
  • Despite the recommended cut score of 85 in the manual, a cut score of 90is judged to be more suitable to the determination of listening comprehension deficits as per the available psychometric data outlined in Chapter 5 of the test manual.
  • *While the OPUS passages possess adequate complexity, the passage questions do not always reflect that complexity adequately. Many of them merely probe the student’s surface knowledge and can easily be guessed or determined based on background knowledge.
    • To illustrate, for the older age group (above 7-8 years) a great number of syntax related questions pertain to the student’s ability to recall whether the main story character was male or female.
    • Similarly, a great number of questions in the story pertain to the factual recall of information vs. usage and comprehension of less transparent non-literal language.
    • Several of my students have received credit as per testing instructions for very vague responses.
    • Still, others received relatively high scores as compared to the low number of questions they were actually able to answer on this test.
  • There really seems to be no good reason for the excessively cumbersome and heavy easel which contains passages as well as the allowable passage responses. Since the OPUS is a test of listening comprehension and does not require the student to look at any pictures or text, all of the needed information for the purpose of test administration could have easily been fitted in the record forms!

There you have it! These are my impressions of using the OPUS in my settings. I would readily administer this test to very severely language-impaired clients, psychiatrically impaired clients as well as intellectually impaired clients. However, if some of my language impaired clients are functioning relatively higher (subjective judgment), I would probably select a different test, such as the Test of Integrated Language and Literacy (TILLS), especially because beginning at 8 years of age, semantic flexibility skills as well as reading and writing abilities as measured by such tasks as nonword reading, spelling, etc., are far more sensitive to the detection of language and literacy deficits (pgs. 10-11) vs. listening comprehension tests.

What about you? Have you used this test with any of your students to date? If yes, what are some strengths and limitations you are noticing? Let us know by posting your comments below.


