Summary:This review of George Hillocks's 2002 book, The Testing Trap: How State Writing Assessments Control Learning, is still a relevant read, providing history and research connected to the issues involved in high stakes state writing tests. The review details the validity and reliability of such tests, the scoring processes, the variety of tests from state to state, and the range of knowledge about writing held by teachers who score. Worthy of a teacher book group or a school-wide reading, this review could be used as a gateway to this book written by a distinguished researcher in the field of composition.
Arguably, no current trend in educational policy generates more controversy than the use of mandatory assessments. As states implement standards for schools, high-stakes testing has become a powerful index of accountability and performance in teaching and learning. In some states, performance is linked to teacher bonuses and students’ graduation; in others, poor test scores can mean the takeover or dismantling of an entire district. While opinion polls show that teachers and parents generally favor setting higher standards for schools, the grades for current mandated assessments are far less favorable. As recently reported in Education Week’s Quality Counts 2001, nearly 70 percent of public school teachers surveyed feel that standards have led to an overemphasis on assessments. In some states (Massachusetts and New York, for example), parents and students have even boycotted them. High-stakes testing looms as public education’s next battle royale.
Among those who believe in these tests are President Bush, for whom “testing is the cornerstone of reform,” and prominent business executives such as IBM chairman Louis V. Gerstner, Jr., who, in a March 14 New York Times editorial (“The Tests We Know We Need”), called standardized assessments “an essential part of the drive toward world-class standards in our public schools.” Gerstner, who also serves as cochairman of Achieve, Inc., a nonprofit school reform group created by governors and business leaders, cites a recent Public Agenda (www.publicagenda.org) poll of middle and high school students, 80 percent of whom “believed that the standardized tests used in their schools are generally fair.” For many advocates, assessments are the bottom line and sine qua non of education reform. The need for them, argues Gerstner, could not be clearer: “Last fall, college professors and employers were asked to assess the writing proficiency of high school graduates. Three-quarters of both groups rated graduates as poor or fair on writing. Two-thirds rated graduates as poor or fair in math.”
But advocates often speak as if the tests themselves, rather than teaching or standards, are the heavy artillery of reform. High standards may promise rich curriculum, but if assessments are only loosely tied to learning objectives, they can undermine and weaken effective classroom practices. As Warren Simmons, executive director of the Annenberg Institute for School Reform, has warned, “unfortunately what we have in too many districts is test-driven reform masquerading as standards-based reform” (Education Week, April 5, 2000). Test scores may offer a quantifiable fix on a school or district’s progress and an expedient yardstick for politicians and administrators, but what if they don’t measure the standards the curriculum advances? What if they aim too low in what they assess? How standards, teaching, and testing are aligned is a crucial element in the story, and the implications for the teaching of writing are far reaching, argues George Hillocks, Jr., in his new book, The Testing Trap: How State Writing Assessments Control Learning.
Hillocks, a distinguished composition researcher who currently directs the University of Chicago master’s program in teaching, presents a trenchant and provocative analysis of a diverse and representative group of state writing assessments. And, as his title suggests, the findings are disturbing: most of the mandatory assessments he examines do more harm than good. They undermine the standards they are meant to support, he argues, and, worse, encourage bad writing and bad teaching practices. His study, the most thorough to date, focuses on five states: New York, Illinois, Texas, Kentucky, and Oregon, and is supplemented by data gleaned from interviews with more than 300 teachers and administrators. In Hillocks’s view, only Kentucky and Oregon have developed assessments congruent with what their standards and what research have shown to be effective classroom practices. They are anomalies in the otherwise bleak assessment landscape the book painstakingly details.
Hillocks succinctly defines the problem of testing. Does a writing test assess what its proponents claim? Does it indicate how well a student may be able to write in any given situation? Are these tests meaningful measures of individuals and schools? When he began the study, thirty-seven states had some sort of writing assessment, and only a handful of them allowed for developing a piece of writing over more than one session. But the key threads of Hillocks’ argument go far beyond the validity of state-mandated, on-demand, single-prompt tests. Much of his study focuses on disparities between testing rubrics or criteria and the standards they purportedly reflect. He argues that, in most states, the rubrics are vague and the instruction they promote is of low quality and little use to students. He cites, for example, assessment prompts that ask students to write quickly on random topics without offering data or information; these in turn foster classroom instruction that relies on formulaic writing such as the five-paragraph essay. Such practices “engender vacuous writing” (114), setting low standards for teaching and eliminating “the need for critical thought. It [the five-paragraph theme] teaches students that any reasons they propose in support of a proposition need not be examined for consistency, evidentiary force, or even relevance” (136). The tests and the preparatory practices they engender in classrooms invite writing that Hillocks sardonically dismisses as “blether,” a word his Scottish forebears used “to indicate talk that was unfocused, rambling, and . . . thoughtless” (77).
In a careful deconstruction of rubrics and benchmark papers used in assessments of persuasive writing, Hillocks observes that while many states highlight “organization” and “elaboration of evidence” as the main criteria, their standard for elaboration is purely quantitative, i.e., the number of supports for a claim is considered but not their quality or accuracy. But, Hillocks argues, more is not necessarily better. In Illinois and Texas, rubrics fail to measure how well ideas are developed or the relevance of the evidence used in supporting claims. Hence, there is no way to evaluate the quality of support made for an assertion. Such criteria send a message to students and teachers alike: what matters is the form not the content, and students needn’t be responsible for what they write, only how it conforms to the formula.
In Illinois, for example, he finds that the vague rubrics go hand-in-glove with the poor quality of benchmark papers used as scoring guides. And in Texas, 58 percent of the teachers Hillocks interviewed believed that “elaboration” was the most important criterion and taught writing accordingly. As the district supervisor of a large urban school told him, “just do whatever you can to get them to write more.” And, says Hillocks, “a majority of Texas teachers see organization through the five-paragraph theme as one way of assuring basic elaboration so that students have at least three reasons to make a persuasive point” (87). How did such dubious pedagogy come to be the norm in Texas classrooms even when the state’s own department of education warns against it? Hillocks sees a connection between what is taught and much of the professional development Texas teachers experience. While nearly 80 percent of the teachers interviewed had attended workshops related to the state’s mandatory tests (most of them devoted to learning the rubrics and scoring writing), less than a third had attended summer institutes or workshops at one or more of the state’s writing project sites; and in this group, two-thirds taught in economically advantaged suburban schools.
“While teachers have approximately equal experience in test-scoring,” he notes, “they do not have equal experience with more advanced thinking about writing.” Indeed, in every state where teachers have little knowledge about composition pedagogy, “the testing system tends to become the knowledge base for teaching writing.” Although the intent of high-stakes testing sounds noble (“leave no child behind”), the impact on teaching has, according to Hillocks “a powerful effect on increasing the [achievement] gap by restricting what students are allowed to learn in many poorer districts” (102).
Then there is the question of the actual scoring process. One of the book’s more distressing revelations is Hillocks’s description of the assembly-line approach used to score Illinois’ writing tests. Shipped to an out-of-state commercial firm, the tests are evaluated (by graduate students and English teachers) in the basement of a shopping mall where scorers are expected to review tests at the rate of sixty compositions per hour. Scorers know, an informant tells Hillocks, that if they can’t keep up or “maintain reliability,” they will be removed from their scoring team or be fired. Reliability means “minimizing disagreement” and not using “their own judgment about a paper.” To re-enforce the herd mindset, a subsample of papers are circulated “to many raters in a given session. The goal is for all of them to agree. When they do not, the process may be halted for retraining” (120).
In 1986, Hillocks authored a comprehensive research review, surveying more than two decades’ worth of studies covering modes of writing instruction, composing processes, student writing apprehension and repertoires, etc. His 1995 book, Teaching as a Reflective Practice, won the National Council of Teachers of English David H. Russell Award for Distinguished Research in the Teaching of English. Anyone familiar with Hillocks’s previous contributions to the field may recognize a familiar theme in The Testing Trap: the dangers of neglecting content, inquiry, and critical thinking in writing pedagogy and assessments. “Blether” and the bogey of formulaic writing are, in part, the result of a “venerable history [in writing pedagogy] of assuming that content demands will be taken care of elsewhere” (Hillocks 1995, 149).
Hillocks has been an ardent proponent of the view that writing classrooms must do more than focus on processes and genres. The teaching of writing and thinking must go hand in hand. Teachers, he argues, need to use writing to help students learn inquiry strategies for discovering, elaborating, and testing ideas. Such strategies are indispensable tools for developing critical literacy, the kinds of academic discourse used in content areas, as well as for the kinds of job skills currently in demand in the global economy. For example, a frequent complaint about student writing in secondary and postsecondary classes focuses on argument and persuasive modes of writing. Common flaws are the use of “chains of unsupported claims”; unclear support for assertions; feelings masquerading as reasons and evidence; and argument by assertion, i.e., something is true because it’s my opinion, I strongly believe it, and that’s all there is to it. Arguably, such “blether” has much in common with advertising and political spots where the wider, louder, and sexier the saturation, the better a brand or campaign may fare. But for those who believe writing is a means to develop independent thinking, Hillocks has another message.
He cites research showing that, compared with other pedagogic approaches, writing curricula that incorporate inquiry strategies have the most substantive and powerful impacts on student performance. The current writing assessments by and large ignore this, and, more perniciously, their high stakes for teachers and schools translate into low stakes for students and classroom writing instruction. In Texas, for example, only 10.9 percent of the teachers Hillocks interviewed showed any evidence of using inquiry in their writing instruction; while 98 percent focused on organization and elaboration. The second major focus of attention for Texas teachers (85 percent) was grammar, mechanics, and usage. The majority of them rely on multiple choice or other “objective,” skill-and-drill methods as a means of rehearsing students for the tests. Mechanics are not taught in the context of student writing in Texas classrooms. Yet decades of studies have shown that however expedient and economical skills drills may be, they fail over the long term to improve student writing, much less authentically assess a student’s developmental needs as a writer.
Writing assessments vary widely from state to state. Those that receive the highest marks in Hillocks’s study are Kentucky and Oregon, which test and encourage diverse writing tasks. They require both on-demand and portfolio assessments. Other states call for on-demand writing only. In Kentucky, students have a school year to produce their writing for assessment; Illinois gives students forty minutes; and in Texas, students have a school day. New York emphasizes expository writing. Oregon tests imaginative, expository, narrative, and persuasive writing. In Kentucky and New York, teachers in the building score the writing. In Illinois and Texas, tests go to a private company. In New York and Texas, students must pass the tests to graduate, but in Illinois, how students fare has no impact on them personally. In Texas, low scores may result in school districts being placed under supervision or in some cases being dismantled. The merits or liabilities of these testing differences are open to question, but the stakes, as Hillocks shows, have profound impacts on how teachers teach writing.
“Clearly,” he writes, “the spectrum of writing used in the Kentucky assessment is far broader than in any other state examined in detail in this study and broader than that in any of the other states with writing assessments” (54). Beyond the rich array of writing samples, “it provides…time for students to develop pieces of writing adequately so that they do not have to revert to the formulaic. . .” (205). Indeed, the state’s portfolio development guidelines discourage formulaic writing such as the five-paragraph theme. They encourage writing across the curriculum and require exemplars from subjects other than English. Kentucky’s assessment guidelines also have the highest percentages of support from teachers in any state surveyed. “Over three quarters (76.6 percent),” writes Hillocks, “believe that the state assessment supports the kind of writing programs they would want in their schools. Over two thirds (67.2 percent) are positive about the scoring rubric….But perhaps the most impressive fact is that nearly 80 percent of the teachers interviewed feel that the portfolio assessment has helped improve writing in the state. No other state writing assessment studied comes close to having such a strong endorsement for its appropriateness and its power in bringing about change” (197). This high support from teachers stands in stark contrast to the attitudes toward testing cited earlier in Education Week’s Quality Counts study. Kentucky ranked third in the nation in that publication’s 2002 evaluations of the quality of each state’s performance on standards and accountability in all core subjects.
Hillocks also studied how the quality of Kentucky’s assessment system impacts teaching practices. “The percentages of Kentucky are consistently high in using all elements of the writing process. In other states, teachers tend to use some parts of the process but not others….Once again, the success of the assessment in promoting better teaching of writing is dependent on the character of the assessment” (196).
Kentucky is the shining exception to assessment trends in most states. Hillocks’s book is a damning and meticulously researched indictment of how regressive testing systems elsewhere are lowering the bar for learning and high standards. Given the “millions of dollars and thousands of teachers’ hours and hundreds of thousands of student classroom hours” (17) devoted to mandatory assessments, his book is a wake-up call to policymakers, educators, parents, and politicians concerned with improving teaching and learning in our schools.