State Assessments: A Crisis of Evidence?

George Hillocks, Jr., in his essay, “How State Assessments Lead to Vacuous Thinking and Writing, ” argues that in many cases, high-stakes testing provides students with an impoverished context for writing, and therefore rewards surface, mostly evidence-free writing. Instead, students are indirectly encouraged to supply claims and warrants instead of evidence. In Aristotelian terms, we might say that students are using intrinsic proofs (invented by language) instead of extrinsic proofs (existing outside of the writer’s mind).

The solution, Hillocks claims, is to enrich the writing contexts for high-stakes testing, and this idea is reinforced by Arthur Applebee, in his use of the framework for the 2011 National Assessment for Educational Progress in writing. The section, “The Issue: What Should Students Write About?” references Hillocks, and in the outcomes section, notes:

Although many use items very similar to those in NAEP, others, such as New York State, base writing on extended reading passages, or include at least some classroom-based writing as part of the assessment (Kentucky, Vermont). A more general issue for assessment developers is whether it would be useful to increase the content load of student writing prompts, and if so, how this could be done within current assessment frameworks or through extensions of them (p. 91-92, emphasis mine)

It is difficult for me, quite honestly, to imagine an objection to this practice, at least in terms of validity. I assume we can all agree that if we are requesting writing samples from students, we want thoughtful, well-reasoned work–work that would stand up in contexts outside of a writing assessment. A greater amount of context, it seems to me, would enrich student writing. Real data could be provided, and various viewpoints on an issue could be shown. Students, in this way, would be able to use real, detailed data, instead of generalizations.

So what’s the problem? Applebee writes, “One possibility, particularly if writing and other assessments become computerized, would be through the adoption of some common metrics for assessing quality of writing across assessments in different content areas” (p.92). This seems to address the problem of students writing in different areas of prior knowledge–for instance, if one student is writing about the Emancipation Proclamation, and another is writing about tree frogs.

But the idea that Hillocks advances, as I read it, is not that students be tested on what they already know, but that we enrich prompts, providing more information about the writing context. This would not seem to be a problem from a reliability standpoint, because we could give all students of a given level the same prompt, and we could assess it with the same constructs with which we already assess writing samples. Perhaps a bit more time would need to be provided, but otherwise, the task could stay more or less the same.

Les Perelman, in “Information Illiteracy and Mass Market Writing Assessments,” however takes the practice of enriching writing prompts firmly to task, claiming that this sort of condensed data do not constitute “real arguments,” and that students are encouraged to use information improperly (133). The student has to accept the authority of the information without checking it.

Perhaps this rather damning summary is fair, and perhaps Perelman is right in suggesting that summative assessments should resemble literature reviews more than arguments. Nevertheless, while high-stakes testing remains more or less in its traditional form, I can see few reasons not to enrich writing prompts.


Teaching Without Grading? Placing Without Judging?

Like most humanities teachers, I have, shall we say, a bit of a bleeding heart. I worry about damaging my students with outdated and ineffective pedagogies, because I don’t want to make students more resistant to writing instruction than many of them already are. That’s why, of all the things we read in this unit, grading contracts and directed self-placement appeal to me the most. I want to consider those ideas here.

Teaching Without Grading? Grading Contracts
Grades tend to occlude feedback, and sometimes block a process approach to writing instruction. Therefore, some scholars have turned to grading contracts, where a certain score in a course is guaranteed as long as the student meets certain criteria, which are usually tied to fairly objective measures like output, revision, and attendance. The advantage, of course, is that (in theory, anyway) the student can stop focusing on what kind of score a composition is likely to receive, and start focusing on how to polish a piece of writing to the greatest degree possible. As long as they do the assigned work for a certain grade, they will receive that grade; the staircase perception of grading, we hope, disappears.

Asao Inoue, in his essay “Grading Contracts: Assessing Their Effectiveness on Different Racial Formations,” explored the way whites, blacks, and Pacific islanders responded to a form of contract grading at Fresno State. He found that grading contracts were highly effective for Pacific islanders, somewhat effective for blacks, and not as effective for whites. Black students had trouble meeting the quantity standards, and whites seemed frustrated by the process approach, revising the least between drafts.

It’s clear that contract grading is not one-size-fits all, but it’s notable that the students who were challenged by the contract system tended to be the more privileged students. So perhaps it would be good for white students to have to write based on process, and output, and sustained effort–which can be evenly distributed across racial lines–and not trade almost entirely on their literacy backgrounds. This is something that probably will depend, to a large degree, on context.

Placing Without Judging? Directed Self-Placement
Another unusual form of assessment is directed self-placement, which was proposed by Daniel J. Royer and Roger Gilles in 1998. The idea is that when making decisions about where a student will succeed–in standard, accelerated, or basic writing courses–it will not do to simply assign them a spot based on a writing sample (this latter approach, Royer and Gilles point out, often engenders resentment among students who do not see themselves as basic writers). Instead, a good deal of information about the courses and expectations is provided to the students, and the students are invited, with the input of an advisor, to place themselves in the proper course. The approach works, Royer and Gilles claim, because a healthy percentage of students actually choose the basic (non-credit) writing course without coercion.

And again, the idealist in me likes this approach. If I were to take a job as a WPA, I would be tempted to try it. The objection that one of my classmates raised–that consigning students to their own choices seemed irresponsible–was not very persuasive to me. After all, I think college is about making choices, and students are given choices (with attendant consequences) to make all of the time. Do they use their paychecks to pay rent, or to throw a keg party? Do they get up for their class, or sleep in and play XBox? When they make poor choices, certain things traditionally happen, up to and including failing out of college, having to work for a while in a job they dislike, and then deciding, when older and hopefully more mature, to try again. This sort of choice, then, doesn’t strike me as irresponsible, but as another opportunity for students to practice wise decision making. Besides, an advisor can hopefully help students to see if they are being unrealistic about their abilities.

What Does Blackness Have to Do with Writing Assessment?

I have to confess that before I got to FSU, my thinking about race had been shallow and rare. I’m not sure I had ever heard the term “white privilege,” or if I had, I had managed to ignore it; in fact, in “White Privilege: Unpacking the Invisible Knapsack” by Peggy McIntosh, this is item #32. I wanted to see people as individuals, subject to the same privileges and standards as myself, and I resisted seeing race as an important factor in most areas of inquiry. So, when we turned to issues of “access, accessibility, and diversity” in our writing assessment class, I was at least skeptical if not resistant.

But I’m sensing a change in my thinking about race in general, in the form of a greater acknowledgement of institutional racism. And if institutional racism exists, it follows that systems of racism would be re-inscribed in writing assessments, since we’ve already discussed the ways that assessments are ideological.

Diane Kelly-Riley, in “Getting Off the Boat and Onto the Bank: Exploring the Validity of Shared Evaluation Methods for Students of Color in College Writing Assessment,” urges us to get “off the boat” of white normative assessments (bounded by white privilege) and “onto the bank” with those who are affected by these assessments. She suggests that we attend to ways that assessments marginalize students of color, and reify institutional racism.

Indeed, Johnson and Van Brackle’s study “Linguistic Discrimination in Writing Assessment: How Raters React to African American ‘Errors,’ ESL errors, and Standard English Errors on a State-Mandated Writing Exam” clearly shows a systematic bias in the way error is assessed in a standardized test. They found that AAVE was penalized above “white” errors like comma splices, and “ESL” errors like definite and indefinite article usage. 

It is a bit surprising, then, that in Arnetha Ball’s “Expanding the Dialogue on Culture as a Critical Component When Assessing Writing,” where the responses of African-American teachers to student compositions were compared to the responses of white teachers, the study found that African-American teachers were reliably harder on students, when it came to responses, than white teachers. Ball suggests that this stringent emphasis on correctness, on the part of these teachers, reflects a desire to help students enter “professional” (read: white) discourse without incriminating racial markers.

But to be honest, as a middle-class, highly educated white male, I’m kind of at a loss. Some black compositionists, like Geneva Smitherman, argue convincingly for the inclusion and acceptance of AAVE. Others, like Charles Nash, seem to think this is suicide for black students, living as they are in a white dominated culture: “It would be more racist not to do that [remediate black error] and just perpetuate the mess” (5). To some degree, it feels like a very interesting conversation to which I was not invited, yet as a developing assessment specialist, it seems important to develop an opinion about this. I suppose as a rhetorician, I’m drawn to the idea of audience: a skilled writer directs his or her composition toward a specific audience, and that might include the ability to code-switch as the situation demands. I’m aware, however, that this continues to assume a public discursive space keyed to white American norms. It’s a problem. I’m just not sure, at this point, how to solve it.


Is Writing Assessment a Technology?

In our Digital Revolution and Convergence Culture class with Kathleen Yancey, we have discussed, among other things, what counts as a technology. This seems to be a critical question for me especially, because I’m interested in technology alongside assessment. Are they the same thing: is writing assessment a technology?

I have to confess, first of all, that my reading on what counts as a technology is fairly impoverished. I do like Ong’s contention, in “Writing is a Technology that Restructures Thought,” that writing is a technology, so if that is true, it should follow that writing assessment is a technology. But what is a technology? We need a definition.

Brian Huot points to George Madaus (1993) as an early proponent of writing assessment as technology. Huot himself, however, uses Andrew Feenberg’s ideas of instrumental and substantive technology to develop his own theory. The substantive view, which appears more frequently than the other, sees technology as “a new type of social system that restructures the entire social world as an object of control.” The instrumental, which is far more common, sees technology as neutral tools (141). I crudely (and unknowingly) activated this distinction, by the way, in an early blog post for Digital Revolution and Convergence Culture. Technologies are not merely tools to be utilized from ideological positions; the technologies, themselves, involve ideologies.

Fine: so a technology is ideological, so it helps us to achieve certain goals. But that still falls pretty short of a definition. Michael Neal, in Writing Assessment and the Revolution in Digital Texts and Technologies, points to another Huot essay (“Computers and Assessment: Understanding Two Technologies”), along with Green, Haas, and Madaus, to argue that a technology involves a way to accomplish purposes. He suggests, with Haas, that materiality is involved in technologies; they are “materially embodied” (18). It seems, for Neal, that there is an element of intentionality involved in technologies, though they may be used in unintended ways.

So, a definition would be something like this: a technology is an intellectual stance, intentionally directed toward solving a certain kind of problem, materially embodied, and containing a specific ideology. It seems that writing assessments, then, would fit this criteria, though the materiality of them is harder to find (of course, we can say that assessments exist in computer screens or paper–etc–but I find that pretty weak. The fact that even thoughts can be said to be material, in the form of synapses, might render the category moot. If nothing is immaterial, then what does it mean to be material?)

The other criteria, however, fit fairly well, so I am satisfied that writing assessment, like writing, is a technology.

Addendum: I just had a conversation with Bruce, Joe, and Jeff about materiality. Jeff thinks that an element of materiality is retrievability. So, a synapse would not be material unless it could be recovered; neither could sound unless it could be recorded. This seems like a good enough distinction for my purposes.

Reliability and Validity: Concepts in Tension

Kathleen Yancey, in her oft-cited essay “Looking Back as We Look Forward: Historicizing Writing Assessment,” argues that “writing assessment is commonly understood as an exercise in balancing the twin concepts of validity and reliability” (135). And it certainly seems to be true that these concepts are, if not exactly opposed to each other, certainly in tension with each other. Validity, as Yancey understands it, is “measuring what you intend to measure,” while reliability means that “you can measure it consistently” (135). Not surprisingly, rhetoricians tend to favor the former, which relies more on arguments, while psychometricians  tend to favor the latter, which relies more on numbers.

But the picture is perhaps more complex than Professor Yancey’s essay would suggest. Roberta Camp, in “Changing the Model for the Direct Assessment of Writing,” agrees that validity and reliability perform a balancing act in assessment, pointing to the popularity of multiple choice tests (high validity) with psychometricians, and direct writing samples (high validity) with writing teachers, and the common compromise of merging the two in assessments (103-106). But she also argues that validity, as a construct, can only be fully realized when based on a rich theory of writing.

Lorrie Shepherd further complicates the traditional view of validity in her essay “The Centrality of Test Use and Consequences for Test Validity,” arguing that validity cannot and should not be considered apart from the consequences of the use of an assessment, assuming the assessment is used as intended (5 ). Michael Neal agrees, seeing validity as a combination of accuracy (which the Yancey definition addresses, to some degree) and appropriateness (is this the right assessment for this situation?).

Reliability, too, is a multifaceted thing, as is made clear in Cherry and Meyer’s “Reliability Issues in Holistic Assessment.” The authors remind us that reliability involves the concepts of measurement error, analysis of variance, and context. It is certainly not simply a matter of inter-rater reliability, as is often understood in the rhet/comp community. Ultimately, the authors argue that “strictly speaking, ‘reliability’ refers not to a characteristic (or set of characteristics) of a particular test but to the confidence that test users can place in scores yielded by the test as a basis for making certain kinds of decisions” (53). So here, interestingly, reliability seems to bleed a little into Shepherd’s concept of of consequential delivery, and everything points back to my previous post: assessment is never neutral. There is always something at stake besides measuring writing, so we must measure with care.

What History Teaches Us: Writing Assessments Are Not Neutral

As I studied these six readings, something hit home for me. Granted, it was something I already “knew” in the sense that it was not new information, but I think I began to believe it on a deeper level. What hit home for me is that when writing assessments are designed and deployed, historically, there is always something at stake besides discovering the quality of student writing. Sometimes these exigencies have been noble, and sometimes suspect.

Even the Elliot chapter, which focuses on individual hero narratives, detached from context and virtually ignoring race and class, notes that writing assessments were always developed in response to fairly specific exigencies. The pushes for accountability that occurred in the late 60’s and the early 2000’s, for instance, came as responses to attempts to standardize education and make it available to poor and racially marginalized students (196-198).

More shockingly, we learn from O’Neill, Moore, and Huot that the field of intelligence testing began as a way to support military efforts during WWII, and the SAT was mandated right after the bombing of Pearl Harbor (18-20). Even closer to home, writing assessments at Harvard in the late 1800s were established, according to Penrod, “to respond to specific social conditions…that demanded hierarchies exist in the workplace, in education, and in societal relations” (xxii).

So writing assessments not only respond to exigencies, but they are always after something greater than simply learning how well students can write. The goal is to classify students by perceived ability, and then a number of things can be done with that information. The information can be used to divide students into a class needing “remediation.” It can be used to prove that teachers are not performing. It can even be used to place an artificial value on student “worth” to the military-industrial complex (such as deciding, based on IQ, who goes to the front lines). On the other hand, assessment can be used locally, in the service of teaching, in order to best discover how to help students improve.

And, as an ancillary point, it follows from this that no matter how “objective” an assessment is designed to be, it not only serves a rhetorical purpose, but as a technology, it is inescapably ideological (Huot and Neal 421-422). Writing assessments are never neutral.