Network News

X My Profile
View More Activity

Testing Vet Reveals How to Fix Standardized Tests

Todd Farley has a new book: "Making the Grades: My Misadventures in the Standardized Testing Industry." It was an intriguing read, but I told him it didn't go far enough. He had dramatized the weaknesses in the many tests he graded, but did not explain to us poor realists what we should put in their places. At first he resisted my suggestion, but I told him I was sure, if he thought about it, he would come up with something. He did:

1) The reason I wrote my book is because I think no one has any idea how totally ridiculous large-scale assessment is (especially the open-ended items). That's what I hope my book reveals, a system that is just staggeringly, laughably ineffective. I think the efforts to make that process "standardized" or "objective" have taken all meaning away from the work, and the end result is that now all the testing industry produces are numbers, random numbers. I really do think that information is the most important aspect that I bring to the debate about testing. Having said that, I do believe there are some things that can be done to make that assessment much more effective.

2) This is simply a logistical issue, but I think from now on student tests need to be scored by one person at one time. Currently, most large-scale assessment (including the vaunted NAEP) are chopped into bits, with a student's mutliple-choice answers going one place, their short answers going another, and their long answers somewhere else. That means if a student answers ten questions about "Charlotte's Web," for example, question 1 might be read and scored by Bob on Monday, question 2 by Mary on Wednesday, question 3 by George on the NEXT Thursday, etc etc. Sometimes weeks go between the scoring of questions 1 and 2 or 3 and 4, which seems to me to take so much away from what a student might be trying to say. While this is done for various reasons in the testing industry (training, money, deadlines, etc.), it also means a student's test is scored almost entirely without context.

Surely this is done so that each answer is given an "objective" read by some dispassionate employee (not to mention the fact you can then train unqualified people to do the simple task of scoring by having them search for random words), but it also means we are reading an answer to question 2 without knowing how a student answered question 1. In my opinion, this totally takes away from a broad understanding of a student's knowledge. Decisions end up being made based more on picayune things like what words show up on the paper, not so much what those words might mean (i.e. we accept "bubbles" but not "sizzles" in a question about the definition of "boiling", etc.). In the current system that means 5 or 10 or 15 or people might all end up doing some of the scoring/grading on each kid's test. That is not being "objective." It is an unrealistic assessment of a child's understanding.

In the scoring centers, it also takes away from the sense of responsibility that we feel about kids: if I was scoring one student's entire test, I'd become invested in it, but in the current set-up I'd just be scoring Question 2 (i.e., "What is the theme of this story?") for about three straight days and would completely lose any feel that I was assessing actual children. It just totally becomes a muddled mess of words at that point, not students. Ergo, I think what has to happen is that some person completely qualified in a subject area (such as an English teacher reading English tests, math and math, etc.) should read and score each test in its entirety, not just chop them all up into bits. If it costs more to hire actual educators instead of random people off the street, that's still what I think makes more sense.

3) And so, what I think SHOULD happen is what happened on the best assessment I ever worked on: the state of Washington's Goal 2 classroom-based assessments. Interestingly enough, Washington has a state test (Washington Assessment of Student Learning) for reading and math, the usual high-pressure, mandatory tests that many teachers/parents argue against AND that also happen to be part of the horrible system my book and I impugn (in fact, a lot of my early scoring career was working on WASL reading and writing). Of less importance to the state is the Goal 2 tests (for History, Civics, Health/PE/, and the Arts--Music, Theatre, Visual Arts, Dance), but those tests were Classroom-Based Assessments that were written and scored by the state's teachers in conjunction with Riverside Publishing--they weren't just handed off to the test company with no idea what was really happening next. For those tests, scoring systems were established on the state-level, and then local teachers in those subject areas were entrusted to read the tests, view the performances and assess the results. It seemed to me this way you had some sort of central management (state gov't providing standards on what should be learned and what constituted acceptable and unacceptable results), plus teacher participation in the scoring process that to me means qualified people would give serious reviews of student work, a massive improvement on the current state of bored, unqualified temps making snap judgments based on the fleeting glances they give student work. Even if we don't think it's a good idea for teachers to assess their own students' work, then teachers can cross-grade within a district (which happens at the college-level).

Jay, I don't know that my suggested system is perfect, but it is a massive improvement on the foolishness that now occurs.

By Jay Mathews  | September 22, 2009; 6:43 PM ET
Categories:  Jay on the Web  
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: New Ammo For Charter Debate
Next: Seen Cheating? Tell Me About It.


Jay, teachers have been saying the same things for years. How come you think we're all lazy or irresponsible when we say the testing is virtually useless, but when someone in the testing business makes the claim, you take it seriously? How about an apology?

Posted by: aed3 | September 22, 2009 8:56 PM | Report abuse

I disagree with comments about Number 2. There is no need to grade one part with an attitude from a previous section. That would make the scoring of the test less objective and partial. Reviewers would more likely give people a higher score if previous answers were wonderful and giving students the benefirt of the doubt on a succeeding question.

There is also no valid reason to know that a student fared poorly on a previous answer. This would result in the teacher giving the student a boost or downgrading another question to reflect the quality of the preceeding question. Just because different teachers grade differnt sections, that doesn't mean that the scores are not valid.

Posted by: ericpollock | September 22, 2009 8:57 PM | Report abuse

for aed3---I didn't say I agreed with Todd's view. I just put it out there for readers to see and reflect upon. We are all about diversity of opinion at the Struggle.

Posted by: jaymathews | September 23, 2009 8:32 AM | Report abuse

Yeah, Mr. Farley was the picture of professionalism during his many years in the testing industry (note sarcasm, please).
The behaviors he describes are those that he displayed himself, and I find it incredibly ironic that he is so quick to judge the scorers, most of whom take their work very seriously. He fails to acknowledge that while working in the industry that he now "impugns," he was responsible for the training of readers and the resultant quality of the scores for the assessments in which he was involved--if readers behaved less than appropriately, it was because he let them. Amazing that he has suddenly seen the light and now feels compelled to share his revelation with the masses. (Classroom grading...what a novel idea!) What is more likely is that he is looking to make a quick buck and can't think of anything else to write about. Take what he says with a grain of salt, folks. He was himself an "unqualified temp," albeit one who can put together a semi-coherent sentence and whose pitiful squawks for attention are, unfortunately, among the loudest. But someone who is concerned for the well-being of the kiddies? As he, himself, has said (when describing a student's test response): "Gibberish."

Posted by: IronyofsuchBS | September 23, 2009 3:22 PM | Report abuse

The comments to this entry are closed.

RSS Feed
Subscribe to The Post

© 2010 The Washington Post Company