Network News

X My Profile
View More Activity

Posted at 10:30 AM ET, 12/18/2009

Author: 'My Misadventures in the Standardized Testing Industry'

By Valerie Strauss

Today my guest is Todd S. Farley, who worked for years in the standardized testing industry and authored the new book “Making the Grades: My Misadventures in the Standardized Testing Industry.” I asked him to write about the biggest problems he encountered and here is his account. Here is what he wrote.

By Todd S. Farley
For 15 years I was employed by the K-12 testing business, working for many of the biggest players (Pearson Education, Educational Testing Service, American Institutes of Research, etc.) on many of the biggest tests (National Assessment of Educational Progress, California High School Exit Exam, Florida Comprehensive Assessment, Virginia Standards of Learning, etc.).

While I did enjoy the career (good money, nice people, fun trips), it also left me completely convinced of the utter folly of entrusting decisions about American students, teachers, and schools to the for-profit industry that long employed me. I don’t know how anyone who’s seen what I’ve seen could feel any differently.

I’m not even talking about the well-publicized disasters that have occurred in recent years, when mis-scored tests kept students from getting into their preferred colleges (SAT, 2006); kept teachers from earning their deserved certifications (Praxis, 2004); and kept teenagers from graduating high school when they should have (Minnesota state assessment, 2000).

Any Google search will result in many similar testing tragedies, but I’d say the scandals that make the news are only the tip of the iceberg. In fact, I’d say there aren’t scoring problems on some standardized tests—my experience suggests there are scoring problems on all of them.

Multiple-choice tests can be scored easily by machines, but constructed-response items (short-answer questions, essays questions) that students answer in their own words need to be read and scored by humans, which is where I think it all goes to hell. From my experience, that human scoring of tens of millions of student tests every year goes to hell for five main reasons:

The tests get scored each year by a motley crew of temporary employees earning low hourly wages, and while many of those people are earnest and conscientious employees, many others are not.

Many end up working in test-scoring centers only because they can’t get jobs elsewhere, and over the years I worked with every kind of drunk (a fellow in Iowa City who started every day wan and shaky but ended it—after tippling his way through break time—ruddy and rambunctious); and dingbat (one scorer who gave every student response the score of 2 one day, every single one of them!); and dilettante (a scorer in Phoenix who told me his real job was as an ultimate fighter and who after three weeks on the job thought he was being tested, not that the students were). Are these some of the people who should be making decisions about American education?

The only way to get large groups of temporary employees to score the tests in a standardized way is to establish stringent scoring rules and not waver from them.

This regularly results in rules that are rigid to the point of absurd. I remember scoring one item on bike safety, when I was instructed to give a point to any fourth-grade student whose drawing showed a good example of a bike safety rule. This made sense when a student (for example) drew a rider wearing a helmet or stopped at a stop sign.

However, it also meant we were instructed to credit any bike at a stop sign, and soon I was being instructed to give full credit to a poster showing a bike flying through the air in front of a stop sign; a bike in the back of a pick-up truck in front of a stop sign; and a bike busted into pieces in front of a stop sign.

In the cases when there aren’t rigid scoring rules to establish, the scoring that gets done is largely subjective, with the scores earned resulting as much from which temporary employee reads a response as the quality of student work. When assessing essays/writing, the scorers are supposed to differentiate between essays that are “skillful” and essays that are “sufficient,” for example, which is largely a matter of opinion. Since when are people all supposed to come to the same conclusion about a piece of writing?

The number of tests that need to be read and scored each year is so massive that every conceivable shortcut is taken to get that job done. The testing industry works exceedingly hard to meet deadlines and get scores put on to tests, while I saw much less interest in getting the correct scores put on them.

When I was a supervisor and trainer in charge of 10, 20, 100 people, the last thing I needed was for each scorer to give a meticulous and earnest review to every student response. All I really needed was for them to quickly slap down a score and move on to the next answer. How else do people imagine those tens of millions of students responses get scored?

Finally, and perhaps most importantly, the test-scoring industry cheats.

It cheats on qualification tests to make sure there is enough personnel to meet deadlines/get tests scored; it cheats on reliability scores to give off the appearance of standardization even when that doesn’t exist; it cheats on validity scores and calibration scores and anything else that might be needed. I don’t want to just point fingers here, because I am guilty too, and over the years I fudged the numbers like everyone else.

Statistical tomfoolery and corporate chicanery were the hallmark of my test-scoring career, and while I’m not proud of that, it is a fact. Remember, I was never in the testing business for any reason other than to earn a pay check, just like many of the testing companies are in it solely to make a buck.

I’ve spoken to educators in Florida lately about the hard times they face: teaching jobs being cut, extracurricular activities being cut, book shortages in schools and workbooks that have to be recycled from year to year. Meanwhile, the state just signed a $250 million dollar contract for its Florida Comprehensive Assessment Test program.

I don’t claim to have all the answers about the problems in American education, but neither can I imagine how giving a quarter-billion dollars to a for-profit testing company is better for students than hiring teachers and buying books—especially not after I’ve seen what the testing companies do for that money.


Todd Farley is the author of Making the Grades: My Misadventures in the Standardized Testing Industry.

Follow Valerie’s blog all day, every day at

For all the Post’s Education coverage, please see

By Valerie Strauss  | December 18, 2009; 10:30 AM ET
Categories:  Standardized Tests  | Tags:  standardized testing  
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: The decade's 10 big education ideas
Next: Reader: Valerie, spare me ‘your snarky column’


I definitely want to read the book!

Are people aware that the publisher of a textbook or a standardized test is not necessarily the group that prepares the material? I worked for a textbook preparation firm that prepared the pages for printing for publishers. Frequently, questions and even paragraphs of text were rearranged by our editors and composers to fit onto the page. For example, with multiple-choice questions, all possible answers had to be directly under the question, with none running over to the top of the next column. Sounds sensible, but frequently the only way to accomplish this was to interchange #3 with #6 or to reword one of the questions or some of the answers. Yes, the editors at the publisher saw the material and had final approval, but when the deadline approached, they tended to let things go by. (I once discovered a mathematical error in a problem at the last minute and was told there was no time to fix it.) One publisher assigned two different editors to work on a project--they didn't like each other and gave us conflicting instructions! It's a marvel that the material is still intelligible when it reaches the students, let along significant.

Posted by: opinionatedreader | December 19, 2009 2:17 PM | Report abuse

The comments to this entry are closed.


© 2010 The Washington Post Company