Network News

X My Profile
View More Activity

Casting Doubt on My Pro-Testing Bias

The scholarly Rothstein family, father Richard and son Jesse, are taking turns removing the intellectual underpinnings of people like me who want to judge schools based on standardized tests. Richard, of the Economic Policy Institute, recently published a fine book--“Grading Education”-- investigating viable alternatives to assessment through testing, such as the English system of visiting inspectors. Now Jesse, just moved from Princeton to UC Berkeley, has published a paper in the Quarterly Journal showing that value-added testing measures---a favorite of mine---may not be so good at identifying the most effective teachers after all.

I learned about the younger Rothstein’s study in the latest Education Week (bias alert: I am on their board). Debra Viadero, the nation’s best reporter on education research, says Rothstein found that differences between teachers, when measured by how much their students’ scores improve, may be heavily influenced by the way those students were assigned to their classes. Viadero discusses two other studies, however, that indicate value-added measures may work if several years of data are used and if exposed to research using random assignment procedures. She also reveals that Mathematica Policy Research Inc., a giant in the education research field, is gearing up a massive study of value-added measures, but I will probably be completely senile (the study has about four years to go) before the results come out.

By Washington Post Editors  | July 21, 2009; 6:13 PM ET
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: Admissions 101: Do We Have the Guts to Tell Our Kids We Can't Afford Their Dream Schools?
Next: Jay on the Web: Which Makes A Bigger Difference - Good Teachers or Administrative Processes?


Dan Willingham has written and spoken (on Youtube) on this topic. Although his exact focus was on merit pay, he was talking about using tests to identify the most capable teachers. It's a short step to using these measures on schools.

He ends by saying: "Merit pay can't work until there's a way to measure teacher performance that's fair."

Posted by: MathCurmudgeon | July 21, 2009 11:42 PM | Report abuse

Thank you, thank you, thank you.

And the previous commenter makes a great point on cognitive science and its lessons of how and why data-driven accountability will backfire.

Notice that the (scientific) opponents of Rothstein always issue so many caveats. If you are aware of the infinite diversities within public education in this huge, diverse, and segregated nation you'll see why those caveats mean that top down testing models can't work. Its not just the unpredictable variety of classroom realities that can't be compared statistically. Its also the infinite variety of ways that principals are assigned and, real world, that can make or break the channces of effective teachers to produce gains. And the inifinite ways that central offices impose policies that can make or break principals.

"Reformers" say "so what if the models are imperfect?"

But the question answer is imperfect for what. Would you commit yourself to the urban classroom if statisical glitches meant that you had a 1 in 10 chance per year of having your career destroyed? Or 1 in 20? Our 1 in 5 that a fellow teacher will be destroyed unfairly?

Notice that Fenty and Rhee don't want their accountability schemes to be held accountable and want to kill independent evaluations of their "reforms." For an explanation why that human dynamic will always be with us, read Rothstein's father.

Posted by: johnt4853 | July 22, 2009 8:30 AM | Report abuse

So, are you going to apologize to all of us, classroom teachers like me and researchers like Bracey, who have been making these points to you for most of the past decade?

Posted by: teacherken | July 22, 2009 11:34 AM | Report abuse

My friend teacherken---if I had been flaming you and your thoughtful allies all that time, an apology might be in order. But ours has been reasoned discourse, with no need for apologies on either side. And I haven't crossed over yet. All I am doing is pointing out when your side has some good points. As you know, I make up my mind based on what I see working for kids in schools, so I am going to have to see alternatives to testing actually operating with student achievement soaring, particularly for impoverished kids, before I wave the white flag.

Posted by: jaymathews | July 22, 2009 4:54 PM | Report abuse

As a Family and Consumer Science teacher, the core strand of my curriculum is personal resource management. Who would purchase a product that had a consumer review that included phrases such as "may not be so good," "may be heavily influenced by...", and "may work if..." are not indicators of product reliability. They trumpet "Caveat emptor!"

Because teacher quality matters so much, value-added teacher assessment still might be considered as one of multiple factors to determine teacher effectiveness if it was a cost neutral and "somewhat" reliable indicator. However, "gearing up a massive study of value-added measures" is likely to have a massive price tag. You don't need to be a Harvard economist to understand that it's unwise to invest our finite resources in unreliable products and it's ethically questionable to divert money intended and desperately needed in our classrooms to fund research that "may not be so good."

I continue to be amazed that teachers and children are held to higher standards of accountability than researchers and policymakers.

Posted by: susangraham | July 22, 2009 11:40 PM | Report abuse

I don't want to speak for teacherken - who does an amazingly prolific job speaking for himself - but I think I recognize his frustration. I don't see him asking for an apology for any inflammatory writing, but rather, an apology of sorts for failures by the media, policymakers, and the research community - for not listening to the voice of classroom teachers. We have been saying this for years, and pointing out all of these flaws, and citing the research and analyses that do support our view. Many researchers are constrained by time, money, or access from collecting stronger data, and yet they publish results with stronger conclusions than are warranted. Then we endure the salesmen who use mediocre research to push their agenda by trying to submerge the collective voice of experienced practitioners as they trumpet the supremacy of a large data set.

For what it's worth, I wrote an article for Teacher Magazine a few months back, in which I tried to illustrate the folly of linking test scores to teachers, especially at the secondary level.

Posted by: DavidBCohen | July 22, 2009 11:42 PM | Report abuse

Good point DavidB. But keep in mind not all teachers agree with you. I have spent a lot of time with extraordinary classroom educators, like Jaime Escalante, Dan Coast, Mike Grill, Harriett Ball, David Levin and Mike Feinberg, who think standardized testing has an important place in making schools work, and think the annoyances and excesses of these testing regimes are outweighed by their benefits.

Posted by: Jay Mathews | July 23, 2009 3:25 PM | Report abuse

Jesse Rothstein is not the first person to find significant problems and errors with VAM, which was once seen as the ultimate answer to teacher evaluation. This was not a discussion about the value of standardized tests, however--it was an incisive scholarly analysis of a statistical tool that few people fully understand. That's a different issue. VAM is not as reliable or useful as researchers would like, but it's seductive-- especially if making your living depends on getting huge grants.

Good teachers accept standardized testing if the data yielded is used productively. The money quote in the EdWeek article came from UWM's Doug Harris, who said that arguing about VAM was the wrong debate: "We need to know how to use the measure in practice to improve school performance."

Posted by: nflanagan2 | July 23, 2009 4:26 PM | Report abuse

Hello again Jay,

I respect those other viewpoints, though I'd be curious to know which tests specifically, and which grade levels they're talking about.

Notice also how there's a slight shift in the focus of the argument now. Your initial post calls into question the idea of using standardized tests as a measure of teacher quality, while in your response to me above, you point out some teachers say those tests hold "an important place in making schools work." Though still with some reservations, I would say that testing is more likely to yield helpful data regarding a school than an individual teacher.

Posted by: DavidBCohen | July 23, 2009 4:32 PM | Report abuse

to DavidB: I agree completely with your point. I have columnized against individual merit pay, and much prefer that schools be judged in toto, as a team, since the best schools i know work as teams. As for grade levels, Ball, Levin and Feinberg taught in elementary and middle school grades, and used the state annual criterion referenced tests, plus off the shelf norm referenced tests like the Stanford 10, to judge their work. Escalante and Grill focused on AP, and Coast on IB.

Posted by: Jay Mathews | July 23, 2009 6:10 PM | Report abuse

Jay, I keep thinking you are making a breakthough, then you seem to retreat. So there are some fine teachers and schools who either ignore the tests yet do well or see some benefits to them. But the evidence of damage has become overwhelming, especially to most people in classrooms and clearly to growing numbers of parents, as indicated in many surveys.

But I want to focus on one point in particular: "I am going to have to see alternatives to testing actually operating with student achievement soaring, particularly for impoverished kids, before I wave the white flag."

You can go to the NY Performance Standards Consortium to see such schools. The Big Picture schools and High Tech Highs rely on all sorts of performance assessments rooted in teacher evaluation as their primary, dominating forms of assessment. They use assessment to improve teaching and learning, and their results show good assessment is valuable. Soaring? Well, surely that is a requirement almost no schools that focus on boosting test scores can claim to do. But how do you define soaring achievement - circularly, by test scores, or by some other criteria (which we tend to lack, in part for having put so many eggs in the testing basket)?

Of course these school networks are not whole systems (NY Consortium is some 30 schools). Since NCLB functionally precludes building a different sort of system, we cannot see large-scale uses of very different approaches to assessment at work in the US. But as Linda Darling-Hammond and colleagues have shown in various articles, the US tests far more than any developed nation or nation doing well on international assessments; many of those nations mix national tests with local assessments (and do better than the US); and some use no standardized testing at all (and do better).

I think you continue to ignore too much evidence. And here I think you are more or less saying, I won't support a new approach until the new approach proves itself, but you are ignoring that the system you defend precludes the possibility of ever having such a system. So we have to combine the evidence from schools and networks in the US with international evidence to say, there are rather certainly far better approaches. You should be encouraging the federal government to support the careful development of such new systems, such as through the use of stimulus funds. (The guidelines for that come out tomorrow, I hear - we shall see what they encourage, allow or preclude).

BTW - do you know of any schools that assign teachers and students randomly, the apparent pre-requisite for making 'growth models' be fair for evaluating teachers? That is independent of the fact we'd still just be talking about growth on standardized test scores. It's like Sec. Duncan who said we should not ignore achievement in evaluating teachers, but functionally defines achievement as test scores. You, and he, can do better.

Lastly, thanks for your willingness to engage in dialog.

Posted by: montyneill | July 23, 2009 7:37 PM | Report abuse

Jay, why do you think any of these tests are valid? What evidence can you cite that any of these standardized tests actually test what they claim to be testing?

Posted by: Nemessis | July 26, 2009 10:01 PM | Report abuse

The comments to this entry are closed.

RSS Feed
Subscribe to The Post

© 2010 The Washington Post Company