The hype of 'value-added' in teacher evaluation
The issue of evaluating teacher performance has been in the news a lot lately, including yesterday, when a school in Rhode Island fired every one of the educators in the building. Here today to discuss teacher evaluation methods is Lisa Guisbond. She is a policy analyst for the National Center for Fair and Open Testing, known as FairTest, a Boston-based organization that aims to improve standardized testing practices and evaluations of students, teachers and schools.
By Lisa Guisbond
As a rookie mom, I used to be shocked when another parent expressed horror about a teacher I thought was a superstar. No more. The fact is that your kids’ results will vary with teachers, just as they do with pills, diets and exercise regimens.
Nonetheless, we all want our kids to have at least a few excellent teachers along the way, so it’s tempting to buy into hype about value-added measures (VAM) as a way to separate the excellent from the horrifying, or least the better from the worse.
It’s so tempting that VAM is likely to be part of a reauthorized No Child Left Behind. The problem is, researchers urge caution because of the same kinds of varied results featured in playground conversations.
Value-added measures use test scores to track the growth of individual students as they progress through the grades and see how much “value” a teacher has added.
Policymakers want to use this data to evaluate teachers and make decisions about pay, tenure or termination.
Tracking an individual child’s progress is clearly better than what we have now: a kind of apples to oranges comparison of the average scores of this year’s fourth graders to last year’s, for example. It’s easy to see how such comparisons could get muddled by a large influx of kids with autism, for example. That’s why value-added measures initially seem so attractive.
So why do I think we need a BS detector when considering this idea? Well, here’s what the National Academy of Sciences Board on Testing and Assessment (BOTA) says:
“A great deal is unknown about the potential and limitations of alternative statistical models for evaluating teachers’ value added contributions to student learning. BOTA agrees with other experts who have urged the need for caution and for further research prior to any large-scale, high-stakes reliance on [value-added approaches].”
Here are four cautions, among many:
*First, value-added rests on the shaky assumption that math and English test scores tell us what we need to know about student progress. No matter how good a test may be, it can’t measure all of what parents want their kids to be learning and doing in school. In short, value added would intensify the existing unhealthy pressure on teachers to teach to the test.
*Second, as I pointed out in my last post, it’s impossible to tease out the effect of one teacher from those who came before, or from a music teacher, for example, who is the linchpin in a musical student’s school week (but is not measured by any test). It’s also difficult to separate a teacher’s influence from the influence of a chaotic home, poor nutrition, lack of sleep or a host of other factors.
*Third, the validity of this approach rests on the false assumption that students and teachers are assigned randomly. In reality, senior teachers can and do choose better schools and classes, while parents in affluent towns fight to get their kids into classrooms of teachers with good reputations. Think this might skew test results a bit?
*Fourth, value added doesn’t give us any information about what practices distinguish good teachers from bad. All we know is good teachers get better test scores, not what they did to achieve this.
Oh, and here’s an interesting twist: Researchers looking at math test results saw more variation within one teacher’s “effectiveness” than from one teacher to another. Turns out “good” teachers aren’t consistently good, and “bad” teachers aren’t consistently bad. As I was saying, your results will vary.
For more in-depth analysis of value-added measures and why growth should be assessed using multiple measures, see one of FairTest’s analyses of VAM.
Follow my blog all day, every day by bookmarking
Posted by: mamoore1 | February 25, 2010 2:46 PM | Report abuse
Posted by: emilymb1 | February 25, 2010 5:45 PM | Report abuse
Posted by: celestun100 | February 26, 2010 4:01 PM | Report abuse
The comments to this entry are closed.