Network News

X My Profile
View More Activity


Posted at 9:00 AM ET, 03/ 3/2011

Value-added assessment: Theory vs practice

By Valerie Strauss

The following was written by Matthew Di Carlo, senior fellow at the non-profit Albert Shanker Institute, located in Washington, D.C. This post originally appeared on the institute’s blog.

By Matthew Di Carlo
A few weeks ago, the National Education Policy Center (NEPC) released a review of last year’s Los Angeles Times (LAT) value-added analysis – with a specific focus on the technical report upon which the paper’s articles were based (done by RAND’s Richard Buddin). In line with prior research, the critique’s authors – Derek Briggs and Ben Domingue – redid the LAT analysis, and found that teachers’ scores vary widely, but that the LAT estimates would be different under different model specifications; are error-prone; and conceal systematic bias from non-random classroom assignments. They were also, for reasons yet unknown, unable to replicate the results.

Since then, the Times has issued two responses. The first was a quickly published article, which claimed (including in the headline) that the LAT results were confirmed by Briggs/Domingue – even though the review reached the opposite conclusions. The basis for this claim, according to the piece, was that both analyses showed wide variation in teachers’ effects on test scores (see NEPC’s reply to this article).

Then, there was another response, this time on the Times’ ombudsman-style blog. This piece quotes the paper’s Assistant Managing Editor, David Lauter, who stands by the paper’s findings and the earlier article, arguing that the biggest question is:

"…whether teachers have a significant impact on what their students learn or whether student achievement is all about … factors outside of teachers’ control. … The Colorado study comes down on our side of that debate. … For parents and others concerned about this issue, that’s the most significant finding: the quality of teachers matters."

Saying “teachers matter” is roughly equivalent to saying that teacher effects vary widely – the more teachers vary in their effectiveness, controlling for other relevant factors, the more they can be said to “matter” as a factor explaining student outcomes. Since both analyses found such variation, the Times claims that the NEPC review confirms their “most significant finding.”

The review’s authors had a much different interpretation (see their second reply). This may seem frustrating. All the back and forth has mostly focused on somewhat technical issues, such as model selection, sample comparability, and research protocol (with some ethical charges thrown in for good measure). These are essential matters, but there is also an even simpler reason for the divergent interpretations, one that is critically important and arises constantly in our debates about value-added.

Here’s the first key point: The finding that teachers matter – that there is a significant difference overall between the most and least effective teachers – is not in dispute.

Indeed, the fact that there is wide variation in teacher “quality” has been acknowledged by students, parents, and pretty much everyone else for centuries – and has been studied empirically for decades (see here and here for older examples). The more recent line of value-added research has made enormous (and fascinating) contributions to this knowledge, using increasingly sophisticated methods (see here, here, here, and here for just a few influential examples).

Therefore, the Times’ claim that the NEPC analysis confirmed their findings because they too found wide variation in teacher effects is kind of missing the point. Teacher effects will vary overall with virtually any model specification that’s even remotely complex. The real issue, both in this case and in the larger debate over value-added, is whether we can measure the effectiveness of individual teachers.

Now, if the Times had simply published a few articles reporting their overall findings – for example, the size of the aggregate difference between the most and least effective teachers, and how it varies by school, student, and teacher characteristics – I suspect there would have been relatively little controversy. The core criticisms by Briggs and Domingue would still have been relevant and worth presenting, of course – their review is focused on the analysis, not how the Times used it. But the LAT technical paper (and articles based on it) would really have just been one of dozens reaching the same conclusion – albeit one presented more accessibly (in the articles), using a large new database in the newspaper’s home town.

Of course, the Times did not stop there. They published the value-added scores for individual teachers in an online database. Just as the academic literature on value-added is different from using the estimates in high-stakes employment decisions, the paper’s publication of the database is very different from their presenting overall results.

Let’s say I was working for a private company, and I told my boss that I had an analysis showing that there was wide variation in productivity among the company’s employees. She probably already knew that, or at least suspected as much, but she might be interested to see the size of the differences between the most and least productive workers. The results might even lead her to implement particular policies – in hiring, mentoring, supervision, and the like. But this is still quite different from saying that I could use this information to accurately identify which specific employees are the most and least productive, both now and in the future.

The same goes for teachers, and that is the context in which the criticisms by Briggs and Domingue are most consequential. They address a set of important questions: How many teachers’ estimates change with a different model with different variables (and what does that mean if they do)? Did the model omit important variables that influenced individual teachers’ estimates? Were the estimates biased by school-based decisions such as classroom assignment? How many teachers were misclassified due to random error?

From this perspective, with an eye toward individual-level accuracy, the Times might have proceeded differently. They might have accounted for error margins in assigning teachers effectiveness ratings (as I have discussed before).

When confronted with the failure to replicate their results, they might have actually shown concern, and taken steps to figure it out. And they may have reacted to the fact that their results vary by model specification and were likely biased by non-random classroom assignment (which will likely be made worse by the publication of the database) by, at the very least, agreeing to make public their sensitivity analyses, and defending their choices.

Instead, they persisted in defending a conclusion that was never in question. They argued – twice – that the NEPC review also found variation in teacher effects, and therefore supported their “most significant” conclusion, even if it disagreed with their other findings.

On this basis, they downplayed the other issues raised by Briggs/Domingue (who are, by the way, reputable researchers pointing out inherent, universally accepted flaws in these methods). In other words, the Times seems to have conflated the importance of teacher quality with the ability to measure it at the individual level.

(Incidentally, they made a similar mistake in their article about the Gates MET report.)

And, unfortunately, they are not alone. I hear people – including policymakers – advocate constantly for the use of value-added in teacher evaluations or other high-stakes decisions by saying that “research shows” that there are huge differences between “good” and “bad” teachers.

This overall variation is a very important finding, but for policy purposes, it doesn’t necessarily mean that we can differentiate between the good, the bad, and the average at the level of individual teachers. How we should do so is an open question.

Conflating the importance of teacher quality with the ability to measure it carries the risk of underemphasizing all the methodological and implementation details – such as random error, model selection, and data verification – that will determine whether value-added plays a productive role in education policy.

These details are critical, and way too many states and districts, like the Los Angeles Times, actually seem to be missing the trees for the forest.

-0-

Follow my blog every day by bookmarking washingtonpost.com/answersheet. And for admissions advice, college news and links to campus papers, please check out our Higher Education page at washingtonpost.com/higher-ed Bookmark it!

By Valerie Strauss  | March 3, 2011; 9:00 AM ET
Categories:  Guest Bloggers, Matthew Di Carlo, Teachers  | Tags:  la times, la times teacher, los angeles times, los angeles times teachers, los angeles times teachers series, teacher assessment, teacher evaluation, teachers, teachers and value-added, teachers database, value added  
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   Del.icio.us   StumbleUpon   Technorati   Google Buzz   Previous: The Diane Ravitch myth
Next: DFER’s achievement gap ‘bull’

Comments

This--"This overall variation is a very important finding, but for policy purposes, it doesn’t necessarily mean that we can differentiate between the good, the bad, and the average at the level of individual teachers"--is one the most important sentences in the debate.

To ISOLATE direct and singular causation between ONE teacher and ONE student is impossible in the context of time and money required to control for all factors in the student outcome mix. This is complicated and counter-intuitive, but simply a fact.

This does NOT discount that we should be addressing teacher quality and student outcomes--just not in the simplistic and mechanical ways we are hearing and have been doing for a century.

Posted by: plthomas3 | March 3, 2011 9:27 AM | Report abuse

Just a note to thank you and the Post for The Answer Sheet. I appreciate the posts and your analysis of many things that are accepted at face value by too many reporters.

Posted by: Kate15 | March 3, 2011 11:36 AM | Report abuse

One reason why the LA Times, and the Los Angeles community, might have dismissed Briggs' critique without a second thought; the Los Angeles school district has spent $500,000 per teacher ($3.5 million total) trying to fire just SEVEN of the districts 33,000 teachers for poor classroom performance.

The LA community knows full well the extreme lengths the teachers union, and their allies, will go to protect incompetent and ineffective teachers.

Posted by: frankb1 | March 3, 2011 1:33 PM | Report abuse

LA Mayor Antonio Villaraigosa:

"When we fought to change the seniority-based layoff system that was disproportionately hurting our neediest students, the teachers union fought back.

When we fought to empower parents to turn around failing schools and bring in outside school operators with proven records of success, the teachers union fought back.

And now, while we try to measure teacher effectiveness in order to reward the best teachers and replace the tiny portion who aren't helping our kids learn, the teachers union fights back.

It's not easy for me to say this. I started out as an organizer for UTLA (United Teachers Los Angeles), and I don't have an anti-union bone in my body. The teachers unions aren't the biggest or the only problem facing our schools, but for many years now, they have been the most consistent, most powerful defenders of the unacceptable status quo."

Posted by: frankb1 | March 3, 2011 1:35 PM | Report abuse

Title: Frank Findings

To ignore a fact,
No matter how big or small,
Shows a lack of tact.

Posted by: DHume1 | March 3, 2011 2:11 PM | Report abuse

From the LA Times’ ombudsman:

"It also is worth noting that the policy center is partly funded by the Great Lakes Center for Education Research and Practice, which is run by the top officials of several Midwestern teachers unions and supported by the National Education Assn., the largest teachers union in the country and a vociferous critic of value-added analysis."

Posted by: frankb1 | March 3, 2011 3:17 PM | Report abuse

And MOST importantly from the LA Times:

"In the next few weeks, The Times plans to publish version two of the database, updated with a new year of data and incorporating a number of changes based on that feedback."

Posted by: frankb1 | March 3, 2011 3:22 PM | Report abuse

Spinning one's wheels in an effort to discredit someone is not the same thing as successfully refuting an argument. Objecting to a flawed system of measuring employees' effectiveness with the attendant consequences is not the same as defending the status quo.

Posted by: mcnyc | March 3, 2011 5:40 PM | Report abuse

Seems like a certain FrankB is monopolizing the comment section by defending a study that has been proven statistically flawed not only by those mentioned in this article, but by others including the people that warned NYC against using them to evaluate teachers.

Are there some bad apples? Yes, but as this study proves, a very effective teacher can falsely be labeled ineffective. The LA Times seems to be covering their butts rather than admit the truth.

And Frank, effectiveness should never be based on passing scores but on the amount of progress a student makes in that year. A student making a year or more's growth can still fall below the passing grade of a standardized test. That is why other methods of assessing students should also be taken into account. That's significant growth.

Progress should always be rewarded and that student should be made to feel proud of that achievement and work towards reaching or exceeding grade level. People have different rates of acquiring learning, and their journey is different.
To judge a child or teacher by that one test is really an injustice.

Posted by: Schoolgal | March 3, 2011 5:44 PM | Report abuse

Schoolgal: "The LA Times seems to be covering their butts rather than admit the truth."

No, they are moving forward. And winning accolades & awards for it (see link below).

In case you missed it:

"In the next few weeks, The Times plans to publish version two of the database, updated with a new year of data and incorporating a number of changes based on that feedback."

And in the next few years, newspapers (and other media) across the country will be following their lead.

http://www.mediabistro.com/fishbowlla/la-times-wins-philip-meyer-award-for-controversial-teacher-story_b22001

Posted by: frankb1 | March 3, 2011 6:05 PM | Report abuse

And they are on the move in New Jersey:

N.J. Proposes Way to Measure Teachers

"A Christie administration task force is recommending New Jersey teachers be judged half on student test scores and half on observations of teachers and other methods.

The evaluation system laid out in the group’s report, if approved by the state Legislature, would affect teachers’ pay and tenure. In what his administration is pitching as the “year of education reform,” Gov. Chris Christie is looking to make it easier to fire teachers, create charter schools and pay teachers based on job performance rather than seniority."


http://blogs.wsj.com/metropolis/2011/03/03/christie-proposes-new-way-to-measure-teachers/

Posted by: frankb1 | March 3, 2011 6:21 PM | Report abuse

What about the fact that many students couldn't care less about their performance on state standardized tests used for value-added measures of teacher effectiveness? Sure, elementary school students will tow the line, and some middle and high school students will accept the rhetoric of performing well for their school's sake, but many will get bored of reading long passages and doing monotonous calculations that don't figure into their grades, their chances of graduation or college acceptance, and reduce their effort or worse yet start guessing randomly so they can sleep for an extra 35 min. I have to admit, if I was a student, I wouldn't take these tests very seriously, especially for subjects that I am disinterested in or that I perform poorly in. Never mind the fact that these scores aren't accurate reflections of teacher effectiveness; In many cases they're not even accurate reflections of students' mastery of the tested standards.

Posted by: stevendphoto | March 3, 2011 7:22 PM | Report abuse

Just because state governments are signing onto a flawed system doesn't make it less flawed. Repeating a half-truth does not make it more true.

Posted by: mcnyc | March 3, 2011 7:28 PM | Report abuse

After reading a NYTimes article on how Bloomberg is buying off reporters, it didn't surprise me the LA Times won the award.

But I really don't think you understand that you cannot evaluate teachers based on one test. First understand, no teacher wants an ineffective teacher in the mix. But the principals, not union, does the evaluations. I wonder how many excellent teachers made the list as ineffective?? And what other profession has a list published in the papers?? All we want is a fair and balanced evaluation process. The LA Times changing their methods only proves they were wrong the first time. And will probably be wrong again.

Posted by: Schoolgal | March 3, 2011 10:14 PM | Report abuse

What boggles the mind is that people still seem think kids are vessels that information can just be poured into. Kids are kids. Some care about their work, some don't. Some try, some don't. Others are unable to focus for more than a minute or two. Often times, the teacher is great, but disruptive students make for an impossible learning environment.

Posted by: chicogal | March 4, 2011 1:09 AM | Report abuse

Bill Gates must be an idiot. One wonders why he would he accept this kind of LA nonsense as sound. Again public education is not a business.

Posted by: zebra22 | March 4, 2011 7:29 AM | Report abuse

"I hear people – including policymakers – advocate constantly for the use of value-added in teacher evaluations or other high-stakes decisions." Another partial truth in the ed reform debate in an attempt to substantiate one's POV.

Most, including Diane Ravitch, in any discussion regarding VAMs used in evaluating teachers have advocated for using it only as part of a "mixed measures" approach combined, of course, with subjective administrative evaluations. The degree to which VAM is used in a system is determined by the local collective bargaining agreement, not dictated by administration. As well, many "experts" also advocate for examining these results over time (at least 3-5) for more validity, that one or two years data has little validity. Did Mr. DiCarlo mention this in his piece?

It's also worth noting that random student placement is critical for any degree of effectiveness using a VAM. The "problem" students should not be placed with the same teacher year after year under the pretense they're more capable of handling these students. Does that teacher get more pay for any of this? Almost never.

Another aspect of VA measures almost never discussed is whether raw scores or percentage of growth from year to year is the accepted practice. For obvious reasons, percentage growth of a student is a more realistic view of a student's progress as opposed to their raw score, especially when dealing with youngsters from the lower learning cohort.

If you're going to discuss an issue, an attempt needs to be made to give both sides of the question and then allow the reader to judge for themselves.

Posted by: paulhoss | March 4, 2011 8:29 AM | Report abuse

The bottom line is this:

There is no test that can accurately measure the progress of each child in the class while evaluating the teacher at the same time. Yes, "value added" attempts to do this, but it is far from accurate at this time. But don't take my word for it: ask any testing expert.

It's not that difficult to evaluate a teacher, but the task requires time and expertise. That should be obvious to everyone.

Posted by: Linda/RetiredTeacher | March 4, 2011 1:44 PM | Report abuse

Post a Comment

We encourage users to analyze, comment on and even challenge washingtonpost.com's articles, blogs, reviews and multimedia features.

User reviews and comments that include profanity or personal attacks or other inappropriate comments or material will be removed from the site. Additionally, entries that are unsigned or contain "signatures" by someone other than the actual author will be removed. Finally, we will take steps to block users who violate any of our posting standards, terms of use or privacy policies or any other policies governing this site. Please review the full rules governing commentaries and discussions.




characters remaining

 
 
RSS Feed
Subscribe to The Post

© 2011 The Washington Post Company