Network News

X My Profile
View More Activity

Posted at 2:30 PM ET, 09/ 1/2010

Should test scores be used AT ALL for teacher evaluation?

By Valerie Strauss

Earlier this week a major report was released (pdf) saying that “value-added” formulas based on standardized test scores to evaluate teachers are unreliable and should not be used as a major factor in teacher assessment.

“Value-added modeling” has become the new big phrase in the education world. Essentially, it means measures that use test scores to track the growth of individual students as they progress through the grades to see how much “value” a teacher has added.

The value-added movement is supported by the Obama administration, which encouraged states to change laws to allow teachers to be evaluated primarily by such measures. And the Los Angeles Times recently used such a formula to grade more than 6,000 California teachers in a project that is highly controversial.

This would all be fine if assessment experts haven’t repeatedly warned that standardized tests designed for students should not be used to evaluate teachers. But they have. In addition, value-added formulas do not include other factors that affect students, and can skew results by giving better scores to teachers who “teach to the test” and lesser scores to teachers who are assigned students with the greatest educational needs.

In this climate, the Economic Policy Institute, a nonpartisan, nonprofit think tank based in Washington, the report, which concludes that heavy reliance on VAM methods should not dominate high-stakes decisions about teacher evaluation and pay.

The report, written by 10 prominent educators and researchers, says:

There is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed.

For a variety of reasons, analyses of VAM results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers. VAM estimates have proven to be unstable across statistical models, years, and classes that teachers teach.

And it warns of negative consequences if "value added" is a key component in evaluation -- including more “teaching to the test” and narrowed curriculum. Further, the study says, teachers may try to avoid being assigned particularly needy students because they do worse on standardized tests.

With all of that said, I wondered why the report did not say that these measures should not be used at all in evaluation.

The executive summary says:

Legislatures should not mandate a test-based approach to teacher evaluation that is unproven and likely to harm not only teachers, but also the children they instruct.

But it also says:

Adopting an invalid teacher evaluation system and tying it to rewards and sanctions is likely to lead to inaccurate personnel decisions and to demoralize teachers, causing talented teachers to avoid high-needs students and schools, or to leave the profession entirely, and discouraging potentially effective teachers from entering it.

So, what gives? Why should VAM measures be used if there is no consensus that they are reliable assessment tools? Why should they be given any weight? There are better, albeit more time-consuming ways, to weed out bad teachers.

I asked EPI to query the authors about this, and received a response from Helen F. Ladd, professor of public policy and economics at Duke University, president-elect of the Association for Public Policy Analysis and Management.

You can see the full list of authors, which includes Diane Ravitch and Linda Darling-Hammond, here, along with the executive summary.

I asked: If student standardized test scores are unreliable as stated in the study, why should they be used at all in teacher evaluation? Why doesn’t the study say they should not be used, period, for this purpose? Was the study bending to political reality?

Ladd: "There is no perfect way to evaluate teachers. Test scores are unreliable; so are principal observations, or peer evaluations, or analysis of videotapes, and so on. The only way to evaluate teachers fairly is to gather information from a variety of imperfect sources, each of which may contribute some information. If a teacher seemed to be ineffective in all of these measures, I’d be pretty confident that the teacher was ineffective. But if a teacher were ineffective only on one of them, I would be reluctant to make that conclusion.

"Test scores are unreliable, but they are still more often right than wrong, but not sufficiently more often to justify making high-stakes decisions on the basis of test scores alone. But giving test scores too much weight in a balanced evaluation system runs the additional danger of creating incentives to narrow the curriculum, as we described in the paper. If they are not given too much weight, this danger is lessened. How much weight they should be given should be a matter of local experimentation and judgment. All we say in the paper is that giving them 50 percent of the weight is too much."

Follow my blog every day by bookmarking And for admissions advice, college news and links to campus papers, please check out our Higher Education page at Bookmark it!

By Valerie Strauss  | September 1, 2010; 2:30 PM ET
Categories:  Research, Teacher assessment, Teachers  | Tags:  VAM, economic police institute, epi report, evaluating teachers, la times and teachers, los angeles times and teachers, research, teacher assessment, teachers project, value added and teachers, value added formulas, value-added, value-added measures  
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: Why paying parents to attend school events is wrong
Next: How to help your child adapt to college life


Teachers should not be evaluated based on standardized test scores. That's plain wrong. How about the kids who don't "try" to do well? Kids who do well on classwork and then "freeze" on standardized tests? How about the kids who are exempted? I always did that with my special ed kids, because they never did well and my regular ed kids would fill in the bubbles to make a design. I believe that these tests are an insult overall. They just to keep psychometrists employed.

Posted by: kodonivan | September 1, 2010 3:00 PM | Report abuse

If student test scores are used to evaluate teachers, they should also be used to evaluate school district superintendants and school boards. After all, who made te decision to hire the teachers in the first place? Who determines the conditions under which they teach?

Posted by: rlguenther | September 1, 2010 3:30 PM | Report abuse

I wish the general public knew this.

Posted by: educationlover54 | September 1, 2010 3:49 PM | Report abuse

Food for thought.

Let us pretend that this "method" is valid.

This means that, as the supporters of the "method" claim, previous test results can be used to accurately indicate the future scores of a student. Remember the "method" is claimed to be totally accurate since the "method" according to the supporters can be used to fire teachers simply based upon test results of students.

The supporters claims that the "method" will be accurate in indicating teachers that are below average, average, or superior.

Let us take teacher A who is superior based upon previous test results. Three students of teacher A fail the test at the end of the year of the class of teacher A.

Based upon the "method" these three students will continuously fail in the future. They had a proven superior teacher and they failed. Remember we have accepted that the "method" is accurate in indicating future scores of students for below average, average, and superior teachers. The "methods" accurately indicates that future scores of the three students that failed with superior teacher A will be failure no matter the teacher these students have for the next year.

So if the "method" is valid it can not only be used to get rid of below average teachers, but could also be used to stop wasting public funds on students that the accurate "method" indicates will always fail in the future.

If we know students that will fail in the next year will fail no matter what teacher they have, then they always must fail in following years.

Instead of expensive teachers these students can all be identified and placed in rooms that are overseen by minimum wage workers. No need to do any further expensive testing on these students and it is far better to simply cheaply warehouse them until they no longer have to attend the public schools.

In fact all children entering the 1st grade can be given proven superior teachers. Those children that fail the test after a year with these superior teachers can simply be designated for cheap warehousing.

Time for Americans that are always looking for quick fixes to understand the nonsense of a "method" and supporters that pretend that present test scores can accurately indicate future test scores. A test score only indicates the test score of a student for a given test. It is as simple as that.

Posted by: bsallamack | September 1, 2010 4:01 PM | Report abuse

A better way to evaluate teacher would be to have highly experienced teachers -- those near retirement or IN retirement -- to visit and observe/mentor working teachers. These observers would be limited to observing teachers in their DISCIPLINE AND AGE GROUP. One of the most absurd aspects of the current 'system' is that you can have a former middle school English teacher observing high school physics teachers. It's like a tugboat captain observing an airline pilot. They're both called captains! And they're both experienced in guiding vehicles!

THat said, I would much rather have been judged by test scores than by observation by the ignorant. The current system encourages 'engagio-tainment' by teachers, and the results speak for themselves.

Posted by: physicsteacher | September 1, 2010 4:28 PM | Report abuse

Value-added can be used as one of a group of measures to assess effectiveness of a teacher. The fact is that other measures are also subjective and can be invalid as well.

Posted by: Nikki1231 | September 1, 2010 7:15 PM | Report abuse

>"This means that, as the supporters of the "method" claim, previous test results can be used to accurately indicate the future scores of a student." - written by bsallamack

Sorry, no one can predict the future. "Validity" means that a test measures what it claims to measure. The fact is, we have to be clear what we mean by a teacher's effectiveness, and then we have to try and find a method that measures effectiveness according to our definition.

>"Remember the "method" is claimed to be totally accurate" - written by bsallamack

No one claims this method is totally accurate. It does, however, seem be at least as accurate as any other method we have, and more accurate than credentials or experience in predicting a teacher's effectiveness. (In this case, effectiveness is defined as the ability to increase their student's test scores.)

Since there are so many misconceptions about how VA modeling works, let me see if I can try to explain parts of it.

The model assumes that a student's performance this year is related to their performance in prior years, and related to their teachers performance over all of their students, and related to their schools peformance over all the students who attend that school. The VAM then attempts to tease out the different contributions to the student's performance: the capabilities of the student, the effectiveness of the teacher, and the social milieu of the school.

I know that no one in this blog really wants to know this. Everyone has already dug in. You can at least be happy you are not in health care, where the government may forbid you from unionizing, the government will dictate how much you are paid, and the government will intrude on your day-to-day operations without concern about how well your patients are cared for, but rather with concern about how well you fill out your paperwork. I would not feel too sorry for myself - it can always become worse.

Posted by: cypherp | September 1, 2010 8:38 PM | Report abuse

Starting tomorrow (9/2/10) I am going to post a series of comments related to teacher evaluation on my blog at

I have worked with this issue for the past 50 years and I think some things are missing from the current debate.

Check out my comments and let me know what you think.

Posted by: clarkd1 | September 1, 2010 9:31 PM | Report abuse

I don't think the method is accurate at all, however, people like it because of the (false) simplistic idea that the students will do well on a test if they have a good teacher. Your average American is not going to get the idea about teaching to the test and narrowing the curriculum being wrong. To your average non-teacher it is just "common sense" that students who score well have good teachers and students who don't score well have bad teachers.
Also, as my sister, a medical doctor, recently pointed out to me, teachers are sort of the last group of real middle class Americans with benefits. She thinks that is why they are under attack, because unemployed people are envious of the benefits and because people don't want to pay taxes.
I agree with physicsteacher. I would rather be evaluated by test scores than by a principal who was a P.E. teacher. I teach foreign language.

Posted by: celestun100 | September 1, 2010 9:36 PM | Report abuse

The test scores are being used for way more than they were ever intended for as far as evaluations of students also. Kids are actually placed in groups based on the scores from one test that is sometimes isolated from the curriculum they are studying.

I think the tests are fine, if expensive, ways to assess student achievement if they are good tests. However, they don't indicate how well students will do in subjects and are not supposed to be "diagnostic" tools, are they?

Now we are using standardized tests to judge teachers, so the whole thing has just taken on a life of its own.

MCPS has a very good foreign language county exam. When I read my students test essays, I can tell who understands what I taught and who doesn't. I don't usually get big surprises because I read essays the students write during the semester all the time.
There is also a speaking portion to the test. These are real tasks that the students have to perform in order to pass the test.
If a test is just multiple choice, I only know that the student knows some material, has some good reading comprehension and is a good test taker. I don't know if the student can write a good paragraph. It takes more time and effort on a teacher's part to teach paragraph writing than it does to eliminate wrong answers from a multiple choice test.

That is my perspective. When I taught elementary, I always thought that the good readers scored very high on standardized tests without any special test prep sessions. Poor readers would score average or a bit below with a lot of coaching, but perhaps would have been better off with more time to practice reading and writing.

And that is why I don't like all the emphasis on testing. I would rather the students were given direct instruction and plenty of time for reading and writing itself. I think it is silly to teach non readers reading skills without teaching them how to read. I also think it is silly to teach non English speakers isolated skills without word comprehension.

Posted by: celestun100 | September 1, 2010 9:52 PM | Report abuse

There you go again Valerie! Making rational arguments supported by research is such an attractive quality but otherwise pretty much useless as our capitalist economy speeds headlong toward its death.

The banks now have grabbed almost all the wealth of this nation.


But even gorged on the people's money, the banks are ironically walking dead, like Nobel Laureate Paul Krugman used to say, they're zombies. More like vampires to my mind. Because there are a few pockets of wealth and working class influence they simply must suck up if they are to survive awhile longer. There's the nation's public school system, there's public and private pension funds, there's the Social Security system, Medicare and Medicaid, there's all the public services delivered by state and local governments.

Then there's the largest unionized work force left in this country, the teachers. What, you thought this sudden flood of disparagement and denigration of teachers, including this value added assessment nonsense, was "for the kids"? What started with Reagan and PATCO is supposed to end with Obama and NEA-AFT. Can't pay teachers those exorbitant $50,000 a year salaries and a pension in their old age. We've got multi-million dollar bonuses for Lloyd Blankfein and Jamie Dimon to think of after all.

Posted by: natturner | September 2, 2010 12:28 AM | Report abuse

Barack Obama needs to start listening to the critics of Arne Duncan. Obama is half African American yet he is listening to the policies that indirectly discriminate against African American teachers and students. Our first president with minority blood is participating in discrimition against minorities. To me, this is racism.

Posted by: educationlover54 | September 2, 2010 8:57 AM | Report abuse


I agree the commercial and investment banks are largely badly managed organizations that pay ridiculously high salaries, while their ethics and social responsibility are marginal, if not in the toilet. Their mortgage businesses alone should make several of them criminal enterprises. The government we elected and sustain largely supports their existence as you know. Thus the bailouts.

Also recognize: "we" own most of the banks through mutual funds, public employee pension funds, union trust funds, individual stock ownership, etc. If you have a 401K or the equivalent, you probably own a piece of Citi, Chase, B of A, Wells Fargo and all the rest of them.

Posted by: axolotl | September 2, 2010 12:10 PM | Report abuse

>"This means that, as the supporters of the "method" claim, previous test results can be used to accurately indicate the future scores of a student." - written by bsallamack

Sorry, no one can predict the future. "Validity" means that a test measures what it claims to measure. The fact is, we have to be clear what we mean by a teacher's effectiveness, and then we have to try and find a method that measures effectiveness according to our definition.

>"Remember the "method" is claimed to be totally accurate" - written by bsallamack
Posted by: cypherp
cypherp needs to keep up with the news.

Teachers in D.C. have already been fired totally based upon test results.

The "method" has been fully included in the evaluation of teachers in D.C. and can account for 55 percent of a teacher's evaluation.

This indicates that the "method" has been accepted as totally valid and totally accurate.

Might as well simply logically follow this in regard to students and cheaply warehouse the students where test results these students can not learn.

cypherp writes "no one can predict the future" but simply ignores that the "method" is totally based upon the supposedly ability to predict the future of test results based upon previous test results.

The reality is that if an individual believes that the foundations of a "method" are false then it is illogical for that individual to believe in the use of that "method".

I really wish that logic would be once again part of public education.

Posted by: bsallamack | September 2, 2010 12:47 PM | Report abuse

If student test scores are used to evaluate teachers, they should also be used to evaluate school district superintendants and school boards. After all, who made te decision to hire the teachers in the first place? Who determines the conditions under which they teach?

Posted by: rlguenther
This is heresy. Accept this and you might as well accept the ideas that political leaders should actually do something about the problems instead of simply blaming teachers.

Posted by: bsallamack | September 2, 2010 12:53 PM | Report abuse

> Teachers in D.C. have already been fired totally based upon test results. - written by bsallamack

Sorry, bsallamack, but "totally" does not equal 55%. Maybe you graduated from a Washington, DC high school before Michelle Rhee became chancellor? Otherwise, I'd expect you'd have learned your fractions and percentages.

I didn't hear teachers complaining when they were given a 21% pay raise, including retroactive raises, and increases in benefits. Why isn't the union complaining about the teachers who received the $20,000-$30,000 bonuses?

Why doesn't the union agree to open the personnel files of the fired teachers? Then maybe we'd see the terrible injustice that was done. Or maybe we'd see what a bunch of losers the fired teachers really were.

DC teachers are adults, aren't they? They signed their new contract which spelled out very clearly how they would be evaluated. Now, they are being asked to fulfil their end of the contract, and listen to all the complaints they're making. In the meantime, the DC school system is no longer the laughing-stock of the nation.

Posted by: cypherp | September 3, 2010 7:42 AM | Report abuse

Why don't we start using the some sort of evaluation on the president and Arne Duncan?

We need an evaluation system to judge them. Right now only the press is evaluating them. We need an objective (choke, choke) like IMPACT in order to judge them.

Posted by: educationlover54 | September 4, 2010 8:52 AM | Report abuse

A better question might be, "Should standardized test scores be used for anything?" How many people realize the number of layers between the original question and the printing? How many know that the order of the questions depends on the space on the page, so that a manuscript that starts out with reading comprehension questions following the order in which the ideas appear in the selection may be shuffled by the production people to fit the questions and answers into the same column? How many times are answer choices changed in the production level to fit? There have been cases--one in my personal experience--in which a reading selection taken from a published source was rewritten to make it easier for the students to understand, and a few years ago an author's daughter discovered her mother's work was used on the NY Regents exam but had been altered. How many parents are aware that our company worked on many materials intended as classroom texts that followed the multiple-choice format of the the standardized tests? Or that students are sometimes given practice tests and instructed on how to eliminate the impossible answer choices to increase their chance of guessing correctly, or told that if four answers in a row are "C," for instance, they should got back and double-check, because standardized test practice is to have no more than three of the same letter in a row? Or that the people creating the tests seem to assume no student knows anythign that is NOT tested; in one case, when our company pointed out two correct answers to a question, we were told the second one was fourth-grade vocabulary level so no third-grader taking the test would know it!

Standardized tests should be abandoned entirely.

Posted by: sideswiththekids | September 4, 2010 8:59 AM | Report abuse

Why would any teacher be reluctant to be evaluated on their students' performances? Only the marginal and/or the dregs need to worry about how their kids test. Teachers of high caliber and great quality would have nothing to cause concern.

Grow up all yous wusses. Stand up and be accountable for what you're supposed to be doing and stop making excuses against being objectively evaluated. The days of the subjective administrative teacher evaluation have been an embarrassment to the teaching profession and are over. Clean up your collective acts and deal with it.

Posted by: phoss1 | September 6, 2010 6:26 AM | Report abuse

Evaluating teachers by students' standardized tests assumes all the students tried to answer the test questions correctly. Some students just mark anything to get through, some who have major problems fill in the circles in a little design, one rebellious student was caught filling in the circles to form a certain word, and in Massachusetts several years ago there was a movement by students who resented so many standardized tests to turn in blank sheets as a protest. (Not to mention the questions that are so poorly worded no one can figure them out, let alone answer correctly.)

phoss1, that's why teachers are reluctant to be evaluated on their students' performances.

Posted by: sideswiththekids | September 6, 2010 12:01 PM | Report abuse

The comments to this entry are closed.

RSS Feed
Subscribe to The Post

© 2010 The Washington Post Company