Network News

X My Profile
View More Activity

Posted at 3:27 PM ET, 09/19/2010

Teacher: What my evaluation must include

By Valerie Strauss

David B. Cohen has been a teacher for 16 years, and is now in his 13th year of teaching in California public high schools. He earned a Master’s degree in Education at Stanford University in 1995 and achieved National Board Certification in 2004. He is a founding member of Accomplished California Teachers (ACT) and co-authored the group’s first policy report, which proposes a multiple-measure teacher evaluation system. Cohen is also a member of the Teacher Leaders Network and blogs at InterACT.

Here is a letter that Cohen sent to California Gov. Arnold Schwarzenegger; members of the California Board of Education; California Education Secretary Bonnie Reiss; and state Sen. Gloria Romero, chair of the California Senate Education Committee; Assemblymember Julia Brownley, chair of the California Assembly Education Committee

Dear California Education Policy Leader:
As the new school year begins, students and their families always deal with changes: New schools, new classes and teachers, and a whole array of questions about what the new year holds in store. More than in most years, teachers find themselves uncertain about what to expect, in part due to the swelling general and political interest in teacher evaluation. Most of us would welcome improved teacher evaluations that actually help us do our jobs better; our fear is that politics and expediency will lead us towards the misuse of state test scores in teacher evaluation.

Even the advocates of so-called "value-added" measurements concede the existence of variables known and unknown, and offer up various attempts to control for those factors.

They concede that sample sizes present a challenge. Then, they typically offer all sorts of mathematical formulas and ignore the longstanding warning from the three leading educational research bodies that advise against using test scores for teacher evaluation.

I'd like to take this opportunity to inquire about a more personal matter, however: My own teaching evaluation. You see, this year will bring with it considerable changes at my school, many of which will impact my teaching and my students. If you are willing to consider using student test scores in my future evaluations, I believe that I deserve to know how you view the relevance of these factors as they affect those test scores.

As you read on, please realize that I am not talking about hypothetical situations. I have not exaggerated the descriptions at all, nor are my questions rhetorical. Each listed item is a real change occurring at my school this year, and each question deserves an answer if you believe there is validity in value-added measures for teacher evaluation.

1. New principal -- According to an Urban Institute study (Damon, et. al., 2009) the experience level of a principal has an effect of student performance. "An important finding to emerge from our analysis is the positive impact of principal experience, particularly over the first few years of principals’ careers. ... [T]his implies that new, inexperienced principals will, on average, hurt school performance." In fact, authors of the study refer to that particular information as their "clearest finding." How do you propose to control for the effects of a rookie principal before evaluating me with test scores? What formula should be used for each successive year of our principal's time at our school?

2. One additional administrator in the school - According to this review of relevant research (Krueathep, 2008), the level of administrative support does seem to have effects on student performance, but the reviewed studies have varying conclusions, suggesting both improved and diminished student performance. How do you propose that my school make a determination about the effects of adding an administrative position, when the research is contradictory? After we make that determination, what statistical model do you propose we use in value-added measurements that will be used in my evaluation?

3. New class scheduling - My school will, for the first time, use block schedules for four days per week (compared to prior use of block schedules two days per week). According to C.W. Lewis, et. al., (NASSP Bulletin, Dec. 2005) block scheduling has a positive effect on student performance on standardized tests. How will this change be factored into my score-based evaluation? If some teachers at my school have more training and experience than others in relation to teaching in block schedules, would you propose that the test-score portion of teacher evaluations include different expectations for different teachers? Would you expect this effect to occur in the first year, or, if gradually, over how many years?

4. Later start time - This year, our school day will begin at 8:15 a.m. Last year, about 75% of our school days began at 7:50 a.m. According to many studies, including this one (Owens, et. al., 2010) reported recently in the Wall Street Journal, a later start time has a positive effect on adolescent learners. How much of an effect will be expected in our test scores?

5. My own teaching schedule -- Last year I taught two morning classes and two afternoon classes. This year, all my classes are in the morning. So, if you plan to use my students' test scores for evaluation, please consider this report (NYSUT, 1998) summarizing relevant studies: "Research indicates that many high school students do their best learning in the afternoon. One study found that afternoon reading instruction produced the greatest increase in reading scores as compared to morning instruction." Since half of my classes are now in the morning instead of the afternoon, please suggest a formula for the expected change in my effectiveness as measured on student tests. Also, I am teaching one new course this year compared to last, and it's a course that I haven't taught in the past five years. Am I expected to produce the same gains when teaching different courses? Will my "value-added" be compared to other teachers of this grade level, or other teachers of this course?

6. New tutorial period -- Our new school schedule will include 65 minutes per week for students to receive additional support. Some schools and districts in California (Whittier, Irvine, and many others,) are finding that tutorial periods provide valuable academic support that has a positive impact on student performance. There is also some research support (Balfanz, et. al., 2002) for that idea. However, our students will be allowed to make their own choices from a wide variety of tutorial offerings each week.

Will the state fund any additional data analysis in order to see how my students used that extra tutorial time? If they tend to spend that time with less effective teachers, or teachers of other subjects, will test scores be adjusted prior to their inclusion in my evaluation? If students usually spend that time with me, will I need to produce even higher test scores, and if so, how much higher? What if students need to use tutorial time in ways that won't produce higher test scores? Do you agree that the pairing of tutorial choices for students and VAM-based evaluations for teachers creates a conflict of interest between me and my students?

7. New school data management system -- We'll be switching over to a new web-based program that should improve communication between school and home, regarding attendance and grades. That changeover will take up many hours of teachers' time, due in part to training, as well as a lag time as we adjust to the new system and transfer information from other systems. Many studies indicate that adequate planning time affects teacher performance, so this change will have a potentially negative effect on our staff. However, we should end up with improved communication with parents and caregivers, and school-home communication has a positive effect (Shirvani, 2007) on student performance. So, this new system will potentially lower and raise test scores. Will the state help us measure which effect is larger, and determine how that effect should be factored into value-added measurements?

8. New colleagues - According to a study (Kirabo and Bruegmann, 2009) published in American Economic Journal: Applied Economics, "a teacher’s students have larger achievement gains in math and reading when she has more effective colleagues (based on estimated value-added from an out-of-sample pre-period). Spillovers are strongest for less experienced teachers and persist over time, and historical peer quality explains away about 20 percent of the own-teacher effect, results that suggest peer learning." This year, I will be teaching with a different combination of teachers in my department and school. Prior to my next evaluation, will the state help fund the data collection and analysis necessary to determine which teachers are having a "spillover effect" on the others? It seems only fair.

Looking over that list of changes for this school year, I have many, many questions about what the next 10 months will be like on my campus. I can say from past experience that this is an unprecedented amount of change. (And I'm not even asking you to address changes for which I can't cite relevant studies, though it seems relevant that my classroom will be within one hundred feet of a new building under construction later this year).

As you have certainly surmised, I am adamantly opposed to the idea of using state test scores in teacher evaluations. I have argued that point repeatedly in various ways, in various publications, but in this letter, I have focused on the entirely real situation in which I find myself this year. So, here's my final question for you: do you honestly believe that the combined effects of this much change can be measured?

We may disagree about the issues, and I expect that if you do disagree, you can support your position by answering my questions. If you cannot, then I hope you intend to come up with the answers and air them for public debate before enacting relevant policies. Any policies which fail to address these questions and which do not receive a proper hearing are sure to fail.

You have it in your power to exert great influence on students' lives, through your influence on my work, and the work of hundreds of thousands of my fellow California teachers.

You are also occupying an office entrusted to you to serve the common good for our state. You assumed that office more recently than I began teaching, and I will continue teaching when you have left that office. As a professional with the utmost commitment to my students and community, and the utmost desire to teach well, I request a reply from you that will be of some practical guidance in our shared mission to serve students. I trust that your reply will reflect your commitment to crafting and implementing wise policies that will actually work within complex realities of our schools.

David B. Cohen

NOTE: To date, none of the addressees has responded, and only one has acknowledged receiving this communication.

By Valerie Strauss  | September 19, 2010; 3:27 PM ET
Categories:  Guest Bloggers, Teacher assessment, Teachers  | Tags:  accomplished california teachers, california schools, californiai and education, schwarzenegger and education, teacher assessment, teachers, value added  
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: Ravitch: Why civil rights groups oppose the Obama agenda
Next: Has education reform jumped the shark? A teacher says 'yes'


Thank you, Valerie, for posting this letter on The Answer Sheet. Looking back, I wish I'd added one other point. Some people might counter with the argument that VAM could be used to distinguish among teachers at the same school, making school-level changes less relevant in teacher comparisons; I would respond that you have to assume then that each change affects each teacher equally - a tough assumption to prove, and common sense suggests that it would be a faulty assumption.

Posted by: DavidBCohen | September 19, 2010 4:59 PM | Report abuse

Mr. Cohen,

Thank you for doing this. Although I don't live in California, the more people education about this issue - the better.

Posted by: educationlover54 | September 19, 2010 5:15 PM | Report abuse

Great letter, David. I hope you get the responses it deserves, now that Valerie has published it.

Posted by: PLMichaelsArtist-at-Large | September 19, 2010 6:56 PM | Report abuse

Hi David- Glad you made it here to Washington! A few comments.

Because there are so many variables in teaching, and because we cannot randomly assign students to groups, academically speaking, we don't really know what makes a good teacher. We just have a public data base of knowledge that is widely accepted as "things good teachers do." Likewise, from a research perspective, we'll never be able to determine using tests the impact a teacher has on student learning.

The question however, is whether or not testing can provide us any usable information about whether a teacher is effective. And whether or not it can be used to corroborate other evidence about what we believe to be quality teaching.

David, do you mean to suggest that Value Added Measurement gives us as much information about teaching quality as picking balls out of a hat (honest question)? Or do you mean to suggest because VAM's use as a measurement tool is in its infancy it shouldn't play a very large factor in teacher evaluation?

Posted by: Mccrabster | September 19, 2010 9:56 PM | Report abuse


I think you miss a major point.

Your letter seems to say "Using test scores to evaluate me is wrong because of (insert change at your school)".

What you should have said is that using test scores to evaluate you is wrong not because of changes, but because test scores do not measure teacher effectiveness.

There are dozens, if not hundreds of variables that affect a child's performance on state exams. The teacher is only one such variable, and it is statistically invalid to use such data as an evaluation tool.

The people who advocate for evaluations based upon test scores do so not because it makes schools better, but because it's politically expedient at the moment.

Teachers like you should point out the absurdity of such methods not by saying they don't work because of changes at your school, but because they aren't valid methods to begin with.

Posted by: william85 | September 19, 2010 10:13 PM | Report abuse

Mccrabster - I disagree that we can't figure out what makes an effective teacher. It's just that we must rely on a much more complete picture, far more than test scores, and we must be comfortable with the idea of not being able to "prove" a teacher's quality in the most scientific sense of that word. When you're dealing with non-quantifiable qualities in non-quantifiable contexts, that's the way it goes, and that's what too many people fail to realize. I just read somewhere (but didn't note where) that James Popham, one of the most established experts on testing, assessment and measurement, said that ultimately the most reliable form of teacher evaluation is probably the professional judgment of others in the field (presumably veteran educators working with some training and some standards by which to make a judgment). As for the question about whether test scores are better than random chance, I'm the wrong person to ask. I've tried very hard to argue above that it's not necessary to rely on my feelings when there's so much research backing up my position. It bothers me whenever I read reporting that says, "Well, some parents and administrators want to use VAM, while teachers and unions oppose the idea." It's not just that I oppose it, or teachers more broadly; if you look at the consensus of AERA, NCME, and APA, plus look at the error rates reported in a recent DOE study, plus look at the warnings sounded by the National Academies last year (re: Race to the Top), it seems like the research and science communities mostly agree it's a bad idea. I have no idea why VAM supporters keep getting a free pass from so many in government and media when they have nothing with which to answer that.

William85, I absolutely agree with you, and if you go over to my blog (using the links at the top of this post), you'll find that I've taken every possible angle criticizing VAM (based on state tests) for teacher evaluation. If you look at my list of blog posts you'll find a series of posts with titles asking "Do You Understand My Job?" "...My Students?" "...My School?" (all posted in April 2010). I've also taken the tack that you suggest in a couple of articles in Teacher Magazine. This just happens to be the strategy I took this time around.

Posted by: DavidBCohen | September 20, 2010 12:59 AM | Report abuse

David- The economic policy institute concluded "A review of the technical evidence leads us to conclude that, although standardized test scores of students are one piece of information for school leaders to use to make judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation."

I'll look more closely at the others you mention on your blog. But I thought that was the gist. It's not that it can't or shouldn't be used. It's that it's not reliable so shouldn't be used solely.

I agree with your measurement expert- becauase we cannot isolate variables in the classroom- the best thing to do is to to use master teachers to observe. We use what's widely thought of as best practice despite not knowing that a certain methodology is statistically significantly different than another methodology and that's ok.

Posted by: Mccrabster | September 20, 2010 5:49 AM | Report abuse

Brilliant. Thank you for this post/letter.

Posted by: vprecht | September 20, 2010 10:59 AM | Report abuse

I just think that it is a waste of time.

The letter is good, but the implication is that this is a useful way for educators, administrators or whoever to use their time and it isn't. Teachers have a classroom full of kids to teach.

I think this value-added thing is probably something that people who haven't taught in a classroom would agree with. Teachers know that each class is different.

All this emphasis on removing good teachers is counterproductive.

Posted by: celestun100 | September 20, 2010 11:45 AM | Report abuse


It is too bad that California is in such bad shape economically and they are trying to save money on teachers.

But why does California insist on this charade of school "reform" when we all know they just need to cut the budget?

Ridiculous. It would be cheaper to just lay people off honestly.

Posted by: celestun100 | September 20, 2010 11:51 AM | Report abuse

I read parts of the reports you site David, and it's funny but I think we read the same things and come to vastly different conclusions. These reaseachers do not appear to be diametrically opposed to the use of VAM. They simply say VAM should not be used to make high stakes decisions due to concerns about reliability. The researchers caution against overuse, not use. However, when used in conjunction with some of the ideas you discuss in your joint report over on ACT found at it seems to me we can corrroborate our evidence to make more valid claims about teacher quality. Where observation helps capture whether whether a teacher "can" do it. VAM can help is to determine whether a teacher "does" do it. And that to me is an exciting combination.

Posted by: mmccabe4724 | September 20, 2010 3:47 PM | Report abuse

Just today I was thinking of the effects of the first two variables mentioned: principal and administrative support.

What can even an excellent teacher do without the permission of the principal and central administration to be autonomous? This was true before NCLB and it is even more true now.

My daughter suffered from educational neglect (a euphemism!) in 1998 because none of her teachers felt they were allowed to differentiate for her needs! Only if dictated from above were they willing to do something different for her.

Now, under NCLB the situation is the same, worse even because superintendents (condoned by their Boards of Education) now force teachers to uphold rigid schedules and use scripted Reading First literacy lessons.

How about allowing teachers flexibility to meet their students' educational needs?

Posted by: gpadvocate | September 20, 2010 7:07 PM | Report abuse

gpadvocate wrote: How about allowing teachers flexibility to meet their students' educational needs?

Now there's a thought! Why aren't we trusted to do what's best for the kids?

Posted by: musiclady | September 20, 2010 7:45 PM | Report abuse

I read the letter with amazement. The author has included all the excuses he can think of for not being able to improve the result of his teaching. And instead has offered no other method of evaluation. Until he, as a teacher comes up with any method of evaluation of teachers, he has to accept the method which is currently easily enforceable. In my opinion, the students should be quized everyday for the subject which was taught the last time and the teacher should use computer to conduct and evaluate the comprehenshion of the student in the first ten minutes of the class. The student should have confidence in the teacher and the teacher should not make fun of the students not doing good in evaluation. But he should take time out to praise the students who come back with good result. I have been a student of teachers of many cultural background, the best teachers I found were the British teachers who had no other interest except in teaching their students and making them feel comfortable in repeatedly going over the concepts they were teaching. A British teacher is not a member of Board of any kind, does not have any private business and loves and gets paid for teaching only.

Posted by: mahmood7438 | September 20, 2010 9:19 PM | Report abuse

Mahmood- the author presents detailed recommendations for elements of a comprehensive evaluation here:

Posted by: mmccabe4724 | September 20, 2010 10:52 PM | Report abuse

Did anyone notice that the SBE just adopted school library standards -- for the first time in CA history?

Now you have a chance to approach the 85% of CA schools without credentialed librarians and ask them what they are doing to meet state standards.

Mybe we have a chance to help kids learn -- by giving them the skills of lifelong learning -- and a place to practice those skills.

Posted by: richardguy1 | September 21, 2010 12:00 PM | Report abuse

Mahmood - please don't try to read too much into the letter. You are inferring motives that don't exist, and seem to be making assumptions about my situation that aren't true. I am not making excuses; I am citing research that, viewed in combination, thoroughly challenges assumptions about teachers, students, and test scores. Also, as another comment points out, I am engaged in efforts to provide solutions that will work. This letter aims to prevent the state from rushing into ill-conceived policies when they can't answer questions about the fatal flaws in their approach.

Mccrabster - I appreciate the dialogue! On the particular quote you pulled from EPI, I wrote in my blog that I think they should have come down harder; I disagree with the way they put this, but it's a matter of degree, not a matter of which side we come down on generally. Given the degree of uncertainty that they cite, I question why we should use VAM at all for individual teachers. If it's not reliable information, how do we set some arbitrary percentage that unreliable information is "worth" in a teacher evaluation? I acknowledge other measures of effectiveness will have reliability issues, and would oppose putting arbitrary percentages on any of them. The reason I favor other evaluation methods is that I think their challenges are more likely to be overcome. The issues with testing run too deep, which is why I've written so many blog posts and articles taking a piece at a time.

You're right to point out that the research community is not opposed to VAM overall, because researchers remain hopeful that the issues can be worked out, the tests can be improved, etc. I hardly expect researchers to close the door on research! However, until they come back with results and modify their consensus on the issue, I'm willing to shut VAM out from active consideration in evaluations, for all the reasons listed above, and in my other blog posts and articles in Teacher Magazine. Where I am more interested in test results is to make broad, systemic analyses - and even then, cautiously, non-punitively (since that doesn't work anyways), and with multiple measures for cross-reference. After all, that's what most state tests are designed for. You don't ask a kid nine multiple choice vocabulary questions drawn at random from the entire language in order to get a precise picture of a kid's vocabulary. Depending on your sampling, the same kid could score 100% one day and 55% the next day. Cizek (2007) reports that subtests on state tests are no better than a coin toss if used to diagnose a single student's skills. However, test 1,000 or 10,000 or 100,000 students, combine the results, and you might be getting a sense of what a student population knows. So, if a school or district introduces a new program and you see broad changes in tests, I think that's worth looking at. (Of course, in some years you see noticeable score fluctuations in the absence of any likely cause).

Posted by: DavidBCohen | September 22, 2010 4:58 PM | Report abuse

David, there are reliability concerns with all measurement tools- including, if not especially, teacher observations. It's why we should use multiple measures, including classroom observation and standardized testing. If multiple measures come to the same conclusion about effective teaching, we are more likely coming to the correct conclusion. Consider the the administrator or master teacher who can say, "I am concerned about teacher x's methodolgy. As proof of that concern I not only have 4 formal observations, but I noticed that over the course of 3 years, teacher x's test scores in math put their effectivness, after accounting for a number of varialbles (though not all), in the 9th percentile. Even if that 9th percentile is plus or minus 15, we have a piece of information that tells us something more about that teacher's effectiveness.

I am frustrated that on the one hand you have called on your readers to "consult the research" but on the other hand you call on them come to differenet conclusions than those who conduct it.
I'm not convinced VAM and merit pay are the cure all for our evils. I just don't buy the suggestion that because there are questions about it's ability to improve teaching, it therefore shouldn't be used.

I like your ideas about about using sub-tests to evalauate the strength and weaknesses of areas of instruction at the district and school level.

Posted by: mmccabe4724 | September 25, 2010 1:12 PM | Report abuse

The comments to this entry are closed.

RSS Feed
Subscribe to The Post

© 2011 The Washington Post Company