Network News

X My Profile
View More Activity

Valerie Strauss v. me on tests

I didn't intend to pick a fight with my blogging wonder of a colleague Valerie Strauss, but she wouldn't let go of the issue. (I least that is what I would tell my mom if this were the playground and not the world's finest Web site.) Valerie says the standardized tests we use now are too unreliable to tolerate. I don't like them that much myself, but I still think they are useful, and don't see Valerie providing any evidence on her side.

The standardized test results I have seen over the last 30 years seem to conform with what I would expect from what I know of the quality of the teaching and the socio-economic level of the students being tested. McLean High in that wealthy community has higher test scores than Annandale High, which is in a less expensive part of Fairfax County with fewer parents who have graduated from college. Banneker High in D.C., which has a selective student body, does better on tests than Ballou, which does not.

Schools that have taken unusual measures to deepen and invigorate the learning of impoverished children, such as Achievement First, Uncommon schools and KIPP, show significantly better scores than schools that have not.

Those examples, and hundreds more, convince me that the tests we are using are much better than nothing, and shouldn't be dismissed as not worth using any more.

I would prefer more tests that required significant writing, like the AP and IB exams. I think they encourage better teaching. I like Richard Rothstein's suggestion that we experiment with school inspectors as they have in England, because I think well trained educators could give us a much deeper measure of different schools. But I have no idea if they would produce results any more useful than the ones we have now, if applied to all schools, because we haven't done much research on them yet. Using AP and IB as measures of test participation is a very good way to see which schools are trying hardest to challenge students, but that is not the same as looking at the test results to determine the quality of teaching.

In Vermont, they looked closely at results from measuring schools by the quality of student portfolios and found little difference from the results obtained through standardized tests. So I think it would be great if Valerie told us more of what she had in mind. What assessments does she want to use to replace the ones we have, and why does she think they would be better?

Also, if they are going to cost a lot more, might it be better to use that extra money to pay teachers more, and wait untill we find assessments that don't break the budget?

Read Jay's blog every day at

Follow all the Post's Education coverage on Twitter, Facebook and our Education web page,

By Jay Mathews  | March 15, 2010; 6:05 PM ET
Categories:  Jay on the Web  | Tags:  AP, IB, Valerie Strauss v. Jay Mathews, Vermont research, are standardized tests useless  
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: How to handle students cheating
Next: LSAT: the devil's work?


Disclaimer: I am a big fan of Rothstein as I think he makes a compelling case for the danger of believing that low level standardized tests 'get' us much in the way of meaningful data. So, here are a few questions...

1. Why do we need national, high stakes tests?

2. What is the value added/detriment to a student's learning when teachers are pushed to focus heavily on tested areas and ignore other curricular, non-tested subjects?

3. Who should schools be accountable to? I, personally, am a fan of schools being directly responsible to their local communities and the state. Being held to one national standard is flawed on so many levels due to variable funding structures and regional differences.

4. If we want innovators, the creative class, and a nation of big thinkers... how do these low level, multiple choice tests move us in that direction?

5. Why do we talk about education in the US as if it is one entity? I have taught in four states, in different settings and I can tell you that education in America is not one entity. Urban, suburban, rural, west, midwest, east coast, bible belt... these regions are not the same. at all. Talking about them as if there is some magical umbrella term that captures them all is not productive.

We need to move past this fascination with the blame the teacher, test the kids, carrot, stick, carrot, stick... game that is being played. Learning is a complicated process, lets be humble enough to realize that the measurement of that learning is complicated as well. 'We' need 'better' education, that I can agree on, but the manner in which testing companies are measuring that 'better' makes me seriously concerned.

We need to move past what is easily measurable and onto what is meaningful. That is the learning I want to be a part of...

Posted by: dlaufenberg | March 15, 2010 8:18 PM | Report abuse

"I would prefer more tests that required significant writing, like the AP and IB exams. I think they encourage better teaching"

I think you'll find many test developers would too, the major issue is cost. A multi-choice test can be machine graded for pennies, an essay needs to be graded at least twice by two raters, who need to be trained, then you need procedures for dealing with strict or lenient raters, not to mention inconsistent raters, etc. It's too expensive for most purposes. On top of that, high-stakes tests have to stand up to legal scrutiny, so test developers are reluctant to take any risks with new ideas.

Many people assume that tests have "washback" on classroom teaching, but the amount of research is pretty limited. Washback seems to be as much of an assumption based on anecdotes and "common sense" than something demonstrated by empirical evidence.

One key thing seems to be the perceived stakes of the test- washback is unlikely if stakeholders don't believe the test matters. If standardized multi-choice tests remain as high-stakes assessments of teacher and school performance, then it's not obvious that token writing assessments will make a huge difference.

A viable solution is to use a mixture of teacher developed and administered classroom assessments and standardized measures. Equating of teacher assessments with standardized tests is conceptually fairly simple (using basically the same procedures employed to equate AP writing raters), but would require a substantial amount of resources and time to implement. As the old saying goes, "Quick, cheap, or good, pick any two."

Posted by: Trev1 | March 15, 2010 8:34 PM | Report abuse

One more thing Jay, Valerie Strauss doesn't talk about test reliability.

"Reliable" in psychometric speak has a clearly defined meaning, basically that if you administer the same test to the same persons, they will rank-order the same. Improving test reliability is easy- you make the test longer. Standardized multiple-choice tests are extremely reliable in this technical sense because large numbers of questions can be administered and graded very cheaply. Whether they are valid measures of whatever they are claimed to measure is a different matter.

Posted by: Trev1 | March 15, 2010 8:47 PM | Report abuse

I haven't seen any reaction to the common core standards released for comment last week.

Posted by: caxtontype1 | March 15, 2010 10:15 PM | Report abuse

I haven't seen any reaction to the common core standards released for comment last week.

Posted by: caxtontype1 | March 15, 2010 10:17 PM | Report abuse

I haven't seen any reaction to the common core standards released for comment last week.

Posted by: caxtontype1 | March 15, 2010 10:18 PM | Report abuse

I was all set to get on my high horse and rant. Then I read dlaufenberg's comment. There is nothing left to say.

Posted by: Jenny04 | March 16, 2010 6:25 AM | Report abuse

Tests are great, if the data is actually used properly. Are there any studies on Kipp et al that look at the scores of the kids they graduate and compare them to non-charter graduates CONTROLLING for non-teaching-related test score predictors such as test scores on entry, sex, race, family income, etc?

Posted by: qaz1231 | March 16, 2010 11:23 AM | Report abuse

Good question, qaz1231: I reported on the most recent of such studies of one KIPP school on this blog not too long ago. Here is the link:

It uses the randomized comparison approach--which social scientists consider one of their most effective tools--that is also at the heart of a much bigger study of several KIPP schools by Mathematica Policy Research Inc., its final report due in a few years but a preliminary report may appear this year. Instead of painstakingly trying to find a control group of non-KIPP kids by matching them to KIPP kids in as many characteristics as possible, this approach uses random assignment---you let chance divide a large group into two smaller groups and see how they perform in different conditions. In the case of charter schools like KIPP, which select students randomly when they have more applicants than spaces, this supposedly gives you two well-matched groups, the kids that won the lottery and are in KIPP and the kids that didn't and aren't in KIPP. This is particularly good for capturing in both groups the parental motivation for a better education that some experts have suggested explains why KIPP kids do so well. You look at them a few years later and see who has higher achievement, and can presume that that had to be caused by their different schooling experiences since the two groups are otherwise so much alike. Of course there are critics of this approach too, but it seems to be the best we can come up with so far.

Posted by: Jay Mathews | March 16, 2010 11:50 AM | Report abuse

dlaufenberg's excellent question can be best answered, I think, by looking back at what caused us to have standardized tests rating schools in the first place. State education departments have been fiddling with these things for a long time. Tom Loveless found such tests in California as long ago as 1962. The instinct then, as now, was to try to quantify how schools were doing in teaching things the state's customers--voters and taxpayers---wanted taught, like reading, math and writing, so they would have confidence in the schools and keep voting for school budgets or legislators who favored strong school budgets. It was also useful in finding schools that were doing very poorly compared to similar schools, and help them fix what was causing them to fall behind.
But this didn't become a big deal until the 1980s, when the Nation at Risk report said public education was not doing so well, a reflection of the first big publicity given to a decline in SAT scores in a New York Times story in 1975. The people who were most concerned about this were southern governors, particularly Jim Hunt in NC, Dick Riley in SC and Bill Clinton in AR. They were desperate to attract more big companies to their states to get more jobs and more revenues that would allow them to have better economies. They asked the companies about this and were told they didn't want to come because southern schools were so much worse than in the rest of the country, meaning that it would be harder to recruit good workers and to persuade their executives to move to places where their kids would not be well prepared for college. So those governors, and many others, adopted basic skills tests that were used to rate schools and to motivate everyone to make those schools better. That led to the standards movement which led to No Child Left Behind and here we are. As I said in this piece, I think they were responding to a very strong view among voters that schools should be accountable this way. Many people subscribe to the view you are advancing that we really don't need this, but if you ever try to run for governor on that platform, you are going to lose badly. I wouldn't vote for you. I think we all need motivators, and this does, on balance, despite all the problems, push us toward having better schools for our kids, in my view.

Posted by: Jay Mathews | March 16, 2010 12:04 PM | Report abuse

Valerie says the standardized tests we use now are too unreliable to tolerate. I don't like them that much myself,
Could someone explain what is going on since standardized tests in this nation are as scarce as an animal that is becoming extinct.

There are the SAT type tests and the national tests for the 4th and 8th grade that are given every two years.

NCLB stated that states would have to create their own standardized tests. No concept that the only really valid tests would be national tests to deal with the problems that states tests would be either too simple or too hard.

Imagine that you walk into a hospital in Tennessee and the standard test for TB is different from the standard test for TB in Virginia.

Well currently without national standard tests this is the situation. Apparently reading and math are different in each state. A cat may have 9 lives but has 50 different meanings in the United States.

Valerie must win this argument since the absence of national tests makes standardized tests too unreliable.
NCES Finds States Lowered 'Proficiency' Bar
Their results suggest that between 2005 and 2007, various states made their standards less rigorous in one or more grade levels or subjects in at least 26 instances. In 12 instances, particular states appeared to make their standards more stringent in one or more...
It is over a year since the only real standardized tests of public schools were given in February 2009 by the government and the results for the 4th grade and 8th grade tests in Reading are still not available.

Posted by: bsallamack | March 16, 2010 1:40 PM | Report abuse

"The standardized test results I have seen over the last 30 years..."
The only standardized tests available for the last 30 years has been the SAT exam. This is the standardized test for students applying to college.

These tests reveal absolutely nothing in regard to public school education for students who do not take this test. Many students that take these exams went to private schools.

State and local tests are meaningless since many Americans move every year. Only national tests can provide a true picture of public education in the United States.

Posted by: bsallamack | March 16, 2010 2:02 PM | Report abuse

Jay: thanks for the link. Now, having read your earlier post and the paper it links, I'm impressed but still not entirely enamoured of this approach. First off, I can't find figure out exactly how many students were in each group: you say "about 200", the authors say "54% of 457 applicants enrolled" and Table 1 says 285 enrolled. It's important because the demographic comparisons should be 1) Lottery winners who enrolled vs 2) Lottery losers. And if the effect sizes are weighted by cohort, so too should be the demographic comparisons. Table 1 should also include sex of applicant/enrollee.

Perhaps even more problematic is the dropout issue. The authors say and you quote: "the results show no difference in switching [schools] between winners and losers. . .This weighs against the view that exit from KIPP matters for the achievement gains reported here."

But this is horribly insufficient, because it assumes that the reasons that KIPP and non-KIPP students have for switching schools are the same, whereas common sense would suggest that KIPP students switch out mostly when they can't hack the demanding program, whereas your typical public school kid switches schools for more logistical reasons.

Posted by: qaz1231 | March 16, 2010 2:45 PM | Report abuse


The next question then follows, if we have been chasing this thing since the 80s and still find ourselves still trying to 'save' American education, maybe we are going about it incorrectly. The standards/testing craze has been in fast forward for the better part of a decade and we are still having the same conversation.

More of the same, ie. national testing, isn't going to all of the sudden make things better. I propose, that perhaps... we aren't asking the right questions or framing the problem correctly. School should not be what it was, measured the way it was, conducted the same... because we are in a completely different educational landscape. Trying to hearken back to some golden age of easily measured learning, doesn't make it so.

Here's a thought... although I know as of late (newsweek, and the like) this will come off as ridiculous since the 'teachers' are the problem... but, maybe we should trust the teachers in the schools to assess student work and when the community feels that the teachers aren't doing their jobs... the parents bring their concerns to the school board. This is the process that we originally trusted and pulled us through the most prosperous years in history. Removing the accountability for student progress from the community is a mistake. Distancing the community from the responsibility of educating their children, is troubling.

Even if we move in the direction of national testing and standards... the antics, gaming, cheating will continue. Lets stop playing games with numbers, pushing useless data around like broccoli on the plate and get real about the types of schools we need for our children. I suggest that these schools are not about the multiple choice test and are not measured (well) by the cheapest testing structure available.

Posted by: dlaufenberg | March 16, 2010 6:37 PM | Report abuse

"The only standardized tests available for the last 30 years has been the SAT exam. This is the standardized test for students applying to college."

Results from different states or countries can be compared through a variety of "test equating" processes. Basically, if there are enough students who took both tests, statistical comparisons of the entire populations of students who took either test can be made. If not enough persons took both tests, a third "equating" test can be administered to samples of people who took either test and this can be used to link the two operational tests. SAT results could easily be used to equate two local tests. This is routine stuff for psychometricians and researchers.

Although conceptually simple, there are many technical difficulties with equating, and a very extensive technical literature on it. This doesn't mean the process isn't valid for some purposes, but it does mean that you need to be very cautious about interpreting the results. The PISA data is an excellent example of this.

Posted by: Trev1 | March 17, 2010 12:04 AM | Report abuse

The standards/testing craze has been in fast forward for the better part of a decade and we are still having the same conversation.
Posted by: dlaufenberg | March 16, 2010 6:37 PM
Yes it is a craze. Testing will not improve learning.

As for national standardized testing this will not improve education. The only purpose of this testing is that it may provides insight into problems and it can be an effective method of evaluating changes.

The national tests of 2009 show some improvement in teaching math to 4th and 8th graders. When released the national tests of 2009 in reading will probably show that reading scores have decreased. Perhaps with this evidence the nation will understand that if you spend so much time with teach to the test with math this will have an effect on reading where it is impossible to teach to the test.

Contrary to the politicians testing by itself does not improve education.

Public education in this nation has degraded greatly. How else explain the absurd ideas that firmly believed regarding education.

Posted by: bsallamack | March 17, 2010 1:38 PM | Report abuse

For general information.

In Britain:
"In our drive to serve our children well and raise standards, it is essential that teachers and parents know what their children have already mastered and what their future learning needs are."

All five-year-olds will be tested during their first half-term at school from September 1998, under plans announced by the Government yesterday.

Now imagine if the United States adopted this policy and tested all children entering public schools.

Instead of talking about imaginary achievement gaps because of teachers not doing their jobs the nation would understand that there are differences in children and the blame can not be simply imposed upon teachers.

But the politicians would never allow this since it is far easier to pretend that the teachers are the problem.

Posted by: bsallamack | March 17, 2010 2:02 PM | Report abuse

The comments to this entry are closed.

RSS Feed
Subscribe to The Post

© 2010 The Washington Post Company