Too many curricular aims creates assessment problems
My guest is Kenneth J. Bernstein, an award-winning veteran teacher in the Washington D.C. area. He blogs at http://teacherken.dailykos.com/.
By Kenneth Bernstein
One characteristic of the curriculum for many courses of study in American public schools is that we try to cover far too much.
Let me offer some words from W. James Popham,now emeritus at UCLA, and one of the nation’s acknowledged experts on assessment and evaluation.
In his 2009 book, Instruction That Measures Up: Successful Teaching in the Age of Accountability, Popham offers some concerns about the number of curricular aims we seek to assess. He found one case of 35 separate performance expectations in one 7th grade math program, and a total of 299 potentially assessable curricular aims in a 4th grade program. As he notes (on p. 56):
"It is simply not possible for teachers to teach all of these curricular aims during a single school year -- or rather, it’s not possible to each them with the kind of rigor likely to foster deep and lasting learning."
He also cautions (on p. 61) :
It is far better for students to master a modest number of truly potent, large-grain curricular aims than it is for them to superficially touch on a galaxy of smaller-grain curricular aims."
Unfortunately, too much of our current approach to assessment presumes the galaxy of smaller-grain curricular aims. This creates some real problems.
Presume that in total there are 1,000 questions that could be asked about a subject. The test can only ask about 100. We are sampling the possible universe of what a student can demonstrate knowing about that subject, and like any sampling, we may have measurement error.
Let’s consider two hypothetical students, one who knows very little, and one who knows the vast amount of the material. The first gets all 100 questions right, but those are the only questions the student knows. The second misses all 100, which are the only ones not part of the student’s knowledge.
I recognize that neither of these scenarios is likely. But bear with me.
Let’s suppose we are justifiably surprised by these results. So we later give another test, with completely different questions. This time the first student gets them all wrong and the second gets them all right. Neither pair of scores is an accurate reflection of the underlying knowledge, but we now have two data points for each student.
There’s the first problem: Too often we use a single set of scores in a fashion that ignores the possible measurement error. And if we combine two sets taken at different times, what may appear to be increased or decreased performance may indicate nothing more than regression to the mean -- a performance close to the student’s underlying knowledge -- than it does any measurement of learning, or for that matter the “value added” by the teacher’s instruction.
The problem gets worse the larger the number of indicators we seek to assess.
Let’s consider the more benign of the two examples I cited from Popham: 35 separate indicators to be assessed for 7th grade math. Let’s presume that we are prepared to give a test with 70 questions, two for each indicator. We still risk distorted results because of the sampling error for each indicator based on the particular questions used.
We hope and expect that with 35 indicators, that error will even out and give us an overall measure that is accurate. We hope it, but we cannot know for certain. To give 70 questions to 7th graders probably presumes a test of at least 90 minutes, which is already an extended time for a student of that age to sit still at one task, something else that can affect the score.
Now imagine that there are almost 300 indicators for that fourth grader. Even with only one question per indicator you cannot assess all of them in a single test sitting. The results obtained may therefore be distorted the same way we saw skewed and inaccurate results for my two hypothetical students.
We have all taken tests. They give us hard numbers. We have a preference in our society for hard numbers, because we like to compare. Data, if it is accurate, if we understand what it represents, can inform us. This is true in school as well as elsewhere.
But when the data has limitation? How much weight should we be putting on it? How much do we remember that all sampling carries with it some rate of error?
In a recent piece on D.C. Schools Chancellor Michelle Rhee, Valerie Strauss wrote about the D.C. teacher evaluation system called IMPACT: “Twenty-two measures have to be displayed in 30 minutes? It’s idiotic.” If that is idiotic, what word do we use to describe a single test that attempts to measure 35 or more indicators, with two or three questions per, in a single sitting? How is that an accurate representation of anything beyond what that student does on that test on that sitting?
Each additional statistical manipulation we do with data potentially increases the margin of error. Value-added methodologies do not yet solve these problems, which is why professional caution about using them for high stakes purposes.
The recent report from the Economic Policy Institute titled, “Problems with the use of student test scores to evaluate teachers,” says: “A review of the technical evidence leads us to conclude that, although standardized test scores of students are one piece of information for school leaders to use to make judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation.”
I would argue that is true for all data obtained from tests. It provides some information. We should examine it, use it to inform our decision making, including about instruction. But we should remain aware of its limitations. We should not attempt to make decisions that cannot by justified by using the results of tests by themselves, or in giving them weight out of proportion to their accuracy. That should apply to how we apply test scores to students as well as how we use that information to evaluate teachers.
Those who insist on ignoring the appropriate cautions? Well, perhaps Valerie’s words are appropriate: It’s idiotic.
Follow my blog every day by bookmarking washingtonpost.com/answersheet. And for admissions advice, college news and links to campus papers, please check out our Higher Education page at washingtonpost.com/higher-ed Bookmark it!
| September 21, 2010; 11:00 AM ET
Categories: Curriculum, Guest Bloggers, Research, Standardized Tests | Tags: curriculum, curriculum and assessment, kenneth bernstein, value added
Save & Share: Previous: The danger of blaming schools for economic woes
Next: World Bank invests more in education; big gap remains
Posted by: zoniedude | September 21, 2010 12:46 PM | Report abuse
Posted by: bsallamack | September 21, 2010 12:46 PM | Report abuse
Posted by: bsallamack | September 21, 2010 12:52 PM | Report abuse
The comments to this entry are closed.