Jorge Werthein: Formula to Grade Teachers’ Skill Gains in Use, and Critics

1 de setembro de 2010

Formula to Grade Teachers’ Skill Gains in Use, and Critics

Brendan Smialowski for The New York Times

Michelle A. Rhee, center, the Washington schools chancellor, fired about 25 teachers partly over poor value-added evaluations.

By SAM DILLON

- Permalink

How good is one teacher compared with another?

Irfan Khan/Los Angeles Times

Karen Caruso teaches in Los Angeles, where some teacher ratings are made public.

A growing number of school districts have adopted a system called value-added modeling to answer that question, provoking battles from Washington to Los Angeles — with some saying it is an effective method for increasing teacher accountability, and others arguing that it can give an inaccurate picture of teachers’ work.

The system calculates the value teachers add to their students’ achievement, based on changes in test scores from year to year and how the students perform compared with others in their grade.

People who analyze the data, making a few statistical assumptions, can produce a list ranking teachers from best to worst.

Use of value-added modeling is exploding nationwide. Hundreds of school systems, including those in Chicago, New York and Washington, are already using it to measure the performance of schools or teachers. Many more are expected to join them, partly because the Obama administration has prodded states and districts to develop more effective teacher-evaluation systems than traditional classroom observation by administrators.

Though the value-added method is often used to help educators improve their classroom teaching, it has also been a factor in deciding who receives bonuses, how much they are and even who gets fired.

Michelle A. Rhee, the schools chancellor in Washington, fired about 25 teachers this summer after they rated poorly in evaluations based in part on a value-added analysis of scores.

And 6,000 elementary school teachers in Los Angeles have found themselves under scrutiny this summer after The Los Angeles Times published a series of articles about their performance, including a searchable database on its Web site that rates them from least effective to most effective. The teachers’ union has protested, urging a boycott of the paper.

Education Secretary Arne Duncan weighed in to support the newspaper’s work, calling it an exercise in healthy transparency. In a speech last week, though, he qualified that support, noting that he had never released to news media similar information on teachers when he was the Chicago schools superintendent.

“There are real issues and competing priorities and values that we must work through together — balancing transparency, privacy, fairness and respect for teachers,” Mr. Duncan said. On The Los Angeles Times’s publication of the teacher data, he added, “I don’t advocate that approach for other districts.”

A report released this month by several education researchers warned that the value-added methodology can be unreliable.

“If these teachers were measured in a different year, or a different model were used, the rankings might bounce around quite a bit,” said Edward Haertel, a Stanford professor who was a co-author of the report. “People are going to treat these scores as if they were reflections on the effectiveness of the teachers without any appreciation of how unstable they are.”

Other experts disagree.

William L. Sanders, a senior research manager for a North Carolina company, SAS, that does value-added estimates for districts in North Carolina, Tennessee and other states, said that “if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers.”

Dr. Sanders helped develop value-added methods to evaluate teachers in Tennessee in the 1990s. Their use spread after the 2002 No Child Left Behind law required states to test in third to eighth grades every year, giving school districts mountains of test data that are the raw material for value-added analysis.

In value-added modeling, researchers use students’ scores on state tests administered at the end of third grade, for instance, to predict how they are likely to score on state tests at the end of fourth grade.

A student whose third-grade scores were higher than 60 percent of peers statewide is predicted to score higher than 60 percent of fourth graders a year later.

If, when actually taking the state tests at the end of fourth grade, the student scores higher than 70 percent of fourth graders, the leap in achievement represents the value the fourth-grade teacher added.

Even critics acknowledge that the method can be more accurate for rating schools than the system now required by federal law, which compares test scores of succeeding classes, for instance this year’s fifth graders with last year’s fifth graders.

But when the method is used to evaluate individual teachers, many factors can lead to inaccuracies. Different people crunching the numbers can get different results, said Douglas N. Harris, an education professor at the University of Wisconsin, Madison. For example, two analysts might rank teachers in a district differently if one analyst took into account certain student characteristics, like which students were eligible for free lunch, and the other did not.

Millions of students change classes or schools each year, so teachers can be evaluated on the performance of students they have taught only briefly, after students’ records were linked to them in the fall.

In many schools, students receive instruction from multiple teachers, or from after-school tutors, making it difficult to attribute learning gains to a specific instructor. Another problem is known as the ceiling effect. Advanced students can score so highly one year that standardized state tests are not sensitive enough to measure their learning gains a year later.

In Houston, a district that uses value-added methods to allocate teacher bonuses, Darilyn Krieger said she had seen the ceiling effect as a physics teacher at Carnegie Vanguard High School.

“My kids come in at a very high level of competence,” Ms. Krieger said.

After she teaches them for a year, most score highly on a state science test but show little gains, so her bonus is often small compared with those of other teachers, she said.

The Houston Chronicle reports teacher bonuses each year in a database, and readers view the size of the bonus as an indicator of teacher effectiveness, Ms. Krieger said.

“I have students in class ask me why I didn’t earn a higher bonus,” Ms. Krieger said. “I say: ‘Because the system decided I wasn’t doing a good enough job. But the system is flawed.’ ”

This year, the federal Department of Education’s own research arm warned in a study that value-added estimates “are subject to a considerable degree of random error.”

And last October, the Board on Testing and Assessments of the National Academies, a panel of 13 researchers led by Dr. Haertel, wrote to Mr. Duncan warning of “significant concerns” that the Race to the Top grant competition was placing “too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals.”

“Value-added methodologies should be used only after careful consideration of their appropriateness for the data that are available, and if used, should be subjected to rigorous evaluation,” the panel wrote. “At present, the best use of VAM techniques is in closely studied pilot projects.”

Despite those warnings, the Department of Education made states with laws prohibiting linkages between student data and teachers ineligible to compete in Race to the Top, and it designed its scoring system to reward states that use value-added calculations in teacher evaluations.

“I’m uncomfortable with how fast a number of states are moving to develop teacher-evaluation systems that will make important decisions about teachers based on value-added results,” said Robert L. Linn, a testing expert who is an emeritus professor at the University of Colorado, Boulder.

“They haven’t taken caution into account as much as they need to,” Professor Linn said.

1 de setembro de 2010

Formula to Grade Teachers’ Skill Gains in Use, and Critics

By SAM DILLON

Irfan Khan/Los Angeles Times

Nenhum comentário:

Postar um comentário