Unspinning Data on New Jersey Charter Schools

Today’s (okay…yesterday… I got caught up in a few other things) New Jersey headlines once again touted the supposed successes of New Jersey Charter Schools:

http://www.nj.com/news/index.ssf/2011/01/gov_christie_releases_study_sh.html

The Star Ledger reporters, among others, were essentially reiterating the information provided them by the New Jersey Department of Education. Here’s their story.

http://www.state.nj.us/education/news/2011/0118chart.htm

And here’s a choice quote from the press release:

“These charter schools are living proof that a firm dedication to students and a commitment to best education practices will result in high student achievement in some of New Jersey’s lowest-income areas,” said Carlos Perez, chief executive officer of the New Jersey Charter School Association. He pointed to NJASK data for third grade Language Arts, where more than half the charters outperformed the schools in their home districts, and of those, more than 75 percent were located in former Abbott districts.

No spin there. Right? Just a balanced summary of achievement data, with thoughtful interpretation of what they might actually mean. Not really.

There are many, many reasons why the comparisons released yesterday are deeply problematic, and well, quite honestly, pretty darn meaningless. I could not have said it better than Matt DiCarlo of Shanker Blog did here:

“Unfortunately, however, the analysis could barely pass muster if submitted by a student in one of the state’s high school math classes (charter or regular public).”

Here are some guidelines I have posted in the past, regarding appropriate ways to compare New Jersey Charter Schools to their host districts on various measures including outcome measures:

  1. When comparing across schools within poor urban setting, compare on basis of free lunch, not free or reduced, so as to pick up variation across schools. Reduced lunch income threshold too high to pick up variation.
  2. When comparing free lunch rates across schools either a) compare against individual schools and nearest schools, OR compare against district averages by GRADE LEVEL. Subsidized lunch rates decline in higher grade levels (for many reasons, to be discussed later). Most charter schools serve elementary and/or middle grades. As such they should be compared to traditional public schools of the same grade level. High school students bring district averages down.
  3. When comparing test score outcomes using NJ report card data, be sure to compare General Test Takers, not Total Test Takes. Total Test Takers include scores/pass rates for children with disabilities. But, as we have seen time and time again, in charts above, Charters tend not to serve these students. Therefore, it is best to exclude scores of these students from both the Charter Schools and Traditional Public Schools.

Today’s (okay, yesterday – publication lag) primary violation involves #3 above, but also relates to the first two basic rules. Let’s do a quick walk through, using the 2009 data, because the 2010 school level school reports data are not yet posted on the NJDOE web site. The bottom line is that it is relatively meaningless to simply compare raw scores or proficiency rates of charter schools to host district schools – as done by NJDOE and the Star Ledger. That is, it is meaningless unless they actually serve similar student populations, which they do not.

Below, I walk through a few quick examples of student population differences in Newark, home to the state’s high-flying charter schools (North Star Academy and Robert Treat Academy). Next, I construct a statistical model of school performance including New Jersey Charter schools and traditional public schools in their host district, controlling for student demographics and location. I first used this same model here: Searching for Superguy in New Jersey. I use that model to show adjusted performance comparisons on a few of the tests, and then I use a variation of that model to test the proficiency rate difference – on average statewide – between charter schools and schools in the host district. Finally, I address one additional factor which I am unable to fully control for in the model – the fact that some New Jersey Charter Schools – high performing ones – seem to have unusually high rates of cohort attrition between grade 6 and 8, concurrent with rising test scores. I raise this point because pushing out of students is not an option available to traditional public schools. In fact, it is the traditional public schools that must take back those students pushed out.

Demographic Examples from Newark

Here are a few slides from previous posts on the demography of Newark Charter Schools in particular, compared to other Newark Public Schools. Here are the shares of kids who qualify for free lunch by school in Newark (city boundaries). Clearly, most of the charters fall toward the left hand side of the graph with far fewer of the lowest low-income children.

The shares of English Language Learners look similar if not more dramatic. Many NPS schools have very high rates of English Language Learners while few charters have even a modest share.

Finally, here’s a 4 year run of the most recent available special education classification rate data (More recent years of data have a dead link on the classification rates). This graph compares Essex County charter schools with Essex County public school districts. Charter Schools have invariably low special education rates, but for those focused on children with disabilities.

 

One cannot reasonably ignore these differences when comparing performance outcomes of kids across schools. It’s just silly and not particularly useful.

The Outcomes Corrected for the Demographics

So then, what happens if we actually use some statistical adjustments to evaluate whether the charter schools outperform (on average proficiency rate) other schools in the same city on the same test. Well, I’ve done this for charter data from 2009 and previous years and will do it again for the 2010 data when available. I use variables available in the Fall Enrollment Files and from the School Report Card and information on school location from the NCES Common Core of Data in order to create a model of the expected scores for each charter school and each other school in the same city. In the model, I use only the performance of GENERAL TEST TAKERS, so as to exclude those scores of special education students (who, for the most part don’t attend the charter schools). The model:

Outcome = f(Poverty, Race, Homelessness, City, Tested Grade, Subject)

Is use the model to create a predicted performance level (proficiency rate) for each school, considering which grade level test we are looking at, in which subject, the race/ethnicity of the students (where Hispanic concentration is highly correlated with available ELL data, and Hispanic concentration data are more consistently reported), the share of students qualifying for free lunch, the percent identified as homeless and the city of location for the school. That is, each charter school is effectively compared against only other schools in the same geographic context (city).

This is a CRUDE model, which can’t really account for other factors, such as the possibility that some charter schools actually shed, or push out, lower performing students over time.  More on that below. So, for each school, I get a predicted performance level – what that school is expected to achieve given the children it serves and the location. I can then compare the actual performance to the predicted performance to determine whether the school beats expectations or falls below expectations.

The next two graphs provide a visual representation of schools beating the odds and schools under-performing with respect to expectations. Charters are identified in red and named. Blue circles are traditional public schools in the same district. Note that there are about the same number of charters beating expectations as there are falling short. The same is true for non-charters. On average, both groups appear to be about average.

8th Grade Math performance looks much like 4th grade. Charters are evenly split between “good” and “bad,” as are the traditional public schools in their host districts.

The Overall Charter Difference (Or Not?)

Now, the above graphs don’t directly test whether the average charter performance is better or worse than the average non-charter performance on the same test, same grade and in the same location. But, conducting that test (for these purposes) is as simple as adding into the statistical model an indicator of whether a school is a charter school. Doing so creates a simple (oversimplified, in fact) comparison of the average performance of charters to the average performance of non-charters in the same city (on the same test, in the same grade level), while “correcting” statistically for differences in the student population. I SHOULD POINT OUT THAT ONE CAN NEVER REALLY FULLY CORRECT FOR THOSE DIFFERENCES!

Using this oversimplified method, the analysis (statistical output) below shows that the charter average proficiency rate is about 3% higher than the non-charter average – BUT THAT DIFFERENCE IS NOT STATISTICALLY SIGNIFICANT. That is, there really isn’t any difference. THAT IS, THERE REALLY ISN’T ANY DIFFERENCE.


Some Other Intervening Factors: Cohort Attrition, or Pushing Out

As I mentioned above, even the “tricky statistics” I used cannot sort out such things as a school that systematically dumps, or pushes out lower performing students, where those lower performing students end up back in the host district. Such an effect would simultaneously boost the charter performance and depress the host district performance (if enough kids were pushed back). I’ve written on this topic previously. So, I’ll reuse some of the older stuff – which isn’t really that old (last Fall).

In this figure, we can see that for the 2009 8th graders, North Star began with 122 5th graders and ended with 101 in 8th. The subsequent cohort also began with 122, and ended with 104. These are sizable attrition rates. Robert Treat, on the other hand, maintains cohorts of about 50 students – non-representative cohorts indeed – but without the same degree of attrition as North Star. Now, a school could maintain cohort size even with attrition if that school were to fill vacant slots with newly lotteried-in students. This, however, is risky to the performance status of the school, if performance status is the main selling point.

Here, I take two 8th grade cohorts and trace them backwards. I focus on General Test Takers only, and use the ASK Math assessment data in this case. Quick note about those data – Scores across all schools tend to drop in 7th grade due to cut-score placement (not because kids get dumber in 7th grade and wise up again in 8th). The top section of the table looks at the failure rates and number of test takers for the 6th grade in 2005-06, 7th in 2006-07 and 8th in 2007-08. Over this time period, North Star drops 38% of its general test takers. And, cuts the already low failure rate from nearly 12% to 0%. Greater Newark also drops over 30% of test takers in the cohort, and reaps significant reductions in failures (partially proficient) in the process.

The bottom half of the table shows the next cohort in sequence. For this cohort, North Star sheds 21% of test takers between grade 6 and 8, and cuts failure rates nearly in half  – starting low to begin with (starting low in the previous grade level, 5th grade, the entry year for the school). Gray and Greater Newark also shed significant numbers of students and Greater Newark in particular sees significant reductions in share of non(uh… partially)proficient students.

My point here is not that these are bad schools, or that they are necessarily engaging in any particular immoral or unethical activity. But rather, that a significant portion of the apparent success of schools like North Star is a) attributable to the demographically different population they serve to begin with and b) attributable to the patterns of student attrition that occur within cohorts over time.

Understanding Differing Perspectives

Some will say, why should I care if charters are producing higher outcomes with similar kids? What matters to me is that they are producing higher outcomes! Anyone who produces higher outcomes in Newark or Trenton should be applauded, no matter how they do it. It’s one more high performing school where there wasn’t one previously.

It is important to understand that comparisons of student outcomes that ignore differences in student populations reward – in the public eye – those schools that manage to find a way to serve more advantaged populations, either by achieving non-representative initial lottery pool or by selective attrition. As a result, there is a disincentive for charter operators to actually make greater effort to serve higher need populations – the ones who really need it! And there are many out there who see this as their real mission.  Those charter operators who do try to serve more ELL children, more children in severe multi-generational poverty, and children with disabilities often find themselves answering tough questions from their boards of directors and the media regarding why they can’t produce the same test scores as the high-flying charter on the other side of town. These are not good incentives from a public policy perspective. They are good for the few, not the whole.

Further, one’s perspective on this point varies whether one is a parent looking for options for his/her own child, or a policymaker looking for “scalable” policy options for improving educational opportunities for children statewide. From a parent (or child) perspective, one is relatively unconcerned whether the positive school effect is function of selectivity of peer group and attrition, so long as there is a positive effect. But, from a public policy perspective, the “charter model” is only useful if the majority of positive effects are not due to peer group selectivity and attrition, but rather to the efficacy and transferability of the educational models, programs and strategies. Given the uncommon student populations served by many Newark charters and even more uncommon attrition patterns among some… not to mention the grossly insufficient data… we simply have no way of knowing whether these schools can provide insights for scalable reforms.

As they presently operate, however, many of the standout schools do not represent scalable reforms. And on average, New Jersey charters are still… just… average.

Understanding Education Costs versus “Inflation”

We often see pundits arguing that education spending has doubled over a 30 year period, when adjusted for inflation, and we’ve gotten nothing for it. We’ve got modest growth in NAEP scores and huge growth in spending. And those international comparisons… wow!

The assertion is therefore that our public education system is less cost-effective now than it was 30 years ago. But this assumption is based on layers of flawed reasoning, on both sides of the equation.

Here’s a bit of School Finance 101 on this topic:

First, what are the two sides of the equation, or at least the two parts of the fraction? The numerator here is education spending and how we measure it now compared to previously. The major flaw in the usual reasoning is that we are making our comparison of the education dollar now to then by simply adjusting the value of that dollar for the average changes in the prices of goods purchased by a typical consumer (food, fuel, etc.), or the Consumer Price Index.

Unfortunately, the consumer price index is relatively unhelpful (okay, useless) for comparing current education spending to past education spending, unless we are considering how many loaves of bread or gallons of gas can be purchased with the education dollar.

If we wanted to maintain constant quality education over time, the main thing we’d have to do is maintain a constant quality workforce in schools – mainly a teacher workforce, but also administrators, etc. At the very least, if quality lagged behind we’d have to be able to offset the quality losses with additional workers, but the trade-offs are hard to estimate.

The quality of the teacher workforce is influenced much more by the competitiveness of the wages for teachers, compared to other professions, than to changes in the price of a loaf of bread or gallon of gas. If we want to get good teachers, teaching must be perceived as a desirable profession with a competitive wage. That is, to maintain teacher quality we must maintain the competitiveness of teacher wages (which we have not over time) and to improve teacher quality, we must make teacher wages (or working conditions) more competitive. On average, non-teacher wage growth has far outpaced the CPI over time and on average, teacher wages have lagged behind non-teacher wages, even in New Jersey!

Now to the denominator or the outcomes of our education system. First of all, if we allow for a decline in the quality of the key input – teachers – we can expect a decline in the outcomes however we choose to measure them. But, it is also important to understand that if we wish to achieve either higher outcomes, or to achieve a broader array of outcomes, or to achieve higher outcomes in key areas without sacrificing the broader array of outcomes, costs will rise. In really simple terms, the cost of doing more is more, not less. And yes, a substantial body of rigorous peer-reviewed empirical literature supports this contention (a few examples below).

So, as we ask our schools to accomplish more we can expect the costs of those accomplishments to be greater. If we expect our children to compete in a 21st century economy, develop technology skills and still have access to physical education and arts, it will likely cost more, not less, than achieving the skills of 1970. But, we must also make sure we are adequately measuring the full range of outcomes we expect schools to accomplish. If we are expecting schools to produce engaged civic participants, we may or may not see the measured effects in elementary reading and math test scores.

An additional factor that affects the costs of achieving educational outcomes is the student inputs – or who is showing up at the schoolhouse door (or logging in to the virtual school). A substantial body of research (see chapter by Duncombe and Yinger, here) explains how child poverty, limited English proficiency, unplanned mobility and even school racial composition may influence the costs of achieving any given level of student outcomes. Differences in the ways children are sorted across districts and schools create large differences in the costs of achieving comparable outcomes and so too do changes in the overall demography of the student population over time. Escalating poverty, and mobility induced by housing disruptions, increased numbers of children not speaking English proficiently all lead to increases of the cost of achieving even the same level of outcomes achieved in prior years. This is not an excuse. It’s reality. It costs more to achieve the same outcomes with some students than with others.

In short, the “cost” of education rises as a function of at least 3 major factors:

  1. Changes in the incoming student populations over time
  2. Changes in the desired outcomes for those students, including more rigorous core content area goals or increased breadth of outcome goals
  3. Changes in the competitive wage of the desired quality of school personnel

And the interaction of all three of these! For example, changing student populations making teaching more difficult (a working condition), meaning that a higher wage might be required to simply offset this change. Increasing the complexity of outcome goals might require a more skilled teaching workforce, requiring higher wages.

The combination of these forces often leads to an increase in education spending that far outpaces the consumer price index, and it should. Cost rise as we ask more of our schools, as we ask them to produce a citizenry that can compete in the future rather than the past. Costs rise as the student population inputs to our public schooling system change over time. Increased poverty, language barriers and other factors make even the current outcomes more costly to achieve. And costs of maintaining the quality of the teacher workforce change as competitive wages in other occupations and industries change, which they have.

Typically, state school finance systems have not kept up with the true increased costs of maintaining teacher quality, increased outcome demands or changing student demography. Nor have states sufficiently targeted resources to districts facing the highest costs of achieving desired outcomes. See www.schoolfundingfairness.org. And many states, with significantly changing demography including Arizona, California and Colorado have merely maintained or even cut current spending levels for decades (despite what would be increased costs of even maintaining current outcome levels).

Evaluating education spending solely on the basis of changes in the price of a loaf of bread and/or gallon of gasoline is, well, silly.

Notably, we may identify new “efficiencies” that allow us to produce comparable outcomes, with comparable kids at lower cost. We may find some of those efficiencies through existing variation across schools and districts, or through new experimentation. But it is downright foolish to pretend that those efficiencies are simply out there (even if we can’t see them, or don’t know them) and we can simply squeeze the current system into achieving comparable or better outcomes at lower cost.

Readings

Baker, B.D., Taylor, L., Vedlitz, A. (2008) Adequacy Estimates and the Implications of Common Standards for the Cost of Instruction. National Research Council.  http://www7.nationalacademies.org/CFE/Taylor%20Paper.pdf

Duncombe, W., Lukemeyer, A., Yinger, J. (2006) The No Child Left Behind Act: Have Federal Funds been Left Behind? http://cpr.maxwell.syr.edu/efap/Publications/costing_out.pdf

This second one is a really fun article showing the vast differences in the costs of achieving NCLB proficiency targets in two neighboring states which happen to have very different testing standards. In really simple terms, Missouri has a hard test with low proficiency rates and Kansas and easy test with high proficiency rates. The authors show the cost implications of achieving the lower, versus higher tested achievement standards.

Thinking through cost-benefit analysis and layoff policies


If you’re running a school district or a private school and you are deciding on what to keep in your budget and what to discard, you are making trade-offs. You are making trade-offs as to whether you want to spend money on X or on Y, or perhaps a more complicated mix of many options. How you come to your decision depends on a number of factors:

  1. The cost – the total costs of the various ingredients that go into providing X and providing Y. That is, how many people, at what salary and benefits, how much space at what overhead cost (per time used) and how much stuff (materials, supplies and equipment) and at what market prices?
  2. The benefits – the potential dollar return to doing X versus doing Y. For example, how much dollar savings might be generated in operating cost savings from reorganizing our staffing and use of space, if we spend up front (capital expenses) to reorganize and consolidate our elementary schools where they have become significantly imbalanced over time?
  3. The effects – the relative effectiveness of doing X versus doing Y. For example, in the simplest case, if we are choosing between two reading programs, what are the reading achievement gains, or effects, from each program? Or, more pertinent to the current conversation (but far more complex to estimates), what are the relative effects of reducing class size by 2 students when compared to keeping a “high quality” teacher.
  4. The utility – The utility of each option refers to the extent that the option in question addresses a preferred outcome goal. Utility is about preferences, or tastes. For example, in the current accountability context, one might be pressured to place greater “utility” on improving math or reading outcomes in grades 3 through 8. If the costs of a preferred program are comparable to the costs of a less preferred program… well… the preferred program wins. There are many ways to determine what’s “preferred,” and more often than not, public input plays a key role especially in smaller, more affluent suburban school districts. As noted above, federal and state policy have played a significant role in defining utility in the past decade (and arguably, distorting resource allocation to a point of significant imbalance in resource-constrained districts)

This basic cost analysis framework laid out by Henry Levin back in 1983 and revisited by Levin and McEwan since should provide the basis for important trade-off decisions in school budgeting and should provide the conceptual basis for arguments like those made by Petrilli and Roza in their recent policy brief. But such a framework is noticeably absent and likely so because most of the proposals made by Petrilli and Roza:

  1. are not sufficiently precise to apply such a framework  largely because little is known about the likely outcomes (which may in fact be quite harmful); and
  2. because they have failed entirely to consider in detail the related costs of proposed options, especially up-front costs of many of the options (like school reorganization or developing teacher evaluation systems). Note that the full length book (from which the brief comes) is no more thoughtful or rigorous.

Back of the Napkin Application to Layoff Options

Allow me to provide a back-of-the-napkin example of some of the pieces that might go into determining the savings and/or benefits from the BIG suggestion made by Pettrilli and Roza – which is to use quality based layoffs in place of seniority based layoffs when cutting budgets. This one would seem to be a no-brainer. Clearly, if we layoff based on quality, we’ll have better teachers left (greater effectiveness) and we’ll have saved a ton money or a ton of teachers. That is, if we are determined to layoff X teachers, it will save more money to lay off more senior, more expensive teachers than to lay off novice teachers. However, that’s not the likely what-if scenario. More likely is that we are faced with cutting X% of our staffing budget, so the difference will be in the number of teachers we need to lay off in order to achieve that X%, and the benefit difference might be measured in terms of the change in average class size resulting from laying off teachers by “quality” measures and laying off teachers by seniority.

Let’s lay out some of the pieces of this cost benefit analysis to show its complexity.

First of all, let’s consider how to evaluate the distribution of the different layoff policies.

Option 1 – Layoffs based on seniority

This one is relatively easy and involves starting from the bottom in terms of experience and laying off as many junior teachers as necessary to achieve 5% savings to our staffing budget.

Option 2 – Layoffs based on quality

Here’s the tricky part. Budget cuts and layoffs are here and now. Most districts do not have in place rigorous teacher evaluation systems that would allow them to make high stakes decisions based on teacher quality metrics. AND, existing teacher quality metrics where they do exist (NY, DC, LA) are very problematic. So, on the one hand, if districts rush to immediately implement “quality” based layoffs, districts will likely revert to relying heavily on some form of student test score driven teacher effectiveness rating, modeled crudely (like the LA Times model).  Recall that even in better models of this type, we are looking at a 35% chance of identifying an average teacher as “bad” and 20% chance of identifying a good teacher as “bad.”

In general, the good and bad value-added ratings fall somewhat randomly across the experience distribution. So, for simplicity in this example, I will assume that quality based firings are essentially random. That is, they would result in dismissals randomly distributed across the experience range. Arguably, value-added based layoffs are little more than random, given that a) there is huge year to year error even when comparing on the same test and b) there are huge differences when rating teachers using one test, versus using another.

Testing this out with Newark Public Schools – Elementary Classroom Teachers 2009-10

At the very least, one would think that randomly firing our way to a 5% personnel budget cut would create a huge difference when compared to firing our way to a 5% personnel budget cut by eliminating the newest and cheapest teachers. I’m going to run these numbers using salaries only, for illustrative purposes (one can make many fun arguments about how to parse out fixed vs. variable benefits costs, or deferred benefits vs. short run cost differences for pensions and deferred sick pay, etc.).

We start with just over 1,000 elementary classroom teachers in Newark Public Schools, and assume an average class size of 25 for simplicity. The number of teachers is real (at least according to state data) but the class sizes are artificially simplified. We are also assuming all students and classroom space to be interchangeable.  A 5% cut is about $3.7 million. Let’s assume we’ve already done our best to cut elsewhere in the district budget, perhaps more than 5% across other areas, but we are left with the painful reality of cutting 5% from core classroom teachers in grades K-8. In any case, we’re hoping for some dramatic saving here – or at least benefits revealed in terms of keeping class sizes in check.

Figure 1: Staffing Cut Scenarios for Newark Public Schools using 2009-10 Data

If we layoff only the least experienced teachers to achieve the 5% cut, we layoff only teachers with 3 or fewer years of experience when using the Newark data.  The average experience of those laid off is 1.8 years. And we end up laying off 72 teachers (a sucky reality no matter how you cut it).

If we use a random number generator to determine layoffs (really, a small difference from using Value-added modeling), we end up laying off only 54 teachers instead of 72. We save 18 teachers, or 1.7% of our elementary classroom teacher workforce.

What’s the class size effect of saving these 18 teachers? Well, under the seniority based layoff policy, class size rises from 25 to 26.86. Under the random layoff policy, class size rises from 25 to 26.37. That is, class size is affected by about half a student per class. This may be important, but it still seems like a relatively small effect for a BIG policy change. This option necessarily assumes no downside to the random loss of experienced teachers. Of course, the argument is that more of those classes now have a good teacher in front of them. But again, doing this here and now with the type of information available means relying not even on the “best” of teacher effectiveness models, but relying on expedited, particularly sloppy, not thoroughly vetted models. I would have continued concerns even with richer models, like those explored in the recent Gates/Kane report, which still prove insufficient.

Perhaps most importantly, how does this new policy affect the future teacher workforce in Newark – the desirability for up-and-coming teachers to pursue a teaching career in Newark, where their career might be cut off at any point, by random statistical error? And how does that tradeoff balance with a net difference of about half a student per classroom?

What about other costs?

Petrilli and Roza, among others, ignore entirely any potential downside to the teacher workforce – those who might choose to enter that workforce if school districts or states al-of-the-sudden decide to rely heavily on error prone and biased measures of teacher effectiveness to implement layoff policies.  This downside might be counterbalanced by increased salaries, on average and especially on the front end. That is, to achieve equal incoming teacher quality over time, given the new uncertainty, might require higher front end salaries. This cost is ignored entirely (or simply assumed to come from somewhere else, like cutting benefits… simply negating step increments, or supplements for master’s degrees, each of which have other unmeasured consequences).

I have assumed above that districts would rely heavily on available student testing data, creating error-prone, largely random layoffs, while ignoring the cost of applying the evaluation system to achieve the layoffs. Arguably, even contracting an outside statistician to run the models and identifying the teachers to be laid off would cost another $50,000 to $75,000, leading to reduction of at least one more teacher position under the “quality based” layoff model.

And then there are the legal costs of fighting the due process claims that the dismissals were arbitrary and the potential legal claims over racially disparate firings. Forthcoming law review article to be posted soon.

Alternatively, developing a more rigorous teacher evaluation system that might more legitimately guide layoff policies requires significant up-front costs, ignored entirely in the current overly simplistic, misguided rhetoric.

How can we implement quality based layoffs when we’re supposed to be laying off teachers NOT teaching math and reading in elementary grades?

Here’s another issue that Petrilli, Roza and others seem to totally ignore. They argue that we must a) dismiss teachers based on quality and b) must make sure we don’t compromise class sizes in core instructional areas, like reading and math in the elementary grades.

Let’s ponder this for a moment. The only teachers to whom we can readily assign (albeit deeply flawed) effectiveness ratings are those teaching math and reading between grades 3 and 8. So, the only teachers who we could conceivably layoff based on preferred “reformy” quality metrics are teachers who are directly responsible for teaching math and reading between grades 3 and 8.

That is, in order to implement quality based layoffs, as reformers suggest, we must be laying off math and reading teachers between grades 3 and 8, except that we are supposed to be laying off other teachers, not those teachers. WOW… didn’t think that one through very well… did they?

Am I saying seniority layoffs are great?

No. Clearly seniority layoffs are imperfect and arguably there is no perfect answer to layoff policies. Layoffs suck and sometimes that sucky option has to be implemented. Sometimes that that sucky option has to be implemented with a blunt and convenient instrument and one that is easily defined, such as years of service. It is foolish to argue that teaching is the only profession where those who’ve been around for a while – those who’ve done their time – have greater protection when the axe comes down. Might I suggest that paying one’s dues even plays a significant role in many private sector jobs? Really? And it is equally foolish to argue that every other profession EXCEPT TEACHING necessarily makes precise quality decisions regarding employees when that axe comes down.

The tradeoff being made in this case is a tradeoff  NOT between “keeping quality teachers” versus “keeping old, dead wood” as Petrilli, Roza and others would argue, but rather the tradeoff between laying off teachers on the unfortunately crude basis of seniority only, versus laying off teachers on a marginally-better-than-random, roll-of-the-dice basis. I would argue the latter may actually be more problematic for the future quality of the teaching workforce!  Yes, pundits seem to think that destabilizing the teaching workforce can only make it better. How could it possibly get worse, they argue? Substantially increasing the uncertainty of career earnings for teachers can certainly make it worse.

Bad Teachers Hurt Kids, but Salary Cuts Have no Down Side?

The assumption constantly thrown around in these policy briefs is that putting a bad teacher in front of the kids is the worst possible thing you could do. We have to fire those teachers. They are bad for kids. They hurt kids.

But, the same pundits argue that we should cut pay for the teachers in any number of ways (including paying for benefits) and subject teachers to layoff policies that are little more than random. Since so many teachers are bad teachers – and simply bad people – these policies are, of course, not offensive. Right? Kids good. Teachers bad. Treat kids well. Take it out on teachers. No harm to kids. Easy!

I’m having a hard time swallowing that. That’s just not a reasonable way to treat a workforce (if you want a good workforce), no less a reasonable way to treat a workforce charged with educating children. In fact, it’s bad for the kids, and just plain ignorant to assert that one can treat the teachers badly, lower their pay, morale and ultimately the quality of the teacher workforce and expect there to be no downside for the kids.

Petrilli and Roza make the assumption that there is big savings to be found from cutting teacher salaries directly and also indirectly by passing along benefits costs to teachers.  That’s a salary cut! Or at least a cut to the total compensation package and it’s a package deal! This argument seems to be coupled with an assumption that there is absolutely no loss of benefit or effectiveness from pursuing this cost-cutting approach (because we’ll be firing all of the sucky teachers anyway). That is, teacher quality will remain constant even if teacher salaries are cut substantially.  A substantial body of research questions that assumption:

  • Murnane and Olson (1989) find that salaries affect the decision to enter teaching and the duration of the teaching career;
  • Figlio (1997, 2002) and Ferguson (1991) find that higher salaries are associated with better qualified teachers;
  • Figlio and Reuben (2001) “find that tax limits systematically reduce the average quality of education majors, as well as new public school teachers in states that have passed these limits;”
  • Ondrich, Pas and Yinger (2008) “find that teachers in districts with higher salaries relative to non-teaching salaries in the same county are less likely to leave teaching and that a teacher is less likely to change districts when he or she teaches in a district near the top of the teacher salary distribution in that county.”

To mention a few.

That is, in the aggregate, higher salaries (and better working conditions) can attract a stronger teacher workforce, and at a local level, having more competitive teaching salaries compared either to non-teaching jobs in the same labor market or compared to teaching jobs in other districts in the same labor market can help attract and especially retain teachers.

Allegretto, Corcoran and Mishel, among others, have shown that teacher wages have lagged over time – fallen behind non-teaching professions. AND, they have shown that the benefits differences are smaller than many others argue and certainly do not make up the difference in the wage deficit over time. I have shown previously on my blog that teacher wages in New Jersey have similarly lagged behind!

So, let’s assume we believe that teacher quality necessarily trumps reduced class size, for the same dollar spent. Sadly, this has been a really difficult trade-off to untangle in empirical research and while reformers boldly assume this, the evidence is not clear. But let’s accept that assumption. But let’s also accept the evidence that overall wages and local wage advantages lead to a stronger teacher workforce.

If that’s the case, then the appropriate decision to make at the district level would be to lay off teachers and marginally increase class sizes, while making sure to keep salaries competitive. After all, the aggregate data seem to suggest that over the past few decades we’ve increased the number of personnel more than we’ve increased the salaries of those personnel. That is, cut numbers of staff before cutting or freezing salaries. In fact, one might even choose to cut more staff and pay even higher salaries to gain competitive advantage in tough economic times. Some have suggested as much.  I’m not sold on that either, especially when we start talking about increasing class sizes to 30, 35 or even 50.  Note that class size may also affect the competitive wage that must be paid to a teacher in order to recruit and retain teachers of constant quality. Nonetheless, it is important to understand the role of teacher compensation in ensuring the overall quality of the teacher workforce and it is absurd to assume no negative consequences of slashing teacher pay across-the-board.

Take home point!

In summary, we should be providing thoughtful decision frameworks for local public school administrators to make cost-effective decisions regarding resource allocation rather than spewing laundry lists of reformy strategies for which no thoughtful cost-effectiveness analysis has ever been conducted.

Further, now is not the time to act in panic and haste to adopt these unfounded strategies without appropriate consideration of the up-front costs of making truly effective reforms.

A few references

Richard J. Murnane and Randall Olsen (1989) The effects of salaries and opportunity costs on length of state in teaching. Evidence from Michigan. Review of Economics and Statistics 71 (2) 347-352

David N. Figlio (1997) Teacher Salaries and Teacher Quality. Economics Letters 55 267-271.

David N. Figlio (2002) Can Public Schools Buy Better-Qualified Teachers?” Industrial and Labor Relations Review 55, 686-699.

Figlio (1997, 2002) and Ferguson (1991) find that higher salaries are associated with better qualified teachers

Ronald Ferguson (1991) Paying for Public Education: New Evidence on How and Why Money Matters. Harvard Journal on Legislation. 28 (2) 465-498.

Figlio, D.N., Reuben, K. (2001) Tax limits and the qualifications of new teachers Journal of Public Economics 80 (1) 49-71

Ondrich, J., Pas, E., Yinger, J. (2008) The Determinants of Teacher Attrition in Upstate New York.  Public Finance Review 36 (1) 112-144

Stretching Truth, Not Dollars?

This week, Mike Petrilli (TB Fordham Institute) and Marguerite Roza (Gates Foundation) released a “policy brief” identifying 15 ways to “stretch” the school dollar. Presumably, what Petrilli and Roza mean by stretching the school dollar is to find ways to cut spending while either not harming educational outcomes or actually improving them. That goal in mind, it’s pretty darn hard to see how any of the 15 proposals would lead to progress toward that goal.

The new policy brief reads like School Finance Reform in a Can. I’ve written previously about what I called Off-the-Shelf school finance reforms, which are quick and easy – generally ineffective and meaningless, or potentially damaging – revenue-neutral school finance fixes. In this new brief, Petrilli and Roza have pulled out all the stops. They’ve generated a list, which could easily have been generated by a random search engine scouring “reformy” think tank websites, excluding any ideas actually supported by research literature.

The policy brief includes some introductory ramblings about district level practices for “stretching” the school dollar, but the policy brief focuses on state policies that can assist in stretching the school dollar at the state level and provide local districts greater options to stretch the school dollar. I will focus my efforts on the state policy list.

Here’s the state policy recommendation list:

1. End “last hired, first fired” practices.

2. Remove class-size mandates.

3. Eliminate mandatory salary schedules.

4. Eliminate state mandates regarding work rules and terms of employment.

5. Remove “seat time” requirements.

6. Merge categorical programs and ease onerous reporting requirements.

7. Create a rigorous teacher evaluation system.

8. Pool health-care benefits.

9. Tackle the fiscal viability of teacher pensions.

10. Move toward weighted student funding.

11. Eliminate excess spending on small schools and small districts.

12. Allocate spending for learning-disabled students as a percent of population.

13. Limit the length of time that students can be identified as English Language Learners.

14. Offer waivers of non-productive state requirements.

15. Create bankruptcy-like loan provisions.

This list can be lumped into four basic categories:

A) Regurgitation of “reformy” ideology for which there exists absolutely no evidence that the “reforms” in question lead to any improvement in schooling efficiency. That is, no evidence that these reforms either “cut costs” (meaning reduce spending without reducing outcomes) or improve benefits (or outcome effects).

  1. Creating a rigorous evaluation system
  2. Ending “last hired, first fired” practices
  3. Move toward weighted student funding

B) Relatively common “money saving” ideas, backed by little or no actual cost-benefit analysis – the kind of stuff you’d be likely to read in a personal finance column in magazine in a dentist’s office.

  1. Pool health-care benefits.
  2. Create bankruptcy-like loan provisions. (???)
  3. Tackle pensions
  4. Cut spending on small districts and schools (consolidate?)

C) Reducing expenditures on children with special needs by pretending they don’t exist.

  1. Allocate spending for learning-disabled students as a percent of population.
  2. Limit the length of time that students can be identified as English Language Learners.

D) Un-regulation

  1. eliminate class-size limits
  2. provide waivers for ineffective mandates
  3. eliminate seat time requirements
  4. merge categorical programs
  5. eliminate work rules
  6. eliminate mandatory salary schedules

So, let’s walk through a few of these in greater detail. Let’s address whether there is any evidence whatsoever that these policies a) would actually lead to reduced short run costs while not harming, or even improving outcomes, or b) are for any other reason a good idea.

Creating an Evaluation System

This likely requires significant up front spending- heavy front end investment to design the system and put the system into place. Yes, increased, not decreased spending. And in the short-term, while money is tight. AND, there is little or no evidence that what is being recommended – a Tennessee or Colorado-style teacher evaluation model (50% on value-added scores), would actually reduce spending and /or improve outcomes. Rather, I could make a strong case that such a model will lead to exorbitant legal fees for the foreseeable future (I have a forthcoming law review article on this topic).  The likelihood of achieving long run benefits from these short run expenses is questionable at best. In fact, the likelihood of significant harm seems equal if not greater (see my previous post on this topic: value-added teacher evaluation).

Ending “Last Hired, First Fired” layoff policies

In very crude terms, this approach might simply allow a district – or entire state – to layoff senior, higher salary teachers. Yeah… that could reduce the payroll. Good policy? Really questionable! Of course, Petrilli and Roza also argue that we simply shouldn’t be paying teachers for experience or degrees anyway. So I guess if we did that, we wouldn’t generate savings from this recommendation. Silly me. One or the other, I guess.

Now, we could generate performance increases (at lower spending, if we keep seniority pay, or at constant spending if we don’t) if, and only if, the future actually plays out as simulated in the various performance-based layoff simulations which I, and others have recently discussed. The assumptions in these simulations are bold (unrealistic), and much of the logic circular.

And then there are those short-term legal costs of defending the racially disparate firings, and random error firings.

Eliminating Class Size Limits

Yes, larger classes require less spending – on a per pupil basis. Smaller classes have greater benefit (greater “bang for the buck” shall we so boldly say) in higher poverty settings. A labor market dynamic problem realized in the late 1990s, when CA implemented statewide class size reduction, was that the policy stretched the pool of highly qualified teachers and ultimately made it even harder for high poverty schools to get high quality teachers (a dreadfully oversimplified and disputable version of the story).

Removing class size limits might be reasonable if only affluent districts agreed to increase their class sizes, putting more “high quality” teachers into the available labor pool… who might then be recruited into high poverty districts (another dreadfully oversimplified, if not absurd scenario).  But who really thinks it will play out this way? We already know that affluent school districts a) have strong preferences for very small class sizes and b) have the resources to retain those small class sizes or reduce them further. See Money and the Market for High Quality Schooling.

Eliminating mandatory salary schedules

It seems that in this recommendation, Petrilli and Roza are arguing against state policies that mandate the adoption by local public school districts of specific step and lane salary schedules. They really only provide one brief paragraph with little or no explanation regarding what the heck they are talking about.

I’ve personally never been much of a fan of state rigidity regarding local negotiated agreements – at least in terms of steps and lanes. Many problems can occur where states enact policies as rigid as those of Washington State, were teachers statewide are on a single salary schedule.

The best work on this topic (and I’ve worked on the same topic with Washington data) is by Lori Taylor of Texas A&M who shows that the Washington single salary schedule leads to non-competitive wages for teachers in metro areas, and also leads to non-competitive wages for teachers in math and science relative to other career opportunities in metro areas. The statewide salary schedule in Washington is arguably too rigid. Here’s a link to Taylor’s study:

Taylor, L. (2008) Washington Wages: An Analysis of Educator and Comparable Non-educator Wages in the State of Washington. Washington State Institute for Public Policy.

But this does not mean, by any stretch of the imagination, that removing this requirement would save money, or “stretch” the education dollar. It might allow bargaining units in metro areas in Washington to scale up salaries over time as the economy improves. And it might lead to some creative differentiation across negotiated agreements, with districts trying to leverage different competitive advantages over one another for teacher recruitment.

But, these competitive behaviors among districts may also lead to ratcheting of teacher salaries across neighboring bargaining units, and may lead to increased salary expense with small marginal returns (as clusters of districts compete to pay more for an unchanging labor pool). For an analysis of this effect, see Mike Slagle’s work on spatial relationships in teacher salaries in Missouri. In short, Slagle finds that changes to neighboring district salary schedules are among the strongest predictors of an individual district’s salary schedule. Ratcheting upward of salaries in neighboring districts is likely to lead to adjustment by each neighboring district (to the extent resources are available). Ratcheting downward does not tend to occur (not reported in this article).

Slagle, M. (2010) A Comparison of Spatial Statistical Methods in a School Finance Policy Context. Journal of Education Finance 35 (3)

[note: this article is a shortened version of Mike’s dissertation. The article addresses only the ratcheting of per pupil spending, but the full dissertation also addresses teacher salaries]

In any case, we certainly have no evidence that removing state level requirements for mandatory salary schedules would save money while holding outcomes harmless – hence improving efficiency. Like I said, I’m not a big fan of such restrictions either, but I have no delusion that removing them will save any district a ton of money – or any for that matter.

This recommendation seems to also be tied up in the notion that we shouldn’t be paying teachers for experience or degree levels anyway. Therefore, mandating as much would clearly be foolish. I’ve addressed this idea previously in The Research Question that Wasn’t Asked.

In addition, this recommendation seems to adopt the absurd assumption that we could immediately just pay every teacher in the current system the bachelor’s degree base salary (Okay, the salary of a teacher with 3 years and a bachelor’s degree, where marginal test-score returns to experience fade). We could immediately recapture all of that salary money dumped into differentiation by experience or differentiation by degree, and that we could have massive savings with absolutely no harm to the quality of schooling – or quality of teacher labor force in the short-run or in the long-term. Again, that’s the research question that was never asked. Previous estimates of all of the money wasted on the master’s degree salary “bump” are actually this crude.

For similarly absurd analysis by Marguerite Roza regarding teacher pay, see my previous post on “inventing research findings.”

Move toward Weighted Student Funding

Petrilli and Roza also advocate moving to Weighted Student Funding. They seem to argue that the “big” savings here will come from the ability of states and school districts to immediately take back funding as student enrollments decline. That is, a district in a state, or school in a district gets a certain amount per kid. If they lose the kid, they lose the money. This keeps us from wasting a whole lot of money on kids who aren’t there anymore.

Okay… Now… most state aid is allocated on a per pupil basis to begin with. And, in general, as enrollments fluctuate, state aid fluctuates. Lose a kid. Lose the state aid that is driven by that kid. Some states have recognized that the costs of providing education don’t actually decline linearly (or increase linearly) with changes in enrollment and have included safety valves to slow the rate of aid loss as enrollments decline. Such policies are reasonable.

Petrilli and Roza seem to be belligerently and ignorantly declaring that there is simply never a legitimate reason for a funding formula to include small school district or declining enrollment provisions. I have testified in court as an expert against such provisions when those provisions are completely “out of whack”, but would never say they are entirely unwarranted. That’s just foolish, and ignorant.

Local revenues in many states (and in many districts within states) still make up a large share of public school funding, and local revenues are typically derived from property taxes applied to the total taxable property wealth of the school district. As kids come and go, local revenues do not come and go. If a tax levy of X% on the district’s assessed property values raises $8,000 per pupil – and if enrollment declines, but the total assessed value stays constant, the same tax raises more per pupil, perhaps $8,100. The district would lose state funding because it has fewer pupils (and perhaps also because it can generate larger local share per pupil).  But that’s really nothing new.

There’s really no new “huge” savings to be had here.

UNLESS:

a) we are talking about kids moving to charter schools from the traditional public schools, and for each kid who moves to a charter school, we either require the district to pass along the local property tax share of funding associated with that child (Many states), or reduce state aid by the equivalent amount (Missouri).

b) there exists a property tax revenue limit tied specifically to the number of pupils served in the district (as in Wisconsin and other states) which then means that the district would have to reduce its local property taxes to generate only the per pupil revenue allowed. That’s not savings. It’s a state enforced local tax cut.

So then, why do Petrilli and Roza care about Weighted Student Funding as an option? The above two “Unless” scenarios are possible suspects. Blind reformy punditry regardless of logic is equally possible (WSF is cool… reformy… who cares what it does?).

It’s not really about “saving” money at all. Rather, it’s about creating mechanisms to enable local property tax revenues to be diverted in support of charter schools (even if the local taxpayers did not approve the charter), or to have local budgets forcibly reduced/capped when students opt-in to voucher programs (Milwaukee).

And this isn’t really a “weighted student funding” issue at all. In many states, it already works this way (WSF or not). Big savings? Perhaps an opportunity to reduce the state subsidy to charter schools by requiring greater local pass through – in those states where this doesn’t already occur. But these provisions face significant legal battles in some states. If a state is not already doing this, this policy change would also likely lead to significant up front legal expenses.

In fact, I can’t imagine a circumstance where adopting weighted student funding can be expected to either save money or improve outcomes for the same money. There’s simply no proof to this effect. Sadly, while it would seem at the very least, that adopting weighted funding might improve transparency and equity of funding across schools or districts, that’s not necessarily the case either.

My own research finds that districts adopting weighted funding formulas have not necessarily done any better than districts using other budgeting methods when it comes to targeting financial resources on the basis of student needs. See: http://epaa.asu.edu/ojs/index.php/epaa/article/view/5

Petrilli and Roza’s Weighted Funding recommendation for “stretching” the dollar is strange at best. As a recommendation to state policymakers, adoption of weighted funding provides few options for “stretching” the dollar, but may provide a mechanism for diverting districts’ local revenues to support choice programs (potentially reducing state support for those programs).

As a recommendation to local school district officials, adoption of weighted funding really provides no options for “stretching” the dollar, and may, in fact, increase centralized bureaucracy required to develop and manage the complex system of decentralized budgeting that accompanies WSF (see: http://epx.sagepub.com/content/23/1/66.short)

So,

No savings?

No improvements to equity?

No evidence of improved efficiency?

What then, does WSF have to do with “stretching” the school dollar?

Baker, B.D., Elmer, D.R. (2009) The Politics of Off‐the‐Shelf School Finance Reform. Educational Policy 23 (1) 66‐105

Baker, B.D. (2009) Evaluating Marginal Costs with School Level Data: Implications for the Design of Weighted Student Allocation Formulas. Education Policy Analysis Archives 17 (3)

Savings from Small Districts and Schools

I am one who believes in creating savings through consolidation of unnecessarily small schools and school districts. And, at the school or district level, some sizeable savings can be achieved by reorganizing schools into more optimal size configurations (elementary schools of 300 to 500 students and high schools of 600 to 900 for example, See Andrews, Duncombe and Yinger)

For other research on the extent to which consolidation can help cut costs, see Does School District Consolidation Cut Costs, also by Bill Duncombe and John Yinger (the leading experts on this stuff).

Now, Petrilli and Roza, however, seem to imply that the savings from these consolidations or simply from starving the small schools and districts can perhaps help states to sustain the big districts – STRETCHING that small school dollar. Note that Petrilli and Roza ignore entirely the possibility that some of these small schools and districts (in states like Wyoming, western Kansas, Nebraska) might actually have no legitimate consolidation options. Kill them all! Get rid of those useless small schools and districts, I say!

Here’s the thing about de-funding small schools and districts to save big ones. The total amount of money often is not much… BECAUSE THEY ARE SMALL SCHOOLS!!!!!  I learned this while working in Kansas, a state which arguably substantially oversubsidizes small rural school districts, creating significant inequities between those districts and some of the states large towns and cities with high concentrations of needy students. While the inequity can (and should) be reduced, the savings don’t go very far.

So, let’s say we have 6 school districts serving 100 kids each, and spending $16,000 per pupil to do so. Let’s say we can lump them all together and make them produce equal outcomes for only $10,000 per pupil. A bold, bold assumption. We just saved $6,000 per pupil (really unlikely), across 600 pupils. That’s not chump change… it’s $3,600,000 (okay… in most state budgets that is chump change).

So, now let’s take this savings, and give it to the rest of the kids in the state – oh – about 400,000. Well, we just got ourselves about $9 per pupil. Even if we try to save the mid-sized city district of 50,000 students down the road, it’s about $72 per pupil. That is something. And if we can achieve that, then fine. But slashing small districts and schools to save big, or even average ones, usually doesn’t get us very far. BECAUSE THEY ARE SMALL! GET IT! SMALL DISTRICTS WITH SMALL BUDGETS!

Similar issues apply to elimination of very small schools in large urban districts. It’s appropriate strategy – balancing and optimizing enrollment (reorganizing those too-small high schools created as a previous Gates-funded reform?). It should be done. But unless a district is a complete mess of tiny, poorly organized schools, the savings aren’t likely to go that far.

Let’s also remember that major reconfiguration of school level enrollments will require significant up front capital expense! Yep, here we are again with a significant increased expense in the short-term. Duncombe and Yinger discuss this in their work. Strangely, this slips right past Petrilli and Roza.

Use Census Based Funding for Special Education

So, what Petrilli and Roza are arguing here is that states could somehow save money by allocating their special education funding to school districts on an assumption that every school district has a constant share of its enrollment that qualifies for special education programs. Those districts that presently have more? Well, they’ve just been classifying every kid they can find so they can get that special education money. This flat-funding policy will bring them into line… and somehow “stretch” that dollar.

Let’s say we assume that every district has 16% (Pennsylvania) or 14.69% (New Jersey) children qualifying for special education. Let’s say we pick some number, like these, that is about the current average special education population.  Our goal is really to reduce the money flowing to those districts that have higher than average rates. Of course, if we pick the average, we’ll be reducing money to the districts with higher rates and increasing money to the districts with lower rates and you know what – WE’LL SPEND ABOUT THE SAME IN SPECIAL EDUCATION AID? “Stretching?” how?

And will we have accomplished anything close to logical? Let’s see, we will have slammed those districts that have been supposedly over-identifying kids for decades just to get more special ed aid. That, of course, must be good.

BUT, we will also be providing aid for 14.69% of kids to districts that have only 7% or 8% children with disabilities. Funding on a census basis or flat basis requires that we provide excess special education aid to many districts – unless we fund all districts as if they have the same proportion of special education kids as the district with the fewest special education kids. That is, simply cut special education aid to all districts except the one that currently receives the least.

How is that smart “stretching?”

The only way to “save” money with this recommendation is simply to “cut funding” and “cut services.” And, unless cut to the bare minimum, the “flat allocation” strategy requires choosing to “overfund” some districts while “underfunding” others. One might try to argue that this policy change would at least reduce further growth in special ed populations. But the article below suggests that this is not likely the case either. The resulting inequities significantly offset any potential benefits.

There exist a multitude of problems with flat, or census-based special education funding, which have led to declining numbers of states moving in this direction in recent years, New Jersey being an exception. I discuss this with co-authors Matt Ramsey and Preston Green in our forthcoming chapter on special education finance in the Handbook on Special Education Policy Research.

Of course, there also exists the demographic reality that children with disabilities are simply not distributed evenly across cities, towns and rural areas within states, leading to significant inequities when using Census Based funding. CB Funding is, in fact, the antithesis of Weighted Student Funding. How does one reconcile that?

For a recent article on the problems with the underlying assumptions of Census Based special education funding, see:

Baker, B.D., Ramsey, M.J. (2010) What we don’t know can’t hurt us? Evaluating the equity consequences of the assumption of uniform distribution of needs in Census Based special education funding. Journal of Education Finance 35 (3) 245‐275

Here’s a draft copy of our forthcoming book chapter on special education finance: SEF.Baker.Green.Ramsey.Final

Limit Time for ELL/LEP

This one is both absurd and obnoxious. Essentially, Petrilli and Roza argue that kids should be given a time limit to become English proficient and should not be provided supplemental programs or services – or at least the money for them – beyond that time frame. For example, a child might be funded for supplemental services for 2 years, and 2 years only. Some states have done this. Again, there is no clear basis for such cutoffs, nor is it clear how one would even establish the “right” time limit, or whether that time limit would somehow vary based on the level of language proficiency at the starting time.

Yes, this approach, like cutting special education funding can be used to cut spending and cut and reduce the quality of services. But that’s all it is. It’s not “stretching” any dollar.

Other Stuff

Now, the brief does list other state policy options as well as other district practices. Some of these are rather mundane, typical ideas for “cost saving.” But, of course, no evidence or citation of actual cost effectiveness, cost benefit or cut utility analysis is presented. Petrilli and Roza toss around ideas like a) pooling health care costs, b) redesigning sick leave policies or c) shifting health care costs to employees. These are the kind of things that are often on the table anyway.

I fail to see how this new policy brief provides any useful insights in this regard. Some actual cost-benefit analysis would be the way to go. As a guide for such analyses, I recommend Henry Levin and Patrick McEwan’s book on Cost Effectiveness Analysis in Education.

There are a handful of articles available on the topic of incentives associated with varied sick leave policies, including THIS ONE, School District Leave Policies, Teacher Absenteeism, and Student Achievement, by Ron Ehrenberg of Cornell (back in 1991).

One category I might have included above is that at least two of the recommendations embedded in the report argue for stretching the school dollar, so-to-speak, by effectively taxing school employees. That is, setting up a pension system that requires greater contribution from teacher salaries, and doing the same for health care costs. This is a tax – revenue generating (or at least a give back). This is not stretching an existing dollar. This is requiring the public employees, rather than the broader pool of taxpayers (state and/or local), to pay the additional share. One could also classify it as a salary cut. But Petrilli and Roza have already proposed salary cuts in half of the other recommendations. Just say it. Hey… why not just take the “master’s bump” money and use that to pay for pensions and health care? No-one will notice it’s even gone? We all know it was wasted and un-noticed to begin with.

I was particularly intrigued by the entirely reasonable point that school districts should NOT make the harmful cuts by narrowing their curriculum. I was intrigued by this point because this is precisely what Marguerite Roza has been arguing that poor districts MUST do in order to achieve minimum standards within their existing budgets. I wrote about this issue previously HERE. It is an interesting, but welcome about-face to see Roza no-longer arguing that poor, resource constrained school districts should dump all but the basics (while other districts, with more advantaged student populations and more adequate resources need not do the same).

Utter lack of sources/evidence for any/all of this junk

Finally, I encourage you to explore the utter lack of support (or analysis) that the policy brief provides for any/all of its recommendations. It won’t take much time or effort. Read the footnotes. They are downright embarrassing, and in some cases infuriating. At the very least, they border on THINK TANKY MALPRACTICE.

There is a reference to the paper by Dan Goldhaber simulating seniority based layoffs, but that paper provides no analysis of cost/benefit, the central premise of the dollar stretching brief. The Petrilli/Roza (not Goldhaber) assumption is simply that the results will be good, and because we are firing more expensive teachers, it will cost less to get those good results.

The policy brief makes a reference to “typical teacher contracts” (FN2) regarding sick leave, with no citation… no supporting evidence, and phrased rather offensively (18 weeks a year off? For all teachers? Everywhere! OMG???)

FN2: Typical U.S. teacher contracts are for 36.5 weeks per year and include 2.5 weeks sick and personal days for a total work year of 34 weeks, or 18 weeks time off.

The brief refers to work by NCTQ (not the strongest “research” organization) for how to restructure teacher pay.

The report self-cites The Promise of Cafeteria Style Pay (by Roza, non-peer reviewed… schlock), and makes a bizarre generalized attack in footnote 5 that school districts uniformly defend the use of non-teaching staff as substitutes (no evidence/source provided).

FN5: Districts requiring non-teaching staff to serve as substitutes argue that it is good practice to have all staff in classrooms at least a few days a year.

The brief cites policy reports (and punditry) on pension gaps (including the Pew Center report), and those reports refer to alternative plans for closing gaps over time. These are important issues, but the question of how this “stretches” the school dollar is noticeably absent.

And that’s it. That’s the entire extent of “research” and “evidence” used to support this policy brief.

Introducing the Reform-Inator!

Introducing the Coolest New Gadget of the Year – just in time for last-day shopping! The Reform-inator!

  1. Can be used to instantly fire and/or de-tenurize teachers. However, in order to use the reform-inator for these purposes you must line up 100 teachers including all of the good, bad and average ones. The reforminator is a bit touchy… and misfires quite frequently … hitting an average teacher instead of a truly bad one about 35% of the time, and hitting a good teacher instead of a truly bad one about 20% of the time. But what the heck… go for it. Thin the herd. Probabilities are in your favor, if only marginally. And besides, there will be plenty more teachers willing to step up and face the firing line next year.
  2. Can be used to instantly replicate (or new reformy term: scalify, or scalification) only the upper half of charter schools, because we all know that the upper half of charter schools are … well… better than average ones, and well… good charters are good… and bad ones bad (but no need to talk about those, just as there’s no need to talk about the good traditional public schools)… so we really want to replicate and expand only those good charters (primarily by reduced regulation, increased numbers of authorizers and reduced oversight requirements, even though the track record to date hasn’t really shown that to be easily accomplished).
  3. Can be used to take anything that is presently about 7% smaller than it was in the past, and make it disappear entirely – GONE… ALL GONE… just like all of the money for public schools. It’s not just recessed – temporarily diminished – It’s just gone. Vanished. Time to shut it all down! No more sweetheart deals (especially in those really crazy overspending states like Arizona and Utah)!
  4. Can instantly make value-added estimates of teacher effectiveness the “true” measure of teacher effectiveness, and further, can make value-added estimates of teacher effectiveness a stronger predictor of themselves… which of course, are the true measure of effectiveness (stronger than a weak to moderate correlation, that is). Use the special self-validation trigger for this particular effect. Also works for low self-esteem.
  5. Can be used to locate Superman (‘cuz I sure can’t find him in these scatterplots of NYC charter school performance compared to traditional public schools, or these from Jersey either).
  6. Will eliminate entirely anything that might be labeled as Status Quo! Because we all know that if it’s status quo – it’s got to go (or at the very least, the first reformy role of logic: “anything is better than the status quo”)
  7. Most importantly, like any good REFORMY tool, it’s got a Trigger!

Other ideas?

Is it the “New Normal” or the “New Stupid?”

I’ll admit from the start that I’m recycling some arguments here (okay… all of the arguments) … but this stuff needs to be reinforced, over and over again. Quite honestly, to me, from a school finance perspective, this is the most important issue that has surfaced in the past year, and potentially the most dangerous and damaging for the future of American public education.

Robert Reich of Berkeley recently wrote of the Attack on American Education:

http://wallstreetpit.com/54502-the-attack-on-american-education

Specifically, Reich pointed to substantial budget cuts across states as evidence of our de-investment in public schooling. Here are the first three states (by alphabetical order), and the education spending cuts mentioned by Reich in his blog post:

  • Arizona has eliminated preschool for 4,328 children, funding for schools to provide additional support to disadvantaged children from preschool to third grade, aid to charter schools, and funding for books, computers, and other classroom supplies. The state also halved funding for kindergarten, leaving school districts and parents to shoulder the cost of keeping their children in school beyond a half-day schedule.
  • California has reduced K-12 aid to local school districts by billions of dollars and is cutting a variety of programs, including adult literacy instruction and help for high-needs students.
  • Colorado has reduced public school spending in FY 2011 by $260 million, nearly a 5 percent decline from the previous year. The cut amounts to more than $400 per student.

As I have mentioned on numerous previous occasions, even the assumption that these cuts represent “de-investment” (suggesting cutting back on something that has been scaled up over time) is flawed, because it accepts that these states actually invested to begin with. Reich points out that current attack is a seemingly unprecedented attack on public education budgets across states, in both K-12 and higher education and arguably an attack on promoting an educated society more generally:

Have we gone collectively out of our minds? Our young people — their capacities to think, understand, investigate, and innovate — are America’s future. In the name of fiscal prudence we’re endangering that future.

But even Reich’s arguments fail to point out that in many of these states, the attack on education and de-investment (if there ever was significant investment, or scale up) has been occurring for decades. In good times, and in bad… Bad economic times just provide a more convenient excuse. Couple that with all of the new rhetoric about the “New Normal” and the excuses to slash-and-burn public school funding are at an all time high.

Let’s review:

First, here’s where the above three states fit into comparisons of state and local education revenue per pupil. Yes, some of the higher spending states are cutting back as well, if you read down Reich’s list of education spending cuts, but these three states have a particularly rich history of low spending and education cutbacks (including year after year mid-year funding recisions, even in good economic times in Colorado) .

Figure 1

Okay,so who cares if they aren’t spending that much. Maybe it’s because they’ve been taxing themselves to death… like we all have, obviously… we all know that… and that education spending is simply eating away at their economies. It’s just not sustainable!

So, here are direct expenditures on education (k-12 and higher ed) as a percent of aggregate personal income for each state. California has been flat, and low for over 30 years and Colorado and Arizona which were once relatively high, have decreased their effort consistently for about 30 years, in a race to the bottom.

Figure 2

Total Direct Education Spending as a Percent of Personal Income

Yeah but… yeah but….yeah but… it’s because their total taxes are so darn high. This is just education. Well then:

Figure 3

Yes, even on these, California is perhaps somewhat above average, whereas Colorado in recent years has been sitting near the bottom. Arizona jumped up in recent years, but is by no means high, compared with other states or trended, over time, out of control.

But even then, we know they’ve all gone wild on teacher hiring… bloating that teacher workforce, reducing class sizes and pupil teacher ratios to inefficiently low levels:

Figure 4

Pupil to Teacher Ratios over Time

Okay, well maybe not California, Arizona or Colorado (or Utah… in Gray at the top of the figure). California did increase teacher numbers in the late 1990s with class size reduction, but that flattened out and increased since, with lack of financial support.

But we all know that none of this matters anyway, right?

In fact, REFORMY logic dictates that it’s those states which have been spending like crazy, wasting their effort and paying for way too many teachers that are a real drag on our national test scores AND our economy.

The problem is not states like California, Arizona or reformy standouts like Colorado (or Tennessee or Lousiana), but rather, those over-educated curmudgeonly high spending non-reformy, low pupil teacher ratio states like Vermont, Massachusetts and New Jersey.

They – yes they – with their gold-plated schools are the shame of our nation (and why we can’t be Finland, right?)!  Our national education emergency (if there is one) is certainly not the fault of those states exercising consistent and appropriate fiscal austerity in good times or in bad.

Well:

Figure 5

Relationship Between State & Local Revenue per Pupil (for high poverty districts) & NAEP Mean Scale Scores

www.schoolfundingfairness.org

On average, states like Arizona and California which have high need student populations, but have thrown their public schools under the bus, are a significant drag on our national performance.

And this is due to lack of effort as much as it is lack of capacity.  Higher effort states also tend to be the higher spending states which also tend to have the higher outcomes. And, when taken as a separate group, compare quite favorably on international performance comparisons.

Figure 6

Relationship between Fiscal Effort and Level of Financial Resources

www.schoolfundingfairness.org

Finally, these differences in outcomes, effort and pupil to teacher ratios are not all about differences in poverty. Again, I’ve already pointed out that these states have high pupil-to-teacher ratios and low spending not because they are poor but rather because they don’t put up the effort.

And now we are boldly (and belligerently) encouraging them to “do more with less” by which we actually mean “do even less with less?”

To clarify how poverty rates fit within this picture, Figure 7 provides adjusted state poverty estimates (see citation below figure) and pupil to teacher ratios. At their respective poverty levels, each of these states has higher – if not much higher than average pupil to teacher ratios. They also have much lower than average per pupil spending.

Figure 7

State Cost Adjusted Poverty Estimates and Pupil to Teacher Ratios

Renwick, Trudi. Alternative Geographic Adjustments of U.S. Poverty Thresholds: Impact on State Poverty Rates. U.S. Census Bureau, August 2009

Further, while these states have higher pupil to teacher ratios than other states with similar poverty rates, they also have very low outcomes even compared to other states with similar corrected poverty rates. Colorado remains somewhat in the middle of the pack on outcomes, having a lower poverty population than either Arizona or California and also having more recently slashed and burned its public education system. Colorado pupil to teacher ratios have also remained closer to those of other states, and much lower than California or Arizona.

Figure 8

State Cost Adjusted Poverty Estimates and NAEP Mean Outcomes

 

How does this all fit into the long-run picture of investment in public schooling? Yes, we’ve had the most significant economic downturn in several decades. State budgets took a hit, and good information on that budget hit can be found at www.rockinst.org, where, among other things, data show that the most recent quarterly estimates of state revenue are still about 7% off their peak in 2008. That’s 7% – not 100%, not 20% (even more important is the variation across states). It’s a hole. But it’s not ALL GONE (and only a complete fool would argue as much)! Note that there have been in the past few decades at least two other significant economic slowdowns/downturns that affected state revenues and education spending – from about 1989 to 1992 – with lagged effects in some regions, and from 2001 to 2002 (post 9/11 shock).  In some states, education spending rebounded in the wake of these downturns, but in others, state legislatures continued to constrain if not outright slash-and burn state education budgets (while expanding tax cuts) throughout the economic good times that followed each downturn (1996ish to 2001 and 2002 t 2008).

What’s different now? Why are we sitting at the edge of a much more dangerous policy agenda? Well, the recent economic downturn was greater. But again, recent data shows the beginnings of a rebound. What is most different is that we are now faced with this completely absurd argument of The New Normal – as a national agenda to scale back education spendingEVEN IN STATES WHERE IT HAD ALREADY BEEN SCALED BACK FOR DECADES. But who knew? Didn’t every state just spend out of its freakin’ mind for …oh… the past hundred years or so?

The New Normal argument that we must cut back our bloated education budgets and increase class sizes and pupil to teacher ratios back to reasonable levels is, at best, based on the shallowest understanding of (hyper-aggregated & overstated) national “trends” in education spending and pupil to teacher ratios, coupled with complete obliviousness to the variations in effort and spending and pupil to teacher ratios that exist across states, and for that matter, the demographic trends in some states which make it appear as if education spending has spiraled out of control (Vermont). That is, if we assume that those pitching-tweeting-blogging The New Normal have even the first clue about trends in education spending, state school finance systems, and the quality of public schooling across states to begin with. Personally, I’m not sure they do. In fact, I’m increasingly convinced they don’t.

A few comments on the Gates/Kane value-added study

A few comments on the Gates/Kane Value-added study

(My apologies in advance for an excessively technical, research geeky post, but I felt it necessary in this case)

Take home points

1) As I read it, the new Gates/Kane value-added findings are NOT by any stretch of the imagination an endorsement of using value-added measures of teacher effectiveness for rating individual teachers as effective or not or for making high-stakes employment decisions. In this regard, the Gates/Kane findings are consistent with previous findings regarding stability, precision and accuracy of rating individual teachers.

2) Even in the best of cases, measures used in value-added added models remain insufficiently precise or accurate to account for the differences in children served by different teachers in different classrooms (see discussion of poverty measure in first section, point #2 below)

3) Too many of these studies, including this one, adopt the logic that value-added outcomes can be treated both as a measure of effectiveness to be investigated (independent variable) and as the true measure of effectiveness (the dependent measure). That is, this study like others evaluates the usefulness of both value added measures and other measures of teacher quality by their ability to predict future (or different group) value-added measures. Certainly, the deck is stacked in favor of value added measures under such a model. See value-added as a predictor of itself below.

4) Value-added measures can be useful for exploring variations in student achievement gains across classroom settings and teachers, but I would argue that they remain of very limited use for identifying more precisely or accurately, the quality of individual teachers.  Among other things, the most useful findings in the new Gates/Kane study apply to very few teachers in the system (see final point below).

Detailed discussion

Much has been made of the preliminary findings of the Gates Foundation study on teacher effectiveness. Jason Felch of the LA Times has characterized the study as an outright endorsement of the use of Value-added measures as the primary basis for determining teacher effectiveness. Mike Johnston, the Colorado State Senator behind that state’s new teacher tenure law, which requires that 50% of teacher evaluation be based on student growth (and tenure and removal of tenure based on the evaluation scheme), also seemed thrilled – via twitter – that the Gates study found that value-added scores in one year predict value-added scores in another – seemingly assuming this finding unproblematically endorses his policies (?) (via Twitter: SenJohnston Mike Johnston New Gates foundation report on effective teaching: value added on state test strongest predictor of future performance).

But, as I read it, the new Gates study is – even setting aside its preliminary nature – NOT AN OUTRIGHT ENDORSEMENT OF USING VALUE-ADDED MEASURES AS A SIGNIFICANT BASIS FOR MAKING HIGH STAKES DECISIONS ABOUT TEACHER DISMISSAL/RETENTION, AS IS MANDATED VIA STATE POLICIES LIKE THOSE ADOPTED IN COLORADO – OR AS SUGGESTED BY THE ABSURDLY NARROW APPROACH FOR “OUTING” TEACHERS TAKEN BY MR. FELCH AND THE LA TIMES.

Rather, the new Gates study tells us that we can use value-added analysis to learn about variations in student learning (or at least in test score growth) across classrooms and schools and that we can assume that some of this variation is related to variations in teacher quality. But, there remains substantial uncertainty in the capacity to estimate whether any one teacher is a good teacher or a bad one.

Perhaps the most important and interesting aspects of the study are its current and proposed explorations of the relationship between value-added measures and other measures, including student perceptions, principal perceptions and external evaluator ratings.

Gates Report vs. LA Times Analysis

In short, data quality and modeling matter, but you can only do so much.

For starters, let’s compare some of the features of the Gates study value added models to the LAT models. These are some important differences to look for when you see value- added models being applied to study student performance differences across classrooms – especially where the goal is to assign outcome effects to teachers.

  1. The LAT Times model, like many others, uses annual achievement data (as far as I can tell) to determine teacher effectiveness, whereas the Gates study at least explores the seasonality of learning – or more specifically, how much achievement change occurs over the summer (which is certainly outside of teacher’s control AND differs across students by their socioeconomic status). One of the more interesting findings of the Gates study is that from 4th grade on: “The norm sample results imply that students improve their reading comprehension scores just as much (or more) between April and October as between October and April in the following grade. Scores may be rising as kids mature and get more practice outside of school.” This means that if there exist substantial differences in summer learning by students’ family income level and/or other factors, as has been found in other studies, then using annual data could significantly and inappropriately disadvantage teachers who are assigned students whose reading skills lagged over the summer. The existing blunt indicator of low income status is unlikely to be sufficiently precise to correct for summer learning differences.
  2. The LA Times model did include such blunt measures for poverty status and language proficiency, as well as disability status (single indicator), but later found shares of gifted children to be associated with differences in teacher ratings, along with student race. The Gates study includes similarly crude indicators of socioeconomic status, but does include in their value-added model whether individual children are classified as gifted. It also includes student race and the average characteristics of students in each classroom (peer group effect). This is much richer and more appropriate model, but still likely insufficient to fully account for the non-random distribution of students.  That is, the Gates study models at least attempt to correct for the influence of peers in the classroom in addition to individual characteristics of students, but even this may be insufficient. One particular concern of mine is the use of a single dichotomous measure of child poverty – whether the child qualifies for free or reduced price lunch – and the share of children in each class who do. The reality is that in many urban public schooling settings like those involved in the Gates study, several elementary/middle schools have over 80% children qualifying for free or reduced lunch, but this apparent similarity is no guarantee of similar poverty conditions among the children in one school or classroom compared to another. One classroom might be filled 80% with children whose family income is at or below 100% income threshold for poverty, whereas another classroom might be filled with 80% children whose income is 85% higher (at the threshold for “reduced” price lunch). This is a big difference that is not captured with this crude measure.
  3. The LAT analysis uses a single set of achievement measures. Other studies like the work of Sean Corcoran (see below) using data from Houston, TX have shown us the relatively weak relationship between value-added ratings of teachers produced by one test and value added ratings of teachers produced by another test. Thankfully, the Gates foundation analysis takes steps to explore this question further, but I would argue, overstates the relationship they found between tests or states that relationship in a way that might be misinterpreted by pundits seeking to advance the use of value-added for high stakes decisions (more later).

Learning about Variance vs. Rating Individual Teachers with Precision and Accuracy

If we are talking about using the value-added method to classify individual teachers as effective or ineffective and to use this information as the basis for dismissing teachers or for compensation, then we should be very concerned with the precision and accuracy of the measures as they apply to each individual teacher. In this context, one can characterize precision and accuracy as follows.

  • Precision – That there exists little error in our estimate that a teacher is responsible for producing good or bad student value-added on the test instrument used.  That is, we have little chance of classifying a good teacher as bad, an average teacher as bad, or vice versa.
  • Accuracy – That the test instrument and our use of it to measure teacher effectiveness is really measuring “true” effectiveness of the teacher – or truly how good that teacher is at doing all of the things we expect that teacher to do.

If, instead of classifying individual teachers as good or bad (and firing them, or shaming them in the newspaper or on milk cartons), we are actually interested in learning about variations in “effectiveness” across many teachers and many sections of students over many years, and whether student perceptions, supervisor evaluations, classroom conditions and teaching practices are associated with differences in effectiveness, we are less concerned about precise and accurate classification of individuals and more concerned about the relationships between measures, across many individuals (measured with error).  That is, do groups of teachers who do more of “X” seem to produce better value-added gains? Do groups of teachers prepared in this way seem to produce better outcomes? We are not concerned about whether a given teacher is accurately “scored.” Instead, we are concerned about general trends and averages.

The Gates study, like most previous studies, finds what I would call relatively weak correlations between the value-added score an individual teacher receives for one section of students in math or reading compared to another, and from one year to the next. The Gates research report noted:

“When the between-section or between-year correlation in teacher value-added is below .5, the implication is that more than half of the observed variation is due to transitory effects rather than stable differences between teachers. That is the case for all of the measures of value-added we calculated.”

Below is a table of those correlations – taken from their Table #5.

Unfortunately, summaries of the Gates study seem to obsess on how relatively high the correlation is from year to year for teachers rated by student performance on the state math test (.404) and largely ignore how much lower many of the other correlations are. Why is the correlation for the ELA test under .20 and what does that say about the high-stakes usefulness of the approach? Like other studies evaluating the stability of value-added ratings, the correlations seem to run between .20 and .40, with some falling below .20. That’s not a very high correlation – which then suggests not a very high degree of precision in figuring out which individual teacher is a good teacher versus which one is bad. BUT THAT’S NOT THE POINT EITHER!

Now, the Gates study rightly points out that lower correlations do not mean that the information is entirely unimportant. The study focuses on what it calls “persistent” effects or “stable” effects, arguing that if there’s a ton of variation across classrooms and teachers, being able to explain even a portion of that variation is important – A portion of a lot is still something. A small slice of a huge pie may still provide some sustenance. The report notes:

“Assuming that the distribution of teacher effects is “bell-shaped” (that is, a normal distribution), this means that if one could accurately identify the subset of teachers with value-added in the top quartile, they would raise achievement for the average student in their class by .18 standard deviations relative to those assigned to the median teacher. Similarly, the worst quarter of teachers would lower achievement by .18 standard deviations. So the difference in average student achievement between having a top or bottom quartile teacher would be .36 standard deviations.” (p.19)

The language here is really, really, important, because it speaks to a theoretical and/or hypothetical difference between high and low performing teachers drawn from a very large analysis of teacher effects (across many teachers, classrooms, and multiple years). THIS DOES NOT SPEAK TO THE POSSIBILITY THAT WE CAN PRECISELY AND ACCURATELY IDENTIFY WHETHER ANY SINGLE TEACHER FALLS IN THE TOP OR BOTTOM GROUP! It’s a finding that makes sense when understood correctly but one that is ripe for misuse and misunderstanding.

Yes, in probabilistic terms, this does suggest that if we implement mass layoffs in a system as large as NYC and base those layoffs on value-added measures, we have a pretty good chance of increasing value-added in later years – assuming our layoff policy does not change other conditions (class size, average quality of those in the system – replacement quality). But any improvements can be expected to be far, far, far less than the .18 figure used in the passage above. Even assuming no measurement error – that the district if laying off the “right” teachers (a silly assumption), the newly hired teachers can be expected to fall, at best, across the same normal curve. But I’ve discussed my taste for this approach to collateral damage in previous posts. In short, I believe it’s unnecessary and not that likely to play out as we might assume. (see discussion of reform engineers at bottom)

A Few more Technical Notes

Persistent or Stable Effects: The Gates report focuses on what it terms “persistent” effects of teachers on student value-added – assuming that these persistent effects represent the consistent, over time or across sections influence of a specific teacher on his/her students’ achievement gains. The report focuses on such “persistent” effects for a few reasons. First, the report uses this discussion to, I would argue, overplay the persistent influence teachers have on student outcomes – as in the quote above which is later used in the report to explain the share of the black-white achievement gap that could be closed by highly effective teachers. The assertion is that even if teacher effects explain small portion of variations in student achievement gains, if variations in those gains are huge, then explaining a portion is important. Nonetheless, the persistent effects remain a relatively small portion (as high as “modest” portion in some cases) – which dramatically reduces the precision with which we can identify the effectiveness of any one teacher (taking as given that the tests are the true measure of effectiveness – the validity concern).

AND, I would argue that it is a stretch to assume that the persistent effects within teachers are entirely a function of teacher effectiveness. The persistent effect of teachers may also include the persistent characteristics of students assigned to that teacher – that the teacher, year after year, and across sections is more likely to be assigned the more difficult students (or the more expert students). Persistent pattern yes. Persistent teacher effect? Perhaps partially (How much? Who knows?).

Like other studies, the identification of persistent effects from year to year, or across sections in the new Gates study merely reinforces that with more sections and/or more years of data (more students passing through) for any given teacher, we can gain a more stable value-added estimate and more precise indication of the value-added associated with the individual teacher. Again, the persistent effect may be a measure of the persistence of something other than the teacher’s actual effectiveness (teacher X always has the most disruptive kids, larger classes, noisiest/hottest/coldest – generally worst classroom).  The Gates study does not (BECAUSE IT WASN’T MEANT TO) assess how the error rate of identifying a teacher as “good” or “bad” changes with each additional year of data, but given that other findings are so consistent with other studies, I would suspect the error rate to be similar as well.

Differences Between Tests: The Gates study provides some useful comparisons of value-added ratings of teachers on one test, compared with ratings of the same teachers on another test – a) for kids in the same section in the same year, and b) for kids in different sections of classes with the same teacher.

Note that in a similar analysis, Corcoran, Jennings and Beveridge found:

“among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”

Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

That is, analysis of teacher value-added ratings on two separate tests called into question the extent to which individual teachers might accurately be classified as effective when using a single testing instrument. That is, if we assume both tests to measure how effective a teacher is a teaching “math,” or a specific subject within “math,” then both tests should tell us the same thing about each teacher – which ones are truly effective math teachers and which ones are not. Corcoran’s findings raise serious questions about accuracy in this regard.

The Gates study argues that comparing teacher-value added across two math tests – where one is more conceptual – allows them to validate that doing well on one test, the state test – as long as the results are correlated with the other, more conceptual test, did not compromise conceptual learning. That seems reasonable enough, to the extent that the testing instruments are being appropriately described (and to the extent they are valid instruments).  In terms of value-added ratings, the Gates study, like the Corcoran study, finds only a modest relationship between ratings of teacher based on one test and ratings based on the other:

“the correlation between a teacher’s value-added on the state test and their value-added on the Balanced Assessment in Math was .377 in the same section and .161 between sections.”

But the Gates study also explores the relationships between “persistent” components across tests – which must be done across sections taking the test in the same year (until subsequent years become available). They find:

“we estimate the correlation between the persistent component of teacher impacts on the state test and on BAM is moderately large, .54.”

“The correlation in the stable teacher component of ELA value-added and the Stanford 9 OE was lower, .37.”

I’m uncomfortable with the phrasing here that says – “persistent component of teacher impacts” – in part because there exist a number of other persistent conditions or factors that may be embedded in the persistent effect, as I discuss above. Setting that aside, however, what the authors are exploring is whether the correlated component – the portions of student performance on any given test that are assumed to represent teacher effectiveness – is similar between tests.

In any case, however, these correlations like the others in the Gates analysis are telling us how highly associated – or not – the assumed persistent component is across tests across many teachers teaching many sections of the same class.  This allows the authors to assert that across all of these teachers and the various sections they teach, there is a “moderately” large relationship between student performance on the two different tests, supporting the authors’ argument that one test somewhat validates the other. But again, this analysis, like the others in the report, does not suggest by any stretch of the imagination that either one test or the other will allow us to precisely identify the good teacher versus the bad one. There is still a significant amount of reshuffling going on in teacher ratings from one test to the next, even with the same students in the same class sections in the same year. And, of course, good teaching is not synonymous with raising a student’s test scores.

This analysis does suggest that we might – by using several tests – get a more accurate picture of student performance and how it varies across teachers, and does at least suggest that across multiple tests – if the persistent component is correlated – just like across multiple years – we might get a more stable picture of which teachers are doing better/worse.  Precise enough for high stakes decisions (and besides, how much more testing can we/they handle?)? I’m still not confident that’s the case.

Value-added is the best predictor of itself

This seems to be one of the findings that gets the most media-play (and was the basis of Senator Johnston’s proud tweets). Of course value-added is a better predictor of future value-added (on the same test and with the same model) than other factors are of future value-added – even if value-added is only a weak predictor of future (or different section) value-added. Amazingly, however, many of the student survey responses on factors related to things like “Challenge” seem almost as related to value-added as value-added to itself. That is a surprising finding, and I’m not sure yet what to make of it. [note that the correlation between student ratings and VAM were for the same class & year, whereas VAM predicting VAM is a) across sections and b) across years).

Again, the main problem with this VAM predicts VAM argument is that it assumes value-added ratings in the subsequent year to be THE valid measure of the desired outcome. But that’s the part we just don’t yet know. Perhaps the student perceptions are actually a more valid representation of good teaching than the value-added measure? Perhaps we should flip the question around? It does seem reasonable enough to assume that we want to see students improve their knowledge and skills in measurable ways on high quality assessments. Whether our current batch of assessments, as we are currently using them and as they are being used in this analysis accomplishes that goal remains questionable.

What is perhaps most useful about the Gates study and future research questions is that it begins to explore with greater depth and breadth the other factors that are – and are not – associated with student achievement gains.

Findings apply to a relatively small share of teachers

I have noted in other blog posts on this topic that in the best of cases (or perhaps worst if we actually followed through with it), we might apply value added ratings to somewhat less than 20% of teachers – those directly responsible and solely responsible for teaching reading or math to insulated clusters of children in grades 3 to 8 – well… 4-8, actually … since many VA models use annual data and the testing starts with grade 3. Even for the elementary school teachers who could be rated, the content of the ratings would exclude a great deal of what they teach. Note that most of the interesting findings in the new Gates study are those which allow us to evaluate the correlations of teachers across different sections of the same course in addition to subsequent years. These comparisons can only be made at the middle school level (and/or upper elementary, if taught by section). Further, many of the language arts correlations were very low, limiting the more interesting discussions to math alone. That is, we need to keep in mind that in this particular study, most of the interesting findings apply to no more than 5% to 10% of teachers – those involved in teaching math in the upper elementary and middle grades – specifically those teaching multiple sections of the same math content each year.


Still searching for that pot of gold

The rhetoric about our decades-long drunken spending spree just won’t stop, nor will the rhetoric that the money is all gone. All of it. Nothin’ left. We spent it all. We taxed ourselves to the limit and those damn teachers unions and public schools just took it all and left us with the bill. It’s gone! all gone!

Here are some recent quotes/comments from pundits who’ve done little analytically but to offer a few absurd back of the napkin explanations for why they believe that a) we’ve been on a drunken spending spree and b) it’s all gone!

Andy Rotherham in Time:

the golden age of school spending is likely coming to an end.

http://www.time.com/time/nation/article/0,8599,2035999,00.html

There’s so much more in this article, including statements about how it’s plainly obvious that for each worker added to a private firm, there is an immediate incremental return in production output (each additional worker adds $x worth of output to any private firm) whereas in education we continue to add workers and see nothing in return. Both parts of this assumption are… well… just nutty.

So, Rotherham has given us the argument that our “golden age” of school spending is coming to an end. And Mike Petrilli, in a twitter-battle with Diane Ravitch has laid down the Petrillian Truth (roll with that one Mike…it’s got a nice ring) that “The Money is Gone!”

MichaelPetrilli: That’s a great line, Diane, but it doesn’t solve the problem. The money is gone. We have to help schools cut smart.
http://educationnext.org/in-which-i-debate-diane-ravitch-in-140-characters-or-less/

That’s right. It’s all gone. It’s freakin’ gone. Cut, cut, cut. Cut it all. Zero out public education. It doesn’t matter what state you live in, what part of the country, your state has taxed you to the limit and has spent it all on the edu-bureaucracy. Every state… the whole nation has simply been pouring money into schools and they have to stop because the money is gone.

Okay, really, how much is gone? And has any of it come back yet? Is it really all gone forever? Is 20% gone, 50%, or perhaps even 70%? Must we reset the system to an average cost that is, say, 20% below where it was in 2008? 10?

You know, there are actually legitimate researchers and organizations out there tracking the condition of state and local revenues. And while these have been some tough times, their findings are somewhat less apocolyptic than the comments of Rotherham and Petrilli above… who don’t actually look at state budget data when making these claims. Here are the findings from the most recent quarterly report from the Rockefeller Institute:

The Rockefeller Institute’s compilation of data from 48 early reporting states shows collections from major tax sources increased by 3.9 percent in nominal terms compared to the third quarter of 2009, but was 7.0 percent below the same period two years ago. Gains were widespread, with 42 states showing an increase in revenues compared to a year earlier. After adjusting for inflation, tax revenues increased by 2.6 percent in the third quarter of 2010 compared to the same quarter of 2009. States’ personal income taxes represented a $2.5 billion gain and sales taxes a $2.0 billion gain for the period.
www.rockinst.org

Yes, revenues are down. State revenues are still rolling in about 7% below where they were in 2008, but in most states have begun to rebound… in order to reach that level. We took a hit. States took a hit. Some took a bigger hit than others and some are rebounding more quickly and others more slowly.

But, I must also reiterate that not every state really put their heart into public schools or the combination of their elementary and secondary and higher education systems to begin with. Many have already been systematically reducing their spending effort for years.

A few national graphs first. Here’s total state and local government expenditure as a share of personal income over time.Yes, on average, it has climbed slightly over 30 years. And, it has oscillated in between, with government expenditure (state and local) declining as a share of personal income during those periods when personal income grew quickly.

Elementary, secondary and higher education do make up a sizable share of this spending – albeit not clearly a drunken spree. Here’s education direct expenditures as a share of state and local general expenditures over the same time period.

So, the reality is that education spending first declined as a share of general spending and has since leveled off. So actually, it may be some of that other stuff that’s creating pressure on the system, a point duly acknowledged by Rotherham. But, the current argument seems to be that public schools are discretionary – negotiable – and all of that other stuff is not. Either way, even the total growth in the previous figure is not that disconcerting.  A whole other discussion for a later point in time is the issue of how many states have kicked non-current expenditures (pension obligations and other debt) down the road for someone else to deal with.

Most importantly, however, here are the differences in direct education spending as a share of personal income across states. When it comes to public K-12 and higher education systems, states vary widely. Some have provided high levels of support for schools, allocated that support fairly and maintained appropriate levels of effort to finance their education systems. Others have thrown their education systems under the bus. They don’t need some data-proof ideologue to tell them that the money is gone and now’s the time to cut.

This figure, like the ones in my previous “bubble” post, shows the variation in “effort” across states – measured somewhat differently – but same conclusion. That’s the thing – I keep taking different angles on these data and they keep telling me similar stories – that many states have actually systematically reduced their “effort” to finance public education systems over time, and yes, some have increased effort. And, there’s an interesting story behind each trend. Again, Vermont has systematically scaled up education spending relative to personal income over time. New Jersey has increased over time as well, but New Jersey has only risen to  a relatively below average position over time. By contrast, Colorado and Arizona both provide LESS DIRECT SPENDING ON EDUCATION AS A SHARE OF PERSONAL INCOME IN 2008 THAN THEY DID IN 1977!!!!!!!!!!  And they are not the only ones.  Perhaps those states need a correction in the other direction?

It will indeed be interesting to see how these “effort” measures shift as income takes a temporary hit and a bigger one that it has in the past. Most of the differences in the level of “effort” in the above figure are a function of income. States with higher personal income are able to raise what they need in education spending with a much smaller share of income. Even New Jersey, which is a relatively high spending state has relatively low effort. Other lower effort states include Connecticut and Massachusetts.

But, back to the point – These national aggregate claims that we’re tapped out – all of us – and every state – are entirely inappropriate and irresponsible. Let’s take a hard look and a more precise look at what’s really going on. Let’s focus our attention on useful quarterly reports like those from Rockefeller Institute on the condition of state revenue and lets provide appropriately differentiated instruction to states based on the widely varied conditions they face and the widely varied levels of effort they’ve applied thus far toward improving their education systems. The current rhetoric is unhelpful, and sadly, I think that’s the point!

The problem? Cheerleading and Ceramics, of course!

David Reber with the Topeka Examiner had a great post a while back (April, 2010) addressing the deceptive logic that we should be outraged by supposed exorbitant spending on things like cheerleading and ceramics, and not worry so much about the little things, like disparities between wealthy and poor school districts. I finally saw this post today, from a tweet, and realized I had not yet blogged on this topic.

This logic/argument comes from the “research” of Marguerite Roza, who, well, has a track record of making such absurd arguments in an effort to place blame on poor urban districts and take attention away from disparities between poor urban districts and their more affluent suburban neighbors.

This new argument is really just more of the same ol’ flimsy logic from this crew. For the past several years, Roza and colleagues have attempted to argue that states have largely done their part to fix inequities in funding between school districts, and that now, the burden falls on local public school districts to clean up their act. Here’s an excerpt from one of my recent articles on this topic:

On other occasions, Roza and Hill have argued that persistent between-district disparities may exist but are relatively unimportant. Following a state high court decision in New York mandating increased funding to New York City schools, Roza and Hill (2005) opined: “So, the real problem is not that New York City spends some $4,000 less per pupil than Westchester County, but that some schools in New York [City] spend $10,000 more per pupil than others in the same city.” That is, the state has fixed its end of the system enough.

This statement by Roza and Hill is even more problematic when one dissects it more carefully. What they are saying is that the average of per pupil spending in suburban districts is only $4,000 greater than spending per pupil in New York City but that the difference between maximum and minimum spending across schools in New York City is about $10,000 per pupil. Note the rather misleading apples-and-oranges issue. They are comparing the average in one case to the extremes in another.

In fact, among downstate suburban[1] New York State districts, the range of between-district differences in 2005 was an astounding $50,000 per pupil (between the small, wealthy Bridgehampton district at $69,772 and Franklin Square at $13,979). In that same year, New York City as a district spent $16,616 per pupil, while nine downstate suburban districts spent more than $26,616 (that is, more than $10,000 beyond the average for New York City). Pocantico Hills and Greenburgh, both in Westchester County (the comparison County used by Roza and Hill), spent over $30,000 per pupil in 2005.[2] These numbers dwarf even the purported $10,000 range within New York City (a range that we agree is presumptively problematic); our conclusion based on this cursory analysis is that the bigger problem likely remains the between-district disparity in funding.

http://epaa.asu.edu/ojs/article/viewFile/718/831

My article (with Kevin Welner) goes on to show how states have far from resolved between district disparities and that New York State in particular has among the most substantial persistent disparities between wealthy and poor school districts.For more information on persistent between district disparities that really do exist, see: Is School Funding Fair?.

I have a forthcoming paper this spring where I begin to untangle the new argument about poor urban districts really having plenty of money but simply wasting it on cheerleading and ceramics. Here’s a draft of a section of the introduction to that paper:

A handful of authors, primarily in non-peer reviewed and think tank reports posit that poor urban school districts have more than enough money to achieve adequate student outcomes and simply need to reallocate what they have toward improving achievement on tested subject areas. These authors, including Marguerite Roza and colleagues of the Center for Reinventing Public Education encourage public outrage that any school district not presently meeting state outcome standards would dare to allocate resources to courses like ceramics or activities like cheerleading. To support their argument, the authors provide anecdotes of per pupil expense on cheerleading being far greater than per pupil expense on core academic subjects like math or English.

Imagine a high school that spends $328 per student for math courses and $1,348 per cheerleader for cheerleading activities. Or a school where the average per-student cost of offering ceramics was $1,608; cosmetology, $1,997; and such core subjects as science, $739.[1]

These shocking anecdotes, however, are unhelpful for truly understanding resource allocation differences and reallocation options. For example, the major reason why cheerleading or ceramics expenses per pupil are highest is the relatively small class sizes, compared to those in English or Math. In total, the funds allocated to either cheerleading of ceramics are unlikely to have much if any effect if redistributed to reading or math.

Further, the requirement that poor urban (or other) districts currently falling below state outcome standards must re-allocate any and all resources from co-curricular and extracurricular activities toward improving achievement on tested outcomes may increase inequities in the depth and breadth of curricular offerings between higher and lower poverty schools – inequities that may be already quite substantial. That is, it may already be the case that higher poverty districts and those facing greater resource constraints are reallocating resources toward core, tested areas of curriculum and away from more advanced course offerings which extend beyond the tested curriculum and enriched opportunities including both elective courses and extracurricular activities.  Some evidence on this point already exists.

The perspective that low performing districts merely need to reallocate what they already have is particularly appealing in the current fiscal context, where state budgets and aid allocations to local public school districts are being slashed. Accepting Roza’s logic, states under court mandates or in the shadows of recent rulings regarding educational adequacy, but facing tight budgets may simply argue that high poverty and/or low performing districts should shift all available resources into the teaching of core, tested subjects. Lower poverty districts with ample resources that exceed minimum outcome standards face no such reallocation obligations, leading to substantial differences in depth and breadth of curriculum. Arguably a system that is both adequate and fair would protect the availability of deep and broad curriculum while simultaneously attempting to improve narrowly measured outcomes.

More later as this research progresses.


[1] “Downstate Suburban” refers to areas such as Westchester County and Long Island and is an official regional classification in the New York State Education Department Fiscal Analysis and Research Unit Annual Financial Reports data, which can be found here: http://www.oms.nysed.gov/faru/PDFDocuments/2008_Analysis.pdf and http://www.oms.nysed.gov/faru/Profiles/profiles_cover.html

[2] Interestingly, however, Bridgehampton and New York City have relatively similar “costs” due to Bridgehampton’s small size and New York City’s high student needs (see Duncombe and Yinger, 2009). The figures offered in this paragraph are based on Total Expenditures per Pupil from State Fiscal Profiles 2005. http://www.oms.nysed.gov/faru/Profiles/profiles_cover.html. Results are similar when comparing current operating expenditures per pupil.

Potential abuses of the Parent Trigger???

This article in the LA Times has been getting a lot of buzz today – http://www.latimes.com/news/local/la-me-compton-parents-20101207,0,1116485.story

The article discusses the use of what is called a “parent trigger” policy.  Here’s the synopsis:

On Tuesday, they intend to present a petition signed by 61% of McKinley parents that would require the Compton Unified School District to bring in a charter company to run the school. Charter schools are independently operated public schools.

“I know it’s never been done before, but I want to step up because I’m a parent who cares about my children and their education,” Murphy said Monday. She and other parents were meeting with organizers from Parent Revolution, a nonprofit that lobbied successfully last year for the so-called parent-trigger law.

So, what you’ve got is 61% of parents in a community pushing for a school to be converted to a charter school and potentially pushing for that school to be a specific type of charter school. This presents all sorts of interesting – and twisted possibilities.

I wrote about a week ago on how some charter schools, like North Star Academy in Newark have established themselves as the equivalent of elite magnet schools – potentially engaging in activities such as pushing out lower performing kids over time.

So, my question for the day is whether these “parent trigger” policies might allow a simple majority of parents – or some defined majority share – to force a reorganization of their neighborhood school into a charter – that would subsequently weed out those other “less desirable kids?”

That is, does this new policy of simple majority (mob) rule allow parents in a specific community to redefine their neighborhood school so that the school no-longer serves lower performing kids or kids whose parents are less able or for that matter less interested in engaging in a level of parent involvement that might be required by a specific charter operator? In short, can the majority of parents effectively kick out a minority of parents that they don’t like – including parents of kids with disabilities or non-English speaking parents?

Sure, you say – charters can’t discriminate in this way because they must rely on lotteries for admissions and must take children with disabilities and those unable to speak English. They would have to accept those kids in the neighborhood. Yes, by law this might be true. But experience with many charters proves otherwise. Many do rely on attrition to boost scores – somehow avoid serving kids with disabilities and non-English speaking kids. But the neighborhood school couldn’t do the same.

Taking this a step further, envision a neighborhood split along language, ethnic or even religious lines. Can the parents of the majority group force their neighborhood school to be reconstituted as a cultural, language or for that matter religion (argued as culture) specific school that is effectively hostile to the minority?

Hey education law friends – help me out with the possibilities here?