Blog

School Funding Deception Alert! (in a CAN)

I’ve noticed a pattern in a few recent school funding proposals, mostly emanating from shoddy, haphazard proposals developed on behalf of the CANs (ConnCAN & its close relatives) and often with “technical support” of Bryan Hassel of Public Impact. Let’s call it school finance reform in a CAN.

These new simplified school funding formula proposals, framed under the “money follows the child” ideology are intended to make state school funding formulas more “transparent” and to allow for more equitable and predictable flow of funding to charter schools or other non-district schools.

In each proposal (ConnCAN’s Spend Smart & The Tab, or Rhode Island’s new formula [albeit laced with other problems unique to RI-see post]), among a variety of other major overlooked factors, arbitrary and unfounded recommendations, exists a seemingly innocuous proposal regarding how to target aid for variations in student needs across districts.

As the authors of ConnCan’s recent Spend Smart brief explain deeply embedded in a footnote… you really only need to use a single factor to get state aid targeted to the right schools and that factor is the share of children qualifying for FREE OR REDUCED PRICED LUNCH. There’s no need for a special factor for limited English proficient/English language learner populations, or anything else. It’s all pretty much correlated to free and reduced lunch. (Hassel’s previous report for ConnCan, The Tab, included a trivially small LEP/ELL weight instead of none at all).

First, this assumption is patently wrong to begin with and is never actually validated by the authors of these proposals. But let’s set that aside for the moment. I’ll have a future post where I use actual data to show just how freakin’ wrong the assumption is.

But why would they propose this anyway? Well, it turns out to be really simple. If a state has a fixed sum of money to distribute (generally how it works), the CAN game is to figure out on what basis might charter schools get the maximum share of that money – regardless of who really needs it most. That is, what measures CAN they choose for weightings which will drive money to charters. Charter schools do tend to operate in poorer communities (relative to state averages), but a) serve the less poor among the poor, b) serve few or no LEP/ELL children, and c) incidentally, also serve few or no children with disabilities (as has been addressed on my blog regarding NY and NJ charter schools, and will be addressed soon regarding CT charters – numbers already run, charts forthcoming).  I’ll set aside c) for now.

So, the way to maximize charter funding, is to give a single weight for children qualified for free OR REDUCED PRICE LUNCH, and to negate any weight for LEP/ELL children (or make it as small as possible). That way, charters will get the same weight for kids whose family income falls between the 130% poverty level and 185% poverty level as neighborhood schools get for children below the 130% poverty level (This distinction is NOT TRIVIAL), where neighborhood schools have far more of the lower-income children. Any money that would have gone to LEP/ELL children can be rolled into a bigger weight for free/reduced lunch children, channeling a larger share of the total funding available to charter schools.

While not specifically addressed in these proposals, one would imagine that the same pundits would also favor flat, lump sum, or census based funding for special education, not differentiated by disability type, such that every school or district gets a specific dollar amount for special education based on a fixed share of their enrollment – a) whether they serve any special education students at all, or b) whether they only serve mild specific learning disability students, and none more severe. Watch out for this one as well!

===

Note: I’m sure that many will respond to this post by arguing that charters get severely shortchanged on state aid anyway, and that even if they make out okay on these adjustments, the lack of funding for such things as capital outlay and facilities more than offsets the difference. That’s a topic for another day. But, suffice it to say, existing comparisons like those made in the recent Ball State/Public Impact (imagine that) study are grossly oversimplified (as I explain regarding NYC schools, here: http://nepc.colorado.edu/publication/NYC-charter-disparities (page 23)). For example, typical crude comparisons never address whether having few or no special education children (relative to averages for district schools) result in cost reductions (per pupil) that might actually be greater than facilities average annual expenses per pupil.

===

Follow up figure for those who asked:

Note that using only a weight on free or reduced lunch would drive the same amount of supplemental funding to Torrington as to Norwalk or Danbury, despite large differences in LEP/ELL populations. So too would any charter school that has comparable low income population to Danbury, Stamford or Norwalk, even if that school had no LEP/ELL children. There may be other valid differences that require additional attention. Even this graph is too crude to give us the full story. The bottom line is that one must at least evaluate the distributions of children by need categories across districts and settings before making such bold, but oversimplified policy recommendations.

Here’s Rhode Island:

The issue here is similar in that Central Falls in particular (imagine that) gets shafted by failure to independently address differences in ELL/LEP concentration. While RI has few districts, and has a specific cluster of high poverty districts, the rates of LEP/ELL children across those districts vary from 5% to 20%. But, as I’ve explained previously, the RI formula and logic behind it have numerous other empirical and logical gaps. see: https://schoolfinance101.wordpress.com/2010/07/01/the-gist-twists-rhode-island-school-finance/

Distilling Rhetoric & Research on NY State Education Spending

This is another one of those mundane school finance formula posts. This one is focused on media and political spin in New York State around the recently adopted state budget and proposed school aid cuts.

Yesterday, I had the displeasure of reading a New York Post piece in which New York Governor Cuomo and the Post were validating how and why the proposed budget cuts would not and should not compromise the quality of New York State public schools. But this article – both the Post explanations and especially the Governor’s spokespersons explanations present a massive distortion of how the proposed cuts actually affect different types of districts across New York State.

Let’s break it down:

Political Spin

Here’s the public appeal, political spin on why cutting state aid to schools in New York really causes no harm:

The state’s student population dropped to 2.7 million from 2.8 million — or 4.6 percent — during that period.

And during that same span, the number of rank-and-file teachers grew to 214,000 from 194,957 — a 9.8 percent increase.

As a result, overall public-school expenditures more than doubled, from $26 billion to $58 billion statewide.

And:

“The huge growth in school bureaucracy and overhead is disturbing, especially since many schools are threatening to fire teachers,” Cuomo spokesman Josh Vlasto said. “School districts clearly have more than enough to do more with less.”

Read more: http://www.nypost.com/p/news/local/supervisor_bloat_hikes_overhead_gnbt3xbRu6hnqPRqrCTZvO#ixzz1IkEkATFa

Very simple: New York State school districts have added a whole bunch of administrators they don’t need – administrators who are obscenely high paid, and really just a massive waste. That is, if we accept the numbers reported above. But I won’t go after those in this post, because the argument is flawed on so many other levels. I will say that it is a foolish stretch to argue that administrative bloat has caused a doubling of per pupil spending across New York school districts.

Essentially, the argument here is that since there is so much bloat and waste – REGARDLESS OF WHERE THAT BLOAT AND WASTE EXISTS, if we cut aid to districts, they can simply cut that bloat. Of course that logic doesn’t work so well if the proposal is to cut aid from districts which are not the ones with the reported bloat.

Academic Analysis on Relative Efficiency and State Aid in New York

It is indeed interesting that the NY Post and Governor’s office have chosen to focus on spending increases since 1997.  Spending in many New York school districts, teacher salaries and administrative salaries in many New York school districts did escalate over this period. But why? What’s going on there? In what districts and what parts of the state is spending increasing, and does state aid play any role in those increases? Perhaps most directly on the question above, are some of those increases in spending actually leading to inefficiency, and is there any component of state aid that might be encouraging inefficient spending in school districts? If that was the case, we’d probably want to look first at those state aid programs as a place to cut.

Here are some summaries of findings from studies on New York State’s STAR tax relief program, which provides a sizeable chunk of financial support in systematically larger amounts to affluent communities:

Eom:

We test this hypothesis by examining the introduction of New York State’s large state-subsidized property tax exemption program, which began in 1999. We find evidence that, all else constant, the exemptions have reduced efficiency in districts with larger exemptions, but the effects appear to diminish as taxpayers become accustomed to the exemptions.

http://bk21gspa.snu.ac.kr/datafile/downfile/%EC%97%84%ED%83%9C%ED%98%B8%28GSPA-SD%2907_1.6.8.pdf

Public Budgeting & Finance / Spring 2006

Eom & Killeen:

Similar to many property tax relief programs, New York State’s School Tax Relief (STAR) program has been shown to exacerbate school resource inequities across urban, suburban, and rural schools. STAR’s inherent conflict with the wealth equalization policies of New York State’s school finance system are highlighted in a manner that effectively penalizes large, urban school districts by not adjusting for factors likely to contribute to high property taxation. As a policy solution, this article presents results of a simulation that distributes property tax relief using an econometrically based cost index. The results substantially favor high-need urban and rural school districts.

http://eus.sagepub.com/content/40/1/36.full.pdf+html

Education and Urban Society November 2007

Rockoff:

I examine how a property tax relief program in New York State affected local educational spending. This program, which lowered the marginal cost of school expenditure to homeowners, had statistically and economically significant effects on local government behavior. A typical school district, which received 20% of its revenue through the program in the school year 2001- 2002, raised expenditure by 4.1% and local property taxes by 6.8% in response. I then examine how the preferences of various groups of local taxpayers affect educational spending by identifying systematic variation across districts in the response to fiscal incentives. These results support the hypothesis that homeowners are more influential on local expenditure decisions than renters, owners of second homes, or owners of non-residential property.

http://www0.gsb.columbia.edu/faculty/jrockoff/papers/local_response_draft_january_10.pdf

Recap of the Research

So, let’s recap. What do we know about NY state aid and the potential link to the supposed inefficiencies to which the NY Post article and governor’s spokesman refer:

  1. That STAR aid in particular is allocated disproportionately to more affluent downstate school districts;
  2. That STAR aid, by reducing the price to local homeowners of raising an additional dollar in taxes to their schools, encouraged increased local spending on schools;
  3. That when the relative efficiency of school districts is measured in terms of increases in measured test scores, given additional dollars spent, STAR aid appears to have encouraged less efficient spending. STAR aid enabled affluent suburban districts to spend on other things not directly associated with measured outcomes, but things those communities still desired for their schools.
  4. That STAR aid contributes to inequities across districts in a system that is already highly inequitable.

What’s happening now?

As I have shown here, in recent years, STAR aid continues to be allocated inequitably, benefitting systematically wealthier districts.

https://schoolfinance101.wordpress.com/2011/02/04/where%E2%80%99s-the-pork-mitigating-the-damage-of-state-aid-cuts/

https://schoolfinance101.com/wp-content/uploads/2011/02/figure3.jpg

Funding inequities persist across New York state districts, with affluent suburban districts far outspending their poorer urban neighbors.

See www.schoolfundingfairness.org

But, the proposed funding cuts are not targeted at the districts which are most likely contributing to “inefficient” spending growth (if it is really inefficient).

The state aid cuts are not targeted to the state aid which seems to be stimulating less efficient spending and exacerbating inequity.

Rather, the proposed state aid cuts fall disproportionately on general foundation formula aid for those districts which have already been left in the dust by their more affluent neighbors. https://schoolfinance101.wordpress.com/2011/02/04/where%E2%80%99s-the-pork-mitigating-the-damage-of-state-aid-cuts/

https://schoolfinance101.com/wp-content/uploads/2011/02/figure5.jpg

How does that make sense?

Quite honestly, the argument made in the Post, and by the governor’s spokesperson is really obnoxious and misguided, given the distribution of the planned cuts.

Analogy for the day

Let’s say we have a state aid program for personal transportation and we have some really rich communities and some really poor communities.

Let’s assume no mass transportation exists.

Let’s say we (the state) decide to give individuals in the poor communities $200 per month to help them purchase, insure and maintain a personal vehicle –  a freakin’ car… and pretty damn cheap car that is minimally functional and questionably reliable. The $200 per month is pretty much all they’ve got. They’ve got few or no personal resources to contribute to an upgrade, and pretty much live month to month on maintenance and insurance.

We use another pot of aid to give $100 per month to residents of the rich community, who’ve already gone out and purchased Bentleys and Ferraris, and mostly use that money for occasional detailing of their vehicles which they might otherwise forgo (perhaps not) or perhaps an enhanced satellite radio subscription they might not have otherwise chosen and one that includes channels the never really expect to use (typically, they would have gotten the most expensive subscription anyway. As the truly rich like to point out, no-one who’s truly rich would ever dare ask how much it costs to maintain a yacht).

All of the sudden, the state budget is tight and a new report from some think tank comes out showing that in the past 10 years, more and more NY residents are driving Ferraris and Bentleys and more and more of them get their cars detailed on a monthly basis and have the most expensive satellite radio subscription. It’s an abomination. Therefore, cutting aid certainly causes no harm.

So policymakers pass their first on-time budget in years, cutting 10% of that $200 per month that currently supports basic car purchases in the poor communities! They ignore entirely that the $100 per month to the rich communities even exists.

Of course, once we’ve cut that money and ignored the other, what we now have is a set of poor communities that is less able to insure and maintain their economy vehicles. And about those Ferraris and Bentleys? We haven’t even touched their detailing subsidy.

Public Impact’s Persistent Pattern of Shoddy Analysis

Alternative title: Why Hassel with research, data and facts?

I was called up on this past week to review a new policy brief on reforming Connecticut’s education funding system – or Education Cost Sharing formula. The brief, titled Spend Smart: Fix Connecticut’s Broken School Funding System seemed simple enough on its face, but as I looked deeper, ended up being among the most offensively shallow and poorly documented reports I have ever seen.

Further, some of elements of the report which were stated as fact, but entirely unsubstantiated would actually lead to funding policies that significantly disadvantage some of the state’s highest need children. Even worse, this brief was accompanied by submitted legislation that included these ill-conceived policies.

But this post is only partly about this new brief produced by ConnCan, with an eclectic mix of authors put forth in reformy manifesto style. Nearly every attempt to ground “facts” in the brief was tied to previous ConnCan briefs, which themselves included little or no substantiation.

The common denominator in this brief and those on which it relies, as well as the accompanying legislation, appears to be Bryan Hassel of Public Impact. Hassel has also played a role on previous haphazard manifesto-like school funding reports including Fund the Child. Bryan Hassel has also been mentioned as the outside expert to advocate on behalf of ConnCan for school funding reform in Connectictut, including testifying in favor of the proposed legislation. See: http://blog.ctnews.com/kantrowitz/2009/12/03/1208/, or the ConnCan tweet:

Brian Hassel, co-dir. Public Impact: SB 1195 would “catapult Connecticut into a national model for schools” #edreform #getsmartct

http://twitter.com/#!/conncan/status/51061576467361792

Tangentially, Bryan Hassel and Public Impact were also involved in the production of the deeply problematic analysis of charter school funding disparities released last year, which I critique in part, in my recent work on New York City charter schools.

There comes a point where I encounter enough different reports linked to single organization and author, where those reports are so shockingly bad that I simply can’t hold back anymore.

The following three examples, all connected back to Public Impact and Bryan Hassel, provide evidence of the utter methodological incompetence of this organization and their/his complete disregard for a) existing rigorous research, b) legitimate analytical methods and data, and perhaps most disturbingly, c) significant adverse consequences of performing shoddy analysis and making bold but haphazard policy recommendations.

Below are three of my related critiques of policy “research” (used as loosely as I can imagine) with ties to Public Impact and Bryan Hassel. I offer these critiques in particular to any policy makers who might believe it reasonable to rely on this junk, or the organization that produces it.

Example 1: Public Impact and ConnCan’s Funding Reform Proposals

http://nepc.colorado.edu/files/TTR-ConnCan-Baker-FINAL.pdf

Here are just a few examples from my review of Spend Smart. The Spend Smart brief essentially argues that the Connecticut finance system is broken (it may well be, and I think it is), and that it should be fixed with a simple school funding formula with a single weight on children qualified for free or reduced price lunch.

This particular brief stated a number of supposed “facts” about the status of the current system, few or none of which could be substantiated with information provided, and some which were clearly unchecked and simply wrong, with significant consequences.

Here are some quoted claims from the brief and a tracing of the factual basis for those claims:

Claim 2: “Moreover, our current system was designed to direct 33 percent more dollars to students in towns with high poverty, but actually provides only 11.5 percent more funding for these students.” (Page 2)

Claim 2 posits that the current ECS formula leads to an average of 11.5% additional funding per low-income child across Connecticut school districts. That claim is cited to a previous ConnCan report, The Tab, authored by Bryan Hassel of Public Impact (specifically Page 18 of The Tab). Page 18 of The Tab cites this claim in Footnote 18 as “Authors’ analysis using 2007-08 data from the State Department of Education.  See Appendix for Details.” However, the appendix of the report provides no such justification and no further reference to the 11.5 figure. Rather the appendix provides only listings of data sources supposedly used and no explanation of how those sources might have been used.[i]

Claim 5: “For example, students at Connecticut’s charter schools are funded at only 75 cents on the dollar compared with traditional public schools.” (page 3)

Claim 5 is perhaps most perplexing, and like Claim 1, an example of the evidentiary black hole. The claim that Connecticut charter schools receive, on average, about 75% of state average funding is cited to a previous ConnCan report [not a Hassel/Public Impact product] titled Connecticut’s Charter School Law and Race to the Top. [ii] This ConnCan report was previously reviewed by Robert Bifulco for NEPC, who explained:[iii]

“The brief provides no indication of how it was determined that charter schools end up with only 75% of per-pupil funding that districts receive, or how, if at all, this comparison accounts for in-kind services or differences in service responsibilities.” [p. 3, Bifulco Critique]

And finally, for now:

Claim 6:“The formula could also hypothetically provide weights for other student needs, such as English Language Learner status. However, data shared by Connecticut State Department of Education with the State’s Ad Hoc Committee to Study Education Cost Sharing and School Choice show that the measure for free/reduced price lunch also captures most English language learners. In other words, there is a very strong correlation between English language learner concentration and poverty concentration in Connecticut. In addition, keeping the formula simple allows a more generous weight for students in poverty.” (p. 7, FN 12)

Claim 6 is particularly disconcerting, both because it includes a statistical finding which is never validated and because it is used to inform a policy solution which would produce substantial inequities harmful to a specific student population – children with limited English language skills. The authors claim outright that there is no need for additional adjustment for districts serving large shares of limited English proficient children because:

“there is a very strong correlation between English language learner concentration and poverty concentration in Connecticut.” (p. 7, FN 12)

This finding is cited only ambiguously in a footnote to data shared by CTDOE.  In some states, a strong relationship between the two measures might warrant collapsing supplemental aid for LEP and low-income children into one student need factor – with sufficient additional support to meet the combination and concentration of needs. However, a quick check of the data in Connecticut shown in Figure 1 (below) reveals that several districts have disproportionately high LEP concentrations relative to their low-income concentrations – specifically Norwalk, Danbury, New London, Windham and New Britain. These districts would be substantially disadvantaged by a formula with no additional weighting for LEP children, coupled with an arbitrary, small weighting for low-income status. In fact, the proposal to include only a relatively small weight for free or reduced price lunch and ignore the concentrated needs of these districts is most likely a back-door way to reduce the overall cost of the formula, and limit the extent that the formula truly redistributes funding where it is needed.

Figure 1

Relationship between Subsidized Lunch Rates and ELL Concentrations 2009


Data source: CTDOE 2009, Student need (free or reduced lunch: http://sdeportal.ct.gov/Cedar/WEB/ct_report/StudentNeedDT.aspx) and LEP data files (http://sdeportal.ct.gov/Cedar/WEB/ct_report/EllDT.aspx)

Note: From 2005 to 2009, the r-squared for this relationship ranges from .25 to .62, and is generally around .5.

The bottom line – The authors clearly never checked. The authors clearly don’t know what they are talking about, even at the most basic level. Yet they are willing – all who signed on to this brief, including Hassel, Hawley-Miles and Paul Hill – to go out on a limb and make these proclamations – proclamations and policy proposals which are simply bad, wrong, misguided – and irresponsible.

Example #2: Public Impact ConnCan’s The Tab

Much of the content of the Spend Smart brief seems to be grounded in, and some of it directly cited to, the previous ConnCan finance report titled The Tab, on which Bryan Hassel was listed as lead author.

I have written previously about The Tab, which is of equal quality to Spend Smart. Here’s a copy and paste of my previous post on The Tab.

https://schoolfinance101.wordpress.com/2009/11/23/why-is-it-ok-for-think-tanks-to-just-make-stuff-up/

==========Original Blog Post

This topic comes to mind today because ConnCan has just released a report (http://www.conncan.org/matriarch/documents/TheTab.pdf)    on how to fix Connecticut school funding which provides classic examples of just makin’ stuff up (page 25). The report begins with a few random charts and graphs showing the differences in funding between wealthy and poor Connecticut school districts and their state and local shares of funding. These analyses, while reasonably descriptive are relatively meaningless because they are not anchored to any well conceived or articulated explanation of “what should be.” Such a conception might be located here or even here (Chapters 13, 14 & 15 are particularly on target)!

The height of making stuff up in the report is the recommended policy solution to the problem which is never clearly articulated. There are problems in CT, but The Tab, certainly doesn’t identify them!

The supposed ideal policy solution involves a pupil-based funding formula where each pupil should receive at least $11,000 per pupil (made up), and each child in poverty (no definition provided – just a few random ideas in a footnote) should receive an additional $3,000 per pupil (also made up) and each child with limited English language proficiency should receive an additional $400 per pupil (yep… totally made up). There is minimal attempt in the report (http://www.conncan.org/matriarch/documents/TheTab.pdf) to explain why these figures are reasonable. They’re simply made up.

The authors do provide some back-of-the-napkin explanations for the numbers they made up – based on those numbers being larger than the amounts typically allocated (not necessarily true). They write off the possibility that better numbers might be derived by way of a general footnote reference to a chapter in the Handbook of Research on Education Finance and Policy by Bill Duncombe and John Yinger which actually explains methods for deriving such estimates.

The authors of The Tab conclude: “Combined with federal funding that flows on the basis of poverty and (in some cases) the English Language Learner weight of an additional $400, the $3,000 poverty weight would enable districts and schools to devote considerable resources to meeting the needs of disadvantaged students.” I’m glad they are so confident in their “made up” numbers! I, however, am less so!

It would be one thing if there was no conceptual or methodological basis for figuring out which children require more resources or how much more they might actually need. Then, I guess, you might have to make stuff up. Even then, it might be reasonable to make at least some thoughtful attempt to explain why you made up the numbers you… well… made up. But alas, such thinking seems beyond the grasp of at least some “think tanks.” Guess what? There actually are some pretty good articles out there which attempt to distill additional costs associated with specific poverty measures… like this one, by Bill Duncombe and John Yinger: How much more does a disadvantaged student cost?

It’s not like the title of this article somehow conceals its contents, does it? Nor is the journal in which it was published (Economics of Education Review) somehow tangential to the point at hand. This paper, prepared for the National Research Council provides some additional insights into additional costs associated with poverty and methods for estimating those costs.

Rather than even attempt to argue that these figures are somehow founded in something, the authors of The Tab seem to push the point that it really doesn’t matter what these numbers are as long as the state allocates pupil-based funding.  That’s the fix! That’s what matters… not how much funding or whether the right kids get the right amounts. In fact, the reverse is true. The potential effectiveness, equity and adequacy of any decentralized weighted funding system is highly contingent upon driving appropriate levels of funding and funding differentials across schools and districts!

Example #3: Public Impact Charter Disparity Analysis

Finally, there’s the report done by Public Impact with Ball State University on charter school funding disparities, which remains fresh in my mind because it keeps coming back up again and again. And it is because of the connection between the shoddy methods of that report, and the absurdly shoddy analysis in The Tab and Spend Smart, that this post is focuses on Bryan Hassel and Public Impact.

When digging deeper on financial differences among charter and non-charter schools in New York City, and looking at what the Public Impact/Ball State study had said about New York charter schools, my coauthor and I were shocked at how poorly the Public Impact/Ball State study had been conducted. Here’s a short section of our critique:

From: Baker, B.D. & Ferris, R. (2011). Adding Up the Spending: Fiscal Disparities and Philanthropy among New York City Charter Schools. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/NYC-charter-disparities.

This section returns to the issue of disparities in funding between non-charter and charter schools. As already noted, the Ball State/Public Impact study identified New York State as having large financial disparities between traditional public schools and charter schools. In contrast, the NYC independent budget office concluded that charters with department of education facilities had only negligibly fewer resources than non-charter public schools. One of these accounts is incorrect.

Ball State/Public Impact study claims that NYC traditional public school per-pupil expenditures were $20,021 in 2006-07, and that charter school expenditures were $13,468, for a 32.7% difference.[iv] However, the first figure appears to be inflated; the only figure that closely resembles $20,021 is the total expenditure, including capital outlay expense. This amounts to 19,198,[v] according to the 2006-07 NCES fiscal survey.[vi] This amount includes spending that is clearly not for traditional public schools—it includes not only transportation and textbooks allocated to charter schools, but also the city expenditures on buildings used by some charter schools.[vii] In essence, this approach attributes spending on charters to the publics they are being compared with—clearly a problematic measurement.

After offering these figures and the crude comparisons, the Ball State/Public Impact study argues that the purportedly severe funding differential is not explained by differences in need, because on average 43.5% of the students in public schools in New York State qualify for free or reduced-price lunch, while on average 73.3% of those in charter schools in New York State do. But, as was demonstrated earlier, there are three problems: (a) the focus on state rates, rather than NYC rates; (b) the inclusion of reduced-price lunch rates rather than just free-lunch rates as a measure of poverty (when focused on comparisons within NYC); and (c) the failure to compare only schools serving the same grade-levels. When these details are addressed, a different picture emerges. At the elementary level in NYC, for example, charter school free lunch rates were 57% and non-charter public school rates were 68%.

The NYC IBO report offers figures that are more in line with the data. For 2008-09, traditional public schools are found to have expenditures of $16,678, while charters that are provided with facilities are at nearly the same level ($16,373). Public expenditures on charters not provided facilities are found to be about $2,700 per pupil lower ($13,661). But even this comparison is not necessarily the most precise or accurate that might be made, because it does not attempt to compare schools that are (a) similar in grade level and grade range and (b) similar in student needs. The IBO analysis provides a useful, albeit limited, comparison of charter schools in their aggregate to district schools in their aggregate. Importantly, the IBO charter school funding figures do not include funds raised through private giving to schools or monies provided by their management organizations.

Once the cost differences associated with student populations are factored in, the IBO analysis changes significantly. In fact, the cost associated with student population differences is the same as the per-pupil cost associated with lack of a facility: $2,500. After adding the $2,500 low-need-population adjustment to charters, those not in BOE facilities can be seen to have funding nearly equal to that of non-charters ($16,171 vs $16,678) while those in BOE facilities have significantly more funding than non-charters (see Table 3).[viii]

One might try to argue that these problems we identify with the NY estimates, which render them entirely meaningless, are specific to New York, but that the rest of the states are reasonably estimated. The reality is that when it comes to estimating these types of funding differentials, each state and each local district, depending on the charter funding formula has its own peculiarities. If the crude method used by Hassel and colleagues completely missed the boat on New York, it is highly likely that comparable problems exist across many other settings. Without further, more detailed an appropriate analysis it would be unwise to base any conclusions on the existing Ball State/Public Impact study.


[i] In the recent report Is School Funding Fair, 2007-08 update (http://www.schoolfundingfairness.org/SFF_2008_Update.pdf) , Baker, Farrie and Sciarra show that the differential between very high and very low poverty districts in Connecticut is about 15% (Table 1), however, it is important to understand that in Connecticut, these patterns are not systematic. Rather, as I show in Figure A3 of the appendix herein, there exist substantial irregularities in current spending per pupil with respect to poverty. Among high need districts in particular, funding levels vary widely. Arguably, in this regard the system is indeed broken. But the ConnCan reports fail to provide any legitimate evidence to this effect.

[ii] http://www.conncan.org/sites/default/files/research/CTCharterLaw-RTTT2010-Web-2.pdf.  Interestingly, the authors of the current brief, including Bryan Hassel, choose not to anchor this conclusion to other recent work co-authored by Hassel, which describes funding disparities between host districts – New Haven and Bridgeport – and charters in those cities as “severe.” However, Baker and Ferris (2011) explain substantial methodological flaws in the characterization of charter funding gaps by Hassel and colleagues, pertaining to their analysis of New York State and New York City charter schools. There is little reason to believe that Hassel and colleagues analyses of Connecticut are any more valid than those for New York. For the state and district summaries of charter disparities, see: Batdorff, M., Maloney, L., May, J., Doyle, D., & Hassel, B. (2010). Charter School Funding: Inequity Persists. Muncie, IN: Ball State University. see: p. 10-11,Table 5. For a thorough critique of Hassel and colleagues mis-steps in this report when characterizing charter disparities in New York, see: Baker, B.D. & Ferris, R. (2011). Adding Up the Spending: Fiscal Disparities and Philanthropy among New York City Charter Schools. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/NYC-charter-disparities.

[iii] Bifulco, R. (2010). Review of “Connecticut’s Charter School Law & Race to the Top!” Boulder and Tempe: Education and the Public Interest Center & Education Policy Re-search Unit. Retrieved [date] from  http://nepc.colorado.edu/files/TTR-ConnCan-Bifulco.pdf

[iv] See: Batdorff, M., Maloney, L., May, J., Doyle, D., & Hassel, B. (2010). Charter School Funding: Inequity Persists. Muncie, IN: Ball State University, bottom of Table 5

[v] Depending on how one chooses to calculate this figure, the range is from 19,199 to about 20,162. The reported total expenditures for the district are $20,144,661,000 and enrollment figures range from 999,150 (as reported in the fiscal survey) to 1,049,273 (implied enrollment from current expenditure per pupil calculation in fiscal survey).

[vi] From the Census Bureau’s Fiscal Survey of Local Governments, Elementary and Secondary Education, F-33.  http://www.census.gov/govs/www/school.html

[vii] The New York State Education Department reports several versions of expenditure figures. Total expenditures per pupil for NYC in 2007-08 were $18,977—much lower than the total reported by Batdorf and colleagues. But the IBO correctly points out some expenses would be appropriately excluded from this number. For instance, the NYC Department of Education provides facilities for about half the city’s charter schools as well as many other forms of support for some charter schools, including authorizer services, food service, transportation services, textbooks, and management services:

Pass-through Support for Charter Schools. Charter schools are eligible to receive goods such as textbooks and software, as well as services such as special education evaluations, health services, and student transportation, if needed and requested from the district. In NYC there is a long-established process for non-public schools to access these services, and charter schools have access to similar support from DOE. For these items, charter schools receive the goods or services rather than dollars to pay for them. Most of these non-cash allocations are managed centrally through DOE.

IBO report, 2010: Retrieved December 13, 2010, from
http://schools.nyc.gov/community/planning/charters/ResourcesforSchools/default.htm.

It is simply wrong to compare the city aggregate spending per pupil to the school-site allotment for charters, as was done by Batdorf and colleagues (who also use the most inflated available figure for the city aggregate spending). In 2007-08 (a year earlier than the IBO comparison figure, but likely a reasonable substitute), NYSED estimates for the instructional/operating expenditures per pupil in NYC were $15,065 (this uses the instructional expenditure share, including expenditures on employee benefits [IE2%, Col. AP] times the total expenditures.  Retrieved December 13, 2010, from http://www.oms.nysed.gov/faru/Profiles/datacolumns1.htm). This figure may be far more relevant than that chosen by Batdorf and colleagues, but is still potentially problematic.

[viii] Again, we are unable to adjust precisely for differences in special education populations, due to lack of sufficiently detailed data.

Measuring poverty in education policy research

My goal in this post is to explain why it is vitally important in the current policy debate that we pay careful attention to how child poverty is measured and what is gained and lost by choosing different versions of poverty measures as we evaluate education systems, schools and policy alternatives.

This post is inspired by a recent exceptional column on a similar topic by Gordon MacInnis, on NJ Spotlight. See: http://www.njspotlight.com/stories/11/0323/1843/

There is a great deal of ignorance and in some cases belligerent denial about persistent problems with using excessively crude measures to characterize the family backgrounds of children, specifically measuring degrees of economic disadvantage.

As an example of the belligerent denial side of the conversation, the following statements come from a recent slide show from officials at the New Jersey Department of Education, regarding their comparisons of charter school performance, and in response to my frequently expressed concern that New Jersey Charter schools tend to serve larger shares of the “less poor among the poor” children. Here’s the graph for Newark schools.

That is, New Jersey Charter Schools which operate generally in high poverty settings, tend to serve somewhat comparable shares of children qualifying for free AND REDUCED price lunch, when compared to neighborhood schools, but serve far fewer children who qualify for FREE LUNCH ONLY.

NJDOE official’s recent response to this claim is as follows:

  • The state aid formula does not distinguish between “free” and “reduced”-price lunch count.
  • New Jersey combines free and reduced for federal AYP determination purposes
  • All students in both these categories are generally used by researchers throughout the country as a good enough proxy for “economically disadvantaged”
  • And most important, research shows that concentration of poverty in schools creates unique challenges, and most charters in NJ cross a threshold of concentrated poverty that makes these distinctions meaningless

Whether New Jersey uses this crude indicator in other areas of policy does not make it a good measure. In some cases, it may be the only available measure. But that also doesn’t make it a good one. And whether researchers use the measure when it’s one of the only measures available also does not make it a good measure.

Any thoughtful and reasonably informed researcher should readily recognize and acknowledge the substantial shortcomings of such crude income classification, and the potential detrimental effects of using such a measure within an analysis or statistical model.

The final bullet point is just silly. The final statement claims that since charters and non-charters in New Jersey cities are all “poor enough” there’s really no difference. This claim relies on selecting a threshold for identifying poverty that is simply too high to capture the true differences in poorness – real, legitimate and important differences – with significant consequences for student outcomes.

To put it quite simply, the distinction between various levels of poverty and measures for capturing those distinctions are not trivial and not meaningless. Rather, they are quite meaningful and important, especially in the current policy context.

Here’s a run-down on why these differences are not trivial:

What are the “official” differences in those who qualify for free versus reduced priced lunch?

Figure 1 provides the income definitions for families to qualify for free versus reduced price lunch. This information is relatively self-explanatory. Families qualifying for reduced price lunch have income at 185% of the poverty level. Families qualifying for free lunch fall below income of 130% of the poverty level.

Figure 1: Income cut-offs for families qualifying for the National School Lunch Program


Unfortunately, a secondary problem with these cut-offs for discussion another day, is that these thresholds do not vary appropriately across regions and between rural and urban areas. The same income might go further in providing a reasonable lifestyle in Texas than in the New York metropolitan area. Trudi Renwick has done some preliminary work providing state level adjusted poverty estimates to correct for this problem: http://www.census.gov/hhes/povmeas/methodology/supplemental/research.html

If these distinctions are trivial and meaningless, why are there such large differences in NAEP performance?

Now the fact that the income levels which qualify a family for free or reduced lunch are different does not necessarily mean that these differences are important to education policy analysis. In fact, one thing that we do know is that because the income thresholds fit differently in different settings and different regions, different measures work better in different settings (lower-income thresholds in southern and southwestern states, for example).

But why do we consider these measures in education policy research to begin with? The main reason we consider poverty measures in education policy research is because it is generally well understood that children’s economic well-being is strongly associated with their educational outcomes, and with our ability to improve those outcomes and the costs of improving those outcomes. In most thorough, social science analysis of these relationships, extensive measures of family educational background, actual income (rather than simple categories), numbers of books in the household, and other measures are used. But such measures aren’t always readily available. It is more common to find, in a state data system, a simple indicator of whether a child qualifies for free or reduced price lunch. That doesn’t make it good though. It’s just there.

But if, for example, we could look at achievement outcomes of kids who qualified for free lunch only, and for kids who qualified for reduced price lunch, and if we saw significant differences in their achievement, then it would be important to consider both… or consider specifically the indicator more strongly associated with lower student outcomes. The goal is to identify the measure, or version of the measure that is sensitive to the variations in family backgrounds in the setting under investigation and is associated with outcomes.

Figure 2 piggy backs on Gordon MacInnis examples comparing NAEP achievement gaps between non-low income students (anything but a homogeneous group) and students who qualify for free or for reduced price lunch. In figure 2 I graph NAEP 8th grade math outcomes for 2003 to 2009. What we see is that the average outcomes for students who qualify for free lunch are much lower than those who qualify for reduced price lunch. In fact, the gap between free and reduced is nearly as big in some cases as the gap between reduced and not qualified!

Figure 2: Differences in 8th grade Math Achievement by Income Status 2003-2009


Can every school in Cleveland be equally poor?

Another issue is that when we use the free or reduced price lunch indicator, and apply that indicator as a blunt, dummy variable to kids in high poverty settings – like poor urban core areas – we are likely to find that 100% of children qualify. Just because 100% of children receive the “qualified for free or reduced lunch” label does not by any stretch of the imagination mean that they are all on equal “economic disadvantage” footing. That they are all “poor enough” to be equally disadvantaged.

Let’s take a look at Cleveland Municipal School District and the distribution of schools by their rate of free and reduced lunch. There it is in Figure 3 – Nearly every school in Cleveland is 100% free or reduced price lunch. So, I guess they are all about the same. All equally poor. No need to consider any differential treatment, funding, policies or programs? Right?

Figure 3: Distribution of Cleveland Municipal School District % Free or Reduced Price Lunch Rates


Well, not really! That would be a truly stupid assertion, and I expect anyone working within Cleveland Municipal School District can readily point to those neighborhoods and schools that serve far more substantively economically disadvantaged students than others. The data I have for this analysis are not quite that fine-grained – to go to the neighborhood level – but in Figure 4 I can break the city into 4 areas, and show the average poverty index level for families with public school enrolled children between the ages of 6 and 16.  The poverty index is income relative to the poverty level where 100 is 100% level, and 185 would be roughly the level that qualifies for reduced price lunch, for example. Figure 4 shows the average differences across 4 areas of the city – classified in the American Community Survey as Public Use Microdata Areas, or PUMAs.

Figure 4: Average “Poverty Index” by Public Use Microdata Area within Cleveland


Figure 5 shows the distributions for each area, and they are different. Clearly, not all Cleveland neighborhoods are comparably economically disadvantaged, even in 100% of the schools are 100% free or reduced price lunch!

Figure 5: Poverty Index distribution by Public Use Microdata Area within Cleveland


Why is this so important in the current policy context?

So then, who really cares? Why does any of this matter? And why now? Well, it has always mattered, and responsible researchers have typically sought more fine-grained indicators of economic status, where available. But we are now in an era where policy researchers are engaged in fast-paced, fast-tracked use of available state administrative data in order to immediately inform policy decision-making. This is a dangerous data environment, and crude poverty measurement has potentially dire consequences.  Here are a few reasons why:

  • Many if not most models rating teacher or school effectiveness rely on a single dummy variable indicating that a child does or does not come from a family that falls below the 185% income level for poverty.

I’ve actually been shocked by this. Reviewing numerous pretty good and even very high quality studies estimating teacher effects on student outcomes, I’ve found an incredible degree of laziness in the specification of student characteristics – specifically student poverty.

Figure 6 shows the poverty components of the New York City Teacher Effectiveness Model. Yep – there it is, a simple dichotomous indicator of qualifying for free or reduced price lunch. No way at all to differentiate between teachers of marginally poor, and very poor children.

Figure 6: Measures included in New York City Teacher Effectiveness Model


In a value-added model of teacher effects, if we use only a crude Yes or No indicator for whether a child is in a family that falls below the 185% income level for poverty, that child who is marginally below that income level is considered no different from the child who is well below that income level – homeless, destitute, multi-generational poverty. Further, in many large urban centers, nearly all children fall below the 185% income level (imagine doing this in Cleveland?). But they are not all the same! The variations in economic circumstances faced by children across schools and classrooms is huge. But the crude measurement ignores that variation entirely. And the lack of sensitivity of these measures to real differences in economic disadvantage likely adversely affects teachers of much poorer children – a model bias that goes unchecked for lack of a more precise indicator to check for the bias!

  • This problem is multiplied by the fact that when these models evaluate the influence of peers on individual student performance, the peer group is also characterized in terms of whether the peers fall below this single income threshold.

In a teacher effectiveness model, the poverty measurement problem operates at two levels. First, at the individual student level mentioned above, where one cannot delineate between the student from a low-income family and the student from a very low income family. Second, “better” value-added teacher effectiveness models also attempt to account for the characteristics of the classroom peer group. But, we are stuck with the same crude measure, which prohibits us from evaluating the effect on any one student’s achievement gains of being in a class of marginally low-income peers versus being in a class of very low-income peers.

Okay, you say, the “best” value added models – especially those used in high stakes teacher evaluation would not be so foolish as to use such a crude indicator. BUT THEY DO, JUST LIKE THE NYC MODEL ABOVE. AND THEY DO SO QUITE CALLOUSLY AND IGNORANTLY.  Why? Because it’s the data they have. The LA Times model uses a single dummy variable for poverty, and does not even include a classroom peer effect aggregation of that variable.

  • Many comparisons of charter and traditional public schools that seek to evaluate whether charters are serving representative populations only compare the total of children qualifying for free or reduced price lunch, or similarly apply simple indicators of free or reduced price lunch status to individual students.

Yet, charter schools seem invariably to serve much more similar rates of children qualifying for free or reduced price lunch when compared to nearby traditional public schools, but serve far fewer children in the lower-income group which qualify for free lunch. Charters seem to be serving the less poor among the poor, in poor neighborhoods, in Newark, NJ or in New York City. Given that the performance differences among these subgroups tend to be quite large, using only the broader classification masks these substantial differences.

In conclusion

Yes, in some cases, we continue to be stuck with these less than precise indicators of child poverty. In some cases, it’s all we’ve got in the data system. But it is our responsibility to seek out better measures where we can, and use the better measures when we have them. We should, whenever possible:

  1. Use the measure that picks up the variation across children and educational settings
  2. Use the measure that serves as the strongest predictor of educational outcomes – the strongest indicator of potential educational disadvantage.
  3. And most importantly, when you don’t have a better measure, and when the stakes are particularly high, and when the crude measure might significantly influence (bias) the results, JUST DON’T DO IT!

Don’t attempt to draw major conclusions about whether charter schools (or any schools or programs for that matter) can do “as well” with low-income children when the indicator for “low income” encompasses equally every child (or nearly every child) in the city in both traditional public and charter schools.

Don’t attempt to label a teacher as effective or ineffective at teaching low-income kids, relative to his or her peers, when your measure of low-income is telling you that nearly all kids in all classrooms are equally low-income, when they clearly are not.

And most importantly, don’t make ridiculous excuses for using inadequate measures!

Student Test Score Based Measures of Teacher Effectiveness Won’t Improve NJ Schools

Op-Ed from: http://www.northjersey.com

The recent Teacher Effectiveness Task Force report recommended basing teacher evaluation significantly on student test scores. A few weeks earlier, Education Commissioner Cerf recommended that teacher tenure and dismissal, as well as compensation decisions be based largely on student assessment data.

Implicit in these recommendations is that the state and local districts would design a system for linking student assessment data to teachers for purposes of estimating teacher effectiveness. The goal of statistical “teacher effectiveness” measurement systems, including the most common approach called value-added modeling (VAM), is to estimate the extent to which a specific teacher contributes to the learning gains of a group (or groups) of students assigned to that teacher in a given year.

Unfortunately, while this all sounds good, it just doesn’t work, at least not well enough to even begin considering using it for making high stakes decisions about teacher tenure, dismissal or compensation. Here’s a short list (my full list is much longer) of reasons why:

  1. It is not possible to equate the difficulty of moving a group of children 5 points (or rank and percentile positions) at one end of a test scale to moving children 5 points at the other end. Yet that is precisely what the proposed evaluations endeavor to accomplish. In such a system, the only fair way to compare one teacher to another would be to ensure that each has a randomly assigned group of children whose initial achievement is spread similarly across the testing scale. Real schools and districts don’t work that way.  It is also not possible to compare a 5 point gain in reading to a 5 point gain in math. These limitations undermine the entire proposed system.
  2. Even with the best models and data, teacher ratings are highly inconsistent from year to year, and have very high rates of misclassification. According to one recent major study, there is a 35% chance of identifying an average teacher as poor, given one year of data, and 25% chance given three years. Getting a good rating is a statistical crap shoot.
  3. If we rate the same teacher with the same students, but with two different tests in the same subject, we get very different results. Cal. Berkeley Economist Jesse Rothstein, re-evaluating the findings of the much touted Gates Foundation Measuring Effective Teaching (MET) study noted that more than 40% of teachers who placed in the bottom quarter on one test (state test) were in the top half when using the other test (alternative). That is, teacher ratings based on the state assessment were only slightly better than a coin toss for identifying which teachers did well using the alternative assessment.
  4. No-matter how hard statisticians try, and no matter how good the data and statistical model, it is very difficult to separate a teacher’s effect on student learning gains from other classroom effects, like peer effect (race and poverty of peer group).  New Jersey schools are highly segregated, hampering our ability to make valid comparisons across teachers who work in vastly different settings. Statistical models attempt to adjust away these differences, but usually come up short.
  5. Kids learn over the summer too and higher income kids learn more than their lower income peers over the summer. As a result, annual testing data aren’t very useful for measuring teacher effectiveness. Annual (rather than fall-spring) testing data significantly disadvantage teachers serving children whose summer learning lags. Setting aside all of the un-resolvable problems above, this one can be fixed with fall-spring assessments. But it cannot be resolved in any fast-tracked plan involving current New Jersey assessments, which are annual. The task force report irresponsibly ignores this HUGE AND OBVIOUS concern, recommending fast-tracked use of current assessment data.
  6. As noted by the task force, only those teachers responsible for reading and math in grades 3 to 8 could readily be assigned ratings (less than 20% of teachers). Testing everything else is a foolish and expensive endeavor. This means school districts will need separate contracts for separate classes of teachers and will have limited ability to move teachers from one contract type to another (from second to fourth grade). Further, pundits have been arguing that a) we should be using effectiveness measures instead of experience to implement layoffs due to budget cuts, and b) we shouldn’t be laying off core, classroom teachers in grades 3 to 8. But those are the only teachers for whom “effectiveness” measures would be available?
  7. Basing teacher evaluations, tenure decisions and dismissal decisions on scores that may be influenced by which students a teacher serves provides a substantial disincentive for teachers to serve kids with the greatest needs, disruptive kids, or kids with disruptive family lives. Many of these factors are not, and can not be captured by variables in the best models. Some have argued that including value-added metrics in teacher evaluation reduces the ability of school administrators to arbitrarily dismiss a teacher. Rather, use of these metrics provides new opportunities to sabotage a teacher’s career through creative student assignment practices.

In short, we may be able to estimate a statistical model that suggests that teacher effects vary widely across the education system – that teachers matter. But we would be hard pressed to use that model to identify with any degree of certainty which individual teachers are good teachers and which are bad.

Contrary to education reform wisdom, adopting such problematic measures will not make the teaching profession a more desirable career option for America’s best and brightest college graduates. In fact, it will likely make things much worse. Establishing a system where achieving tenure or getting a raise becomes a roll of the dice and where a teacher’s career can be ended by a roll of the dice is no way to improve the teacher work force.

Contrary to education reform wisdom, using these metrics as a basis for dismissing teachers will NOT reduce the legal hassles associated with removal of tenured teachers.  As the first rounds of teachers are dismissed by random error of statistical models alone, by manipulation of student assignments, or when larger shares of minority teachers are dismissed largely as a function of the students they serve, there will likely be a new flood of lawsuits like none ever previously experienced. Employment lawyers, sharpen your pencils and round up your statistics experts.

Authors of the task force report might argue that they are putting only 45% of the weight of evaluations on these measures. The rest will include a mix of other objective and subjective measures. The reality of an evaluation that includes a single large, or even significant weight, placed on a single quantified factor is that that specific factor necessarily becomes the tipping point, or trigger mechanism. It may be 45% of the evaluation weight, but it becomes 100% of the decision, because it’s a fixed, clearly defined (though poorly estimated) metric.

Self-proclaimed “reformers” make the argument that the present system of teacher evaluation is so bad as to be non-existent. Reformers argue that the current system has 100% error rate (assuming current evaluations label all teachers as good, when all are actually bad)!

From the “reformer” viewpoint, something is always better than nothing.

Value added is something.

We must do something.

Therefore, we must do value-added.

Reformers also point to studies showing that teacher’s value-added scores are the best predictor (albeit a weak and error prone predictor) of teacher’s future value added scores – a self-fulfilling prophecy. These arguments are incredibly flimsy.

In response, I often explain that if we lived in a society that walked everywhere, and a new automotive invention came along, but had the tendency to burst into a ball of flames on every third start, I think I’d walk. Now is a time to walk! Some innovations just aren’t ready for broad public adoption – and some may never be. Some, like this one, may not be a very good idea to begin with. That said, improving teacher evaluation is not a simple either/or and now may be a good time to step back from this false dichotomy and discuss more productive alternatives.

Expanded gambling okay in NJ, but only if it involves gambling on teachers’ jobs!

I may be the only one in New Jersey who had a twisted enough view of today’s news stories to pick up on this connection. Seemingly irrelevant to my blog, today, the Governor of New Jersey vetoed a bill that would have approved online gambling. At the same time, the Governor’s teacher effectiveness task force released its long-awaited report. And it did not disappoint. Well, I guess that’s a matter of expectations. I had very low expectations to begin with – fully expecting a poorly written, ill-conceived rant about how to connect teacher evaluations to test scores – growth scores – and how it is imperative that a large share of teacher evaluation be based on growth scores. And I got all of that and more!!!!!

I have written about this topic on multiple occasions.

For the full series on this topic, see: https://schoolfinance101.wordpress.com/category/race-to-the-top/value-added-teacher-evaluation/

And for my presentation slides on this topic, including summaries of the relevant research, see: https://schoolfinance101.com/wp-content/uploads/2010/10/teacher-evaluation_general.pdf

When it comes to critiquing the Task Force Report, I’m not even sure where to begin. In short, the report proposes the most ill-informed toxic brew of policy recommendations that one can imagine. The centerpiece, of course, is heavy… very heavy reliance on statewide student testing measures yet to be developed… yet to be evaluated for their statistical reliability … or their meaningfulness of any sort (including predictive validity of future student success). As Howard Wainer explains here, even the best available testing measures are not up to the task of identifying more and less effective teachers: http://www.njspotlight.com/ets_video2/

But who cares what the testing and measurement experts think anyway. This is about the kids… and we must fix our dreadful system and do it now… we can’t wait! The children can’t wait!

So then, what does this have to do with the online gambling veto? Well, it struck me as interesting that, on the one hand, the Governor vetoes a bill that would approve online gambling, but the Governor’s Task Force proposes a teacher evaluation plan that would make teachers’ year to year job security and teacher evaluations largely a game of chance. Yes, a roll of the dice. Roll a 6 and you’re fired! Damn hard to get 3 in a row (positive evaluations) to get tenure. Exponentially easier to get 2 in a row (bad evals) and get fired. No online gambling for sure, but gambling on the livelihood of teachers? That’s absolutely fine!

Interestingly, one of the only external sources even cited (outside of citing the comparably problematic Washington DC IMPACT contract, and think tanky schlock like the New Teacher Project’s “Teacher Evaluation 2.0“), was the Gates Foundation’s Measuring Effective Teaching Project (MET).  Of course, the task force report fails to mention that the Gates Foundation MET project report does not make a very compelling statistical case that using test scores as a major factor for evaluating teachers is a good idea. Actually, they fail to mention anything substantive about the MET reports. I wrote about the MET report here.  And economist Jesse Rothstein took a closer look at the Gates MET findings here! Rothstein concluded:

In particular, the correlations between value-added scores on state and alternative assessments are so small that they cast serious doubt on the entire value-added enterprise. The data suggest that more than 20% of teachers in the bottom quarter of the state test math distribution (and more than 30% of those in the bottom quarter for ELA) are in the top half of the alternative assessment distribution. Furthermore, these are “disattenuated” estimates that assume away the impact of measurement error. More than 40% of those whose actually available state exam scores place them in the bottom quarter are in the top half on the alternative assessment.
In other words, teacher evaluations based on observed state test outcomes are only slightly better than coin tosses at identifying teachers whose students perform unusually well or badly on assessments of conceptual understanding.

Yep that’s right. It’s little more than a coin toss or a roll of the dice! Online gambling (personally, I don’t care one way or the other about it), not okay. Gambling on teachers’ livelihoods with statistical error? Absolutely fine. After all, it’s those damn teachers that have sucked the economy dry with their high salaries and gold-plated benefits packages! And after all, it is the only profession in the world where you can do a really crappy job year after year after year… and you’re totally protected, right? Of course it’s that way. Say it loud enough and enough times, over and over again, and it must be true.

Here are a few random thoughts I have about the report:

  • So… as I understand it, they want to base 45% of a teacher’s evaluation on measures that have a 35% chance of misclassifying an average teacher as ineffective – and these are measures that only apply to about 15 to 20% of the teacher workforce? That doesn’t sound very well thought out to me.
  • Forcing reading and math teachers to be evaluated by measures over which they have limited control, and measures that jump around significantly from year to year and disadvantage teachers in more difficult settings isn’t likely to make New Jersey’s best and brightest jump at the chance to teach in Newark, Camden or Jersey City.
  • Even if the current system of teacher evaluation is less than ideal, it doesn’t mean that we should jump to adopt metrics that are as problematic as these. Promoters of these options would have the public believe that it’s either the status quo – which is necessarily bad – or test-score based evaluation – which is obviously good. This is untrue at many levels. First, New Jersey’s status quo is pretty good. Second, New Jersey’s best public and private schools don’t use test scores as a primary or major source of teacher evaluation. Yet somehow, they are still pretty darn good. So, using or not using test scores to hire and fire teachers is not likely the problem nor is it the solution. It’s an absurd false dichotomy.
  • Authors of the report might argue that they are putting only 45% of the weight of evaluations on these measures. The rest will include a mix of other objective and subjective measures. The reality of an evaluation that includes a single large, or even significant weight, placed on a single quantified factor is that that specific factor necessarily becomes the tipping point, or trigger mechanism. It may be 45% of the evaluation weight, but it becomes 100% of the decision, because it’s a fixed, clearly defined (though poorly estimated) metric.

Here’s a quick run-down on some of the issues associated with using student test scores to evaluate teachers:

[from a forthcoming article on legal issues associated with using test scores to evaluate, and dismiss teachers]

Most VAM teacher ratings attempt to predict the influence of the teacher on the student’s end-of-year test score, given the student’s prior test score and descriptive characteristics – for example, whether the student is poor, has a disability, or is limited in her English language proficiency.[1] These statistical controls are designed to account for the differences that teachers face in serving different student populations.  However, there are many problems associated with using VAM to determine whether teachers are effective.  The remainder of this section details many of those problems.

Instability of Teacher Ratings

The assumption in value-added modeling for estimating teacher “effectiveness” is that if one uses data on enough students passing through a given teacher each year, one can generate a stable estimate of the contribution of that teacher to those children’s achievement gains.[2] However, this assumption is problematic because of the concept of inter-temporal instability: that is, the same teacher is highly likely to get a very different value-added rating from one year to the next.  Tim Sass notes that the year-to-year correlation for a teacher’s value-added rating is only about 0.2 or 0.3 – at best a very modest correlation.  Sass also notes that:

About one quarter to one third of the teachers in the bottom and top quintiles stay in the same quintile from one year to the next while roughly 10 to 15 percent of teachers move all the way from the bottom quintile to the top and an equal proportion fall from the top quintile to the lowest quintile in the next year.[3]

Further, most of the change or difference in the teacher’s value-added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics.[4]

Similarly, preliminary analyses from the Measures of Effective Teaching Project, funded by the Bill and Melinda Gates Foundation found:

When the between-section or between-year correlation in teacher value-added is below .5, the implication is that more than half of the observed variation is due to transitory effects rather than stable differences between teachers. That is the case for all of the measures of value-added we calculated.[5]

While some statistical corrections and multi-year analysis might help, it is hard to guarantee or even be reasonably sure that a teacher would not be dismissed simply as a function of unexplainable low performance for two or three years in a row.

Classification & Model Prediction Error

Another technical problem of VAM teacher evaluation systems is classification and/or model prediction error.  Researchers at Mathematica Policy Research Institute in a study funded by the U.S. Department of Education carried out a series of statistical tests and reviews of existing studies to determine the identification “error” rates for ineffective teachers when using typical value-added modeling methods.[6] The report found:

Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively.[7]

Type I error refers to the probability that based on a certain number of years of data, the model will find that a truly average teacher performed significantly worse than average.[8] So, that means that there is about a 25% chance, if using three years of data or 35% chance if using one year of data that a teacher who is “average” would be identified as “significantly worse than average” and potentially be fired.  Of particular concern is the likelihood that a “good teacher” is falsely identified as a “bad” teacher, in this case a “false positive” identification. According to the study, this occurs one in ten times (given three years of data) and two in ten (given only one year of data).

Same Teachers, Different Tests, Different Results

Determining whether a teacher is effective may vary depending on the assessment used for a specific subject area and not whether that teacher is a generally effective teacher in that subject area.  For example, Houston uses two standardized test each year to measure student achievement: the state Texas Assessment of Knowledge and Skills (TAKS) and the nationally-normed Stanford Achievement Test.[9] Corcoran and colleagues used Houston Independent School District (HISD) data from each test to calculate separate value-added measures for fourth and fifth grade teachers.[10] The authors found that a teacher’s value-added can vary considerably depending on which test is used.[11] Specifically:

among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test.  Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.[12]

Similar issues apply to tests on different scales – different possible ranges of scores, or different statistical modification or treatment of raw scores, for example, whether student test scores are first converted into standardized scores relative to an average score, or expressed on some other scale such as percentile rank (which is done is some cases but would generally be considered inappropriate).  For instance, if a teacher is typically assigned higher performing students and the scaling of a test is such that it becomes very difficult for students with high starting scores to improve over time, that teacher will be at a disadvantage. But, another test of the same content or simply with different scaling of scores (so that smaller gains are adjusted to reflect the relative difficulty of achieving those gains) may produce an entirely different rating for that teacher.

Difficulty in Isolating Any One Teacher’s Influence on Student Achievement

It is difficult if not entirely infeasible to isolate one specific teacher’s contribution to student’s learning, leading to situations where a teacher might be identified as a bad teacher simply because her colleagues are ineffective. This is called a spillover effect. [13] For students who have more than one teacher across subjects (and/or teaching aides/assistants), each teacher’s value-added measures may be influenced by the other teachers serving the same students. Kirabo Jackson and Elias Bruegmann, for example, found in a study of North Carolina teachers that students perform better, on average, when their teachers have more effective colleagues.[14] Cory Koedel found that reading achievement in high school is influenced by both English and math teachers.[15] These spillover effects mean that teachers assigned to weaker teams of teachers might be disadvantaged, through no fault of their own.

Non-Random Assignment of Students Across Teachers, Schools And Districts

The fact that teacher value-added ratings cannot be disentangled from patterns of student assignment across schools and districts leads to the likelihood that teachers serving larger shares of one population versus another are more likely to be identified as effective or ineffective, through no fault of their own.  Non-random assignment, like inter-temporal instability is a seemingly complicated statistical issue. The non-random assignment problem relates not to the error in the measurement (test scores) but to the complications of applying a statistical model to real world conditions. The most fair comparisons between teachers would occur in a case where teachers could be randomly assigned to comparable classrooms with comparable resources, and where exactly the same number of students could be randomly assigned to those teachers, so that each teacher would have the same numbers of children and children of similar family backgrounds, prior performance, personal motivation and other characteristics. Obviously, this does not happen in reality.

Students are not sorted randomly across schools, across districts, or across teachers within schools. And teachers are not randomly assigned across school settings, with equal resources. It is certainly likely that one fourth grade teacher in a school is assigned more difficult students year-after-year than another. This may occur by choice of that teacher – a desire to try to help out these students – or other factors including the desire of a principal to make a teacher’s work more difficult.  While most value-added models contain some crude indicators of poverty status, language proficiency and disability classification, few if any sufficiently mitigate the bias that occurs from non-random student assignment. That bias occurs from such apparently subtle forces as the influence of peers on one another, and the inability of value-added models to sufficiently isolate the teacher effect from the peer effect, both of which occur at the same level of the system – the classroom.[16]

Jesse Rothstein notes that “[r]esults indicate that even the best feasible value-added models may be substantially biased, with the magnitude of the bias depending on the amount of information available for use in classroom assignments.”[17]

Value-added modeling has more recently been at the center of public debate after the Los Angeles Times contracted RAND Corporation economist Richard Buddin to estimate value-added scores for Los Angeles teachers, and the Times reporters then posted the names of individual teachers classified as effective or ineffective on their web site.[18] The model used by the Los Angeles Times, estimated by Buddin, was a fairly typical one, and the technical documentation proved rich with evidence of the types of model bias described by Rothstein and others. For example:

  • 97% of children in the lowest performing schools are poor, and 55% in higher performing schools are poor;
  • The number of gifted children a teacher has affects their value-added estimate positively – The more gifted children the teacher has, the higher the effectiveness rating;
  • Black teachers have lower value-added scores for both English Language Arts and Math than white teachers, and these are some of the largest negative correlates with effectiveness ratings provided in the report – especially for MATH;
  • Having more black students in your class is negatively associated with teacher’s value-added scores, though this effect is relatively small;
  • Asian teachers have higher value-added scores than white teachers for Math, with the positive association between being Asian and math teaching effectiveness being as strong as the negative association for black teachers.

Some of these associations above are explained by related research by Hanushek and Rivkin, which shows measurable effects of the racial composition of peer groups on individual student’s outcomes and explains the difficulty in distilling these effects from teacher effects.[19] Note that it is also likely that associations with teacher race above are entangled with student race, where black teachers are more likely to be in classrooms with larger shares of black students.[20]

All value-added comparisons are relative. They can be used for comparing one teacher to another in a school, teachers in one school to teachers in another school, or in one district to other districts. The reference group becomes critically important when determining the potential for disparate impact of negative teacher ratings, resulting from model bias. For example, if one were to employ a district-wide performance-based dismissal (or retention) policy in Los Angeles using the Los Angeles Times model, one would likely layoff disproportionate numbers of teachers in poor schools and black teachers of black students, while disproportionately retaining Asian teachers.  But, if one adopted the layoff policy relative to within-school rather than district-wide norms, because children are largely segregated by neighborhoods and schools, the disparate effect might be lessened. The policy may neither be fairer nor better in terms of educational improvement, but racially disparate dismissals might be reduced.

Finally, because teacher value-added ratings cannot be disentangled entirely from patterns of student assignment across teachers within schools, principals may manipulate assignment of difficult and/or unmotivated students in order to compromise a teacher’s value-added ratings, increasing the principal’s ability to dismiss that teacher. This concern might be mitigated by requirements for lottery-based student assignment and teacher assignments. However, such requirements could create cumbersome student assignment processes and processes that interfere with achieving the best teacher match for each child.

Whereas the problem of stability rates and error rates above are issues of “statistical error,” the problem of non-random assignment is one of “model bias.” Many value-added ratings of “teacher effectiveness” suffer from both large degrees of error, and severe levels of model bias.  The two are cumulative problems, not overlapping. In fact, the extent of error in the measures may partially mask the full extent of bias. In other words, we might not even know how prodigious the bias is.

In The Best Possible Case, About 20% of Contracted Certified Teachers in a District Might Have Value-Added Scores

Setting aside the substantial concerns above over “measurement error” and “model bias” which severely compromise the reliability and validity of value-added ratings of teachers, in most public school districts, fewer than 20% of certified teaching staff could be assigned any type of value-added assessment score.  Existing standardized assessments typically focus on reading or language arts, and math performance between grades three and eight.  Because baseline scores are required, and ideally multiple prior scores to limit model bias, it becomes difficult to fairly rate third grade teachers.  By middle school or junior high, students are interacting with many more teachers and it becomes more difficult to assign value-added scores to any one teacher. When considering the various support staff roles, specialist teachers, teachers of elective and/or advanced secondary courses, value-added measures are generally applicable to only a small minority of teachers in any school district (<20%). Thus, in order to make value-added measures a defined element of teacher evaluation in teacher contracts, one must have separately negotiated contracts for those teachers to whom these measures apply and this is administratively cumbersome and potentially expensive for districts in these difficult economic times.

Washington DC’s IMPACT teacher evaluation system is one example that differentiates classes of teachers by having, or not, value-added measures.[21] While contractually feasible, this approach creates separate classes of teachers in schools and may have unintended consequences for educational practices, including increasing tensions between non-value-added-rated teachers wishing to pull students of value-added-rated teachers out of class for special projects or activities.


[1] Value-added ratings of teachers are generally not based on a simple subtraction of each student’s spring test score and previous fall test score for a specific subject. Such an approach would clearly disadvantage teachers who happen to serve less motivated groups of students, or students with more difficult home lives and/or fewer family resources to support their academic progress through the year. It would be even more problematic to simply use the spring test score from the prior year as the baseline score, and the spring of the current year to evaluate the current year teacher, because the teacher had little control over any learning gain or loss that may have occurred during the prior summer. And these gains and losses tend to be different for students from higher and lower socio-economic status.  See Karl L. Alexander et al., Schools, Achievement, and Inequality: A Seasonal Perspective, 23 Educ. Eval. and Pol’y Analysis 171 (2001). Recent findings from a study funded by the Bill and Melinda Gates Foundation confirm these “seasonal” effects: “The norm sample results imply that students improve their reading comprehension scores just as much (or more) between April and October as between October and April in the following grade. Scores may be rising as kids mature and get more practice outside of school.” Bill & Melinda Gates Foundation, Learning about Teaching: Initial Findings from the Measures of Effective Teaching Project 8, available at http://www.metproject.org/downloads/Preliminary_Findings-Research_Paper.pdf.

[2] Tim R. Sass, The Stability of Value-Added Measures of Teacher Quality and Implications for Teacher Compensation Policy, Urban Institute (2008), available at http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf. See also Daniel F. McCaffrey et al., The Intertemporal Variability of Teacher Effect Estimates, 4 Educ. Fin. & Pol’y, 572 (2009).

[3] Sass, supra note 27.

[4] Id.

[5] Bill & Melinda Gates Foundation, supra note 26.

[6] Peter Z. Schochet & Hanley S. Chiang, Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains (NCEE 2010-4004). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education (2010).

[7] Id.

[8] Id. at 12.

[9] Sean P. Corcoran, Jennifer L. Jennings & Andrew A. Beveridge, Teacher Effectiveness on High- and Low-Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

[10] Id.

[11] Id.

[12] Id.

[13] Cory Koedel, An Empirical Analysis of Teacher Spillover Effects in Secondary School, 28 Econ. of Educ. Rev.682 (2009).

[14] C. Kirabo Jackson & Elias Bruegmann, Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers, 1 Am. Econ. J.: Applied Econ. 85 (2009).

[15] Koedel, supra note 38.

[16] There exist at least two different approaches to control for peer group composition. On approach, used by Caroline Hoxby and Gretchen Weingarth  involves constructing measures of the average entry level of performance for all other students in the class.  C. Hoxby & G. Weingarth, Taking Race Out of the Equation: School Reassignment and the Structure of Peer Effects, available at http://www.hks.harvard.edu/inequality/Seminar/Papers/Hoxby06.pdf. Another involves constructing measures of the average racial and socioeconomic characteristics of classmates, as done by Eric Hanushek and Steven Rivkin. E. Hanushek & S. Rivkin, School Quality and the Black-White Achievement Gap, available at http://www.nber.org/papers/w12651.pdf?new_window=1.

[17] Jesse Rothstein, Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement, 25 Q. J. Econ. (2008). See also Jesse Rothstein, Student Sorting and Bias in Value Added Estimation: Selection on Observables and Unobservables, available at http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf. Many advocates of value-added approaches point to a piece by Thomas Kane and Douglas Staiger as downplaying Rothstein’s concerns. Thomas Kane & Douglas Staiger, Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation, available at http://www.nber.org/papers/w14607.pdf?new_window=1.   However, Eric Hanushek and Steve Rivkin explain, regarding the Kane and Staiger analysis: “the possible uniqueness of the sample and the limitations of the specification test suggest care in interpretation of the results.” Eric A. Hanushek & Steve G. Rivkin, S., Generalizations about Using Value-Added Measures of Teacher Quality 8, available at http://www.utdallas.edu/research/tsp-erc/pdf/jrnl_hanushek_rivkin_2010_teacher_quality.pdf.

[18] Richard Buddin, How Effective Are Los Angeles Elementary Teachers and Schools?, available at http://www.latimes.com/media/acrobat/2010-08/55538493.pdf.

[19] Eric Hanushek & Steve Rivkin, School Quality and the Black-White Achievement Gap, Educ. Working Paper Archive, Univ. of Ark., Dep’t of Educ. Reform (2007).

[20] Charles T. Clotfelter et al., Who Teaches Whom? Race and the Distribution of Novice Teachers, 24 Econ. of Educ. Rev. 377 (2005).

Smart Guy (Gates) makes my list of “Dumbest Stuff I’ve Ever Read!”

Bill Gates (clearly a very smart guy) has just topped my list of Dumbest Stuff I’ve Ever Read for the first few months of 2011. He did it with this post in the Huffington Post and with his talk to State Governors (in which he also naively handed out copies of the book Stretching the School Dollar, which is complete junk):

http://www.huffingtonpost.com/bill-gates/bill-gates-school-performance_b_829771.html

Let’s dissect two bold premises of Gates’ argument about US spending and student outcomes – how we’ve spent ourselves crazy for decades and how we’ve gotten nothing for it – how we spend so much more than other countries, but they kick our butts – his reasons for arguing that now is the time to flip the curve.

Gates opines:

Compared to other countries, America has spent more and achieved less.

To be able to make such a comparison, one would have to be able to accurately and precisely measure education spending levels in the United States relative to education spending levels in other countries, and achievement outcomes of children in the United States compared to otherwise similar children in other countries. We’ve already heard much blog talk about how poverty rates among US children and children in Finland are, well, not really so comparable – Finland having much lower poverty. Clearly, that makes at least some difference.

But let’s focus on the expenditure side of this puzzle for a moment.

We don’t hear enough about how those expenditure figures are, well, not so comparable either.

International education spending comparisons like those presented by the Organization for Economic Cooperation and Development (OECD) and often reported by organizations like McKinsey are, well, bogus…meaningless… uh…not particularly useful. Why? Because they are not comparable. Plain and simple.

Government or public education expenditures in different countries contain different components. A number of my colleagues and I are in the process of better understanding and delineating the components included in public education expenditures across nations. For example, in a country with a national health care system, public education expenditures may not include health care expenses for all employees. That’s not a trivial expense. The same may be true of pension contributions and obligations, where they exist, in other countries. The same is also true for arts and athletic programs in countries where it is more common for those activities to be embedded in community services. But, we’ve yet to fully identify the extent of these differences across nations or how these differences affect the spending comparisons. What we do know is that they do affect the spending comparisons – and likely quite significantly.

So, that in mind, what can we say about how much the US spends with respect to how well our children do, compared to other countries’ spending and outcomes when neither the spending figure nor the children in the system are even remotely comparable? Not much!

Gates opines:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead.

[from a previous post]

We often see pundits arguing that education spending has doubled over a 30 year period, when adjusted for inflation, and we’ve gotten nothing for it. We’ve got modest growth in NAEP scores and huge growth in spending. And those international comparisons… wow!

The assertion is therefore that our public education system is less cost-effective now than it was 30 years ago. But this assumption is based on layers of flawed reasoning, on both sides of the equation.

Here’s a bit of School Finance 101 on this topic:

First, what are the two sides of the equation, or at least the two parts of the fraction? The numerator here is education spending and how we measure it now compared to previously. The major flaw in the usual reasoning is that we are making our comparison of the education dollar now to then by simply adjusting the value of that dollar for the average changes in the prices of goods purchased by a typical consumer (food, fuel, etc.), or the Consumer Price Index.

Unfortunately, the consumer price index is relatively unhelpful (okay, useless) for comparing current education spending to past education spending, unless we are considering how many loaves of bread or gallons of gas can be purchased with the education dollar.

If we wanted to maintain constant quality education over time, the main thing we’d have to do is maintain a constant quality workforce in schools – mainly a teacher workforce, but also administrators, etc. At the very least, if quality lagged behind we’d have to be able to offset the quality losses with additional workers, but the trade-offs are hard to estimate.

The quality of the teacher workforce is influenced much more by the competitiveness of the wages for teachers, compared to other professions, than to changes in the price of a loaf of bread or gallon of gas. If we want to get good teachers, teaching must be perceived as a desirable profession with a competitive wage. That is, to maintain teacher quality we must maintain the competitiveness of teacher wages (which we have not over time) and to improve teacher quality, we must make teacher wages (or working conditions) more competitive. On average, non-teacher wage growth has far outpaced the CPI over time and on average, teacher wages have lagged behind non-teacher wages, even in New Jersey!

Now to the denominator or the outcomes of our education system. First of all, if we allow for a decline in the quality of the key input – teachers – we can expect a decline in the outcomes however we choose to measure them. But, it is also important to understand that if we wish to achieve either higher outcomes, or to achieve a broader array of outcomes, or to achieve higher outcomes in key areas without sacrificing the broader array of outcomes, costs will rise. In really simple terms, the cost of doing more is more, not less. And yes, a substantial body of rigorous peer-reviewed empirical literature supports this contention (a few examples below).

So, as we ask our schools to accomplish more we can expect the costs of those accomplishments to be greater. If we expect our children to compete in a 21st century economy, develop technology skills and still have access to physical education and arts, it will likely cost more, not less, than achieving the skills of 1970. But, we must also make sure we are adequately measuring the full range of outcomes we expect schools to accomplish. If we are expecting schools to produce engaged civic participants, we may or may not see the measured effects in elementary reading and math test scores.

An additional factor that affects the costs of achieving educational outcomes is the student inputs – or who is showing up at the schoolhouse door (or logging in to the virtual school). A substantial body of research (see chapter by Duncombe and Yinger, here) explains how child poverty, limited English proficiency, unplanned mobility and even school racial composition may influence the costs of achieving any given level of student outcomes. Differences in the ways children are sorted across districts and schools create large differences in the costs of achieving comparable outcomes and so too do changes in the overall demography of the student population over time. Escalating poverty, and mobility induced by housing disruptions, increased numbers of children not speaking English proficiently all lead to increases of the cost of achieving even the same level of outcomes achieved in prior years. This is not an excuse. It’s reality. It costs more to achieve the same outcomes with some students than with others.

In short, the “cost” of education rises as a function of at least 3 major factors:

  1. Changes in the incoming student populations over time
  2. Changes in the desired outcomes for those students, including more rigorous core content area goals or increased breadth of outcome goals
  3. Changes in the competitive wage of the desired quality of school personnel

And the interaction of all three of these! For example, changing student populations making teaching more difficult (a working condition), meaning that a higher wage might be required to simply offset this change. Increasing the complexity of outcome goals might require a more skilled teaching workforce, requiring higher wages.

The combination of these forces often leads to an increase in education spending that far outpaces the consumer price index, and it should. Cost rise as we ask more of our schools, as we ask them to produce a citizenry that can compete in the future rather than the past. Costs rise as the student population inputs to our public schooling system change over time. Increased poverty, language barriers and other factors make even the current outcomes more costly to achieve. And costs of maintaining the quality of the teacher workforce change as competitive wages in other occupations and industries change, which they have.

Typically, state school finance systems have not kept up with the true increased costs of maintaining teacher quality, increased outcome demands or changing student demography. Nor have states sufficiently targeted resources to districts facing the highest costs of achieving desired outcomes. See www.schoolfundingfairness.org. And many states, with significantly changing demography including Arizona, California and Colorado have merely maintained or even cut current spending levels for decades (despite what would be increased costs of even maintaining current outcome levels).

Evaluating education spending solely on the basis of changes in the price of a loaf of bread and/or gallon of gasoline is, well, silly.

Notably, we may identify new “efficiencies” that allow us to produce comparable outcomes, with comparable kids at lower cost. We may find some of those efficiencies through existing variation across schools and districts, or through new experimentation. But it is downright foolish to pretend that those efficiencies are simply out there (even if we can’t see them, or don’t know them) and we can simply squeeze the current system into achieving comparable or better outcomes at lower cost.

Closing thought

So, Mr. Gates… neither of your two main premises rests on solid footing. Not only that, but these arguments are so commonplace and so intellectually flimsy and lazy as to be outright embarrassing.

I know you’ve got other things to think about and likely rely heavily on advisers to help you shape these arguments, much like politicians rely heavily on their staffers. Here’s a tip Mr. Gates. YOU ARE GETTING REALLY BAD, DEEPLY FLAWED ADVICE AND INFORMATION WHEN IT COMES TO SCHOOL FUNDING ARGUMENTS.

There are many, many credible school finance and economics of education scholars out there. Those who you have chosen to rely on in many instances – authors of Stretching the School Dollar and others are not credible scholars of school finance or education policy more generally. I tackle some of the other myths driving the current debate in these two recent posts:

School Funding Equity Smokescreens

School Funding Myths & Stepping Outside the “New Normal”

I don’t pretend by any stretch to be the only credible source, or the best one (or even one of the top 20, 50 or 100). And we in the field certainly don’t all agree on all, or perhaps even most topics. I’d try listing the many exceptional school finance and economics of education scholars here, but I’d likely end up leaving some really important ones out. I’ll gladly inform you directly regarding which scholars may provide the most useful information regarding specific topics and issues.

Cheers!

Related Readings

Baker, B.D., Taylor, L., Vedlitz, A. (2008) Adequacy Estimates and the Implications of Common Standards for the Cost of Instruction. National Research Council.  http://www7.nationalacademies.org/CFE/Taylor%20Paper.pdf

Duncombe, W., Lukemeyer, A., Yinger, J. (2006) The No Child Left Behind Act: Have Federal Funds been Left Behind? http://cpr.maxwell.syr.edu/efap/Publications/costing_out.pdf

This second one is a really fun article showing the vast differences in the costs of achieving NCLB proficiency targets in two neighboring states which happen to have very different testing standards. In really simple terms, Missouri has a hard test with low proficiency rates and Kansas and easy test with high proficiency rates. The authors show the cost implications of achieving the lower, versus higher tested achievement standards.

School Funding Equity Smokescreens: A note to the Equity Commission

In this blog post, I summarize a number of issues I’ve addressed in the past. In my previous post, I discussed general reformy myths about school spending. In this post, I address smokescreens commonly occurring in DC beltway rhetoric about school funding equity and adequacy. School funding is largely a state and local issue, where even that “local” component is governed under state policies. So I guess that makes it a state issue, really. Occasionally, the federal government will dabble in the debate over how or whether to intervene more extensively in state and local public school finance.  Now is one of those times where the federal government is again at least paying lip services to the question of equity – with some implication that they may even be talking about school funding equity. The federal government has created an equity commission!

One of my fears is that this current discussion of funding equity will be typical of recent beltway discussions of school funding, and be trapped in the constant fog of School Funding Smokescreens and insulated entirely from more legitimate representations and analyses of the critical issues that should be addressed.

So, for you – the equity commission – here’s a quick run down on School Funding Smokescreens:

1. On average, nationally, we now put more funding into higher poverty school districts than lower poverty ones (to no avail)

This argument seems to be popping up more and more of late, and often with the table below attached. This table is from the National Center for Education Statistics and shows the average current operating expenditure per pupil of school districts nationally over time. The table would appear to show that in 1994-95, low poverty school districts had between $300 and $400 less in per pupil spending than higher poverty ones. By 2006-07, the highest poverty quintile of school districts had about $100 per pupil more than the lowest poverty quintile. That’s it. We’re done. Equity problems fixed. No more savage inequalities. And after all of this fixing of school finance equity, we really got nothing for it. Achievement gaps are still unacceptably large and NAEP scores stagnant? Right? All of this after dumping a whole extra $100 per pupil into high poverty districts. I guess we should be rethinking this crazy strategy of systematically pouring so much into high poverty districts.

Table 1

NCES Oversimplification of Funding Differences by Poverty


Well, to begin with, a $100 difference really wouldn’t be that much anyway, given that the costs of actually meeting the needs of children from economically disadvantaged backgrounds are much greater than this. Setting that (really important) question aside, this table provides a less than compelling argument that we as a nation have accomplished improved funding equity for kids in high poverty districts.

Here’s the underlying scatter of school districts that lead to the neatly packed aggregations above. In the graph below, districts are plotted by current expenditures per pupil with respect to census poverty rates, using 2007-08 data. Clearly there is substantial variation in current spending. In fact, the underlying relationship isn’t even a relationship at all. It’s all over the place. And yes, if you fit a trend line – if you take out a huge magnifying glass – you can see that the trendline is ever so slightly higher in the higher poverty schools than in the lower poverty ones (perhaps about $100?). It’s not systematic. It’s not statistically significant. It’s pretty darn meaningless.

Figure 1

Pattern of school districts underlying Table 1


In our recent report Is school funding fair? we conducted a far more rigorous analysis of state and local revenue per pupil with respect to poverty, for each state. What we showed was that there exists huge variation across states both in the overall level of resources available to local public school districts and in the differences in state and local revenue in higher and lower poverty districts.  In that report, we showed that 9 states have statistically significantly lower state and local revenue per pupil in higher poverty districts (after controlling for economies of scale and competitive wage variation). Overall, half of states had lower funding per pupil in higher poverty districts (with many of those approximately the same).

Among the worst states were New York, Illinois and Pennsylvania. Let’s pull Illinois forward in Figure 1 – and also look at state and local revenues (excluding federal support, to focus on state policies) in place of current expenditures.

Figure 2

State and local revenues with respect to poverty, with Illinois highlighted


Now, when we exclude federal revenues the overall line tips slightly downward. The federal effect is slight, but there. More strikingly, when we pull Illinois forward in the picture, we see that funding by poverty across Illinois districts is highly regressive, and is systematic and statistically significant. Funding inequities across Illinois districts are far from being resolved. AND ILLINOIS IS NOT ALONE. I could go on and on with this.

2. The remaining (Because of #1), most egregious disparities in funding and teacher quality occur across schools within districts (because of politically motivated and corrupt local administrators) and these disparities are what cause the persistent racial achievement gaps (the reason those gaps haven’t improved since we’ve fixed between district inequity)

To many, this argument seems absurd (and is) on its face. Who really says that? Does anyone? Am I just makin’ this stuff up? No. And in fact, because this argument has become so pervasive of late, I even had to take the time to write a fairly extensive research article on the topic. See: http://epaa.asu.edu/ojs/article/view/718

I have written about this topic on my blog on several occasions and much of my writing on this topic can be found by reading my critiques of reports from the Center for American Progress and from the Education Trust. Here are some choice quotes where CAP and Ed Trust frame this argument – or blow this smoke!

Center for American Progress

State funding formulas tend to exert an equalizing effect on per pupil revenues between districts, on average, and not by accident. These formulas were sculpted by two generations of litigation and legislation seeking equitable or adequate funding for property-poor school districts.

Scandalous inequity in the distribution of resources within school districts has plagued U.S. education for more than a hundred years.

empirical literature documenting the extent of within-district inequity is astonishingly thin. [my reply: well, not if you actually read the research on the topic]

Center for American Progress

The outcome of such practices is predictable: A further widening of the dangerous achievement gap that has become endemic in American schools today.

Education Trust

Many states have made progress in closing the funding gaps between affluent school districts and those serving the highest concentrations of low-income children. But a hidden funding gap between high-poverty and low-poverty schools persists between schools within the same district.

These gaps occur partly because teachers in wealthier schools tend to earn more than their peers in high-poverty schools and because of pressure to “equalize” other resources across schools.

All of these claims that within district inequities are the major source of persistent inequity and that our failure to close within district funding and teacher quality gaps (having already fixed between district ones) are the reason for persistent black-white and poor-non-poor achievement gaps might be reasonable if poor children and non-poor children and black children and white children actually lived in the same school districts. BUT, IN GENERAL,* THEY DO NOT! As a result this argument is patently absurd, ridiculous, irresponsible and ignorant. It’s one massive distraction. A smokescreen of monumental proportion!

Here’s a quick visual of the reality that any informed analyst (or anyone who simply lives in the real world) understands. Below are two maps of the Chicago metropolitan area. On the left is a map which shows school districts and the level of state and local revenue per pupil in each of those districts. We know from above that Illinois maintains a very regressive state school finance formula. That is, higher poverty districts have less funding than lower poverty ones. Note that the diagonal shading indicates the location of districts that have majority minority (black and Hispanic) enrollment. As it turns out, most of those districts are in the orange – lower funding levels.

Now, CAP and Ed Trust would have you believe otherwise to begin with (that poor minority districts already have enough money), but would then go further to say that the real problem is that these Illinois districts are putting money into their white, rich schools at the expense of their poor black and Hispanic ones. How is that even possible?

Okay, so let’s look at the right hand panel, in which I have indicated the locations of individual schools, with majority black schools in red and majority Hispanic schools in purple. Majority white schools are in white. NOTE THAT THE WHITE DOTS TEND TO BE DISTRICTS ENTIRELY SEPARATE FROM THE PURPLE OR RED ONES. AND ONLY CHICAGO PUBLIC SCHOOLS HAS MUCH OF A MIX OF PURPLE AND RED. Only a handful of districts have both white and majority minority schools. Also, only a handful of districts have both low(er) poverty and high poverty schools. Districts are highly segregated.

Figure 3

State and Local Revenue and the Location of Majority Minority Schools in Illinois


FEW IF ANY SCHOOL DISTRICTS IN THIS MAP HAVE THE OPPORTUNITY TO REDISTRIBUTE RESOURCES ACROSS THEIR “RICH” AND “POOR” OR “BLACK” AND “WHITE” SCHOOLS – BECAUSE THEY DON’T HAVE BOTH!!!!!  Yes, Chicago Public Schools and a few other districts can re-allocate between poor black and poor Hispanic schools. But such re-allocation accomplishes little toward improving educational equity in the Chicago metro area or State of Illinois.

Now… to those at Ed Trust and CAP – if you really don’t mean this, I dare you to actually say it. Say that between district differences in demographics and funding are THE BIG ISSUE. At least as big if not much bigger than within district differences. Say it. Acknowledge it. I challenge you. Release another hastily crafted report and press release – but this time – having conclusions that are at least reasonably grounded in reality.  The data are unambiguous in this regard. Yes, within district disparities exist and it is important to address them. I will certainly admit that, and I’ve never said otherwise.  But solving within district resource variation alone will accomplish very little.

*Clarification – In states with county-wide districts, and large diverse populations, like Florida, one is more likely to see between school  within district segregation to be a greater problem.

3. High need, poor urban districts (in addition to misallocating all of their resources to the schools serving rich white kids in their district???) are simply wasting massive sums of money on things like cheer leading and ceramics.

This is another absurd and empirically unfounded argument. Again, you ask, is anyone really saying that high need, low performing school districts are actually wasting money on cheerleading and ceramics that could easily be translated into sufficient resources for improving reading and math performance (can we really fire the cheer leading coach and hire 6 more math specialists)? Surely no-one is advancing an argument – SMOKESCREEN – that utterly absurd. But again, these quotes can be found all over the Beltway talk-circuit regarding the best fixes for school funding inequities and inefficiencies (and nifty was to stretch that school dollar).

Here’s the advertisement headline from a recent beltway discussion at the Urban Institute:

Urban Institute Event Headline (based on content from Marguerite Roza)

Imagine a high school that spends $328 per student for math courses and $1,348 per cheerleader for cheerleading activities. Or a school where the average per-student cost of offering ceramics was $1,608; cosmetology, $1,997; and such core subjects as science, $739.

I’ve only recently begun exploring more deeply the resource differences across school districts that fall into different performance and efficiency categories. I’ve been specifically looking at Illinois and Missouri school districts, and estimating statistical models to determine which districts are:

a) resource constrained and low performing (low-low)

b) resource constrained and high performing (low-high)

c) resource rich and high performing (high-high)

d) resource rich and low performing (high-low)

These categories are based on thoroughly cost adjusted analysis. As such, a district identified as having low or constrained “resources” may actually spend more per pupil in nominal dollars than a district identified as having high resource levels. The resource levels are adjusted for various cost pressures including differences in student needs. I should be posting the forthcoming paper on my research page some time in the next month. But here’s a preview.

In both states, most districts fall into categories a) and c), where you would expect. There’s somewhat more “scatter” in Missouri, either because Missouri has some better funded high need districts (less regressive than Illinois) or because my statistical model just isn’t working quite right. I picked these neighboring states because Missouri is less regressive than Illinois and because I had similar data on both. So, the big question here is – if I compare the dominant categories of resource constrained low performing schools to resource rich high performing ones what do we actually see in the organization of their staffing and course delivery?

In Missouri, I tabulate each individual course to which teachers are assigned. In Illinois my tabulation is by the main assignment of each teacher. To begin with, in both states, the high spending high performing schools have more course offerings per pupil and more teachers per pupil (and smaller class sizes). These differences are far greater under the more regressive Illinois policies.

Here are a few fun visuals of what I’m finding so far, expressed in “shares of staff” allocation and relating staffing allocations in low-low districts to those of high-high districts.

The first two graphs compare the main assignments of teachers in high resource high performing Illinois schools (high school assignments only) to those in low resource low performing ones. The diagonal line represents “comparable” allocation to high resource high performing schools. Assignments falling below the line represent “deficits” (relative) in low resource low performing schools.

Across all assignment areas, Figure 4 shows that kids in low resource low performing schools tend to have reduced access to physical education, biology, chemistry and foreign language. Sadly, no indicator for ceramics in these data.

Figure 4

Allocation of Main Teaching Assignments in Illinois Districts


Focusing on less frequent assignment areas – lower budget share & staff allocation areas – Figure 5 shows that in Illinois, kids in low resource low performing schools tend to have reduced access to advanced math and science courses and drivers education, but have greater access to basic courses. That is, these districts are already channeling their resources to the basic, at the detriment of potentially important advanced coursework in math and science, and even basic coursework in biology and chemistry.

Figure 5

Allocation of Main Teaching Assignments in Illinois Districts (less frequent assignments)


Missouri – despite having somewhat higher relative resource levels in higher poverty settings (than Illinois, but still regressive), shows very similar patterns. Figure 6 shows reduced access to physical education for kids in low resource low outcome schools and elevated access to “general” math and language arts courses.

Figure 6

Allocation of Assigned Courses in Missouri Districts


Kids in low resource low outcome schools have reduced access to advanced math courses including calculus and trigonometry, and reduced access to chemistry. They have higher shares of teachers in special education, basic life skills, earth and physical (basic/introductory) science and in JROTC. Again, significant reallocation to “basics” is already occurring and within significant resource constraints.

Figure 7

Allocation of Assigned Courses in Missouri Districts (less frequent courses)


Also, LET IT BE KNOWN THAT HIGH SPENDING HIGH PERFORMING SCHOOLS IN MISSOURI HAVE TWICE AS MANY CERAMICS COURSE OFFERINGS PER PUPIL AS LOW SPENDIGN LOW PERFORMING MISSOURI SCHOOLS!!!!!

4. None of this school funding equity – between district stuff – matters anyway!

Rigorous peer reviewed studies do show that state school finance reforms matter. Shifting the level of funding can improve the quality of teacher workforce and ultimately the level of student outcomes and shifting the distribution of resources can shift the distribution of outcomes.

We conclude that there is arbitrariness in how research in this area appears to have shaped the perceptions and discourse of policymakers and the public. Methodological complexities and design problems plague finance impact studies. Advocacy research that has received considerable attention in the press and elsewhere has taken shortcuts toward desired conclusions, and this is troubling. As demonstrated by our own second look at the states discussed in Hanushek and Lindseth’s book, the methods used for such relatively superficial analyses are easily manipulable and do not necessarily lead to the book’s conclusions. Higher quality research, in contrast, shows that states that implemented significant reforms to the level and/or distribution of funding tend to have significant gains in student outcomes. Moreover, we stress the importance of the specific nature of any given reform: positive outcomes are likely to arise only if the reform is both significant and sustained. Court orders alone do not ensure improved outcomes, nor do short-term responses.

Dice Rolling Activity for New Jersey Teachers

Yesterday, New Jersey’s Education Commissioner announced his plans for how teachers should be evaluated, what teachers should have to do to achieve tenure, and on what basis a teacher could be relieved of tenure. In short, Commissioner Cerf borrowed from the Colorado teacher tenure and evaluation plan which includes a few key elements (Colorado version outlined at end of post):

1. Evaluations based 50% on teacher effectiveness ratings generated with student assessment data – or value-added modeling (though not stated in those specific terms)

2. Teachers must receive 3 positive evaluations in a row in order to achieve tenure.

3. Teachers can lose tenure status or be placed at risk of losing tenure status if they receive 2 negative evaluations in a row.

This post is intended to illustrate just how ill-conceived – how poorly thought out – the above parameters are. This all seems logical on its face, to anyone who knows little or nothing about the fallibility of measuring teacher effectiveness or probability and statistics more generally. Of course we only want to tenure “good” teachers and we want a simple mechanism to get rid of bad ones. If it was only that easy to set up simple parameters of goodness and badness and put such a system into place. Well, it’s not.

Here’s an activity for teachers to try today. It may take more than a day to get it done.

MATERIALS: DICE (well, really just one Die)! That’s all you need!

STEP 1: Roll Die. Record result. Roll again. Record result. Keep rolling until you get the same number 3 times in a row. STOP. Write down the total number of rolls.

STEP 2: Roll Die. Record result. Roll again. Record result. Keep rolling until you get the same number 2 times in a row. STOP. Write down the total number of rolls.

Post your results in the comments section below.

Now, what the heck does this all mean? Well, as I’ve written on multiple occasions, the year to year instability of teacher ratings based on student assessment scores is huge. Alternatively stated, the relationship between a teacher’s rating in one year and the next is pretty weak. The likelihood of getting the same rating two straight years is pretty low, and three straight years is very low. The year to year correlation, whether we are talking about the recent Gates/Kane studies or previous work, is about .2 to .3. There’s about a 35% chance that an average teacher in any year is misidentified as poor, given one year of data, and 25% chance given two years of data. That’s very high error rate and very low year to year relationship. This is noise. Error. Teachers – this is not something over which you have control! Teachers have little control over whether they can get 3 good years in a row. AND IN THIS CASE, I’M TALKING ONLY ABOUT THE NOISE IN THE DATA, NOT THE BIAS RESULTING FROM WHICH STUDENTS YOU HAVE!

What does this mean for teachers being tenured and de-tenured under the above parameters? Given the random error, instability alone, it could take quite a long time, a damn long time for any teacher to actually string together 3 good years of value added ratings. And even if one does, we can’t be that confident that he/she is really a good teacher. The dice rolling activity above may actually provide a reasonable estimate of how long it would take a teacher to get tenure (depending on how high or low teacher ratings have to be to achieve or lose tenure). In that case, you’ve got a 1/6 chance with each roll that you get the same number you got on the previous. Of course, getting the same number as your first roll two more times is a much lower probability than getting that number only one more time. You can play it more conservatively by just seeing how long it takes to get 3 rolls in a row where you get a 4, 5 or 6 (above average rating), and then how long it takes to get only two in a row of a 1, 2, or 3.

What does that mean? That means that it could take a damn long time to string together the ratings to get tenure, and not very long to be on the chopping block for losing it. Try the activity. Report your results below.

Each roll above is one year of experience. How many rolls did it take you to get tenure? And how long to lose it?

Now, I’ve actually given you a break here, because I’ve assumed that when you got the first of three in a row, that the number you got was equivalent to a “good” teacher rating. It might have been a bad, or just average rating. So, when you got three in a row, those three in a row might get you fired instead of tenured. So, let’s assume a 5 or a 6 represent a good rating. Try the exercise again and see how long it takes to get three 5s or three 6s in a row. (or increase your odds of either success or failure by lumping together any 5 or 6 as successful and any 1 or 2 as unsuccessful, or counting any roll of 1-3 as unsuccessful and any roll of 4 -6 as successful)

Of course, this change has to work both ways too. See how long it takes to get two 1s or two 2s in a row, assuming those represent bad ratings.

Now, defenders of this approach will likely argue that they are putting only 50% of the weight of evaluations on these measures. The rest will include a mix of other objective and subjective measures. The reality of an evaluation that includes a single large, or even significant weight, placed on a single quantified factor is that that specific factor necessarily becomes the tipping point, or trigger mechanism. It may be 50% of the evaluation weight, but it becomes 100% of the decision, because it’s a fixed, clearly defined (though poorly estimated) metric.

In short, based on the instability of measures alone, the average time to tenure will be quite long, and highly unpredictable. And, those who actually get tenure may not be much more effective, or any more, than those who don’t. It’s a crap shoot. Literally!

Then, losing tenure will be pretty easy… also on a crap shoot… but your odds of losing are much greater than your odds were of winning.

And who’s going to be lining up for these jobs?

Summary of research on “intertemporal instability” and “error rates”

The assumption in value-added modeling for estimating teacher “effectiveness” is that if one uses data on enough students passing through a given teacher each year, one can generate a stable estimate of the contribution of that teacher to those children’s achievement gains.[1] However, this assumption is problematic because of the concept of inter-temporal instability: that is, the same teacher is highly likely to get a very different value-added rating from one year to the next.  Tim Sass notes that the year-to-year correlation for a teacher’s value-added rating is only about 0.2 or 0.3 – at best a very modest correlation.  Sass also notes that:

About one quarter to one third of the teachers in the bottom and top quintiles stay in the same quintile from one year to the next while roughly 10 to 15 percent of teachers move all the way from the bottom quintile to the top and an equal proportion fall from the top quintile to the lowest quintile in the next year.[2]

Further, most of the change or difference in the teacher’s value-added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics.[3]

Similarly, preliminary analyses from the Measures of Effective Teaching Project, funded by the Bill and Melinda Gates Foundation found:

When the between-section or between-year correlation in teacher value-added is below .5, the implication is that more than half of the observed variation is due to transitory effects rather than stable differences between teachers. That is the case for all of the measures of value-added we calculated.[4]

While some statistical corrections and multi-year analysis might help, it is hard to guarantee or even be reasonably sure that a teacher would not be dismissed simply as a function of unexplainable low performance for two or three years in a row.

Classification & Model Prediction Error

Another technical problem of VAM teacher evaluation systems is classification and/or model prediction error.  Researchers at Mathematica Policy Research Institute in a study funded by the U.S. Department of Education carried out a series of statistical tests and reviews of existing studies to determine the identification “error” rates for ineffective teachers when using typical value-added modeling methods.[5] The report found:

Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively.[6]

Type I error refers to the probability that based on a certain number of years of data, the model will find that a truly average teacher performed significantly worse than average.[7] So, that means that there is about a 25% chance, if using three years of data or 35% chance if using one year of data that a teacher who is “average” would be identified as “significantly worse than average” and potentially be fired.  Of particular concern is the likelihood that a “good teacher” is falsely identified as a “bad” teacher, in this case a “false positive” identification. According to the study, this occurs one in ten times (given three years of data) and two in ten (given only one year of data).

Same Teachers, Different Tests, Different Results

Determining whether a teacher is effective may vary depending on the assessment used for a specific subject area and not whether that teacher is a generally effective teacher in that subject area.  For example, Houston uses two standardized test each year to measure student achievement: the state Texas Assessment of Knowledge and Skills (TAKS) and the nationally-normed Stanford Achievement Test.[8] Corcoran and colleagues used Houston Independent School District (HISD) data from each test to calculate separate value-added measures for fourth and fifth grade teachers.[9] The authors found that a teacher’s value-added can vary considerably depending on which test is used.[10] Specifically:

among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test.  Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.[11]

Similar issues apply to tests on different scales – different possible ranges of scores, or different statistical modification or treatment of raw scores, for example, whether student test scores are first converted into standardized scores relative to an average score, or expressed on some other scale such as percentile rank (which is done is some cases but would generally be considered inappropriate).  For instance, if a teacher is typically assigned higher performing students and the scaling of a test is such that it becomes very difficult for students with high starting scores to improve over time, that teacher will be at a disadvantage. But, another test of the same content or simply with different scaling of scores (so that smaller gains are adjusted to reflect the relative difficulty of achieving those gains) may produce an entirely different rating for that teacher.

Brief Description of Colorado Model

Colorado, Louisiana, and Tennessee have teacher evaluation systems proposed that will require 50% or more of the evaluations to be based on their students’ academic growth.  This section summarizes the evaluation systems in these states as well as the procedural protections that are provided for teachers.

Colorado’s statute creates a state council for educator effectiveness that advises the state board of education.[12] A major goal of these councils is to aid in the creation of teacher evaluation systems that “every teacher is evaluated using multiple fair, transparent, timely, rigorous, and valid methods.”[13] Considerations of student academic growth must comprise at least 50% of each evaluation.[14] Quality measures for teachers must include “measures of student longitudinal academic growth” such as “interim assessments results or evidence of student work, provided that all are rigorous and comparable across classrooms and aligned with state model content standards and performance standards.”[15] These quality standards must take diverse factors into account including “special education, student mobility, and classrooms with a student population in which ninety-five percent meet the definition of high-risk student.”[16]

Colorado’s statute also calls for school districts to develop appeals procedures.  A teacher or principal who is deemed ineffective must receive written notice, documentation used for making this determination, and identification of deficiency.[17] Further, the school district must ensure that a tenured teacher who disagrees with this designation has “an opportunity to appeal that rating, in accordance with a fair and transparent process, where applicable, through collective bargaining.”[18] If no collective bargaining agreement is in place, then the teacher may request a review “by a mutually agreed-upon third party.”[19] The school district or board for cooperative services must develop a remediation plan to correct these deficiencies, which will include professional development opportunities that are intended to help the teacher achieve an effective rating in her next evaluation.[20] The teacher or principal must receive a reasonable amount of time to correct such deficiencies.[21]


[1] Tim R. Sass, The Stability of Value-Added Measures of Teacher Quality and Implications for Teacher Compensation Policy, Urban Institute (2008), available at http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf. See also Daniel F. McCaffrey et al., The Intertemporal Variability of Teacher Effect Estimates, 4 Educ. Fin. & Pol’y, 572 (2009).

[2] Sass, supra note 27.

[3] Id.

[4] Bill & Melinda Gates Foundation, supra note 26.

[5] Peter Z. Schochet & Hanley S. Chiang, Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains (NCEE 2010-4004). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education (2010).

[6] Id.

[7] Id. at 12.

[8] Sean P. Corcoran, Jennifer L. Jennings & Andrew A. Beveridge, Teacher Effectiveness on High- and Low-Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

[9] Id.

[10] Id.

[11] Id.

[12] Co. Rev. Stat. § 22-9-105.5(2)(a) (2010).

[13] Id. § 22-9-105.5(3)(a).

[14] Id.

[15] Id.

[16] Id.  The statute also calls for the creation of performance evaluation councils that advise school districts.  Id. § 22-9-107(1).  The performance evaluation councils also help school districts develop teacher evaluation systems that must be based on the same measures as that developed by the state council for educator effectiveness.  Id. § 22-9-106(1)(e)(II).  However, the performance evaluation councils lose their authority to set standards once the state board has promulgated rules and the initial phase of statewide implementation has been completed.  Id. § 22-9-106(1)(e)(I).

[17] Id. § 22-9-106(3.5)(b)(II).

[18] Id.

[19] Id.

[20] Id.

[21] Id.

School Funding Myths & Stepping Outside the “New Normal”

I’ve been writing quite a bit lately about rather complex state school finance formula issues. That is much the point of this blog.  But it may be hard for some to see how my recent posts on school finance relate back to the broader reform agenda, and to understand the implications of these posts for state policies. Let me try to summarize these posts – posts on spending bubbles, the “New Normal” and school finance PORK.  My overarching goal in these posts is to explain that much of the reformy rhetoric about budget cuts and the “New Normal” is based on myth about how school funding works and what we should be doing in these catastrophic economic times.

Here are the myths and some of the realities.

Reformy myth #1: That every state has done its part and more, to pour money into high need, especially poor urban districts. It hasn’t worked, mainly because teachers are lazy and overpaid and not judged on effectiveness, measured by value-added scores. So, now is the time to slash the budgets of those high need districts, where all of the state aid is flowing, and fire the worst teachers. And, it will only help, not hurt.

Reality: Only a handful of states have actually targeted substantial additional resources to high need districts. See www.schoolfundingfairness.org. And the effort of states to finance their elementary and secondary education systems varies widely. Some states have in fact systematically reduced their effort to finance local public schools for decades. That is, the tax burden to finance public schools in some states is much lower now than it was decades ago. Very few states apply much higher effort than in the past.  See: https://schoolfinance101.wordpress.com/2010/12/23/is-it-the-new-normal-or-the-new-stupid/

Reformy myth #2: The only aid to be cut, the aid that should be cut, and the aid that must be cut in the name of the public good, is aid to high need, large urban districts in particular. The argument appears to be that handing down state aid cuts as a flat percent of state aid is the definition of “shared sacrifice.” And the garbage analysis of district Return on Investment by the Center for American Progress, of course, validates that high need urban districts tend to be least efficient anyway. Therefore, levying the largest cuts on those districts is entirely appropriate.

Reality: As I have discussed in my series of recent posts, if there are going to be cuts – if states really believe that cuts to state aid are absolutely necessary, many state aid formulas include aid that is more appropriate to cut. That is, aid to districts who really don’t need that aid. Aid to districts that can already spend well above all others with less local effort. Aid to districts that will readily replace their losses in state aid with additional local revenues (or even private contributions). That’s the pork, and that’s where cuts, if necessary, should occur.

Reformy myth #3: The general public is fed up and don’t want to waste any more of their hard earned tax dollars on public schools. They are fed up with greedy teachers with gold plated benefits and fed up with high paid administrators. They don’t care about small class sizes and…well… are just fed up with all of this taxing and spending on public schools that stink. As a result, the only answer is to cut that spending and simultaneously make schools better.

Reality: The reality is that local voters in a multitude of surveys rate their own local public schools quite highly and that local voters when given the opportunity, even during the recent economic downturn, show very high rates of support for school budgets – including budgets with significant increased property taxes (the most hated tax). As I noted in a previous post, when New Jersey handed down state aid cuts to 2010-2011 school budgets and when- for the first time in a long time- the majority of local district budgets statewide failed to achieve approval from local voters, it was still the case that the vast majority (72%) of local budgets passed in affluent communities – in most cases raising sufficient local property tax resources to cover the state aid cuts. In another case, local residents in an affluent suburban community raised privately $420,000 to save full day kindergarten programs. Meghan Murphy’s analysis of Hudson Valley school districts shows that New York State districts also have attempted to counterbalance state aid cuts with property tax increases, but that the districts have widely varied capacity to pull this off.  Parents in a Kansas district are suing in federal court requesting injunctive relief to allow them to raise their taxes for their schools (they use faulty logic and legal arguments, but their desire for better schools should be acknowledged!).

Reformy myth #4: None of this school funding stuff matters anyway. It doesn’t matter what the overall level of funding is and it doesn’t matter how that funding is distributed. As evidence of this truthiness, reformers point to 30+ years of huge spending growth coupled with massive class size reduction and they argue… flat NAEP scores, low international performance and flat SAT scores. Therefore, if we simply cut funding back to 1980 levels (adjusted only for the CPI) and fire bad teachers, we can achieve the same level of outcomes for one heck of a lot less money.

Reality: First of all, these comparisons of spending now to spending then are bogus. I address the various factors that influence the changing costs of achieving desired educational outcomes in this post: https://schoolfinance101.wordpress.com/2011/01/12/understanding-education-costs-versus-inflation/. Second, rigorous peer reviewed studies do show that state school finance reforms matter. Shifting the level of funding can improve the quality of teacher workforce and ultimately the level of student outcomes and shifting the distribution of resources can shift the distribution of outcomes. http://www.tcrecord.org/content.asp?contentid=16106 Similarly, constraining education spending growth over the long term can significantly harm the quality of public schools. See: https://schoolfinance101.wordpress.com/2010/04/22/a-few-quick-notes-on-tax-and-expenditure-limits-tels/

An opportunity for states?

I would argue that now… right now… represents a real opportunity for those states who actually want to step up, and really invest in the quality of their education systems and use the quality of their education systems to drive their economic futures.

I stumbled across this article http://www.foxnews.com/us/2011/02/13/states-offer-tax-breaks-guarantee-jobs/ on the Fox News website the other day, and it presents some useful insights for state policy makers regarding tax policy decisions and economic growth. I’ve written about the same point very early in my blogging (that the Small Business Survival Index in particular, misses some big points about location selection). Here’s a short section of the Fox News piece:

But there’s a catch to the anti-tax, pro-business rhetoric: Businesses consider a range of factors when deciding where to locate, including the quality of schools, roads and programs that rely on a certain level of public spending and regulation. And evidence suggests there is little correlation between a state’s tax rate and its overall economic health.

“Concerns about taxes are overstated,” said Matt Murray, a professor of economics at the University of Tennessee who studies state finance. “Labor costs, K-12 education and infrastructure availability are all part of a good business climate. And you can’t have those without some degree of taxation.”

States’ tax rates also do not predict their resilience during an economic downturn.

Arguably, no time is better than now. Other states are jumping on board with the “New Normal” reformy logic that slashing education budgets, increasing class sizes and narrowing curriculum around tested content areas is the only way to go. Yet, educated parents invariably want small class sizes (often topping the list of preferences for private or public schools) and want their children to have intellectually stimulating and broad, enriched curriculum. The current environment presents a great opportunity for some states to step outside the “New Normal” and truly race to the top with real investment in their public schooling systems.