Blog

Dumbest “real” reformy graphs!

So in my previous post I created a set of hypothetical research studies that might be presented at the Reformy Education Research Association annual meeting. In my creation of the hypotheticals I actually tried to stay  pretty close to reality, setting up reasonable tables with information that is actually quite probable.  Now, when we get down to the real reformy stuff that’s out there, it’s a whole lot worse. In fact, had I presented the “real” stuff in my previous post, I’d have been criticized for fabricating examples that are just too stupid to be true. Let’s take a look at some real “reformy” examples here:

1. From Democrats for Education Reform of Indiana

According to the DFER web site post which includes this graph:

True, there are some great, traditional public schools in Indiana and throughout the nation.  We’re also fortunate that a vast majority of our educators excel at their jobs and are dedicated to doing whatever it takes to help students succeed.  However, that doesn’t mean we should turn a blind eye to what ISN’T working.  Case in point?  The following diagram displays how all 5th grade classes in the span of a year in one central Indiana school district are doing on a set of state Language Arts student academic standards.  Because 5th grade classes in Indiana are only taught by one teacher, the dots can be translated to display how well the students of individual teachers are doing.

Now, ask yourself this:  In which dot or class would you want your child?  And, imagine if your child were in the bottom performing classroom for not one but MULTIPLE years.  In spite of lofty claims made by those who defend the current system, refusal to offer constructive alternatives to rectify charts such as the one above represents the sad state of education dialogue in America today.

So, here we have a graph… a line graph of all things, across classrooms (3rd grade graphing note – a bar graph would be better, but still stupid). This graph shows the average pass rates on state assessments for kids in each class. Nothin’ else. Not gains. Just average scores. Gains wouldn’t necessarily tell us that much either. But this is truly absurd.  The author of the DFER post makes the bold leap that the only conclusion one can draw from differences in average pass rates across a set of Indiana classrooms is that some teachers are great and others suck! Had I used this “real” example to criticize reformers, most would have argued that I had gone overboard.

2. Bill Gates brilliant exposition on turning that curve upside down – and making money matter

Now I’ve already written about this graph, or at least the post in which it occurs, but I didn’t include the graph itself.

Gates uses this chart to advance the argument:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries… For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.

Among other things, the chart includes no international comparison, which becomes the centerpiece of the policy argument. Beyond that, the chart provides no real evidence of a lack of connection between spending and outcomes across districts within U.S. States.  Instead, the chart juxtaposes completely different measures on completely different scales to make it look like one number is rising dramatically while  the others are staying flat. This tells us NOTHING. It’s just embarrassing. Simply from a graphing standpoint, a blogger at Junk Charts noted:

Using double axes earns justified heckles but using two gridlines is a scandal!  A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)

Not much else to say about that one. Again, had I used an example this absurd to represent reformy research and thinking, I’ d have likely faced stern criticism for mis-characterizing the rigor of reformy research!

Hat tip to Bob Calder on Twitter, for finding an even more absurd representation of pretty much the same graph used by Gates above. This one comes to us from none other than Andrew Coulson of Cato Institute. Coulson has a stellar record of this kind of stuff. So, what would you do to the Gates graph above if you really wanted to make your case that spending has risen dramatically and we’ve gotten no outcome improvement? First, use total rather than per pupil spending (and call it “cost”) and then stretch the scale on the vertical axis for the spending data to make it look even steeper. And then express the achievement data in percent change terms because NAEP scale scores are in the 215 to 220 range for 4th grade reading, for example, but are scaled such that even small point gains may be important/relevant but won’t even show as a blip if expressed as a percent over the base year.

And here’s the Student’s First version of the same old story:

3. Original promotional materials from the reformy documentary, The Cartel (a manifesto on New Jersey public schools)

The Cartel is essentially the ugly step-cousin of Waiting for Superman and The Lottery. I’ve written extensively about the Cartel when it was originally released and then when it made its Jersey tour. Thankfully, it didn’t get much beyond that. Back when it was merely a small time, low budget, ill-conceived, and even more poorly researched pile of reformy drivel, The Cartel had a promotional web site (different from the current one) which included a page of documented facts explaining why reform was necessary in New Jersey. The central message was much the same as the Gates message above. The graphs that follow are nolonger there, but the message is – for example – here:

With spending as high as $483,000 per classroom (confirmed by NJ Education Department records), New Jersey students fare only slightly better than the national average in reading and math, and rank 37th in average SAT scores.

Here are the truly brilliant graphs that support this irrefutable conclusion:

I have discussed these graphs at length previously! I’m not sure it’s even worth reiterating my previous comments. But, just to clarify, it is entirely conceivable that participation rates for the SAT differ somewhat across states and may actually be an important intervening factor? Nah… couldn’t be.

A trip to the Reformy Education Research Association?

So, as I head off to AERA in New Orleans, I’ve been pondering what it would be like if there was a special education research conference for reformy types.  What would we find at the Reformy Education Research Association, RERA? How would the research be conducted or presented? What kinds of research thinking might we see?

Well, here are a few examples.

Reformy Study #1

First, here’s a table from the widely distributed paper from a team of renowned authors at the Forum on Understanding Core Knowledge in EDucation.

As you can see, the study endeavors to identify the determinants of school failure, in part, to identify those specific policies that must be changed in order to eliminate failing schools from our society. Failing schools are, after all, an abomination.  The researchers ranked New Jersey schools from highest to lowest proficiency rates and took the top and bottom 10%. They then mined the content of the negotiated contractual agreements for each district, looking for key elements of those contracts for explanations for why some districts fail but others perform quite well (as good as Finland!). They also gathered basic demographic data on students, having been dinged by reviewer #3 (an outsider) on their proposal in which they had not included such data. The authors note, however, that including this data did not alter their original conclusions or policy implications.

Conclusion: The cause of some schools failing and others succeeding is clearly the absence of regular use of clear metrics for teacher evaluation and the absence of mutual consent school assignment policies. It is also likely that basing salaries on experience or degree level adds to the dysfunction of low performing schools.

Policy recommendation: Immediately implement a new teacher evaluation system based 50% on student assessment data. Prohibit the use of experience or degree level as a basis for compensation.

Reformy Study #2

In this next study, authors from the Belltower Institute for Technology Education and Modern Enterprise explore the scalability of a nationally recognized model for charter schooling. Specifically, the goal of the study is to determine whether the model, which has received accolades in major newspapers and on network television (Reformy Nation) over the past year, might be a useful model for replacing entire urban school systems.  Table 2 below shows the characteristics of one successful charter school (sufficient data unavailable on the 3 less successful charters in the same network) operating the model, and the characteristics of the urban host district of that charter school. Deliberations are under way in that district to grant the charter operators full control of all schools in the district. Data in the table focus specifically on children in Grades 6 to 8, the only grades served by the charter.

Clearly, the charter not only outperforms the host district schools in grade 6, but by an even larger margin in grade 8, which can only be interpreted (emphasis in original manuscript) as the charter school adding more value to students with each year that they stay (setting aside the possibility that large shares of those students who are nolonger in attendance by 8th grade may have been lower performers).

Again, original analyses included only student assessment scores, and no further information student population characteristics. Amazingly, the original proposal got dinged by the same reviewer #3 as the study above, but reviewers #1 and #2 found the proposal to represent the highest standards of reformy rigor.

The authors continue to maintain that this information is unimportant because the charter populations are necessarily representative of the host district because a lottery is used for admission to the charter. Nonetheless, the authors contend that the reported differences in student populations and cohort attrition are “trivial.”

Conclusion: Clearly, the charter school has proven that it is able to produce far better results than host district schools while serving the very same children (emphasis in original manuscript) as those served by host district schools, and by using its “no excuses” approach.  Further, children’s performance improves the longer they attend the charter school.

Policy recommendation:  Set in place a strategy to turn over all host district schools, across all grade levels to the charter operator.

Reformy Study #3

In the third and final paper, economists from the the Measuring Yearly Advancements in Social Science project released preliminary findings from a massive privately funded study on teacher effectiveness. Specifically, the study endeavors to determine the correlates of effective teaching, in order to guide public school district personnel policies – specifically hiring, retention and compensation decisions. The study involved 22,543 teachers (326 of whom had complete data on all observations) across 6 cities (4 of which failed to provide sufficient data in time for this preliminary release).  Using two years of data on students assigned to each teacher (using only the 4th grade math assessment data, because correlations on language arts assessments were too unreliable to report), the study investigated which factors are most highly related to a TRUE measure of teaching effectiveness – where true “effectiveness” was defined as the contribution of Teacher X, to achievement growth in 4th grade math on the STATE assessment for students S1 – Sy, linked to that teacher in the given year (Equation expressed in Appendix A, pages 69-74).  The same students were also given a second math assessment. School principals conducted observations 5 times during the year and filled out an extensive evaluation matrix based on teacher practices and student – teacher interactions. Students were also administered surveys, as were parents of those students, requesting extensive feedback regarding their perceptions of teacher quality. The correlations are shown in Table 3.

Conclusions & Implications: The strongest correlate of true teaching effectiveness was the estimate of teacher contribution to student achievement on the same test a year later. However, this correlation was only modest (.30). All other measures including effectiveness measures based on alternative tests and student, parent and administrator perceptions of teacher effectiveness were less correlated with the original value-added estimate, thus raising questions about the usefulness of any of these other measures. Because the value-added measure turns out to be the best predictor of itself in a subsequent year, this estimate alone trumps all others in terms of usefulness for making decisions regarding teacher retention (especially in times of staffing reduction) and should also be considered a primary factor in compensation decisions. Note that while it may appear that school administrators, students and their parents have highly consistent views regarding which teachers are more and less effective (note the higher correlations across administrator ratings of teachers, and student and parent ratings), we consider these findings unimportant because none of these perception-based ratings were as correlated with the original value-added estimate as the value-added estimate was with itself (which of course, is the TRUE measure of effectiveness).

School Funding Deception Alert! (in a CAN)

I’ve noticed a pattern in a few recent school funding proposals, mostly emanating from shoddy, haphazard proposals developed on behalf of the CANs (ConnCAN & its close relatives) and often with “technical support” of Bryan Hassel of Public Impact. Let’s call it school finance reform in a CAN.

These new simplified school funding formula proposals, framed under the “money follows the child” ideology are intended to make state school funding formulas more “transparent” and to allow for more equitable and predictable flow of funding to charter schools or other non-district schools.

In each proposal (ConnCAN’s Spend Smart & The Tab, or Rhode Island’s new formula [albeit laced with other problems unique to RI-see post]), among a variety of other major overlooked factors, arbitrary and unfounded recommendations, exists a seemingly innocuous proposal regarding how to target aid for variations in student needs across districts.

As the authors of ConnCan’s recent Spend Smart brief explain deeply embedded in a footnote… you really only need to use a single factor to get state aid targeted to the right schools and that factor is the share of children qualifying for FREE OR REDUCED PRICED LUNCH. There’s no need for a special factor for limited English proficient/English language learner populations, or anything else. It’s all pretty much correlated to free and reduced lunch. (Hassel’s previous report for ConnCan, The Tab, included a trivially small LEP/ELL weight instead of none at all).

First, this assumption is patently wrong to begin with and is never actually validated by the authors of these proposals. But let’s set that aside for the moment. I’ll have a future post where I use actual data to show just how freakin’ wrong the assumption is.

But why would they propose this anyway? Well, it turns out to be really simple. If a state has a fixed sum of money to distribute (generally how it works), the CAN game is to figure out on what basis might charter schools get the maximum share of that money – regardless of who really needs it most. That is, what measures CAN they choose for weightings which will drive money to charters. Charter schools do tend to operate in poorer communities (relative to state averages), but a) serve the less poor among the poor, b) serve few or no LEP/ELL children, and c) incidentally, also serve few or no children with disabilities (as has been addressed on my blog regarding NY and NJ charter schools, and will be addressed soon regarding CT charters – numbers already run, charts forthcoming).  I’ll set aside c) for now.

So, the way to maximize charter funding, is to give a single weight for children qualified for free OR REDUCED PRICE LUNCH, and to negate any weight for LEP/ELL children (or make it as small as possible). That way, charters will get the same weight for kids whose family income falls between the 130% poverty level and 185% poverty level as neighborhood schools get for children below the 130% poverty level (This distinction is NOT TRIVIAL), where neighborhood schools have far more of the lower-income children. Any money that would have gone to LEP/ELL children can be rolled into a bigger weight for free/reduced lunch children, channeling a larger share of the total funding available to charter schools.

While not specifically addressed in these proposals, one would imagine that the same pundits would also favor flat, lump sum, or census based funding for special education, not differentiated by disability type, such that every school or district gets a specific dollar amount for special education based on a fixed share of their enrollment – a) whether they serve any special education students at all, or b) whether they only serve mild specific learning disability students, and none more severe. Watch out for this one as well!

===

Note: I’m sure that many will respond to this post by arguing that charters get severely shortchanged on state aid anyway, and that even if they make out okay on these adjustments, the lack of funding for such things as capital outlay and facilities more than offsets the difference. That’s a topic for another day. But, suffice it to say, existing comparisons like those made in the recent Ball State/Public Impact (imagine that) study are grossly oversimplified (as I explain regarding NYC schools, here: http://nepc.colorado.edu/publication/NYC-charter-disparities (page 23)). For example, typical crude comparisons never address whether having few or no special education children (relative to averages for district schools) result in cost reductions (per pupil) that might actually be greater than facilities average annual expenses per pupil.

===

Follow up figure for those who asked:

Note that using only a weight on free or reduced lunch would drive the same amount of supplemental funding to Torrington as to Norwalk or Danbury, despite large differences in LEP/ELL populations. So too would any charter school that has comparable low income population to Danbury, Stamford or Norwalk, even if that school had no LEP/ELL children. There may be other valid differences that require additional attention. Even this graph is too crude to give us the full story. The bottom line is that one must at least evaluate the distributions of children by need categories across districts and settings before making such bold, but oversimplified policy recommendations.

Here’s Rhode Island:

The issue here is similar in that Central Falls in particular (imagine that) gets shafted by failure to independently address differences in ELL/LEP concentration. While RI has few districts, and has a specific cluster of high poverty districts, the rates of LEP/ELL children across those districts vary from 5% to 20%. But, as I’ve explained previously, the RI formula and logic behind it have numerous other empirical and logical gaps. see: https://schoolfinance101.wordpress.com/2010/07/01/the-gist-twists-rhode-island-school-finance/

Distilling Rhetoric & Research on NY State Education Spending

This is another one of those mundane school finance formula posts. This one is focused on media and political spin in New York State around the recently adopted state budget and proposed school aid cuts.

Yesterday, I had the displeasure of reading a New York Post piece in which New York Governor Cuomo and the Post were validating how and why the proposed budget cuts would not and should not compromise the quality of New York State public schools. But this article – both the Post explanations and especially the Governor’s spokespersons explanations present a massive distortion of how the proposed cuts actually affect different types of districts across New York State.

Let’s break it down:

Political Spin

Here’s the public appeal, political spin on why cutting state aid to schools in New York really causes no harm:

The state’s student population dropped to 2.7 million from 2.8 million — or 4.6 percent — during that period.

And during that same span, the number of rank-and-file teachers grew to 214,000 from 194,957 — a 9.8 percent increase.

As a result, overall public-school expenditures more than doubled, from $26 billion to $58 billion statewide.

And:

“The huge growth in school bureaucracy and overhead is disturbing, especially since many schools are threatening to fire teachers,” Cuomo spokesman Josh Vlasto said. “School districts clearly have more than enough to do more with less.”

Read more: http://www.nypost.com/p/news/local/supervisor_bloat_hikes_overhead_gnbt3xbRu6hnqPRqrCTZvO#ixzz1IkEkATFa

Very simple: New York State school districts have added a whole bunch of administrators they don’t need – administrators who are obscenely high paid, and really just a massive waste. That is, if we accept the numbers reported above. But I won’t go after those in this post, because the argument is flawed on so many other levels. I will say that it is a foolish stretch to argue that administrative bloat has caused a doubling of per pupil spending across New York school districts.

Essentially, the argument here is that since there is so much bloat and waste – REGARDLESS OF WHERE THAT BLOAT AND WASTE EXISTS, if we cut aid to districts, they can simply cut that bloat. Of course that logic doesn’t work so well if the proposal is to cut aid from districts which are not the ones with the reported bloat.

Academic Analysis on Relative Efficiency and State Aid in New York

It is indeed interesting that the NY Post and Governor’s office have chosen to focus on spending increases since 1997.  Spending in many New York school districts, teacher salaries and administrative salaries in many New York school districts did escalate over this period. But why? What’s going on there? In what districts and what parts of the state is spending increasing, and does state aid play any role in those increases? Perhaps most directly on the question above, are some of those increases in spending actually leading to inefficiency, and is there any component of state aid that might be encouraging inefficient spending in school districts? If that was the case, we’d probably want to look first at those state aid programs as a place to cut.

Here are some summaries of findings from studies on New York State’s STAR tax relief program, which provides a sizeable chunk of financial support in systematically larger amounts to affluent communities:

Eom:

We test this hypothesis by examining the introduction of New York State’s large state-subsidized property tax exemption program, which began in 1999. We find evidence that, all else constant, the exemptions have reduced efficiency in districts with larger exemptions, but the effects appear to diminish as taxpayers become accustomed to the exemptions.

http://bk21gspa.snu.ac.kr/datafile/downfile/%EC%97%84%ED%83%9C%ED%98%B8%28GSPA-SD%2907_1.6.8.pdf

Public Budgeting & Finance / Spring 2006

Eom & Killeen:

Similar to many property tax relief programs, New York State’s School Tax Relief (STAR) program has been shown to exacerbate school resource inequities across urban, suburban, and rural schools. STAR’s inherent conflict with the wealth equalization policies of New York State’s school finance system are highlighted in a manner that effectively penalizes large, urban school districts by not adjusting for factors likely to contribute to high property taxation. As a policy solution, this article presents results of a simulation that distributes property tax relief using an econometrically based cost index. The results substantially favor high-need urban and rural school districts.

http://eus.sagepub.com/content/40/1/36.full.pdf+html

Education and Urban Society November 2007

Rockoff:

I examine how a property tax relief program in New York State affected local educational spending. This program, which lowered the marginal cost of school expenditure to homeowners, had statistically and economically significant effects on local government behavior. A typical school district, which received 20% of its revenue through the program in the school year 2001- 2002, raised expenditure by 4.1% and local property taxes by 6.8% in response. I then examine how the preferences of various groups of local taxpayers affect educational spending by identifying systematic variation across districts in the response to fiscal incentives. These results support the hypothesis that homeowners are more influential on local expenditure decisions than renters, owners of second homes, or owners of non-residential property.

http://www0.gsb.columbia.edu/faculty/jrockoff/papers/local_response_draft_january_10.pdf

Recap of the Research

So, let’s recap. What do we know about NY state aid and the potential link to the supposed inefficiencies to which the NY Post article and governor’s spokesman refer:

  1. That STAR aid in particular is allocated disproportionately to more affluent downstate school districts;
  2. That STAR aid, by reducing the price to local homeowners of raising an additional dollar in taxes to their schools, encouraged increased local spending on schools;
  3. That when the relative efficiency of school districts is measured in terms of increases in measured test scores, given additional dollars spent, STAR aid appears to have encouraged less efficient spending. STAR aid enabled affluent suburban districts to spend on other things not directly associated with measured outcomes, but things those communities still desired for their schools.
  4. That STAR aid contributes to inequities across districts in a system that is already highly inequitable.

What’s happening now?

As I have shown here, in recent years, STAR aid continues to be allocated inequitably, benefitting systematically wealthier districts.

https://schoolfinance101.wordpress.com/2011/02/04/where%E2%80%99s-the-pork-mitigating-the-damage-of-state-aid-cuts/

https://schoolfinance101.com/wp-content/uploads/2011/02/figure3.jpg

Funding inequities persist across New York state districts, with affluent suburban districts far outspending their poorer urban neighbors.

See www.schoolfundingfairness.org

But, the proposed funding cuts are not targeted at the districts which are most likely contributing to “inefficient” spending growth (if it is really inefficient).

The state aid cuts are not targeted to the state aid which seems to be stimulating less efficient spending and exacerbating inequity.

Rather, the proposed state aid cuts fall disproportionately on general foundation formula aid for those districts which have already been left in the dust by their more affluent neighbors. https://schoolfinance101.wordpress.com/2011/02/04/where%E2%80%99s-the-pork-mitigating-the-damage-of-state-aid-cuts/

https://schoolfinance101.com/wp-content/uploads/2011/02/figure5.jpg

How does that make sense?

Quite honestly, the argument made in the Post, and by the governor’s spokesperson is really obnoxious and misguided, given the distribution of the planned cuts.

Analogy for the day

Let’s say we have a state aid program for personal transportation and we have some really rich communities and some really poor communities.

Let’s assume no mass transportation exists.

Let’s say we (the state) decide to give individuals in the poor communities $200 per month to help them purchase, insure and maintain a personal vehicle –  a freakin’ car… and pretty damn cheap car that is minimally functional and questionably reliable. The $200 per month is pretty much all they’ve got. They’ve got few or no personal resources to contribute to an upgrade, and pretty much live month to month on maintenance and insurance.

We use another pot of aid to give $100 per month to residents of the rich community, who’ve already gone out and purchased Bentleys and Ferraris, and mostly use that money for occasional detailing of their vehicles which they might otherwise forgo (perhaps not) or perhaps an enhanced satellite radio subscription they might not have otherwise chosen and one that includes channels the never really expect to use (typically, they would have gotten the most expensive subscription anyway. As the truly rich like to point out, no-one who’s truly rich would ever dare ask how much it costs to maintain a yacht).

All of the sudden, the state budget is tight and a new report from some think tank comes out showing that in the past 10 years, more and more NY residents are driving Ferraris and Bentleys and more and more of them get their cars detailed on a monthly basis and have the most expensive satellite radio subscription. It’s an abomination. Therefore, cutting aid certainly causes no harm.

So policymakers pass their first on-time budget in years, cutting 10% of that $200 per month that currently supports basic car purchases in the poor communities! They ignore entirely that the $100 per month to the rich communities even exists.

Of course, once we’ve cut that money and ignored the other, what we now have is a set of poor communities that is less able to insure and maintain their economy vehicles. And about those Ferraris and Bentleys? We haven’t even touched their detailing subsidy.

Public Impact’s Persistent Pattern of Shoddy Analysis

Alternative title: Why Hassel with research, data and facts?

I was called up on this past week to review a new policy brief on reforming Connecticut’s education funding system – or Education Cost Sharing formula. The brief, titled Spend Smart: Fix Connecticut’s Broken School Funding System seemed simple enough on its face, but as I looked deeper, ended up being among the most offensively shallow and poorly documented reports I have ever seen.

Further, some of elements of the report which were stated as fact, but entirely unsubstantiated would actually lead to funding policies that significantly disadvantage some of the state’s highest need children. Even worse, this brief was accompanied by submitted legislation that included these ill-conceived policies.

But this post is only partly about this new brief produced by ConnCan, with an eclectic mix of authors put forth in reformy manifesto style. Nearly every attempt to ground “facts” in the brief was tied to previous ConnCan briefs, which themselves included little or no substantiation.

The common denominator in this brief and those on which it relies, as well as the accompanying legislation, appears to be Bryan Hassel of Public Impact. Hassel has also played a role on previous haphazard manifesto-like school funding reports including Fund the Child. Bryan Hassel has also been mentioned as the outside expert to advocate on behalf of ConnCan for school funding reform in Connectictut, including testifying in favor of the proposed legislation. See: http://blog.ctnews.com/kantrowitz/2009/12/03/1208/, or the ConnCan tweet:

Brian Hassel, co-dir. Public Impact: SB 1195 would “catapult Connecticut into a national model for schools” #edreform #getsmartct

http://twitter.com/#!/conncan/status/51061576467361792

Tangentially, Bryan Hassel and Public Impact were also involved in the production of the deeply problematic analysis of charter school funding disparities released last year, which I critique in part, in my recent work on New York City charter schools.

There comes a point where I encounter enough different reports linked to single organization and author, where those reports are so shockingly bad that I simply can’t hold back anymore.

The following three examples, all connected back to Public Impact and Bryan Hassel, provide evidence of the utter methodological incompetence of this organization and their/his complete disregard for a) existing rigorous research, b) legitimate analytical methods and data, and perhaps most disturbingly, c) significant adverse consequences of performing shoddy analysis and making bold but haphazard policy recommendations.

Below are three of my related critiques of policy “research” (used as loosely as I can imagine) with ties to Public Impact and Bryan Hassel. I offer these critiques in particular to any policy makers who might believe it reasonable to rely on this junk, or the organization that produces it.

Example 1: Public Impact and ConnCan’s Funding Reform Proposals

http://nepc.colorado.edu/files/TTR-ConnCan-Baker-FINAL.pdf

Here are just a few examples from my review of Spend Smart. The Spend Smart brief essentially argues that the Connecticut finance system is broken (it may well be, and I think it is), and that it should be fixed with a simple school funding formula with a single weight on children qualified for free or reduced price lunch.

This particular brief stated a number of supposed “facts” about the status of the current system, few or none of which could be substantiated with information provided, and some which were clearly unchecked and simply wrong, with significant consequences.

Here are some quoted claims from the brief and a tracing of the factual basis for those claims:

Claim 2: “Moreover, our current system was designed to direct 33 percent more dollars to students in towns with high poverty, but actually provides only 11.5 percent more funding for these students.” (Page 2)

Claim 2 posits that the current ECS formula leads to an average of 11.5% additional funding per low-income child across Connecticut school districts. That claim is cited to a previous ConnCan report, The Tab, authored by Bryan Hassel of Public Impact (specifically Page 18 of The Tab). Page 18 of The Tab cites this claim in Footnote 18 as “Authors’ analysis using 2007-08 data from the State Department of Education.  See Appendix for Details.” However, the appendix of the report provides no such justification and no further reference to the 11.5 figure. Rather the appendix provides only listings of data sources supposedly used and no explanation of how those sources might have been used.[i]

Claim 5: “For example, students at Connecticut’s charter schools are funded at only 75 cents on the dollar compared with traditional public schools.” (page 3)

Claim 5 is perhaps most perplexing, and like Claim 1, an example of the evidentiary black hole. The claim that Connecticut charter schools receive, on average, about 75% of state average funding is cited to a previous ConnCan report [not a Hassel/Public Impact product] titled Connecticut’s Charter School Law and Race to the Top. [ii] This ConnCan report was previously reviewed by Robert Bifulco for NEPC, who explained:[iii]

“The brief provides no indication of how it was determined that charter schools end up with only 75% of per-pupil funding that districts receive, or how, if at all, this comparison accounts for in-kind services or differences in service responsibilities.” [p. 3, Bifulco Critique]

And finally, for now:

Claim 6:“The formula could also hypothetically provide weights for other student needs, such as English Language Learner status. However, data shared by Connecticut State Department of Education with the State’s Ad Hoc Committee to Study Education Cost Sharing and School Choice show that the measure for free/reduced price lunch also captures most English language learners. In other words, there is a very strong correlation between English language learner concentration and poverty concentration in Connecticut. In addition, keeping the formula simple allows a more generous weight for students in poverty.” (p. 7, FN 12)

Claim 6 is particularly disconcerting, both because it includes a statistical finding which is never validated and because it is used to inform a policy solution which would produce substantial inequities harmful to a specific student population – children with limited English language skills. The authors claim outright that there is no need for additional adjustment for districts serving large shares of limited English proficient children because:

“there is a very strong correlation between English language learner concentration and poverty concentration in Connecticut.” (p. 7, FN 12)

This finding is cited only ambiguously in a footnote to data shared by CTDOE.  In some states, a strong relationship between the two measures might warrant collapsing supplemental aid for LEP and low-income children into one student need factor – with sufficient additional support to meet the combination and concentration of needs. However, a quick check of the data in Connecticut shown in Figure 1 (below) reveals that several districts have disproportionately high LEP concentrations relative to their low-income concentrations – specifically Norwalk, Danbury, New London, Windham and New Britain. These districts would be substantially disadvantaged by a formula with no additional weighting for LEP children, coupled with an arbitrary, small weighting for low-income status. In fact, the proposal to include only a relatively small weight for free or reduced price lunch and ignore the concentrated needs of these districts is most likely a back-door way to reduce the overall cost of the formula, and limit the extent that the formula truly redistributes funding where it is needed.

Figure 1

Relationship between Subsidized Lunch Rates and ELL Concentrations 2009


Data source: CTDOE 2009, Student need (free or reduced lunch: http://sdeportal.ct.gov/Cedar/WEB/ct_report/StudentNeedDT.aspx) and LEP data files (http://sdeportal.ct.gov/Cedar/WEB/ct_report/EllDT.aspx)

Note: From 2005 to 2009, the r-squared for this relationship ranges from .25 to .62, and is generally around .5.

The bottom line – The authors clearly never checked. The authors clearly don’t know what they are talking about, even at the most basic level. Yet they are willing – all who signed on to this brief, including Hassel, Hawley-Miles and Paul Hill – to go out on a limb and make these proclamations – proclamations and policy proposals which are simply bad, wrong, misguided – and irresponsible.

Example #2: Public Impact ConnCan’s The Tab

Much of the content of the Spend Smart brief seems to be grounded in, and some of it directly cited to, the previous ConnCan finance report titled The Tab, on which Bryan Hassel was listed as lead author.

I have written previously about The Tab, which is of equal quality to Spend Smart. Here’s a copy and paste of my previous post on The Tab.

https://schoolfinance101.wordpress.com/2009/11/23/why-is-it-ok-for-think-tanks-to-just-make-stuff-up/

==========Original Blog Post

This topic comes to mind today because ConnCan has just released a report (http://www.conncan.org/matriarch/documents/TheTab.pdf)    on how to fix Connecticut school funding which provides classic examples of just makin’ stuff up (page 25). The report begins with a few random charts and graphs showing the differences in funding between wealthy and poor Connecticut school districts and their state and local shares of funding. These analyses, while reasonably descriptive are relatively meaningless because they are not anchored to any well conceived or articulated explanation of “what should be.” Such a conception might be located here or even here (Chapters 13, 14 & 15 are particularly on target)!

The height of making stuff up in the report is the recommended policy solution to the problem which is never clearly articulated. There are problems in CT, but The Tab, certainly doesn’t identify them!

The supposed ideal policy solution involves a pupil-based funding formula where each pupil should receive at least $11,000 per pupil (made up), and each child in poverty (no definition provided – just a few random ideas in a footnote) should receive an additional $3,000 per pupil (also made up) and each child with limited English language proficiency should receive an additional $400 per pupil (yep… totally made up). There is minimal attempt in the report (http://www.conncan.org/matriarch/documents/TheTab.pdf) to explain why these figures are reasonable. They’re simply made up.

The authors do provide some back-of-the-napkin explanations for the numbers they made up – based on those numbers being larger than the amounts typically allocated (not necessarily true). They write off the possibility that better numbers might be derived by way of a general footnote reference to a chapter in the Handbook of Research on Education Finance and Policy by Bill Duncombe and John Yinger which actually explains methods for deriving such estimates.

The authors of The Tab conclude: “Combined with federal funding that flows on the basis of poverty and (in some cases) the English Language Learner weight of an additional $400, the $3,000 poverty weight would enable districts and schools to devote considerable resources to meeting the needs of disadvantaged students.” I’m glad they are so confident in their “made up” numbers! I, however, am less so!

It would be one thing if there was no conceptual or methodological basis for figuring out which children require more resources or how much more they might actually need. Then, I guess, you might have to make stuff up. Even then, it might be reasonable to make at least some thoughtful attempt to explain why you made up the numbers you… well… made up. But alas, such thinking seems beyond the grasp of at least some “think tanks.” Guess what? There actually are some pretty good articles out there which attempt to distill additional costs associated with specific poverty measures… like this one, by Bill Duncombe and John Yinger: How much more does a disadvantaged student cost?

It’s not like the title of this article somehow conceals its contents, does it? Nor is the journal in which it was published (Economics of Education Review) somehow tangential to the point at hand. This paper, prepared for the National Research Council provides some additional insights into additional costs associated with poverty and methods for estimating those costs.

Rather than even attempt to argue that these figures are somehow founded in something, the authors of The Tab seem to push the point that it really doesn’t matter what these numbers are as long as the state allocates pupil-based funding.  That’s the fix! That’s what matters… not how much funding or whether the right kids get the right amounts. In fact, the reverse is true. The potential effectiveness, equity and adequacy of any decentralized weighted funding system is highly contingent upon driving appropriate levels of funding and funding differentials across schools and districts!

Example #3: Public Impact Charter Disparity Analysis

Finally, there’s the report done by Public Impact with Ball State University on charter school funding disparities, which remains fresh in my mind because it keeps coming back up again and again. And it is because of the connection between the shoddy methods of that report, and the absurdly shoddy analysis in The Tab and Spend Smart, that this post is focuses on Bryan Hassel and Public Impact.

When digging deeper on financial differences among charter and non-charter schools in New York City, and looking at what the Public Impact/Ball State study had said about New York charter schools, my coauthor and I were shocked at how poorly the Public Impact/Ball State study had been conducted. Here’s a short section of our critique:

From: Baker, B.D. & Ferris, R. (2011). Adding Up the Spending: Fiscal Disparities and Philanthropy among New York City Charter Schools. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/NYC-charter-disparities.

This section returns to the issue of disparities in funding between non-charter and charter schools. As already noted, the Ball State/Public Impact study identified New York State as having large financial disparities between traditional public schools and charter schools. In contrast, the NYC independent budget office concluded that charters with department of education facilities had only negligibly fewer resources than non-charter public schools. One of these accounts is incorrect.

Ball State/Public Impact study claims that NYC traditional public school per-pupil expenditures were $20,021 in 2006-07, and that charter school expenditures were $13,468, for a 32.7% difference.[iv] However, the first figure appears to be inflated; the only figure that closely resembles $20,021 is the total expenditure, including capital outlay expense. This amounts to 19,198,[v] according to the 2006-07 NCES fiscal survey.[vi] This amount includes spending that is clearly not for traditional public schools—it includes not only transportation and textbooks allocated to charter schools, but also the city expenditures on buildings used by some charter schools.[vii] In essence, this approach attributes spending on charters to the publics they are being compared with—clearly a problematic measurement.

After offering these figures and the crude comparisons, the Ball State/Public Impact study argues that the purportedly severe funding differential is not explained by differences in need, because on average 43.5% of the students in public schools in New York State qualify for free or reduced-price lunch, while on average 73.3% of those in charter schools in New York State do. But, as was demonstrated earlier, there are three problems: (a) the focus on state rates, rather than NYC rates; (b) the inclusion of reduced-price lunch rates rather than just free-lunch rates as a measure of poverty (when focused on comparisons within NYC); and (c) the failure to compare only schools serving the same grade-levels. When these details are addressed, a different picture emerges. At the elementary level in NYC, for example, charter school free lunch rates were 57% and non-charter public school rates were 68%.

The NYC IBO report offers figures that are more in line with the data. For 2008-09, traditional public schools are found to have expenditures of $16,678, while charters that are provided with facilities are at nearly the same level ($16,373). Public expenditures on charters not provided facilities are found to be about $2,700 per pupil lower ($13,661). But even this comparison is not necessarily the most precise or accurate that might be made, because it does not attempt to compare schools that are (a) similar in grade level and grade range and (b) similar in student needs. The IBO analysis provides a useful, albeit limited, comparison of charter schools in their aggregate to district schools in their aggregate. Importantly, the IBO charter school funding figures do not include funds raised through private giving to schools or monies provided by their management organizations.

Once the cost differences associated with student populations are factored in, the IBO analysis changes significantly. In fact, the cost associated with student population differences is the same as the per-pupil cost associated with lack of a facility: $2,500. After adding the $2,500 low-need-population adjustment to charters, those not in BOE facilities can be seen to have funding nearly equal to that of non-charters ($16,171 vs $16,678) while those in BOE facilities have significantly more funding than non-charters (see Table 3).[viii]

One might try to argue that these problems we identify with the NY estimates, which render them entirely meaningless, are specific to New York, but that the rest of the states are reasonably estimated. The reality is that when it comes to estimating these types of funding differentials, each state and each local district, depending on the charter funding formula has its own peculiarities. If the crude method used by Hassel and colleagues completely missed the boat on New York, it is highly likely that comparable problems exist across many other settings. Without further, more detailed an appropriate analysis it would be unwise to base any conclusions on the existing Ball State/Public Impact study.


[i] In the recent report Is School Funding Fair, 2007-08 update (http://www.schoolfundingfairness.org/SFF_2008_Update.pdf) , Baker, Farrie and Sciarra show that the differential between very high and very low poverty districts in Connecticut is about 15% (Table 1), however, it is important to understand that in Connecticut, these patterns are not systematic. Rather, as I show in Figure A3 of the appendix herein, there exist substantial irregularities in current spending per pupil with respect to poverty. Among high need districts in particular, funding levels vary widely. Arguably, in this regard the system is indeed broken. But the ConnCan reports fail to provide any legitimate evidence to this effect.

[ii] http://www.conncan.org/sites/default/files/research/CTCharterLaw-RTTT2010-Web-2.pdf.  Interestingly, the authors of the current brief, including Bryan Hassel, choose not to anchor this conclusion to other recent work co-authored by Hassel, which describes funding disparities between host districts – New Haven and Bridgeport – and charters in those cities as “severe.” However, Baker and Ferris (2011) explain substantial methodological flaws in the characterization of charter funding gaps by Hassel and colleagues, pertaining to their analysis of New York State and New York City charter schools. There is little reason to believe that Hassel and colleagues analyses of Connecticut are any more valid than those for New York. For the state and district summaries of charter disparities, see: Batdorff, M., Maloney, L., May, J., Doyle, D., & Hassel, B. (2010). Charter School Funding: Inequity Persists. Muncie, IN: Ball State University. see: p. 10-11,Table 5. For a thorough critique of Hassel and colleagues mis-steps in this report when characterizing charter disparities in New York, see: Baker, B.D. & Ferris, R. (2011). Adding Up the Spending: Fiscal Disparities and Philanthropy among New York City Charter Schools. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/NYC-charter-disparities.

[iii] Bifulco, R. (2010). Review of “Connecticut’s Charter School Law & Race to the Top!” Boulder and Tempe: Education and the Public Interest Center & Education Policy Re-search Unit. Retrieved [date] from  http://nepc.colorado.edu/files/TTR-ConnCan-Bifulco.pdf

[iv] See: Batdorff, M., Maloney, L., May, J., Doyle, D., & Hassel, B. (2010). Charter School Funding: Inequity Persists. Muncie, IN: Ball State University, bottom of Table 5

[v] Depending on how one chooses to calculate this figure, the range is from 19,199 to about 20,162. The reported total expenditures for the district are $20,144,661,000 and enrollment figures range from 999,150 (as reported in the fiscal survey) to 1,049,273 (implied enrollment from current expenditure per pupil calculation in fiscal survey).

[vi] From the Census Bureau’s Fiscal Survey of Local Governments, Elementary and Secondary Education, F-33.  http://www.census.gov/govs/www/school.html

[vii] The New York State Education Department reports several versions of expenditure figures. Total expenditures per pupil for NYC in 2007-08 were $18,977—much lower than the total reported by Batdorf and colleagues. But the IBO correctly points out some expenses would be appropriately excluded from this number. For instance, the NYC Department of Education provides facilities for about half the city’s charter schools as well as many other forms of support for some charter schools, including authorizer services, food service, transportation services, textbooks, and management services:

Pass-through Support for Charter Schools. Charter schools are eligible to receive goods such as textbooks and software, as well as services such as special education evaluations, health services, and student transportation, if needed and requested from the district. In NYC there is a long-established process for non-public schools to access these services, and charter schools have access to similar support from DOE. For these items, charter schools receive the goods or services rather than dollars to pay for them. Most of these non-cash allocations are managed centrally through DOE.

IBO report, 2010: Retrieved December 13, 2010, from
http://schools.nyc.gov/community/planning/charters/ResourcesforSchools/default.htm.

It is simply wrong to compare the city aggregate spending per pupil to the school-site allotment for charters, as was done by Batdorf and colleagues (who also use the most inflated available figure for the city aggregate spending). In 2007-08 (a year earlier than the IBO comparison figure, but likely a reasonable substitute), NYSED estimates for the instructional/operating expenditures per pupil in NYC were $15,065 (this uses the instructional expenditure share, including expenditures on employee benefits [IE2%, Col. AP] times the total expenditures.  Retrieved December 13, 2010, from http://www.oms.nysed.gov/faru/Profiles/datacolumns1.htm). This figure may be far more relevant than that chosen by Batdorf and colleagues, but is still potentially problematic.

[viii] Again, we are unable to adjust precisely for differences in special education populations, due to lack of sufficiently detailed data.

Measuring poverty in education policy research

My goal in this post is to explain why it is vitally important in the current policy debate that we pay careful attention to how child poverty is measured and what is gained and lost by choosing different versions of poverty measures as we evaluate education systems, schools and policy alternatives.

This post is inspired by a recent exceptional column on a similar topic by Gordon MacInnis, on NJ Spotlight. See: http://www.njspotlight.com/stories/11/0323/1843/

There is a great deal of ignorance and in some cases belligerent denial about persistent problems with using excessively crude measures to characterize the family backgrounds of children, specifically measuring degrees of economic disadvantage.

As an example of the belligerent denial side of the conversation, the following statements come from a recent slide show from officials at the New Jersey Department of Education, regarding their comparisons of charter school performance, and in response to my frequently expressed concern that New Jersey Charter schools tend to serve larger shares of the “less poor among the poor” children. Here’s the graph for Newark schools.

That is, New Jersey Charter Schools which operate generally in high poverty settings, tend to serve somewhat comparable shares of children qualifying for free AND REDUCED price lunch, when compared to neighborhood schools, but serve far fewer children who qualify for FREE LUNCH ONLY.

NJDOE official’s recent response to this claim is as follows:

  • The state aid formula does not distinguish between “free” and “reduced”-price lunch count.
  • New Jersey combines free and reduced for federal AYP determination purposes
  • All students in both these categories are generally used by researchers throughout the country as a good enough proxy for “economically disadvantaged”
  • And most important, research shows that concentration of poverty in schools creates unique challenges, and most charters in NJ cross a threshold of concentrated poverty that makes these distinctions meaningless

Whether New Jersey uses this crude indicator in other areas of policy does not make it a good measure. In some cases, it may be the only available measure. But that also doesn’t make it a good one. And whether researchers use the measure when it’s one of the only measures available also does not make it a good measure.

Any thoughtful and reasonably informed researcher should readily recognize and acknowledge the substantial shortcomings of such crude income classification, and the potential detrimental effects of using such a measure within an analysis or statistical model.

The final bullet point is just silly. The final statement claims that since charters and non-charters in New Jersey cities are all “poor enough” there’s really no difference. This claim relies on selecting a threshold for identifying poverty that is simply too high to capture the true differences in poorness – real, legitimate and important differences – with significant consequences for student outcomes.

To put it quite simply, the distinction between various levels of poverty and measures for capturing those distinctions are not trivial and not meaningless. Rather, they are quite meaningful and important, especially in the current policy context.

Here’s a run-down on why these differences are not trivial:

What are the “official” differences in those who qualify for free versus reduced priced lunch?

Figure 1 provides the income definitions for families to qualify for free versus reduced price lunch. This information is relatively self-explanatory. Families qualifying for reduced price lunch have income at 185% of the poverty level. Families qualifying for free lunch fall below income of 130% of the poverty level.

Figure 1: Income cut-offs for families qualifying for the National School Lunch Program


Unfortunately, a secondary problem with these cut-offs for discussion another day, is that these thresholds do not vary appropriately across regions and between rural and urban areas. The same income might go further in providing a reasonable lifestyle in Texas than in the New York metropolitan area. Trudi Renwick has done some preliminary work providing state level adjusted poverty estimates to correct for this problem: http://www.census.gov/hhes/povmeas/methodology/supplemental/research.html

If these distinctions are trivial and meaningless, why are there such large differences in NAEP performance?

Now the fact that the income levels which qualify a family for free or reduced lunch are different does not necessarily mean that these differences are important to education policy analysis. In fact, one thing that we do know is that because the income thresholds fit differently in different settings and different regions, different measures work better in different settings (lower-income thresholds in southern and southwestern states, for example).

But why do we consider these measures in education policy research to begin with? The main reason we consider poverty measures in education policy research is because it is generally well understood that children’s economic well-being is strongly associated with their educational outcomes, and with our ability to improve those outcomes and the costs of improving those outcomes. In most thorough, social science analysis of these relationships, extensive measures of family educational background, actual income (rather than simple categories), numbers of books in the household, and other measures are used. But such measures aren’t always readily available. It is more common to find, in a state data system, a simple indicator of whether a child qualifies for free or reduced price lunch. That doesn’t make it good though. It’s just there.

But if, for example, we could look at achievement outcomes of kids who qualified for free lunch only, and for kids who qualified for reduced price lunch, and if we saw significant differences in their achievement, then it would be important to consider both… or consider specifically the indicator more strongly associated with lower student outcomes. The goal is to identify the measure, or version of the measure that is sensitive to the variations in family backgrounds in the setting under investigation and is associated with outcomes.

Figure 2 piggy backs on Gordon MacInnis examples comparing NAEP achievement gaps between non-low income students (anything but a homogeneous group) and students who qualify for free or for reduced price lunch. In figure 2 I graph NAEP 8th grade math outcomes for 2003 to 2009. What we see is that the average outcomes for students who qualify for free lunch are much lower than those who qualify for reduced price lunch. In fact, the gap between free and reduced is nearly as big in some cases as the gap between reduced and not qualified!

Figure 2: Differences in 8th grade Math Achievement by Income Status 2003-2009


Can every school in Cleveland be equally poor?

Another issue is that when we use the free or reduced price lunch indicator, and apply that indicator as a blunt, dummy variable to kids in high poverty settings – like poor urban core areas – we are likely to find that 100% of children qualify. Just because 100% of children receive the “qualified for free or reduced lunch” label does not by any stretch of the imagination mean that they are all on equal “economic disadvantage” footing. That they are all “poor enough” to be equally disadvantaged.

Let’s take a look at Cleveland Municipal School District and the distribution of schools by their rate of free and reduced lunch. There it is in Figure 3 – Nearly every school in Cleveland is 100% free or reduced price lunch. So, I guess they are all about the same. All equally poor. No need to consider any differential treatment, funding, policies or programs? Right?

Figure 3: Distribution of Cleveland Municipal School District % Free or Reduced Price Lunch Rates


Well, not really! That would be a truly stupid assertion, and I expect anyone working within Cleveland Municipal School District can readily point to those neighborhoods and schools that serve far more substantively economically disadvantaged students than others. The data I have for this analysis are not quite that fine-grained – to go to the neighborhood level – but in Figure 4 I can break the city into 4 areas, and show the average poverty index level for families with public school enrolled children between the ages of 6 and 16.  The poverty index is income relative to the poverty level where 100 is 100% level, and 185 would be roughly the level that qualifies for reduced price lunch, for example. Figure 4 shows the average differences across 4 areas of the city – classified in the American Community Survey as Public Use Microdata Areas, or PUMAs.

Figure 4: Average “Poverty Index” by Public Use Microdata Area within Cleveland


Figure 5 shows the distributions for each area, and they are different. Clearly, not all Cleveland neighborhoods are comparably economically disadvantaged, even in 100% of the schools are 100% free or reduced price lunch!

Figure 5: Poverty Index distribution by Public Use Microdata Area within Cleveland


Why is this so important in the current policy context?

So then, who really cares? Why does any of this matter? And why now? Well, it has always mattered, and responsible researchers have typically sought more fine-grained indicators of economic status, where available. But we are now in an era where policy researchers are engaged in fast-paced, fast-tracked use of available state administrative data in order to immediately inform policy decision-making. This is a dangerous data environment, and crude poverty measurement has potentially dire consequences.  Here are a few reasons why:

  • Many if not most models rating teacher or school effectiveness rely on a single dummy variable indicating that a child does or does not come from a family that falls below the 185% income level for poverty.

I’ve actually been shocked by this. Reviewing numerous pretty good and even very high quality studies estimating teacher effects on student outcomes, I’ve found an incredible degree of laziness in the specification of student characteristics – specifically student poverty.

Figure 6 shows the poverty components of the New York City Teacher Effectiveness Model. Yep – there it is, a simple dichotomous indicator of qualifying for free or reduced price lunch. No way at all to differentiate between teachers of marginally poor, and very poor children.

Figure 6: Measures included in New York City Teacher Effectiveness Model


In a value-added model of teacher effects, if we use only a crude Yes or No indicator for whether a child is in a family that falls below the 185% income level for poverty, that child who is marginally below that income level is considered no different from the child who is well below that income level – homeless, destitute, multi-generational poverty. Further, in many large urban centers, nearly all children fall below the 185% income level (imagine doing this in Cleveland?). But they are not all the same! The variations in economic circumstances faced by children across schools and classrooms is huge. But the crude measurement ignores that variation entirely. And the lack of sensitivity of these measures to real differences in economic disadvantage likely adversely affects teachers of much poorer children – a model bias that goes unchecked for lack of a more precise indicator to check for the bias!

  • This problem is multiplied by the fact that when these models evaluate the influence of peers on individual student performance, the peer group is also characterized in terms of whether the peers fall below this single income threshold.

In a teacher effectiveness model, the poverty measurement problem operates at two levels. First, at the individual student level mentioned above, where one cannot delineate between the student from a low-income family and the student from a very low income family. Second, “better” value-added teacher effectiveness models also attempt to account for the characteristics of the classroom peer group. But, we are stuck with the same crude measure, which prohibits us from evaluating the effect on any one student’s achievement gains of being in a class of marginally low-income peers versus being in a class of very low-income peers.

Okay, you say, the “best” value added models – especially those used in high stakes teacher evaluation would not be so foolish as to use such a crude indicator. BUT THEY DO, JUST LIKE THE NYC MODEL ABOVE. AND THEY DO SO QUITE CALLOUSLY AND IGNORANTLY.  Why? Because it’s the data they have. The LA Times model uses a single dummy variable for poverty, and does not even include a classroom peer effect aggregation of that variable.

  • Many comparisons of charter and traditional public schools that seek to evaluate whether charters are serving representative populations only compare the total of children qualifying for free or reduced price lunch, or similarly apply simple indicators of free or reduced price lunch status to individual students.

Yet, charter schools seem invariably to serve much more similar rates of children qualifying for free or reduced price lunch when compared to nearby traditional public schools, but serve far fewer children in the lower-income group which qualify for free lunch. Charters seem to be serving the less poor among the poor, in poor neighborhoods, in Newark, NJ or in New York City. Given that the performance differences among these subgroups tend to be quite large, using only the broader classification masks these substantial differences.

In conclusion

Yes, in some cases, we continue to be stuck with these less than precise indicators of child poverty. In some cases, it’s all we’ve got in the data system. But it is our responsibility to seek out better measures where we can, and use the better measures when we have them. We should, whenever possible:

  1. Use the measure that picks up the variation across children and educational settings
  2. Use the measure that serves as the strongest predictor of educational outcomes – the strongest indicator of potential educational disadvantage.
  3. And most importantly, when you don’t have a better measure, and when the stakes are particularly high, and when the crude measure might significantly influence (bias) the results, JUST DON’T DO IT!

Don’t attempt to draw major conclusions about whether charter schools (or any schools or programs for that matter) can do “as well” with low-income children when the indicator for “low income” encompasses equally every child (or nearly every child) in the city in both traditional public and charter schools.

Don’t attempt to label a teacher as effective or ineffective at teaching low-income kids, relative to his or her peers, when your measure of low-income is telling you that nearly all kids in all classrooms are equally low-income, when they clearly are not.

And most importantly, don’t make ridiculous excuses for using inadequate measures!

Student Test Score Based Measures of Teacher Effectiveness Won’t Improve NJ Schools

Op-Ed from: http://www.northjersey.com

The recent Teacher Effectiveness Task Force report recommended basing teacher evaluation significantly on student test scores. A few weeks earlier, Education Commissioner Cerf recommended that teacher tenure and dismissal, as well as compensation decisions be based largely on student assessment data.

Implicit in these recommendations is that the state and local districts would design a system for linking student assessment data to teachers for purposes of estimating teacher effectiveness. The goal of statistical “teacher effectiveness” measurement systems, including the most common approach called value-added modeling (VAM), is to estimate the extent to which a specific teacher contributes to the learning gains of a group (or groups) of students assigned to that teacher in a given year.

Unfortunately, while this all sounds good, it just doesn’t work, at least not well enough to even begin considering using it for making high stakes decisions about teacher tenure, dismissal or compensation. Here’s a short list (my full list is much longer) of reasons why:

  1. It is not possible to equate the difficulty of moving a group of children 5 points (or rank and percentile positions) at one end of a test scale to moving children 5 points at the other end. Yet that is precisely what the proposed evaluations endeavor to accomplish. In such a system, the only fair way to compare one teacher to another would be to ensure that each has a randomly assigned group of children whose initial achievement is spread similarly across the testing scale. Real schools and districts don’t work that way.  It is also not possible to compare a 5 point gain in reading to a 5 point gain in math. These limitations undermine the entire proposed system.
  2. Even with the best models and data, teacher ratings are highly inconsistent from year to year, and have very high rates of misclassification. According to one recent major study, there is a 35% chance of identifying an average teacher as poor, given one year of data, and 25% chance given three years. Getting a good rating is a statistical crap shoot.
  3. If we rate the same teacher with the same students, but with two different tests in the same subject, we get very different results. Cal. Berkeley Economist Jesse Rothstein, re-evaluating the findings of the much touted Gates Foundation Measuring Effective Teaching (MET) study noted that more than 40% of teachers who placed in the bottom quarter on one test (state test) were in the top half when using the other test (alternative). That is, teacher ratings based on the state assessment were only slightly better than a coin toss for identifying which teachers did well using the alternative assessment.
  4. No-matter how hard statisticians try, and no matter how good the data and statistical model, it is very difficult to separate a teacher’s effect on student learning gains from other classroom effects, like peer effect (race and poverty of peer group).  New Jersey schools are highly segregated, hampering our ability to make valid comparisons across teachers who work in vastly different settings. Statistical models attempt to adjust away these differences, but usually come up short.
  5. Kids learn over the summer too and higher income kids learn more than their lower income peers over the summer. As a result, annual testing data aren’t very useful for measuring teacher effectiveness. Annual (rather than fall-spring) testing data significantly disadvantage teachers serving children whose summer learning lags. Setting aside all of the un-resolvable problems above, this one can be fixed with fall-spring assessments. But it cannot be resolved in any fast-tracked plan involving current New Jersey assessments, which are annual. The task force report irresponsibly ignores this HUGE AND OBVIOUS concern, recommending fast-tracked use of current assessment data.
  6. As noted by the task force, only those teachers responsible for reading and math in grades 3 to 8 could readily be assigned ratings (less than 20% of teachers). Testing everything else is a foolish and expensive endeavor. This means school districts will need separate contracts for separate classes of teachers and will have limited ability to move teachers from one contract type to another (from second to fourth grade). Further, pundits have been arguing that a) we should be using effectiveness measures instead of experience to implement layoffs due to budget cuts, and b) we shouldn’t be laying off core, classroom teachers in grades 3 to 8. But those are the only teachers for whom “effectiveness” measures would be available?
  7. Basing teacher evaluations, tenure decisions and dismissal decisions on scores that may be influenced by which students a teacher serves provides a substantial disincentive for teachers to serve kids with the greatest needs, disruptive kids, or kids with disruptive family lives. Many of these factors are not, and can not be captured by variables in the best models. Some have argued that including value-added metrics in teacher evaluation reduces the ability of school administrators to arbitrarily dismiss a teacher. Rather, use of these metrics provides new opportunities to sabotage a teacher’s career through creative student assignment practices.

In short, we may be able to estimate a statistical model that suggests that teacher effects vary widely across the education system – that teachers matter. But we would be hard pressed to use that model to identify with any degree of certainty which individual teachers are good teachers and which are bad.

Contrary to education reform wisdom, adopting such problematic measures will not make the teaching profession a more desirable career option for America’s best and brightest college graduates. In fact, it will likely make things much worse. Establishing a system where achieving tenure or getting a raise becomes a roll of the dice and where a teacher’s career can be ended by a roll of the dice is no way to improve the teacher work force.

Contrary to education reform wisdom, using these metrics as a basis for dismissing teachers will NOT reduce the legal hassles associated with removal of tenured teachers.  As the first rounds of teachers are dismissed by random error of statistical models alone, by manipulation of student assignments, or when larger shares of minority teachers are dismissed largely as a function of the students they serve, there will likely be a new flood of lawsuits like none ever previously experienced. Employment lawyers, sharpen your pencils and round up your statistics experts.

Authors of the task force report might argue that they are putting only 45% of the weight of evaluations on these measures. The rest will include a mix of other objective and subjective measures. The reality of an evaluation that includes a single large, or even significant weight, placed on a single quantified factor is that that specific factor necessarily becomes the tipping point, or trigger mechanism. It may be 45% of the evaluation weight, but it becomes 100% of the decision, because it’s a fixed, clearly defined (though poorly estimated) metric.

Self-proclaimed “reformers” make the argument that the present system of teacher evaluation is so bad as to be non-existent. Reformers argue that the current system has 100% error rate (assuming current evaluations label all teachers as good, when all are actually bad)!

From the “reformer” viewpoint, something is always better than nothing.

Value added is something.

We must do something.

Therefore, we must do value-added.

Reformers also point to studies showing that teacher’s value-added scores are the best predictor (albeit a weak and error prone predictor) of teacher’s future value added scores – a self-fulfilling prophecy. These arguments are incredibly flimsy.

In response, I often explain that if we lived in a society that walked everywhere, and a new automotive invention came along, but had the tendency to burst into a ball of flames on every third start, I think I’d walk. Now is a time to walk! Some innovations just aren’t ready for broad public adoption – and some may never be. Some, like this one, may not be a very good idea to begin with. That said, improving teacher evaluation is not a simple either/or and now may be a good time to step back from this false dichotomy and discuss more productive alternatives.

Expanded gambling okay in NJ, but only if it involves gambling on teachers’ jobs!

I may be the only one in New Jersey who had a twisted enough view of today’s news stories to pick up on this connection. Seemingly irrelevant to my blog, today, the Governor of New Jersey vetoed a bill that would have approved online gambling. At the same time, the Governor’s teacher effectiveness task force released its long-awaited report. And it did not disappoint. Well, I guess that’s a matter of expectations. I had very low expectations to begin with – fully expecting a poorly written, ill-conceived rant about how to connect teacher evaluations to test scores – growth scores – and how it is imperative that a large share of teacher evaluation be based on growth scores. And I got all of that and more!!!!!

I have written about this topic on multiple occasions.

For the full series on this topic, see: https://schoolfinance101.wordpress.com/category/race-to-the-top/value-added-teacher-evaluation/

And for my presentation slides on this topic, including summaries of the relevant research, see: https://schoolfinance101.com/wp-content/uploads/2010/10/teacher-evaluation_general.pdf

When it comes to critiquing the Task Force Report, I’m not even sure where to begin. In short, the report proposes the most ill-informed toxic brew of policy recommendations that one can imagine. The centerpiece, of course, is heavy… very heavy reliance on statewide student testing measures yet to be developed… yet to be evaluated for their statistical reliability … or their meaningfulness of any sort (including predictive validity of future student success). As Howard Wainer explains here, even the best available testing measures are not up to the task of identifying more and less effective teachers: http://www.njspotlight.com/ets_video2/

But who cares what the testing and measurement experts think anyway. This is about the kids… and we must fix our dreadful system and do it now… we can’t wait! The children can’t wait!

So then, what does this have to do with the online gambling veto? Well, it struck me as interesting that, on the one hand, the Governor vetoes a bill that would approve online gambling, but the Governor’s Task Force proposes a teacher evaluation plan that would make teachers’ year to year job security and teacher evaluations largely a game of chance. Yes, a roll of the dice. Roll a 6 and you’re fired! Damn hard to get 3 in a row (positive evaluations) to get tenure. Exponentially easier to get 2 in a row (bad evals) and get fired. No online gambling for sure, but gambling on the livelihood of teachers? That’s absolutely fine!

Interestingly, one of the only external sources even cited (outside of citing the comparably problematic Washington DC IMPACT contract, and think tanky schlock like the New Teacher Project’s “Teacher Evaluation 2.0“), was the Gates Foundation’s Measuring Effective Teaching Project (MET).  Of course, the task force report fails to mention that the Gates Foundation MET project report does not make a very compelling statistical case that using test scores as a major factor for evaluating teachers is a good idea. Actually, they fail to mention anything substantive about the MET reports. I wrote about the MET report here.  And economist Jesse Rothstein took a closer look at the Gates MET findings here! Rothstein concluded:

In particular, the correlations between value-added scores on state and alternative assessments are so small that they cast serious doubt on the entire value-added enterprise. The data suggest that more than 20% of teachers in the bottom quarter of the state test math distribution (and more than 30% of those in the bottom quarter for ELA) are in the top half of the alternative assessment distribution. Furthermore, these are “disattenuated” estimates that assume away the impact of measurement error. More than 40% of those whose actually available state exam scores place them in the bottom quarter are in the top half on the alternative assessment.
In other words, teacher evaluations based on observed state test outcomes are only slightly better than coin tosses at identifying teachers whose students perform unusually well or badly on assessments of conceptual understanding.

Yep that’s right. It’s little more than a coin toss or a roll of the dice! Online gambling (personally, I don’t care one way or the other about it), not okay. Gambling on teachers’ livelihoods with statistical error? Absolutely fine. After all, it’s those damn teachers that have sucked the economy dry with their high salaries and gold-plated benefits packages! And after all, it is the only profession in the world where you can do a really crappy job year after year after year… and you’re totally protected, right? Of course it’s that way. Say it loud enough and enough times, over and over again, and it must be true.

Here are a few random thoughts I have about the report:

  • So… as I understand it, they want to base 45% of a teacher’s evaluation on measures that have a 35% chance of misclassifying an average teacher as ineffective – and these are measures that only apply to about 15 to 20% of the teacher workforce? That doesn’t sound very well thought out to me.
  • Forcing reading and math teachers to be evaluated by measures over which they have limited control, and measures that jump around significantly from year to year and disadvantage teachers in more difficult settings isn’t likely to make New Jersey’s best and brightest jump at the chance to teach in Newark, Camden or Jersey City.
  • Even if the current system of teacher evaluation is less than ideal, it doesn’t mean that we should jump to adopt metrics that are as problematic as these. Promoters of these options would have the public believe that it’s either the status quo – which is necessarily bad – or test-score based evaluation – which is obviously good. This is untrue at many levels. First, New Jersey’s status quo is pretty good. Second, New Jersey’s best public and private schools don’t use test scores as a primary or major source of teacher evaluation. Yet somehow, they are still pretty darn good. So, using or not using test scores to hire and fire teachers is not likely the problem nor is it the solution. It’s an absurd false dichotomy.
  • Authors of the report might argue that they are putting only 45% of the weight of evaluations on these measures. The rest will include a mix of other objective and subjective measures. The reality of an evaluation that includes a single large, or even significant weight, placed on a single quantified factor is that that specific factor necessarily becomes the tipping point, or trigger mechanism. It may be 45% of the evaluation weight, but it becomes 100% of the decision, because it’s a fixed, clearly defined (though poorly estimated) metric.

Here’s a quick run-down on some of the issues associated with using student test scores to evaluate teachers:

[from a forthcoming article on legal issues associated with using test scores to evaluate, and dismiss teachers]

Most VAM teacher ratings attempt to predict the influence of the teacher on the student’s end-of-year test score, given the student’s prior test score and descriptive characteristics – for example, whether the student is poor, has a disability, or is limited in her English language proficiency.[1] These statistical controls are designed to account for the differences that teachers face in serving different student populations.  However, there are many problems associated with using VAM to determine whether teachers are effective.  The remainder of this section details many of those problems.

Instability of Teacher Ratings

The assumption in value-added modeling for estimating teacher “effectiveness” is that if one uses data on enough students passing through a given teacher each year, one can generate a stable estimate of the contribution of that teacher to those children’s achievement gains.[2] However, this assumption is problematic because of the concept of inter-temporal instability: that is, the same teacher is highly likely to get a very different value-added rating from one year to the next.  Tim Sass notes that the year-to-year correlation for a teacher’s value-added rating is only about 0.2 or 0.3 – at best a very modest correlation.  Sass also notes that:

About one quarter to one third of the teachers in the bottom and top quintiles stay in the same quintile from one year to the next while roughly 10 to 15 percent of teachers move all the way from the bottom quintile to the top and an equal proportion fall from the top quintile to the lowest quintile in the next year.[3]

Further, most of the change or difference in the teacher’s value-added rating from one year to the next is unexplainable – not by differences in observed student characteristics, peer characteristics or school characteristics.[4]

Similarly, preliminary analyses from the Measures of Effective Teaching Project, funded by the Bill and Melinda Gates Foundation found:

When the between-section or between-year correlation in teacher value-added is below .5, the implication is that more than half of the observed variation is due to transitory effects rather than stable differences between teachers. That is the case for all of the measures of value-added we calculated.[5]

While some statistical corrections and multi-year analysis might help, it is hard to guarantee or even be reasonably sure that a teacher would not be dismissed simply as a function of unexplainable low performance for two or three years in a row.

Classification & Model Prediction Error

Another technical problem of VAM teacher evaluation systems is classification and/or model prediction error.  Researchers at Mathematica Policy Research Institute in a study funded by the U.S. Department of Education carried out a series of statistical tests and reviews of existing studies to determine the identification “error” rates for ineffective teachers when using typical value-added modeling methods.[6] The report found:

Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively.[7]

Type I error refers to the probability that based on a certain number of years of data, the model will find that a truly average teacher performed significantly worse than average.[8] So, that means that there is about a 25% chance, if using three years of data or 35% chance if using one year of data that a teacher who is “average” would be identified as “significantly worse than average” and potentially be fired.  Of particular concern is the likelihood that a “good teacher” is falsely identified as a “bad” teacher, in this case a “false positive” identification. According to the study, this occurs one in ten times (given three years of data) and two in ten (given only one year of data).

Same Teachers, Different Tests, Different Results

Determining whether a teacher is effective may vary depending on the assessment used for a specific subject area and not whether that teacher is a generally effective teacher in that subject area.  For example, Houston uses two standardized test each year to measure student achievement: the state Texas Assessment of Knowledge and Skills (TAKS) and the nationally-normed Stanford Achievement Test.[9] Corcoran and colleagues used Houston Independent School District (HISD) data from each test to calculate separate value-added measures for fourth and fifth grade teachers.[10] The authors found that a teacher’s value-added can vary considerably depending on which test is used.[11] Specifically:

among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test.  Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.[12]

Similar issues apply to tests on different scales – different possible ranges of scores, or different statistical modification or treatment of raw scores, for example, whether student test scores are first converted into standardized scores relative to an average score, or expressed on some other scale such as percentile rank (which is done is some cases but would generally be considered inappropriate).  For instance, if a teacher is typically assigned higher performing students and the scaling of a test is such that it becomes very difficult for students with high starting scores to improve over time, that teacher will be at a disadvantage. But, another test of the same content or simply with different scaling of scores (so that smaller gains are adjusted to reflect the relative difficulty of achieving those gains) may produce an entirely different rating for that teacher.

Difficulty in Isolating Any One Teacher’s Influence on Student Achievement

It is difficult if not entirely infeasible to isolate one specific teacher’s contribution to student’s learning, leading to situations where a teacher might be identified as a bad teacher simply because her colleagues are ineffective. This is called a spillover effect. [13] For students who have more than one teacher across subjects (and/or teaching aides/assistants), each teacher’s value-added measures may be influenced by the other teachers serving the same students. Kirabo Jackson and Elias Bruegmann, for example, found in a study of North Carolina teachers that students perform better, on average, when their teachers have more effective colleagues.[14] Cory Koedel found that reading achievement in high school is influenced by both English and math teachers.[15] These spillover effects mean that teachers assigned to weaker teams of teachers might be disadvantaged, through no fault of their own.

Non-Random Assignment of Students Across Teachers, Schools And Districts

The fact that teacher value-added ratings cannot be disentangled from patterns of student assignment across schools and districts leads to the likelihood that teachers serving larger shares of one population versus another are more likely to be identified as effective or ineffective, through no fault of their own.  Non-random assignment, like inter-temporal instability is a seemingly complicated statistical issue. The non-random assignment problem relates not to the error in the measurement (test scores) but to the complications of applying a statistical model to real world conditions. The most fair comparisons between teachers would occur in a case where teachers could be randomly assigned to comparable classrooms with comparable resources, and where exactly the same number of students could be randomly assigned to those teachers, so that each teacher would have the same numbers of children and children of similar family backgrounds, prior performance, personal motivation and other characteristics. Obviously, this does not happen in reality.

Students are not sorted randomly across schools, across districts, or across teachers within schools. And teachers are not randomly assigned across school settings, with equal resources. It is certainly likely that one fourth grade teacher in a school is assigned more difficult students year-after-year than another. This may occur by choice of that teacher – a desire to try to help out these students – or other factors including the desire of a principal to make a teacher’s work more difficult.  While most value-added models contain some crude indicators of poverty status, language proficiency and disability classification, few if any sufficiently mitigate the bias that occurs from non-random student assignment. That bias occurs from such apparently subtle forces as the influence of peers on one another, and the inability of value-added models to sufficiently isolate the teacher effect from the peer effect, both of which occur at the same level of the system – the classroom.[16]

Jesse Rothstein notes that “[r]esults indicate that even the best feasible value-added models may be substantially biased, with the magnitude of the bias depending on the amount of information available for use in classroom assignments.”[17]

Value-added modeling has more recently been at the center of public debate after the Los Angeles Times contracted RAND Corporation economist Richard Buddin to estimate value-added scores for Los Angeles teachers, and the Times reporters then posted the names of individual teachers classified as effective or ineffective on their web site.[18] The model used by the Los Angeles Times, estimated by Buddin, was a fairly typical one, and the technical documentation proved rich with evidence of the types of model bias described by Rothstein and others. For example:

  • 97% of children in the lowest performing schools are poor, and 55% in higher performing schools are poor;
  • The number of gifted children a teacher has affects their value-added estimate positively – The more gifted children the teacher has, the higher the effectiveness rating;
  • Black teachers have lower value-added scores for both English Language Arts and Math than white teachers, and these are some of the largest negative correlates with effectiveness ratings provided in the report – especially for MATH;
  • Having more black students in your class is negatively associated with teacher’s value-added scores, though this effect is relatively small;
  • Asian teachers have higher value-added scores than white teachers for Math, with the positive association between being Asian and math teaching effectiveness being as strong as the negative association for black teachers.

Some of these associations above are explained by related research by Hanushek and Rivkin, which shows measurable effects of the racial composition of peer groups on individual student’s outcomes and explains the difficulty in distilling these effects from teacher effects.[19] Note that it is also likely that associations with teacher race above are entangled with student race, where black teachers are more likely to be in classrooms with larger shares of black students.[20]

All value-added comparisons are relative. They can be used for comparing one teacher to another in a school, teachers in one school to teachers in another school, or in one district to other districts. The reference group becomes critically important when determining the potential for disparate impact of negative teacher ratings, resulting from model bias. For example, if one were to employ a district-wide performance-based dismissal (or retention) policy in Los Angeles using the Los Angeles Times model, one would likely layoff disproportionate numbers of teachers in poor schools and black teachers of black students, while disproportionately retaining Asian teachers.  But, if one adopted the layoff policy relative to within-school rather than district-wide norms, because children are largely segregated by neighborhoods and schools, the disparate effect might be lessened. The policy may neither be fairer nor better in terms of educational improvement, but racially disparate dismissals might be reduced.

Finally, because teacher value-added ratings cannot be disentangled entirely from patterns of student assignment across teachers within schools, principals may manipulate assignment of difficult and/or unmotivated students in order to compromise a teacher’s value-added ratings, increasing the principal’s ability to dismiss that teacher. This concern might be mitigated by requirements for lottery-based student assignment and teacher assignments. However, such requirements could create cumbersome student assignment processes and processes that interfere with achieving the best teacher match for each child.

Whereas the problem of stability rates and error rates above are issues of “statistical error,” the problem of non-random assignment is one of “model bias.” Many value-added ratings of “teacher effectiveness” suffer from both large degrees of error, and severe levels of model bias.  The two are cumulative problems, not overlapping. In fact, the extent of error in the measures may partially mask the full extent of bias. In other words, we might not even know how prodigious the bias is.

In The Best Possible Case, About 20% of Contracted Certified Teachers in a District Might Have Value-Added Scores

Setting aside the substantial concerns above over “measurement error” and “model bias” which severely compromise the reliability and validity of value-added ratings of teachers, in most public school districts, fewer than 20% of certified teaching staff could be assigned any type of value-added assessment score.  Existing standardized assessments typically focus on reading or language arts, and math performance between grades three and eight.  Because baseline scores are required, and ideally multiple prior scores to limit model bias, it becomes difficult to fairly rate third grade teachers.  By middle school or junior high, students are interacting with many more teachers and it becomes more difficult to assign value-added scores to any one teacher. When considering the various support staff roles, specialist teachers, teachers of elective and/or advanced secondary courses, value-added measures are generally applicable to only a small minority of teachers in any school district (<20%). Thus, in order to make value-added measures a defined element of teacher evaluation in teacher contracts, one must have separately negotiated contracts for those teachers to whom these measures apply and this is administratively cumbersome and potentially expensive for districts in these difficult economic times.

Washington DC’s IMPACT teacher evaluation system is one example that differentiates classes of teachers by having, or not, value-added measures.[21] While contractually feasible, this approach creates separate classes of teachers in schools and may have unintended consequences for educational practices, including increasing tensions between non-value-added-rated teachers wishing to pull students of value-added-rated teachers out of class for special projects or activities.


[1] Value-added ratings of teachers are generally not based on a simple subtraction of each student’s spring test score and previous fall test score for a specific subject. Such an approach would clearly disadvantage teachers who happen to serve less motivated groups of students, or students with more difficult home lives and/or fewer family resources to support their academic progress through the year. It would be even more problematic to simply use the spring test score from the prior year as the baseline score, and the spring of the current year to evaluate the current year teacher, because the teacher had little control over any learning gain or loss that may have occurred during the prior summer. And these gains and losses tend to be different for students from higher and lower socio-economic status.  See Karl L. Alexander et al., Schools, Achievement, and Inequality: A Seasonal Perspective, 23 Educ. Eval. and Pol’y Analysis 171 (2001). Recent findings from a study funded by the Bill and Melinda Gates Foundation confirm these “seasonal” effects: “The norm sample results imply that students improve their reading comprehension scores just as much (or more) between April and October as between October and April in the following grade. Scores may be rising as kids mature and get more practice outside of school.” Bill & Melinda Gates Foundation, Learning about Teaching: Initial Findings from the Measures of Effective Teaching Project 8, available at http://www.metproject.org/downloads/Preliminary_Findings-Research_Paper.pdf.

[2] Tim R. Sass, The Stability of Value-Added Measures of Teacher Quality and Implications for Teacher Compensation Policy, Urban Institute (2008), available at http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf. See also Daniel F. McCaffrey et al., The Intertemporal Variability of Teacher Effect Estimates, 4 Educ. Fin. & Pol’y, 572 (2009).

[3] Sass, supra note 27.

[4] Id.

[5] Bill & Melinda Gates Foundation, supra note 26.

[6] Peter Z. Schochet & Hanley S. Chiang, Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains (NCEE 2010-4004). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education (2010).

[7] Id.

[8] Id. at 12.

[9] Sean P. Corcoran, Jennifer L. Jennings & Andrew A. Beveridge, Teacher Effectiveness on High- and Low-Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

[10] Id.

[11] Id.

[12] Id.

[13] Cory Koedel, An Empirical Analysis of Teacher Spillover Effects in Secondary School, 28 Econ. of Educ. Rev.682 (2009).

[14] C. Kirabo Jackson & Elias Bruegmann, Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers, 1 Am. Econ. J.: Applied Econ. 85 (2009).

[15] Koedel, supra note 38.

[16] There exist at least two different approaches to control for peer group composition. On approach, used by Caroline Hoxby and Gretchen Weingarth  involves constructing measures of the average entry level of performance for all other students in the class.  C. Hoxby & G. Weingarth, Taking Race Out of the Equation: School Reassignment and the Structure of Peer Effects, available at http://www.hks.harvard.edu/inequality/Seminar/Papers/Hoxby06.pdf. Another involves constructing measures of the average racial and socioeconomic characteristics of classmates, as done by Eric Hanushek and Steven Rivkin. E. Hanushek & S. Rivkin, School Quality and the Black-White Achievement Gap, available at http://www.nber.org/papers/w12651.pdf?new_window=1.

[17] Jesse Rothstein, Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement, 25 Q. J. Econ. (2008). See also Jesse Rothstein, Student Sorting and Bias in Value Added Estimation: Selection on Observables and Unobservables, available at http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf. Many advocates of value-added approaches point to a piece by Thomas Kane and Douglas Staiger as downplaying Rothstein’s concerns. Thomas Kane & Douglas Staiger, Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation, available at http://www.nber.org/papers/w14607.pdf?new_window=1.   However, Eric Hanushek and Steve Rivkin explain, regarding the Kane and Staiger analysis: “the possible uniqueness of the sample and the limitations of the specification test suggest care in interpretation of the results.” Eric A. Hanushek & Steve G. Rivkin, S., Generalizations about Using Value-Added Measures of Teacher Quality 8, available at http://www.utdallas.edu/research/tsp-erc/pdf/jrnl_hanushek_rivkin_2010_teacher_quality.pdf.

[18] Richard Buddin, How Effective Are Los Angeles Elementary Teachers and Schools?, available at http://www.latimes.com/media/acrobat/2010-08/55538493.pdf.

[19] Eric Hanushek & Steve Rivkin, School Quality and the Black-White Achievement Gap, Educ. Working Paper Archive, Univ. of Ark., Dep’t of Educ. Reform (2007).

[20] Charles T. Clotfelter et al., Who Teaches Whom? Race and the Distribution of Novice Teachers, 24 Econ. of Educ. Rev. 377 (2005).

Smart Guy (Gates) makes my list of “Dumbest Stuff I’ve Ever Read!”

Bill Gates (clearly a very smart guy) has just topped my list of Dumbest Stuff I’ve Ever Read for the first few months of 2011. He did it with this post in the Huffington Post and with his talk to State Governors (in which he also naively handed out copies of the book Stretching the School Dollar, which is complete junk):

http://www.huffingtonpost.com/bill-gates/bill-gates-school-performance_b_829771.html

Let’s dissect two bold premises of Gates’ argument about US spending and student outcomes – how we’ve spent ourselves crazy for decades and how we’ve gotten nothing for it – how we spend so much more than other countries, but they kick our butts – his reasons for arguing that now is the time to flip the curve.

Gates opines:

Compared to other countries, America has spent more and achieved less.

To be able to make such a comparison, one would have to be able to accurately and precisely measure education spending levels in the United States relative to education spending levels in other countries, and achievement outcomes of children in the United States compared to otherwise similar children in other countries. We’ve already heard much blog talk about how poverty rates among US children and children in Finland are, well, not really so comparable – Finland having much lower poverty. Clearly, that makes at least some difference.

But let’s focus on the expenditure side of this puzzle for a moment.

We don’t hear enough about how those expenditure figures are, well, not so comparable either.

International education spending comparisons like those presented by the Organization for Economic Cooperation and Development (OECD) and often reported by organizations like McKinsey are, well, bogus…meaningless… uh…not particularly useful. Why? Because they are not comparable. Plain and simple.

Government or public education expenditures in different countries contain different components. A number of my colleagues and I are in the process of better understanding and delineating the components included in public education expenditures across nations. For example, in a country with a national health care system, public education expenditures may not include health care expenses for all employees. That’s not a trivial expense. The same may be true of pension contributions and obligations, where they exist, in other countries. The same is also true for arts and athletic programs in countries where it is more common for those activities to be embedded in community services. But, we’ve yet to fully identify the extent of these differences across nations or how these differences affect the spending comparisons. What we do know is that they do affect the spending comparisons – and likely quite significantly.

So, that in mind, what can we say about how much the US spends with respect to how well our children do, compared to other countries’ spending and outcomes when neither the spending figure nor the children in the system are even remotely comparable? Not much!

Gates opines:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead.

[from a previous post]

We often see pundits arguing that education spending has doubled over a 30 year period, when adjusted for inflation, and we’ve gotten nothing for it. We’ve got modest growth in NAEP scores and huge growth in spending. And those international comparisons… wow!

The assertion is therefore that our public education system is less cost-effective now than it was 30 years ago. But this assumption is based on layers of flawed reasoning, on both sides of the equation.

Here’s a bit of School Finance 101 on this topic:

First, what are the two sides of the equation, or at least the two parts of the fraction? The numerator here is education spending and how we measure it now compared to previously. The major flaw in the usual reasoning is that we are making our comparison of the education dollar now to then by simply adjusting the value of that dollar for the average changes in the prices of goods purchased by a typical consumer (food, fuel, etc.), or the Consumer Price Index.

Unfortunately, the consumer price index is relatively unhelpful (okay, useless) for comparing current education spending to past education spending, unless we are considering how many loaves of bread or gallons of gas can be purchased with the education dollar.

If we wanted to maintain constant quality education over time, the main thing we’d have to do is maintain a constant quality workforce in schools – mainly a teacher workforce, but also administrators, etc. At the very least, if quality lagged behind we’d have to be able to offset the quality losses with additional workers, but the trade-offs are hard to estimate.

The quality of the teacher workforce is influenced much more by the competitiveness of the wages for teachers, compared to other professions, than to changes in the price of a loaf of bread or gallon of gas. If we want to get good teachers, teaching must be perceived as a desirable profession with a competitive wage. That is, to maintain teacher quality we must maintain the competitiveness of teacher wages (which we have not over time) and to improve teacher quality, we must make teacher wages (or working conditions) more competitive. On average, non-teacher wage growth has far outpaced the CPI over time and on average, teacher wages have lagged behind non-teacher wages, even in New Jersey!

Now to the denominator or the outcomes of our education system. First of all, if we allow for a decline in the quality of the key input – teachers – we can expect a decline in the outcomes however we choose to measure them. But, it is also important to understand that if we wish to achieve either higher outcomes, or to achieve a broader array of outcomes, or to achieve higher outcomes in key areas without sacrificing the broader array of outcomes, costs will rise. In really simple terms, the cost of doing more is more, not less. And yes, a substantial body of rigorous peer-reviewed empirical literature supports this contention (a few examples below).

So, as we ask our schools to accomplish more we can expect the costs of those accomplishments to be greater. If we expect our children to compete in a 21st century economy, develop technology skills and still have access to physical education and arts, it will likely cost more, not less, than achieving the skills of 1970. But, we must also make sure we are adequately measuring the full range of outcomes we expect schools to accomplish. If we are expecting schools to produce engaged civic participants, we may or may not see the measured effects in elementary reading and math test scores.

An additional factor that affects the costs of achieving educational outcomes is the student inputs – or who is showing up at the schoolhouse door (or logging in to the virtual school). A substantial body of research (see chapter by Duncombe and Yinger, here) explains how child poverty, limited English proficiency, unplanned mobility and even school racial composition may influence the costs of achieving any given level of student outcomes. Differences in the ways children are sorted across districts and schools create large differences in the costs of achieving comparable outcomes and so too do changes in the overall demography of the student population over time. Escalating poverty, and mobility induced by housing disruptions, increased numbers of children not speaking English proficiently all lead to increases of the cost of achieving even the same level of outcomes achieved in prior years. This is not an excuse. It’s reality. It costs more to achieve the same outcomes with some students than with others.

In short, the “cost” of education rises as a function of at least 3 major factors:

  1. Changes in the incoming student populations over time
  2. Changes in the desired outcomes for those students, including more rigorous core content area goals or increased breadth of outcome goals
  3. Changes in the competitive wage of the desired quality of school personnel

And the interaction of all three of these! For example, changing student populations making teaching more difficult (a working condition), meaning that a higher wage might be required to simply offset this change. Increasing the complexity of outcome goals might require a more skilled teaching workforce, requiring higher wages.

The combination of these forces often leads to an increase in education spending that far outpaces the consumer price index, and it should. Cost rise as we ask more of our schools, as we ask them to produce a citizenry that can compete in the future rather than the past. Costs rise as the student population inputs to our public schooling system change over time. Increased poverty, language barriers and other factors make even the current outcomes more costly to achieve. And costs of maintaining the quality of the teacher workforce change as competitive wages in other occupations and industries change, which they have.

Typically, state school finance systems have not kept up with the true increased costs of maintaining teacher quality, increased outcome demands or changing student demography. Nor have states sufficiently targeted resources to districts facing the highest costs of achieving desired outcomes. See www.schoolfundingfairness.org. And many states, with significantly changing demography including Arizona, California and Colorado have merely maintained or even cut current spending levels for decades (despite what would be increased costs of even maintaining current outcome levels).

Evaluating education spending solely on the basis of changes in the price of a loaf of bread and/or gallon of gasoline is, well, silly.

Notably, we may identify new “efficiencies” that allow us to produce comparable outcomes, with comparable kids at lower cost. We may find some of those efficiencies through existing variation across schools and districts, or through new experimentation. But it is downright foolish to pretend that those efficiencies are simply out there (even if we can’t see them, or don’t know them) and we can simply squeeze the current system into achieving comparable or better outcomes at lower cost.

Closing thought

So, Mr. Gates… neither of your two main premises rests on solid footing. Not only that, but these arguments are so commonplace and so intellectually flimsy and lazy as to be outright embarrassing.

I know you’ve got other things to think about and likely rely heavily on advisers to help you shape these arguments, much like politicians rely heavily on their staffers. Here’s a tip Mr. Gates. YOU ARE GETTING REALLY BAD, DEEPLY FLAWED ADVICE AND INFORMATION WHEN IT COMES TO SCHOOL FUNDING ARGUMENTS.

There are many, many credible school finance and economics of education scholars out there. Those who you have chosen to rely on in many instances – authors of Stretching the School Dollar and others are not credible scholars of school finance or education policy more generally. I tackle some of the other myths driving the current debate in these two recent posts:

School Funding Equity Smokescreens

School Funding Myths & Stepping Outside the “New Normal”

I don’t pretend by any stretch to be the only credible source, or the best one (or even one of the top 20, 50 or 100). And we in the field certainly don’t all agree on all, or perhaps even most topics. I’d try listing the many exceptional school finance and economics of education scholars here, but I’d likely end up leaving some really important ones out. I’ll gladly inform you directly regarding which scholars may provide the most useful information regarding specific topics and issues.

Cheers!

Related Readings

Baker, B.D., Taylor, L., Vedlitz, A. (2008) Adequacy Estimates and the Implications of Common Standards for the Cost of Instruction. National Research Council.  http://www7.nationalacademies.org/CFE/Taylor%20Paper.pdf

Duncombe, W., Lukemeyer, A., Yinger, J. (2006) The No Child Left Behind Act: Have Federal Funds been Left Behind? http://cpr.maxwell.syr.edu/efap/Publications/costing_out.pdf

This second one is a really fun article showing the vast differences in the costs of achieving NCLB proficiency targets in two neighboring states which happen to have very different testing standards. In really simple terms, Missouri has a hard test with low proficiency rates and Kansas and easy test with high proficiency rates. The authors show the cost implications of achieving the lower, versus higher tested achievement standards.

School Funding Equity Smokescreens: A note to the Equity Commission

In this blog post, I summarize a number of issues I’ve addressed in the past. In my previous post, I discussed general reformy myths about school spending. In this post, I address smokescreens commonly occurring in DC beltway rhetoric about school funding equity and adequacy. School funding is largely a state and local issue, where even that “local” component is governed under state policies. So I guess that makes it a state issue, really. Occasionally, the federal government will dabble in the debate over how or whether to intervene more extensively in state and local public school finance.  Now is one of those times where the federal government is again at least paying lip services to the question of equity – with some implication that they may even be talking about school funding equity. The federal government has created an equity commission!

One of my fears is that this current discussion of funding equity will be typical of recent beltway discussions of school funding, and be trapped in the constant fog of School Funding Smokescreens and insulated entirely from more legitimate representations and analyses of the critical issues that should be addressed.

So, for you – the equity commission – here’s a quick run down on School Funding Smokescreens:

1. On average, nationally, we now put more funding into higher poverty school districts than lower poverty ones (to no avail)

This argument seems to be popping up more and more of late, and often with the table below attached. This table is from the National Center for Education Statistics and shows the average current operating expenditure per pupil of school districts nationally over time. The table would appear to show that in 1994-95, low poverty school districts had between $300 and $400 less in per pupil spending than higher poverty ones. By 2006-07, the highest poverty quintile of school districts had about $100 per pupil more than the lowest poverty quintile. That’s it. We’re done. Equity problems fixed. No more savage inequalities. And after all of this fixing of school finance equity, we really got nothing for it. Achievement gaps are still unacceptably large and NAEP scores stagnant? Right? All of this after dumping a whole extra $100 per pupil into high poverty districts. I guess we should be rethinking this crazy strategy of systematically pouring so much into high poverty districts.

Table 1

NCES Oversimplification of Funding Differences by Poverty


Well, to begin with, a $100 difference really wouldn’t be that much anyway, given that the costs of actually meeting the needs of children from economically disadvantaged backgrounds are much greater than this. Setting that (really important) question aside, this table provides a less than compelling argument that we as a nation have accomplished improved funding equity for kids in high poverty districts.

Here’s the underlying scatter of school districts that lead to the neatly packed aggregations above. In the graph below, districts are plotted by current expenditures per pupil with respect to census poverty rates, using 2007-08 data. Clearly there is substantial variation in current spending. In fact, the underlying relationship isn’t even a relationship at all. It’s all over the place. And yes, if you fit a trend line – if you take out a huge magnifying glass – you can see that the trendline is ever so slightly higher in the higher poverty schools than in the lower poverty ones (perhaps about $100?). It’s not systematic. It’s not statistically significant. It’s pretty darn meaningless.

Figure 1

Pattern of school districts underlying Table 1


In our recent report Is school funding fair? we conducted a far more rigorous analysis of state and local revenue per pupil with respect to poverty, for each state. What we showed was that there exists huge variation across states both in the overall level of resources available to local public school districts and in the differences in state and local revenue in higher and lower poverty districts.  In that report, we showed that 9 states have statistically significantly lower state and local revenue per pupil in higher poverty districts (after controlling for economies of scale and competitive wage variation). Overall, half of states had lower funding per pupil in higher poverty districts (with many of those approximately the same).

Among the worst states were New York, Illinois and Pennsylvania. Let’s pull Illinois forward in Figure 1 – and also look at state and local revenues (excluding federal support, to focus on state policies) in place of current expenditures.

Figure 2

State and local revenues with respect to poverty, with Illinois highlighted


Now, when we exclude federal revenues the overall line tips slightly downward. The federal effect is slight, but there. More strikingly, when we pull Illinois forward in the picture, we see that funding by poverty across Illinois districts is highly regressive, and is systematic and statistically significant. Funding inequities across Illinois districts are far from being resolved. AND ILLINOIS IS NOT ALONE. I could go on and on with this.

2. The remaining (Because of #1), most egregious disparities in funding and teacher quality occur across schools within districts (because of politically motivated and corrupt local administrators) and these disparities are what cause the persistent racial achievement gaps (the reason those gaps haven’t improved since we’ve fixed between district inequity)

To many, this argument seems absurd (and is) on its face. Who really says that? Does anyone? Am I just makin’ this stuff up? No. And in fact, because this argument has become so pervasive of late, I even had to take the time to write a fairly extensive research article on the topic. See: http://epaa.asu.edu/ojs/article/view/718

I have written about this topic on my blog on several occasions and much of my writing on this topic can be found by reading my critiques of reports from the Center for American Progress and from the Education Trust. Here are some choice quotes where CAP and Ed Trust frame this argument – or blow this smoke!

Center for American Progress

State funding formulas tend to exert an equalizing effect on per pupil revenues between districts, on average, and not by accident. These formulas were sculpted by two generations of litigation and legislation seeking equitable or adequate funding for property-poor school districts.

Scandalous inequity in the distribution of resources within school districts has plagued U.S. education for more than a hundred years.

empirical literature documenting the extent of within-district inequity is astonishingly thin. [my reply: well, not if you actually read the research on the topic]

Center for American Progress

The outcome of such practices is predictable: A further widening of the dangerous achievement gap that has become endemic in American schools today.

Education Trust

Many states have made progress in closing the funding gaps between affluent school districts and those serving the highest concentrations of low-income children. But a hidden funding gap between high-poverty and low-poverty schools persists between schools within the same district.

These gaps occur partly because teachers in wealthier schools tend to earn more than their peers in high-poverty schools and because of pressure to “equalize” other resources across schools.

All of these claims that within district inequities are the major source of persistent inequity and that our failure to close within district funding and teacher quality gaps (having already fixed between district ones) are the reason for persistent black-white and poor-non-poor achievement gaps might be reasonable if poor children and non-poor children and black children and white children actually lived in the same school districts. BUT, IN GENERAL,* THEY DO NOT! As a result this argument is patently absurd, ridiculous, irresponsible and ignorant. It’s one massive distraction. A smokescreen of monumental proportion!

Here’s a quick visual of the reality that any informed analyst (or anyone who simply lives in the real world) understands. Below are two maps of the Chicago metropolitan area. On the left is a map which shows school districts and the level of state and local revenue per pupil in each of those districts. We know from above that Illinois maintains a very regressive state school finance formula. That is, higher poverty districts have less funding than lower poverty ones. Note that the diagonal shading indicates the location of districts that have majority minority (black and Hispanic) enrollment. As it turns out, most of those districts are in the orange – lower funding levels.

Now, CAP and Ed Trust would have you believe otherwise to begin with (that poor minority districts already have enough money), but would then go further to say that the real problem is that these Illinois districts are putting money into their white, rich schools at the expense of their poor black and Hispanic ones. How is that even possible?

Okay, so let’s look at the right hand panel, in which I have indicated the locations of individual schools, with majority black schools in red and majority Hispanic schools in purple. Majority white schools are in white. NOTE THAT THE WHITE DOTS TEND TO BE DISTRICTS ENTIRELY SEPARATE FROM THE PURPLE OR RED ONES. AND ONLY CHICAGO PUBLIC SCHOOLS HAS MUCH OF A MIX OF PURPLE AND RED. Only a handful of districts have both white and majority minority schools. Also, only a handful of districts have both low(er) poverty and high poverty schools. Districts are highly segregated.

Figure 3

State and Local Revenue and the Location of Majority Minority Schools in Illinois


FEW IF ANY SCHOOL DISTRICTS IN THIS MAP HAVE THE OPPORTUNITY TO REDISTRIBUTE RESOURCES ACROSS THEIR “RICH” AND “POOR” OR “BLACK” AND “WHITE” SCHOOLS – BECAUSE THEY DON’T HAVE BOTH!!!!!  Yes, Chicago Public Schools and a few other districts can re-allocate between poor black and poor Hispanic schools. But such re-allocation accomplishes little toward improving educational equity in the Chicago metro area or State of Illinois.

Now… to those at Ed Trust and CAP – if you really don’t mean this, I dare you to actually say it. Say that between district differences in demographics and funding are THE BIG ISSUE. At least as big if not much bigger than within district differences. Say it. Acknowledge it. I challenge you. Release another hastily crafted report and press release – but this time – having conclusions that are at least reasonably grounded in reality.  The data are unambiguous in this regard. Yes, within district disparities exist and it is important to address them. I will certainly admit that, and I’ve never said otherwise.  But solving within district resource variation alone will accomplish very little.

*Clarification – In states with county-wide districts, and large diverse populations, like Florida, one is more likely to see between school  within district segregation to be a greater problem.

3. High need, poor urban districts (in addition to misallocating all of their resources to the schools serving rich white kids in their district???) are simply wasting massive sums of money on things like cheer leading and ceramics.

This is another absurd and empirically unfounded argument. Again, you ask, is anyone really saying that high need, low performing school districts are actually wasting money on cheerleading and ceramics that could easily be translated into sufficient resources for improving reading and math performance (can we really fire the cheer leading coach and hire 6 more math specialists)? Surely no-one is advancing an argument – SMOKESCREEN – that utterly absurd. But again, these quotes can be found all over the Beltway talk-circuit regarding the best fixes for school funding inequities and inefficiencies (and nifty was to stretch that school dollar).

Here’s the advertisement headline from a recent beltway discussion at the Urban Institute:

Urban Institute Event Headline (based on content from Marguerite Roza)

Imagine a high school that spends $328 per student for math courses and $1,348 per cheerleader for cheerleading activities. Or a school where the average per-student cost of offering ceramics was $1,608; cosmetology, $1,997; and such core subjects as science, $739.

I’ve only recently begun exploring more deeply the resource differences across school districts that fall into different performance and efficiency categories. I’ve been specifically looking at Illinois and Missouri school districts, and estimating statistical models to determine which districts are:

a) resource constrained and low performing (low-low)

b) resource constrained and high performing (low-high)

c) resource rich and high performing (high-high)

d) resource rich and low performing (high-low)

These categories are based on thoroughly cost adjusted analysis. As such, a district identified as having low or constrained “resources” may actually spend more per pupil in nominal dollars than a district identified as having high resource levels. The resource levels are adjusted for various cost pressures including differences in student needs. I should be posting the forthcoming paper on my research page some time in the next month. But here’s a preview.

In both states, most districts fall into categories a) and c), where you would expect. There’s somewhat more “scatter” in Missouri, either because Missouri has some better funded high need districts (less regressive than Illinois) or because my statistical model just isn’t working quite right. I picked these neighboring states because Missouri is less regressive than Illinois and because I had similar data on both. So, the big question here is – if I compare the dominant categories of resource constrained low performing schools to resource rich high performing ones what do we actually see in the organization of their staffing and course delivery?

In Missouri, I tabulate each individual course to which teachers are assigned. In Illinois my tabulation is by the main assignment of each teacher. To begin with, in both states, the high spending high performing schools have more course offerings per pupil and more teachers per pupil (and smaller class sizes). These differences are far greater under the more regressive Illinois policies.

Here are a few fun visuals of what I’m finding so far, expressed in “shares of staff” allocation and relating staffing allocations in low-low districts to those of high-high districts.

The first two graphs compare the main assignments of teachers in high resource high performing Illinois schools (high school assignments only) to those in low resource low performing ones. The diagonal line represents “comparable” allocation to high resource high performing schools. Assignments falling below the line represent “deficits” (relative) in low resource low performing schools.

Across all assignment areas, Figure 4 shows that kids in low resource low performing schools tend to have reduced access to physical education, biology, chemistry and foreign language. Sadly, no indicator for ceramics in these data.

Figure 4

Allocation of Main Teaching Assignments in Illinois Districts


Focusing on less frequent assignment areas – lower budget share & staff allocation areas – Figure 5 shows that in Illinois, kids in low resource low performing schools tend to have reduced access to advanced math and science courses and drivers education, but have greater access to basic courses. That is, these districts are already channeling their resources to the basic, at the detriment of potentially important advanced coursework in math and science, and even basic coursework in biology and chemistry.

Figure 5

Allocation of Main Teaching Assignments in Illinois Districts (less frequent assignments)


Missouri – despite having somewhat higher relative resource levels in higher poverty settings (than Illinois, but still regressive), shows very similar patterns. Figure 6 shows reduced access to physical education for kids in low resource low outcome schools and elevated access to “general” math and language arts courses.

Figure 6

Allocation of Assigned Courses in Missouri Districts


Kids in low resource low outcome schools have reduced access to advanced math courses including calculus and trigonometry, and reduced access to chemistry. They have higher shares of teachers in special education, basic life skills, earth and physical (basic/introductory) science and in JROTC. Again, significant reallocation to “basics” is already occurring and within significant resource constraints.

Figure 7

Allocation of Assigned Courses in Missouri Districts (less frequent courses)


Also, LET IT BE KNOWN THAT HIGH SPENDING HIGH PERFORMING SCHOOLS IN MISSOURI HAVE TWICE AS MANY CERAMICS COURSE OFFERINGS PER PUPIL AS LOW SPENDIGN LOW PERFORMING MISSOURI SCHOOLS!!!!!

4. None of this school funding equity – between district stuff – matters anyway!

Rigorous peer reviewed studies do show that state school finance reforms matter. Shifting the level of funding can improve the quality of teacher workforce and ultimately the level of student outcomes and shifting the distribution of resources can shift the distribution of outcomes.

We conclude that there is arbitrariness in how research in this area appears to have shaped the perceptions and discourse of policymakers and the public. Methodological complexities and design problems plague finance impact studies. Advocacy research that has received considerable attention in the press and elsewhere has taken shortcuts toward desired conclusions, and this is troubling. As demonstrated by our own second look at the states discussed in Hanushek and Lindseth’s book, the methods used for such relatively superficial analyses are easily manipulable and do not necessarily lead to the book’s conclusions. Higher quality research, in contrast, shows that states that implemented significant reforms to the level and/or distribution of funding tend to have significant gains in student outcomes. Moreover, we stress the importance of the specific nature of any given reform: positive outcomes are likely to arise only if the reform is both significant and sustained. Court orders alone do not ensure improved outcomes, nor do short-term responses.