Author: schoolfinance101

Bruce Baker is an Professor in the Graduate School of Education at Rutgers, The State University of New Jersey. From 1997 to 2008 he was a professor at the University of Kansas in Lawrence, KS. He is lead author with Preston Green (Penn State University) and Craig Richards (Teachers College, Columbia University) of Financing Education Systems, a graduate level textbook on school finance policy published by Merrill/Prentice-Hall. Professor Baker has written a multitude of peer reviewed research articles on state school finance policy, teacher labor markets, school leadership labor markets and higher education finance and policy. His recent work has focused on measuring cost variations associated with schooling contexts and student population characteristics, including ways to better design state school finance policies and local district allocation formulas (including Weighted Student Funding) for better meeting the needs of students. Baker, along with Preston Green of Penn State University are co-authors of the chapter on Conceptions of Equity in the recently released Handbook of Research Education Finance and Policy, and co-authors of the chapter on the Politics of Education Finance in the Handbook of Education Politics and Policy and co-authors of the chapter on School Finance in the Handbook of Education Policy of the American Educational Research Association. Professor Baker has also consulted for state legislatures, boards of education and other organizations on education policy and school finance issues and has testified in state school finance litigation in Kansas, Missouri and Arizona. He is a member of the Think Tank Review Panel, a group of academic researchers who conduct technical reviews of publicly released think tank reports on education policy issues.

Data, Portfolios & the Path Forward for NYC (& Elsewhere)

As the new year begins, I’ve been pondering what I might recommend as guiding principles for the path forward for education policy in New York City under its new Mayor, Bill de Blasio, who is often referred to on Twitter as BDB. So here are my thoughts for the way forward, from one BDB (Bruce D. Baker) to another.

Note that I had drafted much of this content last spring when convening with a group of scholars to discuss the path forward for NYC education policies. Not being as well versed in the specifics of NYC education policies, but having at least written academically about some, I kept my ideas broad, and applicable to many educational settings across the U.S.

My recommendations fall into two broad categories:

Develop a robust, balanced, least intrusive system of indicators for evaluating New York City Schools and then use that information appropriately

NYC BOE policies of the past ten years have been rife with data abuse (though at times, merely in an effort to comply with state required data abuse). School closures have been based on ill-conceived measures of “school failure” which do little more than target the city’s neediest student populations, imposing on them repeated disruptions.

New York City’s teacher performance reports, albeit better than many, apply the worst form of statistical reductionism to quantify teacher “quality,” taking noisy statistical estimates of the association between teachers-of-record and assigned students test score gains (applying only the most convenient statistical corrections) in limited curricular areas and grades, and assuming levels of precision and accuracy that are completely unwarranted.

Such data abuse – on both counts [school closures and teacher ratings] – is reprehensible.

Right-sized (NOT BIG) data can indeed be useful for guiding decision-making in large, complex urban education systems. But data should never be the exclusive determinant of policies or other high stakes decisions.

Human judgment matters, including human interpretation of the meaning and usefulness of data as it informs decisions which ultimately affect other human beings.

New York City should give serious consideration to how data are collected, maintained and ultimately used for informing policy and decision making. Four guiding principles are:

Emphasis should be on understanding what the data can and cannot tell us about schools, their climate, students and their achievement and the role of teachers, leaders, programs and services. Policies should emphasize how various constituents can make sense of data, coupled with their knowledge and experience, to inform the path forward.
Data should NEVER dictate decisions. Rather, data may inform them. Along these same lines, despite ill-conceived requirements of state policies, imprecise information (which includes nearly all social science measures) should NEVER be treated as determinative, attaching specific consequences to specific scores or estimates (splitting hairs that cannot or shouldn’t be split).
Data systems must better capture the scope of public service that is public schooling in a modern era. This means collecting more than just that which is most easily quantifiable, and more than just achievement test scores on mandated core curriculum.
Data should be collected in the least intrusive manner necessary to draw valid inference, or provide valid descriptive profiles.

School principals and leadership teams should have available to them sufficient and appropriate data to guide – Not Dictate – building level management decisions. Information might include typical measures of student achievement as well as measures of gain, but also include measures on longer term outcomes of students who attended any given school (graduation, college attendance/persistence) – linked longitudinally to both outcome data while they were in attendance at the school as well as data on programs and services in which they participated. Data on students should similarly be traceable backwards. Data might also include indicators of parent and student perceptions of school environment, etc.

Data should attempt to capture not only limited, easily measurable “outcomes” but also more accurately measure inputs and resources as well as characterize ongoing educational processes. After all, a central objective of data collection and maintenance is to be able to make connections between inputs, process and outcomes. And the central objective of city leaders should be to ensure equitable and adequate inputs and processes, to support the achievement of more equitable, more adequate outcomes. Not the other way around.

An important consideration is that data should be collected more strategically so as to be far less intrusive than current practices on the actual educational processes being monitored with the data. That is, the goal of actors within the system should not be to improve the measured data elements, but rather to more substantively improve their practices in ways that lead to shifts in the measured data elements, assuming we’re measuring the right things (often a bold assumption).

Lengthy performance assessments, achievement tests or survey instruments need not be given every year to every child. Appropriate sampling can achieve robust data with far less intrusion (or expense). Such is the design of assessment systems like the National Assessment of Educational Progress. Providing samples of items to samples of students across schools can reduce cost, reduce intrusion, reduce the likelihood of teaching to the test, item familiarity and other threats to validity, and thus provide more useful information. This approach also reduces the digital record maintained on any one student.

As a basic rule of thumb, high stakes decisions should never be made with low quality information. One should never conclude with certainty based on uncertain information.

The city should avoid the urge to apply categories to otherwise continuous and noisy data – such as applying specific cut points and imposing quality/value judgments based on those cut-points. Few if any measures collected in social science, including test scores from multiple choice assessments given to nine and ten year old children, are sufficiently precise for making high stakes determinations by splitting hairs between getting 20 versus 21 (or even 20 vs 25) correct responses. Most of the types of data collected in such an environment are simply not sufficiently precise for such determinations.

Complex decisions like school closures and reorganization require multiple perspectives and varied forms of information/data. A school should not be closed simply on the basis of one or a handful of bad performance indicators – typical school report card elements. The role of schools in communities should not be completely ignored. Nor should the extent to which the same children’s lives/educations are repeatedly disrupted. More mundane considerations including transportation efficiency & facilities quality/efficiency/fit are likely more relevant than student outcomes when considering school closings. In fact, rarely if ever are “low tested outcomes” a legitimate reason for school closure. Rather, they are usually an indicator of other underlying processes – some non-school and perhaps some school processes – requiring far more thoughtful intervention than the current slash-and-burn approach to “failing schools.”

Similarly, data may inform but should never dictate human resource decisions. Such is the core problem with recently adopted statewide teacher and principal evaluation models that prescribe percentages of evaluations that must be dictated by X, Y or Z, and that require specific personnel actions be taken when the numbers fall into preset categories.

As my colleagues and I explain in a recent article,

Arguably, a more reasonable and efficient use of these quantifiable metrics in human resource management might be to use them as a knowingly noisy pre-screening tool to identify where problems might exist across hundreds of classrooms in a large district. Value-added estimates might serve as a first step toward planning which classrooms to observe more frequently. Under such a model, when observations are completed, one might decide that the initial signal provided by the value-added estimate was simply wrong. One might also find that it produced useful insights regarding a teacher’s (or group of teachers’) effectiveness at helping students develop certain tested skills.

School leaders or leadership teams should clearly have the authority to make the case that a teacher is ineffective and that the teacher even if tenured should be dismissed on that basis. It may also be the case that the evidence would actually include data on student outcomes – growth, etc. The key, in our view, is that the leaders making the decision – indicated by their presentation of the evidence – would show that they have reasonably used information to make an informed management decision. Their reasonable interpretation of relevant information would constitute due process, as would their attempts to guide the teacher’s improvement on measures over which the teacher actually had control.

Put simply, mindless reliance on prescribed metrics is not effective human resource management whether in the private sector or in public schools.

Additional Readings:

Friday Thoughts on Data, Assessment & Informed Decision Making in Schools. https://schoolfinance101.wordpress.com/2012/12/07/friday-thoughts-on-data-assessment-informed-decision-making-in-schools/
Closing schools: Good Reasons and Bad Reasons https://schoolfinance101.wordpress.com/2012/02/08/closing-schools-good-reasons-and-bad-reasons/
“Corporate Reform” or Failed, Desperate Corporate Management? https://schoolfinance101.wordpress.com/2013/07/26/corporate-reform-or-failed-desperate-corporate-management/
Baker, B.D., Green, P.C., Oluwole, J. (2013) The legal consequences of mandating high stakes decisions based on low quality information: Teacher Evaluation in the Race-to-the-Top Era. Education Policy Analysis Archives http://epaa.asu.edu/ojs/article/view/1298/1043
Follow up on Ed Waivers, Junk Rating Systems & and Misplaced Blame – New York City’s “Failing” Schools https://schoolfinance101.wordpress.com/2012/09/07/follow-up-on-ed-waivers-junk-rating-systems-and-misplaced-blame-new-york-citys-failing-schools/

Balance choice with support for equity & access

The theme of individual (parent/child) liberty via parental choice that has dominated the past decade of education policy in New York City (and elsewhere) must be counterbalanced with a greater emphasis on equity and equality of opportunity and access. This requires considering carefully the geographic and socioeconomic distribution of educational opportunities at all grade levels across all children.

Choice in and of itself does not ensure equity. This false premise promoted by many “education reformers” runs counter to centuries of political theory, which explains that liberty (a core tenet of choice) and equality are at constant tension with one another (only at some extreme point might “meet and be confounded together” Tocqueville, Alexis de. Democracy in America, volume 2, part II, chapter 1).

Implicit in policy preferences for choice program expansion is the notion that more children should have the choice to attend “higher quality” schooling options and that such options will emerge, as a function of the competitive marketplace for quality schooling with little attention to the level of resources provided or other prerequisite conditions for sustaining an equitable distribution of quality schooling.

The notion that one would provide via public subsidy, “higher quality” alternatives means also consciously providing lower quality ones. That is, consciously endorsing a policy of such inequity that the parents of children presently attending “low(er) quality” schools will endure the transaction costs (family/child disruption, geographic inconvenience) to move their child from their neighborhood schools. This is simply wrongheaded.

Even more wrongheaded are policies yielding outright deprivation by labeling neighborhood schools as failing and shuttering them on false pretenses (low test scores as a method of placing blame), leaving parents to scramble to find an acceptable alternative (one that is merely better than nothing). Such policies create a false sense of demand for those alternatives (typically charters), further advancing current policy preferences. That is, the argument that charter waiting lists provide validation for further charter expansion, even when those waiting lists have been induced by school closures.

Recent policy preferences are built on the assumption that liberty achieved by choice programs serves as substitute for the provision of broad based, equitable and adequate financing. Studies purporting significant advantages achieved by students attending charter schools have invariably neglected to evaluate their access to financial resources (see also) (while selectively evaluating outcomes of children attending those schools with access to resources), frequently downplaying the importance of money or relevance of equity traditionally conceived.

Advocates suggest that if some children are made better off by the presence of higher quality options, all are better off and certainly no one is worse off. This too is false. In forthcoming work, I explain that:

Baker and Green (2008) as well as Koski and Reich (2006) explain that to a large extent education operates as a positional good, whereby the advantages obtained by some necessarily translate to disadvantages for others. For example, Baker and Green (2008) explain that “In a system where children are guaranteed only minimally adequate K–12 education, but where many receive far superior opportunities, those with only minimally adequate education will have limited opportunities in higher education or the workplace.” (p. 210) This concern is particularly pronounced in a city like New York where children and families are constantly jockeying for position to gain access to selective admissions public middle and secondary schools, and where the majority of charter schools serve elementary and middle grades. The competitive position of children in otherwise similar district or charter schools with fewer resources is compromised by the presence of better resourced district or charter schools. Though surely, all would be less well off if all were substantially though equally deprived.

Variation in resources across private providers, as well as across charter schools tends to be even greater than variation across traditional public schools (Baker, 2009, Baker, Libby & Wiley, 2012). Further, higher and lower quality private and charter schools are not equitably distributed geographically and broadly available to all. In the most extreme case, in New Orleans following Hurricane Katrina where traditional district schools were largely wiped out, and where choice based solutions were imposed during the recovery, entire sections of the city were left without secondary level options and provided a sparse few elementary and middle level options (Buras, 2011).

Baker, Libby and Wiley show that in New York City, charter expansion has yielded vastly inequitable choices. Most New York City charter school networks serve far fewer children qualifying for free lunch (<130% poverty level), far fewer English language learners and far fewer children with disabilities than same grade level schools in the same borough of the city. These patterns of student sorting induce inequities across schools. But, these schools also have widely varied access to financial resources despite being equitably funded by the city. Some charter networks are able to outspend demographically similar district schools by over $5,000 per pupil, and to provide class sizes that are 4 to 6 (or more) students smaller.

Put simply, on cannot assume that providing a “system of great schools” will necessarily yield an equitable system of high quality, operationally efficient schools. It hasn’t and it won’t, in New Orleans, New York, in Sweden or anywhere. City leaders must actively manage the provision of an equitable, high quality and operationally efficient school system rather than simply assuming that a system of great schools will necessarily accomplish that goal.

Moving forward in the short term:

The city should develop more transparent, comparable reporting of district and charter school site-based revenues and expenditures, inclusive of a) private contributions by source and b) in-kind expenditures from parent organizations, including salaries and benefits of centrally employed staff. More detailed reporting of soft money and in-kind contributions may provide insights regarding policy efforts to improve resource equity between charter and district schools, and among charter schools.
All schools operating within the city should be brought under the same policy umbrella to ensure more equitable distributions of students and the resources to serve them. This means financing charter schools in accordance with the student populations they serve, via weighted student funding. This also means considering policy alternatives for balancing resource access across schools, given their widely varied access to private resources.
City leaders should push state leaders for the billions in resources stilled owed city school children in the years since the ruling in Campaign for Fiscal Equity.
Finally, given the increased organizational complexities of privately governed and managed charter schools, the city should take steps to ensure that children’s and employees’ rights remain equally protected (when compared with their peers in “government operated” district schools). The choice between private management and public provision of schooling is not benign and should not be taken lightly. Increasingly federal and state case law is revealing that children’s and employees rights are substantively lessoned in schools managed and operated by private entities. In addition, taxpayer rights to gain access to records and finances of private providers have also been interpreted by courts as more limited (than access to similar information from government entities).

Additional Readings

Baker, B.D., Libby, K., & Wiley, K. (2012) Spending by the Major Charter Management Organizations: Comparing charter school and local public district financial resources in New York, Ohio, and Texas.Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/files/rb-charterspending_0.pdf
Baker, B.D., Libby, K., Wiley, K. Charter School Expansion & Within District Equity: Confluence or Conflict? Education Finance and Policy
Baker, B.D. (2012). Review of “New York State Special Education Enrollment Analysis.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-ny-special-ed.
Buras, K. L. (2011). Race, charter schools, and conscious capitalism: On the spatial politics of whiteness as property (and the unconscionable assault on black New Orleans). Harvard Educational Review, 81(2), 296-331.

The Post-Equity Era in School Finance

I’ve written a few posts in recent months where I’ve raised concern about the apparent complete disregard (& outright ignorance) for the role of equitable and adequate financing of our public schools. The bottom line is that providing for a high quality, equitably distributed system of public schooling in the United States requires equitable, adequate and stable and sustainable public financing. There’s no way around that. It’s a necessary underlying condition.

I too often here pundits spew the vacuous mantra – it doesn’t matter how much money you have – it matters more how you spend it. But if you don’t have it you can’t spend it. And, if everyone around you has far more than you, their spending behavior may just price you out of the market for the goods and services you need to provide (quality teachers being critically important, and locally competitive wages being necessary to recruit and retain quality teachers). How much money you have matters. How much money you have relative to others matters in the fluid, dynamic and very much relative world of school finance (and economics more broadly). Equitable and adequate funding matters.

But alas, it seems that one of the first things to go when the economy tanked a few years back was any sense that equity could ever be important. Take, for example, NY Governor Cuomo’s recent response to a challenge to racial disparities in funding shortfalls in his state.

Asked in a radio interview this morning about Schenectady Schools Superintendent Larry Spring joining those filing a federal civil rights complaint against the state alleging its school funding mechanism shortchanges minority students and those with disabilities, Gov. Cuomo didn’t so much answer the question as elaborately re-phrase it.

That’s called democracy, and that’s what the Legislature debates every year and what is the fair amount of funding for each district. And should a rich district get no money because they’re a richer district, or should they get more money because they put in more money? Should the needier districts get all the money because they’re needier even though they put in less? And that is the annual debate of the state budget and the education funding formula.

Over the past several weeks, as I embark on a new project evaluating the past 20 years of funding equity and funding level shifts across states, I’ve begun playing around with alternative ways to characterize changes to funding over time, and evaluate causes of those changes. I’ve explained in previous posts how the amount of total state aid, for example, is only part of puzzle. The extent to which state aid is targeted according to local fiscal capacity and need is most important for determining whether increased state aid will improve overall equity. For example, here’s New Jersey state aid per pupil and total state and local revenue per pupil in 1997 and again in 2007.

In 1997, districts with higher poverty rates were already receiving higher levels of state aid than their less needy counterparts. But the differences in aid were not sufficient to create an overall upward tilt – progressive pattern.

Figure 1 – NJ in 1997

By 2007, the infusion of state aid into high need districts had pushed those districts to a point where they were better positioned to provide smaller class sizes and to pay more competitive wages. The state aid had been sufficiently targeted to achieve an overall progressive distribution of state and local revenue.

Figure 2 – NJ in 2007

What I’ve been working on lately, is tracking the relative progressiveness/regressiveness of state and local revenues and of state aid, over time from 1993 to 2011 for all states, using the same “fairness ratio” we use in our annual report on school funding fairness.

Imagine, for example, having a state and local revenue fairness indicator for every year, for each state, from 1993 to 2011.

Where the index is 1.0, a district with 30% children in poverty (census poverty) is expected to have about the same state and local revenue per pupil as a district with 0% poverty.
Where the index is 1.2, a district with 30% children in poverty is expected to have about 120% (20% more) revenue per pupil than a district with 0% poverty.
And where the index is .8, a district with 30% poverty is expected to have about 80% of the revenue of a district with 0% poverty.

As in this hypothetical, one can track the changes in targeting of state aid along with the changes in overall state and local revenue fairness. Note that even if state aid fairness stays constant from one year to the next, changes in local revenue raising patterns can alter equity. Further, state aid might be allocated relatively “fairly” but at too low a level to improve overall state and local revenue equity.

In my forthcoming academic papers on this topic, models include multiple moving parts.

In this hypothetical, in 1993, state aid is poorly targeted according to needs but that targeting improves over time (moving to right) and as a result, state and local revenue fairness improves (moving up vertically). By 2007, the system reaches it’s peak of aid targeting and overall progressiveness. But then the system falls back as state aid targeting declines – perhaps as a function of disproportionate aid cuts to the neediest districts while holding harmless less needy districts.

Figure 3 – Hypothetical over Time

Well, that’s what it looks like in hypothetical land. Now how does this look for actual states? Let’s start with a few that have spent much of the period in the progressive funding zone, above the red horizontal line.

Each of these states reaches its peak around 2008 or 2009, then declines. New Jersey’s decline in aid targeting and overall progressiveness between 2009 and 2011 is particularly striking (at this point I’m waiting for the next year of data to see what’s going on here. Yes, it’s declined, but this seems more than expected).

Figure 4 – New Jersey

Massachusetts is a messier picture. School finance reforms in the early 1990s substantially shifted responsibility to the state (counterbalancing local revenue losses and emergent inequities from the 1980s in response to constitutional tax limits). Since that time, overall progressiveness has floated around between 1.2 and 1.4 and state aid fairness between 2.5 and 4.0. Overall progressiveness appears to peak twice in 2001 and again in 2008.

Notably, Massachusetts was among those states that took a pretty hard economic hit in the 2001-2003 economic slowdown (where states that had the largest share of income derived from investments/non-wage income seemed to suffer most). By 2011, however, Massachusetts is about as low on overall progressiveness as it has been since implementing school finance reforms in the 1990s.

Figure 5 – Massachusetts

Ohio’s story is similar. Ohio was under judicial pressure throughout the 1990s (DeRolph cases) and is often characterized as being relatively non-responsive to that pressure. But Ohio did improve its overall progressive and its state aid targeting throughout that period, reaching a peak around 2007/08. But, then, like others, declining state aid targeting set the state back quite significantly (though 2010 and 2011 stay about the same).

Figure 6 – Ohio

Connecticut really never implemented any systematic statewide school finance reform (maintaining variations on the Education Cost Sharing Formula since the mid 1990s). But Connecticut did over time allocate substantial lumps of aid to Hartford and New Haven for their magnet school programs, creating an appearance of a progressive statewide system. That system reaches its peak of progressiveness (leaving out many other high need districts) at a few points in the 2000s (2000, 2001, 2008, 2002) and peak of targeting in 2007, then, like others declines quite substantially – ending at flat funding.

Figure 7 – Connecticut

Figure 8 – Kansas

Kansas is a fun case that begins with adoption of a new weighted funding formula coupled with spending/revenue limits at the outset of the data. changes to local property taxation and shifting state aid lead to a rather jumbled mess of persistent regressiveness through the early to mid 1990s. In the later 1990s, legislators imposed cuts to local revenue requirements and then in 2001, state aid freezes and cuts let to a system that became more and more regressive, despite marginally better aid targeting, reaching its low in 2004. At this point, judicial pressure (2003) kicks in, followed by a high court ruling (2006) accepting reform legislation which temporarily drives Kansas school funding into the progressive zone. But that doesn’t last long, and by 2011, Kansas finds itself back in regressive territory, similar to year 2000 levels.

And now for the states that have never even come close to cracking the progressive threshold even with reforms (Pennsylvania) and judicial pressure (New York).

Figure 9 – New York

New York did make progress over time, and implementing a new funding formula after the Campaign for Fiscal Equity ruling did make some difference on state aid targeting. But that targeting has declined with steep aid cuts to needy districts.

Figure 10 – Pennsylvania

Pennsylvania had a fleeting moment in the late 200s where targeting of aid, and overall equity improved, but they’ve now reverted to regressiveness levels comparable to 1993.

Finally, Illinois never has, and perhaps never will really give a damn.

Figure 11 – Illinois

Aid became marginally more targeted around 2006-2009, but by 2011, Illinois aid is about as targeted as it was in the early 1990s and the system remains even more regressive than it was during that time period.

And yet we wonder why our lower income children’s educational outcomes continue to suffer? We pretend that if only our higher poverty districts would fire that bottom 5% of teachers who produce bad test scores (gains), they’d do better (because of course, they can hire a new crop of better teachers even if they can’t pay a competitive wage?). We pretend that expanding charter schooling, to siphon off the less needy among the needy into privately subsidized (soft money) schools (and diminished legal protections) that somehow we’ll achieve a desirable systemwide effect?

We continue to place risky bets on not only revenue neutral, but revenue negative “solutions.” But hey, those are other people’s children anyway, right?

Meanwhile, the damage that’s been done to our public education systems by outright and at times belligerent neglect of state school finance systems has, in the past 3 years alone set us back in many cases 20 years.

Now is the time to turn that corner and attempt to repair that damage as quickly as it was inflicted.

Ignorati Honor Roll 2013: Pundit Version

As 2013 comes to an end, it’s time to review some of the more ridiculous claims and arguments made by pundits and politicians over the course of the past year.

A definition of “Ignorati” is important here:

Elites who, despite their power, wealth, or influence, are prone to making serious errors when discussing science and other technical matters. They resort to magical thinking and scapegoating with alarming ease and can usually be found furiously adding fuel to moral panics and information cascades. [ http://www.urbandictionary.com/define.php?term=Ignorati ]

I’m sure I’ve missed many good ones (please do send) and I’ve definitely put more weight in my selection on stuff I’ve come across recently than stuff that appeared at the beginning of the year. I’ve tried to select statements and representations of data that are so foolish that, in my view, they severely undermine the credibility of their source. At least a few of these are statements made by pundits (this post) and politicians (next post) and echoed by the media, that are so patently false and/or foolish that it’s rather surprising that anyone could swallow them whole.

So, without further ado…

Petrilli on PISA and Poverty

Let’s start with two claims made by Mike Petrillli in a recent post at Ed Excellence, in which he opined that bad teachers (or at least bad teaching), not poverty must be causing low PISA scores on Math for U.S. 15 year olds. Mike was perplexed that a) poverty might affect math outcomes as much as (if not more than) reading, thus something else must really be affecting math (bad teachers/teaching) and b) that poverty was affecting our 15 year olds’ outcomes, when we all know poverty affects younger kids more!? (really?). Mike’s goal was to explain that one must accept unreasonably complicated assumptions if one is to accept that poverty might comparably influence math or that poverty affects outcomes of older children as well as younger ones. Here it is in Mike’s own words (setting up the supposed “bad” assumptions used by others).

First, one must assume that math is somehow more related to students’ family backgrounds than are reading and science, since we do worse in the former. That’s quite a stretch, especially because of much other evidence showing that reading is more strongly linked to socioeconomic class. It’s well known that affluent toddlers hear millions more words from their parents than do their low-income peers. Initial reading gaps in Kindergarten are enormous. And in the absence of a coherent, content-rich curriculum, schools have struggled to boost reading scores for kids coming from low-income families.

…the second assumption must be that “poverty” has a bigger impact on math performance for fifteen-year-olds than for younger students. But I can’t imagine why. If anything, it should have less of an impact, because our school system has had more time to erase the initial disadvantages that students bring with them into Kindergarten.

Here’s a link to my post with the complete rebuttal and explanation!

Petrilli’s conclusion in the face of these inexplicable assumptions?

Maybe we’re just not very good at teaching math, especially in high school.

Here’s a shortened version of my earlier critique. First, there’s no evidence that poverty affects measured reading outcomes more than measured math outcomes, especially for highly aggregated student populations. The key word here is “measured.” Often math achievement simply seems more precisely, accurately or consistently measured (revealing more predictable variation) thus revealing clearer, more predictable gaps.

Yes, we have evidence of disparate outcomes by poverty for reading. But we have ample evidence of disparate outcomes by poverty for math. Even though we’ve been subjected lately to new reports (of old news) that higher income kids get exposed to more words earlier, that doesn’t mean that higher income kids don’t also get exposed to mathematical thinking/basic numeracy early on.

Here are the state aggregate math and reading outcomes by poverty for NAEP, and not-so-surprisingly (for anyone with an ounce of background in this stuff), the math scores are marginally more disparate than the reading scores.

Figure 1

The second assertion is actually even more silly – that poverty affects early learning and thus only bad teaching affects what happens since (say, between 4^th and 8^th grade tests). Put simply, the effects of poverty are cumulative over time, most often leading to increasing gaps in later grades when compared with earlier grades (I should note, especially if we do not put sufficient support into resolving those gaps).

Here’s the empirical snapshot.

Figure 2

Now, this type of thinking isn’t novel for Mike. He’s made up lots of stuff before that simply doesn’t pass the most basic smell test, presenting it as some form of clever insightful revelation that makes perfect sense if you have little or no background on the issue (to his credit, it’s always done with a grin/smirk and ability to dance).

Among the most egregious examples was his policy brief a few years back with Marguerite Roza on Stretching the School Dollar which included many examples of policies and spending practices he’d like to see changed in schools, many of which actually had little or nothing to do with stretching dollars at all. For more on this topic, see this post, this policy report, and this peer reviewed article. (no, I can’t believe I wasted so much time rebutting utterly foolish schlock!)

And with the utmost class coupled with their usual depth of substance, TB Fordham responds by tweeting:

Baker’s a quack http://t.co/DbG09vw6Uk @SchlFinance101

— Fordham Institute (@educationgadfly) December 28, 2013

Good stuff. Deep.

Smarick on Propping up Philadelphia

For the most compelling evidence that U.S. schools are dreadfully failing in mathematics preparation, one might point to the quantitative wizardry of Andrew Smarick. Like Petrilli, Smarick can’t be confused with really basic facts and numbers.

Over the past year, Smarick has gone on at least a few twitter rants about how the City of Philadelphia has been propped up with so much additional funding over the years and has still proven itself to be a complete failure. His twitter rants about wasted state aid, and Philly’s egregious, inefficient, under-productive overspending are in support of his agenda to simply eliminate public urban school districts and replace them with collections of charter schools (despite evidence that PA charters haven’t done a very good job).

@kombiz Philly’s district = terrible for decades, families left, as a result it’s bankrupt. Gotten huge state funding for yrs to prop it up.

— Andy Smarick (@smarick) August 11, 2013

What’s so ridiculous about Smarick’s claims here is that Philadelphia is and has been for some time, among the least well-funded major urban school districts in the nation. One can find evidence of this in many, many places and public data sources. Smarick’s angle is to simply assert that Philly receives more state aid than other PA districts. Yes… and Philly has a lot more students.

While Smarick’s entire argument for ending the urban district is suspect and full of holes (laced with historical, policy, legal and empirical ignorance) there are many other cities that might serve as better examples than Philly.

Here are two representations of Philly school funding in context. First, here’s Philly’s state and local revenue per pupil, relative to the average for its metropolitan area (PA districts only), where 1.0 is average, with districts arranged by poverty. Put simply, Philly has much greater need – higher poverty – than surrounding districts and lower than average funding. Philly is that big one… actually, those big four shapes, below the average line and with high poverty – getting higher from year to year. Way up in the upper left, is the adjacent leafy suburb – Lower Merion!

Figure 3

Let’s make this even simpler by comparing Philly, Allentown and Reading – three of the most fiscally screwed districts in the nation – to other more affluent Pennsylvania districts, and by breaking out their state and local revenues. Here, we can look specifically for that “propping up” with state aid effect. And guess what? It’s not there.

C’mon dude! This is ridiculous. Download some freakin’ data, either from PA dept of ed, or use the Census Fiscal Survey. It’s really not that hard. Make a graph. Philly has not been “propped up!” Okay… yeah… propped up more than if it was left entirely to raise school funds on local property taxes and propped up slightly more than Reading or Allentown. But to suggest that the state has, time and time again, bailed out Philly, given it more than would be necessary to achieve desired outcomes, is utterly ridiculous, reckless, irresponsible or downright incompetent.

Figure 4

Jeanne Allen’s Bizarre Interpretation of the U.S. & Louisiana Constitutions

This one is a bit different, but I would be remiss if I didn’t revisit it. No, it’s not one of those things that can be simply rebutted with a graph or two. Rather, this is one to be rebutted with a basic understanding of civics. If I go back further in my posts over time, I can find at least a handful of blustery reformy posts which are illustrative of our failures of civics education in the U.S.

There’s this one from the ever insightful Bob Bowdon of Choice Media, in which Bowdon decries supposed union supported legal attacks on Georgia’s charter authorizing authority (totally neglecting the actual phrasing of the Georgia constitution and the role of the courts in interpreting that constitution).

There’s this one, in which a Kansas attorney wishes to argue that there exists an individual liberty interest to impose unlimited local property taxation (e.g. that state imposed tax and expenditure limits violate that individual liberty).

But earlier this year, when the Louisiana courts struck down that state’s voucher program redirecting tax dollars to private schools, Jeanne Allen of the Center for Education Reform penned a response of unprecedented civic ignorance, her core argument being that in her view, the U.S. Constitution (as interpreted in the Cleveland voucher case of Zelman v. Simmons-Harris) protects an individual liberty to taxpayer funded private schooling.

In her own words:

“If indeed the Louisiana constitution, as suggested by the majority court opinion, prohibits parents from directing the course of the funds allocated to educate their child, then the Louisiana constitution needs to be reviewed by the nation’s highest court,” said Center for Education Reform President Jeanne Allen.

Allen added: “I urge Governor Jindal to file an appeal to the US Supreme Court, and ask for the justices’ immediate review of the decision. The Louisiana justices actions today violate the civil rights of parents and children who above all are entitled to an education that our Founders repeated time and time again is the key to a free, productive democracy.”

This is a bizarre interpretation indeed, of a ruling (Zelman) that permits the public financing of private schooling, inclusive of religious alternatives (that is, the specific model used in Cleveland was found NOT to violate the establishment clause in its use of public dollars for vouchers to private religious schools). That is not to say, by any stretch of the imagination, that this case by extension establishes a right for children everywhere to access public dollars for their private education. You see, “permit” and “require” are two very different things.

For more explanation, please see this post.

Lerum/Students First (and many others) on DC and Tennessee NAEP Miracles

I conclude with perhaps my favorite of reformy echo-chamber claims of 2013, one which we were all graced with not just once, but twice in recent months with the release of 2013 State NAEP scores and the later release of 2013 large urban district NAEP scores.

Most of the mis-NAEP-ery centered on claims of great gains in achievement (between one cohort of kids two years ago, and another cohort this year) in reformy favorites Tennessee and Washington DC. The central assertion of the reformy echo-chamber was that these great gains experienced in Tennessee and DC were proof positive that teacher evaluation reforms are working! Take, for example Eric Lerum’s Blog post on the Students First web site which starts with:

The 2013 National Assessment of Educational Progress (NAEP) results provide some of the strongest evidence yet that investment in student-centered education reforms improves student achievement.

Further down in the post, Lerum explains just what that compelling evidence is and what it means! Lerum provides us 3 “truthys”

First, that investment in teacher quality matters. Tennessee and D.C. have both implemented comprehensive teacher evaluation systems paired with targeted professional development, and (along with Florida) they were out ahead of all other states in doing so. This has established them as national leaders in policies related to teacher quality.

…

Second, we learned that rigorous academic standards make a difference. D.C. and Tennessee were early adopters of the Common Core State Standards and have been dedicated to good-faith implementation. They gave teachers and schools the resources and training necessary to put the standards in action, and students responded.

Third, it is clear that education reform isn’t about partisan politics. D.C. is one of the most liberal jurisdictions in the country; Tennessee is one of the most conservative. But when policymakers and education stakeholders withstand political pressure and make the changes needed to improve schools, kids win.

Well, that third one’s a bit of an aside, but let’s take a look at the first two. That recently adopted teacher evaluation policies and early adoption of common core standards have lifted DC and TN to new heights on NAEP!

Now, for these latest findings to actually validate that teacher evaluation and/or other favored policies are “working” to improve student outcomes, two empirically supportable conditions would have to exist.

First, that the gains in NAEP scores have actually occurred – changed their trajectory substantively – SINCE implementation of these reforms.
Second, that the gains achieved by states implementing these policies are substantively different from the gains of states not implementing similar policies, all else equal.

And neither claim is true, as I explain more thoroughly here! But here’s a quick graphic run down.

First, major gains in DC actually started long before recent evaluation reforms, whether we are talking about common core adoption or DC IMPACT. In fact, the growth trajectory really doesn’t change much in recent years. But hey, assertions of retro-active causation are actually more common than one might expect!

Figure 11

Note also that DC has experienced demographic change over time, an actual decline in cohort poverty rates over time and that these supposed score changes over time are actually simply score differences from one cohort to the next. This is not to downplay the gains, but rather to suggest that it’s rather foolish to assert that policies of the past few years have caused them.

Second, comparing cohort achievement gains (adjusted for initial scores… since lower scoring states have higher average gains on NAEP) with STUDENTS FIRST’s own measures of “reformyness” we see first that DC and TN really aren’t standouts, that other reformy states actually did quite poorly (states on the right hand side of the graphs that fall below the red line), and many non-reformy states like Maryland, New Jersey, New Hampshire and Massachusetts do quite well (states toward the middle or left that appear well above the line).

Needless to say, if we were to simply start with these graphs and ask ourselves, whose kickin’ butt on NAEP gains… and are states with higher grades on Students First policy preferences systematically more likely to be kickin’ butt, the answers might not be so obvious. But if we start with the assumption that DC and TN are kicking butt and have the preferred policies, and then just ignore all of the others, we can construct a pretty neat – but completely invalid story line.

Figure 12

Closing thoughts…

I hear you say… hey… this is a bit personal isn’t it? Well, yes, I’ve called out individuals for their arguments and I too prefer sticking to the substance of those arguments. But let’s be clear here – I’m calling out the arguments – their substance – or blatant lack thereof. These ridiculous arguments happen to have emanated from these individuals.

Certainly, Lerum was far from the only one to have made the absurd DC and Tennessee claims. I could pick many others for that one… but Lerum expressed it all so eloquently wrong in his Students First post, and I had already done the comparisons with Students First own ratings of reformyness. So this was low hanging fruit. I used a ridiculous Nick Kristoff quote in my original post on this topic.

Smarick’s belligerent and repeated wrongness is simply unexcused. Smarick simply can’t be bothered with facts. He would tweet this silliness. I’d write about how wrong it was… and a week or two later… he’d be back to Philly bashing – all again with his argument that the state has already done all it can to fiscally prop up Philly.

As for Petrilli, these complete wacky, ill-conceived and under-informed arguments are his own, and to his credit, his response to my earlier post was gracious. But that doesn’t keep him off the Ignorati honor roll. It just allows him to wear that badge with honor.

And now, we sit and wait to see what the new year will bring.

Cheers.

On Short-Term Memory & Statistical Ineptitude: A few reminders regarding NAEP TUDA results

Nothin’ brings out good ol’ American statistical ineptitude like the release of NAEP or PISA data. Even more disturbing is the fact that the short time window between the release of state level NAEP results and city level results for large urban districts permits the same mathematically and statistically inept pundits to reveal their complete lack of short term memory – memory regarding the relevant caveats and critiques of the meaning of NAEP data and NAEP gains in particular, that were addressed extensively only a few weeks back – a few weeks back when pundit after pundit offered wacky interpretations of how recently implemented policy changes affected previously occurring achievement gains on NAEP, and interpretations of how these policies implemented in DC and Tennessee were particularly effective (as evidenced by 2 year gains on NAEP) ignoring that states implementing similar policies did not experience such gains and that states not implementing similar policies in some cases experienced even greater gains after adjusting for starting point.

Now that we have our NAEP TUDA results, and now that pundits can opine about how DC made greater gains than NYC because it allowed charter schools to grow faster, or teachers to be fired more readily by test scores… let’s take a look at where our big cities fit into the pictures I presented previously regarding NAEP gains and NAEP starting points.

The first huge caveat here is that any/all of these “gains” aren’t gains at all. They are cohort average score differences which reflect differences in the composition of the cohort as much as anything else. Two year gains are suspect for other reasons, perhaps relating to quirks in sampling, etc. Certainly anyone making a big deal about which districts did or did not show statistically significant differences in mean scale scores from 2011 to 2013, without considering longer term shifts is exhibiting the extremes of Mis-NAEP-ery!

So, here are the figures…. starting with NAEP 8th grade math gains for 10 years, against the initial average score in 2003.

The relationship between 10 year gains on 8th grade math and initial average score is relatively strong. DC and LA which appear to be getting the early applause for their reformy amazingness pretty much fall right in line with expectations. Boston is a standout here… and Cleveland? well… that’s a bit perplexing, but Cleveland reveals perplexing data on many levels in ed policy (including some of the consistently highest school level low income concentrations in the nation).

The relationship for reading is not quite as strong:

LA is lookin’ pretty good here, but starting pretty darn low – lower than DC… which, by the way, really isn’t a standout here on 10 year gains. Cleveland? well… not a pretty sight… Other cities fall pretty much in line with expectations given their initial 2003 mean scores.

Here are the 4 year gains for math grade 8:

DC looks a little better here… but as previously, cities fall among the states in roughly their expected locations- but for Cleveland and Detroit, which seem to lag. San Diego, a relative standout on 10 year gains, lags on 4 year gains, but that’s hardly a condemnation of a city that a) has made longer term gains and b) as of 2009 sits among the higher performing jurisdictions.

Finally, here’s the 4 year gain for reading grade 8:

This relationship is certainly less consistent. DC falls more or less in line. Cleveland and Milwaukee aren’t lookin’ so good. San Diego is back above the line, but having started and remaining lower in the pack than they were on math.

Again, the big caveat here is that these aren’t “gains” but rather cohort differences. And one might suspect population change to occur more quickly in cities than in states, especially in those cases where cities have smaller overall student populations than states (setting aside those pesky low population states like VT, WY, etc.).

What to make of this all? Not much really. Does NAEP TUDA provide broad condemnation of urban education in the U.S. Well, only to the extent that NAEP generally provides such condemnation, since cities and states tend to fall in line with one another (but for some notable standouts). Do these data present us with obvious pictures about current policy preferences or directions? Well, that would be hard to assert given that these data don’t really present us with consistent pictures – but for the fact that starting point matters, and my previous post illustrating how demography matters.

This is by no means to suggest that policies and practices don’t matter, but rather that frequent, egregious misinterpretation of NAEP data provides no value-added to the policy conversation. (yeah… I said value-added!?)

SUPPLEMENTAL FIGURES

Here are a few additional figures from a few years back… it took a while to find them (they are from a project I did on poverty measurement), but they establish the rather obvious fact that these NAEP TUDA scale scores (level scores) are also associated with economic context – specifically, poverty concentration.

Given that many of these cities are high poverty settings, the relationship is actually tighter when I use the more stringent census poverty threshold (rather than free lunch, which is 130% of poverty level), even though these city level poverty data are not necessarily completely overlapping with school district enrollments. What these data do show is that Cleveland and Detroit are simply much higher poverty settings than the other cities in the sample (for 5 to 17 year old children). And that is certainly relevant to both score levels and potential changes in cohort level scores over time.

NAEP scores are from 2009

Racial Disparities in NY State Aid Shortfalls

Yesterday, Ed Law Prof Blog posted an update about the Office of Civil Rights complaint to be filed by Schenectady School District claiming that shortfalls in New York State aid fall disparately by student race.

I’ve reported on numerous occasions on this blog the patterns of disparity in New York State funding. I actually hadn’t checked recently the strength of the relationship between funding shortfalls and school district racial composition. As the Ed Law blog explains, litigation around this question (that of racially disparate impact of school funding policy) was largely headed off by the Sandoval case which held that no private right of action exists for challenging policies violating disparate impact regulations promulgated under Title VI of the Civil Rights Act. “Disparate impact” occurs where a policy ends up having different effects on one group versus another, by race, ethnicity or national origin but not necessarily because the policy is written explicitly to treat individuals differently by race. That is, it’s a statistical association with race that may not have to do directly with race. But then again, it might. That’s the hard part to prove when race isn’t written right into the policy as it used to be, say, in the pre-Brown era. For those interested in some additional school finance reading on this topic see:

Baker, B. D., & Green III, P. C. (2005). Tricks of the Trade: State Legislative Actions in School Finance Policy That Perpetuate Racial Disparities in the Post‐Brown Era. American Journal of Education, 111(3), 372-413. AJE_Baker_Green_Tricks

In the post-Sandoval era, complaints regarding policies that yield racially disparate impact are to be brought as administrative claims, through the relevant federal agencies/departments, just as Schenectady has done here (as elaborated in Ed Law Prof Blog).

So today’s big question is just how bad are the racial disparities in state aid shortfalls in New York State?

Is Schenectady right?

First, let’s define state aid shortfall. As I’ve explained on previous posts, New York operates a foundation aid formula which defines the per pupil amount of funding that is required for each district, given it’s location (labor market) and students (needs) in order to achieve adequate outcomes (this formula being the state’s own proposed remedy to previous state litigation over the adequacy of funding). So, in step one, the state calculates adequate target funding:

1) Sound Basic Funding Target = base funding figure x pupil need index x regional cost index x aidable pupil count

Where that “aidable pupil count” figure includes some additional adjustments.

Step two determines the amount the local district should contribute to the sound basic target funding and thus, the remaining amount to be contributed as state aid.

2) State Aid = Sound Basic Funding Target – Local Contribution

But the problem is that New York has, in nearly every year since proposing this remedy to past litigation, added a few more steps to the calculation, which include:

freezing foundation funding to levels from several years prior
invoking the deceptively named “Gap Elimination Adjustment” to inflict disproportionate cuts on needier districts
enforcing local property tax limits that effectively prohibit districts from making up their losses in state aid – and effectively prohibit districts from even coming close to achieving the level of funding the state itself has declared as constitutionally adequate. Notably, the aid shortfalls are so extreme that low wealth districts really couldn’t ever tax themselves locally enough to make up the losses even if they tried.

Point #3 above is the subject of a separate lawsuit challenging the absurdity of invoking a policy that would prohibit, even if possible, districts from raising the level of funding the state itself declares as adequate but refuses to provide.

So, after the additional freezes and cuts are invoked, we can determine the state aid gap as follows:

State Aid Shortfall = State Aid to Achieve Sound Basic Funding Target – Actual State Aid after Freeze and Gap Elimination Adjustment

And just how related to race are those aid shortfalls? Well, here it is, based on the 2013-14 State Aid Runs merged with demographic data from the 2012 NYSED School Report Cards:

Previously, I’ve shown that these aid shortfalls are pretty strongly associated with the state’s own Pupil Need Index with higher need districts facing larger shortfalls. And racial composition is associated with the pupil need index, if we focus on traditionally disadvantaged racial aggregate classifications (which is a whole separate can of worms).

To summarize the graph above, which visually displays only those districts with greater than 2,000 pupils, but includes all (weighted for enrollment) in statistical estimates, it is certainly the case that New York State districts with higher concentrations of black or Hispanic children have greater state aid shortfalls.

There is indeed a racially disparate impact.

Moreover, that impact is pretty darn big. Moving from a district with 0% black or Hispanic children to one with 100% black or Hispanic children yields a difference in funding gap of over $2,000 per pupil.

Many of the state’s highest minority concentration districts have state aid shortfalls between $5,000 and $10,000 per pupil whereas NONE of the lowest minority concentration districts has an aid shortfall over $5,000 per pupil!

And these state aid shortfalls are shortfalls against the State’s own (paltry, low-ball) estimates of what it might have taken to achieve the now dated outcome standards of 2007 (under previous litigation)!

UPDATE:

Here’s a quick multivariate run of the data to determine whether otherwise similar districts with more minority children have bigger funding gaps, where otherwise similar is determined with respect to components of the formula itself – the Regional Cost Index, Pupil Need Index and the additional weights included in the Total Aidable Foundation Pupil Unit count.

Somewhat surprisingly, in this regression, the racially disparate impact is actually larger than when previously represented only as a bivariate relationship between funding gaps and race. I’d have expected the Pupil Needs Index to have substantially moderated the relationship between race and funding gap. But, it is also likely that within any region, the funding gaps are more disparate by race than they appear statewide. This occurs because many of the high minority districts are in higher cost regions.

Petrilli’s Hammer & the poverty has nothing to do with PISA argument

Mike Petrilli over at TB Fordham has made his case for why differences in national economic context do little to substantively explain variations in PISA scores.

He frames his argument in terms of Occam’s Razor, as if to sound well informed, deeply intellectual and setting the stage to share profound logical argument, summarized as follows:

“among competing hypotheses, the hypothesis with the fewest assumptions should be selected.”

Petrilli asserts that while some might perceive a modest association (actually, it’s pretty strong) between national economic context and average tested outcomes in math, for example… like this…

…that it is entirely illogical to assert that child poverty has anything to do with national aggregate differences in math performance at age 15.

That is, the various assumptions that must be made to accept this crazy assertion – that economic context matters in math performance – simply don’t hold water in Petrilli’s mind. Rather, the answer must be much simpler and lie in the classroom, with our good ol’ American ineptitude at teaching math.

As Petrilli concludes in his post:

So what’s an alternative hypothesis for the lackluster math performance of our fifteen-year-olds? One in line with Occam’s Razor?

Maybe we’re just not very good at teaching math, especially in high school.

Accepting the bad math teaching conclusion simply requires fewer tricky assumptions than asserting any role for economic context in determining national aggregate outcomes.

Let’s call this Petrilli’s Hammer! as an illogical, blunt & necessarily under-informed alternative to Occam’s Razor. When in doubt – when too lazy to develop disciplined understanding of the field on which you choose to opine and when data are just too hard to handle, get that hammer and everything can look like a nail! (e.g. the bad teacher conclusion)

These two quotes frame Petrilli’s argument:

First, one must assume that math is somehow more related to students’ family backgrounds than are reading and science, since we do worse in the former. That’s quite a stretch, especially because of much other evidence showing that reading is more strongly linked to socioeconomic class. It’s well known that affluent toddlers hear millions more words from their parents than do their low-income peers. Initial reading gaps in Kindergarten are enormous. And in the absence of a coherent, content-rich curriculum, schools have struggled to boost reading scores for kids coming from low-income families.

AND

So the second assumption must be that “poverty” has a bigger impact on math performance for fifteen-year-olds than for younger students. But I can’t imagine why. If anything, it should have less of an impact, because our school system has had more time to erase the initial disadvantages that students bring with them into Kindergarten.

The problem is that both of these statements are a) conceptually foolish and b) statistically ignorant.

Let’s tackle the second issue conceptually first. These scores for 15 year olds are performance level – or status scores. Status scores reflect the cumulative effects of schooling and family background. Most notably in this case, status scores – math performance at age 15, reflect the cumulative influences of poverty – living in poverty – growing up in poverty – lacking resources over long periods of ones’ early life.

Here’s some more reading on poverty timing and cumulative effects.

And then there’s this report which I prepared last summer with ETS.

So… setting measurement issues aside here, we can logically expect gaps between lower and higher income kids to grow between earlier grade assessments and later grade assessments – if we choose to do little or nothing in policy terms about the circumstances under which these children live. Yes, we can and should leverage resources in schools to offset these gaps. But we’re not necessarily applying those resources either.

Accepting Petrilli’s second point above requires that we ignore entirely that our school system remains vastly disparate in many states and locations between rich and poor communities and reinforces (rather than erasing) the initial disadvantages that students bring with them to Kindergarten.

Now, backing up to his first point, where Petrilli argues that if higher poverty settings/contexts do worse relative to lower poverty settings on math than on reading assessments, there must be a simple answer for the math problem/disparity – like bad math teaching of course. There can be no logical explanation for why math scores might be more sensitive than reading scores to poverty variation. Assuming bad math teaching to be the reason for greater disparity in math than in reading is much simpler than exploring why it might appear that math test scores are more sensitive to context/poverty, etc. than reading scores. This is true because we all know that poverty affects reading more than math – or so Mike says without citation to any legitimate source validating his point.

This one is pretty simple. First, it may simply be the case that Mike Petrilli is wrong on all levels here. That conceptually and statistically, economic deprivation seems to have stronger affect on numeracy than on literacy. But even accepting the idea that poverty affects literacy more – in a substantive way – doesn’t mean that we’d find a stronger statistical relationship between a) variations in poverty across settings and b) variations in measured outcomes across settings. The fact is that variations in math assessments are often simply more predictable. They may be both more stable/consistent and may actually have more variation to predict.

Empirical Illustrations

I’m going to use state level NAEP data within the US here to provide statistical illustrations for the rather simple flat-out-wrongness of Mike Petrilli’s Hammer.

The following illustrations simply reveal how data of this type tend to play out, something anyone reasonably well versed in using assessment data along side economic data, at various levels of aggregation, would understand. Some of these patterns reveal conceptually sound underlying hypotheses, and some may simply be an artifact of typical issues occurring in the measurement of student outcomes at different ages and in different subjects.

So, for our first question we ask whether it can possibly be the case that there exists greater disparity in math outcomes in 8th grade than in 4th grade across US states of varying degrees of poverty (setting aside the substantive explanations for why such gaps increase).

Now, careful here, this one requires using a little algebra – slope/intercept analysis. The first figure here shows the variation in NAEP math outcomes for 8th graders and for 4th graders, both in 2013.

This figure shows us first of all, that 8th grade math scores are more predictably disparate as a function of poverty than are 4th grade math scores. For 8th grade, poverty alone explains 63% of the cross state variation in math scores, but marginally less (59%) for 4th grade.

The figure also shows us that by 8th grade, an additional 1% poverty is associated with 1.13 point lower state average scale score, whereas in 4th grade, 1% higher poverty rate is associated only with .83 points lower in state average scale score. That is, the negative slope is greater for 8th than for fourth grade.

There can be many, many reasons for this. Among these reasons might be that as time goes on, cumulative poverty related deficits do increase. Persistent disadvantage makes gaps grow. It may also be a measurement issue, pertaining to the precision of measurement of mathematics knowledge and skill, or it may even be an issue of the stability and predictability of tests on early grade math content given to 9 year olds versus tests on stuff like algebra and pre-algebra given to older, hopefully more mature kids (who’ve also taken far more tests by that time).

But, instead of gettin’ all thoughtful about these possibilities and arming ourselves with well-conceived arguments grounded in data and knowledge of the literature, we could simply use Petrilli’s Hammer to assert that the one and only logical answer is that math teachers in high poverty states like Alabama and Mississippi suck and math teachers in low poverty states like New Jersey and Massachusetts rock! It’s bad math teaching that is making this negative slope get worse between grade 4 and grade 8 – bad math teaching exclusively in high poverty states!

Is there greater disparity in Grade 8 Math than in Grade 4 Math by Contextual Poverty?

The next question then is how can it ever be that math scores might be more disparate as a function of poverty when we all know that poverty affects reading more?

The next figure shows the relationship between poverty by state, and math and reading scores in grade 4. Rather amazingly, math scores are more predictable as a function of poverty than are reading scores – note the difference in variance explained (r-squared). Now, (almost) anyone who has ever plotted reading and math “level” (status) scores, or even estimated value added scores for reading and math in relation to poverty or nearly any other covariate knows that this is common. Variation in math scores – level or value added – is often much more predictable than is variation in reading scores. As above, this may be for many, many reasons. Maybe we’re just not as good on the measurement side at teasing out differences in underlying skill on reading, with either 9 or 14 year olds?

That math scores are more predictably a function of poverty than reading scores – across states – doesn’t mean that our math teaching is better or worse than our reading teaching. Even though the math scores at 4th grade are more predictable than the reading scores, the reading slope appears slightly more disparate (steeper negative). And that doesn’t mean either that our reading teaching is more disparate, or that the 4th grade scores are picking up some differential on the baggage kids bring to school with them. It’s a statistical artifact of the data – based on how math and reading are being measured. It may mean something, but who knows what? It may mean absolutely nothing.

Are Grade 4 Math Scores more predictably a function of poverty than Grade 4 Reading Scores across contexts?

Finally, here’s the 8th grade math and reading. Here, math is marginally more predictable as a function of poverty and math outcomes are more disparate as a function of poverty.

At least by these measures – NAEP math and reading scores – aggregated to the state level – which is similar to making national comparisons – reading is NOT as Petrilli so confidently argues above “more strongly linked to socioeconomic class” than math.

International comparisons work much the same.

What about Grade 8 Math and Reading?

Indeed, Petrilli is attempting to assert that there exists an incongruity between the data and the underlying reality – that yes, reading scores are affected by poverty, but math not so much. Thus, if the data show that math scores are more affected by poverty than are reading scores, then something much more nefarious must be going on – Yes – the bad teacher/teaching problem!

It couldn’t possibly have anything to do with measurement issues or the significant possibility that the full range of student outcomes measured are similarly affected by economic deprivation. That would just be way too much to swallow.

But, if we want to go there… if we want to accept Petrilli’s argument that there’s simply no excuse for U.S. students to fall where they do on international math comparisons, because poverty doesn’t affect 15 year olds or math, only younger kids and reading, then we must apply Petrilli’s hammer to state-by-state comparisons as well.

And thus we logically conclude that math teaching in DC, MS, AL, LA stink and math teaching in NJ, MA VT and NH is great! And that poverty really has nothing to do with it?

Graph of the Day: My contribution to PISA Palooza

With today’s release of PISA data it is once again time for wild punditry, mass condemnation of U.S. public schools and a renewed sense of urgency to ram through ill-conceived, destructive policies that will make our school system even more different from those breaking the curve on PISA.

With that out of the way, here’s my little graphic contribution to what has become affectionately known to edu-pundit class as PISA-Palooza. Yep… it’s the ol’ poverty as an excuse graph – well, really it’s just the ol’ poverty in the aggregate just so happens to be pretty strongly associated with test scores in the aggregate – graph… but that’s nowhere near as catchy.

PISA Data: http://nces.ed.gov/pubs2014/2014024_tables.pdf

(table M4)

OECD Relative Poverty: Source: Provisional data from OECD Income distribution and poverty database (www.oecd.org/els/social/inequality).

Yep – that’s right… relative poverty – or the share of children in families below 50% of median income – is reasonably strongly associated with Math Literacy PISA scores. And this isn’t even a particularly good measure of actual economic deprivation. Rather, it’s the measure commonly used by OECD and readily available. Nonetheless, at the national aggregate, it serves as a pretty strong correlate of national average performance on PISA.

What our little graph tells us – albeit not really that meaningful – is that if we account (albeit poorly) for child poverty, the U.S. is actually beating the odds. Way to go? (but for that really high poverty rate).

Bottom line – economic conditions matter and simple rankings of countries by their PISA scores aren’t particularly insightful (and the above graph only marginally more insightful). Further, comparisons of cities in China to entire nations is a particularly silly approach.

Additional Readings:

Coley, R., Baker, B.D. (2013) Poverty and Education: Finding the Way Forward. ETS Center for
Research on Human Capital and Education. Princeton, NJ: Educational Testing Service

http://www.ets.org/s/research/pdf/poverty_and_education_report.pdf

Baker, B.D., Welner, K.G. (2011) Productivity Research, the U.S. Department of Education, and High‐Quality Evidence. Boulder, CO: National Education Policy Center. Retrieved [date] from
http://nepc.colorado.edu/publication/productivity‐research

Where are the most economically disproportionate charter schools? (& why does it matter?) UPDATED

Updated: It seems that Mike Petrilli on Twitter takes issue with my reference to these schools below as “segregated.” In his view, if a city includes some charter schools that have more of a 50/50 balance of low income and non-low income kids, those are the integrated schools, even if they achieve their balance by creaming off the non-low income kids in a district that is 80% low income. Petrilli seems to suggest that it is necessarily a good thing if charters can can create a balanced population for themselves, even if they create imbalanced population (even more intense concentration of poverty) for the system as a whole. Notably, an unanswered question by the data below is the extent to which the creation of economically non-representative charters in a city can help to retain some middle class families that might not have otherwise sent their children to the district schools. Certainly, there exists at least some evidence that Catholic school enrollments have suffered from charter expansion. It seems far less likely that these charters are recruiting into the city, higher income children from neighboring districts. To suggest that a majority, or even large share of non-low-income students in charters are retained (but would have otherwise left the public system), brought in from lower poverty neighboring suburbs, or siphoned from private schools and would not have otherwise attended the public system is a huge stretch – a smokescreen. It remains most likely that the vast majority of sorting displayed herein is internal to the public-charter system and unlikely to be crossing school district or city boundaries. [more below]

In this first of several posts, I explore economic variation in charter enrollments in the states of Massachusetts, New Jersey and Connecticut.

I’m taking a fairly simple, easily replicable approach here and encourage any data savvy readers to take their own shot at it. For this analysis I’m using the most recent three years of non-preliminary school level enrollment data from the National Center for Education Statistics Common Core of Data, Public School Universe Survey.

http://nces.ed.gov/ccd/pubschuniv.asp

I’m only using a handful of variables here. I’m using:

City of location (lcity)
Total school enrollment (member)
Total number of free lunch qualified children (frelch)
Charter school indicator (chartr)

For each year of the data, I sum the enrollment of all schools in the city of location, including charters and district schools and magnets or other special schools. That gives me the total number of all kids enrolled in a city (yeah… it’s a little messy in that some cities include schools that also enroll kids from outside the city – I limit the final lists to large enough enrollment areas where such cases should not substantively distort final numbers). I do the same for kids qualified for free lunch. So, I have:

City Total Enrollment
City Free Lunch Enrollment

Note that this is by city, not host district, but city is a relevant geographic unit for many reasons, including the fact that many US cities are actually carved into multiple segregated public school districts. Part of the point here is to run a quick-and-dirty summary with the publicly available, readily useable data.

Next, I determine each charter school’s market share:

School market share = school enrollment/city enrollment

And then each school’s share of low income kids served:

School free lunch share = school free lunch / city free lunch

If a school was serving a representative population by low income status, then the free lunch share for the school would equal the market share for the school. That is, the school would be serving both X% of total enrollment and X% of low income kids. I use a simple disparity ratio here:

School free lunch share / school market share

If the disparity ratio is say, .50, then the charter school is serving only half as many low income kids as would be proportional for that school.

To make the final data set manageable… I focus on charter schools in cities where the aggregate enrollment is greater than 10,000. And to have more stable numbers 1) I use only those charters with at least a 1% market share and I use a three year average (2009 to 2011).

So, let’s have at it. Here are the ratios for Connecticut schools:

All but two CT charters underserve low income students in these data. Four are under 70%. Park City, Jumoke and AF Bridgeport are particularly egregious examples!

Here’s Massachusetts:

Many Boston area schools are excluded from the above table on the basis that what outsiders generally think of as “Boston” is actually carved into many smaller city areas, many of which fell under my 10,000 aggregate enrollment threshold. I will report additional data on these areas at a later date.

And finally, New Jersey:

Unfortunately, in this last figure, we actually lose some of New Jersey’s most economically disproportionate charter schools which are in Hoboken, which fell under the aggregate enrollment threshold.

Why does this matter?

There exist at least two reasons why it matters to pay close attention to just how different charter schools are from their surroundings – that is, if and when they are. First, better understanding demographic differences of charter schools – or any school for that matter – provides useful backdrop for claims of chartery miracles. Second, the demography of charters in their local contexts, and demographic shifts induced by choice programs, or attendance boundary reconfiguration for that matter, have implications for schools on both ends – sending and receiving.

1. Claims of reformy miracles

I don’t know how many times I’ve come across tweets and blog posts, for example, talking about how BASIS charter schools in Arizona are better than Singapore or Shanghai, or even Finland. And that, since we all know Arizona is a high poverty state, BASIS must be serving low income kids, and thus achieving some transferable miracle.

If we put BASIS into a scatterplot, including its % free or reduced lunch share, among Arizona schools, expressed in national percentile ranking for math, we get this picture:

Here, BASIS looks rather not-so-miraculous. In fact, it’s right about where one would expect given the students it serves.

Likewise, schools like Robert Treat Academy and North Star Academy often receive praise for their outcomes in New Jersey. Here’s where they lie when we take into account free lunch shares alone (and use general test taker outcomes to reduced special ed and ELL effects).

Both are near where one would expect them to be given their students. In fact, many more Newark Public Schools district schools deviate positively – and more positively – from expectations than either of these “miracle” schools.

2. Effects on the system as a whole

As I’ve shown in several previous posts (like this one), when charter schools (or district’s own magnet schools) siphon off lower need students they leave behind higher need students. Just as the concentration of lower need students in charter or magnet schools may provide advantageous peer group influence on those involved, the concentration of higher need students left behind in district or other charter schools has adverse peer group effects. Similar concerns arise with neighborhood level sorting of children and families. The policy goal is to figure out how to best manage student sorting so as not to exacerbate these problems via under-regulated choice programs (with incentives to cream-skim).

Regulation need not take the form of requiring all charter (or district magnet) schools to serve proportionate shares of specific populations (by race, economic status or disability). The reality is that some charter schools, like districts’ own magnet schools may work better with some populations than others and thus forcing them to serve a population they are ill equipped to serve is neither productive for the school nor the child.

However, where charter (or magnet) success depends on ability to serve a select population, alternative policy constraints like growth caps may be in order, to restrain otherwise parasitic tendencies.

Thus far, however, unfettered, largely parasitic charter growth continues to have the potential to do much more harm than good in the long run.

UPDATE

Some have pointed out that the charter sector in these states appears relatively “balanced” overall. Thus, what’s the harm? They merely introduce heterogeneity based on the preferences of individual parents on behalf of their children. The problem is that charter enrollment behaviors seems to vary substantially by city. So, statewide averages, or statewide distributions can mask real local level problems. For example, in New Jersey, most of the charter schools in Trenton over enroll low income kids, while on average in Newark, they under enroll. That charters in Trenton over enroll low income kids does not help the Newark situation, though it does raise different questions for Trenton. Notably, when CREDO conducted its study of charter school effects in New Jersey, the identified positive effect came entirely from Newark, whereas charters elsewhere in the state underperformed.

Here are a few additional slides showing the city level aggregate disproportionality for the states above. Note that there may be a few cases where charter operators submitted the WRONG information about their “city of location” to their state, for the national data. In which case, a charter may show up in a city where it keeps its management office rather than where it runs its school. Don’t blame me for wrong addresses in the data. Blame those who submitted their information WRONG!

Here’s NJ, where the greatest aggregate disproportionality is in Princeton. And to those arguing that charters are merely creating more balance than can the district – that is NOT the case in Princeton NJ. Note that the net disproportionality in Newark is about 84%. Thus, while there is heterogeneity, with some schools overservign low income kids, there are enough schools underserving low income kids and by a large enough margin that the net effect is that charters in Newark are underserving. Some other smaller towns with single charters standout… Camden is approximately balanced between charter and district schools and Trenton has higher concentration of low income kids in charters. On average in NJ, the state average is relatively balanced.

Here’s Massachusetts, which on average is imbalanced, with significant disproportionality in locations like Dorchester which is home to many charters. Charters within the cit of Boston itself are more balanced.

Here’s Connecticut, which on average is also imbalanced.

Another point that has been raised, related to the issue of charters attracting suburbanites and retaining “wealthier” families than might otherwise stay in the cities and send their kids to the schools, is the argument that these most disproportionate charters likely represent their neighborhoods within the cities, and the schools around them. First, as I explain in the comments below, this apparent skimming pattern isn’t so much a function of some charters serving wealthy populations (not so much a Princeton problem), but rather a function of charters in otherwise poor neighborhoods skimming off the less poor from surrounding neighborhoods and schools. Indeed, the other scenario likely exists in a few select cases. But having reviewed numerous maps of charter locations and demography, I don’t suspect that’s the norm. Here are a few maps for illustrations.

Here are Newark charters:

Note for example, that Robert Treat Academy stands out like a sore thumb. And even TEAM, which is more representative than other Newark Charters, sticks out in its context (a yellow circle surrounded by red ones). So too does Greater Newark which is surrounded both by higher poverty district schools and higher poverty other charters.

Here’s Hartford, CT, where nearly every other district school – except for the magnet schools – is a red circle – serving very high poverty concentrations.

But, Hartford is wonderfully illustrative of the fact that some districts also impose on themselves a significant degree of economic segregation. Hartford’s Capital Prep is as disproportionate in low income enrollment as Jumoke and Achievement First. But none – none of the districts’ regular public schools, including those right next door, serve such low shares of kids qualified for free lunch.

Comments on NJ’s Teacher Evaluation Report & Gross Statistical Malfeasance

A while back, in a report from the NJDOE, we learned that outliers are all that matters. They are where life’s important lessons lie! Outliers can provide proof that poverty doesn’t matter. Proof that high poverty schools – with a little grit and determination – can kick the butts of low poverty schools. We were presented with what I, until just the other day might have considered the most disingenuous, dishonest, outright corrupt graphic representation I’ve seen… (with this possible exception)!

Yes, this one:This graph was originally presented by NJ Commissioner Cerf in 2012 as part of his state of the schools address. I blogged about this graph and several other absurd misrepresentations of data in the same presentation here & here.

Specifically, I showed before that the absurd selective presentation of data in this graph completely misrepresents that actual underlying relationship, which looks like this:

Yep, that’s right, % free or reduced priced lunch alone explains 68% of the variation in proficiency rates between 2009 and 2012 (okay, that’s one more year than in the misleading graph above, but the pattern is relatively consistent over time).

But hey, it’s those outliers that matter right? It’s those points that buck the trend that really define where we want to look…what we want to emulate? right?

Actually, the supposed outliers above are predictably different, as a function of various additional measures that aren’t included here. But that’s a post for another day. [and discussed previously here]

THEN came the recent report on progress being made on teacher evaluation pilot programs, and with it, this gem of a scatterplot:

This scatterplot is intended to represent a validation test of the teacher practice ratings generated by observations. As reformy logic tells us, an observed rating of a teacher’s actual classroom practice is only ever valid of those ratings are correlated with some measure of test score gains.

In this case, the scatterplot is pretty darn messy looking. Amazingly, the report doesn’t actually present either the correlation coefficient (r) or coefficient of determination (r-squared) for this graph, but I gotta figure in the best case it’s less than a .2 correlation.

Now, state officials could just use that weak correlation to argue that “observations BAD, SGP good!” which they do, to an extent. But before they even go there, they make one of the most ridiculous statistical arguments I’ve seen, well… since I last wrote about one of their statistical arguments.

They argue – in picture and in words above – that if we cut off points from opposite corners – lower right and upper left – of a nearly random distribution – there otherwise exists a pattern. They explain that “the bulk of the ratings show a positive correlation” but that some pesky outliers buck the trend.

Here’s a fun illustration. I generated 100 random numbers and another 100 random numbers, normally distributed and then graphed the relationship between the two:

And this is what I got! The overall correlation between the first set of random numbers and second set was .03.

Now, applying NJDOE Cerfian outlier exclusion, I exclude those points where X (first set of numbers) >.5 & Y (second set)<-.5 [lower right], and similarly for the upper left. Ya’ know what happens when I cut of those pesky supposed outliers in the upper left and lower right. The remaining “random” numbers now have a positive correlation of .414! Yeah… when we chisel a pattern out of randomness, it creates… well… sort of… a pattern.

Mind you, if we cut off the upper right and lower left, the bulk of the remaining points show a negative correlation. [in my random graph, or in theirs!]

But alas, the absurdity really doesn’t even end there… because the report goes on to explain how school leaders should interpret this lack of a pattern that after reshaping is really kind of a pattern, that isn’t.

Based on these data, the district may want to look more closely at its evaluation findings in general. Administrators might examine who performed the observations and whether the observation scores were consistently high or low for a particular observer or teacher. They might look for patterns in particular schools, noting the ones where many points fell outside the general pattern of data. These data can be used for future professional development or extra training for certain administrators. (page 32)

That is, it seems that state officials would really like local administrators to get those outliers in line – to create a pattern where there previously was none – to presume that the reason outliers exist is because the observers were wrong, or at least inconsistent in some way. Put simply, that the SGPs are necessarily right and the observations wrong, and that the way to fix the whole thing is to make sure that the observations in the future better correlate with the necessarily valid SGP measures.

Which would be all fine and dandy… perhaps… if those SGP measures weren’t so severely biased as to be meaningless junk.

Yep, that’s right – SGP’s at least at the school level, and thus by extension at the underlying teacher level are:

higher in schools with higher average performance to begin with in both reading and math
lower in schools with higher concentrations of low income children
lower in schools with higher concentrations of non-proficient special education children
lower in schools with higher concentrations of black and Hispanic children

So then, what would it take to bring observation ratings in line with SGPs? It would take extra care to ensure that ratings based on observations of classroom practice, regardless of actual quality of classroom practice, were similarly lower in higher poverty, higher minority schools, and higher in higher performing schools. That is, let’s just make sure our observation ratings are similarly biased – similarly wrong – to make sure that they correlate. Then all of the wrong measures can be treated as if they are consistently right???????

Actually, I take some comfort in the fact that the observation ratings weren’t correlated with the SGPs. The observation ratings may be meaningless and unreliable… but at least they’re not highly correlated with the SGPs which are otherwise correlated with a lot of things they shouldn’t be.

When will this madness end?

A few quick thoughts and graphs on Mis-NAEP-ery

Update: Here are a bunch of additional graphs relating Students First Report Card grades with unadjusted and adjusted NAEP Gains (hint – it’s the adjusted gains that matter since low performing states are able to post bigger gains, and also generally received higher grades from Students First). Mis_naep_ery9

Yesterday gave us the release of the 2013 NAEP results, which of course brings with it a bunch of ridiculous attempts to cast those results as supporting the reform-du-jour. Most specifically yesterday, the big media buzz was around the gains from 2011 to 2013 which were argued to show that Tennessee and Washington DC are huge outliers – modern miracles – and that because these two settings have placed significant emphasis on teacher evaluation policy – that current trends in teacher evaluation policy are working – that tougher evaluations are the answer to improving student outcomes – not money… not class size… none of that other stuff.

I won’t even get into all of the different things that might be picked up in a supposed swing of test scores at the state level over a 2 year period. Whether 2 year swings are substantive and important or not can certainly be debated (not really), but whether policy implementation can yield a shift in state average test scores in a two year period is perhaps even more suspect.

Setting all that aside, let’s just take a step back and look at the NAEP data, changes in scores from 03-13, 09-13 and 11-13 for 4th grade reading and 8th grade math. BUT, as I’ve shown before, since gains on NAEP appear correlated with starting point – lower performing states show higher gains, let’s condition those gains on starting point by representing them in scatterplots against starting points.

Here are the figures. In some of the figures below, I’ve cut out Washington, DC because it is such a low performing outlier. It does creep into the picture as its scores rise. But this is a rise over the longer haul, much prior to teacher evaluation reforms.

If teacher evaluation reform (or expanded choice, etc.) has caused great NAEP gains, then the graphs below should show that especially from pre-RTTT baseline year 2009 to 2013, states adopting RTTT-style teacher eval policies should be rising above the trendline – but not those curmudgeonly states that have lagged in such reform efforts.

Grade 4 Reading 2003-2013

Over the 10 year period, Maryland is the miracle state in 4th grade reading. Matt Di Carlo has pointed this out in the past. Florida does okay, and Alabama is also a standout. New Jersey and Massachusetts – both initial high performers also exceed expectations given starting point. Louisiana falls right on the line.

Grade 4 Reading 2009-2013

From 2009-2013, Maryland remains the standout. Georgia, Washington, Utah and Minnesota also do pretty darn well, and yes… Tennessee is in that next batch, but even Wyoming and New Hampshire beat expectations by more, having started higher. Louisiana beats expectations. In any case, it’s hard to make a case that from 2009 to 2013, states that moved most aggressively on teacher evaluation are those that showed greatest gains.

Grade 4 Reading 2011-2013

On the recent 2-year bump, Tennessee and DC do quite well, but so too does Minnesota. Colorado (another teacher eval state) does pretty well on this one. This graph may provide the “best” (albeit painfully weak, suspect and short term) “evidence” for teacher eval states – well – except for Minnesota, which I don’t believe was leading the reformy pack on that issue. Of course there are also those who wish to point to choice policies as the driver – noting Indiana’s presence in the mix – but similar inconsistencies undermine this argument (with larger and smaller charter and voucher share states falling, well, all over the place in this figure – but that does warrant some additional figures at a later point.)

Let’s move to 8th grade math.

Grade 8 Math 2003-2013 New Jersey and Massachusetts lead the way on 10 year gains – even though they started high – with Vermont and New Hampshire doing okay as well. Hawaii also isn’t looking bad here. But Louisiana, despite starting low, posted lack-luster gains. Tennessee is right below Nevada – falling pretty much in line with expectations.

Grade 8 Math 2009-2013From 2009 to 2013, New Jersey and Massachusetts along with Rhode Island, Hawaii, Ohio, California and Mississippi do pretty well. Not your most reformy mix of states – regarding teacher evaluation or choice programs (but for Ohio’s charter expansion). Louisiana is still sucking it up, and Tennessee falling more or less in line with expectations.

Grade 8 Math 2011-2013Finally, in the much noisier two year bump on math from 2011 to 2013, we get a little more spreading out – because a two year bump is noisier – less certain – less decisive in any way, and also less related to initial level. Here, New Jersey and Massachusetts are still about as far above expected growth as is Tennessee, which for the first time jumps above expectations for grade 8 math growth. DC does creep into the picture here, and posts some pretty nice gains. BUT… the issue with DC is that its average starting point is so low that it’s hard to predict accurately what its gain would likely be.

Is Tennessee’s 2-year growth an anomaly? we’ll have to wait at least another two years to figure that out. Was it caused by teacher evaluation policies? That’s really unlikely, given that those states that are equally and even further above their expectations have approached teacher evaluation in very mixed ways and other states that had taken the reformy lead on teacher policies – Louisiana and Colorado – fall well below expectations.

UPDATE: Classic example of Mis-NAEP-ery

Latest NAEP school test scores suggest that school reform helps. Big improvements in DC & Tennessee, both centers of reform.

— Nicholas Kristof (@NickKristof) November 8, 2013

Here are some additional versions of the figures above, in which I have identified the states that received passing grades from Students First for “teaching” related policies.

Clarification: The graphs below separate states that received above/below a “teaching” grade point average of 2.0 from Students First.

Another UPDATE: Here are the trends on DC score improvements… So, in other words, are you really telling me that teacher contractual changes adopted in the last few years affected student gains starting back in 1996?