On Short-Term Memory & Statistical Ineptitude: A few reminders regarding NAEP TUDA results

Nothin’ brings out good ol’ American statistical ineptitude like the release of NAEP or PISA data. Even more disturbing is the fact that the short time window between the release of state level NAEP results and city level results for large urban districts permits the same mathematically and statistically inept pundits to reveal their complete lack of short term memory – memory regarding the relevant caveats and critiques of the meaning of NAEP data and NAEP gains in particular, that were addressed extensively only a few weeks back – a few weeks back when pundit after pundit offered wacky interpretations of how recently implemented policy changes affected previously occurring achievement gains on NAEP, and interpretations of how these policies implemented in DC and Tennessee were particularly effective (as evidenced by 2 year gains on NAEP) ignoring that states implementing similar policies did not experience such gains and that states not implementing similar policies in some cases experienced even greater gains after adjusting for starting point.

Now that we have our NAEP TUDA results, and now that pundits can opine about how DC made greater gains than NYC because it allowed charter schools to grow faster, or teachers to be fired more readily by test scores… let’s take a look at where our big cities fit into the pictures I presented previously regarding NAEP gains and NAEP starting points.

The first huge caveat here is that any/all of these “gains” aren’t gains at all. They are cohort average score differences which reflect differences in the composition of the cohort as much as anything else. Two year gains are suspect for other reasons, perhaps relating to quirks in sampling, etc. Certainly anyone making a big deal about which districts did or did not show statistically significant differences in mean scale scores from 2011 to 2013, without considering longer term shifts is exhibiting the extremes of Mis-NAEP-ery!

So, here are the figures…. starting with NAEP 8th grade math gains for 10 years, against the initial average score in 2003.

The relationship between 10 year gains on 8th grade math and initial average score is relatively strong. DC and LA which appear to be getting the early applause for their reformy amazingness pretty much fall right in line with expectations. Boston is a standout here… and Cleveland? well… that’s a bit perplexing, but Cleveland reveals perplexing data on many levels in ed policy (including some of the consistently highest school level low income concentrations in the nation).

The relationship for reading is not quite as strong:

LA is lookin’ pretty good here, but starting pretty darn low – lower than DC… which, by the way, really isn’t a standout here on 10 year gains. Cleveland? well… not a pretty sight… Other cities fall pretty much in line with expectations given their initial 2003 mean scores.

Here are the 4 year gains for math grade 8:

DC looks a little better here… but as previously, cities fall among the states in roughly their expected locations- but for Cleveland and Detroit, which seem to lag. San Diego, a relative standout on 10 year gains, lags on 4 year gains, but that’s hardly a condemnation of a city that a) has made longer term gains and b) as of 2009 sits among the higher performing jurisdictions.

Finally, here’s the 4 year gain for reading grade 8:

This relationship is certainly less consistent. DC falls more or less in line. Cleveland and Milwaukee aren’t lookin’ so good. San Diego is back above the line, but having started and remaining lower in the pack than they were on math.

Again, the big caveat here is that these aren’t “gains” but rather cohort differences. And one might suspect population change to occur more quickly in cities than in states, especially in those cases where cities have smaller overall student populations than states (setting aside those pesky low population states like VT, WY, etc.).

What to make of this all? Not much really. Does NAEP TUDA provide broad condemnation of urban education in the U.S. Well, only to the extent that NAEP generally provides such condemnation, since cities and states tend to fall in line with one another (but for some notable standouts). Do these data present us with obvious pictures about current policy preferences or directions? Well, that would be hard to assert given that these data don’t really present us with consistent pictures – but for the fact that starting point matters, and my previous post illustrating how demography matters.

This is by no means to suggest that policies and practices don’t matter, but rather that frequent, egregious misinterpretation of NAEP data provides no value-added to the policy conversation. (yeah… I said value-added!?)

SUPPLEMENTAL FIGURES

Here are a few additional figures from a few years back… it took a while to find them (they are from a project I did on poverty measurement), but they establish the rather obvious fact that these NAEP TUDA scale scores (level scores) are also associated with economic context – specifically, poverty concentration.

Given that many of these cities are high poverty settings, the relationship is actually tighter when I use the more stringent census poverty threshold (rather than free lunch, which is 130% of poverty level), even though these city level poverty data are not necessarily completely overlapping with school district enrollments. What these data do show is that Cleveland and Detroit are simply much higher poverty settings than the other cities in the sample (for 5 to 17 year old children). And that is certainly relevant to both score levels and potential changes in cohort level scores over time.

NAEP scores are from 2009

Racial Disparities in NY State Aid Shortfalls

Yesterday, Ed Law Prof Blog posted an update about the Office of Civil Rights complaint to be filed by Schenectady School District claiming that shortfalls in New York State aid fall disparately by student race.

I’ve reported on numerous occasions on this blog the patterns of disparity in New York State funding. I actually hadn’t checked recently the strength of the relationship between funding shortfalls and school district racial composition. As the Ed Law blog explains, litigation around this question (that of racially disparate impact of school funding policy) was largely headed off by the Sandoval case which held that no private right of action exists for challenging policies violating disparate impact regulations promulgated under Title VI of the Civil Rights Act. “Disparate impact” occurs where a policy ends up having different effects on one group versus another, by race, ethnicity or national origin but not necessarily because the policy is written explicitly to treat individuals differently by race. That is, it’s a statistical association with race that may not have to do directly with race. But then again, it might. That’s the hard part to prove when race isn’t written right into the policy as it used to be, say, in the pre-Brown era. For those interested in some additional school finance reading on this topic see:

Baker, B. D., & Green III, P. C. (2005). Tricks of the Trade: State Legislative Actions in School Finance Policy That Perpetuate Racial Disparities in the Post‐Brown Era. American Journal of Education, 111(3), 372-413. AJE_Baker_Green_Tricks

In the post-Sandoval era, complaints regarding policies that yield racially disparate impact are to be brought as administrative claims, through the relevant federal agencies/departments, just as Schenectady has done here (as elaborated in Ed Law Prof Blog).

So today’s big question is just how bad are the racial disparities in state aid shortfalls in New York State?

Is Schenectady right?

First, let’s define state aid shortfall. As I’ve explained on previous posts, New York operates a foundation aid formula which defines the per pupil amount of funding that is required for each district, given it’s location (labor market) and students (needs) in order to achieve adequate outcomes (this formula being the state’s own proposed remedy to previous state litigation over the adequacy of funding). So, in step one, the state calculates adequate target funding:

1) Sound Basic Funding Target = base funding figure x pupil need index x regional cost index x aidable pupil count

Where that “aidable pupil count” figure includes some additional adjustments.

Step two determines the amount the local district should contribute to the sound basic target funding and thus, the remaining amount to be contributed as state aid.

2) State Aid = Sound Basic Funding Target – Local Contribution

But the problem is that New York has, in nearly every year since proposing this remedy to past litigation, added a few more steps to the calculation, which include:

freezing foundation funding to levels from several years prior
invoking the deceptively named “Gap Elimination Adjustment” to inflict disproportionate cuts on needier districts
enforcing local property tax limits that effectively prohibit districts from making up their losses in state aid – and effectively prohibit districts from even coming close to achieving the level of funding the state itself has declared as constitutionally adequate. Notably, the aid shortfalls are so extreme that low wealth districts really couldn’t ever tax themselves locally enough to make up the losses even if they tried.

Point #3 above is the subject of a separate lawsuit challenging the absurdity of invoking a policy that would prohibit, even if possible, districts from raising the level of funding the state itself declares as adequate but refuses to provide.

So, after the additional freezes and cuts are invoked, we can determine the state aid gap as follows:

State Aid Shortfall = State Aid to Achieve Sound Basic Funding Target – Actual State Aid after Freeze and Gap Elimination Adjustment

And just how related to race are those aid shortfalls? Well, here it is, based on the 2013-14 State Aid Runs merged with demographic data from the 2012 NYSED School Report Cards:

Previously, I’ve shown that these aid shortfalls are pretty strongly associated with the state’s own Pupil Need Index with higher need districts facing larger shortfalls. And racial composition is associated with the pupil need index, if we focus on traditionally disadvantaged racial aggregate classifications (which is a whole separate can of worms).

To summarize the graph above, which visually displays only those districts with greater than 2,000 pupils, but includes all (weighted for enrollment) in statistical estimates, it is certainly the case that New York State districts with higher concentrations of black or Hispanic children have greater state aid shortfalls.

There is indeed a racially disparate impact.

Moreover, that impact is pretty darn big. Moving from a district with 0% black or Hispanic children to one with 100% black or Hispanic children yields a difference in funding gap of over $2,000 per pupil.

Many of the state’s highest minority concentration districts have state aid shortfalls between $5,000 and $10,000 per pupil whereas NONE of the lowest minority concentration districts has an aid shortfall over $5,000 per pupil!

And these state aid shortfalls are shortfalls against the State’s own (paltry, low-ball) estimates of what it might have taken to achieve the now dated outcome standards of 2007 (under previous litigation)!

UPDATE:

Here’s a quick multivariate run of the data to determine whether otherwise similar districts with more minority children have bigger funding gaps, where otherwise similar is determined with respect to components of the formula itself – the Regional Cost Index, Pupil Need Index and the additional weights included in the Total Aidable Foundation Pupil Unit count.

Somewhat surprisingly, in this regression, the racially disparate impact is actually larger than when previously represented only as a bivariate relationship between funding gaps and race. I’d have expected the Pupil Needs Index to have substantially moderated the relationship between race and funding gap. But, it is also likely that within any region, the funding gaps are more disparate by race than they appear statewide. This occurs because many of the high minority districts are in higher cost regions.

Petrilli’s Hammer & the poverty has nothing to do with PISA argument

Mike Petrilli over at TB Fordham has made his case for why differences in national economic context do little to substantively explain variations in PISA scores.

He frames his argument in terms of Occam’s Razor, as if to sound well informed, deeply intellectual and setting the stage to share profound logical argument, summarized as follows:

“among competing hypotheses, the hypothesis with the fewest assumptions should be selected.”

Petrilli asserts that while some might perceive a modest association (actually, it’s pretty strong) between national economic context and average tested outcomes in math, for example… like this…

…that it is entirely illogical to assert that child poverty has anything to do with national aggregate differences in math performance at age 15.

That is, the various assumptions that must be made to accept this crazy assertion – that economic context matters in math performance – simply don’t hold water in Petrilli’s mind. Rather, the answer must be much simpler and lie in the classroom, with our good ol’ American ineptitude at teaching math.

As Petrilli concludes in his post:

So what’s an alternative hypothesis for the lackluster math performance of our fifteen-year-olds? One in line with Occam’s Razor?

Maybe we’re just not very good at teaching math, especially in high school.

Accepting the bad math teaching conclusion simply requires fewer tricky assumptions than asserting any role for economic context in determining national aggregate outcomes.

Let’s call this Petrilli’s Hammer! as an illogical, blunt & necessarily under-informed alternative to Occam’s Razor. When in doubt – when too lazy to develop disciplined understanding of the field on which you choose to opine and when data are just too hard to handle, get that hammer and everything can look like a nail! (e.g. the bad teacher conclusion)

These two quotes frame Petrilli’s argument:

First, one must assume that math is somehow more related to students’ family backgrounds than are reading and science, since we do worse in the former. That’s quite a stretch, especially because of much other evidence showing that reading is more strongly linked to socioeconomic class. It’s well known that affluent toddlers hear millions more words from their parents than do their low-income peers. Initial reading gaps in Kindergarten are enormous. And in the absence of a coherent, content-rich curriculum, schools have struggled to boost reading scores for kids coming from low-income families.

AND

So the second assumption must be that “poverty” has a bigger impact on math performance for fifteen-year-olds than for younger students. But I can’t imagine why. If anything, it should have less of an impact, because our school system has had more time to erase the initial disadvantages that students bring with them into Kindergarten.

The problem is that both of these statements are a) conceptually foolish and b) statistically ignorant.

Let’s tackle the second issue conceptually first. These scores for 15 year olds are performance level – or status scores. Status scores reflect the cumulative effects of schooling and family background. Most notably in this case, status scores – math performance at age 15, reflect the cumulative influences of poverty – living in poverty – growing up in poverty – lacking resources over long periods of ones’ early life.

Here’s some more reading on poverty timing and cumulative effects.

And then there’s this report which I prepared last summer with ETS.

So… setting measurement issues aside here, we can logically expect gaps between lower and higher income kids to grow between earlier grade assessments and later grade assessments – if we choose to do little or nothing in policy terms about the circumstances under which these children live. Yes, we can and should leverage resources in schools to offset these gaps. But we’re not necessarily applying those resources either.

Accepting Petrilli’s second point above requires that we ignore entirely that our school system remains vastly disparate in many states and locations between rich and poor communities and reinforces (rather than erasing) the initial disadvantages that students bring with them to Kindergarten.

Now, backing up to his first point, where Petrilli argues that if higher poverty settings/contexts do worse relative to lower poverty settings on math than on reading assessments, there must be a simple answer for the math problem/disparity – like bad math teaching of course. There can be no logical explanation for why math scores might be more sensitive than reading scores to poverty variation. Assuming bad math teaching to be the reason for greater disparity in math than in reading is much simpler than exploring why it might appear that math test scores are more sensitive to context/poverty, etc. than reading scores. This is true because we all know that poverty affects reading more than math – or so Mike says without citation to any legitimate source validating his point.

This one is pretty simple. First, it may simply be the case that Mike Petrilli is wrong on all levels here. That conceptually and statistically, economic deprivation seems to have stronger affect on numeracy than on literacy. But even accepting the idea that poverty affects literacy more – in a substantive way – doesn’t mean that we’d find a stronger statistical relationship between a) variations in poverty across settings and b) variations in measured outcomes across settings. The fact is that variations in math assessments are often simply more predictable. They may be both more stable/consistent and may actually have more variation to predict.

Empirical Illustrations

I’m going to use state level NAEP data within the US here to provide statistical illustrations for the rather simple flat-out-wrongness of Mike Petrilli’s Hammer.

The following illustrations simply reveal how data of this type tend to play out, something anyone reasonably well versed in using assessment data along side economic data, at various levels of aggregation, would understand. Some of these patterns reveal conceptually sound underlying hypotheses, and some may simply be an artifact of typical issues occurring in the measurement of student outcomes at different ages and in different subjects.

So, for our first question we ask whether it can possibly be the case that there exists greater disparity in math outcomes in 8th grade than in 4th grade across US states of varying degrees of poverty (setting aside the substantive explanations for why such gaps increase).

Now, careful here, this one requires using a little algebra – slope/intercept analysis. The first figure here shows the variation in NAEP math outcomes for 8th graders and for 4th graders, both in 2013.

This figure shows us first of all, that 8th grade math scores are more predictably disparate as a function of poverty than are 4th grade math scores. For 8th grade, poverty alone explains 63% of the cross state variation in math scores, but marginally less (59%) for 4th grade.

The figure also shows us that by 8th grade, an additional 1% poverty is associated with 1.13 point lower state average scale score, whereas in 4th grade, 1% higher poverty rate is associated only with .83 points lower in state average scale score. That is, the negative slope is greater for 8th than for fourth grade.

There can be many, many reasons for this. Among these reasons might be that as time goes on, cumulative poverty related deficits do increase. Persistent disadvantage makes gaps grow. It may also be a measurement issue, pertaining to the precision of measurement of mathematics knowledge and skill, or it may even be an issue of the stability and predictability of tests on early grade math content given to 9 year olds versus tests on stuff like algebra and pre-algebra given to older, hopefully more mature kids (who’ve also taken far more tests by that time).

But, instead of gettin’ all thoughtful about these possibilities and arming ourselves with well-conceived arguments grounded in data and knowledge of the literature, we could simply use Petrilli’s Hammer to assert that the one and only logical answer is that math teachers in high poverty states like Alabama and Mississippi suck and math teachers in low poverty states like New Jersey and Massachusetts rock! It’s bad math teaching that is making this negative slope get worse between grade 4 and grade 8 – bad math teaching exclusively in high poverty states!

Is there greater disparity in Grade 8 Math than in Grade 4 Math by Contextual Poverty?

The next question then is how can it ever be that math scores might be more disparate as a function of poverty when we all know that poverty affects reading more?

The next figure shows the relationship between poverty by state, and math and reading scores in grade 4. Rather amazingly, math scores are more predictable as a function of poverty than are reading scores – note the difference in variance explained (r-squared). Now, (almost) anyone who has ever plotted reading and math “level” (status) scores, or even estimated value added scores for reading and math in relation to poverty or nearly any other covariate knows that this is common. Variation in math scores – level or value added – is often much more predictable than is variation in reading scores. As above, this may be for many, many reasons. Maybe we’re just not as good on the measurement side at teasing out differences in underlying skill on reading, with either 9 or 14 year olds?

That math scores are more predictably a function of poverty than reading scores – across states – doesn’t mean that our math teaching is better or worse than our reading teaching. Even though the math scores at 4th grade are more predictable than the reading scores, the reading slope appears slightly more disparate (steeper negative). And that doesn’t mean either that our reading teaching is more disparate, or that the 4th grade scores are picking up some differential on the baggage kids bring to school with them. It’s a statistical artifact of the data – based on how math and reading are being measured. It may mean something, but who knows what? It may mean absolutely nothing.

Are Grade 4 Math Scores more predictably a function of poverty than Grade 4 Reading Scores across contexts?

Finally, here’s the 8th grade math and reading. Here, math is marginally more predictable as a function of poverty and math outcomes are more disparate as a function of poverty.

At least by these measures – NAEP math and reading scores – aggregated to the state level – which is similar to making national comparisons – reading is NOT as Petrilli so confidently argues above “more strongly linked to socioeconomic class” than math.

International comparisons work much the same.

What about Grade 8 Math and Reading?

Indeed, Petrilli is attempting to assert that there exists an incongruity between the data and the underlying reality – that yes, reading scores are affected by poverty, but math not so much. Thus, if the data show that math scores are more affected by poverty than are reading scores, then something much more nefarious must be going on – Yes – the bad teacher/teaching problem!

It couldn’t possibly have anything to do with measurement issues or the significant possibility that the full range of student outcomes measured are similarly affected by economic deprivation. That would just be way too much to swallow.

But, if we want to go there… if we want to accept Petrilli’s argument that there’s simply no excuse for U.S. students to fall where they do on international math comparisons, because poverty doesn’t affect 15 year olds or math, only younger kids and reading, then we must apply Petrilli’s hammer to state-by-state comparisons as well.

And thus we logically conclude that math teaching in DC, MS, AL, LA stink and math teaching in NJ, MA VT and NH is great! And that poverty really has nothing to do with it?

Graph of the Day: My contribution to PISA Palooza

With today’s release of PISA data it is once again time for wild punditry, mass condemnation of U.S. public schools and a renewed sense of urgency to ram through ill-conceived, destructive policies that will make our school system even more different from those breaking the curve on PISA.

With that out of the way, here’s my little graphic contribution to what has become affectionately known to edu-pundit class as PISA-Palooza. Yep… it’s the ol’ poverty as an excuse graph – well, really it’s just the ol’ poverty in the aggregate just so happens to be pretty strongly associated with test scores in the aggregate – graph… but that’s nowhere near as catchy.

PISA Data: http://nces.ed.gov/pubs2014/2014024_tables.pdf

(table M4)

OECD Relative Poverty: Source: Provisional data from OECD Income distribution and poverty database (www.oecd.org/els/social/inequality).

Yep – that’s right… relative poverty – or the share of children in families below 50% of median income – is reasonably strongly associated with Math Literacy PISA scores. And this isn’t even a particularly good measure of actual economic deprivation. Rather, it’s the measure commonly used by OECD and readily available. Nonetheless, at the national aggregate, it serves as a pretty strong correlate of national average performance on PISA.

What our little graph tells us – albeit not really that meaningful – is that if we account (albeit poorly) for child poverty, the U.S. is actually beating the odds. Way to go? (but for that really high poverty rate).

Bottom line – economic conditions matter and simple rankings of countries by their PISA scores aren’t particularly insightful (and the above graph only marginally more insightful). Further, comparisons of cities in China to entire nations is a particularly silly approach.

Additional Readings:

Coley, R., Baker, B.D. (2013) Poverty and Education: Finding the Way Forward. ETS Center for
Research on Human Capital and Education. Princeton, NJ: Educational Testing Service

http://www.ets.org/s/research/pdf/poverty_and_education_report.pdf

Baker, B.D., Welner, K.G. (2011) Productivity Research, the U.S. Department of Education, and High‐Quality Evidence. Boulder, CO: National Education Policy Center. Retrieved [date] from
http://nepc.colorado.edu/publication/productivity‐research

Where are the most economically disproportionate charter schools? (& why does it matter?) UPDATED

Updated: It seems that Mike Petrilli on Twitter takes issue with my reference to these schools below as “segregated.” In his view, if a city includes some charter schools that have more of a 50/50 balance of low income and non-low income kids, those are the integrated schools, even if they achieve their balance by creaming off the non-low income kids in a district that is 80% low income. Petrilli seems to suggest that it is necessarily a good thing if charters can can create a balanced population for themselves, even if they create imbalanced population (even more intense concentration of poverty) for the system as a whole. Notably, an unanswered question by the data below is the extent to which the creation of economically non-representative charters in a city can help to retain some middle class families that might not have otherwise sent their children to the district schools. Certainly, there exists at least some evidence that Catholic school enrollments have suffered from charter expansion. It seems far less likely that these charters are recruiting into the city, higher income children from neighboring districts. To suggest that a majority, or even large share of non-low-income students in charters are retained (but would have otherwise left the public system), brought in from lower poverty neighboring suburbs, or siphoned from private schools and would not have otherwise attended the public system is a huge stretch – a smokescreen. It remains most likely that the vast majority of sorting displayed herein is internal to the public-charter system and unlikely to be crossing school district or city boundaries. [more below]

In this first of several posts, I explore economic variation in charter enrollments in the states of Massachusetts, New Jersey and Connecticut.

I’m taking a fairly simple, easily replicable approach here and encourage any data savvy readers to take their own shot at it. For this analysis I’m using the most recent three years of non-preliminary school level enrollment data from the National Center for Education Statistics Common Core of Data, Public School Universe Survey.

http://nces.ed.gov/ccd/pubschuniv.asp

I’m only using a handful of variables here. I’m using:

City of location (lcity)
Total school enrollment (member)
Total number of free lunch qualified children (frelch)
Charter school indicator (chartr)

For each year of the data, I sum the enrollment of all schools in the city of location, including charters and district schools and magnets or other special schools. That gives me the total number of all kids enrolled in a city (yeah… it’s a little messy in that some cities include schools that also enroll kids from outside the city – I limit the final lists to large enough enrollment areas where such cases should not substantively distort final numbers). I do the same for kids qualified for free lunch. So, I have:

City Total Enrollment
City Free Lunch Enrollment

Note that this is by city, not host district, but city is a relevant geographic unit for many reasons, including the fact that many US cities are actually carved into multiple segregated public school districts. Part of the point here is to run a quick-and-dirty summary with the publicly available, readily useable data.

Next, I determine each charter school’s market share:

School market share = school enrollment/city enrollment

And then each school’s share of low income kids served:

School free lunch share = school free lunch / city free lunch

If a school was serving a representative population by low income status, then the free lunch share for the school would equal the market share for the school. That is, the school would be serving both X% of total enrollment and X% of low income kids. I use a simple disparity ratio here:

School free lunch share / school market share

If the disparity ratio is say, .50, then the charter school is serving only half as many low income kids as would be proportional for that school.

To make the final data set manageable… I focus on charter schools in cities where the aggregate enrollment is greater than 10,000. And to have more stable numbers 1) I use only those charters with at least a 1% market share and I use a three year average (2009 to 2011).

So, let’s have at it. Here are the ratios for Connecticut schools:

All but two CT charters underserve low income students in these data. Four are under 70%. Park City, Jumoke and AF Bridgeport are particularly egregious examples!

Here’s Massachusetts:

Many Boston area schools are excluded from the above table on the basis that what outsiders generally think of as “Boston” is actually carved into many smaller city areas, many of which fell under my 10,000 aggregate enrollment threshold. I will report additional data on these areas at a later date.

And finally, New Jersey:

Unfortunately, in this last figure, we actually lose some of New Jersey’s most economically disproportionate charter schools which are in Hoboken, which fell under the aggregate enrollment threshold.

Why does this matter?

There exist at least two reasons why it matters to pay close attention to just how different charter schools are from their surroundings – that is, if and when they are. First, better understanding demographic differences of charter schools – or any school for that matter – provides useful backdrop for claims of chartery miracles. Second, the demography of charters in their local contexts, and demographic shifts induced by choice programs, or attendance boundary reconfiguration for that matter, have implications for schools on both ends – sending and receiving.

1. Claims of reformy miracles

I don’t know how many times I’ve come across tweets and blog posts, for example, talking about how BASIS charter schools in Arizona are better than Singapore or Shanghai, or even Finland. And that, since we all know Arizona is a high poverty state, BASIS must be serving low income kids, and thus achieving some transferable miracle.

If we put BASIS into a scatterplot, including its % free or reduced lunch share, among Arizona schools, expressed in national percentile ranking for math, we get this picture:

Here, BASIS looks rather not-so-miraculous. In fact, it’s right about where one would expect given the students it serves.

Likewise, schools like Robert Treat Academy and North Star Academy often receive praise for their outcomes in New Jersey. Here’s where they lie when we take into account free lunch shares alone (and use general test taker outcomes to reduced special ed and ELL effects).

Both are near where one would expect them to be given their students. In fact, many more Newark Public Schools district schools deviate positively – and more positively – from expectations than either of these “miracle” schools.

2. Effects on the system as a whole

As I’ve shown in several previous posts (like this one), when charter schools (or district’s own magnet schools) siphon off lower need students they leave behind higher need students. Just as the concentration of lower need students in charter or magnet schools may provide advantageous peer group influence on those involved, the concentration of higher need students left behind in district or other charter schools has adverse peer group effects. Similar concerns arise with neighborhood level sorting of children and families. The policy goal is to figure out how to best manage student sorting so as not to exacerbate these problems via under-regulated choice programs (with incentives to cream-skim).

Regulation need not take the form of requiring all charter (or district magnet) schools to serve proportionate shares of specific populations (by race, economic status or disability). The reality is that some charter schools, like districts’ own magnet schools may work better with some populations than others and thus forcing them to serve a population they are ill equipped to serve is neither productive for the school nor the child.

However, where charter (or magnet) success depends on ability to serve a select population, alternative policy constraints like growth caps may be in order, to restrain otherwise parasitic tendencies.

Thus far, however, unfettered, largely parasitic charter growth continues to have the potential to do much more harm than good in the long run.

UPDATE

Some have pointed out that the charter sector in these states appears relatively “balanced” overall. Thus, what’s the harm? They merely introduce heterogeneity based on the preferences of individual parents on behalf of their children. The problem is that charter enrollment behaviors seems to vary substantially by city. So, statewide averages, or statewide distributions can mask real local level problems. For example, in New Jersey, most of the charter schools in Trenton over enroll low income kids, while on average in Newark, they under enroll. That charters in Trenton over enroll low income kids does not help the Newark situation, though it does raise different questions for Trenton. Notably, when CREDO conducted its study of charter school effects in New Jersey, the identified positive effect came entirely from Newark, whereas charters elsewhere in the state underperformed.

Here are a few additional slides showing the city level aggregate disproportionality for the states above. Note that there may be a few cases where charter operators submitted the WRONG information about their “city of location” to their state, for the national data. In which case, a charter may show up in a city where it keeps its management office rather than where it runs its school. Don’t blame me for wrong addresses in the data. Blame those who submitted their information WRONG!

Here’s NJ, where the greatest aggregate disproportionality is in Princeton. And to those arguing that charters are merely creating more balance than can the district – that is NOT the case in Princeton NJ. Note that the net disproportionality in Newark is about 84%. Thus, while there is heterogeneity, with some schools overservign low income kids, there are enough schools underserving low income kids and by a large enough margin that the net effect is that charters in Newark are underserving. Some other smaller towns with single charters standout… Camden is approximately balanced between charter and district schools and Trenton has higher concentration of low income kids in charters. On average in NJ, the state average is relatively balanced.

Here’s Massachusetts, which on average is imbalanced, with significant disproportionality in locations like Dorchester which is home to many charters. Charters within the cit of Boston itself are more balanced.

Here’s Connecticut, which on average is also imbalanced.

Another point that has been raised, related to the issue of charters attracting suburbanites and retaining “wealthier” families than might otherwise stay in the cities and send their kids to the schools, is the argument that these most disproportionate charters likely represent their neighborhoods within the cities, and the schools around them. First, as I explain in the comments below, this apparent skimming pattern isn’t so much a function of some charters serving wealthy populations (not so much a Princeton problem), but rather a function of charters in otherwise poor neighborhoods skimming off the less poor from surrounding neighborhoods and schools. Indeed, the other scenario likely exists in a few select cases. But having reviewed numerous maps of charter locations and demography, I don’t suspect that’s the norm. Here are a few maps for illustrations.

Here are Newark charters:

Note for example, that Robert Treat Academy stands out like a sore thumb. And even TEAM, which is more representative than other Newark Charters, sticks out in its context (a yellow circle surrounded by red ones). So too does Greater Newark which is surrounded both by higher poverty district schools and higher poverty other charters.

Here’s Hartford, CT, where nearly every other district school – except for the magnet schools – is a red circle – serving very high poverty concentrations.

But, Hartford is wonderfully illustrative of the fact that some districts also impose on themselves a significant degree of economic segregation. Hartford’s Capital Prep is as disproportionate in low income enrollment as Jumoke and Achievement First. But none – none of the districts’ regular public schools, including those right next door, serve such low shares of kids qualified for free lunch.

Comments on NJ’s Teacher Evaluation Report & Gross Statistical Malfeasance

A while back, in a report from the NJDOE, we learned that outliers are all that matters. They are where life’s important lessons lie! Outliers can provide proof that poverty doesn’t matter. Proof that high poverty schools – with a little grit and determination – can kick the butts of low poverty schools. We were presented with what I, until just the other day might have considered the most disingenuous, dishonest, outright corrupt graphic representation I’ve seen… (with this possible exception)!

Yes, this one:This graph was originally presented by NJ Commissioner Cerf in 2012 as part of his state of the schools address. I blogged about this graph and several other absurd misrepresentations of data in the same presentation here & here.

Specifically, I showed before that the absurd selective presentation of data in this graph completely misrepresents that actual underlying relationship, which looks like this:

Yep, that’s right, % free or reduced priced lunch alone explains 68% of the variation in proficiency rates between 2009 and 2012 (okay, that’s one more year than in the misleading graph above, but the pattern is relatively consistent over time).

But hey, it’s those outliers that matter right? It’s those points that buck the trend that really define where we want to look…what we want to emulate? right?

Actually, the supposed outliers above are predictably different, as a function of various additional measures that aren’t included here. But that’s a post for another day. [and discussed previously here]

THEN came the recent report on progress being made on teacher evaluation pilot programs, and with it, this gem of a scatterplot:

This scatterplot is intended to represent a validation test of the teacher practice ratings generated by observations. As reformy logic tells us, an observed rating of a teacher’s actual classroom practice is only ever valid of those ratings are correlated with some measure of test score gains.

In this case, the scatterplot is pretty darn messy looking. Amazingly, the report doesn’t actually present either the correlation coefficient (r) or coefficient of determination (r-squared) for this graph, but I gotta figure in the best case it’s less than a .2 correlation.

Now, state officials could just use that weak correlation to argue that “observations BAD, SGP good!” which they do, to an extent. But before they even go there, they make one of the most ridiculous statistical arguments I’ve seen, well… since I last wrote about one of their statistical arguments.

They argue – in picture and in words above – that if we cut off points from opposite corners – lower right and upper left – of a nearly random distribution – there otherwise exists a pattern. They explain that “the bulk of the ratings show a positive correlation” but that some pesky outliers buck the trend.

Here’s a fun illustration. I generated 100 random numbers and another 100 random numbers, normally distributed and then graphed the relationship between the two:

And this is what I got! The overall correlation between the first set of random numbers and second set was .03.

Now, applying NJDOE Cerfian outlier exclusion, I exclude those points where X (first set of numbers) >.5 & Y (second set)<-.5 [lower right], and similarly for the upper left. Ya’ know what happens when I cut of those pesky supposed outliers in the upper left and lower right. The remaining “random” numbers now have a positive correlation of .414! Yeah… when we chisel a pattern out of randomness, it creates… well… sort of… a pattern.

Mind you, if we cut off the upper right and lower left, the bulk of the remaining points show a negative correlation. [in my random graph, or in theirs!]

But alas, the absurdity really doesn’t even end there… because the report goes on to explain how school leaders should interpret this lack of a pattern that after reshaping is really kind of a pattern, that isn’t.

Based on these data, the district may want to look more closely at its evaluation findings in general. Administrators might examine who performed the observations and whether the observation scores were consistently high or low for a particular observer or teacher. They might look for patterns in particular schools, noting the ones where many points fell outside the general pattern of data. These data can be used for future professional development or extra training for certain administrators. (page 32)

That is, it seems that state officials would really like local administrators to get those outliers in line – to create a pattern where there previously was none – to presume that the reason outliers exist is because the observers were wrong, or at least inconsistent in some way. Put simply, that the SGPs are necessarily right and the observations wrong, and that the way to fix the whole thing is to make sure that the observations in the future better correlate with the necessarily valid SGP measures.

Which would be all fine and dandy… perhaps… if those SGP measures weren’t so severely biased as to be meaningless junk.

Yep, that’s right – SGP’s at least at the school level, and thus by extension at the underlying teacher level are:

higher in schools with higher average performance to begin with in both reading and math
lower in schools with higher concentrations of low income children
lower in schools with higher concentrations of non-proficient special education children
lower in schools with higher concentrations of black and Hispanic children

So then, what would it take to bring observation ratings in line with SGPs? It would take extra care to ensure that ratings based on observations of classroom practice, regardless of actual quality of classroom practice, were similarly lower in higher poverty, higher minority schools, and higher in higher performing schools. That is, let’s just make sure our observation ratings are similarly biased – similarly wrong – to make sure that they correlate. Then all of the wrong measures can be treated as if they are consistently right???????

Actually, I take some comfort in the fact that the observation ratings weren’t correlated with the SGPs. The observation ratings may be meaningless and unreliable… but at least they’re not highly correlated with the SGPs which are otherwise correlated with a lot of things they shouldn’t be.

When will this madness end?

A few quick thoughts and graphs on Mis-NAEP-ery

Update: Here are a bunch of additional graphs relating Students First Report Card grades with unadjusted and adjusted NAEP Gains (hint – it’s the adjusted gains that matter since low performing states are able to post bigger gains, and also generally received higher grades from Students First). Mis_naep_ery9

Yesterday gave us the release of the 2013 NAEP results, which of course brings with it a bunch of ridiculous attempts to cast those results as supporting the reform-du-jour. Most specifically yesterday, the big media buzz was around the gains from 2011 to 2013 which were argued to show that Tennessee and Washington DC are huge outliers – modern miracles – and that because these two settings have placed significant emphasis on teacher evaluation policy – that current trends in teacher evaluation policy are working – that tougher evaluations are the answer to improving student outcomes – not money… not class size… none of that other stuff.

I won’t even get into all of the different things that might be picked up in a supposed swing of test scores at the state level over a 2 year period. Whether 2 year swings are substantive and important or not can certainly be debated (not really), but whether policy implementation can yield a shift in state average test scores in a two year period is perhaps even more suspect.

Setting all that aside, let’s just take a step back and look at the NAEP data, changes in scores from 03-13, 09-13 and 11-13 for 4th grade reading and 8th grade math. BUT, as I’ve shown before, since gains on NAEP appear correlated with starting point – lower performing states show higher gains, let’s condition those gains on starting point by representing them in scatterplots against starting points.

Here are the figures. In some of the figures below, I’ve cut out Washington, DC because it is such a low performing outlier. It does creep into the picture as its scores rise. But this is a rise over the longer haul, much prior to teacher evaluation reforms.

If teacher evaluation reform (or expanded choice, etc.) has caused great NAEP gains, then the graphs below should show that especially from pre-RTTT baseline year 2009 to 2013, states adopting RTTT-style teacher eval policies should be rising above the trendline – but not those curmudgeonly states that have lagged in such reform efforts.

Grade 4 Reading 2003-2013

Over the 10 year period, Maryland is the miracle state in 4th grade reading. Matt Di Carlo has pointed this out in the past. Florida does okay, and Alabama is also a standout. New Jersey and Massachusetts – both initial high performers also exceed expectations given starting point. Louisiana falls right on the line.

Grade 4 Reading 2009-2013

From 2009-2013, Maryland remains the standout. Georgia, Washington, Utah and Minnesota also do pretty darn well, and yes… Tennessee is in that next batch, but even Wyoming and New Hampshire beat expectations by more, having started higher. Louisiana beats expectations. In any case, it’s hard to make a case that from 2009 to 2013, states that moved most aggressively on teacher evaluation are those that showed greatest gains.

Grade 4 Reading 2011-2013

On the recent 2-year bump, Tennessee and DC do quite well, but so too does Minnesota. Colorado (another teacher eval state) does pretty well on this one. This graph may provide the “best” (albeit painfully weak, suspect and short term) “evidence” for teacher eval states – well – except for Minnesota, which I don’t believe was leading the reformy pack on that issue. Of course there are also those who wish to point to choice policies as the driver – noting Indiana’s presence in the mix – but similar inconsistencies undermine this argument (with larger and smaller charter and voucher share states falling, well, all over the place in this figure – but that does warrant some additional figures at a later point.)

Let’s move to 8th grade math.

Grade 8 Math 2003-2013 New Jersey and Massachusetts lead the way on 10 year gains – even though they started high – with Vermont and New Hampshire doing okay as well. Hawaii also isn’t looking bad here. But Louisiana, despite starting low, posted lack-luster gains. Tennessee is right below Nevada – falling pretty much in line with expectations.

Grade 8 Math 2009-2013From 2009 to 2013, New Jersey and Massachusetts along with Rhode Island, Hawaii, Ohio, California and Mississippi do pretty well. Not your most reformy mix of states – regarding teacher evaluation or choice programs (but for Ohio’s charter expansion). Louisiana is still sucking it up, and Tennessee falling more or less in line with expectations.

Grade 8 Math 2011-2013Finally, in the much noisier two year bump on math from 2011 to 2013, we get a little more spreading out – because a two year bump is noisier – less certain – less decisive in any way, and also less related to initial level. Here, New Jersey and Massachusetts are still about as far above expected growth as is Tennessee, which for the first time jumps above expectations for grade 8 math growth. DC does creep into the picture here, and posts some pretty nice gains. BUT… the issue with DC is that its average starting point is so low that it’s hard to predict accurately what its gain would likely be.

Is Tennessee’s 2-year growth an anomaly? we’ll have to wait at least another two years to figure that out. Was it caused by teacher evaluation policies? That’s really unlikely, given that those states that are equally and even further above their expectations have approached teacher evaluation in very mixed ways and other states that had taken the reformy lead on teacher policies – Louisiana and Colorado – fall well below expectations.

UPDATE: Classic example of Mis-NAEP-ery

Latest NAEP school test scores suggest that school reform helps. Big improvements in DC & Tennessee, both centers of reform.

— Nicholas Kristof (@NickKristof) November 8, 2013

Here are some additional versions of the figures above, in which I have identified the states that received passing grades from Students First for “teaching” related policies.

Clarification: The graphs below separate states that received above/below a “teaching” grade point average of 2.0 from Students First.

Another UPDATE: Here are the trends on DC score improvements… So, in other words, are you really telling me that teacher contractual changes adopted in the last few years affected student gains starting back in 1996?

Failure is in the Eye of the Political Hack: Thoughts & Data on NJ Failure Factories & NOLA Miracles

We all know… by the persistent blather emanating from reformy-land that some common truths exist in education policy.

Among those truths are that New Jersey’s urban public school districts are absolute, undeniable Failure Factories, while New Orleans’ Post-Katrina charter invasion is the future of greatness in public (well, not really public) education – the ultimate example of how reformyness taken to its logical extreme saves children from failure factories.

Thus, we must take New Jersey down that New Orleans path toward greatness. It’s really that simple. Dump this union-protectionist favor-my-failure-factory mindset… throw all caution (and public tax dollars) to the wind – jump on that sector agnostic train and relinquish all adult self interest.

But like most reformy truths, this one is a bit fact challenged, even when mining reformy preferred data sources.

Now – as I’ve explained previously, I do have my concerns with the Global Report Card method for bridging state – NAEP and international assessments. But why should a little statistical validity concern keep us from having some fun with it.

Wouldn’t it be fun, for example, if we could make some direct comparisons between NOLA’s miracle relinquished, sector agnostic charter schools and New Jersey’s union-protectin’ public bureaucracy failure factories? Wouldn’t it just?

And of course, if we can compare our individual school districts with Finland and Singapore using the Global Report Card than why the heck can’t we compare NJ Failure Factories and NOLA Miracle Charters? I guess we can.

Let’s start with a global look at NJ’s massively failing public schooling system when compared with Finland, and all other U.S. Public School Districts. In this graph we have NJ districts in Orange, compared against the Finnish average (50% on the vertical axis) and in the context of all other U.S. districts, from lower to higher percent free/reduced lunch.

In fact, a whole bunch of NJ districts look like they’re doin’ pretty darn well-Above that Finnish median (the Finnish line?). But hey… there are those high poverty NJ districts over to the right… those failure factors… those where children are being dreadfully failed by their unionized teachers (yeah… you!)… they do indeed fall well below the Finnish median… and that’s just not acceptable!

Certainly, Louisiana as a state must look at least as good as NJ when compared to Finland… especially given the massive gains of NOLA children after that wonderfully beneficial weather event some years back (or so it’s been characterized by many a reformy public official in the past 5 years or so).

Well that’s not a very good start is it? But hey, Louisiana is a very high poverty state with many issues to overcome. And it is well understood that the best way to overcome poverty is to put very little fiscal effort into public education, to rate teachers by their students test scores, and to evaluate teacher preparation similarly.

NOLA Miracles and NJ Failure Factories

Let’s dig deeper into that lower right hand corner for a bit. Let’s specifically isolate those NJ public districts and those NOLA Recovery School District charters with over 80% children qualified for free or reduced priced lunch and let’s see how they stack up against each other, by their percentile rank among U.S. Districts in 2009 Reading and Math.

Here, it certainly appears that NJ failure factories are actually doing about as well as NOLA miracle schools. Heck, Union City, West New York and East Newark beat them all, including NOLA KIPP. Newark – one of the most failure-factory of all is ahead of most NOLA charter organizations and not far behind KIPP.

The picture is pretty similar for reading, but with two NOLA charter operators rising higher in the picture. The others, not-so-much!

Now – you say.. but the NOLA charters are much higher in poverty. This isn’t really a fair comparison, even though I’ve isolated only the highest poverty districts and schools. But to say that, you’d have to be ignorant of a key problem with poverty measurement – about which I’ve written on numerous occasions in recent years both in my blog, in peer reviewed articles and in recent reports.

Put simply, because the same income thresholds are used across the whole country for determining those free/reduced lunch rates above, poverty in NOLA schools is significantly overstated and in the NJ schools is significantly understated.

So, I can use adjustments that we generated for our research on poverty measures to correct the free/reduced lunch rates for our NOLA charters and NJ districts. For the most part, the NJ districts, by comparison move up to 100%, but here, I allow them to go above 100% just to spread them out. By contrast and as expected the NOLA Charters go down in poverty.

And the pictures look like this:

And what do we see?

We see that ON AVERAGE, NJ PUBLIC SCHOOL DISTRICT FAILURE FACTORS ARE BOTH HIGHER IN POVERTY AND HIGHER IN AVERAGE PERFORMANCE OUTCOMES THAN THE VAST MAJORITY OF NOLA MIRACLE CHARTERS.

Heck, even Camden City Schools, slated to be dismantled NOLA-style… already performs about as well (middle of the pack) as most post-Katrina NOLA miracle charters. On Reading, Union City, West New York, East Newark, Elizabeth, East Orange, New Brunswick and Newark all beat NOLA KIPP and all have higher adjusted low income rates!

Yes… the statistical bridging method here from state assessments to national percentiles is, well, imperfect at best.

But that never stopped reformy-pundits from arguing that all U.S. schools suck when compared to Finland or Singapore.

Thus by empirical and logical extension, the NOLA reformy miracle is a cesspool when compared to New Jersey’s failure factories.

Either that, or New Jersey’s failure factories really aren’t as bad as we’ve been led to believe (except maybe this one).

Well that doesn’t fit the reformy narrative very well does it?

Charter Schools & the Public Good: Jersey City Version

As I’ve discussed in several recent posts, I’m increasingly concerned with how charter school expansion has played out both in our cities and in our suburbs.

My one post that perhaps best captures my overarching concerns is here.

It seems that increasingly, no matter where I look, my worst fears are realized. As I’ve explained numerous times – I began my work on charter school policy with positive expectations. Not so much anymore. Here’s how it’s all playing in Jersey City, NJ.

First… the map…where we have our two highly skimmed schools – Soaring Heights and Learning Community Charter. And yes, we do have some charters for the commoners at least in terms of income status.

NOTE: While LCCS has not updated its latitude/longitude data for its new location – the enrollment data characterizing their actual student enrollments are from 2012-13.

The skimming behavior of the elite charters not only disadvantages other district schools, but also those charters for the commoners.

Perhaps more problematic than the number of lower income children left behind in district and non-elite charters is the number and share and type of children with disabilities. Here are the aggregate shares, which are disparate enough.

More problematic however is the fact that the big red bar representing district schools includes much larger shares of children with far more severe and more costly disabilities. Charters are serving only those children with the least severe a) mild specific learning disabilities, b) speech/language needs and c) in some cases “other” health impairments.

And under New Jersey’s persistently biased growth measures, these strong patterns of student sorting not only have consequences for the average level of student performance, but also for the average gains. Clustering more disadvantaged peers together – which necessarily happens when you cluster more advantaged peers together – has consequences.

Higher poverty settings have lower gains and vice versa. Does this mean, as NJDOE would have us believe [by arguing that their growth measures fully account for student background and that teachers are the most important in school determinant of growth], that teachers in Liberty Academy and Jersey City Comm Charter suck and teachers in Learning Community and Soaring Heights are awesome? This is a highly suspect (read totally ridiculous, offensive and asinine) conclusion to draw.

And lower performing settings have lower gains, though this picture is somewhat less clear, because Soaring Heights fails to soar to its expected heights.

The sorting induced by some though not all charter schools in Jersey City raises concerns about how New Jersey charter policy should move forward in the future. This is not to suggest that any and all sorting is bad and should never occur – or be immediately stopped. But, we cannot ignore it… nor should we let the system run wild on its current path.

NJ Education Spending & the Collapse of Equity [Update]

A while back, I wrote this post on the collapse of educational equity in New Jersey.

A few years back, I wrote this post to try to clear up the multitude of falsehoods I kept hearing about New Jersey taxes and spending.

Well… not much time to write a great deal of explanatory text today… but here are a few updated figures. Tax figures are from the state and local tax query system of Taxpolicycenter.org. Note that these figures only go through FY 2011, as do Census data on local public school district spending used in the retreat from equity post above.

But before I go to my updated slides, note that the Center on Budget and Policy Priorities also recently produced a report on education spending since 2008, finding that New Jersey was among those states that either in percentage terms or on a per pupil basis, had seen reductions in inflation adjusted elementary and secondary education spending.

So, here’s New Jersey spending on education since 1990, beginning with STATE DIRECT EXPENDITURES on k-12 and higher education. Note that the peak of state direct spending on k-12 was in 2006, following the largest scale up of “Abbott” funding (from 1998 to 2005ish). Since that time, first with the adoption of the School Funding Reform Act (for comments on problems with SFRA, see this post) and then with recession era cuts, state support has declined.

Here’s combined state and local direct spending on education PER CAPITA (NOT per pupil, but per population) – which is the lions share of spending on education (federal being relatively small… but for a temporary “stabilization” boost).

Now, one argument for the per capita drop is/was that earnings/incomes, etc were dropping and thus the burden on the taxpayer was simply too high and climbing. But here’s what the direct spending – state and local – on education looks like as a share of personal income.

Yes, even as a share of income, education spending declined.

And this decline comes largely as a function of state aid decline. And, when state aid declines, the natural tendency is to use local property taxes to the extent possible to offset that decline.

So, as we can see, property taxes spiked.

And, of course, some local public districts have far more capacity to offset their losses with property tax increases than do others. See this post

And for more information on persistent property tax disparities by wealth in NJ see this post!

So… during this period, as the post mentioned at the outset of this post explains – the progressiveness of New Jersey’s state school finance system begins to decline.

That previously progressive system had actually made some substantial strides for low income children.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: