More Inexcusable Inequalities: New York State in the Post-Funding Equity Era

I did a post a short while back about the fact that there are persistent inequities in state school finance formulas and that those  persistent inequities have real consequences for students’ access to key resources in schools – specifically their access to a rich array of programs, services, courses and other opportunities.  In that post I referred to the post school funding equity era as this perceived time in which we live. Been there, done that. Funding equity? No problem. We all know funding doesn’t matter anyway. Funding can’t buy a better education. It’s all about reform. Not funding. And we all know that the really good reformy strategies can, in fact, achieve greater output with even less funding. Hey, just look at all of those high flying, no excuses charter schools. Wait… aw crap… it seems that many of them actually do spend quite a bit. But, back to my point. Alexander Russo put up a good post today about those pesky school funding gaps, asking whatever happened to them? And he nailed it when he pointed out:

 If funding didn’t matter, then rich districts wouldn’t bother taxing themselves to provide resources to local kids.  If funding didn’t matter, high-performing charter schools wouldn’t cost so much.  Until and unless funding matters again in the public debate over education, I fear that we’ll largely be left fiddling at the margins (which is what it feels like we’re doing now).

I will have much more to say in the near future about the mythology about whether, why and how money matters in education. In this post, I’d just like to illustrate some of the extremes in access to resources that persist across school districts in New York State, which along with Illinois (the topic of Russo’s post) remains among the most inequitable states in the nation. (see: http://www.schoolfundingfairness.org)

Let’s start here.

This is a snapshot if the total expenditures per pupil and the need and cost adjusted expenditures per pupil of some of the MOST and LEAST advantaged school districts in New York State (in terms of a mix of need & spending measures). Without any adjustment for needs and costs, the high poverty, high need districts in many cases are spending below $16,000 per pupil, and the Top 30 districts nearly double that. When adjusted for needs/costs, the disparities widen dramatically.

Even worse, as I’ve explained a few times on this blog, New York State actually uses state aid to help support these disparities, by giving unnecessarily large sums of aid to the top group while continuing to cut aid from the bottom. Here is the distribution of some of that aid:

And here is the distribution of the most recent per pupil cuts in aid:

This all results in a rather ugly pattern of disparities that look rather like this, when we compare current need and cost adjusted funding levels with current district outcomes, as I did in a recent post on Illinois and Connecticut schools:

Because NY has so many districts, I’ve included only the relatively large ones here. This graph shows that districts with more need and cost adjusted funding tend to have higher outcomes and those with less need and cost adjusted funding tend to have lower outcomes. But, this graph is not intended to be a causal representation of that relationship. Rather, it’s intended to display the patterns of disparity across these districts. In the Lower Left are districts that are very high need, very low resources and very low outcomes. Among the standouts in this group are Utica and Poughkeepsie (in red in the first table above).  In the upper right hand corner of the picture are the lower need, high resource and high outcome districts.

What I’ve been finding most interesting though hardly surprising in my research is just how stark the consequences of these disparities are in terms of the actual programs and services provided within these districts. Reformy logic has told us in the past (see: https://schoolfinance101.wordpress.com/2011/05/05/resource-deprivation-in-high-need-districts-caps-goofy-roi/) that really, these districts in the lower left have more than enough money but they insist on wasting it all on junk like cheerleading and ceramics when they should be putting it into basic math/reading coursework.  Alternatively, related reformy logic is that these districts are really just wasting it all on paying additional salaries for experience and degree levels when they could just pay teachers the base salary and do just as well (I’m sure Utica would have great luck in recruiting and retaining teachers with that kind of salary structure. Actually, one of the better articles on relative salaries and teacher job choices uses data on upstate NY cities: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.5636&rep=rep1&type=pdf)

Setting aside these, well, completely stupid and unfounded claims (which are so pervasive in today’s education policy debate, especially in NY State), these next few slides take a look at the types of disparities in access to specific courses and opportunities faced by students in New York State’s schools.

First, here are a few slides using data from the Office of Civil Rights data collection on AP participation rates and participation in other key milestone courses.These data are shown with respect to district poverty rates, and poor small city districts (and some less poor, but still not advantaged ones) are highlighted.

This first slide shows the ratio of students in 7th grade (early) algebra to those taking algebra in  high school. As poverty rates increase, rates of participation in early algebra decline.Clearly, to a large extent, this pattern occurs because fewer students in these districts are prepared for early algebra.

This slide shows overall participation in advanced placement courses. Overall, AP participation declines as poverty increases. Again, this is likely partly due to differences in readiness for these courses among higher poverty populations.

But, it’s also likely due to differences in access to/availability of resources.   For a high need district to both a) provide the advanced opportunities for kids in middle and secondary school and b) make sure kids are prepared to take advantage of those opportunities, those districts would need additional resources on the front end – to make sure kids are prepared for early algebra and on the back end to be able to provide the advanced courses once kids are prepared.

The contrast between the top 30 and bottom 30 (and small city) districts in New York State, as evidenced by the allocation of teaching assignments is striking and disturbing. Let’s start with allocation of teaching assignments to advanced and college credit courses (all are not included). I’ve tallied teaching assignments per 1,000 student (in the group of schools, excluding NYC) based on statewide staffing data from 2010-11.This is very preliminary stuff, from a large data set on all teacher assignments in NY State.

What this first tally shows is that in the high performing, high spending, affluent school districts, there are .5 teacher assignments per 1,000 pupils allocated to AP Physics B. In low performing, low spending, high poverty districts, there are only .05 teacher assignments per 1,000 pupils. That adds up to a disparity ratio of 8.61. In other words, pupils in advantaged districts have nearly 9 times the access to teachers assigned to AP Physics as do pupils in disadvantaged districts. In nearly every and any college credit or AP course, disparity ratios run from about 2 to 9 fold differences. The same is true for disparities specifically between the top districts and poor small city districts which largely fall in the lower left of the Quadrant figure above.

Now, you might be saying…well… they don’t have these programs because of all of their frivolous spending on music and arts. Not so much.

On average, most middle and secondary music and arts staffing assignments also run at about a 2 fold or greater disparity between high and low need/resource districts in New York State.  Kids in high need, low resource, low outcome districts have substantially less access to band, chorus, orchestra, private instrumental or vocal lessons…. and JAZZ BAND! This is not an exhaustive list. And a handful of arts opportunities are allocated roughly with parity (1:1), but high need, low resource districts do not have substantially greater resources allocated to any of these areas and generally have much less.

The one area where the resource balance shifts systematically is in the allocation of remedial and special education related staffing assignments. Here are some examples. Even in special education, in some cases high resources districts retain their advantage. But on average, the higher need, lower resource districts are driving additional resources into special education related teaching assignments. And just to clarify, no, these districts are not way ahead on class size reduction. A few are. Others clearly are not!

In general in NY State, high need districts are, well, screwed. And as I’ve shown in recent posts, the current leadership in New York State has done little to really help – and arguably much to hurt.

Inequity still matters.

Funding inequity has real consequences for the programs, services and educational opportunities that can be provided to kids.

Anyone who suggests otherwise – that funding is somehow irrelevant to any and all of this – is, well, full of crap. These things cost money. Providing both/and costs more than providing either/or.

To reiterate, this is not the post-funding era!

In fact, quite depressingly, we may be sitting at the edge of a new era of dramatic educational inequalities unlike any we’ve experienced in recent decades.

 

When VAMs Fail: Evaluating Ohio’s School Performance Measures

Any reader of my blog knows already that I’m a skeptic of the usefulness of Value-added models for guiding high stakes decisions regarding personnel in schools. As I’ve explained on previous occasions, while statistical models of large numbers of data points – like lots of teachers or lots of schools – might provide us with some useful information on the extent of variation in student outcomes across schools or teachers and might reveal for us some useful patterns – it’s generally not a useful exercise to try to say anything about any one single point within the data set. Yes, teacher “effectiveness” estimates tend to be based on the many student points across students taught by that teacher, but are still highly unstable. Unstable to the point, where even as a researcher hoping to find value in this information, I’ve become skeptical.

However, I had still been holding out more hope that school level aggregate information on student growth – value added estimates – might still be more useful mainly because it represents a higher level of aggregation. That is, each school is indeed a single point in a school level analysis, but that point represents an aggregation of student points and more student points than would be aggregated to any one teacher in a school. Generally, school level value-added measures BECAUSE of this aggregation are somewhat more reliable.

I’m in the process of compiling data as part of a project which includes data on Ohio public schools. Ohio makes available school level value added ratings as well as traditional school performance level ratings. For that, I am grateful to them. Ohio also makes school site financial data available. Thanks again Ohio!

At the outset of any project, I like to explore the properties of various measures provided by the state. For example, to what extent are current accountability measures a) related to the same measures in the previous years, and b) related to factors such as student population characteristics?

Matt Di    Carlo over at http://www.shankerblog.org (see: http://shankerblog.org/?p=3870) has already addressed many/most of these issues with regard to the Ohio data. But, I figured I’d just reiterate these points with a few additional figures, focusing especially on the school level value added ratings.

As Matt Di         Carlo has already explained, Ohio’s performance index which is based on percent passing data is highly sensitive to concentrations of low income students.

Ohio performance index and % free lunch:

Nothing out of the ordinary here (except perhaps the large number of 0 values, which I didn’t bother to exclude – and which really compromise my r-squared… will fix if I get a chance). On this type of measure, this is pretty much expected and common across state systems. This is precisely why many state accountability system measures systematically penalize higher poverty schools and districts. Because they depend on performance level comparisons and because performance levels are highly sensitive to student/family backgrounds.

As a result, these heavily poverty biased measures are also pretty stable over time. Here’s the year to year correlation of the performance Index.

I’ve pointed out previously that one good way to get more stable performance measures over time – for schools, districts or for teachers – is to leave the bias in there. That is, keeping the measure heavily biased by student population characteristics keeps the measure more stable over time – if the student populations across schools and districts remain stable. More reliable yes. More useful, absolutely NOT.

It’s  pretty  much the case that the performance index received by a school this year will be in line with the index received the previous year.

Therein lies part of the argument for moving toward gain or value-added ratings. Note however that an exclusive emphasis on value-added without consideration for performance level means that we can ignore persistent achievement gaps between groups and the overall level of performance of lower performing groups.  That’s at least a bit problematic from a policy perspective! But I’ll set that aside for now.

Let’s take a look at what we can resolve and can’t resolve in Ohio school ratings by moving toward their value-added model (technical documentation here: http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?Page=3&TopicRelationID=117&Content=113068)

As I noted above, I’d love to believe that the school level value-added estimates would provide at least some useful information to either policymakers or school officials. But, I’m now pretty damn skeptical, and here’s more evidence regarding why. Here is the relationship between 2008-09 and 2009-10 school value added ratings using the overall “value added index.”

Note that any district in the lower right quadrant is a district that had positive growth in 2009 but negative in 2010. Any district that is in the upper left had negative growth in 2009 and positive in 2010. It’s pretty much a random scatter. There is little relationship at all between what a school received in 2009 and in 2010 (or in 2008 or earlier for that matter).

So, imagine you are a school principal and year after year your little dot in this scatter plot shows up in a completely different place – odds are quite in favor of that! What are you to do with this information? Imagine trying to attach state accountability to these measures? I’ve long expressed concern about attaching any immediate policy actions to this type of measure. But in this case, I’m even concerned as to whether I have any reasonable research use for these measures. They are pretty much noise.

Here’s a little fishing into the rather small predictable shares of variation in those measures:

As it turns out, the prior year index is a stronger (though still weak) predictor of the current year index. But, it’s also the case that districts that had higher overall performance levels in the prior year tended to have lower value added the following year, and districts with higher % free lunch and higher % special ed population also had lower value added (among those starting at the same performance index level). That is some of the predictable stuff here is bias… indicative of model-related (if not test related) ceiling effects as well as demographic bias. That’s really unhelpful, and likely overlooked by most playing around with these data.

I get a little further if I use the math gains (the reading gains are particularly noisy).

These are ever so slightly more predictable than the aggregate index. But not a whole lot. But, they too are also a predictable function of stuff they shouldn’t be:

Again, districts that started with higher performance index have lower gain, and districts with higher free lunch and special ed populations have lower gain… and yes… these biases cut in opposite directions. But that doesn’t provide any comfort that they are counterbalancing in any way that makes these data at all useful.

If anything, the properties of the Ohio value-added data are particularly disheartening.  There’s little if anything there to begin with and what appears to be there might be compromised by underlying biases.

Further, even if the estimates were both more reliable and potentially less biased, I’m not quite sure how local district administrators would derive meaning from them – meaning that would lead to actions that could be taken to improve – or turn around their school in future years.

At this point and given these data, the best way to achieve a statistical turn around is probably to simply do nothing and sit and wait until the next year of data. Odds are pretty good your little dot (school) on the chart will end up in a completely different location the next time around!

 

A Look at State Aid Cuts in New York State 2011-12

Following is another in my school finance geeky series of straight-up analyses of state school finance formulas. I wrote about New Jersey’s funding formula few days ago. This analysis focuses specifically on the cuts levied across NY school districts for 2011-12 and the underfunding of the foundation formula for select districts.

In 2007, New York State adopted the new Foundation Aid Program.

A full critique of that state aid program can be found here: NY Aid Policy Brief_Fall2011_DRAFT6

That school funding formula was argued by the state to represent the state’s constitutional obligation to provide for a sound basic education. That argument was built on the assumption that the underlying base aid for the formula would be calculated by estimating the average instructional spending per pupil of districts statewide that were performing well, or achieving 80% proficiency on state assessments.[1] By 2011-12, the foundation level was to be set to $6,535.[2] For each district, the sound basic level of funding would be determined by multiplying the foundation funding level times that district’s Pupil Need Index to account for variations in student populations to be served, and Regional Cost Index to account for variations in regional labor costs.

Target “Sound Basic” Funding per Pupil = Foundation x PNI x RCI

            Next, to determine each district’s total sound basic, or foundation formula funding target, this per pupil funding figure was to be multiplied times the Total Aidable Foundation Pupil Units, or TAFPU. TAFPU is based on district enrollments, but includes additional weightings to account for student needs, such as students with disabilities and summer school pupils.

Total Sound Basic Funding Target = Sound Basic Funding per Pupil x TAFPU

            Next, for each district, the state determines the share of the total to be raised locally and the share to be distributed in state foundation aid. A district receives the greater of aid levels based on two different calculations:

State Foundation Aid = Total Sound Basic Funding Target – Expected Minimum Local Contribution

OR

State Foundation Aid = Total Sound Basic Funding Target x State Aid Sharing Ratio

 Applying the Formula to Small Cities and New York City

We can apply these calculations to determine the aid that should have been received in 2011-12 by several of the state’s small cities and by New York City, based on data and parameters from state aid runs as provided on April 1, 2011. (again… this is how it hypothetically works).

Table 1 shows the first portion of the calculations

Note that these are all high need districts, though Tonawanda and North Tonawanda are certainly lower need than Utica or New York City. Among the districts Utica has by far the highest pupil need index. New York City and other downstate Hudson Valley districts have the highest labor market cost estimates. All but Tonawanda and North Tonawanda receive target per pupil funding levels over $10,000.

In the next step, we determine the total foundation funding and the state share of that funding target.

Table 2. Calculation of Promised State Aid

For example, for Albany, the target per pupil funding is $12,179. The expected minimum local contribution is $4,749 and the difference between the two is $7,430 per pupil. In the case of Albany, that difference becomes the state aid per pupil amount. Multiply that amount times the aidable pupils, and you’ve got a total state aid of about $93.5 million. For New York City, it turns out that the higher aid amount is allotted by using the State Aid Sharing Ratio instead of the difference between target funding and estimated local contribution. By the final calculation, New York City would receive about $8.6 billion in aid.

 Broken Promises: Aid Freezes and Gap Elimination

But, this is all hypothetical. This is all entirely based on the promised foundation aid formula. This is all based on the foundation aid formula that the state has argued is by its design the manifestation of the state’s own constitutional obligation to provide a sound basic and meaningful high school education to children across New York State.  Note that I have provided an entirely separate report which explains the insufficiency of these targets and the rationale behind them. But let’s accept these targets for the moment and explore the extent to which even these modest promises have been ignored. Because we are dealing with really big numbers here, Table 3 reports those numbers in millions.

Table 3. Foundation Freezes and Gap Reductions (or are they just aid cuts?)

For Albany, the sound basic level of aid calculated by the legislature’s own formula is about $93.5 million. But, from the start, foundation aid was frozen at prior year levels, which were actually frozen at the levels of the year prior to that. For Albany, the aid freeze brings them down to $56.7 million, or a $37 million shortfall from their sound basic aid calculation. For New York City, the freeze alone pulls out $2.4 billion in aid. For small cities, the total reduction from the freeze, the total underfunding of sound basic aid, is about $271 million.

But it doesn’t end there. The state budget for 2011-12 does not promise to fund even that frozen level of aid. Rather, an additional “Gap Elimination Adjustment” was applied to cut aid further. At the last minute of the legislative session, there was partial reduction of this adjustment, but not full reduction. The adopted Gap Elimination adjustment removes another $12.5 million from Albany, bring their actual state aid level for 2011-12 to rest at $44.2 million, or less than half of their sound basic aid target. The total funding gap for small cities is $370 million. And the total funding gap for New York City after the Gap Elimination adjustment is $3.2 billion.

In summary, even if we pretend that the current foundation formula does provide for a sound basic education, even if we ignore that the current foundation formula is set to relatively low success rates on an assessment where scores had become inflated over time, the New York State Legislature has fallen 30% to 50% or more below these funding promises for many high need, large districts. Statewide, the foundation formula shortfall before Gap Elimination adjustment is approximately $5.5 billion, and after gap elimination adjustment is $8.1 billion. While the current formula itself falls short in many ways, the New York Legislature faces a serious uphill climb simply to keep their own promises.

Spreadsheet of Calculations: Funding Gap NY Calculations

Note: Analysis above focuses on the Foundation Aid Program. Other aids outside this formula include:

F(FA0013) 00 2011-12 CHARTER SCHOOL TRANSITIONAL

G(FA0029) 00 2011-12 HIGH TAX AID

H(FA0065) 00 2011-12 SUMMER TRANSPORTATION AID

I(FA0069) 00 2011-12 TRANSPORTATION AID W/O SUMMER

J(FA0073) 00 2011-12 BUILDING AID

K(FA0077) 00 2011-12 BUILDING  REORG INCENTIVE AID

L(FA0081) 00 2011-12 OPERATING REORG INCENTIVE AID

M(FA0085) 00 2011-12 NON-CMPNT COMPUTER ADMIN AID

N(FA0089) 00 2011-12 NON-CMPNT CAREER EDN AID

O(FA0021) 00 2011-12 NON-CMPNT ACADEMIC IMPROVMT AID

P(FA0093) 00 2011-12 BOCES AID

Q(FA0097) 00 2011-12 PUBLIC EC HIGH COST AID

R(FA0101) 00 2011-12 PRIVATE EXCESS COST AID

S(FA0105) 00 2011-12 SOFTWARE AID

T(FA0109) 00 2011-12 LIBRARY MATERIALS AID

U(FA0113) 00 2011-12 TEXTBOOK AID

V(FA0117) 00 2011-12 HARDWARE & TECHNOLOGY AID

W(FA0121) 00 2011-12 FULL DAY K CONVERSION

X(FA0125) 00 2011-12 UNIV PREKINDERGARTEN AID

Y(FA0033) 00 2011-12 SUPPLEMENTAL PUB EXCESS COST

Z(FA0185) 00 2011-12 ACADEMIC ENHANCEMENT AID

More with Less or More with More & Why it Matters!

I did a piece a short while back on TEAM Academy, a Charter school which I thus far admire in Newark, NJ. I admire the school because, while the data I’ve been able to gather from official sources still indicates that TEAM is far from a statistical match with its surroundings, and appears to have greater cohort attrition than I might like to see, I am, at this point, comfortable stating that TEAM Academy is more comparable than others to its surroundings than other Newark Charters.

Allow me to restate why I care about the comparability piece of the puzzle. First, let me say that I do believe that there is (or at least may be) an important role in urban school systems or any school systems for that matter, for schools that aren’t entirely comparable. That’s the case for Magnet schools for example, which have in some rigorous studies been shown to produce positive outcomes for kids who attend. (see: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.152.385&rep=rep1&type=pdf)

But, when schools like magnet schools show positive outcomes we must recognize them for what they are and not make bold assumptions that those schools can easily be replicated districtwide or nationwide for “all kids” otherwise “trapped” in “failing schools.” Magnet or other selective schools’ success is likely significantly contingent on the student population served. The same goes for some charter schools, a key point of which is that it is foolish to ever lump all charters into one basket as if they represent a single reform strategy. They are a diverse mix of schools. Some serve more comparable populations to surrounding district schools and operate more similarly to open enrollment public schools while others are far more similar to magnet schools in terms of population served and in terms of the curriculum that can then be delivered to that population. When charters are effectively magnet schools (like North Star in Newark) scalability must be viewed differently (in part because the “success” of the school is as likely dependent on the selective student body as it is on any program/services/curriculum provided).

But the debate on scalability of “successful” charters goes beyond just the student population comparability issue. Far too often the rhetoric around successful charters involves the following three part claim:

Claim: Successful charter schools serve the same students, for less money and get better outcomes than traditional public schools.

Rarely if ever are these three components sufficiently validated.  This is especially true of the same students and less money prongs of the argument.  If policymakers accept on faith that pundits are truthful in these claims, policymakers may develop a false confidence as to how easily and how cheaply charter expansion can lead to improved outcomes.  It would behoove policymakers to take a much closer look at all three prongs of the issue, and consider each of these possibilities in Table 1.

Table 1. Framework of Possibilities

Note that this table can be expanded to include those cases of charters that serve non-comparable populations that are more needy than nearby traditional public schools (a focus of many specialized charters)

As I noted in my post regarding TEAM Academy, while the expenditure comparisons (particularly in New Jersey) are complicated they are critically important. And, perhaps my most important statement in that post is that there is no shame in spending more to provide a good education. Charter supporters (or anyone for that matter) should not understate the costs of their additional efforts. Charter supporters should not downplay the importance of class size reduction, teacher salaries, extended learning time in an effort to fit themselves into a category in Table 1 into which they don’t really belong.

Policymakers need to know what works and why it works. If a charter school is really freakin’ successful by spending more money on certain things and/or spending it differently, that’s important to know, even if their overall success is partly contingent on serving a selective population. Simply adopting the rhetoric of serving the same students, for less money and getting better outcomes than traditional public schools is unhelpful when it’s simply not true. Even worse, it’s potentially harmful to promote expansion on such a false premise.

So, here are a few more examples which come from preliminary explorations which are part of a much bigger project to get a handle on charter spending. Note that I began this project over a year ago and released a detailed report on New York City charter spending last year: http://nepc.colorado.edu/publication/NYC-charter-disparities. That report provides important supporting detail for this post regarding making sound comparisons of spending in Charters and traditional public schools in NYC.

Let’s start with a look at Amistad Academy, a well-known high performing charter school in New Haven Connecticut and part of the Achievement First network (www.achievementfirst.org). By usual accounts, Amistad is a high flying charter school. Let me be absolutely clear about this – I’m not crapping on Amistad. To the best of my data-driven understanding, it’s a very good school providing strong academic opportunities for kids in New Haven. But, from a policy standpoint, it’s worth at least cursory exploration of data on the three prongs above.

The following analyses use a mix of data from the National Center for Education Statistics Common Core of Data, from the CTDOE data system (http://sdeportal.ct.gov/Cedar/WEB/ct_report/DTHome.aspx) and from Guidestar (www.guidestar.or). In order to have all data elements lined up to a common fiscal and enrollment year, I’ve focused on school year 2008-09 here.

Figure 1. Amistad % Free Lunch compared to New Haven Schools

Figure 2. Map of Amistad % Free Lunch compared to Surround Schools


NOTE: I’m informed (see comment below) that the school location for Amistad is not correct. Note that the school location is based on the latitude and longitude as provided in the NCES Common Core of Data (www.nces.ed.gov/ccd/bat). As I suspected might be the case, the CCD Lat/Lon indicates the location of the Central Office of Achievement First (403 James Street). Amistad is located over in the area indicated, near many high poverty traditional public schools. (130 Edgewood Avenue New Haven, CT 06511)

So, Figure 1 and Figure 2 show quite decisively that Amistad is not serving a population which is comparable to surroundings in terms of % qualified for free lunch. Amistad also reports 0% LEP/ELL [no data] while the district reports 12.6% (http://sdeportal.ct.gov/Cedar/WEB/ct_report/EllDTViewer.aspx)

Now, let’s take a look at Amistad’s per pupil spending compared to New Haven public schools. Note that it’s generally not a great idea to try to compare against the district as a whole. If we are comparing Amistad’s performance to elementary and middle schools in New Haven, we should be comparing Amistad’s spending to elementary and middle schools. I’ll provide examples for KIPP charter schools in NYC and Houston at the end of this post.

One must also figure out what components are “in” and what components of spending are “out “ when making a host district comparison to charter spending. For example, host districts are responsible for transportation of children to charters in CT. So, that spending should be removed from host district spending. Note that Amistad logically reports no expenditures on transportation in CTDOE spending reports. http://sdeportal.ct.gov/Cedar/WEB/ct_report/FinanceDTViewer.aspx

Further, host districts are responsible for costs of all resident children with disabilities, and it is difficult to discern whether any of these costs (other than the regular education costs of those students) show up in the charter expenditures. Amistad reports no percentage of spending on special education in CTDOE reports (reporting its total general expenditure figure instead). It is most likely that a large share if not all of the district special education spending should be excluded from the district spending figure. http://sdeportal.ct.gov/Cedar/WEB/ct_report/FinanceDTViewer.aspx

Finally, Amistad is part of a national network which might be considered analogous to its “district,” and expenditures by that national organization should be included. I’ve played it very conservative by only prorating the “administrative” expenses (www.guidestar.org) of Achievement First Inc. across all students in the network, for an additional $218 per pupil. (www.achievementfirst.org)

Figure 3. Per pupil Spending in Amistad Academy and New Haven

Data Sources: New Haven City & Amistad CTDOE http://sdeportal.ct.gov/Cedar/WEB/ct_report/FinanceDTViewer.aspx Amistad Academy IRS 990 from www.guidestar.org (Total expenditures =  $9,575,340, enrollment = 641 in 2008-09)

[1] Host districts are responsible for transportation costs for in district students enrolled in charters.

[2] 18.08% of New Haven Public Schools total expense is on special education.  Amistad reported total expenditures as special educ. expenditures & 0% to special education in 2008-09. See: http://sdeportal.ct.gov/Cedar/WEB/ct_report/FinanceDTViewer.aspx

[3] Achievement First Administrative Expense in subsequent year (guidestar.org) $1.125 million with cumulative enrollment 2010-11 of approximately 5,150 (tallied from achievementfirst.org)

So, Figure 3 shows that Amistad academy at the very least spends comparable to New Haven district wide spending after excluding transportation, and spends quite a bit more than New Haven district per pupil if we exclude all of New Haven’s special education spending. Even if we excluded only a portion of New Haven’s special education spending (likely more appropriate), Amistad’s spending would be quite a bit higher than New Haven Public Schools.  Again, there should be no shame in trying to spend more to provide a good school. Rather, it’s arguably quite noble.

I’m not a big fan of relying exclusively on aggregate spending figures. Rather, I prefer to dig under the hood a bit to see how those dollars are leveraged. This is especially important if we really want to figure out how to replicate the successes of a school like Amistad, albeit with a very different population.

Figure 4 shows the class sizes by grade level in Amistad and New Haven public schools based on CTDOE data from 2008-09.  Amistad appears to have leverage money for smaller class sizes in the lower grades, a choice which arguably makes sense given the existing research on the effects of class size reduction. Overall, Amistad has lower class sizes than the district at the same grade level. And that costs money.

Figure 4. Class Size by Grade Level

Now, on to teacher salaries. In my previous post on TEAM Academy in Newark, NJ, I found that TEAM had scaled up teacher salaries on the front end of experience and paid much higher salaries than Newark Public Schools (no easy accomplishment), for new to mid-career teachers, putting TEAM in a pretty good position for local recruitment and retention. Figure 5 shows that Amistad has done much the same. To construct Figure 5 I used 6 years of data on individual classroom teachers in Connecticut and estimated a teacher salary model as a function of experience, degree level and year of the data. I estimated separate models for New Haven schools and for Amistad, and used those models to impute the implicit teacher salary schedule.

Amistad is paying more on the front end, and far outpacing the district across the first several years of the salary schedule (figures jump around for later years in Amistad due to very few teachers in those categories). And perhaps this allows Amistad to recruit and retain the teachers it wants. More exploration is warranted.

Figure 5. Modeled Teacher Salaries by Degree and Experience Level

So, in summary, what we have here is a high performing school that does not serve the same population, spends more than the local district and chooses to leverage spending toward class size reduction in the early grades and toward competitive early to mid-career teacher salaries. That’s a realistic look at a school that by many accounts is a darn good one.

[but a look I suspect some will still take offense to]

The population differences of the school create serious limitations for determining its scalability. That is, is the performance a function of the students or of the school? That’s hard to tell (even in a rigorous lottery based analysis). Further, the expense of the Amistad model of reduced class size and higher wages on the front end may cause some policymakers to balk. But that expense may be indicative of what’s actually needed, even with a more selective student population.

Perhaps more importantly, even with publicly available macro level data we can gain some insights into how the additional money is leveraged. And it would appear that Amistad is doing things I would consider quite logical, such as early grade class size reduction and paying competitive teacher wages. Those aren’t necessarily the sexy things the “cool kids” might be expecting. And those are both things that cost money. It would be hard to run a school with both reduced class sizes AND competitive wages while spending substantially less. And it is critically important that we recognize this!

Addendum: Making school level spending comparisons in New York City and Houston

Note that a major shortcoming of the Connecticut data above is that they don’t allow for comparison of New Haven schools spending by grade level or individual comparable schools. I have begun large scale analysis of school site expenditure in numerous other contexts. Below are two examples of school site comparison against same grade level schools – including comparable budget components (as well as spelling out in the fine print those aspects which aren’t directly comparable – see FN about KIPP Academy financial reporting – much more detail in my NEPC report).

A1. KIPP Schools in New York City (preliminary analysis)

Like Amistad, and KIPP middle schools in NYC appear to be spending more than NYC public middle schools in the same parts of the city. They are a) not serving comparable populations and b) spending more (even if we spread KIPP Academy spending across all schools and if we exclude KIPP to College spending).

Making the appropriate corrections for facilities access is complicated in Connecticut because facilities expenses are not broken out for the Charters. The CTDOE figures for Amistad and New Haven above contain the same reported components (when transportation & special education are excluded for New Haven), but facilities lease payments may be (are likely) embedded in operating expenses of Amistad (& tend to run around $1,500 per pupil in NJ cities, and over $2,000 per pupil in Manhattan). However, New Haven remains responsible for upkeep and renovation for its facilities as well as any payments on debt that may exist. That is, district facilities are not, as some might argue “free.” So, for example, Amistad spends about $828 per pupil on plant operations and maintenance, while New Haven spends $1,735 per pupil in 2008-09 (a difference of $907). But, on administrative & support services, New Haven spends $1,863 per pupil and Amistad spends $3,585 per pupil (a difference of $1,722). This latter figure likely includes a significant lease payment (or some other peculiar overhead expense), but is partially offset by the differences in operations and maintenance (net difference of $1,722 – $907 = $815, which is smaller than the total expenditure differences reported above, but does close some of the gap). But these back of the napkin approaches only get you so far.

I have greater capacity to correct for these differences in my more detailed NYC data used previously in my NEPC report and used above.

A2: KIPP (and all other charters) in Houston (preliminary analysis)

http://ritter.tea.state.tx.us/perfreport/aeis/2010/DownloadData.html

One can see in the figure above that many of the KIPP schools in Houston are spending well above a) most other charters b) most Houston public schools and c) the Houston district average expenditure. Yes, charters on the whole are a mixed bag. Many are quite low spending. These data likely need much more cleaning and cross-checking. But they are generally accessible through the TEA web site.

========================

NOTE: All data used in these posts come from official state, federal and IRS documents, in a few cases through respected aggregators of data (guidestar.org).  In a few cases above, I rely on total enrollment counts from the organization web site (Achievement First). Generally, I rely on official data and provide URLs to data sources so that any and all analyses can be checked, replicated, etc.  If you are a representative of a school and believe your data to be “wrong,” I will typically respond by at least checking that I have not made an error in reporting the data. But, if the data are what they are, then I suggest that you go to the source for any corrections. Most of these data are reported by the schools themselves to the state and federal agencies in question. I just report them as they are, and do certainly attempt to reconcile anything that appears out of line – and will make corrections when the correction can be validated.

 

 

 

Thoughts on Improving the School Funding Reform Act (SFRA) in NJ

I’ve seen a number of tweets and vague media references of late about the fact that NJ Education Commissioner Cerf will at some point in the near future be providing recommendations for how to change the School Funding Reform Act of 2008.

I also have it on good authority that NJDOE has convened a working group to discuss how to alter SFRA and are bringing in outside consultants for ideas. To no surprise, I’ve been left out of these conversations, despite my narrowly focused expertise on these very topics.

SFRA is subject to review by the department. Most of SFRA is laid out in statute, or laws passed by the legislature. But, as I understand it, the department of education does have some latitude to “tweak” parameters within SFRA. For example, adjusting/changing various weights and other factors which drive more money to some districts and less to others.

Now, I hate to stick my nose in on this process with my own preemptive recommendations, but you see, this happens to be a topic I know something about. After all, if within my broad areas of expertise on education policy/finance there is one area in which I really specialize it’s the design of state school finance formulas to meet student needs. And, I happen to have a little background on NJ’s SFRA. So, here’s my free advice. A little pro-bono technical advisement.

First, keep in mind that I have in the past testified on problems with SFRA, specifically focusing on what I consider to be technical errors made in the original design of the formula which fall under the umbrella of “tweakable” stuff.  I also happen to have done research  conference presentations and have published peer reviewed research related to some of the problematic features of SFRA – specifically the way the state chose to adjust for competitive wage variation across settings and the way the state chose to fund special education.

My apologies to all the non-Jersey and non-finance geeks out there for whom this analysis is going to quickly go technical. Can’t avoid it. Would take far too much space to provide full background on each issue. But I do have complete related documentation linked throughout. My reason for this post is simply to get this stuff out there. To make it known what the actual, technical issues are and what should be addressed when talking about “tweaking” SFRA. Some background is in order though, if for no other reason to explain how I’ve narrowed my scope here.

First, state school funding formulas like SFRA start out by calculating an “adequacy budget” target for each school district:

Adequacy Budget = (Base Funding + Student Need Funding) x Geographic Cost Variation

Typically, the student need category includes additional funding for a) low income children, b) children with limited English language proficiency, and c) children with disabilities. Under geographic cost variation, states generally adjust for geographic variation in competitive wages (how much more does it cost to pay teachers competitively in one labor market versus another) and for small, remote and sparsely populated districts (economies of scale & sparsity). The latter issue is less relevant in NJ.

Typically the second step in a state school finance formula is the parsing of state versus local responsibility to pay for the adequacy budget:

Foundation Formula State Aid = Adequacy Budget – Local Fair Share

This part is important too, especially for balancing tax equity concerns. But, in this post and in most of my analyses of SFRA, I’m focused on getting those adequacy targets correct.  And with SFRA, there is plenty to talk about.

SFRA emerged in part from an analysis prepared for the department of education on the costs of providing an adequate education. That report, by John Augenblick and Associates was produced to the department around 2003, but was not released by the department until 2006. Elements of that report were used to guide a new school funding formula adopted in 2008 – SFRA.

It’s really important to understand that the adoption of state school funding formulas is necessarily a political process. That’s just reality. One can ponder a world in which we substitute technical expertise for political deliberation as somehow being the perfect substitute, but even I understand that’s not realistic.

And quite honestly the quality of technical advisement varies widely. I would go so far as to say that some technical advisement is clearly better than other technical advisement, and some is not worth a damn. For examples of the latter, see: https://schoolfinance101.wordpress.com/2011/06/06/roza-tinted-reality/   and:  https://schoolfinance101.wordpress.com/2011/04/01/publicincompetence/

So, the reality is that legislatures adopt something, perhaps with technical advisement and state courts are available to hear any legally relevant grievances (and consider technical advisement) to evaluate whether those concerns rise to the level of constitutional violation.

I often assist in identifying what those grievances are. Here, I’m pointing mainly to technical quibbles over what came out of the legislative process in New Jersey. These are technical quibbles for which I would argue the research suggests there is a “right way” to do things and the New Jersey legislature and department of education chose the “wrong way.” These are technical quibbles which result in relatively modest, though important corrections to the setting of district “adequacy budgets.” And these are technical quibbles which the court appointed special master decided did not rise to a level of constitutional violation. That is, SFRA was “good enough” to meet constitutional muster.

So then, I suggest that the departmental (regulatory) review process is the right time to address these technical problems.

Table 1 provides my short list of relatively easy fixes.

First, when adopting SFRA someone, somewhere along the line suggested that the formula provide substantially greater money for each high school student than for each elementary student and marginally more money for each middle school student than for each elementary student. But, there is no clear evidence – no firm research basis for such differentiation. No evidence, for example, that it costs more to provide equal educational opportunity in districts that have a larger share of secondary than elementary students. Rather, differences that do exist in spending on high school versus elementary students are merely artifacts of the ways in which districts have typically spent regardless of which children would benefit more from additional expenditure. The most problematic feature of this adjustment is that higher poverty districts tend to have smaller shares of their total enrollment in high school, meaning that this adjustment drives more money to lower poverty and less to higher poverty districts. And it does so without any real justification. This pattern occurs for a variety of reasons, including dropout rates but also family migration patterns and family economic status shifts with maturation.

Second, when determining how to include an adjustment for differences in competitive wages across areas of New Jersey, department officials decided to rely conceptually on a new approach proposed by the National Center for Education Statistics – the Comparable Wage Index (see link below). But then they abandoned the actual index and the actual methods behind it to come up with their own. In their own method, NJDOE looked not at labor market level wages but at county level wages of non-teachers (controlling for age, occupation, industry and education level). By using county level data, NJDOE officials came up with a “geographic cost adjustment” that gives the biggest adjustments to the highest income counties (Bergen, Morris, Essex) rather than broadly applying the adjustment to regions of the state. Most problematically, this GCA gives a bigger funding boost to affluent Ridgewood (Bergen) than to nearby Paterson (Passaic) and to Franklin Township than to New Brunswick. That’s just wrong!

Third, and this is a big one, when adopting SFRA the choice was made to fund special education by a method called Census Based funding. That is, assuming that every district really has or should have the same share of population in need of services. They set the rate to 14.69% of students. The argument is that districts with more than that have simply been identifying more to chase additional funding and not that they actually have greater need. I address the flaws of this logic extensively in the linked research article below. Of course, the most absurd aspect of financing every district as if they have 14.69% children with disabilities is the assumption that it is somehow appropriate to fund many districts at that level who actually have far fewer children in need. Fiscal prudence this is not! But again, it does tend to reduce funding in higher poverty urban districts as well as larger, poor remote southern NJ towns (see my research article).

Fourth, in another seemingly back of the napkin exercise, someone decided that a child who is both from a low income background and with limited English language proficiency clearly doesn’t need the additional funding tied to both characteristics, and instead should be provided something in between. So, they instituted a “combination weight” which was a marginal increase over the low income weight, instead of the sum of the low income weight and LEP/ELL weight. I could probably make a stronger case that increased concentrations of both needs in districts serving very high concentrations of children who are both low income and non-English speaking leads to escalating not diminishing costs. Clearly, use of this weight instead of using the sum of the two reduces funding to the districts with the highest concentrations of students who are both poor and non-English speaking. Further, if a district is majority low income, each marginal child who is non-English speaking is more likely to be both and receive the lower combination weight.

Table 1. Summary of Current Errors and Proposed Fixes

Errors in Original SFRA 2008-09 How it Works  Why it’s Wrong Alternative
Grade Level Weight 1.0 Elementary Based on back of the napkin analysis. No real basis in true cost differential. Disadvantages higher poverty districts with lower share of children in upper grades. Eliminate (Revenue neutral, set to average)
1.04 Middle
1.17 Secondary
Geographic Cost Adjustment Based on non-teacher wages in county County is the wrong unit for this analysis. Should be labor market (clusters of counties). Current approach rewards affluent counties (Bergen, Morris, Somerset). Labor Market Based Comparable Wage Index
Census Based Funding of Special Education Special education funding is allocated in flat amount assuming each district has 14.69% children qualified for special education. This assumption is wrong and it leads to significant inequities in special education funding per child with actual needs.  Allocate on need basis
Combination Weight Children who are both ELL and Low Income do not receive weighted funding for both, but rather receive an adjustment between the two. Reduction was based on back of the napkin estimate, and signifcanlty draws funding away from most needy districts. Reinstate full weighting for both

Here is a link to my full report in which I first identify these issues:

Baker.PJP-SFRA.Report.WEB (My complete report explaining the above problems)

Figure 1 shows what happens if we run a formula simulation based on the original 2008 SFRA parameters, and if we incrementally fix each one of these errors.

First, I remove the Combination weight and replace it with an option where each child can receive the sum of the at risk weight and the LEP/ELL weight if they qualify for both.  Table 2 below shows that taking this approach raises the combo weight cost for TYPE 3 districts from $212 million to $330 million. And, looking at the second set of bars in Figure 1, it increases funding in lower income, higher need districts. Note that these are shifts in the total adequacy targets, for which costs will be shared between the state and local districts (albeit increasing targets more in districts heavily reliant on state aid).

Second, I allocate special education funding according to actual concentrations of children with disabilities. This does come at an increased total cost as well, raising total target funding for special education from $991 million to just over $1 billion. Again, total, to be funded by state and local, but again with stronger effect on districts more dependent on state aid.

Third, I get rid of that pesky grade level adjustment and replace it with the revenue neutral average foundation funding level. This does drive some more money into lower income districts.

Fourth, I replace the county level geographic cost adjustment with the National Center for Education Statistics adjustment, set to a statewide average of 1.0 (to make it more revenue neutral). This ain’t perfect. The NCES index has some “rough edges” (see my linked paper). But it’s still more justifiable in general, even if it does hurt some districts which actually need more help. This issue really requires a complete redo!

Figure 1. Simulation based on Operating Type 3 Districts

Table 2 provides some fiscal implications, as noted above, but it’s important to understand that these fiscal implications are based on a simulation of only Type 3 districts (which does include most of the kids). Table 2 is intended to show the patterns of reshuffling that would occur with these corrections.

Table 2. Simulation based on Operating Type 3 Districts

Formula Component Status Quo Remove Combo Fix Special Ed Remove Grade Level Fix GCA Fix All
Total Base Cost $9,547 $9,547 $9,547 $9,547 $9,547 $9,547
Total Cost of At Risk $1,610 $1,610 $1,610 $1,611 $1,610 $1,610
Total Cost of LEP/ELL $70 $70 $70 $70 $70 $70
Total Cost of Combo $212 $330 $212 $212 $212 $330
Total Cost of Special Ed Base $991 $991 $1,018 $991 $991 $1,018
Full State Funding
Total Cost of Special Ed Categorical $496 $496 $509 $496 $496 $509
Bottom Line Before Regional Wage Index $12,926 $13,044 $12,966 $12,927 $12,926 $13,084
Bottom Line After Regional Wage Index $13,007 $13,126 $13,043 $13,008 $13,041 $13,198

Figure 2. Distribution of Need-based Adjustments before Adjustment

(excludes special education)

Figure 3. Distribution of Need-based Adjustments after Adjustment (Fix All)

(excludes special education)

The bottom line here is that the reason each and every one of these corrections is important is that each of the original errors of logic and analysis that found their way into the SFRA formula shifts funding away from higher need and toward lower need districts. These aren’t huge shifts, but they’re not trivial either.

For those who wish to play around, here’s the simulation:

Aid Simulation (MS Excel File with Macros)

And for those wishing some additional technical reading to explain my arguments above, here are links to some of my related writing.

AERA.WageIndexPaper.March2008 (Conference Paper on Problems with NJ Wage Index)

Link to Published Article on Problems with Census Based Special Education Funding

Cheers!

More Detail on the Problems of Rating Ed Schools by Teachers’ Students’ Outcomes

In my previous post, I explained that the new push to rate schools of education by the student outcome gains of teachers who graduated from certain education schools is a problematic endeavor… one unlikely to yield particularly useful information, and one that may potentially create the wrong incentives for education schools.  To reiterate, I laid out 3 reasons (and there are likely many more) why this approach is so problematic. Here, I divide them out a bit more – 4 ways.

  1. parsing out individual teacher’s academic backgrounds – that is if teachers hold credentials and degrees from may institutions, which institution is primarily responsible for their effectiveness?
  2. the teacher workforce in most states includes a mix of teachers from a multitude of within and out-of-state institutions, public and private, with many of those institutions having only a handful of teachers in some states. States will not be able to evaluate all pipelines reliably. Does this mean that states should just cut off teachers from other states, or from institutions that don’t produce enough of their teachers to generate an estimate of the effectiveness of those teachers?
  3. because of the vast differences in state testing systems, and differences in the biases in those testing systems toward either higher or lower ability student populations (floor and ceiling effects), graduates of a given teaching college who might for example flock to affluent suburban districts on each side of a state line might find themselves falling systematically at opposite ends of the effectiveness ratings. The differences may have little or nothing to do with actually being better or worse at delivering one state’s curriculum versus another, and may instead have everything to do with the ways in which the underlying scales of the tests lead to bias in teacher effectiveness ratings. We already know from research on Value Added estimates that the same teacher may receive very different ratings on different tests, even on the same basic content area (math).
  4. and to me, this is still the big one, that graduates of teaching programs are simply not distributed randomly across workplaces. This problem would be less severe perhaps if they were distributed in sufficient numbers across various labor markets in a state, where local sample sizes would be sufficient for within labor market analysis across all institutions. But teacher labor markets tend to be highly local, or regional within large states.

I showed previously how the rates of children qualifying for free or reduced price lunch varies significantly across schools of graduates of Kansas teacher preparation programs:

Racial composition varies as well:

But perhaps most importantly, the above to charts are merely indicative of the fact that the overall geographic distribution of teacher prep program graduates varies widely. Some are in low-income remote rural settings, with very small class sizes, while others are near the urban core of Kansas City, either in sprawling low poverty suburbs or in the very poor, relatively population dense inner urban fringe.  Making legitimate comparisons of the relative effectiveness of teachers across these widely varied settings is a formidable task for even the most refined value-added model and even that may be too optimistic.

Here’s the geographic distribution of teacher graduates of the major public teacher preparation institutions in Kansas:

The Kansas City suburbs in this figure are covered in Red (KU), Purple (K-State) and Orange (Emporia State) does, and a significant number of blue ones (Pitt State). Western Kansas is dominated by Green Dots (Hays State) and southeast Kansas by blue ones (Pitt State). Wichita is dominated by black dots (Wichita State). Nearly all of these clusters are local/regional, around the locations of the universities. Certainly, much of the distribution is also dependent upon demand for teachers, where the greatest growth has been in the Kansas City suburbs to the south and west (out toward Lawrence, home to KU).

Here it is peeled back. First KU:

Next K-State:

Wichita State:

Fort Hays State:

Pittsburg State:

Emporia State:

Even if we assume that value added models could be an effective tool for a) rating teacher effectiveness and b) aggregating that teacher effectiveness to their preparation institutions, it is a stretch to assume that we could find any reasonable way to reliably and validly compare the effectiveness of the graduates of these public institutions, given that they are clustered in such vastly different educational settings – with widely varied resource levels, widely varied class sizes, kids who sit on buses for widely varied amounts of time, widely varied poverty levels, immigration patterns and numerous other factors (it’s that other “unobservable” stuff that really complicates things!). The only reasonable statistical solution would be to have  graduates of Kansas teacher preparation programs randomly assigned to Kansas schools upon graduation.

As I noted on my previous post, I’m not entirely opposed to exploring our ability to generate useful information by testing statistical models of teacher effectiveness aggregated in this way (to preparation institutions or pipelines). It is certainly more reasonable to use these information in the aggregate for “program evaluation” purposes than for rating individual teachers. But, even then, I remain skeptical that these data will be of any particular use either for state agencies in determining which institutions should or should not be producing teachers, or for the institutions themselves. It is a massive leap, for example, to assume that a teacher preparation institution might be able to look at the value-added ratings based on the performance of students of their graduates, and infer anything from those ratings about the programs and courses their graduates took as they pursued their undergraduate (or graduate) degrees. Though again, I’m not opposed to seeing what, if anything, one can learn in this regard.

What would be particularly irresponsible – and what is actually being recommended – is to accept this information as necessarily valid and reliable (which it is highly unlikely to be) and to mandate the use of this information as a substantial component of high stakes decisions about institutional accreditation.

Ed Next’s triple-normative leap! Does the “Global Report Card” tell us anything?

Imagine trying to determine international rankings for tennis players or soccer teams entirely by a) determining how they rank relative to the average team or player in their country, then b) having only the average team or player from each country play each other in a tournament, then c) estimating how the top teams would rank when compared with each other based only on how their country’s average teams did when they played each other and how much better we think the individual teams or players are when compared to the average team or player in their country? Probably not that precise or even accurate, ya’ think?

Jay Greene and Josh McGee have produced a nifty new report and search tool that allows the average American Joe and Jane to see how their child’s local public school districts would stack up if one were to magically transport their district to Singapore or Finland.

 http://globalreportcard.org/

Even better, this nifty tool can be used by local newspapers to spread outrage throughout suburban communities everywhere across this mediocre land of ours.

To accomplish this mystical transportation, Greene and McGee rely on wizardry not often employed in credible empirical analysis: The Triple Normative Leap. Technically, it’s two leaps, across three norms. That is, the researcher-acrobat jumps from one normalized measure based on one underlying test, to another, and then to yet another (okay, actually to 50 others!). This is impressive, since the double-normative leap is tricky enough and has often resulted in severe injury.

To their credit, the authors provide pretty clear explanations of the triple-normative leap
and how it is used to compare the performance of schools in Scarsdale, NY to kids in Finland without ever making those kids sit down and take an assessment that is comparable in any
regard.

For example, the average student in Scarsdale School District in Westchester County, New York scored nearly one standard deviation above the mean for New York on the state’s math exam. The average student in New York scored six hundredths of a standard deviation above the national average of the NAEP exam given in the same year, and the average student in the United States scored about as far in the negative direction (-.055) from the international average on PISA. Our final index score for Scarsdale in 2007 is equal to the sum of the district, state, and national estimates (1+.06+ -.055 = 1.055). Since the final index score is expired in standard deviation units, it can easily be converted to a percentile for easy interpretation. In our example, Scarsdale would rank at the seventy seventh percentile internationally in math.

Note: Addition and spelling errors in Jay Greene’s original web-based materials: http://globalreportcard.org/about.html

Now, Greene and McGee do recognize the potential limitations of making this leap across non-comparable assessments, with potentially non-comparable distributions. In their technical appendix, which few other than geeky stat guys like me will ever read, they explain:

In order to construct the Global Report Card we combine testing information at three separate levels of aggregation: state, national, and international. At each level we use the available testing information to estimate the distribution of student achievement. To allow for direct comparisons across state and national borders, and thus testing instruments, we map all testing data to the standard normal curve.

We must make two assumptions for our methodology to yield valid results. First, mapping to the standard normal requires us to make the assumption that the distribution of student achievement on each of the testing instruments is approximately normal at each level of aggregation (i.e. district, state, national). Second, to compare the distribution of student achievement across testing instruments we assume that standard deviation units are relatively similar across the 2 testing instruments and across time. In other words we assume that being a certain distance from mean student performance in Arkansas is similar to being the same distance from mean student performance in Massachusetts.

http://globalreportcard.org/docs/AboutTheIndex/Global-Report-Card-Technical-Appendix-8-30-11.pdf

So, they appropriately lay out the important assumptions that to actually rate individual districts in the U.S. against international standards, based on relative position to a) other districts in their state, b) their state to the entire U.S., and then c) the entire U.S. relative to other countries, one must have a reasonable expectation that the distributions at each level are a) normal and b) have similar ranges. The range piece is key here because the spread of scores at any level dictates how many points a district can gain or lose when making each leap.  Again, they appropriately lay out these potential concerns. And then, true-to-form, they ignore them entirely. They don’t even test whether these assumptions hold.

The way I see it, if you’re going to point out a limitation and completely ignore it, you should at least point it out in the body of the report, not the appendix.

Setting aside that little concern for now, here’s how it all works. Walking backwards through their analysis each US district starts with penalty points based on the U.S. mean on PISA compared to the international mean.  That is, every district in the US is given a penalty point (-.055) partly because of the legitimately low performance of large numbers of US students in states that have thrown their public education systems under the bus, including Arizona, Colorado… but more strikingly, Louisiana and the deep south.

Now, a high performing state might then be able to offset their national penalty by outperforming U.S. norms… but only to the extent that NAEP has a wide enough distribution to allow a high performer to gain enough points back to make up that ground. If NAEP has a narrower range than the PISA distribution, even if you rock on NAEP, you can’t gain back the ground lost. In theory, this might even make some sense, but it would depend on the truth of the report’s key assumptions, which (as noted) are never tested.

The next move in the triple-normative leap is the move to the wacky collection of state assessments and their widely varied scale score distributions. High performing districts in a state like California, where the mean NAEP score of California gives everyone another layer of penalty to start, and a big one at that, are screwed. California high performers get a NAEP based penalty on top of their US average penalty and have to make up that entire deficit with standard deviations on state assessments. They’ve got a lot of ground to make up in standard deviations from their own state mean on their state assessment (if it’s even possible).

Let’s take a look at some of the actual district level distributions of standardized mean scale scores on state assessments. Remember, Green and McGee’s triple normative leap only works well to the extent that state assessments are a) normally distributed, b) have similar range and c) are not particularly skewed in one direction or the other.

Note that these graphs are of the normalized distributions of scale scores.

Here’s California

Here’s Ohio

And Here’s Indiana

Oh well, so much for that little assumption. Perhaps most importantly, these distributions show that it depends quite a bit on what state your district is in whether your district has reasonable likelihood of making up 1, 2 or 3 points in the last normative leap.

Remember, every district loses over half a point from the start based on U.S. PISA performance. California districts actually appear to have greater opportunity to make up more ground on the last leap, because the spread of California normed scores on state assessments is wider. But, they’ll need it, since their state average performance on NAEP gets all districts in the state a large penalty.

Anyway, while it may be fun to play with Green and McGee’s nifty web-based search tool, it really doesn’t give us much a picture as to how individual local public school districts in the U.S. stack up against foreign nations. It’s just too much of a stretch to assume that a district’s normative position on quirky state assessments, with non-normal distributions, can actually be translated with any precision to represent that district’s position within the performance distribution of schools in Finland or Singapore.

So, while it may be fun to play with the tool and see how different local public school districts compare, more or less to one another as they relate to other countries, it is totally inappropriate to make bold claims that any of these findings speak to the supposed “mediocrity” of the best public schools in the U.S. Many may appear mediocre when transported internationally for no reason other than the penalty points assessed to them in the first two normative leaps (national and state mean), neither of which has much to do with their own performance.

And these concerns ignore the fact that we are dealing with substantively different assessment content. See: http://nepc.colorado.edu/thinktank/review-us-math

Addendum:

McGee was kind enough to open a discussion on the topic below, and clarified… which what I was assuming already… that:

“We assume that being a certain distance from mean student performance in Arkansas is relatively similar to being the same distance from mean student performance in Massachusetts.”

My response is that the spread or variance issue is critically important here, even, and especially when making this kind of assumption. It comes down to the reasons for the differences in spread (like the differences seen in the above histograms).

The variance in each state’s assessments across districts contains some variance that truly indicates differences in performance and some that indicates differences in tests. The problem is that we can’t tell which portion of the spread is “real” variation in performance across districts (driven largely by demographic differences) and which is a function of the different assessments – especially the different assessments across states. Some of the variance is clearly constrained by the underlying testing differences, and may also be upper or lower limit constrained.

Third Way’s “Revisionist Analysis” [Bold-faced lie!]

I know I said I’d stop addressing the Third Way report on Middle Class Schools, but I do have one more thing to point out. Third Way issued a memo in which it aggressively attacked my assertion that they had used district level data to characterize middle class schools. Again, this assertion was relevant to showing the absurdity of their classification scheme, but there were numerous other problems with the report.

My NEPC Review

My NEPC Response to Third Way Memo regarding Methods

Third way claims my analyses to be “fatally flawed” because, as they claim in their follow-up memo, their analyses were actually at the school level and did not, as I show in tables in my review, contain all schools in poor cities including Detroit, Philadelphia or Chicago. Allow me to point out that what I actually said in my review was:

That is, these large urban districts are counted in any Third Way district-level analyses as middle-class districts.

I was very clear in my review that the table of large cities pertained specifically to “district-level” analyses in the Third Way report. I further explained extensively the problems with their continued mixing of school, individual family and district units.

But here’s the kicker based on one last check of their original report and the follow-up memo. In the follow up memo, the authors include this footnote to explain their methods – focusing on how they collected school level data from the NCES Common Core (school level data that never actually show up in any form, any table, in their original report). Note the part in this footnote where they explain selecting “school” as the unit of analysis:

Footnote in Memo

http://content.thirdway.org/publications/446/Third_Way_Memo_-_A_Response_to_the_National_Education_Policy_Center_.pdf

Footnote #8 Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed September 22, 2011. Available at: http://nces.ed.gov/ccd/bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey,” “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the U.S. Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “School.” Then select next. On the following page, in the “select columns” drop down box, choose the “Students in Special Programs” option. Select the box next to “Total Free and Reduced Lunch Students.” Then in the drop down box, select “Contact Information” option. Then select the box next to “Location City.” Then go back to the “select columns” drop down box and select the “Enrollment by Grade” option.  Then select the box next to “11th Grade enrollment.”  Then go more time to the “select columns” drop down box, choose “Total enrollment.” Then select the box next to “Total students.” Then select next. On the next page, choose “Illinois.” Then click the “view table” option. Once the table is compiled, download the table into Excel.csv by clicking that option at the top of the page. To calculate the number of high schools in Chicago with a student population of between 26-75% eligible for NSLP, we performed the following steps: 1) We first sorted by schools based on % NSLP (number of students eligible for free or reduced lunch divided by total number of students enrolled). 2) We then pulled out the schools that had enrollment in 11th grade. 3) We then sorted the schools based on location city, and pulled out the schools located in the City of Chicago.

Now, check out the two related (copied and pasted) footnotes from their original report. Each indicates using DISTRICT level data.

In short, the follow up memo was simply a lie – a flat out lie – and included revisionist analysis completely unrelated to any information actually presented in the original report.

I have retained copies of the originals, if the authors should choose to now go back and edit/change these footnotes.

Doing crappy analysis is one thing. Trying to cover it up by lying and revising while leaving the trail behind really doesn’t help.

Original Report

http://content.thirdway.org/publications/435/Third_Way_Report_-_Incomplete_How_Middle_Class_Schools_Aren_t_Making_the_Grade_-_PRINT.pdf

Footnote #40 Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed July 25, 2011. Available at: http://nces.ed.gov/ccd/ bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey,” “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the U.S. Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “District.” Then select next. On the following page, in the “select columns” drop down box, choose the “Census 2000 – Household Income, Occupancy and Size” option. Then check the box next to “Median Family Income.” Then go back to the “select columns” drop down box, choose the “Students in Special Programs” option. Select the box next to “Total Free and Reduced Lunch Students.” Then go back one more time to the “select columns” drop down box, choose “total enrollment.” Then select the box next to “total students.” Then select next. On the next page, choose the “Select 50 States + DC” filter from the drop down box. Then click the “view table” option. Once the table is compiled, download the table into Excel.csv by clicking that option at the top of the page. To calculate average household income by school district, we performed the following steps: 1) We first sorted school districts based on % NSLP (number of students eligible for free or reduced lunch divided by total number of students enrolled). 2) Using CPI for 2009, we adjusted the incomes for inflation. 3) We then found the median household income, based on the following groupings: 0-25.44%, 25.45-75.44%, 75.45-100% NSLP.

Footnote #88 Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed July 25, 2011. Available at: http://nces.ed.gov/ccd/ bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey”, “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “District.” Then select next. On the following page, in the “select columns” drop down box, choose the “Census 2000 – Household Income, Occupancy and Size” option. Then check the box next to “Median Family Income.” Then go back to the “select columns” drop down box, choose the “Students in Special Programs” option. Select the box next to “Total Free and Reduced Lunch Students.” Then go back one more time to the “select columns” drop down box, choose “total enrollment.” Then select the box next to “total students.” Then select next. On the next page, choose the “Select 50 States + DC” filter from the drop down box. Then click the “view table” option. Once the table is compiled, download the table into Excel.csv by clicking that option at the top of the page. To calculate average household income by school district, we performed the following steps: 1) We first sorted school districts based on % NSLP (number of students eligible for free or reduced lunch divided by total number of students enrolled). 2) Using CPI for 2009, we adjusted the incomes for inflation. 3) We then found the median household income, based on the following groupings: 0-25.44%, 25.45-50.44%, 50.45-75.44%, 75.45-100% NSLP.

Insult of insults from Third Way – Baker, You… You… Status Quo…er!

I gotta admit that my favorite part of the Third Way memo responding to my critique of their “Middle Class” report is the end of the memo.

Here are the two concluding paragraphs from the Third Way memo in reply to my rather harsh critique of their report:

 There are 52,860 public and charter schools that fall within our definition of middle-class schools, and they educate 25.7 million16 students. The message from Dr. Baker and the NEPC seems to be—let’s ignore them. In fact, let’s not even define them. Our view is that there is immense potential out there. These schools are failing in their basic mission—to become college factories.

From our perspective, college graduation rates of 31% and 23% in the second and third NSLP groupings, respectively—as our report presents—are unacceptable for America’s economic future. Clearly, the NEPC and Dr. Baker disagree and are satisfied with the status quo. We are not.

Yes, there it is. The insult of insults in reformyland! I am, as a result of critiquing their near criminal abuse of data, a… a… Status Quo-er!

Obviously, anyone (like me) who might take offense at such egregious representation of data must be a defender of the status quo. That is the worst offense in today’s reform debate. Especially if the egregious abuse of data was done with good intentions? Right? Done with the good intentions of letting the American public understand just how awful their schools are!  They need to know. America needs to know! And now! This can’t wait! Even if we have to classify information illogically or draw conclusions that don’t even match our data?

Look, bad data analyses and bombastic conclusions about our supposed education apocalypse do little or nothing to start a genuine conversation about either the true current conditions of our schools or whether we should be considering systemic changes.

Often, such crisis mode reporting has as its central objective, encouraging the public and policymakers to act in haste and adopt ill-conceived (often self-serving) policy before they know what’s really going on. That is, let’s get in a panic and adopt something really stupid and fast.  Any reader should be wary of and evaluate critically crisis-mode reports like the Third Way middle class report. Some such reports may ultimately reveal important issues and some even with a degree of immediacy. Third Way’s report reveals neither.

On ignorance & impartiality: A comment on the Monmouth U. Poll on Ed. Policy

Some Twitter followers may have noticed the ongoing back and forth regarding the validity of the recent Monmouth University Poll on education reform.I’d certainly rather spend my time on more substantive discussion.

As I’ve noted on many occasions, polls are what they are. They ask what they ask. And the responses to the questions must always be evaluated only with respect to what was asked. Questions about specific policies in particular require that the policies in question be described correctly. This is a point raised the other day by Matt Di   Carlo about the Monmouth Poll here.

Yesterday, Patrick Murray, director of the polling institute posted a response to some of the criticisms levied against the recent Monmouth poll. Unfortunately, I found his response to be much less fulfilling and in many ways far more disturbing than the poll itself. Quite honestly, I’d have left this issue alone if not for some particularly troublesome assertions made by the polling institute director Patrick Murray.

First, here is my response regarding the substantive issue raised by Matt Di   Carlo:

Mr. Murray points out that he, as many pollsters do, chose to use colloquial language to describe “tenure.” The problem, as explained by Matt Di Carlo here http://shankerblog.org/?p=3695, is that the colloquial characterization was factually incorrect, and that it would be possible to achieve a colloquial characterization that is not factually incorrect. The factual error in the characterization of tenure leads to a clear bias in the question. This is the most obvious example, but there are numerous more subtle cases where questions do not accurately represent existing or proposed legislation or regulations.

Here are a few additional points regarding content in Mr. Murray’s response:

Specifically, Mr. Murray contends that critics were simply unhappy with the results, and offered no substantive criticism of the methods.

On Twitter, I have criticized the title of the press release for the poll, which claims that the poll results indicate broad support for New Jersey reforms, implying that responses to the specific questions regarding policies can be taken as supporting the specific policies being proposed.  That is, it infers a close relationship between the policies framed in the questions and actual policy proposals on the table.  Usually,  it is the media who makes such misguided leaps. In this case, the polling institute provided them with the misleading headline.

Mr. Murray’s response not only defends the headline, but he actually makes even less justified statements (slightly more specific) to the same effect. Mr. Murray claims that the poll results provide “broad, general support” for the “Governor’s proposals”, which happen to be rather specific proposals (many of which are not actually the governor’s proposals, but proposals for which he has offered support).  But, very few (if any) of the questions in the poll accurately represent the specific proposals (like mischaracterizing what tenure is).  The questions are broad, and imprecise (if intended to discern support for existing proposals). They are general. Some are outright incorrect. As a researcher, I can assure you that a response to one question, referring to one type of policy (a hypothetical policy that is substantively different from the actual proposals) should not be interpreted as relating to another (without careful statistical validation, which would involve asking the other question).  That is a methodological concern. Not a concern with the findings. It is a concern largely over the representation of findings (press release titles matter), as opposed to the usual quibbling over sampling issues.

After defending the wording of the tenure question, Mr. Murray goes on to discuss the follow up questions to the tenure question – specifically those about how the general public would like to see tenure changed. The problem is that each of these questions about how to “change” tenure is invalid because “change” in the mind of the respondent (at least the uninformed respondent) is measured against an incorrectly defined baseline of what tenure is. That is, Mr. Murray has provided a prompt in the first tenure question that incorrectly describes tenure, asserting that tenure means that a teacher can only be fired for “serious misconduct.” Then he asks in a series of questions whether that should be changed and how. If the baseline condition – existing policy – is described incorrectly, arguably biased – then responses to subsequent questions are influenced by this. That is either biased, or simply sloppy.

Which brings up a related issue. Mr. Murray notes that many if not most poll respondents were unaware of policies, or details of reforms. Because of that, the phrasing of the questions, the colloquial explanations of the policies are of even greater importance, having even greater potential to shape the response. That phrasing can be the basis of grossly misinforming the otherwise uninformed respondent. And it just may have been.

The most significant and most disturbing point:

Setting aside this methodological quibbling, I take issue with Mr. Murray’s point that academic researchers might come at these issues with normative values – as I admittedly do – and that having normative values (based on years of extensive research on these topics) somehow invalidates someone’s ability to critique the poll. Mr. Murray explains:

 To start, most of the criticism has come from people without expertise in the field of survey research.  Some has, which I will treat more seriously.  But it’s important to note that all of these critics, including some who are academic researchers, have taken very public normative positions on education policy.  Normative is one of those great social science words.  It simply means they already have a clear opinion about how things ought to be.  When normative values get applied in a research setting, they lead to bias.

So, in other words. If you don’t have expertise in opinion research, your criticisms should not be taken seriously. And, if you have far too much knowledge and expertise in the substance of the poll (education law, policy and reform), you are too biased for your opinion to carry any weight. This argument is patently absurd.

As Mr. Murray frames it, only through blissful ignorance  on issues of substance can anyone be sufficiently impartial to be involved in, or make claims or arguments regarding either substance or method.  Those with knowledge and opinions derived from that knowledge are necessarily too biased to have valid concerns. I’ll admit that I have biases for rigorous research methodologies.

Like Dr. Di Carlo (who holds a Ph.D. in Sociology from Cornell), I’m not a pollster. I’m a researcher and perhaps that alters my view on how research is conducted and what kinds of conclusions can be reasonably drawn from survey responses to questions with specific wording.  I generally don’t care much for polls or polling results, but I am a stickler for methods.

This poll was about policies, not politicians. And as someone who studies policies I am particularly sensitive to the details of policy design & implementation. This poll was clearly not sensitive to those details and was exceptionally sloppy in its characterization of policies and policy design. And that’s a methodological problem, and one that is so glaringly apparent because of my academic expertise in this area – not because of some normative bias – but, because of actual details, including statutes and regulations.

Perhaps I’m being too picky, and that’s just how the polling industry works. Perhaps the normative values of pollsters allow for imprecise colloquial descriptions and drawing broad unsubstantiated conclusions. That seems to be the gist of Patrick Murray’s argument, and one I find distasteful enough to require a response.