Blog

Uncommon Denominators: Understanding “Per Pupil” Spending

This post is another in my series on data issues in education policy. The point of this post is to encourage readers of education policy research to pay closer attention to the fact that any measure of “per pupil spending” contains two parts – a measure of “spending” in the numerator and a measure of “pupils” in the denominator.

Put simply, both measures matter, and matching the right numerator to the right denominator matters.

Below are a few illustrations of why it’s important to pay attention to both the numerator and denominator when considering both variations across settings in education spending or variations over time in education spending.

Declining Enrollment Growth and Exploding Spending!

First it is important to understand that when the ratio of spending to pupils is growing over time, that growth may be a function of either or both, increasing expenditures in the numerator or declining pupils in the denominator. Usually, both parts are moving simultaneously, making interpretation more difficult. The State of Vermont over the past 20 years makes a fun example.

Vermont is among the highest per pupil spending states in the nation, and Vermont’s per pupil spending has continued to grow at a relatively fast pace over the past 20 years. Figure 1 shows Vermont’s per pupil spending growth (not adjusted for inflation, because choice of an inflator adds another level of complexity) in the upper half of the figure.

But, the lower half of the figure shows Vermont’s enrollment over the same period.

Figure 1. Vermont per Pupil Spending and Enrollments

Slide2

Clearly, given the dramatic enrollment statewide enrollment decline, even if total revenue and spending remained constant, or lagged significantly in its decline, per pupil spending would continue to grow.

Figure 2 breaks out the year over year growth rates of a) total revenue, b) enrollments and c) revenue per pupil. The math is pretty simple here, and the issue almost too obvious to bother with on this blog… but the point here is that if enrollment is declining by 2% annually, and total revenue (or spending) increasing by 4% to 6%, then per pupil revenue will increase by 6% to 8%.

Figure 2. Vermont % Change in Spending and Enrollment

Slide3

Yes, that’s all pretty simple and seemingly obvious. But, that doesn’t stop many from simply looking at per pupil spending growth as if it all represents spending growth. 8% annual growth likely plays differently to a political audience than 4% or 6% growth. Both parts are moving and we can’t forget that. Further, because the provision of education involves a mix of fixed, step and variable costs, we can’t expect spending changes to track perfectly with enrollment changes over time. But yes, we can and should expect appropriate adjustments down the line to accommodate the pupils that need to be served.

Equity Implications of Alternative Denominators: The ADA Game

I’ve written previously on this blog about different measures of student enrollment used in state school finance formulas, which are also used in presenting per pupil spending. A handful of states rely on “Average Daily Attendance” as a basis for providing state aid, and in turn as the method by which they report per pupil spending. As I’ve explained in previous posts, Average Daily Attendance measures vary systematically with respect to poverty, compared with enrollment measures. That is, on average, among those enrolled in a school, attendance rates tend to be lower on a daily basis in schools serving more low income and minority students. So, if one uses these measures to drive state aid to local districts, the result is systematically lower state aid in higher poverty, higher minority districts. But, if one uses these same measures to report per pupil spending, then no harm no foul… or so it seems.

As an aside, when pushed to rationalize financing schools on the basis of attendance, state policymakers often suggest that the purpose of the policy is to create an incentive for school officials to increase attendance rates.[i] The problems with this argument are many-fold. First, local public school districts remain responsible for providing the resources to educate all eligible enrolled children. While 90% may be in attendance on any given day, and while some children may be absent more than others, the same 90% are not in attendance every day. In all likelihood, 100% of eligible enrolled children attend at some point in the year. Second, depriving local public school districts of state aid lessens their capacity to provide interventions that might lead to improved attendance rates. Third, many school absences are simply beyond the control of local public school officials. This is particularly the case for poverty-induced, chronic health related absences. Finally, there exists little or no sound empirical evidence that this approach provides an effective incentive.[ii]

Figure 3 provides an illustration of how different per pupil spending figures look across Texas districts when reported by a) enrollment and b) average daily attendance with respect to shares of low income children. First, because few if any districts have perfect average daily attendance, the green dots – spending per enrolled pupil – are lower than the orange dots – spending per pupil in average daily attendance. Spending per enrolled pupil is simply lower than spending per pupil in average daily attendance. Further, while it would appear that spending per pupil in average daily attendance is higher in higher poverty districts than in lower poverty ones, that is not necessarily the case for spending per enrolled pupil (much smaller difference).

Figure 3. Per Pupil Spending and Low Income Concentrations in Texas

Slide5

Figure provides an alternative view, collapsing data into low income quintiles.

Figure 4. Per Pupil Spending by Low Income Quintile in Texas

Slide6

And it is certainly relevant that the districts in question here are obligated not merely to serve those who show up on a given day but to have resources available to all of whom for which they are held responsible. That is, those enrollment.

Matching the Numerator and Denominator: My expenditures on your pupils?

Finally, I’d like to address the somewhat more convoluted issue of matching the right numerator to the right denominator, especially when making spending comparisons across schools or districts.

I wrote extensively here, about making comparisons between brick and mortar vs. online schools.

And, I wrote extensively here about making comparisons between charter schools and district schools in New York City.

The increasing complexities of the interdependency relationships between district hosts and charter schools create significant confusion when comparing per pupil spending between host district and charter schools. In a recent report, I provide explanations of common (though likely intentional, after the 3rd or 4th iteration) mistakes. Here is one version of my critique of the Ball State study, which appears in Footnote 22, page 49 of this study: http://nepc.colorado.edu/files/rb-charterspending_0.pdf

For example, under many state charter laws, host districts or sending districts retain responsibility for providing transportation services, subsidizing food services, or providing funding for special education services. Revenues provided to host districts to provide these services may show up on host district financial reports, and if the service is financed directly by the host district, the expenditure will also be incurred by the host, not the charter, even though the services are received by charter students.

Drawing simple direct comparisons thus can result in a compounded error: Host districts are credited with an expense on children attending charter schools, but children attending charter schools are not credited to the district enrollment. In a per-pupil spending calculation for the host districts, this may lead to inflating the numerator (district expenditures) while deflating the denominator (pupils served), thus significantly inflating the district’s per pupil spending. Concurrently, the charter expenditure is deflated.

Correct budgeting would reverse those two entries, essentially subtracting the expense from the budget calculated for the district, while adding the in-kind funding to the charter school calculation. Further, in districts like New York City, the city Department of Education incurs the expense for providing facilities to several charters. That is, the City’s budget, not the charter budgets, incur another expense that serves only charter students. The Ball State/Public Impact study errs egregiously on all fronts, assuming in each and every case that the revenue reported by charter schools versus traditional public schools provides the same range of services and provides those services exclusively for the students in that sector (district or charter).

Here’s a relatively straightforward, albeit incomplete illustration. Figure 5 shows that in many states, like New York, Connecticut or New Jersey, the relationship between district host and charter spending creates significant problems in equating numerators and denominators. In many states, as we explain above, host districts retain responsibility for spending on such things as charter student transportation or special education. Districts within stats may opt for different approaches to transportation financing, and some districts may opt to provide funding for centralized enrollment management or for facilities co-locations. The costs of providing these services typically remain on the ledger of the district. That is, they are in the district’s numerator, even when the pupils are removed from the denominator. This makes the resulting per pupil spending comparisons, well, simply wrong.

Figure 5. The Conceptual Problem with Matching Numerators and Denominators – Charter Spending Comparisons

Slide8

Connecticut is one state where responsibility for transportation and special education expense is retained by the district (while many CT charters serve very few children with disabilities to begin with). Figure 6 below provides an illustration of how charter to host district spending comparisons differ when one removes special education and transportation expenses from the districts’ numerator. When these expenses are included on the district’s expense, district spending is somewhat higher than charter spending, but when they are removed, in both cases district spending is lower.

Figure 6. Matching spending responsibilities for more accurate comparisons

Slide9

Notably, this is far from a complete analysis. It is merely illustrative. Similar problems exist with reported data on charter school revenues and spending in New Jersey.

In New York City, the Independent Budget Office has produced a handful of useful reports on making relevant comparisons there.

============

Note: Charter advocates often argue that charters are most disadvantaged in financial comparisons because charters must often incur from their annual operating expenses, the expenses associated with leasing facilities space. Indeed it is true that charters are not afforded the ability to levy taxes to carry public debt to finance construction of facilities. But it is incorrect to assume when comparing expenditures that for traditional public schools, facilities are already paid for and have no associated costs, while charter schools must bear the burden of leasing at market rates–essentially an “all versus nothing” comparison. First, public districts do have ongoing maintenance and operations costs of facilities as well as payments on debt incurred for capital investment, including new construction and renovation. Second, charter schools finance their facilities by a variety of mechanisms, with many in New York City operating in space provided by the city, many charters nationwide operating in space fully financed with private philanthropy, and many holding lease agreements for privately or publicly owned facilities. (for more, see: http://nepc.colorado.edu/files/rb-charterspending_0.pdf, p49-50)

==============

[i]Recently, when New Jersey slipped the attendance factor into the determination of state aid, Education Commissioner Chris Cerf argued:

“When you look at the (difference) between the number of children on the rolls and the number of children in some of these schools, it can be very distressing,” Cerf said. “Pushing these districts to do everything in their power to get kids to attend class is good.” http://blogs.app.com/capitolquickies/2012/04/24/cerf-said-push-districts-to-get-kids-in-school/

[ii] A study published in the Spring 2013 issue of the Journal of Education Finance purports to find positive effects on attendance and graduation rates in states with “strong incentive” enrollment basis for funding, with particular emphasis on states relying on average daily attendance, but combining with them many (most) states using an average daily membership figure. Most problematically, the study draws its main conclusion from state aggregate cross sectional analyses, applying unsatisfyingly ambiguous classifications of state school finance policy count methods, and applying an approach which cannot separate finance policy effects from other contextual differences across states.

The final study is published here: Ely, Todd L., and Mark L. Fermanich. “Learning to count: school finance formula count methods and attendance-related student outcomes.” Journal of Education Finance 38.4 (2013): 343+

An earlier draft is available here: http://www.aefpweb.org/sites/default/files/webform/Fermanich_Ely_AEFP_2012.pdf

On “Dropout Factories” & (Fraudulent) Graduation Rates in NJ

This NJ Star Ledger piece the other day reminded me of an issue I’ve been wanting to check out for some time now. I’m skeptical of graduation rates as a measure of student outcomes to begin with, because, of course, graduation can be strongly influenced by local norms and practices. As such, it’s really hard to validly compare graduation rates from one place to another or even over time, as graduation standards may change. Notably, arbitrary assignment of “passing” cut scores on high stakes assessment isn’t particularly helpful and can be quite harmful. But I digress.

What piqued my interest a while back was the apparent disconnect between cohort attrition measures from 9th to 12th grade, or 10th to 12th grade, and reported graduation rates. Indeed, these are two different things. BUT, it seems strange for example that North Star Academy in Newark could report a 100% graduation rate! and .3% dropout rate! while having approximately 50% attrition rate between grades 5 and 12! How can you lose half your kids over time and still have 100% graduation and effectively no dropouts. Of course the answer is that none of these are dropouts, but rather they are voluntary transfers (with no follow up to determine where they’ve gone or what happened to them).

In any case, it seemed at best, a bit disingenuous and at worst, outright fraudulent for North Star to present itself as near perfect, when a deeper dive into the data (something North Star’s own data driven leaders fail to ever report) suggest otherwise.

Here, I quickly explore the significance of this issue across charter and district schools statewide.

First, let’s look at 2013 graduation rates and the 2012-13 fall enrollment cohorts as seniors relative to themselves as freshman.

Slide1As the key indicates, orange dots are district cohort ratios – representing the senior class of 2013 as a percent of who they were as a freshman class of 2009-10. Green dots are graduation rates for all of the same district schools. Blue circles are reported graduation rates for charters and red squares are cohort ratios. The trendline is fit to charter and district school cohort ratios. In most cases, the cohort ratios are lower than the reported grad rates. But not by a whole lot. For TEAM Academy the two are close enough to overlap. For Central Jersey College Prep and University Academy, there is what appears to be a differential of about 5 to 7% or so.

But for North Star, the gap is huge. If we evaluate North Star on its reported graduation rate, the school looks great. Nearly perfect! But even compared to other schools statewide, on the same measure of cohort loss, North Star is no leader. Rather, it’s a laggard. (not a Paterson Sci/Tech or Hoboken laggard, but a laggard nonetheless).

Let’s take a look now at the 2012 and 2013 graduation rates, averaged, and the last 3 cohorts of sophomores to seniors, averaged just to see if the above single year estimates are anomalous.

Slide2Taking the two years of grad rates and three cohorts actually reveals that North Star a) falls even further below the trendline (worse than the average district school) AND b) still has a massive gap between reported grad rate and cohort loss. TEAM now also has a gap, but that gap is smaller than for North Star. Indeed, it is possible that TEAM is back filling enrollments (adding kids in high school to fill empty seats), but I’ll leave Ryan Hill, chief exec of TEAM to let me know if that’s the case.

Now, it’s certainly also possible that district schools in Newark are adding kids in upper grades, as they exit from North Star, or other charter or magnet schools. It is far less likely that many of these students are shifting to selective private schools (and upward, outward transfer) after the 10th grade.

Finally, let’s take a look at the gaps between reported graduation rates and cohort ratios, again using the last three sophomore to senior cohorts and last two years of graduation rates.

Slide3Consider this a test of the legitimacy of using the graduation rate to characterize the extent to which schools actually help students persist toward high school completion. The above graph suggests that North Star’s graduation rate is overstated by 10 to 15% averaged over time and that their graduation rate is far more inflated than nearly anyone else except American History High. TEAM is actually quite low in average difference between cohort attrition and reported graduation rate. The other high outlier here is Central Jersey College Prep.

Again, there can be a number of enrollment flow/transfer reasons for the gaps between cohort attrition and reported graduation rate. But, at the very least these figures should be regularly reported and used as a basis for evaluating the validity of reported graduation rates.

 

 

 

Welcome to Relay Medical College & North Star Community Hospital

Arne Duncan has one whopper of an interview available here: http://www.msnbc.com/andrea-mitchell-reports/watch/better-preparing-our-nations-educators-237066307522

Related to his new push to evaluate teacher preparation programs using student outcome data: https://schoolfinance101.wordpress.com/2014/04/25/arne-ology-the-bad-incentives-of-evaluating-teacher-prep-with-student-outcome-data/

And his Whitehouse press release can be found here: http://www.whitehouse.gov/the-press-office/2014/04/25/fact-sheet-taking-action-improve-teacher-preparation

Now, there’s a whole lot to chew on here, but let me focus on one of the more absurd hypocrisies in all of this.

First, Duncan seems to think the world of medical education without apparently having the first clue about how any of it actually works. In his view, it’s really just a matter of intensive clinical training (no academic prerequisites required) and competitive wages (a reasonable, though shallowly articulated argument).

Second, Duncan also seems to think that a major part of the solution for Ed Schools can be found in entrepreneurial startups like Relay Graduate School of Education. The Whitehouse press release proclaims:

Relay Graduate School of Education, founded by three charter management organizations, measures and holds itself accountable for both program graduate and employer satisfaction, and requires that teachers meet high goals for student learning growth before they can complete their degrees. There is promise that this approach translates into classroom results as K-12 students of Relay teachers grew 1.3 years in reading performance in one year.

Now, I’ll set aside for the moment that the student outcome metrics proposed for use in evaluating ed schools create the same bad incentives (and unproven benefits) that the feds have imposed for evaluating physicians and hospitals.

Let’s instead consider the model of the future – one which blends Arne Duncan’s otherwise entirely inconsistent models of training. I give you:

The Relay Medical College and North Star Community Hospital

Here’s how it all works. Deep in the heart of some depressed urban core where families and their children suffer disproportionate obesity, asthma and other chronic health conditions, where few healthy neighborhood groceries exist, but plenty of fast food joints are available, sits the newly minted North Star Community Hospital.

It all starts here. NSCH is a new kind of hospital that does not require any of its staff to actually hold medical degrees, any form of board certification or nursing credential, or even special technician degrees to operate medical equipment or handle medications. Rather, NSCH recruits bright young undergrads from top liberal arts colleges, with liberal arts majors, and puts them through an intense 5 week training program where they learn to berate and belittle minority families and children and shame them into eating more greens and fiber. Where they learn to demean them into working out – walking the treadmill, etc. It’s rather like an episode of the Biggest Loser. And the Hospital is modeled on the premise that if it can simply engage enough of the community members in its bootcamp style wellness program, delivered by these fresh young faces, they can substantively alter the quality of life in the urban core.

There is indeed some truth to the argument. Getting more community members to eat healthier and exercise will improve their health stats, including morbidity and mortality measures commonly used in Hospital rating systems. In fact, over time, this Hospital, which provides no actual medical diagnosis and treatment does produce annual reports that show astoundingly good outcome measures for community members who complete their program.

These great outcome measures generate headlines from the local news writers who fail to explore more deeply what they mean (Yes Star Ledger editorial board, that’s you!). NCSH becomes such a darling of the media and politicians that they are granted authority to start their own medical school to replicate their “successes.” And they are granted the authority to run a medical school where medical training need not even be provided by individuals with medical training!

Rather, they will grant medical degrees to their own incoming staff based on their own experiences with healthcare awesomeness. That’s right, individuals who themselves had little or no basic science or actual supervised clinical training in actual medicine, but have 3 to 5 years of experience in medical awesomeness in this start-up (pseudo) Hospital will grant medical degrees – to their own incoming peers!

Acknowledging the brilliance of this new model, US Dept of Health officials established a new rating system for all medical colleges whereby they must show that graduates of their programs reduce patient morbidity and mortality. RMC and NCSH continue to lead the nation, despite providing no actual medical interventions, but sticking to their plan of tough love, no excuses wellness training.

But, one day, it comes to light that while approximately 50 community members per year who succeeded in NCSH program and did in fact experience improved quality of life, there had been over 150 entrants to the program each year (like this). In fact, most failed. Some simply weren’t up for the daily berating inflicted on them by NCSH staff. Some had other chronic health ailments and were told by NCSH staff to suck it up, get in line (literally, in line, step left only when told) or leave.

It became clear that patients with diabetes and heart conditions need not apply. None of the staff employed at NCSH had training in cardiology or for that matter any CPR or basic life support skills. That stuff really didn’t matter to them and they sure as heck weren’t going to stand for someone keeling over on the treadmill, and lowering the NCSH mortality stats.

Sadly, by this point in time RMC and NCSH had become such a touted model that the real urban hospitals had all been closed. Further, there were few if any incentives for real medical colleges to train physicians to work in the urban core, where the traditional medical model had now been fully replaced by the RMC/NCSH model. They certainly couldn’t match the stats that NCSH was posting if they chose to serve patients who actually had chronic health conditions, or were non-compliant patients.

And those 100 dropouts of the NCSH program from each cohort, those with diabetes, heart disease and other health conditions not so easily mediated with a good shout down, were simply out luck. Actual community morbidity and mortality stats skyrocketed. But alas, no one was left to care.

Note:

Indeed, wellness is key to the provision of high quality healthcare in the urban core and elsewhere. But it is not a replacement. And yes, one can make an argument that the bootcamp program described above as NCSH legitimately helped to improve the health outcomes and perhaps even the overall quality of life for the 50 program completers, as does the reality TV show Biggest Loser.

One can certainly make the comparison to the benefits obtained by the 50% or so actual completers of the most no excusiness charter schools like North Star Academy in Newark, NJ. Those few students who do succeed and complete are likely better off academically than they might have otherwise been. But this by no means indicates that North Star Academy and Relay Graduate School of Education, or my hypothetical North Star Community Hospital and Relay Medical College are model programs for serving the public good. In fact, as pointed out here, assuming so, applying bogus easily manipulated and simply wrongheaded metrics to proclaim success, may in the end cause far more harm than good.

 

 

Arne-Ology & the Bad Incentives of Evaluating Teacher Prep with Student Outcome Data

As I understand it, USDOE is going to go ahead with the push to have teacher preparation programs rated in part based on the student growth outcomes of children taught by individuals receiving credentials from those programs. Now, the layers of problems associated with this method are many and I’ve addressed them previously here and in professional presentations.

  1. This post summarizes my earlier concerns about how the concept fails both statistically and practically.
  2. This post explains what happens at the ridiculous extremes of this approach (a warped, endogenous cycle of reformy awesomeness)
  3. These slides present a more research based, and somewhat less snarky critique

Now, back to the snark.

This post builds on my most recent post in which I challenged the naive assertion that current teacher ratings really tell us where the good teachers are. Specifically, I pointed out that in Massachusetts, if we accept the teacher ratings at face value, then we must accept that good teachers are a) less likely to teach in middle schools, b) less likely to teach in high poverty schools and c) more likely to teach in schools that have more girls than boys.

Slide4

Extending these findings to the policy of rating teacher preparation programs by the ratings their teachers receive… working on the assumption that these ratings are quite strongly biased by school context, it would make sense for Massachusetts teacher preparation institutions to try to get their teachers placed in low poverty elementary schools that have fewer boys.

Given that New Jersey growth percentile data reveal even more egregious patterns of bias, I now offer insights for New Jersey colleges of education as to where they should try to place their graduates – that is, if they want to win at the median growth percentile game.

Slide2

It’s pretty simple – New Jersey colleges of education would be wise to get their graduates placements in schools that are:

  • 20% of fewer free lunch (to achieve good math gains)
  • 5% or lower black (to achieve good math gains)
  • 11% or lower free lunch (to achieve good LAL gains)
  • 2% or lower black ( to achieve good LAL gains)

Now, the schools NJ colleges of ed should avoid (for placing their grads) are those that are:

  • over 50% free lunch
  • over 30% black

That is, if colleges of education want to play this absurd game of chasing invalid metrics.

Let’s take a look at some of the specific districts that might be of interest.

Here are the districts with the highest and lowest growth producing teachers (uh… assuming this measure has any attribution to teacher quality).

Slide3

Now, my New Jersey readers can readily identify the differences between these groups, with a few exceptions. Ed schools in NJ would be wisest to maximize their placements in locations like Bernards Twp, Essex Fells, Princeton, Mendham and Ridgewood. After all, what young grads wouldn’t want to work in these districts? And of course, Ed schools would be advised to avoid placing any grads in districts like East Orange, Irvington or Newark.

Let me be absolutely clear here. I AM NOT ACTUALLY ADVOCATING SUCH DETRIMENTAL UNETHICAL BEHAVIOR.

Rather, I am pointing out that newly adopted USDOE regulations in fact endorse this model by requiring that this type of data actually be used to consequentially evaluate teacher preparation programs.

It’s simply wrong. It’s bad policy. And it must stop!

And yes… quite simply… this is WORSE THAN THE STATUS QUO!

For further discussion on this point, I refer you to this post!

 

 

 

 

 

The Endogeneity of the Equitable Distribution of Teachers: Or, why do the girls get all the good teachers?

Recently, the Center for American Progress (disclosure: I have a report coming out through them soon) released a report in which they boldly concluded, based on data on teacher ratings from Massachusetts and Louisiana, that teacher quality is woefully inequitably distributed across children by the income status of those children. As evidence of these inequities, the report’s authors included a few simple graphs, like this one, showing the distribution of teachers by their performance categories:

Figure 1. CAP evidence of teacher quality inequity in Massachusetts

Slide1

Based on this graph, the authors conclude:

In Massachusetts, the percentage of teachers rated Unsatisfactory is small overall, but students in high-poverty schools are three times more likely to be taught by one of them. The distribution of Exemplary teachers favors students in high-poverty schools, who are about 30 percent more likely to be taught by an exemplary teacher than are students in low-poverty schools. However, students in high-poverty schools are less likely to be taught by a Proficient teacher and more likely to be taught by a teacher who has received a Needs Improvement rating. (p. 4)

But, there exists (at least) one huge problem of making the assertion that teacher ratings, built significantly on measures such as Student Growth Percentiles, provide evidence of inequitable distribution of teaching quality. It is very well understood that many value added estimates in state policy and practice, and most if not all student growth percentile measures used in state policy and practice are substantially influenced by student population characteristics including income status, prior performance and even gender balance of classrooms.

Let me make this absolutely clear one more time – simply because student growth percentile measures are built on expected current scores of individual students based on prior scores does not mean, by any stretch of the statistical imagination, that SGPs “fully account for student background” and even more so, for the classroom context factors including other students and the student group in the aggregate. Further, Value Added Models (VAMs) which may take additional steps to account for these potential sources of bias are typically not successful at removing all such bias.

Figure 2 here shows the problem. As I’ve explained numerous previous times, growth percentile and value added measures contain 3 basic types of variation:

  1. Variation that might actually be linked to practices of the teacher in the classroom;
  2. Variation that is caused by other factors not fully accounted for among the students, classroom setting, school and beyond;
  3. Variation that is, well, complete freakin statistical noise (in many cases, generated by the persistent rescaling and stretching, cutting and compressing, then stretching again, changes in test scores over time which may be built on underlying shifts in 1 to 3 additional items answered right or wrong by 9 year olds filling in bubbles with #2 pencils).

Our interest in #1 above, but to the extent that there is predictable variation, which combines #1 and #2, we are generally unable to determine what share of the variation is #1 and what share is #2.

Figure 2. The Endogeneity of Teacher Quality Sorting and Ratings Bias

Slide2

A really important point here is that many if not most models I’ve seen actually adopted by states for evaluating teachers do a particularly poor job at parsing 1 & 2. This is partly due to the prevalence of growth percentile measures in state policy.

This issue becomes particularly thorny when we try to make assertions about the equitable distribution of teaching quality. Yes, as per the figure above, teachers do sort across schools and we have much reason to believe that they sort inequitably. We have reason to believe they sort inequitably with respect to student population characteristics. The problem is that those same student population characteristics in many cases also strongly influence teacher ratings.

As such, those teacher ratings themselves aren’t very useful for evaluating the equitable distribution of teaching. In fact, in most cases it’s a pretty darn useless exercise, ESPECIALLY with the measures commonly adopted across states to characterize teacher quality.Being able to determine the inequity of teacher quality sorting requires that we can separate #1 and #2 above. That we know the extent to which the uneven distribution of students affected the teacher rating versus the extent to which teachers with higher ratings sorted into more advantaged school settings.

Now, let’s take a stroll through just how difficult it is to sort out whether the inequity CAP sees in Massachusetts teacher ratings is real, or more likely just a bad, biased ratings system.

Figure 3 relates the % of teachers in the bottom two ratings categories to the share of children qualified for free lunch, by grade level, across Massachusetts schools. As we can see, low poverty schools tend to have very few of those least effective teachers, whereas many, though not all higher poverty schools do have larger shares, consistent with the CAP findings.

Figure 3. Relating Shares of Low Rated Teachers and School Low Income Share in Massachusetts

Slide3

Figure 4 presents the cross school correlations between student demographic indicators and teacher ratings. Again, we see that there are more low rated teachers in higher poverty, higher minority concentration schools.

But, as a little smell-test here, I’ve also included % female students, which is often a predictor of not just student test score levels but also rates of gain. What we see here is that at the middle and secondary level, there are fewer “bad” teachers in schools that have higher proportions of female students.

Does that make sense? Is it really the case that the “good” teachers are taking the jobs in the schools with more girls?

Figure 4. Relating Shares of Low Rated Teachers and School Demographics in Massachusetts

 Slide4

 

Okay, let’s do this as a multiple regression model, and for visual clarity, graph the coefficients in Figure 5. Here, I’ve regressed the % low performing teachers on each of the demographic measures. In find a negative (though only sig. at p<.10) effect on the % female measure. That is, schools with more girls have fewer “bad” teachers. Yes, schools with more low income kids seem to have more “bad” teachers, but in my view, the whole darn thing is suspect.

Figure 5. Regression Based Estimates of Teacher Rating Variation by Demography in Massachusetts

Slide5

So, the Massachusetts ratings seem hardly useful for sorting out bias versus actual quality and thus determining which kids are being subjected to better or worse teachers.

But what about other states? Well, I’ve written much about the ridiculous levels of bias in the New Jersey Growth Percentile measures. But, here they are again.

Figure 6. New Jersey School Growth Percentiles by Low Income Concentration and Grade 3 Mean Scale Scores

 Slide6

Figure 6 shows that New Jersey school median growth percentiles are associated with both low income concentration and average scale scores of the first tested grade level. The official mantra of the state department of education is that these patterns obviously reflect that low income, low performing children are simply getting the bad teachers. But that, like the CAP finding above, is an absurd stretch given the complete lack of evidence as to what share of these measures, if any, can actually be associated with teacher effect and what share is driven by context and students.

So, let’s throw in that percent female effect just for fun. Table 1 provides estimates from a few alternative regression models of the school level SGP data. As with the Massachusetts ratings, the regressions show that the share of student population that is female is positively associated with school level median growth percentile, and quite consistently and strongly so.

Now, extending CAP’s logic to these findings, we must now assume that the girls get the best teachers! Or at least that schools with more girls are getting the better teachers. It could not possibly have anything to do with classrooms and schools having more girls being, for whatever reason, more likely to generate test score gains, even with the same teachers? But then again, this is all circular.

Table 1. Regressions of New Jersey School Level Growth Percentiles on Student Characteristics

Slide7

Note here that these models are explaining in the case of LAL, nearly 40% of the variation in growth percentiles. That’s one heck of a lot of potential bias. Well, either that, or teacher sorting in NJ is particularly inequitable. But knowing what’s what here is impossible. My bet is on some pretty severe bias.

Now for one final shot, with a slightly different twist. New York City uses a much richer value-added model which accounts much more fully for student characteristics. The model also accounts for some classroom and school characteristics. But the New York City model, which also produces much noisier estimates as a result (the more you parse the bias, the more you’re left with noise), doesn’t seem to fully capture some other potential contributors to value added gains. The regressions in Table 2 below summarize resource measures that predict variation in school aggregated teacher value added estimates for NYC middle schools.

Table 2. How resource variation across MIDDLE schools influences aggregate teacher value-added in NYC

Slide8

Schools with smaller classes or higher per pupil budgets have higher average teacher value added! It’s also the case that schools with higher average scale scores have higher average teacher value added. That poses a potential bias problem. Student characteristics must be evaluated in light of the inclusion of the average scale score measure.

Indeed, more rigorous analyses can be done to sort the extent that “better” (higher test score gain producing) teachers migrate to more advantaged schools, but with very limited samples of data on teachers having prior ratings in one setting who then sort to another (and maintain some stable component of their prior rating). Evaluating in large scale, without tracking individual moves, even when trying to include a richer set of background variables is likely to mislead.

Another alternative is to reconcile teacher sorting by outcome measures with teacher sorting by other characteristics that are exogenous (not trapped in this cycle of cause and effect). Dan Goldhaber and colleagues provide one recent example applied to data on teachers in Washington State. Goldhaber and colleagues compared the distribution of a) novice teachers, b) teachers with low VAM estimates and c) teachers by their own test scores on a certification exam, across classrooms, schools and districts by 1) minority concentration, 2) low income concentration and 3) prior performance. That is, the reconciled the distribution of their potentially endogenous measure (VAM) with two exogenous measures (teacher attributes). And they did find disparities.

Notably, in contrast with much of the bluster about teacher quality distribution being primarily a function of corrupt, rigid contract driving within district and within school assignment of teachers, Goldhaber and colleagues found the between district distribution of teacher measures to be most consistently disparate:

For example, the teacher quality gap for FRL students appears to be driven equally by teacher sorting across districts and teacher sorting across schools within a district. On the other hand, the teacher quality gap for URM (underrepresented minority) students appears to be driven primarily by teacher sorting across districts; i.e., URM students are much more likely to attend a district with a high percentage of novice teachers than non-URM students. In none of the three cases do we see evidence that student sorting across classrooms within schools contributes significantly to the teacher quality gap.

These findings, of course, raise issues regarding the logic that district contractual policies are the primary driver of teacher quality inequity (the BIG equity problem, that is). Separately, while the FRL results are not entirely consistent with the URM (Underrepresented Minority) findings, this may be due to the use of a constant income threshold for comparing districts in rural Eastern Washington to districts in the Seattle metro. Perhaps more on this at a later point.

Policy implications of misinformed conclusions from bad measures

The implications of ratings bias vary substantially by the policy preferences supported to resolve the supposed inequitable distribution of teaching. One policy preference is the “fire the bad teachers” preference, assuming that a whole bunch of better teachers will line up to take their jobs. If we impose this policy alternative using such severely biased measures as the Massachusetts or New Jersey measures, we will likely find ourselves disproportionately firing and detenuring, year after year, teachers in the same high need schools, having little or nothing to do with the quality of the teachers themselves. As each new batch of teachers enters these schools, and subsequently faces the same fate due to the bogus, biased measures it seems highly unlikely that high quality candidates will continue to line up. This is a disaster in the making. Further, applying the “fire the bad teachers” approach in the presence of such systematically biased measures is likely a very costly option – both in terms of the district costs of recruiting and training new batches of teachers year after year, and the costs of litigation associated with dismissing their predecessors based on junk measures of their effectiveness.

Alternatively, if one provides compensation incentives to draw teachers into “lower performing” schools, and perhaps take efforts to improve working conditions (facilities, class size, total instructional load), fewer negative consequences – even in the presence of bad, biased measurement, are likely to occur. One can hope, based on recent studies of transfer incentive policies, that some truly “better” teachers would be more likely to opt to work in schools serving high need populations, even where their own rating might be at greater risk (assuming policy does not assign high stakes to that rating). This latter approach certainly seems more reasonable, more likely to do good, and at the very least far less likely to do serious harm.

Why you can’t compare simple achievement gaps across states! So don’t!

Consider this post the second in my series of basic data issues in education policy analysis.

This is a topic on which I’ve written numerous previous posts. In most previous posts I’ve focused specifically on the issue of problems with poverty measurement across contexts and how those problems lead to common misinterpretations of achievement gaps. For example, if we simply determine achievement gaps by taking the average test scores of children above and below some arbitrary income threshold, like those qualifying or not for the federally subsidized school lunch program, any comparisons we make across states will be severely compromised by the fact that a) the income threshold we use may provide very different quality of life from Texas to New Jersey and b) the average incomes and quality of life of those above that threshold versus those below it may be totally different in New Jersey than in Texas.

For example, the histogram below presents the New Jersey and Texas poverty income distributions for families of children between the ages of 5 and 17. The Poverty Index is the ratio of family income to the poverty income level (which is fixed national). The histograms are generated using 2011 American Community Survey data extracted from http://www.ipums.org (one of my favorite sites!). The vertical line is set at 185% poverty, or the federal “reduced price lunch” threshold, a common threshold used in comparing low income to non-low income student achievement gaps.

As we can see, the income distribution in New Jersey is simply higher than that of Texas. It’s also more dispersed. And, as it turns out the ratio of income for those above, versus those below the 185% threshold is much greater in New Jersey.

In New Jersey, the income ratio is about 6:1 for those above versus those below the 185% threshold. In Texas, the ratio is about 4.5:1. And that matters when comparing achievement gaps!

Figure 1. Poverty Income Distributions for Texas and New Jersey

Slide1

Figure 2 illustrates the relationship between the income ratios for non-low income to low–income children’s families and outcome gaps for NAEP grade 4 Math in 2011. Put simply, states with larger income gaps also have larger outcome gaps. States with the largest income gaps, like Connecticut, have particularly large outcome gaps. Clearly, it would be inappropriate to compare directly, the income achievement gap of Idaho, for example to New Jersey. New Jersey does have a larger outcome gap. But New Jersey also has a larger income gap. Both states fall below the trendline, indicating (that if we assume the relationship to be linear) that their outcome gaps are both smaller than expected, and in fact, quite comparable.

Figure 2: Relating Income Gaps to Outcome Gaps (NAEP Grade 4 Math)

Slide2

 

 

Table 1 summarizes the correlations between income and outcome gaps for each NAEP math and reading assessment, over several years.

Table 1. Correlations between Income and Outcome GapsSlide3

It stands to reason that if the income differences between low income and non-low income families affect the income achievement gaps, then so too would the income differences between racial groups affect the outcome differences between racial groups. Therefore, it is equally illogical to compare directly racial achievement gaps across states.

Figure 3a shows the black and white family income distributions in Texas and 3b shows the income distributions in Connecticut.

In Texas, the ratio of family income for white families to black families is about 1.5 to 1.

In Connecticut, that ratio is over 2.3:1.

Figure 3a. Black and White Income Distributions in Texas

Slide5

Figure 3b. Black and White Income Distributions in Connecticut

 Slide4

Thus, as expected, Figure 4 shows that the income gaps between black and white families are quite strongly correlated with the outcome gaps of their children in Math (Grade 4).

Figure 4. Income Gaps between Black and White Students and Outcome Gaps

Slide6

Table 2 shows the correlations between black-white family income gaps and black-white child outcome gaps for NAEP assessments since 2000.

Table 2. Correlations between Black-White Income Gaps and Black-White Test Score GapsSlide7

Why is this important? Well, it’s important because state officials and data naïve education reporters love to make a big deal about which states have the biggest achievement gaps and by extension assert that the primary reason for these gaps is state lack of attention to the gaps in policy.

Connecticut reporters and politicos love to point to that state’s “biggest in the nation” achievement gap, with absolutely no cognizance of the fact that their achievement gaps, both income and race related, are driven substantially by the vast income disparity of the state. That said, Connecticut consistently shows larger gaps than would even be expected for its level of income disparity.

Black-white achievement gaps are similarly a hot topic in Wisconsin, but will little acknowledgement that Wisconsin also has the largest income gap (other than DC) between the families of black and white children.

New Jersey officials love to downplay the state’s high average performance by lambasting defenders of public schools with evidence of largest in the nation achievement gaps.

“The dissonance in that is if you get beneath the numbers, beneath the aggregates, you’ll see that we have one of the largest achievement gaps in the nation.” (former Commissioner Christopher Cerf)

Years ago, politicos and education writers might have argued that these gaps persist because of funding and related resource gaps. Nowadays, the same groups might argue that these gaps persist because employment protections for “bad teachers” in high poverty, high minority concentration schools, and that where the gaps are bigger, those protections must be somehow most responsible.

But these assertions – both the old and the new – presume that comparisons of achievement gaps, either by race or income, between states are valid. That is, they validly reflect policy/practice differences across states and not some other factor.

Quite simply, as most commonly measured, they do not. They largely reflect differences in income distributions across states, a nuance I suspect will continue to be overlooked in public discourse and the media. But one can hope.

 

 

Understand your data & use it wisely! Tips for avoiding stupid mistakes with publicly available NJ data

My next few blog posts will return to a common theme on this blog – appropriate use of publicly available data sources. I figure it’s time to put some positive, instructive stuff out there. Some guidance for more casual users (and more reckless ones) of public data sources and for those must making their way into the game. In this post, I provide a few tips on using publicly available New Jersey schools data. The guidance provided herein is largely in response to repeated errors I’ve seen over time in using and reporting New Jersey school data, where some of those errors are simple oversight and lack of deep understanding of the data, and others of those errors seem a bit more suspect. Most of these recommendations apply to using other states’ data as well. Notably, most of these are tips that a thoughtful data analyst would arrive at on his/her own, by engaging in the appropriate preliminary evaluations of the data. But sadly these days, it doesn’t seem to work that way.

So, here are a few NJ state data tips.

NJ ASK scale score data vary by grade level, so aggregating across grade levels produces biased comparisons if schools have different numbers of kids in different grade levels

NJ, like other states has adopted math and reading assessments from grades 3 to 8 and like other states has made numerous rather arbitrary decisions over time as to how to establish cut scores determining proficiency on the assessments, and methods for converting raw scores (numbers of items on a 50 point test) into scale scores (with proficiency cut-score of 200 and max score of 300). [1] The presumption behind this method is that “proficiency” has some common meaning across grade levels. That a child who is proficient in grade 3 math for example, if he or she learns what they are supposed to in 4th grade (and only what they are supposed to), they will again be proficient at the end of the year. But that doesn’t mean that the distributions of testing data actually support this assumption. Alternatively, the state could have scaled the scores year-over-year such that the average student remained the average student, a purely normative approach rather than the pseudo-standards-based (mostly normative) approach currently in use.

A few fun artifacts of the current approach are that a) proficiency rates vary from one grade to the next, giving a false impression that, for example, 5th graders simply aren’t doing as well as 4th graders in language arts, and that b) scale score averages vary similarly. Many a 5 or 6th grade teacher or grade level coordinator across the state has come under fire from district officials for their apparent NJASK underperformance compared to lower grades. But this underperformance is merely an artifact of arbitrary decisions in the design of the tests, difficulty of the items, conversion to scale scores and arbitrary assignment of cut points.

Here’s a picture of the average scale scores drawn from school level data weighted by relevant test takers for NASK math and NJASK language arts. Of course, the simplest implication here is that “kids get dumber at LAL as they progress through grades” and or their teachers simply suck more, and that “kids get smarter in math as they progress through grades.” Alternatively – as stated above, this is really just an artifact of those layers of arbitrary decisions.

Figure 1. Scale Scores by Grade Level Statewide

Slide1

Why then, do we care? How does this affect our common uses of the data? Well, on several occasions I’ve encountered presentations of schoolwide average scale scores as somehow representing school average test performance. The problem is that if you aggregate across grades, but have more kids in some grades than others, your average will be biased by the imbalance of kids. If you are seeking to play this bias to your advantage:

  1. If your school has more kids in grades 6 to 8 than in 3 to 5, you’d want to look at LAL scores. That’s because kids statewide simply score higher on LAL in grades 6 to 8. It would be completely unfair to compare schoolwide LAL scores for a school with mostly grades 6 to 8 students to schoolwide LAL scores for a school with mostly grades 3 to 5 students. Yet it is done far too often!
  2. Interestingly, the revers appears true for math.

So, consumers of reports of school performance data in New Jersey should certainly be suspicious any time someone chooses to make comparisons solely on the basis of schoolwide LAL scores, or math scores for that matter. While it makes for many more graphs and tables, grade level disaggregation is the only way to go with these data.

Let’s take a look.

Here are Newark Charter and District schools by schoolwide LAL, and by low income concentration. Here, we see that Robert Treat Academy, North Star Academy and TEAM academy fall above the line. That is, relative to their low income concentrations (setting aside very low rates of children with disabilities or ELL children, and 50+% attrition at North Star) have average schoolwide scale scores that appear to exceed expectations.

Figure 2. Newark Charter vs. District School Schoolwide LAL by Low Income Concentration

Slide3

 

But, as Figure 3 shows, it may not be a good idea (unless of course you are gaming the data) to use LAL school aggregate scale scores to represent the comparative performance of these schools with NPS schools. As Figure 3 shows, both North Star and TEAM academy – especially TEAM academy have larger shares of kids in the grades where average scores tend to be higher as a function of the tests and their rescaling.

Figure 3. Charter and District Grade Level DistributionsSlide2

http://www.nj.gov/education/data/enr/enr13/enr.zip

Figure’s 4a and 4b break out the comparisons by grade level and provide both the math and LAL assessments for a more complete and more accurate picture, though still ignoring many other variables that may influence these scores (attrition, special education, ELL and gender balance). These figures also identify schools slated for takeover by charters. Whereas TEAM academy appeared on schoolwide aggregate to “beat the odds” on LAL, TEAM falls roughly on the trendline for LAL 6, 7 and 8 and falls below it for LAL 5. That is, disaggregation paints a different picture of TEAM academy in particular – one of a school that by grade level more or less meets the average expectation.  Similarly for North Star, while their small groups of 3rd and 4th graders appear to substantially beat the odds, differences are much smaller for their 5 through 8th grade students when compared to only students in those same grades in other schools. Some similar patterns appear for math, except that TEAM in particular falls more consistently below the line.

Figure 4a. Charter and District Scale Scores vs. Low Income by Grade

Slide4

Figure 4b. Charter and District Scale Scores vs. Low Income by Grade

Slide5

Figure 4c. Charter and District Scale Scores vs. Low Income by Grade

Slide6

A few related notes are in order:

  • Math assessments in grades 5-7 have very strong ceiling effects which are particularly noticeable in more affluent districts and schools where significant shares of children score 300.
  • As a result of the scale score fluctuations, there are also by grade proficiency rate fluctuations.

Not all measures are created equal: Measures, thresholds and cutpoints matter!

I’ve pointed this one out on a number of occasions – that finding the measure that best captures the variations in needs across schools is really important when you are trying to tease out how those variations relate to test scores. I’ve also explained over and over again how measures of low income concentration commonly used in education policy conversations are crude and often fail to capture variation across settings. But, among the not-so-great options for characterizing differences in student needs across schools, there are better and worse methods and measures. Two simple and highly related rules of thumb apply when evaluating factors that may affect or be strongly associated with student outcomes:

  1. The measure that picks up more variation across settings is usually the better measure, assuming that variation is not noise (simply a greater amount of random error in reporting of the measure).
  2. Typically, the measure that picks up more “real” variation across settings will also be more strongly correlated with the measure of interest – in many cases variations in student outcome levels.

A classic case of how different thresholds or cutpoints affect the meaningful variation captured across school settings is the choice of shares of free lunch (below the 130% threshold for poverty) versus free or reduced priced lunch (below the much higher 185% threshold) when comparing schools in a relatively high poverty setting. In many relatively high poverty settings, the share of children in families below the 185% threshold exceeds 80 to 90% across all schools. Yes, there may appear to be variation across those schools, but that variation within such a narrow, truncated range may be particularly noisy, and thus not very helpful in determining the extent to which low income shares compromise student outcomes. It is really important to understand that two schools with 80% of children below the 185% income threshold for poverty can be hugely different in terms of actual income and poverty distribution.

Here, for example, is the distribution of schools by concentration of children in families below the 185% income threshold in Newark, NJ. The mean is around 90%!

Figure 5.

Slide8

Now here is the distribution of schools by concentration of children in families below the 130% threshold. The bell curve looks similar in shape, but now the mean is around 80% and the spread is much greater. But even this is really the test of proof of the meaningfulness of this variation.

Figure 6.

Slide9

 

But first a little aside. If, in figure 5, nearly all kids are below the free/reduced threshold and fewer below the free lunch threshold, we basically have a city where “if not free lunch, then reduced lunch.” Plotted, it looks like this:

Figure 7.

Slide10

The correlation here is -.65 across Newark schools. What this actually means, is that the percent reduced lunch children is, in fact a measure of the percent of lower need children in any school – because there are so few children who don’t qualify for either. Children qualifying for reduced price lunch in Newark are among the upper income children in Newark schools. If a school has fewer reduced lunch children, it typically means they have more free lunch children and vice versa.

As such comparing charter schools to district schools on the basis of % free or reduced is completely bogus, because charters serve very low shares of the lower income children but do serve the rest.

Second, it is statistically very problematic to put both of these measures – the inverse of one another because they account for nearly the entire population – in a single regression model!

Further validation of the importance of using the measures of actual lower income children is provided in the table below, which shows the correlations between outcome measures across schools and student population characteristics.

Figure 8. Correlations between low income concentrations and outcome measures

Slide11 With respect to every outcome measure, % free lunch is more strongly negatively associated with the outcome measure. Of course, one striking problem here is that the growth percentile scores, while displaying weaker relationships to low income than level scores, do show a modest relationship, indicating their persistent bias, even across schools within a relatively narrow range of poverty (Newark). But that’s a side story for now!

To add to the assertion that % reduced lunch in a district like Newark (where % reduced means, % not free lunch), is in fact a measure of relative advantage, take a look at the final column. % reduced lunch alone is strongly positively correlated with the outcome level measures. Statistically, this is a given since it is more or less the inverse of a measure that is strongly negatively correlated with the outcome measures.

Know your data context!

Finally, and this is somewhat of an extension of a previous point, it’s really important if you intend to engage in any kind of comparisons across school settings, to get to know your context. Get to know your data and how they vary across schools. For example, know that nearly all kids in Newark fall below the 185% income threshold and that this means that if a child is not below the 130% income threshold, then they are likely still below the 185% threshold. This creates a whole different meaning from the “usual” assumptions about children qualified for reduced price lunch, how their shares vary across schools, and what it likely means.

Many urban districts have population distributions by race that are similarly in inverse proportion to one another. That is, in a city like Newark, schools that are not predominantly black tend to be predominantly Hispanic. Similar patterns exist in Chicago and Philadelphia at much larger scale. Here is the scatterplot for Newark. In Newark, the relationship between % black and % Hispanic is almost perfectly inverse!

Figure 9. % Black versus % Hispanic for Newark Schools

Slide7As Mark Weber and I pointed out in our One Newark briefs, just as it would be illogical (as well as profoundly embarrassing) to try to consider both % free and % reduced lunch in a model comparing Newark schools, it is hugely problematic to try to address both % Hispanic and % black in any model comparing Newark schools. Quite simply, for the most part, if not one then the other.

Catching these problems is a matter of familiarity with context and familiarity with data. These are common issues. And I encourage budding grad students, think tankers and data analysts to pay closer attention to these issues.

How can we catch this stuff?

Know your context.

Run descriptive analyses first to get to know your data.

Make a whole bunch of scatterplots to get to know how variables relate to one another.

Don’t assume that the relationships and meanings of the measures in one context necessarily translate to another. The best example here is the meaning of % reduced lunch. It might just be a measure of relative advantage in a very high poverty urban setting.

And think… think… think twice… and think again about just what the measures mean… and perhaps more importantly, what they don’t and cannot!

 

Cheers!

 

 

 

[1] New Jersey Assessment of Skills and Knowledge 2012 TECHNICAL REPORT Grades 3-8. February 2013. NJ Department of Education. http://www.nj.gov/education/assessment/es/njask_tech_report12.pdf

A Response to “Correcting the Facts about the One Newark Plan: A Strategic Approach To 100 Excellent Schools”

schoolfinance101's avatarNew Jersey Education Policy Forum

Full report here: Weber.Baker.OneNewarkResponseFINALREVIEW

Mark Weber & Bruce Baker

On March 11, 2014, the Newark Public Schools (NPS) released a response to our policy brief of January 24, 2014: “An Empirical Critique of One Newark.”[1] Our brief examined the One Newark plan, a proposal by NPS to close, “renew,” or turn over to charter management organizations (CMOs) many of the district’s schools. Our brief reached the following conclusions:

  •  Measures of academic performance are not significant predictors of the classifications assigned to NPS schools by the district, when controlling for student population characteristics.
  • Schools assigned the consequential classifications have substantively and statistically significantly greater shares of low income and black students.
  • Further, facilities utilization is also not a predictor of assigned classifications, though utilization rates are somewhat lower for those schools slated for charter takeover.
  • Proposed charter takeovers cannot be justified on the assumption that charters will yield better outcomes…

View original post 691 more words

DFER Idiocy on New York School Finance

This may just be among the most ludicrous proclamations I’ve read in quite some time, and it’s brought to us by none other than Dimwits doofuses/doofi? …well something with a “D” For Education Reform:

“Contrary to what you may hear from certain special interest groups, the best way to fix our schools is not just to pour more money into the education bureaucracy. New York already spends $75 billion in education annually—from public schools to state funded universities—more than the total annual budget of 47 other states. What we need is smarter investments that actually deliver results, like statewide universal full-day pre-k, scholarships for students in critically-needed STEM courses and funding to reward our hardest working teachers. Governor Cuomo is taking a stand for our students by pushing for these programs, and we should all join him in putting our students first.” – See more at: http://www.dfer.org/blog/2014/03/dfer-ny_release_1.php#sthash.GfFKgHng.dpuf
I’ve spoken on this point on many  previous occasions – that simply throwing out “big a-contextual numbers” is pointless. It says nothing. It’s bafflingly ignorant – mathematically inept and simply stupid. That NY “already spends $75 billion in education annually” is neither here nor there without context. It’s just dumb blather – pointless. Saying that it’s “more than the total budget of 47 other states” is equally stupid.
Let’s review NY state school finance issues for a moment. Here’s a recap of previous posts:

  1. On how New York State crafted a low-ball estimate of what districts needed to achieve adequate outcomes and then still completely failed to fund it.
  2. On how New York State maintains one of the least equitable state school finance systems in the nation.
  3. On how New York State’s systemic, persistent underfunding of high need districts has led to significant increases of numbers of children attending school with excessively large class sizes.
  4. On how New York State officials crafted a completely bogus, racially and economically disparate school classification scheme in order to justify intervening in the very schools they have most deprived over time.

But sticking specifically to the current issue, take a look here at how the Governor’s current budget proposal severely undercuts in state aid WHAT SCHOOL DISTRICTS NEEDED BACK IN 2007 UNDER LOWER STANDARDS, based on the state’s own formula estimates.

Below are the more comprehensive briefs on this topic. Yeah… I know this is way to  difficult reading for them DFER folk… and it includes math and numbers… but those seeking a deeper understanding of school funding in New York State please do read.

Here’s a post in which I explain more concisely these issues! To summarize without copying the whole post, New York State has continued raising outcome standards and continues to fall more than 30% short of their own school finance formula funding targets. Funding targets which were a) low-balled in the first place and b) have been lowered since.

Funding shortfalls for many districts are around 50% of the aid they should receive. And those shortfalls are greatest for the neediest districts. The state continues to underfund NYC’s foundation aid by about $3 billion per year.

Top 50 2014-15 Budgeted Shortfalls

[data run as of 1/17/14]

Slide3

Further, our annual report Is School Funding Fair has repeatedly identified NY state as among the most regressively financed states in the nation – that is, states where state and local revenue is systematically lower in higher poverty districts.

My more recent longitudinal analyses show negligible progress for NY state from 1993 to 2011. This graph (below) of mid-Atlantic states shows that despite court orders in the past, NY state continues to operate a regressively funded  system.

On the vertical axis we have the school funding fairness ratio, based on the model we use in our annual report, but extended over 19 years. This ratio shows the expected spending or revenue at 30% census poverty over the expected revenue at 0% poverty. When the ratio is over 1.0, the system is progressive and when it’s under 1.0 it’s regressive.

NY has been regressive since 1993 and shows little sign of improving (backsliding for now). Also noted are timings of judicial rulings – this stuff is from a forthcoming paper I’m working on for an academic conference. As is well understood, Pennsylvania is also a particularly regressive state.

MidAtlantic

And now to reiterate one more thing – which I wrote about just the other day – uh… yesterday! 

School finance reform does matter! On balance, it is safe to say that a significant and growing body of rigorous empirical literature validates that state school finance reforms can have substantive, positive effects on student outcomes, including reductions in outcome disparities or increases in overall outcome levels. Further, it stands to reason that if positive changes to school funding have positive effects on short and long run outcomes both in terms of level and distribution, then negative changes to school funding likely have negative effects on student outcomes.

I’ve also addressed this here: http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

And here! http://www.tcrecord.org/content.asp?contentid=16106

It is completely ignorant to assert – NY spends $75 billion – therefore that’s enough. Schools just need to spend it better with absolutely no understanding of where that money is in the state – which districts  have more and which have less – or how much may really be needed to provide for an equitable and adequate system of school funding in New York.

So… am I forgoing civil discourse here a bit. Hell yeah! This ignorant drivel by DFER is pathetic political pandering. And it’s just dumb. Dangerously dumb.

All toward the noble cause of advocating against equitable and adequate school funding for the state’s neediest schoolchildren. ???

Civil discourse ended as soon as they decided that facts and context simply don’t matter. I simply have no tolerance for this level of stupid!

What really matters? Equitable & Adequate Funding!

Below is a section of a paper I’ve been working on the past few weeks (which will be presented in Philadelphia in April)

Some of the content below is also drawn from: http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

===============

Over the past several decades, many states have pursued substantive changes to their state school finance systems, while others have not. Some reforms have come and gone. Some reforms have been stimulated by judicial pressure resulting from state constitutional challenges and others have been initiated by legislatures. In an evaluation of judicial involvement in school finance and resulting reforms from 1971 to 1996, Murray, Evans and Schwab (1998) found that “court ordered finance reform reduced within-state inequality in spending by 19 to 34 percent. Successful litigation reduced inequality by raising spending in the poorest districts while leaving spending in the richest districts unchanged, thereby increasing aggregate spending on education. Reform led states to fund additional spending through higher state taxes.” (p. 789)

There exists an increasing body of evidence that substantive and sustained state school finance reforms matter for improving both the level and distribution of short term and long run student outcomes. A few studies have attempted to tackle school finance reforms broadly applying multi-state analyses over time. Card and Payne (2002) found “evidence that equalization of spending levels leads to a narrowing of test score outcomes across family background groups.” (p. 49) Jackson, Johnson and Persico (2014) use data from the Panel Study of Income Dynamics (PSID) to evaluate long term outcomes of children exposed to court-ordered school finance reforms, based on matching PSID records to childhood school districts for individuals born between 1955 and 1985 and followed up through 2011. They find that the “Effects of a 20% increase in school spending are large enough to reduce disparities in outcomes between children born to poor and non‐poor families by at least two‐thirds,” and further that “A 1% increase in per‐pupil spending increases adult wages by 1% for children from poor families.”(p. 42)

Figlio (2004) explains that the influence of state school finance reforms on student outcomes is perhaps better measured within states over time, explaining that national studies of the type attempted by Card and Payne confront problems of a) the enormous diversity in the nature of state aid reform plans, and b) the paucity of national level student performance data. Most recent peer reviewed studies of state school finance reforms have applied longitudinal analyses within specific states. And several such studies provide compelling evidence of the potential positive effects of school finance reforms. Roy (2011) published an analysis of the effects of Michigan’s 1990s school finance reforms which led to a significant leveling up for previously low-spending districts. Roy, whose analyses measure both whether the policy resulted in changes in funding and who was affected, found that “Proposal A was quite successful in reducing interdistrict spending disparities. There was also a significant positive effect on student performance in the lowest-spending districts as measured in state tests.” (p. 137) Similarly, Papke (2001), also evaluating Michigan school finance reforms from the 1990s, found that “increases in spending have nontrivial, statistically significant effects on math test pass rates, and the effects are largest for schools with initially poor performance.” (Papke, 2001, p. 821)[1] Deke (2003) evaluated “leveling up” of funding for very-low-spending districts in Kansas, following a 1992 lower court threat to overturn the funding formula (without formal ruling to that effect). The Deke article found that a 20 percent increase in spending was associated with a 5 percent increase in the likelihood of students going on to postsecondary education. (p. 275)

Two studies of Massachusetts school finance reforms from the 1990s find similar results. The first, a non-peer-reviewed report by Downes, Zabel, and Ansel (2009) explored, in combination, the influence on student outcomes of accountability reforms and changes to school spending. It found that “Specifically, some of the research findings show how education reform has been successful in raising the achievement of students in the previously low-spending districts.” (p. 5) The second study, an NBER working paper by Guryan (2001), focused more specifically on the redistribution of spending resulting from changes to the state school finance formula. It found that “increases in per-pupil spending led to significant increases in math, reading, science, and social studies test scores for 4th- and 8th-grade students. The magnitudes imply a $1,000 increase in per-pupil spending leads to about a third to a half of a standard-deviation increase in average test scores. It is noted that the state aid driving the estimates is targeted to under-funded school districts, which may have atypical returns to additional expenditures.” (p. 1)[2] Downes had conducted earlier studies of Vermont school finance reforms in the late 1990s (Act 60). In a 2004 book chapter, Downes noted “All of the evidence cited in this paper supports the conclusion that Act 60 has dramatically reduced dispersion in education spending and has done this by weakening the link between spending and property wealth. Further, the regressions presented in this paper offer some evidence that student performance has become more equal in the post-Act 60 period. And no results support the conclusion that Act 60 has contributed to increased dispersion in performance.” (p. 312)[3] Most recently, Hyman (2013) also found positive effects of Michigan school finance reforms in the 1990s, but raised some concerns regarding the distribution of those effects. Hyman found that much of the increase was targeted to schools serving fewer low income children. But, the study did find that students exposed to an additional “12%, more spending per year during grades four through seven experienced a 3.9 percentage point increase in the probability of enrolling in college, and a 2.5 percentage point increase in the probability of earning a degree.” (p. 1)

Indeed, this point is not without some controversy, much of which is easily discarded. Second-hand references to dreadful failures following massive infusions of new funding can often be traced to methodologically inept, anecdotal tales of desegregation litigation in Kansas City, Missouri, or court-ordered financing of urban districts in New Jersey (see Baker & Welner, 2011).[4] Hanushek and Lindseth (2009) use a similar anecdote-driven approach in which they dedicate a chapter of a book to proving that court-ordered school funding reforms in New Jersey, Wyoming, Kentucky, and Massachusetts resulted in few or no measurable improvements. However, these conclusions are based on little more than a series of graphs of student achievement on the National Assessment of Educational Progress in 1992 and 2007 and an untested assertion that, during that period, each of the four states infused substantial additional funds into public education in response to judicial orders.[5] Greene and Trivitt (2008) present a study in which they claim to show that court ordered school finance reforms let to no substantive improvements in student outcomes. However, the authors test only whether the presence of a court order is associated with changes in outcomes, and never once measure whether substantive school finance reforms followed the court order, but still express the conclusion that court order funding increases had no effect. In equally problematic analysis, Neymotin (2010) set out to show that massive court ordered infusions of funding in Kansas following Montoy v. Kansas led to no substantive improvements in student outcomes. However, Neymotin evaluated changes in school funding from 1997 to 2006, but the first additional funding infused following the January 2005 supreme court decision occurred in the 2005-06 school year, the end point of Neymotin’s outcome data.

On balance, it is safe to say that a significant and growing body of rigorous empirical literature validates that state school finance reforms can have substantive, positive effects on student outcomes, including reductions in outcome disparities or increases in overall outcome levels. Further, it stands to reason that if positive changes to school funding have positive effects on short and long run outcomes both in terms of level and distribution, then negative changes to school funding likely have negative effects on student outcomes. Thus it is critically important to understand the impact of the recent recession on state school finance systems, the effects on long term student outcomes being several years down the line.

References

Ajwad, Mohamad I. 2006. Is intra-jurisdictional resource allocation equitable? An analysis of campus level spending data from Texas elementary schools. The Quarterly Review of Economics and Finance 46 (2006) 552-564

Baker, Bruce D. 2012. Re-arranging deck chairs in Dallas: Contextual constraints on within district resource allocation in large urban Texas school districts. Journal of Education Finance 37 (3) 287-315

Baker, B. D., & Corcoran, S. P. (2012). The Stealth Inequities of School Funding: How State and Local School Finance Systems Perpetuate Inequitable Student Spending. Center for American Progress.

Baker, B., & Green, P. (2008). Conceptions of equity and adequacy in school finance. Handbook of research in education finance and policy, 203-221.

Baker, B. D., Sciarra, D. G., & Farrie, D. (2012). Is School Funding Fair?: A National Report Card. Education Law Center. http://schoolfundingfairness.org/National_Report_Card_2012.pdf

Baker, B. D., Taylor, L. L., & Vedlitz, A. (2008). Adequacy estimates and the implications of common standards for the cost of instruction. National Research Council.

Baker, B. D., & Welner, K. G. (2011). School finance and courts: Does reform matter, and how can we tell. Teachers College Record, 113(11), 2374-2414.

Baker, B.D., Welner, K.G. (2010) Premature celebrations: The persistence of inter-district funding disparities. Education Policy Analysis Archives. http://epaa.asu.edu/ojs/article/viewFile/718/831

Card, D., and Payne, A. A. (2002). School Finance Reform, the Distribution of School Spending, and the Distribution of Student Test Scores. Journal of Public Economics, 83(1), 49-82.

Chambers, Jay G., Jesse D. Levin, and Larisa Shambaugh. 2010. Exploring weighted student formulas as a policy for improving equity for distributing resources to schools: A case study of two California school districts. Economics of Education Review, 29(2), 283-300.

Chambers, Jay, Larisa Shambaugh, Jesse Levin, Mari Muraki, and Lindsay Poland. 2008. A Tale of Two Districts: A Comparative Study of Student-Based Funding and School-Based Decision Making in San Francisco and Oakland Unified School Districts. American Institutes for Research. Palo Alto, CA.

Ciotti, P. (1998). Money and School Performance: Lessons from the Kansas City Desegregations Experience. Cato Policy Analysis #298.

Coate, D. & VanDerHoff, J. (1999). Public School Spending and Student Achievement: The Case of New Jersey. Cato Journal, 19(1), 85-99.

Corcoran, S., & Evans, W. N. (2010). Income inequality, the median voter, and the support for public education (No. w16097). National Bureau of Economic Research.

Dadayan, L. (2012) The Impact of the Great Recession on Local Property Taxes. Albany, NY: Rockefeller Institute. http://www.rockinst.org/pdf/government_finance/2012-07-16-Recession_Local_%20Property_Tax.pdf

Deke, J. (2003). A study of the impact of public school spending on postsecondary educational attainment using statewide school district refinancing in Kansas, Economics of Education Review, 22(3), 275-284.

Downes, T. A., Zabel, J., and Ansel, D. (2009). Incomplete Grade: Massachusetts Education Reform at 15. Boston, MA. MassINC.

Downes, T. A. (2004). School Finance Reform and School Quality: Lessons from Vermont. In Yinger, J. (ed), Helping Children Left Behind: State Aid and the Pursuit of Educational Equity. Cambridge, MA: MIT Press

Duncombe, W., Yinger, J. (2008) Measurement of Cost Differentials In H.F. Ladd & E. Fiske (eds) pp. 203-221. Handbook of Research in Education Finance and Policy. New York: Routledge.

Duncombe, W., & Yinger, J. (1998). School finance reform: Aid formulas and equity objectives. National Tax Journal, 239-262.

Edspresso (2006, October 31). New Jersey learns Kansas City’s lessons the hard way. Retrieved October 23, 2009, from http://www.edspresso.com/index.php/2006/10/new-jersey-learns-kansas-citys-lessons-the-hard-way-2/

Evers, W. M., and Clopton, P. (2006). “High-Spending, Low-Performing School Districts,” in Courting Failure: How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm our Children (Eric A. Hanushek, ed.) (pp. 103-194). Palo Alto, CA: Hoover Press.

Figlio, D.N. (2004) Funding and Accountability: Some Conceptual and Technical Issues in State Aid Reform. In Yinger, J. (ed) p. 87-111 Helping Children Left Behind: State Aid and the Pursuit of Educational Equity. MIT Press.

Goertz, M., and Weiss, M. (2009). Assessing Success in School Finance Litigation: The Case of New Jersey. New York City: The Campaign for Educational Equity, Teachers College, Columbia University.

Greene, J. P. & Trivitt, (2008). Can Judges Improve Academic Achievement? Peabody Journal of Education, 83(2), 224-237.

Guryan, J. (2001). Does Money Matter? Estimates from Education Finance Reform in Massachusetts. Working Paper No. 8269. Cambridge, MA: National Bureau of Economic Research.

Hanushek, E. A., and Lindseth, A. (2009). Schoolhouses, Courthouses and Statehouses. Princeton, N.J.: Princeton University Press., See also: http://edpro.stanford.edu/Hanushek/admin/pages/files/uploads/06_EduO_Hanushek_g.pdf

Hanushek, E. A. (Ed.). (2006). Courting failure: How school finance lawsuits exploit judges’ good intentions and harm our children (No. 551). Hoover Press.

Imazeki, J., & Reschovsky, A. (2004). School finance reform in Texas: A never ending story. Helping children left behind: State aid and the pursuit of educational equity, 251-281.

Jaggia, S., Vachharajani, V. (2004) Money for Nothing: The Failures of Education Reform in Massachusetts http://www.beaconhill.org/BHIStudies/EdStudy5_2004/BHIEdStudy52004.pdf

Leuven, E., Lindahl, M., Oosterbeek, H., and Webbink, D. (2007). The Effect of Extra Funding for Disadvantaged Pupils on Achievement. The Review of Economics and Statistics, 89(4), 721-736.

Murray, S. E., Evans, W. N., & Schwab, R. M. (1998). Education-finance reform and the distribution of education resources. American Economic Review, 789-812.

Neymotin, F. (2010) The Relationship between School Funding and Student Achievement in Kansas Public Schools. Journal of Education Finance 36 (1) 88-108

Papke, L. (2005). The effects of spending on test pass rates: evidence from Michigan. Journal of Public Economics, 89(5-6). 821-839.

Resch, A. M. (2008). Three Essays on Resources in Education (dissertation). Ann Arbor: University of Michigan, Department of Economics. Retrieved October 28, 2009, from http://deepblue.lib.umich.edu/bitstream/2027.42/61592/1/aresch_1.pdf

Roy, J. (2011). Impact of school finance reform on resource equalization and academic performance: Evidence from Michigan. Education Finance and Policy, 6(2), 137-167.

Sciarra, D., Farrie, D., Baker, B.D. (2010) Filling Budget Holes: Evaluating the Impact of ARRA Fiscal Stabilization Funds on State Funding Formulas. New York, Campaign for Educational Equity. http://www.nyssba.org/clientuploads/nyssba_pdf/133_FILLINGBUDGETHOLES.pdf

Taylor, L. L., & Fowler Jr, W. J. (2006). A Comparable Wage Approach to Geographic Cost Adjustment. Research and Development Report. NCES-2006-321. National Center for Education Statistics.

Walberg, H. (2006) High Poverty, High Performance Schools, Districts and States. in Courting Failure: How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm our Children (Eric A. Hanushek, ed.) (pp. 79-102). Palo Alto, CA: Hoover Press.

Notes

[1] In a separate study, Leuven and colleagues (2007) attempted to isolate specific effects of increases to at-risk funding on at risk pupil outcomes, but did not find any positive effects.

[2] While this paper remains an unpublished working paper, the advantage of Guryan’s analysis is that he models the expected changes in funding at the local level as a function of changes to the school finance formula itself, through what is called an instrumental variables or two stage least squares approach. Then, Guryan evaluates the extent to which these policy induced variations in local funding are associated with changes in student outcomes. Across several model specifications, Guryan finds increased outcomes for students at Grade 4 but not grade 8. A counter study by the Beacon Hill Institute suggest that reduced class size and/or increased instructional spending either has no effect on or actually worsens student outcomes (Jaggia & Vachharajani, 2004).

[3] Two additional studies of school finance reforms in New Jersey also merit some attention in part because they directly refute findings of Hanushek and Lindseth and of the earlier Cato study and do so with more rigorous and detailed methods. The first, by Alex Resch (2008) of the University of Michigan (doctoral dissertation in economics), explored in detail the resource allocation changes during the scaling up period of school finance reform in New Jersey. Resch found evidence suggesting that New Jersey Abbott districts “directed the added resources largely to instructional personnel” (p. 1) such as additional teachers and support staff. She also concluded that this increase in funding and spending improved the achievement of students in the affected school districts. Looking at the statewide 11th grade assessment (“the only test that spans the policy change”), she found: “that the policy improves test scores for minority students in the affected districts by one-fifth to one-quarter of a standard deviation” (p. 1). Goertz and Weiss (2009) also evaluated the effects of New Jersey school finance reforms, but did not attempt a specific empirical test of the relationship between funding level and distributional changes and outcome changes. Thus, their findings are primarily descriptive. Goertz and Weiss explain that on state assessments achievement gaps closed substantially between 1999 and 2007, the period over which Abbott funding was most significantly scaled up. Goertz & Weiss further explain: “State Assessments: In 1999 the gap between the Abbott districts and all other districts in the state was over 30 points. By 2007 the gap was down to 19 points, a reduction of 11 points or 0.39 standard deviation units. The gap between the Abbott districts and the high-wealth districts fell from 35 to 22 points. Meanwhile performance in the low-, middle-, and high-wealth districts essentially remained parallel during this eight-year period” (Figure 3, p. 23).

[4] Two reports from Cato Institute are illustrative (Ciotti, 1998, Coate & VanDerHoff, 1999).

[5] That is, the authors merely assert that these states experienced large infusions of funding, focused on low income and minority students, within the time period identified. They necessarily assume that, in all other states which serve as a comparison basis, similar changes did not occur. Yet they validate neither assertion. Baker and Welner (2011) explain that Hanushek and Lindseth failed to even measure whether substantive changes had occurred to the level or distribution of school funding as well as when and for how long. In New Jersey, for example, infusion of funding occurred from 1998 to 2003 (or 2005), thus Hanushek and Lindseth’s window includes 6 years on the front end where little change occurred (When?). Kentucky reforms had largely faded by the mid to late 1990s, yet Hanushek and Lindseth measure post reform effects in 2007 (When?). Further, in New Jersey, funding was infused into approximately 30 specific districts, but Hanushek and Lindseth explore overall changes to outcomes among low-income children and minorities using NAEP data, where some of these children attend the districts receiving additional support but many did not (Who?). In short the slipshod comparisons made by Hanushek and Lindseth provide no reasonable basis for asserting either the success or failures of state school finance reforms. Hanushek (2006) goes so far as to title the book “How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm Our Children.” The premise that additional funding for schools often leveraged toward class size reduction, additional course offerings or increased teacher salaries, causes harm to children is, on its face, absurd. And the book which implies as much in its title never once validates that such reforms ever do cause harm. Rather, the title is little more than a manipulative attempt to convince the non-critical spectator who never gets past the book’s cover to fear that school finance reforms might somehow harm children. The book also includes two examples of a type of analysis that occurred with some frequency in the mid-2000s which also had the intent of showing that school funding doesn’t matter. These studies would cherry pick anecdotal information on either or both a) poorly funded schools that have high outcomes or b) well-funded schools that have low outcomes (see Evers & Clopton, 2006, Walber, 2006).