Arne-Ology & the Bad Incentives of Evaluating Teacher Prep with Student Outcome Data

As I understand it, USDOE is going to go ahead with the push to have teacher preparation programs rated in part based on the student growth outcomes of children taught by individuals receiving credentials from those programs. Now, the layers of problems associated with this method are many and I’ve addressed them previously here and in professional presentations.

  1. This post summarizes my earlier concerns about how the concept fails both statistically and practically.
  2. This post explains what happens at the ridiculous extremes of this approach (a warped, endogenous cycle of reformy awesomeness)
  3. These slides present a more research based, and somewhat less snarky critique

Now, back to the snark.

This post builds on my most recent post in which I challenged the naive assertion that current teacher ratings really tell us where the good teachers are. Specifically, I pointed out that in Massachusetts, if we accept the teacher ratings at face value, then we must accept that good teachers are a) less likely to teach in middle schools, b) less likely to teach in high poverty schools and c) more likely to teach in schools that have more girls than boys.

Slide4

Extending these findings to the policy of rating teacher preparation programs by the ratings their teachers receive… working on the assumption that these ratings are quite strongly biased by school context, it would make sense for Massachusetts teacher preparation institutions to try to get their teachers placed in low poverty elementary schools that have fewer boys.

Given that New Jersey growth percentile data reveal even more egregious patterns of bias, I now offer insights for New Jersey colleges of education as to where they should try to place their graduates – that is, if they want to win at the median growth percentile game.

Slide2

It’s pretty simple – New Jersey colleges of education would be wise to get their graduates placements in schools that are:

  • 20% of fewer free lunch (to achieve good math gains)
  • 5% or lower black (to achieve good math gains)
  • 11% or lower free lunch (to achieve good LAL gains)
  • 2% or lower black ( to achieve good LAL gains)

Now, the schools NJ colleges of ed should avoid (for placing their grads) are those that are:

  • over 50% free lunch
  • over 30% black

That is, if colleges of education want to play this absurd game of chasing invalid metrics.

Let’s take a look at some of the specific districts that might be of interest.

Here are the districts with the highest and lowest growth producing teachers (uh… assuming this measure has any attribution to teacher quality).

Slide3

Now, my New Jersey readers can readily identify the differences between these groups, with a few exceptions. Ed schools in NJ would be wisest to maximize their placements in locations like Bernards Twp, Essex Fells, Princeton, Mendham and Ridgewood. After all, what young grads wouldn’t want to work in these districts? And of course, Ed schools would be advised to avoid placing any grads in districts like East Orange, Irvington or Newark.

Let me be absolutely clear here. I AM NOT ACTUALLY ADVOCATING SUCH DETRIMENTAL UNETHICAL BEHAVIOR.

Rather, I am pointing out that newly adopted USDOE regulations in fact endorse this model by requiring that this type of data actually be used to consequentially evaluate teacher preparation programs.

It’s simply wrong. It’s bad policy. And it must stop!

And yes… quite simply… this is WORSE THAN THE STATUS QUO!

For further discussion on this point, I refer you to this post!

 

 

 

 

 

The Endogeneity of the Equitable Distribution of Teachers: Or, why do the girls get all the good teachers?

Recently, the Center for American Progress (disclosure: I have a report coming out through them soon) released a report in which they boldly concluded, based on data on teacher ratings from Massachusetts and Louisiana, that teacher quality is woefully inequitably distributed across children by the income status of those children. As evidence of these inequities, the report’s authors included a few simple graphs, like this one, showing the distribution of teachers by their performance categories:

Figure 1. CAP evidence of teacher quality inequity in Massachusetts

Slide1

Based on this graph, the authors conclude:

In Massachusetts, the percentage of teachers rated Unsatisfactory is small overall, but students in high-poverty schools are three times more likely to be taught by one of them. The distribution of Exemplary teachers favors students in high-poverty schools, who are about 30 percent more likely to be taught by an exemplary teacher than are students in low-poverty schools. However, students in high-poverty schools are less likely to be taught by a Proficient teacher and more likely to be taught by a teacher who has received a Needs Improvement rating. (p. 4)

But, there exists (at least) one huge problem of making the assertion that teacher ratings, built significantly on measures such as Student Growth Percentiles, provide evidence of inequitable distribution of teaching quality. It is very well understood that many value added estimates in state policy and practice, and most if not all student growth percentile measures used in state policy and practice are substantially influenced by student population characteristics including income status, prior performance and even gender balance of classrooms.

Let me make this absolutely clear one more time – simply because student growth percentile measures are built on expected current scores of individual students based on prior scores does not mean, by any stretch of the statistical imagination, that SGPs “fully account for student background” and even more so, for the classroom context factors including other students and the student group in the aggregate. Further, Value Added Models (VAMs) which may take additional steps to account for these potential sources of bias are typically not successful at removing all such bias.

Figure 2 here shows the problem. As I’ve explained numerous previous times, growth percentile and value added measures contain 3 basic types of variation:

  1. Variation that might actually be linked to practices of the teacher in the classroom;
  2. Variation that is caused by other factors not fully accounted for among the students, classroom setting, school and beyond;
  3. Variation that is, well, complete freakin statistical noise (in many cases, generated by the persistent rescaling and stretching, cutting and compressing, then stretching again, changes in test scores over time which may be built on underlying shifts in 1 to 3 additional items answered right or wrong by 9 year olds filling in bubbles with #2 pencils).

Our interest in #1 above, but to the extent that there is predictable variation, which combines #1 and #2, we are generally unable to determine what share of the variation is #1 and what share is #2.

Figure 2. The Endogeneity of Teacher Quality Sorting and Ratings Bias

Slide2

A really important point here is that many if not most models I’ve seen actually adopted by states for evaluating teachers do a particularly poor job at parsing 1 & 2. This is partly due to the prevalence of growth percentile measures in state policy.

This issue becomes particularly thorny when we try to make assertions about the equitable distribution of teaching quality. Yes, as per the figure above, teachers do sort across schools and we have much reason to believe that they sort inequitably. We have reason to believe they sort inequitably with respect to student population characteristics. The problem is that those same student population characteristics in many cases also strongly influence teacher ratings.

As such, those teacher ratings themselves aren’t very useful for evaluating the equitable distribution of teaching. In fact, in most cases it’s a pretty darn useless exercise, ESPECIALLY with the measures commonly adopted across states to characterize teacher quality.Being able to determine the inequity of teacher quality sorting requires that we can separate #1 and #2 above. That we know the extent to which the uneven distribution of students affected the teacher rating versus the extent to which teachers with higher ratings sorted into more advantaged school settings.

Now, let’s take a stroll through just how difficult it is to sort out whether the inequity CAP sees in Massachusetts teacher ratings is real, or more likely just a bad, biased ratings system.

Figure 3 relates the % of teachers in the bottom two ratings categories to the share of children qualified for free lunch, by grade level, across Massachusetts schools. As we can see, low poverty schools tend to have very few of those least effective teachers, whereas many, though not all higher poverty schools do have larger shares, consistent with the CAP findings.

Figure 3. Relating Shares of Low Rated Teachers and School Low Income Share in Massachusetts

Slide3

Figure 4 presents the cross school correlations between student demographic indicators and teacher ratings. Again, we see that there are more low rated teachers in higher poverty, higher minority concentration schools.

But, as a little smell-test here, I’ve also included % female students, which is often a predictor of not just student test score levels but also rates of gain. What we see here is that at the middle and secondary level, there are fewer “bad” teachers in schools that have higher proportions of female students.

Does that make sense? Is it really the case that the “good” teachers are taking the jobs in the schools with more girls?

Figure 4. Relating Shares of Low Rated Teachers and School Demographics in Massachusetts

 Slide4

 

Okay, let’s do this as a multiple regression model, and for visual clarity, graph the coefficients in Figure 5. Here, I’ve regressed the % low performing teachers on each of the demographic measures. In find a negative (though only sig. at p<.10) effect on the % female measure. That is, schools with more girls have fewer “bad” teachers. Yes, schools with more low income kids seem to have more “bad” teachers, but in my view, the whole darn thing is suspect.

Figure 5. Regression Based Estimates of Teacher Rating Variation by Demography in Massachusetts

Slide5

So, the Massachusetts ratings seem hardly useful for sorting out bias versus actual quality and thus determining which kids are being subjected to better or worse teachers.

But what about other states? Well, I’ve written much about the ridiculous levels of bias in the New Jersey Growth Percentile measures. But, here they are again.

Figure 6. New Jersey School Growth Percentiles by Low Income Concentration and Grade 3 Mean Scale Scores

 Slide6

Figure 6 shows that New Jersey school median growth percentiles are associated with both low income concentration and average scale scores of the first tested grade level. The official mantra of the state department of education is that these patterns obviously reflect that low income, low performing children are simply getting the bad teachers. But that, like the CAP finding above, is an absurd stretch given the complete lack of evidence as to what share of these measures, if any, can actually be associated with teacher effect and what share is driven by context and students.

So, let’s throw in that percent female effect just for fun. Table 1 provides estimates from a few alternative regression models of the school level SGP data. As with the Massachusetts ratings, the regressions show that the share of student population that is female is positively associated with school level median growth percentile, and quite consistently and strongly so.

Now, extending CAP’s logic to these findings, we must now assume that the girls get the best teachers! Or at least that schools with more girls are getting the better teachers. It could not possibly have anything to do with classrooms and schools having more girls being, for whatever reason, more likely to generate test score gains, even with the same teachers? But then again, this is all circular.

Table 1. Regressions of New Jersey School Level Growth Percentiles on Student Characteristics

Slide7

Note here that these models are explaining in the case of LAL, nearly 40% of the variation in growth percentiles. That’s one heck of a lot of potential bias. Well, either that, or teacher sorting in NJ is particularly inequitable. But knowing what’s what here is impossible. My bet is on some pretty severe bias.

Now for one final shot, with a slightly different twist. New York City uses a much richer value-added model which accounts much more fully for student characteristics. The model also accounts for some classroom and school characteristics. But the New York City model, which also produces much noisier estimates as a result (the more you parse the bias, the more you’re left with noise), doesn’t seem to fully capture some other potential contributors to value added gains. The regressions in Table 2 below summarize resource measures that predict variation in school aggregated teacher value added estimates for NYC middle schools.

Table 2. How resource variation across MIDDLE schools influences aggregate teacher value-added in NYC

Slide8

Schools with smaller classes or higher per pupil budgets have higher average teacher value added! It’s also the case that schools with higher average scale scores have higher average teacher value added. That poses a potential bias problem. Student characteristics must be evaluated in light of the inclusion of the average scale score measure.

Indeed, more rigorous analyses can be done to sort the extent that “better” (higher test score gain producing) teachers migrate to more advantaged schools, but with very limited samples of data on teachers having prior ratings in one setting who then sort to another (and maintain some stable component of their prior rating). Evaluating in large scale, without tracking individual moves, even when trying to include a richer set of background variables is likely to mislead.

Another alternative is to reconcile teacher sorting by outcome measures with teacher sorting by other characteristics that are exogenous (not trapped in this cycle of cause and effect). Dan Goldhaber and colleagues provide one recent example applied to data on teachers in Washington State. Goldhaber and colleagues compared the distribution of a) novice teachers, b) teachers with low VAM estimates and c) teachers by their own test scores on a certification exam, across classrooms, schools and districts by 1) minority concentration, 2) low income concentration and 3) prior performance. That is, the reconciled the distribution of their potentially endogenous measure (VAM) with two exogenous measures (teacher attributes). And they did find disparities.

Notably, in contrast with much of the bluster about teacher quality distribution being primarily a function of corrupt, rigid contract driving within district and within school assignment of teachers, Goldhaber and colleagues found the between district distribution of teacher measures to be most consistently disparate:

For example, the teacher quality gap for FRL students appears to be driven equally by teacher sorting across districts and teacher sorting across schools within a district. On the other hand, the teacher quality gap for URM (underrepresented minority) students appears to be driven primarily by teacher sorting across districts; i.e., URM students are much more likely to attend a district with a high percentage of novice teachers than non-URM students. In none of the three cases do we see evidence that student sorting across classrooms within schools contributes significantly to the teacher quality gap.

These findings, of course, raise issues regarding the logic that district contractual policies are the primary driver of teacher quality inequity (the BIG equity problem, that is). Separately, while the FRL results are not entirely consistent with the URM (Underrepresented Minority) findings, this may be due to the use of a constant income threshold for comparing districts in rural Eastern Washington to districts in the Seattle metro. Perhaps more on this at a later point.

Policy implications of misinformed conclusions from bad measures

The implications of ratings bias vary substantially by the policy preferences supported to resolve the supposed inequitable distribution of teaching. One policy preference is the “fire the bad teachers” preference, assuming that a whole bunch of better teachers will line up to take their jobs. If we impose this policy alternative using such severely biased measures as the Massachusetts or New Jersey measures, we will likely find ourselves disproportionately firing and detenuring, year after year, teachers in the same high need schools, having little or nothing to do with the quality of the teachers themselves. As each new batch of teachers enters these schools, and subsequently faces the same fate due to the bogus, biased measures it seems highly unlikely that high quality candidates will continue to line up. This is a disaster in the making. Further, applying the “fire the bad teachers” approach in the presence of such systematically biased measures is likely a very costly option – both in terms of the district costs of recruiting and training new batches of teachers year after year, and the costs of litigation associated with dismissing their predecessors based on junk measures of their effectiveness.

Alternatively, if one provides compensation incentives to draw teachers into “lower performing” schools, and perhaps take efforts to improve working conditions (facilities, class size, total instructional load), fewer negative consequences – even in the presence of bad, biased measurement, are likely to occur. One can hope, based on recent studies of transfer incentive policies, that some truly “better” teachers would be more likely to opt to work in schools serving high need populations, even where their own rating might be at greater risk (assuming policy does not assign high stakes to that rating). This latter approach certainly seems more reasonable, more likely to do good, and at the very least far less likely to do serious harm.

Why you can’t compare simple achievement gaps across states! So don’t!

Consider this post the second in my series of basic data issues in education policy analysis.

This is a topic on which I’ve written numerous previous posts. In most previous posts I’ve focused specifically on the issue of problems with poverty measurement across contexts and how those problems lead to common misinterpretations of achievement gaps. For example, if we simply determine achievement gaps by taking the average test scores of children above and below some arbitrary income threshold, like those qualifying or not for the federally subsidized school lunch program, any comparisons we make across states will be severely compromised by the fact that a) the income threshold we use may provide very different quality of life from Texas to New Jersey and b) the average incomes and quality of life of those above that threshold versus those below it may be totally different in New Jersey than in Texas.

For example, the histogram below presents the New Jersey and Texas poverty income distributions for families of children between the ages of 5 and 17. The Poverty Index is the ratio of family income to the poverty income level (which is fixed national). The histograms are generated using 2011 American Community Survey data extracted from http://www.ipums.org (one of my favorite sites!). The vertical line is set at 185% poverty, or the federal “reduced price lunch” threshold, a common threshold used in comparing low income to non-low income student achievement gaps.

As we can see, the income distribution in New Jersey is simply higher than that of Texas. It’s also more dispersed. And, as it turns out the ratio of income for those above, versus those below the 185% threshold is much greater in New Jersey.

In New Jersey, the income ratio is about 6:1 for those above versus those below the 185% threshold. In Texas, the ratio is about 4.5:1. And that matters when comparing achievement gaps!

Figure 1. Poverty Income Distributions for Texas and New Jersey

Slide1

Figure 2 illustrates the relationship between the income ratios for non-low income to low–income children’s families and outcome gaps for NAEP grade 4 Math in 2011. Put simply, states with larger income gaps also have larger outcome gaps. States with the largest income gaps, like Connecticut, have particularly large outcome gaps. Clearly, it would be inappropriate to compare directly, the income achievement gap of Idaho, for example to New Jersey. New Jersey does have a larger outcome gap. But New Jersey also has a larger income gap. Both states fall below the trendline, indicating (that if we assume the relationship to be linear) that their outcome gaps are both smaller than expected, and in fact, quite comparable.

Figure 2: Relating Income Gaps to Outcome Gaps (NAEP Grade 4 Math)

Slide2

 

 

Table 1 summarizes the correlations between income and outcome gaps for each NAEP math and reading assessment, over several years.

Table 1. Correlations between Income and Outcome GapsSlide3

It stands to reason that if the income differences between low income and non-low income families affect the income achievement gaps, then so too would the income differences between racial groups affect the outcome differences between racial groups. Therefore, it is equally illogical to compare directly racial achievement gaps across states.

Figure 3a shows the black and white family income distributions in Texas and 3b shows the income distributions in Connecticut.

In Texas, the ratio of family income for white families to black families is about 1.5 to 1.

In Connecticut, that ratio is over 2.3:1.

Figure 3a. Black and White Income Distributions in Texas

Slide5

Figure 3b. Black and White Income Distributions in Connecticut

 Slide4

Thus, as expected, Figure 4 shows that the income gaps between black and white families are quite strongly correlated with the outcome gaps of their children in Math (Grade 4).

Figure 4. Income Gaps between Black and White Students and Outcome Gaps

Slide6

Table 2 shows the correlations between black-white family income gaps and black-white child outcome gaps for NAEP assessments since 2000.

Table 2. Correlations between Black-White Income Gaps and Black-White Test Score GapsSlide7

Why is this important? Well, it’s important because state officials and data naïve education reporters love to make a big deal about which states have the biggest achievement gaps and by extension assert that the primary reason for these gaps is state lack of attention to the gaps in policy.

Connecticut reporters and politicos love to point to that state’s “biggest in the nation” achievement gap, with absolutely no cognizance of the fact that their achievement gaps, both income and race related, are driven substantially by the vast income disparity of the state. That said, Connecticut consistently shows larger gaps than would even be expected for its level of income disparity.

Black-white achievement gaps are similarly a hot topic in Wisconsin, but will little acknowledgement that Wisconsin also has the largest income gap (other than DC) between the families of black and white children.

New Jersey officials love to downplay the state’s high average performance by lambasting defenders of public schools with evidence of largest in the nation achievement gaps.

“The dissonance in that is if you get beneath the numbers, beneath the aggregates, you’ll see that we have one of the largest achievement gaps in the nation.” (former Commissioner Christopher Cerf)

Years ago, politicos and education writers might have argued that these gaps persist because of funding and related resource gaps. Nowadays, the same groups might argue that these gaps persist because employment protections for “bad teachers” in high poverty, high minority concentration schools, and that where the gaps are bigger, those protections must be somehow most responsible.

But these assertions – both the old and the new – presume that comparisons of achievement gaps, either by race or income, between states are valid. That is, they validly reflect policy/practice differences across states and not some other factor.

Quite simply, as most commonly measured, they do not. They largely reflect differences in income distributions across states, a nuance I suspect will continue to be overlooked in public discourse and the media. But one can hope.

 

 

Understand your data & use it wisely! Tips for avoiding stupid mistakes with publicly available NJ data

My next few blog posts will return to a common theme on this blog – appropriate use of publicly available data sources. I figure it’s time to put some positive, instructive stuff out there. Some guidance for more casual users (and more reckless ones) of public data sources and for those must making their way into the game. In this post, I provide a few tips on using publicly available New Jersey schools data. The guidance provided herein is largely in response to repeated errors I’ve seen over time in using and reporting New Jersey school data, where some of those errors are simple oversight and lack of deep understanding of the data, and others of those errors seem a bit more suspect. Most of these recommendations apply to using other states’ data as well. Notably, most of these are tips that a thoughtful data analyst would arrive at on his/her own, by engaging in the appropriate preliminary evaluations of the data. But sadly these days, it doesn’t seem to work that way.

So, here are a few NJ state data tips.

NJ ASK scale score data vary by grade level, so aggregating across grade levels produces biased comparisons if schools have different numbers of kids in different grade levels

NJ, like other states has adopted math and reading assessments from grades 3 to 8 and like other states has made numerous rather arbitrary decisions over time as to how to establish cut scores determining proficiency on the assessments, and methods for converting raw scores (numbers of items on a 50 point test) into scale scores (with proficiency cut-score of 200 and max score of 300). [1] The presumption behind this method is that “proficiency” has some common meaning across grade levels. That a child who is proficient in grade 3 math for example, if he or she learns what they are supposed to in 4th grade (and only what they are supposed to), they will again be proficient at the end of the year. But that doesn’t mean that the distributions of testing data actually support this assumption. Alternatively, the state could have scaled the scores year-over-year such that the average student remained the average student, a purely normative approach rather than the pseudo-standards-based (mostly normative) approach currently in use.

A few fun artifacts of the current approach are that a) proficiency rates vary from one grade to the next, giving a false impression that, for example, 5th graders simply aren’t doing as well as 4th graders in language arts, and that b) scale score averages vary similarly. Many a 5 or 6th grade teacher or grade level coordinator across the state has come under fire from district officials for their apparent NJASK underperformance compared to lower grades. But this underperformance is merely an artifact of arbitrary decisions in the design of the tests, difficulty of the items, conversion to scale scores and arbitrary assignment of cut points.

Here’s a picture of the average scale scores drawn from school level data weighted by relevant test takers for NASK math and NJASK language arts. Of course, the simplest implication here is that “kids get dumber at LAL as they progress through grades” and or their teachers simply suck more, and that “kids get smarter in math as they progress through grades.” Alternatively – as stated above, this is really just an artifact of those layers of arbitrary decisions.

Figure 1. Scale Scores by Grade Level Statewide

Slide1

Why then, do we care? How does this affect our common uses of the data? Well, on several occasions I’ve encountered presentations of schoolwide average scale scores as somehow representing school average test performance. The problem is that if you aggregate across grades, but have more kids in some grades than others, your average will be biased by the imbalance of kids. If you are seeking to play this bias to your advantage:

  1. If your school has more kids in grades 6 to 8 than in 3 to 5, you’d want to look at LAL scores. That’s because kids statewide simply score higher on LAL in grades 6 to 8. It would be completely unfair to compare schoolwide LAL scores for a school with mostly grades 6 to 8 students to schoolwide LAL scores for a school with mostly grades 3 to 5 students. Yet it is done far too often!
  2. Interestingly, the revers appears true for math.

So, consumers of reports of school performance data in New Jersey should certainly be suspicious any time someone chooses to make comparisons solely on the basis of schoolwide LAL scores, or math scores for that matter. While it makes for many more graphs and tables, grade level disaggregation is the only way to go with these data.

Let’s take a look.

Here are Newark Charter and District schools by schoolwide LAL, and by low income concentration. Here, we see that Robert Treat Academy, North Star Academy and TEAM academy fall above the line. That is, relative to their low income concentrations (setting aside very low rates of children with disabilities or ELL children, and 50+% attrition at North Star) have average schoolwide scale scores that appear to exceed expectations.

Figure 2. Newark Charter vs. District School Schoolwide LAL by Low Income Concentration

Slide3

 

But, as Figure 3 shows, it may not be a good idea (unless of course you are gaming the data) to use LAL school aggregate scale scores to represent the comparative performance of these schools with NPS schools. As Figure 3 shows, both North Star and TEAM academy – especially TEAM academy have larger shares of kids in the grades where average scores tend to be higher as a function of the tests and their rescaling.

Figure 3. Charter and District Grade Level DistributionsSlide2

http://www.nj.gov/education/data/enr/enr13/enr.zip

Figure’s 4a and 4b break out the comparisons by grade level and provide both the math and LAL assessments for a more complete and more accurate picture, though still ignoring many other variables that may influence these scores (attrition, special education, ELL and gender balance). These figures also identify schools slated for takeover by charters. Whereas TEAM academy appeared on schoolwide aggregate to “beat the odds” on LAL, TEAM falls roughly on the trendline for LAL 6, 7 and 8 and falls below it for LAL 5. That is, disaggregation paints a different picture of TEAM academy in particular – one of a school that by grade level more or less meets the average expectation.  Similarly for North Star, while their small groups of 3rd and 4th graders appear to substantially beat the odds, differences are much smaller for their 5 through 8th grade students when compared to only students in those same grades in other schools. Some similar patterns appear for math, except that TEAM in particular falls more consistently below the line.

Figure 4a. Charter and District Scale Scores vs. Low Income by Grade

Slide4

Figure 4b. Charter and District Scale Scores vs. Low Income by Grade

Slide5

Figure 4c. Charter and District Scale Scores vs. Low Income by Grade

Slide6

A few related notes are in order:

  • Math assessments in grades 5-7 have very strong ceiling effects which are particularly noticeable in more affluent districts and schools where significant shares of children score 300.
  • As a result of the scale score fluctuations, there are also by grade proficiency rate fluctuations.

Not all measures are created equal: Measures, thresholds and cutpoints matter!

I’ve pointed this one out on a number of occasions – that finding the measure that best captures the variations in needs across schools is really important when you are trying to tease out how those variations relate to test scores. I’ve also explained over and over again how measures of low income concentration commonly used in education policy conversations are crude and often fail to capture variation across settings. But, among the not-so-great options for characterizing differences in student needs across schools, there are better and worse methods and measures. Two simple and highly related rules of thumb apply when evaluating factors that may affect or be strongly associated with student outcomes:

  1. The measure that picks up more variation across settings is usually the better measure, assuming that variation is not noise (simply a greater amount of random error in reporting of the measure).
  2. Typically, the measure that picks up more “real” variation across settings will also be more strongly correlated with the measure of interest – in many cases variations in student outcome levels.

A classic case of how different thresholds or cutpoints affect the meaningful variation captured across school settings is the choice of shares of free lunch (below the 130% threshold for poverty) versus free or reduced priced lunch (below the much higher 185% threshold) when comparing schools in a relatively high poverty setting. In many relatively high poverty settings, the share of children in families below the 185% threshold exceeds 80 to 90% across all schools. Yes, there may appear to be variation across those schools, but that variation within such a narrow, truncated range may be particularly noisy, and thus not very helpful in determining the extent to which low income shares compromise student outcomes. It is really important to understand that two schools with 80% of children below the 185% income threshold for poverty can be hugely different in terms of actual income and poverty distribution.

Here, for example, is the distribution of schools by concentration of children in families below the 185% income threshold in Newark, NJ. The mean is around 90%!

Figure 5.

Slide8

Now here is the distribution of schools by concentration of children in families below the 130% threshold. The bell curve looks similar in shape, but now the mean is around 80% and the spread is much greater. But even this is really the test of proof of the meaningfulness of this variation.

Figure 6.

Slide9

 

But first a little aside. If, in figure 5, nearly all kids are below the free/reduced threshold and fewer below the free lunch threshold, we basically have a city where “if not free lunch, then reduced lunch.” Plotted, it looks like this:

Figure 7.

Slide10

The correlation here is -.65 across Newark schools. What this actually means, is that the percent reduced lunch children is, in fact a measure of the percent of lower need children in any school – because there are so few children who don’t qualify for either. Children qualifying for reduced price lunch in Newark are among the upper income children in Newark schools. If a school has fewer reduced lunch children, it typically means they have more free lunch children and vice versa.

As such comparing charter schools to district schools on the basis of % free or reduced is completely bogus, because charters serve very low shares of the lower income children but do serve the rest.

Second, it is statistically very problematic to put both of these measures – the inverse of one another because they account for nearly the entire population – in a single regression model!

Further validation of the importance of using the measures of actual lower income children is provided in the table below, which shows the correlations between outcome measures across schools and student population characteristics.

Figure 8. Correlations between low income concentrations and outcome measures

Slide11 With respect to every outcome measure, % free lunch is more strongly negatively associated with the outcome measure. Of course, one striking problem here is that the growth percentile scores, while displaying weaker relationships to low income than level scores, do show a modest relationship, indicating their persistent bias, even across schools within a relatively narrow range of poverty (Newark). But that’s a side story for now!

To add to the assertion that % reduced lunch in a district like Newark (where % reduced means, % not free lunch), is in fact a measure of relative advantage, take a look at the final column. % reduced lunch alone is strongly positively correlated with the outcome level measures. Statistically, this is a given since it is more or less the inverse of a measure that is strongly negatively correlated with the outcome measures.

Know your data context!

Finally, and this is somewhat of an extension of a previous point, it’s really important if you intend to engage in any kind of comparisons across school settings, to get to know your context. Get to know your data and how they vary across schools. For example, know that nearly all kids in Newark fall below the 185% income threshold and that this means that if a child is not below the 130% income threshold, then they are likely still below the 185% threshold. This creates a whole different meaning from the “usual” assumptions about children qualified for reduced price lunch, how their shares vary across schools, and what it likely means.

Many urban districts have population distributions by race that are similarly in inverse proportion to one another. That is, in a city like Newark, schools that are not predominantly black tend to be predominantly Hispanic. Similar patterns exist in Chicago and Philadelphia at much larger scale. Here is the scatterplot for Newark. In Newark, the relationship between % black and % Hispanic is almost perfectly inverse!

Figure 9. % Black versus % Hispanic for Newark Schools

Slide7As Mark Weber and I pointed out in our One Newark briefs, just as it would be illogical (as well as profoundly embarrassing) to try to consider both % free and % reduced lunch in a model comparing Newark schools, it is hugely problematic to try to address both % Hispanic and % black in any model comparing Newark schools. Quite simply, for the most part, if not one then the other.

Catching these problems is a matter of familiarity with context and familiarity with data. These are common issues. And I encourage budding grad students, think tankers and data analysts to pay closer attention to these issues.

How can we catch this stuff?

Know your context.

Run descriptive analyses first to get to know your data.

Make a whole bunch of scatterplots to get to know how variables relate to one another.

Don’t assume that the relationships and meanings of the measures in one context necessarily translate to another. The best example here is the meaning of % reduced lunch. It might just be a measure of relative advantage in a very high poverty urban setting.

And think… think… think twice… and think again about just what the measures mean… and perhaps more importantly, what they don’t and cannot!

 

Cheers!

 

 

 

[1] New Jersey Assessment of Skills and Knowledge 2012 TECHNICAL REPORT Grades 3-8. February 2013. NJ Department of Education. http://www.nj.gov/education/assessment/es/njask_tech_report12.pdf

A Response to “Correcting the Facts about the One Newark Plan: A Strategic Approach To 100 Excellent Schools”

schoolfinance101's avatarNew Jersey Education Policy Forum

Full report here: Weber.Baker.OneNewarkResponseFINALREVIEW

Mark Weber & Bruce Baker

On March 11, 2014, the Newark Public Schools (NPS) released a response to our policy brief of January 24, 2014: “An Empirical Critique of One Newark.”[1] Our brief examined the One Newark plan, a proposal by NPS to close, “renew,” or turn over to charter management organizations (CMOs) many of the district’s schools. Our brief reached the following conclusions:

  •  Measures of academic performance are not significant predictors of the classifications assigned to NPS schools by the district, when controlling for student population characteristics.
  • Schools assigned the consequential classifications have substantively and statistically significantly greater shares of low income and black students.
  • Further, facilities utilization is also not a predictor of assigned classifications, though utilization rates are somewhat lower for those schools slated for charter takeover.
  • Proposed charter takeovers cannot be justified on the assumption that charters will yield better outcomes…

View original post 691 more words

DFER Idiocy on New York School Finance

This may just be among the most ludicrous proclamations I’ve read in quite some time, and it’s brought to us by none other than Dimwits doofuses/doofi? …well something with a “D” For Education Reform:

“Contrary to what you may hear from certain special interest groups, the best way to fix our schools is not just to pour more money into the education bureaucracy. New York already spends $75 billion in education annually—from public schools to state funded universities—more than the total annual budget of 47 other states. What we need is smarter investments that actually deliver results, like statewide universal full-day pre-k, scholarships for students in critically-needed STEM courses and funding to reward our hardest working teachers. Governor Cuomo is taking a stand for our students by pushing for these programs, and we should all join him in putting our students first.” – See more at: http://www.dfer.org/blog/2014/03/dfer-ny_release_1.php#sthash.GfFKgHng.dpuf
I’ve spoken on this point on many  previous occasions – that simply throwing out “big a-contextual numbers” is pointless. It says nothing. It’s bafflingly ignorant – mathematically inept and simply stupid. That NY “already spends $75 billion in education annually” is neither here nor there without context. It’s just dumb blather – pointless. Saying that it’s “more than the total budget of 47 other states” is equally stupid.
Let’s review NY state school finance issues for a moment. Here’s a recap of previous posts:

  1. On how New York State crafted a low-ball estimate of what districts needed to achieve adequate outcomes and then still completely failed to fund it.
  2. On how New York State maintains one of the least equitable state school finance systems in the nation.
  3. On how New York State’s systemic, persistent underfunding of high need districts has led to significant increases of numbers of children attending school with excessively large class sizes.
  4. On how New York State officials crafted a completely bogus, racially and economically disparate school classification scheme in order to justify intervening in the very schools they have most deprived over time.

But sticking specifically to the current issue, take a look here at how the Governor’s current budget proposal severely undercuts in state aid WHAT SCHOOL DISTRICTS NEEDED BACK IN 2007 UNDER LOWER STANDARDS, based on the state’s own formula estimates.

Below are the more comprehensive briefs on this topic. Yeah… I know this is way to  difficult reading for them DFER folk… and it includes math and numbers… but those seeking a deeper understanding of school funding in New York State please do read.

Here’s a post in which I explain more concisely these issues! To summarize without copying the whole post, New York State has continued raising outcome standards and continues to fall more than 30% short of their own school finance formula funding targets. Funding targets which were a) low-balled in the first place and b) have been lowered since.

Funding shortfalls for many districts are around 50% of the aid they should receive. And those shortfalls are greatest for the neediest districts. The state continues to underfund NYC’s foundation aid by about $3 billion per year.

Top 50 2014-15 Budgeted Shortfalls

[data run as of 1/17/14]

Slide3

Further, our annual report Is School Funding Fair has repeatedly identified NY state as among the most regressively financed states in the nation – that is, states where state and local revenue is systematically lower in higher poverty districts.

My more recent longitudinal analyses show negligible progress for NY state from 1993 to 2011. This graph (below) of mid-Atlantic states shows that despite court orders in the past, NY state continues to operate a regressively funded  system.

On the vertical axis we have the school funding fairness ratio, based on the model we use in our annual report, but extended over 19 years. This ratio shows the expected spending or revenue at 30% census poverty over the expected revenue at 0% poverty. When the ratio is over 1.0, the system is progressive and when it’s under 1.0 it’s regressive.

NY has been regressive since 1993 and shows little sign of improving (backsliding for now). Also noted are timings of judicial rulings – this stuff is from a forthcoming paper I’m working on for an academic conference. As is well understood, Pennsylvania is also a particularly regressive state.

MidAtlantic

And now to reiterate one more thing – which I wrote about just the other day – uh… yesterday! 

School finance reform does matter! On balance, it is safe to say that a significant and growing body of rigorous empirical literature validates that state school finance reforms can have substantive, positive effects on student outcomes, including reductions in outcome disparities or increases in overall outcome levels. Further, it stands to reason that if positive changes to school funding have positive effects on short and long run outcomes both in terms of level and distribution, then negative changes to school funding likely have negative effects on student outcomes.

I’ve also addressed this here: http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

And here! http://www.tcrecord.org/content.asp?contentid=16106

It is completely ignorant to assert – NY spends $75 billion – therefore that’s enough. Schools just need to spend it better with absolutely no understanding of where that money is in the state – which districts  have more and which have less – or how much may really be needed to provide for an equitable and adequate system of school funding in New York.

So… am I forgoing civil discourse here a bit. Hell yeah! This ignorant drivel by DFER is pathetic political pandering. And it’s just dumb. Dangerously dumb.

All toward the noble cause of advocating against equitable and adequate school funding for the state’s neediest schoolchildren. ???

Civil discourse ended as soon as they decided that facts and context simply don’t matter. I simply have no tolerance for this level of stupid!

What really matters? Equitable & Adequate Funding!

Below is a section of a paper I’ve been working on the past few weeks (which will be presented in Philadelphia in April)

Some of the content below is also drawn from: http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

===============

Over the past several decades, many states have pursued substantive changes to their state school finance systems, while others have not. Some reforms have come and gone. Some reforms have been stimulated by judicial pressure resulting from state constitutional challenges and others have been initiated by legislatures. In an evaluation of judicial involvement in school finance and resulting reforms from 1971 to 1996, Murray, Evans and Schwab (1998) found that “court ordered finance reform reduced within-state inequality in spending by 19 to 34 percent. Successful litigation reduced inequality by raising spending in the poorest districts while leaving spending in the richest districts unchanged, thereby increasing aggregate spending on education. Reform led states to fund additional spending through higher state taxes.” (p. 789)

There exists an increasing body of evidence that substantive and sustained state school finance reforms matter for improving both the level and distribution of short term and long run student outcomes. A few studies have attempted to tackle school finance reforms broadly applying multi-state analyses over time. Card and Payne (2002) found “evidence that equalization of spending levels leads to a narrowing of test score outcomes across family background groups.” (p. 49) Jackson, Johnson and Persico (2014) use data from the Panel Study of Income Dynamics (PSID) to evaluate long term outcomes of children exposed to court-ordered school finance reforms, based on matching PSID records to childhood school districts for individuals born between 1955 and 1985 and followed up through 2011. They find that the “Effects of a 20% increase in school spending are large enough to reduce disparities in outcomes between children born to poor and non‐poor families by at least two‐thirds,” and further that “A 1% increase in per‐pupil spending increases adult wages by 1% for children from poor families.”(p. 42)

Figlio (2004) explains that the influence of state school finance reforms on student outcomes is perhaps better measured within states over time, explaining that national studies of the type attempted by Card and Payne confront problems of a) the enormous diversity in the nature of state aid reform plans, and b) the paucity of national level student performance data. Most recent peer reviewed studies of state school finance reforms have applied longitudinal analyses within specific states. And several such studies provide compelling evidence of the potential positive effects of school finance reforms. Roy (2011) published an analysis of the effects of Michigan’s 1990s school finance reforms which led to a significant leveling up for previously low-spending districts. Roy, whose analyses measure both whether the policy resulted in changes in funding and who was affected, found that “Proposal A was quite successful in reducing interdistrict spending disparities. There was also a significant positive effect on student performance in the lowest-spending districts as measured in state tests.” (p. 137) Similarly, Papke (2001), also evaluating Michigan school finance reforms from the 1990s, found that “increases in spending have nontrivial, statistically significant effects on math test pass rates, and the effects are largest for schools with initially poor performance.” (Papke, 2001, p. 821)[1] Deke (2003) evaluated “leveling up” of funding for very-low-spending districts in Kansas, following a 1992 lower court threat to overturn the funding formula (without formal ruling to that effect). The Deke article found that a 20 percent increase in spending was associated with a 5 percent increase in the likelihood of students going on to postsecondary education. (p. 275)

Two studies of Massachusetts school finance reforms from the 1990s find similar results. The first, a non-peer-reviewed report by Downes, Zabel, and Ansel (2009) explored, in combination, the influence on student outcomes of accountability reforms and changes to school spending. It found that “Specifically, some of the research findings show how education reform has been successful in raising the achievement of students in the previously low-spending districts.” (p. 5) The second study, an NBER working paper by Guryan (2001), focused more specifically on the redistribution of spending resulting from changes to the state school finance formula. It found that “increases in per-pupil spending led to significant increases in math, reading, science, and social studies test scores for 4th- and 8th-grade students. The magnitudes imply a $1,000 increase in per-pupil spending leads to about a third to a half of a standard-deviation increase in average test scores. It is noted that the state aid driving the estimates is targeted to under-funded school districts, which may have atypical returns to additional expenditures.” (p. 1)[2] Downes had conducted earlier studies of Vermont school finance reforms in the late 1990s (Act 60). In a 2004 book chapter, Downes noted “All of the evidence cited in this paper supports the conclusion that Act 60 has dramatically reduced dispersion in education spending and has done this by weakening the link between spending and property wealth. Further, the regressions presented in this paper offer some evidence that student performance has become more equal in the post-Act 60 period. And no results support the conclusion that Act 60 has contributed to increased dispersion in performance.” (p. 312)[3] Most recently, Hyman (2013) also found positive effects of Michigan school finance reforms in the 1990s, but raised some concerns regarding the distribution of those effects. Hyman found that much of the increase was targeted to schools serving fewer low income children. But, the study did find that students exposed to an additional “12%, more spending per year during grades four through seven experienced a 3.9 percentage point increase in the probability of enrolling in college, and a 2.5 percentage point increase in the probability of earning a degree.” (p. 1)

Indeed, this point is not without some controversy, much of which is easily discarded. Second-hand references to dreadful failures following massive infusions of new funding can often be traced to methodologically inept, anecdotal tales of desegregation litigation in Kansas City, Missouri, or court-ordered financing of urban districts in New Jersey (see Baker & Welner, 2011).[4] Hanushek and Lindseth (2009) use a similar anecdote-driven approach in which they dedicate a chapter of a book to proving that court-ordered school funding reforms in New Jersey, Wyoming, Kentucky, and Massachusetts resulted in few or no measurable improvements. However, these conclusions are based on little more than a series of graphs of student achievement on the National Assessment of Educational Progress in 1992 and 2007 and an untested assertion that, during that period, each of the four states infused substantial additional funds into public education in response to judicial orders.[5] Greene and Trivitt (2008) present a study in which they claim to show that court ordered school finance reforms let to no substantive improvements in student outcomes. However, the authors test only whether the presence of a court order is associated with changes in outcomes, and never once measure whether substantive school finance reforms followed the court order, but still express the conclusion that court order funding increases had no effect. In equally problematic analysis, Neymotin (2010) set out to show that massive court ordered infusions of funding in Kansas following Montoy v. Kansas led to no substantive improvements in student outcomes. However, Neymotin evaluated changes in school funding from 1997 to 2006, but the first additional funding infused following the January 2005 supreme court decision occurred in the 2005-06 school year, the end point of Neymotin’s outcome data.

On balance, it is safe to say that a significant and growing body of rigorous empirical literature validates that state school finance reforms can have substantive, positive effects on student outcomes, including reductions in outcome disparities or increases in overall outcome levels. Further, it stands to reason that if positive changes to school funding have positive effects on short and long run outcomes both in terms of level and distribution, then negative changes to school funding likely have negative effects on student outcomes. Thus it is critically important to understand the impact of the recent recession on state school finance systems, the effects on long term student outcomes being several years down the line.

References

Ajwad, Mohamad I. 2006. Is intra-jurisdictional resource allocation equitable? An analysis of campus level spending data from Texas elementary schools. The Quarterly Review of Economics and Finance 46 (2006) 552-564

Baker, Bruce D. 2012. Re-arranging deck chairs in Dallas: Contextual constraints on within district resource allocation in large urban Texas school districts. Journal of Education Finance 37 (3) 287-315

Baker, B. D., & Corcoran, S. P. (2012). The Stealth Inequities of School Funding: How State and Local School Finance Systems Perpetuate Inequitable Student Spending. Center for American Progress.

Baker, B., & Green, P. (2008). Conceptions of equity and adequacy in school finance. Handbook of research in education finance and policy, 203-221.

Baker, B. D., Sciarra, D. G., & Farrie, D. (2012). Is School Funding Fair?: A National Report Card. Education Law Center. http://schoolfundingfairness.org/National_Report_Card_2012.pdf

Baker, B. D., Taylor, L. L., & Vedlitz, A. (2008). Adequacy estimates and the implications of common standards for the cost of instruction. National Research Council.

Baker, B. D., & Welner, K. G. (2011). School finance and courts: Does reform matter, and how can we tell. Teachers College Record, 113(11), 2374-2414.

Baker, B.D., Welner, K.G. (2010) Premature celebrations: The persistence of inter-district funding disparities. Education Policy Analysis Archives. http://epaa.asu.edu/ojs/article/viewFile/718/831

Card, D., and Payne, A. A. (2002). School Finance Reform, the Distribution of School Spending, and the Distribution of Student Test Scores. Journal of Public Economics, 83(1), 49-82.

Chambers, Jay G., Jesse D. Levin, and Larisa Shambaugh. 2010. Exploring weighted student formulas as a policy for improving equity for distributing resources to schools: A case study of two California school districts. Economics of Education Review, 29(2), 283-300.

Chambers, Jay, Larisa Shambaugh, Jesse Levin, Mari Muraki, and Lindsay Poland. 2008. A Tale of Two Districts: A Comparative Study of Student-Based Funding and School-Based Decision Making in San Francisco and Oakland Unified School Districts. American Institutes for Research. Palo Alto, CA.

Ciotti, P. (1998). Money and School Performance: Lessons from the Kansas City Desegregations Experience. Cato Policy Analysis #298.

Coate, D. & VanDerHoff, J. (1999). Public School Spending and Student Achievement: The Case of New Jersey. Cato Journal, 19(1), 85-99.

Corcoran, S., & Evans, W. N. (2010). Income inequality, the median voter, and the support for public education (No. w16097). National Bureau of Economic Research.

Dadayan, L. (2012) The Impact of the Great Recession on Local Property Taxes. Albany, NY: Rockefeller Institute. http://www.rockinst.org/pdf/government_finance/2012-07-16-Recession_Local_%20Property_Tax.pdf

Deke, J. (2003). A study of the impact of public school spending on postsecondary educational attainment using statewide school district refinancing in Kansas, Economics of Education Review, 22(3), 275-284.

Downes, T. A., Zabel, J., and Ansel, D. (2009). Incomplete Grade: Massachusetts Education Reform at 15. Boston, MA. MassINC.

Downes, T. A. (2004). School Finance Reform and School Quality: Lessons from Vermont. In Yinger, J. (ed), Helping Children Left Behind: State Aid and the Pursuit of Educational Equity. Cambridge, MA: MIT Press

Duncombe, W., Yinger, J. (2008) Measurement of Cost Differentials In H.F. Ladd & E. Fiske (eds) pp. 203-221. Handbook of Research in Education Finance and Policy. New York: Routledge.

Duncombe, W., & Yinger, J. (1998). School finance reform: Aid formulas and equity objectives. National Tax Journal, 239-262.

Edspresso (2006, October 31). New Jersey learns Kansas City’s lessons the hard way. Retrieved October 23, 2009, from http://www.edspresso.com/index.php/2006/10/new-jersey-learns-kansas-citys-lessons-the-hard-way-2/

Evers, W. M., and Clopton, P. (2006). “High-Spending, Low-Performing School Districts,” in Courting Failure: How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm our Children (Eric A. Hanushek, ed.) (pp. 103-194). Palo Alto, CA: Hoover Press.

Figlio, D.N. (2004) Funding and Accountability: Some Conceptual and Technical Issues in State Aid Reform. In Yinger, J. (ed) p. 87-111 Helping Children Left Behind: State Aid and the Pursuit of Educational Equity. MIT Press.

Goertz, M., and Weiss, M. (2009). Assessing Success in School Finance Litigation: The Case of New Jersey. New York City: The Campaign for Educational Equity, Teachers College, Columbia University.

Greene, J. P. & Trivitt, (2008). Can Judges Improve Academic Achievement? Peabody Journal of Education, 83(2), 224-237.

Guryan, J. (2001). Does Money Matter? Estimates from Education Finance Reform in Massachusetts. Working Paper No. 8269. Cambridge, MA: National Bureau of Economic Research.

Hanushek, E. A., and Lindseth, A. (2009). Schoolhouses, Courthouses and Statehouses. Princeton, N.J.: Princeton University Press., See also: http://edpro.stanford.edu/Hanushek/admin/pages/files/uploads/06_EduO_Hanushek_g.pdf

Hanushek, E. A. (Ed.). (2006). Courting failure: How school finance lawsuits exploit judges’ good intentions and harm our children (No. 551). Hoover Press.

Imazeki, J., & Reschovsky, A. (2004). School finance reform in Texas: A never ending story. Helping children left behind: State aid and the pursuit of educational equity, 251-281.

Jaggia, S., Vachharajani, V. (2004) Money for Nothing: The Failures of Education Reform in Massachusetts http://www.beaconhill.org/BHIStudies/EdStudy5_2004/BHIEdStudy52004.pdf

Leuven, E., Lindahl, M., Oosterbeek, H., and Webbink, D. (2007). The Effect of Extra Funding for Disadvantaged Pupils on Achievement. The Review of Economics and Statistics, 89(4), 721-736.

Murray, S. E., Evans, W. N., & Schwab, R. M. (1998). Education-finance reform and the distribution of education resources. American Economic Review, 789-812.

Neymotin, F. (2010) The Relationship between School Funding and Student Achievement in Kansas Public Schools. Journal of Education Finance 36 (1) 88-108

Papke, L. (2005). The effects of spending on test pass rates: evidence from Michigan. Journal of Public Economics, 89(5-6). 821-839.

Resch, A. M. (2008). Three Essays on Resources in Education (dissertation). Ann Arbor: University of Michigan, Department of Economics. Retrieved October 28, 2009, from http://deepblue.lib.umich.edu/bitstream/2027.42/61592/1/aresch_1.pdf

Roy, J. (2011). Impact of school finance reform on resource equalization and academic performance: Evidence from Michigan. Education Finance and Policy, 6(2), 137-167.

Sciarra, D., Farrie, D., Baker, B.D. (2010) Filling Budget Holes: Evaluating the Impact of ARRA Fiscal Stabilization Funds on State Funding Formulas. New York, Campaign for Educational Equity. http://www.nyssba.org/clientuploads/nyssba_pdf/133_FILLINGBUDGETHOLES.pdf

Taylor, L. L., & Fowler Jr, W. J. (2006). A Comparable Wage Approach to Geographic Cost Adjustment. Research and Development Report. NCES-2006-321. National Center for Education Statistics.

Walberg, H. (2006) High Poverty, High Performance Schools, Districts and States. in Courting Failure: How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm our Children (Eric A. Hanushek, ed.) (pp. 79-102). Palo Alto, CA: Hoover Press.

Notes

[1] In a separate study, Leuven and colleagues (2007) attempted to isolate specific effects of increases to at-risk funding on at risk pupil outcomes, but did not find any positive effects.

[2] While this paper remains an unpublished working paper, the advantage of Guryan’s analysis is that he models the expected changes in funding at the local level as a function of changes to the school finance formula itself, through what is called an instrumental variables or two stage least squares approach. Then, Guryan evaluates the extent to which these policy induced variations in local funding are associated with changes in student outcomes. Across several model specifications, Guryan finds increased outcomes for students at Grade 4 but not grade 8. A counter study by the Beacon Hill Institute suggest that reduced class size and/or increased instructional spending either has no effect on or actually worsens student outcomes (Jaggia & Vachharajani, 2004).

[3] Two additional studies of school finance reforms in New Jersey also merit some attention in part because they directly refute findings of Hanushek and Lindseth and of the earlier Cato study and do so with more rigorous and detailed methods. The first, by Alex Resch (2008) of the University of Michigan (doctoral dissertation in economics), explored in detail the resource allocation changes during the scaling up period of school finance reform in New Jersey. Resch found evidence suggesting that New Jersey Abbott districts “directed the added resources largely to instructional personnel” (p. 1) such as additional teachers and support staff. She also concluded that this increase in funding and spending improved the achievement of students in the affected school districts. Looking at the statewide 11th grade assessment (“the only test that spans the policy change”), she found: “that the policy improves test scores for minority students in the affected districts by one-fifth to one-quarter of a standard deviation” (p. 1). Goertz and Weiss (2009) also evaluated the effects of New Jersey school finance reforms, but did not attempt a specific empirical test of the relationship between funding level and distributional changes and outcome changes. Thus, their findings are primarily descriptive. Goertz and Weiss explain that on state assessments achievement gaps closed substantially between 1999 and 2007, the period over which Abbott funding was most significantly scaled up. Goertz & Weiss further explain: “State Assessments: In 1999 the gap between the Abbott districts and all other districts in the state was over 30 points. By 2007 the gap was down to 19 points, a reduction of 11 points or 0.39 standard deviation units. The gap between the Abbott districts and the high-wealth districts fell from 35 to 22 points. Meanwhile performance in the low-, middle-, and high-wealth districts essentially remained parallel during this eight-year period” (Figure 3, p. 23).

[4] Two reports from Cato Institute are illustrative (Ciotti, 1998, Coate & VanDerHoff, 1999).

[5] That is, the authors merely assert that these states experienced large infusions of funding, focused on low income and minority students, within the time period identified. They necessarily assume that, in all other states which serve as a comparison basis, similar changes did not occur. Yet they validate neither assertion. Baker and Welner (2011) explain that Hanushek and Lindseth failed to even measure whether substantive changes had occurred to the level or distribution of school funding as well as when and for how long. In New Jersey, for example, infusion of funding occurred from 1998 to 2003 (or 2005), thus Hanushek and Lindseth’s window includes 6 years on the front end where little change occurred (When?). Kentucky reforms had largely faded by the mid to late 1990s, yet Hanushek and Lindseth measure post reform effects in 2007 (When?). Further, in New Jersey, funding was infused into approximately 30 specific districts, but Hanushek and Lindseth explore overall changes to outcomes among low-income children and minorities using NAEP data, where some of these children attend the districts receiving additional support but many did not (Who?). In short the slipshod comparisons made by Hanushek and Lindseth provide no reasonable basis for asserting either the success or failures of state school finance reforms. Hanushek (2006) goes so far as to title the book “How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm Our Children.” The premise that additional funding for schools often leveraged toward class size reduction, additional course offerings or increased teacher salaries, causes harm to children is, on its face, absurd. And the book which implies as much in its title never once validates that such reforms ever do cause harm. Rather, the title is little more than a manipulative attempt to convince the non-critical spectator who never gets past the book’s cover to fear that school finance reforms might somehow harm children. The book also includes two examples of a type of analysis that occurred with some frequency in the mid-2000s which also had the intent of showing that school funding doesn’t matter. These studies would cherry pick anecdotal information on either or both a) poorly funded schools that have high outcomes or b) well-funded schools that have low outcomes (see Evers & Clopton, 2006, Walber, 2006).

“One Newark’s” Racially Disparate Impact on Teachers

schoolfinance101's avatarNew Jersey Education Policy Forum

PDF of Policy Brief: Weber.Baker.Oluwole.Staffing.Report_3_10_2014_FINAL

As with our previous One Newark policy brief, this one is too long and complex to post in full as a blog. Below are the executive summary and conclusions and policy recommendations. We encourage you to read the full report at the link above.

Executive Summary

In December of 2013, State Superintendent Cami Anderson introduced a district-wide restructuring plan for the Newark Public Schools (NPS). In our last brief on “One Newark,” we analyzed the consequences for students; we found that, when controlling for student population characteristics, academic performance was not a significant predictor of the classifications assigned to schools by NPS. This results in consequences for schools and their students that are arbitrary and capricious; in addition, we found those consequences disproportionately affected black and low-income students. We also found little evidence that the interventions planned under One Newark – including takeovers of schools…

View original post 711 more words

What do our One Newark Reports tell us?

My doctoral student Mark Weber and I have just completed our second report evaluating the impact of the proposed One Newark plan on Newark schools, teachers and the students they serve. In this post, I will try to provide a condensed summary of our findings and the connections between the two reports.

In our first report, we evaluated the statistical bases for the placement of Newark district schools into various categories of consequences. Those categories are as follows:

  1. No Major Change: Neither the staff nor students will experience a major restructuring. While some schools may be resited, there will otherwise be little impact on the school. We have classified many of the schools slated for redesign in this category, as there appears to be no substantial change in the student body, the staff, or the mission of the school in NPS documents; however, we recognize that this may change as One Newark is implemented, and that some of these schools may eventually belong in different categories.
  2. Renew: As staff will have to reapply for their positions, students may see a large change in personnel. The governance of the school may change in other ways.
  3. Charter Takeover: While students are given “priority” if they choose to apply to the charter, there appears to be no guarantee they will be accepted.
  4. Close: We consider a school “closed” when it ceases to function in its current form, its building is being divested or repurposed, and it is not being taken over by a charter operator.
  5. Unknown: The “One Newark” documents published by NPS are ambiguous about the fate of the school.

We evaluated the extent to which schools, by these classifications differed in terms of a) performance measures, b) student population characteristics and c) facilities indicators. We also tested whether these factors might be used to predict to which group schools were assigned.  What we found was:

  • Measures of academic performance are not significant predictors of the classifications assigned to NPS schools by the district, when controlling for student population characteristics.
  • Schools assigned the consequential classifications have substantively and statistically significantly greater shares of low income and black students.
  • Further, facilities utilization is also not a predictor of assigned classifications, though utilization rates are somewhat lower for those schools slated for charter takeover.
  • Proposed charter takeovers cannot be justified on the assumption that charters will yield better outcomes with those same children. This is because the charters in question do not currently serve similar children. Rather they serve less needy children and when adjusting school aggregate performance measures for the children they serve, they achieve no better current outcomes on average than the schools they are slated to take over.
  • Schools slated for charter takeover or closure specifically serve higher shares of black children than do schools facing no consequential classification. Schools classified under “renew” status serve higher shares of low-income children.

Figure 1

Slide1

Figure 2

Slide2

Figure 3

Slide3

Figure 4

Slide4

These last two figures are particularly important because what they show is that after we adjust for student population characteristics, schools slated for takeover by charters in some cases actually outperform the charters assigned to take them over. While North Star Academy is a relative high performer, we are unable here to account for the fact that North Star loses about half of their students between grades 5 and 12.

These findings raise serious concerns at two levels. First, these findings raise serious questions about the districts own purported methodology for classifying schools. Our analyses suggest the district’s own classifications are arbitrary and capricious, yielding racially and economically disparate effects.  Second, the choice, based on arbitrary and capricious classification, to subject disproportionate shares of low income and minority children to substantial disruption to their schooling, shifting many to schools under private governance, may substantially alter the rights of these children, their parents and local taxpayers.

One Newark is a program that appears to place sanctions on schools – including closure, charter takeover, and “renewal” – on the basis of student test outcomes, without regard for student background. The schools under sanction may have lower proficiency rates, but they also serve more challenging student populations: students in economic disadvantage, students with special educational needs, and students who are Limited English Proficient.

There is a statistically significant difference in the student populations of schools that face One Newark sanctions and those that do not. “Renew” schools serve more free lunch-eligible students, which undoubtedly affects their proficiency rates. Schools slated for charter takeover and closure serve larger proportions of students who are black; those students and their families may have their rights abrogated if they choose to stay at a school that will now be run by a private entity.

There is a clear correlation between student characteristics and proficiency rates on state tests. When we control for student characteristics, we find that many of the schools slated for sanction under One Newark actually have higher proficiency rates than we would predict. Further, the Newark charter schools that may take over those NPS schools perform worse than prediction.

There is, therefore, no empirical justification for assuming that charter takeovers will work, when after adjusting for student populations, schools to be taken over actually outperform the charters assigned to take them over. Further, these charters have no track record actually serving populations like those attending the schools identified for takeover.

Our analysis calls into question NPS’s methodology for classifying schools under One Newark. Without statistical justification that takes into account student characteristics, the school classifications appear to be arbitrary and capricious.

Further, our analyses herein find that the assumption that charter takeover can solve the ills of certain district schools is specious at best.  The charters in question, including TEAM academy, have never served populations like those in schools slated for takeover and have produced only comparable current outcome levels relative to the populations they actually serve.

Finally, as with other similar proposals sweeping the nation arguing to shift larger and larger shares of low income and minority children into schools under private and quasi-private governance, we have significant concerns regarding the protections of the rights of these children and taxpayers in these communities.

In our second report, we evaluated the distribution of staffing consequences from the One Newark proposal.  For us, the One Newark plan raised immediate concerns of possible racial disparity both for students and their teachers. As such, we decided to evaluate the racially disparate impact of the plan on teachers, in relation to the students they serve and also to explore the parallels between the One Newark proposal and past practices which disadvantaged minority teachers. In our second report, we found:

  • There is a historical context of racial discrimination against black teachers in the United States, and “choice” systems of education have previously been found to disproportionately affect the employment of these teachers. One Newark appears to continue this tradition.
  • There are significant differences in race, gender, and experience in the characteristics of NPS staff and the staff of Newark’s charter schools.
  • NPS’s black teachers are far more likely to teach black students; consequently, these black teachers are more likely to face an employment consequence as black students are more likely to attend schools sanctioned under One Newark.
  • Black and Hispanic teachers are more likely to teach at schools targeted by NJDOE for interventions – the “tougher” school assignments.
  • The schools NPS’s black and Hispanic teachers are assigned to lag behind white teachers’ schools in proficiency measures on average; however, these schools show more comparable results in “growth,” the state’s preferred measure for school and teacher accountability.
  • Because the demographics of teachers in Newark’s charter sector differ from NPS teacher demographics, turning over schools to charter management operators may result in an overall Newark teacher corps that is more white and less experienced.

Figure 5

Slide5

This figure is the real kicker here. This figure, based on two separate logistic regression models characterizes a) the likelihood that an NPS teacher is in a school which faces consequences and b) the likelihood that a teacher is a charter school teacher. That is, we estimate the odds, by race, experience and other factors that a teacher is in a school where they are likely to face job consequences and the odds that a teacher works in the favored subset of charter schools. We find that:

  • NPS teachers who face employment consequences as a function of One Newark are 2.11 times as likely to be black as to be white, and 1.766 times as likely to be Hispanic as white.
  • By contrast, charter school teachers in Newark who are not only protected by the plan, but given the opportunity in some cases to take over the schools and thus the jobs of those NPS teachers, are only 74% as likely to be black as to be white, 47% as likely to be Hispanic as white, and 3.6 times more likely to be Asian than white.
  • Both charter teachers and NPS teachers facing employment consequences tend to be female.
  • NPS teachers who face employment consequences as a function of One Newark are about 50% more likely to have 10 to 14 years of experience compared to their peers with 0 to 4 years, and 37% more likely to have 15 to 19 years of experience compared to their peers with 0 to 4 years.
  • Charter teachers, who again may be given the opportunity to take over schools of these NPS teachers, are highly unlikely to have more than 0 to 4 years of experience. Charter teachers are more than 3x as likely to have 0 to 4 years as opposed to 6 to 9 years, 10 times as likely to have 0 to 4 years as opposed to 10 to 14 years, 20 times as likely to have 0 to 4 years as opposed to 15 to 19 years, and nearly 100 times as likely to have 0 to 4 years of experience than to have more than 19 years of experience.

The overall effect of One Newark on the total Newark teaching corps may likely be to make it more white and less experienced than it is currently.

We find patterns of racial bias in the consequences to staff similar to those we found in the consequences to students, largely because the racial profiles of students and staff within the NPS schools are correlated. In other words: Newark’s black teachers tend to teach the district’s black students; therefore, because One Newark disproportionately affects those black students, black teachers are more likely to face an employment consequence.

NPS’s black teachers are also more likely to have positions in the schools that are designated by the state as needing interventions – the more challenging school assignments. The schools of NPS black teachers consequently lag in proficiency rates, but not in student growth. We do not know the dynamics that lead to more black teachers being assigned to these schools; qualitative research on this question is likely needed to understand this phenomenon.

One Newark will turn management of more NPS schools over to charter management organizations. In our previous brief, we questioned the logic of this strategy, as these CMOs currently run schools that do not teach students with similar characteristics to NPS’s neighborhood schools. Evidence suggests these charters would not achieve any better outcomes with this different student population.

This brief adds a new consideration to the shift from traditional public schools to charters: if the CMOs maintain their current teaching corps’ profile in an expansion, Newark’s teachers are likely to become more white and less experienced overall. Given the importance of teacher experience, particular in the first few years of work, Newark’s students would likely face a decline in teacher quality as more students enroll in charters.

The potential change in the racial composition of the Newark teaching corps under One Newark – to a staff that has a smaller proportion of teachers of color – would occur within a historical context of established patterns of discrimination against black teachers. “Choice” plans in education have previously been found to disproportionately impact the employment of black teachers; One Newark continues in this tradition. NPS may be vulnerable to a disparate impact legal challenge on the grounds that black teachers will disproportionately face employment consequences under a plan that arbitrarily targets their schools.

And now to editorialize in no uncertain terms…

One Newark is an ill-conceived plan. It is simply wrong, statistically, conceptually and quite possibly legally. That it can be so wrong on so many levels displays an astounding combination of ignorance and arrogance among its designers, promoters and supporters.

First, the justifications for school closures are, well, unjustified. The data said to support the plan simply don’t.  Even if closing schools based on poor performance could be justified, the data do not indicate a valid performance based reason for the selections. This is either a sign of gross statistical incompetence on the part of district (and by extension, state) officials or evidence that they have made their decisions on some other basis entirely.

Second, the fact that in many cases, lower performing charters are slated to takeover higher performing district schools (when accounting for students served) is utterly ridiculous. Again, this is either evidence of gross statistical malfeasance and complete ignorance on the part of district officials or that their choices are based on something else entirely. Certainly it is a clever strategy for making charters look good to assign them to takeover schools that already outperform them. But I suspect I’m giving district officials too much credit if I assume this to be their rationale.

Third and finally, if I heard someone suggest 10 years ago [in a time when data free ideological punditry was at least somewhat more moderated and history marginally more understood and respected], that we should start reforming Newark or any racially mixed urban district by closing the black schools, firing the black teachers, selling their buildings and turning over their management to private companies (that may ignore many student and employee rights), I’d have thought they were either kidding or members of some crazy extremist organization [Note that this plan is substantively different in many ways from the Philadelphia privatization plan that was adopted over ten years ago, where private companies held contracts with the district, and thus remained under district governance].

It would be one thing if there were valid facilities utilization, safety or health concerns and other legitimate reorganization considerations that just so happened to affect a larger share of black than other students and teachers. It is difficult if not impossible to protect against any and all racially disparate effects, even when making well-reasoned, empirically justifiable policy decisions.

But this proposed plan, as shown in our analyses, is based on nothing, nor has there been any real, thoughtful or statistically reasonable attempt to justify that it is actually based on something. No legitimate data analyses have been provided to support the plan (much like the flimsy parallel proposal in Kansas City).

It is truly a sad commentary on the state of the education reform conversation that we would even entertain the One Newark proposal, and even more so that we would entertain such a proposal with no valid justification and an increasing body of evidence to the contrary.

 


The False Markets of Market Based Reforms

I’ve not spent a great deal of time talking about “corporate reform,” “privatization” or “market based reform,” mainly because I find these labels unproductive and oversimplified. Most of which occurs in our public and private sectors (also a simplification) and provision of public and private goods and services is far more nuanced – well beyond simple classification. Further, as I have noted on a few occasions, the rhetoric which emanates from one side or the other of these ideological debates is often entirely inconsistent. See for example, my explanation that strategies being promoted for public schools as derivative of successful private sector industry are anything but. That which is pitched as “corporate reform” is little more than failed private sector management.

A similar hypocrisy has been nagging at me lately, and I’ve touched on it, in part, on a few previous posts (here’s one). That is, there are sweeping claims that many of the policies being advanced these days are about capitalizing on the virtues of free markets – that we are trying to use the rationality that emerges from less regulated private sector markets to achieve more efficient production of high quality schools.

Enter charter schools and alternative pathways to the teaching profession.

Two interconnected strategies that are often discussed under this umbrella are a) the expansion of charter schooling and b) reduction of barriers to entry to the teaching profession through programs like Teach for America.

The argument is that by providing a fair public subsidy to charter schools (since their competition is subsidized and already established, giving them an unfair head start), we can induce competition improving both charter schools (via attrition of the weak) and public schools with whom they compete for students. Indeed, there is some empirical evidence to support this finding.

On the teacher labor market issue, the argument for reducing barriers to entry is that the market can decide whether it prefers the traditionally prepared teachers, or the alternative, creating competitive pressures for traditional preparation programs to improve and for alternative pathways to produce qualified candidates… those who can compete on a level, but less regulated playing field.

Casual, anecdotal evidence that both of these strategies are “working” to introduce positive market based effects, include evidence of long waiting lists – that is, high consumer demand, for charter schools in major urban centers. Others point to the fact that candidates from programs like Teach for America tend to get hired and quickly, suggesting high demand for their skills on the open market for teachers.

Ten to fifteen years ago, either of these arguments might have been supportable. Back when charter schools were still mainly upstarts, with a few emerging networks and back when alternative temporary teacher pipelines, while seeking exclusive arrangements with districts and some charters, still populated a balance of the two. But that was then and this is now.

Charter waiting lists in the current era are as likely to be policy-induced by deprivation and mass closure of true public options, where those closures are often based on bogus metrics used for declaring district schools as “failure factories.” Worse, these declarations of failure and disruptive intervention and bogus metrics upon which they are based are now codified in state policies promulgated in response to federal pressures (close the worst 5% you must and tweak your accountability measures you may).  Today’s charter waiting lists are as much a function of induced under-supply of public options if not far more so than local community demand for more charter options.

Surely if the government forcibly shut down Amtrak choosing some bogus measure to declare it an abject failure, there would be increased “demand” for other means of transportation along the N/E corridor and if the government forcibly shut the U.S. Postal service, other package delivery services would see a spike in their business. But I find it doubtful anyone would suggest that this spike resulted from a true market driven preference for their products or services.  Charter waiting lists in the wake of forced shutdown of district schools based on low average test scores, or even biased growth metrics, are no different.

Which brings me to my second point.  I’ve not written much on this blog about TFA, nor have I had cause to.  But the recent back and forth speculation on the potential role of TFA in Newark under the proposed reorganization and layoff plan (connected, or not?) led to some back and forth among local writer Bob Braun and TFA leadership, which brought out one blogger’s attempt to find a middle ground in the debate.  This blogger, responding to the twitter trending of #ResistTFA brought up an argument on behalf of TFA that I’ve not heard in a while. That is, that education schools should learn from the market based successes of TFA, specifically “why do principals and schools still line up to hire TFA corps members when they have the chance?” I must admit, I’ve been complicit in making similar arguments in my own research in the past. (here & here)

The implication in this blog post is that TFA has, by virtue of producing high quality candidates, outpaced the competition (traditional preparation programs) on the free market – the open labor market for teachers. This would be all well and good if the speedy placement of TFA candidates had anything to do with an open competitive labor market, but it doesn’t.  And I’m not entirely sure I fault them for that. I only fault their advocates for not acknowledging that.  And, I would suggest that many traditional ed schools also operate in relatively close relationships with local public school districts.

My problem with the current false market scenario regarding TFA is its intersection with the false market for charter schools. Just as charter school expansion – demand – has become heavily dependent on manipulation of markets by policy makers, TFA expansion – demand – has become dependent on those major charter network operators who are dependent on charter market manipulation (forced closure of district schools).[1]

Put simply, this is not market based reform, nor should anyone pretend that market mechanisms (rather than policy preferences and market manipulation) are driving any of this.

Yes, it is reasonable that we might experiment with public subsidies to private providers, be it through direct private management under district contract, or via upstarts like charters (by their original intent). And yes, it is reasonable to test out alternative pathways to teaching.  But when we start forcibly shuttering the public system, under the facade of federally promulgated state policies, and replacing the only true public option with private providers who then establish exclusive arrangements for alternatively prepared short-term staff, we’ve gone too far.

When we start claiming that these shifts are happening due to free market forces and public demand, well… then we’re just full of crap.

It’s time to put a stop to this and rethink where we’re headed before even more damage is done!


[1] It may be that the long term financial viability of major charter networks depends on both incredibly high employee churn and placement of alums in future positions of political power (to continue rigging markets in favor of the institutions that hire them). Churn is required because, as I’ve explained in previous posts, many well-established charter operators actually pay pretty good salaries over the first several years, outpacing local district schools by 20 to 30%, while also offering small class sizes. It is hard to conceive of how these schools would balance their operating budgets were they to retain these teachers much longer than the usual 2 to 5 years. Further, charter advocates in NYC like to point to the exorbitant retirement costs of the city district as one reason why charter spending is actually lower (even though it’s not) than district schools. I’ve noted several times that it is rather absurd to compare a fully matured district with thousands of retirees to an institution less than 10 years old that likely has no retirees as of yet. Presumably they would, eventually, unless of course they can churn, churn, churn.  TFA helps them accomplish this goal.