Blog

Amazing Graph Proves Poverty Doesn’t Matter!(?)

I just couldn’t pass this one up. This is a graph for the ages, and it comes from a presentation by the New Jersey Commissioner of Education given at the NJASA Commissioner’s Convocation in Jackson, NJ on Feb 29. State of NJ Schools presentation 2-29-2012

Please turn to Slide #24:

The title conveys the intended point of the graph – that if you look hard enough across New Jersey – you can find not only some, but MANY higher poverty schools that perform better than lower poverty schools.

This is a bizarre graph to say the least. It’s set up as a scatter plot of proficiency rates with respect to free/reduced lunch rates, but then it only includes those schools/dots that fall in these otherwise unlikely positions. At least put the others there faintly in the background, so we can see where these fit into the overall pattern. The suggestion here is that there is not pattern.

The apparent inference here? Either poverty itself really isn’t that important a factor in determining student success rates on state assessments, or, alternatively, free and reduced lunch simply isn’t a very good measure of poverty even if poverty is a good predictor. Either way, something’s clearly amiss if we have so many higher poverty schools outperforming lower poverty ones. In fact, the only dots included in the graph are high poverty districts outperforming lower poverty ones. There can’t be much of a pattern between these two variables at all, can there? If anything, the trendline must be sloped up hill? (that is, higher poverty leads to higher outcomes!)

Note that the graph doesn’t even tell us which or how many dots/schools are in each group and/or what percent of all schools these represent. Are they the norm? or the outliers?

So, here’s the actual pattern:

Hmmm… looks a little different when you put it that way. Yeah, it’s a scatter, not a perfectly straight line of dots. And yes, there are some dots to the right hand side that land above the 65 line and some dots to the left that land below it.

BUT THE REALITY IS THAT FREE/REDUCED LUNCH ALONE EXPLAINS ABOUT 2/3 OF THE VARIATION IN PROFICIENCY RATES ACROSS SCHOOLS!

Do free/reduced lunch rates explain all of the variance? Of course not. Nothing really does, in part because the testing data themselves include noise, and reducing the testing data to percentages of kids over and above arbitrary thresholds introduces other noise. So all of the variance can’t be explained no-matter how many variables we throw at it. We can, however, take some additional easily accessible variables from the school report cards and explain a little more of the variation:

But, % free lunch remains the dominant factor, along with % black and % female. Combining free/reduced produces a somewhat weaker effect than using % free alone.

Lengthy, somewhat related tangent

Back in 2007-2008, while I was still at the University of Kansas, I was involved in a study of factors associated with production of outcomes and relative efficiency of New Jersey schools. Most of the data were generally insufficient for academic publication, but we did have some fun playing and figuring out what was there.

The study was designed to figure out a) which background factors really accounted for differences in NJ school performance, and b) what were the differences in characteristics of schools that appeared to do better or worse than expected.

Here are a few snapshots of what I found back then, constructing models of school level outcomes for New Jersey schools using data from 2004 to 2006 (all publicly accessible data).

First, using a combination of background demographic factors, school characteristics and other school resource measures we were able to explain as much as 82% of the variation in 8th grade (then GEPA) outcomes. Still, % free and reduced lunch played a (the) dominant role, along with other related factors including special education shares, racial composition, % of female adults living in the surrounding area holding a Graduate degree, and an indicator that the school was in an affluent suburban district (DFG I or J).

We played around with multiple options and this is where we ended up. One of the more interesting revelations was that poverty seemed to have stronger effects on outcomes in population dense urban centers (our Urban x Free Lunch interaction term). This finding is common and can be explained in multiple ways (I’ll have to get to that another time).

We also found that certain resource measures were associated with higher (or lower) outcome schools. Schools where teachers had higher salaries than other similar teachers (by degree and experience) in the surrounding labor market tended to have higher outcomes. And schools with larger shares of teachers in their first three years with only a BA had lower outcomes.

We (I) actually took the analyses a step further and estimated preliminary models of the costs of producing desired outcome targets (models which I subsequently improved upon). The key element of these models was to figure out if there were, in fact, alternative or additional demographic measures for districts that might help to better capture which districts have legitimately higher costs of achieving desired student outcomes. That is, what kind of stuff should be weighted, and/or weighted more heavily in the state school finance formula.  Specifically, what alternatives do we have for addressing poverty?

This was the first attempt:

And this was the second attempt (in a published article):

  • Baker, B.D., Green, P.C. (2009) Equal Educational Opportunity and the Distribution to State Aid to Schools: Can or should racial composition be a factor? Journal of Education Finance 34 (3) 289-323

What we found was that poverty (measured by % free lunch) indeed strongly affects the costs of improving student outcomes, specifically applied to New Jersey districts, in one case focusing only on K-12 unified districts and in the second case all NJ districts. This finding is not a revelation.

We also found that one might capture additional “costs” by including measures of school district racial composition, and we discuss the legal implications of this finding in several related articles (here, here & here). But, we also point out that there are alternatives for capturing some of the same effect, including the Urban x Poverty interaction.

So yes, we can make our statistical models and analyses ever more nuanced to more thoroughly explain the links between student backgrounds and student outcomes, and the costs of improving those outcomes. And, to the extent we can, we should.  But the fact is that poverty still matters, and it seems to matter statistically even when we measure it with the imperfect, crude proxy of children qualified for free or reduced price lunch.

In summary, despite the apparent brilliant wisdom conveyed in the graph at the outset of this post:

  1. Poverty as measured by free and reduced lunch status remains a very strong predictor of variations in proficiency rates across New Jersey schools; and
  2. Various measures of poverty, including free lunch status, and census poverty rates interacted with urban population density strongly influence the costs of improving outcomes across New Jersey school districts (and to an extent that far exceeds the weights in the current school finance formula).

But it’s still a really fun graph!

Here’s a link to a related article on schools supposedly “beating the odds” (like those in the above graph)

And here’s a link to my preliminary analyses which never saw the light of day (rough and unedited, in its original draft form): BAKER.DRAFT.JUNE_08

About those Dice… Ready, Set, Roll! On the VAM-ification of Tenure

A while back I wrote a post (and here) in which I explained that the relatively high error rates in Value-added modeling might make it quite difficult for teachers to get tenure under some newly adopted and other proposed guidelines and much easier to lose it, even after waiting years to get lucky [& yes I do mean LUCKY] enough to obtain it.

The standard reformy template is that teachers should only be able to get tenure after 3 years of good ratings in a row and that teachers should be subject to losing tenure if they get 2 bad years in a row.  Further, it is possible that the evaluations might actually stipulate that you can only get a good rating if you achieve a certain rating on the quantitative portion of the evaluation – or the VAM score. Likewise for bad ratings (that is, the quantitative measure overrides all else in the system).

The premise of the dice rolling activity from my previous post was that it is necessarily much less likely to roll the same number (or subset of numbers) three times in a row than twice (exponentially in fact). That is, it is much harder to overcome the odds based on error rates to achieve tenure, and much easier to lose it. Again, this is much due to the noisiness of the data, and less due to the difficulty of actually being “good” year after year. The ratings simply jump around a lot. See my previous post.

So, for those of you energetic young reformy wanna be teachers out there thinkin’ – hey, I can cut it – I’ll take my chances and my “good” teaching will overcome those odds – generating year-after-year top quartile rankings? Alot of that is totally out of your control! [Look, I would have been right there with you when I graduated college]

But my first post on this topic was all in hypothetical-land. Now, with the newly released NYC teacher data we can see just how many teachers actually got three-in-a-row in the past three years [among those actually teaching the same subject and grade level in the same school], applying different ranges of “acceptableness” or not.

So, here, I give the benefit of the doubt, and set a reasonably low bar for getting a good rating – the median or higher [ignoring error ranges and sticking with the type of firm cut-points that current state policies and local contracts seem to be adopting]. Any teacher who gets the median or higher 3 years in a row can get tenure! otherwise, keep trying until you get your three in a row? How many teachers is that? How many overcome the odds of the randomness and noise in the data? Well, here it is:

As percentiles dictate (by definition) about half of the teachers in the data are in the upper half in the most recent year. But, only about 20% of teachers in any grade or subject are above the median two years in a row. Further, only about 6 to 7% actually were lucky enough to land in the upper half for three years running!  Assuming stability remains relatively similar over time, we could expect that in any three year period, about 7% of teachers might string together three above-the-medians in a row. At that pace, tenure will be awarded rather judiciously. (but actually, stability in the most recent year over prior is unusually high)

Let’s say I cut teachers a break and only take tenure away if they get two in a row not in the bottom half, but rather all the way down into the bottom third!  What are the odds? How many teachers actually get two years in a row in the bottom third?

Well, here it is:

That’s rather depressing isn’t it. The chances of ending up in the bottom third two years in a row are about the same as the chances of ending up in the top half three years in a row!

Now, perhaps you’re thinkin’ Big Deal. So you jump into and out of the edges of these categories. That just means you’re not really solidly in the “good” or the “bad” and it should take you longer to get tenure. That’s fair? After all, it’s not like any substantial portion of teachers are actually jumping back and forth between the top half and the bottom third?

  • In ELA,  14% of those in the top half in 2010 were in the bottom third in 2009
  • In ELA, 23.9% in the top half in 2009 were in the bottom third in 2010
  • In Math (where the scores are more stable in part because they appear to retain some biases), 9% of those in the top half in 2010 were in the bottom third in 2009
  • In Math, 26% of those in the bottom third in 2009 were in the top half in 2010 and nearly 16% of those in the top half in 2009 ended up in the bottom third in 2010.

[corrected]

Most of these shifts if not nearly all of them are not because the teacher actually became a good teacher or became a bad teacher from one year to the next.

The big issue here is the human side of this puzzle. None of the existing deselection or tightened tenure requirement simulations of the supposed positive effects of leveraging VAM estimates to improve student outcomes makes even halfhearted attempts to account for human behavioral responses to a system driven by these imprecise and potentially inaccurate metrics. All adopt the oversimplified “all else  equal” assumption of an unending supply of new teacher candidates that are equal in quality to the  current average teacher and with comparable standard deviation.

Reformy arguments ratchet these assumptions up a notch. The most reformy arguments in favor of moving toward these types of tenure and de-tenuring provisions posit that making tenure empirically performance based and de-selecting the “bad” teachers will strengthen the teaching profession. That better applicants – the top third of college graduates – will suddenly flock to teaching instead of other currently higher paying professions.

But, with so little control over one’s destiny is that really likely to be the case? It certainly stands to be a frustrating endeavor to achieve any level of job stability. And it doesn’t look like average compensation will be rising in the near future to compensate for this dramatic increase in risk. Further, if we tie compensation to these ratings either as one-time bonuses or as salary adjustments, many teachers who, by chance, get good ratings in one year will, by chance again, get bad ratings the next year.  Teachers will have a difficult time even guessing at what their compensation might look like the following year. And since the ratings are necessarily relative (based on percentiles) the distribution of additional compensation must involve winners and losers. The luckier one or a handful of teachers get in a given year, the larger the share of the merit pot they receive and the less others receive.  Once again, I do mean LUCK.

Who will really be standing in line to take these jobs? In the best case (depending on one’s point of view), perhaps a few additional energetic grads of highly selective colleges will jump into the mix for a couple of years. But as these numbers and frustrations play out over time, the pendulum is certainly likely to swing the other direction.

More risk and more uncertainty without any sign of significantly increased reward is highly unlikely to improve the teaching profession and far more likely to make things much worse, especially in already hard to staff schools and districts!

These numbers are fun to play with. I just can’t stop myself. And they have endless geeky academic potential. But I’m increasingly convinced that they have little practical value for improving school quality. And I’m increasingly disturbed by how policy makers  have adopted absurd, rigid requirements around these anything but precise and questionably accurate metrics.

 

 

Seeking Practical Uses of the NYC VAM Data???

A short while back, in a follow up post regarding the Chetty/Friedman/Rockoff study I wrote about how and when I might use VAM results, if I happened to be in a decision making role in a school or district:

I would want to be able to generate a report of the VA estimates for teachers in the district. Ideally, I’d like to be able to generate a report based on alternative model specifications (option to leave in and take out potential biases) and on alternative assessments (or mixes of them). I’d like the sensitivity analysis option in order to evaluate the robustness of the ratings, and to see how changes to model specification affect certain teachers (to gain insights, for example, regarding things like peer effect vs. teacher effect).

If I felt, when pouring through the data, that they were telling me something about some of my teachers (good or bad), I might then use these data to suggest to principals how to distribute their observation efforts through the year. Which classes should they focus on? Which teachers? It would be a noisy pre-screening tool, and would not dictate any final decision.  It might start the evaluation process, but would certainly not end it.

Further, even if I did decide that I have a systematically underperforming middle school math teacher (for example), I would only be likely to try to remove that teacher if I was pretty sure that I could replace him or her with someone better. It is utterly foolish from a human resource perspective to automatically assume that I will necessarily be able to replace this “bad” teacher with an “average” one.  Fire now, and then wait to see what the applicant pool looks like and hope for the best?

Since the most vocal VAM advocates love to make the baseball analogies… pointing out the supposed connection between VAM teacher deselection arguments and Moneyball, consider that statistical advantage in Baseball is achieved by trading for players with better statistics – trading up (based on which statistics a team prefers/needs).  You don’t just unload your bottom 5%  or 15% players in on-base-percentage and hope that players with on-base-percentage equal to your team average will show up on your doorstep. (acknowledging that the baseball statistics analogies to using VAM for teacher evaluation to begin with are completely stupid)

With the recently released NYC data in hand, I now have the opportunity to ponder the possibilities. How, for example, if I was the principal of a given, average sized school in NYC, might I use the VA data on my teachers to council them? to suggest personnel changes? assignment changes, or so on? Would these data, as they are, provide me any useful information about my staff and how to better my school?

For this exercise, I’ve decided to look at the year to year ratings of teachers in a relatively average school. Now, why would I bother looking at the year to year ratings when we know that the multi-year averages are supposed to more accurate – more representative of the teacher’s over time contributions? Well, you’ll see in the graphs below that those multi-year averages also may not be that useful. In many cases, given how much teacher ratings bounce around from year to year, it’s rather like assigning a grade of “C” to the kid who got Fs on the first two tests of the semester, and As on the next two or even a mix of Fs and As in some random sequence. Averages, or aggregations, aren’t always that insightful. So I’ve decided to peel it back a bit, as I likely would if I was the principal of this school seeking insights about how to better use my teachers and/or how to work with them to improve their art.

Here are the year to year Math VA estimates for my teachers who actually continue in my building from one year to the next:

Focusing on the upper left graph first, in 2008-09, Rachel, Elizabeth and Sabina were somewhat below average. In 2009-10 they were slightly above average. In fact, going to the prior year (07-08), Elizabeth and Sabina were slightly above average, and Rachel below. They reshuffle again, each somewhat below average in 2006-07, but only Rachel has a score for the earliest year. Needless to say, it’s little tricky figuring out how to interpret differences among these teachers from this very limited view of very noisy data. Julie is an interesting case here. She starts above average in 05-06, moves below average, then well above average, then back to below. She’s never in the same place twice. There could be any number of reasons for this that are legitimate (different class composition, different life circumstances for Julie, etc.). But, more likely it’s just the noise talkin’! Then there’s Ingrid, who held her own in the upper right quadrant for a few years, then disappears. Was she good? or lucky?  Glen also appears to be a tw0-in-a-row Math teaching superstar, but we’ll have to see how the next cycle works out for him.

Now, here are the ELA results:

If we accept these results as valid (a huge stretch), one might make the argument that Glen spent a bit too much of his time in 2008-09 trying to be a Math teaching superstar, and really shortchanged ELA. But he got it together and became a double threat in 2009-10?  Then again, I think I’d have to wait and see if Glen’s dot in the picture actually persists in any one quadrant for more than a year or two, since most of the others continue to bounce all over the place. Perhaps Julie, Rachel, Elizabeth and Sabina really are just truly average teachers in the aggregate – if we choose to reduce their teaching to little blue dots on a scatterplot. Or perhaps these data are telling me little or nothing about their teaching. Rachel and Julie were both above average in 05-06, with former? colleague (or left the VAM mix) Ingrid. Rachel drops below average and is joined by Sabina the next year. Jennifer shows up as a two-year very low performer, then disappears from the VAM mix. But Julie, Rachel, Sabina and Elizabeth persist, and good for them!

So, now that I’ve spent all of my time trying to figure out if Glen is a legitimate double-threat superstar and what, if anything I can make of the results for Julie, Rachel, Elizabeth and Sabina, It’s time to put this back into context, and take a look at my complete staffing roster for this school (based on 2009-10 NYSED Personnel Master File). Here it is by assignment code, where “frequency” refers to the total number of assigned positions in a particular area:

So, wait a second, my school has a total of 28 elementary classroom teachers. I do have a total of 11 ELA and 10 Math ratings in 2009-10, but apparently fewer than that (as indicated above) for teachers teaching the same subject and grade level in sequential years (the way in which I merged my data). Ratings start in 4th grade, so that knocks out a big chunk of even my core classroom teachers.

I’ve got a total of 108 certified positions in my school, and I’m spending my time trying to read these tea leaves which pertain to, oh… about 5% of my staff (who are actually  there, and rated, on multiple content areas, for more than a few years).

By the way, by the time I’m looking at these data, it’s 2011-12, two years after the most recent value-added estimates and not too many of my teachers are posting value-added estimates more than a few years in a row. How many more are gone now? Sabina, Rachel, Elizabeth, Julie? Are you still even there? Further, even if they are there, I probably should have been trying to make important decisions in the interim and not waiting for this stuff. I suspect the reports can/will be produced more likely on a 1 year lag, but even then I have to wait to see how year-to-year ratings stack up for specific teachers.

From a practical standpoint, as someone who would probably try to make sense of this type of data if I was in the role of school principal (‘cuz data is what I know, and real “principalling” is not!), I’m really struggling to see the usefulness of it.

See also my previous post on Inkblots and Opportunity Costs.

Note for New Jersey readers: It is important to understand that there are substantive differences between the Value-added estimates produced in NYC and the Student Growth Percentile’s being produced in NJ. The bottom line – while the value-added estimates above fail to provide me with any meaningful insights, they are conceptually far superior (for this purposes) to SGP reports.

These value-added estimates actually are intended to sort out the teacher effect on student growth. They try to correct for a number of factors, as I discuss in my previous post.

Student Growth Percentiles do not even attempt to isolate the teacher effect on student growth, and therefore it is entirely inappropriate to try to interpret SGP’s in this same way. SGPs could conceivably be used in a VAM, but by no means should ever stand alone.

They are NOT A TEACHER EFFECTIVENESS EVALUATION TOOL. THEY SHOULD NOT BE USED AS SUCH.  An extensive discussion of this point can be found here:

https://schoolfinance101.wordpress.com/2011/09/02/take-your-sgp-and-vamit-damn-it/

https://schoolfinance101.wordpress.com/2011/09/13/more-on-the-sgp-debate-a-reply/

You’ve Been VAM-IFIED! Thoughts (& Graphs) on the NYC Teacher Data

Readers of my blog know I’m both a data geek and a skeptic of the usefulness of Value-added data specifically as a human resource management tool for schools and districts. There’s been much talk this week about the release of the New York City teacher ratings to the media, and subsequent publication of those data by various news outlets. Most of the talk about the ratings has focused on the error rates in the ratings, and reporters from each news outlet have spent a great deal of time hiding behind their supposed ultra-responsibleness of being sure to inform the public that these ratings are not absolute, that they have significant error ranges, etc.  Matt Di Carlo over at Shanker Blog has already provided a very solid explanatory piece on the error ranges and how those ranges affect classification of teachers as either good or bad.

But, the imprecision – as represented by error ranges – of each teacher’s effectiveness estimate is but one small piece of this puzzle. And in my view, the various other issues involved go much further in undermining the usefulness of the value added measures which have been presented by the media as necessarily accurate albeit lacking in precision.

Remember, what we are talking about here are statistical estimates generated on tests of two different areas of student content knowledge – math and English language arts.  What is being estimated is the extent of change in score (for each student, from one year to the next) on these particular forms of these particular tests of this particular content, and only for this particular subset of teachers who work in these particular schools.

We know from other research (from Corcoran and Jennings, and form the first Gates MET report) that value added estimates might be quite different for teachers of the same subject area if a different test of that subject is used.

We know that summer learning may affect student annual value added, yet in this case, NYC is estimating teacher effectiveness on student outcomes from year to year. That is, the difference in a students’ score on one day in the spring of 2009 to another in the spring of 2010, is being attributed to a teacher who has contact, for a few hours a day with that child from September to June (but not July and August).

The NYC value-added model does indeed include a number of factors which attempt to make fairer comparisons between teachers of similar grade levels, similar class sizes, etc. But we also know that those attempts work only so well.

Focusing on error rate alone presumes that we’ve got the model and the estimates right – that we are making valid assertions about the measures and their attribution to teaching effectiveness.

That is, that we really are estimating the teacher’s influence on a legitimate measure of student learning in the given content area.

Then error rates are thrown into the discussion (and onto the estimates) to provide the relevant statistical caveats about their precision.

That is, accepting that we are measuring the right thing and rightly attributing it to the teacher, there might be some noise – some error – in our estimates.

If the estimates lack validity, or are biased, the rate of noise, or error around the invalid or biased estimate is really a moot point.

In fact, as I’ve pointed out before on this blog, it is quite likely that value added estimates that retain bias by failing to fully control for outside influences are actually likely to be more stable over time (to the extent that the outside influences remain more stable over time). And that’s not a good thing.

So, to the news reporters out there, be careful about hiding behind the disclaimer that you’ve responsibly provided the error rates to the public. There’s a lot more to it than that.

Playing with the Data

So, now for a little playing with the data, which can be found here:

http://www.ny1.com/content/top_stories/156599/now-available–nyc-teacher-performance-data-released-friday#doereports

I personally wanted to check out a few things, starting with assessing the year to year stability of the ratings. So, let’s start with some year to year correlations achieved by merging the teacher data reports across years for teachers who stayed in the same school teaching the same subject area to the same grade level. Note that teacher IDs are removed from the data. But teachers can be matched within school, subject and grade level, by name over time (by concatenating the dbn [school code], teacher name, grade level and subject area [changing subject area and grade level naming to match between older and newer files]). First, here’s how the year to year correlations play out for teachers teaching the same grade, subject area and in the same school each year.

Sifting through the Noise

As with other value-added studies, the correlation across teachers in their ratings from one year to the next seem to range from about .10 to about .50. Note that between 2009-10 and 2008-09 Math value-added estimates were relatively highly correlated, compared to previous years (with little clear evidence as to why, but for possible changes to assessments, etc.). Year to year correlations for ELA are pretty darn low, especially prior to the most recent two years.

Visually, here’s what the relationship between the most recent two years of ELA VAM ratings looks like:

I’ve done a little color coding here for fun. Dots coded in orange are those that stayed in the “average” category from one year to the next. Dots in bright red are those that stayed “high” or “above average” from one year to the next and dots in pale blue were “low” or “below average” from one year to the next. But there are also significant numbers of dots that were above average or high in one year, and below average or low in the next.  9 to 15% (of those who were “good” or were “bad” in the previous year) move all the way from good to bad or bad to good. 20 to 35% who were “bad” stayed “bad” & 20 to 35% who were “good” stayed “good.” And this is between the two years that show the highest correlation for ELA.

Here’s what the math estimates look like:

There’s actually a visually identifiable positive relationship here. Again, this is the relationship between the two most recent years, which by comparison to previous years, showed a higher correlation.

For math, only about 7% of teachers jump all the way from being bad to good or good to bad (of those who were “good” or “bad” the previous year), and about 30 to 50% who were good remain good, or who were bad, remain bad.

But, that still means that even in the more consistently estimated models, half or more of teachers move into or out of the good or bad categories from year to year, between the two years that show the highest correlation in recent years.

And this finding still ignores whether other factors may be at play in keeping teachers in certain categories. For example, whether teachers stay labeled as ‘good’ because they continue to work with better students or in better environments.

Searching for Potential Sources of Bias

My next fun little exercise in playing with the VA data involved merging the data by school dbn to my data set on NYC school characteristics. I limited my sample for now to teachers in schools serving all grade levels 4 to8 and w/complete data in my NYC schools data, which include a combination of measures from the NCES Common Core and NY State School Report Cards. I did a whole lot of fishing around to determine whether there were any particular characteristics of schools that appeared associated either or both with individual teacher value added estimates or with the likelihood that a teacher ended up being rated “good” or “bad” by my aggregations used here.  I will present my preliminary findings with respect to those likelihoods here.

Here are a few logistic regression models of the odds that a teacher was rated “good” or rated “bad” based on a) the multi-year value-added categorical rating for the teacher and b) based on school year 2009 characteristics of their school across grades 4 to 8.

After fishing through a plethora of measures on school characteristics (because I don’t have classroom characteristics for each teacher), I found that with relative consistency, using the Math ratings, teachers in schools with higher math proficiency rates tended to get better value added estimates for math and were more likely to be rated “good.” This result was consistent across multiple attempts, models, subsamples (Note that I’ve only got 1300 of the total math teachers rated here… but it’s still a pretty good and well distributed sample). Also, teachers in schools with larger average class size tended to have lower likelihood of being classified as “above average” or “high” performers. These findings make some sense, in that peer group effect may be influencing teacher ratings and class size may effects (perhaps as spillover?) may not be fully captured in the model. The attendance rate factor is somewhat more perplexing.

Again, these models were run with the multi-year value added classification.

Next, I checked to see if there were differences in the likelihood of getting back to back good or back to back bad ratings by school characteristics. Here are the models:

As it turns out, the likelihood of achieving back to back good or back to back bad ratings is also influenced by school characteristics. Here, as class size increases by 1 student, the likelihood that a teacher in that school gets back to back bad ratings goes up by nearly 8%. The likelihood of getting back to back good ratings declines by 6%. The likelihood of getting back to back good ratings increases by nearly 8% in a school with 1% higher math proficiency rate in grades 4 to 8.

These are admittedly preliminary checks on the data, but these findings in my view do warrant further investigation into school level correlates with the math value added estimates and classifications in particular. These findings are certainly suggestive of possible estimate bias.

Who Gets VAM-ED?

Finally, while there’s been much talk about these ratings being released for such a seemingly large number of teachers – 18,000 – it’s important to put those numbers in context in order to evaluate their relevance. First of all, it’s 18,000 ratings, not teachers. Several teachers are rated for both math and ELA, bringing the total number of individuals down significantly from 18,000.  In still generous terms, the 18,000 or so are more like “positions” within schools, but even then, the elementary classroom teacher covers both areas even within the same assignment or position.

Based on the NY State Personnel Master File for 2009-10, there were about 150,000 (linkable to individual schools including those in the VA reports) certified staffing assignments in New York City in 2009-10 (where individual teachers cover more than one assignment). In that light, 18,000 is not that big a share.

But let’s look at it at the school level using two sample schools. For these comparisons I picked two schools which had among the largest numbers of VA math estimates (with many of the same teachers in those schools having VA ELA estimates).  The actual listing of teacher assignments is provided for two schools below, along with the number of teachers for whom there were Math VA estimates.  Again, these are schools with among the highest reported number (and share) of teachers who were assigned math effectiveness ratings.

In each case, we are Math VAM-ing around 30% of total teacher assignments [not teachers, but assignments] (with substantial overlap for ELA). Clearly, several of the teacher assignments in the mix for each school are completely un-VAM-able. States such as Tennessee have adopted the absurd strategy that these other staff should be evaluated on the basis of the scores for those who can be VAM-ed.

A couple of issues are important to consider here. First, these listings more than anything convey the complexity of what goes on in schools – the type of people who nee to come together and work together collectively on behalf of the interests of kids. VAM-ing some subset of those teachers and putting their faces in the NY Post is unhelpful in many regards. Certainly there exist significant incentives for teachers to migrate to un-vammed assignments to the extent possible.   And please don’t tell me that the answer to this dilemma is to VAM the Orchestra conductor or Art teacher. That’s just freakin’ stupid!

As Preston Green, Joseph Oluwole and I discuss in our forthcoming article in the BYU Education and Law Journal, coupling the complexities of staffing real schools and evaluating the diverse array of professionals that exist in those schools with VAM-based rating schemes necessarily means adopting differentiated contractual agreements, leading to numerous possible perverse incentives and illogical management decisions (as we’ve already seen in Tennessee as well as in the structure of the DC IMPACT contract).

Student Enrollments & State School Finance Policies

Most readers of the NJDOE report on reforming the state’s school finance formula likely glided right past the seemingly innocuous recommendation to shift the enrollment count method for funding from a fall enrollment count to an average daily attendance figure. After all, on its face, the argument provided seems to make sense. Let’s fund on this basis so that we can incentivize increased attendance in our most impoverished and low performing districts. (Another argument I’ve heard in other states is “why would we fund kids who aren’t there?”). The data were even presented to validate that attendance rates are lower in these districts (Figure 3.1).

I, however, could not let this pass, because Average Daily Attendance as a basis for funding is actually a well understood trick of the trade for reducing aid to districts and schools with higher poverty and minority concentrations.  I have both blogged about this topic in the past, and written published research directly and indirectly related to the topic.[1]

The intent of this blog post is to provide a (very limited, oversimplified) primer on the common methods of counting general student populations for purposes of determining state aid to schools (charter and district) and to provide some commentary on the pros and cons of each.

This blog post doesn’t touch upon the layers of additional factors associated with counting all of the various special student categories that may drive additional aid to local public school districts and charter schools.  I have, however, written numerous articles and reports on that topic as well. I’m writing about the underlying, basic count methods in this post because they are so often overlooked. But, they tend to have multiplicative effects throughout state school finance formulas.

So, here’s the primer (in somewhat oversimplified terms since there are multiple permutations on each):

Definitions

Fall Enrollment Count

A fall enrollment or fall attendance count is often based on the count of students either enrolled or specifically in attendance on a single date early in the fall of the school year (Oct 1, Oct 15, etc.). That figure may be based on students who have enrolled in a district or on students who actually attended on the given day. These single day counts in the fall are sometimes reconciled with a spring/January re-calculation leading to either upward or downward adjustments in remaining aid payments.

Average Daily Attendance

Average daily attendance counts are based on the numbers of children actually in attendance in a school or district each day, then, typically averaged on a bimonthly or quarterly basis in order to determine mid-year adjustments to state aid.

Average Daily Membership

Average Daily Membership or Average Daily Enrollment measures the numbers of children enrolled to attend a specific district throughout the year, and may also be periodically reconciled, as students enter and leave the district or school mid-year.

Comments on Each

Fall Enrollment Count

Fall enrollment counts allow for rational annual budget planning.  Note that there is a difference between enrollment and attendance.  Conceptually, attendance can’t exceed enrollment, if enrollment represents all those eligible to attend and enrolled to attend a particular school or district.   To some degree, it makes sense to base funding on the students enrolled rather than those that can be tracked down to attend on a single day in the fall.

Single point in time enrollment counts do not allow for mid-term adjustments to aid when students come or go during the school year. One might argue that this means that districts with significant mid-year attrition will be overpaid throughout the year. But these districts have had to plan their budgets and staffing based on the numbers they expected at the beginning of the year (though usually state aid estimates for budgeting purposes are based on prior year fall enrollments), and cannot easily make mid-year adjustments to accommodate losses in aid resulting from losses in students.

Average Daily Attendance (ADA)

One major problem with ADA is that districts must plan their budgets and staffing on an annual basis, and mid-year adjustments based on attendance counts, result in reductions in aid that are difficult to absorb mid-stream in the school year.  The bottom line is that districts and charter schools are obligated to have services available for all who might attend, not just all who do on a given day.

In addition, districts with higher poverty concentrations and high minority concentrations tend to have lower attendance rates for a variety of reasons beyond their control.  Students from disrupted, low income households are more likely, for example to have illnesses that go untreated, be malnourished or be exposed to other factors (second hand smoke & other environmental hazards) that compromise their health.  They have less access to transportation, and often come from single parent households, limiting parental supports to get them out the door to school.  One cannot fix these factors by reducing aid to school districts facing these dilemmas.

It is well understood that financing schools on the basis of average daily attendance systematically reduces aid to higher poverty districts.  The NJDOE report acknowledges that funding on this basis would lead to a reduction in aid of over 3% for districts in DFG A versus average districts (see figure 3.1).  Further, there is no substantive evidence that funding formulas based on ADA have ever improved or better balanced student attendance rates by district poverty and race over time.

Using ADA as the basis for determining funding can have other unintended consequences, such as increased numbers of school closure days in order to reduce the risk of low attendance.[2] School districts might, for example, choose to close for increased numbers of days during flu season, as attendance drops off. Closures typically do not reduce average daily attendance. In fact, closures are used by schools/districts operating under this model as a way to avoid low attendance days.  And some districts may be more significantly affected than others in this regard. Weather related decisions may also be affected.

Average Daily Membership (ADM)

ADM requires the State in collaboration with school districts to accurately manage their enrollment information.  It is unclear if NJDOE has the present ability to implement ADM in New Jersey

As with average daily attendance, districts plan their budgets and staffing on an annual basis, and mid-year adjustments to enrollment, leading to reductions in aid, may not easily be absorbed mid-stream.

Within year moves tend to more often affect higher poverty, urban districts,[3] potentially causing greater fluctuations in the budgets of these districts and complicating their financial planning.

A Few Examples from States

States in the Northeast do not tend to use Average Daily Attendance as their method for determining school aid. Rather, New York State had been using attendance as a factor in a prior school funding formula.[4]  Presently among Northeastern states, Connecticut uses Resident Pupils within its Education Cost Sharing Formula,[5] New York uses ADM toward the estimation of Total Aidable Foundation Pupil Units*,[6] Pennsylvania uses ADM,[7] Massachusetts uses a Fall Enrollment figure,[8] and Rhode Island uses ADM.[9] Other states around the country, including Kansas[10] and Colorado[11] use a fall enrollment count date.  Many others around the country use variations on either ADM or FTE, including Florda and Tennessee.  A few states — e.g., Missouri,[12] Texas and Illinois — still use ADA.  But published literature and legal analyses have, in fact, criticized the racially disparate effects of Missouri’s school funding formula (prior to recent reforms).[13]

Application to New Jersey Data

So, just how disparate are attendance rates across New Jersey school districts, by race and low income status, as well as by district factor grouping? Here are a few quick graphs based on the 2010-11 school level data on enrollments (enr file from NJDOE) and attendance rates (school report card d-base).

In short, what these graphs show is that if aid were allocated by average daily attendance as opposed to by enrollment or membership, districts with higher percent black population or higher percent low income, would receive systematic reductions to their state aid. These reductions would be non-trivial.  High school attendance in a school that is 100% black is, on average, nearly 7% lower than in a school that is 0% black. In elementary schools, the differential is between 2% and 3%.   These differentials would translate directly to percent reductions in aid.

Enrollment Data: http://www.nj.gov/education/data/enr/

Attendance Data: http://education.state.nj.us/rc/rc10/index.html

*Note: In some parts of the NY Aid formulas, the local wealth measure for taxable assessed value per pupil uses a variant of ADA in the denominator.  This use is generally much less significant to the overall calculation of aid than using ADA directly in the calculation of the foundation allotment.


[1] Green, P.C., Baker, B.D. (2006) Urban Legends, Desegregation and School Finance: Did Kansas City Really Prove that Money Doesn’t Matter? Michigan Journal of Race and Law. 12 (1)

Baker, B.D., Green, P.C. (2005) Tricks of the Trade: Legislative Actions in School Finance that Disadvantage Minorities in the Post-Brown Era American Journal of Education 111 (May) 372-413

[3] Killeen, K., Baker, B.D. Addressing the Moving Target: Should measures of student mobility be included in education cost studies? (Available on request)

[5]http://www.sde.ct.gov/sde/lib/sde/PDF/dgm/report1/merecsgd.pdf  “Resident Students are those regular education and special education pupils enrolled at the expense of the town on October 1 of each school year.”

[6]https://stateaid.nysed.gov/budget/combaidsa_0910.htm  For calculating Foundation Aid, which has been frozen since this point in time

[8]http://finance1.doe.mass.edu/chapter70/enrollment_desc.pdf. “In order to be included, a student must be officially enrolled on October 1st. Those who leave inSeptember or arrive after October 1st are not counted. A student who happens to be absent onOctober 1st is included nonetheless; this is a measure of enrollment, not attendance.”

[13]Green, P.C., Baker, B.D. (2006) Urban Legends, Desegregation and School Finance: Did Kansas City Really Prove that Money Doesn’t Matter? Michigan Journal of Race and Law. 12 (1)   Baker and Green (2006) explain: “Missouri is among a handful of states that continues to provide aid to local public school districts on the basis of their average daily attendance (ADA) rather than enrolled pupil count or membership. From 2000 to 2004, poverty rates and black student population share alone explain 59% of variations in attendance rates across Missouri school districts enrolling over 2,000 students. Both black population share and poverty rate are strongly associated with lower attendance rates, leading to systematically lower funding per eligible or enrolled pupil in districts with higher shares of either population.”

How NOT to fix the New Jersey Achievement Gap

Late yesterday, the New Jersey Department of Education Released its long awaited report on the state school finance formula. For a little context, the formula was adopted in 2008 and upheld by the court as meeting the state constitutional standard for providing a thorough and efficient system of public schooling. But, court acceptance of the plan came with a requirement of a review of the formula after three years of implementation. After a change in administration, with additional legal battles over cuts in aid in the interim, we now have that report.  The idea was that the report would suggest any adjustments that may need to be made to the formula to make the distributions of aid across districts more appropriate/more adequate (more constitutional?). I laid out my series of proposed minor adjustments in a previous post.

Reduced to its simplest form, the current report argues that New Jersey’s biggest problem in public education is its achievement gap – the gap between poor and minority students and between non-poor and non-minority students.  And the obvious proposed fix? To reduce funding to high poverty, predominantly minority school districts and increase funding to less poor districts with fewer minorities.

Why? Because money and class size simply don’t matter. Instead, teacher quality and strategies like those  used in Harlem Childrens’ Zone do!

Here’s my quick, day-after, critique:

The Obvious Problem? New Jersey’s Huge & Unchanging Achievement Gap

The front end of the report provides lots of nifty graphs based on cohort proficiency rates on tests which change substantially in some years. The graphs are neatly laid out to validate the argument that New Jersey’s achievement gap is large and hasn’t changed much.  First, on the point of the largeness of the gap, in national context. I’ve explained here how the NJ poor-non-poor gap is actually relatively average nationally. That’s not to say that it’s acceptable, we ought to work on this, by whatever reasonable means we can.

Thankfully (so I don’t have to revisit all of the problems here), the remainder of the achievement gap analysis presented by NJDOE is thoroughly critiqued in a recent post by Matt Di   Carlo at Shanker Blog. DiCarlo summarizes some of the NJ achievement gap and trend data to point out:

The results for eighth grade math and fourth grade reading are more noteworthy – on both tests, eligible students in NJ scored 12 points higher in 2011 than in 2005, while the 2011 cohorts of non-eligible students were higher by roughly similar margins.

In other words, achievement gaps in NJ didn’t narrow during these years because both the eligible and non-eligible cohorts scored higher in 2011 versus 2005. Viewed in isolation, the persistence of the resulting gaps might seem like a policy failure. But, while nobody can be satisfied with these differences and addressing them must be a focus going forward, the stability of the gaps actually masks notable success among both groups of students (at least to the degree that these changes reflect “real” progress rather than compositional changes).

http://shankerblog.org/?p=5102

Revelation? Gaps are a function of the height of the highs as much as the depth of the lows. If both get better, gaps don’t close as much. Gaps are still a problem, and must be addressed even if the highs get higher, because opportunity for access to college and on the labor market is relative. But, the framing of the NJ achievement gap by NJDOE is unhelpful in this regard, and the proposed solutions harmful. How does it make sense then, to provide greater increases in state aid to those students in districts at the highs and less to the lows?

Supporting Claims for Solutions?

Of course, to support the eventual pre-determined (utterly absurd) conclusion that the way to close this achievement gap is to cut aid to the poor and give it to the less poor requires that the report validate that money really has nothing to do with it. That, arguably, all of that money and increased staffing actually made things worse. Further, that cutting money from poor districts is what will make them better. I guess it also then stands to reason that giving larger aid increases to less poor districts might also make them worse, and viola – the achievement gap shrinks!

  • Claim 1: Money Has Nothing to do with It

The claims that money doesn’t matter are built on some graphs which could easily make my list of dumbest graphs (or at least most pointless, deceptive, meaningless ones). Here’s one which is intended to convince the reader that all of that money sent to Abbott districts was for naught:

The report uses the graph to conclude:

While the above analysis is not sufficient to say whether new spending has had a positive impact on student achievement, it makes clear that financial resources are not the only – and perhaps not even the most important – driver of achievement.

If the graph isn’t sufficient to make this point, then why use the graph to try to make this point? Clearly, looking only at two variables – percent change in revenue and percent change in proficiency rates – is not even sufficient to make the softened claim “perhaps not even the most important” factor in improving student achievement.  These assertions can’t be supported in any way by this graph.

But even more suspect is the assertion embedded in the policy recommendations  that therefore, cutting aid from high poverty districts will cause no harm.

Better research on whether and to what extent school finance reforms improve student outcomes &/or equity of outcomes shows that in fact, school finance reforms can and do improve both the level and distribution of student outcomes: http://www.tcrecord.org/content.asp?contentid=16106

Higher quality research, in contrast, shows that states that implemented significant reforms to the level and/or distribution of funding tend to have significant gains in student outcomes.

Further, research on the broader question (based on real analysis) of whether and how class size and money matter indicates that, in simple terms, money does matter, and that things that cost money, like class size reduction and improving teacher quality (which does cost money) matter:  http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

Perhaps most importantly, even the research that has cast doubt on the strength of the positive influence of money on student outcomes has never validated that cuts to funding are not harmful and may be helpful. This is an absurd and unfounded claim.

Richard Murnane of Harvard said it well enough back in the early 1990s:

“In my view, it is simply indefensible to use the results of quantitative studies of the relationship between school resources and student achievement as a basis for concluding that additional funds cannot help public school districts. Equally disturbing is the claim that the removal of funds… typically does no harm.” (p. 457)

Murnane, R. (1991) Interpreting the Evidence on Does Money Matter? Harvard Journal of Legislation. 28 p. 457-464

Though not directly stated in the NJDOE report, it is implicit in the recommendations.

  • Claim 2: Teacher Quality & Harlem Childrens’ Zone-Style Strategies Can Close the Gap

Deeply embedded in the NJDOE report, making the transition from claims of dire achievement gaps toward how to fix them, is a discussion of how the obvious solutions based on current research must have to do with improving teacher quality and doing stuff like Harlem Childrens’ Zone does.  The NJDOE report includes two particularly bold statements that these two strategies alone – but certainly not money – can close the black-white achievement gap:

Having a highly effective teacher for three to five years can erase the deficits that the typical disadvantaged student brings to school.xxiii

Evidence from the Harlem Children’s Zone provides a similar demonstration of the power of schools to close the black-white achievement gap existing in New York.xxiv

Needless to say, these interpretations of the existing research are a massive unwarranted stretch. Matt Di    Carlo addresses the issue of  how many teachers does it take to close the achievement gap?

Even then, the implicit assertion of the report in general, that money has nothing to do with teacher quality or the distribution of teacher quality, is ridiculous. As I explain here:

A substantial body of literature has accumulated to validate the conclusion that both
teachers’ overall wages and relative wages affect the quality of those who choose to enter the teaching profession, and whether they stay once they get in. For example, Murnane and Olson (1989) found that salaries affect the decision to enter teaching and the duration of the teaching career, while Figlio (1997, 2002) and Ferguson (1991) concluded that higher salaries are associated with more qualified teachers.

http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

And further, on the flip side, cuts to funding and severe constraints on spending growth can reduce teacher quality:

Research on the flip side of this issue – evaluating spending constraints or reductions – reveals the potential harm to teaching quality that flows from leveling down or reducing spending. For example, David Figlio and Kim Rueben (2001) note that, “Using data from the National Center for Education Statistics we find that tax limits systematically reduce the average quality of education majors, as well as new public school teachers in states that have passed these limits.”

And, if we are interested in achievement gaps, and better distributing the quality of teachers across richer and poorer districts and children:

Salaries also play a potentially important role in improving the equity of student outcomes. While several studies show that higher salaries relative to labor market norms can draw higher quality candidates into teaching, the evidence also indicates that relative teacher salaries across schools and districts may influence the distribution of teaching quality. For example, Ondrich, Pas and Yinger (2008) “find that teachers in districts with higher salaries relative to non-teaching salaries in the same county are less likely to leave teaching and that a teacher is less likely to change districts when he or she teaches in a district near the top of the teacher salary distribution in that county.”

But even more strikingly, these interpretations ignore entirely that what Harlem Childrens Zone does, above and beyond anything else is to spend a ton of money (raising as much as $60,000 per pupil in private giving in some years, for additional information, see this post) and spend much of that money on providing smaller class sizes than surrounding NYC district schools.  So, in effect, what Harlem Childrens Zone shows us (in its best light) is that we can make modest progress toward closing achievement  gaps by leveraging substantial additional financial resources to provide comprehensive wrap-around community resources coupled with small class sizes.

The Proposal: Cut Aid to the Poor and Give More to the Non-Poor (& Less Poor)

After the rather predictable preamble about New Jersey’s achievement gap, coupled with classic claims that money clearly isn’t the answer, and things that actually cost money, but we’ll pretend don’t really cost money are the answer, the obvious recommendations for changes to the school finance formula are to reduce aid to the poor and give it to the less poor.

Here are the distributions of the percent change in state aid for 2012-13 across K-12 districts and the per pupil (preliminary estimates in need of updated enrollment figures) by districts arranged from lower to higher concentrations of low income children:

K-12 Unified Districts Only

K-12 Unified Districts Only

The report argues specifically that the adjustments in the aid formula for low income children should be reduced. That they should be reduced because they were increased without basis, over original recommendations provided to the state board back in 2003 (but hidden until 2006). In short, that those low income kids really don’t need that much and will be better off without it.

I critique those original recommendations in this report. Essentially, the argument is that there is simply no basis for providing as much as an additional 57% per low income child in high poverty concentration districts, therefore we should reduce it. The icing in the cake in this argument is a table in which the report points out that Texas, Vermont and Maine provide less than this. How in the heck they chose Texas, Vermont and Maine is beyond me. These states are at least a little different from NJ… and… from each other.

Beyond that, it should go without saying that the decisions of policymakers in three completely different states that aren’t New Jersey really have little or nothing to do with the cost of providing equal educational opportunity to low income kids in New Jersey.  Are we going to base all of our policies on Vermont… and Texas simultaneously? That would be a real trick? Consider the possibilities?

As my reported linked above points out, the weights in the original analysis were too low, and were thus adjusted upwards, though not necessarily far enough? On what basis? Well, the actual research on the costs of providing equal educational opportunities for low income children points to weights nearer to double, not 40% or 50% higher than average.  Here’s the most directly relevant article, from the Economics of Education Review, and here’s a link to a National Research Council report on the subject.

In a further effort to reduce aid to poorer districts (in a way that will have multiplicative effects throughout the formula) NJDOE proposes to base the allocation of aid on Average Daily Attendance. This is actually a classic, well understood Trick of the Trade for shifting aid away from poorer districts which for a variety of reasons outside their control have lower attendance rates. Way back when I started this blog, one of the topics I wrote about was these seemingly innocuous tricks (a subject of my research).  While other states do continue to use these policies, since their effects are well understood, to recommend such a change is shameless.

But even setting aside the empirical evidence on “costs,” how can it possibly make sense that achievement gaps between richer and poorer districts will be moderated by taking money from poorer districts and redistributing it back to less poor ones?

That’s the report in its essence.

We’ve got big achievement gaps.

Money doesn’t matter – in fact – it must be making things worse not better.

Therefore, to close the gaps, we need to give less of that harmful money to the poor, and more to the non-poor.

Go figure?

Reformy Platitudes & Fact-Challenged Placards won’t Get Connecticut Schools what they Really Need!

For a short while yesterday – more than I would have liked to – I followed the circus of testimony and tweets about proposed education reform legislation in Connecticut. The reform legislation – SB 24 – includes the usual reformy elements of teacher tenure reform, ending seniority preferences, expanding and promoting charter schooling, etc. etc. etc. And the reformy circus had twitpics of of eager undergrads (SFER) & charter school students (as young as Kindergarten?) shipped in and carrying signs saying CHARTER=PUBLIC (despite a body of case law to the contrary, and repeated arguments, some lost in state courts [oh], by charter operators that they need not comply with open records/meetings laws or disclose employee contracts), and tweeting reformy platitudes and links to stuff they called research supporting the reformy platform (Much of it tweeted as “fact checking” by the ever-so-credible ConnCAN).

Ignored in all of this theatre-of-the-absurd was any actual substantive, knowledgeable conversation about the state of public education in Connecticut, the nature of the CT achievement gap and the more likely causes of it, and other problems/failures of Connecticut education policy.

First, that achievement gap:

Yes, Connecticut has a large achievement gap… among the largest. But, I encourage you to read my previous post in which I explain that poverty achievement gaps in states tend to be mostly a function of income disparity in states. The bigger the income difference between rich and poor, the bigger the achievement gaps between them. But, even then, the CT achievement gap is a problem. CT’s income gaps between poor and  non-poor are most similar to those of MA and NJ, but both MA and NJ do better than CT on achievement gap measures. Here’s a graph relating income gap and achievement gap:

Connecticut has a higher than otherwise expected gap and MA, NJ and RI have lower.

But, is this because of teacher tenure? Is it because teachers aren’t regularly fired because of bad student test scores? Is it because there aren’t enough charter or magnet schools in CT? That’s highly unlikely for several reasons.

First, teachers have tenure status in both higher and lower performing, higher and lower income districts in CT. As I show below, teacher salaries are lower and class sizes larger in disadvantaged districts. SB24 does NOTHING to fix that.

As for highly recognized charter and magnet schools in CT, these schools are actually serving far fewer of the lower income kids within the lower income neighborhoods. So, while they might be doing okay, on average, for the kids they are serving, it is equally likely that they are contributing to the achievement gap as much if not more than helping it. That’s not to say they aren’t helping the students they are serving. But rather that the segregated nature of their services is capitalizing on a peer effect of concentrating more advantaged children. Either way, these schools are unlikely to serve as a broad based solution for CT education quality in general or for resolving achievement gaps.

During this same time period, teachers in NJ and MA also had similar tenure protections and weren’t being tenured or fired based on student test scores. Still somehow, those states had smaller gaps. Further, while both other states do have charter schools, New Jersey which has a much smaller achievement gap than CT has thus far maintained a relatively small charter sector. What Massachusetts and New Jersey have done is to more thoroughly and systematically address school funding disparities.

The Real Disparities:

In a previous series of posts, I discussed what I called Inexcusable Inequalities. I actually used CT as the main example, not because CT is among the worst states on funding inequality, but because I happened to have good data on CT. CT is not among the worst. That special space is reserved for NY, IL, PA and a few others. But CT has its problems. Let’s do a quick walk through. In my previous analysis

I started my previous post by comparing per pupil spending adjusted for needs and costs across all CT school districts with actual outcomes of those districts in order to categorize CT districts into more and less advantaged groups. The differences, starting with the figure below were pretty darn striking. Districts like New Canaan, Westport and Weston have rather high need and cost adjusted spending, certainly by comparison with Bridgeport, New London or New Britain.

For Illustrative purposes, I then picked a few of the most disadvantaged CT districts and compared them to the most advantaged on a handful of measures – shown below. In this table, I report their nominal spending per pupil – not adjusted for the various needs and additional costs. Even without those adjustments, districts like Bridgeport and New Britain start well behind their more advantaged peers. And among other differences, they pay their teachers less a) on average and b) at any given level of experience or education. Pretty darn hard to recruit and retain quality teachers into these settings given the combination of working conditions and lower pay.

AND MAKING TENURE CONTINGENT ON STUDENT TEST SCORES, OR FIRING TEACHERS BASED ON STUDENT TEST SCORES WON’T FIX THAT! IT WILL FAR MORE LIKELY MAKE IT MUCH, MUCH WORSE!

Salary disparity patterns hold when comparing a) all districts in the upper right of the first figure with b) all districts in the lower left, and c) districts furthers in the lower left (severe disparity):

On top of that, class sizes are also larger in the higher need districts, despite the need for smaller class sizes to aid in closing the achievement gaps for these children (more here).

Further, as I showed in my previous post, the funding disparities have significant consequences for the depth and breadth of curricular offerings available to high students in these districts:

For this analysis, I used individual teacher level data on individual course assignments to determine the distribution of teacher assignments per child, thus characterizing each district’s and group of districts’ offerings (for related research, see: https://schoolfinance101.com/wp-content/uploads/2010/01/b-baker-mo_il-resourcealloc-aera2011.pdf)

Disadvantaged districts have far fewer total positions per child, and if we click and blow up the graph, we can see some striking discrepancies! Those high need districts have far more special education and bilingual education teachers (squeezing out other options, from their smaller pot!). Those high need districts have only about half the access to teachers in physical education assignments or art, much less access to Band (little or none to Orchestra), and significantly less access to math teachers!

IN REALLY SIMPLE TERMS, UNDER CT POLICIES, HIGH NEED DISTRICTS SUCH AS BRIDGEPORT AND NEW BRITAIN HAVE FAR FEWER RESOURCES AND FAR GREATER NEEDS. THEIR TEACHERS HAVE LOWER SALARIES AND, ON AVERAGE, LARGER CLASSES.

Messing with teacher evaluation, especially in ways as likely to do harm as to do good, is an unfortunate distraction at best. Doing so on the basis that those are the policy changes needed to close Connecticut’s achievement gap reflects an astounding degree of utter obliviousness!

What about those amazing CT charter and magnet schools? Aren’t they the ultimate scalable solution?

I’ve written much more detail here, about the issue of whether renowned CT charter schools actually “do more, with less while serving the same students.” Here are a few quick graphs. First, Amistad Academy of New Haven in context, by % free lunch:

Next, Capital Prep in Hartford in context. Now, I typically wouldn’t (shouldn’t) have to point out that a small selective magnet program drawing students across district lines is simply NOT REPRESENTATIVE and not likely a scalable solution for all kids.  It’s a potentially good option for those with access, and much of the benefit of the option likely rests in selective peer group effect (as noted above). I feel compelled, however, to point out how Capital Prep is (obviously) not a typical  school only because the head of the school seems to be trying to argue that it is a model scalable reform Really? Really? I mean…. REALLY?):

But what about Governor Malloy’s funding plan? That’ll fix it! Won’t it?

Amidst all of the reformy platitudes, misguided and fact-challenged placards and the like, there were occasional references to Governor Malloy’s changes to the state school finance formula – seemingly implying that the Governor has taken major steps toward making the (supposedly already overfunded) system fairer. There was certainly no outrage expressed at the types of disparities I note above, and all the warm fuzzy feeling anyone could possibly conjure that any finance package tied to the vast batch of reformyness on steriods would be sufficient to get the job done.

After all, new aid would be progressively distributed. Those poor districts would get, on average, about… oh… a whopping new $250 per pupil while richer districts would get only about $50 per pupil. And with this astounding outlay of fiscal effort, the most important thing is to make sure it doesn’t just go straight into the pockets of those union-lacky-lazy-self-interested-teachers, of course – or at least certainly not the “ineffective” ones.

Here are the effects of the Malloy funding increases, on a per pupil basis, if added on to Net Current Expenditures per Pupil (pulling out magnet school aid which creates a distorted representation for New Haven and Hartford):

What we have in this picture is each district as a dot (circle or triangle). Districts are sorted from low to high percent free/reduced lunch along the horizontal axis. Net Current Expenditures are on the vertical axis. Blue Circles represent current (okay, last year) levels of current expenditures per pupil. RED TRIANGLES REPRESENT THE ADDITION OF MALLOY AID. Wow… that’s one heck of a difference. That should certainly fix the disparities I laid out above! NOT!

Here it is with district names added, so you can see where some of our more disadvantaged districts start and end up:

Not that helpful for Bridgeport or New Britain, is it?

To summarize:

The fact is that EQUITABLE AND ADEQUATE FUNDING IS THE NECESSARY UNDERLYING CONDITION FOR IMPROVING EDUCATION QUALITY IN CONNECTICUT AND REDUCING ACHIEVEMENT GAPS!!!!!! (related research: http://www.tcrecord.org/library/content.asp?contentid=16106)

Equitable and adequate funding is a necessary underlying condition for running any quality school, be it a traditional public school, charter school or private school. Money matters and it matters regardless of the type of school we’re talking about.

Equitable and adequate funding is required for recruiting and retaining teachers in Connecticut’s high need, currently under-resourced schools (something charter operators realize). Recruiting and retaining teachers to work in these communities will take more, not less money.

Reformy platitudes (and fact-challenged placards) about tenure reform won’t change that.  And altering the job security landscape to move toward ill-conceived evaluation frameworks and flawed metrics will likely hurt far more than it will help.

It’s time to pack up the reformy circus, load up the buses and shred the placards and have some real, substantive conversations about improving the quality and equality of public schooling in Connecticut.

Borrowing wise words from those truly market-based, Private Independent schools…

Lately it seems that public policy and the reformy rhetoric that drives it are hardly influenced by the vast body of empirical work and insights from leading academic scholars which suggests that such practices as using value-added metrics to rate teacher quality, or dramatically increasing test-based accountability and pushing for common core standards and tests to go with them are unlikely to lead to substantial improvements in education quality, or equity.

Rather than review relevant empirical evidence or provide new empirical illustrations in this post, I’ll do as I’ve done before on this blog and refer to the wisdom and practices of private independent schools – perhaps the most market driven segment and most elite segment of elementary and secondary schooling in the United States.

Really… if running a school like a ‘business’ (or more precisely running a school as we like to pretend that ‘businesses’ are run… even though ‘most’ businesses aren’t really run the way we pretend they are) was such an awesome idea for elementary and secondary schools, wouldn’t we expect to see that our most elite, market oriented schools would be the ones pushing the envelope on such strategies?

If rating teachers based on standardized test scores was such a brilliant revelation for improving the quality of the teacher workforce, if getting rid of tenure and firing more teachers was clearly the road to excellence, and if standardizing our curriculum and designing tests for each and every component of it were really the way forward, we’d expect to see these strategies all over the home pages of web sites of leading private independent schools, and we’d certainly expect to see these issues addressed throughout the pages of journals geared toward innovative school leaders, like Independent School Magazine.  In fact, they must have been talking about this kind of stuff for at least a decade. You know, how and why merit pay for teachers is the obvious answer for enhancing teacher productivity, and why we need more standardization… more tests… in order to improve curricular rigor? 

So, I went back and did a little browsing through recent, and less recent issues of Independent School Magazine and collected the following few words of wisdom:

From Winter 2003, when the school where I used to teach decided to drop Advanced Placement courses:

A little philosophy, first. Independent schools are privileged. We do not have to respond to the whims of the state, nor to every or any educational trend. We can maximize our time attuned to students and how they learn, and to the development of curriculum that enriches them and encourages the skills and attitudes of independent thinkers. Our founding charters and missions established independence for a range of reasons, but they now give all of us relative curricular autonomy, the ability to bring together a faculty of scholars and thinkers who are equipped to develop rich, developmentally sound programs of study. As Fred Calder, the executive director of New York State Association of Independent Schools, wrote in a letter to member schools a few years ago: “If we cannot design our programs according to our best lights and the needs of our communities, then let the monolith prevail and give up the enterprise. Standardized testing in subject areas essentially smothers original thought, more fatally, because of the irresistible pressure on teachers to teach to the tests.”

http://www.nais.org/publications/ismagazinearticle.cfm?ItemNumber=144300

Blasphemy? Or simply good education!

And from way, way back in 2000, in a particularly thoughtful piece on “business” strategies applied to schools:

Educators do not respond to the same incentives as businesspeople and school heads have much less clout than their corporate counterparts to foster improvement. Most teachers want higher salaries but react badly to offers of money for performance. Merit pay, so routine in the corporate world, has a miserable track record in education. It almost never improves outcomes and almost always damages morale, sowing dissension and distrust, for three excellent reasons, among others: (1) teachers are driven to help their own students, not to outperform other teachers, which violates the ethic of service and the norms of collegiality; (2) as artisans engaged in idiosyncratic work with students whose performance can vary due to factors beyond school control, teachers often feel that there is no rational, fair basis for comparison; and (3) in schools where all faculty feel underpaid, offering a special sum to a few sparks intense resentment. At the same time, school leaders have limited leverage over poor performers. Although few independent schools have unionized staff and formal tenure, all are increasingly vulnerable to legal action for wrongful dismissal; it can take a long time and a large expense to dismiss a teacher. Moreover, the cost of firing is often prohibitive in terms of its damage to morale. Given teachers’ desire for security, the personal nature of their work, and their comparative lack of worldliness, the dismissal of a colleague sends shock waves through a faculty, raising anxiety even among the most talented.

http://nais.org/publications/ismagazinearticle.cfm?ItemNumber=144267

Unheard of! Isn’t firing the bad teacher supposed to make all of those (statistically) great teachers feel better about themselves? Improve the profession? [that said, we have little evidence one way or the other]

How can we allow our leading private, independent, market-based schools to promote such gobbledygook? Why do they do it? Are they a threat to our national security or our global economic competitiveness because they were not then, nor are they now (see recent issues: http://www.nais.org/) fast-tracking the latest reformy fads? Testing out the latest and greatest educational improvement strategies on their own students, before those strategies get tested on low income children in overcrowded urban classrooms? Why aren’t the boards of directors of these schools – many of whom are leaders in “business” – demanding that they change their outmoded ways? Why? Why? Why? Because what they are doing works! At least in terms of their success in continuing to attract students and produce successful graduates.

Now, that’s not to say that these schools are completely stagnant, never adopting new strategies or reforms. They do new stuff all the time (technology integration, etc.) – just not the absurd reformy stuff being dumped upon public schools by policymakers who in many cases choose to send their own children to private independent schools.

In my repeated pleas to private school leaders to provide insights into current movements in teacher evaluation and compensation, I’ve actually found little change from these core principles of nearly a decade ago.  Private independent schools don’t just fire at will and fire often and teacher compensation remains very predictable and traditionally structured. I’d love to know, from my private school readers, how many of their schools have adopted state mandated tests?

Private independent schools pride themselves on offering small class sizes   (see also here) and a diverse array of curricular opportunities, as well as arts, sports and other enrichment – the full package.  And, as I’ve shown in my previous research, private independent schools charge tuition and spend on a per pupil basis at levels much higher than traditional public school districts operating in the same labor market. They also pay their headmasters well! More blasphemy indeed.

In fact, aside from “no excuses” charter schools whose innovative programs consist primarily of rigid discipline coupled with longer hours and small group tutoring (not rocket science), and higher teacher salaries (here, here & here) to compensate the additional work, private independent schools may just be among the least reformy elementary and secondary education options out there.

That’s not to say they are anything like “no excuses” charter schools. They are not in many ways. But they are equally non-reformy.  In fact, the average school year in private independent schools is shorter not longer than in traditional public schools – about 165 days.  And the average student load of teachers working in private independent schools (course sections x class size) is much lower in the typical private independent school than in traditional public schools. But that ain’t reformy stuff at all, any more than trying to improve outcomes of low income kids by adding hours and providing tutoring.

None-the-less, for some reason, well educated people with the available resources, keep choosing these non-reformy and expensive schools. Some of these schools have been around for a while too! Maybe, just maybe, it’s because they are doing the right things – providing good, well rounded educational opportunities as many of them have for centuries, adapting along the way (see: http://www.exeter.edu/admissions/109_1220_11688.aspx) .  Perhaps they’ve not gone down the road of substantially increased testing and curriculum standardization, test-based teacher evaluation – firing their way to Finland – because they understand that these policy initiatives offer little to improve school quality, and much potential damage.

Perhaps there are some lessons to be learned from market based systems. But perhaps we should be looking to those market based systems that have successfully provided high quality schooling for centuries to our nation’s most demanding, affluent and well educated leaders, rather than basing our policy proposals on some make-believe highly productive private sector industry where new technologies reduce production costs to near $0 and where complex statistical models are used to annually deselect non-productive employees.

Just pondering the possibilities, and still waiting for Zuck (an Exeter alum) to invest in Harkness Tables for Newark Public Schools and class sizes of 12 across the board!

Productivity continued…updated…

Update

Mark Dynarski has added some additional useful recommendations regarding productivity research. Dynarski’s comments come in response to our suggestions for improving the rigor of productivity research, where our suggestions were based on rigorous application of relevant methods that we would expect to see applied in productivity research.

We agree with Mark Dynarski that using relevant methods alone doesn’t guarantee that they are used well.  We were starting from the position that the work of Roza and Hill doesn’t not apply relevant methods at all, no less well.  That in mind, we concur with Dynarski’s argument that it is not only important to use the right methods, but to use them well, and that reasonable standards may be applied. Here are Mark Dynarski’s suggestions:

Here are some examples of what I had in mind for research standards: the analysis has been replicated by another researcher working independently (replication being a lynchpin of scientific method). Predictions from the analysis have explanatory power outside the sample. The modeling framework is mathematically consistent. The research team has no conflicts of interest.

Applying these standards might result in excluding a lot of current research (even peer-reviewed research), but I think that would be the point Welner and Baker are making.

Readers interest in assessing research might take a look at the National Academy of Sciences’ Reference Manual on Scientific Evidence, now in its third edition, especially the chapter by Kaye and Freedman on statistics. It’s highly readable and available for free download from the academy’s website.

Below is my original reply to Mark Dynarski’s comment:

Over at Sara Mead’s Ed Week blog, Mark Dynarski checks in with a few relevant questions and observations. Actually, as it turns out, we agree ALMOST entirely with Dynarski when he says:

And focusing on peer-reviewed research as a form of quality assurance, as Baker and Welner suggest, seems problematic. Peer-reviewed research journals have highly variable degrees of editorial control, and peer review itself can vary from cursory reading to exhaustive and detailed comments. My own observation is that focusing on research with rigorous designs probably is a superior contributor to quality on average. There never seem to be enough of these when difficult debates on education policy issues arise, though.

Our only disagreement here is with his characterization of what we said. We did not uphold peer review as the gold standard. Though we probably used the phrase – peer review – too often in the brief itself. Rather, we believe just as Dynarski stated, that research with rigorous designs is a superior contributor to quality, on average! Hell yes. Absolutely. That’s our point. At the very least, the issues and questions at hand should be framed, or frame-able in relevant terms for rigorous evaluation.

That is precisely our concern with the Roza/Hill and Roza and other colleagues materials we address in our report (see pages 9 to 14). Further, a large section of our report summarizes the relevant methods – those rigorous and appropriate designs that should be applied to the questions at hand, but are noticeably absent even at the most cursory level in Roza and Hill’s materials.

To save you all the trouble of actually reading our entire brief, I’ve copied and pasted below the section of our brief where we address relevant methods:

Summary of Available Methods

Discussions of educational productivity can and should be grounded in the research knowledge base. Therefore, prior to discussing the Department of Education’s improving productivity project website and recommended resources, we think it important to explain the different approaches that researchers use to examine productivity and efficiency questions. Two general bodies of research methods have been widely used for addressing questions of improving educational efficiency. One broad area includes “cost effectiveness analysis” and “cost-benefit analysis.” The other includes two efficiency approaches: “production efficiency” and “cost efficiency.” Each of these is explained below.

 Cost-Effectiveness Analysis and Cost-Benefit Analysis

In the early 1980s Hank Levin produced the seminal resource on applying cost effectiveness analysis in education (with a second edition in 2001, co-written with Patrick McEwan),[i] helpfully titled “Cost-Effectiveness Analysis: Methods and Applications.” The main value of this resource is as a methodological guide for determining which, among a set of options, are more and less cost effective, which produce greater cost-benefit, or which have greater cost-utility.

The two main types of analyses laid out in Levin and McEwan’s book are cost-effectiveness analysis and cost-benefit analysis, the latter of which can focus on either short-term cost savings or longer term economic benefits. All these approaches require an initial determination of the policy alternatives to be compared. Typically, the baseline alternative is the status quo. The status quo is not a necessarily a bad choice. One embarks on cost-effectiveness or cost-benefit analysis to determine whether one might be able to do better than the status quo, but it is not simply a given that anything one might do is better than what is currently being done. It is indeed almost always possible to spend more and get less with new strategies than with maintaining the current course.

Cost-effectiveness analysis compares policy options on the basis of total costs. More specifically, this approach compares the spending required under specific circumstances to fully implement and maintain each option, while also considering the effects of each option on a common set of measures. In short:

Cost of implementation and maintenance of option A

Estimated outcome effect of implementing and maintaining option A

Compared to

Cost of implementation and maintenance of option B

Estimated outcome effect of implementing and maintaining option B

Multiple options may (and arguably should) be compared, but there must be at least two. Ultimately, the goal is to arrive at a cost-effectiveness index or ratio for each alternative in order to determine which provides the greatest effect for a constant level of spending.

The accuracy of cost-effectiveness analyses is contingent, in part, upon carefully considering all direct and indirect expenditures required for the implementation and maintenance of each option. Imagine, for example, program A, where the school incurs the expenses on all materials and supplies. Parents in program B, in contrast, are expected to incur those expenses. It would be inappropriate to compare the two programs without counting those materials and supplies as expenses for Program B. Yes, it is “cheaper” for the district to implement program A, but the effects of program B are contingent upon the parent expenditure.

Similarly, consider an attempt to examine the cost effectiveness of vouchers set at half the amount allotted to public schools per pupil. Assume, as is generally the case, that the measured outcomes are not significantly different for those students who are given the voucher. Finally, assume that the private school expenditures are the same as those for the comparison public schools, with the difference between the voucher amount and those expenditures being picked up through donations and through supplemental tuition charged to the voucher parents. One cannot claim greater “cost effectiveness” for voucher subsidies in this case, since another party is picking up the difference. One can still argue that this voucher policy is wise, but the argument cannot be one of cost effectiveness.

Note also that the expenditure required to implement program alternatives may vary widely depending on setting or location. Labor costs may vary widely, and availability of appropriately trained staff may also vary, as would the cost of building space and materials. If space requirements are much greater for one alternative, while personnel requirements are greater for the second, it is conceivable that the relative cost effectiveness of the two alternatives could flip when evaluated in urban versus rural settings. There are few one-size-fits-all answers.

Cost-effectiveness analysis also requires having common outcome measures across alternative programs. This is relatively straightforward when comparing educational programs geared toward specific reading or math skills. But policy alternatives rarely focus on precisely the same outcomes. As such, cost-effectiveness analysis may require additional consideration of which outcomes have greater value, which are more preferred than others. Levin and McEwan (2001) discuss these issues in terms of “cost-utility” analyses. For example, assume a cost-effectiveness analysis of two math programs, each of which focuses on two goals: conceptual understanding and more basic skills. Assume also that both require comparable levels of expenditure to implement and maintain and that both yield the same average combined scores of conceptual and basic-skills assessments. Program A, however, produces higher conceptual-understanding scores, while program B produces higher basic-skills scores. If school officials or state policy makers believe conceptual understanding to be more important, a weight might be assigned that favors the program that led to greater conceptual understanding.

In contrast to cost-effectiveness analysis, cost-benefit analysis involves dollar-to-dollar comparisons, both short-term and long-term. That is, instead of examining the estimated educational outcome effect of implementing and maintaining a given option, cost-benefit analysis examines the economic effects. But like cost-efficiency analysis, cost-benefit analysis requires comparing alternatives:

Cost of implementation and maintenance of option A

Estimated economic benefit (or dollar savings) of option A

Compared to

Cost of implementation and maintenance of option B

Estimated economic benefit (or dollar savings) of option B

Again, the baseline option is generally the status quo, which is not assumed automatically to be the worst possible alternative. Cost-benefit analysis can be used to search for immediate, or short-term, cost savings. A school in need of computers might, for example, use this approach in deciding whether to buy or lease them or it may use the approach to decide whether to purchase buses or contract out busing services. For a legitimate comparison, one must assume that the quality of service remains constant. Using these examples, the assumption would be that the quality of busing or computers is equal if purchased, leased or contracted, including service, maintenance and all related issues. All else being equal, if the expenses incurred under one option are lower than under another, that option produces cost savings. As we will demonstrate later, this sort of example applies to a handful of recommendations presented on the Department of Education’s website.

Cost-benefit analysis can also be applied to big-picture education policy questions, such as comparing the costs of implementing major reform strategies such as class-size reduction or early childhood programs versus raising existing teachers’ salaries or measuring the long-term economic benefits of those different programmatic options. This is also referred to as return-on-investment analysis.

While cost-effectiveness and cost-benefit analyses are arguably under-used in education policy research, there are a handful of particularly useful examples:

  1. Determining whether certain comprehensive school reform models are more cost-effective than others?[ii]
  2. Determining whether computer-assisted instruction is more cost-effective than alternatives such as peer tutoring?[iii]
  3. Comparing National Board Certification for teachers to alternatives in terms of estimated effects and costs.[iv]
  4. Cost-benefit analysis has been used to evaluate the long-term benefits, and associated costs, of participation in certain early-childhood programs.[v]

Another useful example is provided by a recent policy brief prepared by economists Brian Jacob and Jonah Rockoff, which provides insights regarding the potential costs and benefits of seemingly mundane organizational changes to the delivery of public education, including (a) changes to school start times for older students, based on research on learning outcomes by time of day; (b) changes in school-grade configurations, based on an increased body of evidence relating grade configurations, location transitions and student outcomes; and (c) more effective management of teacher assignments.[vi] While the authors do not conduct full-blown cost effectiveness or cost-benefit analyses, they do provide guidance on how pilot studies might be conducted.

Efficiency Framework

As explained above, cost-benefit and cost-effectiveness analyses require analysts to isolate specific reform strategies in order to correspondingly isolate and cost the strategies’ components and estimate their effects. In contrast, relative-efficiency analyses focus on the production efficiency or cost efficiency of organizational units (such as schools or districts) as a whole. In the U.S. public education system, there are approximately 100,000 traditional public schools in roughly 15,000 traditional public school districts, plus 5,000 or so charter schools. Accordingly, there is significant and important variation in the ways these schools get things done. The educational status quo thus entails considerable variation in approaches and in quality, as well as in the level and distribution of funding and the population served.

Each organizational unit, be it a public school district, a neighborhood school, a charter school, a private school, or a virtual school, organizes its human resources, material resources, capital resources, programs, and services at least marginally differently from all others. The basic premise of using relative efficiency analyses to evaluate education reform alternatives is that we can learn from these variations. This premise may seem obvious, but it has been largely ignored in recent policymaking. Too often, it seems that policymakers gravitate toward a policy idea without any empirical basis, assuming that it offers a better approach despite having never been tested. It is far more reasonable, however, to assume that we can learn how to do better by (a) identifying those schools or districts that do excel, and (b) evaluating how they do it. Put another way, not all schools in their current forms are woefully inefficient, and any new reform strategy will not necessarily be more efficient. It is sensible for researchers and policymakers to make use of the variation in those 100,000 schools by studying them to see what works and what does not. These are empirical questions, and they can and should be investigated.

Efficiency analysis can be viewed from either of two perspectives: production efficiency or cost efficiency. Production efficiency (also known as “technical efficiency of production”) measures the outcomes of organizational units such as schools or districts given their inputs and given the circumstances under which production occurs. That is, which schools or districts get the most bang for the buck? Cost efficiency is essentially the flip side of production efficiency. In cost efficiency analyses, the goal is to determine the minimum “cost” at which a given level of outcomes can be produced under given circumstances. That is, what’s the minimum amount of bucks we need to spend to get the bang we desire?

In either case, three moving parts are involved. First, there are measured outcomes, such as student assessment outcomes. Second, there are existing expenditures by those organizational units. Third, there are the conditions, such as the varied student populations,  and the size and location of the school or district, including differences in competitive wages for teachers, health care costs, heating and cooling costs, and transportation costs.

It is important to understand that all efficiency analyses, whether cost efficiency or production efficiency, are relative. Efficiency analysis is about evaluating how some organizational units achieve better or worse outcomes than others (given comparable spending), or how or why the “cost” of achieving specific outcomes using certain approaches and under some circumstances is more or less in some cases than others. Comparisons can be made to the efficiency of average districts or schools, or to those that appear to maximize output at given expense or minimize the cost of a given output. Efficiency analysis in education is useful because there are significant variations in key aspects of schools: what they spend, who they serve and under what conditions, and what they accomplish.

Efficiency analyses involve estimating statistical models to large numbers of schools or districts, typically over multiple years. While debate persists on the best statistical approaches for estimating cost efficiency or technical efficiency of production, the common goal across the available approaches is to determine which organizational units are more and less efficient producers of educational outcomes. Or, more precisely, the goal is to determine which units achieve specific educational outcomes at a lower cost.

Once schools or districts are identified as more (or less) efficient, the next step is to figure out why. Accordingly, researchers explore what variables across these institutions might make some more efficient than others, or what changes have been implemented that might have led to improvements in efficiency. Questions typically take one of two forms:

  1. Do districts or schools that do X tend to be more cost efficient than those doing Y?
  2. Did the schools or districts that changed their practices from X to Y improve in their relative efficiency compared to districts that did not make similar changes?

That is, the researchers identify and evaluate variations across institutions, looking for insights in those estimated to be more efficient, or alternatively, evaluating changes to efficiency in districts that have altered practices or resource allocation in some way. The latter approach is generally considered more relevant, since it speaks directly to changing practices and resulting changes in efficiency.[vii]

While statistically complex, efficiency analyses have been used to address a variety of practical issues, with implications for state policy, regarding the management and organization of local public school districts:

  1. Investigating whether school district consolidation can cut costs and identifying the most cost-efficient school district size.[viii]
  2. Investigating whether allocating state aid to subsidize property tax exemptions to affluent suburban school districts compromises relative efficiency.[ix]
  3. Investigating whether the allocation of larger shares of school district spending to instructional categories is a more efficient way to produce better educational outcomes.[x]
  4. Investigating whether decentralized governance of high schools improves efficiency.[xi]

These analyses have not always produced the results that policymakers would like to hear. Further, like many studies using rigorous scholarly methods, these analyses have limitations. They are necessarily constrained by the availability of data, they are sensitive to the quality of data, and they can produce different results when applied in different settings.[xii] But the results ultimately produced are based on rigorous and relevant analyses, and the U.S. Department of Education should be more concerned with rigor and relevance than convenience or popularity.

 


[i] Levin, H. M. (1983). Cost-Effectiveness. Thousand Oaks, CA: Sage.

Levin, H. M., & McEwan, P. J. (2001). Cost effectiveness analysis: Methods and applications. 2nd ed. Thousand Oaks, CA: Sage.

[ii] Borman, G., & Hewes, G. (2002). The long-term effects and cost-effectiveness of Success for All. Educational Evaluation and Policy Analysis, 24, 243-266.

[iii] Levin, H. M., Glass, G., & Meister, G. (1987). A cost-effectiveness analysis of computer assisted instruction. Evaluation Review, 11, 50-72.

[iv] Rice, J. K., & Hall, L. J. (2008). National Board Certification for teachers: What does it cost and how does it compare? Education Finance and Policy, 3, 339-373.

[v] Barnett, W. S., & Masse, L. N. (2007). Comparative Benefit Cost Analysis of the Abecedarian Program and its Policy Implications. Economics of Education Review, 26, 113-125.

[vi] See Jacob, B., & Rockoff, J. (2011). Organizing Schools to Improve Student Achievement: Start Times, Grade Configurations and Teacher Assignments. The Hamilton Project. Retrieved November 6, 2011 from http://www.hamiltonproject.org/files/downloads_and_links/092011_organize_jacob_rockoff_paper.pdf

See also Patrick McEwan’s review of this report:

McEwan, P. (2011). Review of Organizing Schools to Improve Student Achievement. Boulder, CO: National Education Policy Center. Retrieved December 2, 2011 from http://nepc.colorado.edu/thinktank/review-organizing-schools

[vii] Numerous authors have addressed the conceptual basis and empirical methods for evaluating technical efficiency of production and cost efficiency in education or government services more generally. See, for example:

Bessent, A. M., & Bessent, E. W. (1980). Determining the Comparative Efficiency of Schools through Data Envelopment Analysis, Education Administration Quarterly, 16(2), 57-75.

Duncombe, W., Miner, J., & Ruggiero, J. (1997). Empirical Evaluation of Bureaucratic Models of Inefficiency, Public Choice, 93(1), 1-18.

Duncombe, W., & Bifulco, R. (2002). Evaluating School Performance: Are we ready for prime time? In William J. Fowler, Jr. (Ed.), Developments in School Finance, 1999–2000, NCES 2002–316.Washington, DC: U.S. Department of Education, National Center for Education Statistics.

Grosskopf, S., Hayes, K. J., Taylor, L. L., & Weber, W. (2001). On the Determinants of School District Efficiency: Competition and Monitoring. Journal of Urban Economics, 49, 453-478.

[viii] Duncombe, W. & Yinger, J. (2007). Does School District Consolidation Cut Costs? Education Finance and Policy, 2(4), 341-375.

[ix] Eom, T. H., & Rubenstein, R. (2006). Do State-Funded Property Tax Exemptions Increase Local Government Inefficiency? An Analysis of New York State’s STAR Program. Public Budgeting and Finance, Spring, 66-87.

[x] Taylor, L. L., Grosskopf, S., & Hayes, K. J. (2007). Is a Low Instructional Share an Indicator of School Inefficiency? Exploring the 65-Percent Solution. Working Paper.

[xi] Grosskopf, S., & Moutray, C. (2001). Evaluating Performance in Chicago Public High Schools in the Wake of Decentralization. Economics of Education Review, 20, 1-14.

[xii] See, for example, Duncombe, W., & Bifulco, R. (2002). “Evaluating School Performance: Are we ready for prime time?” In William J. Fowler, Jr. (Ed.), Developments in School Finance, 1999–2000, NCES 2002–316. Washington, DC: U.S. Department of Education, National Center for Education Statistics.

Closing schools: Good Reasons and Bad Reasons

Current reformy rhetoric dictates that we MUST CLOSE FAILING SCHOOLS! That we must close those schools that are dropout factories or have persistently low achievement levels on state assessments. And, that we must, in the process, fire all of the staff in those schools that have caused these dismal conditions year after year, by thinking only of themselves, their tenure, their pensions and their wages – which are clearly too high for workers of their meager cognitive ability.

Take these simple bold steps and things will get better! Surely they will.

But, the bottom line is that you can’t just close down the poorest schools in any city school system and simply replace them with less poor ones – problem solved! That is, unless the larger strategy is actually about closing down entire neighborhoods, allowing them to become blighted, then seeking investors to step in and gentrify the area, replacing the old population with a new, less poor one! Problem solved. Or alternatively, if one relies on the off chance of a large scale natural disaster disproportionately displacing the poorest families to a large urban district in a neighboring state. But I digress.

A major unintended consequence of this ill-conceived reform movement is that it is distracting local school administrators and boards of education from closing and/or reorganizing schools for the right reasons by focusing all of the attention on closing schools for the wrong ones. In fact, even when school officials might wish to consider closing schools for logical reasons, they now seem compelled to say instead that they are proposing specific actions because the schools are “failing!” Not because they are too small to operate at efficient scale, that local demographic shift warrants reconsidering attendance boundaries, or that a facility is simply unsafe, or an unhealthy environment.

In really blunt terms, the current reformy rhetoric is forcing leaders to make stupid arguments for school closures where otherwise legitimate ones might actually exist!

There are legitimate reasons, cost saving reasons and other, to close schools and reorganize the delivery of educational services across organizational units and geographic locations within a district. Often, when I’m pushed to suggest the types of steps districts might take to achieve cost savings, the first issue I turn to is school organization/optimization.  Closing schools is not necessarily a bad thing. Closing schools for the wrong reasons and under the wrong pretexts is a bad thing. Reorganizing schools may lead to staffing reductions. These are cost cutting realities in a labor intensive industry. The fact is that you can’t really cut much from costs without cutting labor costs.  When enrollments decline significantly over time, fewer teachers are needed to get the job done and the staff may need to be reorganized.

But closing schools based on test scores, and pretending that we are somehow appropriately dismissing the staff that “caused” those low test scores is – well – just dumb.

Now let’s talk about some of the more legitimate reasons that a district might choose to close/reorganize schools.

First, let’s define “cost” and “cost cutting.” Cost is the minimum amount that needs to be spent to      achieve any given level of outcomes. It’s certainly possible to spend more than the minimum hypothetical – perfect world – cost of achieving any given level of outcomes. In fact, it’s pretty much a given that spending on outcomes occurs in less than perfect conditions, including unevenly growing and declining enrollments and unevenly distributed facilities capacity, quality and efficiency. Ultimately, the goal is to figure out how to reduce those barriers – less than perfect conditions – in order to get closer to that hypothetical minimum cost of achieving a given level of outcomes. In other words, the goal in times of budget cuts it to figure out how to spend less, but not compromise outcomes.

Here’s a short list of legitimate reasons a district might choose to close schools.

Economies of Scale

Operating unnecessarily small schools within a district creates inappropriate inequities. Providing more resources per pupil in one school necessarily means less in others. If those differences are based on legitimate differences in costs and student needs, that’s fine. It’s a difference that advances rather than erodes equity. But, sustaining inefficiently small schools at the expense of others within a large, population dense school district doesn’t meet those criteria. So, it’s in the best interest of the district as a whole to find ways to optimize the distribution of enrollments across schools within districts. To make sure, for example, that there aren’t elementary schools in one part of town with only 100 or so students, and in another part of town with 1,200 students. That there aren’t high schools with 300 to 400 students drawing  resources from high schools with 1,500 students. This can be really tricky to accomplish. But even moving toward optimal, while not reaching it is better than nothing. The literature on economies of scale suggests that elementary schools of 300 to 500 students and high schools of 600 to 900 students seem to produce optimal outcomes, and these sizes are consistent with literature that suggests that districts with 2000 to 4000 pupils seem to minimize costs of producing outcomes.

Facility efficiency

Some school facilities are simply more efficient to operate than others. They have more efficient mechanical/HVAC systems, are better insulated, have fewer deferred maintenance issues, potentially longer overall projected useful life.   Some facilities simply have more efficient space for accommodating the kinds of programs and services that need to be delivered. Evaluating the costs and benefits of maintaining and upgrading the current stock of facilities and whether children can be more efficiently distributed across “better” spaces with lower operations and maintenance costs is something any/all school districts should be engaged in on an ongoing basis.

Transportation efficiency

As population distribution shifts across spaces within a district, and while considering other reasons for reorganizing and redistributing students across schools – usually via changes to school attendance zones, but potentially with choice programs as well – evaluation of transportation efficiency should also be on the table.  In a district with dramatically declining enrollment or geographically shifting enrollment, school closings may be inevitable. In fact, a district may find itself closing some schools and selling off land, while opening others in different locations (less likely in more densely populated urban centers, but common in sprawling exurbia).

Health & Safety Concerns

This one is (or at least should be) a no brainer. Kids shouldn’t be housed in unsafe or unhealthy facilities.  That in mind, districts should engage in cost-benefit analyses to evaluate/compare the costs of improving the problem facilities/spaces versus other reorganization options.  Closing unsafe, unhealthy schools and appropriately distributing students among “better” spaces is obviously a legitimate reason for school closing.

Socioeconomic integration/balancing

A final reason why a district might close and/or reorganize schools to improve performance while maintaining (or cutting) spending, is to achieve better peer group balance across schools. Of course, this only works when the district is a) heterogeneous enough to be able to create better balanced peer groups and b) geographically small enough to not incur substantial transportation costs when implementing such a policy. A substantial body of research indicates that concentrated poverty and for that matter racial composition (racial isolation) in schools can affect the costs of achieving a given outcome target. Optimizing peer group composition across schools while considering interaction with other cost drivers (transportation) makes sense.  Of course, the U.S. Supreme Court has placed some constraints on the role of race in re-assignment policies, http://www.oyez.org/cases/2000-2009/2006/2006_05_908. But options remain available.

Improving peer group balance, optimizing school sizes, optimizing bus routes, making best use of most operationally efficient and educationally efficient learning spaces all can help districts both reduce costs and improve outcomes.

AND ABSOLUTELY NONE OF THIS HAS ANYTHING TO DO WITH CLOSING FAILING SCHOOLS.  Why, because there’s little or no evidence that closing “failing” schools improves either productivity or efficiency.

It’s not that sexy. It’s not reformy. It’s just good management decision making to get the most bang-for-the-buck. And it’s all stuff that districts can and should be working on constantly.

Closing schools is never easy. Someone will always be irked, no matter what the reason for the closure. A neighborhood will feel that it has lost its identity. Alums will feel that a piece of their childhood has been taken away.  So if we’re going to go down this road, and fight the difficult political fights that school closing plans create, then we ought to be closing the schools for the right reasons, and not the wrong ones!