Blog

Revisiting NJOSA & the Lakewood Effect

The current version of the New Jersey Opportunity Scholarship Act would pilot the tuition tax credits for private schooling in the following locations:

  • Asbury Park City School District
  • Camden City School District
  • Elizabeth City School District
  • Lakewood City School District
  • Newark City School District
  • City of Orange School District
  • Passaic City School District, and
  • City of Perth Amboy School District

http://www.njleg.state.nj.us/2012/Bills/S2000/1779_I1.PDF

http://www.njspotlight.com/stories/12/0316/0145/

NJOSA is often pitched publicly as a scholarship program that would allow students trapped in failing urban districts to exercise the choice to select a better alternative – implicit in this argument is that any private school option a student might choose would necessarily be a better alternative. Also suggestive in the rhetoric around NJOSA is that this program is mainly focused on kids in places like Camden and Newark – the stereotypical New Jersey urban centers.

NJOSA would provide scholarships to children in families below the 250% income threshold for poverty. The text of the bill indicates that eligible children are those either attending a chronically failing school in one of the districts above or eligible to enroll in such school in the following year (which would seem to include any child within the attendance boundaries of these districts even if presently already enrolled in private schools).

“children either attending a chronically failing school or eligible to enroll in a chronically failing school in the next school year.”

I have discussed NJOSA numerous times on this blog, specifically focusing on the Lakewood effect here & here.

Many in New Jersey probably already understand that the above list contains some intriguing outliers, but I suspect few understand just how big these outlier effects are. One would naturally assume that Newark, for example, would be the major target of NJOSA scholarship recipients? Right? That’s our stereotypical urban core with failing schools from which kids need to escape.

Here’s what the Newark private school market looks like.

This map uses data on individual private schools, their locations, and enrollments from the 2007-08 National Center for Education Statistics, Private School Universe Survey, which also includes classifications of religious affiliation/status. Purple circles are religious private schools and green circles are those who’s primary affiliation is listed as non-religious (independent of a specific church/religion). Circle size indicates enrollment size. Bigger circles are the bigger schools.

I also use U.S. Census Bureau American Community Survey data to identify the number of total children and children in families below the 250% income threshold attending private school within each Public Use Micro Data Area (PUMA). Blue numbers indicate total private enrollments, and red numbers indicate low income private school enrollments.

Currently, there are about 3400 private schooled students residing in Newark, and there are about 2,000 who actually fall below the 250% poverty-income threshold. So, that’s a sizeble number of Newark children who might quality for NJOSA scholarships, in addition to others who might apply who are presently enrolled in public schools.

It would seem by the language in the bill that a current privately schooled student would merely have to be eligible to attend their local public school, but not actually do so.

Here’s what the Passaic/Clifton private school market looks like (neither one is big enough to be its own PUMA):

The Passaic/Clifton PUMA has nearly as many low income private school enrolled children as Newark – 1,619, despite much smaller total population. And by far the largest private school in the area is Yeshiva Ktana.

But the most striking example is that of Lakewood, as I have discussed in the past. Since Lakewood remains in this bill, even though there’s nothing really new I’m presenting here, I felt the need to reiterate just how big a deal this is.

Here’s the Lakewood private school marketplace & current enrollments:

Based on the Census ACS data from a few years back, there were over 17,000 privately schooled students in Lakewood, and OVER 10,400 OF THOSE STUDENTS WERE IN FAMILIES THAT REPORTED THEMSELVES AS BEING BELOW THE 250% POVERTY-INCOME THRESHOLD!

Recall that Newark had about 2,000 low income private school enrolled children.

Orange/East Orange combined have under 900.

All of the cities around Asbury Park combined about 400 (meaning that Asbury Park alone is likely much less).

Camden about 1,300

Elizabeth about 1,000

The entire area (several towns/districts) around Perth Amboy about 1,000 (meaning that Perth Amboy is likely only a fraction of that amount)

And again, Lakewood, over 10,000! (and Passaic, another significant amount)

In other words, all of the other locations combined do not have the sum total of low income private school enrolled children that Lakewood has. Lakewood would likely be the epicenter of NJOSA scholarship distribution. I noted in my first post on this topic that if the average scholarship amounts were as proposed, the Lakewood Yeshiva schools would stand to take in as much as $67 million per year in these indirect taxpayer subsidies.

The clever subversion of taxpayer rights

I have a secondary, related concern when it comes to Tuition Tax Credits, these days, often framed as “Opportunity Scholarship Acts.”

Tuition Tax Credit programs create an indirect subsidy of private schooling, whereas Vouchers provide a direct subsidy.  The latter is a more honest approach and one that at least allows for legal recourse by concerned taxpayers – even if they eventually lose. It is currently the case that voucher programs which provide direct subsidies to families, even where the majority of those families choose to use their subsidy for religious schooling, are constitutional under the U.S. Constitution (but not under some state constitutions which expressly prohibit use of public funding for religious education). Specifically, the U.S. Supreme Court has determine that these subsidies do not violate the establishment clause of the U.S. Constitution, because the distribution of the subsidy is mediated through individual/family choices and the subsidy/voucher program (at least by Cleveland design) is neutral to religion (see: http://www.oyez.org/cases/2000-2009/2001/2001_00_1751  – the dissent is worth listening to)

This is not to say, however that a state might not be vulnerable to legal challenge over a voucher system if it could be shown that the state had actually made policy decisions with the intent of guiding students and resources toward specific religious schools/institutions, but rather that the Cleveland model did pass muster. One might certainly scrutinize the NJ legislature’s choice to include Lakewood in NJOSA, with the Lakewood Yeshiva schools essentially as the primary beneficiary of the program. This would seem somewhat analogous to a 1990s scenario where NY State redrew one district’s boundaries so as to encompass a single homogeneous religious community (see: http://www.oyez.org/cases/1990-1999/1993/1993_93_517) Could NY State now go back and pilot a voucher program in Kiryas Joel instead? Would the choice of a homogeneous religious community to pilot a voucher program violate the establishment clause? Would it be substantively different from the more “neutral” Cleveland Voucher program? Maybe.

But, here’s the kicker with Tuition Tax Credit programs.  They are indirect subsidies, generated by providing full tax credits to corporations to gift money to a state approved (independently governed) entity (voucher governing body). Thus, a hole of “X” is created in the state budget. That hole is paid for by the fact that the state no-longer has to allocate state aid (>or= X) to local public districts where students accept the scholarship to attend private schools instead. It’s the mathematical equivalent of simply allocating the same sum in state revenue directly to private schools, but it’s achieved indirectly through a third party entity.

Who cares? Why is that important? If the state has gamed this system to favor and disproportionately subsidize a specific religion, can’t we still do something about it? The answer to that question is probably not, at least via legal action!  The U.S. Supreme Court has recently determined that taxpayers do not have legal standing to challenge the distribution of these indirect subsidies. As far as we can tell no one really seems to have a right to challenge these policies for potentially violating the establishment clause. If if was a voucher program- direct subsidy – there would most likely at least exist the right of taxpayers to challenge the policy in court, even if it was eventually determined that the policy was constitutional (sufficiently similar to the Cleveland model). But the indirect tuition tax credit approach cleverly permits diversion of tax revenues while negating entirely taxpayer rights to challenge that diversion. See: http://www.oyez.org/cases/2010-2019/2010/2010_09_987

In other words, the court never even gets to address the substantive question of whether the legislature has intentionally gone out of its way to favor and subsidize a specific religion.

(Real) Graph vs (Fake) Graph Friday

This post provides a quick follow up to yesterday’s post (late last night) when I critiqued a questionable graph from an NJDOE presentation here: State of NJ Schools presentation 2-29-2012

It turns out that the slide presentation had many comparable graphs that deserve at least some attention. First, there’s this graph which attempts to argue that early reading proficiency is a statewide issue, and not just a problem of low income urban neighborhoods:

Rather impressive eh? Certainly gives the impression that early reading deficits are concentrated not in the poorest districts but in the least poor ones.

Why would someone make such an argument? Well, one reason would be if this argument was being coupled with arguments to redistribute funding to those less poor district to help them out – to argue that educational “risk” is not concentrated in poor districts, but rather distributed across all districts.

The problem here is that it’s completely absurd to compare total counts of students who are non-proficient across groups without any regard for the total counts of all students. That is, what percent of kids are proficient in each poverty group. Well, here’s what that picture ends up looking like:

Pretty much as we might expect. Lack of reading proficiency in 3rd grade as measured on state assessments is a much bigger problem in higher poverty districts, with poverty here measured as % Free Lunch and with reading proficiency tabulated for general test takers

Here’s the next graph, which compares charter school reading and math proficiency rates in Newark to Newark Public Schools:

In this case, the title is somewhat appropriate in that charter school performance does indeed vary in Newark. But the graph is pretty much meaningless and deceptive.

The graph relates average Language Arts and Math proficiency across schools showing basically that schools which are higher on one are also higher on the other. That’s really no big surprise. But the graph ignores entirely the substantive student population differences that explain a large portion of the difference in these proficiency rates. The graph appears to be not-so-subtly constructed to reinforce the central point of this section of the presentation slides – that charters outperform district schools.  That point continues to be built on analyses that were already thoroughly debunked many times over. This graph goes a step further by then cherry picking a few charters to name – all of which appear superior to the “District.”

So, what does it look like if we take all of these schools, and separate the district into it’s schools, and plot the combined proficiency rates with respect to % Free Lunch? Well, here it is:INCLUDES NJASK3 TO NJASK8 (no HSPA)

Yes, this graph reinforces the title of the NJDOE graph, but in a much more reasonable light. That said, there are a number of other student population factors that would need to be accounted for in a more thorough analysis. 

Among other things, while the first graph appears to suggest that TEAM Academy is a relative laggard compared to schools like North Star or Robert Treat, my representation here shows that TEAM is actually further above it’s expected performance than either of the other two. TEAM simply serves a lower income population than the other two. Further, district schools serving similar populations do similarly well. And several charter schools do as poorly (and worse) than comparable district schools.

 

Amazing Graph Proves Poverty Doesn’t Matter!(?)

I just couldn’t pass this one up. This is a graph for the ages, and it comes from a presentation by the New Jersey Commissioner of Education given at the NJASA Commissioner’s Convocation in Jackson, NJ on Feb 29. State of NJ Schools presentation 2-29-2012

Please turn to Slide #24:

The title conveys the intended point of the graph – that if you look hard enough across New Jersey – you can find not only some, but MANY higher poverty schools that perform better than lower poverty schools.

This is a bizarre graph to say the least. It’s set up as a scatter plot of proficiency rates with respect to free/reduced lunch rates, but then it only includes those schools/dots that fall in these otherwise unlikely positions. At least put the others there faintly in the background, so we can see where these fit into the overall pattern. The suggestion here is that there is not pattern.

The apparent inference here? Either poverty itself really isn’t that important a factor in determining student success rates on state assessments, or, alternatively, free and reduced lunch simply isn’t a very good measure of poverty even if poverty is a good predictor. Either way, something’s clearly amiss if we have so many higher poverty schools outperforming lower poverty ones. In fact, the only dots included in the graph are high poverty districts outperforming lower poverty ones. There can’t be much of a pattern between these two variables at all, can there? If anything, the trendline must be sloped up hill? (that is, higher poverty leads to higher outcomes!)

Note that the graph doesn’t even tell us which or how many dots/schools are in each group and/or what percent of all schools these represent. Are they the norm? or the outliers?

So, here’s the actual pattern:

Hmmm… looks a little different when you put it that way. Yeah, it’s a scatter, not a perfectly straight line of dots. And yes, there are some dots to the right hand side that land above the 65 line and some dots to the left that land below it.

BUT THE REALITY IS THAT FREE/REDUCED LUNCH ALONE EXPLAINS ABOUT 2/3 OF THE VARIATION IN PROFICIENCY RATES ACROSS SCHOOLS!

Do free/reduced lunch rates explain all of the variance? Of course not. Nothing really does, in part because the testing data themselves include noise, and reducing the testing data to percentages of kids over and above arbitrary thresholds introduces other noise. So all of the variance can’t be explained no-matter how many variables we throw at it. We can, however, take some additional easily accessible variables from the school report cards and explain a little more of the variation:

But, % free lunch remains the dominant factor, along with % black and % female. Combining free/reduced produces a somewhat weaker effect than using % free alone.

Lengthy, somewhat related tangent

Back in 2007-2008, while I was still at the University of Kansas, I was involved in a study of factors associated with production of outcomes and relative efficiency of New Jersey schools. Most of the data were generally insufficient for academic publication, but we did have some fun playing and figuring out what was there.

The study was designed to figure out a) which background factors really accounted for differences in NJ school performance, and b) what were the differences in characteristics of schools that appeared to do better or worse than expected.

Here are a few snapshots of what I found back then, constructing models of school level outcomes for New Jersey schools using data from 2004 to 2006 (all publicly accessible data).

First, using a combination of background demographic factors, school characteristics and other school resource measures we were able to explain as much as 82% of the variation in 8th grade (then GEPA) outcomes. Still, % free and reduced lunch played a (the) dominant role, along with other related factors including special education shares, racial composition, % of female adults living in the surrounding area holding a Graduate degree, and an indicator that the school was in an affluent suburban district (DFG I or J).

We played around with multiple options and this is where we ended up. One of the more interesting revelations was that poverty seemed to have stronger effects on outcomes in population dense urban centers (our Urban x Free Lunch interaction term). This finding is common and can be explained in multiple ways (I’ll have to get to that another time).

We also found that certain resource measures were associated with higher (or lower) outcome schools. Schools where teachers had higher salaries than other similar teachers (by degree and experience) in the surrounding labor market tended to have higher outcomes. And schools with larger shares of teachers in their first three years with only a BA had lower outcomes.

We (I) actually took the analyses a step further and estimated preliminary models of the costs of producing desired outcome targets (models which I subsequently improved upon). The key element of these models was to figure out if there were, in fact, alternative or additional demographic measures for districts that might help to better capture which districts have legitimately higher costs of achieving desired student outcomes. That is, what kind of stuff should be weighted, and/or weighted more heavily in the state school finance formula.  Specifically, what alternatives do we have for addressing poverty?

This was the first attempt:

And this was the second attempt (in a published article):

  • Baker, B.D., Green, P.C. (2009) Equal Educational Opportunity and the Distribution to State Aid to Schools: Can or should racial composition be a factor? Journal of Education Finance 34 (3) 289-323

What we found was that poverty (measured by % free lunch) indeed strongly affects the costs of improving student outcomes, specifically applied to New Jersey districts, in one case focusing only on K-12 unified districts and in the second case all NJ districts. This finding is not a revelation.

We also found that one might capture additional “costs” by including measures of school district racial composition, and we discuss the legal implications of this finding in several related articles (here, here & here). But, we also point out that there are alternatives for capturing some of the same effect, including the Urban x Poverty interaction.

So yes, we can make our statistical models and analyses ever more nuanced to more thoroughly explain the links between student backgrounds and student outcomes, and the costs of improving those outcomes. And, to the extent we can, we should.  But the fact is that poverty still matters, and it seems to matter statistically even when we measure it with the imperfect, crude proxy of children qualified for free or reduced price lunch.

In summary, despite the apparent brilliant wisdom conveyed in the graph at the outset of this post:

  1. Poverty as measured by free and reduced lunch status remains a very strong predictor of variations in proficiency rates across New Jersey schools; and
  2. Various measures of poverty, including free lunch status, and census poverty rates interacted with urban population density strongly influence the costs of improving outcomes across New Jersey school districts (and to an extent that far exceeds the weights in the current school finance formula).

But it’s still a really fun graph!

Here’s a link to a related article on schools supposedly “beating the odds” (like those in the above graph)

And here’s a link to my preliminary analyses which never saw the light of day (rough and unedited, in its original draft form): BAKER.DRAFT.JUNE_08

About those Dice… Ready, Set, Roll! On the VAM-ification of Tenure

A while back I wrote a post (and here) in which I explained that the relatively high error rates in Value-added modeling might make it quite difficult for teachers to get tenure under some newly adopted and other proposed guidelines and much easier to lose it, even after waiting years to get lucky [& yes I do mean LUCKY] enough to obtain it.

The standard reformy template is that teachers should only be able to get tenure after 3 years of good ratings in a row and that teachers should be subject to losing tenure if they get 2 bad years in a row.  Further, it is possible that the evaluations might actually stipulate that you can only get a good rating if you achieve a certain rating on the quantitative portion of the evaluation – or the VAM score. Likewise for bad ratings (that is, the quantitative measure overrides all else in the system).

The premise of the dice rolling activity from my previous post was that it is necessarily much less likely to roll the same number (or subset of numbers) three times in a row than twice (exponentially in fact). That is, it is much harder to overcome the odds based on error rates to achieve tenure, and much easier to lose it. Again, this is much due to the noisiness of the data, and less due to the difficulty of actually being “good” year after year. The ratings simply jump around a lot. See my previous post.

So, for those of you energetic young reformy wanna be teachers out there thinkin’ – hey, I can cut it – I’ll take my chances and my “good” teaching will overcome those odds – generating year-after-year top quartile rankings? Alot of that is totally out of your control! [Look, I would have been right there with you when I graduated college]

But my first post on this topic was all in hypothetical-land. Now, with the newly released NYC teacher data we can see just how many teachers actually got three-in-a-row in the past three years [among those actually teaching the same subject and grade level in the same school], applying different ranges of “acceptableness” or not.

So, here, I give the benefit of the doubt, and set a reasonably low bar for getting a good rating – the median or higher [ignoring error ranges and sticking with the type of firm cut-points that current state policies and local contracts seem to be adopting]. Any teacher who gets the median or higher 3 years in a row can get tenure! otherwise, keep trying until you get your three in a row? How many teachers is that? How many overcome the odds of the randomness and noise in the data? Well, here it is:

As percentiles dictate (by definition) about half of the teachers in the data are in the upper half in the most recent year. But, only about 20% of teachers in any grade or subject are above the median two years in a row. Further, only about 6 to 7% actually were lucky enough to land in the upper half for three years running!  Assuming stability remains relatively similar over time, we could expect that in any three year period, about 7% of teachers might string together three above-the-medians in a row. At that pace, tenure will be awarded rather judiciously. (but actually, stability in the most recent year over prior is unusually high)

Let’s say I cut teachers a break and only take tenure away if they get two in a row not in the bottom half, but rather all the way down into the bottom third!  What are the odds? How many teachers actually get two years in a row in the bottom third?

Well, here it is:

That’s rather depressing isn’t it. The chances of ending up in the bottom third two years in a row are about the same as the chances of ending up in the top half three years in a row!

Now, perhaps you’re thinkin’ Big Deal. So you jump into and out of the edges of these categories. That just means you’re not really solidly in the “good” or the “bad” and it should take you longer to get tenure. That’s fair? After all, it’s not like any substantial portion of teachers are actually jumping back and forth between the top half and the bottom third?

  • In ELA,  14% of those in the top half in 2010 were in the bottom third in 2009
  • In ELA, 23.9% in the top half in 2009 were in the bottom third in 2010
  • In Math (where the scores are more stable in part because they appear to retain some biases), 9% of those in the top half in 2010 were in the bottom third in 2009
  • In Math, 26% of those in the bottom third in 2009 were in the top half in 2010 and nearly 16% of those in the top half in 2009 ended up in the bottom third in 2010.

[corrected]

Most of these shifts if not nearly all of them are not because the teacher actually became a good teacher or became a bad teacher from one year to the next.

The big issue here is the human side of this puzzle. None of the existing deselection or tightened tenure requirement simulations of the supposed positive effects of leveraging VAM estimates to improve student outcomes makes even halfhearted attempts to account for human behavioral responses to a system driven by these imprecise and potentially inaccurate metrics. All adopt the oversimplified “all else  equal” assumption of an unending supply of new teacher candidates that are equal in quality to the  current average teacher and with comparable standard deviation.

Reformy arguments ratchet these assumptions up a notch. The most reformy arguments in favor of moving toward these types of tenure and de-tenuring provisions posit that making tenure empirically performance based and de-selecting the “bad” teachers will strengthen the teaching profession. That better applicants – the top third of college graduates – will suddenly flock to teaching instead of other currently higher paying professions.

But, with so little control over one’s destiny is that really likely to be the case? It certainly stands to be a frustrating endeavor to achieve any level of job stability. And it doesn’t look like average compensation will be rising in the near future to compensate for this dramatic increase in risk. Further, if we tie compensation to these ratings either as one-time bonuses or as salary adjustments, many teachers who, by chance, get good ratings in one year will, by chance again, get bad ratings the next year.  Teachers will have a difficult time even guessing at what their compensation might look like the following year. And since the ratings are necessarily relative (based on percentiles) the distribution of additional compensation must involve winners and losers. The luckier one or a handful of teachers get in a given year, the larger the share of the merit pot they receive and the less others receive.  Once again, I do mean LUCK.

Who will really be standing in line to take these jobs? In the best case (depending on one’s point of view), perhaps a few additional energetic grads of highly selective colleges will jump into the mix for a couple of years. But as these numbers and frustrations play out over time, the pendulum is certainly likely to swing the other direction.

More risk and more uncertainty without any sign of significantly increased reward is highly unlikely to improve the teaching profession and far more likely to make things much worse, especially in already hard to staff schools and districts!

These numbers are fun to play with. I just can’t stop myself. And they have endless geeky academic potential. But I’m increasingly convinced that they have little practical value for improving school quality. And I’m increasingly disturbed by how policy makers  have adopted absurd, rigid requirements around these anything but precise and questionably accurate metrics.

 

 

Seeking Practical Uses of the NYC VAM Data???

A short while back, in a follow up post regarding the Chetty/Friedman/Rockoff study I wrote about how and when I might use VAM results, if I happened to be in a decision making role in a school or district:

I would want to be able to generate a report of the VA estimates for teachers in the district. Ideally, I’d like to be able to generate a report based on alternative model specifications (option to leave in and take out potential biases) and on alternative assessments (or mixes of them). I’d like the sensitivity analysis option in order to evaluate the robustness of the ratings, and to see how changes to model specification affect certain teachers (to gain insights, for example, regarding things like peer effect vs. teacher effect).

If I felt, when pouring through the data, that they were telling me something about some of my teachers (good or bad), I might then use these data to suggest to principals how to distribute their observation efforts through the year. Which classes should they focus on? Which teachers? It would be a noisy pre-screening tool, and would not dictate any final decision.  It might start the evaluation process, but would certainly not end it.

Further, even if I did decide that I have a systematically underperforming middle school math teacher (for example), I would only be likely to try to remove that teacher if I was pretty sure that I could replace him or her with someone better. It is utterly foolish from a human resource perspective to automatically assume that I will necessarily be able to replace this “bad” teacher with an “average” one.  Fire now, and then wait to see what the applicant pool looks like and hope for the best?

Since the most vocal VAM advocates love to make the baseball analogies… pointing out the supposed connection between VAM teacher deselection arguments and Moneyball, consider that statistical advantage in Baseball is achieved by trading for players with better statistics – trading up (based on which statistics a team prefers/needs).  You don’t just unload your bottom 5%  or 15% players in on-base-percentage and hope that players with on-base-percentage equal to your team average will show up on your doorstep. (acknowledging that the baseball statistics analogies to using VAM for teacher evaluation to begin with are completely stupid)

With the recently released NYC data in hand, I now have the opportunity to ponder the possibilities. How, for example, if I was the principal of a given, average sized school in NYC, might I use the VA data on my teachers to council them? to suggest personnel changes? assignment changes, or so on? Would these data, as they are, provide me any useful information about my staff and how to better my school?

For this exercise, I’ve decided to look at the year to year ratings of teachers in a relatively average school. Now, why would I bother looking at the year to year ratings when we know that the multi-year averages are supposed to more accurate – more representative of the teacher’s over time contributions? Well, you’ll see in the graphs below that those multi-year averages also may not be that useful. In many cases, given how much teacher ratings bounce around from year to year, it’s rather like assigning a grade of “C” to the kid who got Fs on the first two tests of the semester, and As on the next two or even a mix of Fs and As in some random sequence. Averages, or aggregations, aren’t always that insightful. So I’ve decided to peel it back a bit, as I likely would if I was the principal of this school seeking insights about how to better use my teachers and/or how to work with them to improve their art.

Here are the year to year Math VA estimates for my teachers who actually continue in my building from one year to the next:

Focusing on the upper left graph first, in 2008-09, Rachel, Elizabeth and Sabina were somewhat below average. In 2009-10 they were slightly above average. In fact, going to the prior year (07-08), Elizabeth and Sabina were slightly above average, and Rachel below. They reshuffle again, each somewhat below average in 2006-07, but only Rachel has a score for the earliest year. Needless to say, it’s little tricky figuring out how to interpret differences among these teachers from this very limited view of very noisy data. Julie is an interesting case here. She starts above average in 05-06, moves below average, then well above average, then back to below. She’s never in the same place twice. There could be any number of reasons for this that are legitimate (different class composition, different life circumstances for Julie, etc.). But, more likely it’s just the noise talkin’! Then there’s Ingrid, who held her own in the upper right quadrant for a few years, then disappears. Was she good? or lucky?  Glen also appears to be a tw0-in-a-row Math teaching superstar, but we’ll have to see how the next cycle works out for him.

Now, here are the ELA results:

If we accept these results as valid (a huge stretch), one might make the argument that Glen spent a bit too much of his time in 2008-09 trying to be a Math teaching superstar, and really shortchanged ELA. But he got it together and became a double threat in 2009-10?  Then again, I think I’d have to wait and see if Glen’s dot in the picture actually persists in any one quadrant for more than a year or two, since most of the others continue to bounce all over the place. Perhaps Julie, Rachel, Elizabeth and Sabina really are just truly average teachers in the aggregate – if we choose to reduce their teaching to little blue dots on a scatterplot. Or perhaps these data are telling me little or nothing about their teaching. Rachel and Julie were both above average in 05-06, with former? colleague (or left the VAM mix) Ingrid. Rachel drops below average and is joined by Sabina the next year. Jennifer shows up as a two-year very low performer, then disappears from the VAM mix. But Julie, Rachel, Sabina and Elizabeth persist, and good for them!

So, now that I’ve spent all of my time trying to figure out if Glen is a legitimate double-threat superstar and what, if anything I can make of the results for Julie, Rachel, Elizabeth and Sabina, It’s time to put this back into context, and take a look at my complete staffing roster for this school (based on 2009-10 NYSED Personnel Master File). Here it is by assignment code, where “frequency” refers to the total number of assigned positions in a particular area:

So, wait a second, my school has a total of 28 elementary classroom teachers. I do have a total of 11 ELA and 10 Math ratings in 2009-10, but apparently fewer than that (as indicated above) for teachers teaching the same subject and grade level in sequential years (the way in which I merged my data). Ratings start in 4th grade, so that knocks out a big chunk of even my core classroom teachers.

I’ve got a total of 108 certified positions in my school, and I’m spending my time trying to read these tea leaves which pertain to, oh… about 5% of my staff (who are actually  there, and rated, on multiple content areas, for more than a few years).

By the way, by the time I’m looking at these data, it’s 2011-12, two years after the most recent value-added estimates and not too many of my teachers are posting value-added estimates more than a few years in a row. How many more are gone now? Sabina, Rachel, Elizabeth, Julie? Are you still even there? Further, even if they are there, I probably should have been trying to make important decisions in the interim and not waiting for this stuff. I suspect the reports can/will be produced more likely on a 1 year lag, but even then I have to wait to see how year-to-year ratings stack up for specific teachers.

From a practical standpoint, as someone who would probably try to make sense of this type of data if I was in the role of school principal (‘cuz data is what I know, and real “principalling” is not!), I’m really struggling to see the usefulness of it.

See also my previous post on Inkblots and Opportunity Costs.

Note for New Jersey readers: It is important to understand that there are substantive differences between the Value-added estimates produced in NYC and the Student Growth Percentile’s being produced in NJ. The bottom line – while the value-added estimates above fail to provide me with any meaningful insights, they are conceptually far superior (for this purposes) to SGP reports.

These value-added estimates actually are intended to sort out the teacher effect on student growth. They try to correct for a number of factors, as I discuss in my previous post.

Student Growth Percentiles do not even attempt to isolate the teacher effect on student growth, and therefore it is entirely inappropriate to try to interpret SGP’s in this same way. SGPs could conceivably be used in a VAM, but by no means should ever stand alone.

They are NOT A TEACHER EFFECTIVENESS EVALUATION TOOL. THEY SHOULD NOT BE USED AS SUCH.  An extensive discussion of this point can be found here:

https://schoolfinance101.wordpress.com/2011/09/02/take-your-sgp-and-vamit-damn-it/

https://schoolfinance101.wordpress.com/2011/09/13/more-on-the-sgp-debate-a-reply/

You’ve Been VAM-IFIED! Thoughts (& Graphs) on the NYC Teacher Data

Readers of my blog know I’m both a data geek and a skeptic of the usefulness of Value-added data specifically as a human resource management tool for schools and districts. There’s been much talk this week about the release of the New York City teacher ratings to the media, and subsequent publication of those data by various news outlets. Most of the talk about the ratings has focused on the error rates in the ratings, and reporters from each news outlet have spent a great deal of time hiding behind their supposed ultra-responsibleness of being sure to inform the public that these ratings are not absolute, that they have significant error ranges, etc.  Matt Di Carlo over at Shanker Blog has already provided a very solid explanatory piece on the error ranges and how those ranges affect classification of teachers as either good or bad.

But, the imprecision – as represented by error ranges – of each teacher’s effectiveness estimate is but one small piece of this puzzle. And in my view, the various other issues involved go much further in undermining the usefulness of the value added measures which have been presented by the media as necessarily accurate albeit lacking in precision.

Remember, what we are talking about here are statistical estimates generated on tests of two different areas of student content knowledge – math and English language arts.  What is being estimated is the extent of change in score (for each student, from one year to the next) on these particular forms of these particular tests of this particular content, and only for this particular subset of teachers who work in these particular schools.

We know from other research (from Corcoran and Jennings, and form the first Gates MET report) that value added estimates might be quite different for teachers of the same subject area if a different test of that subject is used.

We know that summer learning may affect student annual value added, yet in this case, NYC is estimating teacher effectiveness on student outcomes from year to year. That is, the difference in a students’ score on one day in the spring of 2009 to another in the spring of 2010, is being attributed to a teacher who has contact, for a few hours a day with that child from September to June (but not July and August).

The NYC value-added model does indeed include a number of factors which attempt to make fairer comparisons between teachers of similar grade levels, similar class sizes, etc. But we also know that those attempts work only so well.

Focusing on error rate alone presumes that we’ve got the model and the estimates right – that we are making valid assertions about the measures and their attribution to teaching effectiveness.

That is, that we really are estimating the teacher’s influence on a legitimate measure of student learning in the given content area.

Then error rates are thrown into the discussion (and onto the estimates) to provide the relevant statistical caveats about their precision.

That is, accepting that we are measuring the right thing and rightly attributing it to the teacher, there might be some noise – some error – in our estimates.

If the estimates lack validity, or are biased, the rate of noise, or error around the invalid or biased estimate is really a moot point.

In fact, as I’ve pointed out before on this blog, it is quite likely that value added estimates that retain bias by failing to fully control for outside influences are actually likely to be more stable over time (to the extent that the outside influences remain more stable over time). And that’s not a good thing.

So, to the news reporters out there, be careful about hiding behind the disclaimer that you’ve responsibly provided the error rates to the public. There’s a lot more to it than that.

Playing with the Data

So, now for a little playing with the data, which can be found here:

http://www.ny1.com/content/top_stories/156599/now-available–nyc-teacher-performance-data-released-friday#doereports

I personally wanted to check out a few things, starting with assessing the year to year stability of the ratings. So, let’s start with some year to year correlations achieved by merging the teacher data reports across years for teachers who stayed in the same school teaching the same subject area to the same grade level. Note that teacher IDs are removed from the data. But teachers can be matched within school, subject and grade level, by name over time (by concatenating the dbn [school code], teacher name, grade level and subject area [changing subject area and grade level naming to match between older and newer files]). First, here’s how the year to year correlations play out for teachers teaching the same grade, subject area and in the same school each year.

Sifting through the Noise

As with other value-added studies, the correlation across teachers in their ratings from one year to the next seem to range from about .10 to about .50. Note that between 2009-10 and 2008-09 Math value-added estimates were relatively highly correlated, compared to previous years (with little clear evidence as to why, but for possible changes to assessments, etc.). Year to year correlations for ELA are pretty darn low, especially prior to the most recent two years.

Visually, here’s what the relationship between the most recent two years of ELA VAM ratings looks like:

I’ve done a little color coding here for fun. Dots coded in orange are those that stayed in the “average” category from one year to the next. Dots in bright red are those that stayed “high” or “above average” from one year to the next and dots in pale blue were “low” or “below average” from one year to the next. But there are also significant numbers of dots that were above average or high in one year, and below average or low in the next.  9 to 15% (of those who were “good” or were “bad” in the previous year) move all the way from good to bad or bad to good. 20 to 35% who were “bad” stayed “bad” & 20 to 35% who were “good” stayed “good.” And this is between the two years that show the highest correlation for ELA.

Here’s what the math estimates look like:

There’s actually a visually identifiable positive relationship here. Again, this is the relationship between the two most recent years, which by comparison to previous years, showed a higher correlation.

For math, only about 7% of teachers jump all the way from being bad to good or good to bad (of those who were “good” or “bad” the previous year), and about 30 to 50% who were good remain good, or who were bad, remain bad.

But, that still means that even in the more consistently estimated models, half or more of teachers move into or out of the good or bad categories from year to year, between the two years that show the highest correlation in recent years.

And this finding still ignores whether other factors may be at play in keeping teachers in certain categories. For example, whether teachers stay labeled as ‘good’ because they continue to work with better students or in better environments.

Searching for Potential Sources of Bias

My next fun little exercise in playing with the VA data involved merging the data by school dbn to my data set on NYC school characteristics. I limited my sample for now to teachers in schools serving all grade levels 4 to8 and w/complete data in my NYC schools data, which include a combination of measures from the NCES Common Core and NY State School Report Cards. I did a whole lot of fishing around to determine whether there were any particular characteristics of schools that appeared associated either or both with individual teacher value added estimates or with the likelihood that a teacher ended up being rated “good” or “bad” by my aggregations used here.  I will present my preliminary findings with respect to those likelihoods here.

Here are a few logistic regression models of the odds that a teacher was rated “good” or rated “bad” based on a) the multi-year value-added categorical rating for the teacher and b) based on school year 2009 characteristics of their school across grades 4 to 8.

After fishing through a plethora of measures on school characteristics (because I don’t have classroom characteristics for each teacher), I found that with relative consistency, using the Math ratings, teachers in schools with higher math proficiency rates tended to get better value added estimates for math and were more likely to be rated “good.” This result was consistent across multiple attempts, models, subsamples (Note that I’ve only got 1300 of the total math teachers rated here… but it’s still a pretty good and well distributed sample). Also, teachers in schools with larger average class size tended to have lower likelihood of being classified as “above average” or “high” performers. These findings make some sense, in that peer group effect may be influencing teacher ratings and class size may effects (perhaps as spillover?) may not be fully captured in the model. The attendance rate factor is somewhat more perplexing.

Again, these models were run with the multi-year value added classification.

Next, I checked to see if there were differences in the likelihood of getting back to back good or back to back bad ratings by school characteristics. Here are the models:

As it turns out, the likelihood of achieving back to back good or back to back bad ratings is also influenced by school characteristics. Here, as class size increases by 1 student, the likelihood that a teacher in that school gets back to back bad ratings goes up by nearly 8%. The likelihood of getting back to back good ratings declines by 6%. The likelihood of getting back to back good ratings increases by nearly 8% in a school with 1% higher math proficiency rate in grades 4 to 8.

These are admittedly preliminary checks on the data, but these findings in my view do warrant further investigation into school level correlates with the math value added estimates and classifications in particular. These findings are certainly suggestive of possible estimate bias.

Who Gets VAM-ED?

Finally, while there’s been much talk about these ratings being released for such a seemingly large number of teachers – 18,000 – it’s important to put those numbers in context in order to evaluate their relevance. First of all, it’s 18,000 ratings, not teachers. Several teachers are rated for both math and ELA, bringing the total number of individuals down significantly from 18,000.  In still generous terms, the 18,000 or so are more like “positions” within schools, but even then, the elementary classroom teacher covers both areas even within the same assignment or position.

Based on the NY State Personnel Master File for 2009-10, there were about 150,000 (linkable to individual schools including those in the VA reports) certified staffing assignments in New York City in 2009-10 (where individual teachers cover more than one assignment). In that light, 18,000 is not that big a share.

But let’s look at it at the school level using two sample schools. For these comparisons I picked two schools which had among the largest numbers of VA math estimates (with many of the same teachers in those schools having VA ELA estimates).  The actual listing of teacher assignments is provided for two schools below, along with the number of teachers for whom there were Math VA estimates.  Again, these are schools with among the highest reported number (and share) of teachers who were assigned math effectiveness ratings.

In each case, we are Math VAM-ing around 30% of total teacher assignments [not teachers, but assignments] (with substantial overlap for ELA). Clearly, several of the teacher assignments in the mix for each school are completely un-VAM-able. States such as Tennessee have adopted the absurd strategy that these other staff should be evaluated on the basis of the scores for those who can be VAM-ed.

A couple of issues are important to consider here. First, these listings more than anything convey the complexity of what goes on in schools – the type of people who nee to come together and work together collectively on behalf of the interests of kids. VAM-ing some subset of those teachers and putting their faces in the NY Post is unhelpful in many regards. Certainly there exist significant incentives for teachers to migrate to un-vammed assignments to the extent possible.   And please don’t tell me that the answer to this dilemma is to VAM the Orchestra conductor or Art teacher. That’s just freakin’ stupid!

As Preston Green, Joseph Oluwole and I discuss in our forthcoming article in the BYU Education and Law Journal, coupling the complexities of staffing real schools and evaluating the diverse array of professionals that exist in those schools with VAM-based rating schemes necessarily means adopting differentiated contractual agreements, leading to numerous possible perverse incentives and illogical management decisions (as we’ve already seen in Tennessee as well as in the structure of the DC IMPACT contract).

Student Enrollments & State School Finance Policies

Most readers of the NJDOE report on reforming the state’s school finance formula likely glided right past the seemingly innocuous recommendation to shift the enrollment count method for funding from a fall enrollment count to an average daily attendance figure. After all, on its face, the argument provided seems to make sense. Let’s fund on this basis so that we can incentivize increased attendance in our most impoverished and low performing districts. (Another argument I’ve heard in other states is “why would we fund kids who aren’t there?”). The data were even presented to validate that attendance rates are lower in these districts (Figure 3.1).

I, however, could not let this pass, because Average Daily Attendance as a basis for funding is actually a well understood trick of the trade for reducing aid to districts and schools with higher poverty and minority concentrations.  I have both blogged about this topic in the past, and written published research directly and indirectly related to the topic.[1]

The intent of this blog post is to provide a (very limited, oversimplified) primer on the common methods of counting general student populations for purposes of determining state aid to schools (charter and district) and to provide some commentary on the pros and cons of each.

This blog post doesn’t touch upon the layers of additional factors associated with counting all of the various special student categories that may drive additional aid to local public school districts and charter schools.  I have, however, written numerous articles and reports on that topic as well. I’m writing about the underlying, basic count methods in this post because they are so often overlooked. But, they tend to have multiplicative effects throughout state school finance formulas.

So, here’s the primer (in somewhat oversimplified terms since there are multiple permutations on each):

Definitions

Fall Enrollment Count

A fall enrollment or fall attendance count is often based on the count of students either enrolled or specifically in attendance on a single date early in the fall of the school year (Oct 1, Oct 15, etc.). That figure may be based on students who have enrolled in a district or on students who actually attended on the given day. These single day counts in the fall are sometimes reconciled with a spring/January re-calculation leading to either upward or downward adjustments in remaining aid payments.

Average Daily Attendance

Average daily attendance counts are based on the numbers of children actually in attendance in a school or district each day, then, typically averaged on a bimonthly or quarterly basis in order to determine mid-year adjustments to state aid.

Average Daily Membership

Average Daily Membership or Average Daily Enrollment measures the numbers of children enrolled to attend a specific district throughout the year, and may also be periodically reconciled, as students enter and leave the district or school mid-year.

Comments on Each

Fall Enrollment Count

Fall enrollment counts allow for rational annual budget planning.  Note that there is a difference between enrollment and attendance.  Conceptually, attendance can’t exceed enrollment, if enrollment represents all those eligible to attend and enrolled to attend a particular school or district.   To some degree, it makes sense to base funding on the students enrolled rather than those that can be tracked down to attend on a single day in the fall.

Single point in time enrollment counts do not allow for mid-term adjustments to aid when students come or go during the school year. One might argue that this means that districts with significant mid-year attrition will be overpaid throughout the year. But these districts have had to plan their budgets and staffing based on the numbers they expected at the beginning of the year (though usually state aid estimates for budgeting purposes are based on prior year fall enrollments), and cannot easily make mid-year adjustments to accommodate losses in aid resulting from losses in students.

Average Daily Attendance (ADA)

One major problem with ADA is that districts must plan their budgets and staffing on an annual basis, and mid-year adjustments based on attendance counts, result in reductions in aid that are difficult to absorb mid-stream in the school year.  The bottom line is that districts and charter schools are obligated to have services available for all who might attend, not just all who do on a given day.

In addition, districts with higher poverty concentrations and high minority concentrations tend to have lower attendance rates for a variety of reasons beyond their control.  Students from disrupted, low income households are more likely, for example to have illnesses that go untreated, be malnourished or be exposed to other factors (second hand smoke & other environmental hazards) that compromise their health.  They have less access to transportation, and often come from single parent households, limiting parental supports to get them out the door to school.  One cannot fix these factors by reducing aid to school districts facing these dilemmas.

It is well understood that financing schools on the basis of average daily attendance systematically reduces aid to higher poverty districts.  The NJDOE report acknowledges that funding on this basis would lead to a reduction in aid of over 3% for districts in DFG A versus average districts (see figure 3.1).  Further, there is no substantive evidence that funding formulas based on ADA have ever improved or better balanced student attendance rates by district poverty and race over time.

Using ADA as the basis for determining funding can have other unintended consequences, such as increased numbers of school closure days in order to reduce the risk of low attendance.[2] School districts might, for example, choose to close for increased numbers of days during flu season, as attendance drops off. Closures typically do not reduce average daily attendance. In fact, closures are used by schools/districts operating under this model as a way to avoid low attendance days.  And some districts may be more significantly affected than others in this regard. Weather related decisions may also be affected.

Average Daily Membership (ADM)

ADM requires the State in collaboration with school districts to accurately manage their enrollment information.  It is unclear if NJDOE has the present ability to implement ADM in New Jersey

As with average daily attendance, districts plan their budgets and staffing on an annual basis, and mid-year adjustments to enrollment, leading to reductions in aid, may not easily be absorbed mid-stream.

Within year moves tend to more often affect higher poverty, urban districts,[3] potentially causing greater fluctuations in the budgets of these districts and complicating their financial planning.

A Few Examples from States

States in the Northeast do not tend to use Average Daily Attendance as their method for determining school aid. Rather, New York State had been using attendance as a factor in a prior school funding formula.[4]  Presently among Northeastern states, Connecticut uses Resident Pupils within its Education Cost Sharing Formula,[5] New York uses ADM toward the estimation of Total Aidable Foundation Pupil Units*,[6] Pennsylvania uses ADM,[7] Massachusetts uses a Fall Enrollment figure,[8] and Rhode Island uses ADM.[9] Other states around the country, including Kansas[10] and Colorado[11] use a fall enrollment count date.  Many others around the country use variations on either ADM or FTE, including Florda and Tennessee.  A few states — e.g., Missouri,[12] Texas and Illinois — still use ADA.  But published literature and legal analyses have, in fact, criticized the racially disparate effects of Missouri’s school funding formula (prior to recent reforms).[13]

Application to New Jersey Data

So, just how disparate are attendance rates across New Jersey school districts, by race and low income status, as well as by district factor grouping? Here are a few quick graphs based on the 2010-11 school level data on enrollments (enr file from NJDOE) and attendance rates (school report card d-base).

In short, what these graphs show is that if aid were allocated by average daily attendance as opposed to by enrollment or membership, districts with higher percent black population or higher percent low income, would receive systematic reductions to their state aid. These reductions would be non-trivial.  High school attendance in a school that is 100% black is, on average, nearly 7% lower than in a school that is 0% black. In elementary schools, the differential is between 2% and 3%.   These differentials would translate directly to percent reductions in aid.

Enrollment Data: http://www.nj.gov/education/data/enr/

Attendance Data: http://education.state.nj.us/rc/rc10/index.html

*Note: In some parts of the NY Aid formulas, the local wealth measure for taxable assessed value per pupil uses a variant of ADA in the denominator.  This use is generally much less significant to the overall calculation of aid than using ADA directly in the calculation of the foundation allotment.


[1] Green, P.C., Baker, B.D. (2006) Urban Legends, Desegregation and School Finance: Did Kansas City Really Prove that Money Doesn’t Matter? Michigan Journal of Race and Law. 12 (1)

Baker, B.D., Green, P.C. (2005) Tricks of the Trade: Legislative Actions in School Finance that Disadvantage Minorities in the Post-Brown Era American Journal of Education 111 (May) 372-413

[3] Killeen, K., Baker, B.D. Addressing the Moving Target: Should measures of student mobility be included in education cost studies? (Available on request)

[5]http://www.sde.ct.gov/sde/lib/sde/PDF/dgm/report1/merecsgd.pdf  “Resident Students are those regular education and special education pupils enrolled at the expense of the town on October 1 of each school year.”

[6]https://stateaid.nysed.gov/budget/combaidsa_0910.htm  For calculating Foundation Aid, which has been frozen since this point in time

[8]http://finance1.doe.mass.edu/chapter70/enrollment_desc.pdf. “In order to be included, a student must be officially enrolled on October 1st. Those who leave inSeptember or arrive after October 1st are not counted. A student who happens to be absent onOctober 1st is included nonetheless; this is a measure of enrollment, not attendance.”

[13]Green, P.C., Baker, B.D. (2006) Urban Legends, Desegregation and School Finance: Did Kansas City Really Prove that Money Doesn’t Matter? Michigan Journal of Race and Law. 12 (1)   Baker and Green (2006) explain: “Missouri is among a handful of states that continues to provide aid to local public school districts on the basis of their average daily attendance (ADA) rather than enrolled pupil count or membership. From 2000 to 2004, poverty rates and black student population share alone explain 59% of variations in attendance rates across Missouri school districts enrolling over 2,000 students. Both black population share and poverty rate are strongly associated with lower attendance rates, leading to systematically lower funding per eligible or enrolled pupil in districts with higher shares of either population.”

How NOT to fix the New Jersey Achievement Gap

Late yesterday, the New Jersey Department of Education Released its long awaited report on the state school finance formula. For a little context, the formula was adopted in 2008 and upheld by the court as meeting the state constitutional standard for providing a thorough and efficient system of public schooling. But, court acceptance of the plan came with a requirement of a review of the formula after three years of implementation. After a change in administration, with additional legal battles over cuts in aid in the interim, we now have that report.  The idea was that the report would suggest any adjustments that may need to be made to the formula to make the distributions of aid across districts more appropriate/more adequate (more constitutional?). I laid out my series of proposed minor adjustments in a previous post.

Reduced to its simplest form, the current report argues that New Jersey’s biggest problem in public education is its achievement gap – the gap between poor and minority students and between non-poor and non-minority students.  And the obvious proposed fix? To reduce funding to high poverty, predominantly minority school districts and increase funding to less poor districts with fewer minorities.

Why? Because money and class size simply don’t matter. Instead, teacher quality and strategies like those  used in Harlem Childrens’ Zone do!

Here’s my quick, day-after, critique:

The Obvious Problem? New Jersey’s Huge & Unchanging Achievement Gap

The front end of the report provides lots of nifty graphs based on cohort proficiency rates on tests which change substantially in some years. The graphs are neatly laid out to validate the argument that New Jersey’s achievement gap is large and hasn’t changed much.  First, on the point of the largeness of the gap, in national context. I’ve explained here how the NJ poor-non-poor gap is actually relatively average nationally. That’s not to say that it’s acceptable, we ought to work on this, by whatever reasonable means we can.

Thankfully (so I don’t have to revisit all of the problems here), the remainder of the achievement gap analysis presented by NJDOE is thoroughly critiqued in a recent post by Matt Di   Carlo at Shanker Blog. DiCarlo summarizes some of the NJ achievement gap and trend data to point out:

The results for eighth grade math and fourth grade reading are more noteworthy – on both tests, eligible students in NJ scored 12 points higher in 2011 than in 2005, while the 2011 cohorts of non-eligible students were higher by roughly similar margins.

In other words, achievement gaps in NJ didn’t narrow during these years because both the eligible and non-eligible cohorts scored higher in 2011 versus 2005. Viewed in isolation, the persistence of the resulting gaps might seem like a policy failure. But, while nobody can be satisfied with these differences and addressing them must be a focus going forward, the stability of the gaps actually masks notable success among both groups of students (at least to the degree that these changes reflect “real” progress rather than compositional changes).

http://shankerblog.org/?p=5102

Revelation? Gaps are a function of the height of the highs as much as the depth of the lows. If both get better, gaps don’t close as much. Gaps are still a problem, and must be addressed even if the highs get higher, because opportunity for access to college and on the labor market is relative. But, the framing of the NJ achievement gap by NJDOE is unhelpful in this regard, and the proposed solutions harmful. How does it make sense then, to provide greater increases in state aid to those students in districts at the highs and less to the lows?

Supporting Claims for Solutions?

Of course, to support the eventual pre-determined (utterly absurd) conclusion that the way to close this achievement gap is to cut aid to the poor and give it to the less poor requires that the report validate that money really has nothing to do with it. That, arguably, all of that money and increased staffing actually made things worse. Further, that cutting money from poor districts is what will make them better. I guess it also then stands to reason that giving larger aid increases to less poor districts might also make them worse, and viola – the achievement gap shrinks!

  • Claim 1: Money Has Nothing to do with It

The claims that money doesn’t matter are built on some graphs which could easily make my list of dumbest graphs (or at least most pointless, deceptive, meaningless ones). Here’s one which is intended to convince the reader that all of that money sent to Abbott districts was for naught:

The report uses the graph to conclude:

While the above analysis is not sufficient to say whether new spending has had a positive impact on student achievement, it makes clear that financial resources are not the only – and perhaps not even the most important – driver of achievement.

If the graph isn’t sufficient to make this point, then why use the graph to try to make this point? Clearly, looking only at two variables – percent change in revenue and percent change in proficiency rates – is not even sufficient to make the softened claim “perhaps not even the most important” factor in improving student achievement.  These assertions can’t be supported in any way by this graph.

But even more suspect is the assertion embedded in the policy recommendations  that therefore, cutting aid from high poverty districts will cause no harm.

Better research on whether and to what extent school finance reforms improve student outcomes &/or equity of outcomes shows that in fact, school finance reforms can and do improve both the level and distribution of student outcomes: http://www.tcrecord.org/content.asp?contentid=16106

Higher quality research, in contrast, shows that states that implemented significant reforms to the level and/or distribution of funding tend to have significant gains in student outcomes.

Further, research on the broader question (based on real analysis) of whether and how class size and money matter indicates that, in simple terms, money does matter, and that things that cost money, like class size reduction and improving teacher quality (which does cost money) matter:  http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

Perhaps most importantly, even the research that has cast doubt on the strength of the positive influence of money on student outcomes has never validated that cuts to funding are not harmful and may be helpful. This is an absurd and unfounded claim.

Richard Murnane of Harvard said it well enough back in the early 1990s:

“In my view, it is simply indefensible to use the results of quantitative studies of the relationship between school resources and student achievement as a basis for concluding that additional funds cannot help public school districts. Equally disturbing is the claim that the removal of funds… typically does no harm.” (p. 457)

Murnane, R. (1991) Interpreting the Evidence on Does Money Matter? Harvard Journal of Legislation. 28 p. 457-464

Though not directly stated in the NJDOE report, it is implicit in the recommendations.

  • Claim 2: Teacher Quality & Harlem Childrens’ Zone-Style Strategies Can Close the Gap

Deeply embedded in the NJDOE report, making the transition from claims of dire achievement gaps toward how to fix them, is a discussion of how the obvious solutions based on current research must have to do with improving teacher quality and doing stuff like Harlem Childrens’ Zone does.  The NJDOE report includes two particularly bold statements that these two strategies alone – but certainly not money – can close the black-white achievement gap:

Having a highly effective teacher for three to five years can erase the deficits that the typical disadvantaged student brings to school.xxiii

Evidence from the Harlem Children’s Zone provides a similar demonstration of the power of schools to close the black-white achievement gap existing in New York.xxiv

Needless to say, these interpretations of the existing research are a massive unwarranted stretch. Matt Di    Carlo addresses the issue of  how many teachers does it take to close the achievement gap?

Even then, the implicit assertion of the report in general, that money has nothing to do with teacher quality or the distribution of teacher quality, is ridiculous. As I explain here:

A substantial body of literature has accumulated to validate the conclusion that both
teachers’ overall wages and relative wages affect the quality of those who choose to enter the teaching profession, and whether they stay once they get in. For example, Murnane and Olson (1989) found that salaries affect the decision to enter teaching and the duration of the teaching career, while Figlio (1997, 2002) and Ferguson (1991) concluded that higher salaries are associated with more qualified teachers.

http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

And further, on the flip side, cuts to funding and severe constraints on spending growth can reduce teacher quality:

Research on the flip side of this issue – evaluating spending constraints or reductions – reveals the potential harm to teaching quality that flows from leveling down or reducing spending. For example, David Figlio and Kim Rueben (2001) note that, “Using data from the National Center for Education Statistics we find that tax limits systematically reduce the average quality of education majors, as well as new public school teachers in states that have passed these limits.”

And, if we are interested in achievement gaps, and better distributing the quality of teachers across richer and poorer districts and children:

Salaries also play a potentially important role in improving the equity of student outcomes. While several studies show that higher salaries relative to labor market norms can draw higher quality candidates into teaching, the evidence also indicates that relative teacher salaries across schools and districts may influence the distribution of teaching quality. For example, Ondrich, Pas and Yinger (2008) “find that teachers in districts with higher salaries relative to non-teaching salaries in the same county are less likely to leave teaching and that a teacher is less likely to change districts when he or she teaches in a district near the top of the teacher salary distribution in that county.”

But even more strikingly, these interpretations ignore entirely that what Harlem Childrens Zone does, above and beyond anything else is to spend a ton of money (raising as much as $60,000 per pupil in private giving in some years, for additional information, see this post) and spend much of that money on providing smaller class sizes than surrounding NYC district schools.  So, in effect, what Harlem Childrens Zone shows us (in its best light) is that we can make modest progress toward closing achievement  gaps by leveraging substantial additional financial resources to provide comprehensive wrap-around community resources coupled with small class sizes.

The Proposal: Cut Aid to the Poor and Give More to the Non-Poor (& Less Poor)

After the rather predictable preamble about New Jersey’s achievement gap, coupled with classic claims that money clearly isn’t the answer, and things that actually cost money, but we’ll pretend don’t really cost money are the answer, the obvious recommendations for changes to the school finance formula are to reduce aid to the poor and give it to the less poor.

Here are the distributions of the percent change in state aid for 2012-13 across K-12 districts and the per pupil (preliminary estimates in need of updated enrollment figures) by districts arranged from lower to higher concentrations of low income children:

K-12 Unified Districts Only

K-12 Unified Districts Only

The report argues specifically that the adjustments in the aid formula for low income children should be reduced. That they should be reduced because they were increased without basis, over original recommendations provided to the state board back in 2003 (but hidden until 2006). In short, that those low income kids really don’t need that much and will be better off without it.

I critique those original recommendations in this report. Essentially, the argument is that there is simply no basis for providing as much as an additional 57% per low income child in high poverty concentration districts, therefore we should reduce it. The icing in the cake in this argument is a table in which the report points out that Texas, Vermont and Maine provide less than this. How in the heck they chose Texas, Vermont and Maine is beyond me. These states are at least a little different from NJ… and… from each other.

Beyond that, it should go without saying that the decisions of policymakers in three completely different states that aren’t New Jersey really have little or nothing to do with the cost of providing equal educational opportunity to low income kids in New Jersey.  Are we going to base all of our policies on Vermont… and Texas simultaneously? That would be a real trick? Consider the possibilities?

As my reported linked above points out, the weights in the original analysis were too low, and were thus adjusted upwards, though not necessarily far enough? On what basis? Well, the actual research on the costs of providing equal educational opportunities for low income children points to weights nearer to double, not 40% or 50% higher than average.  Here’s the most directly relevant article, from the Economics of Education Review, and here’s a link to a National Research Council report on the subject.

In a further effort to reduce aid to poorer districts (in a way that will have multiplicative effects throughout the formula) NJDOE proposes to base the allocation of aid on Average Daily Attendance. This is actually a classic, well understood Trick of the Trade for shifting aid away from poorer districts which for a variety of reasons outside their control have lower attendance rates. Way back when I started this blog, one of the topics I wrote about was these seemingly innocuous tricks (a subject of my research).  While other states do continue to use these policies, since their effects are well understood, to recommend such a change is shameless.

But even setting aside the empirical evidence on “costs,” how can it possibly make sense that achievement gaps between richer and poorer districts will be moderated by taking money from poorer districts and redistributing it back to less poor ones?

That’s the report in its essence.

We’ve got big achievement gaps.

Money doesn’t matter – in fact – it must be making things worse not better.

Therefore, to close the gaps, we need to give less of that harmful money to the poor, and more to the non-poor.

Go figure?

Reformy Platitudes & Fact-Challenged Placards won’t Get Connecticut Schools what they Really Need!

For a short while yesterday – more than I would have liked to – I followed the circus of testimony and tweets about proposed education reform legislation in Connecticut. The reform legislation – SB 24 – includes the usual reformy elements of teacher tenure reform, ending seniority preferences, expanding and promoting charter schooling, etc. etc. etc. And the reformy circus had twitpics of of eager undergrads (SFER) & charter school students (as young as Kindergarten?) shipped in and carrying signs saying CHARTER=PUBLIC (despite a body of case law to the contrary, and repeated arguments, some lost in state courts [oh], by charter operators that they need not comply with open records/meetings laws or disclose employee contracts), and tweeting reformy platitudes and links to stuff they called research supporting the reformy platform (Much of it tweeted as “fact checking” by the ever-so-credible ConnCAN).

Ignored in all of this theatre-of-the-absurd was any actual substantive, knowledgeable conversation about the state of public education in Connecticut, the nature of the CT achievement gap and the more likely causes of it, and other problems/failures of Connecticut education policy.

First, that achievement gap:

Yes, Connecticut has a large achievement gap… among the largest. But, I encourage you to read my previous post in which I explain that poverty achievement gaps in states tend to be mostly a function of income disparity in states. The bigger the income difference between rich and poor, the bigger the achievement gaps between them. But, even then, the CT achievement gap is a problem. CT’s income gaps between poor and  non-poor are most similar to those of MA and NJ, but both MA and NJ do better than CT on achievement gap measures. Here’s a graph relating income gap and achievement gap:

Connecticut has a higher than otherwise expected gap and MA, NJ and RI have lower.

But, is this because of teacher tenure? Is it because teachers aren’t regularly fired because of bad student test scores? Is it because there aren’t enough charter or magnet schools in CT? That’s highly unlikely for several reasons.

First, teachers have tenure status in both higher and lower performing, higher and lower income districts in CT. As I show below, teacher salaries are lower and class sizes larger in disadvantaged districts. SB24 does NOTHING to fix that.

As for highly recognized charter and magnet schools in CT, these schools are actually serving far fewer of the lower income kids within the lower income neighborhoods. So, while they might be doing okay, on average, for the kids they are serving, it is equally likely that they are contributing to the achievement gap as much if not more than helping it. That’s not to say they aren’t helping the students they are serving. But rather that the segregated nature of their services is capitalizing on a peer effect of concentrating more advantaged children. Either way, these schools are unlikely to serve as a broad based solution for CT education quality in general or for resolving achievement gaps.

During this same time period, teachers in NJ and MA also had similar tenure protections and weren’t being tenured or fired based on student test scores. Still somehow, those states had smaller gaps. Further, while both other states do have charter schools, New Jersey which has a much smaller achievement gap than CT has thus far maintained a relatively small charter sector. What Massachusetts and New Jersey have done is to more thoroughly and systematically address school funding disparities.

The Real Disparities:

In a previous series of posts, I discussed what I called Inexcusable Inequalities. I actually used CT as the main example, not because CT is among the worst states on funding inequality, but because I happened to have good data on CT. CT is not among the worst. That special space is reserved for NY, IL, PA and a few others. But CT has its problems. Let’s do a quick walk through. In my previous analysis

I started my previous post by comparing per pupil spending adjusted for needs and costs across all CT school districts with actual outcomes of those districts in order to categorize CT districts into more and less advantaged groups. The differences, starting with the figure below were pretty darn striking. Districts like New Canaan, Westport and Weston have rather high need and cost adjusted spending, certainly by comparison with Bridgeport, New London or New Britain.

For Illustrative purposes, I then picked a few of the most disadvantaged CT districts and compared them to the most advantaged on a handful of measures – shown below. In this table, I report their nominal spending per pupil – not adjusted for the various needs and additional costs. Even without those adjustments, districts like Bridgeport and New Britain start well behind their more advantaged peers. And among other differences, they pay their teachers less a) on average and b) at any given level of experience or education. Pretty darn hard to recruit and retain quality teachers into these settings given the combination of working conditions and lower pay.

AND MAKING TENURE CONTINGENT ON STUDENT TEST SCORES, OR FIRING TEACHERS BASED ON STUDENT TEST SCORES WON’T FIX THAT! IT WILL FAR MORE LIKELY MAKE IT MUCH, MUCH WORSE!

Salary disparity patterns hold when comparing a) all districts in the upper right of the first figure with b) all districts in the lower left, and c) districts furthers in the lower left (severe disparity):

On top of that, class sizes are also larger in the higher need districts, despite the need for smaller class sizes to aid in closing the achievement gaps for these children (more here).

Further, as I showed in my previous post, the funding disparities have significant consequences for the depth and breadth of curricular offerings available to high students in these districts:

For this analysis, I used individual teacher level data on individual course assignments to determine the distribution of teacher assignments per child, thus characterizing each district’s and group of districts’ offerings (for related research, see: https://schoolfinance101.com/wp-content/uploads/2010/01/b-baker-mo_il-resourcealloc-aera2011.pdf)

Disadvantaged districts have far fewer total positions per child, and if we click and blow up the graph, we can see some striking discrepancies! Those high need districts have far more special education and bilingual education teachers (squeezing out other options, from their smaller pot!). Those high need districts have only about half the access to teachers in physical education assignments or art, much less access to Band (little or none to Orchestra), and significantly less access to math teachers!

IN REALLY SIMPLE TERMS, UNDER CT POLICIES, HIGH NEED DISTRICTS SUCH AS BRIDGEPORT AND NEW BRITAIN HAVE FAR FEWER RESOURCES AND FAR GREATER NEEDS. THEIR TEACHERS HAVE LOWER SALARIES AND, ON AVERAGE, LARGER CLASSES.

Messing with teacher evaluation, especially in ways as likely to do harm as to do good, is an unfortunate distraction at best. Doing so on the basis that those are the policy changes needed to close Connecticut’s achievement gap reflects an astounding degree of utter obliviousness!

What about those amazing CT charter and magnet schools? Aren’t they the ultimate scalable solution?

I’ve written much more detail here, about the issue of whether renowned CT charter schools actually “do more, with less while serving the same students.” Here are a few quick graphs. First, Amistad Academy of New Haven in context, by % free lunch:

Next, Capital Prep in Hartford in context. Now, I typically wouldn’t (shouldn’t) have to point out that a small selective magnet program drawing students across district lines is simply NOT REPRESENTATIVE and not likely a scalable solution for all kids.  It’s a potentially good option for those with access, and much of the benefit of the option likely rests in selective peer group effect (as noted above). I feel compelled, however, to point out how Capital Prep is (obviously) not a typical  school only because the head of the school seems to be trying to argue that it is a model scalable reform Really? Really? I mean…. REALLY?):

But what about Governor Malloy’s funding plan? That’ll fix it! Won’t it?

Amidst all of the reformy platitudes, misguided and fact-challenged placards and the like, there were occasional references to Governor Malloy’s changes to the state school finance formula – seemingly implying that the Governor has taken major steps toward making the (supposedly already overfunded) system fairer. There was certainly no outrage expressed at the types of disparities I note above, and all the warm fuzzy feeling anyone could possibly conjure that any finance package tied to the vast batch of reformyness on steriods would be sufficient to get the job done.

After all, new aid would be progressively distributed. Those poor districts would get, on average, about… oh… a whopping new $250 per pupil while richer districts would get only about $50 per pupil. And with this astounding outlay of fiscal effort, the most important thing is to make sure it doesn’t just go straight into the pockets of those union-lacky-lazy-self-interested-teachers, of course – or at least certainly not the “ineffective” ones.

Here are the effects of the Malloy funding increases, on a per pupil basis, if added on to Net Current Expenditures per Pupil (pulling out magnet school aid which creates a distorted representation for New Haven and Hartford):

What we have in this picture is each district as a dot (circle or triangle). Districts are sorted from low to high percent free/reduced lunch along the horizontal axis. Net Current Expenditures are on the vertical axis. Blue Circles represent current (okay, last year) levels of current expenditures per pupil. RED TRIANGLES REPRESENT THE ADDITION OF MALLOY AID. Wow… that’s one heck of a difference. That should certainly fix the disparities I laid out above! NOT!

Here it is with district names added, so you can see where some of our more disadvantaged districts start and end up:

Not that helpful for Bridgeport or New Britain, is it?

To summarize:

The fact is that EQUITABLE AND ADEQUATE FUNDING IS THE NECESSARY UNDERLYING CONDITION FOR IMPROVING EDUCATION QUALITY IN CONNECTICUT AND REDUCING ACHIEVEMENT GAPS!!!!!! (related research: http://www.tcrecord.org/library/content.asp?contentid=16106)

Equitable and adequate funding is a necessary underlying condition for running any quality school, be it a traditional public school, charter school or private school. Money matters and it matters regardless of the type of school we’re talking about.

Equitable and adequate funding is required for recruiting and retaining teachers in Connecticut’s high need, currently under-resourced schools (something charter operators realize). Recruiting and retaining teachers to work in these communities will take more, not less money.

Reformy platitudes (and fact-challenged placards) about tenure reform won’t change that.  And altering the job security landscape to move toward ill-conceived evaluation frameworks and flawed metrics will likely hurt far more than it will help.

It’s time to pack up the reformy circus, load up the buses and shred the placards and have some real, substantive conversations about improving the quality and equality of public schooling in Connecticut.

Borrowing wise words from those truly market-based, Private Independent schools…

Lately it seems that public policy and the reformy rhetoric that drives it are hardly influenced by the vast body of empirical work and insights from leading academic scholars which suggests that such practices as using value-added metrics to rate teacher quality, or dramatically increasing test-based accountability and pushing for common core standards and tests to go with them are unlikely to lead to substantial improvements in education quality, or equity.

Rather than review relevant empirical evidence or provide new empirical illustrations in this post, I’ll do as I’ve done before on this blog and refer to the wisdom and practices of private independent schools – perhaps the most market driven segment and most elite segment of elementary and secondary schooling in the United States.

Really… if running a school like a ‘business’ (or more precisely running a school as we like to pretend that ‘businesses’ are run… even though ‘most’ businesses aren’t really run the way we pretend they are) was such an awesome idea for elementary and secondary schools, wouldn’t we expect to see that our most elite, market oriented schools would be the ones pushing the envelope on such strategies?

If rating teachers based on standardized test scores was such a brilliant revelation for improving the quality of the teacher workforce, if getting rid of tenure and firing more teachers was clearly the road to excellence, and if standardizing our curriculum and designing tests for each and every component of it were really the way forward, we’d expect to see these strategies all over the home pages of web sites of leading private independent schools, and we’d certainly expect to see these issues addressed throughout the pages of journals geared toward innovative school leaders, like Independent School Magazine.  In fact, they must have been talking about this kind of stuff for at least a decade. You know, how and why merit pay for teachers is the obvious answer for enhancing teacher productivity, and why we need more standardization… more tests… in order to improve curricular rigor? 

So, I went back and did a little browsing through recent, and less recent issues of Independent School Magazine and collected the following few words of wisdom:

From Winter 2003, when the school where I used to teach decided to drop Advanced Placement courses:

A little philosophy, first. Independent schools are privileged. We do not have to respond to the whims of the state, nor to every or any educational trend. We can maximize our time attuned to students and how they learn, and to the development of curriculum that enriches them and encourages the skills and attitudes of independent thinkers. Our founding charters and missions established independence for a range of reasons, but they now give all of us relative curricular autonomy, the ability to bring together a faculty of scholars and thinkers who are equipped to develop rich, developmentally sound programs of study. As Fred Calder, the executive director of New York State Association of Independent Schools, wrote in a letter to member schools a few years ago: “If we cannot design our programs according to our best lights and the needs of our communities, then let the monolith prevail and give up the enterprise. Standardized testing in subject areas essentially smothers original thought, more fatally, because of the irresistible pressure on teachers to teach to the tests.”

http://www.nais.org/publications/ismagazinearticle.cfm?ItemNumber=144300

Blasphemy? Or simply good education!

And from way, way back in 2000, in a particularly thoughtful piece on “business” strategies applied to schools:

Educators do not respond to the same incentives as businesspeople and school heads have much less clout than their corporate counterparts to foster improvement. Most teachers want higher salaries but react badly to offers of money for performance. Merit pay, so routine in the corporate world, has a miserable track record in education. It almost never improves outcomes and almost always damages morale, sowing dissension and distrust, for three excellent reasons, among others: (1) teachers are driven to help their own students, not to outperform other teachers, which violates the ethic of service and the norms of collegiality; (2) as artisans engaged in idiosyncratic work with students whose performance can vary due to factors beyond school control, teachers often feel that there is no rational, fair basis for comparison; and (3) in schools where all faculty feel underpaid, offering a special sum to a few sparks intense resentment. At the same time, school leaders have limited leverage over poor performers. Although few independent schools have unionized staff and formal tenure, all are increasingly vulnerable to legal action for wrongful dismissal; it can take a long time and a large expense to dismiss a teacher. Moreover, the cost of firing is often prohibitive in terms of its damage to morale. Given teachers’ desire for security, the personal nature of their work, and their comparative lack of worldliness, the dismissal of a colleague sends shock waves through a faculty, raising anxiety even among the most talented.

http://nais.org/publications/ismagazinearticle.cfm?ItemNumber=144267

Unheard of! Isn’t firing the bad teacher supposed to make all of those (statistically) great teachers feel better about themselves? Improve the profession? [that said, we have little evidence one way or the other]

How can we allow our leading private, independent, market-based schools to promote such gobbledygook? Why do they do it? Are they a threat to our national security or our global economic competitiveness because they were not then, nor are they now (see recent issues: http://www.nais.org/) fast-tracking the latest reformy fads? Testing out the latest and greatest educational improvement strategies on their own students, before those strategies get tested on low income children in overcrowded urban classrooms? Why aren’t the boards of directors of these schools – many of whom are leaders in “business” – demanding that they change their outmoded ways? Why? Why? Why? Because what they are doing works! At least in terms of their success in continuing to attract students and produce successful graduates.

Now, that’s not to say that these schools are completely stagnant, never adopting new strategies or reforms. They do new stuff all the time (technology integration, etc.) – just not the absurd reformy stuff being dumped upon public schools by policymakers who in many cases choose to send their own children to private independent schools.

In my repeated pleas to private school leaders to provide insights into current movements in teacher evaluation and compensation, I’ve actually found little change from these core principles of nearly a decade ago.  Private independent schools don’t just fire at will and fire often and teacher compensation remains very predictable and traditionally structured. I’d love to know, from my private school readers, how many of their schools have adopted state mandated tests?

Private independent schools pride themselves on offering small class sizes   (see also here) and a diverse array of curricular opportunities, as well as arts, sports and other enrichment – the full package.  And, as I’ve shown in my previous research, private independent schools charge tuition and spend on a per pupil basis at levels much higher than traditional public school districts operating in the same labor market. They also pay their headmasters well! More blasphemy indeed.

In fact, aside from “no excuses” charter schools whose innovative programs consist primarily of rigid discipline coupled with longer hours and small group tutoring (not rocket science), and higher teacher salaries (here, here & here) to compensate the additional work, private independent schools may just be among the least reformy elementary and secondary education options out there.

That’s not to say they are anything like “no excuses” charter schools. They are not in many ways. But they are equally non-reformy.  In fact, the average school year in private independent schools is shorter not longer than in traditional public schools – about 165 days.  And the average student load of teachers working in private independent schools (course sections x class size) is much lower in the typical private independent school than in traditional public schools. But that ain’t reformy stuff at all, any more than trying to improve outcomes of low income kids by adding hours and providing tutoring.

None-the-less, for some reason, well educated people with the available resources, keep choosing these non-reformy and expensive schools. Some of these schools have been around for a while too! Maybe, just maybe, it’s because they are doing the right things – providing good, well rounded educational opportunities as many of them have for centuries, adapting along the way (see: http://www.exeter.edu/admissions/109_1220_11688.aspx) .  Perhaps they’ve not gone down the road of substantially increased testing and curriculum standardization, test-based teacher evaluation – firing their way to Finland – because they understand that these policy initiatives offer little to improve school quality, and much potential damage.

Perhaps there are some lessons to be learned from market based systems. But perhaps we should be looking to those market based systems that have successfully provided high quality schooling for centuries to our nation’s most demanding, affluent and well educated leaders, rather than basing our policy proposals on some make-believe highly productive private sector industry where new technologies reduce production costs to near $0 and where complex statistical models are used to annually deselect non-productive employees.

Just pondering the possibilities, and still waiting for Zuck (an Exeter alum) to invest in Harkness Tables for Newark Public Schools and class sizes of 12 across the board!