Blog

The Circular Logic of Quality-Based Layoff Arguments

Many pundits are responding enthusiastically to the new LA Times article on quality-based layoffs – or how dismissing teachers based on Value-added scores rather than on seniority would have saved LAUSD many of its better teachers, rather than simply saving its older ones.

Some are pointing out that this new LA Times report is the “right” way to use value-added as compared with the “wrong” way that LA Times had used the information previously this year.

Recently, I explained the problematic circular logic being used to support these “quality-based layoff” arguments. Obviously, if we dismiss teachers based on “true” quality measures, rather than experience which is, of course, not correlated with “true” quality measures, then we save the jobs of good teachers and get rid of bad ones. Simple enough? Not so. Here’s my explanation, once again.

This argument draws on an interesting thought piece and simulation posted at http://www.caldercenter.org  ( Teacher Layoffs: An Empirical Illustration of Seniority vs. Measures of Effectiveness), which was later summarized in a (less thoughtful) recent Brookings report (http://www.brookings.edu/~/media/Files/rc/reports/2010/1117_evaluating_teachers/1117_evaluating_teachers.pdf).

That paper demonstrated that if one dismisses teachers based on VAM, future predicted student gains are higher than if one dismisses teachers based on experience (or seniority). The authors point out that less experienced teachers are scattered across the full range of effectiveness – based on VAM – and therefore, dismissing teachers on the basis of experience leads to dismissal of both good and bad teachers – as measured by VAM. By contrast, teachers with low value-added are invariably – low value-added – BY DEFINITION. Therefore, dismissing on the basis of low value-added leaves more high value-added teachers in the system – including more teachers who show high value-added in later years (current value added is more correlated with future value added than is experience).

It is assumed in this simulation that VAM (based on a specific set of assessments and model specification) produces the true measure of teacher quality both as basis for current teacher dismissals and as basis for evaluating the effectiveness of choosing to dismiss based on VAM versus dismissing based on experience.

The authors similarly dismiss principal evaluations of teachers as ineffective because they too are less correlated with value-added measures than value-added measures with themselves.

Might I argue the opposite? – Value-added measures are flawed because they only weakly predict which teachers we know – by observation – are good and which ones we know are bad? A specious argument – but no more specious than its inverse.

The circular logic here is, well, problematic. Of course if we measure the effectiveness of the policy decision in terms of VAM, making the policy decision based on VAM (using the same model and assessments) will produce the more highly correlated outcome – correlated with VAM, that is.

However, it is quite likely that if we simply use different assessment data or different VAM model specification to evaluate the results of the alternative dismissal policies that we might find neither VAM-based dismissal nor experienced based dismissal better or worse than the other.

For example, Corcoran and Jennings conducted an analysis of the same teachers on two different tests in Houston, Texas, finding:

…among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.

  • Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

So, what would happen if we did a simulation of “quality based” layoffs versus experience-based layoffs using the Houston data, where the quality-based layoffs were based on a VAM model using the Texas Assessments (TAKS), but then we evaluate the effectiveness of the layoff alternatives using a value-added model of Stanford achievement test data? Arguably the odds would still be stacked in favor of VAM predicting VAM – even if different VAM measures (and perhaps different model specifications). But, I suspect the results would be much less compelling than the original simulation.

The results under this alternative approach may, however, be reduced entirely to noise – meaning that the VAM based layoffs would be the equivalent of random firings – drawn from a hat and poorly if at all correlated with the outcome measure estimated by a different VAM – as opposed to experienced based firings. Neither would be a much better predictor of future value-added.  But for all their flaws, I’d take the experienced based dismissal policy over the roll of the dice, randomized firing policy any day.

In the case of the LA Times analysis, the situation is particularly disturbing if we look back on some of the findings in their own technical report.

I explained in a previous post that the LA Times value-added model had potentially significant bias in its estimates of teacher quality. For example, in my earlier post, I explain that:

Buddin finds that black teachers have lower value-added scores for both ELA and MATH. Further, these are some of the largest negative effects in the second level analysis – especially for MATH. The interpretation here (for parent readers of the LA Times web site) is that having a black teacher for math is worse than having a novice teacher. In fact, it’s the worst possible thing! Having a black teacher for ELA is comparable to having a novice teacher.

Buddin also finds that having more black students in your class is negatively associated with teacher’s value-added scores, but writes off the effect as small. Teachers of black students in LA are simply worse? There is NO discussion of the potentially significant overlap between black teachers, novice teachers and serving black students, concentrated in black schools (as addressed by Hanushek and Rivken in link above).

By contrast, Buddin finds that having an Asian teacher is much, much better for MATH. In fact, Asian teachers are as much better (than white teachers) for math as black teachers are worse! Parents – go find yourself an Asian math teacher in LA? Also, having more Asian students in your class is associated with higher teacher ratings for Math. That is, you’re a better math teacher if you’ve got more Asian students, and you’re a really good math teacher if you’re Asian and have more Asian students?????

One of the more intriguing arguments in the new LA Times article is that under the seniority based layoff policy:

Schools in some of the city’s poorest areas were disproportionately hurt by the layoffs. Nearly one in 10 teachers in South Los Angeles schools was laid off, nearly twice the rate in other areas. Sixteen schools lost at least a fourth of their teachers, all but one of them in South or Central Los Angeles.

http://articles.latimes.com/2010/dec/04/local/la-me-1205-teachers-seniority-20101204/2

That is, new teachers who were laid off based on seniority preferences were concentrated in high need schools. But so too were teachers with low value-added ratings?

While arguing that “far fewer” teachers would be laid off in high need schools under a quality-based layoff policy, the LA Times does not however offer up how many teachers would have been dismissed from these schools had their biased value-added measures been used instead? Recall that from the original LA Times analysis:

97% of children in the lowest performing schools are poor, and 55% in higher performing schools are poor.

Combine this finding with the findings above regarding the relationship between race and value-added ratings and it is difficult to conceive how VAM based layoffs of teachers in LA would not also fall disparately on high poverty and high minority schools. The disparate effect may be partially offset by statistical noise, but that simply means that some teachers in lower poverty schools will be dismissed on the basis of random statistical error, instead of race-correlated statistical bias (which leads to a higher rate of dismissals in higher poverty, higher minority schools).

Further, the seniority based layoff policy leads to more teachers being dismissed in high poverty schools because the district placed more novice teachers in high poverty schools, whereas the value-added based layoff policy would likely lead to more teachers being dismissed from high poverty, high minority schools, experienced or not, because they were placed in high poverty, high minority schools.

So, even though we might make a rational case that seniority based layoffs are not the best possible option, because they may not be highly correlated with true (not “true”) teaching quality, I fail to see how the current proposed alternatives are much if any better.  They only appear to be better when we measure them against themselves as the “true” measure of success.

The Curious Duplicity of NCTQ

NCTQ fashions itself as a leading think tank on promoting teacher quality in K-12 education. NCTQ adopts a relatively extreme position that teacher quality is the one and only thing that matters! Teacher quality is THE determining factor of school quality.

I also believe that teacher quality is very important. I also agree with NCTQ on the point that content knowledge, at the middle and secondary levels especially, is particularly important and that simply being listed as “qualified” to teach specific content is no guarantee.

As part of their effort to improve teacher quality, NCTQ has been going around doing “studies” and applying ratings to the quality of teacher preparation institutions. Now, I noted on my previous post that NCTQ and others may actually be missing the boat on who is actually preparing teachers. But lets set that aside for a moment. One would think that if NCTQ is so interested in teacher quality as the primary determinant of school quality and student success, and teacher expertise as an important part of that equation at higher grade levels, that any analysis of the quality of undergraduate or graduate programs to train teachers would have to place significant emphasis on faculty quality and expertise? right? It would make little sense to simply review which textbooks are used or what the course descriptions say, or what the curricular sequence happens to be? Right?

Out of a multitude of indicators on teacher preparation institutions, NCTQ includes only 1 – yes 1 – regarding faculty quality, which is described as follows:

In our evaluation of programs, we examined teaching responsibilities for all faculty members, as indicated by course assignments in course schedules, excluding all clinical coursework. We looked for two specific examples of inappropriate assignments: 1) an instructor teaching across the areas of foundations of education, methods and educational psychology; and/or 2) an instructor who teaches both reading and mathematics methods courses. Other inappropriate assignments may well be made but were not included in our review.

http://www.nctq.org/edschoolreports/illinois/standards/26Methodology.jsp

Yep, that’s it. All that they address is whether a faculty member appears to teach across two areas that no faculty member, in their view, could be sufficiently prepared to teach. The rest is based largely on textbooks chosen, syllabi and course descriptions, regardless of faculty expertise. Clearly this was a matter of data convenience. It’s hard to figure out whether individual faculty members truly possess expertise in their fields, short of evaluating their individual academic backgrounds, research and writing on the topic.

But it is absurd for an organization that believes teacher quality in K-12 education paramount, and content expertise critical, to ignore outright faculty expertise in their evaluations of teacher preparation institutions.

Here’s their FAQ on the long-term project of evaluating teacher preparation programs: http://www.nctq.org/p/response/evaluation_faq.jsp

Related reading (actual research):

Wolf-Wendel, L, Baker, B.D., Twombly, S., Tollefson, N., & Mahlios, M. (2006) Who’s Teaching the Teachers? Evidence from the National Survey of Postsecondary Faculty and Survey of Earned
Doctorates. American Journal of Education 112 (2) 273-300

Ed Schools

Ed schools seem to make an easy target in public policy debates over the quality of American public schooling and the American teacher workforce.

In many recent lopsided “ed school as the root of all evil” presentations, “Ed Schools,” are treated as some easily defined, static entity over time. In the book of reformyness (chapter 7, verse 2), “Ed Schools” necessarily consist of some static set of traditional higher education institutions – 4 year teachers colleges including regional state colleges and flagship universities – where a bunch of crusty old education professors spew meaningless theory at wide-eyed undergrads (who graduated at the bottom of their high school class) seeking that golden ticket to a job for life – with summers off.

In order to craft a clearly understandable (albeit entirely false) dichotomy of policy alternatives, pundits then present teachers who have obtained alternative certification as a group of individuals, nearly all of whom necessarily attended highly selective colleges and majored in something really, really rigorous and then received their certification through some more expeditious and clearly much more practical and useful fast-tracked option.

This was certainly the theme of a discussion (hashtag #edschools) at Thomas B. Fordham Institute actively tweeted the other day by Mike Petrilli and a few others.  What I found most interesting was that no-one really challenged the assumptions that “ed schools” are some easily definable group of traditional higher education institutions – that this has been unchanged over decades – and that teacher training is some consistent, exclusive domain of traditional public higher education institutions – specifically as an undergraduate degree granting enterprise? That there are and have always been, oh… about a thousand or so ed schools… that well… keep on doing the same damn thing over and over again (for the past 50 years, one participant tweeted) … and well… no one ever shuts down the bad Ed Schools… and that’s why we’re in such bad shape! It’s really that simple.

Because this characterization is simply assumed to be true, the obvious way to crack this broken and declining system is to expand alt. certification and allow more non-traditional, for profit and entrepreneurial organizations – especially non-university organizations to grant teaching credentials – heck – let’s let them actually grant degrees. Who needs brick-and-mortar colleges anyway? Given the assumed static nature of the declining and antiquated system of “Ed Schools” that has brought us to our knees, this is the only answer!!!!!

One of my favorite tweets from the event was from Mike Petrilli, relaying a comment by Kate Walsh:

Walsh: There are 1410 Ed schools in the country. NCTQ spent 5 years determining that number.

You know what Kate, by the time you were done figuring that out (however you did), the number had already changed. Also, FYI, there are actually some data sources out there that might have been helpful for tabulating the existing degree granting programs and the numbers of degrees conferred by those programs.

So, let’s take a look at some of the data on degrees conferred across all education fields in 1990, 2000 and 2010.

Let’s start with a quick look at the total degrees conferred in “education” as defined by degree classification codes (CIP Codes), across all institutions granting such degrees nationally. The interesting twist here is that bachelor’s degree production of education degrees has been relatively constant over time for about 20 years and perhaps longer. Doctoral degree production increased from 1990 to 2000, but stagnated after that. On the other hand, Master’s degree production has skyrocketed.

Now, one might try to argue that what that’s really about is all of those currently practicing teachers who are just accumulating those worthless master’s degrees to get that salary bump. I will write more on this topic at a later point, but that’s not likely the dominant scenario. Yes, many of the master’s degrees are obtained to broaden fields of certification in order to give current teachers more options – either assignment options in their current districts, or other job opportunities. AND, many of the masters degrees these days are initial credentials granted to individuals who did not receive their teaching credential as an undergraduate. Many initial teaching credentials are granted at the master’s, not bachelor’s level. A substantial amount of teacher training goes on at the master’s, not undergraduate level. No matter the case, the master’s degrees – of which there are so many – and so many more being granted than bachelors degrees – are the interesting story here.

Is it really that the same old traditional higher education institutions with crusty old, out of date professors, are now just spewing out masters degrees? Or is something else at work here?

Well, here are the top 25 MA producers in education back in 199o. Even at that time, the largest master’s degree granting institutions were not the top universities – or even the top teachers colleges. But, some of those schools were at least in the mix. Teachers College of Columbia University, Ohio State, Michigan State and Harvard all appear in the top 25 in 1990.

Here are the top 25 master’s producers in 2000. Here, the tide begins to shift a bit. Schools like NOVA Southeastern with their online programs, and National-Louis grow even bigger than they had been a decade earlier. Teachers College retains a top 25 spot, as does Ohio State, and University of Minnesota makes the list. Harvard is gone.

By 2009, “Ed Schools” are a substantially different mix. Not only that, but look at the volume of degree production. Back in 1990, Ed Schools at respectable major universities were putting out about 600 master’s degrees in education related fields per year. They held on to similar rates in 2000 and still in 2009. But by 2009, Walden University and U. of Phoenix were each cranking out 4,500+ master’s degrees per year. Grand Canyon U. comes in next in line. These are the entrepreneurial up-starts that are the product of minimized regulation of teaching credentials.

If there truly has been a decline in the quality of the teacher workforce, and if pundits truly believe that this supposed decline is related somehow to “Ed Schools,” then it might behoove those same pundits to explore the dramatic changes that have, in fact, already occurred in the “Ed School” marketplace.

If there has been a dramatic decline in teacher preparation, and in specialized training, it may be worth taking a look at those institutions that have emerged to dominate the production of education degrees and credentials in recent years. After all, Walden and Phoenix each produce 5 to 10 times the master’s degree credentials in education of major public universities. And, production of education master’s degrees is now nearly double the level of production of education bachelor’s degrees. And many of these entrepreneurial start-ups specifically frame their master’s programs as an option for individuals with a bachelor’s degree in “something else” to obtain a teaching credential.

Is even more deregulation and entrepreneurial teacher preparation what we really need? Can one really blame the traditional higher education institutions, whose share of production has declined steadily for decades, for declining teacher quality? Only if you ignore these trends, which I expect these pundits will continue to do.

 

Truly Uncommon in Newark…

A while back I wrote a post explaining why I felt that while Robert Treat Academy Charter School in Newark is a fine school, it’s hardly a replicable model for large scale reform in Newark, or elsewhere.  I have continued over time to write about the extent to which Newark Charter schools in particular have engaged in a relatively extreme pattern of cream skimming.  The same is true in Jersey City and Hoboken, but not so in Trenton. But, Trenton also offers us fewer examples of those high-flying charters that we are supposed to view as models for the future of NJ education. When I wrote my earlier post on Treat, I somehow completely bypassed North Star Academy, which I would now argue is even that much less scalable than Robert Treat. That’s not to say that North Star Academy is not a highly successful school for the students that it serves… or at least for those who actually stay there over time.  But rather that Star of the North is yet another example of why the “best” New Jersey charter schools provide a very limited path forward for New Jersey urban school reform. Let’s take a look:

So, here’s where North Star fits in my 8th grade performance comparisons of beating the odds, based on the statistical model I explain in previous posts:

In this figure (ab0ve), we see that North Star certainly beats the odds at 8th grade. Now, we can also already see that North Star has a much lower % free lunch than nearly any other school in Newark, limiting scalability right off the bat. There just aren’t enough non-poor kids in Newark to create many more schools with demography like North Star. Not to mention the complete lack of children with disabilities or limited English language proficiency.

Here’s North Star on the map, in context. Smaller lighter circles are lower % free lunch schools. Most of the charters in this map are… well.. smaller lighter circles (with charters identified with a red asterisk). Not all, however, are as non-representative as North Star.

Now, here’s the part that sets North Star and a few others apart – at first in a seemingly good way…

If we take the 2009 assessments for each grade level, one interesting finding is that the charter schools serving lower grade levels in Newark are generally doing less well than the NPS average (red line). But, those schools that start at grade 5 seem to be picking up a population that right away is doing comparable or better than the NPS average. See, for example, TEAM and Greater Newark (comparable to NPS in their first grade – 5th – served) and, of course, North Star whose students perform well above NPS in their first year – likely not fully a North Star effect, but rather at least partly a selection effect (Lottery or not, it’s a different population than those served in the district).  More strikingly, with each increase in grade level, proficiency rates climb dramatically toward 100% by 8th grade. Either they are simply doing an amazing job of bringing these kids to standards over a 3 year period… or … well… something else.

The figure above looks at 6th, 7th, and 8th graders in the same year. That is, they aren’t the same kids over time doing  better and better. But, even if we looked at 6th graders in one year, 7th graders the next year and 8th graders the following year, we wouldn’t necessarily be looking at the same kids. In fact, one really easy way to make cohort test scores rise is to systematically shed – push out – those students who perform less well each year. Sadly, NJDOE does not provide the individual student data necessary for such tracking. But there are a few other ways to explore this possibility.

First, here are the cohort “attrition rates” based on 3 sequential cohorts for Newark Charter schools:

In this figure, we can see that for the 2009 8th graders, North Star began with 122 5th graders and ended with 101 in 8th. The subsequent cohort also began with 122, and ended with 104. These are sizable attrition rates. Robert Treat, on the other hand, maintains cohorts of about 50 students – non-representative cohorts indeed – but without the same degree of attrition as North Star. Now, a school could maintain cohort size even with attrition if that school were to fill vacant slots with newly lotteried-in students. This, however, is risky to the performance status of the school, if performance status is the main selling point.

Here’s what the cohort attrition looks like when tracked with the state assessment data.

Here, I take two 8th grade cohorts and trace them backwards. I focus on General Test Takers only, and use the ASK Math assessment data in this case. Quick note about those data – Scores across all schools tend to drop in 7th grade due to cut-score placement (not because kids get dumber in 7th grade and wise up again in 8th). The top section of the table looks at the failure rates and number of test takers for the 6th grade in 2005-06, 7th in 2006-07 and 8th in 2007-08. Over this time period, North Star drops 38% of its general test takers. And, cuts the already low failure rate from nearly 12% to 0%. Greater Newark also drops over 30% of test takers in the cohort, and reaps significant reductions in failures (partially proficient) in the process.

The bottom half of the table shows the next cohort in sequence. For this cohort, North Star sheds 21% of test takers between grade 6 and 8, and cuts failure rates nearly in half  – starting low to begin with (starting low in the previous grade level, 5th grade, the entry year for the school). Gray and Greater Newark also shed significant numbers of students and Greater Newark in particular sees significant reductions in share of non(uh… partially)proficient students.

My point here is not that these are bad schools, or that they are necessarily engaging in any particular immoral or unethical activity. But rather, that a significant portion of the apparent success of schools like North Star is a) attributable to the demographically different population they serve to begin with and b) attributable to the patterns of student attrition that occur within cohorts over time.

Again, the parent perspective and public policy perspective are entirely different. From a parent (or child) perspective, one is relatively unconcerned whether the positive school effect is function of selectivity of peer group and attrition, so long as there is a positive effect. But, from a public policy perspective, the model is only useful if the majority of positive effects are not due to peer group selectivity and attrition, but rather to the efficacy and transferability of the educational models, programs and strategies. Given the uncommon student populations served by many Newark charters and even more uncommon attrition patterns among some… not to mention the grossly insufficient data… we simply have no way of knowing whether these schools can provide insights for scalable reforms.

As they presently operate, however, many of the standout schools – with North Star as a shining example – do not represent scalable reforms.


New Jersey Superintendent Salaries in Context

These two figures provide some updated context to an earlier post  on Arbitrary Pay Limits for NJ Administrators. The bombastic rhetoric on this topic refuses to die down. So, here are a few more figures to put NJ public school district administrator salaries into context. Note that these two figures compare THE TOP 20 SUPERINTENDENT SALARIES to salaries of a) the majority of private independent school headmasters statewide, and b) the average of a large number of NJ Hospital administrators (non-physician chief executives).  Just a little more fodder for the conversation.

 

Figure 1

Mean and Median Compensation by Group

Figure 2

Top 20 Public School  Superintendents (2009-10) Compared with Private School Headmasters (2008)

 

Enough said.

3 very weak arguments for using weak indicators

This post is partly in response to the Brookings Institution report released this week which urged that value-added measures be considered in teacher evaluation:

http://www.brookings.edu/~/media/Files/rc/reports/2010/1117_evaluating_teachers/1117_evaluating_teachers.pdf

However, this post is more targeted at the punditry that has followed the Brookings report – the punditry that now latches onto this report as a significant endorsement of using value-added ratings as a major component in high-stakes personnel decisions. Personally, I didn’t read it that way. Nowhere did I see this report arguing strongly for a substantial emphasis on value-added measures. That said, I actually felt that the report based its rather modest conclusions on 3 deeply flawed arguments.

Argument 1 – Other methods of teacher evaluation are ineffective at determining good versus bad teachers because those methods are only weakly correlated with value-added measures.

Or, in other words, current value added measures, while only weak predictors of future value-added, are still stronger predictors of future value-added (using the same measures and models) than other indirect measures of teacher quality such as experience or principal evaluations.

This logic undergirds the quality-based dismissal example in the recent Brookings report which is based on an earlier Calder Center paper (www.caldercenter.org). That paper showed that if one dismisses teachers based on VAM, future predicted student gains are higher than if one dismisses teachers based on experience (or seniority). The authors point out that less experienced teachers are scattered across the full range of effectiveness – based on VAM – and therefore, dismissing teachers on the basis of experience leads to dismissal of both good and bad teachers – as measured by VAM. By contrast, teachers with low value-added are invariably – low value-added – BY DEFINITION. Therefore, dismissing on the basis of low value-added leaves more high value-added teachers in the system – including more teachers who show high value-added in later years (current value added is more correlated with future value added than is experience).

It is assumed in this simulation that VAM (based on a specific set of assessments and model specification) produces the true measure of teacher quality both as basis for current teacher dismissals and as basis for evaluating the effectiveness of choosing to dismiss based on VAM versus dismissing based on experience.

The authors similarly write off principal evaluations of teachers as ineffective because they too are less correlated with value-added measures than value-added measures with themselves.

Might I argue the opposite? – Value-added measures are flawed because they only weakly predict which teachers we know – by observation – are good and which ones we know are bad? A specious argument – but no more specious than its inverse.

The circular logic here is, well, problematic. Of course if we measure the effectiveness of the policy decision in terms of VAM, making the policy decision based on VAM (using the same model and assessments) will produce the more highly correlated outcome – correlated with VAM, that is.

However, it is quite likely that if we simply use different assessment data or different VAM model specification to evaluate the results of the alternative dismissal policies that we might find neither VAM-based dismissal nor experienced based dismissal better or worse than the other.

For example, Corcoran and Jennings conducted an analysis of the same teachers on two different tests in Houston, Texas, finding:

…among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.

  • Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

So, what would happen if we did a simulation of “quality based” layoffs versus experience-based layoffs using the Houston data, where the quality-based layoffs were based on a VAM model using the Texas Assessments (TAKS), but then we evaluate the effectiveness of the layoff alternatives using a value-added model of Stanford achievement test data? Arguably the odds would still be stacked in favor of VAM predicting VAM – even if different VAM measures (and perhaps different model specifications). But, I suspect the results would be much less compelling than the original simulation.

The results under this alternative approach may, however, be reduced entirely to noise – meaning that the VAM based layoffs would be the equivalent of random firings – drawn from a hat and poorly if at all correlated with the outcome measure estimated by a different VAM – as opposed to experienced based firings. Neither would be a much better predictor of future value-added.  But for all their flaws, I’d take the experienced based dismissal policy over the roll of the dice, randomized firing policy any day.

Argument 2 – We should be unconcerned about high misclassification errors – falsely identifying good teachers as bad, therefore resulting in random harm to teachers – Rather, we should be concerned that current methods falsely identify bad teachers as good, doing lifelong harm to kids.

The Brookings report argues:

Much of the concern and cautions about the use of value-added have focused on the frequency of occurrence of false negatives, i.e., effective teachers who are identified as ineffective.  But framing the problem in terms of false negatives places the focus almost entirely on the interests of the individual who is being evaluated rather than the students who are being served.  It is easy to identify with the good teacher who wants to avoid dismissal for being incorrectly labeled a bad teacher.  From that individual’s perspective, no rate of misclassification is acceptable.  However, an evaluation system that results in tenure and advancement for almost every teacher and thus has a very low rate of false negatives generates a high rate of false positives, i.e., teachers identified as effective who are not.  These teachers drag down the performance of schools and do not serve students as well as more effective teachers.

Again, the false identification assumption regarding current evaluations is based on the assumption that the value-added measure is a true measure of teacher quality. That is, we know current evaluations are bad because many teachers get tenure but have bad value-added ratings. But, value-added ratings are good because some teachers who had good value-added ratings at one point, under one type of value-added model applied to one type of assessments, also have good value-added ratings later using the same model specification applied to similar or same testing data.

Setting that circular logic issue aside, we are faced with the moral dilemma I posed in an earlier post. This argument is all about the “adults vs. kids” issue, and the assumption that if it’s really about the kids, it can’t be at all about the adults in the system, and vice versa. The reality however is that a system that is a great workplace for adults can translate to a better educational setting for children and a system that creates a divisive, negative workplace setting for the adults is unlikely to translate to a better educational setting for the kids. It’s more likely to be a both/and, not either/or situation.

I explained previously:

I guess that one could try to dismiss those moral, ethical and legal concerns regarding wrongly dismissing teachers by arguing that if it’s better for the kids in the end, then wrongly firing 1 in 4 average teachers along the way is the price we have to pay. I suspect that’s what the pundits would argue – since it’s about fairness to the kids, not fairness to the teachers, right? Still, this seems like a heavy toll to pay, an unnecessary toll, and quite honestly, one that’s not even that likely to work even in the best of engineered circumstances.

Too often overlooked in these analyses is the question of who will really want to teach in an education system where the chance of having one’s career cut short by random statistical error is quite high????? Who will be waiting in line? What kind of workplace will that create? And can we really expect average teaching quality to improve as a result?

Argument 3 – Other professions use “weak” indicators or signals of performance, like using the SAT for college admission or using patient mortality rates to evaluate hospital quality.

The Brookings report argues:

It is instructive to look at other sectors of the economy as a gauge for judging the stability of value-added measures.  The use of imprecise measures to make high stakes decisions that place societal or institutional interests above those of individuals is wide-spread and accepted in fields outside of teaching.

Examples from Brookings:

In health care, patient volume and patient mortality rates for surgeons and hospitals are publicly reported on an annual basis by private organizations and federal agencies and have been formally approved as quality measures by national organizations.

The correlation of the college admission test scores of college applicants with measures of college success is modest (r = .35 for SAT combined verbal + math and freshman GPA[9]).  Nevertheless nearly all selective colleges use SAT or ACT scores as a heavily weighted component of their admission decisions even though that produces substantial false negative rates (students who could have succeeded but are denied entry).

On its face, the argument that other professions use weak indicators is reason for public education to do the same is absurd.  And, this argument presents as a given, with very weak justification and a handful of cherry-picked citations, that these weak signals play a significant role in high stakes decision-making. A more thorough review of health-care policy literature in particular raises many of the same concerns we hear in the education debate over institutional and individual performance measures – precision, accuracy and incentives. There also exists a similar divide in perspectives between healthcare policy wonks and management organizations versus physicians with regard to the accuracy and usefulness of the indicators and the incentives created by specific measures.

Many pundits out there tweeting and blogging about this new Brookings report are the same pundits who continue to argue that value-added ratings should constitute as much as 50% of teacher evaluation – and that somehow this new Brookings report validates their claim. I don’t see where the Brookings report goes anywhere near that far.

To those viewing the Brookings report in that light, implicit in the “other sectors do it” argument is that the SAT and mortality rates are considered major factors for evaluating students for admission or for evaluating hospital quality. Are they really? In an era where more and more colleges are making the SAT optional, how many are using it as 50% of admissions criteria? Yes, most highly selective colleges do still require the SAT, and it no doubt serves as a tipping factor on admissions decisions (largely out of convenience when taking the first cut at a large applicant pool).  But, several have abandoned use of SAT altogether (http://www.fairtest.org/university/optional), perhaps because it is perceived to be such a weak signal – or because of all of the perverse incentives and inequities associated with the SAT.  Would anyone seriously consider using patient mortality rates alone as 50% of the value for rating hospital quality – determining hospital closures?

And even the batting average comparison – the authors argue that past batting average is only a weak predictor of future batting average – but is clearly still important in player personnel decisions. But what percentage does batting average count for? Does a baseball GM consider which pitchers the batter went up against that season? Would batting average count for, say, 50% of the decision – absolutely, a fixed share, deal breaker? It may be important, but it’s one of many, many statistics most of which are also likely considered in context – in a very flexible decision framework (more art than science?).

And then there’s the issue of the incentives created by emphasizing a specific measure. What may be good in baseball may not in healthcare or education!

Let’s say we put this much emphasis on batting average. What is a player to do? How can the player improve his worth? Getting more hits – getting traded to a team in a division with poor pitching… to get more hits? Neither is a deeply problematic incentive. There’s not much downside to improving one’s batting average (setting aside the role of performance enhancing drugs and all that).  Besides, it’s a freakin’ game!!!!!… a game that involves gaming the system. A game that is based largely on obsession with statistics and trying to figure our which ones are and aren’t meaningful.

The SAT is different. Much has been made of the perverse incentives including increased classification among students from higher income families to take an un-timed SAT (http://www.slate.com/id/2141820/) and entire industries that have merged around the emphasis on SAT scores for college admission, reinforcing socio-economic disparities in SAT performance. Even if the SAT could be a reasonable indicator, its usefulness has been distorted and significantly compromised by the incentives and resulting behaviors. Hence the decision by many colleges to consider the test optional .

What about hospitals trying to reduce mortality rates? Turning away the sickest, highest risk patients and most complex cases is one option. Unlike the batting average measure, there is a significant downside to this one. Those with the greatest needs don’t get served!

To argue that these supposedly analogous measures are widely accepted in healthcare as a basis for high-stakes decisions is a foolish stretch. The citations in the Brookings report to sources that report healthcare indicators (like: http://www.hospitalcompare.hhs.gov/)  are more analogous to the school report cards that already exist on state department of education web sites (but with additional survey responses added) – and not analogous to the publication of individual teacher value-added ratings as done earlier this year by the LA Times.  Comparable information – and comparably useless information – is already widely available on public schools through both government sources and a plethora of private vendors. For that matter, consumer ratings of individual teachers are also available through sources like www.ratemyteacher.com.

To wrap this up…

Do I think value-added measures are useless? No! Do I think they should be used to evaluate and compare individual teachers and should be used as a significant factor in employment decisions? No! But, there is likely still much that can be learned from studying different approaches to value-added modeling, developing more useful assessment instruments, using these instruments and models to better understand what works in schools and what doesn’t.  Clearly we need to learn how to use data more thoughtfully to inform school improvement efforts. Quite honestly, I believe many already are.

 

Follow up Response to Chad Aldeman’s critique of my argument:

You (Chad) seem to have missed the point, which I perhaps did not make clear enough. Indeed, if we were able to measure precisely what we wanted to measure and if we measured it, and it predicted itself, we’d be in great shape. The problem is that we can’t measure precisely what we think we want to measure.

Given the same assessment instrument and value added model parameters, we get a .2 to .3 correlation from year to year… (possibly partly a function of non-random assignment).

Given different assessments in the same year, Sean Corcoran found:

…among those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.

* Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

So… what this calls into question is whether we really are measuring what we want to measure. Are we able to precisely determine that a math teacher is good at teaching math – regardless of the students they have, or the year they have them, or the test we use to measure it? Or even regardless of the scaling of the test.

When we measure points per game, we know what we are measuring. I’m curious now about the correlation between points per game and win/loss, or even a championship season… or annual revenue, since that’s the end goal… but I digress.

 

Searching for Superguy in Jersey…

A short while back I did a post called Searching for Superguy in Gotham.  In that post, I tackled the assumption that Superguy was easily identifiable as a hero leader of charter schools – or at least that was one distorted portrayal of Superguy in Waiting for Superman. Now, I should point out here that I really don’t know of anyone actually out there running charter schools who wishes to portray him/herself in this way. So, to be absolutely clear, this post is in no way an attack on those who are out there just trying to do the best job they can for kids in need.

This post IS a criticism of the punditry around charter schools- the notion that charter schools are easy to pick out from the crowd of urban (or other) schools- because they are necessarily, plainly and obviously better. That classic argument that the upper half is better than average!

This was the basis of my Searching for Superguy in Gotham activity. In that activity, I estimated a relatively simple statistical model to determine which schools performed better than expected, given their students and location and which schools performed less well than expected, given their students and location. I had been planning all along to do something similar with New Jersey Charter Schools. Now is that time!!!!!

As I did with New York City charter schools, I have estimated a statistical model of the proficiency rates of each charter school and each other school in the same New Jersey city. In the model, I correct for a) free lunch rates, b) homelessness rates, c) student racial composition (Hispanic and black). AND, I compare each test – grade level and subject – to the same test across all schools. AND, I compare each school to other schools in the same city (by using a “city” dummy variable). I obtained all necessary variables from a) NJ school report cards (outcome measures) and b) NJ enrollment data file (free lunch, race, homelessness) and c) NCES Common Core of data for “city” location of school.

So now, the search for Jersey Superguy begins! Let’s start with 4th Grade Math performance in 2009. This scatterplot includes all schools with ASK4 Math scores in cities where charters existed in 2009. Schools above the red horizontal line are schools that “beat the odds.” That is, they are schools that had proficiency rates that were above the expected proficiency rates for that school, given its students, the test, and the location (city). Schools below the red line are schools that did not meet expectations. So, is superman (mythical super charter school leader) hiding in one of those dots way at the top of the scatter? Is he in a high-flying, high poverty school? Is he in a high-flying low poverty school? Certainly, he could not be down in the lower half of the graph.


CLICK HERE TO SEE WHICH SCHOOLS ARE CHARTERS AND WHICH ARE DISTRICT SCHOOLS

CLICK HERE FOR A CLOSE UP ON NEWARK SCHOOLS OVER AND UNDER THE LINE

NOTE: I’m in the process of fixing a data error that occurs on a few charter schools (affecting merging of data).  These figures still include the merge error, but the overall distributions are not affected. Schools affected include: Environment Community School, Liberty Academy, Hope Academy, International CS of Trenton, Jersey City Community CS and Jersey City Golden Door. I HAVE  NOW EXCLUDED MISMATCHED SCHOOLS.

The source of the error is the NJDOE enrollment file, which, for example identifies Environment Community School as both 80_6232_920 (county, district, school) and as 80_6235_900.  The first of these codes is correct. The second is for Liberty Academy CS (according to the School Report Card and according to NCES data).

Now, let’s take a look at the 8th Grade Math outcomes. Here’s the statewide scatterplot:


Surely superguy must be hangin’ out in one of those high flyin’ dots way at the top of the scatter?

CLICK HERE TO SEE WHICH SCHOOLS ARE CHARTERS AND WHICH ARE DISTRICT SCHOOLS

CLICK HERE FOR A CLOSE UP ON NEWARK SCHOOLS OVER AND UNDER THE LINE

As you can see, there are plenty of charters and traditional public schools above the line, and below the line. The point here is by no means to bash charters. Rather, this is about being realistic about charters and more importantly realistic about the difficulty of truly overcoming the odds. It’s not easy and any respectable charter school leader or teacher and any respectable traditional public school leader or teacher will likely confirm that. It’s not about superguy. It’s about hard work and sustained support – be it for charters or for traditional public schools.

As I noted in my previous searching for superguy post:

Yeah… I’d like to be a believer. I don’t mean to be that much of a curmudgeon. I’d like to sit and wait for Superguy – perhaps watch a movie while waiting (gee… what to watch?). But I think it would be a really long wait and we might be better off spending this time, effort and our resources investing in the improvement of the quality of the system as a whole. Yeah, we can still give Superguy a chance to show himself (or herself), but let’s not hold our breath, and let’s do our part on behalf of the masses (not just the few) in the meantime.

TECHNICAL APPENDIX

Here is a link to the model used for generating the over/under performing graphs above

And here is a separate model  in which I test whether Charter schools on average outperform traditional public  schools in the same city. This model shows that they don’t, or at least that their 1 to 3 percentage point edge on proficiency is not statistically significant. But whether charters on average outperform – or don’t – traditional public schools is not the point. The point is that like traditional public schools – they vary – and it’s important for us to get a handle on how and why all schools vary in their successes and failures – charter or not.

Complete slide set here: New Charter Figures Nov 12

BONUS MAPS

Here are some updated maps of the demographics and adjusted performance measures of charter and district schools in Newark.

First, % Free Lunch 2009-10:

Next, a new one, % LEP/ELL – note that the % LEP/ELL for NWK charters is so low, therefore their dots are so small that the star indicating “charter” covers them entirely:

Finally, here are the Beating the Odds figures converted into color coded circles – with large purple circles being high performers – better than expectations – medium size pale dots being relatively average performers – and large yellow dots performing below expectations:

Jersey City % LEP/ELL

Jersey City % Free Lunch

Jersey City Performance Index

Getting all “bubbly” over that spending “bubble?”

Let me just say that I hate this graph!

FIGURE 1 – NATIONAL TRENDS IN PUPIL TO TEACHER RATIOS

Why? Well, this is my own version of the graph… but it is a graph that has been used many times over, of late, to make the argument that American public schools have simply been drowning in an excess of public funding for decades and that public school districts nationally have leveraged all of that additional money over time to flood classrooms with additional staff. Of course, this framing is being used to set up the argument that it’s time for a logical correction – a return to sanity – a return to reasonable class sizes, etc. etc. etc.

First of all, even this national graph – which isn’t particularly meaningful – shows a reduction of slightly over 2 students per teacher over the past 25 years. Chop off the period prior to 1985 and the most dramatic shifts in pupil to teacher ratio are wiped away.

Second, national averages just aren’t that meaningful.  Claims like this one, from Mike Petrilli of Fordham Institute, misunderstand entirely the variation in resources across states and variation in effort across states:

The tough-love message to superintendents and school boards nationwide should be clear: The day of reckoning has arrived; let the de-leveraging begin. The spending bubble is over. No more adding staff at a pace that outstrips student enrollment; no more sweetheart deals on pensions or health insurance; no more whining about “large” class sizes of twenty-five. It’s time to live within our means.

http://www.edexcellence.net/flypaper/index.php/2010/11/welcome-to-a-new-era-of-restraint/

The new “reform” story line is that all states have been spending out of control they have taxed their citizenry to death and have spent most of that money hiring more and more teachers even while student populations have stagnated. Further, we have done all of this out of control spending and class size reduction while seeing absolutely no return for our dollar! Pretty simple!  Very compelling, eh? NO.

When one takes a look at individual states – in terms of relative tax burden, changes in tax burden over time – in terms of changes in pupil to teacher ratios over time – and in terms of student outcomes in relation to pupil to teacher ratios and tax burdens – it’s pretty damn hard to find states which actually fit that story line. And public K-12 education is largely a state and local endeavor, not a federal one.

So, let’s take a state by state look. Let’s start here – with the total revenues per pupil – state aggregate – including state, local and federal sources. I’ve adjusted these for inflation (Employment Cost Index – Gov’t Workers), but not for regional variation. For more information on comparing across states, see THIS POST. (<–See this link if you want to know how states rank compared to one another on state and local revenues).

FIGURE 2 – TOTAL REVENUES PER PUPIL – INFLATION ADJUSTED, NOT REGIONALLY ADJUSTED (2005 Dollars)

What this shows us is that there’s quite a bit of variation in revenues for local public school districts across states, with some states investing a lot – like New Jersey, Vermont and Wyoming and some not so much – like Utah, Arizona and California. This graph also tells us that many states really didn’t see consistent spending/district revenue growth throughout the period. No massive… long term… uniform… spending bubble… from which they all benefited.

The patterns of revenue increases mirror patterns of pupil to teacher ratio change over time. Yes, some states like Vermont, Wyoming and New Jersey did see pupil to teacher ratios decline. But, not so for Utah, California (after 1998) or Arizona. In fact, in Arizona pupil to teacher ratios increased over time!

FIGURE 3 – PUPIL TO TEACHER RATIOS ACROSS STATES AND OVER TIME

And, it is similarly foolish to assert that all states put themselves over the brink in terms of taxes to support all of this supposedly lavish spending. Here are the state and local direct expenditures as a percent of personal income across states:

FIGURE 4 – STATE AND LOCAL DIRECT EXPENDITURES (All, not just Educ.) AS A PERCENT OF PERSONAL INCOME

Look at New Jersey, which has provided relatively high levels of funding, with increases in the early 1990s and again from 1998 to 2003 (though lagging in the middle). New Jersey is not among highest at all on state and local direct expenditures as a percent of personal income – because incomes in New Jersey are quite high. Arizona is also relatively low and Utah and California only average – not high. The “high” state and local direct spending burden states in this mix are Wyoming and Vermont.

So then, what about that story line we’re being told applies across our nation’s schools –

  1. that they’ve been swimming in money for decades – a huge ongoing spending bubble – and
  2. that they’ve spent it all on pupil to teacher ratios – increasing teacher quantity not quality – and
  3. that they’ve taxed their state residents to death in the process – and
  4. got nothing for it in improved student outcomes.

Well, the only state that seems to come close here is Vermont (and perhaps Wyoming) – which does have high and growing public expenditure burden (and the highest “effort” index at www.schoolfundingfairness.org), increased education spending per pupil over time and continued decline in pupil to teacher ratios.

Of course, that last part of the story line about outcome failure doesn’t fit so well, because Vermont does very well on outcomes (See this article for discussion of empirical research on results of Vermont school finance reforms: https://schoolfinance101.com/wp-content/uploads/2010/01/doreformsmatter-baker-welner.pdf)

It also turns out that even Vermont’s “bubble” story isn’t so simple. Vermont really hasn’t been adding significant numbers of additional staff and spending a lot more in recent years. Rather, Vermont is experiencing a significant decline in student population – but has not adjusted its public education system accordingly – reorganizing into more efficiently organized schools and districts. You see, if you spend the same amount and retain similar numbers of teachers, when enrollments decline, pupil to teacher ratios decline and per pupil spending goes up. Yeah… same effect as a spending bubble – but very different cause. And yes, this is something that should be addressed. But again – somewhat different issue! Except for an infusion of funding in the late 1990s in response to school funding equity litigation – Vermont has not necessarily been on a wild education spending binge to reduce class sizes – Rather, the state is a victim of declining school-aged population, resulting in declining enrollment and declining pupil to teacher ratios.

Here’s Vermont enrollment over this same period.

FIGURE 5 – VERMONT ENROLLMENTS

This map shows the locations and enrollment of Vermont schools. Small red dots are schools with 50 or fewer students and small orange/brown dots have 50 to 100.  In some areas of the state, you can find small red dots within a few miles of each other (many of which are small town schools – in separate towns – actually serving the same grade levels/ranges).

FIGURE 6 – LOCATIONS AND ENROLLMENT SIZES OF VERMONT SCHOOLS

Okay, so if this story line doesn’t even fit for Vermont, then who? Perhaps no-one! Arguments built on assumptions of “national trends” regarding financing, class size and/or most features of our public education systems – our 50 and then some systems – are generally unhelpful (if not entirely misguided).

Arguments that encourage state legislators across the nation – including those in states like Utah, Arizona and California that now is the time to cut, cut, cut, because we just went through decades of spend, spend, spend, are downright ignorant and irresponsible.

And please check out this post: https://schoolfinance101.wordpress.com/2010/10/27/when-schools-have-money/ where I explain that within states – especially inequitable states like NY or IL – if and when anyone did benefit from reduced pupil to teacher ratios and more importantly smaller class sizes, it was often those in the most affluent communities.

A few updated NJ charter figures

New, updated slides in PPT format (for clarity on labels): CHARTER SCHOOLS_NOV2010

I expect people will be asking why some of my figures previously posted don’t match up exactly with figures presented by others on New Jersey Charter Schools – including those produced by ACNJ in a new report.  In short, the answer is that at least with regard to “poverty” measurement and comparisons across charters and Newark Public Schools, they are different measures. In my previous slides I show a bar graph of Free Lunch rates and later show scatterplots of performance by Free or Reduced Price Lunch rates. ACNJ and many others use only Free or Reduced lunch rates, never exploring the distinction between the two. Seems like a subtle difference for the lay reader and one that might not sink in right away. But, it can actually be an important distinction in this type of comparison.

Here’s a link to the differences in eligibility guidelines: http://www.fns.usda.gov/cnd/governance/notices/iegs/iegs.htm

For children to qualify for Free Lunch, their family income levels must be below the 130% income level with respect to the Poverty Income Level (30% above poverty line). That is, kids in families who qualify for free lunch are in families up to that level.

The income threshold for Reduced Price Lunch is the 185% income level with respect to the poverty income level.

The fact is that most school aged children in Newark fall under the 185% income level with respect to the poverty income level. As such, most schools in Newark have over 80% children in this category. Therefore, it is hard to use this relatively “generous” income threshold in order to distinguish differences in populations across Newark Schools- NPS or charter. The lower income threshold serves as a better way to distinguish the differences.

Here is the % Free Lunch using NJDOE 2009-10 data: http://www.state.nj.us/education/data/enr/enr10/

These data are highly consistent (except for Lady Liberty) with my 2008-09 data from the National Center for Education Statistics Common Core of Data. Most Newark Charter Schools, especially the frequently touted high performers, have very low relative rates of children below the 130% poverty threshold.

Here is the % Free or Reduced Lunch using NJDOE 2009-10 data:

Here, the charter schools scatter themselves more widely among the NPS schools. They appear more comparable and their average is only marginally different by some accounts. BUT, the reality is that most kids in Newark fall under this threshold and nearly every school in the above figure exceeds 70% free or  reduced lunch and the vast majority exceed 80%. This higher income threshold limits our ability to distinguish real differences in student populations across Newark schools.

Another angle would be to say that the difference in the position of charter schools in the second graph versus the first is an indication that CHARTERS ARE SERVING THE LESS POOR AMONG THE POOR.  Not all, but many are doing this. Most surprising perhaps is that Robert Treat in particular remains a standout even with regard to the less poor.

Additional Figures:

Here are the special education classification rate data for 2004 through 2007:

NJDOE has not posted the more recent classification rate data by the same format. Enrollment files used in the first part of this post have disaggregated classification data, but report mostly “0” values for charters because counts were too small to report. NJDOE does report placement data, but again, these data are spotty at best for NJ Charter schools.

Here are the frequency distributions by school, for Newark Schools, by Free Lunch and by Free or Reduced Lunch. As you can see, the distribution for Free or Reduced Lunch is all crunched in the range above 80% making it more difficult to distinguish true poverty differences among schools serving Newark children.

GUIDELINES FOR USING/COMPARING NJ CHARTER DATA

  1. When comparing across schools within poor urban setting, compare on basis of free lunch, not free or reduced, so as to pick up variation across schools. Reduced lunch income threshold too high to pick up variation.
  2. When comparing free lunch rates across schools either a) compare against individual schools and nearest schools, OR compare against district averages by GRADE LEVEL. Subsidized lunch rates decline in higher grade levels (for many reasons, to be discussed later). Most charter schools serve elementary and/or middle grades. As such they should be compared to traditional public schools of the same grade level. High school students bring district averages down.
  3. When comparing test score outcomes using NJ report card data, be sure to compare General Test Takers, not Total Test Takes. Total Test Takers include scores/pass rates for children with disabilities. But, as we have seen time and time again, in charts above, Charters tend not to serve these students. Therefore, it is best to exclude scores of these students from both the Charter Schools and Traditional Public Schools.

ACNJ’s Newark Kids Count 2010 report appears to fail on all 3 guidelines above.

ACNJ’s Newark Kids Count: http://acnj.org/admin.asp?uri=2081&action=15&di=1841&ext=pdf&view=yes

ADDITIONAL STUFF

One question raised by the ACNJ Kids Count yearbook is why the NPS schools hold ground with Newark Charters through 4th grade, but appear to lose ground in 8th grade. The charter advocate explanation is that charters are simply doing better, cumulatively, with students through 8th grade and preparing them for college. However, there are two other equally if not more likely explanations.

First, the mix of schools that are charter schools serving 8th grade students is different from the mix serving 4th grade students. Heavy “cream-skimmers” like North Star Academy start at 5th grade. And some lower performing charters, actually serving more representative populations end at 4th grade. The different mix of charters having students taking the 8th grade test versus those taking the 4th grade test may explain a substantial portion of the difference. It’s also important to understand that at this break – where low performers end – and where high performers start up – that many low performing students may be pushed back into NPS schools and meanwhile, higher performing ones creamed off.  Here are the charter school proficiency rates (general test takers only) from 2009 state report cards, along side NPS proficiency rates (averaged across tests).

Second, charter schools have the ability to use cohort attrition to their advantage, over time, shedding the students who perform less well on assessments, perhaps due to the extent of parental obligation involved in keeping students in the school or even due to the message that the child “just can’t cut it here.” NJDOE data don’t allow for precise student level tracking to see whether individual students stay on in particular charter schools or which students do. But, one can do a relatively simple back of the napkin approach using the grade level enrollment files to determine whether or not cohort attrition may be an issue. Note from the performance graph above, North Star in particular shows incrementally higher proficiency rates at each higher grade level. While this is not a cohort comparison, it is possible that this pattern arises due to attrition of weaker students in higher grades.

Here’s a quick look at 3 cohorts of 5th graders across these schools:

This tabulation shows significant cohort attrition for North Star in particular.

Now, there is nothing particularly conclusive about the above slides, but they do raise questions as to whether the difference in 8th grade scores between NPS and Newark Charters is at least partly if not substantially a function of a) the different mix of schools serving 8th grade and b) the significant cohort attrition of at least one of the larger schools. Note that these attrition patters, if shedding lower performing students have the effect of both raising the charter 8th grade average and depressing the NPS 8th grade average.

New Jersey Charter Schools Association gets angry over… data?

For some time now, I’ve been pulling together data from the National Center for Education Statistics and from the New Jersey Department of Education on New Jersey Charter Schools. Why do I do it? Mainly out of frustration that no-one else seems to be playing a monitoring role. I’ve not seen any good compilations or presentations of the various types of data that exist on New Jersey Charter Schools. That said, the data aren’t great. They aren’t worthy of high level academic research. But they are what we’ve got, and they are from the primary government sources charged with collecting these data. So, here are a series of my slides compiled from the data:

Link to PDF slides: CHARTER SCHOOLS_OCT2010

CHARTER SCHOOLS_NOV2010 (Includes updated slides)

Updated Figures: https://schoolfinance101.wordpress.com/2010/11/10/a-few-updated-nj-charter-figures/


CHARTER SCHOOL DEMOGRAPHICS

Data: LINK TO UPDATED SPREADSHEET OF FREE LUNCH AND SPECIAL ED DATA

On second look, it appears that this first graph matches the 2008-09 data from the spreadsheet linked above (not the 2007-08 as originally labeled).


CHARTER SCHOOL PERFORMANCE WITH RESPECT TO DEMOGRAPHICS (NEWARK)


CHARTER SCHOOLS IN SPATIAL CONTEXT (CLICK FOR READABILITY)

Previous posts and additional figures on NJ charters can be found throughout my blog at:

1. Math Trends over Time by District Factor Group: https://schoolfinance101.wordpress.com/2009/12/14/nj-charter-update-math-trends-over-time/

2. Playing with Charter Numbers: https://schoolfinance101.wordpress.com/2009/11/13/playing-with-charter-numbers-in-nj/

3. Replicating Robert Treat Academy: https://schoolfinance101.wordpress.com/2009/11/05/replicating-robert-treat-academy/

My general conclusions from these previous posts and the above graphs?

  1. New Jersey Charter Schools generally serve smaller shares of children qualifying for free lunch than schools in their host district and schools in their immediate surroundings.
  2. New Jersey Charter Schools serve very few children with disabilities.
  3. New Jersey Charter School performance, like charter school performance elsewhere is  a mixed bag. Some of the highest performers are simply not comparable to traditional public schools in their districts because they serve such different student populations (far fewer low income children and few or no special education students). So, even if we found that these schools produced greater gains for their students than similar students would have achieved in the traditional public schools, we could not sort out whether that effect came from school quality differences or from peer group differences (which doesn’t matter from the parent perspective, but does from the policy perspective).

A colleague of mine shared these data with an interested reporter. I spoke with the reporter. And the reporter requested a response from a representative of the New Jersey Charter Schools Association.

The New Jersey Charter Schools Association responded:

The New Jersey Charter Schools Association seriously questions the credibility of this biased data. Rutgers University Professor Bruce Baker is closely aligned with teachers unions, which have been vocal opponents of charter schools and have a vested financial interest in their ultimate failure.

Baker is a member of the Think Tank Review Panel, which is bankrolled by the Great Lakes Center for Education Research and Practice. Great Lakes Center members include the National Education Association and the State Education Affiliate Associations in Illinois, Indiana, Michigan, Minnesota, Ohio and Wisconsin. Its chairman is: Lu Battaglieri, the executive director of the Michigan Education Association.

There are now thousands of children on waiting lists for charters schools in New Jersey. This demand shows parents want the option of sending their children to these innovative schools and are satisfied with the results.

Wow. That’s quite interesting. These data can’t be credible simply because I sit on the Think Tank Review Panel and I am – ACCORDING TO THEM (news to me) – closely aligned with teachers’ unions. According to this statement, these data are necessarily “biased,” even though the statement provides no evidence whatsoever to that effect. Heck, I’ve merely graphed and mapped NCES and NJDOE data. Did my mapping software introduce some devious union bias? Damn that ArcView!

By the way, I don’t get any kind of ongoing pay for doing this Think Tank Review stuff. I do get contracted to write a policy brief or critique on occasion, and it’s a relatively small sum of money for each brief or critique.  I consult for a lot of groups around the country and a long list can be found on my vitae, here: B.Baker.Vitae.October5_2010

I don’t take any money for this blog or reprints/re-posts of it, and quite honestly, when I do take contract money to write a policy brief or report – whoever it’s for – I go to extra lengths to make sure that the data and analysis are defensible, typically opting for the most conservative representation of the data, knowing full well that the instinct of any opposing critic will be to pounce.

Hey… these data are what they are. I’m just making graphs of them. This official statement of the New Jersey Charter Schools Association is a childish personal attack from an organization that apparently has little else to stand on.

SOURCE LINKS:

For free lunch data and enrollments: http://nces.ed.gov/ccd/

Use the “build a table” function (under CCD Data tools)

For special education count data:

General NJDOE site: http://www.state.nj.us/education/specialed/data/

For 2007 classification rates: http://www.state.nj.us/education/specialed/data/2007.htm

First link: http://www.state.nj.us/education/specialed/data/ADR/2007/classification/distclassification.xls

Note that same link is dead for 2008 and 2009: http://www.state.nj.us/education/specialed/data/2008.htm

For test score data: http://education.state.nj.us/rc/rc09/database.htm