Why I’m not crying for Louisiana and Colorado

Many of the “reformers” out there are whining and fist-thumping about the surprise omission of Louisiana and Colorado as Race to the Top Winners. After all, Louisiana has been a heavy favorite from the outset of RttT, and Colorado… well Colorado took the amazingly bold leap of adopting legislation to mandate that a majority of teacher evaluation be based on value-added test scores. That’s got to count for something. Heck, these two states should have gotten the whole thing? Here’s Tom Vander Ark’s take on this huge surprise loss: http://edreformer.com/2010/08/co-la-surprise-losers/

Now here’s why I find it somewhat of a relief that these two states did not find themselves in the winners’ circle (not that I can identify a great deal of logic to support those who did… but…).

I’ve written numerous times about Louisiana’s public education system, and that state’s support or lack-thereof for providing a decent quality education for the children of Louisiana.

https://schoolfinance101.wordpress.com/2009/12/18/disg-race-to-the-top/

Here’s an excerpt from that previous post:

Let’s take a look at Louisiana’s education system. Yes, their system needs help, but the reality is that Louisiana politicians have never attempted to help their own system. In fact they’ve thrown it under the bus and now they want an award? Here’s the rundown:

  • 3rd lowest (behind Delaware & South Dakota) % of gross state product spent on elementary and secondary schools (American Community Survey of 2005, 2006, 2007)
  • 2nd lowest percent of 6 to 16 year old children attending the public system at about 80% (tied with Hawaii, behind Delaware) (American Community Survey of 2005, 2006, 2007). The national average is about 87%.
  • 2nd largest (behind Mississippi) racial gap between % white in private schools (82%) and % white in public schools (52%) (American Community Survey of 2005, 2006, 2007).  The national average is a 13% difference in whiteness, compared to 30% in Louisiana.
  • 3rd largest income gap between publicly and privately schooled children at about a 2 to 1 ratio. (American Community Survey of 2005, 2006, 2007)
  • 4th highest percent of teachers who attended non-competitive or less competitive (bottom 2 categories) undergraduate colleges based on Barrons’ ratings (NCES Schools and Staffing Survey of 2003-04). Almost half of Louisiana teachers attended less or non-competitive colleges, compared to 24% nationally.
  • Negative relationship between per pupil state and local revenues and district poverty rates, after controlling for regional wage variation, economies of scale, population density (poor get less).
  • 46th (of 52) on NAEP 8th Grade Math in 2009. 38th of 41 in 2000. http://nces.ed.gov/nationsreportcard/statecomparisons/
  • 49th (of 52) on NAEP 4th Grade Math in 2009. 35th of 42 in 2000.

So, this is a state where 20% abandon the public system and 82% of those who leave are white and have income twice that of those left in the public system, half of whom are non-white. While the racial gap is large in Mississippi, a much smaller share of Mississippi children abandon the public system and Mississippi is average on the percent of GSP allocated to public education. Mississippi simply lacks the capacity to do better. Louisiana doesn’t even try. And they deserve and award?

Quite honestly, I hadn’t really thought much about Colorado’s chances until today. I was certainly aware of their finalist status and aware of the reform crowd support for their new teacher evaluation legislation. But I hadn’t really reviewed their “indicators.” Here’s my summary of Colorado from earlier today:

Using 2007-08 data, Colorado ranks:

  • 45th in effort (% gross state product spent on schools)
  • 39th in funding level overall
  • 32nd in funding fairness (has a system whereby higher poverty districts have systematically less state and local revenue per pupil than lower poverty districts)

Yes, better than Louisiana, but nothin’ to brag about. And yes, both are marginally better than Round 1 winner Tennessee… but nearly every other state in the nation is.

So, where do these two states fit into those scatterplots I posted earlier today which identified Round 1 and Round 2 winners? Here they are – First, fiscal effort and overall spending level. Both states are very low effort states, and both are relatively low spending states.

And next, effort and funding fairness – or the extent to which funding is allocated in greater amounts to districts with greater needs.

In both cases, Louisiana and Colorado fall toward the lower left hand corner of the plot. Both are very low fiscal effort states. They have the capacity to provide more support for public education – BUT DON’T! Both states are also “regressive” – allocating systematically less funding per pupil to higher need districts, with Louisiana close to a flat distribution. And both are generally low spending despite their capacity to do better.

Improving state data systems – linking those data to teacher preparation institutions in order to impose sanctions on those institutions – banning teachers from obtaining tenure until they can achieve 3 consecutive years of positive value-added scores (error rates alone and year to year fluctuations may make this a low probability event) – and expanding charter schools are not likely to dig these states out of their current position. Doing so will require far greater investment than RttT could ever provide, especially in the case of Louisiana.  In fact, dramatically increasing job risk and career instability for individuals interested in entering teaching without also increasing the reward is likely to have significant negative effects. Unfortunately, it is about as likely that losing RttT will cause these states to reconsider their shortsighted reform agendas as it is that reading this blog post will get them to reconsider the persistent deprivation of their public education systems.

RttT Round 2 – Stuff that Doesn’t Matter!

Unlike many RttT enthusiasts, I have to say that I was pleased to see that Louisiana and Colorado were not among the winners. I’ve written extensively about Louisiana public schools in the past:

https://schoolfinance101.wordpress.com/2009/12/18/disg-race-to-the-top/

Although Colorado doesn’t look as bad as Louisiana on the indicators I often use on this blog, it ain’t pretty.

Using 2007-08 data, Colorado ranks:

  • 45th in effort (% gross state product spent on schools)
  • 39th in funding level overall
  • 32nd in funding fairness (has a system whereby higher poverty districts have systematically less state and local revenue per pupil than lower poverty districts)

Of course, these indicators – which I believe tell us a lot about state education systems – don’t really matter much when it comes to the big race, as I pointed out here:

https://schoolfinance101.wordpress.com/2010/03/29/and-the-rttt-winners-are/

Thankfully, while these indicators of actual effort to finance state school systems and participation rates in those systems didn’t matter in Round 2 either, the picks for Round 2 winners are somewhat – though not entirely – less offensive. I’ve highlighted in yellow with red type any cases where a Round 1 or 2 winner comes in 40th or lower on an indicator – Bottom 10. I’ve indicated in green with blue type, cases where states are in the Top 10. Sadly, there are far more bottom 10 cases than top 10 cases.

I would consider EFFORT and FAIRNESS to be the two key indicators here over which states have greatest control. A poor  state could put up significant effort and still not raise significant funding (Mississippi). The only Round 2 winner state with no “bad” marks and many good ones is Massachusetts. Massachusetts scores well on fairness and overall funding level. Tennessee, from Round 1 is simply a disgrace! North Carolina is perhaps the weakest link in Round 2, along with Florida which ranks poorly but avoids the bottom ten on any measure, and Hawaii which makes the bottom 10 on measures less within the control of the state – coverage. But, Hawaii has inflicted significant damage on its already struggling public schooling system in recent years.

And here are a few interesting two-dimensional views of RttT Round 1 and Round 2 states. First, here’s a two-dimensional view of educational effort and spending level – spending for high poverty districts. The two are reasonably related. Effort explains about half of the variation in spending levels. States like North Carolina and Tennessee are low on effort and low on spending. States like Massachusetts are relatively high on spending, but average on effort.  Rhode Island, Maryland, New York, and Ohio are above average on spending and effort. But spending level doesn’t guarantee that it’s spent – or distributed – fairly across wealthier and poorer districts.

Here’s a look at “fairness” and spending level. New York is high on spending level, but not so good on fairness. In New York State, wealthy districts in Westchester County and Long Island outspend much of the rest of the nation. But, poorer districts including New York City are largely left out, spending significantly less than the affluent suburbs.  Then there are those wonderful states where higher poverty districts have slightly higher revenue per pupil than lower poverty ones, but for the most part – everyone is similarly deprived. These are the “you get nothing!” (reference to Willy Wonka in previous post) states and Tennessee tops that list! Even more depressing are the states where “you get nothing” in general, and you get less if you are poor. Those states include RttT Round 2 standout North Carolina … and Florida sits on the margin of this group. Massachusetts is the “good” standout in this figure.


And here’s effort and coverage – or the share of 6 to 16 year olds attending the public school system.  New York, Maryland and Ohio (on the margin) do poorly on coverage, but have reasonable overall effort. Delaware is the real outlier here… with very low effort and very, very low coverage. Thankfully, none in Round 2 can match Delaware!

Finally, here is the state and local revenue predicted level for high poverty districts, and NAEP mean 2009, grade 4 reading and math scores (combined).  It’s always fun to throw the outcome data in there. And in this case, the RttT Round 1 and Round 2 winners are distributed across the range. Again, Tennessee from Round 1 is the biggest “bad” outlier, but one could say that Massachusetts from Round 2 is a positive counterbalance. Clearly, the demography and economy of these two states differs significantly. My complaint with Tennessee is not that it performs poorly partly because it has a large, low-income population. Rather, my problem with Tennessee, as I’ve noted many previous times is that TN puts up little effort and spends little, and barely spends even that paltry amount equitably. In addition, as I’ve discussed previously, TN has consistently had the lowest outcome standards.

For more on Rhode Island school funding, see: https://schoolfinance101.wordpress.com/2010/07/01/the-gist-twists-rhode-island-school-finance/

For more on Hawaii, see: https://schoolfinance101.wordpress.com/2009/11/06/hawaiis-funding-mess-my-thoughts-on-why/

As I noted on my previous post regarding Round 1 winners:

So then, who cares? or why should we? Many have criticized me for raising these issues, arguing “that’s not the point of RttT.  It’s (RttT)not about equity or adequacy of funding, or how many kids get that funding. That’s old school – stuff of the past – get over it! This…  This is about INNOVATION! And RttT is based on the ‘best’ measures of states’ effort to innovate… to make change… to reach the top!”

My response is that the above indicators measure Essential Pre-Conditions! One cannot expect successful innovation without first meeting these essential preconditions.  If you want to buy the “business-minded” rhetoric of innovation, which I wrote about here , you also need to buy into the reality that the way in which businesses achieve innovation also involves investment in both R&D and production (coupled with monitoring production quality). You can have all of the R&D and quality monitoring systems in the world, but if you go cheap on production and make a crappy product – you haven’t gotten very far.  On average, it does cost more to produce higher quality products.

This also relates to my post on common standards and the capacity to achieve them. It’s great to set high standards, but if don’t allocate the resources to achieve those standards, you haven’t gotten very far! It costs more to achieve high standards than low ones. Tennessee provides a striking example in the maps from this post! (their low spending seems generally sufficient to achieve their even lower outcome standards!)

That in mind, should states automatically be disqualified from RttT for doing so poorly on these Essential Preconditions? Perhaps not. After all, these are states which may need to race to the top more than others (assuming the proposed RttT strategies actually have anything to do with improving schools). But, for states doing so poorly on key indicators like effort and overall resources, or even the share of kids using the public school system, those states should at least have to explain themselves – and show how they will do their part to rectify these concerns.

For a video version of my comments on the big race, see:


New from the Center on Inventing Research Findings

The other day, the Center on Reinventing Public Education (CRPE) at University of Washington released a bold new study claiming that Washington school districts underpay Math and Science teachers relative to other teachers – which is clearly an abomination in a state that is home to high-tech industries like Boeing and Microsoft.

The study consisted of looking at the average salaries of math and science teachers and other teachers in several large Washington State school districts and showing that in most, the average for math and science teachers is lower than for other teachers. As it turns out, the average experience of math and science teachers is lower and far more of them are in their first five years. So, it’s mainly about the experience differential. The authors infer from this that turnover of math and science teachers must be higher, but never actually test this assumption. They next infer that this turnover must be a function of having less competitive salaries – relative to what they could earn outside of teaching.

The study never calculates relative turnover of math and science versus other teachers. Rather, the study implies that lower average experience levels must be indicative of higher turnover. The only follow-up analysis on this point is to show that math and science teachers, in addition to being less experienced, are also younger. Wow! That doesn’t validate the turnover claim though, which may be true… but no validation here.

This is a silly study to begin with, but check out the not-so-subtle difference between the press release and the study itself.

The Press Release
http://www.crpe.org/cs/crpe/view/news/111

The analysis finds that in twenty-five of the thirty largest districts, math and science teachers had fewer years of teaching experience due to higher turnover—an indication that labor market forces do indeed vary with subject matter expertise. The subject-neutral salary schedule works to ignore these differences.

The Study
http://www.crpe.org/cs/crpe/download/csr_files/rr_crpe_STEM_Aug10.pdf

That said, the lower teacher experience levels are indicative of greater turnover among the math and science teaching ranks, lending support to the hypothesis that math and science teachers may have access to more compelling non-teaching opportunities than do their peers. (p. 5)

Both are a stretch, given the thin analysis, but the press release declares outright that turnover is the issue, while the study merely infers without ever testing or validating.

The study goes on to be an indictment of paying teachers more for years of experience – (because we all know that experience doesn’t matter?) – and argues that differential pay by teaching field is the answer. This is an absurd false dichotomy. Even if it is reasonable to differentiate pay by teaching field that does not mean that it is unreasonable to differentiate by experience, or that taking dollars away from experience-based pay is the only way to differentiate by field.

I happen to agree that there exist significant problems with Washington’s statewide teacher salary schedule, and that among other things, math and science teachers in Washington State are disadvantaged on the broader labor market. But the CRPE study does nothing to advance this argument.

Previous work by Lori Taylor, of Texas A&M does:

Report on Taylor Study:

http://www.wsipp.wa.gov/rptfiles/08-12-2201.pdf

Taylor Study:

http://www.leg.wa.gov/JointCommittees/BEF/Documents/Mtg11-10_11-08/WAWagesDraftRpt.pdf

The CRPE study goes further to say that the findings indicate that school districts haven’t taken seriously a state policy initiative to increase investment in math and science teaching. So let’s say that the bill to which the CRPE press release refers – House Bill 2621 – really did stimulate districts to step up their efforts to hire more math and science teachers. What would likely happen to math and science teacher average salaries? Well, many new math and science teachers would enter the system. That would alter the experience distribution of math and science teachers – they would likely become less experienced on average – and hence their average salaries would decline and be lower than average salaries in other fields not stimulated by similar initiatives.

When I get a chance, I’ll try to play around with my Washington teacher data set and post some follow-up analyses.

Kevin Welner and I point to similar misrepresentations of findings from several reports from this same center in this article on within and between-district financial disparities:

Baker, B. D., & Welner, K. G. (2010). “Premature celebrations: The persistence of interdistrict funding disparities” Educational Policy Analysis Archives, 18(9). Retrieved [date] from http://epaa.asu.edu/ojs/article/view/718

And now, for some fun follow-up figures:

These figures use individual teacher level data from the State of Washington. I include all teachers holding “secondary” assignments and identify teachers certified to teach biology, chemistry, physics, general science and math (and all subcategories) using the certification record files on the same teachers. Note that some teachers in the data set hold multiple assignments, so the total numbers of cases in these graphs is not an exact match for the total number of individual teachers. I haven’t asked for Washington Teacher data for a few years, so these only go up to 2006-07. Unlike the CRPE report, which cherry picks 30 districts, I use the whole state. If I get a chance, I’ll play with some other cuts at the data. These data don’t coincide at all with the CRPE “findings.”

Here are the experience differences:

Here are the salary differences, on average, which coincide with the experience differences:

Now, here are the total numbers of teachers, and apparent decline in share that are math/science certified over this time period. Math/science teachers were relatively flat, while others grew.

Finally, here’s a portion of the regression model of certified base salaries, where I control for degree level, experience, year, hours per day and days per year, all of which influence salaries. Interestingly, this regression shows that math and science teachers, holding all that other stuff constant, made about $380 more than non-math/science teachers, even under the fixed salary schedule.


LA Times Study: Asian math teachers better than Black ones

The big news over the weekend involved the LA Times posting of value-added ratings of LA public school teachers.

Here’s how the Times spun their methodology:

Seeking to shed light on the problem, The Times obtained seven years of math and English test scores from the Los Angeles Unified School District and used the information to estimate the effectiveness of L.A. teachers — something the district could do but has not.

The Times used a statistical approach known as value-added analysis, which rates teachers based on their students’ progress on standardized tests from year to year. Each student’s performance is compared with his or her own in past years, which largely controls for outside influences often blamed for academic failure: poverty, prior learning and other factors.

This spin immediately concerned me, because it appears to assume that simply using a student’s prior score erases, or controls for, any and all differences among students by family backgrounds as well as classroom level differences – who attends school with whom.

Thankfully (thanks to the immediate investigative work of Sherman Dorn), the analysis was at least marginally better than that and conducted by a very technically proficient researcher at RAND named Richard Buddin. Here’s his technical report:

The problem is that even someone as good as Buddin can only work with the data he has. And there are at least 3 major shortcomings of the data that Buddin appeared to have available for his value added models. I’m setting aside here the potential quality of the achievement measures themselves.  Calculating (estimating) a teacher’s effect on their students’ learning and specifically, identifying the differences across teachers where those students are not randomly assigned (with same class size, comparable peer group, same air quality, lighting, materials, supplies, etc.) requires that we do a pretty damn good job of accounting for the measurable differences across the children assigned to teachers. This is especially true if our plan is to post names on the wall (or web)!

Here’s my quick read, short list of shortcomings to Buddin’s data, that I would suspect, lead to significant problems in precisely determining differences in teacher quality across students:

  1. While Buddin’s analysis includes student characteristics that may (and in fact appear to) influence student gains, Buddin – likely due to data limitations – includes only a simple classification variable for whether a student is a Title I student or not, and a simple classification variable for whether a student is limited in their English proficiency. These measures are woefully insufficient for a model being used to label teachers on a website as good or bad. Buddin notes that 97% of children in the lowest performing schools are poor, and 55% in higher performing schools are poor. Identifying children simply as poor or not poor misses entirely the variation among the poor to very poor children in LA public schools – which is most of the variation in family background in LA public schools. That is, the estimated model does not control at all for one teacher teaching a class of children who barely qualify for Title I programs, versus a teacher with a classroom of children of destitute homeless families, or multigenerational poverty. I suspect Buddin, himself, would have liked to have had more detailed information. But, you can only use what you’ve got. When you do, however, you need to be very clear about the shortcomings. Again, most kids in LA public schools are poor and the gradients of poverty are substantial. Those gradients are neglected entirely.  Further, the model includes no “classroom” related factors such as class size, student peer group composition (either by a Hoxby approach of average ability level of peer group, or considering racial composition of peer group as done by Hanushek and Rivkin. Then again, it’s nearly if not entirely impossible to fully correct for classroom level factors in these models.).
  2. It would appear that Buddin’s analysis uses annual testing data, not fall-spring assessments. This means that the year-to-year gains interpreted as “teacher effects” include summer learning and/or summer learning lag. That is, we are assigning blame, or praise to teachers based on what kids learned, or lost over the summer. If this is true of the models, this is deeply problematic. Okay, you say, but Buddin accounted for whether a student was a Title I student and summer opportunities are highly associated with Poverty Status. But, as I note above, this very crude indicator is far from sufficient to differentiate across most LA public school students.
  3. Finally, researchers like Jesse Rothstein, among others have suggested that having multiple years of prior scores on students can significantly reduce the influence of non-random assignment of students to teachers on the ratings of teachers. Rothstein speaks of using 3-years of lagged scores (http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf) so as to sufficiently characterize the learning trajectories of students entering any given teacher’s class. It does not appear that Buddin’s analysis includes multiple lagged scores.

So then what are some possible effects of these problems, where might we notice them, and why might they be problematic?

One important effect, which I’ve blogged about previously, is that the value-added teacher ratings could be substantially biased by the non-random sorting of students – or in more human terms – teachers of children having characteristics not addressed by the models could be unfairly penalized, or for that matter, unfairly benefited.

Buddin is kind enough in his technical paper to provide for us, various teacher characteristics and student characteristics that are associated with the teacher value-added effects – that is, what kinds of teachers are good, and which ones are more likely to suck? Buddin shows some of the usual suspects, like the fact that novice (first 3 years) teachers tended to have lower average value added scores. Now, this might be reasonable if we also knew that novice teachers weren’t necessarily clustered with the poorest of students in the district. But, we don’t know that.

Strangely, Buddin also shows us that the number of gifted children a teacher has affects their value-added estimate – The more gifted children you have, the better teacher you are??? That seems a bit problematic, and raises the question as to why “gifted” was not used as a control measure in the value-added ratings? Statistically, this could be problematic if giftedness was defined by the outcome measure – test scores (making it endogenous). Nonetheless, the finding that having more gifted children is associated with the teacher effectiveness rating raises at least some concern over that pesky little non-random assignment issue.

Now here’s the fun, and most problematic part:

Buddin finds that black teachers have lower value-added scores for both ELA and MATH. Further, these are some of the largest negative effects in the second level analysis – especially for MATH. The interpretation here (for parent readers of the LA Times web site) is that having a black teacher for math is worse than having a novice teacher. In fact, it’s the worst possible thing! Having a black teacher for ELA is comparable to having a novice teacher.

Buddin also finds that having more black students in your class is negatively associated with teacher’s value-added scores, but writes off the effect as small. Teachers of black students in LA are simply worse? There is NO discussion of the potentially significant overlap between black teachers, novice teachers and serving black students, concentrated in black schools (as addressed by Hanushek and Rivken in link above).

By contrast, Buddin finds that having an Asian teacher is much, much better for MATH. In fact, Asian teachers are as much better (than white teachers) for math as black teachers are worse! Parents – go find yourself an Asian math teacher in LA? Also, having more Asian students in your class is associated with higher teacher ratings for Math. That is, you’re a better math teacher if you’ve got more Asian students, and you’re a really good math teacher if you’re Asian and have more Asian students?????

Talk about some nifty statistical stereotyping.

It makes me wonder if there might also be some racial disparity in the “gifted” classification variable, with more Asian students and fewer black students district-wide being classified as “gifted.”

IS ANYONE SEEING THE PROBLEM HERE? Should we really be considering using this information to either guide parent selection of teachers or to decide which teachers get fired?

I discussed the link between non-random assignment and racially disparate effects previously here:

https://schoolfinance101.wordpress.com/2010/06/02/pondering-legal-implications-of-value-added-teacher-evaluation/

Indeed there may be some substantive differences in the average academic (undergraduate & high school) preparation in math of black and Asian teachers in LA. And these differences may translate into real differences in the effectiveness of math teaching. But sadly, we’re not having that conversation here. Rather, the LA times is putting out a database, built on insufficient underlying model parameters, that produces these potentially seriously biased results.

While some of these statistically significant effects might be “small” across the entire population of teachers in LA, the likelihood that these “biases” significantly affect specific individual teacher’s value-added ratings is much greater – and that’s what’s so offensive about the use of this information by the LA Times. The “best possible,” still questionable, models estimated are not being used to draw simple, aggregate conclusions about the degree of variance across schools and classrooms, but rather they are being used to label individual cases from a large data set as “good” or “bad.” That is entirely inappropriate!

Note: On Kane and Staiger versus Rothstein and non-random assignment

Finally, a comment on references to two different studies on the influence of non-random assignment. Those wishing to write off the problems of non-random assignment typically refer to Kane and Staiger’s analysis using a relatively small, randomized sample. Those wishing to raise concerns over non-random assignment typically refer to Jesse Rothstein’s work. Eric Hanushek, in an exceptional overview article on Value Added assessment summarizes these two articles, and his own work as follows:

An alternative approach of Kane and Staiger (2008) of using estimates from a random assignment of teachers to classrooms finds little bias in traditional estimation, although the possible uniqueness of the sample and the limitations of the specification test suggest care in interpretation of the results.

A compelling part of the analysis in Rothstein (2010) is the development of falsification tests, where future teachers are shown to have significant effects on current achievement. Although this could be driven in part by subsequent year classroom placement on based on current achievement, the analysis suggests the presence of additional unobserved differences..

In related work, Hanushek and Rivkin (2010) use alternative, albeit imperfect, methods for judging which schools systematically sort students in a large Texas district. In the “sorted” samples, where random classroom assignment is rejected, this falsification test performs like that in North Carolina, but this is not the case in the remaining “unsorted” sample where random assignment is not rejected.

http://edpro.stanford.edu/hanushek/admin/pages/files/uploads/HanushekRivkin%20AEA2010.CALDER.pdf

Newsflash: The upper half is better than average!

I’ve seen many versions of this argument in the past year, but this one comes from Kevin Carey in response to the Civil Rights Framework which criticized the current administration’s overemphasis on Charter Schools as lacking evidentiary support. Carey responds that the Civil Rights Framework selectively interprets the research on Charter schools, noting:

Here’s the problem: the contention that charters have “little or no evidentiary support” rests on studies finding that the average performance of all charters is generally indistinguishable from the average regular public school. At the same time, reasonable people acknowledge that the best charter schools–let’s call them “high-quality” charter schools–are really good, and there’s plenty of research to support this.

http://www.quickanded.com/2010/08/evidence-and-the-civil-rights-group-framework.html

I recall a similar comment in the media a few months back, by a researcher, regarding a national charter schools study – something to the effect of – Charter schools on average performed similarly to traditional public schools, but if we look at the upper half of the charter schools in the sample, they substantially  outperformed the average public school serving similar students.

These statements have been driving me crazy for months now. Here’s why –

To put it in really simple terms:

THE UPPER HALF OF ALL SCHOOLS OUTPERFORM THE AVERAGE OF ALL SCHOOLS!!!!!

or … Good schools outperform average ones. Really?

Why should that be any different for charter schools (accepting a similar distribution) that have a similar average performance to all schools?

This is absurd logic for promoting charter schools as some sort of unified reform strategy – Saying… we want to replicate the best charter schools (not that other half of them that don’t do so well).

Yes, one can point to specific analyses of specific charter models adopted in specific locations and identify them as particularly successful. And, we might learn something from these models which might be used in new charter schools or might even be used in traditional public schools.

But the idea that “successful charters” (the upper half) are evidence that charters are “successful” is just plain silly.

=======

Let’s throw a few visuals and numbers on my whining session above.  Below are some snapshots of New York City Charter schools. First, lets take a quick look at the mismatched demographics of New York City charters compared to same grade level traditional public schools. Here are the Free Lunch rates. I’ve tended to focus on Free Lunch rates rather than Free and Reduced, because Free Lunch falls under a lower poverty threshold, and, as my previous analyses have shown, while charters often serve similar numbers of combined free and reduced lunch children, they tend to serve the less poor among the poor (larger reduced shares, smaller free shares). This graph confirms my previous findings, and is based on data corroborated from both the NCES Common Core, Public School Universe Data from 2007-08 and the New York State Education Department School Report Cards.  Note also that the biggest differences are at the elementary level, which covers most of the charter schools.

Second, let’s look at the rates of children who are limited in their English Language proficiency. Here, the differences at the elementary level are huge! Charters in NYC simply don’t serve limited English proficient children!

Now for a few oversimplified scatterplots comparing charter school performance outcomes to traditional public schools – all “Regular Schools” by the school type classification in then NCES Common Core – and compared against those in the same borough. I’ve focused on Brooklyn and the Bronx here because of the wide variations in student population composition across Manhattan schools.

First note that none of the charters in the Bronx which had 8th grade 2009 test scores available had a free lunch rate over 80%, while several traditional public schools in the Bronx did. This chart shows the relationship between % scoring level 4 (top level) and % qualifying for free lunch. Charters are named and shown in red. Traditional publics are hollow circles. Both groups scatter! In fact, there are a few traditional publics at the top (which may be classified as “regular schools” but may be far from regular). Among Charters, Bronx Prep, KIPP Academy and Icahn 1 do rather well. Hyde Leadership (higher poverty than the other charters) and Harriet Tubman – not so well. But there are plenty of traditional public schools in the Bronx that appear to do well, and others not so well.

Here are the Brooklyn Charters and traditional public schools on the same outcome measure – percent scoring level 4 or higher on 8th grade math.  Here, all but Brooklyn Excelsior Charter have much lower poverty rates and simply aren’t comparable to most Brooklyn traditional public schools. And don’t forget, there are also likely very large differences in rates of children with other needs – like limited English proficiency. Williamsburg Collegiate and Brooklyn Excelsior appear to be doing quite well. But then again, Williamsburg Collegiate starts at 5th grade, so their success is likely at least partly a function of feeder schools.  There are plenty of “high flying” traditional public schools in this picture as well… and likely a few unique explanations as to why they fly so high. There are also plenty of low-flying charters. Here are the Bronx Charters in 2009, on 5th grade math. Again, the charters generally have much lower free lunch rates than the traditional public schools. In this figure, most of the traditional public schools have free lunch rates over 80% while none of the charters do. And again, charter performance, like traditional public school performance is scattered – some low – some high.

And finally, those Brooklyn charters on 5th grade math performance. Low poverty and scattered (except Brooklyn Excelsior which is higher poverty, and seemingly doin’ pretty well).A few new ones – Here are the Bronx and Brooklyn charter 5th grade performance levels based on a regression model controlling for stability rates, free lunch, ELL concentrations, year of data (using 2008 and 2009) and comparing specifically against other schools in the same borough. The performance levels are represented by the residuals of the regression model. Above “0” on the vertical axis is “better than predicted – or better than average at given characteristics, and below “0” is below expected at given characteristics.

In these graphs, most of the highest high flyers are non-charters. Charters are split above and below the “0” line, as one might expect.

Anyway, on this cursory walk through of the relative demographics and relative position of charters in the performance mix, it continues to evade me as to why we should be considering “charters” as a specific reform strategy and one that can raise urban school districts from their dreadful depths of failure. Had I not indicated which schools were charters in these graphs, I wonder how many “reformy” types could have picked out the dots that were charters. I suspect, given a blind sample, they would select the dots that fall furthest out of line in the upper right hand corner of each graph – the highest performing high poverty schools.  In three of the above 4 graphs, they’d have picked non-charters first and would have done so on the misguided perceptions that a) charters are the high flyers in any mix of schools and b) charters serve very high poverty populations. The reality is that charters are as scattered as traditional schools, and in general in NYC, they are serving lower need populations.

=======

A little more fun here. Here are schools in the area around the Harlem Children’s Zone. First, here are the maps of free lunch shares and LEP shares for charter and traditional public schools.  Green dots have lower rates of LEP or free lunch. Stars indicate charters. Names are adjacent to schools. Note that most of the charters are lower poverty and much lower LEP than surrounding schools.

And here are the residuals of the same regression model used above, applied in this case to Grade 5 Math Mean Scale Scores. Red dots are schools that perform less well than expected and green dots are those that perform much better than expected. Note that charters are a mixed bag, and the HCZ charter performs particularly poorly – which caught me off guard.