Blog

New from the Center on Inventing Research Findings

The other day, the Center on Reinventing Public Education (CRPE) at University of Washington released a bold new study claiming that Washington school districts underpay Math and Science teachers relative to other teachers – which is clearly an abomination in a state that is home to high-tech industries like Boeing and Microsoft.

The study consisted of looking at the average salaries of math and science teachers and other teachers in several large Washington State school districts and showing that in most, the average for math and science teachers is lower than for other teachers. As it turns out, the average experience of math and science teachers is lower and far more of them are in their first five years. So, it’s mainly about the experience differential. The authors infer from this that turnover of math and science teachers must be higher, but never actually test this assumption. They next infer that this turnover must be a function of having less competitive salaries – relative to what they could earn outside of teaching.

The study never calculates relative turnover of math and science versus other teachers. Rather, the study implies that lower average experience levels must be indicative of higher turnover. The only follow-up analysis on this point is to show that math and science teachers, in addition to being less experienced, are also younger. Wow! That doesn’t validate the turnover claim though, which may be true… but no validation here.

This is a silly study to begin with, but check out the not-so-subtle difference between the press release and the study itself.

The Press Release
http://www.crpe.org/cs/crpe/view/news/111

The analysis finds that in twenty-five of the thirty largest districts, math and science teachers had fewer years of teaching experience due to higher turnover—an indication that labor market forces do indeed vary with subject matter expertise. The subject-neutral salary schedule works to ignore these differences.

The Study
http://www.crpe.org/cs/crpe/download/csr_files/rr_crpe_STEM_Aug10.pdf

That said, the lower teacher experience levels are indicative of greater turnover among the math and science teaching ranks, lending support to the hypothesis that math and science teachers may have access to more compelling non-teaching opportunities than do their peers. (p. 5)

Both are a stretch, given the thin analysis, but the press release declares outright that turnover is the issue, while the study merely infers without ever testing or validating.

The study goes on to be an indictment of paying teachers more for years of experience – (because we all know that experience doesn’t matter?) – and argues that differential pay by teaching field is the answer. This is an absurd false dichotomy. Even if it is reasonable to differentiate pay by teaching field that does not mean that it is unreasonable to differentiate by experience, or that taking dollars away from experience-based pay is the only way to differentiate by field.

I happen to agree that there exist significant problems with Washington’s statewide teacher salary schedule, and that among other things, math and science teachers in Washington State are disadvantaged on the broader labor market. But the CRPE study does nothing to advance this argument.

Previous work by Lori Taylor, of Texas A&M does:

Report on Taylor Study:

http://www.wsipp.wa.gov/rptfiles/08-12-2201.pdf

Taylor Study:

http://www.leg.wa.gov/JointCommittees/BEF/Documents/Mtg11-10_11-08/WAWagesDraftRpt.pdf

The CRPE study goes further to say that the findings indicate that school districts haven’t taken seriously a state policy initiative to increase investment in math and science teaching. So let’s say that the bill to which the CRPE press release refers – House Bill 2621 – really did stimulate districts to step up their efforts to hire more math and science teachers. What would likely happen to math and science teacher average salaries? Well, many new math and science teachers would enter the system. That would alter the experience distribution of math and science teachers – they would likely become less experienced on average – and hence their average salaries would decline and be lower than average salaries in other fields not stimulated by similar initiatives.

When I get a chance, I’ll try to play around with my Washington teacher data set and post some follow-up analyses.

Kevin Welner and I point to similar misrepresentations of findings from several reports from this same center in this article on within and between-district financial disparities:

Baker, B. D., & Welner, K. G. (2010). “Premature celebrations: The persistence of interdistrict funding disparities” Educational Policy Analysis Archives, 18(9). Retrieved [date] from http://epaa.asu.edu/ojs/article/view/718

And now, for some fun follow-up figures:

These figures use individual teacher level data from the State of Washington. I include all teachers holding “secondary” assignments and identify teachers certified to teach biology, chemistry, physics, general science and math (and all subcategories) using the certification record files on the same teachers. Note that some teachers in the data set hold multiple assignments, so the total numbers of cases in these graphs is not an exact match for the total number of individual teachers. I haven’t asked for Washington Teacher data for a few years, so these only go up to 2006-07. Unlike the CRPE report, which cherry picks 30 districts, I use the whole state. If I get a chance, I’ll play with some other cuts at the data. These data don’t coincide at all with the CRPE “findings.”

Here are the experience differences:

Here are the salary differences, on average, which coincide with the experience differences:

Now, here are the total numbers of teachers, and apparent decline in share that are math/science certified over this time period. Math/science teachers were relatively flat, while others grew.

Finally, here’s a portion of the regression model of certified base salaries, where I control for degree level, experience, year, hours per day and days per year, all of which influence salaries. Interestingly, this regression shows that math and science teachers, holding all that other stuff constant, made about $380 more than non-math/science teachers, even under the fixed salary schedule.


LA Times Study: Asian math teachers better than Black ones

The big news over the weekend involved the LA Times posting of value-added ratings of LA public school teachers.

Here’s how the Times spun their methodology:

Seeking to shed light on the problem, The Times obtained seven years of math and English test scores from the Los Angeles Unified School District and used the information to estimate the effectiveness of L.A. teachers — something the district could do but has not.

The Times used a statistical approach known as value-added analysis, which rates teachers based on their students’ progress on standardized tests from year to year. Each student’s performance is compared with his or her own in past years, which largely controls for outside influences often blamed for academic failure: poverty, prior learning and other factors.

This spin immediately concerned me, because it appears to assume that simply using a student’s prior score erases, or controls for, any and all differences among students by family backgrounds as well as classroom level differences – who attends school with whom.

Thankfully (thanks to the immediate investigative work of Sherman Dorn), the analysis was at least marginally better than that and conducted by a very technically proficient researcher at RAND named Richard Buddin. Here’s his technical report:

The problem is that even someone as good as Buddin can only work with the data he has. And there are at least 3 major shortcomings of the data that Buddin appeared to have available for his value added models. I’m setting aside here the potential quality of the achievement measures themselves.  Calculating (estimating) a teacher’s effect on their students’ learning and specifically, identifying the differences across teachers where those students are not randomly assigned (with same class size, comparable peer group, same air quality, lighting, materials, supplies, etc.) requires that we do a pretty damn good job of accounting for the measurable differences across the children assigned to teachers. This is especially true if our plan is to post names on the wall (or web)!

Here’s my quick read, short list of shortcomings to Buddin’s data, that I would suspect, lead to significant problems in precisely determining differences in teacher quality across students:

  1. While Buddin’s analysis includes student characteristics that may (and in fact appear to) influence student gains, Buddin – likely due to data limitations – includes only a simple classification variable for whether a student is a Title I student or not, and a simple classification variable for whether a student is limited in their English proficiency. These measures are woefully insufficient for a model being used to label teachers on a website as good or bad. Buddin notes that 97% of children in the lowest performing schools are poor, and 55% in higher performing schools are poor. Identifying children simply as poor or not poor misses entirely the variation among the poor to very poor children in LA public schools – which is most of the variation in family background in LA public schools. That is, the estimated model does not control at all for one teacher teaching a class of children who barely qualify for Title I programs, versus a teacher with a classroom of children of destitute homeless families, or multigenerational poverty. I suspect Buddin, himself, would have liked to have had more detailed information. But, you can only use what you’ve got. When you do, however, you need to be very clear about the shortcomings. Again, most kids in LA public schools are poor and the gradients of poverty are substantial. Those gradients are neglected entirely.  Further, the model includes no “classroom” related factors such as class size, student peer group composition (either by a Hoxby approach of average ability level of peer group, or considering racial composition of peer group as done by Hanushek and Rivkin. Then again, it’s nearly if not entirely impossible to fully correct for classroom level factors in these models.).
  2. It would appear that Buddin’s analysis uses annual testing data, not fall-spring assessments. This means that the year-to-year gains interpreted as “teacher effects” include summer learning and/or summer learning lag. That is, we are assigning blame, or praise to teachers based on what kids learned, or lost over the summer. If this is true of the models, this is deeply problematic. Okay, you say, but Buddin accounted for whether a student was a Title I student and summer opportunities are highly associated with Poverty Status. But, as I note above, this very crude indicator is far from sufficient to differentiate across most LA public school students.
  3. Finally, researchers like Jesse Rothstein, among others have suggested that having multiple years of prior scores on students can significantly reduce the influence of non-random assignment of students to teachers on the ratings of teachers. Rothstein speaks of using 3-years of lagged scores (http://gsppi.berkeley.edu/faculty/jrothstein/published/rothstein_vam2.pdf) so as to sufficiently characterize the learning trajectories of students entering any given teacher’s class. It does not appear that Buddin’s analysis includes multiple lagged scores.

So then what are some possible effects of these problems, where might we notice them, and why might they be problematic?

One important effect, which I’ve blogged about previously, is that the value-added teacher ratings could be substantially biased by the non-random sorting of students – or in more human terms – teachers of children having characteristics not addressed by the models could be unfairly penalized, or for that matter, unfairly benefited.

Buddin is kind enough in his technical paper to provide for us, various teacher characteristics and student characteristics that are associated with the teacher value-added effects – that is, what kinds of teachers are good, and which ones are more likely to suck? Buddin shows some of the usual suspects, like the fact that novice (first 3 years) teachers tended to have lower average value added scores. Now, this might be reasonable if we also knew that novice teachers weren’t necessarily clustered with the poorest of students in the district. But, we don’t know that.

Strangely, Buddin also shows us that the number of gifted children a teacher has affects their value-added estimate – The more gifted children you have, the better teacher you are??? That seems a bit problematic, and raises the question as to why “gifted” was not used as a control measure in the value-added ratings? Statistically, this could be problematic if giftedness was defined by the outcome measure – test scores (making it endogenous). Nonetheless, the finding that having more gifted children is associated with the teacher effectiveness rating raises at least some concern over that pesky little non-random assignment issue.

Now here’s the fun, and most problematic part:

Buddin finds that black teachers have lower value-added scores for both ELA and MATH. Further, these are some of the largest negative effects in the second level analysis – especially for MATH. The interpretation here (for parent readers of the LA Times web site) is that having a black teacher for math is worse than having a novice teacher. In fact, it’s the worst possible thing! Having a black teacher for ELA is comparable to having a novice teacher.

Buddin also finds that having more black students in your class is negatively associated with teacher’s value-added scores, but writes off the effect as small. Teachers of black students in LA are simply worse? There is NO discussion of the potentially significant overlap between black teachers, novice teachers and serving black students, concentrated in black schools (as addressed by Hanushek and Rivken in link above).

By contrast, Buddin finds that having an Asian teacher is much, much better for MATH. In fact, Asian teachers are as much better (than white teachers) for math as black teachers are worse! Parents – go find yourself an Asian math teacher in LA? Also, having more Asian students in your class is associated with higher teacher ratings for Math. That is, you’re a better math teacher if you’ve got more Asian students, and you’re a really good math teacher if you’re Asian and have more Asian students?????

Talk about some nifty statistical stereotyping.

It makes me wonder if there might also be some racial disparity in the “gifted” classification variable, with more Asian students and fewer black students district-wide being classified as “gifted.”

IS ANYONE SEEING THE PROBLEM HERE? Should we really be considering using this information to either guide parent selection of teachers or to decide which teachers get fired?

I discussed the link between non-random assignment and racially disparate effects previously here:

https://schoolfinance101.wordpress.com/2010/06/02/pondering-legal-implications-of-value-added-teacher-evaluation/

Indeed there may be some substantive differences in the average academic (undergraduate & high school) preparation in math of black and Asian teachers in LA. And these differences may translate into real differences in the effectiveness of math teaching. But sadly, we’re not having that conversation here. Rather, the LA times is putting out a database, built on insufficient underlying model parameters, that produces these potentially seriously biased results.

While some of these statistically significant effects might be “small” across the entire population of teachers in LA, the likelihood that these “biases” significantly affect specific individual teacher’s value-added ratings is much greater – and that’s what’s so offensive about the use of this information by the LA Times. The “best possible,” still questionable, models estimated are not being used to draw simple, aggregate conclusions about the degree of variance across schools and classrooms, but rather they are being used to label individual cases from a large data set as “good” or “bad.” That is entirely inappropriate!

Note: On Kane and Staiger versus Rothstein and non-random assignment

Finally, a comment on references to two different studies on the influence of non-random assignment. Those wishing to write off the problems of non-random assignment typically refer to Kane and Staiger’s analysis using a relatively small, randomized sample. Those wishing to raise concerns over non-random assignment typically refer to Jesse Rothstein’s work. Eric Hanushek, in an exceptional overview article on Value Added assessment summarizes these two articles, and his own work as follows:

An alternative approach of Kane and Staiger (2008) of using estimates from a random assignment of teachers to classrooms finds little bias in traditional estimation, although the possible uniqueness of the sample and the limitations of the specification test suggest care in interpretation of the results.

A compelling part of the analysis in Rothstein (2010) is the development of falsification tests, where future teachers are shown to have significant effects on current achievement. Although this could be driven in part by subsequent year classroom placement on based on current achievement, the analysis suggests the presence of additional unobserved differences..

In related work, Hanushek and Rivkin (2010) use alternative, albeit imperfect, methods for judging which schools systematically sort students in a large Texas district. In the “sorted” samples, where random classroom assignment is rejected, this falsification test performs like that in North Carolina, but this is not the case in the remaining “unsorted” sample where random assignment is not rejected.

http://edpro.stanford.edu/hanushek/admin/pages/files/uploads/HanushekRivkin%20AEA2010.CALDER.pdf

Newsflash: The upper half is better than average!

I’ve seen many versions of this argument in the past year, but this one comes from Kevin Carey in response to the Civil Rights Framework which criticized the current administration’s overemphasis on Charter Schools as lacking evidentiary support. Carey responds that the Civil Rights Framework selectively interprets the research on Charter schools, noting:

Here’s the problem: the contention that charters have “little or no evidentiary support” rests on studies finding that the average performance of all charters is generally indistinguishable from the average regular public school. At the same time, reasonable people acknowledge that the best charter schools–let’s call them “high-quality” charter schools–are really good, and there’s plenty of research to support this.

http://www.quickanded.com/2010/08/evidence-and-the-civil-rights-group-framework.html

I recall a similar comment in the media a few months back, by a researcher, regarding a national charter schools study – something to the effect of – Charter schools on average performed similarly to traditional public schools, but if we look at the upper half of the charter schools in the sample, they substantially  outperformed the average public school serving similar students.

These statements have been driving me crazy for months now. Here’s why –

To put it in really simple terms:

THE UPPER HALF OF ALL SCHOOLS OUTPERFORM THE AVERAGE OF ALL SCHOOLS!!!!!

or … Good schools outperform average ones. Really?

Why should that be any different for charter schools (accepting a similar distribution) that have a similar average performance to all schools?

This is absurd logic for promoting charter schools as some sort of unified reform strategy – Saying… we want to replicate the best charter schools (not that other half of them that don’t do so well).

Yes, one can point to specific analyses of specific charter models adopted in specific locations and identify them as particularly successful. And, we might learn something from these models which might be used in new charter schools or might even be used in traditional public schools.

But the idea that “successful charters” (the upper half) are evidence that charters are “successful” is just plain silly.

=======

Let’s throw a few visuals and numbers on my whining session above.  Below are some snapshots of New York City Charter schools. First, lets take a quick look at the mismatched demographics of New York City charters compared to same grade level traditional public schools. Here are the Free Lunch rates. I’ve tended to focus on Free Lunch rates rather than Free and Reduced, because Free Lunch falls under a lower poverty threshold, and, as my previous analyses have shown, while charters often serve similar numbers of combined free and reduced lunch children, they tend to serve the less poor among the poor (larger reduced shares, smaller free shares). This graph confirms my previous findings, and is based on data corroborated from both the NCES Common Core, Public School Universe Data from 2007-08 and the New York State Education Department School Report Cards.  Note also that the biggest differences are at the elementary level, which covers most of the charter schools.

Second, let’s look at the rates of children who are limited in their English Language proficiency. Here, the differences at the elementary level are huge! Charters in NYC simply don’t serve limited English proficient children!

Now for a few oversimplified scatterplots comparing charter school performance outcomes to traditional public schools – all “Regular Schools” by the school type classification in then NCES Common Core – and compared against those in the same borough. I’ve focused on Brooklyn and the Bronx here because of the wide variations in student population composition across Manhattan schools.

First note that none of the charters in the Bronx which had 8th grade 2009 test scores available had a free lunch rate over 80%, while several traditional public schools in the Bronx did. This chart shows the relationship between % scoring level 4 (top level) and % qualifying for free lunch. Charters are named and shown in red. Traditional publics are hollow circles. Both groups scatter! In fact, there are a few traditional publics at the top (which may be classified as “regular schools” but may be far from regular). Among Charters, Bronx Prep, KIPP Academy and Icahn 1 do rather well. Hyde Leadership (higher poverty than the other charters) and Harriet Tubman – not so well. But there are plenty of traditional public schools in the Bronx that appear to do well, and others not so well.

Here are the Brooklyn Charters and traditional public schools on the same outcome measure – percent scoring level 4 or higher on 8th grade math.  Here, all but Brooklyn Excelsior Charter have much lower poverty rates and simply aren’t comparable to most Brooklyn traditional public schools. And don’t forget, there are also likely very large differences in rates of children with other needs – like limited English proficiency. Williamsburg Collegiate and Brooklyn Excelsior appear to be doing quite well. But then again, Williamsburg Collegiate starts at 5th grade, so their success is likely at least partly a function of feeder schools.  There are plenty of “high flying” traditional public schools in this picture as well… and likely a few unique explanations as to why they fly so high. There are also plenty of low-flying charters. Here are the Bronx Charters in 2009, on 5th grade math. Again, the charters generally have much lower free lunch rates than the traditional public schools. In this figure, most of the traditional public schools have free lunch rates over 80% while none of the charters do. And again, charter performance, like traditional public school performance is scattered – some low – some high.

And finally, those Brooklyn charters on 5th grade math performance. Low poverty and scattered (except Brooklyn Excelsior which is higher poverty, and seemingly doin’ pretty well).A few new ones – Here are the Bronx and Brooklyn charter 5th grade performance levels based on a regression model controlling for stability rates, free lunch, ELL concentrations, year of data (using 2008 and 2009) and comparing specifically against other schools in the same borough. The performance levels are represented by the residuals of the regression model. Above “0” on the vertical axis is “better than predicted – or better than average at given characteristics, and below “0” is below expected at given characteristics.

In these graphs, most of the highest high flyers are non-charters. Charters are split above and below the “0” line, as one might expect.

Anyway, on this cursory walk through of the relative demographics and relative position of charters in the performance mix, it continues to evade me as to why we should be considering “charters” as a specific reform strategy and one that can raise urban school districts from their dreadful depths of failure. Had I not indicated which schools were charters in these graphs, I wonder how many “reformy” types could have picked out the dots that were charters. I suspect, given a blind sample, they would select the dots that fall furthest out of line in the upper right hand corner of each graph – the highest performing high poverty schools.  In three of the above 4 graphs, they’d have picked non-charters first and would have done so on the misguided perceptions that a) charters are the high flyers in any mix of schools and b) charters serve very high poverty populations. The reality is that charters are as scattered as traditional schools, and in general in NYC, they are serving lower need populations.

=======

A little more fun here. Here are schools in the area around the Harlem Children’s Zone. First, here are the maps of free lunch shares and LEP shares for charter and traditional public schools.  Green dots have lower rates of LEP or free lunch. Stars indicate charters. Names are adjacent to schools. Note that most of the charters are lower poverty and much lower LEP than surrounding schools.

And here are the residuals of the same regression model used above, applied in this case to Grade 5 Math Mean Scale Scores. Red dots are schools that perform less well than expected and green dots are those that perform much better than expected. Note that charters are a mixed bag, and the HCZ charter performs particularly poorly – which caught me off guard.

What does the education level of 25 to 34 year olds really mean?

About a week ago, The College Board released their latest status report in their college completion series.

http://completionagenda.collegeboard.org/sites/default/files/reports_pdf/Progress_Executive_Summary.pdf

The parts of the report that seemed to grab the most media attention were those related to a) comparing the US to other countries on the percent of 25 to 34 year olds who hold an associates degree or higher and b) comparing US states to one another on the same measure.

Newspapers across the country ran with this stuff and Twitter was buzzing with punditry on what these indicators meant about the quality of K-12 public schools in each state. Our public schools must be failing us if we’re only 24th on the education level of our younger adults – one Missouri pundit tweeted (related news story here).

The first thing that caught my eye was that Washington, DC was first in the rankings of percent of 25 to 34 year olds with an associates degree or higher.  Of course it is. Washington DC is a magnet for recent college graduates. Clearly, this particular indicator says as much about the employment options for a young, college educated workforce as it does about a state’s own education system. This indicator also tells us something about the education level and expectations of the previous generation – parents of these 25 to 34 year olds, whether in the same state or elsewhere. And, this indicator may also tell us something about the extent to which a state imports or exports college students.

So, I decided to play with some data…’cuz that’s what I like to do… just to see how these rankings might change if I tweaked them a bit.

I decided it might be fun to look at the differences in the rates of college educated adults – % of 25 to 34 year olds with a bachelors degree or higher – across states in three different ways:

  1. percent of 25 to 34 year old current adult residents who hold a BA or higher
  2. percent of 25 to 34 year old adult current residents who were born in the state who hold a BA or higher
  3. percent of 25 to 34 year old adults who were born in the state, whether they continue to reside there or not, who hold a BA or higher

It would seem to me that the second of these measures is most on target – the percent of the native population that holds a certain level of education. Needless to say, when I focus on the second measure, the rankings change somewhat. Here it is:

Table 1

Education Level (% BA or Higher) of the 25 to 34 Year Old Population by State

U.S. Census – American Community Survey 2006 to 2008

Data Source: Steven Ruggles, J. Trent Alexander, Katie Genadek, Ronald Goeken, Matthew B. Schroeder, and Matthew Sobek. Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database]. Minneapolis: University of Minnesota, 2010.

Washington DC which ranks 1st on resident college graduates drops to 24th on native college graduates. MA, NY and NJ which were 2, 5 & 4, are now 1, 2, 3. Virginia goes from 9th to 26th and Maryland goes from 6th to 15th when only natives are considered. This is likely a DC effect as well. NH also drops quite a bit. Wisconsin rises quite a bit. Overall, there are some pretty big changes here.

Here are a few scatterplots – ‘cuz nothin’ is more fun than a good scatterplot.

This one shows on the horizontal axis, the share of 25 to 34 year old residents who are natives (born there).  On the vertical axis is the % BA or higher for all current residents. There’s DC, way above the rest on the vertical axis and pretty far to the left on the horizontal – that is, not too many 25 to 34 year olds who live there, were born there. The native share is only lower in Nevada. But Nevada doesn’t seem to be importing college grads!

This one shows the relationship between the % BA or higher among all current residents (horizontal axis) and % BA or higher among native residents (born and live there). Clearly there’s a pretty strong relationship between the two. But, there is enough variation to really change some rankings. Mass is high either way.  The big movers are those identified above, like Maryland, Virginia and New Hampshire, which have much more educated resident young adult populations than native resident young adult populations.

This one puts the “native share” again on the horizontal axis. On the vertical axis is a measure of the difference in the education level of all current residents (25 to 34) and native current residents. It’s somewhat of a net “import” effect measure. How much more educated is the current resident population than the born and raised population? Now, this is net difference, including the fact that some individuals who were born and raised in a state might have left and become more educated. Big net importers here appear to be Maryland, Virginia and Vermont and New Hampshire (Vermont surprised me a bit here… since there isn’t a whole lot of industry to attract college grads, but Burlington does always make those “great places to live” lists). It might also be a small sample size issue with the Vermont data. At the other end of the picture are Nebraska and Nevada, which don’t appear to importing a more educated adult population. Strangely, all but Nebraska are in the positive zone on this measure (note that this measure does not have to be net-zero across states because between state migration is not the only type of migration occurring. International migration may also affect these differences. This may also reflect the fact that more educated individuals tend to be more mobile. Just pondering).

In this one, we have the “native share” again on the horizontal axis, and the difference between the education level of those born in the state – whether they stayed or not – and those who reside in the state. This is somewhat of a net “export” measure. In this case, it would appear that Wyoming is the big loser. So too are Nebraska and Wisconsin. This is the one interesting piece about Wyoming. In the rankings above, Wyoming doesn’t move much. It’s 47th in % BA for current residents and 48th for native residents. But, Wyoming does much better on the education level of those born in the state, whether they stay or not – which apparently they don’t if they have a BA or higher.

So what does all of this mean? Probably not much. These figures and additional analyses certainly tell a more nuanced story than the media buzz of last week. But, it’s hard to really link much of this back to the quality of states’ underlying elementary and secondary education systems. Far too many factors are in play here, and even tweaking this one factor – whether residents are native residents or not- has significant consequences for state rankings.

So much for attaching any simple, bold statement about [YOUR STATE HERE] to that huge, pull-out multi-color map in the College Board Report!


Rolling Dice: If I roll a “6” you’re fired!

Okay… Picture this…I’m rolling dice… and each time I roll a “6” some loud-mouthed, tweet happy pundit who just loves value-added assessment for teachers gets fired. Sound fair? It might happen to  someone who sucks at their job…or might just be someone who is rather average. Doesn’t matter. They lost on the roll of the dice.  A 1 in 6 chance. Not that bad. A 5 in 6 chance of keeping their job. Can’t you live with that?

This report was just released the other day from the National Center for Education Statistics:

http://ies.ed.gov/ncee/pubs/20104004/pdf/20104004.pdf

The report carries out a series of statistical tests to determine the identification “error” rates for “bad teachers” when using typical value added statistical methods. Here’s a synopsis of the findings from the report itself:

Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively.

Where:

Type I error rate (α) is the probability that based on c years of data, the hypothesis test will find that a truly average teacher (such as Teacher 4) performed significantly worse than average. (p. 12)

So, that means that there is about a 25% chance, if using three years of data or 35% chance if using 1 year of data that a teacher who is “average” would be identified as “significantly worse than average” and potentially be fired. So, what I really need are some 4 sided dice. I gave the pundits odds that are too good! Admittedly, this is the likelihood of identifying an “average” teacher as well below average. The likelihood of identifying an above average teacher as below average would be lower. Here’s the relevant definition of a “false positive” error rate from the study”

the false positive error rate, ()FPRq, is the probability that a teacher (such as Teacher 5) whose true performance level is q SDs above average is falsely identified for special assistance. (p. 12)

From the first quote above, even this occurs 1 in 10 times (given three years of data and 2 in 10 given only one year). And here’s the definition of a “false negative error:”

false negative error rate is the probability that the hypothesis test will fail to identify teachers (such as Teachers 1 and 2 in Figure 2.1) whose true performance is at least T SDs below average.

…which also occurs 1 in 10 times (given three years of data and 2 in 10 given only one year).

These concerns are not new. In a previous post, I discuss various problems with using value added measures for identifying good and bad teachers, such as temporal instability: http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf.

The introduction of this new report notes:

Existing research has consistently found that teacher- and school-level averages of student test score gains can be unstable over time. Studies have found only moderate year-to-year correlations—ranging from 0.2 to 0.6—in the value-added estimates of individual teachers (McCaffrey et al. 2009; Goldhaber and Hansen 2008) or small to medium-sized school grade-level teams (Kane and Staiger 2002b). As a result, there are significant annual changes in teacher rankings based on value-added estimates.

In my first post on this topic (and subsequent ones), I point out that the National Academies have already cautioned that:

“A student’s scores may be affected by many factors other than a teacher — his or her motivation, for example, or the amount of parental support — and value-added techniques have not yet found a good way to account for these other elements.”

http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=1278

And again, this new report provides a laundry list of factors that affect value-added assessment beyond the scope of the analysis itself:

However, several other features of value-added estimators that have been analyzed in the literature also have important implications for the appropriate use of value-added modeling in performance measurement. These features include the extent of estimator bias (Kane and Staiger 2008; Rothstein 2010; Koedel and Betts 2009), the scaling of test scores used in the estimates (Ballou 2009; Briggs and Weeks 2009), the degree to which the estimates reflect students’ future benefits from their current teachers’ instruction (Jacob et al. 2008), the appropriate reference point from which to compare the magnitude of estimation errors (Rogosa 2005), the association between value-added estimates and other measures of teacher quality (Rockoff et al. 2008; Jacob and Lefgren 2008), and the presence of spillover effects between teachers (Jackson and Bruegmann 2009).

In my opinion, the most significant problem here is the non-random assignment problem. The noise problem is significant and important, but much less significant than the non-random assignment problem. It just happens to be the topic of the day.

But alas, we continue to move forward… full steam ahead.

As I see it there are two groups of characters pitching fast-track adoption of value-added teacher evaluation policies.

Statistically Inept Pundits (who really don’t care anyway): The statistically inept pundits are those we see on Twitter every day, applauding the mass firing of DC teachers, praising the Colorado teacher evaluation bill and thinking that RttT is just AWESOME, regardless of the mixed (at best) evidence behind the reforms promoted by RttT (like value-added teacher assessment). My take is that they have no idea what any of this means… have little capacity to understand it anyway… and probably don’t much care. To them, I’m just a curmudgeonly academic throwing a wet blanket on their teacher bashing party. After all, who… but a wet blanket could really be against making sure all kids have good teachers… making sure that we fire and/or lay off the bad teachers, not just the inexperienced ones. These teachers are dangerous after all. They are hurting kids. We must stop them! Can’t argue that.  Or can we? The problem is, we just don’t have ideal, or even reasonably good methods for distinguishing between those good and bad teachers. And school districts that are all-of-the-sudden facing huge budget deficits and laying off hundreds of teachers, don’t retroactively have in place an evaluation system with sufficient precision to weed out the bad – nor could they.  Implementing “quality-based layoffs” here and now is among the most problematic suggestions currently out there.  The value-added assessment systems yet-to-be-implemented aren’t even up to the task. I’m really confused why these pundits who have so little knowledge about this stuff are so convinced that it is just so AWESOME.

Reform Engineers: Reform engineers view this issue in purely statistical and probabilistic terms – setting legal, moral and ethical concerns aside. I can empathize with that somewhat, until I try to make it actually work in schools and until I let those moral, ethical and legal concerns creep into my head. Perhaps I’ve gone soft. I’d have been all for this no more than 5 years ago. The reform engineer assumes first that it is the test scores that we want to improve  as our central objective – and only the test scores. Test scores are the be-all and end-all measure.  The reform engineer is okay with the odds above because more than 50% of the time they will fire the right person. That may be good enough – statistically. And, as long as they have decent odds of replacing the low performing teacher with at least an average teacher – each time – then the system should move gradually in a positive direction.  All that matters is that we have the potential for a net positive quality effect on replacing the 3/4 of fired teachers who were correctly identified and at least breaking even on the 1/4 who were falsely fired. That’s a pretty loaded set of assumptions though. Are we really going to get the best applicants to a school district where they know they might be fired for no reason on a 25% chance (if using 3 years of data) or 35% chance (on one year?). Of course, I didn’t even factor into this the number of bad teachers identified as good.

I guess that one could try to dismiss those moral, ethical and legal concerns regarding wrongly dismissing teachers by arguing that if it’s better for the kids in the end, then wrongly firing 1 in 4 average teachers along the way is the price we have to pay. I suspect that’s what the pundits would argue – since it’s about fairness to the kids, not fairness to the teachers, right? Still, this seems like a heavy toll to pay, an unnecessary toll, and quite honestly, one that’s not even that likely to work even in the best of engineered circumstances.

========

Follow up notes: A few comments I have received have argued from a reform engineering perspective that if we a) use the maximum number of years of data possible, and b) focus on identifying the bottom 10% or fewer of teachers, based on the analysis in the NCES/Mathematica report, we might significantly reduce our error rate – down to say 10% of teachers being incorrectly fired. Further, it is more likely that those incorrectly identified as failing are closer to failing anyway. That is not, however, true in all cases. This raises the interesting ethical question of – what is the tolerable threshold for randomly firing the wrong  teacher? or keeping the wrong teacher?

Further, I’d like to emphasize again that there are many problems that seriously undermine the application of value-added assessment for teacher hiring/firing decisions. This issue probably ranks about 3rd among the major problem categories. And this issue has many dimensions. First there is the statistical and measurement issue of having statistical noise result in wrongful teacher dismissal. There are also the litigation consequences that follow. There are also the questions over how the use of such methods will influence individuals thinking about pursuing teaching as a career, if pay is not substantially increased to counterbalance these new job risks. It’s not just about tweaking the statistical model and cut-points to bring the false positives into a tolerable zone. This type of shortsightedness is all too common in the types of technocratic solutions I, myself, used to favor.

Here’s a quick synopsis of the two other  major issues undermining the usefulness of value-added assessment for teacher evaluation & dismissal (on the assumption that majority weight is placed on value-added assessment):

1) That students are not randomly assigned across teachers and that this non-random assignment may severely bias estimates of teacher quality. The fact that non-random assignment of students may bias estimates of teacher quality will also likely have adverse labor market effects, making it harder to get the teachers we need in the classrooms where we need them most – at least without a substantial increase to their salaries to offset the risk.

2) That only a fraction of teachers can even be evaluated this way in the best of possible cases (generally less than 20%), and even their “teacher effects” are tainted – or enhanced – by one another. As I discussed previously, this means establishing different contracts for those who will versus those who will not be evaluated by test scores, creating at least two classes of teachers in schools and likely leading to even greater tensions between them. Further, there will likely be labor market effects with certain types of teachers either jockeying for position as a VAM evaluated teacher, or avoiding those positions.

More can be found on my entire blog thread on this topic: https://schoolfinance101.wordpress.com/category/race-to-the-top/value-added-teacher-evaluation/

Private Schools & Public Education Policy in New Jersey

The commission on private schools established by former Governor Corzine has just released its report:

http://nj.gov/governor/news/reports/pdf/20100720_np_schools.pdf

This report is more fun than many recent reports in New Jersey because it actually has some data and citations. Nonetheless, I have at least a few concerns regarding the presentation of the data and implications drawn from it. I was particularly intrigued by the graph on page 7 – which I replicate below:


This graph shows an apparent catastrophic collapse of the private schooling sector in New Jersey… or does it? Look at that the Y (vertical) axis. The range is from 160,000 to 192,000.  Yeah… that makes for a really steep apparent drop off. Note also that this data is from a state department of education source and is not reconciled against any other source. So, a stretched Y axis to make it look really, really, really dramatic. No second look – second opinion. And, only a single aggregate count of private school kids to show a major across-the-board collapse.

Here’s a more detailed exploration, using two data sources: 1) The National Center for Education Statistics Private School Universe Survey and 2) the U.S. Census Bureau American Community Survey, via the Integrated Public Use Microdata System.

First, here are the number of private schools by type in New Jersey over time:

This graph shows that the only significant decline in numbers of schools occurs for Catholic Parochial schools. Other private school types hold their ground in total numbers of schools.

Next, here are the enrollment and in the second graph, enrollment adjusted for missing data.

As with numbers of schools, the most substantive decline is for Catholic Parochial schools. There is a smaller drop for Catholic Diocesan schools. Other schools stay relatively constant, with some reclassification occurring between Other Religious – Not Affiliated and Other Affiliated. Note that the corrected, weighted version in the second graph above shows a somewhat smaller decline in Catholic Parochial enrollment than the un-adjusted version.

Next, I address private school enrollment by grade level and as a share of the total population of students in public and private school. A drop in private school enrollment would only be significant if it occurred in a context of stable or growing overall student population.

Here’s the total school population by grade level:

And the private school population by grade level:

What we see in this second graph is that the Grades 1 to 4 population appears to be declining most.

Here’s the private school enrollment by grade level as a percent of total enrollment. Kindergarten private school enrollment as a share of kindergarten students has declined. But, other grade level private school populations have declined only very slightly as a share of all children statewide in the same grade level.

This much more refined picture, across two additional data sets casts some doubt on the significance of the first graph above. Is there really a massive collapse of private schooling in New Jersey? It doesn’t look that way to me.

Explanations and Policy Implications for Catholic Schooling in New Jersey

Indeed, there may be some cause for concern for Catholic Parochial schools which appear to be closing and losing enrollment. But this phenomenon is not unique to New Jersey. Others have attempted to shed light on why Catholic schools are struggling in many urban centers.  Catholic schools have tried to remain accessible to the middle class by holding tuition down. At the same time, costs have risen. Decades ago, Catholic schools relied heavily on unpaid, church affiliated staff. Now, nearly all staff are salaried. My own recent analysis suggest that the cost of operating many Catholic schools are quite similar to those for traditional public school districts. The gap between tuition and cost has grown substantially over time for these schools. That’s not sustainable.

Two recent reports provide additional insights regarding public policy forces that may be compromising the stability of Catholic schooling in particular:

1) This Pew Trust report on parental choices in Philadelphia suggests that the expansion of Charter schools has potentially cut into the non-Catholic enrollment in urban Catholic schools.

http://www.pewtrusts.org/uploadedFiles/wwwpewtrustsorg/Reports/Philadelphia_Research_Initiative/PRI_education_report.pdf

Notably, New Jersey has not expanded charter schools as quickly as other states. But, it remains possible that existing New Jersey charter schools have drawn some students away from urban Catholic schools. As such, if the state is truly concerned with the sustainability of Catholic schools, the state should evaluate the effect of charter expansion on Catholic school enrollment (and on teacher recruitment/retention).

2) This Thomas B. Fordham Institute report suggests that vouchers in other locations such as Milwaukee have been a double-edged sword for Catholic schools. Vouchers do not provide full cost subsidy and restrict charging tuition above the subsidy to cover the gap. As such, schools are required to take a loss for each voucher student accepted. Further, as Catholic schools take on more non-Catholic vouchered students, parishioner contributions tend to decline – because it is perceived that the Catholic mission of the school has been compromised.

http://www.edexcellence.net/doc/catholic_schools_08.pdf

This situation does not apply in New Jersey, but findings from other cities raise concern that an under-subsidized voucher or tuition tax credit like the proposed Opportunity Scholarship Act (NJOSA) could actually do more harm than good for many private schools.

Vouchers differ from other subsidies (like the transportation and textbook subsidies) because of the restriction on charging tuition to cover the margin between the subsidy level and actual cost.  Some schools may subvert this requirement with strongly implied requirements for “tithing” as a substitute for tuition – including voucher receiving families. In fact, families could be obligated to tithe sufficient income to the private schools (or the religious institution that governs those schools) such that the family then qualifies for the tax credit program. The state should attempt to guard against this possibility in the design of any related policy.

Follow-up information:

A reader was kind enough to send me this link: http://www.avi-chai.org/census.pdf

Page 23 of this census report on Jewish school enrollment explains:

The other side of the geographic distribution picture is the concentration of schools in New York and New Jersey, as well as the overwhelming Orthodox domination in these two states. New York has 132,500 students, up from 104,000 ten years ago, while New Jersey has nearly 29,000 students, up from 18,000 in 1998. New Jersey’s gain is nearly all attributable to Lakewood, although there has been meaningful growth in Bergen County and the Passaic area. At the same time, Solomon Schechter enrollment in New Jersey has declined precipitously.

Clearly, the Orthodox schools in New Jersey are not in a free fall, as implied by the aggregation of all private schools in the private school commission report.

Another reader sent me this link:  http://www.njpsa.org/userfiles/File/EO161.pdf

This link explains the charge of the commission. It would seem to me that the final report has strayed somewhat from this charge.


Another “You Cannot be Serious!” The demise of private sector preschool in New Jersey?

There is little I find more enjoyable than boldly stated claims where the claims are entirely unsubstantiated… but where data are relatively accessible for testing those claims.

This week, the Governor’s Task Force on Privatization in New Jersey released their final report on the virtues of privatization for specific services. I took particular interest in the claims made about preschool in New Jersey. Preschool programs were expanded significantly with public support for both public and private programs for 3 and 4 year olds following the 1998 NJ Supreme Court ruling in Abbott v. Burke. For more information on the rulings and Abbott pre-school programs, see: http://www.edlawcenter.org/ELCPublic/AbbottPreschool/AbbottPreschoolProgram.htm

Here are the claims made in the privatization report:

•At the program’s inception, nearly 100 percent of students were served by providers in the private sector, many of which are women‐and minority‐owned businesses. Now, approximately 60 percent are served by private providers, as traditional districts have built preschools at great public expense and unfairly regulated their private‐sector competitors out of business.

•There are currently two sets of state regulations governing pre‐k. The majority of private pre‐k providers are subject to Dept. of Children and Families (DCF) regulations, but private pre‐k providers working in the former Abbott districts and serving low‐income children in some other districts are subject to the regulation of the DOE and the respective districts themselves, effectively crowding out the private sector and driving up costs to the taxpayer without any documented benefit to the children they serve.

To summarize, the over-subsidized public option of Abbott preschool has decimated the private preschool market in New Jersey, adding numerous women and minority business owners to the unemployment roles since the program was implemented (okay… a bit extreme… but I suspect you’ll hear it spun this way… since the above language isn’t far off from this).
The last time I read something this silly was in a research report from The Reason Foundation regarding “weighted student funding.” Not surprisingly, the Reason Foundation is among the only sources cited for… anything… in this report on the virtues of privatization (see page 4).

In this post, I’ll address two issues:

First, I address whether the claim that private preschool enrollment has dropped is true. Has private preschool in New Jersey actually been decimated since the 1998 Abbott decision? Are there that many fewer slots in private versus public preschools than before that time? Have public programs continued to grow while private programs have been eliminated? Has private preschool enrollment declined at any greater rate than private school enrollment generally? if at all?

Second, I revisit some of my previous findings about private versus public school markets, cost and quality. The recommendation that follows from the above claims is that the state, instead of continuing to subsidize expensive Abbott preschool programs, should allow any private provider to participate without Abbott regulation. This, it is assumed, would dramatically reduce costs. Rather, this might reduce expenditures… and the quality of service along with it. Lower spending (not cost) private providers simply don’t and can’t offer what higher spending providers do. Cost assumes specific quality, and lower “cost” assumes that less can be spent for the same quality. In this case, quality is being ignored entirely (or assumed entirely unimportant). That is, the proposed plan of allowing any private provider to house “preschool” students would likely be the equivalent of subsidized “daycare” (minimally compliant with Dept. of Children and Families (DCF) regulations) and not actual “pre-school.”

Issue 1

For these first four figures, I use data from the U.S.Census Bureau’s Integrated Public Use Microdata System. One of my favorites. Specifically, I evaluate the school enrollment patterns of 3 and 4 year olds in New Jersey from 1990 to 2008, by school type. Note that Census IPUMS data are actually not great for evaluating parent responses to the “school” enrollment question for 3 and 4 year olds, because in many cases a parent will identify their child as being in “school” even if the child is merely in daycare… home based, non-instructional, or any type of daycare. This is not hugely problematic here, because the report on privatization assumes that home based daycare or anything registered with DCF to supervise children during the day qualifies as a pre-school.  If anything, there may be under-reporting of private enrollment in these data by parents who actually don’t consider their private daycare to be “school.”

For 3 year olds, from 1990 to 2000, both public and private enrollment increase, while non-enrollment decreases. Public and private enrollment then stay relatively steady, except for an apparent increase in private enrollment in 2008 (I’m not confident in this bump, having seen other odd jumps between 2007 and 2008 IPUMS data). In any case, it would not appear that public enrollment has continued to severely squeeze out the private market place, unless we were to assume that the private market would have absorbed the entirety of the reduction in non-enrollment.  The lack of substantive shift from 2000 to 2008, with privates if anything, increasing their share, suggests that public subsidized have not led to the collapse of the private preschool market.

The next two figures show the enrollment patterns for 4 year olds. In general, 4 year olds are more likely to be enrolled in school, public or private, and less likely to be non-enrolled. As with 3 year olds, there really aren’t any substantive changes to the relative enrollment of 4 year olds in public and private settings between 2000 and 2008. No collapse of the private market here.


As an alternative, I explore the enrollment of private schools which provide pre-kindergarten programs statewide, using the National Center for Education Statistics Private School Universe Survey. Using this data set, we can determine whether the number of enrollment slots at the preschool level among private providers has declined, and whether the decline in private preschool enrollment has been greater than the decline in private school enrollment more generally.  Note that much has been made of the “collapse” of private schooling in New Jersey in the context of the New Jersey Opportunity Scholarship Act.

This figure shows that private school enrollment generally has declined more than private preschool enrollment since 2000. Private preschool enrollment has remained relatively stagnant statewide from 2002 to 2008. No real collapse of private preschools evident here.

Issue 2

As I noted above, preschool might be defined in many different ways. On the one hand, we might wish to consider preschool to be any place that meets minimum health and safety guidelines for caring for children between the ages of 3 and 4. To me, that sounds more like daycare. Alternatively, preschool might actually involve specific curriculum and activities as well as training for personnel, etc. Obviously, these differences in definition can and likely do significantly influence the cost per child of offering the service. If I can hire high school graduates and rely heavily in parent volunteers, and use only minimally compliant physical space to supervise children at play – mix in story time – I can likely do things relatively cheaply. On the other hand, if I actually have to hire teachers who hold college degrees and provide a specific curriculum and have appropriate physical spaces in which to do those things, it’s likely going to get more expensive – publicly or privately provided. It’s not so much about whether it’s publicly or privately provided, but whether there are minimum expectations for what defines “preschool.”

The elementary and secondary private school market is highly stratified by price and quality, as I have discussed on many previous occasions. YOU GET WHAT YOU PAY FOR. Yeah… I know that clashes with the appealing logic that private providers always do more with less…. thwarting the “you get what you pay for” assumption… or even reversing it… ‘cuz private provides do so much more with so much less. But let’s look again at one of my favorite summaries – with a new presentation – of the private school market. Here’s the earlier version.

This figure lines up the national average (regionally cost adjusted for each regional cluster) a) per pupil spending, b) pupil to teacher ratios and c) percentage of teachers who attended competitive undergraduate colleges, for private schools by private school type. Public school expenditures sit right near the middle. The small group of Catholic schools in the national sample sit right along side public schools (the system of Catholic schools has evolved to look much like their public school counterparts over time).  Independent schools spend nearly twice what public schools spend, have much smaller class sizes and have very high percentages of teachers who attended competitive undergraduate colleges. Hebrew and Jewish day schools lie about half way between the elite privates and public and Catholic schools. At the other end of the private school market are conservative christian schools, which spend much less per pupil than public or Catholic schools. They do have somewhat smaller class sizes, but have very poorly paid teachers, and have few if any teachers who attended competitive colleges. For more on these comparisons, see: https://schoolfinance101.wordpress.com/2010/02/20/stossel-coulson-misinformation-on-private-vs-public-school-costs/. In short, this figure shows that even in the k-12 marketplace, private providers are very diverse, some offering small class sizes and highly qualified teachers for a much higher price than public schools, and others offering much less.

We can certainly expect at least as much variation in the private preschool marketplace, if not one-heck-of-a-lot more, since many private daycare facilities require little or no formal training and no college degree for their employees.

As an aside, I was driving down Route 202 the other day west of Somerville Circle and noticed that they are putting in a Creme-de-la-Creme “daycare/preschool.”  We had one around the corner from our house in Leawood, KS.  I suspect that few of the Abbott preschool facilities built at such great expense compare favorably to a “Creme” facility – with waterpark (we’re talking slides, fountains), mini tennis court, indoor fish pond, tv studio, etc. (at least that’s what the one in Leawood had. I expect nothing less here?).  I expect that many parents, having toured many other “less desirable” daycare and preschools, will decide that their child deserves the “Creme” lifestyle (I suspect that there are actually other options with better curriculum and perhaps better teachers in the area, but I have not had the occasion to research it). It’s just an extreme example of the diversity of the private preschool marketplace. I suspect the cost per pupil will far exceed that of the Abbott preschools (heck… it already exceeded $12k per year in Kansas several years ago).

To summarize, the Task Force report on privatization makes bold claims about Abbott preschool programs crowding out, and decimating private preschool programs, many run by women and minority business owners. But the Task Force report does not bother to substantiate a) that private preschools have actually suffered, or b) that any, if they had suffered, were actually owned and operated by women or minorities. The only “evidence” the report has to offer is the undocumented claim that 100% of kids were in private programs and now only 60% are. Where does that come from? What the heck is that? 100% of who? 60% of what?

Further, the Task Force report is willing to assume that warehousing 3 and 4 year olds under the supervision of high school graduates in physical spaces and with supervision ratios compliant with DCF regulations is sufficient for low-income and minority children… or rather… that it is the lower cost option with equivalent quality to Abbott pre-school programs (public or publicly regulated private). It is critically important that we acknowledge the difference in the quality or even type of service received at different price points. Like the private K-12 market, the private preschool market varies widely, and spending much less generally means getting much less.

=====

See also, the Abbott 5th year report: http://edlawcenter.org/ELCPublic/Publications/PDF/PreschoolFifthYearReport.pdf

Manual for Child Care Centers from DCF in NJ: http://www.nj.gov/dcf/divisions/licensing/CCCmanual.pdf

Can’t forget this:

The Gist Twist(s) & Rhode Island School Finance

So, I’ve tried not to… but I’ve been following the relatively uninformed debate over Rhode Island’s nifty new Foundation Aid formula on the National Journal “Experts” Blog.

http://education.nationaljournal.com/2010/06/a-funding-formula-for-success.php#comments

Yep, Rhode Island has invented the… wheel… or perhaps bread… one or the other. Pretty much a run-of-the-mill foundation aid formula here. And that’s not necessarily a bad thing. But there are a number of “wait and see” issues here… like how well the crafty state-local matching aid formula will work and to what extent the single relatively small and completely arbitrary poverty weight will actually drive additional funding to higher poverty districts.

One thing really caught my eye in Deborah Gist’s response to David Sciarra. Mr. Sciarra criticized the inclusion of New Hampshire in the calculation of the foundation aid level for the 2010-11 incarnation – adoption year incarnation of the nifty new bread/wheel. Here’s how Gist responds:

1. Our core instructional amount was based on national research, using data from the NCES, is sufficient to fund the requirements of the Rhode Island Basic Education Program, and it in no way focused on states with low per-pupil expenditures. In fact, we looked particularly carefully at our neighboring states, which have some of the highest per-pupil expenditures in the nation, and we included only those states that have an organizational structure and staffing patterns similar to ours.

First, I must say that it is a strange use of the term “national research” to refer to simply taking averages of spending data from states collected from a national survey, jointly from the National Center for Education Statistics and Census Bureau. It’s an annual survey. Collection of data. Not national research. It could be used for research. Heck, I love those data and know them oh too well. Which brings me to the Gist Twist here. And, it’s a three part twist.

You see, the goal is to identify an underlying “foundation” level of funding for school districts in Rhode Island.

Twist Part I: The first part of the twist, which I will not dig through here in great detail, is the pruning back of core instructional expenditures, a definition in the NCES data intended to be reported uniformly across states, albeit imperfect. The choice of core versus all current operating expense clearly drops the foundation value, and quite significantly. What remains unknown is the extent to which other aid beyond the foundation formula will actually address those other cost areas. In 2007-08, Rhode Island instructional spending per pupil was about $8,500 and current operating expenditures per pupil over $14,000. That’s a big difference to cover with other aid. Let’s hope they do.

Twist Part II: I was also quite intrigued by Gist’s explanation of how national data were used, and her defense to the accusation that they picked low spending states and took the average of the low spending states. Gist responds by saying they took “neighbors” of Rhode Island, which are, of course high spending states.

Here’s how the actual legislation describes the process:

(1) The core instruction amount shall be an amount equal to a statewide per pupil core instruction amount as established by the department of elementary and secondary education, derived from the average of northeast regional expenditure data for the states of Rhode Island, Massachusetts, Connecticut, and New Hampshire from the National Center for Education Statistics (NCES) that will adequately fund the student instructional needs as described in the basic education program and multiplied by the district average daily membership as defined in section 16-7-22.

http://www.ride.ri.gov/Finance/Funding/FundingFormula/Docs/H8094Aaa_FINAL_6_10_10.pdf

Even though I love maps, I won’t post one here. Maybe it’s because I used to teach in New Hampshire, and once lived in eastern Connecticut that I realize that one of these two is actually a neighbor of Rhode Island and one is not. Okay… for those of you pulling out your maps to figure out how all of those tiny New England states line up… yeah… New Hampshire does not neighbor Rhode Island. So then, why include New Hampshire in the calculation of the average instructional expenditures to set the Rhode Island foundation. Okay… let’s set aside the fact that this whole approach is actually not a reasonable way to identify the costs of meeting Rhode Island’s education standards, in Rhode Island districts and charter schools. But if you’re going to go down this road, the decisions should be somewhat justifiable.

Here’s the average core instructional spending per pupil for the states used:

Hmmmm… which one of these is not like the others? Yeah… New Hampshire’s per pupil spending is somewhat lower. But, it is a smaller state than the other two, and thus has lessened effect on the averages.  Oh… by the way… “similar organizational structure” as noted by Gist above, was her/their way of cutting out Vermont from the averages – because Vermont has too many non-unified districts – or actually – because Vermont is the highest spending of these states.

Here’s the effect on the averages. Including New Hampshire brings the average down by just under $200 per pupil. While this doesn’t seem like a lot, it’s about 1/3 of the difference between Rhode Island’s current spending per pupil and the target spending. That is, including New Hampshire cuts the aggregate increases in funding (difference between RI current and Target) required by about 1/3 … but that’s before we get to Part III of the twist.

Twist Part III: As far as I can tell, the proposed foundation level for fy2010-11 or even fy2011-2012? is to be set at $8,295.  Please correct me if this is not true.  That’s the amount cited here on slide #8:

http://www.ride.ri.gov/Finance/Funding/FundingFormula/Docs/Formula_PPT.pdf

And in any other documentation in which a foundation number is cited. These documents are generally from this past winter/spring leading up to passage of the legislation. So what’s wrong with that?  Well, the average spending of CT, MA and NH which comes out to about $8,295 (actually, mine comes out to $8,259) is from data from fiscal year 2006-07. Are they really basing the 2010-11 or 2011-12 foundation level on 2006-07 data?  Take a look at my second graph above. The 2007-08 data came out the other day. And, as it turns out, the 2007-08 Rhode Island average core instructional spending per pupil was over $8,500. That’s actually more than the new foundation level.

That’s not to say that it can’t be reasonable to have a foundation level that’s less than current average spending. After all, the average spending is the average of all districts, including their varied needs. It is conceivable that the current average is more than sufficient… to achieve current average performance in districts with less than average needs. But that’s not how this is being spun at all. Rather, it’s being spun as a breakthrough based on thorough and thoughtful empirical analysis.  That’s hardly the case.

Quite honestly, Ms. Gist and the RI legislature may have been better off saying that the foundation level will be set at $8,295 because that’s how much we are willing to pay for – not this silly back of the napkin justification of the amount they were willing to pay for. That in mind, this foundation formula and its arbitrary weights – excuse me – weight – actually bring us backwards, not forwards in the school finance debate, making a mockery of “research” and its potential use for informing state school finance policy.

Sorry… got a little edgy at the end there.

And here’s a little extra credit reading which actually covers national research on estimating the cost of achieving state standards. It’s from the National Research Council of all places: http://www7.nationalacademies.org/CFE/Taylor%20Paper.pdf

Follow up note:

As the statute reads, RI itself would also be included in the average calculation, lowering the value further. It makes little sense to include current average (or even 3 year old average) spending of the state you are trying to “fix” in the average spending to inform the foundation level if the assumption is that the state has, for lack of any real formula, fallen behind in regional competitiveness. Of course, it hasn’t fallen behind New Hampshire. So… my above averages do not include Rhode Island itself and are intended only to be illustrative of the arbitrary (well… not really arbitrary… intentional) choice of including New Hampshire in the calculation.

By the way… I wonder if Deborah Gist can see New Hampshire from her window, or does Massachusetts actually get in the way?

An Alternative Look at the Census Financial Data

The spin is on. As soon as the annual school district level U.S. Census fiscal survey data are released, news outlets across the country take their shot a spinning the data to show just where their state stands. New York #1! Utah… dead last! Hawaii “above average.” Spending just really high (totally out of context)! Typically, news outlets point out spending is high when they wish to argue that it’s too high… and we should do something to curb it. No mention is made of outcomes achieved with that spending, or which districts in the state are responsible for the high average. When spending is reported as low, the spin is generally that it is too low, and that state policymakers should do something about it.

Allow me to briefly present a slightly more nuanced picture. For the past few years, and in a number of publications, I have used a statistical model of the national school finance data to correct for such issues as a) economies of scale and population density, b) regional variation in competitive wages, and c) variations in student needs. I use this model to project what a school district, with comparable characteristics, would have in state and local revenue per pupil in each state. The methods of this madness were used in this study: http://epaa.asu.edu/ojs/article/viewFile/718/831

Here are some of the results with the 2007-08 Census Fiscal Survey data (with the model built on data from 2005-06, 2006-07  & 2007-08).

Before getting to the modeled estimates of comparable state and local revenue, lets take a quick look at the relative educational effort of each state, or the combined State and Local Revenues for K-12 education as a share of Gross State Product. Vermont and New Jersey lead the pack on this on, with other states including Maryland and New York in the mix. Note, however, that this effort can be quite unevenly distributed. In fact, it may be the case that a significant amount of effort is going into local property tax revenues being raised by the richest communities in a state. Yeah… it’s still a lot of effort, but selectively distributed among those who can put up that effort and choose to as long as it benefits (or is perceived to benefit) their own children. Total effort provides a limited window, but important one nonetheless.

Fun Fact about this first table – TAKE A LOOK AT OUR RACE TO THE TOP, ROUND 1 WINNERS! (47TH & 50TH ON EFFORT!!!!)

Now to the model based estimates of who’s really in the top and bottom ten on state and local revenue per pupil for elementary and secondary education. Let’s begin by looking at those states where the lowest poverty districts have the highest and lowest resources.

Yep, New York is #1 in per pupil state and local revenues for very low poverty districts! Indeed, very affluent Long Island and Westchester County school districts in New York State spend about as much as any districts in the nation, largely because they have the financial capacity to do so (and partly because the state has enabled them to!)

Next in line in funding for very low poverty districts are Wyoming and Vermont, which really don’t have many children attending incredibly high poverty districts. Notably, New Jersey falls well behind New York state for low poverty districts, and many of New Jersey’s affluent suburbs lie in the same labor market with the higher spending affluent New York suburbs. And then there’s Tennessee – one of our great RttT winners.  Of course, as I have shown on a previous post, this works fine for TN, which as the lowest state assessment cut scores – so most of the kids pass the tests anyway (low standards & low funding – a winning combination indeed)!  

The next table ranks per pupil funding for high poverty districts.  Notably, New York is NOT in first place on this one. New York drops to 6th, but the situation is somewhat more complicated. While this might appear okay, it can be particularly difficult for high poverty New York state school districts to recruit and retain high quality teachers when they are surrounded by so many affluent districts which already hold the recruitment and retention advantage, and have substantially more resources. For high poverty districts, New Jersey and Wyoming come in first. Wyoming is simply high across the board. And yep… there’s Tennessee again – our RttT winner in 47th place!

This next table ranks the within-state FAIRNESS of the state school funding distribution – where fairness is determined by taking the ratio of high poverty funding to low poverty funding – with the implicit assumption that state school finance systems should provide for additional support in districts serving children with greater needs. Now, this table must be taken in the context of the previous two. For example, Utah comes in first on “fairness.” But, in this case, this merely means that low poverty districts in Utah get nothing, and high poverty districts in Utah get next to nothing! In a twisted sense, that’s “fair?????”

Among states not at the bottom in overall resources, New Jersey, Ohio, Minnesota and Massachusetts seem to be driving additional resources into higher need, higher poverty districts.

States  at the other end of the spectrum include New York, Pennsylvania and Illinois. These are among the historically least equitable large, diverse states in the country. Now, to Pennsylvania’s credit, these calculations precede the phase-in of their new funding formula which the governor has continued to support even during the recession. New York and Illinois are another story. Yeah… New York also implemented – okay – kind of planned to implement a new formula. That didn’t get very far, and it is highly unlikely (okay, almost entirely unlikely based on other analysis I’ve conducted on more recent NY data) that NY has actually improved since 2007-08.  Illinois hasn’t even tried – in fact, Illinois just keeps getting worse and worse!

Now for an obligatory point – Many argue that the overall funding level in states is simply a function of their wealth. Wealthier states, like wealthier school districts within states simply have the ability to spend more. That is indeed partly true. But effort also matters – remember that first slide above?  This scatterplot shows the relationship between state effort and funding levels in a hypothetical average poverty school district. There’s actually a reasonably strong relationship here, but for a few quirky outliers. In fact, based on additional analyses, a state’s effort explains about as much of the funding level as does a state’s wealth.

So, Mississippi is a very poor state that puts up relatively average effort, but simply can’t get very far with that effort. By contrast, Tennessee and Louisiana both have much higher fiscal capacity (measured by gross state product per capita) than Mississippi, but they simply don’t use it. Tennessee has little excuse for its spending level! Nor does Louisiana!

Finally, here’s a snapshot of the association between 8th grade reading and math NAEP performance and funding levels across states. As it turns out, funding levels for high poverty settings were most strongly associated with NAEP performance for all students. As one can see, there exists a reasonable correlation between funding levels and NAEP mean scale scores. That said, as I have noted in previous posts regarding such relationships, there’s a lot of circular stuff all tangled up in here. Wealthier states with more educated adult populations supporting higher education spending – and supporting and encouraging their children to do well in school, etc.  But, it is difficult to conceive how a state in the bottom left corner of this picture (very low funding in high poverty districts – and most likely, low funding across the board) can begin to lift itself out of that corner – or Race to the Top. Financial resources are a necessary underlying condition, albeit easier to achieve in some states than in others.

Note: Difficulties arise when trying to make simple comparisons of funding levels and funding gaps with achievement gaps between poor and non-poor children in each state a) because income thresholds used for subsidized lunch status characterize very different populations from one region of the country to another and from rural to urban settings within states, and b) because gaps between non-poor and poor children in states depend significantly on how wealthy are the non-poor and how poor are the poor. Sadly, these complexities make it very difficult if not impossible to use NAEP data to untangle the relationship between funding differences between lower and higher poverty districts, and outcome differences between children attending those districts in different states:

I discuss the poverty measurement problems here:

https://schoolfinance101.wordpress.com/2009/11/27/title-i-does-not-make-rich-states-richer/

Kevin Welner and I discuss evaluating the relationship between state school funding distribution and student outcomes here:

https://schoolfinance101.com/wp-content/uploads/2010/05/doreformsmatter_formatted.pdf