The “Ed Schools are the Problem” Fallacy

I had the displeasure of waking up to this drivel in my in-box this morning:

“Those who can, do. Those who can’t, teach. And those who can’t teach, teach teaching.”

http://www.nytimes.com/2013/10/21/opinion/keller-an-industry-of-mediocrity.html?_r=0

yeah… and those completely lacking in critical thinking, basic research and data interpretation skills write op-eds for the Times.

I don’t really teach teachers myself, so I guess I shouldn’t take offense. But I do mainly because the core argument advanced here is so ill-informed and poorly conceived.

Allow me to start by pointing out that I have actually written detailed, quantitative research in peer reviewed journals on the very topic of who’s teaching the teachers. In fact, the article we wrote was done partly in response to the Arthur Levine report cited in the Times op-ed piece. And it’s not as if the article title really conceals its contents:

Wolf‐Wendel, L., Baker, B. D., Twombly, S., Tollefson, N., & Mahlios, M. (2006). Who’s teaching the teachers? Evidence from the National Survey of Postsecondary Faculty and the Survey of Earned Doctorates. American Journal of Education, 112(2), 273-300.

My apologies for the fact that this article is fire-walled. I really don’t expect all of my blog readers to go through the trouble of paying for it or finding an academic library that carries it. But any responsible journalist, pundit or author proclaiming a strong policy position on this issue ought to at least do some reading on the topic first. The above article is certainly not uncritical of teacher preparation. [UPDATE: Full version here, courtesy of the kind folks at AJE Baker2006]

And the issues of complexity and variation in teacher preparation I explore in the above research article are not the only massive omission or conflation put forth in the New York Times piece, which operates on the overly crude assumption of a uniform system of content-free instruction across any and all ed schools.

Let’s tackle the bigger and much simpler issue here – the broad notion advanced in this op-ed that Ed Schools are the problem! Ed Schools are the primary threat to the quality of our public schooling system as a whole and by extension Ed Schools are a threat to our national security. [yeah… he didn’t really say that… but somehow it often goes there] And further, that if we can just replace ed schools – with some other unknown thing – we’ll all be better off.

A kinder, gentler variant on this argument is that it’s just the bad ed schools that are a threat and that we can weed out those bad ed schools by looking at how the students of their graduates perform. I’ve addressed this issue in a few previous blog posts. First, I’ve addressed the question of whether “ed school” is really some static, monolithic entity. Second, I’ve addressed the feasibility of rating ed schools by twice removed outcome measures.

But there’s actually a simpler logical fallacy at play here which lies at the root of many reformy arguments regarding causes and consequences – failure to acknowledge that the U.S. has a wide range of elementary and secondary of schools that are both high performing and low performing and that the defining features differentiating higher and lower performing schools are not found primarily in their teachers or the preparation programs they attended – or whether they attended any at all – but rather in the communities they serve, the resources available to them and the backgrounds, health and economic well-being of the children and families they serve.

This is not about the poverty as excuse argument. This is about the simple point that our highest performing public schools also employ teachers from traditional public college and university preparation programs and in many cases, teachers from the same – or substantively overlapping – college and university preparation programs as teachers in our lowest performing schools in the same region.

If that’s the case, then how is it possible that teacher preparation programs are the problem?

I know… the good reformer at this point is thinking – but there are no good U.S. public schools or districts. They all suck and that’s precisely why teacher preparation is the problem. Of course, if that was the case – that all K-12 public schools suck – it would hard to, by research design – with a dependent variable that doesn’t vary – attribute that sucky-ness to a single cause. But the dependent variable does vary… even when we rely on reformy resources like the Global Report Card I wrote about here.

First, here’s a location where you, yourself can actually download the reformy report card, which in large part was designed to shake the confidence of America’s suburban parents by taking a few statistical leaps to show them their leafy suburban schools wouldn’t stack up so well if we transported them to Finland or Singapore.

http://globalreportcard.org/docs/Global-Report-Card-Data-11.14.12.xlsx

I’ll save that argument for another day, and just select two sets of districts from this report card, from Illinois and Kansas, because I have the data readily available. Let’s look at local public school districts that are

1) Better than the Average Fin and those that are…

2) Worse than 80% of beer-swillin’ Hockey Lovin’ Canadians.

That’s quite a contrast (even though both are high performing countries – on average – setting aside demographics, etc.).

Here are the lists:

So, we’ve got some school districts in each state that are better than the average Finnish school and some that get trampled by the those syrup swillin’ hosers from the Great White North.

The only plausible explanation is that the teachers in the Better than Finland category are either from completely non-traditional ed schools or not ed schools at all while the teachers in the not-so-great schools all come from your typical state ed school.

Certainly, we know from large bodies of teacher labor market research that graduates of various preparation programs, colleges and universities and alternative route programs more broadly, sort themselves on the labor market, with those who possess stronger academic credentials often sorting into the “more desirable” jobs.

But that’s somewhat of an aside here. For the basic reformy premise of massive uniform ed school failure to be true – we would have to see little or no commonality in the ed school preparation of teachers across these settings – across totally awesome U.S. schools and totally sucky ones.

So, here’s the recent distribution of graduates of Kansas teacher preparation programs in the Kansas City metropolitan area which includes the Blue Valley School district – better than the average Fin and Kansas City Kansas, which, well, gets its butt kicked by Canada!

Hmmm… you can’t possibly be telling me that both KCK and BVSD have teachers who graduated from the major state teacher preparation colleges can you? If that’s the case, then their relative international rankings might not be determined by teacher preparation?

[ignore the poverty shading in the background…’cuz payin attention to poverty… well… just isn’t cool with the reformy crowd!]

There are some notable features to this map. One is that BVSD and and Olathe to its west were still significantly growing districts during this period. So it makes sense that they hired a lot of new teachers during that time. It makes less sense that KCK, more stagnant (and declining) in population hired so many new teachers – but for the relatively high turnover rate more common in such high poverty settings! There are also some distributional differences in the dots – which universities produce more teachers for which districts (or provide more credentials). Pittsburg state (blue dots) more prevalent in KCK provides a local program that feeds to KCK. I’d be hard pressed, however, to lay blame on Pitt state for KCK’s Canadian butt-whoopin’ and I’d be equally hard pressed to credit K-state in producing more teachers for Blue Valley as the cause of Blue Valley’s competitive match up with Finland! The fact is that all of these Kansas districts draw heavily on teachers produced by the public teachers colleges of that state – and some do as well as Finland while others struggle.

As such, it’s pretty darn hard to lay blame on traditional teacher preparation in Kansas for these differences in outcomes.

Now, let’s take a look at a few high performing and lower performing districts in Illinois.

First, here are the top 15 undergraduate degree producers for Chicago and Aurora East and for Naperville and Lake Forest. Rather than from the degree producers perspective, these data simply include all instructional staff in these districts, downloadable here: http://www.isbe.state.il.us/research/xls/2012-tsr-public-dataset-instr.xlsx

The data include where teachers got their undergraduate and advanced degrees.

Wow… there’s actually quite a bit of overlap in the institutions. Sure, there are differences. Where a state name is listed, the teacher received his/her undergraduate degree from an un-named institution in that state (such are the shortcomings of state administrative data). The City of Chicago does have larger shares from some Chicago based programs. But there’s also overlap and there’s significant overlap for the state’s major public teacher preparation institutions, like Illinois State University, Northern Illinois University and the University of Illinois main campus (Champaign). How can that be? How can there possibly be school districts that compete favorably with Finland while employing graduates of traditional teachers colleges?

While the percentages of teachers in these districts who attended any one preparation institution tend to be small, the shares who attended major public preparation institutions for their bachelors degrees appears marginally larger in the high performers (Over 10% for both IL State and Northern).

That’s impossible! But… But… But… graduates of those same colleges are teaching in districts that got whooped by the Canadians? So how can we possibly place blame for systemic failure of American schools on teacher prep programs? I’m struggling with the logic here.

One more look… here are the advanced degree granting institutions for teachers in higher versus lower performing Chicago area districts. Note that “NULL” refers to those not holding (or reporting) advanced degrees and that the share holding only a bachelors degree is higher in the lower performing districts (poorer, minority districts).

Again, these degrees – which in include both initial and additional certifications – are dominated by traditional credential granting institutions with substantial overlap across teachers between higher and lower performing schools.

This is a separable but related issue to the evaluation of ed schools by student outcome measures. I’ll continue digging more into that issue in future posts.

It is certainly hard to make a compelling case that traditional teacher preparation institutions are the primary cause of our supposed lagging national education system when our highest performing schools – those that compete favorably with Finland – also employ in large number, graduates of those preparation programs and in many cases employ significant numbers of graduates of the same programs that provide teachers for our supposed failing schools.

$500 million? No! $3 BILLION! That’s $3BILLION! Comments New York State’s Underfunding of NYC Schools

New York’s Governor Cuomo has been big on words promising NOT TO FUND New York State schools and squeeze them to the maximum extent possible with layers of cuts and caps. After all – NOT FUNDING SCHOOLS is the most noble of endeavors – that along with declaring death penalties for those underfunded, high need schools that post low average test scores.

The Governor’s most recent anti-school-funding attack comes in response to NYC Mayoral Candidate Bill de Blasio’s campaign promise to push for universal preschool for city school children. At least as characterized in a New York Daily News editorial, the noble Governor is again set to dig in his heals against any additional spending on schools:

Laying claim to big ideas, Bill de Blasio has promised to deliver universal all-day pre-kindergarten, paid for by raising taxes on wealthy New Yorkers.

The attractive concept helped boost de Blasio to a commanding lead over Republican Joe Lhota, and is key to his education program. He calls universal pre-k “how we will start to close the achievement gap” between minority and white children.

Now is the time for voters to consider whether de Blasio has a prayer of fulfilling his pledge to provide services to 70,000 4-year-olds, along with after-school programs for middle-schoolers.

He’d need a half-billion dollars a year and has staked all on convincing Albany to okay a city income tax hike on high earners.

Bill, meet Andrew:

Gov. Cuomo says no.

In an interview with the Daily News Editorial Board, Cuomo made clear that he has no intention of pressing the Legislature to give de Blasio the $500 million in tax money he’s counting on.

Read more: http://www.nydailynews.com/opinion/bill-rude-awakening-article-1.1487839#ixzz2i5NJcOJT

So, as the Daily News would characterize, the Governor is incensed that de Blasio would even consider the absurd possibility of needing an additional half a billion – that’s HALF A BILLION DOLLARS! to finance the frivolity of universal preschool.

Yeah… HALF A BILLION DOLLARS sure sounds like an obscene number. But let’s reflect for a moment on just how much money the state of New York, under the leadership of Governor Cuomo continues to come up short in financing the state school finance formula that was adopted back in 2007 in order to comply with a NY State High Court ruling that funding levels at that time for New York City were inadequate.

In 2012-13 and again in 2013-14 – New York State continues to short NYC on general state aid for schools to the tune of around 3 BILLION DOLLARS! Yeah… that’s 6 times the seemingly obscenely huge funding request for de Blasio’s pre-k proposal.

Let’s take a look. First, here’s a graph of the relationship between state aid shortfalls per pupil and the state’s own pupil need index. Larger circles represent larger (enrollment) districts. As can be seen here, in 2013-14, based on March/April 2013 adopted budget figures (state aid run worksheets), New York City – the bowling ball in the picture – is shorted by just under $3,000 per pupil in State Aid! That’s $3,000 PER PUPIL IN STATE AID!

Now, here’s how the funding formula that got the state out from under litigation is supposed to work.

A district’s target funding level per pupil is supposed to be a function of a) the foundation level of funding per pupil [appropriately inflated to represent current year costs], times b) the pupil need index for each district times c) the regional cost index for each district times d) the number of “aidable foundation pupil units” (which is an enrollment count including some adjustments for special education and other factors).

The adjusted foundation amount per pupil in NYC is $16,562 in 2013-14.

Bear in mind that even this figure is based on rigged analyses that severely underestimate actual needs and costs.

Then, the state determines the share of that figure to be covered by the district and balance to be covered by the state. The state share for NYC is supposed to be $7,006.

Take that figure times the total aidable foundation pupil units (TAFPU) and you’ve got….. $8.8 BILLION DOLLARS!

THAT’S $8.8 BILLION DOLLARS!!!!!

But alas, that’s far more than the state actually allocates to the City of New York through the foundation aid program it adopted to get out from under litigation brought by the city of New York – because funding was (and still is) inadequate!

I’m not even going to quibble (here and now) over the broader conception of “adequacy” involved, or the fact that the state concocted ways to reduce their estimated targets. The fact is that even though they set a low bar and further lowballed their funding targets, they’ve (meaning the Governor and Legislature) chosen to not even come close to funding those targets!

Instead, the state begins by freezing the underlying foundation aid level to past levels, setting NYC’s foundation aid level to just under $6.4 billion.

Yeah… sounds like a lot… but that’s already well short of what the city is supposed to get. And that’s just the first CUT.

Next, the state applies what it calls the Gap Elimination Adjustment (and then partially restores that cut, to make it seem like a gift), further reducing state aid to NYC down to just under $5.9 billion.

That’s a total state aid shortfall of nearly $3 billion! THAT’S $3 BILLION!!!!!!!!!!! [with the figure fluctuating around $3 billion from year to year – the figure was higher for 2012-13]

So before the good Governor Cuomo decries the obscene half a billion dollar request presented by future mayor(?) de Blasio, it might be wise for him to reflect on the state’s own past and still relevant promises to New York City… promises that would rightfully (constitutionally… as per the high court decision in Campaign for Fiscal Equity v. State) drive an additional $3 Billion to NYC.

THAT’S $3 BILLION!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

More Thoughts on Interpreting Educational/Economic Research: DC Impact Study

Today brings us yet another opportunity to apply common sense interpretation to an otherwise seemingly complex research study – this time on the “effectiveness” of the DC Impact teacher evaluation system on improving teaching quality in the district. The study, by some of my favorite researchers (no sarcasm here, these are good, thoughtful individuals who do high quality work) is nicely described in the New York Times Economix Blog section:

To study the program’s effect, the researchers compared teachers whose evaluation scores were very close to the threshold for being considered a high performer or a low performer. This general method is common in social science. It assumes that little actual difference exists between a teacher at, say, the 16th percentile and the 15.9th percentile, even if they fall on either side of the threshold. Holding all else equal, the researchers can then assume that differences between teachers on either side of the threshold stem from the threshold itself.

The results suggest that the program had perhaps its largest effect on the rate at which low-performing teachers left the school system. About 20 percent of teachers just above the threshold for low performance left the school system at the end of a year; the probability that a teacher just below the threshold would quit was instead above 30 percent.

In addition, low-performing teachers who remained lifted their performance, according to the system’s criteria. To give a sense of scale, the researchers noted that the effect was about half as large as the substantial gains that teachers typically make in their first years of teaching combined.

http://economix.blogs.nytimes.com/2013/10/17/a-new-look-at-teacher-evaluations-and-learning/?_r=1&

The study: http://cepa.stanford.edu/sites/default/files/w19529.pdf

So, for research and stats geeks this description speaks to a design referred to as regression discontinuity analysis. It sounds complicated but it’s really not. The idea is that whenever we stick cut-points through some distribution of ratings or scores – through messy/noisy data – some people fall just below those cut-points and others just above. But the cut-points are really arbitrary and those who fall just above the line really aren’t substantively, or even statistically significantly different from those who fall just below the line. It’s almost equivalent (assumed equivalent for research purposes) to taking a group of otherwise similar individuals and randomly assigning to some, one score (below the line) and others another score (above the line).

In one application of this approach, researchers from Harvard studied the effect of high stakes high school exit exams on student in Massachusetts. Students who barely passed the test were compared with those who barely failed the test on the first try. In reality, missing or making one or two additional questions does not validly indicate that one child knows their math better than the other. The kids were otherwise comparable, but some were labeled failures and others successes. Those labeled failures were more likely to drop out of high school and less likely to attend college.

The conclusion – that these arbitrary, non-meaningful distinctions adopted in policy are harmful.

This brings us to the present study on the DC Impact teacher evaluation system. Here, the researchers identified teachers who were really no different from one another statistically on their DC Impact ratings, but some were just a few fractions of a point low enough to be labeled as Ineffective and face threat of dismissal, and others just high enough to be out of the woods for now. That is, there really aren’t any substantive observed quality differences between these two groups. Note that the researchers studied this at the high end of the ratings distribution as well, but didn’t really find as much going on there.

Put simply, what this study says is that if we take a group of otherwise similar teachers, and randomly label some as “ok” and tell others they suck and their jobs are on the line, the latter group is more likely to seek employment elsewhere. No big revelation there and certainly no evidence that DC Impact “works.”

Rather, arbitrary, non-meaningful distinctions are still consequential. This is largely what was found in the Massachusetts high stakes testing studies.

Actually, one implication for supervisors is that if you want to get a teacher you don’t like to leave your school, find a way to give them a bad rating. But I think most supervisors and principals could already figure that one out.

Here’s an alternative experiment to try – take a group of otherwise similar teachers and randomly assign them to group 1 and group 2. We’ll treat Group 1 okay… just okay… no real pats on the back or accolades. Group 2 on the other hand will be berated and treated like crap by the principal on a daily basis and each day in passing, the principal will scowl at them and say… “your job’s on the line!.”

My thinking is that group 2 teachers will be more likely to seek employment elsewhere. Not hugely different from the DC Impact research framework and nor are the policy implications. Does this mean that teacher evaluation works – has appropriate labor market consequences. No… not at all. It means that arbitrary differential treatment matters.

Of course, this would be an unethical experiment unlikely to make through IRB approval. But heck, screwing with people’s lives via actual arbitrary and capricious employee rating schemes, adopted as policy is totally okay.

As for the second conclusion… that those who do stay appear to improve their game…it certainly makes sense that individuals would try not to continue being the whipping boy… even if they perceive their prior selection as whipping boy to be arbitrary and capricious. Notably, the bulk of the evaluations in this study were based on observed behaviors not test-based metrics, and with observations, teachers have more direct control over what their supervisors observe and can therefore respond accordingly. Whether these behavior changes have anything to do with better actual on-the-job performance – “good teaching” – is at least questionable.

The Value Added & Growth Score Train Wreck is Here

In case you hadn’t noticed evidence is mounting of a massive value-added and growth score train wreck. I’ve pointed out previously on this blog that there exist some pretty substantial differences in the models and estimates of teacher and school effectiveness being developed in practice across states for actual use in rating, ranking, tenuring and firing teachers – and rating teacher prep programs – versus the models and data that have been used in high profile research studies. This is not to suggest that the models and data used in high profile research studies are ready for prime time in high stakes personnel decisions. They are not. They reveal numerous problems of their own. But many if not most well-estimated, carefully vetted value-added models used in research studies a) test alternative specifications including use of additional covariates at the classroom and school level, or include various “fixed effects” to better wash away potential bias and b) through this process, end up using substantially reduced samples of teachers for whom data on substantial samples of students across multiple sections/classes within year and across years are available (see, for example: http://nepc.colorado.edu/files/NEPC-RB-LAT-VAM_0.pdf ). Constraints imposed in research to achieve higher quality analyses often result in loss of large numbers of cases, and result potentially in clearer findings, which makes similar approaches infeasible where the goal is not to produce the most valid research but instead to evaluate the largest possible number of teachers or principals (where seemingly, validity should be an even greater concern).

Notably, even where these far cleaner data and far richer models are applied, critical evaluators of the research on the usefulness of these value-added models suggest that… well… there’s just not much there.

Haertel:

My first conclusion should come as no surprise: Teacher VAM scores should emphatically not be included as a substantial factor with a fixed weight in consequential teacher personnel decisions. The information they provide is simply not good enough to use in that way. It is not just that the information is noisy. Much more serious is the fact that the scores may be systematically biased for some teachers and against others… (p. 23)

https://www.ets.org/Media/Research/pdf/PICANG14.pdf

Rothstein on Gates MET:

Hence, while the report’s conclusion that teachers who perform well on one measure “tend to” do well on the other is technically correct, the tendency is shockingly weak. As discussed below (and in contrast to many media summaries of the MET study), this important result casts substantial doubt on the utility of student test score gains as a measure of teacher effectiveness.

http://nepc.colorado.edu/files/TTR-MET-Rothstein.pdf

A really, really, important point to realize is that the models that are actually being developed, estimated and potentially used by states and local public school districts for such purposes as determining which teachers get tenure, or determining teacher bonuses or salaries, who gets fired… or even which teacher preparation institutions get to keep their accreditation?…. those models increasingly appear to be complete junk!

Let’s review what we now know about a handful of them:

New York City

I looked at New York City value-added findings when the teacher data were released a few years back. I would argue that the New York City model is probably better than most I’ve seen thus far and its technical documentation reveals more thorough attempts to resolve common concerns about bias. Yet, the model, by my cursory analysis still fails to produce sufficiently high quality information for confidently judging teacher effectiveness.

Among other things, I found that only in the most recent year, were the year over year correlations even modest, and the numbers of teachers in the top 20% for multiple years running astoundingly low. Here’s a quick summary of a few previous posts:

Math – Likelihood of being labeled “good”

15% less likely to be good in school with higher attendance rate
7.3% less likely to be good for each 1 student increase in school average class size
6.5% more likely to be good for each additional 1% proficient in Math

Math – Likelihood of being repeatedly labeled “good”

19% less likely to be sequentially good in school with higher attendance rate (gr 4 to 8)
6% less likely to be sequentially good in school with 1 additional student per class (gr 4 to 8)
7.9% more likely to be sequentially good in school with 1% higher math proficiency rate.

Math [flipping the outcome measure] – Likelihood of being labeled “bad”

14% more likely to be bad in school with higher attendance rate.
7.9% more likely to be sequentially bad for each additional student in average class size (gr 4 to 8)

https://schoolfinance101.wordpress.com/2012/02/28/youve-been-vam-ified-thoughts-graphs-on-the-nyc-teacher-data/

New York State

Then there are the New York State conditional Growth Percentile Scores. First, here’s what the state’s own technical report found:

Despite the model conditioning on prior year test scores, schools and teachers with students who had higher prior year test scores, on average, had higher MGPs. Teachers of classes with higher percentages of economically disadvantaged students had lower MGPs. (p. 1) http://schoolfinance101.files.wordpress.com/2012/11/growth-model-11-12-air-technical-report.pdf

And in an astounding ethical lapse, only a few paragraphs later, the authors concluded:

The model selected to estimate growth scores for New York State provides a fair and accurate method for estimating individual teacher and principal effectiveness based on specific regulatory requirements for a “growth model” in the 2011-2012 school year. p. 40 http ://engageny.org/wp-content/uploads/2012/06/growth-model-11-12-air-technical-report.pdf

Concerned about what they were seeing, Lower Hudson Valley superintendents commissioned an outside analysis of data on their teachers and schools provided by the state. Here is a recent Lower Hudson Valley news summary of the findings of that report:

But the study found that New York did not adequately weigh factors like poverty when measuring students’ progress.

“We find it more common for teachers of higher-achieving students to be classified as ‘Effective’ than other teachers,” the study said. “Similarly, teachers with a greater number of students in poverty tend to be classified as ‘Ineffective’ or ‘Developing’ more frequently than other teachers.”

Andrew Rice, a researcher who worked on the study, said New York was dealing with common challenges that arise when trying to measure teacher impact amid political pressures.

“We have seen other states do lower-quality work,” he said.

http://www.lohud.com/article/20131015/NEWS/310150042/Study-faults-NY-s-teacher-evaluations

That’s one heck of an endorsement, eh? We’ve seen others do worse?

Perhaps most offensive is that New York State a) requires that if the teacher receives a bad growth measure rating, the teacher cannot be given a good overall rating and b) the New York State Commissioner has warned local school officials that the state will intervene “if there are unacceptably low correlation results between the student growth sub-component and any other measure of teacher and principal effectiveness.” In other words, districts must ensure that all other measures are sufficiently correlated with the state’s own junk measure.

Ohio (school level)

In brief, in my post on Ohio Value Added scores, at the school level, I found that year over year correlations were nearly 0 – the year to year ratings of schools were barely correlated with themselves and on top of that, were actually correlated with things with which they should not be correlated. https://schoolfinance101.wordpress.com/2011/11/06/when-vams-fail-evaluating-ohios-school-performance-measures/

New Jersey (school level)

And then there’s New Jersey, which, while taking a somewhat more measured approach to adoption and use of their measures than in New York, has adopted measures which appear to be among the most problematic I’ve seen.

Here are a few figures:

And here is a link to a comprehensive analysis of these measures and the political rhetoric around them. http://njedpolicy.files.wordpress.com/2013/05/sgp_disinformation_bakeroluwole1.pdf

Conclusions & Implications?

At this point, I’m increasingly of the opinion that even if there was a possible reasonable use of value-added and growth data for better understanding variations in schooling and classroom effects on measured learning, I no longer have any confidence that these reasonable uses can occur in the current policy environment.

What are some of those reasonable uses and strategies?

First, understanding the fallibility of any one model of school or teacher effects is critically important, and we should NEVER, NEVER, NEVER be relying on a single set of estimates from one model specification to make determinations about teacher, or school… or teacher preparation program effectiveness. Numerous analysis using better data and richer models than those adopted by states have shown that teacher, school or other rankings and ratings vary sometimes wildly under different model specifications. It is by estimating multiple different models and seeing how the rank orders and estimates change that we can get some better feel for what’s going on (knowing what we’ve changed in our models), and whether or the extent to which our models are telling us anything useful. The political requirement of adopting a single model forces bad decision making and bad statistical interpretation.

Second, at best the data revealed by multiple alternative models might be used as exploratory tools in large systems to see where things appear to be working better or worse, with respect to producing incremental changes in test scores, where test scores exist and are perceived meaningful. That’s a pretty limited scope to begin with. But informed statistical analysis may provide guidance on where to look more closely – which classrooms or schools to observe more frequently. But, these data will never provide us definitive information that can or should be used as a determinative factor in high stakes personnel decisions.

But that’s precisely the opposite of current policy prescriptions.

Unlike a few years back, when I was speculating that such problems might lead to a flood of litigation regarding the fairness of using these measures for rating, ranking and dismissing teachers, we now have substantial information that these problems are real.

Even more so from a litigation perspective, we have substantial information that policy makers have been made aware of these problems – especially problems of bias in rating systems – and that some policymakers, most notably New York’s John King have responded with complete disregard.

Can we just make it all stop! ???

Notes on the Seniority Smokescreen

Seniority, in the modern reformy lexicon, is among the dirtiest words. Senior teachers are not only ineffective and greedy and never put interests of the children over their own, but they are in fact downright evil, a persistent drain on state and local economies and a threat to our national security! By contrast, “effectiveness” is good and since seniority and effectiveness are presumed entirely unassociated, the simple solution is to replace any reference to seniority in current education policies with measures of “effectiveness.”

If only it was so simple. This modern reformy mantra grossly misinterprets the relationship between seniority and effectiveness, presumes currently available measures of effectiveness to be more useful than they really are at sorting “good” from “bad” teachers, ignores that the proposed solutions have in many cases been found NOT to solve the supposed problem, and is oblivious to the broader literature on teacher labor markets, compensation and the quality of the teaching workforce.

Seniority and Effectiveness

Numerous studies over time have shown that as teachers reach somewhere around their 5^th year, student achievement gains under those teachers begin to grow more slowly and to an extent level off.[1] These findings, to the extent we believe that these metrics of test score gain adequately represent teaching effectiveness, do not by any stretch of the imagination mean that more experienced teachers are less effective. Rather, their effectiveness increases from year to year level off. If they have indeed reached their optimal performance then it makes sense to continue to compensate senior teachers in order to retain them. A constant cycle of replacement costs money and costs in terms of lost effectiveness during the start-up years.[2]

Seniority and Fairness to Children

One argument is that seniority preferences in teaching assignments permit senior teachers to hold on to jobs in schools against their principals preferences and that these seniority privileges often lead to the neediest children having the least experienced teachers – as the more experienced teachers get the cushiest jobs in the district. On the one hand, this assertion acknowledges that new teachers, not senior ones, may in fact be the least effective. On the other hand, it’s simply not supported by research. Two separate studies, one on Seattle schools after implementing “mutual consent” hiring, and the other exploring Florida seniority contractual provisions have found that a) implementation of mutual consent initially exacerbated inequities across schools and ultimately led to no change[3] and b) that seniority provisions in contracts had no statistical relationship to inequitable distributions of teachers across schools and children.[4] Further, mutual consent hiring policies a) assume that central office decisions are necessarily bad and principals’ decisions necessarily good, b) they ignore that principal quality itself may be inequitably distributed and c) they ignore that central office is responsible for assigning principals.

Seniority and Layoffs

Another argument is that removing seniority preferences in cases of reduction in force (RIF) will necessarily lead to an improved teaching workforce as measured by student achievement outcomes. The argument is that seniority preferences must be replaced by “effectiveness” metrics that are predictably related to future effectiveness. First, such “effectiveness” measures are typically only available for core content classroom teachers between grades 3 and 8. In those relatively rare cases where Reduction in Force is actually implemented, the 20% of teachers for whom such metrics are available are least likely to be reduced and there typically exists significant latitude in deciding which programs and positions might be reduced first. Most reductions in force, which happen infrequently to begin with, chip away at other programs and services before ever approaching core subject area teachers. Second, these effectiveness measures are wildly erratic and often substantially biased by who the teacher is teaching.[5] In reality, these policies propose to replace seniority with a roll of the dice, or even a roll of rigged dice. Third, when implementing a 5% across the board cut, for example, effectively eliminating teachers at random by experience rather than eliminating only the newest teachers results only in nickels and dimes savings – certainly not enough to make even the smallest dent in the degree of persistent underfunding of Philadelphia schools, where debate on these policies is ongoing.[6]

While seniority is a seemingly arbitrary and imperfect measure for retaining teachers, replacing it with a roll of the dice is likely to have serious negative consequences for retaining high quality senior teachers and recruiting teachers into high need districts.

Bigger Picture Wage Issues

Most important to recruiting and retaining a quality teacher workforce in any given school district are a) the relative compensation and working conditions a teacher can expect throughout their career when compared with other career options in the same labor market and b) the relative compensation and working conditions a teacher can expect in one school or district versus another in the same labor market.[7]

Paying teachers competitively, offering good working conditions, including smaller class sizes and other resources, and providing job security are far more likely to produce the teaching workforce our children need.

[1] Rice, J. K. (2010). The Impact of Teacher Experience: Examining the Evidence and Policy Implications. Brief No. 11. National center for analysis of longitudinal data in education research.

[2] Ronfeldt, M., Loeb, S., & Wyckoff, J. (2013). How teacher turnover harms student achievement. American Educational Research Journal, 50(1), 4-36.

[3] http://www.nctq.org/docs/Mutual_Concent_8049.pdf

[4] Cohen-Vogel, L., & Feng, L. (2013). Seniority Provisions in Collective Bargaining Agreements and the “Teacher Quality Gap”. Educational Evaluation and Policy Analysis.

[5] see for example, Haertel, E. H. (2013). Reliability and Validity of Inferences about Teachers Based on Student Test Scores. http://atlanticresearchpartners.org/wp-content/uploads/2013/07/PICANG14.pdf

[6] https://schoolfinance101.wordpress.com/2011/01/12/thinking-through-cost-benefit-analysis-and-layoff-policies/, see also: https://schoolfinance101.wordpress.com/2013/08/15/debunking-reformy-messaging-a-philadelphia-story/

[7] Richard J. Murnane and Randall Olsen (1989) The effects of salaries and opportunity costs on length of state in teaching. Evidence from Michigan. Review of Economics and Statistics 71 (2) 347-352

Ronald Ferguson (1991) Paying for Public Education: New Evidence on How and Why Money Matters. Harvard Journal on Legislation. 28 (2) 465-498.

Susanna Loeb and Marianne Page (19980 Examining the link between wages and quality in the teacher workforce. Department of Economics, University of California, Davis.

Figlio, D.N., Rueben, K. (2001) Tax Limits and the Qualifications of New Teachers. Journal of Public Economics. April, 49-71

Ondrich, J., Pas, E., Yinger, J. (2008) The Determinants of Teacher Attrition in Upstate New York. Public Finance Review 36 (1) 112-144

Pauvre, Pauvre NYC Charter Schools?

There’s nothing really new in this post. I’m just revisiting data and figures that I’ve addressed over and over in this blog – drawn from this report and this conference paper. I’m reposting this information because many seem to quickly forget or totally ignore what we already know and the current debate over whether the city of New York should charge charter schools rent is clouded by the usual mix of non-information, lack of information, disinformation and catchy (though false) statements on t-shirts.

These data are from 2008-2010 and at some point I will update these analyses. But, while downloading, parsing and analyzing NYC district school data is relatively straightforward, it remains a more burdensome task to get a complete picture of charter school financing in NYC and most other locations for that matter (searching through non-profit filings, etc.). That, in and of itself, raises serious accountability concerns [see the extent of footnotes needed in the above report to clarify our various concerns over clarity, completeness, accuracy and precision of charter school financial reporting].

Another important note is that conditions in district schools in New York City have continued to decline… with larger and larger class sizes each year… and persistent underfunding of the state school finance system. Thus, it is quite possible that the class size and other advantages charters held over district schools between 2008 and 2010 are much greater now.

So, what do we know about NYC charter schools? [and to be clear, this is an NYC specific issue… which, if you read the above report and paper… plays out differently, for example, in Ohio and Texas]

First, NYC charter schools have historically served much less needy student populations than their same grade level district school counterparts in the same borough of the city:

Second, New York City charter schools in many cases spend far more on a per pupil basis than do district schools serving similar student populations, at the same grade level in the same borough.

Third, Class Sizes at the elementary and middle school level tend, on average to be smaller, and in many cases much smaller (5 to 10 students per class smaller) in charter schools than in district schools.

Fourth, even with these resource advantages, New York City charter schools show very mixed performance outcomes compared to same grade level district schools serving similar student populations in the same borough.

My intent here is not to argue whether the city should or should not charge $2,700 per pupil rent to charters. Clearly, the effect of such a policy would fall disparately across charter school operators – where some are far more advantaged than others. It is important that we not simply accept the rhetoric of the pauvre, pauvre charter school that faces such awful mistreatment under possible city policy changes.

The big issue here – the overarching issue – regards the extent of inequities in access to resources that persist across the city system. Inequities exist across district schools by neighborhood and students served.

An important finding in the figures above is that huge inequities persist within the charter sector – a sector that has been selectively advantaged by the current administration’s policies over the past decade.

Whoever becomes the next Mayor of NYC must consider how the whole system fits together and how that system can generate the best distribution of opportunities for all children.

Equity is a necessary concern and one that is not resolved by providing, endorsing or expanding choices among inequitable alternatives.

Maps of NYC Charter and District School % Free Lunch 2010-11 (NCES Common Core of Data)

Paying Economists by Hair Color? Thoughts on Masters Degrees & Teacher Compensation

In previous posts, I’ve conveyed my distaste for the oft obsessively narrow thinking of the traditional labor economist when engaged in education policy research. I’ve picked on the assumption that greed and personal interest are necessarily the sole driving force of all human rational decision making. And I’ve picked on the obsession with narrow and circular validity tests. Yet still, sometimes, I see quotes from researchers I otherwise generally respect, that completely blow my mind.

I gotta say, this quote from Tom Kane of Harvard regarding compensation for teachers holding masters degrees is right up there with the worst of them – most notably because it conveys such an obscenely narrow perspective of compensation policies (public or private sector) and broader complexities of labor market dynamics.

The quote comes to us from the Wall Street Journal the other day:

“Paying teachers on the basis of master’s degrees is equivalent to paying them based on hair color,” said Thomas J. Kane, an economist at the Harvard Graduate School of Education and director for the Center for Education Policy Research. Mr. Kane said decades of research has shown that teachers holding master’s degrees are no more effective at raising student achievement than those with only bachelor’s, except in math. Researchers have also shown that teachers with advanced degrees in science benefit students. (Wall St. Journal)

http://online.wsj.com/article/SB10001424052702304795804579101723505111670.html

The broad implication of the quote by Kane is that immediate, measurable differentials in teacher compensation should only ever be directly associated with characteristics, indicators, behaviors of teachers that can be directly associated with differences in measured student test score differences from time 1 to time 2. Here… and NOW! That’s it. Any and all pay differentials MUST be associated with estimated test score gains on reading and math! The extension of this logic is that if there exists no statistically estimated relationship between a teacher having a masters degree in X, Y or Z and their students test score gains in that or the next year (perhaps), then no compensation should exist for this characteristic. Apparently, the same would apply for the teacher with the additional year of experience if they could not show a marginal gain to their students test scores over what they had previously achieved with less experience.

There are two gaping holes in this logic (setting aside the huge questions of the validity of the test score outcome metric as most important in defining student success and attributing that success to the teacher).

First, the research on “masters” degrees in education has pretty much addressed questions related to whether holding something ambiguously classified as a masters degree is positively associated with test score gains. In fact, these studies have found that holding content area masters degrees in math is associated with gains in math achievement.[1]

BUT, that research has not to my knowledge asked the broader labor market question regarding whether school districts offering additional pay for holding or pursuing masters degrees achieve a recruitment advantage on the labor market for teachers, or any other benefits to their workforce and children they serve. That’s a totally different question and one that requires being able to think beyond the laughably narrow mindset that the ONLY benefit that can ever matter and should ever warrant additional public expenditure is that which contributes directly and immediately to test score gains (of the students with the specific teacher being compensated for their masters degree).

Quite honestly, it’s this same utterly ridiculous thinking that plagues Kane’s Measures of Effective Teaching studies for the Gates Foundation – the assumption pervasive at every step of the project that the one and only valid indicator of teacher quality is value-added itself and all other measures should be evaluated by their ability to predict value added. Because a teacher’s prior year value added is the best predictor of current year value added, and a better predictor than observations, student surveys, etc., therefore value-added is the best measure of teacher effectiveness!

Second, both from a practical perspective and with potential broader labor market implications, there are many, many reasons why a local public school district or private school… or other business entity for that matter might wish to provide additional compensation for their employees who chose to advance their education, either related specifically to their current job title and responsibilities, or not!

It may be entirely reasonable for local public school districts to provide additional compensation for teachers seeking graduate credentials that expand their possible involvement in district or school activities, such as achieving additional training to work with special needs populations, or additional content certifications, or for that matter additional training to engage in all of the new teacher observations Kane and others now seem to think are necessary for getting rid of bad teachers (even though his own work on MET did not support his own conclusion in this regard). That is, you might want to have the salary differential available for the utility player.

It may also be an entirely reasonable approach for school districts to view providing additional compensation for furthering one’s education as a useful tool for retaining teachers – especially those who themselves show interest in expanding their own knowledge/learning.

In both this, and the previous case, the additional degrees or credentials obtained may actually have no direct relationship to the current primary responsibilities of the teacher. Does that mean they are entirely useless? That they should not be in any way associated with differentiated compensation not only until they are used, but until they are used in such a way that we can estimate that the additional credential has led to test score gains?

That’s just freakin’ asinine.

And this reductionist thinking really needs to stop.

A few other points are in order here. As I’ve shown in previous posts, the pursuit of the education masters degree takes many forms and has drifted over years. See the following figures.

Indeed, more creative thinking about how and when we choose to compensate graduate degrees through salary differentials is important to consider. But it would be utterly foolish to consider only immediate contribution to student test score gain as the single valid metric for making this decision.

Also, there already exists some variation in the ways in which masters degrees tend to be compensated across local public school districts. On average, I have found in studying teacher wage data in large diverse metropolitan areas that it is in fact the more affluent suburban districts that a) tend to have larger shares of teachers holding masters degrees and b) tend to provide a bigger bump in salary associated with masters degrees. Here are the New Jersey figures from a few years back.

And in the Chicago metro area, based on prior work, a teacher in a majority minority (student population) district is only 69% as likely as a teacher in a non-majority minority district (in the same labor market and holding other teacher characteristics constant) to hold a master’s degree and a teacher in a district that is 100% minority is only 60% as likely to hold a master’s degree as a teacher in a district that is 0% minority.

This figure displays the salary differentials by degree level and experience:

Teachers in majority minority districts are much less likely to hold a master’s degree than teachers in other districts in the same labor market. On average, a teacher in a majority minority district – at the same degree level and experience as a teacher in a non-majority minority district – is making about $2,000 less in annual salary. Because a master’s degree “bump” in salary is worth an average of about $8,500 ($8,481) in annual salary, large shares of teachers in majority minority schools are earning over $10,000 less than teachers at comparable experience levels in non-majority minority districts in the same labor market.

So, in other words, if masters degrees are such an obscene inefficiency in our public education system, they are an inefficiency most among the affluent suburban districts and less so in major urban districts (it is actually common to find that financially less constrained districts spend less efficiently – at least in terms of direct relationship to measured outcomes).

Meanwhile, the policies endorsed by Kane (via the MET project) are at least implicitly far more focused on fixing the supposed inefficiencies of our major urban education systems.

Perhaps, just perhaps, if those suburban districts next door did not pay such large differentials for and recruit so aggressively the teachers with masters degrees, their urban neighbors might have a chance to recruit some of those same teachers. But in the current environment, pushing urban districts to remove any and all compensating differentials related to anything not tied directly and immediately to test score gains will undoubtedly do far more harm than good.

[1] Goldhaber, D. D., & Brewer, D. J. (1998). When should we reward degrees for teachers?. The Phi Delta Kappan, 80(2), 134-138.

Other articles related to teacher wages and quality

Richard J. Murnane and Randall Olsen (1989) The effects of salaries and opportunity costs on length of state in teaching. Evidence from Michigan. Review of Economics and Statistics 71 (2) 347-352

David N. Figlio (1997) Teacher Salaries and Teacher Quality. Economics Letters 55 267-271. David N. Figlio (2002) Can Public Schools Buy Better-Qualified Teachers?” Industrial and Labor Relations Review 55, 686-699. Ronald Ferguson (1991) Paying for Public Education: New Evidence on How and Why Money Matters. Harvard Journal on Legislation. 28 (2) 465-498.

Susanna Loeb and Marianne Page (2000) Examining the link between teacher wages and student outcomes: the importance of alternative labor market opportunities and non-pecuniary variation. Review of Economics and Statistics 82, 393-408. Susanna Loeb and Marianne Page (19980 Examining the link between wages and quality in the teacher workforce. Department of Economics, University of California, Davis.

Figlio, D.N., Rueben, K. (2001) Tax Limits and the Qualifications of New Teachers. Journal of Public Economics. April, 49-71

Ondrich, J., Pas, E., Yinger, J. (2008) The Determinants of Teacher Attrition in Upstate New York. Public Finance Review 36 (1) 112-144

Philadelphia Graph of the Day: Revenue Structure of Advantaged & Disadvantaged PA Districts

Philadelphia Graph of the Day

I just can’t drop the Philly issue, because of the complete absurdity of the reformy rhetoric about Philly schools and persistent willful ignorance regarding the role of equitable and adequate funding for Philly schools and the Commonwealth’s failure to provide any reasonable level of support.

For what it’s worth – and I’ve spent a great deal of time critiquing this and similar studies – the Commonwealth in the mid-2000s took on the task of determining the “costs” per pupil of what Pennsylvania school districts needed to get the job done. This cost analysis was then used to guide development of a new formula intended to drive appropriate levels of state aid to districts facing substantive gaps between current spending (2006-07) at the time, and cost estimates developed under state supervision, by independent consultants.

[critique of these & related methods can be found here]

At the time, state officials found that districts including Philadelphia, Allentown and Reading faced funding (relative to cost) gaps between $4,000 and $6,000+ per pupil. So, in rather bold style, they adopted a new school finance formula with the intent to phase districts toward their adequacy targets. Then the economy tanked, and a new era of political attacks on state school finance formulas followed (as much a Cuomo/NY issue as a Corbett one!).

So, where are Pennsylvania school districts now, with 2013-14 (July estimates) funding (holding local effort constant), when compared to the 2006-07 funding gaps? That is, have Pennsylvania districts come any closer in the (7) following years to the targets that were estimated for them before it all came crumbling down?

Simple answer? No!

That is, Philly remains more than $4,000 per pupil (by this quick&dirty analysis) below the funding target that was estimated for it nearly a decade ago. [BEF = Basic Education Funding]

BEF 2013-14: http://www.portal.state.pa.us/portal/server.pt/community/education_budget/8699/SAEBG/539259

Endangering Intelligent Conversation: Comments on the Latest Hanushekian Crisis Manifesto

I had the displeasure of coming across this completely ridiculous and deceitful video the other day:

http://www.youtube.com/watch?v=e8aeEr2qk9s

Which was created to promote Eric Hanushek’s latest U.S. education crisis manifesto. Around the 4:12 minute mark, the video jumps from crisis mode to policy solution mode, telling us how among U.S. States, Florida is a model for the way forward, and states like Wyoming and New York provide proof positive that money really has nothing to do with helping schools. That money doesn’t matter is a critical underpinning of nearly every reformy rant.

Here’s the complete story to the contrary.

Now, most of what’s here has been summarized previously by Hanushek, and I have discussed this stuff previously on this blog.

This bizarre video got me thinking about a series of previous posts where I’ve looked across numerous indicators to try to tease out the relationships among them, across states. I’ve selectively scoured scatterplots of relationships between various state level indicators and outcome measures, but have not for a while now, simply stepped back and evaluated the correlations across all of them, and then tried to tease out what states, if any really do stand out.

Let’s start with the indicators and their sources.

School Funding Fairness: First up in my state level data set are a series of indicators from http://schoolfundingfairness.org /

These indicators characterize the level of funding and effort in state school finance systems.

Funding Level (predicted at 10% Poverty)
Funding Effort (state and local revenue as a share of gross state product)
Coverage (% of 6 to 16 year olds in public school system)
Early Childhood Enrollment (% of 3 & 4yr olds enrolled in some form of school)

Union Strength: Second are the Thomas B. Fordham Institute rankings of state level union strength. Here, a rank of 1 means a state with strong unions, and a rank of 50 would be a state with a weak union role. Thus, from a measurement standpoint, one might describe it as “union weakness” – that is, the higher the assigned value (thus lower ranking) the weaker the unions in that state. http://www.edexcellencemedia.net/publications/2012/20121029-How-Strong-Are-US-Teacher-Unions/20121029-Union-Strength-Full-Report.pdf

Policy Context Reformyness: Third, I have the grade point averages assigned to states by Students First in their state report cards. I use their overall GPA, their GPA for teacher policies and their GPA for parent power. http://reportcard.studentsfirst.org / In brief, Students First supports removal of seniority privileges, test-based teacher evaluation, mutual consent teacher assignment policies, folding their preferences for these policies into their teacher GPA. Regarding parent power, they support such cockamamie schemes as parent trigger, and more common school choice alternatives such as charter school expansion and tuition tax credits.

Teacher Wage Competitiveness: Here, I rely on the Economic Policy Institute’s measure of the Teaching Penalty, which is the average weekly wage of teachers compared to non-teachers for each state. http://www.epi.org/publication/the_teaching_penalty_an_update_through_2010 /

Harvard PEPG/Hanushek Catching Up Outcome Measures: Finally, along with mean scale scores of the National Assessment of Educational Progress, I also use some of Eric Hanushek’s own measures of student outcomes – and corrected versions of those measures – in order to track which of the above policies seem most correlated with various outcome measures – in the appropriate direction that is! http://www.hks.harvard.edu/pepg/PDF/Papers/PEPG12-03_CatchingUp.pdf

Now, in the book of “reformy”, there are some well understood truths.

First, that school choice programs, no matter what, no matter how structured, necessarily lead to an improved system for all. Choice lifts all boats. It necessarily induces innovation, thus quality, and the pressures of innovative quality force the stagnant public system to step up or collapse (that is, unless political leaders have already crafted a scheme to forcibly close traditional public schools, creating a false demand for alternatives, and then publicize that false demand as real… and well… you know).

Second, teacher wages are completely unimportant, largely because teachers are already paid way to freakin much and as a result they are all just complacent, lazy and greedy, waiting on those fat pensions they stand to collect after they ride out their time. In fact, pure reformy ideology declares that the best way to improve teacher quality is to cut those wage for most teachers, and perhaps, based on the luck of the roll of the test score dice, grant a few bonuses here and there to the truly “great” teachers.

Spending more money on stuff like expanding early childhood education is just wasteful expansion of the existing bureaucracy, having no persistent positive gains for children down the line.

States that spend a lot on schools are really just wasting their money, and getting nothing for it (see the above video… which pretty much says this straight up! Re: Wyoming and New York).

Thus… we must look to the models… like Florida… or Louisiana… or perhaps even Arizona?

So I was wondering….

So, I was wondering, if I took all of the above indicators, and first evaluated the correlations among all of them, and then evaluated a few scatterplots of what appear to be among the more consistent correlations, what would I conclude? Clearly, by my snark above, along with a lot of the other content you’ve probably read on this blog, I already have an opinion in this regard… but let’s start with a look at the correlations. Do they really tell us how totally freakin awesome reformyness is? And how totally freakin pointless it is to consider silly stuff…. Like money… and paying teachers well? And how completely freakin’ destructive unions are to quality education systems?

Here’s the correlation matrix… highlighted using a standard Excel conditional formatting feature.

Table 1. Correlations Across Indicators

And here’s a bullet point summary of the correlations.

States with weaker unions (higher number in ranking, meaning lower union strength ranking), have systematically lower state and local revenue per pupil and less competitive teacher wages.
States with weaker unions have systematically lower average NAEP scores.
States with higher reformy grade point averages according to Students First, have lower shares of children in the public school system, and have lower average NAEP scores.
Average NAEP scores are most positively associated with state and local revenue and teacher wage competitiveness.
Standardized NAEP gains over time are most positively associated with shares of 3 and 4 year olds enrolled in school programs/pre-school.
Standardized NAEP gains are also positively associated with Students First grade point averages. But, standardized NAEP gains are pretty strongly related to starting point. That is, states showing greater gains are generally those who started lower.

Figure 1. Gains Depend on Starting Point

Standardized NAEP gains, adjusted for starting point, are positively associated with enrollment of 3 and 4 year olds and with state and local revenue per pupil.
Adjusted standardized NAEP gains are only very weakly associated with Students First grade point averages.

So, who are the real standouts?

Okay… okay… but those correlations just suggest that states with higher spending and more kids in early childhood education seem to be doing better and gaining more over time. The correlations also, in the most generous case, suggest that the most reformy policies have been adopted in states that do and have historically done very poorly on outcome measures and that states with reformier policies aren’t necessarily outpacing those without (when the performance measures are adjusted for starting point).

Now… one might say… we must give these policies time… we’ve only just begun. To which I say many of the underlying policy conditions in these states, but for more recent changes to teacher evaluation policies under Race to the Top, have actually been in place now for a decade or so. Reforminess IS THE STATUS QUO in many of these low performing and gain-lagging states!

Figure 2. State & Local Revenue and NAEP Mean Scale Score

State and local revenues remain positively associated with NAEP mean scale scores, but this is indeed a case, in part, of those who have versus those who don’t. Here, Massachusetts is lookin’ pretty good, with New Jersey squashed below it, just above New Hampshire. Minnesota’s not lookin’ too bad. Florida is squashed in the middle of the pack among the low spenders.

Figure 3. State and Local Revenue and Adjusted Gains

Interestingly state and local revenues are also loosely (r=.228) associated with adjusted (by starting point) standardized gains on NAEP from 1990 to present (whereas reformier policy preferences were less correlated or not at all with adjusted standardized gains).

Now, a point not to be overlooked here is that New Jersey is actually further above the “expected” value, given its starting point, than Florida. But reformy types HATE New Jersey because it doesn’t conform to their preferences, just like they hate Maryland and tolerate, at best, Massachusetts. New Jersey spends a lot, has very low percent of kids in charter schools and has relatively strong unions.

Thus, the emphasis on Florida as the obvious (really?) standout – the model for all! Some pretty massive freakin deceptive, cherry picking there if you ask me. Missing from this graph is Massachusetts. Amazingly, no mention of New Jersey or Massachusetts in the goofy video above.

What I found most intriguing in this whole exercise was the relative strength of early childhood enrollments both with respect to NAEP mean scale score levels and with respect to the change in the percent of children scoring below basic.

Indeed, the first graph below also reflects some of the have/have not relationship. States like New Jersey and Massachusetts have higher income, more educated families that even without publicly financed pre-k programs would likely enroll their youngsters at a higher rate than parents in much lower income states. These are the strongest relationships in the matrix above… and early childhood enrollments are also most positively correlated with changes in shares of children scoring low on NAEP and most positively correlated with corrected standardized gains on NAEP.

Figure 4. 3 & 4 Year Old Enrollment and NAEP Mean Scale Score

Figure 5. 3 & 4 Year Old Enrollment and Reduction in % Below Basic

So, maybe it’s just me, but if anything, the correlation matrix above suggests that states that are spending on schools seem to both be doing okay, and to be improving over time. Standouts on gains include Maryland and New Jersey (and Delaware). Florida doesn’t strike me as the big standout here, though they do have high NAEP change over time. But, others including New Jersey have higher adjusted NAEP gain and have much higher reduction in the percent below proficient. And while New York and Wyoming raise some questions… some of these are easily disposed, with Wyoming being among the most sparsely populated states in the nation, for example, and New York being home to the largest city in the nation, embedded in the highest cost labor market in the nation.

As I’ve explained on previous posts… there’s a whole lot going on behind any simple scatterplot like these. They don’t tell of complex underlying causal relationships. They don’t really point us to those perfect models to follow. But they sure can be illustrative, and raise some important questions about the BS constantly hurled at us, increasingly in cleverly produced youtube format.

=====

Addendum: This paper was recently tweeted as providing proof that the presence of strong teachers unions in states creates a substantial drag on student performance gains. I’m actually quite shocked that such a methodologically goofy paper was actually published in this journal, which tends to be quite reasonable. First and foremost, the outcome measure – achievement growth over time – is created using states’ own assessments and looking at the difference in proficiency rates between 8th and 4th graders [with an unsatisfying “correction” for differences in test difficulty], in the same year [not even real cohort change]. This is problematic on two levels – first that differencing proficiency rates is a junk analysis to begin with, given policy shifts and other changes in state assessments and cut-scores over time, not to mention the massive information loss that occurs when we look only at numbers of kids shifting over a particular bar (yes… one of my graphs above suffers this same problem – # 5). No, these differences cannot be corrected by the simple regression used in the study. Second, state assessments, rigor of items and cut scores differ so vastly that the idea of comparing proficiency rate changes across states is utterly ridiculous. No, these differences cannot be corrected by the simple regression used in the study. Finally, explanatory variables/covariates in the models are a relatively simple collection of measures for which entirely unsatisfying justification is provided. But that doesn’t matter so much when the dependent measure is complete crap.

Share this:

Share this:

Share this:

New York City

New York State

Ohio (school level)

New Jersey (school level)

Conclusions & Implications?

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: