More Detail on the Problems of Rating Ed Schools by Teachers’ Students’ Outcomes

In my previous post, I explained that the new push to rate schools of education by the student outcome gains of teachers who graduated from certain education schools is a problematic endeavor… one unlikely to yield particularly useful information, and one that may potentially create the wrong incentives for education schools.  To reiterate, I laid out 3 reasons (and there are likely many more) why this approach is so problematic. Here, I divide them out a bit more – 4 ways.

  1. parsing out individual teacher’s academic backgrounds – that is if teachers hold credentials and degrees from may institutions, which institution is primarily responsible for their effectiveness?
  2. the teacher workforce in most states includes a mix of teachers from a multitude of within and out-of-state institutions, public and private, with many of those institutions having only a handful of teachers in some states. States will not be able to evaluate all pipelines reliably. Does this mean that states should just cut off teachers from other states, or from institutions that don’t produce enough of their teachers to generate an estimate of the effectiveness of those teachers?
  3. because of the vast differences in state testing systems, and differences in the biases in those testing systems toward either higher or lower ability student populations (floor and ceiling effects), graduates of a given teaching college who might for example flock to affluent suburban districts on each side of a state line might find themselves falling systematically at opposite ends of the effectiveness ratings. The differences may have little or nothing to do with actually being better or worse at delivering one state’s curriculum versus another, and may instead have everything to do with the ways in which the underlying scales of the tests lead to bias in teacher effectiveness ratings. We already know from research on Value Added estimates that the same teacher may receive very different ratings on different tests, even on the same basic content area (math).
  4. and to me, this is still the big one, that graduates of teaching programs are simply not distributed randomly across workplaces. This problem would be less severe perhaps if they were distributed in sufficient numbers across various labor markets in a state, where local sample sizes would be sufficient for within labor market analysis across all institutions. But teacher labor markets tend to be highly local, or regional within large states.

I showed previously how the rates of children qualifying for free or reduced price lunch varies significantly across schools of graduates of Kansas teacher preparation programs:

Racial composition varies as well:

But perhaps most importantly, the above to charts are merely indicative of the fact that the overall geographic distribution of teacher prep program graduates varies widely. Some are in low-income remote rural settings, with very small class sizes, while others are near the urban core of Kansas City, either in sprawling low poverty suburbs or in the very poor, relatively population dense inner urban fringe.  Making legitimate comparisons of the relative effectiveness of teachers across these widely varied settings is a formidable task for even the most refined value-added model and even that may be too optimistic.

Here’s the geographic distribution of teacher graduates of the major public teacher preparation institutions in Kansas:

The Kansas City suburbs in this figure are covered in Red (KU), Purple (K-State) and Orange (Emporia State) does, and a significant number of blue ones (Pitt State). Western Kansas is dominated by Green Dots (Hays State) and southeast Kansas by blue ones (Pitt State). Wichita is dominated by black dots (Wichita State). Nearly all of these clusters are local/regional, around the locations of the universities. Certainly, much of the distribution is also dependent upon demand for teachers, where the greatest growth has been in the Kansas City suburbs to the south and west (out toward Lawrence, home to KU).

Here it is peeled back. First KU:

Next K-State:

Wichita State:

Fort Hays State:

Pittsburg State:

Emporia State:

Even if we assume that value added models could be an effective tool for a) rating teacher effectiveness and b) aggregating that teacher effectiveness to their preparation institutions, it is a stretch to assume that we could find any reasonable way to reliably and validly compare the effectiveness of the graduates of these public institutions, given that they are clustered in such vastly different educational settings – with widely varied resource levels, widely varied class sizes, kids who sit on buses for widely varied amounts of time, widely varied poverty levels, immigration patterns and numerous other factors (it’s that other “unobservable” stuff that really complicates things!). The only reasonable statistical solution would be to have  graduates of Kansas teacher preparation programs randomly assigned to Kansas schools upon graduation.

As I noted on my previous post, I’m not entirely opposed to exploring our ability to generate useful information by testing statistical models of teacher effectiveness aggregated in this way (to preparation institutions or pipelines). It is certainly more reasonable to use these information in the aggregate for “program evaluation” purposes than for rating individual teachers. But, even then, I remain skeptical that these data will be of any particular use either for state agencies in determining which institutions should or should not be producing teachers, or for the institutions themselves. It is a massive leap, for example, to assume that a teacher preparation institution might be able to look at the value-added ratings based on the performance of students of their graduates, and infer anything from those ratings about the programs and courses their graduates took as they pursued their undergraduate (or graduate) degrees. Though again, I’m not opposed to seeing what, if anything, one can learn in this regard.

What would be particularly irresponsible – and what is actually being recommended – is to accept this information as necessarily valid and reliable (which it is highly unlikely to be) and to mandate the use of this information as a substantial component of high stakes decisions about institutional accreditation.

Ed Next’s triple-normative leap! Does the “Global Report Card” tell us anything?

Imagine trying to determine international rankings for tennis players or soccer teams entirely by a) determining how they rank relative to the average team or player in their country, then b) having only the average team or player from each country play each other in a tournament, then c) estimating how the top teams would rank when compared with each other based only on how their country’s average teams did when they played each other and how much better we think the individual teams or players are when compared to the average team or player in their country? Probably not that precise or even accurate, ya’ think?

Jay Greene and Josh McGee have produced a nifty new report and search tool that allows the average American Joe and Jane to see how their child’s local public school districts would stack up if one were to magically transport their district to Singapore or Finland.

 http://globalreportcard.org/

Even better, this nifty tool can be used by local newspapers to spread outrage throughout suburban communities everywhere across this mediocre land of ours.

To accomplish this mystical transportation, Greene and McGee rely on wizardry not often employed in credible empirical analysis: The Triple Normative Leap. Technically, it’s two leaps, across three norms. That is, the researcher-acrobat jumps from one normalized measure based on one underlying test, to another, and then to yet another (okay, actually to 50 others!). This is impressive, since the double-normative leap is tricky enough and has often resulted in severe injury.

To their credit, the authors provide pretty clear explanations of the triple-normative leap
and how it is used to compare the performance of schools in Scarsdale, NY to kids in Finland without ever making those kids sit down and take an assessment that is comparable in any
regard.

For example, the average student in Scarsdale School District in Westchester County, New York scored nearly one standard deviation above the mean for New York on the state’s math exam. The average student in New York scored six hundredths of a standard deviation above the national average of the NAEP exam given in the same year, and the average student in the United States scored about as far in the negative direction (-.055) from the international average on PISA. Our final index score for Scarsdale in 2007 is equal to the sum of the district, state, and national estimates (1+.06+ -.055 = 1.055). Since the final index score is expired in standard deviation units, it can easily be converted to a percentile for easy interpretation. In our example, Scarsdale would rank at the seventy seventh percentile internationally in math.

Note: Addition and spelling errors in Jay Greene’s original web-based materials: http://globalreportcard.org/about.html

Now, Greene and McGee do recognize the potential limitations of making this leap across non-comparable assessments, with potentially non-comparable distributions. In their technical appendix, which few other than geeky stat guys like me will ever read, they explain:

In order to construct the Global Report Card we combine testing information at three separate levels of aggregation: state, national, and international. At each level we use the available testing information to estimate the distribution of student achievement. To allow for direct comparisons across state and national borders, and thus testing instruments, we map all testing data to the standard normal curve.

We must make two assumptions for our methodology to yield valid results. First, mapping to the standard normal requires us to make the assumption that the distribution of student achievement on each of the testing instruments is approximately normal at each level of aggregation (i.e. district, state, national). Second, to compare the distribution of student achievement across testing instruments we assume that standard deviation units are relatively similar across the 2 testing instruments and across time. In other words we assume that being a certain distance from mean student performance in Arkansas is similar to being the same distance from mean student performance in Massachusetts.

http://globalreportcard.org/docs/AboutTheIndex/Global-Report-Card-Technical-Appendix-8-30-11.pdf

So, they appropriately lay out the important assumptions that to actually rate individual districts in the U.S. against international standards, based on relative position to a) other districts in their state, b) their state to the entire U.S., and then c) the entire U.S. relative to other countries, one must have a reasonable expectation that the distributions at each level are a) normal and b) have similar ranges. The range piece is key here because the spread of scores at any level dictates how many points a district can gain or lose when making each leap.  Again, they appropriately lay out these potential concerns. And then, true-to-form, they ignore them entirely. They don’t even test whether these assumptions hold.

The way I see it, if you’re going to point out a limitation and completely ignore it, you should at least point it out in the body of the report, not the appendix.

Setting aside that little concern for now, here’s how it all works. Walking backwards through their analysis each US district starts with penalty points based on the U.S. mean on PISA compared to the international mean.  That is, every district in the US is given a penalty point (-.055) partly because of the legitimately low performance of large numbers of US students in states that have thrown their public education systems under the bus, including Arizona, Colorado… but more strikingly, Louisiana and the deep south.

Now, a high performing state might then be able to offset their national penalty by outperforming U.S. norms… but only to the extent that NAEP has a wide enough distribution to allow a high performer to gain enough points back to make up that ground. If NAEP has a narrower range than the PISA distribution, even if you rock on NAEP, you can’t gain back the ground lost. In theory, this might even make some sense, but it would depend on the truth of the report’s key assumptions, which (as noted) are never tested.

The next move in the triple-normative leap is the move to the wacky collection of state assessments and their widely varied scale score distributions. High performing districts in a state like California, where the mean NAEP score of California gives everyone another layer of penalty to start, and a big one at that, are screwed. California high performers get a NAEP based penalty on top of their US average penalty and have to make up that entire deficit with standard deviations on state assessments. They’ve got a lot of ground to make up in standard deviations from their own state mean on their state assessment (if it’s even possible).

Let’s take a look at some of the actual district level distributions of standardized mean scale scores on state assessments. Remember, Green and McGee’s triple normative leap only works well to the extent that state assessments are a) normally distributed, b) have similar range and c) are not particularly skewed in one direction or the other.

Note that these graphs are of the normalized distributions of scale scores.

Here’s California

Here’s Ohio

And Here’s Indiana

Oh well, so much for that little assumption. Perhaps most importantly, these distributions show that it depends quite a bit on what state your district is in whether your district has reasonable likelihood of making up 1, 2 or 3 points in the last normative leap.

Remember, every district loses over half a point from the start based on U.S. PISA performance. California districts actually appear to have greater opportunity to make up more ground on the last leap, because the spread of California normed scores on state assessments is wider. But, they’ll need it, since their state average performance on NAEP gets all districts in the state a large penalty.

Anyway, while it may be fun to play with Green and McGee’s nifty web-based search tool, it really doesn’t give us much a picture as to how individual local public school districts in the U.S. stack up against foreign nations. It’s just too much of a stretch to assume that a district’s normative position on quirky state assessments, with non-normal distributions, can actually be translated with any precision to represent that district’s position within the performance distribution of schools in Finland or Singapore.

So, while it may be fun to play with the tool and see how different local public school districts compare, more or less to one another as they relate to other countries, it is totally inappropriate to make bold claims that any of these findings speak to the supposed “mediocrity” of the best public schools in the U.S. Many may appear mediocre when transported internationally for no reason other than the penalty points assessed to them in the first two normative leaps (national and state mean), neither of which has much to do with their own performance.

And these concerns ignore the fact that we are dealing with substantively different assessment content. See: http://nepc.colorado.edu/thinktank/review-us-math

Addendum:

McGee was kind enough to open a discussion on the topic below, and clarified… which what I was assuming already… that:

“We assume that being a certain distance from mean student performance in Arkansas is relatively similar to being the same distance from mean student performance in Massachusetts.”

My response is that the spread or variance issue is critically important here, even, and especially when making this kind of assumption. It comes down to the reasons for the differences in spread (like the differences seen in the above histograms).

The variance in each state’s assessments across districts contains some variance that truly indicates differences in performance and some that indicates differences in tests. The problem is that we can’t tell which portion of the spread is “real” variation in performance across districts (driven largely by demographic differences) and which is a function of the different assessments – especially the different assessments across states. Some of the variance is clearly constrained by the underlying testing differences, and may also be upper or lower limit constrained.

Third Way’s “Revisionist Analysis” [Bold-faced lie!]

I know I said I’d stop addressing the Third Way report on Middle Class Schools, but I do have one more thing to point out. Third Way issued a memo in which it aggressively attacked my assertion that they had used district level data to characterize middle class schools. Again, this assertion was relevant to showing the absurdity of their classification scheme, but there were numerous other problems with the report.

My NEPC Review

My NEPC Response to Third Way Memo regarding Methods

Third way claims my analyses to be “fatally flawed” because, as they claim in their follow-up memo, their analyses were actually at the school level and did not, as I show in tables in my review, contain all schools in poor cities including Detroit, Philadelphia or Chicago. Allow me to point out that what I actually said in my review was:

That is, these large urban districts are counted in any Third Way district-level analyses as middle-class districts.

I was very clear in my review that the table of large cities pertained specifically to “district-level” analyses in the Third Way report. I further explained extensively the problems with their continued mixing of school, individual family and district units.

But here’s the kicker based on one last check of their original report and the follow-up memo. In the follow up memo, the authors include this footnote to explain their methods – focusing on how they collected school level data from the NCES Common Core (school level data that never actually show up in any form, any table, in their original report). Note the part in this footnote where they explain selecting “school” as the unit of analysis:

Footnote in Memo

http://content.thirdway.org/publications/446/Third_Way_Memo_-_A_Response_to_the_National_Education_Policy_Center_.pdf

Footnote #8 Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed September 22, 2011. Available at: http://nces.ed.gov/ccd/bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey,” “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the U.S. Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “School.” Then select next. On the following page, in the “select columns” drop down box, choose the “Students in Special Programs” option. Select the box next to “Total Free and Reduced Lunch Students.” Then in the drop down box, select “Contact Information” option. Then select the box next to “Location City.” Then go back to the “select columns” drop down box and select the “Enrollment by Grade” option.  Then select the box next to “11th Grade enrollment.”  Then go more time to the “select columns” drop down box, choose “Total enrollment.” Then select the box next to “Total students.” Then select next. On the next page, choose “Illinois.” Then click the “view table” option. Once the table is compiled, download the table into Excel.csv by clicking that option at the top of the page. To calculate the number of high schools in Chicago with a student population of between 26-75% eligible for NSLP, we performed the following steps: 1) We first sorted by schools based on % NSLP (number of students eligible for free or reduced lunch divided by total number of students enrolled). 2) We then pulled out the schools that had enrollment in 11th grade. 3) We then sorted the schools based on location city, and pulled out the schools located in the City of Chicago.

Now, check out the two related (copied and pasted) footnotes from their original report. Each indicates using DISTRICT level data.

In short, the follow up memo was simply a lie – a flat out lie – and included revisionist analysis completely unrelated to any information actually presented in the original report.

I have retained copies of the originals, if the authors should choose to now go back and edit/change these footnotes.

Doing crappy analysis is one thing. Trying to cover it up by lying and revising while leaving the trail behind really doesn’t help.

Original Report

http://content.thirdway.org/publications/435/Third_Way_Report_-_Incomplete_How_Middle_Class_Schools_Aren_t_Making_the_Grade_-_PRINT.pdf

Footnote #40 Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed July 25, 2011. Available at: http://nces.ed.gov/ccd/ bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey,” “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the U.S. Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “District.” Then select next. On the following page, in the “select columns” drop down box, choose the “Census 2000 – Household Income, Occupancy and Size” option. Then check the box next to “Median Family Income.” Then go back to the “select columns” drop down box, choose the “Students in Special Programs” option. Select the box next to “Total Free and Reduced Lunch Students.” Then go back one more time to the “select columns” drop down box, choose “total enrollment.” Then select the box next to “total students.” Then select next. On the next page, choose the “Select 50 States + DC” filter from the drop down box. Then click the “view table” option. Once the table is compiled, download the table into Excel.csv by clicking that option at the top of the page. To calculate average household income by school district, we performed the following steps: 1) We first sorted school districts based on % NSLP (number of students eligible for free or reduced lunch divided by total number of students enrolled). 2) Using CPI for 2009, we adjusted the incomes for inflation. 3) We then found the median household income, based on the following groupings: 0-25.44%, 25.45-75.44%, 75.45-100% NSLP.

Footnote #88 Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed July 25, 2011. Available at: http://nces.ed.gov/ccd/ bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey”, “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “District.” Then select next. On the following page, in the “select columns” drop down box, choose the “Census 2000 – Household Income, Occupancy and Size” option. Then check the box next to “Median Family Income.” Then go back to the “select columns” drop down box, choose the “Students in Special Programs” option. Select the box next to “Total Free and Reduced Lunch Students.” Then go back one more time to the “select columns” drop down box, choose “total enrollment.” Then select the box next to “total students.” Then select next. On the next page, choose the “Select 50 States + DC” filter from the drop down box. Then click the “view table” option. Once the table is compiled, download the table into Excel.csv by clicking that option at the top of the page. To calculate average household income by school district, we performed the following steps: 1) We first sorted school districts based on % NSLP (number of students eligible for free or reduced lunch divided by total number of students enrolled). 2) Using CPI for 2009, we adjusted the incomes for inflation. 3) We then found the median household income, based on the following groupings: 0-25.44%, 25.45-50.44%, 50.45-75.44%, 75.45-100% NSLP.

Insult of insults from Third Way – Baker, You… You… Status Quo…er!

I gotta admit that my favorite part of the Third Way memo responding to my critique of their “Middle Class” report is the end of the memo.

Here are the two concluding paragraphs from the Third Way memo in reply to my rather harsh critique of their report:

 There are 52,860 public and charter schools that fall within our definition of middle-class schools, and they educate 25.7 million16 students. The message from Dr. Baker and the NEPC seems to be—let’s ignore them. In fact, let’s not even define them. Our view is that there is immense potential out there. These schools are failing in their basic mission—to become college factories.

From our perspective, college graduation rates of 31% and 23% in the second and third NSLP groupings, respectively—as our report presents—are unacceptable for America’s economic future. Clearly, the NEPC and Dr. Baker disagree and are satisfied with the status quo. We are not.

Yes, there it is. The insult of insults in reformyland! I am, as a result of critiquing their near criminal abuse of data, a… a… Status Quo-er!

Obviously, anyone (like me) who might take offense at such egregious representation of data must be a defender of the status quo. That is the worst offense in today’s reform debate. Especially if the egregious abuse of data was done with good intentions? Right? Done with the good intentions of letting the American public understand just how awful their schools are!  They need to know. America needs to know! And now! This can’t wait! Even if we have to classify information illogically or draw conclusions that don’t even match our data?

Look, bad data analyses and bombastic conclusions about our supposed education apocalypse do little or nothing to start a genuine conversation about either the true current conditions of our schools or whether we should be considering systemic changes.

Often, such crisis mode reporting has as its central objective, encouraging the public and policymakers to act in haste and adopt ill-conceived (often self-serving) policy before they know what’s really going on. That is, let’s get in a panic and adopt something really stupid and fast.  Any reader should be wary of and evaluate critically crisis-mode reports like the Third Way middle class report. Some such reports may ultimately reveal important issues and some even with a degree of immediacy. Third Way’s report reveals neither.

On ignorance & impartiality: A comment on the Monmouth U. Poll on Ed. Policy

Some Twitter followers may have noticed the ongoing back and forth regarding the validity of the recent Monmouth University Poll on education reform.I’d certainly rather spend my time on more substantive discussion.

As I’ve noted on many occasions, polls are what they are. They ask what they ask. And the responses to the questions must always be evaluated only with respect to what was asked. Questions about specific policies in particular require that the policies in question be described correctly. This is a point raised the other day by Matt Di   Carlo about the Monmouth Poll here.

Yesterday, Patrick Murray, director of the polling institute posted a response to some of the criticisms levied against the recent Monmouth poll. Unfortunately, I found his response to be much less fulfilling and in many ways far more disturbing than the poll itself. Quite honestly, I’d have left this issue alone if not for some particularly troublesome assertions made by the polling institute director Patrick Murray.

First, here is my response regarding the substantive issue raised by Matt Di   Carlo:

Mr. Murray points out that he, as many pollsters do, chose to use colloquial language to describe “tenure.” The problem, as explained by Matt Di Carlo here http://shankerblog.org/?p=3695, is that the colloquial characterization was factually incorrect, and that it would be possible to achieve a colloquial characterization that is not factually incorrect. The factual error in the characterization of tenure leads to a clear bias in the question. This is the most obvious example, but there are numerous more subtle cases where questions do not accurately represent existing or proposed legislation or regulations.

Here are a few additional points regarding content in Mr. Murray’s response:

Specifically, Mr. Murray contends that critics were simply unhappy with the results, and offered no substantive criticism of the methods.

On Twitter, I have criticized the title of the press release for the poll, which claims that the poll results indicate broad support for New Jersey reforms, implying that responses to the specific questions regarding policies can be taken as supporting the specific policies being proposed.  That is, it infers a close relationship between the policies framed in the questions and actual policy proposals on the table.  Usually,  it is the media who makes such misguided leaps. In this case, the polling institute provided them with the misleading headline.

Mr. Murray’s response not only defends the headline, but he actually makes even less justified statements (slightly more specific) to the same effect. Mr. Murray claims that the poll results provide “broad, general support” for the “Governor’s proposals”, which happen to be rather specific proposals (many of which are not actually the governor’s proposals, but proposals for which he has offered support).  But, very few (if any) of the questions in the poll accurately represent the specific proposals (like mischaracterizing what tenure is).  The questions are broad, and imprecise (if intended to discern support for existing proposals). They are general. Some are outright incorrect. As a researcher, I can assure you that a response to one question, referring to one type of policy (a hypothetical policy that is substantively different from the actual proposals) should not be interpreted as relating to another (without careful statistical validation, which would involve asking the other question).  That is a methodological concern. Not a concern with the findings. It is a concern largely over the representation of findings (press release titles matter), as opposed to the usual quibbling over sampling issues.

After defending the wording of the tenure question, Mr. Murray goes on to discuss the follow up questions to the tenure question – specifically those about how the general public would like to see tenure changed. The problem is that each of these questions about how to “change” tenure is invalid because “change” in the mind of the respondent (at least the uninformed respondent) is measured against an incorrectly defined baseline of what tenure is. That is, Mr. Murray has provided a prompt in the first tenure question that incorrectly describes tenure, asserting that tenure means that a teacher can only be fired for “serious misconduct.” Then he asks in a series of questions whether that should be changed and how. If the baseline condition – existing policy – is described incorrectly, arguably biased – then responses to subsequent questions are influenced by this. That is either biased, or simply sloppy.

Which brings up a related issue. Mr. Murray notes that many if not most poll respondents were unaware of policies, or details of reforms. Because of that, the phrasing of the questions, the colloquial explanations of the policies are of even greater importance, having even greater potential to shape the response. That phrasing can be the basis of grossly misinforming the otherwise uninformed respondent. And it just may have been.

The most significant and most disturbing point:

Setting aside this methodological quibbling, I take issue with Mr. Murray’s point that academic researchers might come at these issues with normative values – as I admittedly do – and that having normative values (based on years of extensive research on these topics) somehow invalidates someone’s ability to critique the poll. Mr. Murray explains:

 To start, most of the criticism has come from people without expertise in the field of survey research.  Some has, which I will treat more seriously.  But it’s important to note that all of these critics, including some who are academic researchers, have taken very public normative positions on education policy.  Normative is one of those great social science words.  It simply means they already have a clear opinion about how things ought to be.  When normative values get applied in a research setting, they lead to bias.

So, in other words. If you don’t have expertise in opinion research, your criticisms should not be taken seriously. And, if you have far too much knowledge and expertise in the substance of the poll (education law, policy and reform), you are too biased for your opinion to carry any weight. This argument is patently absurd.

As Mr. Murray frames it, only through blissful ignorance  on issues of substance can anyone be sufficiently impartial to be involved in, or make claims or arguments regarding either substance or method.  Those with knowledge and opinions derived from that knowledge are necessarily too biased to have valid concerns. I’ll admit that I have biases for rigorous research methodologies.

Like Dr. Di Carlo (who holds a Ph.D. in Sociology from Cornell), I’m not a pollster. I’m a researcher and perhaps that alters my view on how research is conducted and what kinds of conclusions can be reasonably drawn from survey responses to questions with specific wording.  I generally don’t care much for polls or polling results, but I am a stickler for methods.

This poll was about policies, not politicians. And as someone who studies policies I am particularly sensitive to the details of policy design & implementation. This poll was clearly not sensitive to those details and was exceptionally sloppy in its characterization of policies and policy design. And that’s a methodological problem, and one that is so glaringly apparent because of my academic expertise in this area – not because of some normative bias – but, because of actual details, including statutes and regulations.

Perhaps I’m being too picky, and that’s just how the polling industry works. Perhaps the normative values of pollsters allow for imprecise colloquial descriptions and drawing broad unsubstantiated conclusions. That seems to be the gist of Patrick Murray’s argument, and one I find distasteful enough to require a response.

Inkblots and Opportunity Costs: Pondering the Usefulness of VAM and SGP Ratings

I spent some time the other day, while out running, pondering the usefulness of student growth percentile estimates and value added estimates of teacher effectiveness for the average school or district level practitioner. How would they use them? What would they see in them? How might these performance snapshots inform practice?

Let’s just say I am skeptical that either VAMs (Value Added Models) or SGPs (Student Growth Percentiles) can provide useful insights to anyone who doesn’t have a pretty good understanding of the nuances of these kinds of data/estimates & the underlying properties of the tests. If I was a principal, would I rather have the information than not? Perhaps. But I’m someone who’s primary collecting hobby is, well, collecting data. That doesn’t mean it all has meaning, or more specifically, that it has sufficient meaning to influence my thinking or actions. Some does. Some doesn’t. Keeping some of the data that doesn’t have much meaning actually helps me to delineate. But I digress.

It seems like we are spending a great deal of time and money on these things for questionable return. We are investing substantial resources in simply maximizing the links in our data systems between individual student’s records and their classroom teachers of record, hopefully increasing our coverage to, oh, somewhere between 10% and 20% of teachers (those with intact, single teacher classrooms, serving children who already have a track record of prior tests – e.g. upper elementary classroom teachers).

At the outset of this whole “statistical rating of teachers” endeavor, it was perhaps assumed by some economists that we would just ram these things through as large scale evaluation tools (statewide and in large urban districts) and use them to prune the teacher workforce and that would make the system better. We’d shoot first… ask questions later (if at all). We’d make some wrong decisions, hopefully statistically more “right’ than wrong, and we’d develop a massive model and data set for large enough numbers of teachers that the cost per unit (cost per bad teacher correctly fired, counterbalanced by the cost per good teacher wrongly fired) would be relatively low. We’d bring it all to scale, and scale would mean efficiency.

Now, I find this whole version of the story to be too offensive to really dig into here and now. I’ve written previously about “smart selection” versus “dumb selection” regarding personnel decisions in schools. And this would be what I called “dumb selection.

But, it also hasn’t necessarily played out this way… thankfully… except perhaps for some large city systems like Washington, DC, and a few more rigidly mandated state systems (though we’re mostly in wait-and-see mode there as well). Instead, we are now attempting to be more “thoughtful” about how we use this stuff and asking teachers to ponder their statistical ratings for insights into how they interact with children? How they teach? And we are asking administrators to ponder teachers’ statistical estimates for any meaning they might find.

In my current role, as a researcher of education policy, I love equations like this: http://graphics8.nytimes.com/images/2011/03/07/education/07winerip_graphic/07winerip_graphic-articleLarge-v2.jpg

I like to see the long lists of coefficients (estimates of how some measure in the model relates to the dependent variable) spit out in my Stata logs and ponder what they might mean, with full consideration of what I’ve chosen to include or exclude in the model, and whether I’m comfortable that the measures on both sides of the equation are of sufficient quality to really tell me anything… or at least something.

The other evening,  I thought back to my teaching days (considered a liability as an education policy researcher), at whether I thought it would have been useful to me to simply have some rating of my aggregate effectiveness – simply relative to other teachers. Nothing specific about the performance of my students on specific content/concepts. Just some abstract number… like the relative rarity that my students scored X at the end of my class given that they scored X-Y at the end of last years class? Or, some generalized “effectiveness” rating category based on whether my coefficient in the model surpassed a specific cut score to call me “exceptional” or merely “adequate?” Something like this.

Would that be useful to me? to the principal? if I was the principal?

Given that I typically taught 2 sections of 7th grade life science and 2 of 8th grade physical science (yeah… cushy private school job), with class sizes of about 18 students each, which rotated through different times of day, I might also find it fun to compare growth of my various classes. Did the disruptive distraction kid really cause my ratings in one life science section to crash (you know who you are!)? Was the same kid able to bring her 8th grade teacher down the next year (hopefully not me again!)?

I asked myself… would those ratings actually tell me anything about what I should do next year (accepting that the data would come on a yearly cycle)? Should I go watch teachers who got better ratings? Could I? Would they protect their turf? Would that even tell me a damn thing? Besides, knowing what I do now, I also know that large shares of the teachers who got a better rating likely got that rating either because of a) random error/noise in the data or b) some unmeasured attribute of the students they serve (bias). Of course, I didn’t know that then, so what would I think?

My gut instinct is that any of these aggregate indicators of a teacher’s relative effectiveness, generated from complex statistical models, with, or without corrections for other factors, are little more than ink blots to most teachers and administrators. And I”m not convinced they’ll ever be anything more than that. They possess many of the same attributes of randomness or fuzziness of an ink blot. And while the most staunch advocate might wish them to appear as an impressionist painting, I expect they are still most often seen as ink blots – not even a Jackson Pollock. More random than pattern. And even if/when there is a pattern, the average viewer may never pick it up.

I anxiously (though skeptically) await well crafted qualitative studies exploring stakeholders’ interpretations of these inkblots.

But these aren’t just any ink blots. They are rather expensive ink blots if and when we start trying to use them in more comprehensive and human resource intensive ways through local public schools and districts and if we weigh on them the burden that we MUST use them not merely to inform, but rather to DRIVE our decisions – and must find significant meaning in them to justify doing so.  That is, if we really expect teachers and principals to log significant hours trying to derive meaning from them, after consultants, researchers, central office administrators and state department officials have labored over data system design, linking teachers to students, and deciding on the most aesthetically pleasing representation of teacher performance classifications for the individual reporting system. Using these tools as quick screening, blunt instruments is certainly a bad idea. But is this – staring at them for endless hours in search of meaning that may not be there – much better?

It strikes me that there are a lot more useful things we could/should/might be spending our time looking at in order to inform and improve educational practice or evaluate teachers. And that the cumulative expenditure on these ink blots, including the cost of time spent musing over them, might be better applied elsewhere.

More on the SGP debate: A reply

This new post from Ed News Colorado is in response to my critique of Student Growth Percentiles here: https://schoolfinance101.wordpress.com/2011/09/02/take-your-sgp-and-vamit-damn-it/

I must say that I agree with almost everything in this response to my post, except for a few points. First, they argue:

Unfortunately Professor Baker conflates the data (i.e. the measure) with the use. A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

No, I do not conflate the data and measures with their proposed use. Policy makers are doing that and doing that based on ill advisement from other policymakers who don’t see the important point – the primary purpose – as Betebenner, Briggs and colleagues explain.  This is precisely why I use their work in my previous post – because it explains their intent and provides their caveats.

Policymakers, by contrast are pitching the direct use of SGPs in teacher evaluation. Whether they intended this or not, that’s what’s happening. Perhaps this is because they are not explaining as bluntly they do here, what the actual intent/design was.

Further, I should point out that while I have marginally more faith that a VAM could, in theory be used to parse out teacher effect than an SGP, which isn’t even intended to, I do not have any more faith than they do that a VAM actually can accomplish this objective. They interpret my post as follows:

Despite Professor Baker’s criticism of VAM/SGP models for teacher evaluation, he appears to hold out more hope than we do that statistical models can precisely parse the contribution of an individual teacher or school from the myriad of other factors that contribute to students’ achievement.

I’m not, as they would characterize, a VAM supporter over SGP, and any reader of this blog certainly realizes that. However, it is critically important that state policymakers be informed that SGP is not even intended to be used in this way. I’m very pleased they have chosen to make this the central point of their response!

And while SGP information might reasonably be used in another way, if used as a tool for ranking and sorting teacher or school effectiveness, SGP results would likely be more biased even than VAM results… and we may not even know or be able to figure out to what extent.

I agree entirely with their statement (but for the removal of “freakin”):

We would add that it is a similar “massive … leap” to assume a causal relationship between any VAM quantity and a causal effect for a teacher or school, not just SGPs. We concur with Rubin et al (2004) who assert that quantities derived from these models are descriptive, not causal, measures. However, just because measures are descriptive does NOT imply that the quantities cannot and should not be used as part of a larger investigation of root causes.

The authors of the response make one more point, that I find objectionable (because it’s a cop out!):

To be clear about our own opinions on the subject: The results of large-scale assessments should never be used as the sole determinant of education/educator quality.

What the authors accomplish with this point, is permitting policymakers to still assume (pointing to this quote as their basis) that they can actually use this kind of information, for example, for a fixed 90% share of high stakes decision making, regarding school or teacher performance, and  certainly that a fixed 40% or 50% weight would be reasonable. Just not 100%. Sure, they didn’t mean that. But it’s an easy stretch for a policymaker.

If the measures aren’t meant to isolate system, school or teacher effectiveness, or if they were meant to but simply can’t, they should NOT be used for any fixed, defined, inflexible share of any high stakes decision making.  In fact, even better, more useful measures shouldn’t be used so rigidly.

[Also, as I’ve pointed out in the past, when a rigid indicator is included as a large share (even 40% or more) in a system of otherwise subjective judgments, the rigid indicator might constitute 40% of the weight but drive 100% of the decision.]

So, to summarize, I’m glad we are, for the most part, on the same page. I’m frustrated that I’m the one who had to raise this issue in part because it was pretty clear to me from reading the existing work on SGP’s that many were conflating the measure with its use. I’m still concerned about the use, and especially concerned in the current policy context. I hope in the future that the designers and promoters of SGP will proclaim more loudly and clearly their own caveats – their own cautions – and their own guidelines for appropriate use.

Simply handing off the tool to the end user and then walking away in the face of misuse and abuse would be irresponsible.

Addendum: By the way, I do hope the authors will happily testify on behalf of the first teacher who is wrongfully dismissed or “de-tenured” on the basis of 3 bad SGPs in a row. That they will testify that SGPs were never intended to assume a causal relationship to teacher effectiveness, nor can they be reasonably interpreted as such.

Friday Afternoon Maps: New Orleans, Race & School Locations

A few weeks back, I noticed several tweets about this recent article in Harvard Education Review which takes a look at racial politics and the rebuilding of New Orleans in the Post-Katrina era.

Here’s the dropbox link tweeted by Diane Ravitch:

http://dl.dropbox.com/u/11116752/Buras_2011-Race_Charter_Schools_Conscious_Capitalism.pdf.pdf

The article is by Kristen Buras of Georgia State University. Buras, like at least a few others, points out that Hurricane Katrina forced the greatest housing displacement in poor black neighborhoods of New Orleans. But, perhaps more disturbing was that in the post Katrina period, redevelopment… and especially redevelopment of the new, mixed delivery schooling system largely ignored those same areas, leading to a system where access to schooling is very disparately distributed geographically.

In her article Buras went to the painstaking steps of hand plotting the locations of post-Katrina schools (See her Figure 3, page 321) to make her point about school locations, and that map certainly does so, though a good before-after might be even clearer.

I’ve been meaning to do some pre-post Katrina school mapping for some time now, but wasn’t quite sure what I wanted to look at, or how I might organize the information. Well, here’s what a little Friday afternoon play has yielded.

First, I used US Census 2000 and American Community Survey 2005 data to set up my background. The background carves New Orleans into Public Use Micro Data Areas (PUMAS, from http://www.ipums.org, boundary files from http://www.census.gov). For the background shading, I used IPUMS data to estimate the percent of resident 5 to 17 year olds in each PUMA that were Black in 2000 and 2005 – pre-Katrina conditions. Those red areas to the right hand side, over toward the lower 9th ward and to the Northeast are almost entirely black, for school aged population. While the entire city has relatively high shares of black population, as Buras notes, uptown and the Garden District are certainly somewhat less black than other parts of the city.

In the first map here, I show the locations and total enrollments of schools (indication of available slots) for the year 2000. I use yellow triangles to indicate if a school is a charter school. There were a few, even in 2000. School locations are based on latitude and longitude data from the National Center for Education Statistics Common Core of Data (www.nces.ed.gov/ccd).

Map 1. Year 2000 distribution of traditional public and charter schools in New Orleans


In the first figure, there are a significant number of decent size schools in the deeper red (higher % black) areas of the city. Citywide, there are a handful of charters scattered around.

Now, here’s the distribution of charters and traditional public schools in 2010. Yes, the city as a whole lost a lot of population (but did rebound somewhat between 2006 and 2010, hence the interest in 2010). Quite strikingly, there are simply very few schools of any size now available in those deep red zones (shading still based on pre-Katrina population). And while there are charters scatted throughout the city, even the highest concentration of those schools is in areas with marginally lower pre-Katrina black populations. There are generally more schools and more larger schools in those neighborhoods.

Again, circle size indicates enrollment size, and if the circle has a yellow triangle over it, the school is a charter school.  Further, I’ve kept the size scaling of circles on the same scale in this map as in the previous one. So, if a circle is smaller, it’s enrollment is smaller.

Map 2. Year 2010 distribution of traditional public and charter schools in New Orleans

Now, it is indeed hard to untangle supply from demand here. One can make the argument that the population didn’t return, therefore there is no demand for schools in those areas previously inhabited by the city’s lowest income black populations. Alternatively, one can as reasonably (and more so after reading Buras) argue that the dearth of available public services may provide some explanation for why families have not returned, or have not been able to return.

One might argue that because there exist so many “schools of choice” throughout the city, that geographic location doesn’t really matter. Ya’ just got to travel a bit. Sign up for one of those great schools over there! But research has consistently shown that even in “choice’ models geographic location/proximity is central to enrollment decisions.  Location matters. And having quality options nearby is important. In fact, parents will often favor location over publicly available “quality” measures, continuing enrollment in schools identified as persistently failing if/when other options are simply not geographically accessible. Then again, those “quality” measures aren’t always particularly meaningful.

This population density map for individuals 18 and under suggests comparable population densities in those areas where school density (especially charter school density) has remained much lower: http://www.gnocdc.org/LossOfChildrenInNewOrleansNeighborhoods/Map3.html

Authors such as Henry Levin have explained on numerous occasions that for a choice model to yield equitable distribution of opportunity, consumers must have equitable access to information on schools and equitable mobility among options. Clearly, equitable geographic access is out the window in Post-Katrina New Orleans. Yeah, I think we already knew this from various media reports. But sometimes I have to play with the data and map them myself for it to really sink in. Whether driven by geographic assignment or by choice enrollment, the distribution of educational opportunities in Map 2 above is troublesome.

Far more troublesome is that so many have publicly pitched this New Orleans mixed delivery model as the key to the future of urban education.

Like Buras, I’m pretty damn skeptical that an education system that has redistributed educational opportunity in the ways seen between Map 1 and Map 2 above is all that.  Just pondering and mapping on a Friday afternoon as the sun finally emerges in the rain-soaked Northeast.

Related maps on school aged population loss here: http://www.gnocdc.org/LossOfChildrenInNewOrleansNeighborhoods/index.html


Should there be a Constitutional Right to Unlimited Property Taxation?

A Reply to Dunn and Derthick in Education Next

Anyone who has read my previous work knows I’m not generally a fan of tax and expenditure limits. A significant body of empirical research does show that strict tax and expenditure limits can cause significant damage to state school finance systems over the long haul. For example, Author David Figlio in a study of Oregon’s Measure 5 (National Tax Journal Vol 51 no. 1 (March 1998) pp. 55-70) finds that: Oregon student-teacher ratios have increased significantly as a result of the state’s tax limitation. David Figlio and Kim Rueben in the Journal of Public Economics (April 2001, Pages 49-71) find: Using data from the National Center for Education Statistics we find that tax limits systematically reduce the average quality of education majors, as well as new public school teachers in states that have passed these limits. In a non-peer reviewed, but high quality working paper, Thomas Downes and David Figlio “find compelling evidence that the imposition of tax or expenditure limits on local governments in a state results in a significant reduction in mean student performance on standardized tests of mathematics skills.” (http://ase.tufts.edu/econ/papers/9805.pdf)

Despite my general concerns over tax and expenditure limits, I have even greater concern over legal arguments like those posed by an affluent suburban school district in Kansas, summarized by Joshua Dunn and Martha Derthick in the Fall 2011 issue of Education Next. As Dunn and Derthick explain, beginning in the 1990s Kansas imposed limits on the amount of revenue local public school districts can raise above and beyond the revenue they are guaranteed through the state general fund aid formula. One affluent suburban district outside of Kansas City recently filed a legal challenge to those limits in Federal District Court, and that legal challenge was the subject of Dunn and Derthick’s recent column. Dunn and Derthick explain the legal arguments as follows:

Citing Supreme Court decisions in Meyer v. Nebraska (1923) and Pierce v. Society of Sisters (1925), which held that the liberty guaranteed in the Fourteenth Amendment’s Due Process Clause includes a right of parents to control the education of their children, the plaintiffs charged that the local cap infringes on that right. As well, by forbidding additional taxes it limits their right to use their property as they wish. Still more inventive, they invoked the First Amendment right of assembly, saying that the cap prevents voters from expressing their collective wishes at the ballot box. These violations together, they contended, constitute a denial of equal protection of the law.

http://educationnext.org/trouble-in-kansas/

So then, what’s wrong with considering the individual liberty to unlimited property taxation? If such liberties apply to campaign contributions or other forms of assembly, then why not to the choice to levy whatever property tax one sees fit? And what’s wrong with linking the notion of complete “local” control over property taxation to the notion of parental control over the education of one’s own children? Ah, if it was only so simple. But it’s not, and here’s a primer on why.

A Little Background Tax and Expenditure Limits (TELs)

State imposed limitations on the taxing behavior of state recognized intermediate and local jurisdictions fall into a broad category of state fiscal management policies known as Tax and Expenditure Limits, or TELs. Tax and expenditure limits have been around for decades and exist in one form or another across nearly every state.

Arguably, the modern era of Tax and Expenditure Limits began with the adoption by statewide referendum of California’s Proposition 13 in 1978, which included a series of limits to the taxable assessed values of properties and changes in those assessed values and included an overall tax rate cap.  Daniel R. Mullins and Bruce A. Wallin (2004) note that “Within two years of the passage of Proposition 13 (a California initiative), 43 states had implemented some kind of property tax limitation or relief.” [1]  By 2004, Mullins and Wallin indicate that Forty-six states have some form of constitutional or statutory statewide limitation on the fiscal behavior of their units of local government.

Statewide limitations on local property taxes exist in multiple forms across states.

Overall Property Tax Rate Limits: Mullins and Wallin note that limits on property tax rates are the most common form of Tax and Expenditure Limit. Overall property tax rate limits restrict the total (municipal, school and other) property tax rate which can be adopted by local jurisdictions. Overall property tax rate limits may but do not necessarily include an option for local override votes. That is, property tax rates are limited but may be exceeded by local voter approval, often including such restrictions as requiring a super-majority vote to achieve override. Mullins and Wallin note that 33 states have imposed property tax rate limits, with 31 limiting municipalities, 28 counties, 26 school districts and 23 all three types (p. 7).

Specific Property Tax Rate Limits: Specific property tax rate limits apply limitations to tax rates for one component of local public goods or services, for example a rate limit on municipal taxes only or a rate limit on property taxes for  operating revenues for local public schools, or for capital outlay revenues for local public school. Again, override options may or may not be included.

Property Tax Revenue Limit: Property tax revenue limits place limits on the revenue that may be derived from property taxes in a given year, regardless of the rate applied. Revenue limits may either be applied to the total revenue allowable (revenue level) or, more commonly to the rate of increase in revenue allowable.

Assessment Increase Limit: Because property tax revenues collected, and tax bills paid by property owners are a function of both the tax rate applied and the assessed value of properties, constraints placed on the allowable growth in assessed value also operate as property tax limitations.

General Revenue or Expenditure Limit: States also place caps on the total amount of revenue that can be raised from property taxes for specific purposes, or alternatively on the amount of property tax revenue that can be raised and expended in a given year. Like other limits, these may be placed on either the total level or revenue or expenditures or on the annual growth in revenue or expenditures, and may or may not be coupled with override options (where those override options are also specified in state laws).

Finally, many states include complex combinations of the above property tax and expenditure limits, such as including both a limit on the rate at which assessed property values may grow and the a limit on the property tax levy.

Property Taxation and TELs in Kansas

The above descriptions of tax and expenditure limits reveal some of the complexity of how these limits work. For example, state imposed limits on growth in property value assessments are a property tax limit to the same extent as limiting the tax rate than can be applied to those properties.  Property taxes include multiple moving parts, or multiple policy levers, the vast majority of which in most states are creations of and controlled within state constitutions and statutes. Below is a non-exhaustive list of the moving parts of the property tax revenue equation:

  1. The boundaries of taxing jurisdictions:  Taxing jurisdictions are government subdivisions within states, defined in state statutes and/or constitutions. They are creations of the state, even if granted home rule or limited home rule. Taxing jurisdictions may or may not be as simple as “cities and towns” or “municipalities.” In some states, municipal taxing jurisdictions are reasonably aligned with local school taxing jurisdictions, but in others like Kansas, they are not. The lack of contiguity between local public school district boundaries and municipal boundaries in Kansas is largely a result of school district consolidations that occurred under state statutes adopted in the 1960s, concurrent with (shortly before)  rewriting of the education article of the state constitution. In many states, the geographic spaces defined as taxing jurisdictions and enrollment areas for local public school districts continue to be redrawn, as in the case of the northeastern section of Kansas City Missouri School District which was recently annexed to Independence School District through a procedure created (specifically for that circumstance) under a recent Missouri statute.  Further, school district boundary determinations (under state laws) are often linked to a long history (including recent history) of institutionalized and state sanctioned racial discrimination in housing markets.[2] The defined geographic boundaries of a taxing jurisdiction determine the properties that are included in or excluded from that jurisdiction. Those boundaries ultimately determine the total values of property within the bounded space, and in turn the amount of revenue that can or cannot be generated by applying any given tax rate to those properties.


Figure 1

School District (green) Boundaries and Cities and Towns in the Kansas City Metro



  1. Definitions of Property Types: Different types of property exist within any taxing jurisdiction, including residential properties, residential properties owned by non-residents (second homes), commercial properties, industrial properties, utilities and farm properties. In Kansas and elsewhere property types are defined in the State Constitution (Article 11). The definition of property types influences substantially the application of “local” property taxes because each defined jurisdiction contains a different mix of property types – some with more commercial property than others – some more residential – and others more farm property. And the different values applied to different types of properties become a significant factor influencing the local revenue raising capacity of communities. Note that in Kansas, as elsewhere, the highest aggregate property values per child enrolled in school are not those in school districts with the highest valued houses, but are those in communities like Burlington, Moscow and Rolla which each include non-residential properties of significant value.
  2. Valuation Procedures: Procedures for determining the taxable value of properties are also defined in state statutes and constitutions, and in Kansas, in Article 11 of the constitution. Those valuation procedures operate as a form of tax and expenditure limitation. Residential properties are defined to have a taxable value of 11.5% of fair market value, agricultural land 30%, vacant lots 12%. States adopt such structures out of state policy interest in creating certain types of incentives or controls, including incentives to either preserve or develop farm property or vacant lots, or buffer commercial interest from escalating taxes. These differential assessment ratios are effectively limits to the revenue raising capacity from any applied tax rate.
  3. Property Tax Exemptions: States also control, typically via statute, the extent to which intermediate or local jurisdictions may grant exemptions to property taxes, including the duration over which an exemption may be granted or types of properties that may be granted exemptions. States also may impose exemptions such exempting from property taxes, a proportion of the value of residential properties owned by senior citizens, in the policy interest of protecting seniors on fixed income from escalating property taxes. As a tax equity measure, Kansas in the late 1990s adopted and exemption to the first $20,000 in taxable value of a residential property for property taxes applied to General Fund Revenues for schools (a statutory provision).
  4. Tax Rate Setting & Referendum Procedures: States also regulate the procedures by which local school district budgets are determined and/or tax rates are set. In some states with constitutional property tax limits which include override provisions, the referendum procedure for override is in the constitution, and may include a requirement of super-majority vote to achieve an override. Requirement of a super-majority is a limit. In other states, statutory provisions permit local authorities to raise taxes (or resulting revenues) to specific levels without voter approval and above those levels with voter approval. In some cases, those limits are absolute and cannot be exceeded.
  5. Debt Ratio Ceilings on Bonded Indebtedness: States also impose various limitations on the amount of debt “local” jurisdictions may accumulate toward the financing of capital projects. Kansas, like other states, imposes a limit – measured as a percentage of total taxable assessed valuation – on the amount of debt that can be accumulated through issuance of general obligation municipal bonds for the financing of new school construction or major renovations.

Each and every provision above and each and every element of the property tax system is controlled by and exists only as a function of state constitutional provisions and statutes. Further, each piece of the property tax puzzle imposes limitations – state controlled limitations – on the ability of state sanctioned local jurisdictions to raise revenues with property taxes.

Extreme Implications of a constitutional protection for complete, unregulated local citizen control over property taxation

Taken at its extreme, the assumption that local residents of any geographic space in the State of Kansas possess a Constitutional right to unlimited control over property taxation for “their” local public schools means that those local residents would have control over each and every parameter above, as each parameter above is a critical determinant of the revenue generated for local public schools by adoption of a specific property tax levy (a rate multiplier). Set any parameter – or multiplier to “0” – and the whole equation shuts down. No one piece is more important than another at determining the amount of money that can be raised for “local” public schools.

Taken at its extreme, any group of citizen residents of the state of Kansas should be able to organize themselves, and define a geographic area that they consider to be their taxing jurisdiction. They would then have the authority to define the types of properties in their jurisdiction and the method for determining the taxable value of those properties. Further, they would have the right to decide whether a mere majority or super majority vote is required in order to adopt any particular tax rate to apply to those properties.

If local citizens control only the single parameter of tax rate setting (the “mill levy”), the state could simply alter rules for adopting rate increases, such as requiring a super-majority vote. Or the state could adopt legislation which effectively reduces the taxable value of properties or exempts certain types of properties for raising additional school revenue above current local option budget limits. For example, the state could exempt all commercial and industrial properties from additional taxation (much like the 20% exemption on residential properties for General Funds Budgets). Such state controls, while not limiting the levies adopted, would limit the revenue that could be generated by those levies. Each of these rules only presently exists as a function of prior state, not local actions.

Assuming that there exists only a constitutional right to adopt higher tax levies, but those levies are to be adopted within an otherwise completely state controlled policy framework, is illogical. If such constitutional freedoms do exist, then they must apply to each and every relevant parameter limiting revenue.

Clearly, however, assuming that local groups of citizens have unlimited rights to determine each and every parameter in the property tax revenue generating equation is absurd, would moot numerous Kansas statues, Article 11 of the Kansas Constitution, and similar constitutional and statutory provisions across nearly every other state.

The state interest in regulating taxes imposed on non-resident property owners

As school district boundaries are presently organized, especially in the Kansas City metropolitan area, school districts each consist of many types of properties. Implicit in the assumption that there exists a constitutionally protected individual right to raise additional funds, through property taxation, for the education of one’s own children is that there exists an overly simplistic 1 to 1 to 1 ratio between children to be educated, the parents of those children and homeowner taxpayers of the jurisdiction. That is, each taxpayer homeowner is also a parent with interest in the quality of education provided to his or her child at the collective expense.  Such would be true if the group of parents organized to start a private school and used their private resources to finance the operations of that school to a level suitable to their own tastes.

This assumption crumbles when applied to local property taxation for public schools and when we consider the mix of property types, property owners and taxpayers that fall within any school taxing jurisdiction in Kansas. For example, owners of commercial and industrial properties within the jurisdiction may not be residents of the jurisdiction. Taxes paid by these individuals may be affected significantly by the decisions of a simple majority share of local residents of the district. The state has a legitimate interest in and may see fit to limit such impact. And one method for doing so is the maintenance of existing tax and expenditure limits.

It seems absurd to assume that a group of resident citizens of a jurisdiction have a constitutional right to unlimited taxation of someone else’s property without the option of state intervention.

The state interest in regulating taxes imposed on vulnerable minority voting blocks

Senior citizens who currently no-longer have children attending local public schools and are living on fixed income may be outnumbered at the polls in some jurisdictions when school budget (levy referenda) votes are held. Many states have policies exempting portions of the value of properties owned by senior citizens in order to provide some protection against escalating taxes. Those exemptions are a state imposed limit to property taxation.

As noted above, if we accept the assumption of a constitutional right for a group of local residents in a taxing jurisdiction to levy unlimited taxes on the rest of the jurisdiction, we must also accept that those same residents have control over each and every parameter in the property tax revenue generating equation that might limit their revenue raising capacity. A simple majority of residents could then negate exemptions. The state has a legitimate interest in protecting the rights of local minority voter populations, such as senior citizens, through such policy mechanisms as property tax exemptions.

The state interest in maintaining school funding fairness

Finally, the state also has an interest in the maintenance of equity in the provision of public education and in access to equal educational opportunity, and one mechanism the state has adopted in order to maintain equity is the limitation to supplemental local spending through property taxation.

Why is it problematic from an equal educational opportunity perspective for local public school districts to have unlimited ability to raise their property taxes and spend as they see fit on their local public schools? How, for example, does it harm the children of Kansas City, Kansas if the parents in Shawnee Mission or Blue Valley School Districts choose to substantially outspend Kansas City over the next several years and provide far higher quality local public schools?

Given the vast student  population differences across school districts in the Kansas City area and specifically between Kansas City and Shawnee Mission which are immediate neighbors, there exist very large differences in the actual cost of providing children with equal educational opportunity. Professor William Duncombe (Syracuse University), on behalf of the Kansas Legislative Division of Post Audit in 2006, estimated that if the cost of a specific quality of education for the state average district was set to 100 (100%), the cost of achieving equal opportunity for students in Kansas City would be about 35% higher than that average, and in Shawnee Mission would be about 12% lower than that average. Presently, the state school finance formula provides for much less difference in funding than would actually be needed to achieve more equal educational opportunity (See Table 1). In fact, when all state and local revenues are considered, Kansas is rated as having a regressive to flat state school finance system – one where higher poverty districts have systematically lower (or, at best, nearly comparable) resources per pupil than lower poverty districts – in a recent national report (as of 2007-08).[3]

The differences in cost of equal educational opportunity estimated by William Duncombe are a function of many factors, most notably vast differences in the backgrounds and needs of children attending local public school districts (See Table 2). More needy students require a wider array of services, including more specialized personnel, smaller class sizes and specific educational and support programs. The state has both an interest and a constitutional obligation to provide equal educational opportunity.

There are at least two major reasons why states have an interest in the maintenance of equity and equal educational opportunity across local public school districts.

First, education is a positional good. Access to economic opportunity, including access to higher education for children in Kansas City, Kansas depends not only on the absolute level of educational expenditure in their own public schools but on the relative quality of education they receive compared to that of other children competing for the same slots in local public and private colleges and universities.

Second is that the quality of schooling in any given location depends largely on the quality of teacher workforce that may be attracted to teach in any given location.  It is well understood that in any given labor market, working conditions – most notably student population characteristics – substantially influence teacher job choice, most often to the disadvantage of the neediest students. It would take not only equal, but significantly higher wages to recruit and retain teachers of comparable qualification to teach in Kansas City as it would to recruit and retain similar teachers to teach in Shawnee Mission, Blue Valley or Olathe. The competitive wage for teachers of specific qualifications in any given area are driven by the wages paid by each district’s nearest neighboring competitors and by the differences in working conditions across districts.

At present, teacher salaries in Kansas City, Kansas are already much lower than those in Shawnee Mission and other Johnson County districts (Table 3). They are lower partly because the state already allows Johnson County districts to levy a special “cost of living” tax (see Table 4) which falsely assumes that teachers in districts with more expensive houses are therefore more expensive to hire. Providing further opportunity for Johnson County districts to widen the salary gap, by removing state imposed tax limits, would likely lead to even greater disparities in teacher qualifications across wealthy and poor districts serving lower and higher need student populations in the Kansas City metropolitan area.

If Shawnee Mission and other Johnson County parents have the right to raise their property taxes in order to recruit and retain better teachers, don’t Kansas City parents have the same right? While they might have a similar right, they do not have similar capacity. Granting this right does not require that the state adopt any measures to equalize the capacity to compete.

For every additional mill on the local tax levy, Shawnee Mission can raise an additional $117 per pupil, whereas Kansas City can raise only $38, a greater than 3X difference (see Table 5). Even under present circumstances, with imposed limitations to the local option budget, Kansas City salaries lag behind Johnson County districts, and Johnson County districts have already been provided a local taxing opportunity to widen the gap, an option some have used.

TABLES AVAILABLE IN PDF VERSION: Fast Response Brief on Individual Liberty and Tax Limits


[1] Daniel R. Mullins and Bruce A. Wallin (2004) Tax and Expenditure Limitations: Introduction and Overview Public Budgeting and Finance (Winter) 2 – 15

[2] See Kevin Fox Gotham (2000) Urban Space, Restrictive Covenants and the Origins of Racial Residential Segregation in a U.S. City. 1900 to 1950. International Journal of Urban and Regional Research 24 (3) 616-633

The When, Whether & Who of Worthless Wonky Studies: School Finance Reform Edition

I’ve previously written about the growing number of rigorous peer reviewed and other studies which tend to show positive effects of state school finance reforms. But what about all of those accounts to the contrary? The accounts that seem so dominant in the policy conversations on the topic. What is that vast body of research that suggests that school finance reforms don’t matter? That it’s all money down the rat-hole. That in fact, judicial orders to increase funding for schools actually hurt children?

Beyond utterly absurd graphs and tables like Bill Gates’ “turn the curve upside down” graph, and Dropout Nation’s even more absurd graph, there have been a handful of recent studies and entire books dedicated to proving that court ordered school finance reforms simply have no positive effect on children. Some do appear in peer reviewed journals, despite egregious (and really obvious) methodological flaws. And yes, some really do go so far as to claim that court ordered school finance reforms “harm our children.”[1]

The premise that additional funding for schools often leveraged toward class size reduction, additional course offerings or increased teacher salaries, causes harm to children is, on its face, absurd. Further, no rigorous empirical study of which I am aware actually validates that increased funding for schools in general or targeted to specific populations has led to any substantive, measured reduction in student outcomes or other “harm.”

But questions regarding measurement and validation of positive effects versus non-effects are complex. That said, while designing good research analyses can be quite complex, the flaws of bad analyses are often absurdly simple. As simple as asking three questions: a) whether the reform in question actually happened? b) when it happened and for how long? and c) who was to be affected by the reform?

  • Whether: Many analyses argue to show that school funding reforms had no positive effects on outcomes, but fail to measure whether substantive school funding reforms were ever implemented or whether they were sustained. Studies of this type often simply look at student outcome data in the years following a school funding related ruling, creating crude classifications of who won or lost the ruling. Yet, the question at hand is not whether a ruling in-and-of-itself leads to changes in outcomes, but whether reforms implemented in response to a ruling do. One must, at the very least, measure whether reform actually happened!
  • When: Many analyses simply pick two end points, or a handful of points of student achievement to cast as a window, or envelop around a supposed occurrence of school finance reform or court order, often combining this strategy with the first (not ever measuring the reform itself). For example, one might take NAEP scores from 1992 and 2007 on a handful of states, and indicate that sometime in that window, each state implemented a reform or had a court order. Then one might compare the changes in outcomes from 1992 to 2007 for those states to other states that supposedly did not implement reforms or have court orders. This, of course provides no guarantee that states from the non-reform group (a non-controlled control group?) didn’t actually do something more substantive than the reform group. But, that aside, the casting of a large time window and the same time window across states ignores the fact that reforms may come and go within that window, or may be sufficiently scaled up only during the latter portion of the window. It makes little sense, for example to evaluate the effects of New Jersey’s school finance reforms which experienced their most significant scaling up between 1998 and 2003, by also including 6 years prior to any scaling up of reform. Similarly, some states which may have aggressively implemented reforms at the beginning of the window may have seen those reforms fade within the first few years. When matters!
  • Who: Many analyses also address imprecisely the questions of “who” is expected to benefit from the reforms. Back to the “whether” question, if there was no reform, then the answer to this question is no-one. No-one is expected to benefit from a reform that didn’t ever happen. Further, no-one is expected to benefit today from a reform that may happen tomorrow, nor is it likely that individuals will benefit twenty years from now from a reform that is implemented this year, and gone within the next three years. Beyond these concerns, it is also relevant to consider whether the school finance reform in question, if and when it did happen, benefited specific school districts or specific children. Reforms that benefit poorly funded school districts may not also uniformly benefit low income children who may be distributed, albeit unevenly, across well-funded and poorly-funded districts. Not all achievement data are organized for appropriate alignment with funding reform data. And if they are not, we cannot know if we are measuring the outcomes of who we would actually expect to benefit.

In 2011, Kevin G. Welner of the University of Colorado and I published an extensive review of the good, the bad and the ugly of research on the effectiveness of state school finance reforms.[2] In our article we identify several specific examples of empirical studies claiming to find (not just “find” but prove outright) that school funding reforms and judicial orders simply don’t matter. That is, they don’t have any positive effects on measured student outcomes. But, as noted above, many of those studies suffer from basic flaws of logic in their research design, which center on questions of whether, when and who.

As one example of a whether problem, consider an article published by Greene and Trivett (2008). Greene and Trivitt claim to have found “no evidence that court ordered school spending improves student achievement” (p. 224).  The problem is that the authors never actually measured “spending” and instead only measured whether there had been a court order. Kevin Welner and I explain:

The Greene and Trivitt article, published in a special issue of the Peabody Journal of Education, proclaimed that the authors had empirically estimated “the effect of judicial intervention on student achievement using standardized test scores and graduation rates in 48 states from 1992 to 2005” and had found “no evidence that court ordered school spending improves student achievement” (p. 224, emphasis added). The authors claim to have tested for a direct link between judicial orders regarding state school funding systems and any changes in the level or distribution of student outcomes that are statistically associated with those orders. That is, the authors asked whether a declaration of unconstitutionality (nominally on either equity or adequacy grounds) alone is sufficient to induce change in student outcomes. The study simply offers a rough indication of whether the court order itself, not “court-ordered school spending,” affects outcomes. It certainly includes no direct test of the effects of any spending reforms that might have been implemented in response to one or more of the court orders.

Kevin Welner and I also raise questions regarding “who” would have benefited from specific reforms and “when” specific reforms were implemented and/or faded out. In our article, much of our attention regarding who and when questions focused on Chapter 6, The Effectiveness of Judicial Remedies of Eric Hanushek and Alfred Lindseth’s book Courting Failure.[3] A downloadable version of the same graphs and arguments can be found here: http://edpro.stanford.edu/Hanushek/admin/pages/files/uploads/06_EduO_Hanushek_g.pdf.  Specifically, Hanushek and Lindseth identify four states, Kentucky, Massachusetts, New Jersey and Wyoming as states which have by order of their court systems, (supposedly) infused large sums of money into school finance reforms over the past 20 years. Given this simple classification, Hanushek and Lindseth take the National Assessment (NAEP) Scores for these states, including scores for low income children, and racial subgroups, and plot those scores against national averages from 1992 to 2007.

No statistical tests are performed, but graphs are presented to illustrate that there would appear to be no difference in growth of scores in these states relative to national averages. Of course, there is also no measure of whether and how funding changed in these states compared to others. Additionally, there is no consideration of the fact that in Wyoming, for example, per pupil spending increased largely as a function of enrollment decline and less as a function of infused resources (the denominator shrunk more than the numerator grew).

Setting these other major concerns aside, which alone undermine entirely the thesis of Hanushek and Lindseth’s chapter, Kevin Welner and I explain the problem of using a wide time window to evaluate school finance reforms which may ebb and flow throughout that window:

As noted earlier, the appropriate outcome measure also depends on identifying the appropriate time frame for linking reforms to outcomes. For example, a researcher would be careless if he or she merely analyzed average gains for a group of states that implemented reforms over an arbitrary set of years. If a state included in a study looking at years 1992 and 2007 had implemented its most substantial reforms from 1998 to 2003, the overall average gains would be watered down by the six pre-reform years – even assuming that the reforms had immediate effects (showing up in 1998, in this example). And, as noted earlier, such an “open window” approach may be particularly problematic for evaluating litigation-induced reforms, given the inequitable and inadequate pre-reform conditions that likely led to the litigation and judicial decree.

There also exist logical, identifiable, time-lagged effects for specific reforms. For example, the post-1998 reforms in New Jersey included implementation of universal pre-school in plaintiff districts. Assuming the first relatively large cohorts of preschoolers passed through in the first few years of those reforms, a researcher could not expect to see resulting differences in 3rd or 4th grade assessment scores until four to five years later.

Further, as noted previously, simply disaggregating NAEP scores by race or low income status does not guarantee by any stretch that one has identified the population expected to benefit from specific reforms. That is, race and poverty subgroups in the NAEP sample are woefully imprecise proxies for students attending districts most likely to have received additional resources. Kevin Welner and I explain:

This need to disaggregate outcomes according to distributional effects of school funding reforms deserves particular emphasis since it severely limits the use of the National Assessment of Educational Progress – the approach used in the recent book by Hanushek and Lindseth. The limitation arises as a result of the matrix sampling design used for NAEP. While accurate when aggregated for all students across states or even large districts, NAEP scores can only be disaggregated by a constrained set of student characteristics, and those characteristics may not be well-aligned to the district-level distribution of the students of interest in a given study.

Consider, for example, New Jersey – one of the four states analyzed in the recent book. It might initially seem logical to use NAEP scores to evaluate the effectiveness of New Jersey’s Abbott litigation, to examine the average performance trends of economically disadvantaged children. However, only about half (54%) of New Jersey children who receive free or reduced-price lunch – a cutoff set at 185% of the poverty threshold – attend the Abbott districts. The other half do not, meaning that they were not direct beneficiaries of the Abbott remedies. While effects of the Abbott reforms might, and likely should, be seen for economically disadvantaged children given that sizeable shares are served in Abbott districts, the limited overlap between economic disadvantage and Abbott districts makes NAEP an exceptionally crude measurement instrument for the effects of the court-ordered reform.16

Hanushek and Lindseth are not alone in making bold assertions based on insufficient analyses, though Chapter 6 of their recent book goes to new lengths in this regard. Kevin Welner and I address numerous comparably problematic studies with more subtle whether, who and when problems, including the Greene and Trivitt study noted above.  Another example is a study by Florence Neymotin of Kansas State University, which purports to find that the substantial infusion of funding into Kansas school districts which supposedly occurred between 1997 and 2006 as a function of the Montoy rulings never led to substantive changes in student outcomes. I blogged about this study when it was first reported. But, the most relevant court orders in Montoy did not come until January of 2005, June of 2005 and eventually July of 2006. Remedy legislation may be argued to have begun as early as 2005-06, but primarily from 2006-07 on, before its dismantling from 2008 on. Regarding the Neymotin study, Kevin Welner and I explain:

A comparable weakness undermines a 2009 report written by a Kansas State University economics professor, which contends that judicially mandated school finance reform in Kansas failed to improve student outcomes from 1997 to 2006 (Neymotin, 2009).13 This report was particularly egregious in that it did not acknowledge that the key judicial mandate was issued in 2005 and thus had little or no effect on the level or distribution of resources across Kansas schools until 2007-08. In fact, funding for Kansas schools had fallen behind and become less equitable from 1997 through 2005.14 Consequently, an article purporting to measure the effects of a mandate for increased and more equitable spending was actually, in a very real way, measuring the opposite.[4]

Kevin Welner and I also review several studies applying more rigorous and appropriate methods for evaluating the influence of state school finance reforms. I have discussed those studies previously here. On balance, it is safe to say that a significant body of rigorous empirical literature, conscious of whether, who and when concerns, validates that state school finance reforms can have substantive positive effects on student outcomes including reduction of outcome disparities or increased overall outcome level.

Further, it is even safer to say that analyses provided in sources like the book chapter by Hanushek and Lindseth (2009), or research articles by Neymotin (2009), Greene and Trivett, provide no credible evidence to the contrary, due to significant methodological omissions. Finally, even the boldest, most negative publications regarding state school finance reforms provide no support for the contention that school finance reforms actually “harm our children,” as indicated in the title of a 2006 volume by Eric Hanushek.

Sometimes, even when a research report or article seems really complicated, relatively simple questions like when, whether and who allow the less geeky reader to quickly evaluate and possibly debunk the study entirely.  Sometimes, the errors of reasoning regarding when, whether and who, are so absurd that it’s hard to believe that anyone would actually present such an absurd analysis. But these days, I’m rarely shocked. My personal favorite “when” error remains the Reason Foundation’s claim that numerous current reforms positively affected past results! http://nepc.colorado.edu/bunkum/2010/time-machine-award. It just never ends!

Further reading:

B. Baker, K.G. Welner (2011) Do School Finance Reforms Matter and How Can We Tell. Teachers College Record. http://www.tcrecord.org/content.asp?contentid=16106

Card, D., and Payne, A. A. (2002). School Finance Reform, the Distribution of School Spending, and the Distribution of Student Test Scores. Journal of Public Economics, 83(1), 49-82.

Roy, J. (2003). Impact of School Finance Reform on Resource Equalization and Academic Performance: Evidence from Michigan. Princeton University, Education Research Section Working Paper No. 8. Retrieved October 23, 2009 from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=630121(Forthcoming in Education Finance and Policy.)

Papke, L. (2005). The effects of spending on test pass rates: evidence from Michigan. Journal of Public Economics, 89(5-6). 821-839.

Downes, T. A., Zabel, J., and Ansel, D. (2009). Incomplete Grade: Massachusetts Education Reform at 15. Boston, MA. MassINC.

Guryan, J. (2003). Does Money Matter? Estimates from Education Finance Reform in Massachusetts. Working Paper No. 8269. Cambridge, MA: National Bureau of Economic Research.

Deke, J. (2003). A study of the impact of public school spending on postsecondary educational attainment using statewide school district refinancing in Kansas, Economics of Education Review, 22(3), 275-284.

Downes, T. A. (2004). School Finance Reform and School Quality: Lessons from Vermont. In Yinger, J. (ed), Helping Children Left Behind: State Aid and the Pursuit of Educational Equity. Cambridge, MA: MIT Press.

Resch, A. M. (2008). Three Essays on Resources in Education (dissertation). Ann Arbor: University of Michigan, Department of Economics. Retrieved October 28, 2009, from http://deepblue.lib.umich.edu/bitstream/2027.42/61592/1/aresch_1.pdf

Goertz, M., and Weiss, M. (2009). Assessing Success in School Finance Litigation: The Case of New Jersey. New York City: The Campaign for Educational Equity, Teachers College, Columbia University.


[1] See, for example: E.A. Hanushek (2006) Courting Failure: How School Finance Lawsuits Exploit Judges’ Good Intentions and Harm Our Children. Hoover Institution Press.  Reviewed here: http://www.tcrecord.org/Content.asp?ContentId=13382

[2] Baker, B.D., Welner, K. (2011) School Finance and Courts: Does Reform Matter, and How Can We Tell? Teachers College Record 113 (11) p. –

[3] Hanushek, E. A., and Lindseth, A. (2009). Schoolhouses, Courthouses and Statehouses. Princeton, N.J.: Princeton University Press.

[4] B. Baker, K.G. Welner (2011) Do School Finance Reforms Matter and How Can We Tell. Teachers College Record. http://www.tcrecord.org/content.asp?contentid=16106