Blog

Third Way Responds but Still Doesn’t Get It!

Third Way has posted a response to my critique in which they argue that their analyses do not suffer the egregious flaws my review indicates. Specifically, they bring up my reference to the fact that whenever they are using a “district” level of analysis, they include the Detroit City Schools in their entirety in their sample of “middle class.” They argue that they did not do this, but rather only included the middle class schools in Detroit.

The problems with this explanation are many. First, several of their methodological explanations specifically refer to doing computations based on selecting “district” not school level data. For example, Footnote #8 in their report explains:

Third Way calculation based on the following source: New America Foundation, “Federal Education Budget Project,” Accessed on April 22, 2011. Available at: http://febp.newamerica.net/k12

The New America data set provides data at either the state, or DISTRICT level (see lower right hand section of page from link in footnote), not school level. And financial data of this type are not available nationally at the school level. You couldn’t select some and not all schools for financial data. My tabulations of who is in, or out of the sample are based on the district level data from the link in that web site.

Further, the authors later explain to their readers, in Footnote #40, in great detail, how to construct a data set to identify the middle class schools, using the NCES Common Core of Data Build a Table Function. Specifically, the instructions refer to selecting “district” to construct the data set. That selection creates a file of district level, not school level data. As such, a district is in or out in its entirety.

Third Way calculations based on data from the following source: United States, Department of Education, Institute of Education Statistics, National Center for Education Statistics, Common Core of Data. Accessed July 25, 2011. Available at: http://nces.ed.gov/ccd/bat/. The Common Core of Data includes data from the “2008-09 Public Elementary/Secondary School Universe Survey,” “2008-09 Local Education Agency Universe Survey,” and “2000 School District Demographics” from the U.S. Census Bureau. To generate data from the Common Core of Data, in the “select rows” drop down box, select “District.”

In my review, I explain thoroughly that Third Way mixes units of analysis throughout their report, sometimes referring to district level data from the New America Foundation data set, sometimes referring to NCES tabulations of data based on the Schools and Staffing Survey (not even their own original analyses of SASS data), and in some cases referring to data on individual children from the high school graduating class of 1992. In fact, the title of a section of the review is “mixing and matching data sources.” I explained in my review:

The authors seem to have overlooked the fact that NCES tables based on Schools and Staffing Survey data typically report characteristics based on school-level subsidized lunch rates. As such, within a large, relatively diverse district like New York City, several schools would fall into the authors’ middle-class grouping, while others would be considered high-poverty, or low-income, schools. But, many other of the authors’ calculations are based on district-level data, such as the financial data from New America Foundation. When using district-level data, a whole district would be included or excluded from the group based on the district-wide percentage of children qualifying for free or reduced-price lunch. What this means is that the Third Way report is actually comparing different groups of schools and districts from one analysis to another, and within individual analyses.

When referring to district level data, the district of Detroit would be included in its entirety. When referring to aggregations from tables based on the Schools and Staffing Survey, as I explain, some would be in and some would be out.

Further, the authors refer throughout to the groupings by subsidized lunch rates as quartiles. They are not. Quartiles would include even distributions – quarters – of either children, schools or districts. The selected cutoffs of 25% and 75% qualified for free or reduced lunch do not yield quartiles, as shown by their own data.

The bottom line, however, is that the arbitrary, broad and imbalanced subsidized lunch cutoffs chosen by the authors neither work well for district or school level analysis, no less an inconsistent mix of the two. And, the authors fail to understand that applying the same income thresholds across states and regions of the U.S. yields vastly different populations. Having income below 185% of the income level for poverty provides for a very different quality of life in New York versus New Mexico (for some discussion, see: https://schoolfinance101.wordpress.com/2011/09/13/revisiting-why-comparing-naep-gaps-by-low-income-status-doesnt-work/).

But, in their response, the Third Way authors also downplay the importance of any analyses that might have been done with district level data, stating that their most significant conclusions were not drawn from these data.

As I explain in my review, it would appear that their boldest conclusions were actually drawn from data on a completely different measure at a completely different unit of analysis, and for a completely different generation. Most of their conclusions about college graduation rates appear to be based on individuals who graduated from high school in 1992 (by my tracking of their Footnote #90). Further, when evaluating individual family income based data, the measure of middle class is entirely different, and we don’t know whether those children attend “middle class” schools or districts at all. That is, students are identified by a family income measure and placed into quartiles, regardless of the income levels of their schools. We don’t know which of them attended “middle class” schools and which did not. But, we do know that they graduated about 20 years ago, reducing their relevance for the analysis quite substantially.

For these reasons, the reply by the authors does little to help explain or redeem the report. Readers should also note that these (the issues discussed above) were only a subset of the problems with the report, which included, among other things, claims about middle class under-performance refuted by their own tables on the same page.

These are severe methodological flaws of a type one does not see regularly in “high profile” reports making bold claims about the state of American public education. In my view, the Third Way’s bold proclamation about the dreadful failures of our middle class schools, supported only by severely flawed analyses, was worthy of a bold response.

A few additional comments & data clarifications:

In their reply memo, the authors list the total numbers of schools in Detroit and other cities that fall above and below their subsidized lunch cut off points, arguing that these are the actual numbers of schools in each city which they included in their “middle class” group and arguing that this clarification negates entirely my concern as to which districts are and are not included. Again, whether the illogical and unfounded cut points were applied to school or district level data doesn’t actually matter that much. It’s bad analysis either way.

But, the tabulation they provide in the memo, which is likely drawn from school level data from the NCES Common Core, Public School Universe Survey, does not actually relate to the vast majority of tables and analyses reported in their original document. Either the authors simply don’t understand this, or the memo is a knowingly false representation of their analyses. Here’s a quick run down:

  1. Financial data used in the report, for per pupil expenditure calculations are not available at the school level.
  2. Teacher salary and all teacher characteristics comparisons were based on pre-made tables based on Schools and Staffing Survey data, which is a SAMPLE of about 8,000 or so schools out of 100,000 or so nationally. I point out in my review that these pre-made NCES tables reporting on SASS data would have schools within districts falling either side of the cut off lines. The authors do not appear to have actually used SASS data themselves, which would provide much more flexibility in the analysis. Rather, the  authors performed calculations based on tables in NCES reports using SASS data.
  3. NAEP (National Assessment of Educational Progress) data simply can’t be parsed by school within district in any way that would represent all schools within each district that fall above and/or below the cut points used (as implied in their memo). NAEP data could be reported (or drawn from reports) based on average school characteristics, or based on child characteristics. Third Way appears to have used this easy table creator tool from NAEP (see their FN#52). So, yes, the NAEP tabulations would split schools within large districts. But, to be clear, these would not match the numbers of schools counts the report in their memo because NAEP is based on sample data. Further, the problem here is that their report infers a relationship between the students scores on NAEP and the financial data when there is only partial overlap between the two because different units are used for each. Nonetheless, the BIG takeaway regarding the tables of NAEP data are that NAEP scores of students who attend the middle brackets of schools score… in the middle! Suggesting that these data reveal dreadful failures of middle class schools is delusional (in a purely statistical sense, that is)!
  4. The data on college matriculation and on graduation by age 26 (their most bold conclusions) are cited to reports done by others, most significantly to the Bowen book Crossing the Finish Line, which in its early sections (Chapter 2), includes family income quartile data based on the National Longitudinal Studies of the 8th grade class of 1988, and other data in the Bowen book (as I explain in the review) are on select states only. It is entirely inappropriate to extrapolate either the NELS 88 findings, or select state findings to the national population in “middle class” schools. We may know individual family income quartile, but we do not know their schools’ characteristics. Arguably, it is entirely inappropriate for the Third Way, on page 5 of their reply memo to claim regarding the completion rates of 26 year olds that “This is the major finding of our paper,” when it is, in fact, not their finding at all, but rather a citation to a finding in a book by someone else!

While the authors seem to wish to argue that my criticism over the poverty classification applied to district level data does not undermine their major conclusions that is clearly not the case. Given these concerns that exist across a) financial input data, b) teacher characteristics data, c) achievement outcome measures and d) college completion data, and the misalignment of units across all measures, not a single conclusion of the Third Way report remains intact.

One difference between Playin’ Jazz and Policy Research: Comments on the Third Way “Middle Class” Reply

Occasionally on this blog, I slip in some jazz references. I often see commonalities between jazz improvisation and policy analysis. But I think I’ve finally found one thing that is very different.

A lot of jazz teachers will joke around with students about what to do when you’re improvising a solo over chord changes, perhaps to a standard tune, and you happen to land unintentionally on a dissonant note.  Somethin’ with a really sour sound!  The usual advice is if you hit such a note, play it even louder a few more times! Make it sound intentional. Of course, you eventually want to resolve the dissonance, not end on it. But work it until then.

Well, I’m not sure that this principle applies well to policy research. Here’s why. I just completed a review of a report by Third Way, a think tank I’d never heard of previously. Third Way released a report on what it called “Middle Class” schools, and argued that these schools aren’t making the grade. Methodologically, this report was about the most god-awful thing I’ve ever had to read.  Here is the abstract of my review:

Incomplete: How Middle Class Schools Aren’t Making the Grade is a new report from Third Way, a Washington, D.C.-based policy think tank. The report aims to convince parents, taxpayers and policymakers that they should be as concerned about middle-class schools not making the grade as they are about the failures of the nation’s large, poor, urban school districts. But, the report suffers from egregious methodological flaws invalidating nearly every bold conclusion drawn by its authors. First, the report classifies as middle class any school or district where the share of children qualifying for free or reduced-priced lunch falls between 25% and 75%. Seemingly unknown to the authors, this classification includes as middle class some of the poorest urban centers in the country, such as Detroit and Philadelphia. But, even setting aside the crude classification of middle class, none of the report’s major conclusions are actually supported by the data tables provided. The report concludes, for instance, that middle-class schools perform much less well than the general public, parents and taxpayers believe they do. But, the tables throughout the report invariably show that the schools they classify as “middle class” fall precisely where one would expect them to—in the middle—between higher- and lower-income schools.

http://nepc.colorado.edu/thinktank/review-middle-class

In short, the layers of problems with the report were baffling. Among those layers of problems was a truly absurd definition of “middle class” schools, which, when I went to some of the data sources cited in order to evaluate the membership of “middle class” schools, I found school districts including Detroit, Philadelphia and numerous other large poor urban centers. Yet, throughout, the authors suggested that they were characterizing stereotypical “middle class” schools.

So, here’s the fun part. In response to my critique, did the Third Way authors consider at all the possibility that they had not done a very methodologically strong report? That their definition of “middle class” districts might have a few problems? Hell no. What did they do with that dissonant note! They took the advice of jazz instructors, and decided to defend that note, and play it loudly a few more times!

In their own words:

Let us be clear: Our decision to use this criteria was a deliberate choice, grounded in established procedures and data. http://perspectives.thirdway.org/?p=1173

But really. Let’s be more clear. While you might claim to have played this sour note deliberately, or might be trying to convince us as much, it just doesn’t cut it in policy research. Maybe sometimes it doesn’t really work in Jazz that well either. I don’t really like to see people in the front row cringe while I’m playin’ or encourage them to cringe a few more times before I provide them relief.

Please, don’t make me cringe anymore by defending indefensible criteria and shoddy analyses.  It’s time to go back to the woodshed. Go home. Do some practicing. Learn the tunes. Learn the changes. It takes time and discipline and we all play those dissonant notes some time.  I’ve certainly played my share over time. Sometimes we make em’ work. A lot of the time it can’t be done. Perhaps in this way, the discipline of good policy analysis and the discipline of solid jazz improv are quite similar.

A related parable from Jazz history: http://www.guardian.co.uk/music/2011/jun/17/charlie-parker-cymbal-thrown

Oh, and a few more comments. The “middle class” definition issue is but one of many egregious flaws in the report. Among other things, the authors repeatedly refer to quartiles which are not in fact quartiles. The authors make repeated claims inferring that today’s middle class schools are only getting ¼ graduates through college by age 26, but a little detective work shows that this assumption is actually cited back to a source using data on the high school class of 1992 (20 freakin’ years ago). The report confuses individuals from middle class families with students who attended schools that, on average, are middle class (not the same). Finally, the report constantly notes that middle class schools do not meet expectations, while providing tables showing that the middle class students, on average, perform where? In the middle. Right where expected!

Piloting the Plane on Musical Instruments & using SGPs to Evaluate Teachers

I’ve posted a few blogs recently on the topic of Student Growth Percentile Scores, or SGPs and how many state policymakers have moved to adopt these measures and integrate them into new evaluation systems for teachers. In my first post, I argued that SGPs are simply not designed to make inferences about teacher effectiveness.

The designers of SGP replied to my first post, suggesting that I was conflating the measures with their use by suggesting that these measures can’t and shouldn’t be used to infer teacher effectiveness.  And in their response (more below), they explained in greater detail, what was essentially my main point – that SGPs are not designed or intended to infer teacher effectiveness from student achievement growth. They also argued that the policy makers they have advised on adopting SGPs understood that.

Well, let’s review what’s going on in New Jersey. In New Jersey, a handful of districts have signed on to the department of education’s Pilot teacher evaluation program, explained here: http://www.state.nj.us/education/EE4NJ/faq/

Specifically, here’s how NJDOE responds to the question over how standardized testing data, and SGPs based on those data would be used within the pilot evaluations:

From NJDOE

Q:  How much weight do standardized test scores get in the evaluations?

A:  Standardized test scores are not available for every subject or grade. For those that exist (Math and English Language Arts teachers of grades 4-8), Student Growth Percentages (SGPs), which require pre- and post-assessments, will be used. The SGPs should account for 35%-45% of evaluations.  The NJDOE will work with pilot districts to determine how student achievement will be measured in non-tested subjects and grades.

Now, here is a quote from Betebenner and colleagues’ response to my criticism of policymakers proposed uses of SGPs in teacher evaluation.

From Damian Betebenner & colleagues

A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

(emphasis added)

But, you see, using these data to “evaluate teachers” necessarily infers “attribution of responsibility for that progress.” Attribution of responsibility to the teacher!  If one cannot use these measures to attribute responsibility to the teacher, then how can one possibly use these measures to “evaluate” the teacher? One can’t. You can’t. No-one can. No-one should!

Perhaps in an effort to preserve proprietary interests, Betebenner and colleagues in their reply to my original criticism also note:

To be clear about our own opinions on the subject: The results of large-scale assessments should never be used as the sole determinant of education/educator quality.

No state or district that we work with intends them to be used in such a fashion. That, however, does not mean that these data cannot be part of a larger body of evidence collected to examine education/educator quality.

But this statement stands in direct conflict with the first above. If the tool is insufficient for – simply not even designed to – ATTRIBUTE RESPONSIBILITY FOR PROGRESS to either teachers or schools, then it simply can’t and SHOULDN’T BE USED THAT WAY! Be it for 10% or 90%.

The reality is that even though Betebenner and colleagues explain that they believe that the policymakers with whom they have consulted “get it” and would never consider misusing the measures in the ways I explained on my original post, that is precisely what is going on.

Also, I noted previously that this paragraph from their response is a complete cop out. I explained:

What the authors accomplish with this point, is permitting policymakers to still assume (pointing to this quote as their basis) that they can actually use this kind of information, for example, for a fixed 90% share of high stakes decision making, regarding school or teacher performance, and  certainly that a fixed 40% or 50% weight would be reasonable. Just not 100%. Sure, they didn’t mean that. But it’s an easy stretch for a policymaker.

If the measures aren’t meant to isolate system, school or teacher effectiveness, or if they were meant to but simply can’t, they should NOT be used for any fixed, defined, inflexible share of any high stakes decision making.  In fact, even better, more useful measures shouldn’t be used so rigidly.

[Also, as I’ve pointed out in the past, when a rigid indicator is included as a large share (even 40% or more) in a system of otherwise subjective judgments, the rigid indicator might constitute 40% of the weight but drive 100% of the decision.]

Look. It’s pretty simple. If you want to pilot an airplane effectively, the plane needs to have the right instruments – flight instruments. If you’re coming in for a landing in dense fog in mountainous terrain, you look down to where your flight instruments should be, http://www.b737.org.uk/images/fltinsts_panel_nonefis.jpg, and there sits an alto saxophone instead (albeit a fine, Selmer Mark VI w/serial # in the 180s), you’re screwed. You might have a few minutes left to blow through the changes to Foggy Day, but your chances of successfully piloting the plane to a safe landing are severely diminished.

Okay, this analogy is a bit of a stretch. But it is not a stretch to acknowledge that SGPs were simply not designed to attribute responsibility for student progress to teachers. Meanwhile, VAM models try, but are unable to effectively, accurately or precisely attribute student progress to teachers. So, we have a choice of piloting the plane with either a) the wrong instruments (SGP) or b) instruments that don’t work very well (have high error rates & comparable problems of inference).  When faced with choices this bad, it may be wise to take another course entirely. Don’t pilot the damn plane! It would be a shame to crash it with such a beautiful saxophone on board!

On ignorance & impartiality: A comment on the Monmouth U. Poll on Ed. Policy

Some Twitter followers may have noticed the ongoing back and forth regarding the validity of the recent Monmouth University Poll on education reform.I’d certainly rather spend my time on more substantive discussion.

As I’ve noted on many occasions, polls are what they are. They ask what they ask. And the responses to the questions must always be evaluated only with respect to what was asked. Questions about specific policies in particular require that the policies in question be described correctly. This is a point raised the other day by Matt Di   Carlo about the Monmouth Poll here.

Yesterday, Patrick Murray, director of the polling institute posted a response to some of the criticisms levied against the recent Monmouth poll. Unfortunately, I found his response to be much less fulfilling and in many ways far more disturbing than the poll itself. Quite honestly, I’d have left this issue alone if not for some particularly troublesome assertions made by the polling institute director Patrick Murray.

First, here is my response regarding the substantive issue raised by Matt Di   Carlo:

Mr. Murray points out that he, as many pollsters do, chose to use colloquial language to describe “tenure.” The problem, as explained by Matt Di Carlo here http://shankerblog.org/?p=3695, is that the colloquial characterization was factually incorrect, and that it would be possible to achieve a colloquial characterization that is not factually incorrect. The factual error in the characterization of tenure leads to a clear bias in the question. This is the most obvious example, but there are numerous more subtle cases where questions do not accurately represent existing or proposed legislation or regulations.

Here are a few additional points regarding content in Mr. Murray’s response:

Specifically, Mr. Murray contends that critics were simply unhappy with the results, and offered no substantive criticism of the methods.

On Twitter, I have criticized the title of the press release for the poll, which claims that the poll results indicate broad support for New Jersey reforms, implying that responses to the specific questions regarding policies can be taken as supporting the specific policies being proposed.  That is, it infers a close relationship between the policies framed in the questions and actual policy proposals on the table.  Usually,  it is the media who makes such misguided leaps. In this case, the polling institute provided them with the misleading headline.

Mr. Murray’s response not only defends the headline, but he actually makes even less justified statements (slightly more specific) to the same effect. Mr. Murray claims that the poll results provide “broad, general support” for the “Governor’s proposals”, which happen to be rather specific proposals (many of which are not actually the governor’s proposals, but proposals for which he has offered support).  But, very few (if any) of the questions in the poll accurately represent the specific proposals (like mischaracterizing what tenure is).  The questions are broad, and imprecise (if intended to discern support for existing proposals). They are general. Some are outright incorrect. As a researcher, I can assure you that a response to one question, referring to one type of policy (a hypothetical policy that is substantively different from the actual proposals) should not be interpreted as relating to another (without careful statistical validation, which would involve asking the other question).  That is a methodological concern. Not a concern with the findings. It is a concern largely over the representation of findings (press release titles matter), as opposed to the usual quibbling over sampling issues.

After defending the wording of the tenure question, Mr. Murray goes on to discuss the follow up questions to the tenure question – specifically those about how the general public would like to see tenure changed. The problem is that each of these questions about how to “change” tenure is invalid because “change” in the mind of the respondent (at least the uninformed respondent) is measured against an incorrectly defined baseline of what tenure is. That is, Mr. Murray has provided a prompt in the first tenure question that incorrectly describes tenure, asserting that tenure means that a teacher can only be fired for “serious misconduct.” Then he asks in a series of questions whether that should be changed and how. If the baseline condition – existing policy – is described incorrectly, arguably biased – then responses to subsequent questions are influenced by this. That is either biased, or simply sloppy.

Which brings up a related issue. Mr. Murray notes that many if not most poll respondents were unaware of policies, or details of reforms. Because of that, the phrasing of the questions, the colloquial explanations of the policies are of even greater importance, having even greater potential to shape the response. That phrasing can be the basis of grossly misinforming the otherwise uninformed respondent. And it just may have been.

The most significant and most disturbing point:

Setting aside this methodological quibbling, I take issue with Mr. Murray’s point that academic researchers might come at these issues with normative values – as I admittedly do – and that having normative values (based on years of extensive research on these topics) somehow invalidates someone’s ability to critique the poll. Mr. Murray explains:

 To start, most of the criticism has come from people without expertise in the field of survey research.  Some has, which I will treat more seriously.  But it’s important to note that all of these critics, including some who are academic researchers, have taken very public normative positions on education policy.  Normative is one of those great social science words.  It simply means they already have a clear opinion about how things ought to be.  When normative values get applied in a research setting, they lead to bias.

So, in other words. If you don’t have expertise in opinion research, your criticisms should not be taken seriously. And, if you have far too much knowledge and expertise in the substance of the poll (education law, policy and reform), you are too biased for your opinion to carry any weight. This argument is patently absurd.

As Mr. Murray frames it, only through blissful ignorance  on issues of substance can anyone be sufficiently impartial to be involved in, or make claims or arguments regarding either substance or method.  Those with knowledge and opinions derived from that knowledge are necessarily too biased to have valid concerns. I’ll admit that I have biases for rigorous research methodologies.

Like Dr. Di Carlo (who holds a Ph.D. in Sociology from Cornell), I’m not a pollster. I’m a researcher and perhaps that alters my view on how research is conducted and what kinds of conclusions can be reasonably drawn from survey responses to questions with specific wording.  I generally don’t care much for polls or polling results, but I am a stickler for methods.

This poll was about policies, not politicians. And as someone who studies policies I am particularly sensitive to the details of policy design & implementation. This poll was clearly not sensitive to those details and was exceptionally sloppy in its characterization of policies and policy design. And that’s a methodological problem, and one that is so glaringly apparent because of my academic expertise in this area – not because of some normative bias – but, because of actual details, including statutes and regulations.

Perhaps I’m being too picky, and that’s just how the polling industry works. Perhaps the normative values of pollsters allow for imprecise colloquial descriptions and drawing broad unsubstantiated conclusions. That seems to be the gist of Patrick Murray’s argument, and one I find distasteful enough to require a response.

Inkblots and Opportunity Costs: Pondering the Usefulness of VAM and SGP Ratings

I spent some time the other day, while out running, pondering the usefulness of student growth percentile estimates and value added estimates of teacher effectiveness for the average school or district level practitioner. How would they use them? What would they see in them? How might these performance snapshots inform practice?

Let’s just say I am skeptical that either VAMs (Value Added Models) or SGPs (Student Growth Percentiles) can provide useful insights to anyone who doesn’t have a pretty good understanding of the nuances of these kinds of data/estimates & the underlying properties of the tests. If I was a principal, would I rather have the information than not? Perhaps. But I’m someone who’s primary collecting hobby is, well, collecting data. That doesn’t mean it all has meaning, or more specifically, that it has sufficient meaning to influence my thinking or actions. Some does. Some doesn’t. Keeping some of the data that doesn’t have much meaning actually helps me to delineate. But I digress.

It seems like we are spending a great deal of time and money on these things for questionable return. We are investing substantial resources in simply maximizing the links in our data systems between individual student’s records and their classroom teachers of record, hopefully increasing our coverage to, oh, somewhere between 10% and 20% of teachers (those with intact, single teacher classrooms, serving children who already have a track record of prior tests – e.g. upper elementary classroom teachers).

At the outset of this whole “statistical rating of teachers” endeavor, it was perhaps assumed by some economists that we would just ram these things through as large scale evaluation tools (statewide and in large urban districts) and use them to prune the teacher workforce and that would make the system better. We’d shoot first… ask questions later (if at all). We’d make some wrong decisions, hopefully statistically more “right’ than wrong, and we’d develop a massive model and data set for large enough numbers of teachers that the cost per unit (cost per bad teacher correctly fired, counterbalanced by the cost per good teacher wrongly fired) would be relatively low. We’d bring it all to scale, and scale would mean efficiency.

Now, I find this whole version of the story to be too offensive to really dig into here and now. I’ve written previously about “smart selection” versus “dumb selection” regarding personnel decisions in schools. And this would be what I called “dumb selection.

But, it also hasn’t necessarily played out this way… thankfully… except perhaps for some large city systems like Washington, DC, and a few more rigidly mandated state systems (though we’re mostly in wait-and-see mode there as well). Instead, we are now attempting to be more “thoughtful” about how we use this stuff and asking teachers to ponder their statistical ratings for insights into how they interact with children? How they teach? And we are asking administrators to ponder teachers’ statistical estimates for any meaning they might find.

In my current role, as a researcher of education policy, I love equations like this: http://graphics8.nytimes.com/images/2011/03/07/education/07winerip_graphic/07winerip_graphic-articleLarge-v2.jpg

I like to see the long lists of coefficients (estimates of how some measure in the model relates to the dependent variable) spit out in my Stata logs and ponder what they might mean, with full consideration of what I’ve chosen to include or exclude in the model, and whether I’m comfortable that the measures on both sides of the equation are of sufficient quality to really tell me anything… or at least something.

The other evening,  I thought back to my teaching days (considered a liability as an education policy researcher), at whether I thought it would have been useful to me to simply have some rating of my aggregate effectiveness – simply relative to other teachers. Nothing specific about the performance of my students on specific content/concepts. Just some abstract number… like the relative rarity that my students scored X at the end of my class given that they scored X-Y at the end of last years class? Or, some generalized “effectiveness” rating category based on whether my coefficient in the model surpassed a specific cut score to call me “exceptional” or merely “adequate?” Something like this.

Would that be useful to me? to the principal? if I was the principal?

Given that I typically taught 2 sections of 7th grade life science and 2 of 8th grade physical science (yeah… cushy private school job), with class sizes of about 18 students each, which rotated through different times of day, I might also find it fun to compare growth of my various classes. Did the disruptive distraction kid really cause my ratings in one life science section to crash (you know who you are!)? Was the same kid able to bring her 8th grade teacher down the next year (hopefully not me again!)?

I asked myself… would those ratings actually tell me anything about what I should do next year (accepting that the data would come on a yearly cycle)? Should I go watch teachers who got better ratings? Could I? Would they protect their turf? Would that even tell me a damn thing? Besides, knowing what I do now, I also know that large shares of the teachers who got a better rating likely got that rating either because of a) random error/noise in the data or b) some unmeasured attribute of the students they serve (bias). Of course, I didn’t know that then, so what would I think?

My gut instinct is that any of these aggregate indicators of a teacher’s relative effectiveness, generated from complex statistical models, with, or without corrections for other factors, are little more than ink blots to most teachers and administrators. And I”m not convinced they’ll ever be anything more than that. They possess many of the same attributes of randomness or fuzziness of an ink blot. And while the most staunch advocate might wish them to appear as an impressionist painting, I expect they are still most often seen as ink blots – not even a Jackson Pollock. More random than pattern. And even if/when there is a pattern, the average viewer may never pick it up.

I anxiously (though skeptically) await well crafted qualitative studies exploring stakeholders’ interpretations of these inkblots.

But these aren’t just any ink blots. They are rather expensive ink blots if and when we start trying to use them in more comprehensive and human resource intensive ways through local public schools and districts and if we weigh on them the burden that we MUST use them not merely to inform, but rather to DRIVE our decisions – and must find significant meaning in them to justify doing so.  That is, if we really expect teachers and principals to log significant hours trying to derive meaning from them, after consultants, researchers, central office administrators and state department officials have labored over data system design, linking teachers to students, and deciding on the most aesthetically pleasing representation of teacher performance classifications for the individual reporting system. Using these tools as quick screening, blunt instruments is certainly a bad idea. But is this – staring at them for endless hours in search of meaning that may not be there – much better?

It strikes me that there are a lot more useful things we could/should/might be spending our time looking at in order to inform and improve educational practice or evaluate teachers. And that the cumulative expenditure on these ink blots, including the cost of time spent musing over them, might be better applied elsewhere.

More on the SGP debate: A reply

This new post from Ed News Colorado is in response to my critique of Student Growth Percentiles here: https://schoolfinance101.wordpress.com/2011/09/02/take-your-sgp-and-vamit-damn-it/

I must say that I agree with almost everything in this response to my post, except for a few points. First, they argue:

Unfortunately Professor Baker conflates the data (i.e. the measure) with the use. A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

No, I do not conflate the data and measures with their proposed use. Policy makers are doing that and doing that based on ill advisement from other policymakers who don’t see the important point – the primary purpose – as Betebenner, Briggs and colleagues explain.  This is precisely why I use their work in my previous post – because it explains their intent and provides their caveats.

Policymakers, by contrast are pitching the direct use of SGPs in teacher evaluation. Whether they intended this or not, that’s what’s happening. Perhaps this is because they are not explaining as bluntly they do here, what the actual intent/design was.

Further, I should point out that while I have marginally more faith that a VAM could, in theory be used to parse out teacher effect than an SGP, which isn’t even intended to, I do not have any more faith than they do that a VAM actually can accomplish this objective. They interpret my post as follows:

Despite Professor Baker’s criticism of VAM/SGP models for teacher evaluation, he appears to hold out more hope than we do that statistical models can precisely parse the contribution of an individual teacher or school from the myriad of other factors that contribute to students’ achievement.

I’m not, as they would characterize, a VAM supporter over SGP, and any reader of this blog certainly realizes that. However, it is critically important that state policymakers be informed that SGP is not even intended to be used in this way. I’m very pleased they have chosen to make this the central point of their response!

And while SGP information might reasonably be used in another way, if used as a tool for ranking and sorting teacher or school effectiveness, SGP results would likely be more biased even than VAM results… and we may not even know or be able to figure out to what extent.

I agree entirely with their statement (but for the removal of “freakin”):

We would add that it is a similar “massive … leap” to assume a causal relationship between any VAM quantity and a causal effect for a teacher or school, not just SGPs. We concur with Rubin et al (2004) who assert that quantities derived from these models are descriptive, not causal, measures. However, just because measures are descriptive does NOT imply that the quantities cannot and should not be used as part of a larger investigation of root causes.

The authors of the response make one more point, that I find objectionable (because it’s a cop out!):

To be clear about our own opinions on the subject: The results of large-scale assessments should never be used as the sole determinant of education/educator quality.

What the authors accomplish with this point, is permitting policymakers to still assume (pointing to this quote as their basis) that they can actually use this kind of information, for example, for a fixed 90% share of high stakes decision making, regarding school or teacher performance, and  certainly that a fixed 40% or 50% weight would be reasonable. Just not 100%. Sure, they didn’t mean that. But it’s an easy stretch for a policymaker.

If the measures aren’t meant to isolate system, school or teacher effectiveness, or if they were meant to but simply can’t, they should NOT be used for any fixed, defined, inflexible share of any high stakes decision making.  In fact, even better, more useful measures shouldn’t be used so rigidly.

[Also, as I’ve pointed out in the past, when a rigid indicator is included as a large share (even 40% or more) in a system of otherwise subjective judgments, the rigid indicator might constitute 40% of the weight but drive 100% of the decision.]

So, to summarize, I’m glad we are, for the most part, on the same page. I’m frustrated that I’m the one who had to raise this issue in part because it was pretty clear to me from reading the existing work on SGP’s that many were conflating the measure with its use. I’m still concerned about the use, and especially concerned in the current policy context. I hope in the future that the designers and promoters of SGP will proclaim more loudly and clearly their own caveats – their own cautions – and their own guidelines for appropriate use.

Simply handing off the tool to the end user and then walking away in the face of misuse and abuse would be irresponsible.

Addendum: By the way, I do hope the authors will happily testify on behalf of the first teacher who is wrongfully dismissed or “de-tenured” on the basis of 3 bad SGPs in a row. That they will testify that SGPs were never intended to assume a causal relationship to teacher effectiveness, nor can they be reasonably interpreted as such.

Revisiting why comparing NAEP gaps by low income status doesn’t work

This is a compilation of previous posts, in response to the egregious abuse of data presented on Page 3, here: http://www.scribd.com/fullscreen/64717249:

Pundits love to make cross-state comparisons and rank states on a variety of indicators, something I’m guilty of it as well.[1] A favorite activity is comparing NAEP test scores across subjects, including comparing which states have the biggest test score gaps between children who qualify for subsidized lunch and children who don’t. The simple conclusion – States with big gaps are bad – inequitable – and states with smaller gaps must being doing something right!

It is generally assumed by those who report these gaps and rank states on achievement gaps that these gaps are appropriately measured – comparably measured – across states. That a low-income child in one state is similar to a low-income child in another. That the average low-income child or the average of low-income children in one state is comparable to the average of low-income children in another, and that the average of non-low income children in one state is comparable to the average of non-low income children in another.  Unfortunately, however, this is a deeply flawed assumption.

Let’s review the assumption. Here’s the basic framing adopted by most who report on this stuff:

Non-Poor Child Test Score – Poor Child Test Score = Poverty Achievement Gap

Non-Poor Child in State A = Non-Poor Child in State B

Poor Child in State A = Poor Child in State B

These conditions have to be met for there to be any validity to rankings of achievement gaps.

Now, here’s the problem.

Poor = child from family falling below 185% income level relative to income cut point for poverty

Therefore, the measurement of an achievement gap between “poor” and “non-poor” is:

Average NAEP of children above 185% poverty threshold – Average NAEP of children below 185% poverty threshold = “Poverty” achievement Gap

But, the income level for poverty is not varied by state or region.[2]

As a result, the distribution of children and their families above and below the specified threshold varies widely from state to state, and comparing the average performance of the groups of children above that threshold and below it is not particularly meaningful.  Comparing those gaps across states is really problematic.

Here are graphs of the poverty distributions (using a poverty index where 100 = 100%, or income at the poverty level) for families of 5 to 17 year olds in New Jersey and in Texas. These graphs are based on data from the 2008 American Community Survey (from http://www.ipums.org). They include children attending either/both public and private school.

Figure 1

Poverty Distribution (Poverty Index) and Reduced Price Lunch Cut-Point

 

  Figure 2

Poverty Distribution (Poverty Index) and Reduced Price Lunch Cut-Point

 

To put it really simply, comparing the above the line and below the line groups in New Jersey means something quite different from comparing the above the line and blow the line groups in Texas, where the majority are actually below the line… but where being below the line may not by any stretch of the imagination be associated with comparable economic deprivation. Further, in New Jersey, much larger shares of the population are distributed toward the right hand end of the distribution – the distribution is overall “flatter.” These distributional differences undoubtedly have significant influence on the estimation of achievement gaps. As I often point out, the size of an achievement gap is as much a function of the height of the highs as it is a function of the depth of the lows.[3]

How does this matter when comparing poverty achievement gaps?

In the above charts, while I show how different the poverty and income distributions were in Texas and New Jersey as an example, those charts don’t explain how/why these distribution differences thwart comparisons of low-income vs. non-low income achievement gaps. Yes, it should be clear enough that the above the line and below the line groups just aren’t similar across these two states and/or nearly every other.

A logical extension of the analysis in that previous post would be to look at the relationship between:

Gap in average family total income between those above and below the free or reduced price lunch cut-off

AND

Gap in average NAEP scores between children from families above and below the free or reduced price lunch cut-off

If there is much (or any) of a relationship between the income gaps and the NAEP gaps – that is, states with larger income gaps between the poor and non-poor groups also have larger achievement gaps – such a finding would call into question the usefulness of state comparisons of these gaps.

So, let’s walk through this step by step.

First, Figure 3 shows the relationship across states between the NAEP Math Grade 8 scores and family total income levels for children in families ABOVE the free or reduced cutoff:

Figure 3

There is a modest relationship between income levels of non-low income children and NAEP scores. Higher income states generally have higher NAEP scores. No adjustments are applied in this analysis to the value of income from one location to another, mainly because no adjustments are applied in the setting of the poverty thresholds. Therein lies at least some of the problem. The rest lies in using a simple ABOVE vs. BELOW a single cut point approach.

Second, Figure 4 shows the relationship between the average income of families below the free or reduced lunch cut point and the average NAEP scores on 8th Grade Math (2009).

Figure 4

 

This relationship is somewhat looser than the previous relationship and for logical reasons – mainly that we have applied a single low-income threshold to every state and the average income of individuals below that single income threshold does not vary as widely across states as the average income of individuals above that threshold. Further, the income threshold is arbitrary and not sensitive do the differences in the value of any given income level across states.  But still, there is some variation, with some stats have much larger clusters of very low-income families below the free or reduced price lunch threshold (Mississippi).

But, here’s the most important part. Figure 5 shows the relationship between income gaps estimated using the American Community Survey data (www.ipums.org) from 2005 to 2009 and NAEP Gaps. This graph addresses directly the question posed above – whether states with larger gaps in income between families above and below the arbitrary low-income threshold also have larger gaps in NAEP scores between children from families above and below the arbitrary threshold.

Figure 5

In fact, they do. And this relationship is stronger than either of the two previous relationships. As a result, it is somewhat foolish to try to make any comparisons between achievement gaps in states like Connecticut, New Jersey and Massachusetts versus states like South Dakota, Idaho or Wyoming. It is, for example, more reasonable to compare New Jersey and Massachusetts to Connecticut, but even then, other factors may complicate the analysis.

How does this affect state ranking gaps? Re-ranking New Jersey

New Jersey’s current commissioner of education seems to stake much of his arguments for the urgency of implementing reform strategies on the argument that while New Jersey ranks high on average performance, New Jersey ranks 47th in achievement gap between low-income and non-low income children (video here: http://livestre.am/M3YZ).

And just yesterday, a New Jersey Governor’s Task Force report used New Jersey’s egregious poverty achievement gap as the primary impetus for the immediate need for reform: http://www.scribd.com/fullscreen/64717249 (In my view, all that follows in this report is severely undermined by the fact that those who drafted the report clearly do not have even the most basic understanding of data on poverty and achievement!)

To be fair, this is classic political rhetoric with few or no partisan boundaries.

To review, comparisons of achievement gaps across states between children in families above the arbitrary 185% income level and below that income level are very problematic.  In my last post on this topic, I showed that states where there is a larger gap in income between these two groups (the above and below the line groups), there is also a larger gap in achievement.  That is, the size of the achievement gap is largely a function of the income distribution in each state.

Let’s take this all one more, last step and ask – If we correct for the differences in income between low and higher income families – how do the achievement gap rankings change? And, let’s do this with an average achievement gap for 2009 across NAEP Reading and Math for Grades 4 and 8.

Figure 6 shows the differences in income for lower and higher income children, with states ranked by the income gap between these groups:

Figure 6

 

Massachusetts, Connecticut and New Jersey have the largest income gaps between families above and below the arbitrary Free or Reduced Price Lunch income cut off.

Now, let’s take a look (Figure 7) at the raw achievement gaps averaged across the four tests:

Figure 7

 

New Jersey has a pretty large raw gap, coming in 5th among the lower 48 states (note there are other difficulties in comparing the income distributions in Alaska and Hawaii, in relation to free/reduced lunch cut points). Connecticut and Massachusetts also have very large achievement gaps.

One can see here, anecdotally that states with larger income gaps in the first figure are generally those with larger achievement gaps.

Here’s the relationship between the two (Figure8):

Figure 8

In this graph, a state that falls on the diagonal line, is a state where the achievement gap is right on target for the expected achievement gap, given the difference in income for those above and below the arbitrary free or reduced price lunch cut-off. New Jersey falls right on that line. States falling on the line have relatively “average” (or expected) achievement gaps.

One can take this the next step to rank the “adjusted” achievement gaps based on how far above or below the line a state falls. States below the line have achievement gaps smaller than expected and above the line have achievement gaps larger than expected. At this point, I’m not totally convinced that this adjustment is capturing enough about the differences in income distributions and their effects on achievement gaps. But it makes for some fun adjustments/comparisons nonetheless. In any case, the raw achievement gap comparisons typically used in political debate are pretty meaningless.

Here are adjusted achievement gap rankings (Figure 9):

Figure 9

Here, NJ comes in 27th in achievement gap. That is 27th from largest. That is, New Jersey’s adjusted achievement gap between higher and lower-income students, when correcting for the size of the income gap between those students, is smaller than the gap in the average state.


[3] For further explanation of the problems with poverty measurement across states, using constant thresholds, and proposed solutions see: Renwick, Trudi. Alternative Geographic Adjustments of U.S. Poverty Thresholds: Impact on State Poverty Rates. U.S. Census Bureau, August 2009. https://xteam.brookings.edu/ipm/Documents/Trudi_Renwick_Alternative_Geographic_Adjustments.pdf

Friday Afternoon Maps: New Orleans, Race & School Locations

A few weeks back, I noticed several tweets about this recent article in Harvard Education Review which takes a look at racial politics and the rebuilding of New Orleans in the Post-Katrina era.

Here’s the dropbox link tweeted by Diane Ravitch:

http://dl.dropbox.com/u/11116752/Buras_2011-Race_Charter_Schools_Conscious_Capitalism.pdf.pdf

The article is by Kristen Buras of Georgia State University. Buras, like at least a few others, points out that Hurricane Katrina forced the greatest housing displacement in poor black neighborhoods of New Orleans. But, perhaps more disturbing was that in the post Katrina period, redevelopment… and especially redevelopment of the new, mixed delivery schooling system largely ignored those same areas, leading to a system where access to schooling is very disparately distributed geographically.

In her article Buras went to the painstaking steps of hand plotting the locations of post-Katrina schools (See her Figure 3, page 321) to make her point about school locations, and that map certainly does so, though a good before-after might be even clearer.

I’ve been meaning to do some pre-post Katrina school mapping for some time now, but wasn’t quite sure what I wanted to look at, or how I might organize the information. Well, here’s what a little Friday afternoon play has yielded.

First, I used US Census 2000 and American Community Survey 2005 data to set up my background. The background carves New Orleans into Public Use Micro Data Areas (PUMAS, from http://www.ipums.org, boundary files from http://www.census.gov). For the background shading, I used IPUMS data to estimate the percent of resident 5 to 17 year olds in each PUMA that were Black in 2000 and 2005 – pre-Katrina conditions. Those red areas to the right hand side, over toward the lower 9th ward and to the Northeast are almost entirely black, for school aged population. While the entire city has relatively high shares of black population, as Buras notes, uptown and the Garden District are certainly somewhat less black than other parts of the city.

In the first map here, I show the locations and total enrollments of schools (indication of available slots) for the year 2000. I use yellow triangles to indicate if a school is a charter school. There were a few, even in 2000. School locations are based on latitude and longitude data from the National Center for Education Statistics Common Core of Data (www.nces.ed.gov/ccd).

Map 1. Year 2000 distribution of traditional public and charter schools in New Orleans


In the first figure, there are a significant number of decent size schools in the deeper red (higher % black) areas of the city. Citywide, there are a handful of charters scattered around.

Now, here’s the distribution of charters and traditional public schools in 2010. Yes, the city as a whole lost a lot of population (but did rebound somewhat between 2006 and 2010, hence the interest in 2010). Quite strikingly, there are simply very few schools of any size now available in those deep red zones (shading still based on pre-Katrina population). And while there are charters scatted throughout the city, even the highest concentration of those schools is in areas with marginally lower pre-Katrina black populations. There are generally more schools and more larger schools in those neighborhoods.

Again, circle size indicates enrollment size, and if the circle has a yellow triangle over it, the school is a charter school.  Further, I’ve kept the size scaling of circles on the same scale in this map as in the previous one. So, if a circle is smaller, it’s enrollment is smaller.

Map 2. Year 2010 distribution of traditional public and charter schools in New Orleans

Now, it is indeed hard to untangle supply from demand here. One can make the argument that the population didn’t return, therefore there is no demand for schools in those areas previously inhabited by the city’s lowest income black populations. Alternatively, one can as reasonably (and more so after reading Buras) argue that the dearth of available public services may provide some explanation for why families have not returned, or have not been able to return.

One might argue that because there exist so many “schools of choice” throughout the city, that geographic location doesn’t really matter. Ya’ just got to travel a bit. Sign up for one of those great schools over there! But research has consistently shown that even in “choice’ models geographic location/proximity is central to enrollment decisions.  Location matters. And having quality options nearby is important. In fact, parents will often favor location over publicly available “quality” measures, continuing enrollment in schools identified as persistently failing if/when other options are simply not geographically accessible. Then again, those “quality” measures aren’t always particularly meaningful.

This population density map for individuals 18 and under suggests comparable population densities in those areas where school density (especially charter school density) has remained much lower: http://www.gnocdc.org/LossOfChildrenInNewOrleansNeighborhoods/Map3.html

Authors such as Henry Levin have explained on numerous occasions that for a choice model to yield equitable distribution of opportunity, consumers must have equitable access to information on schools and equitable mobility among options. Clearly, equitable geographic access is out the window in Post-Katrina New Orleans. Yeah, I think we already knew this from various media reports. But sometimes I have to play with the data and map them myself for it to really sink in. Whether driven by geographic assignment or by choice enrollment, the distribution of educational opportunities in Map 2 above is troublesome.

Far more troublesome is that so many have publicly pitched this New Orleans mixed delivery model as the key to the future of urban education.

Like Buras, I’m pretty damn skeptical that an education system that has redistributed educational opportunity in the ways seen between Map 1 and Map 2 above is all that.  Just pondering and mapping on a Friday afternoon as the sun finally emerges in the rain-soaked Northeast.

Related maps on school aged population loss here: http://www.gnocdc.org/LossOfChildrenInNewOrleansNeighborhoods/index.html


Take your SGP and VAMit, Damn it!

In the face of all of the public criticism over the imprecision of value-added estimates of teacher effectiveness, and debates over whether newspapers or school districts should publish VAM estimates of teacher effectiveness, policymakers in several states have come up with a clever shell game. Their argument?

We don’t use VAM… ‘cuz we know it has lots of problems, we use Student Growth Percentiles instead. They don’t have those problems.

WRONG! WRONG! WRONG! Put really simply, as a tool for inferring which teacher is “better” than another, or which school outperforms another, SGP is worse, not better than VAM. This is largely because SGP is simply not designed for this purpose. And those who are now suggesting that it is are simply wrong. Further, those who actually support using tools like VAM to infer differences in teacher quality or school quality should be most nervous about the newly found popularity of SGP as an evaluation tool.

To a large extent, the confusion over these issues was created by Mike Johnston, a Colorado State Senator who went on a road tour last year pitching the Colorado teacher evaluation bill and explaining that the bill was based on the Colorado Student Growth Percentile Model, not that problematic VAM stuff. Johnston naively pitched to legislators and policymakers throughout the country that SGP is simply not like VAM (True) and that therefore, SGP is not susceptible to all of the concerns that have been raised based on rigorous statistical research on VAM (Patently FALSE!).  Since that time, Johnston’s rhetoric that SGP gets around the perils of VAM has been widely adopted by state policymakers in states including New Jersey, and these state policymakers understanding of SGP and VAM is hardly any stronger than Johnston’s.

This brings me back to my exploding car analogy. I’ve pointed out previously that if we lived in a society where pretty much everyone still walked everywhere, and then someone came along with this new automotive invention that was really fast and convenient, but had the tendency to explode on every third start, I think I’d walk. I use this analogy to explain why I’m unwilling to jump on the VAM bandwagon, given the very high likelihood of falsely classifying a good teacher as bad and putting their job on the line – a likelihood of misfire that has been validated by research.  Well, if some other slick talking salesperson (who I refer to as slick Mikey J.) then showed up at my door with something that looked a lot like that automobile and had simply never been tested for similar failures, leading the salesperson to claim that this one doesn’t explode (for lack of evidence either way), I’d still freakin’ walk! I’d probably laugh in his face first. Then I’d walk.

Origins of the misinformation aside, let’s do a quick walk through about how  and why, when it comes to estimating teacher effectiveness, SGP is NOT immune to the various concerns that plague value-added modeling. In fact, it is potentially far more susceptible to specific concerns such as the non-random assignment of students and the influence of various student, peer and school level factors that may ultimately bias ratings of teacher effectiveness.

What is a value-added estimate?

A value added estimate uses assessment data in the context of a statistical model, where the objective is quite specifically to estimate the extent to which a student having a specific teacher or attending a specific school influences that student’s difference in score from the beginning of the year to the end of the year – or period of treatment (in school or with teacher). The best of VAMs attempt to account for several prior year test scores (to account for the extent that having a certain teacher alters a child’s trajectory), classroom level mix of students, individual student background characteristics, and possibly school characteristics. The goal is to identify most accurately the share of the student’s value-added that should be attributed to the teacher as opposed to all that other stuff (a nearly impossible task)

What is a Student Growth Percentile?

To oversimplify a bit, a student growth percentile is a measure of the relative change of a student’s performance compared to that of all students and based on a given underlying test or set of tests. That is, the individual scores obtained on these underlying tests are used to construct an index of student growth, where the median student, for example, may serve as a baseline for comparison. Some students have achievement growth on the underlying tests that is greater than the median student, while others have growth from one test to the next that is less (not how much the underlying scores changed, but how much the student moved within the mix of other students taking the same assessments, using a method called quantile regression to estimate the rarity that a child falls in her current position in the distribution, given her past position in the distribution).  For more precise explanations, see: http://dirwww.colorado.edu/education/faculty/derekbriggs/Docs/Briggs_Weeks_Is%20Growth%20in%20Student%20Achievement%20Scale%20Dependent.pdf

So, on the one hand, we’ve got Value-Added Models, or VAMs, which attempt to construct a model of student achievement, and to estimate specific factors that may affect student achievement growth, including teachers, schools, and ideally controlling for prior scores of the same students, characteristics of other students in the same classroom and school characteristics. The richness of these various additional controls plays a significant role in limiting the extent to which one incorrectly assigns either positive or negative effects to teachers. Briggs and Domingue run various alternative scenarios to this effect here: http://nepc.colorado.edu/publication/due-diligence

On the other hand, we have a seemingly creative alternative for descriptively evaluating how one student’s  performance over time compares to the larger group of students taking the same assessments. These growth measures can be aggregated to the classroom or school level to provide descriptive information on how the group of students grew in performance over time, on average, as a subset of a larger group. But, these measures include no attempt at all to attribute that growth or a portion of that growth to individual teachers or schools. That is, sort out the extent to which that growth is a function of the teacher, as opposed to being a function of the mix of peers in the classroom.

What do we know about Value-added Estimates?

  • They are susceptible to non-random student sorting, even though they attempt to control for it by including a variety of measures of student level characteristics, classroom level and peer characteristics, and school characteristics. That is, teachers who persistently serve more difficult students, students who are more difficult in unmeasured ways, may be systematically disadvantaged.
  • They produce different results with different tests or different scaling of different tests. That is, a teacher’s rating based on their students performance on one test is likely to be very different from that same teacher’s rating based on her students performance on a different test, even of the same subject.
  • The resulting ratings have high rates of error for classifying teacher effectiveness, likely in large part due to error or noise in underlying assessment data and conditions under which students take those tests.
  • They are particularly problematic if based on annual assessment data, because these data fail to account for differences in summer learning, which vary widely by student backgrounds (where those students are non-randomly assigned across teachers).

What do we know and don’t we know about SGP?

  • They rely on the same underlying assessment data as VAMs, but simply re-express performance in terms of changes in relative growth rather than the underlying scores (or rescaled scores).
    • They are therefore susceptible to at least equal error of classification concern
    • Therefore, it is reasonable to assume that using different underlying tests may result in different normative comparisons of one student to another
    • Therefore, they are equally problematic if based on annual assessment data
  • They do not even attempt (because it’s not their purpose) to address non-random sorting concerns or other student and peer level factors that may affect “growth.”
    • Therefore, we don’t even know how badly these measures are biased by these omissions? Researchers have not tested this because it is presumed that these measures don’t attempt such causal inference.

Unfortunately, while SGPs are becoming quite popular across states including Massachusetts, Colorado and New Jersey, and SGPs are quickly becoming the basis for teacher effectiveness ratings, there doesn’t appear to be a whole lot of specific research addressing these potential shortcomings of SGPs. Actually, there’s little or none! This dearth of information may occur because researchers exploring these issues assume it to be a no brainer that if VAMs suffer classification problems due to random error, then so too would SGPs based on the same data. If VAMs suffer from omitted variables bias then SGP would be even more problematic, since it includes no other variables. Complete omission is certainly more problematic than partial omission, so why even bother testing it.

In fact, Derek Briggs, in a recent analysis in which he compares the attributes of VAMs and SGPs explains:

We do not refer to school-level SGPs as value-added estimates for two reasons. First, no residual has been computed (though this could be done easily enough by subtracting the 50th percentile), and second, we wish to avoid the causal inference that high or low SGPs can be explained by high or low school quality (for details, see Betebenner, 2008).

As Briggs explains and as Betebenner originally proposed, SGP is essentially a descriptive tool for evaluating and comparing student growth, including descriptively evaluating growth in the aggregate. But, it is not by any stretch of the imagination designed to estimate the effect of the school or the teacher on that growth.

Again, Briggs in his conclusion section of his analysis of relative and absolute measures of student growth explains:

However, there is an important philosophical difference between the two modeling approaches in that Betebenner (2008) has focused upon the use of SGPs as a descriptive tool to characterize growth at the student-level, while the LM (layered model) is typically the engine behind the teacher or school effects that get produced for inferential purposes in the EVAAS. (value-added assessment system) http://dirwww.colorado.edu/education/faculty/derekbriggs/Docs/Briggs_Weeks_Is%20Growth%20in%20Student%20Achievement%20Scale%20Dependent.pdf

To clarify for the non-researcher, non-statisticians, what Briggs means in his reference to “inferential purposes,” is that SGPs, unlike VAMs are not even intended to “infer” that the growth was caused by differences in teacher or school quality.  Briggs goes further to explain that overall, SGPs tend to be higher in schools with higher average achievement, based on Colorado data. Briggs explains:

These result suggest that schools that higher achieving students tend to, on average, show higher normative rates of growth than schools serving lower achieving students. Making the inferential leap that student growth is solely caused by the school and sources of influence therein, the results translate to saying that schools serving higher achieving students tend to, on average, be more effective than schools serving lower achieving students. The correlations between median SGP and current achievement are (tautologically) higher reflecting the fact that students growing faster show higher rates of achievement that is reflected in higher average rates of achievement at the school level.

Again, the whole point here is that it would be a leap, a massive freakin’ unwarrented leap to assume a causal relationship between SGP and school quality, if not building the SGP into a model that more precisely attempts to distill that causal relationship (if any).

It’s a fun and interesting paper and one of the few that addresses SGP and VAM together, but intentionally does not explore the questions and concerns I pose herein regarding how the descriptive results of SGP would compare to a complete value added model at the teacher level, where the model was intended for estimating teacher effects. Rather, Briggs compares the SGP findings only to a simple value-added model of school effects with no background covariates,[1] and finds the two to be highly correlated. Even then Briggs finds that the school level VAM is less correlated with initial performance level than is the SGP (where that correlation is discussed above).

So then, where does all of this techno-babble bring us? It brings us to three key points.

  1. First, there appears to be no analysis of whether SGP is susceptible to the various problems faced by value-added models largely because credible researchers (those not directly involved in selling SGP to state agencies or districts) consider it to be a non-issue. SGPs weren’t ever meant to nor are they designed to actually measure the causal effect of teachers or schools on student achievement growth. They are merely descriptive measures of relative growth and include no attempt to control for the plethora of factors one would need to control for when inferring causal effects.
  2. Second, and following from the first, it is certainly likely that if one did conduct these analyses, that one would find that SGPs produce results that are much more severely biased than more comprehensive VAMS and that SGPs are at least equally susceptible to problems of random error and other issues associated with test administration (summer learning, etc.).
  3. Third, and most importantly, policymakers are far too easily duped into making really bad decisions with serious consequences when it comes to complex matters of statistics and measurement.  While SGPs are, in some ways, substantively different from VAMS, they sure as heck aren’t better or more appropriate for determining teacher effectiveness. That’s just wrong!

And this is only an abbreviated list of the problems that bridge both VAM and SGP and more severely compromise SGP. Others include spillover effects (the fact that one teacher’s scores are potentially affected by other teachers on his/her team serving the same students in the same year), and the fact that only a handful of teachers (10 to 20%) could be assigned SGP scores, requiring differential contracts for those teachers and creating a disincentive to teach core content in elementary and middle grades.  Bad policy is bad policy. And this conversation shift from VAM to SGP is little more than a smokescreen intended to substitute a potentially worse, but entirely untested method with a method for which serious flaws are now well known.

 

Note: To those venders of SGP (selling this stuff to state agencies and districts) who might claim my above critique to be unfair, I ask you to show me the technical analyses conducted by a qualified fully independent third party that shows that SGPs are not susceptible to non-random assignment problems, that they miraculously negate bias resulting from differences in summer learning even when using annual test data, that they have much lower classification error rates when assigning teacher effectiveness ratings, that teachers receive the same ratings regardless of which underlying tests are used and that one teacher’s ratings are not influenced by the other teachers of the same students. Until you can show me a vast body of literature on these issues specifically applied to SGP (or even using SGP as a measure within a VAM), comparable to that already in existence on more complete VAM models, don’t waste my time.


[1] Noting: “while the model above can be easily extended to allow for multivariate test outcomes (typical of applications of the EVAAS by Sanders), background covariates, and a term that links school effects to specific students in the event that students attend more than one school in a given year (c.f., Lockwood et al., 2007, p. 127-128), we have chosen this simpler specification in order to focus attention on the relationship between differences in our choice of the underlying scale and the resulting schools effect estimates.”

Should there be a Constitutional Right to Unlimited Property Taxation?

A Reply to Dunn and Derthick in Education Next

Anyone who has read my previous work knows I’m not generally a fan of tax and expenditure limits. A significant body of empirical research does show that strict tax and expenditure limits can cause significant damage to state school finance systems over the long haul. For example, Author David Figlio in a study of Oregon’s Measure 5 (National Tax Journal Vol 51 no. 1 (March 1998) pp. 55-70) finds that: Oregon student-teacher ratios have increased significantly as a result of the state’s tax limitation. David Figlio and Kim Rueben in the Journal of Public Economics (April 2001, Pages 49-71) find: Using data from the National Center for Education Statistics we find that tax limits systematically reduce the average quality of education majors, as well as new public school teachers in states that have passed these limits. In a non-peer reviewed, but high quality working paper, Thomas Downes and David Figlio “find compelling evidence that the imposition of tax or expenditure limits on local governments in a state results in a significant reduction in mean student performance on standardized tests of mathematics skills.” (http://ase.tufts.edu/econ/papers/9805.pdf)

Despite my general concerns over tax and expenditure limits, I have even greater concern over legal arguments like those posed by an affluent suburban school district in Kansas, summarized by Joshua Dunn and Martha Derthick in the Fall 2011 issue of Education Next. As Dunn and Derthick explain, beginning in the 1990s Kansas imposed limits on the amount of revenue local public school districts can raise above and beyond the revenue they are guaranteed through the state general fund aid formula. One affluent suburban district outside of Kansas City recently filed a legal challenge to those limits in Federal District Court, and that legal challenge was the subject of Dunn and Derthick’s recent column. Dunn and Derthick explain the legal arguments as follows:

Citing Supreme Court decisions in Meyer v. Nebraska (1923) and Pierce v. Society of Sisters (1925), which held that the liberty guaranteed in the Fourteenth Amendment’s Due Process Clause includes a right of parents to control the education of their children, the plaintiffs charged that the local cap infringes on that right. As well, by forbidding additional taxes it limits their right to use their property as they wish. Still more inventive, they invoked the First Amendment right of assembly, saying that the cap prevents voters from expressing their collective wishes at the ballot box. These violations together, they contended, constitute a denial of equal protection of the law.

http://educationnext.org/trouble-in-kansas/

So then, what’s wrong with considering the individual liberty to unlimited property taxation? If such liberties apply to campaign contributions or other forms of assembly, then why not to the choice to levy whatever property tax one sees fit? And what’s wrong with linking the notion of complete “local” control over property taxation to the notion of parental control over the education of one’s own children? Ah, if it was only so simple. But it’s not, and here’s a primer on why.

A Little Background Tax and Expenditure Limits (TELs)

State imposed limitations on the taxing behavior of state recognized intermediate and local jurisdictions fall into a broad category of state fiscal management policies known as Tax and Expenditure Limits, or TELs. Tax and expenditure limits have been around for decades and exist in one form or another across nearly every state.

Arguably, the modern era of Tax and Expenditure Limits began with the adoption by statewide referendum of California’s Proposition 13 in 1978, which included a series of limits to the taxable assessed values of properties and changes in those assessed values and included an overall tax rate cap.  Daniel R. Mullins and Bruce A. Wallin (2004) note that “Within two years of the passage of Proposition 13 (a California initiative), 43 states had implemented some kind of property tax limitation or relief.” [1]  By 2004, Mullins and Wallin indicate that Forty-six states have some form of constitutional or statutory statewide limitation on the fiscal behavior of their units of local government.

Statewide limitations on local property taxes exist in multiple forms across states.

Overall Property Tax Rate Limits: Mullins and Wallin note that limits on property tax rates are the most common form of Tax and Expenditure Limit. Overall property tax rate limits restrict the total (municipal, school and other) property tax rate which can be adopted by local jurisdictions. Overall property tax rate limits may but do not necessarily include an option for local override votes. That is, property tax rates are limited but may be exceeded by local voter approval, often including such restrictions as requiring a super-majority vote to achieve override. Mullins and Wallin note that 33 states have imposed property tax rate limits, with 31 limiting municipalities, 28 counties, 26 school districts and 23 all three types (p. 7).

Specific Property Tax Rate Limits: Specific property tax rate limits apply limitations to tax rates for one component of local public goods or services, for example a rate limit on municipal taxes only or a rate limit on property taxes for  operating revenues for local public schools, or for capital outlay revenues for local public school. Again, override options may or may not be included.

Property Tax Revenue Limit: Property tax revenue limits place limits on the revenue that may be derived from property taxes in a given year, regardless of the rate applied. Revenue limits may either be applied to the total revenue allowable (revenue level) or, more commonly to the rate of increase in revenue allowable.

Assessment Increase Limit: Because property tax revenues collected, and tax bills paid by property owners are a function of both the tax rate applied and the assessed value of properties, constraints placed on the allowable growth in assessed value also operate as property tax limitations.

General Revenue or Expenditure Limit: States also place caps on the total amount of revenue that can be raised from property taxes for specific purposes, or alternatively on the amount of property tax revenue that can be raised and expended in a given year. Like other limits, these may be placed on either the total level or revenue or expenditures or on the annual growth in revenue or expenditures, and may or may not be coupled with override options (where those override options are also specified in state laws).

Finally, many states include complex combinations of the above property tax and expenditure limits, such as including both a limit on the rate at which assessed property values may grow and the a limit on the property tax levy.

Property Taxation and TELs in Kansas

The above descriptions of tax and expenditure limits reveal some of the complexity of how these limits work. For example, state imposed limits on growth in property value assessments are a property tax limit to the same extent as limiting the tax rate than can be applied to those properties.  Property taxes include multiple moving parts, or multiple policy levers, the vast majority of which in most states are creations of and controlled within state constitutions and statutes. Below is a non-exhaustive list of the moving parts of the property tax revenue equation:

  1. The boundaries of taxing jurisdictions:  Taxing jurisdictions are government subdivisions within states, defined in state statutes and/or constitutions. They are creations of the state, even if granted home rule or limited home rule. Taxing jurisdictions may or may not be as simple as “cities and towns” or “municipalities.” In some states, municipal taxing jurisdictions are reasonably aligned with local school taxing jurisdictions, but in others like Kansas, they are not. The lack of contiguity between local public school district boundaries and municipal boundaries in Kansas is largely a result of school district consolidations that occurred under state statutes adopted in the 1960s, concurrent with (shortly before)  rewriting of the education article of the state constitution. In many states, the geographic spaces defined as taxing jurisdictions and enrollment areas for local public school districts continue to be redrawn, as in the case of the northeastern section of Kansas City Missouri School District which was recently annexed to Independence School District through a procedure created (specifically for that circumstance) under a recent Missouri statute.  Further, school district boundary determinations (under state laws) are often linked to a long history (including recent history) of institutionalized and state sanctioned racial discrimination in housing markets.[2] The defined geographic boundaries of a taxing jurisdiction determine the properties that are included in or excluded from that jurisdiction. Those boundaries ultimately determine the total values of property within the bounded space, and in turn the amount of revenue that can or cannot be generated by applying any given tax rate to those properties.


Figure 1

School District (green) Boundaries and Cities and Towns in the Kansas City Metro



  1. Definitions of Property Types: Different types of property exist within any taxing jurisdiction, including residential properties, residential properties owned by non-residents (second homes), commercial properties, industrial properties, utilities and farm properties. In Kansas and elsewhere property types are defined in the State Constitution (Article 11). The definition of property types influences substantially the application of “local” property taxes because each defined jurisdiction contains a different mix of property types – some with more commercial property than others – some more residential – and others more farm property. And the different values applied to different types of properties become a significant factor influencing the local revenue raising capacity of communities. Note that in Kansas, as elsewhere, the highest aggregate property values per child enrolled in school are not those in school districts with the highest valued houses, but are those in communities like Burlington, Moscow and Rolla which each include non-residential properties of significant value.
  2. Valuation Procedures: Procedures for determining the taxable value of properties are also defined in state statutes and constitutions, and in Kansas, in Article 11 of the constitution. Those valuation procedures operate as a form of tax and expenditure limitation. Residential properties are defined to have a taxable value of 11.5% of fair market value, agricultural land 30%, vacant lots 12%. States adopt such structures out of state policy interest in creating certain types of incentives or controls, including incentives to either preserve or develop farm property or vacant lots, or buffer commercial interest from escalating taxes. These differential assessment ratios are effectively limits to the revenue raising capacity from any applied tax rate.
  3. Property Tax Exemptions: States also control, typically via statute, the extent to which intermediate or local jurisdictions may grant exemptions to property taxes, including the duration over which an exemption may be granted or types of properties that may be granted exemptions. States also may impose exemptions such exempting from property taxes, a proportion of the value of residential properties owned by senior citizens, in the policy interest of protecting seniors on fixed income from escalating property taxes. As a tax equity measure, Kansas in the late 1990s adopted and exemption to the first $20,000 in taxable value of a residential property for property taxes applied to General Fund Revenues for schools (a statutory provision).
  4. Tax Rate Setting & Referendum Procedures: States also regulate the procedures by which local school district budgets are determined and/or tax rates are set. In some states with constitutional property tax limits which include override provisions, the referendum procedure for override is in the constitution, and may include a requirement of super-majority vote to achieve an override. Requirement of a super-majority is a limit. In other states, statutory provisions permit local authorities to raise taxes (or resulting revenues) to specific levels without voter approval and above those levels with voter approval. In some cases, those limits are absolute and cannot be exceeded.
  5. Debt Ratio Ceilings on Bonded Indebtedness: States also impose various limitations on the amount of debt “local” jurisdictions may accumulate toward the financing of capital projects. Kansas, like other states, imposes a limit – measured as a percentage of total taxable assessed valuation – on the amount of debt that can be accumulated through issuance of general obligation municipal bonds for the financing of new school construction or major renovations.

Each and every provision above and each and every element of the property tax system is controlled by and exists only as a function of state constitutional provisions and statutes. Further, each piece of the property tax puzzle imposes limitations – state controlled limitations – on the ability of state sanctioned local jurisdictions to raise revenues with property taxes.

Extreme Implications of a constitutional protection for complete, unregulated local citizen control over property taxation

Taken at its extreme, the assumption that local residents of any geographic space in the State of Kansas possess a Constitutional right to unlimited control over property taxation for “their” local public schools means that those local residents would have control over each and every parameter above, as each parameter above is a critical determinant of the revenue generated for local public schools by adoption of a specific property tax levy (a rate multiplier). Set any parameter – or multiplier to “0” – and the whole equation shuts down. No one piece is more important than another at determining the amount of money that can be raised for “local” public schools.

Taken at its extreme, any group of citizen residents of the state of Kansas should be able to organize themselves, and define a geographic area that they consider to be their taxing jurisdiction. They would then have the authority to define the types of properties in their jurisdiction and the method for determining the taxable value of those properties. Further, they would have the right to decide whether a mere majority or super majority vote is required in order to adopt any particular tax rate to apply to those properties.

If local citizens control only the single parameter of tax rate setting (the “mill levy”), the state could simply alter rules for adopting rate increases, such as requiring a super-majority vote. Or the state could adopt legislation which effectively reduces the taxable value of properties or exempts certain types of properties for raising additional school revenue above current local option budget limits. For example, the state could exempt all commercial and industrial properties from additional taxation (much like the 20% exemption on residential properties for General Funds Budgets). Such state controls, while not limiting the levies adopted, would limit the revenue that could be generated by those levies. Each of these rules only presently exists as a function of prior state, not local actions.

Assuming that there exists only a constitutional right to adopt higher tax levies, but those levies are to be adopted within an otherwise completely state controlled policy framework, is illogical. If such constitutional freedoms do exist, then they must apply to each and every relevant parameter limiting revenue.

Clearly, however, assuming that local groups of citizens have unlimited rights to determine each and every parameter in the property tax revenue generating equation is absurd, would moot numerous Kansas statues, Article 11 of the Kansas Constitution, and similar constitutional and statutory provisions across nearly every other state.

The state interest in regulating taxes imposed on non-resident property owners

As school district boundaries are presently organized, especially in the Kansas City metropolitan area, school districts each consist of many types of properties. Implicit in the assumption that there exists a constitutionally protected individual right to raise additional funds, through property taxation, for the education of one’s own children is that there exists an overly simplistic 1 to 1 to 1 ratio between children to be educated, the parents of those children and homeowner taxpayers of the jurisdiction. That is, each taxpayer homeowner is also a parent with interest in the quality of education provided to his or her child at the collective expense.  Such would be true if the group of parents organized to start a private school and used their private resources to finance the operations of that school to a level suitable to their own tastes.

This assumption crumbles when applied to local property taxation for public schools and when we consider the mix of property types, property owners and taxpayers that fall within any school taxing jurisdiction in Kansas. For example, owners of commercial and industrial properties within the jurisdiction may not be residents of the jurisdiction. Taxes paid by these individuals may be affected significantly by the decisions of a simple majority share of local residents of the district. The state has a legitimate interest in and may see fit to limit such impact. And one method for doing so is the maintenance of existing tax and expenditure limits.

It seems absurd to assume that a group of resident citizens of a jurisdiction have a constitutional right to unlimited taxation of someone else’s property without the option of state intervention.

The state interest in regulating taxes imposed on vulnerable minority voting blocks

Senior citizens who currently no-longer have children attending local public schools and are living on fixed income may be outnumbered at the polls in some jurisdictions when school budget (levy referenda) votes are held. Many states have policies exempting portions of the value of properties owned by senior citizens in order to provide some protection against escalating taxes. Those exemptions are a state imposed limit to property taxation.

As noted above, if we accept the assumption of a constitutional right for a group of local residents in a taxing jurisdiction to levy unlimited taxes on the rest of the jurisdiction, we must also accept that those same residents have control over each and every parameter in the property tax revenue generating equation that might limit their revenue raising capacity. A simple majority of residents could then negate exemptions. The state has a legitimate interest in protecting the rights of local minority voter populations, such as senior citizens, through such policy mechanisms as property tax exemptions.

The state interest in maintaining school funding fairness

Finally, the state also has an interest in the maintenance of equity in the provision of public education and in access to equal educational opportunity, and one mechanism the state has adopted in order to maintain equity is the limitation to supplemental local spending through property taxation.

Why is it problematic from an equal educational opportunity perspective for local public school districts to have unlimited ability to raise their property taxes and spend as they see fit on their local public schools? How, for example, does it harm the children of Kansas City, Kansas if the parents in Shawnee Mission or Blue Valley School Districts choose to substantially outspend Kansas City over the next several years and provide far higher quality local public schools?

Given the vast student  population differences across school districts in the Kansas City area and specifically between Kansas City and Shawnee Mission which are immediate neighbors, there exist very large differences in the actual cost of providing children with equal educational opportunity. Professor William Duncombe (Syracuse University), on behalf of the Kansas Legislative Division of Post Audit in 2006, estimated that if the cost of a specific quality of education for the state average district was set to 100 (100%), the cost of achieving equal opportunity for students in Kansas City would be about 35% higher than that average, and in Shawnee Mission would be about 12% lower than that average. Presently, the state school finance formula provides for much less difference in funding than would actually be needed to achieve more equal educational opportunity (See Table 1). In fact, when all state and local revenues are considered, Kansas is rated as having a regressive to flat state school finance system – one where higher poverty districts have systematically lower (or, at best, nearly comparable) resources per pupil than lower poverty districts – in a recent national report (as of 2007-08).[3]

The differences in cost of equal educational opportunity estimated by William Duncombe are a function of many factors, most notably vast differences in the backgrounds and needs of children attending local public school districts (See Table 2). More needy students require a wider array of services, including more specialized personnel, smaller class sizes and specific educational and support programs. The state has both an interest and a constitutional obligation to provide equal educational opportunity.

There are at least two major reasons why states have an interest in the maintenance of equity and equal educational opportunity across local public school districts.

First, education is a positional good. Access to economic opportunity, including access to higher education for children in Kansas City, Kansas depends not only on the absolute level of educational expenditure in their own public schools but on the relative quality of education they receive compared to that of other children competing for the same slots in local public and private colleges and universities.

Second is that the quality of schooling in any given location depends largely on the quality of teacher workforce that may be attracted to teach in any given location.  It is well understood that in any given labor market, working conditions – most notably student population characteristics – substantially influence teacher job choice, most often to the disadvantage of the neediest students. It would take not only equal, but significantly higher wages to recruit and retain teachers of comparable qualification to teach in Kansas City as it would to recruit and retain similar teachers to teach in Shawnee Mission, Blue Valley or Olathe. The competitive wage for teachers of specific qualifications in any given area are driven by the wages paid by each district’s nearest neighboring competitors and by the differences in working conditions across districts.

At present, teacher salaries in Kansas City, Kansas are already much lower than those in Shawnee Mission and other Johnson County districts (Table 3). They are lower partly because the state already allows Johnson County districts to levy a special “cost of living” tax (see Table 4) which falsely assumes that teachers in districts with more expensive houses are therefore more expensive to hire. Providing further opportunity for Johnson County districts to widen the salary gap, by removing state imposed tax limits, would likely lead to even greater disparities in teacher qualifications across wealthy and poor districts serving lower and higher need student populations in the Kansas City metropolitan area.

If Shawnee Mission and other Johnson County parents have the right to raise their property taxes in order to recruit and retain better teachers, don’t Kansas City parents have the same right? While they might have a similar right, they do not have similar capacity. Granting this right does not require that the state adopt any measures to equalize the capacity to compete.

For every additional mill on the local tax levy, Shawnee Mission can raise an additional $117 per pupil, whereas Kansas City can raise only $38, a greater than 3X difference (see Table 5). Even under present circumstances, with imposed limitations to the local option budget, Kansas City salaries lag behind Johnson County districts, and Johnson County districts have already been provided a local taxing opportunity to widen the gap, an option some have used.

TABLES AVAILABLE IN PDF VERSION: Fast Response Brief on Individual Liberty and Tax Limits


[1] Daniel R. Mullins and Bruce A. Wallin (2004) Tax and Expenditure Limitations: Introduction and Overview Public Budgeting and Finance (Winter) 2 – 15

[2] See Kevin Fox Gotham (2000) Urban Space, Restrictive Covenants and the Origins of Racial Residential Segregation in a U.S. City. 1900 to 1950. International Journal of Urban and Regional Research 24 (3) 616-633