If it’s not valid, reliability doesn’t matter so much! More on VAM-ing & SGP-ing Teacher Dismissal

This post includes a few more preliminary musings regarding the use of value-added measures and student growth percentiles for teacher evaluation, specifically for making high-stakes decisions, and especially in those cases where new statutes and regulations mandate rigid use/heavy emphasis on these measures, as I discussed in the previous post.

========

The recent release of New York City teacher value-added estimates to several media outlets stimulated much discussion about standard errors and statistical noise found in estimates of teacher effectiveness derived from the city’s value-added model.  But lost in that discussion was any emphasis on whether the predicted value-added measures were valid estimates of teacher effects to begin with.  That is, did they actually represent what they were intended to represent – the teacher’s influence on a true measure of student achievement, or learning growth while under that teacher’s tutelage.  As framed in teacher evaluation legislation, that measure is typically characterized as “student achievement growth,” and it is assumed that one can measure the influence of the teacher on “student achievement growth” in a particular content domain.

A brief note on the semantics versus the statistics and measurement in evaluation and accountability is in order.

At issue are policies involving teacher “evaluation” and more specifically evaluation of teacher effectiveness, where in cases of dismissal the evaluation objective is to identify particularly ineffective teachers.

In order to “evaluate” (assess, appraise, estimate) a teacher’s effectiveness with respect to student growth, one must be able to “infer” (deduce, conjecture, surmise…) that the teacher affected or could have affected that student growth. That is, for example, given one year’s bad rating, the teacher had sufficient information to understand how to improve her rating in the following year. Further, one must choose measures that provide some basis for such inference.

Inference and attribution (ascription, credit, designation) are not separable when evaluating teacher effectiveness. To make an inference about teacher effectiveness based on student achievement growth, one must attribute responsibility for that growth to the teacher.

In some cases, proponents of student growth percentiles alter their wording [in a truly annoying & dreadfully superficial way] for general public appeal to argue that:

  1. SGPs are a measure of student achievement growth.
  2. Student achievement growth is a primary objective of schooling.
  3. Therefore, teachers and schools should obviously be held accountable for student achievement growth.

Where accountable is a synonym for responsible, to the extent that SGPs were designed to separate the measurement of student growth from attribution of responsibility for it, then SGPs are also invalid on their face for holding teachers accountable.  For a teacher to be accountable for that growth it must be attributable to them and one must be using a method which permits such inference.

Allow me to reiterate this quote from the authors of SGP:

“The development of the Student Growth Percentile methodology was guided by Rubin et al’s (2004) admonition that VAM quantities are, at best, descriptive measures.” (Betebenner, Wenning & Briggs, 2011)

I will save for another day a discussion of the nuanced differences between statistical causation and inference and causation and inference as might be evaluated more broadly in the context of litigation over determination of teacher effectiveness.  The big problem in the current context, as I have explained in my previous post, is created by legislative attempts to attach strict timelines, absolute weights and precise classifications to data that simply cannot be applied in this way.

Major Validity Concerns

We identify[at least ]  3 categories of significant compromises to inference and attribution and therefore accountability for student achievement growth:

  1. The value-added estimate (or SGP) was influenced by something other than the teacher alone
  2. The value-added (or SGP) estimate given one assessment of the teacher’s content domain produces a different rating than the value-added estimate given a different assessment tool
  3. The value-added estimate (or SGP) is compromised by missing data and/or student mobility, disrupting the link between teacher and students. [the actual data link required for attribution]

The first major issue compromising attribution of responsibility for or inference regarding teacher effectiveness based on student growth is that some other factor or set of factors actually caused the student achievement growth or lack thereof. A particularly bothersome feature of many value-added models is that they rely on annual testing data. That is, student achievement growth is measured from April or May in one year to April or May in the next, where the school year runs from September to mid or late June.  As such, for example, the 4th grade teacher is assigned a rating based on children who attended her class from September to April (testing time), or about 7 months, where 2.5 months were spent doing any variety of other things, and another 2.5 months were spent with their prior grade teacher. Let alone the different access to resources each child has during their after school and weekend hours during the 7 months over which they have contact with their teacher of record.

Students with different access to summer and out-of-school time resources may not be randomly assigned across teachers within a given school or across schools within a district. And students who had prior year teachers who may have checked out versus the teacher who delved into the subsequent year’s curriculum during the post-testing month of the prior year may also not be randomly distributed. All of these factors go unobserved and unmeasured in the calculation of a teacher’s effectiveness, potentially severely compromising the validity of a teacher’s effectiveness estimate.  Summer learning varies widely across students by economic backgrounds (Alexander, Entwisle & Olsen, 2001) Further, in the recent Gates MET Studies (2010), the authors found: “The norm sample results imply that students improve their reading comprehension scores just as much (or more) between April and October as between October and April in the following grade. Scores may be rising as kids mature and get more practice outside of school.” (p. )

Numerous authors have conducted analyses revealing the problems of omitted variables bias and the non-random sorting of students across classrooms (Rothstein, 2011, 2010, 2009, Briggs & Domingue, 2011, Ballou et al., 2012). In short, some value-added models are better than others, in that by including additional explanatory measures, the models seem to correct for at least some biases. Omitted variables bias is where any given teacher’s predicted value is influenced partly by factors other than the teacher herself. That is, the estimate is higher or lower than it should be, because some other factor has influenced the estimate. Unfortunately, one can never really know if there are still additional factors that might be used to correct for that bias. Many such factors are simply unobservable. Others may be measurable and observable but are simply unavailable, or poorly measured in the data. While there are some methods which can substantially reduce the influence of unobservables on teacher effect estimates, those methods can typically only be applied to a very small subset of teachers within very large data sets.[2] In a recent conference paper, Ballou and colleagues evaluated the role of omitted variables bias in value-added models and the potential effects on personnel decisions. They concluded:

            “In this paper, we consider the impact of omitted variables on teachers’ value-added estimates, and whether commonly used single-equation or two-stage estimates are preferable when possibly important covariates are not available for inclusion in the value-added model. The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” (Ballou et al., 2012)

A related problem is the extent to which such biases may appear to be a wash, on the whole, across large data sets, but where specific circumstances or omitted variables may have rather severe effects on predicted values for specific teachers. To reiterate, these are not merely issues of instability or error. These are issues of whether the models are estimating the teacher’s effect on student outcomes, or the effect of something else on student outcomes. Teachers should not be dismissed for factors beyond their control. Further, statutes and regulations should not require that principals dismiss teachers or revoke their tenure in those cases where the principal understands intuitively that the teacher’s rating was compromised by some other cause. [as would be the case under the TEACHNJ Act]

Other factors which severely compromise inference and attribution, and thus validity, include the fact that the measured value-added gains of a teacher’s peers – or team members working with the same students – may be correlated, either because of unmeasured attributes of the students or because of spillover effects of working alongside more effective colleagues (one may never know) (Koedel, 2009, Jackson & Bruegmann, 2009). Further, there may simply be differences across classrooms or school settings that remain correlated with effectiveness ratings that simply were not fully captured by the statistical models.

Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates.  Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).

A handful of studies have also found that teacher ratings vary significantly, even for the same subject area, if different assessments of that subject are used. If a teacher is broadly responsible for effectively teaching in their subject area, and not the specific content of any one test, different results from different tests raise additional validity concerns. Which test better represents the teacher’s responsibilities? [must we specify which test counts/matters/represents those responsibilities in teacher contracts?]  If more than one, in what proportions? If results from different tests completely counterbalance, how is one to determine the teacher’s true effectiveness in their subject area? Using data on two different assessments used in Houston Independent School District, Corcoran and Jennings (2010) find:

[A]mong those who ranked in the top category (5) on the TAKS reading test, more than 17 percent ranked among the lowest two categories on the Stanford test. Similarly, more than 15 percent of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.

The Gates Foundation Measures of Effective Teaching Project also evaluated consistency of teacher ratings produced on different assessments of mathematics achievement. In a review of the Gates findings, Rothstein (2010) explained:

 The data suggest that more than 20% of teachers in the bottom quarter of the state test math distribution (and more than 30% of those in the bottom quarter for ELA) are in the top half of the alternative assessment distribution.(p. 5)

And:

In other words, teacher evaluations based on observed state test outcomes are only slightly better than coin tosses at identifying teachers whose students perform unusually well or badly on assessments of conceptual understanding.(p. 5)

Finally, student mobility, missing data, and algorithms for accounting for that missing data can severely compromise inferences regarding teacher effectiveness.  Corcoran (2010) explains that the extent of missing data can be quite large and can vary by student type:

Because of high rates of student mobility in this [Houston] population (in addition to test exemption and absenteeism), the percentage of students who have both a current and prior year test score – a prerequisite for value-added – is even lower (see Figure 6). Among all grade four to six students in HISD, only 66 percent had both of these scores, a fraction that falls to 62 percent for Black students, 47 percent for ESL students, and 41 percent for recent immigrants.” (Corcoran, 2010, p.20- 21)

Thus, many teacher effectiveness ratings would be based on significantly incomplete information, and further, the extent to which that information is incomplete would be highly dependent on the types of students served by the teacher.

One statistical resolution to this problem is imputation. In effect, imputation creates pre-test or post-test scores for those students who weren’t there. One approach is to use the average score for students who were there, or more precisely for otherwise similar students who were there. On its face imputation is problematic when it comes to attribution of responsibility for student outcomes to the teacher, if some of those outcomes are statistically generated for students who were not even there. But not using imputation may lead to estimates of effectiveness that are severely biased, especially when there is so much missing data. Howard Wainer (2011) esteemed statistician and measurement expert formerly with Educational Testing Service (ETS) explains somewhat mockingly how teachers might game imputation of missing data by sending all of their best students on a field trip during fall testing days, and then, in the name of fairness, sending the weakest students on a field trip during spring testing days.[3] Clearly, in such a case of gaming, the predicted value-added assigned to the teacher as a function of the average scores of low performing students at the beginning of the year (while their high performing classmates were on their trip), and high performing ones at the end of the year (while their low performing classmates were on their trip), would not be correctly attributed to the teacher’s actual teaching effectiveness, though it might be attributable to the teacher’s ability to game the system.

In short, validity concerns are at least as great as reliability concerns, if not greater. If a measure is simply not valid, it really doesn’t matter whether it is reliable or not.

If a measure cannot be used to validly infer teacher effectiveness, cannot be used to attribute responsibility for student achievement growth to the teacher, then that measure is highly suspect as a basis for high stakes decisions making when evaluating teacher (or teaching) effectiveness or for teacher and school accountability systems more generally.

References & Additional Readings

Alexander, K.L, Entwisle, D.R., Olsen, L.S. (2001) Schools, Achievement and Inequality: A Seasonal Perspective. Educational Evaluation and Policy Analysis 23 (2) 171-191

Ballou, D., Mokher, C.G., Cavaluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes. Annual Meeting of the Association for Education Finance and Policy. Boston, MA.  http://aefpweb.org/sites/default/files/webform/AEFP-Using%20VAM%20for%20personnel%20decisions_02-29-12.docx

Ballou, D. (2012). Review of “The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-long-term-impacts

Baker, E.L., Barton, P.E., Darling-Hammong, L., Haertel, E., Ladd, H.F., Linn, R.L., Ravitch, D., Rothstein, R., Shavelson, R.J., Shepard, L.A. (2010) Problems with the Use of Student Test Scores to Evaluate Teachers. Washington, DC: Economic Policy Institute.  http://epi.3cdn.net/724cd9a1eb91c40ff0_hwm6iij90.pdf

Betebenner, D., Wenning, R.J., Briggs, D.C. (2011) Student Growth Percentiles and Shoe Leather. http://www.ednewscolorado.org/2011/09/13/24400-student-growth-percentiles-and-shoe-leather

Boyd, D.J., Lankford, H., Loeb, S., & Wyckoff, J.H. (July, 2010). Teacher layoffs: An empirical illustration of seniority vs. measures of effectiveness. Brief 12. National Center for Evaluation of Longitudinal Data in Education Research. Washington, DC: The Urban Institute.

Briggs, D., Betebenner, D., (2009) Is student achievement scale dependent? Paper  presented at the invited symposium Measuring and Evaluating Changes in Student Achievement: A Conversation about Technical and Conceptual Issues at the annual meeting of the National Council for Measurement in Education, San Diego, CA, April 14, 2009. http://dirwww.colorado.edu/education/faculty/derekbriggs/Docs/Briggs_Weeks_Is%20Growth%20in%20Student%20Achievement%20Scale%20Dependent.pdf

Briggs, D. & Domingue, B. (2011). Due Diligence and the Evaluation of Teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/due-diligence.

Budden, R. (2010) How Effective Are Los Angeles Elementary Teachers and Schools?, Aug. 2010, available at http://www.latimes.com/media/acrobat/2010-08/55538493.pdf.

Braun, H, Chudowsky, N, & Koenig, J (eds). (2010) Getting value out of value-added. Report of a Workshop. Washington, DC: National Research Council, National Academies Press.

Braun, H. I. (2005). Using student progress to evaluate teachers: A primer on value-added models. Princeton, NJ: Educational Testing Service. Retrieved February, 27, 2008.

Chetty, R., Friedman, J., Rockoff, J. (2011) The Long Term Impacts of Teachers: Teacher Value Added and Student outcomes in Adulthood. NBER Working Paper # 17699 http://www.nber.org/papers/w17699

Clotfelter, C., Ladd, H.F., Vigdor, J. (2005)  Who Teaches Whom? Race and the distribution of Novice Teachers. Economics of Education Review 24 (4) 377-392

Clotfelter, C., Glennie, E. Ladd, H., & Vigdor, J. (2008). Would higher salaries keep teachers in high-poverty schools? Evidence from a policy intervention in North Carolina. Journal of Public Economics 92, 1352-70.

Corcoran, S.P. (2010) Can Teachers Be Evaluated by their Students’ Test Scores? Should they Be? The Use of Value Added Measures of Teacher Effectiveness in Policy and Practice. Annenberg Institute for School Reform. http://annenberginstitute.org/pdf/valueaddedreport.pdf

Corcoran, S.P. (2011) Presentation at the Institute for Research on Poverty Summer Workshop: Teacher Effectiveness on High- and Low-Stakes Tests (Apr. 10, 2011), available at https://files.nyu.edu/sc129/public/papers/corcoran_jennings_beveridge_2011_wkg_teacher_effects.pdf.

Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

D.C. Pub. Sch., IMPACT Guidebooks (2011), available at http://dcps.dc.gov/portal/site/DCPS/menuitem.06de50edb2b17a932c69621014f62010/?vgnextoid=b00b64505ddc3210VgnVCM1000007e6f0201RCRD.

Education Trust (2011) Fact Sheet- Teacher Quality. Washington, DC. http://www.edtrust.org/sites/edtrust.org/files/Ed%20Trust%20Facts%20on%20Teacher%20Equity_0.pdf

Hanushek, E.A., Rivkin, S.G., (2010) Presentation for the American Economic Association: Generalizations about Using Value-Added Measures of Teacher Quality 8 (Jan. 3-5, 2010), available at http://www.utdallas.edu/research/tsp-erc/pdf/jrnl_hanushek_rivkin_2010_teacher_quality.pdf

Working with Teachers to Develop Fair and Reliable Measures of Effective Teaching. MET Project White Paper. Seattle, Washington: Bill & Melinda Gates Foundation, 1. Retrieved December 16, 2010, from http://www.metproject.org/downloads/met-framing-paper.pdf.

Learning about Teaching: Initial Findings from the Measures of Effective Teaching Project. MET Project Research Paper. Seattle, Washington: Bill & Melinda Gates Foundation. Retrieved December 16, 2010, from http://www.metproject.org/downloads/Preliminary_Findings-Research_Paper.pdf.

Jackson, C.K., Bruegmann, E. (2009) Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers. American Economic Journal: Applied Economics 1(4): 85–108

Kane, T., Staiger, D., (2008) Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER Working Paper #16407 http://www.nber.org/papers/w14607

Koedel, C. (2009) An Empirical Analysis of Teacher Spillover Effects in Secondary School. 28 (6 ) 682-692

Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Working Paper.

Jacob, B. & Lefgren, L. (2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics. 26(1), 101-36.

Sass, T.R., (2008) The Stability of Value-Added Measures of Teacher Quality and Implications for Teacher Compensation Policy. National Center for Analysis of Longitudinal Data in Educational Research. Policy Brief #4. http://eric.ed.gov/PDFS/ED508273.pdf

McCaffrey, D. F., Lockwood, J. R, Koretz, & Hamilton, L. (2003). Evaluating value-added models for teacher accountability. RAND Research Report prepared for the Carnegie Corporation.

McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67.

Rothstein, J. (2011). Review of “Learning About Teaching: Initial Findings from the Measures of Effective Teaching Project.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-learning-about-teaching.

Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.

Rothstein, J. (2010). Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement. Quarterly Journal of Economics, 125(1), 175–214.

Sanders, W. L., Saxton, A. M., & Horn, S. P. (1997). The Tennessee Value-Added Assessment System: A quantitative outcomes-based approach to educational assessment. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid measure? (pp. 137-162). Thousand Oaks, CA: Corwin Press.

Sanders, William L., Rivers, June C., 1996. Cumulative and residual effects of teachers on future student academic  achievement. Knoxville: University of Tennessee Value- Added Research and Assessment Center.

Sass, T.R. (2008) The Stability of Value-Added Measures of Teacher Quality and Implications for Teacher Compensation Policy. Urban Institute http://www.urban.org/UploadedPDF/1001266_stabilityofvalue.pdf

McCaffrey, D.F., Sass, T.R., Lockwood, J.R., Mihaly, K. (2009) The Intertemporal Variability of Teacher Effect Estimates. Education Finance and Policy 4 (4) 572-606

McCaffrey, D.F., Lockwood, J.R. (2011) Missing Data in Value Added Modeling of Teacher Effects. Annals of Applied Statistics 5 (2A) 773-797

Reardon, S. F. & Raudenbush, S. W. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492–519.

Rubin, D. B., Stuart, E. A., and Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics, 29(1):103–116.

Schochet, P.Z., Chiang, H.S. (2010) Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains. Institute for Education Sciences, U.S. Department of Education. http://ies.ed.gov/ncee/pubs/20104004/pdf/20104004.pdf.


The Toxic Trifecta, Bad Measurement & Evolving Teacher Evaluation Policies

This post contains my preliminary thoughts in development for a forthcoming article dealing with the intersection between statistical and measurement issues in teacher evaluation and teachers’ constitutional rights where those measures are used for making high stakes decisions.

The Toxic Trifecta in Current Legislative Models for Teacher Evaluation

A relatively consistent legislative framework for teacher evaluation has evolved across states in the past few years.  Many of the legal concerns that arise do so because of inflexible, arbitrary and often ill-conceived yet standard components of this legislative template. There exist three basic features of the standard model, each of which is problematic on its own regard, and those problems become multiplied when used in combination.

First, the standard evaluation model proposed in legislation requires that objective measures of student achievement growth necessarily be considered in a weighting system of parallel components. Student achievement growth measures are assigned, for example, a 40 or 50% weight alongside observation and other evaluation measures. Placing the measures alongside one another in a weighting scheme assumes all measures in the scheme to be of equal validity and reliability but of varied importance (utility) – varied weight. Each measure must be included, and must be assigned the prescribed weight – with no opportunity to question the validity of any measure. [1] Such a system also assumes that the various measures included in the system are each scaled such that they can vary to similar degrees. That is, that the observational evaluations will be scaled to produce similar variation to the student growth measures, and that the variance in both measures is equally valid – not compromised by random error or bias. In fact, however, it remains highly likely that some components of the teacher evaluation model will vary far more than others if by no other reasons than that some measures contain more random noise than others or that some of the variation is attributable to factors beyond the teachers’ control. Regardless of the assigned weights and regardless of the cause of the variation (true or false measure) the measure that varies more will carry more weight in the final classification of the teacher as effective or not. In a system that places differential weight, but assumes equal validity across measures, even if the student achievement growth component is only a minority share of the weight, it may easily become the primary tipping point in most high stakes personnel decisions.

Second, the standard evaluation model proposed in legislation requires that teachers be placed into effectiveness categories by assigning arbitrary numerical cutoffs to the aggregated weighted evaluation components. That is, a teacher in the 25%ile or lower when combining all evaluation components might be assigned a rating of “ineffective,” whereas the teacher at the 26%ile might be labeled effective.  Further, the teacher’s placement into these groupings may largely if not entirely hinge on their rating in the student achievement growth component of their evaluation. Teachers on either side of the arbitrary cutoff are undoubtedly statistically no different from one another. In many cases as with the recently released teacher effectiveness estimates on New York City teachers, the error ranges for the teacher percentile ranks have been on the order of 35%ile points (on average, up to 50% with one year of data). Assuming that there is any real difference between the teacher at the 25%ile and 26%ile (as their point estimate) is a huge unwarranted stretch. Placing an arbitrary, rigid, cut-off score into such noisy measures makes distinctions that simply cannot be justified especially when making high stakes employment decisions.

Third, the standard evaluation model proposed in legislation places exact timelines on the conditions for removal of tenure. Typical legislation dictates that teacher tenure either can or must be revoked and the teacher dismissed after 2 consecutive years of being rated ineffective (where tenure can only be achieved after 3 consecutive years of being rate effective).[2] As such, whether a teacher rightly or wrongly falls just below or just above the arbitrary cut-offs that define performance categories may have relatively inflexible consequences.

The Forced Choice between “Bad” Measures and “Wrong” Ones

Two separate camps have recently emerged in state policy regarding development and application of measures of student achievement growth to be used in newly adopted teacher evaluation systems. The first general category of methods is known as value-added models and the second as student growth percentiles. Among researchers it is well understood that these are substantively different measures by their design, one being a possible component of the other. But these measures and their potential uses have been conflated by policymakers wishing to expedite implementation of new teacher evaluation policies and pilot programs.

Arguably, one reason for the increasing popularity of the student growth percentile (SGP) approach across states is the extent of highly publicized scrutiny and large and growing body of empirical research over problems with using value-added measures for determining teacher effectiveness (See Green, Baker and Oluwole, 2012). Yet, there has been little such research on the usefulness of student growth percentiles for determining teacher effectiveness. The reason for this vacuum is not that student growth percentiles are simply not susceptible to the problems of value-added models, but that researchers have chosen not to evaluate their validity for this purpose – estimating teacher effectiveness – because they are not designed to infer teacher effectiveness.

A value added estimate uses assessment data in the context of a statistical model (regression analysis), where the objective is to estimate the extent to which a student having a specific teacher or attending a specific school influences that student’s difference in score from the beginning of the year to the end of the year – or period of treatment (in school or with teacher). The most thorough of VAMs attempt to account for several prior year test scores (to account for the extent that having a certain teacher alters a child’s trajectory), classroom level mix of students, individual student background characteristics, and possibly school characteristics. The goal is to identify most accurately the share of the student’s or group of students’ value-added that should be attributed to the teacher as opposed to other factors outside of the teachers’ control.

By contrast, a student growth percentile is a descriptive measure of the relative change of a student’s performance compared to that of all students and based on a given underlying test or set of tests. That is, the individual scores obtained on these underlying tests are used to construct an index of student growth, where the median student, for example, may serve as a baseline for comparison. Some students have achievement growth on the underlying tests that is greater than the median student, while others have growth from one test to the next that is less. That is, the approach estimates not how much the underlying scores changed, but how much the student moved within the mix of other students taking the same assessments, using a method called quantile regression to estimate the rarity that a child falls in her current position in the distribution, given her past position in the distribution.[3]  Student growth percentile measures may be used to characterize each individual student’s growth, or may be aggregated to the classroom level or school level, and/or across children who started at similar points in the distribution to attempt to characterize collective growth of groups of students.

Many, if not most value-added models also involve normative rescaling of student achievement data, measuring in relative terms how much individual students or groups of students have moved within the large mix of students. The key difference is that the value-added models include other factors in an attempt to identify the extent to which having a specific teacher contributed to that growth, whereas student growth percentiles are simply a descriptive measure of the growth itself. A student growth percentile measure could be used in a value-added model.

As described by the authors of the Colorado Growth Model:

A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.” (Betebenner, Wenning & Briggs, 2011)

Unlike value-added teacher effect estimates, student growth percentiles are not intended for attribution of responsibility for student progress to either the teacher or the school.  But if that is so clearly the case (as recently stated as Fall, 2011) is it plausible that states or local school districts will actually choose to use the measures to make inferences? Below is a brief explanation from a Q&A section of the New Jersey Department of Education web site regarding implementation of pilot teacher evaluation programs:

Standardized test scores are not available for every subject or grade. For those that exist (Math and English Language Arts teachers of grades 4-8), Student Growth Percentages (SGPs), which require pre- and post-assessments, will be used. The SGPs should account for 35%-45% of evaluations.  The NJDOE will work with pilot districts to determine how student achievement will be measured in non-tested subjects and grades.[4]

This explanation clearly indicates that student growth percentile data are to be used for “evaluation” of teacher effectiveness. In fact, the SGPs alone, as they stand, as descriptive measures “should be used to account for 35% to 45% of evaluations.” Other states including Colorado have already adopted (pioneered) the use of Student Growth Percentiles as a statewide accountability measure and have concurrently passed high stakes teacher evaluation legislation. But it remains to be seen how the SGP data will be used in district specific contexts in guiding high stakes decisions.

While value-added models are intended estimate teacher effects on student achievement growth, they fail to do so in any accurate or precise way (see Green, Oluwole & Baker, 2012). By contrast, student growth percentiles make no such attempt.[5]  Specifically, value-added measures tend to be highly unstable from year to year, and have very wide error ranges when applied to individual teachers, making confident distinctions between “good” and “bad” teachers difficult if not impossible. Further, while value-added models attempt to isolate that portion of student achievement growth that is caused by having a specific teacher they often fail to do so and it is difficult if not impossible to discern a) how much they have failed and b) in which direction for which teachers. That is, the individual teacher estimates may be biased by factors not fully addressed in the models, and we may not know how much. We also know that when different tests are used for the same content, teacher receive widely varied ratings raising additional questions about the validity of the measures.

While we do not have similar information from existing research on student growth percentiles, it stands to reason that since they are based on the same types of testing data, they will be similarly susceptible to error and noise. But more problematically, since student growth percentiles make no attempt (by design) to consider other factors that contribute to student achievement growth, the measures have significant potential for omitted variables bias.  SGPs leave the interpreter of the data to naively infer (by omission) that all growth among students in the classroom of a given teacher must be associated with that teacher. Even subtle changes to explanatory variables in value-added models change substantively the ratings of individual teachers (Ballou et al., 2012, Briggs & Domingue, 2010). Excluding all potential explanatory variables, as do SGPs, takes this problem to the extreme. As a result, it may turn out that SGP measures at the teacher level appear more stable from year to year than value-added estimates, but that stability may be entirely a function of teachers serving similar populations of students from year to year. That is, the measures may contain stable omitted variables bias, and thus may be stable in their invalidity.

In defense of Student Growth Percentiles as accountability measures but with no mention of their use for teacher evaluation, Betebenner, Wenning and Briggs (2011) explain that one school of thought is that value-added estimates are also most reasonably interpreted as descriptive measures, and should not be used to infer teacher or school effectiveness:

“The development of the Student Growth Percentile methodology was guided by Rubin et al’s (2004) admonition that VAM quantities are, at best, descriptive measures.” (Betebenner, Wenning & Briggs, 2011)

Rubin et al explain:

“Value-added assessment is a complex issue, and we appreciate the efforts of Ballou et al. (2004), McCaffrey et al. (2004) and Tekwe et al. (2004). However, we do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions. We argue that models such as these should not be seen as estimating causal effects of teachers or schools, but rather as providing descriptive measures.” (Rubin et al., 2004)

Arguably, these explanations do less to validate the usefulness of Student Growth Percentiles as accountability measures (inferring attribution and/or responsibility to schools and teachers) and far more to invalidate the usefulness of both Student Growth Percentiles and Value-Added Models for these purposes.

New Jersey’s TEACHNJ: At The Intersection of the Toxic Trifecta and “Wrong” Measures

A short while back, John Mooney over at NJ Spotlight provided an overview of a pending bill in the New Jersey legislature which just so happens to contain explicitly at least two out of three of the elements of the Toxic Trifecta and contains the third implicitly by granting deference to the NJ Department of Education to approve the quantitative measures used in evaluation systems.

Text of the Bill: http://www.njleg.state.nj.us/2012/Bills/S0500/407_I1.PDF

First, the bill throughout refers to the creation of performance categories as discussed above, implicitly if not explicitly declaring those categories to be absolute, clearly defined and fully differentiable from one another.

Second, while the bill is not explicit in its requirement of specific quantified performance metrics the bill grants latitude on this matter to the NJ Department of Education (to approve local plans) which a) is developing a student growth percentile model to be used for these purposes, and b) under its pilot plan is suggesting (if not requiring) that districts use the student growth percentile data for 35 to 45% of evaluations, as noted above.

Third, the bill places an absolute and inflexible timeline on dismissal:

Notwithstanding any provision of law to the contrary, the principal, in consultation with the panel, shall revoke the tenure granted to an employee in the position of teacher, assistant principal, or vice-principal if the employee is evaluated as ineffective in two consecutive annual evaluations. (p. 10)

The key word here is “shall” which indicates a statutory obligation to revoke tenure. It does not say “may,” or “at the principal’s discretion.” It says shall.

The principal shall revoke tenure if a teacher is unlucky enough to land below an arbitrary cut-point, using a measure not designed for such purposes, for two years in a row. (even if the teacher was lucky enough to achieve an “awesome” rating every other year of her career!)

The kicker is that the bill goes one step further to attempt to eliminate any due process right a teacher might have to challenge the basis for the dismissal:

The revocation of the tenure status of a teacher, assistant principal, or vice-principal shall not be subject to grievance or appeal except where the ground for the grievance or appeal is that the principal failed to adhere substantially to the evaluation process. (p. 10)

In other words, the bill attempts to establish that teachers shall have no basis (no procedural due process claim) for grievance as long as the principal has followed their evaluation plan, ignoring the possibility – the fact – that these evaluation plans themselves, approved or not, will create scenarios and cause personnel decisions which violate due process rights. Further, the attempt at restricting due process rights laid out in the bill itself is a threat to due process and would likely be challenged.

Declaring any old process to constitute due process does not make it so! Especially where the process is built on not only “bad” but “wrong” measures used in a framework that forces dismissal decisions on at least 3 completely arbitrary and capricious bases (2 consecutive years in isolation, fixed weight on wrong measure, arbitrary cut-points for performance categories).

So this raises the big question of what’s behind all of this. Clearly, one thing that’s behind all of this is an astonishing ignorance of statistics and measurement among state legislators favoring the toxic trifecta – either that or a willful neglect of their legislative duty to respect constitutional protections including due process (or both!).


[1] A more reasonable alternative being to use the statistical information as a preliminary screening tool for identifying potential problem areas, and then using more intensive observations and additional evaluation tools as follow-up.  This approach acknowledges that the signals provided by the statistical information may in fact be false either as a function of reliability problems or lacking validity (other conditions contributed to the rating), and therefore in some if not many cases, should be discarded.   The parallel consideration more commonly used requires that the student growth metric be considered and weighted as prescribed, reliable, valid or not.

[2] For example, at the time of writing this draft, the bill introduced in New Jersey read: “Notwithstanding any provision of law to the contrary, the principal shall revoke the tenure granted to an employee in the position of teacher, assistant principal, or vice-principal, regardless of when the employee acquired tenure, if the employee is evaluated as ineffective or partially effective in one year’s annual summative evaluation and in the next year’s annual summative evaluation the employee does not show improvement by being evaluated in a higher rating category. The only evaluations which may be used by the principal for tenure revocation are those evaluations conducted in the 2013-2014 school year and thereafter which use the rubric adopted by the board and approved by the commissioner. The school improvement panel may make recommendations to the principal on a teacher’s tenure revocation.” http://www.njspotlight.com/assets/12/0203/0158

[5] Briggs and Betebenner (2009) explain: “However, there is an important philosophical difference between the two modeling approaches in that Betebenner (2008) has focused upon the use of SGPs as a descriptive tool to characterize growth at the student-level, while the LM (layered model) is typically the engine behind the teacher or school effects that get produced for inferential purposes in the EVAAS.” (Briggs & Betebenner, 2009, p. )

References

Alexander, K.L, Entwisle, D.R., Olsen, L.S. (2001) Schools, Achievement and Inequality: A Seasonal Perspective. Educational Evaluation and Policy Analysis 23 (2) 171-191

Ballou, D., Mokher, C.G., Cavaluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes. Annual Meeting of the Association for Education Finance and Policy. Boston, MA.  http://aefpweb.org/sites/default/files/webform/AEFP-Using%20VAM%20for%20personnel%20decisions_02-29-12.docx

Ballou, D. (2012). Review of “The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-long-term-impacts

Baker, E.L., Barton, P.E., Darling-Hammong, L., Haertel, E., Ladd, H.F., Linn, R.L., Ravitch, D., Rothstein, R., Shavelson, R.J., Shepard, L.A. (2010) Problems with the Use of Student Test Scores to Evaluate Teachers. Washington, DC: Economic Policy Institute.  http://epi.3cdn.net/724cd9a1eb91c40ff0_hwm6iij90.pdf

Betebenner, D., Wenning, R.J., Briggs, D.C. (2011) Student Growth Percentiles and Shoe Leather. http://www.ednewscolorado.org/2011/09/13/24400-student-growth-percentiles-and-shoe-leather

Boyd, D.J., Lankford, H., Loeb, S., & Wyckoff, J.H. (July, 2010). Teacher layoffs: An empirical illustration of seniority vs. measures of effectiveness. Brief 12. National Center for Evaluation of Longitudinal Data in Education Research. Washington, DC: The Urban Institute.

Briggs, D., Betebenner, D., (2009) Is student achievement scale dependent? Paper  presented at the invited symposium Measuring and Evaluating Changes in Student Achievement: A Conversation about Technical and Conceptual Issues at the annual meeting of the National Council for Measurement in Education, San Diego, CA, April 14, 2009. http://dirwww.colorado.edu/education/faculty/derekbriggs/Docs/Briggs_Weeks_Is%20Growth%20in%20Student%20Achievement%20Scale%20Dependent.pdf

Briggs, D. & Domingue, B. (2011). Due Diligence and the Evaluation of Teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/publication/due-diligence.

Budden, R. (2010) How Effective Are Los Angeles Elementary Teachers and Schools?, Aug. 2010, available at http://www.latimes.com/media/acrobat/2010-08/55538493.pdf.

Braun, H, Chudowsky, N, & Koenig, J (eds). (2010) Getting value out of value-added. Report of a Workshop. Washington, DC: National Research Council, National Academies Press.

Braun, H. I. (2005). Using student progress to evaluate teachers: A primer on value-added models. Princeton, NJ: Educational Testing Service. Retrieved February, 27, 2008.

Chetty, R., Friedman, J., Rockoff, J. (2011) The Long Term Impacts of Teachers: Teacher Value Added and Student outcomes in Adulthood. NBER Working Paper # 17699 http://www.nber.org/papers/w17699

Clotfelter, C., Ladd, H.F., Vigdor, J. (2005)  Who Teaches Whom? Race and the distribution of Novice Teachers. Economics of Education Review 24 (4) 377-392

Clotfelter, C., Glennie, E. Ladd, H., & Vigdor, J. (2008). Would higher salaries keep teachers in high-poverty schools? Evidence from a policy intervention in North Carolina. Journal of Public Economics 92, 1352-70.

Corcoran, S.P. (2010) Can Teachers Be Evaluated by their Students’ Test Scores? Should they Be? The Use of Value Added Measures of Teacher Effectiveness in Policy and Practice. Annenberg Institute for School Reform. http://annenberginstitute.org/pdf/valueaddedreport.pdf

Corcoran, S.P. (2011) Presentation at the Institute for Research on Poverty Summer Workshop: Teacher Effectiveness on High- and Low-Stakes Tests (Apr. 10, 2011), available at https://files.nyu.edu/sc129/public/papers/corcoran_jennings_beveridge_2011_wkg_teacher_effects.pdf.

Corcoran, Sean P., Jennifer L. Jennings, and Andrew A. Beveridge. 2010. “Teacher Effectiveness on High- and Low-Stakes Tests.” Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI.

D.C. Pub. Sch., IMPACT Guidebooks (2011), available at http://dcps.dc.gov/portal/site/DCPS/menuitem.06de50edb2b17a932c69621014f62010/?vgnextoid=b00b64505ddc3210VgnVCM1000007e6f0201RCRD.

Education Trust (2011) Fact Sheet- Teacher Quality. Washington, DC. http://www.edtrust.org/sites/edtrust.org/files/Ed%20Trust%20Facts%20on%20Teacher%20Equity_0.pdf

Hanushek, E.A., Rivkin, S.G., (2010) Presentation for the American Economic Association: Generalizations about Using Value-Added Measures of Teacher Quality 8 (Jan. 3-5, 2010), available at http://www.utdallas.edu/research/tsp-erc/pdf/jrnl_hanushek_rivkin_2010_teacher_quality.pdf

Working with Teachers to Develop Fair and Reliable Measures of Effective Teaching. MET Project White Paper. Seattle, Washington: Bill & Melinda Gates Foundation, 1. Retrieved December 16, 2010, from http://www.metproject.org/downloads/met-framing-paper.pdf.

Learning about Teaching: Initial Findings from the Measures of Effective Teaching Project. MET Project Research Paper. Seattle, Washington: Bill & Melinda Gates Foundation. Retrieved December 16, 2010, from http://www.metproject.org/downloads/Preliminary_Findings-Research_Paper.pdf.

Jackson, C.K., Bruegmann, E. (2009) Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers. American Economic Journal: Applied Economics 1(4): 85–108

Kane, T., Staiger, D., (2008) Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation. NBER Working Paper #16407 http://www.nber.org/papers/w14607

Koedel, C. (2009) An Empirical Analysis of Teacher Spillover Effects in Secondary School. 28 (6 ) 682-692

Koedel, C., & Betts, J. R. (2009). Does student sorting invalidate value-added models of teacher effectiveness? An extended analysis of the Rothstein critique. Working Paper.

Jacob, B. & Lefgren, L. (2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics. 26(1), 101-36.

Sass, T.R., (2008) The Stability of Value-Added Measures of Teacher Quality and Implications for Teacher Compensation Policy. National Center for Analysis of Longitudinal Data in Educational Research. Policy Brief #4. http://eric.ed.gov/PDFS/ED508273.pdf

McCaffrey, D. F., Lockwood, J. R, Koretz, & Hamilton, L. (2003). Evaluating value-added models for teacher accountability. RAND Research Report prepared for the Carnegie Corporation.

McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67.

Rothstein, J. (2011). Review of “Learning About Teaching: Initial Findings from the Measures of Effective Teaching Project.” Boulder, CO: National Education Policy Center. Retrieved [date] from http://nepc.colorado.edu/thinktank/review-learning-about-teaching.

Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.

Rothstein, J. (2010). Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement. Quarterly Journal of Economics, 125(1), 175–214.

Sanders, W. L., Saxton, A. M., & Horn, S. P. (1997). The Tennessee Value-Added Assessment System: A quantitative outcomes-based approach to educational assessment. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid measure? (pp. 137-162). Thousand Oaks, CA: Corwin Press.

Sanders, William L., Rivers, June C., 1996. Cumulative and residual effects of teachers on future student academic  achievement. Knoxville: University of Tennessee Value- Added Research and Assessment Center.

McCaffrey, D.F., Sass, T.R., Lockwood, J.R., Mihaly, K. (2009) The Intertemporal Variability of Teacher Effect Estimates. Education Finance and Policy 4 (4) 572-606

McCaffrey, D.F., Lockwood, J.R. (2011) Missing Data in Value Added Modeling of Teacher Effects. Annals of Applied Statistics 5 (2A) 773-797

Reardon, S. F. & Raudenbush, S. W. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492–519.

Rubin, D. B., Stuart, E. A., and Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics, 29(1):103–116.

Schochet, P.Z., Chiang, H.S. (2010) Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains. Institute for Education Sciences, U.S. Department of Education. http://ies.ed.gov/ncee/pubs/20104004/pdf/20104004.pdf.

Real Reform versus Fake Reformy Distractions: More Implications from NJ & MA for CT!

Recently, I responded to an absurd and downright disturbing Op-Ed by a Connecticut education reform organization that claimed that Connecticut needed to move quickly to adopt teacher evaluation/tenure reforms and expand charter schooling because a) Connecticut has a larger achievement gap and lower outcomes for low income students than Massachusetts or New Jersey and b) New Jersey and Massachusetts were somehow outpacing Connecticut in adopting new reformy policies regarding teacher evaluation. Now, the latter assertion is questionable enough to begin with, but the most questionable assertion was that any recent policy changes that may have occurred in New Jersey or Massachusetts explain why low income children in those states do better, and have done better at a faster rate than low income kids in Connecticut. Put simply, bills presently on the table, or legislation and regulations adopted and not yet phased in do not explain the gains in student outcomes of the past 20 years.

Note that I stick to comparisons among these states because income related achievement gaps are most comparable among them (that is, the characteristics of the populations that fall above and below the income thresholds for free/reduced lunch are relatively comparable among these states, but not so much to states in other regions of the country).

I’m not really providing much new information in this post, but I am elaborating on my previous point about the potential relevance of funding equity – school finance – reforms – and providing additional illustrations.

First, let’s take a look at the relationship between state and local revenues per pupil and median household income over time in Mass, CT, RI and NJ. These data are drawn from an article I published in 2010 in which I estimated a model for all states of the relationship between median household income and state and local revenue per pupil over time. In that recent article, I plotted the relationship over time for several states. Here, I have re-plotted that relationship for New Jersey, Massachusetts, Connecticut and Rhode Island.

This graph shows the slope of the statistical relationship between state and local revenues per pupil and median household income, controlling for differences in regional labor costs and economies of scale of districts. All 4 states managed to reduce or eliminate what had been positive relationships between household income and district revenues. That is, in 1990, in each state higher income districts had more revenue. By the late 1990s, Mass and NJ had reversed that pattern, and in fact had redistributed revenue such that districts with lower median household income actually had more revenue. The finance systems had become “progressive,” so-to-speak.

By contrast, the relationship between income and revenue remained more random – and on average flat – in RI and CT.  How Connecticut made so little consistent progress toward fiscal equity over time is another story for another day.

By 2009, things were pretty much the same, as the next figure shows. The next figure compares current operating expenditures per pupil and district poverty rates. Current operating expenditures include expenditure of federal funds, including Title I funds. Those federal funds tend to add marginally to the progressiveness of the system. But, what’s also important is not just whether the trendline tilts upward, but whether the pattern is systematic or predictable. In New Jersey, among districts enrolling 2,000 or more students (scale efficient), census poverty rates alone explain 47% of the variation in current spending. It’s a predictable, upward slope whereby higher poverty districts actually have and spend more per pupil. In Massachusetts, poverty explains nearly 40% of the variation in spending per pupil.

But, by contrast, in Connecticut, poverty explains only about 15% of the variation in spending across districts. Notably, a handful of high poverty districts including Bridgeport, Waterbury and New Britain have been left well behind… well below  the curve.  Further, even though Hartford and New Have have more funding, much of that funding has been channeled through a magnet school aid program, so the seemingly high position of these two districts is somewhat deceptive, and not part of any systematic effort to improve equity.

I’m doing some picking on CT here in particular, because CT has hardly provided systematic targeted support to high need districts. That said, CT is hardly the worst offender among states. Several states do far worse than CT, including PA, NY and IL in particular.  But that’s a post for another day. Those states are not sufficiently comparable in other ways to include in these comparisons. Our report on School Funding Fairness provides a full breakdown across states (www.schoolfundingfairness.org).

In the best possible case, CT has selectively targeted funding to some high need districts and has left out others. Rhode Island funding is somewhat more predictable (though deceptive because of the small number of districts) than CT but clearly less systematic than NJ or Mass.

Now, here are the longitudinal trends in student outcomes in these states, cut a few different ways and focusing on disadvantaged children. First, because I’m not a fan of comparing “low income” kids to “low income” kids even across these relatively similar economies, I take a look at children of mothers who were high school dropouts, and 8th grade math performance:

From 2000 forward (where the scores are relatively comparable over time), children of maternal high school dropouts in Mass and NJ far outpace those in CT or RI.

Next, here are the trends for children who qualify for free lunch, also on 8th grade math:

Finally, here are the trends for 4th grade reading:

In each case, low income kids in the two states that have more aggressively pursued funding equity reforms outperform and have improved their performance faster than low income kids in states that have not.

While these trends are hardly conclusive evidence [more extensive analysis is underway] of a link between the funding shifts and outcome gains, these trends are consistent with a significant body of research on the effects of school finance reforms, which I summarize in these articles/papers:

  1. http://www.tcrecord.org/library/abstract.asp?ContentId=16106
  2. http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

These illustrations coupled with the larger body of related research present a more compelling case for improving funding equity and adequacy than the case that has been presented thus far for other reforms – reforms which lack both any significant research base OR other compelling evidence.

Simply arguing that teacher quality varies – good teachers matter – and that we have to “fix” teacher quality says nothing of how to actually improve teacher quality or the resources required to get the job done. “Teacher quality” is not a policy.

Notably, the CT Education reform group that posted the original absurd claims later softened their arguments regarding the causes of Connecticut’s failures and Mass and NJ’s successes. But somehow, they still found it in them to pitch the same policy solutions. Yes, solutions that had nothing to do with why Mass and NJ were outperforming CT. If the original argument doesn’t hold, then neither do the proposed solutions! That’s how dumb this debate has gotten!

Let’s be absolutely clear here – During the years over which Mass and NJ saw these substantial improvements in outcomes, both states had teacher tenure systems – in both successful and less successful schools! Both states had rather traditional teacher evaluation policies in place. Both states had relatively small overall shares of children attending charter schools. Massachusetts has received some accolades for its accountability system adopted concurrent with funding reforms.

Even if the path to improved productivity and efficiency did involve reforms with any resemblance to the teacher evaluation or charter school reforms on the table, equitable and adequate financing would be a prerequisite condition for doing any of it well. Real innovation requires real investment.

Notably, the types of innovations adopted by those few charter chains which show relatively consistent records of success are rather mundane strategies which involve providing more time and intensive supplemental tutoring, typically at a substantially greater per pupil cost – much of it reflected in higher teacher salaries to support the additional time and responsibilities.   Yes, even in Connecticut those higher flying charters, while serving far less needy student populations than other schools in their host districts, are also outspending them! (and pay teachers more!)

We also have pretty good evidence from research that salaries matter (see page 7-)– that the average level of teacher salaries affects the quality of entrants to the profession – and that the relative salary paid in one district versus another may affect teacher job choice – leading to sorting of teacher credentials across districts.

By contrast we have little evidence that mass deselection of teachers on the basis of noisy and potentially biased measures of student achievement growth – without counterbalancing the risk with substantial additional reward would have any benefit whatsoever [outside of self-fulfilling simulated correlations] – and it might, in fact, do significant harm! We also have a pretty solid track record of studies suggesting few or no benefits of attaching compensation to similar performance metrics (e.g. merit pay).[1] Yet the reformy rhetoric advancing these policies through copy-and-paste template legislation persists! Is anyone taking even a brief time out to think and evaluate these proposals?

Even if we didn’t have all of this evidence for how and why money matters, wouldn’t it make sense on its face to provide schools and districts a level playing field? What possible argument is there for deciding to completely overlook substantial financial inequities in favor of policy options which, to implement properly (if that’s even possible), require equitable financial resources. Yes, real reform costs money.  Good schools cost money!

If reformy advocates really don’t believe that money has anything to do with improving schooling quality, then why do charter advocates in Connecticut (and elsewhere) push so hard for substantial increases in funding ($2,600 per pupil in CT)? Yet the underfunded high need districts stand to gain far, far less under current proposals (only about $250 per pupil). Funding those districts appropriately comes with a higher price tag than increasing charter funding because they serve a lot more students and far more needy students than CT charters. The persistent inequities faced by these districts are glaring… right down to the delivery of basic curricular options, especially in the neediest, least well resourced districts. Larger class sizesAdvanced and enriched curricular optionsTeacher SalariesConcentration of novice teachers.

These are the real differences between well resourced, high performing and poorly resourced, low performing districts in Connecticut.  Both groups of districts pay teachers based on degrees and experience. Both sets of districts have tenure systems. I suspect that both sets of districts do little systematically different regarding teacher evaluation (which might be improved, but not by some ill-conceived state legislation).  So clearly none of that stuff has much of anything to do with which districts are succeeding and which are not ! The districts are substantively  different in terms of who they serve, and in terms of the resources (by the measures noted above) they have to serve them!

I’ve long argued that it seems rather hypocritical to hear staunch charter advocates argue that more money wouldn’t help traditional public schools, and then those same charter advocates set out to help their favored charters substantially outspend the nearby public schools (meanwhile, invariably serving less needy, less costly children to educate).

If money has nothing to do with improving schooling quality – or providing high quality schooling – then why is the average tuition of private independent day schools in Connecticut typically well above (around $25k elem, up to $35k HS) current spending levels of nearby public districts (where tuition covers only a portion of current spending).[2] Money – and what it buys – clearly means something to the consumers of what I might call luxury schooling.  Nationally, private independent schools spend on average 1.96 times average spending of public districts in their same labor market. [3] Meanwhile, these schools DO NOT generally fire teachers at will. They do typically pay based largely on seniority and degree level. What do they do? Well, for one thing, they provide small class sizes [whether that’s the most efficient allocation or not is a separate question, but that is what they choose to do… and what they often advertise]!

It is about the money. Even/especially those disingenuous arguments that it’s NOT about money? Those arguments perhaps, are the ones that most clearly indicate that it IS about money – and a complete unwillingness to acknowledge the importance of money to anyone else but one’s self. Strangely, we’ve reached a point in the discourse where it’s all about finding any rationale we can to NOT give money to the schools and districts that need it most while continuing to blame them, their teachers and the children they serve for the persistent cycle of poor performance – and where we define their performance as poor by comparison to their more advantaged and better resourced peers! Isn’t anyone else seeing the absurdity?[4] [starve them… shift money away from them… then blame them for doing even worse?]

Clearly the media isn’t catching it. I’m less frustrated by the absurdity of these arguments (I rather expect it in the political sphere) than the fact that the media is either complicit in advancing the silliness, or intellectually incapable of seeing through it.

Those who continue to ignore outright the role of money in schooling in favor of reformy window dressing – meanwhile leveraging every opportunity to access a greater share of the federal, state and local tax dollar, present a laughable case for their preferred reforms.  But for some reason, I’m finding it really hard to laugh.

=======

[1]Glazerman, S., Seifullah, A. (2010) An Evaluation of the Teacher Advancement Program in Chicago: Year Two Impact Report. Mathematica Policy Research Institute. 6319-520
Springer, M.G., Ballou, D., Hamilton, L., Le, V., Lockwood, J.R., McCaffrey, D., Pepper, M., and Stecher, B. (2010). Teacher Pay for Performance: Experimental Evidence from the Project on Incentives in Teaching. Nashville, TN: National Center on Performance Incentives at Vanderbilt University.
Marsh, J. A., Springer, M. G., McCaffrey, D. F., Yuan, K., Epstein, S., Koppich, J., Kalra, N., DiMartino, C., & Peng, A. (2011). A Big Apple for Educators: New York City’s Experiment with Schoolwide Performance Bonuses. Final Evaluation Report. RAND Corporation & Vanderbilt University.  http://www.americanprogress.org/issues/2011/04/pdf/class_size.pdf

[2] http://www.brunswickschool.org/admissions/fees-financial-aid/, http://www.chasecollegiate.org/page.cfm?p=208, http://www.fairfieldprep.org/page.cfm?p=24, http://www.greenwichacademy.org/podium/default.aspx?t=32259, http://www.klht.org/podium/default.aspx?t=2170, http://www.stanwichschool.org/admissions/tuition.asp, http://woosterschool.org/admissions/tuition-fees

[3] Baker, B. (2009). Private schooling in the U.S.: Expenditures, supply, and policy implications.
Boulder and Tempe: Education and the Public Interest Center & Education Policy Research
Unit. Retrieved [date] from http://epicpolicy.org/publication/private-schooling-US

[4] An argument taken to its extremes in a recent NJDOE report: https://schoolfinance101.wordpress.com/2012/02/24/how-not-to-fix-the-new-jersey-achievement-gap/

Follow up on Reformy Logic in Connecticut

A few days ago, I responded to an utterly silly CT Ed Reform op-ed which argued that poverty doesn’t really matter so much, nor does funding (by omission), and that Massachusetts and New Jersey do better than Connecticut on behalf low income kids because they’ve adopted accountability and teacher evaluation reforms in the past few years. Thus, the answer is for Connecticut to follow suit by adopting SB 24 in its original form. To be clear, NJ has absolutely not adopted anything like SB 24.  Here’s a key section of that op-ed:

We think folks would be hard-pressed to argue that low-income students right over the border in Massachusetts or New Jersey face very different circumstances at home than the low-income students in Connecticut.  So, what actions have our neighboring states taken to address their achievement gaps that Connecticut hasn’t?  Put bluntly, they have adopted education reform policies very similar to the ones proposed in Governor Malloy’s original education reform bill.  They have adopted or implemented policies that evaluate teachers on the basis of student performance, that rank schools and districts within a tiered intervention framework, and that provide the Commissioner with the authority to intervene in the lowest performing schools and districts.

In my previous post, I already pointed out a few simple realities… like the fact that both Massachusetts and New Jersey have systematically tackled school funding equity over time, where Connecticut has not. I also suggested that it might be rather foolish to argue that policies considered and/or adopted but not even really implemented yet were the cause of improvements to New Jersey and Massachusetts low income student progress.

So, here’s just one more graph to drive home that point:

http://nces.ed.gov/nationsreportcard/naepdata/dataset.aspx

Yes – Mass and NJ do better than CT, across all student groups, including lower income students. And yes, Mass and NJ have shown higher rates of growth in student outcomes, across all student groups. And you know what, much if not most of that growth has occurred prior to considering, piloting and partially implementing new policies.

So… to put it really simply… it’s pretty darn unlikely that Mass and NJ do better than CT because of recent policy developments.

Further, across all three states, poverty continues to matter.

Children who do not qualify for either free or reduced lunch outperform those who do.

Children who qualify for reduced lunch (185% poverty income level) outperform those who qualify for free lunch (130% poverty income level).

Notably, children qualifying for free lunch in Massachusetts have surpassed those qualified for reduced lunch in CT and those qualified for free lunch in NJ are catching up with those qualified for reduced lunch in CT.

In fact, gaps in the other two states remain relatively large as well because of the growth in outcomes of higher income students which has largely paralleled the growth among lower income students.

Friday Thoughts: Is there really a point to advocating both standardization and choice?

I’ve long been perplexed that the Thomas B. Fordham Institute frames as its top two policy priorities:

  1. Implementing the Common Core
  2. Advancing Choice

Their new web site layout makes this more obvious.

More recently, a report released by the Council on Foreign Relations (referred to largely as the Rice-Klein report in the media and on twitter) argued that our “failing” education system is  a national security concern, and that the road to addressing that concern involves:

  1. expanding the Common Core State Standard initiative to include subjects beyond math and English Language Arts;
  2. an expansion of charter schools and vouchers

Now, as I understand it, there’s at least a subtle difference between these two sources on the point regarding vouchers and charter schools in that Fordham does not appear these days to be out front on promoting vouchers and instead seems to be favoring charter expansion (avoiding the word “voucher” but welcoming “other approaches that provide parents and children solid options and the capacity to make maximum use of them”).

Let me be clear that this post isn’t about favoring or slamming either vouchers or the common core, but rather pointing out that favoring both is entirely inconsistent, unless there’s some weird, warped agenda behind it all. This post IS about slamming the two, when used in combination. It just doesn’t make sense.  Let’s throw into this mix other policies promoting standardization of the operations of traditional public schools like forcing those schools to make personnel decisions based largely on student assessment data.

Collectively what we have here is a massive effort on the one hand, to require traditional public school districts to adopt a common curriculum and ultimately to adopt common assessments for evaluating student success on that curriculum and then force those districts to evaluate, retain and/or dismiss their teachers based on student assessment data, while on the other hand, expanding publicly financed subsidies for more children to attend schools that would not be required to do these things (in many cases, for example, relieving charter schools from teacher  evaluation requirements).

For example, if we believe that improving understanding of core scientific concepts is important for our national security or economic competitiveness, why would we be trying to increase the number of students who opt out of those standards, opting instead to attend fundamentalist religious institutions which may be decidedly anti-science? It seems like it would be one or the other? Certainly, TB Fordham Institute appears concerned with the importance of teaching science, and evolution specifically. When they simultaneously promote “other” choice alternatives, are they suggesting the regulation of science curriculum in those alternatives?

Also, if one believes that competitive pressures create improvement across schools (by stimulating innovation), why set up totally different rules – absurd constraints – in fact – for the largest set of schools in the mix. That seems rather counter productive and certainly limits any potential for real innovation. My critique all along about Race to the Top as a stimulus for innovation was that RTTT was anything but a stimulus for innovation and was instead a bribe to get states to fast-track a handful of preferred and completely unfounded reformy template policies – effectively squelching any real innovation that might have otherwise occurred.

One might instead argue for forcing all schools – public, private (if voucher receiving) and charter – to adopt the common core and evaluate teachers with student test data – and to simultaneously promote a broad based choice program. Yeah… let’s try really hard to make all schools the same and then let individuals choose among them? What we would have is a program that allows parents to choose which school adopts the common core better, and uses testing data better when firing teachers. That doesn’t seem to make a whole lot of sense, either.

No matter how you cut it, combining these two broad preferences leads to a ridiculous mix of policies, whichever side you’re coming from (unless, of course, you’re trying to come from both at once).

So, this all has me wondering if the real objective here – among advocates of these seemingly contradictory policies – is actually to make traditional public schooling so utterly unbearable for both teachers and students by expanding the testing and standards driven culture, expanding curricular standards across areas previously untouched, sucking any remaining creativity out of teaching, and mechanizing the teaching workforce in traditional public schools, making even the worst of the less-regulated alternatives seem more desirable for future generations of both teachers and students?

 

The Principal’s Dilemma

This is a bit of tangential post for this blog, but it’s a topic a few of us have been tweeting about and discussing for the past day or so.

In a series of recent blog posts and in a forthcoming article I have discussed the potential problems with using bad, versus entirely inappropriate measures for determining teacher effectiveness.  I have pointed out, for example, that using value-added measures to estimate teacher effectiveness and then determine whether a teacher should be denied tenure, or have their tenure removed might raise due process concerns which arise from the imprecision and potential outright inaccuracy of teacher effectiveness estimates derived from such methods.

I have also explained that in some states like New Jersey, which have adopted Student Growth Percentile measures as an evaluation tool, that where those measures are used as a basis for dismissing teachers, teachers (or their attorney’s) might simply rely on the language of the authors of those methods to point out that they are not designed to, nor were they intended to attribute responsibility for the measured student growth to the teacher. Where attribution of responsibility is off the table the dismissing a teacher on an assumption of ineffectiveness based on these measures is entirely inappropriate, and a potential violation of the teacher’s due process rights.

But, the problem is that state legislatures are increasingly mandating that these measures absolutely be used when making high stakes personnel decisions. That, for example, such measures count for a significant percentage of the final decision (see notes here) to tenure or remove tenure from a teacher, and in some case (Like NY) that these measures be the absolute determinant (that a teacher cannot be rated as good if they have bad value added ratings).  Some state statutes and regulations provide more flexibility, but essentially require that principals and/or district officials develop their own systems and measures which generally conform to value-added or SGP methods or include them as measures within the evaluation process.

Enter the principal’s dilemma. I would argue that state policymakers in many regards have quickly passed along from one state to another, ill-conceived copy-and-paste legislation with little substantive input from the constituents who actually have to implement this stuff. And, as is clear by the groundswell of opposition in states like New York by principals in particular, many charged with the on-the-ground implementation of these policies are, shall we say, a bit concerned. But what to do?

A principal might be concerned, for example, that if she actually follows through with implementation of these ill-conceived fast-tracked policies, and uses the recommended or required measures or follows the preferred methods for developing her own measures, that she might end up being backed into violating the due process rights of teachers.  That is, the principal might, in effect, be required to dismiss a teacher based on measures that the principal understands full well are neither reliable nor valid for determining that teacher’s effectiveness.

So, can the principal simply refuse to implement state policy? My guess is that even if the district board of education agreed in principle with the principal, that the state would threaten some action against the local school district – applying sufficient pressure (perhaps financially) – such that the local board of education would take action against the principal. And, because the principal would be failing to fulfill her official duties as defined in state statutes and regulations, the principal would have no legal leg to stand on – though might at least have a clear conscience to carry with her in search of a more reasonable state that has avoided such foolish, restrictive policies.

The principal might instead halfheartedly comply with the letter of the state statutes, but still vocally oppose the statutes and regulations in blogs, on twitter and in local op-ed columns.  This is where we might think that the principal would be on safer ground. Unfortunately, recent legal precedents suggest that even in this case, the principal might be at a loss for a winning legal defense if the local school board is pressured into action against her. To the extent that the principal’s public airing of concerns with the newly adopted policies relate to her own official duties as a principal, the principal may not even be able to make first amendment argument in her own defense, regarding her concerns with the current direction of public policy regarding teacher evaluation. Even though the principal might actually be a pretty good source of opinion on the matter. In Garcetti, the “Supreme Court held that speech by a public official is only protected if it is engaged in as a private citizen, not if it is expressed as part of the official’s public duties.”

An awkward situation indeed. It would seem that the only choice of the principal to not jeopardize her own career is to suck-it-up, be quiet and do what she’s knows is wrong, violating the due process rights of one teacher after another by being the hand that implements the ill-conceived policies drawn up by those with little or no comprehension of what they’ve actually done.

Is this really how we want our schools to be run?

Note: Reformy policy is particularly schizophrenic regarding deference to principals and respect for their decision making capacity.  Consider that two key elements of the reformy teacher effectiveness policy template are a) highly restrictive guidelines/matrices/rating systems for teacher evaluation and b) mutual consent hiring and placement policies.  Mutual consent policies coupled with anti-seniority preference policies (part of the same package) require that when a teacher is to be hired into or placed in a specific school within a district, district officials must have the consent of the school principal in order to make such a placement.  These policies presume that principals make only good personnel decisions but district officials are far more likely to make bad ones. These policies also ignore that districts retain latitude to place principals, and further, that there might actually be a case where the district office wishes to place a top notch teacher in a school that currently has weak leadership – but where that weak leader might be inclined to deny the high quality teacher. It’s just a silly policy with no basis in practicality or in research. But at its core, the mutual consent policy asserts that the principal is all-knowing and the best person to make personnel decisions. However, these mutual consent policies are often included in the very same packages which then require the principal to a) rate teacher effectiveness in accordance with a prescriptive rubric and b) tenure and or de-tenure teachers in accordance with that rubric on highly restrictive timelines (3 good years to tenure, 2 bad and you’re out). Put really simply… it’s one or the other. Either princpals’ expertise should be respected or not.  Simultaneously advocating both perspectives seems little more than an effort to confuse and undermine the efficient operation of public school systems!

Baseless Reformy Thoughts from Connecticut (& How this year’s reforms improved decades of past performance!?)

This utterly absurd post appeared yesterday on the CT Ed Reform blog:

http://ctedreform.org/blog/2012/04/poverty-is-not-to-blame-ct%E2%80%99s-low-income-students-rank-48th-in-the-nation-while-ma%E2%80%99s-rank-2nd/

Essentially, the argument goes:

  1. CT’s achievement gap is worse than achievement gaps in states like Massachusetts and New Jersey and in particular, CT’s low income students perform less well than low income students in those states.
  2. Massachusetts has recently adopted reforms to teacher evaluation, which is obviously why Massachusetts has a smaller achievement gap and better performance among low income students. (the post cites New Jersey as well)
  3. THEREFORE, WE KNOW THAT POVERTY IS NOT THE ISSUE…. IT’S TEACHER EVALUATION (and Charter schools, and other reformy stuff)!
  4. Therefore, the solution is to pass SB24 in its original form, which includes such fun things as student test based evaluation of teachers (3 good years to tenure, 2 bad and you’re out), additional funding for and expansion of charter schools – which we know can overcome this pesky poverty distraction!

Now, before I even begin here, I found it most absurd that this particular Ed Reform posting attributed progress made in Massachusetts over the past two decades, to policies adopted in the past two years! As they put it:

We think folks would be hard-pressed to argue that low-income students right over the border in Massachusetts or New Jersey face very different circumstances at home than the low-income students in Connecticut.  So, what actions have our neighboring states taken to address their achievement gaps that Connecticut hasn’t?  Put bluntly, they have adopted education reform policies very similar to the ones proposed in Governor Malloy’s original education reform bill.  They have adopted or implemented policies that evaluate teachers on the basis of student performance, that rank schools and districts within a tiered intervention framework, and that provide the Commissioner with the authority to intervene in the lowest performing schools and districts.

That’s just funny! Ridiculous in fact, making me wonder if this really was just an April fool’s joke.   Let’s make this really plain and simple.

Policies adopted in Massachusetts in the past two years and not yet even fully implemented did not cause low income children in Massachusetts to outperform low income children in Connecticut between 1992 and 2009!

And about New Jersey, I’m not quite sure what policies (legislation) they are talking about, since legislation regarding teacher evaluation is still in its early incubation stages. Pilot studies have begun in a handful of districts but  to suggest that pilot studies being implemented and evaluated now somehow explain past performance of low income students in New Jersey is, well, just dumb.

Now, Massachusetts and New Jersey have in fact implemented reforms that Connecticut has not – SCHOOL FINANCE REFORMS (in Mass, coupled with accountability reforms in the 1990s). Both states have more systematically targeted funding to higher need districts. And for a review of the literature on the effects of such school finance reforms, with specific references to New Jersey and Massachusetts, see:

  1. http://www.tcrecord.org/library/abstract.asp?ContentId=16106
  2. http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

While Connecticut has selectively driven magnet aid to Hartford and New Haven, Connecticut has left other high need districts out entirely. Further, in recent years, as shown in a previous post, Connecticut charter schools have substantively segregated students by income within Hartford, New Haven and Bridgeport.

Here’s a quick snapshot, using 2009 data, of the relationship between current spending per pupil and U.S. Census poverty rates for districts enrolling over 2,000 pupil within New Jersey, Massachusetts and Connecticut. The notable feature of these graphs, indicated by the r-squared value is that in both Massachusetts and New Jersey, current spending per pupil is far more predictably a function of differences in local district poverty rates (adjusted for regional cost variation). In New Jersey and Massachusetts, poverty variation explains more than a third, to nearly half the variation in per pupil spending, and in Connecticut, less than 1/6!

Financial data: http://www.census.gov/govs/school/

Poverty data: http://www.census.gov/did/www/saipe/data/schools/data/index.html

Now, on to other issues: That achievement gap!

Yes, Connecticut does have a relatively large achievement gap, but that gap has to be put in context with similar states, as I explain here. The short version of this story is that the low income- non-low income achievement gaps across states are largely a function of the income gaps between the two groups. Here’s my graph of that relationship:

In the upper right hand corner of this graph are the states with both large income gaps between poor and non-poor kids and with large achievement gaps between them. Yes, Connecticut’s gaps are larger than those of Mass or New Jersey, and those are perhaps the most relevant comparison states (the post got that right – but that’s about all).

We know the correlates of student achievement across CT schools: Poverty!

As it turns out, across these three states, in each case, lower income students perform less well on NAEP. Children qualified for reduced lunch perform less well than those not qualified for subsidized meals at all, and children qualified for free lunch perform less well than those who qualify for reduced price lunch. That’s why, in some cases, I choose t parse these populations in comparisons where most schools serve children below the upper threshold.

Data source: http://nces.ed.gov/nationsreportcard/naepdata/dataset.aspx

Data source: http://nces.ed.gov/nationsreportcard/naepdata/dataset.aspx

It also turns out that if we just go nutty with lots of different measures across Connecticut districts to identify those factors that are most highly correlated with student achievement measures (all data can be found here: http://sdeportal.ct.gov/Cedar/WEB/ct_report/DTHome.aspx & http://www.nces.ed.gov/ccd/bat) we find that various measures of poverty or household income are pretty darn highly associated with student outcome measures! This would seem to suggest that perhaps poverty and measures related to it do likely matter in some way. We’re talking about correlations near and above .80 here between % free/reduced and CMT scores across districts!

We know which districts have greater needs, by various measures!

The reality is that by any number of measures of income or poverty, we know which Connecticut districts have greater student needs, low income and in many cases more children with Limited English Language proficiency.  Here are the scatterplots of the relationships between poverty, income and ELL measures used in the above correlation analysis.

Yep. It’s all pretty straightforward. Various measures of income and poverty are pretty highly related across Connecticut. It’s a highly socioeconomically and racially segregated state. And student outcome measures remain highly correlated with socio-economic measures and with racial composition of school districts!

Malloy’s plan does little or nothing to help them financially

We also know from my previous posts that the Malloy plan does little or nothing to infuse additional resources into the highest need districts. Here it is again!

Additional data sources suggest that CT charters serve fewer needy kids and spend more per pupil than surrounding district schools

But, the Malloy plan does include substantial boost in funding for charter schools which I have shown in previous posts, tend to serve the less needy kids within the highest need settings in the state.  Further, I have shown in those earlier posts that Connecticut charter schools don’t appear to be systematically financially disadvantaged when compared to traditional public schools.

I recalled the other day that the new U.S. Department of Education school site data set released a short while ago includes per pupil spending figures for charter schools in some states. Among those states is Connecticut. The report and data also include information on shares of children by school who are low income.

Here are a few quick snapshots of how Connecticut charter schools in Hartford, New Haven and Bridgeport compare in terms of spending and poverty to “regular” public schools in each of those host districts.

First, without charter names:

Data source: http://www2.ed.gov/about/offices/list/opepd/ppss/reports.html#comparability-state-local-expenditures

[Note that the data set has the variable labeled as “school poverty rate” when in fact it is, I believe, a % free or reduced lunch measure. I’ve left the data label as it is in the original data set]

Now, with the names:

Data source: http://www2.ed.gov/about/offices/list/opepd/ppss/reports.html#comparability-state-local-expenditures

[Note that the data set has the variable labeled as “school poverty rate” when in fact it is, I believe, a % free or reduced lunch measure. I’ve left the data label as it is in the original data set]

A few things are notable here.

First, overall, there’s simply no upward tilt in per pupil spending by school poverty rate. That said, I’m only looking here within higher poverty settings (with the trendline determined by the regular public schools only).  In other words, higher poverty schools don’t generally have more resources per pupil than lower poverty ones within these cities, but my experience with similar data in other settings indicates that these variations are most often explained by the distribution of children with disabilities (also scarce in CT charters).

Second, as I illustrated in my previous post, the charter schools in these cities stand out in terms of the populations they serve (by low income status). New Haven schools have somewhat of a spread of low income rates, but Hartford and Bridgeport are all crunched up against the 100% mark (not all at 100% though, and with some more spread when I use free lunch only). In each case, charter % low income is lower than most regular schools in the host district.

Third, consistent with my previous analysis (but likely because the state reported the data to USDOE), many (more than not & all Achievement 1st schools) charter schools appear to be spending not only more than schools serving similar student populations (by income status), of which there are very few in these settings, but also more than district schools serving much lower income student populations.

These data certainly  raise questions about the validity of the current policy push in Connecticut and raise even more questions about the stated reasons for that push. That is, to the extent that anyone truly believes the absurd rhetoric that test-based teacher evaluation policies and expanding higher spending lower poverty charter schools are the solution to Connecticut’s achievement gap.

Firing teachers based on bad (VAM) versus wrong (SGP) measures of effectiveness: Legal note

In the near future my article with Preston Green and Joseph Oluwole on legal concerns regarding the use of Value-added modeling for making high stakes decisions will come out in the BYU Education and Law Journal. In that article, we expand on various arguments I first laid out in this blog post about how use of these noisy and potentially biased metrics is likely to lead to a flood of litigation challenging teacher dismissals.

In short, as I have discussed on numerous occasions on this blog, value-added models attempt to estimate the effect of the individual teacher on growth in measured student outcomes. But, these models tend to produce very imprecise estimates with very large error ranges, jumping around a lot from year to year.  Further, individual teacher effectiveness estimates are highly susceptible to even subtle changes to model variables. And failure to address key omitted variables can lead to systemic model biases which may even lead to racially disparate teacher dismissals (see here & for follow up , here) .

Value added modeling as a basis for high stakes decision making is fraught with problems likely to be vetted in the courts.  These problems are most likely to come to light in the context of overly rigid state policy requirements requiring that teachers be rated poorly if they receive low scores on the quantitative component of evaluations, and where state policies dictate that teachers must be put on watch and/or de-tenured after two years of bad evaluations (see my post with NYC data on problems with this approach).

Significant effort has been applied toward determining the reliability, validity and usefulness of value-added modeling for inferring school, teacher, principal and teacher preparation institution effectiveness. Just see the program from this recent conference.

As implied above, it is most likely that when cases challenging dismissal based on VAM make it to court, deliberations will center on whether these models are sufficiently reliable or valid for making such judgments – whether teachers are able to understand the basis for which they have been dismissed and whether it is assumed that they have had any control over their fate.  Further, there exist questions about how the methods/models may have been manipulated in order to disadvantage certain teachers.

But what about those STUDENT GROWTH PERCENTILES being pitched for similar use in states like New Jersey?  While on the one hand the arguments might take a similar approach of questioning the reliability or validity of the method for determining teacher effectiveness (the supposed basis for dismissal), the arguments regarding SGPs might take a much simpler approach. In really simple terms SGPs aren’t even designed to identify the teacher’s effect on student growth. VAMs are designed to do this, but fail.

When VAMs are challenged in court, one must show that they have failed in their intended objective. But it’s much, much easier to explain in court that SGPs make no attempt whatsoever to estimate that portion of student growth that is under the control of, therefore attributable to, the teacher (see here for more explanation of this).  As such, it is, on its face, inappropriate to dismiss the teacher on the basis of a low classroom (or teacher) aggregate student growth metric like SGP. Note also that even if integrated into a “multiple measures” evaluation model, if the SGP data becomes the tipping point or significant basis for such decisions, the entire system becomes vulnerable to challenge.*

The authors (& vendor) of SGP, in very recent reply to my original critique of SGPs, noted:

Unfortunately Professor Baker conflates the data (i.e. the measure) with the use. A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

http://www.ednewscolorado.org/2011/09/13/24400-student-growth-percentiles-and-shoe-leather

That is, the authors and purveyors clearly state that SGPs make no ATTRIBUTION OF RESPONSIBILITY for progress to either the teacher or the school. The measure itself – the SGP – is entirely separable from attribution to the teacher (or school) of responsibility for that measure!

As I explain in my response, here, this point is key. It’s all about “attribution” and “inference.” This is not splitting hairs. This is a/the central point! It is my experience from expert testimony that judges are more likely to be philosophers than statisticians (empirical question if someone knows?).  Thus quibbling over the meaning of these words is likely to go further than quibbling over the statistical precision and reliability of VAMs. And the quibbling here is relatively straightforward, and far more than mere quibbling I would argue.

A due process standard for teacher dismissal would at the very least require that the measure upon which dismissal was based, where the basis was teaching “ineffectiveness”, was a measure that was intended to INFER a teacher’s effect on student learning growth – a measure which would allow ATTRIBUTION OF [TEACHER] RESPONSIBILITY for that student growth or lack thereof.  This is a very straightforward, non-statistical point.**

Put very simply, on its face, SGP is entirely inappropriate as a basis for determining teacher “ineffectiveness” leading to teacher dismissal.*** By contrast, VAM is, on its face appropriate, but in application, fails to provide sufficient protections against wrongful dismissal.

There are important implications for pending state policies and current and future pilot programs regarding teacher evaluation in New Jersey and other SGP states like Colorado. First, regarding legislation, it would be entirely inappropriate and a recipe for disaster to mandate that soon-to-be available SGP data be used in any way tied to high stakes personnel decisions like de-tenuring or dismissal. That is, SGPs should neither be explicitly or implicitly suggested as a basis for determining teacher effectiveness. Second, local school administrators would be wise to consider carefully how they choose to use these measures, if they choose to use them at all.

Notes:

*I have noted on numerous occasions on this blog that in teacher effectiveness rating systems that a) use arbitrary performance categories, slicing decisive arbitrary categories through noisy metrics and b) use a weighted structure of percentages putting all factors alongside one another (rather than sequential application), the quantified metric can easily drive the majority of decisions, even if weighted at a seemingly small share (20% or so). If the quantified metric is the component of the evaluation system that varies most, and if we assume that variation to be “real” (valid), the quantified metric is likely to be 100% of the tipping point in many evaluations, despite being only 20% of the weighting.

A critical flaw with many legislative frameworks for teacher evaluation and district adopted policies is that they place the quantitative metrics along side other measures including observations, in a weighted calculation of teacher effectiveness. It is this parallel treatment of the measures that permits the test driven component to override all other “measures” when it comes to the ultimate determination of teacher effectiveness and in some cases whether the teacher is tenured or dismissed. A simple logical resolution to this problem is to use the quantitative measures as a first step – a noisy pre-screening – in which administrators – perhaps central office human resources – might review the data to determine whether the data are indicating potential problem areas across schools & teachers – knowing full well that these might be false signals due to data error and bias. But, the data used in this way at this step might then guide district administration on where to allocate additional effort in classroom observations in a given year.  In this case, the quantified measures might ideally improve the efficiency of time allocation in a comprehensive evaluation model but would not serve as the tipping point for decision making.  I suspect however, that even used in this more reasonable way, administrators will realize over time that the initial signals tend not to be particularly useful.

**Indeed, one can also argue that a VAM regression merely describes the relationship between having X teacher, and achieving Y growth, controlling for A, B, C and so on (where A, B, C include various student characteristics, classroom level characteristics and school characteristics).  To the extent that one can effectively argue that a VAM model is merely descriptive and also does not provide a basis for valid inference, similar arguments can be made. BUT, in my view, this is still more subtle than the OUTRIGHT FAILURE OF SGP to even consider A, B & C – which are factors clearly outside of teachers’ control over student outcomes.

***A non-trivial point is that if you review the conference program from the AEFP conference I mentioned above, or existing literature on this point, you will find numerous articles and papers critiquing the use of VAM for determining teacher effectiveness. But, there are none critiquing SGP. Is this because it is well understood that SGPs are an iron-clad method overcoming the problems of VAM? Absolutely not. Academics will evaluate and critique anything which claims to have a specific purpose. Scholars have not critiqued the usefulness of SGPs for inferring teacher effectiveness, have not evaluated their reliability or validity for this purpose, BECAUSE SCHOLARS UNDERSTAND FULL WELL THAT THEY ARE NEITHER DESIGNED NOR INTENDED FOR THIS PURPOSE.

A Few Additional CT Charter Figures

I was admittedly in a bit of a rush the other day to pull together some figures on CT charter schools based largely on data I had previously compiled, some of which only included Achievement First charter schools.  Here, I include all charter schools in Hartford, New Haven and Bridgeport, and address only the % Free Lunch numbers using the most recent available data from the NCES Common Core of Data, which are from 2009-10.  A few quick points are in order.

First, this is not “old” data per se. It is one year lagged from the  most recent official state data (2010-11). Current year (2011-12) data would not be appropriate for use until the close of the year. Thus 2010-11 would be the most recent complete data, if available. Also, these types of data tend to be relatively stable over time. They don’t shift much over a 2 year period, but I’ll keep updating as complete end of year data become available. The burden of reporting accuracy falls on the schools and districts.

Second, this is not a “study.” A study, so to speak, in my view, requires far more extensive analysis than this. And yes, this is a topic on which I have conducted those more extensive analyses (though not specifically involving CT charter schools). This is a blog, and in this blog post and in the previous blog post on CT Charter schools I have merely rendered graphs of the existing data as reported by the schools. There’s no data editing involved, and no tricky statistical analyses ( like the regression model of wages in my CT teacher post – which come from previous work). It’s just graphs. Then why bother? Well, I bother because much of what I see in the ongoing debate over CT charter schools (and charters in some other locations) is guided by misinformation, or at least misconceptions (of charters beating the odds with the “same” students – proving poverty doesn’t matter! nor does money?).  Misinformation that is easily enough correctable with a simple graph or two, or map, or even table of the numbers. Hey… all of these numbers are available to each and every one of you. I’ve provided posts in the past where I explain how to get them and how to summarize and graph them.  I wish someone else would save me the time, and go make their own graphs, or at least present and discuss the existing data to provide relevant context for current policy discussions. But alas, I’ve not seen that happening (though a few individuals have jumped into the game). Thus, I stick my nose, uninvited into another state’s business once again.

All of that said, here are a few more graphs:

The upshot of these graphs is that it would certainly be unfair to criticize Achievement First specifically for serving fewer low income children than district schools in these major cities. In fact, in both New Haven and Hartford, the Achievement First charters have the higher low income concentrations among the charters, and in Bridgeport they are not the lowest.

It is also important to understand that districts have to a large extent self-induced economic segregation through their own magnet school programs. I’ve addressed the same issue regarding Newark, NJ in the past. So, economic segregation within these cities is not entirely driven by the presence of charters but rather by the complex mix of district traditional and magnet schools coupled with the introduction and expansion of charters.

 

SB24 won’t solve CT’s real Teacher Equity Problems

Connecticut’s SB 24 appears to be little more than boilerplate reformy legislation which, like similar legislation in other states, creates a massive smokescreen concealing the very real problems facing Connecticut school districts. I addressed in a previous post my concern that SB24’s emphasis on charter expansion as a solution for high poverty districts is misguided, mainly because most of those successful charter schools in CT are currently achieving their successes at least in part by NOT serving high poverty populations. And another part may be the additional resources of these schools, used for such things as increased school time, supported by increased teacher salaries.  But SB24 comes with few resources attached. The other major elements of SB24 involve teacher “effectiveness” with significant emphasis on use of student performance measures for teacher evaluation. For numerous posts on this topic, see: https://schoolfinance101.wordpress.com/category/race-to-the-top/value-added-teacher-evaluation/

A few points are in order before I move on.

First, even if we make value added measures about 20% of an evaluation system, and observations and other measures cover the rest, if the value-added measures vary most (which they are likely to, simply because they will be reported as statistically norm-referenced), then the value added measures likely become the tipping point more often than not. This is hugely problematic, given our inability to fully remove bias from these measures, and the fact that they remain so damn noisy as to hardly be useful at all (and are easily manipulated to yield different results for individual teachers)

Second, arguing that somehow using these noisy, potentially biased measures for personnel management or even mass deselection of teachers will somehow improve the equity of the distribution of teachers across advantaged and disadvantaged schools is simply absurd! This is especially the case if absolutely no attention is paid to existing underlying disparities in working conditions and teacher compensation.

The Real Connecticut Problem(s)

So, let’s take a look at what’s really going on in Connecticut regarding the distribution and compensation of teachers. But, let me begin with a bit of background literature on the relationship between funding and teacher quality. Rather than reinvent the wheel here, allow me to rely on a section of a policy brief I wrote last fall for Shanker Institute:

The Coleman report looked at a variety of specific schooling resource measures, most notably teacher characteristics, finding positive relationships between these traits and student outcomes. A multitude of studies on the relationship between teacher characteristics and student outcomes have followed, producing mixed messages as to which matter most and by how much.[i] Inconsistent  findings on the relationship between teacher “effectiveness” and how teachers get paid – by experience and education – added fuel to “money doesn’t matter” fire. Since a large proportion of school spending necessarily goes to teacher compensation, and (according to this argument) since we’re not paying teachers in a manner that reflects or incentivizes their productivity, then spending more money won’t help.[ii] In other words, the assertion is that money spent on the current system doesn’t matter, but it could if the system was to change.

Of course, in a sense, this is an argument that money does matter. But it also misses the important point about the role of experience and education in determining teachers’ salaries, and what that means for student outcomes.

While teacher salary schedules may determine pay differentials across teachers within districts, the simple fact is that where one teaches is also very important in determining how much he or she makes.[iii] Arguing over attributes that drive the raises in salary schedules also ignores the bigger question of whether paying teachers more in general might improve the quality of the workforce and, ultimately, student outcomes. Teacher pay is increasingly uncompetitive with that offered by other professions, and  the “penalty” teachers pay increases the longer they stay on the job.[iv]

A substantial body of literature has accumulated to validate the conclusion that both teachers’ overall wages and relative wages affect the quality of those who choose to enter the teaching profession, and whether they stay once they get in. For example, Murnane and Olson (1989) found that salaries affect the decision to enter teaching and the duration of the teaching career,[v] while Figlio (1997, 2002) and Ferguson (1991) concluded that higher salaries are associated with more qualified teachers.[vi] In addition, more recent studies have tackled the specific issues of relative pay noted above. Loeb and Page showed that:

“Once we adjust for labor market factors, we estimate that raising teacher wages by 10 percent reduces high school dropout rates by 3 percent to 4 percent. Our findings suggest that previous studies have failed to produce robust estimates because they lack adequate controls for non-wage aspects of teaching and market differences in alternative occupational opportunities.”[vii]

In short, while salaries are not the only factor involved, they do affect the quality of the teaching workforce, which in turn affects student outcomes.

Research on the flip side of this issue – evaluating spending constraints or reductions – reveals the potential harm to teaching quality that flows from leveling down or reducing spending. For example, David Figlio and Kim Rueben (2001) note that, “Using data from the National Center for Education Statistics we find that tax limits systematically reduce the average quality of education majors, as well as new public school teachers in states that have passed these limits.”[viii]

Salaries also play a potentially important role in improving the equity of student outcomes. While several studies show that higher salaries relative to labor market norms can draw higher quality candidates into teaching, the evidence also indicates that relative teacher salaries across schools and districts may influence the distribution of teaching quality. For example, Ondrich, Pas and Yinger (2008) “find that teachers in districts with higher salaries relative to non-teaching salaries in the same county are less likely to leave teaching and that a teacher is less likely to change districts when he or she teaches in a district near the top of the teacher salary distribution in that county.”[ix]

With regard to teacher quality and school racial composition, Hanushek, Kain, and Rivkin (2004) note: “A school with 10 percent more black students would require about 10 percent higher salaries in order to neutralize the increased probability of leaving.”[x] Others, however, point to the limited capacity of salary differentials to counteract attrition by compensating for working conditions.[xi]

Finally, it bears noting that those who criticize the use of experience and education in determining teachers’ salaries must of course produce a better alternative, and there is even less evidence behind increasingly popular ways to do so than there is to support the policies they intend to replace. In a perfect world, we could tie teacher pay directly to productivity, but contemporary efforts to do so, including performance bonuses based on student test results,[xii] have thus far failed to produce concrete results in the U.S. More promising efforts to measure productivity, such as new teacher evaluations that incorporate heavily-weighted teacher productivity measures based on their students’ test scores, are still a work in progress, and there is not yet evidence that they will be any more effective (or cost-effective) in attracting, developing or retaining high-quality teachers.

To summarize, despite all the uproar about paying teachers based on experience and education, and its misinterpretations in the context of the “Does money matter?” debate, this line of argument misses the point. To whatever degree teacher pay matters in attracting good people into the profession and keeping them around, it’s less about how they are paid than how much. Furthermore, the average salaries of the teaching profession, with respect to other labor market opportunities, can substantively affect the quality of entrants to the teaching profession, applicants to preparation programs, and student outcomes. Diminishing resources for schools can constrain salaries and reduce the quality of the labor supply. Further, salary differentials between schools and districts might help to recruit or retain teachers in high need settings. In other words, resources used for teacher quality matter.

So then, how does this all play out in Connecticut? By the reformy rhetoric being so casually tossed about one would think that all of those urban CT teachers must already be being paid lavishly and certainly more than enough than would be required to get the best and brightest CT college grads to want to teach in Bridgeport or New Britain.

Let’s start with wages relative to non-teaching professions. Allegretto, Corcoran and Mishel identify Connecticut as having a relatively average teaching penalty.

In Connecticut, the average weekly wages of teachers are about 77.6% of the average weekly wages of similarly educated non-teachers.

But, we also know that Connecticut’s good schools and districts are pretty darn good. So perhaps the issue isn’t so much about the average, but about the disparities. Besides, the core rhetoric around the proposed reforms seems to be much about the achievement gap in CT, which is indeed large even when corrected for the income gap.

Let’s quickly revisit my representation of which districts in CT are most disadvantaged, when we look at the relationship between cost and need adjusted per pupil expenditures and average current outcomes:

Expressing the funding as difference from the average, and throwing some cutpoints into the picture to create some fun groups for comparison, I get:

So, I’ve got the advantaged districts which have high adjusted spending and high outcomes. I’ve got my overall group of disadvantaged districts which have low adjusted spending and low outcomes, and a particularly screwed subset of these districts which I call severely disadvantaged (including Bridgeport and New Britain).

Now, recall, that the funding side of the Malloy plan isn’t going to do a whole lot to help out these districts.

Notice that huge infusion of funding represented by the red triangles relative to prior year Net Current Expenditures. Oh wait. There really isn’t any. But what’s funding got to do with it anyway? (go back and read the above section!)

So, I’ve taken the teacher level salary and characteristics data to estimate two different models of the differences in teacher characteristics between the advantaged and disadvantaged districts and between the advantaged and severely disadvantaged districts. First, let’s look simply at salary parity at constant teacher characteristics. NOTE THAT IT WOULD TAKE NOT ONLY AN EQUAL, BUT HIGHER SALARY, FOR EXAMPLE, TO GET TEACHERS WITH SPECIFIC QUALIFICATIONS TO WORK IN BRIDGEPORT AS OPPOSED TO WESTPORT.

Here’s the salary model, with comments in the margins:

All else equal, teachers in disadvantaged districts are still behind their peers in advantaged districts within the same labor market. Reformy platitudes and test based evaluation will not fix that!

Now, here’s a logistic regression of the likelihood that a teacher is a novice, or in his/her first 3 years of teaching. This is a commonly used marker for teacher quality inequity, because a substantial body of literature has found that concentrations of novice teachers (i.e. teachers with less than 3 or 4 years of experience) can have significant negative effects on student outcomes.[1] Rivkin, Hanushek, and Kain (2005) find that teacher experience is important in the first two years of a teaching career (but not thereafter).[2]  Hanushek and Rivkin note that: “we find that identifiable school factors – the rate of student turnover, the proportion of teachers with little or no experience, and student racial composition – explain much of the growth in the achievement gap between grades 3 and 8 in Texas schools.”[3] Notably, evidence from a variety of state and local contexts, provides a consistent picture that higher concentrations of novice teachers are associated with negative effects on student outcomes.

Here are the models:

So, kids in classrooms in severely disadvantaged or generally disadvantaged districts are each about 20% more likely to face novice teachers. Note that they are also more likely to be in larger classes, specifically if they are in the severe disparity group!

Again, SB24’s reformy platitudes will do nothing to remedy this disparity.

Put simply, the SB24 teacher effectiveness provisions are a massive smokescreen that do little or nothing to address persistent underlying disparities across CT districts. Worse, the misguided emphasis on reducing job security and focusing on problematic performance metrics will likely do more harm than good for children in the most disadvantaged districts.

 

 REFERENCES

[i] Hanushek, E.A. (1971) Teacher Characteristics and Gains in Student Achievement: Estimation Using MicroData. Econometrica 61 (2) 280-288

Clotfelter, C.T., Ladd, H.F., Vigdor, J.L. (2007) Teacher credentials and student achievement: Longitudinal analysis with student fixed effects. Economics of Education Review 26 (2007) 673–682

Goldhaber, D., Brewer, D. (1997) Why Don’t Schools and Teachers Seem to Matter? Assessing the Impact of Unobservables on Educational Productivity. The Journal of Human Resources, 332 (3) 505-523

Ehrenberg, R. G., & Brewer, D. J. (1994). Do school and teacher characteristics matter? Evidence from High School and Beyond. Economics of Education Review, 13(1), 1-17.

Ehrenberg, R. G., & Brewer, D. J. (1995). Did teachers’ verbal ability and race matter in the 1960s? Economics of Education Review, 14(1), 1-21.

Jepsen, C. (2005). Teacher characteristics and student achievement: Evidence from teacher surveys. Journal of Urban Economics, 57(2), 302-319.

Jacob, B. A., & Lefgren, L. (2004). The impact of teacher training on student achievement: Quasi-experimental evidence from school reform. Journal of Human Resources, 39(1),50-79.

Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools, and academic achievement. Econometrica, 73(2), 471.

Wayne, A. J., & Youngs, P. (2003). Teacher characteristics and student achievement gains. Review of Educational Research, 73(1), 89-122.

For a recent review of studies on the returns to teacher experience, see:

Rice, J.K. (2010) The Impact of Teacher Experience: Examining the Evidence and Policy Implications. National Center for Analysis of Longitudinal Data in Educational Research.

[ii] Some go so far as to argue that half or more of teacher pay is allocated to “non-productive” teacher attributes, and so it follows that that entire amount of funding could be reallocated toward making schools more productive.

See, for example, a recent presentation to the NY State Board of Regents from September 13, 2011 (page 32), slides by Stephen Frank of Education Resource Strategies: http://www.p12.nysed.gov/mgtserv/docs/SchoolFinanceForHighAchievement.pdf

[iii] Lankford, H., Loeb., S., Wyckoff, J. (2002) Teacher Sorting and the Plight of Urban Schools. Educational Evaluation and Policy Analysis 24 (1) 37-62

[iv] Allegretto, S.A., Corcoran, S.P., Mishel, L.R. (2008) The teaching penalty : teacher pay losing ground. Washington, D.C. : Economic Policy Institute, ©2008.

[v] Richard J. Murnane and Randall Olsen (1989) The effects of salaries and opportunity costs on length of state in teaching. Evidence from Michigan. Review of Economics and Statistics 71 (2) 347-352

[vi] David N. Figlio (2002) Can Public Schools Buy Better-Qualified Teachers?” Industrial and Labor Relations Review 55, 686-699. David N. Figlio (1997) Teacher Salaries and Teacher Quality. Economics Letters 55 267-271. Ronald Ferguson (1991) Paying for Public Education: New Evidence on How and Why Money Matters. Harvard Journal on Legislation. 28 (2) 465-498.

[vii] Loeb, S., Page, M. (2000) Examining the Link Between Teacher Wages and Student Outcomes: The Importance of Alternative Labor Market Opportunities and Non-Pecuniary Variation. Review of Economics and Statistics 82 (3) 393-408

[viii] Figlio, D.N., Rueben, K. (2001) Tax Limits and the Qualifications of New Teachers. Journal of Public Economics. April, 49-71

See also:

Downes, T. A. Figlio, D. N. (1999) Do Tax and Expenditure Limits Provide a Free Lunch? Evidence on the Link Between Limits and Public Sector Service Quality52 (1) 113-128

[ix] Ondrich, J., Pas, E., Yinger, J. (2008) The Determinants of Teacher Attrition in Upstate New York. Public Finance Review 36 (1) 112-144

[x] Hanushek, Kain, Rivkin, “Why Public Schools Lose Teachers,” Journal of Human Resources 39 (2) p. 350

[xi] Clotfelter, C., Ladd, H.F., Vigdor, J. (2011) Teacher Mobility, School Segregation and Pay Based Policies to Level the Playing Field. Education Finance and Policy , Vol.6, No.3, Pages 399–438

Clotfelter, Charles T., Elizabeth Glennie, Helen F. Ladd, and Jacob L. Vigdor. 2008. Would higher salaries keep teachers in high-poverty schools? Evidence from a policy intervention in North Carolina. Journal of Public Economics 92: 1352–70.

[xii] For recent studies specifically on the topic of “merit pay,” each of which generally finds no positive effects of merit pay on student outcomes, see:

Glazerman, S., Seifullah, A. (2010) An Evaluation of the Teacher Advancement Program in Chicago: Year Two Impact Report. Mathematica Policy Research Institute. 6319-520

Springer, M.G., Ballou, D., Hamilton, L., Le, V., Lockwood, J.R., McCaffrey, D., Pepper, M., and Stecher, B. (2010). Teacher Pay for Performance: Experimental Evidence from the Project on Incentives in Teaching. Nashville, TN: National Center on Performance Incentives at Vanderbilt University.

Marsh, J. A., Springer, M. G., McCaffrey, D. F., Yuan, K., Epstein, S., Koppich, J., Kalra, N., DiMartino, C., & Peng, A. (2011). A Big Apple for Educators: New York City’s Experiment with Schoolwide Performance Bonuses. Final Evaluation Report. RAND Corporation & Vanderbilt University.


[1] See Charles T. Clotfelter, Helen F. Ladd and Jacob L. Vigdor, “Who Teaches Whom? Race and the distribution of novice teachers,” Economics of Education Review 24, no. 4 (August, 2005): 377-392;   See Charles T. Clotfelter, Helen F. Ladd and Jacob L. Vigdor, “Teacher sorting, teacher shopping, and the assessment of teacher effectiveness,” Sanford Institute of Public Policy, Duke University, 2004; and Hanushek, Kain, and Rivkin, “Teachers, schools, and academic achievement.”

[2] Hanushek, Kain, and Rivkin, “Teachers, schools, and academic achievement.”