When Disinformation is Fueled by Misinformation! CHANCELLOR TISCH, YOU ARE WRONG!

Very recently, I posted a critique of the recent technical report on New York State median growth percentiles to be used in that state’s teacher and principal evaluation system.

Today, I read this piece in the NY Post – an editorial by NY State Board of Regents Chancellor Merryl Tisch, and well, MY HEAD ALMOST EXPLODED!

The point of the editorial is to encourage NY City’s teachers and DOE to agree to a teacher evaluation system based on supposedly objective measures – where “objective measures” seems largely to be code language for estimates of teacher effectiveness derived from student assessment data.

First, I have written several previous posts on the usefulness of NYC’s value-added model for determining teacher effectiveness.

  1. the NYC VAM model retains some persistent biases
  2. the NYC VAM model is highly unstable from year to year
  3. the NYC VAM results capture only a handful of teachers per school and their results tend to jump all over the place
  4. adopting the NCTQ irreplaceables logic, the NYC VAM data are so noisy that few if any teachers are persistently irreplaceable
  5. for various reasons, it is unlikely that these are just early glitches in the system that will get better with time

Setting aside this long list of concerns about the NYC VAM results, I now turn to the NYSED – state median growth percentile data (which actually seem inferior to the NYC VAM model/estimates). In her editorial, Chancellor Tisch proclaims:

The student-growth scores provided by the state for teacher evaluations are adjusted for factors such as students who are English Language Learners, students with disabilities and students living in poverty. When used right, growth data from student assessments provide an objective measurement of student achievement and, by extension, teacher performance.

Let me be blunt here. CHANCELLOR TISCH – YOU ARE WRONG! FLAT OUT WRONG! IRRESPONSIBLY & PERHAPS NEGLIGENTLY WRONG!

[now, one might quibble that Chancellor Tisch has merely stated that the measures are “adjusted for” certain factors and she has not claimed that those adjustments actually work to eliminate bias. Further, she has merely declared that the measures are “objective” and not that they are accurate or precise. Personally, I don’t find this deceptive language at all comforting!]

Indeed, the measures attempt – but fail to sufficiently adjust for key factors. They retain substantial biases as identified in the state’s own technical report. And they are subject to many of the same error concerns as the NYC VAM model.  Given the findings of the state’s own technical report, it is irresponsible to suggest that these measures can and should be immediately considered for making personnel and compensation decisions.

Finally, as I laid out in my previous blog post to suggest that “growth data from student assessments provide an objective measure of student achievement, and, by extension, teacher performance” IS A HUGE UNWARRANTED STRETCH!

While I might concur with the follow up statement from Chancellor Tisch that “We should never judge an educator solely by test scores, but we shouldn’t completely disregard student performance and growth either.” I would argue that school leaders/peer teachers/personnel managers should absolutely have the option to completely disregard data that have high potential to be sending false signals, either as a function of persistent bias or error. Requiring action based on biased and error prone data (rather than permitting those data to be reasonably mined to the extent they may, OR MAY NOT, be useful) is a toxic formula for public schooling quality.

The one thing I can’t quite figure out here is which is the misinformation and which is the disinformation. In any case, both are wrong!

The rest of what I have to say, I’ve already said. But, so readers don’t have to click the link below to access the previous post, I’ve pasted  the entire thing below. Enjoy!

COMPLETE PREVIOUS POST!

I was immediately intrigued the other day when a friend passed along a link to the recent technical report on the New York State growth model, the results of which are expected/required to be integrated into district level teacher and principal evaluation systems under that state’s new teacher evaluation regulations.  I did as I often do and went straight for the pictures – in this case- the scatterplots of the relationships between various “other” measures and the teacher and principal “effect” measures.  There was plenty of interesting stuff there, some of which I’ll discuss below.

But then I went to the written language of the report – specifically the report’s (albeit in DRAFT form)  conclusions. The conclusions were only two short paragraphs long, despite much to ponder being provided in the body of the report. The authors’ main conclusion was as follows:

The model selected to estimate growth scores for New York State provides a fair and accurate method for estimating individual teacher and principal effectiveness based on specific regulatory requirements for a “growth model” in the 2011-2012 school year. p. 40

http://engageny.org/wp-content/uploads/2012/06/growth-model-11-12-air-technical-report.pdf

13-Nov-2012 20:54

Updated Final Report: http://engageny.org/sites/default/files/resource/attachments/growth-model-11-12-air-technical-report_0.pdf

Local copy of original DRAFT report: growth-model-11-12-air-technical-report

Local copy of FINAL report: growth-model-11-12-air-technical-report_FINAL

Unfortunately, the multitude of graphs that immediately preceded this conclusion undermine it entirely. but first, allow me to address the egregious conceptual problems with the framing of this conclusion.

First Conceptually

Let’s start with the low hanging fruit here. First and foremost, nowhere in the technical report, nowhere in their data analyses, do the authors actually measure “individual teacher and principal effectiveness.” And quite honestly, I don’t give a crap if the “specific regulatory requirements” refer to such measures in these terms. If that’s what the author is referring to in this language, that’s a pathetic copout.  Indeed it may have been their charge to “measure individual teacher and principal effectiveness based on requirements stated in XYZ.” That’s how contracts for such work are often stated. But that does not obligate the author to conclude that this is actually what has been statistically accomplished. And I’m just getting started.

So, what is being measured and reported?  At best, what we have are:

  • An estimate of student relative test score change on one assessment each for ELA and Math (scaled to growth percentile) for students who happen to be clustered in certain classrooms.

THIS IS NOT TO BE CONFLATED WITH “TEACHER EFFECTIVENESS”

Rather, it is merely a classroom aggregate statistical association based on data points pertaining to two subjects being addressed by teachers in those classrooms, for a group of children who happen to spend a minority share of their day and year in those classrooms.

  • An estimate of student relative test score change on one assessment each for ELA and Math (scaled to growth percentile) for students who happen to be clustered in certain schools.

THIS IS NOT TO BE CONFLATED WITH “PRINCIPAL EFFECTIVENESS”

Rather, it is merely a school aggregate statistical association based on data points pertaining to two subjects being addressed by teachers in classrooms that are housed in a given school under the leadership of perhaps one or more principals, vps, etc., for a group of children who happen to spend a minority share of their day and year in those classrooms.

Now Statistically

Following are a series of charts presented in the technical report, immediately preceding the above conclusion.

Classroom Level Rating Bias

School Level Rating Bias

And there are many more figures displaying more subtle biases, but biases that for clusters of teachers may be quite significant and consequential.

Based on the figures above, there certainly appears to be, both at the teacher, excuse me – classroom, and principal – I mean school level, substantial bias in the Mean Growth Percentile ratings with respect to initial performance levels on both math and reading. Teachers with students who had higher starting scores and principals in schools with higher starting scores tended to have higher Mean Growth Percentiles.

This might occur for several reasons. First, it might just be that the tests used to generate the MGPs are scaled such that it’s just easier to achieve growth in the upper ranges of scores. I came to a similar finding of bias in the NYC value added model, where schools having higher starting math scores showed higher value added. So perhaps something is going on here. It might also be that students clustered among higher performing peers tend to do better. And, it’s at least conceivable that students who previously had strong teachers and remain clustered together from year to year, continue to show strong growth. What is less likely is that many of the actual “better” teachers just so happen to be teaching the kids who had better scores to begin with.

That the systemic bias appears greater in the school level estimates than in the teacher level estimates is suggestive that the teacher level estimates may actually be even more bias than they appear. The aggregation of otherwise less biased estimates should not reveal more bias.

Further, as I’ve mentioned on several times on this blog previously, even if there weren’t such glaringly apparent overall patterns of bias their still might be underlying biased clusters.  That is, groups of teachers serving certain types of students might have ratings that are substantially WRONG, either in relation to observed characteristics of the students they serve or their settings, or of unobserved characteristics.

Closing Thoughts

To be blunt – the measures are neither conceptually nor statistically accurate. They suffer significant bias, as shown and then completely ignored by the authors. And inaccurate measures can’t be fair. Characterizing them as such is irresponsible.

I’ve now written 2 articles and numerous blog posts in which I have raised concerns about the likely overly rigid use of these very types of metrics when making high stakes personnel decisions. I have pointed out that misuse of this information may raise significant legal concerns. That is, when district administrators do start making teacher or principal dismissal decisions based on these data, there will likely follow, some very interesting litigation over whether this information really is sufficient for upholding due process (depending largely on how it is applied in the process).

I have pointed out that the originators of the SGP approach have stated in numerous technical documents and academic papers that SGPs are intended to be a descriptive tool and are not for making causal assertions (they are not for “attribution of responsibility”) regarding teacher effects on student outcomes. Yet, the authors persist in encouraging states and local districts to do just that. I certainly expect to see them called to the witness stand the first time SGP information is misused to attribute student failure to a teacher.

But the case of the NY-AIR technical report is somewhat more disconcerting. Here, we have a technically proficient author working for a highly respected organization – American Institutes for Research – ignoring all of the statistical red flags (after waiving them), and seemingly oblivious to gaping conceptual holes (commonly understood limitations) between the actual statistical analyses presented and the concluding statements made (and language used throughout).

The conclusion are WRONGstatistically and conceptually.  And the author needs to recognize that being so damn bluntly wrong may be consequential for the livelihoods of thousands of individual teachers and principals! Yes, it is indeed another leap for a local school administrator to use their state approved evaluation framework, coupled with these measures, to actually decide to adversely affect the livelihood and potential career of some wrongly classified teacher or principal – but the author of this report has given them the tool and provided his blessing. And that’s inexcusable.

The Secrets to Charter School Success in Newark: Comments on the NJ CREDO Report

Today, with much fanfare, we finally got our New Jersey Charter School Report. The unsurprising findings of that report are that charter schools in Newark in particular seem to be providing students with greater average annual achievement gains than those of similar (matched) students attending district schools. Elsewhere around the state charter schools are pretty much average.

Link to report: http://credo.stanford.edu/pdfs/nj_state_report_2012_FINAL11272012.pdf

So then, the big question is, what exactly is behind the apparent success of Newark Charter schools – or at least some of them enough to influence the analysis as a whole – that makes them successful? Further, and perhaps more importantly, is there something about these schools that makes them successful that can be replicated?

The General Model

Allow me to start by pointing out that the CREDO study uses its usual approach  – a reasonable one given data and system constraints, of identifying matched sets of students from feeder schools (or areas) who end up in district schools and in charter schools. CREDO then compares (estimates) the year to year test score gains of students in the charter and district schools.

The CREDO approach, while reasonable, simply can’t sort out which component of student achievement gain is created by “school factors” (such as teacher quality, length of day/year, etc.) and which factors are largely a function of concentrating non-low income, non-ell, non-disabled females in charter schools while concentrating the “others” in district schools.

School Effect = Controllable School Factors + Peer Group & Other Factors

In other words, we simply don’t know what component of the effect has to do with school quality issues that might be replicated and what component has to do with clustering kids together in a more advantaged peer group. Yes, the study controls for the students’ individual characteristics, but no, it cannot sort out whether the clustering of students with more or less advantaged peers affects their outcomes (which it certainly does). Lottery-based studies suffer the same problem, when lotteried in and lotteried out students end up in very different peer contexts. Yes, the sorting mechanism is random, but the placement is not. The peer selection effect may be exacerbated by selective attrition (shedding weaker and/or disruptive students over time). And Newark’s highest flying charter schools certainly have some issues with attrition.

Given my numerous previous posts, I would suggest Figure 1 as the general model of the secrets of Newark Charter School success.

Figure 1. The General Model

Put simply, while resource use – additional time, compensation, etc. – may be part of the puzzle – the scalable part – the strong sorting patterns of students into charter and district schools clearly play some role – a substantial role – and one that constrains our ability to use “chartering” as a broad-based public policy solution.

One Part Segregation

Let’s start by taking a look at the most recent available data on the segregation of students by disability status, free lunch status, gender and language proficiency. Now, the CREDO report is careful to point out that charter school enrollments match the demographics of their feeder schools – and uses this finding as an indication that therefore charter schools aren’t cream-skimming. That’s all well and good…. EXCEPT … that for some (actually many) reason, charter schools themselves end up having far fewer of the lowest income students. See Figure 2.

Figure 2. % Free Lunch

Now, one technical quibble I have with the CREDO report is that it relies on the free/reduced priced lunch indicator to identify economic disadvantage (and then sloppily throughout refers to this as “poverty”). I have shown on numerous previous occasions that Newark charters tend to serve larger shares of the less poor children and smaller shares of the poorer children. So, it is quite likely that the CREDO matched groups of students actually include disproportionate shares of “reduced lunch” children for charters and “free lunch” children sorted into district schools. This is a non-trivial difference! [gaps between free lunch and reduced lunch students tend to be comparable to gaps between reduced lunch and non-qualified students.]

Here are the other sorting issues:

Figure 3. % ELL/LEP

Figure 4. % Female

 

Figure 5 shows that not only do charter schools in Newark tend to serve far fewer children with disabilities, they especially serve few or no students with more severe disabilities. In fact, they serve mainly students with Specific Learning Disabilities and Speech Language Impairment. Given the data in Table 5, it is actually quite humorous – if not strangely disturbing – that the CREDO study attempted to parse the relative effectiveness of district and charter schools at producing outcomes for children with disabilities using only a single broad classification [Student matching was based on a single classification, creating the possibility that children with speech language impairment in charters were being compared with children with mental retardation and autistic children in district schools. It is likely that most students who took the assessments were those with less severe disabilities in both cases.].

Figure 5. Special Education Distributions

Here are some related findings from (and links to) previous posts

Newark Charter Effects on NPS School Enrollments

New Jersey Charter School Special Education

Newark Charter School Attrition Rates

Here are just a few visuals of how the free lunch shares and female student test-taker shares relate to general education proficiency rates on 8th grade math. Both are relatively strong determinants of cross-school proficiency. And both with respect to gender balance and free lunch balance, Newark Charter schools are substantively different from their district school counterparts.

Figure 6: 8th Grade Math & % Free Lunch

Figure 7: 8th Grade Math & % Female

 

Now, these are performance level differences, which are not the same as the gain measures estimated in the CREDO study. But, I’ve chosen the 8th grade scores because that is when the charter scores tend to pull away from the district school scores (that is, these are the score levels at the tail end of achieving greater gains). But, the contexts of the gains for charter students are so substantially different from the contexts of achievement gains for district school students that scalability is highly questionable.

As I’ve said before – There just aren’t enough non-disabled, non-poor, fluent English speaking females in Newark to fully replicate district-wide the successes of the city’s highest flying charters.

One Part Compensation

Now, I’ve also written many posts which address the resource advantages and some resource allocation issues for high flying New York City charter schools, which a) also promote substantial student population segregation and b) have been shown in numerous studies to yield positive achievement gains.

I do not intend to imply by my above critique that peer group effect is necessarily the ONLY effect driving Newark Charter’s supposed success. The problem is that because high flying Newark Charters in particular serve such uncommon student populations we can never really sort out the peer group versus school quality effects.

It is certainly reasonable to assume that the additional time and effort spent with these students in some schools – even though they are a more advantaged (less disadvantaged) group – makes a difference.  No excuses charters in Newark like those in New York City tend to provide longer school days and longer school years, and importantly, they compensate their teachers for the additional time & effort. Here’s a simple chart of the average teacher compensation for early career teachers in NPS and Newark Charters. NPS teachers catch back up in later years, but as I’ve pointed out in numerous previous posts, a handful of Newark charters have adopted the reasonable (smart) competitive strategy of leveraging higher salaries and salary growth at the front end to improve teacher retention and recruitment.

Figure 9: Newark Teacher Compensation

Below is a more precise comparison that teases out the differences that aren’t so apparent in Figure 9. For Figure 10, I have used 3 years of data on teachers to estimate a regression model of teacher salaries as a function of experience, degree level and data year.

Some of Newark’s “high flying charters” [North Star, Gray, TEAM] tend to substantially outpace salaries of NPS teachers over the first ten years of a teacher career. Few of these schools have any teachers with more years of experience than 10. Other Newark charter schools maintain at least relatively competitive salaries with NPS.

Now, a critical point here is that as I’ve shown above, teaching in many of these schools comes with the perk of working with a much more advantaged student population. As such,  it is conceivable that even a comparable wage provides recruitment advantage – given the student population difference. Clearly, a higher wage provides a significant recruitment advantage – though in the case of the highest paying school(s), the elevated salary comes with substantial additional obligations.

Figure 10. Modeled Teacher Salary Variation by Experience

Closing Thoughts

So, when all is said and done, this new “charter school” report like many that have come before it leaves us sadly unfulfilled, at least with respect to its potential to provide important policy insights. Most cynically, one might argue the main finding of the report is simply that cream-skimming works – generates a solid peer effect that provides important academic advantages to a few – and serving a few is better than serving none at all (assuming the latter is really the alternative?). Keep it up!  Don’t worry ’bout the rest of those kids who get shuffled off into district schools. Quite honestly, given the huge, persistent differences in student populations between high flying Newark charters and districts schools, and given the relatively consistency of research on peer group effects, it would be shocking if the CREDO report had not found that Newark charters outperform district schools.

While it is likely that there exists some strategies employed by some charters (as well as some strategies employed by some district schools) that are working quite well – THE CREDO REPORT PROVIDES ABSOLUTELY NO INSIGHTS IN THIS REGARD.  It’s a classic “charter v. district” comparison – where it is assumed that “chartering” represents one set of educational/programmatic strategies and “districting” represents another – when in fact, neither is true (see the scatter of dots in my plots above to see the variations in each group!).

AIR Pollution in NY State? Comments on the NY State Teacher/Principal Rating Models/Report

I was immediately intrigued the other day when a friend passed along a link to the recent technical report on the New York State growth model, the results of which are expected/required to be integrated into district level teacher and principal evaluation systems under that state’s new teacher evaluation regulations.  I did as I often do and went straight for the pictures – in this case- the scatterplots of the relationships between various “other” measures and the teacher and principal “effect” measures.  There was plenty of interesting stuff there, some of which I’ll discuss below.

But then I went to the written language of the report – specifically the report’s (albeit in DRAFT form)  conclusions. The conclusions were only two short paragraphs long, despite much to ponder being provided in the body of the report. The authors’ main conclusion was as follows:

The model selected to estimate growth scores for New York State provides a fair and accurate method for estimating individual teacher and principal effectiveness based on specific regulatory requirements for a “growth model” in the 2011-2012 school year. p. 40

http://engageny.org/wp-content/uploads/2012/06/growth-model-11-12-air-technical-report.pdf

13-Nov-2012 20:54

Updated Final Report: http://engageny.org/sites/default/files/resource/attachments/growth-model-11-12-air-technical-report_0.pdf

Local copy of original DRAFT report: growth-model-11-12-air-technical-report

Local copy of FINAL report: growth-model-11-12-air-technical-report_FINAL

Unfortunately, the multitude of graphs that immediately preceded this conclusion undermine it entirely. but first, allow me to address the egregious conceptual problems with the framing of this conclusion.

First Conceptually

Let’s start with the low hanging fruit here. First and foremost, nowhere in the technical report, nowhere in their data analyses, do the authors actually measure “individual teacher and principal effectiveness.” And quite honestly, I don’t give a crap if the “specific regulatory requirements” refer to such measures in these terms. If that’s what the author is referring to in this language, that’s a pathetic copout.  Indeed it may have been their charge to “measure individual teacher and principal effectiveness based on requirements stated in XYZ.” That’s how contracts for such work are often stated. But that does not obligate the author to conclude that this is actually what has been statistically accomplished. And I’m just getting started.

So, what is being measured and reported?  At best, what we have are:

  • An estimate of student relative test score change on one assessment each for ELA and Math (scaled to growth percentile) for students who happen to be clustered in certain classrooms.

THIS IS NOT TO BE CONFLATED WITH “TEACHER EFFECTIVENESS”

Rather, it is merely a classroom aggregate statistical association based on data points pertaining to two subjects being addressed by teachers in those classrooms, for a group of children who happen to spend a minority share of their day and year in those classrooms.

  • An estimate of student relative test score change on one assessment each for ELA and Math (scaled to growth percentile) for students who happen to be clustered in certain schools.

THIS IS NOT TO BE CONFLATED WITH “PRINCIPAL EFFECTIVENESS”

Rather, it is merely a school aggregate statistical association based on data points pertaining to two subjects being addressed by teachers in classrooms that are housed in a given school under the leadership of perhaps one or more principals, vps, etc., for a group of children who happen to spend a minority share of their day and year in those classrooms.

Now Statistically

Following are a series of charts presented in the technical report, immediately preceding the above conclusion.

Classroom Level Rating Bias

School Level Rating Bias

And there are many more figures displaying more subtle biases, but biases that for clusters of teachers may be quite significant and consequential.

Based on the figures above, there certainly appears to be, both at the teacher, excuse me – classroom, and principal – I mean school level, substantial bias in the Mean Growth Percentile ratings with respect to initial performance levels on both math and reading. Teachers with students who had higher starting scores and principals in schools with higher starting scores tended to have higher Mean Growth Percentiles.

This might occur for several reasons. First, it might just be that the tests used to generate the MGPs are scaled such that it’s just easier to achieve growth in the upper ranges of scores. I came to a similar finding of bias in the NYC value added model, where schools having higher starting math scores showed higher value added. So perhaps something is going on here. It might also be that students clustered among higher performing peers tend to do better. And, it’s at least conceivable that students who previously had strong teachers and remain clustered together from year to year, continue to show strong growth. What is less likely is that many of the actual “better” teachers just so happen to be teaching the kids who had better scores to begin with.

That the systemic bias appears greater in the school level estimates than in the teacher level estimates is suggestive that the teacher level estimates may actually be even more bias than they appear. The aggregation of otherwise less biased estimates should not reveal more bias.

Further, as I’ve mentioned on several times on this blog previously, even if there weren’t such glaringly apparent overall patterns of bias their still might be underlying biased clusters.  That is, groups of teachers serving certain types of students might have ratings that are substantially WRONG, either in relation to observed characteristics of the students they serve or their settings, or of unobserved characteristics.

Closing Thoughts

To be blunt – the measures are neither conceptually nor statistically accurate. They suffer significant bias, as shown and then completely ignored by the authors. And inaccurate measures can’t be fair. Characterizing them as such is irresponsible.

I’ve now written 2 articles and numerous blog posts in which I have raised concerns about the likely overly rigid use of these very types of metrics when making high stakes personnel decisions. I have pointed out that misuse of this information may raise significant legal concerns. That is, when district administrators do start making teacher or principal dismissal decisions based on these data, there will likely follow, some very interesting litigation over whether this information really is sufficient for upholding due process (depending largely on how it is applied in the process).

I have pointed out that the originators of the SGP approach have stated in numerous technical documents and academic papers that SGPs are intended to be a descriptive tool and are not for making causal assertions (they are not for “attribution of responsibility”) regarding teacher effects on student outcomes. Yet, the authors persist in encouraging states and local districts to do just that. I certainly expect to see them called to the witness stand the first time SGP information is misused to attribute student failure to a teacher.

But the case of the NY-AIR technical report is somewhat more disconcerting. Here, we have a technically proficient author working for a highly respected organization – American Institutes for Research – ignoring all of the statistical red flags (after waiving them), and seemingly oblivious to gaping conceptual holes (commonly understood limitations) between the actual statistical analyses presented and the concluding statements made (and language used throughout).

The conclusion are WRONGstatistically and conceptually.  And the author needs to recognize that being so damn bluntly wrong may be consequential for the livelihoods of thousands of individual teachers and principals! Yes, it is indeed another leap for a local school administrator to use their state approved evaluation framework, coupled with these measures, to actually decide to adversely affect the livelihood and potential career of some wrongly classified teacher or principal – but the author of this report has given them the tool and provided his blessing. And that’s inexcusable.

And a video with song!

==================

Note:   In the executive summary, the report acknowledges these biases:

Despite the model conditioning on prior year test scores, schools and teachers with students who had higher prior year test scores, on average, had higher MGPs. Teachers of classes with higher percentages of economically disadvantaged students had lower MGPs.

But then blows them off throughout the remainder of the report, and never mentions that this might be important.

Local copy of report: growth-model-11-12-air-technical-report

On the Stability (or not) of Being Irreplaceable

This is just a quick note with a few pictures in response to the TNTP “Irreplaceables” report that came out a few weeks back – a report that is utterly ridiculous at many levels (especially this graph!)… but due to the storm  I just didn’t get a chance address it.  But let’s just entertain for the moment the premise that teachers who achieve a value-added rating in the top 20% in a given year are… just plain freakin’ awesome…. and that districts should take whatever steps they can to focus on retaining this specific momentary slice of teachers.  At the same time, districts might not want concern themselves with all of those other teachers that range only from okay… all the way down to those that simply stink!

The TNTP report focuses on teachers who were in the top 14% in Washington DC based on aggregate IMPACT ratings, which do include more than value-added alone, but are certainly driven by the Value-added metric. TNTP compares DC to other districts, and explains that the top 20% by value-added are assumed to be higher performers.

For the other four districts we studied, we used teacher value-added scores or student academic growth measures to identify high- and low-performing teachers—those whose students made much more or much less academic progress than expected. These data provided us with a common yardstick for teacher performance. Teachers scoring in approximately the top 20 percent were identified as Irreplaceables. While teachers of this caliber earn high ratings in student surveys and have been shown to have a positive impact that extends far beyond test scores, we acknowledge that such measures are limited to certain grades and subjects and should not be the only ones used in real-world teacher evaluations. http://tntp.org/assets/documents/TNTP_DCIrreplaceables_2012.pdf

Let’s take a  stab at this with the NYC Teacher Value-added Percentiles which I played around with in some previous posts.

The following graphs play out the premise of “irreplaceables” with NYC value-added percentile data. I start by identifying those teachers that are in the top 20% in 2005-06 and then see where they land in each subsequent year through 2009-10.

NOTE: IT’S REALLY NOT A GREAT IDEA TO MAKE SCATTERPLOTS OF THE RELATIONSHIP BETWEEN PERCENTILE RANKS – BETTER TO USE THE ACTUAL VAM SCORES. BUT THIS IS ILLUSTRATIVE… THE POINT BEING TO SEE WHERE ALL OF THOSE DOTS THAT ARE “IRREPLACEABLE” IN YEAR 1 (2005-06) STAY THAT WAY YEAR AFTER YEAR!

I’ve chosen to focus on the MATHEMATICS ratings here… which were actually the more stable ratings from year to year (but were stable potentially because the were biased!)

See: https://schoolfinance101.wordpress.com/2012/02/28/youve-been-vam-ified-thoughts-graphs-on-the-nyc-teacher-data/

Figure 1 – Who is irreplaceable in 2006-07 after being irreplaceable in 2005-06?

Figure 1 shows that there are certainly more “irreplaceables” (awesome teachers) that remain above the median the following year than fall below it… but there sure are one heck of a lot of those irreplaceables that are below the median the next year… and a few that are near the 0%ile!  This is not, by any stretch to condemn those individuals for being falsely rated as irreplaceable but actually sucking. Rather, this is to point out that there is comparable likelihood that these teachers were wrongly classified each year (potentially like nearly every other teacher in the mix).

Figure 2 – Among those 2005-06 Irreplaceables,  how do they reshuffle between 2006-07 & 2007-08?

Hmm… now they’re moving all over the place. A small cluster do appear to stay in the upper right. But, we are dealing with a dramatically diminishing pool of the persistently awesome here.  And I’m not even pointing out the number of cases in the data set that are simply disappearing from year to year. Another post – another day.

I provide an analysis along these lines here: https://schoolfinance101.wordpress.com/2012/03/01/about-those-dice-ready-set-roll-on-the-vam-ification-of-tenure/

Figure 3 – How many of those teachers who were totally awesome in 2005-06 were still totally awesome in 2009-10?

The relationship between ratings from year to year is even weaker when one looks at the endpoints of the data set, comparing 2005-06 ratings to 2009-10 ones. Again, we’ve got teachers who were supposedly “irreplaceable” in 2005-06 who are at the bottom of the heap in 2009-10.

Yes, there is still a cluster of teachers who had a top 20% rating in 2005-06 and have one again in 2009-10. BUT… many… uh… most of these had a much lower rating for at least one of the in between years!

Of the thousands of teachers for whom ratings exist for each year, there are 14 in math and 5 in ELA that stay in the top 20% for each year! Sure hope they don’t leave!

====

Note: Because the NYC teacher data release did not provide unique identifiers for matching teachers from year to year, for my previous analyses I had constructed a matching identifier based on teacher name, subject and grade level within school. So, my year to year comparisons include only those teachers who are teaching the same subject and grade level in the same school from one year to the next. Arguably, this matching approach might lead to greater stability than might be expected if I included teachers who moved to different schools serving different students and/or changed subject areas or levels.

Teachers Unions: Scourge of the Nation?

UPDATED: 1/29/2015

Let me start by stating that I, myself am somewhat agnostic when it comes to the questions around whether I believe teachers unions are generally good or bad for the overall quality of our education system and for educational equity.  In my personal experiences as a young teacher in the early 1990s, I had my issues with my local teachers unions (in New York State in particular), resulting in some pretty heated battles with local and regional union officials [and some pretty nasty internal politics in my own school].  As a young teacher, I was anything but a fan of the teachers union. But unlike many of my TFA pals [I was a few years too early for TFA, but had friends & later colleagues in the first few waves] who only stuck it out in teaching for a year or two and may have developed similar negative feelings toward their local union, I did outgrow that initial reaction – which in my view- was somewhat isolated – and partly a function of my own youthful ignorance.  I didn’t stick it out in public school teaching much longer than that [the local union actually ran me out!], but did have the unique experience of working in an elite private school that had a union, and I worked in that school during a contract renegotiation.

The idea for this post first came about when I read the following quote in an article in the Economist. This has to be among the most utterly stupid statements I think I’ve ever read in my life:

…no Wall Street financier has done as much damage to American social mobility as the teachers’ unions have. http://www.economist.com/node/21564556

And then there’s this more recent quote:

Many schools are in the grip of one of the most anti-meritocratic forces in America: the teachers’ unions, which resist any hint that good teaching should be rewarded or bad teachers fired. http://www.economist.com/news/leaders/21640331-importance-intellectual-capital-grows-privilege-has-become-increasingly

Now… this quote is these quotes are ridiculous at many levels.  Most notably, the first quote is stupid simply because one could never possible contrive a reasonable quantifiable comparison of the supposed negative effects of either the individual hedge fund manager or the supposed monolithic “teachers union.” It’s the empirical equivalent of arguing whether Superman can beat up Hulk. It’s just asinine.

UPDATE: The second quote above comes from a piece that subsequently implies that teachers’ unions are a major, if not the primary cause of educational inequality across children- specifically between rich and poor children. Here’s a little more on the topic of “teacher equity” in particular. (Post 1 | Post 2)

On the heels of this quote came the Thomas B. Fordham Institute report rating the strength of teachers unions – or unionization more generally – across states.  Perhaps the most useful aspect of this report is that it provides us with insights regarding the heterogeneity of unionization across American states.  Unions and unionization are not monolithic.

As recognized by the Fordham report, we really don’t have an American education system. We have 51 systems. They are all somewhat different, with different standards, different funding systems, different union rules and protections and different student outcomes.  The existing variations across our state systems of education alone render the economist statement utterly stupid and misguided.  Those variations also provide for some fun opportunities to explore the relationship between TB Fordham’s characterization of teachers’ union strength across states and other features of state education systems.

In this post, I use data from several reports that attempt to characterize state education systems to probe two main questions – whether there exists any association between general indicators of education quality across states and union strength, and whether there exists any association between indicators of educational equality across states and union strength.

How is union strength related to funding levels and funding fairness?

Along with colleagues at the Education Law Center of New Jersey, I have been preparing for the past few years, annual reports on education funding fairness. In the Funding Fairness report, we use a statistical model on three years of national data on all school districts to project the cost adjusted per pupil state and local revenues for all districts and state averages nationally, and we characterize the overall fairness – progressiveness or regressiveness of state school finance systems. Below, I evaluate the relationship between “union strength rank” from the TB Fordham report and funding “levels” (an indicator of adequacy) and funding “fairness” (whether higher poverty districts receive systematically more, or less funding per pupil than lower poverty districts in that state).

An important caveat here since I like to pick on inappropriate graphs myself is that I really should not be making scatterplots where the x-axis variable is a “rank” measure. Rank is not an interval measure. But this is purely for illustrative purposes, so please forgive my misuse of rank data in this way! [or at least if you slam me for it, acknowledge that I pointed this out!]

Figure 1

In Figure 1 we can see that states with stronger teachers unions [left hand end] tend to have more adequate overall funding levels. It is however more clearly the case that states with weak teachers unions (ranked 45 to 50th) tend to have particularly low adjusted funding levels. This is certainly not to suggest any direction of causation. That’s the whole trick here. Most of this is probably quite circular – endogenous. [the union cynic might argue that this merely shows that teachers’ unions have extorted funds from the taxpayer] That states which tend to be more educated and progressive happen to both have stronger teachers unions and to spend more on education – but for those states like California that by historical artifact referendum have systematically deprived their education systems for decades.

Figure 2

Perhaps more to the point of the Economist assertion, we see that states with weaker teachers unions also tend to have less fair funding distributions – or are systems where it is more likely that high poverty districts have systematically fewer resources per pupil than lower poverty ones.  Again, this result is likely a function of the endogenous relationships mentioned previously.

See: http://www.schoolfundingfairness.org/

UPDATE: So, wait a second, if stronger union states tend to have fairer funding distributions, might that actually enhance equity? In a really big, important and substantive way? Hmmm….

How is union strength related to competitiveness of teacher pay?

Here, I look at the relationship between union strength and the relative wage of teachers compared to non-teachers in the same state.  This is a particularly important comparison for two reasons. First of all, the relative competitiveness of teacher wages likely has significant effects on the quality of individuals who choose to enter the teacher workforce versus other employment opportunities (selecting from HS into College).  Overall wage competitiveness can have long run effects on overall teacher workforce quality.  Further, this is the one comparison I make in this post where we might hypothesize a direct, easily interpreted relationship. That is, we might expect stronger unions to lead to more competitive wages.  Here, I compare the weekly wage % (teacher percent of non-teacher) from the Economic Policy Institute with the TBF union strength rank.

Figure 3

Somewhat to my own surprise, this relationship is actually quite strong!… with states having stronger teachers unions also having generally more competitive teacher wages.

See: http://www.epi.org/publication/the_teaching_penalty_an_update_through_2010/

Is union strength associated with NAEP achievement levels?

Now, the usual retort to teacher union bashing is to point out that states like New Jersey and Massachusetts have strong unions and also have high NAEP scores, and states like Alabama and Mississippi have weak unions and low NAEP scores.  Yeah… okay… but clearly there’s a lot goin’ on there that has little or nothing to do with unions.  But let’s indulge this premise a little further with some additional graphs just to see the patterns.

In these first few figures I present the relationship between NAEP scores for children in families above the 185% income level for poverty (not on free or reduced lunch) and union strength. Note that the patterns are similar for scores for children qualified for reduced lunch or for free lunch, but I’ve not included them here… ‘cuz there are already enough graphs in this post. I’d be happy to share them though.  In general, what we see in Figure 4 and Figure 5 is that NAEP scores for non-low income kids tend to be slightly lower – with little clear pattern – in weak union states.

Figure 4

Figure 5

Figure 6, however, clarifies that NAEP scores tend to be higher for non-low income children in states where incomes are higher for non-low income children.

Figure 6 (but income dictates NAEP)

We can use the information in Figure 6 to adjust the NAEP scores (are they higher or lower than would be expected, given the income levels) for household income differences.  When we make that adjustment, we get Figures 7 and 8.

Figure 7 (income adjusted NAEP)

Figure 8 (income adjusted NAEP)

Still we see that adjusted NAEP scores are somewhat though hardly systematically lower in states with weaker unions. What we certainly do not see here is that NAEP Scores are systematically lower in states with stronger unions. That is, Unions certainly aren’t driving NAEP scores into the ground!

But, while the second set of graphs is more appropriate than the first, both are dreadfully oversimplified characterizations of complex relationships.

Is union strength associated with NAEP achievement gaps?

This question is perhaps most on target with the Economist claim. Following the economist logic, one might assert that teachers unions likely lead to larger achievement gaps, thus limiting social mobility. Measuring poverty related income gaps and comparing them across states is tricky, as I’ve discussed in numerous previous posts. Specifically, the size of the achievement gap between kids not qualified for free or reduced lunch and those qualified for either free or reduced lunch tends to be highly related to the size of the income gap between the two groups – as shown in Figure 9! That is, we can’t just do straight up achievement gap comparisons- we must adjust for the income gap.

Figure 9 (Income Gaps and NAEP Gaps)

Figure 10 and Figure 11 present the income gap adjusted achievement gaps in relation to union strength rank.  What we see is little or no relationship between union strength and achievement gaps. While this does not illustrate that stronger unions lead to smaller achievement gaps…. It also does not by any stretch illustrate that stronger unions lead to larger achievement gaps… an expectation that might reasonably be derived from the claim made in the Economist.

Figure 10

Figure 11

Then again… these are still cursory… descriptive analyses – using only two variables at a time to characterize education systems that are far more complex than can be legitimately characterized with only two variables at a time. It’s exploratory. It’s a start… and there’s certainly more to be explored here… but likely questions that can never be satisfactorily untangled with available data.

See: https://schoolfinance101.wordpress.com/2011/09/13/revisiting-why-comparing-naep-gaps-by-low-income-status-doesnt-work/

Is union strength associated with NAEP achievement growth?

Finally, I suspect that some curmudgeonly reactors to this post will attempt to argue that weak union states have seen more growth in NAEP achievement over time. Well, Figure 12 kind of thwarts that notion as well. Not much relationship there either, but certainly the only one in this post at all that shows even the slightest upward tilt.

Figure 12

But alas, even that tiny upward tilt is a function of the fact that states that saw the greatest growth on NAEP were simply the states that had and still have the lowest overall performance levels – as shown in Figure 13. And, states with lower average performance levels – now and then – tend to have weaker unions.

Figure 13

For a more thorough discussion on this point, see: https://schoolfinance101.wordpress.com/2012/07/27/learning-from-really-bad-graphs-ill-informed-conclusions-thoughts-on-the-new-pepg-catching-up-report/

Conclusions

So what does this all mean then? Are unions good, or are they bad? Do they increase inequality and lower quality? It’s certainly difficult given the data provided above to swallow the bold assertion in the Economist that teachers’ unions are the scourge of the nation and primary cause of declining social mobility.  That’s just a load of unsubstantiated crap!

But then what can we learn here. Well, it is perhaps important that there appears to be at least some likely indirect and certainly endogenous relationship between unionization and funding fairness and funding levels. As I’ve discussed in related research funding fairness and funding levels – and school finance reforms that improve equity and adequacy do matter!  To summarize:

Do state school finance reforms matter? Yes. Sustained improvements to the level and distribution of funding across local public school districts can lead to improvements in the level and distribution of student outcomes. While money alone may not be the answer, more equitable and adequate allocation of financial inputs to schooling provide a necessary underlying condition for improving the equity and adequacy of outcomes. The available evidence suggests that appropriate combinations of more  adequate funding with more accountability for its use may be most promising.

http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

See also this post in which I probe more specifically the changes in achievement gaps over time in Massachusetts and New Jersey.

Further, the potentially more direct relationship between unionization and relative competitiveness of teacher wages compared to other labor market opportunities may be important in the long run.  In a related policy brief from last winter, I noted:

To summarize, despite all the uproar about paying teachers based on experience and education, and its misinterpretations in the context of the “Does money matter?” debate, this line of argument misses the point. To whatever degree teacher pay matters in attracting good people into the profession and keeping them around, it’s less about how they are paid than how much. Furthermore, the average salaries of the teaching profession, with respect to other labor market opportunities, can substantively affect the quality of entrants to the teaching profession, applicants to preparation programs, and student outcomes. Diminishing resources for schools can constrain salaries and reduce the quality of the labor supply. Further, salary differentials between schools and districts might help to recruit or retain teachers in high need settings. In other words, resources used for teacher quality matter.

http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

So, while nothing in this post puts to rest the big – unanswerable – questions of the overall equity and quality effects of teachers unions on our supposed monolithic American public education system, these analyses do at least raise serious questions about the notion that teachers unions are the scourge of the nation cause of all of the supposed – also unfounded – ills of American public schooling.

Cheers! It’s good to be back!

Friday Afternoon Graphs: Graduate Degree Production in Educational Administration 1992 to 2011

I’ll let the pictures tell the story this time. [UPDATED – Errors in original]

Data source: http://nces.ed.gov/ipeds/datacenter/DataFiles.aspx

School Labels & Housing Values: Potential consequences of NJDOE’s new arbitrary & capricious school ratings

There exists relatively broad agreement in the empirical literature that perceived quality of local public goods and services – including local public schools – influences significantly the value – as represented in demand/sales prices – of residential property. In other words – perceived school quality affects housing prices and housing values. All else equal, one pays a premium to live in a school district or attendance zone within a district that is associated with a “good” school.

Indeed this “capitalization” of school quality (perceived or real) in home values is at the root of much of the disparity underlying highly residentially segregated state education systems. It’s a long run, complex chicken-egg cycle sort of thing. Some communities have more which allows them to spend more… to improve perceived quality… and capitalize that value into their homes/property values, increasing the town’s ability to raise revenue further, and increasing barriers to entry for families with lower income.

Realtors, the real estate industry and state and local publications like New Jersey monthly and national publications like Newsweek and U.S. News drool over oversimplified characterizations of good and bad schools. As trivial as this stuff may seem to many of us, it is consequential, or at least can be.

Beyond magazine ratings, state school rating schemes have been shown be consequential for home values. The key is that summary type ratings, broad classifications or grades – ACCURATELY REFLECTING QUALITY OR NOT – seem to have the most significant impact. For example, in one recent study specifically evaluating post-NCLB classification schemes & other metrics, authors found that “Results show that while all school quality measures tested have some explanatory power, school district ratings and performance index, which are comprehensive measures of school quality, are the most appropriate measures and are readily capitalized into housing prices.”[1] In one of the better known studies on this topic, David Figlio evaluated the influence of Florida’s letter grading system on home values, finding:

This paper provides the first evidence of the effects of school grade assignment on the housing market. Our results suggest that the housing market responds significantly to the new information about schools provided by these “school report cards,” even when taking into consideration the test scores or other variables used to construct these same grades. These results suggest that innocuous-seeming school classifications may have large distributional implications, and that policy-makers should exercise caution when classifying schools.

http://bear.warrington.ufl.edu/figlio/house0502.pdf

Now, the caveat to Figlio’s findings is the initial shock on housing prices of revealed grades may fade with time.

These findings raise significant questions about the potential impact on housing values located in attendance boundaries of schools granted these new labels by state agencies, in accordance with their NCLB waiver applications.  In their waiver applications state agencies were (seemingly) under the gun to find ways to classify as problem schools and/or failing schools, not exclusively poor minority schools in the inner city. Indeed, many set out to make poor, minority schools their primary target.  As I’ve shown in recent posts on New York and New Jersey, states did indeed classify as failing schools largely those schools that are predominantly poor, predominantly minority and in the inner city.

But, in their effort to marginally diversify their “bad” schools list, states also proposed achievement gap metrics and subgroup metrics to be used for identifying “other” more diverse and less poor schools for disruptive state intervention.  Most of these schools New Jersey ended up being classified as “focus” schools, or “we’re watching you!” schools and we’re going to push interventions on you through our regional achievement centers.  Here’s the list of “focus” schools in generally non-low-income communities (middle and upper income) in New Jersey:

Table 1. Focus Schools in Non-Low-Income Districts

http://www.state.nj.us/education/reform/PFRschools/Priority-Focus-RewardSchools.pdf

A number of “focus” schools occur along the Northeast Corridor around Middlesex County. This is particularly true of “focus” schools in non-low-income (lighter blue) districts.

Figure 1. Locations of Focus, Priority and Reward Schools

All of these schools achieved their “focus” status by having large achievement gaps between two groups either by race, language proficiency or poverty (or disability?… no detail is provided!), rather than by low average or overall performance. Many are middle schools, in part because middle schools serve as a funneling point within mid-sized suburban districts, where children from neighborhood schools first come together in a single location (or perhaps two locations), creating sufficient subgroup sample sizes for calculating gaps.

Notably, a school can only have a measurable achievement gap between ethnic groups if it has at least 30 tested students in each group!  So really, most of the “focus” schools in middle and upper middle class New Jersey districts are middle schools in more diverse districts.

Far fewer of the more affluent schools in the state even have at least 30 members of disadvantaged minority groups taking state assessments in a given year! As such, racial achievement gaps cannot even be calculated for these districts.

Yes, gaps are a problem… but these measures… and resultant classifications are a twisted combination of ignorant and arbitrary.

Ignorant, arbitrary or otherwise, these classifications may have significant consequences for home values. And homeowners in these districts (and those in poor urban “priority” school zones) should be rightfully outraged at this potentially highly consequential abuse of data. [and of course those in “reward” school zones can quietly basque in the glory of their unearned accolades]

After all, it is the broad labeling that matters more than precise and nuanced characterizations of actual schooling quality!

Figure 3 shows the average proficiency rates of the “reward” schools and “focus” schools in Middlesex County – focusing only on those schools with fewer than 20% of children qualified for free lunch. That is, lower poverty schools.  In terms of overall proficiency, the “focus” schools fit reasonably into the broader mix of schools in Middlesex County.

My intent here is certainly not to downplay the gaps that may persist in these schools, though it’s really important to acknowledge that you can only even measure that gap if diversity exists to begin with. My point in this graph and post in general is that the state has created a labeling system misuses measures that weren’t very good to begin with to create arbitrary and capricious school labels that may have real and substantial consequence for home values. In many cases here, districts that are home to a focus school are immediately adjacent to districts that are home to ‘reward’ schools (an equally unearned label!).

Figure 3.

The kicker here is that even if the public were to become wise to the questionable veracity of these labels, that state has used this labeling system in the context of granting itself near unilateral authority to exercise substantial control over the operations of these schools [an authority which may not actually exist!].

So, it’s not just about the labels – which may be entirely meaningless – but it’s also about – much more about – a substantial threat to local governance of those schools. Now, I’ll admit that I have mixed feelings about “local governance,” because it is often local governance that reinforces disparities across children and schools.

But, that said, the state’s choice to use these labels quite explicitly as a threat to local governance – rather than merely as a “label” to increase awareness and encourage increased local accountability – may increase the consequences for local home values. That is, prospective home buyers may be more likely to avoid purchasing homes in neighborhoods or districts where they perceive that they may lose control to the state of their schools and this effect may be much greater than the effect of a negative label alone. Further, it’s entirely possible that in these middle class communities otherwise perceived as having pretty good schools, that public perception would be that proposed state interventions are more likely to make the schools worse than better (in addition to the threat of intervention itself).

Indeed, these are empirical questions and ones I hope to explore over the next few years as annual housing sales data are released.

Gap measurement in NJ: Largest Within-School Gaps: schools with the largest in-school proficiency gap between the highest-performing subgroup and the combined proficiency of the two lowest-performing subgroups. Schools in this category have a proficiency gap between these subgroups of 43.5 percentage points or higher. see: http://www.state.nj.us/education/reform/PFRschools/TechnicalGuidance.pdf


Data, Data, Data? Dissecting & Debunking NJDOE’s State of the Schools Message

Time again for an NJ State of the Schools Address, as reported HERE in NJ Spotlight (with absolutely no critical question/reporting whatsoever! More or less spoon fed regurgitation).

As I’ve written a number of times on this blog, state officials in New Jersey have decided on specific marketing/messaging plan in order to support current policy initiatives. Those policy initiatives involve:

  1. expanding NJDOE authority to impose desired “reforms” (charter/management takeover, staff replacement, etc.) on specific schools otherwise not under their direct authority.
  2. cutting funding from higher poverty, higher need districts and shifting it toward lower poverty, lower need ones.
  3. expanding charter schooling and promoting other  “innovations” in high poverty concentration schools.

The supposed impetus for these reforms is that New Jersey faces a very large achievement gap between low income and non-low income children (one that is largely mis-measured). While it would seem inconsistent to suggest reducing funding in low income districts and shifting it to others, the creative messaging has been that the additional resources are quite possibly the source of the harm… or at the very least those resources are doing no good. Thus, the path to improvement for low income kids is to transfer their resources to others.  What I have found most disturbing about this messaging – other than the ridiculous message itself! – is the flimsy logic and disingenuous presentations of DATA that have been used to advance the argument.

Look if the message is going to be about Data, Data, Data – then now is the time to take a more thorough, context-sensitive look at the data, and try to better understand what’s really going on.

Let’s do a walk through of some of the information presented in the most recent state of the schools presentation.

Here’s a link to the slides from the recent presentation:

http://www.state.nj.us/education/news/2012/0919con.pdf

NJDOE Message

The most recent state of the schools presentation is now in the post-NCLB waiver era, where we are now presented with those template classifications of schools as Priority, Focus and Reward schools.
The state of the schools presentation revolves to a large extent around these categories, because it is those Priority schools that are the target of the most immediate and disruptive interventions.

Below are the slides that were presented to characterize schools by their performance category. The message to be conveyed by these slides was:

  1. Priority Schools are overspenders (or at least very well resourced)
  2. Priority Schools have very well paid teachers who have slightly higher than average experience
  3. Yet still, priority schools have really crummy outcomes!

Therefore, we must have wide latitude to intervene!

EXHIBIT A – PRIORITY SCHOOLS SPEND MORE(?)

EXHIBIT B – PRIORITY SCHOOLS HAVE HIGH PAID TEACHERS & LOW OUTCOMES!

EXHIBIT C- GAPS REMAIN LARGE

Omitted Information What about demographic differences?

Clearly, a few things are being overlooked in the first two slides which claim characterize Priority schools as schools with plenty of resources that simply don’t get the job done. Now, there’s a little more to the story than that!

Most notable, as I show below, priority schools have about 80% of children qualified for free lunch and reward schools less than 10%! Yet as the NJDOE slide above shows, at the high end these school districts spend slightly under 30% more than state average. Notably, this shoddy comparison does not compare these districts to others in their own labor market.

Indeed, New Jersey more than other states has put some money into these districts. See “Is school funding fair?” But, let’s be clear, these margins of funding difference, while helpful, hardly make these districts – given their needs – flush with excess resources!

In fact, the strongest empirical research on this topic suggests that it would take an additional 100% or so per pupil funding for a district that is 100% low income versus a district that is 0% low income. Here, we are looking at nearly that extreme of low income differential, and not nearly that extreme of funding support! So while these districts are better off than similar districts in other states, implying that they’ve got more than enough to close achievement gaps is a huge stretch.

But do those demographic differences matter?

This figure shows just how much the demographic differences represented above matter with respect to student achievement, and specifically how much school demography continues to dictate the performance classification of schools under the NJDOE waiver plan.

As I pointed out on a recent post, NJDOE has basically flagged schools in low income neighborhoods for experimentation and substantial disruption (closure, etc.) with an option to override any/all local input.

Notably, this pattern is likely better than it would otherwise be because of New Jersey’s past efforts to target additional resources to high need settings, including pre-kindergarten programs, smaller class sizes and more competitive teacher salaries than might otherwise exist in these settings.

What about the teacher pay and teacher characteristics claim?

But what about those salaries? The NJDOE slides present a picture of teachers who – by their argument – are certainly paid enough. And, in fact, setting aside (ignoring entirely the demography of the schools), the implication of the NJDOE slides is that hey… we’re paying these teachers a few thousand more than the average teacher in the state, but clearly they just aren’t very good, or at least there are a bunch of them that aren’t and need to be fired! Further, they have slightly more experience than teachers in other schools… yet they still stink… indicating that experience clearly doesn’t matter. Notice that they didn’t present degree levels.

Okay… now let’s do a legitimate walkthrough of the most recent available data on NJ teachers with respect to the performance categories of schools. I use the 2011-12 Fall Staffing Reports and I fit a regression model of teacher salaries for all elementary and middle level classroom teachers (secondary later if I get a chance). In that model, my goal is to compare the salary a teacher would make:

  • at the same experience level
  • with the same degree level
  • having the same job code
  • working full time
  • in the same labor market (and type of district in that market)
  • in the same year

That is, I’m comparing apples with apples. This first graph shows the average difference in salary on the above comparison bases, statewide. Statewide, teachers in priority schools are earning a lower salary and teachers in reward schools a higher salary than teachers in “all other schools.” But these averages do mask some important differences across labor markets.

Here are the North Jersey/NY projected teacher salaries by experience level, where Newark carries significant weight in the model. Priority school salaries by experience are in blue, reward in red. On average, the differences are rather subtle. Reward schools salaries jump ahead in the mid-range, and priority rise again later, but fall behind in the mid range. But, it’s really important to understand, that simply having roughly the same salary does not mean that salary is actually competitive for recruiting and retaining teachers of comparable qualifications! In fact, to get teachers to work in a high need setting is likely to require a substantively higher wage!

As I explain in a recent review of the literature on this topic: With regard to teacher quality and school racial composition, Hanushek, Kain, and Rivkin (2004) note: “A school with 10 percent more black students would require about 10 percent higher salaries in order to neutralize the increased probability of leaving.”33 Others,however, point to the limited capacity of salary differentials to counteract attrition by compensating for working conditions.34 see: http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

  • Hanushek, Kain, Rivkin, “Why Public Schools Lose Teachers,” Journal of Human Resources 39 (2) p. 350
  • Clotfelter, C., Ladd, H.F., Vigdor, J. (2011) Teacher Mobility, School Segregation and Pay Based Policies to Level the Playing Field. Education Finance and Policy , Vol.6, No.3, Pages 399–438
  • Clotfelter, Charles T., Elizabeth Glennie, Helen F. Ladd, and Jacob L. Vigdor. 2008. Would higher salaries keep teachers in high-poverty schools? Evidence from a policy intervention in North Carolina. Journal of Public Economics 92: 1352–70.

Now let’s look at south jersey, which appears to be the source of most of the deficit that shows up statewide. In South Jersey/Philly metro, teachers in priority schools are making a much lower wage especially in the mid-range. Non-classified and reward schools lead the way on salaries across most of the experience range. Hey… is this chicken or egg? Do salaries matter – or are more advantaged schools simply able to pay higher salaries.

One issue that NJDOE appears to be ignoring entirely is that the classification of these schools may actually lead to additional teacher sorting – making it even harder to staff priority schools with high quality teachers down the line.

Here are the degree levels of classroom teachers in these schools – something notably absent in the NJDOE presentation. The differences between priority and reward schools are quite striking.

PRIORITY SCHOOLS HAVE FAR MORE TEACHERS WITH ONLY A BA AND FEWER WITH AN MA THAN REWARD SCHOOLS!

Finally, here are the concentrations of novice teachers, where a sizable body of research literature points to the problem of teacher churn in high need schools and the relationship between high novice teacher concentrations and lower student outcomes.

What about the performance of low income children in New Jersey?

Again, part of the message being presented in the state of the schools address is that New Jersey in particular has failed its low income children – as indicated by the suspect, over time proficiency rate graphs presented above. These graphs are presented as coupled with the funding/resource graphs to imply that funding is clearly unhelpful at best and harmful at worst when it comes to fixing the achievement gap.

As I’ve written on this blog before, New Jersey has made substantive gains in recent decades for low income children. Further, to make comparisons of achievement gaps, one must focus on the most comparable measures and most comparable settings. In one recent blog post, I compared Massachusetts, Connecticut and New Jersey – which in terms of income distributions and the characteristics of those above and below the Free/Reduced Income thresholds are most similar. The following graphs show that children of HS dropouts and low income children in NJ and MA have both higher levels of performance and have outpaced the gains in performance of similar children in Connecticut and Rhode Island (but especially CT!)

What has New Jersey done to improve performance of low income children?

I also elaborated in that previous that one key difference between these states is that NJ and MA, more than the others have shifted resources toward higher need districts. The first graph shows the disruption over time in the relationship between district income and district resources. MA and NJ have most significantly disrupted this relationship, providing systematically more resources per pupil in lower income districts.

This second graph shows the pattern across districts by poverty in each state. Note that in CT, while a few high poverty districts (Hartford and New Haven) have higher current spending, the CT pattern is less systematic. Further, in those few districts, much of the additional spending is granted through magnet school aid, and thus may have limited positive impact on the districts’ neediest students.

To the best of my understanding, teacher tenure laws are/were strong in each of these states. Few if any districts in these states base teacher evaluation heavily on student test scores – especially during the periods represented in the graphs above – which predate Race to the Top. That is, clearly the differences in low income achievement growth between these states have little/nothing to do with state teacher evaluation policy. To go even further, NJ and CT have relatively small charter school market share, so charter school market share likely is not a major factor either.

Further, as explained in this report, and in this article, substantive and sustained school finance reforms do matter! And the evidence on the effectiveness of these reforms far outweighs the more speculative reforms being suggested as replacements for funding in New Jersey.

What does NJDOE & the current administration propose to do about future funding?

Finally, as I noted previously, the current direction of policy initiatives is to attempt to reshuffle funding away from higher poverty/need districts and toward lower poverty/need ones. Here’s the graph from the previous post.

The Strange Logic of it All?

Coupling this DOOHNIBOR (uh… reverse robinhood) strategy with arguments for disruptive reforms in high poverty settings is illogical at best and reckless and irresponsible at worst.

Children in high poverty settings in New Jersey have made substantive gains over time.

It is quite likely that New Jersey’s investments in the schools and communities of these children have played a significant role in those gains.

Yet, even in New Jersey, where the state has made those efforts, poverty-related disparities do persist and require attention.

There is little or no evidence that expanded charter schooling is substantively improving the outcomes of our lowest income children, largely because those “successful charter schools” of which we most often speak are not serving our lowest income children in any significant numbers, and in some cases are increasing concentrations of disadvantaged children left behind in district schools.

And there’s little evidence that either New Jersey’s failures or gains are a function of an oversimplified good teacher/bad teacher dichotomy, suggesting a need for oversimplified reformy solutions like teacher deselection and/or pay-for-test scores.

Despite the state’s efforts to provide support to high poverty settings/schools, teacher wages still are not where they necessarily need to be in those districts to recruit and retain a high quality applicant pool year after year. There remain disparities in teacher qualifications, including novice teacher concentrations. Teacher quality disparities may be/are an issue – but not in the way they are presently being framed!

These are the basic issues that need to be addressed. They aren’t sexy. They aren’t reformy. They aren’t consistent with the current marketing/messaging of NJDOE.

But they are based on data, data, data, DATA, DATA and more freakin’ Data!

And there’s a lot more where that came from!

 
 

Teacher Salaries, Demographics & Financial Disparities in the Chicago Metro

No time to really write much here today, but I do have a few figures to share. I’m posting these mainly because I keep seeing so many ridiculous a-contextual… and in many cases simply wrong statements about Chicago teachers’ salaries.  As I understand it, salaries are not really the main issue in the contract dispute… but rather… the teacher evaluation system. I’ve already written extensively about the types of teacher evaluation frameworks that I believe are being deliberated here, but I’m not following the issue minute by minute.

This post may be most relevant! 

Someone has to just say no to ill-conceived teacher evaluation policies. Perhaps this is the time.

That aside, there are typically two ways one might choose to compare teacher salaries to determine how they fit into their competitive context. One is to compare teacher salaries to non-teachers of similar age and education level. The overall competitiveness of teacher salaries tends to influence the quality of entrants to the profession. The other is to compare teacher salaries – for similar teachers – across districts within the same labor market.

When taking the latter approach, it is also important to consider the demographic differences across settings. All else equal, teachers will gravitate toward jobs with more desirable working conditions. So, in high need urban settings, equal compensation alone would be insufficient.

Bear in mind that I’ve explained on numerous previous posts how Chicago is among the least well funded large urban districts in the nation!

So, here’s a quick run-down on salaries and student populations – and funding equity (or lack thereof) – in pictures and tables.

Figure 1. Concentration of Predominantly Black and Hispanic Schools and Low Income Districts (and resource inequity)

[this paper explains the model behind the funding disparity analysis]

Figure 2. Demographics of Selected School Districts

Figure 3. Salary by Experience Generated from Model of Teacher Level Data (publicly available here)

So, in the mix, Chicago salaries for the first several years of experience are relatively average – or even slightly above. But, they do trail off at higher levels of experience and eventually fall behind. Remember though that comparable salaries would be generally insufficient for recruiting/retaining comparable teachers in a higher need setting.

Other even higher poverty, higher minority concentration districts like Harvey and Dolton are even more disadvantaged in terms of teacher salary competitiveness.

For more on the importance of teacher salaries, see: http://www.shankerinstitute.org/images/doesmoneymatter_final.pdf

Cheers!

ADDENDUM

I’ve been fielding a few random comments along the lines of “so what… Chicago’s outcomes still stink and they clearly spend more than they should, and pay their teachers more than they should for those stinky outcomes!”  Some of these comments point to higher graduation rates in Springfield, coupled with lower spending. Of course, this comparison assumes that it would cost the same in Springfield and Chicago to accomplish similar outcomes. So, I ran a check based on models I’ve run for recent academic papers. The models are fully elaborated here:

Baker.AEFP.NY_IL.Unpacking.Jan_2012

Specifically, I estimate models to adjust for the various costs faced by districts toward achieving common outcome goals. Those models account for differences in the student population served, differences in regional labor costs and differences in economies of scale (really only affects small districts).

These graphs show the relationship between need and cost adjusted operating expenditure per pupil and student outcome measures. The first uses the state assessment scores, centered around the average district – and averaging these centered scores across all grades and tests. It’s like a combined outcome index of all test scores.

Chicago falls pretty much in line here. It has very low need/cost adjusted spending… and, well… low outcomes. But they certainly don’t appear to have lower outcomes than expected given their resources!

The second uses graduation rates.

It’s a little harder to judge what’s going on here… but Chicago still does not appear to be substantially out of line. Graduation rates can vary for any multitude of reasons… including having lower standards for graduation.

In other words, I’m not buying the argument that “yeah… but… Chicago still ain’t cuttin’ it… even with what it has.” Is it possible to have what you need and still not cut it? Yes. It is certainly possible that a district would have far more adequate resources and still do a crummy job. But, Chicago does not appear to be such a case.

 

 

Ed Waivers, Junk Ratings & Misplaced Blame: Jersey Edition

I’ve been writing over the past few weeks about NCLB waivers and the schools that are being targeted by states under the waiver program as targets for federally endorsed state intervention. [all of which is built on highly suspect legal/governance assumptions]

My concerns here operate at a number of levels. First, the current Federal Administration has again used an “incentive” application process to coerce states to adopt really, really ill-conceived policy frameworks. These policy frameworks consist of two major parts:

  1. school and district performance classification schemes that are largely if not entirely built on misinterpretation and misrepresentation of generally low quality data; and
  2. poorly vetted, ill-conceived, aggressive/abrupt (closure, turnaround) intervention strategies as likely (if not more so) to do harm as they are to do any good.

So… yeah… it boils down to ramming bad, disruptive restructuring plans down the throats of schools/districts/communities that have been classified by biased, and unjustifiable measures. Further, much of this is being proposed without carefully evaluating whether there exists legal authority to do any of it.

Junk Classifications 101

So, let’s take a look at how the school classifications have played out in New Jersey. New Jersey, like other states proposed to classify its worst schools as Priority schools – subject to immediate disruptive intervention, the next lowest set as Focus schools – the you’re next/we’re watching you schools – and another set as “reward” schools – or you kick ass so we’re gonna give you a prize!

Matt Di                                  Carlo over at Shanker Blog has given considerable attention to the issue of state school grading systems and the extent to which they measure or even attempt to measure school effects on student test scores (not to be conflated with actual school “effectiveness”), or instead simply capture the compounded influence of a variety of student background factors on various accountability measures. In other words, are school ratings simply classifying poor minority schools as bad schools and thus branding their teachers and administrators as necessarily ineffective, while not even attempting to actually discern their effectiveness.

Further, in my last post on New York City schools I showed that while there were subtle differences in mean teacher percentile rank across schools rated as the worst (priority) versus those rated best (good standing), a) there were still many “best” schools where teacher average test score effect was much lower than in “worst” schools and b) schools that had lower income students and more minority students were still much more likely to be rated as among the “worst” even if their teacher “effects” were similar.

New Jersey Classifications

This first figure shows the demographic composition of schools by their classification. Perhaps the most astounding feature of this graph is that priority schools are nearly 100% black and Hispanic, while reward schools have very low levels of low income, black or Hispanic students.

Here are a few maps to illustrate the geographic distribution of priority, focus and reward schools, for those who know Jersey. We can see that the priority schools are concentrated in the larger, poorest urban centers and focus schools in and around other poor cities/towns.

Not surprisingly, the reward schools for the most part are scattered through the more affluent suburbs of northern Bergen County and out through the most affluent areas of north central NJ (Morris/Somerset/Hunterdon). Okay… I was actually surprised that they had concocted a rating system that was so absurdly biased. The second set of maps shows that there are some reward schools in the northern half of the city of Newark (the area with lower black population share).

Underlying Measures for Classification

It was assumed that states would be proposing ratings based on a mix of status and improvement measures… and that doing so would somehow mitigate the extent of demographic bias in the classifications. States could also use subgroup and achievement gap measures. States wouldn’t, for example, simply be proposing to step in and close down all of the majority low income and minority schools and turn them over to private management/or otherwise displace their entire teaching and administrative staffs.

Of course, the measures available in most states aren’t always that useful to sifting through the demographic biases.New Jersey’s are particularly bad. The following figure shows the racial and low income composition of schools by the types of measures that determined their status. Both the progress ratings and the performance level ratings are hugely biased! As it turns out, so are the achievement gap and subgroup measures. Notably, many affluent New Jersey districts (where the reward schools are) likely have too few low income or minority students to even report gaps.

Remedying Poverty by Deprivation?

In my analysis of New York State, I also showed that priority schools are far more likely to appear in school districts that have been most underfunded by the state of New York relative to its own promised school funding formula (the one the state adopted/proposed as a remedy to court order several years back).

Now, New York state has one of the worst state school finance systems in the nation. One in which districts with more needy students have systematically fewer resources. New Jersey is a far cry from New York in this regard. New Jersey has done better than most states with respect to funding equity and adequacy. 

And compared to demographically similar states, New Jersey has some positive results to show for its overall funding effort and for its targeting to high poverty districts.

But lately, New Jersey has started down a different road in state school finance policy. The state has chosen in recent and proposed for future years to significantly underfund their own legislatively adopted state school finance formula.

That in mind, the following slides present an analysis somewhat similar to that presented in New York State, but looking forward instead of back. I’m not proposing some lofty “what should be” funding levels based on academic analysis here. Rather, I’m simply looking at the extent to which New Jersey is currently, and proposed to fund districts under its own formula SFRA. This is the formula that was adopted by the legislature under the previous administration and was subsequently upheld by the state court. More on these issues in a later post.

I’ve not had time to reconstruct my own simulations of SFRA projected out over the next several years, so I’ve used data pulled together by the Education Law Center and SOS NJ in which they have projected (SOS NJ) out the SFRA funding shortfalls for each district for the next 5 years. The figure below shows that in the current year, funding shortfalls from the current legislated formula are smaller in districts that are home to priority and focus schools (note that the formula itself significantly reduced targeted effort to these districts when it was implemented).

But, over the next few years, it is expected that as these schools – priority and focus – are subjected to takeover/overhaul/closure – their districts will be increasingly shorted in their funding with respect to what the formula estimates.  That is, the overall strategy here appears to be to identify high need schools for takeover/closure and then systematically and substantially reduce their financial support over time.

Cumulatively, over the next five years, districts of priority schools stand to lose much more on a per pupil basis (relative to what the formula dictates they should receive) than districts of reward schools.

Put bluntly, the goal is to “reform”(?) priority and focus schools and close achievement gaps by taking all of that harmful money away from them and giving it to others who are far less needy! Yeah… that’ll learn-’em!

This is all strangely consistent with the framing of the commissioners report, that was not a report, on school funding and achievement gaps in New Jersey. In that report, Commissioner Cerf essentially proposed (via a series of bad and worse graphs) that the road toward closing New Jersey’s achievement gap should be paved by reducing funding to high need minority districts and shifting it to lower need, lower minority concentration districts. Strange logic indeed.

And these reductions presented above don’t account fully for the plethora of other alterations proposed to the state school funding formula that might further reduce funding to higher need districts – funding to districts that are home to priority and focus schools.

The following posts critique some of the proposed changes, and address other related issues:

Closing Thoughts

As I noted on my previous post, I can hear the reformy outcry now that this is all warranted because we’ve provided poor and minority kids the worst schools and worst teachers for so many years. This is merely an attempt to remedy this persistent, intractable disparity.  The problem with this logic is the placement of blame (in addition to the questionable legal authority and ill-conceived remedies).

We’re not measuring school performance here. There’s no basis in these classification schemes for implying that the teachers and administration are the ones who failed the children. These are junk, gerrymandered classification schemes. They are based on arbitrary distinctions being made with inadequate data/information.