Debunking Myths: Characteristics of Stayers & Leavers in New Jersey

For this one, the graphs pretty much tell the story. I’ve had these data sitting around for a while and just never got around to making the graphs. I’ve used data on migration patterns across cities and states from the American Community Survey in the past. The American Community Survey data are annual survey data which, among other things, include information on employment status, place of residence, place of work, wage income, household income and a bunch of other useful stuff. Since 2000, ACS has been doing annual data collection and has increased sample sizes from 2005 to 2009, increasing the questions that can be addressed with the data. The ACS data also include questions regarding whether the respondent lived in a different location the previous year. Since you have the current year location, whether an individual lived elsewhere the previous year, and where they lived, it’s relatively easy to tabulate characteristics of individuals who a) live in New Jersey in the current survey year but lived elsewhere the previous year, (Moved In) b) live in another state in the current year, but lived in New Jersey the previous year (Moved Out), or c) lived in New Jersey in the current and previous year (STAYER).

The idea for this post had come about a long time ago, when I kept hearing over and over again how New Jersey’s taxes (which I wrote about here) are driving out the state’s highest income (and most productive) residents. As usual, this statement was spun in a number of ways referring loosely to wealth, or income, or “rich” versus poor, but always with the implication that those who otherwise would contribute most to state tax revenues by virtue of their income are the ones headed for the exit. These claims were typically based loosely on a highly questionable secondary report of an earlier study, using data from the earlier part of the decade.

Here’s my quick run at the ACS data on individuals between the ages of 25 to 65 – the majority of wage earners.

Note that the ACS doesn’t survey absolutely everyone. It’s based on a sample. A pretty big sample for these years, but a sample nonetheless. As a result, to project the findings to the total population, one has to use weightings provided in the data (person weight, in this case).

Figure 1. Total numbers of 25 to 65 year olds coming and going

The first figure shows roughly similar numbers of 25 to 65 year olds coming and going. If anything, a few more are coming each year than going.

Figure 2. Income from Wages for those Coming and Going

Figure 2 shows that the income from wages for those coming in is slightly higher than for those leaving.

Figure 3. Household income for those Coming and Going

Household income is also marginally higher for those coming than for those leaving over time.

Figure 4. Education Level of those Coming and Going

This figure shows that on average, those coming into New Jersey have higher levels of education than those leaving. The blue bars, from associates degree or higher, through every higher level of education, are higher than the red bars. That is,those coming into New Jersey tend to be more likely to have a BA or higher, an MA or higher, or a professional or doctorate degree than those leaving New Jersey.

Figure 5. Household Income by State Moved To

Part of the rhetoric – mostly radio talk blather from NJ 101.5 – is that all of those high income earners headed to the exits are headed straight toward those lower tax burden and lower cost of living states like the Carolinas and Florida. Well, as it turns out, the higher income earners that are leaving NJ – those that have higher household income than those who stay in NJ – happen to be moving to Massachusetts, California or New York – not states that one would typically call tax safe havens – but then again – most of the rhetoric regarding high and low tax states is misguided anyway. Those headed to Southern states and to Pennsylvania tend to have lower household income than those who stay in NJ and tend to have lower income than the average leaver.

TOTAL “MOVED TO” States

Note – The difficulty here is that with the ACS data, income is only reported for the current year, not previous income. So, income levels in this graph are income levels after the move, or in the state moved to and not the income level of the household when in NJ the previous year. That said, it is certainly the case that an income of $60k to $80 in those southern states would not otherwise be over $100k in NJ but for the supposed difference in total taxes. Yes, that lower income may provide comparable housing, etc., but that difference is largely a function of housing price and assessed value, not effective tax rate.

NPR Story: http://www.npr.org/blogs/money/2011/04/29/135813061/studies-rich-dont-flee-high-tax-states

The Perils of Favoring Consistency over Validity: Are “bad” VAMS more “consistent” than better ones?

This is another stat-geeky researcher post, but I’ll try to tease out the practical implications. This post comes about partly, though not directly in response to a new Brown Center/Brookings report on evaluating teacher evaluation systems. From that report, by an impressive team of authors, one can tease out two apparent preferences for evaluation systems, or more specifically for any statistical component of those evaluation systems to be based on student assessment scores.

A preference to isolate as precisely as statistically feasible, the influence of the teacher on student test score gains;
A preference to have a statistical rating of teacher effectiveness that is relatively consistent from year to year (where the more consistent models still aren’t particularly consistent).

While there shouldn’t necessarily be a conflict between identifying the best model of teacher effects and having a model that is reliable over time, I would argue that the pressure to achieve the second objective above may lead researchers – especially those developing models for direct application in school districts – to make inappropriate decisions regarding the first objective. After all, one of the most common critiques levied at those using value-added models to rate teacher effectiveness is the lack of consistency of the year to year ratings.

Further, even the Brown Center/Brookings report took a completely agnostic stance regarding the possibility that better and worse models exist, but played up the relative importance of consistency, or reliability, of the teacher’s persistent effect over time.

There are “better” and “worse” models

The reality is that there are better and worse value-added models (though even better ones remain problematic). Specifically there are better and worse ways to handle certain problems that emerge from using value-added modeling to determine teacher effectiveness. One of the biggest issues is how well the model corrects for problems of the non-random assignment of students to teachers across classrooms and schools. It is incredibly difficult to untangle teacher effects from peer group effects and/or any other factor within schooling at the classroom level (mix of students/ lighting/heating/ noise/ class size). We can only better isolate the teacher effect from these other effects if each teacher is given the opportunity to work across varied settings and with varied students over time.

A fine example of taking an insufficient model (LA Times, Buddin Model) and raising it to a higher level with the same data are the alternative modeling exercises prepared by Derek Briggs & Ben Domingue of the University of Colorado. Among other things, Briggs/Domingue shows that by including classroom level peer characteristics in addition to student level dummy variables for economic status and race, significantly reduces the extent to which teacher effectiveness ratings remain influenced by the non-random sorting of students across classrooms.

In our first stage we looked for empirical evidence that students and teachers are sorted into classrooms non-randomly on the basis of variables that are not being controlled for in Buddin’s value-added model. To do this, we investigated whether a student’s teacher in the future could have an effect on a student’s test performance in the past—something that is logically impossible and a sign that the model is flawed (has been misspecified). We found strong evidence that this is the case, especially for reading outcomes. If students are non-randomly assigned to teachers in ways that systemically advantage some teachers and disadvantage others (e.g., stronger students tending to be in certain teachers’ classrooms), then these advantages and disadvantages will show up whether one looks at past teachers, present teachers, or future teachers. That is, the model’s outputs result, at least in part, from this bias, in addition to the teacher effectiveness the model is hoping to capture.

Later:

The second stage of the sensitivity analysis was designed to illustrate the magnitude of this bias. To do this, we specified an alternate value-added model that, in addition to the variables Buddin used in his approach, controlled for (1) a longer history of a student’s test performance, (2) peer influence, and (3) school-level factors.

Clearly, it is important to include classroom level and peer group covariates to attempt to identify more precisely the “teacher effect,” and remove the bias in teacher estimates that results from the non-random ways in which kids are sorted across schools and classrooms.

Two levels of the non-random assignment problem

To clarify, there may be at least two levels to the non-random assignment problem, and both may be persistent problems over time for any given teacher or group of teachers under a single evaluation system. In other words: Persistent non-random assignment!

As I mentioned above, we can only untangle the classroom level effects, which include different mixes of students, class sizes and classroom settings, or even time of day a specific course is taught, if each teacher to be evaluated has the opportunity to teach different mixes of kids, in different classroom settings and at different times of day and so on. Otherwise, some teachers are subjected to persistently different teaching conditions.

Focusing specifically on the importance of students and peer effect, it is more likely than not, that rather than having totally different groups and types of kids year after year, some teachers:

persistently work with children coming from the most disadvantaged family/household background environments;
persistently take on the role of trying to serve the most disruptive children.

At the very least, statistical modeling efforts must attempt to correct for the first of these peer effects with comprehensive classroom level measures of peer composition (and a longer trail of lagged test scores for each student). Briggs showed that doing so made significant improvements to the LAT model. And Briggs showed that the LAT model contained substantial biases, and failed specific falsification tests used to identify those biases. Specifically, the effectiveness of a student’s subsequent teacher could be used to predict the effectiveness of their previous teacher. Briggs/Domingue note:

These results provide strong evidence that students are being sorted into grade 4 and grade 5 classrooms on the basis of variables that have not been included in the LAVAM (p. 11)

That is, a persistent pattern of non-random sorting which affects teachers’ effectiveness ratings. And, a persistent pattern of bias in those ratings that was significantly reduced by Briggs’ improved models.

At this point, you’re probably wondering why I keep harping on this term “persistent.”

Persistent Teacher Effect vs Persistent Model Bias?

So, back to the original point, and the conflict between those two objectives, reframed:

Getting a model consistent enough to shut up those VAM naysayers;
Estimating a statistically more valid VAM, by including appropriate levels of complexity (and accepting the reduced numbers of teachers who can be evaluated as data demands are increased).

Put this way, it’s a battle between REFORMY and RESEARCHY. Obviously, I favor the RESEARCHY perspective, mainly because it favors a BETTER MODEL! And a BETTER MODEL IS A FAIRER MODEL! But sadly, I think that REFORMY will too often win this epic battle.

Now, about that word “persistent.” Ever since the Gates/Kane teaching effectiveness report, there has been new interest in identifying the “persistent effect of teachers” on student test score gains. That is, an obsession with focusing public attention on that tiny sapling of explained variation in test scores that persists from year to year, while making great effort to divert public attention away from the forest of variance explained by other factors. “Persistent” is also the term du jour for the Brown/Brookings report.

A huge leap in those reports referring to “persistent effect” is to expand that phrase from the persistent classroom level variance explained to: “persistent year to year contribution of teachers to student achievement.” (p. 16, Brown/Brookings) It is assumed that any “persistent effect” estimated from any value added model – regardless of the features of that model – represents a persistent “teacher effect.”

But the persistent effect likely contains two components – persistent teacher effect & persistent bias – and the balance of weight of those components depends largely on how well the model deals with non-random assignment. The “persistent teacher effect” may easily be dwarfed by the “persistent non-random assignment bias” in an insufficiently specified model (or one dependent on crappy data).

AND, the persistently crappy model – by failing to reduce the persistent bias – is actually quite likely to be much more stable over time. In other words, if the model fails miserably at correcting for non-random assignment, a teacher who gets stuck with the most difficult kids year after year is much more likely to get a consistently bad rating. More effectively correct for non-random sorting, and the teacher’s rating likely jumps around at least a bit more from year to year.

And we all know that in the current conversations – model consistency trumps model validity. That must change! Above and beyond all of the MAJOR TECHNICAL AND PRACTICAL CONCERNS I’ve raised repeatedly in this blog, there exists little or no incentive, and little or no pressure from researchers (who should no better) for state policy makers or local public school districts to actually try to produce more valid measures of effectiveness. In fact, too many incentives and pressures exist to use bad measures rather then better ones.

NOTE:

The Brookings method for assessing the validity of comprehensive evaluations works best/only works with a more stable VAM model. This means that their system provides an incentive for using a more stable model at the expense of accuracy. As a result, they’ve sort of built into their system – which is supposed to measure accuracy of evaluations – an incentive for less accurate VAM models. It’s kind of a vicious circle.

Research Warning Label: Analysis contains inadequate measurement of student poverty

I’ll likely regret writing this post at some point. But this is a really, really important issue and one that undermines a very large number of prominent research studies on the effectiveness of various school reforms, especially when evaluated in high poverty contexts.

I blogged about this a few weeks back – the problems of poverty measurement in educational research. But this issue continues to come up in e-mails and other conversations. And it’s a critically important issue that so many researchers callously overlook. My sensitivity to this issue is heightened by the potential problems emergent from using bad poverty measurement in models to be used for rating and comparing teacher effectiveness.

Here, I pose a challenge to my research colleagues out there.

3 Reporting Rules for Studies/Models Using Crude Poverty Measures

Rule 1: Descriptive/Distribution Reporting of Poverty Measure

If using a single dummy variable to identify kids as qualifying for free or reduced price lunch, include sufficient descriptive statistics to show just how much or how little variance you are actually picking up with this measure. For example, if using this single “low income” indicator, report how many students qualify, and how many students within each nested group.
If, for example, you’ve got 70% of more of your sample identified with this single “low income” dummy variable, then you are assuming that 70% to be statistically equally poor. If, 60% of the classrooms in your sample have 80% or more students who qualify, you are essentially classifying all of those classrooms as being statistically similar. HAVE THE INTEGRITY TO POINT THAT OUT.

Remember, here’s the variance in % free or reduced lunch across Cleveland Schools.Not very useful, is it?

Is Cleveland just a huge outlier?

Well, in Texas, in 2007:

93% of Dallas elementary schools had over 80% free + reduced lunch

84% of Houston elementary schools had over 80% free + reduced lunch

100% of San Antonio elementary schools had over 80% free + reduced lunch

As such, any analysis which uses only this measure to capture variations in economic status of students across schools within these districts should be interpreted with caution.

Rule 2: Reporting of Relationships between Variance in Poverty and Outcome Measures

If using a single dummy variable to identify kids as qualifying for free or reduced lunch, report the relationship between that variable and student outcome measures. We know from various studies that gradients of poverty and household resources do have strong relationships with student outcome measures. If, at the classroom or school level, the percent of children who qualify for free or reduced lunch has only a modest to weak relationship with classroom or school level outcomes, chances are your poverty measure is junk (That is, there is a greater likelihood that this finding represents a flaw in the poverty measure – lack of variance – than in the likelihood that you are evaluating a system where the poverty-outcome relationship has been completely disrupted. Further, to be confident of the latter, we have to fix the former).

In high poverty settings, your measure may be junk because the range of shares of kids who qualify for free or reduced lunch only varies from about 70% or 80% up to 100%. That is, across nearly all classrooms, nearly all students are from families fall below the 185% income level for poverty. Much of the remaining variation between 80% and 100% is just reporting noise or error.

Any legitimate measure of child poverty or family income status, when aggregated to the classroom or school level will likely be significantly, systematically related to differences in student outcomes. Report it! If it’s not, the measure is likely insufficient. HAVE THE INTEGRITY TO POINT THAT OUT.

EXAMPLE

The following two graphs show us how important it can be to explore using alternative poverty thresholds, such as looking at numbers of children falling below the 130% income threshold versus the 185% threshold in a high poverty setting. The goal is to find the measure that a) works better for picking up variation across school settings or classrooms and b) as a result, picks up poverty variation that may explain differences in student outcomes.

Figure 1 shows the relationship between school level % free OR reduced lunch and 8th grade math proficiency in Newark in 2009

While there appears to be a relationship, most schools fall above 80% free or reduced lunch and the relationship between this poverty measure and student outcomes seems surprisingly weak. On the one hand, we could draw the conclusion that this means that all NPS schools are just so high in poverty that it really doesn’t matter (a ridiculous assertion, to say the least). That all of the kids are poor, and these high poverty levels affect their outcomes similarly, and those remaining variations are all about good and bad teaching, and charter versus traditional public schools.

Figure 2 shows the relationship between school level % free lunch only and 8th grade math proficiency in Newark in 2009

When we use a more sensitive measure, we nearly double the amount of variation we explain in student outcomes, and we severely undermine those conclusions above. From 40% to 80% free lunch there exists a pretty darn strong relationship with student outcomes. Above that, it still erodes somewhat. But this too might be clarified by using an even stricter poverty threshold or a continuous measure of family income. CLEARLY, IT WOULD BE INSUFFICIENT TO USE THE FIRST MEASURE OF POVERTY – FREE + REDUCED – AS A CONTROL VARIABLE IN AN ANALYSIS OF NEWARK SCHOOLS, OR FOR THAT MATTER AN EVALUATION OF NEWARK TEACHERS.

Rule 3: Reporting of Numbers/Shares of Cases Potentially Affected by Omitted Variables Bias (extent to which crude poverty measure compromises validity of model results)

Let’s say you or I have taken each of these first steps, but we decide to go ahead and conduct our analysis of charter school effectiveness, or ratings of individual teacher value added anyway, using the single student level dummy variable for “poorness” (based on free or reduced price lunch). After all, we’ve got to publish something? Now it is incumbent upon you (or I), the researcher, to appropriately represent the extent to which these data shortcomings may bias your (or my) analyses.

For example, in an analysis of teacher effects, it would be relevant to report the number and share of teachers with classrooms having 80% or more children who qualify. Why? Because you’ve chosen statistically to assume that every one of their classrooms full of children are statistically the same in terms of economic disadvantage – EVEN WHEN THEY ARE NOT! Those teachers with the lowest income children may be significantly disadvantaged by this “omitted variables” bias in the model.

Why not just report the overall correlation between effectiveness ratings and classroom level % free or reduced lunch? Yeah… You’re banking on getting that low correlation between teacher effectiveness ratings and % low-income, so you can say your ratings aren’t biased by poverty. Not so fast. You’re likely wrong in making that assertion, given the data. Instead, what you’re showing is that your really crappy poverty measure simply failed to pick up real differences in economic status across classrooms and thus failed to correct for differences in true economic status of students when determining teacher ratings. And then, your crappy poverty measure remained uncorrelated with the biased estimates it helped produce. Really helpful? eh?

Fess up to reality, and report the numbers of teachers across which your model does not effectively control for economic status differences among students – all teachers with classrooms that are say, 80% or more, free or reduced price lunch. HAVE THE INTEGRITY TO POINT THAT OUT.

Here are the factors in the NYC Value-added model. How many teachers have classrooms that are treated as statistically equivalent when they are not? Any teacher effectiveness model applied in a high poverty setting – like a large urban district – that relies solely on the single “low-income” dummy variable – is likely entirely invalid for making comparisons across very large shares of teachers included in the model.

EXAMPLE

So, could we really draw wrongheaded conclusions by using insensitive poverty measurement, and by not checking and fully reporting on distributions? Here’s one example how we might make stupid assertions, using data from 2007 on schools in the Cleveland metro and in the City of Cleveland.

Figure 3 shows the relationship across all elementary schools in the metro and in Cleveland city between % free or reduced lunch and percent passing state assessments

Now, lets assume that we are trying to figure out if for some reason, Cleveland has been unusually successful at disrupting the relationship between % free or reduced lunch and student outcomes, and we wish to compare the relationship within Cleveland to the relationship across all schools surrounding Cleveland. If we didn’t do the visual above, we might miss something huge (actually, given the Cleveland quirk – 100% of schools 100% free or reduced, we likely wouldn’t miss this, but in other less extreme cases we might). Here the pattern shows a very strong relationship between % free or reduced lunch and student outcomes across all schools, and absolutely no relationship between free or reduced lunch and outcomes in Cleveland – A freakin’ miracle! BUT IT’S ENTIRELY BECAUSE THERE’S NO FREAKIN’ VARIATION IN THE POVERTY MEASURE WITHIN CLEVELAND!

We can easily use this same pattern to our advantage to show that the state of Ohio has made progress on the distribution of funding by poverty across schools, but that Cleveland and other cities have not followed through, and are the real problem. That is, that funding per pupil is more tightly related to poverty between districts than across schools within districts. States have fixed the between district problem, but cities have not fixed the within district problem. This is a common Center for American Progress and Ed Trust claim (which is completely unfounded).

Figure 4 shows the estimation of the within and between district funding-poverty relationships for the Cleveland area, in a (completely bogus) way that supports the CAP and Ed Trust claim.

Yes, Cleveland provides and absurd extreme. But, this same problem occurs when comparing any city where variation in the poverty measure across schools ranges from 80% to 100% and where variation in the poverty measure across districts ranges from 0% to 100% (See Newark example above).

No more excuses

The problem for researchers and evaluators is that states maintain multiple data systems that don’t always include the same gradients of data precision. We can find in STATE SCHOOL REPORTS – SCHOOL AGGREGATE DATA systems, information on the numbers and shares of school enrollment that are free lunch, reduced lunch, and sometimes other indicators such as homelessness. But, these data are not included in the STUDENT LEVEL DATA SYSTEM LINKED TO ASSESSMENT OUTCOMES. Instead, those data systems which must be used for value-added modeling or for measuring effectiveness of specific reforms, such as enrollment in charters, include only a handful of simple indicator variables about each student.

Therein lies the typical research excuse – one that I use as well. “It’s what we have! You can’t expect us to use something better if we don’t have it!” No, I can’t. No, we can’t expect you (or I) to use something better if we don’t’ have it. BUT WE CAN EXPECT AN HONEST REPRESENTATION OF THE SHORTCOMINGS OF THESE DATA. And those shortcomings are HUGE, and the stakes are HIGH, especially when we are using these data to compare teacher effectiveness and determine who should be fired, or when we are asserting that charter schools more effective with low income students (if they aren’t actually serving the lower income students).

Readers: Please send along examples of recent prominent studies where the reported statistical model uses only a single indicator for free or reduced lunch to control for either or both a) differences across individual students and b) differences in peer groups, or classroom level effects.

WHAT ABOUT LOS ANGELES, WHERE THE LA TIMES MODEL USED ONLY A SINGLE DUMMY VARIABLE ON FREE+REDUCED LUNCH (actually, the technical report refers ambiguously to students qualified for Title I, with no definition of the variable at all! http://www.latimes.com/media/acrobat/2010-08/55538493.pdf)?

Well, the vast majority of Los Angeles elementary schools have over 80% children qualifying for free + reduced lunch, suggesting that this measure simply won’t capture relevant variation across settings. The majority of LA schools (and classrooms within them) will be treated as statistically equivalent in terms of poverty in a model which only identifies poverty by free + reduced lunch. (data are from the NCES Common Core for 2008-09)

Simply adjusting the poverty threshold downward to the free lunch cut off, spreads the distribution – capturing considerably more variation across schools:

Still the majority of LAUSD elementary schools are over 80% free lunch, indicating that even this measure is likely not sufficiently sensitive to underlying differences in poverty/economic status. Again, it is simply an irresponsible assertion to claim that these schools which have over 80% of children who fall below the 130%, or 185% income level for poverty are pretty much the same. Using a statistical model that claims to correct for economic status, but uses only this measure to do so – depends on that irresponsible assertion! At the very least, this is an assertion that requires considerably more investigation.

Blank Slate: Private School Leaders Step Up!

I’ve noted on several occasions on Twitter (@schlfinance101) and on my blog that I am actually a supporter of high quality private independent schools. In the 1990s, I was a middle school science teacher at The Fieldston School in Riverdale, NY. That experience sticks with me to this day as I write about public education policy issues. In fact, Fieldston helped provide the financial support for the pursuit of my doctoral studies at Columbia University, and for that I thank them. Yes, they helped pay for the meaningless advanced degree that eventually led me to leave (so perhaps it was worth it for them?). Being at a school that supported my own academic/intellectual endeavors was important for me, and I expect I’m not alone in that regard. In high school, I summered at Phillips Exeter Academy after my sophomore year. I attended an expensive, competitive small liberal arts college (Lafayette, in PA – more on that at a later point in time). I’ve spent much of my time around private education, in particular, the more elite tiers of private education. I have no shame in those affiliations (generally speaking), and on some occasions, I’m actually proud of it.

I have a genuine appreciation for what these institutions can offer. I am by no stretch of the imagination a private- school-basher (as some would characterize anyone who dares point out that good private schools often spend much more per child than nearby public schools). Anything but. I am a realist. I am an analyst. I have written extensively about private school spending and characteristics in this report: http://nepc.colorado.edu/publication/private-schooling-US

That said, this blog post is intended to START a conversation. This blog post is an invitation and is specifically an invitation to headmasters, deans and other administrators and board members at leading private independent schools around the country. You can e-mail me officially, by name and school affiliation, or you can, if you choose to, remain anonymous, as long as you are willing to allow me to at least list your “title” and a brief descriptor of the school you represent (for example: Head of Upper School, Highly Selective Independent Day School in Northeastern City). There are two issues I invite you to address:

What is your perspective on the importance of class size, either from the perspective of “effectiveness” (on student outcomes) or marketing? Do you feel that class size is important? Why? What drives your decisions about class size in your school? Feel free to stray outside these narrow questions.
What are your thoughts on the recruitment, selection, retention, evaluation and compensation of teachers? (yeah… that’s a lot, but feel free to focus on one or two). What is your ideal approach to teacher evaluation? What is the current approach in your school, and what are the strengths/weaknesses? Have you changed that approach over time? Who are the key players in the evaluation process and what are their roles? How are evaluations used (dismissal?). How is compensation structured? Is it performance based and if so, by what types of measures? Feel free to elaborate on other related issues not listed here.

You may use the comment section below, or you may e-mail me at educpolicy@gmail.com. If you post in the comments below, you must provide me with a valid e-mail for determining that you are, in fact, who you claim. Comments are held for approval. If you wish to remain anonymous, send an e-mail to the above address and provide me with the relevant – Title – School Descriptor – for how you wish to be identified (that is, not identified). Identify specifically which information in your e-mail you wish for me to post (more importantly, if there’s anything you want to say, but don’t want posted).

Thanks!

Bruce D. Baker

============================

Ron Reynolds of The California Association of Private School Organizations Responds, with a focus on CLASS SIZE:

============================

Dr. Baker,

Sorry this took me so long, but work beckons…

In the interest of full disclosure, I am neither the headmaster, dean, administrator, or board member of “a leading private independent school,” nor do my views necessarily reflect or represent those of any such persons. I am the executive director of the California Association of Private School Organizations, a statewide association of private school administrative units and service agencies affiliated with the Council for American Private Education. The views that follow are my own.

Setting aside the question of what criteria designate a “leading” private independent school, independent schools, whether “leading” or otherwise, comprise a relatively small, if remarkable segment of the nation’s private school universe. Such schools (which I regard as schools affiliated with the National Association of Independent Schools) account for roughly 5 percent of all private schools in the United States, and 11 percent of all private school students enrolled in any of grades K-12.

Journalists frequently write of independent schools as if they were representative of the entire private school universe. While misleading, the tendency is, to some extent, understandable, given that the National Association of Independent Schools collects and maintains an impressive array of data that is largely inaccessible, or nonexistent for the broader U.S. private school universe.

You, Professor Baker, are no stranger to this problem. As Willie Sutton did with banks, so did you resort to using private school tax returns as your principal source of information for the paper referenced in your invitation (“Private Schooling in the U.S.: Expenditures, Supply, and Policy Implications”) because that’s where the data are. While the use of figures contained in IRS Form 990 reflects an admirable degree of ingenuity, such creativity comes at the cost of generalizability – a lacuna which you, admirably, observed. One piece of information I believe you failed to mention, however, is that private schools operating on a for-profit basis appear to be completely excluded from your analysis. Such schools do not comprise an insignificant sub-group. In the state of California, for example, for every independent school there are more than five private schools operating on a for-profit basis (though independent schools, in the aggregate, enroll a greater number of students).

You, of course, are in no way responsible for the absence of such data, and I, to the extent that I am a representative of the broader private school community owe you and others a mea culpa. Regrettably, the lack of such data also complicates determinations of class size. In order to address the issue of class size from a broadly inclusive private school perspective, it is necessary to use student-to-teacher ratios as a proxy measure. I am cognizant that such a proxy presents certain problems, just as you recognized that the use of IRS Form 990 data was less than ideal.

With the preceding caveat in mind, NCES data place the student-teacher ratio for all U.S. public schools in 2007-08 at 15.7. Some readers will reflexively respond to this datum by saying, or thinking that such a figure is misleading. After all, not all teachers included in the computation of the ratio are assigned to (regular) classrooms. A great many, for example, work with children presenting various types of special needs.

While such a qualification undoubtedly possesses merit, it must also be noted that the reduction in class size attributable to the inclusion of special education teachers comes at a considerable cost. The federal government, for example, currently allocates $11.3 billion (the vast majority of which flows to public schools) through the Individuals with Disabilities Education Act, to support the provision of special education and related services. I’m not sure whether you, Professor Baker, included such funding in your computation of public school expenditures, but $11.3 billion would provide roughly enough tuition to nearly double the current national Catholic school enrollment.

All of the above is offered by way of cautionary preface to the qualification that any comparative discussion of class size is subject to various contextual and methodological considerations that can prove problematic. That having been said…

While smaller class sizes, relative to public schools, has long been a hallmark of American private education, significant variability can be inferred to exist within the private school universe. For example, the (FTE) student-teacher ratio for all California private schools in 2009-10 was 12.5. Among independent schools, the figure was 9.4. In for-profit private schools the ratio was 7.8, while among the state’s Catholic schools it was 18.7 – eclipsing the national public school ratio cited above, but falling short of California’s public school student-teacher ratio of 20.8. (These ratios have been computed using California Department of Education data for 2009-10.)

Magnitude of enrollment appear to be positively correlated with student-teacher ratios. (Yeah, I know. D’uh! But independent schools may present an exception to this observation which, if true, invites comment.) Among all California private schools with a total enrollment in excess of 100 students, the student-teacher ratio was 13.9, while the ratio was 14.5 for schools with enrollments of more than 250, and 14.8 for schools with enrollments exceeding 500.

Religious orientation would also appear to be a significant factor. Fully fifty percent of California’s total private school enrollment is located in religious schools with total enrollments exceeding 250. In these schools, the student-teacher ratio is 16.1 – a figure that is higher than the NCES national public school figure cited above. Obviously, the generally lower levels of tuition charged by schools whose religious mission includes making their educational program available to every family seeking access tends to reduce financial barriers to enrollment and contribute to presumptively larger class sizes. Which points to the expectation that an inverse relationship exists between tuition and class size. (Alas! If only comprehensive tuition data were available.)

Independent schools, in which tuition is generally higher than that associated with most religious schools (though it must be noted that some religious schools are also classified as independent schools) serve to underscore the preceding observation. Among California independent schools with enrollments in excess of 500 students, the overall student-teacher ratio is 9.9, fully a third lower than the figure for the remainder of all California private schools with similarly robust enrollments.

At this point, I envision you, Professor Baker, muttering: Now you know why I focused on independent schools in the first place! So, allow me to tell you, at long last, what I think it is that is being offered/purchased for the money.

The premium paid by private school parents is part of a complex value proposition in which inducements to participation must outweigh associated sacrifices. Several components of this value proposition involve class size. As I see it, these components include the provision of an augmented curricular program, and increased access to instructional staff by both students and parents.

You, Professor Baker, taught at The Fieldston School. A check of that school’s website reveals that Fieldston offers its lower school students a wood shop program, provides dance and visual arts classes to its middle school students, and affords its high school students the opportunity to study Greek, Latin, and/or Mandarin Chinese, in addition to French and Spanish. Along with its additional courses, the school offers a specialized, pervasive approach to instruction that is expressed as follows: “At every grade we teach common beliefs such as understanding multiple perspectives, seeing the world beyond the self, creativity and imagination, developing habits of justice, fairness, and empathy, respect for all people and points of view, and a critical approach to decision-making.”

I don’t think it’s much of a stretch to assume that most independent schools offer a more robust variety of classes across the curriculum, and particularly in the arts and humanities, than is generally the case in both public schools and other private schools. The provision of an augmented curriculum is driven by a combination of factors that include various visions of what is entailed by a robust humanistic education that endeavors to shape the whole person, market demand, and resources.

Your research found that Jewish day schools tended to spend more, per pupil, than other categories of private schools. These schools generally offer not only a full complement of secular studies courses, but instruction in Hebrew language, bible, rabbinic literature, Jewish history, customs and holidays, and Israel studies.

To some extent, then, I would argue that smaller class size in private schools is a by-product of a more robust prescribed curriculum, coupled with a great number of elective offerings.

I also believe that in exchange for the tuition premium paid by parents there exists an expectation of greater access – both by students and parents – to the instructional staff. While many in the private school community often view teachers in religiously-oriented schools as members of a family – a voluntary community that coalesces around a shared faith and common core of values, the same is often true of independent school faculty members who identify deeply with the culture and vision associated with their particular school.

In both independent and religious private schools, the expectation of enhanced teacher availability, responsiveness, and commitment is often deeply engrained in the culture of the institution. Private school parents often possess teachers’ phone numbers and e-mail addresses, and frequently contact them after school hours. Teachers are expected to provide more robust and time consuming forms of student evaluation, ranging from extended homework assignment feedback, to participation in more frequent student and parent conferences, to in-depth written assessments of student portfolios and/or journals, participation in child study meetings, and extensive written documentation of student growth and academic progress. Smaller class size is thus, to some extent, a by-product of enhanced labor-intensive expectations held by parents, those involved in school governance, and teachers, themselves.

Best,

Ron

Dr. Ron Reynolds

Executive Director

California Association of Private School Organizations

15500 Erwin St., #303

Van Nuys, CA 91411-1017

==================

My personal response on few points above:

==================

Yes, a major issue in making comparisons between public and private schools is that private schools – because they are less regulated – are simply more varied. This point is, as Ron Reynolds notes, missed by most. As my report discusses, some private schools significantly outspend publics and some spend much less. Some have much smaller classes, and some much larger. Some pay their teachers much less, and some comparable (few private schools pay their teachers much more – the additional money is more often leveraged to broader/deeper curriculum).

Also, it is often the case that the biggest differences in private school class size are not so much a function of smaller elementary grade classes (they are smaller, but not usually half the size), but rather a function of private schools offering a diverse array of elective courses at the secondary level.

Now, from a personal perspective, I agree that the expectation of parental involvement/interaction is greater in the private school setting, especially in a private independent school like Fieldston. However, I would argue that there are some significant counterbalancing factors. For example, at Fieldston, my teaching time consisted of 16 45 minute periods per week – 4 sections meeting 4 times weekly each. Each section typically had fewer than 20 students. On top of that I had 10 to 12 advisees – from among my total student load of less than 80. Maintaining contact with the parents of 12 students, actively, and being responsive to the needier parents from among the 80 students is much less of a task than most public (or Catholic school) teachers would face if expectations were similar. It was far fewer students than I would have had if I was teaching middle school science in a public school with 6 classes, meeting every day of the week, and 25 kids per class. And there was a lot more time in my day to make contacts. These may be important structural issues to explore. But they all come back to pupil to teacher ratio.

Graph of pupil to teacher ratios over time: https://schoolfinance101.com/wp-content/uploads/2010/10/slide23.jpg

Bruce

==================

Dr. Baker,

“What is your perspective on the importance of class size, either from the perspective of “effectiveness” (on student outcomes) or marketing? Do you feel that class size is important? Why? What drives your decisions about class size in your school? Feel free to stray outside these narrow questions.

STAR Prep Academy is a small school by design and we cap all classes at ten students. We do this for the following reasons: 1) We believe quite strongly in differentiation. With a smaller class we can use information about student interests and abilities to differentiate instruction within the class. With a larger group, teachers have difficulty working on diverse projects that match student needs. Furthermore, smaller class sizes reduce paperwork, total student load, time spent passing out papers, etc. Many schools use small class size as a marketing tool, but if they do not actually utilize those small class sizes, it is just a number.

“What are your thoughts on the recruitment, selection, retention, evaluation and compensation of teachers? (yeah… that’s a lot, but feel free to focus on one or two). What is your ideal approach to teacher evaluation? What is the current approach in your school, and what are the strengths/weaknesses? Have you changed that approach over time? Who are the key players in the evaluation process and what are their roles? How are evaluations used (dismissal?). How is compensation structured? Is it performance based and if so, by what types of measures? Feel free to elaborate on other related issues not listed here.”

Teacher evaluation should be done in an developmental manner, allowing veteran teachers an opportunity to share their knowledge base and guide their own development, while new teachers receive more formative guidelines. While we do not currently use evaluations for compensation, we do consider this as a key component in the future. We would also add adjunct duties, participation in outside events and other criteria to the compensation model. Annual raises, outside of COLA do not seem to be appropriate within our environment.

Regards,

Zahir Robb
Head of School
STAR Prep Academy
10101 Jefferson Blvd.
Culver City, CA 90232
(310) 842-8808

ConnCan Cluelessness

Or is it just a school finance Conn-job, in a CAN?

In their response to my Think Tank Review of Spend Smart: Fix Our Broken School Funding System, ConnCan asserts that I claim that Connecticut’s school finance formula is not broken. (see: http://ht.ly/4BknI)

As I state in my report, it’s not that the formula is not problematic, but that ConnCan fails to make any reasonable case that it is – even though it is. Their analysis is simply too shoddy, weak, incompetent to validate that it is broken, or how it is broken. I explain:

There may in fact be legitimate concerns over the equity and adequacy of funding to Connecticut schools as a result of significant problems with the Education Cost Sharing Formula. However, the ConnCAN Spend Smart report provides little or no supporting evidence for their claim that the system is broken or how their proposals would be an effective solution if it indeed is in need of repair.23

I actually show some of the problems in my brief, and have shown these problems in the past. My point in the critique is that ConnCan’s shoddy brief does little to help one understand the problems with the CT school finance system, and in fact provides multiple distractions and significant misinformation.

ConnCan also asserts that I claim that their proposal would harm low income children. Rather, I assert that ConnCan recommends only a relatively low weight for children qualifying for free or reduced price lunch and that they ignore entirely districts with high concentrations of LEP/ELL children.

ConnCan argues that I unfairly suggest that they oppose weighting for LEP/ELL children. While they do hold open the possibility that those children might receive supplemental funding in the future, they also suggest that they have done analysis already, or know of analysis, that indicates that it probably isn’t necessary. This suggestion is not backed by anything, and is completely irresponsible.

Here’s their footnote on this point:

―The formula could also hypothetically provide weights for other student needs, such as English Language Learner status. However, data shared by Connecticut State Department of Education with the State‘s Ad Hoc Committee to Study Education Cost Sharing and School Choice show that the measure for free/reduced price lunch also captures most English language learners. In other words, there is a very strong correlation between English language learner concentration and poverty concentration in Connecticut. In addition, keeping the formula simple allows a more generous weight for students in poverty‖ (p. 7, FN 12).

And here’s my response to their footnote:

This finding is cited only ambiguously in a footnote to data shared by CTDOE. In some states, a strong relationship between the two measures might warrant collapsing supplemental aid for LEP and low-income children into one student-need factor—with sufficient additional support to meet the combination and concentration of needs. However, a quick check of the data in Connecticut shown in Figure 1 (below) reveals that several districts have disproportionately high LEP concentrations relative to their low-income concentrations—specifically Norwalk, Danbury, New London, Windham, Stamford and New Britain. (figure in review)

And:

The overall correlations between ELL concentrations and subsidized lunch rates are not sufficiently strong (only a 0.50 correlation in 2008-2009) to select a single factor for addressing both needs. Nor does the report offer any actual analysis in drawing this conclusion (see Table A1, Appendix). Table A1 in the Appendix to this review provides a quick check of the correlations between wealth measures, income measures and student populations for 2005 and 2009.

That nitpicking aside, my big concern with the ConnCan report in this regard is that they provide absolutely no support for any of their recommendations, and in some cases state as fact, conclusions that turn out to be FLAT OUT WRONG.

I explain:

Further, some of the statements and recommendations made in the report, such as those pertaining to LEP/ELL children, are simply wrong. And these factual mistakes have significant consequences for the validity of the report‘s recommendations. By combining the ELL mistake with the proposal that ―money follow the child‖ (the weighted student funding formula), the report‘s recommendations would apparently be a boon to advocates for charter expansion. However, the weighted funding formula is a tangential argument at best, not supported by any of the claims in the report, and one that seeks to divert significant resources from schools with the highest demonstrated needs.

Finally, regarding the issue of poverty and driving money to charters, ConnCAN seems to not fully understand how their own proposal works – which I guess doesn’t really surprise me. Let’s break it down:

CT charters serve fewer free lunch kids (<130% poverty level) than their host districts, but serve relatively more free or reduced price lunch (<185% poverty level) kids (they have more of the less poor among the poor)
Take any given sum of money and distribute it by free + reduced lunch kids and charters make out better. At a zero sum re-allocation, providing a smaller weight on free or reduced lunch kids versus a larger weight on free only shifts some of that money to charters.
CT charters have very few ELL/LEP kids, so they wouldn’t benefit from a weight on these kids.
Arguing to not have a weight on ELL/LEP kids and to instead reallocate that sum of money to the free or reduced price lunch weight, drives that money into charters – as well as other districts with higher free and reduced price lunch shares but fewer ELL/LEP kids. THIS IS EXACTLY WHAT THEY ARGUE FOR! (see red above)

If we assume state finance formulas to work within fixed budget constraints (which they do), this strategy, based on a lie of no need for an ELL/LEP weight, is effectively robbing the ELL/LEP populations to subsidize the less poor among the poor. This is a classic weight shifting game.

For the complete review, see: http://nepc.colorado.edu/files/TTR-ConnCan-Baker-FINAL.pdf

Previous policy brief on CT School Finance & Money Follows the Child: CT and Money Follows the Child

Graph of the Day: Private School Day Tuition vs. Public School Expenditures (Boston Metro 2009)

I’ve written extensively in the past about private school tuition and expenditures.

Here is a link to a report on private school expenditures I produced in 2009. http://nepc.colorado.edu/publication/private-schooling-US

The graph below is actually stacked heavily in favor of showing that public schools have higher spending than private schools. Why? Because I am comparing private school tuition to public school total expenditures per pupil.

TUITION DOES NOT COVER TOTAL COSTS AND DOES NOT REPRESENT TOTAL SPENDING. The tuition figures included below include only DAY SCHOOL TUITION (or day component for boarding schools) which is only a share of current operating expenditures.

That out of the way, let’s take a look at the distribution of day tuition for the 57 private schools identified in 2009 by Boston Magazine as the “best” in the Boston Metro – broadly defining the Boston Metro (extending pretty far out). These schools collectively serve over 24,000 students. Let’s put their tuition into context by comparing it with the distribution of total per pupil expenditures as reported for public districts in the Boston Metro by the Massachusetts Department of Education.

Note that the vast majority of private independent schools had tuition in 2009 exceeding $30,000, yet few if any public districts spent anywhere near that much. A handful of Catholic private schools charged tuition under $10,000. Whereas the majority of public districts in the Boston metro spent closer to $10,000 than to $20,000.

The average pupil weighted total expenditure per pupil for public districts in the figure is$12,966 (with Boston as a stand out, but under $20k)

The average pupil weighted day tuition for private schools in the figure is $22,337 (but clearly bimodal)

Logic Gaps in the NJ Ed Reform Debate

Not much time for another full length post today. There are numbers to be crunched. But, I did feel it necessary to clear up a few issues regarding NJ Education Reform proposals, including those laid out yesterday focused on a) reforming teacher evaluation to focus on student assessment data, b) tying evaluation to compensation, tenure and dismissal policies, c) ending last in first out, and d) requiring mutual consent in placement/hiring of teachers to specific school locations.

And of course, these policy proposals are framed with the usual urgency.

Here are four overarching claims (and a few other things) based on reformy logic being applied in the New Jersey policy debate:

1. We must act now!

The argument goes that we must act now, before it’s too late, because things are so awful. First, it’s rather hard to argue with a straight face, and certainly not with any data, that NJ’s public education system is so awful. NJ performs at or near the top among states on national assessments, and NJ low-income students (qualifying for free lunch) also do quite well nationally and have risen over the years (one example here). Typically, the great urgency argument is a ruse to get policymakers to act in haste, and adopt policies they and especially those who voted for them, will regret later.

2. We couldn’t possibly do worse!

The argument that we couldn’t possibly do worse! Clearly, New Jersey could do worse, since New Jersey does quite well. That’s not to say that we shouldn’t keep trying to do better, or that we shouldn’t be trying to do better specifically in those areas where we aren’t doing as well as we should. But, we could surely do worse, as the vast majority of states do! See: http://nces.ed.gov/nationsreportcard/naepdata/dataset.aspx

3. Teacher evaluation, compensation and tenure reform are the key variables!

All of the current proposals center on what are argued to be necessary changes to teacher evaluation, compensation, tenure and dismissal. That is, the assumption is that we can improve all schools by making these changes and specifically that we can improve the 200 failing schools which serve over 100,000 students. For these changes to be reasonable, one would have to have some idea, some empirical basis perhaps, for why these policy changes might have any positive effect on either our highly successful districts or those supposedly dreadfully failing ones. Since the existing research literature provides no real substantive support for merit pay (as a way to either stimulate immediate, or long-term improvements), or using student test scores for teacher evaluation, one might logically look at the differences between NJ’s highest performing schools and NJ’s lowest performing ones. Of course, what we find there is that the teacher contractual agreements are quite similar in higher and lower performing schools in NJ. Of course, other things are different, most notably the demographics of those schools.

Let’s make this really simple – IT’S PLAINLY ILLOGICAL TO BLAME SUCCESS OR FAILURE ON A FACTOR THAT DOESN’T VARY ACROSS SUCCESSFUL AND FAILING SCHOOLS. That’s just middle school science logic. Perhaps we should fire the middle school science teachers who taught the current crop of ed reformers?

4. No business in their right mind would retain “ineffective” employees, so why should we let this happen in schools?

There’s also that fun argument that no business in their right mind would or should retain ineffective, low quality, employees? Why would they? Why do they? Well, it’s all relative. Now surely, anyone reading this has encountered at least a few employees of private companies or perhaps even colleagues who, well, just aren’t that good at what they do. Some people do better than others in any field, and there’s always a bottom rung. We ask ourselves, why do we retain these people? Why would a school retain an ineffective teacher? Why would a school grant tenure to such a large share of teachers, some of whom might not be that great. Sometimes the answer to this is pretty simple – That those waiting in line to apply to take those jobs at present salaries might not be any better, and in fact, might be worse! You don’t let go your bottom rung unless you are pretty sure you can replace them with something better. Applied to the current NJ school reform debate: One cannot simply assume that if we force poor urban districts to lay off large numbers of teachers that we would consider “ineffective,” that there will be a long line of better teachers waiting to take those jobs. In fact, the alternatives might be worse in many cases, unless we significantly step up teacher pay and maintain quality benefits, including job stability and the potential for consistent income growth over time (potentially allowing a lower wage than would otherwise be required).

OTHER STUFF…

We must fix LIFO now!

That is, clearly, the most offensive policies that exist today across states and in district contractual agreements are those that protect old, crusty ineffective, uncaring curmudgeons while discarding – throwing out onto the streets – young energetic and caring teachers.

This one is really a smokescreen issue, especially when coupled with the immediacy claim. It makes for good sound bytes and has a catchy acronym – LIFO – which must be bad, because it sounds so bad! But when you dig deeper, even though it seems to make sense that quality should trump seniority in layoff decisions, it’s not that simple – nor is it huge money saver and job saver as some assert.

First, layoffs are here and now – in very tight budget times – and the supposed evaluations to be used don’t yet exist. So suggesting that this is a necessary immediate change is foolish.
Second, if we are relying heavily on test scores to decide quality – the only teachers who would have scores attached to them would be those in core content teaching in grades 3 to 8. But, layoffs are likely to occur in other areas first – and unlikely to reach core teaching in K-8 in many cases. In fact, schools and districts already have significant latitude to restructure programs and offerings leading to layoffs that may not all fall entirely on the basis of seniority (programmatic & position cuts).
Third, there is more research out there than is acknowledged in the present debate that actually does speak to the value of experience.
Fourth, replacing a not-so-great, convenience based (and perhaps turf protecting) measure like seniority with a potentially politically charged, manipulable and or random error prone alternative (like test score based evaluations) CAN ACTUALLY MAKE THINGS WORSE. While LIFO may not be great, the alternatives could be worse and could be an even greater deterrent to the recruitment of a talented teacher workforce.

A few other notes

Regarding what we know about mutual consent teacher hiring/placement policies: https://schoolfinance101.wordpress.com/2010/10/08/nctq-were-sure-it-will-work-even-if-research-says-it-doesnt/

Oh, and by the way, just to be absolutely clear, NEW JERSEY IS NOT THE HIGHEST SPENDING STATE IN THE NATION! https://schoolfinance101.wordpress.com/2010/10/04/state-ranking-madness-who-spends-mostleast/

More expensive than what? A quick comment on CAP’s CSR report

The Center for American Progress today release a report on class size reduction authored by Matthew Chingos, who has conducted a handful of recent interesting studies on the topic.

http://www.americanprogress.org/issues/2011/04/pdf/class_size.pdf

This report reads more or less like a manifesto against class size reduction as a strategy for improving school quality and student outcomes. I’ll admit that I’m also probably not the biggest advocate for class size reduction as a single, core strategy for education reform, and that I do favor some balanced emphasis on teacher quality issues. I’m also not the naysayer that I once was regarding class size reduction and its relative costs. There still exists too little decisive information regarding the cost-benefit tradeoffs between the two – teacher quantity and teacher quality.

I only had a chance to view this report briefly, and one specific section caught my eye – the section titled: CSR, The Most Expensive School Reform.

I found this interesting, because it included a bunch of back of the napkin estimates of the potential costs of CSR (based on reasonable assumptions), BUT PROVIDED NOT ONE SINGLE COMPARISON OF THE COST AND BENEFITS OF CSR TO ANY OTHER ALTERNATIVE.

You see – You can’t say something is the most expensive without actually comparing it to, uh, something else. That’s how cost comparisons work. Cost benefit analysis works this way too. You compare the costs of option A, and outcomes achieved under option A, to the costs of option B, and outcomes achieved under option B.

Implicit in this section of the report is that reducing class size for any given improvement in student outcomes is necessarily more expensive than improving student outcomes by the same amount by improving teacher quality. In fact, explicit in the title of this section of the report is that pretty much any alternative that might get the same outcome is cheaper than CSR. That’s one freakin’ amazing stretch!

Here are a few quotes provided by Matt Chingos on this point:

A school that pays teachers $50,000 per year (roughly the national average) would save $833 per student in teacher salary costs alone by increasing class size from 15 to 20.30 The true savings, including facilities costs and teacher benefits, would be significantly larger. These resources could be used for other purposes. If all of the savings were used to raise teacher salaries, for example, the average teacher salary in this example would increase by $17,000 to $67,000.

And:

The emerging consensus that teacher effectiveness is the single most important in-school determinant of student achievement suggests that teacher recruitment, retention, and compensation policies ought to rank high on the list.

Chingos goes on to address the various teacher effect and effectiveness based layoff simulations by authors including Eric Hanushek and how those simulations project larger gains than would be achieved by class size reduction. Chingos does acknowledge in the next paragraph that:

Teachers would need to be paid more to compensate them for the loss of job security. Providing bonuses to teachers in high-need subjects and schools would also consume resources. If these policies are more cost-effective than reducing class size, then increasing class size in order to pursue them would increase student achievement.

However, it would seem by the title and the rest of the content of this section that Chingos has jumped to a conclusion on this point. No actual cost comparison is made between improving student outcomes by improving teacher effectiveness versus improving student outcomes by class size reduction.

The relevant research question based on the hypothetical here is:

…on a given labor market with a given supply of teacher quantities and qualities, does the teacher that will teach for a salary of $67,000 with a class of 20 children get a better result than the teacher that will teach for a salary of $50,000 with a class of 15?

I’m not sure we know the answer to that, in part because the teacher labor market research also suggests that while there is sensitivity of teacher labor markets to salaries, it may take quite substantial salary increases to achieve comparable gains to class size reduction. Further, given class size and total student load as a working condition, the same teacher might teach a class of 15 for marginally lower salary than to teach a class of 20 (which could be the difference between a total load, at 6 sections per day, of 90 vs. 120 students, which is a pretty big difference).

I’ve been waiting for years for good answers to this tradeoff, and hoping for data that will provide better opportunities to address this question. Unfortunately, the wait continues.

Dumbest “real” reformy graphs!

So in my previous post I created a set of hypothetical research studies that might be presented at the Reformy Education Research Association annual meeting. In my creation of the hypotheticals I actually tried to stay pretty close to reality, setting up reasonable tables with information that is actually quite probable. Now, when we get down to the real reformy stuff that’s out there, it’s a whole lot worse. In fact, had I presented the “real” stuff in my previous post, I’d have been criticized for fabricating examples that are just too stupid to be true. Let’s take a look at some real “reformy” examples here:

1. From Democrats for Education Reform of Indiana

According to the DFER web site post which includes this graph:

True, there are some great, traditional public schools in Indiana and throughout the nation. We’re also fortunate that a vast majority of our educators excel at their jobs and are dedicated to doing whatever it takes to help students succeed. However, that doesn’t mean we should turn a blind eye to what ISN’T working. Case in point? The following diagram displays how all 5th grade classes in the span of a year in one central Indiana school district are doing on a set of state Language Arts student academic standards. Because 5th grade classes in Indiana are only taught by one teacher, the dots can be translated to display how well the students of individual teachers are doing.

Now, ask yourself this: In which dot or class would you want your child? And, imagine if your child were in the bottom performing classroom for not one but MULTIPLE years. In spite of lofty claims made by those who defend the current system, refusal to offer constructive alternatives to rectify charts such as the one above represents the sad state of education dialogue in America today.

So, here we have a graph… a line graph of all things, across classrooms (3rd grade graphing note – a bar graph would be better, but still stupid). This graph shows the average pass rates on state assessments for kids in each class. Nothin’ else. Not gains. Just average scores. Gains wouldn’t necessarily tell us that much either. But this is truly absurd. The author of the DFER post makes the bold leap that the only conclusion one can draw from differences in average pass rates across a set of Indiana classrooms is that some teachers are great and others suck! Had I used this “real” example to criticize reformers, most would have argued that I had gone overboard.

2. Bill Gates brilliant exposition on turning that curve upside down – and making money matter

Now I’ve already written about this graph, or at least the post in which it occurs, but I didn’t include the graph itself.

Gates uses this chart to advance the argument:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries… For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.

Among other things, the chart includes no international comparison, which becomes the centerpiece of the policy argument. Beyond that, the chart provides no real evidence of a lack of connection between spending and outcomes across districts within U.S. States. Instead, the chart juxtaposes completely different measures on completely different scales to make it look like one number is rising dramatically while the others are staying flat. This tells us NOTHING. It’s just embarrassing. Simply from a graphing standpoint, a blogger at Junk Charts noted:

Using double axes earns justified heckles but using two gridlines is a scandal! A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)

Not much else to say about that one. Again, had I used an example this absurd to represent reformy research and thinking, I’ d have likely faced stern criticism for mis-characterizing the rigor of reformy research!

Hat tip to Bob Calder on Twitter, for finding an even more absurd representation of pretty much the same graph used by Gates above. This one comes to us from none other than Andrew Coulson of Cato Institute. Coulson has a stellar record of this kind of stuff. So, what would you do to the Gates graph above if you really wanted to make your case that spending has risen dramatically and we’ve gotten no outcome improvement? First, use total rather than per pupil spending (and call it “cost”) and then stretch the scale on the vertical axis for the spending data to make it look even steeper. And then express the achievement data in percent change terms because NAEP scale scores are in the 215 to 220 range for 4th grade reading, for example, but are scaled such that even small point gains may be important/relevant but won’t even show as a blip if expressed as a percent over the base year.

And here’s the Student’s First version of the same old story:

3. Original promotional materials from the reformy documentary, The Cartel (a manifesto on New Jersey public schools)

The Cartel is essentially the ugly step-cousin of Waiting for Superman and The Lottery. I’ve written extensively about the Cartel when it was originally released and then when it made its Jersey tour. Thankfully, it didn’t get much beyond that. Back when it was merely a small time, low budget, ill-conceived, and even more poorly researched pile of reformy drivel, The Cartel had a promotional web site (different from the current one) which included a page of documented facts explaining why reform was necessary in New Jersey. The central message was much the same as the Gates message above. The graphs that follow are nolonger there, but the message is – for example – here:

With spending as high as $483,000 per classroom (confirmed by NJ Education Department records), New Jersey students fare only slightly better than the national average in reading and math, and rank 37th in average SAT scores.

Here are the truly brilliant graphs that support this irrefutable conclusion:

I have discussed these graphs at length previously! I’m not sure it’s even worth reiterating my previous comments. But, just to clarify, it is entirely conceivable that participation rates for the SAT differ somewhat across states and may actually be an important intervening factor? Nah… couldn’t be.

A trip to the Reformy Education Research Association?

So, as I head off to AERA in New Orleans, I’ve been pondering what it would be like if there was a special education research conference for reformy types. What would we find at the Reformy Education Research Association, RERA? How would the research be conducted or presented? What kinds of research thinking might we see?

Well, here are a few examples.

Reformy Study #1

First, here’s a table from the widely distributed paper from a team of renowned authors at the Forum on Understanding Core Knowledge in EDucation.

As you can see, the study endeavors to identify the determinants of school failure, in part, to identify those specific policies that must be changed in order to eliminate failing schools from our society. Failing schools are, after all, an abomination. The researchers ranked New Jersey schools from highest to lowest proficiency rates and took the top and bottom 10%. They then mined the content of the negotiated contractual agreements for each district, looking for key elements of those contracts for explanations for why some districts fail but others perform quite well (as good as Finland!). They also gathered basic demographic data on students, having been dinged by reviewer #3 (an outsider) on their proposal in which they had not included such data. The authors note, however, that including this data did not alter their original conclusions or policy implications.

Conclusion: The cause of some schools failing and others succeeding is clearly the absence of regular use of clear metrics for teacher evaluation and the absence of mutual consent school assignment policies. It is also likely that basing salaries on experience or degree level adds to the dysfunction of low performing schools.

Policy recommendation: Immediately implement a new teacher evaluation system based 50% on student assessment data. Prohibit the use of experience or degree level as a basis for compensation.

Reformy Study #2

In this next study, authors from the Belltower Institute for Technology Education and Modern Enterprise explore the scalability of a nationally recognized model for charter schooling. Specifically, the goal of the study is to determine whether the model, which has received accolades in major newspapers and on network television (Reformy Nation) over the past year, might be a useful model for replacing entire urban school systems. Table 2 below shows the characteristics of one successful charter school (sufficient data unavailable on the 3 less successful charters in the same network) operating the model, and the characteristics of the urban host district of that charter school. Deliberations are under way in that district to grant the charter operators full control of all schools in the district. Data in the table focus specifically on children in Grades 6 to 8, the only grades served by the charter.

Clearly, the charter not only outperforms the host district schools in grade 6, but by an even larger margin in grade 8, which can only be interpreted (emphasis in original manuscript) as the charter school adding more value to students with each year that they stay (setting aside the possibility that large shares of those students who are nolonger in attendance by 8^th grade may have been lower performers).

Again, original analyses included only student assessment scores, and no further information student population characteristics. Amazingly, the original proposal got dinged by the same reviewer #3 as the study above, but reviewers #1 and #2 found the proposal to represent the highest standards of reformy rigor.

The authors continue to maintain that this information is unimportant because the charter populations are necessarily representative of the host district because a lottery is used for admission to the charter. Nonetheless, the authors contend that the reported differences in student populations and cohort attrition are “trivial.”

Conclusion: Clearly, the charter school has proven that it is able to produce far better results than host district schools while serving the very same children (emphasis in original manuscript) as those served by host district schools, and by using its “no excuses” approach. Further, children’s performance improves the longer they attend the charter school.

Policy recommendation: Set in place a strategy to turn over all host district schools, across all grade levels to the charter operator.

Reformy Study #3

In the third and final paper, economists from the the Measuring Yearly Advancements in Social Science project released preliminary findings from a massive privately funded study on teacher effectiveness. Specifically, the study endeavors to determine the correlates of effective teaching, in order to guide public school district personnel policies – specifically hiring, retention and compensation decisions. The study involved 22,543 teachers (326 of whom had complete data on all observations) across 6 cities (4 of which failed to provide sufficient data in time for this preliminary release). Using two years of data on students assigned to each teacher (using only the 4th grade math assessment data, because correlations on language arts assessments were too unreliable to report), the study investigated which factors are most highly related to a TRUE measure of teaching effectiveness – where true “effectiveness” was defined as the contribution of Teacher X, to achievement growth in 4th grade math on the STATE assessment for students S1 – Sy, linked to that teacher in the given year (Equation expressed in Appendix A, pages 69-74). The same students were also given a second math assessment. School principals conducted observations 5 times during the year and filled out an extensive evaluation matrix based on teacher practices and student – teacher interactions. Students were also administered surveys, as were parents of those students, requesting extensive feedback regarding their perceptions of teacher quality. The correlations are shown in Table 3.

Conclusions & Implications: The strongest correlate of true teaching effectiveness was the estimate of teacher contribution to student achievement on the same test a year later. However, this correlation was only modest (.30). All other measures including effectiveness measures based on alternative tests and student, parent and administrator perceptions of teacher effectiveness were less correlated with the original value-added estimate, thus raising questions about the usefulness of any of these other measures. Because the value-added measure turns out to be the best predictor of itself in a subsequent year, this estimate alone trumps all others in terms of usefulness for making decisions regarding teacher retention (especially in times of staffing reduction) and should also be considered a primary factor in compensation decisions. Note that while it may appear that school administrators, students and their parents have highly consistent views regarding which teachers are more and less effective (note the higher correlations across administrator ratings of teachers, and student and parent ratings), we consider these findings unimportant because none of these perception-based ratings were as correlated with the original value-added estimate as the value-added estimate was with itself (which of course, is the TRUE measure of effectiveness).

Share this:

Share this:

3 Reporting Rules for Studies/Models Using Crude Poverty Measures

Rule 1: Descriptive/Distribution Reporting of Poverty Measure

Rule 2: Reporting of Relationships between Variance in Poverty and Outcome Measures

Rule 3: Reporting of Numbers/Shares of Cases Potentially Affected by Omitted Variables Bias (extent to which crude poverty measure compromises validity of model results)

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: