Saturday, April 23, 2016

Prep Schools and Test Scores
How do we rate teachers and schools?  One way, which has the advantage of being objective, but which is still controversial, is to consider their students’ test scores, and how much they go up while the students are being taught by a given teacher or at a given school.  The No Child Left Behind Act has likewise mandated tests for most public school students at a few points in their schooling.  We don’t have such mandated tests at private schools, but many private high schools (“prep schools”) require Secondary School Admission Tests (SSATs) of all their applicants, and have graduates who almost uniformly are applying to colleges and taking the SAT test.

This scatter graph shows information on the students’ testing at several prep schools which might be of interest to a student living in Massachusetts or New Hampshire, and which have released testing information to Boarding School Review (n.d.).

For each school, the horizontal location of the bubble marking the school shows how well the school’s students tested when they applied to the school (the average SSAT score, measured as a percentile, of incoming students).  And the vertical location of the bubble shows how well the school’s students test in their senior year at the school (the average SAT score, on a scale which goes to 2400) of graduates.

Since the SSAT and SAT test essentially the same thing (preparation for academic schoolwork, a combination of intelligence and learning measured mostly with regard to mathematical understanding, vocabulary and reading skill), we should not be surprised to see a fairly strong correlation between the two statistics.  This correlation can be expressed by an r2 of 78%, or by noting how nearly the schools’ bubbles fall along the best-fit diagonal trend line, as shown:



To the extent each school deviates above the trend line, we can say that the school is doing an above-average job of educating its students (at least, to the extent that SAT scores reflect education), and schools below the trend line appear to not be doing so well on this measure.  (The size of each circle indicates the number of students at the school; the color green indicates a girls-only school, blue a coed school, orange a day school.)

However, the information in this graphic should be a rather small part of judging a school’s academics, and academic strength may not be the most important factor when judging how happy or successful a particular child will be at a particular school.

Further, the height of the dots above the diagonal line may not be as important in choosing a school as is where the school falls along the diagonal line.  Although most people want to go to the most selective prep school (or college) that they get into, it is unclear whether this is the ideal choice in terms of academic progress, let alone social life.  Imagine a student who scores at the 90th percentile on the SSAT.  At a school near the center or on the left of the chart she will be among the stronger students; this may lead to increased self-confidence and self-identification as a scholar; on the other hand, it may instead lead to laziness as a moderate effort may be all that is necessary to have average or even above-average grades.  At an academically tougher school on the right of the chart, the 90th-percentile scorer will likely not be among the stronger students; this may lead to harder work to keep up with peers, or it may lead to frustration and burnout.  So in choosing a school, one might have to guess whether one’s child is more likely to suffer from laziness or from low self-esteem.

A final caveat: With any system of measurement, the accuracy of the measurement goes down to the extent that the measured entity or the measurer has a stake in the outcome.  And as with public schools, colleges and law schools, prep schools have in the past sometimes fudged input and outcomes statistics.  One wouldn’t want to penalize a school for being honest.

References

Boarding School Review. (n.d.). Retrieved from http://www.boardingschoolreview.com/

SSAT.org. (2013). Scores: How to read your score report. Retrieved from www.ssat.org/scores/read-score-report



Tuesday, February 09, 2016

Brookings and the Uses and Abuses of Economic Statistics

A pet peeve of mine is the use of slipshod social science and statistics as a mantle to conceal a weakly supported claim.   I sometimes see this with the output of ideological think tanks, organizations whose dissemination model usually involves getting mainstream publishers to credulously disseminate their “reports,” or press releases.  Sometimes I feel compelled to debunk such reports (e.g., Pittelli, 2016).

This week I noticed an article in the Washington Post Wonkblog (Badger, 2016).  This Wonkblog article reported on an economic analysis from the Brookings Institution.  It claims that a look at the changes in some economic statistics for America’s 100 largest cities over the past five years shows that economic growth does not do much to help the poor and working classes.

To test the validity of this claim, I downloaded the statistics used by Brookings and did some work on them in Excel and Tableau, an excellent visualization program I am currently learning for a course in the Data Analytics program at Southern New Hampshire University (SNHU.edu).

The Brookings article pointed out that “On average, the faster a metro economy grew, the more likely it was to experience improvements in inclusion [Brookings’ term for how well the poor are doing]” but then went on to refute a ridiculous strawman: “Yet growth in metro economies did not reliably improve all residents’ economic fortunes.”  Perhaps more important, the article’s headline was “In metro areas, growth isn’t reliably trickling down.” (Berube, 2016)

The Washington Post Wonkblog picked up the story with an even drearier headline (“All the people being left behind in America’s booming cities”).  The article disapprovingly quoted various business and Republican sources claiming that economic growth is the best way to help the poor and working class, told us that the Brookings report shows they are all mistaken, and ended by quoting an author of the Brookings report telling us that the report shows that the key to improving “inclusion” is increased government spending on the poor.

The Data
So what was all this based on?  Brookings took nine economic statistics, grouped them together in three groups of three, and gave the three statistics groups names which sound meaningful and important, namely: “growth,” “prosperity” and “inclusion.”  As these coinages are idiosyncratic, I will continue to put quotes around them.  In addition, Brookings’ and Wonkblog’s pessimistic reading of the Brookings statistics (primarily, that “growth” is not reliably leading to “inclusion”) is overblown for both statistical reasons and because the Brookings’ coinages are not meaningful or well-constructed.  I have three major issues with their conclusions:

First Issue
The Brookings “growth” measure covers the size of each city’s economy, whereas the “prosperity” and “inclusion” measures cover the per capita economy.  Naturally, growing cities attract workers from other cities with slower growth rates, and these workers – failed by their previous cities of residence – also benefit from a successful city’s growing economy.  But the positives of a city attracting new workers are overlooked by most of these statistics.  Indeed, to the extent a city is attracting new workers, its “prosperity” and “inclusion” measures will lag its “growth” measure, but these discrepancies are not a measure of urban failure, but rather of urban attractiveness.

Second Issue
Brookings claims of its “inclusion” statistic that:
Inclusion indicators measure how the benefits of growth and prosperity in a metropolitan economy are distributed among people. Inclusive growth enables more people to invest in their skills and to purchase more goods and services.

But these claims are not reasonably supported by the three statistics in question. 

Two of the parts of “inclusion” are Median wage and the Employment-to-population ratio (the share of all individuals aged 18 to 65 who are employed).  Neither of these measures tell us much about the bottom tier or working class or non-college graduates.  Median wage is a useful statistic, showing how the middle is doing.  Employment-to-population ratio is also meaningful, and it is perhaps troubling that today the level nationally is close to a 30-year low.  But people also can be unemployed due to prosperity, in the case of couples who can afford to have a stay-at-home parent, or people retiring before age 65.

The last of the statistics making up “inclusion” is the “Relative income poverty rate” (RIPR), which is “The share of people in a metropolitan economy who earn less than half of the local median wage.”  If a city saw everyone’s wages double, with no other changes, then RIPR would be unchanged.  But the low-earning people would certainly benefit from a doubling of real earnings, and they would be better able “to invest in their skills and to purchase more goods and services.”  Like other inequality measures, this one shows negative numbers when better off people see growth in their incomes, even when the people at the bottom are seeing the same or somewhat better incomes.  But this measure of inequality is worse than some others because the “well-off” whose income growth definitionally becomes a bad thing are merely those at the 50th percentile, not some category of rich which is divorced from “the people.”  In essence, as Median Wage is a denominator here, it will tend to cancel out much of the effect of the Median Wage statistic which is ostensibly one of the three parts of "inclusion."  Also, the RIPR statistic only looks at people with earnings, which means that someone going from no earnings (e.g., unemployed, on welfare, or in prison) to low earnings makes their city look worse off.  Further, a low-income person forced to move because he is priced out of, say, San Jose, California, makes that city look better off.

A more meaningful measure of inclusion, or how “benefits… are distributed” to the poor or working class, would look at a group such as the bottom quintile, and would measure whether this lowest-earning portion of the people saw increases or decreases in income (or consumption).  In the absence of such data, the median wage tells us more about the average person’s economic benefits and ability “to invest in their skills and to purchase more goods and services” than does Brookings’ “Relative income poverty rate.”

Third Issue
One cannot say flatly that a rising tide lifts all boats, or that it doesn’t; such a reality falls along a continuum.  I downloaded the three Brookings ranks for each of the100 cities, used Excel to semi-automatically put the tabular data into rows, and made scatter graphs in Excel and then Tableau.  I found that there is indeed a positive correlation between Brookings’ ranks of 5-year “growth” and “inclusion” measures, with a slope of 0.33 and an r-squared of 11%, meaning that one would expect that if one city has a "growth" that is 30 ranks better than a second city, then that first city will most likely have an "inclusion" that is 10 ranks better than the second city.  Further, 11% of all of the variation in the cities’ change of rank in “inclusion” may be explained by the variation in the cities’ change of rank in Brookings’ “growth” measure (P < 0.001).

Below is a scatter graph I constructed in Tableau using the Brookings rank data.  It shows the same dots as the scatter graph shown in the Wonkblog and Brookings articles, but with the addition of city names, where Tableau found room for them (note that the scales are reversed, as 100 is the worst score, and 1 is the best):





A quick glance at the scatter graph does not show any obvious pattern of correlation, and Wonkblog describes it as a “weak relationship.”   The article goes on to say that “This non-pattern is notable precisely because the rising-tide theory remains so alluring, particularly among Republicans.”  Those foolish Republicans!  It may be reasonable to describe a slope of 0.33 and a correlation of 11% as a weak relationship, but it is certainly not a “non-pattern,” not evidence with which to refute people who discern the pattern, and in particular not evidence that some other policies would work better than policies aimed at improving economic growth.

Note on Nonparametric Statistics
The cities are listed and graphed above by rank, not by the actual underlying statistics.  A list of ranks by definition has rank or ordinal scale, but not interval scale (i.e., adjacent cities always have a rank difference of one, but are not equally far apart from each other in terms of the underlying statistics.)  For normal statistical measures, such as those underlying the ranks, one would expect something close to a normal distribution, and that the interval between two adjacent cities which are ranked very high or very low would generally be greater than the interval between two adjacent cities near the middle of the distribution.  (Imagine that we have 100 people chosen at random, arranged by height; we are almost certain to see a greater height difference between the tallest person and the second-tallest person than between two adjacent people near the middle of the line.)

Because a rank difference of one does not rigorously translate to any particular difference in the data underlying the ranking process, a correlation of ranks is not as rigorous a measure as is the correlation of the underlying data.  Likewise, with ranked data, the associated scatter graph will have the appearance of a square which is relatively full of data points, right up to the edges of the square.  For these reasons, a correlation based on ranks could show significantly different values than a correlation based on the underlying statistics.

So given only rank data, a mathematical purist should prefer to use nonparametric statistics.  A person coming at the problem from the opposite standpoint – that is, with little knowledge of statistics – might also prefer the simplest or crudest of these nonparametric methods to determine correlation.  Looking again at the preceding scatter graph, it is visually divided into 4 quadrants, with 50 cities on either side of center, and 50 cities each above and below the center.  Each quadrant will have 25 cities if there is zero correlation by this measure.  But in fact, the quadrant counts are:


UL = 18


UR = 32


LL = 32


LR = 18


With 64/36 times as many cities at bottom left and upper right than at upper left and bottom right, there is clearly a positive correlation between the two variables.  The (simple and crude) quadrant count ratio is n(LL) + n(UR) - n(UL) - n(LR) all divided by N, and gives a number similar to r (the Pearson product-moment correlation coefficient), ranging from -1 to 1.  In this case: (64 – 36) / 100 = 0.28, which is, at the least, on the stronger side of “weak relationships.”

Further, one can see in the scatter graph that there are no cities very near to the upper left and bottom right extreme corners of the whole graph, while there are a few cities very near the bottom left and upper right corners.  In other words, cities with a poor ranking for “growth” also have a poor ranking for “inclusion,” while cities with an excellent ranking for “growth” also have an excellent ranking for “inclusion.”

Alternative Methods
All of the preceding correlations analysis is based on the assumption that the Brookings “growth” and “inclusion” statistics are meaningful and well-named constructs.  But as I noted in my First and Second points above, this is not the case.  So how would I show the relationship between economic growth and benefits to the people?

From the Brookings report, I obtained the nine separate statistics for each of the 100 largest metropolitan areas in the United States.  These are all rates of increase/decrease for the last 5 years, the period emphasized in the Wonkblog article.  Note that I will use the Greek delta symbol Δ to denote change in a statistic, in this case change over the last 5 years expressed as a percentage (e.g., if a statistic increased by 10%, then it was multiplied by 1.10).

After cleaning up the data and putting it in row format in Excel, I noted the Pearson coefficients and r-squared figures for the pairings of these statistics.

So how much does economic growth in a city help the poor?  Just looking at the cities’ Δ Gross Domestic Product (GMP) as the proper measure of economic growth, we see:
·         an r-squared of 0.67 with Δ Aggregate Wages
·         an r-squared of 0.56 with Δ Jobs
·         an r-squared of 0.32 with Δ Average Wage
·         an r-squared of 0.25 with Δ Median Wage

I switched to Tableau at this point because it makes it easy to create a calculated field combining statistics and then to check a correlation with the calculated field.

Aggregate Wages is by definition equal to Average Wage * Jobs, which means that (1+ Δ Aggregate Wages) = (1+ Δ Average Wage) * (1+ Δ Jobs).

In other words, if a city’s Average Wage increases by 10% while its number of Jobs increases by 5%, then its Aggregate Wages increase by 15.5%.  (We multiply a 10% increase in one statistic by a 5% increase in a second statistic by multiplying 1.10 by 1.05 = 1.155 = an increase of 15.5%.)

It is no surprise that again looking at the cities’ Δ Gross Domestic Product (GMP), we see:

  •          an r-squared of 0.67 with Δ Average Wage * Δ Jobs – exactly the same as for Δ Aggregate Wages, as we should expect
  •          an almost-as-high r-squared of 0.61 with Δ Median Wage * Δ Jobs

Furthermore, the beta or slope of the regression line is 0.82 for Δ Average Wage * Δ Jobs and 0.85 for Δ Median Wage * Δ Jobs.

In other words, if one city has a GMP increase that is 10% higher than a second city, that first city will have Aggregate Wages growing on average 8.2% higher than the second city, and 67% of the variation in the growth in the 100 cities’ Aggregate Wages will be explained by growth in GMP alone.

Now, Average Wage can go up substantially just because the top 1% saw huge gains, so to see the improvement in wages of the typical person, we use the Median Wage.  And when one city has a GMP increase that is 10% higher than a second city, that first city will have (Median Wage * Jobs) growing on average 8.5% higher than the second city, and 61% of the variation in the growth in the 100 cities’ (Median Wage * Jobs) will be explained by growth in GMP alone.  Note this scatter graph of change in (Median Wage * Jobs) vs. change in GMP:




So it seems to me that the best way to describe the way in which relative growth in a city’s GMP correlates to relative growth in Wages and Jobs is “quite reliably.”  Sometimes there is more growth in Wages and sometimes more growth in Jobs.  But as noted above, when the number of Jobs has increased in a city, the people in that city have also benefited, either because the Employment ratio is higher than it would otherwise be (a factor which Brookings attempts to add separately), or because some of the people in that city are migrants who came to the city for a job and situation which is generally better than they could have gotten in the city they left behind (a factor which Brookings ignores).

Conclusion
The Brookings authors used questionable methods to combine and create statistics when there are singular statistics which are more meaningful and give more meaningful correlations.

The more complicated a statistical measure, the easier it is to fool oneself (or others) about the meaning of the statistic.  More complicated statistics also allow for more options for comparing the statistics, and more chances of finding what you want to find in a correlation or other comparison.  In this case, Brookings was looking at correlations of the statistics which they termed “growth” and “inclusion.”  Each of these statistics was formed by:

  •          Ranks,
  •          of Sums,
  •          of differences from the three different Means,
  •          divided by three different Standard Deviations,
  •          of Rates of change in,
  •          underlying statistics which were themselves, in some cases, the quotient of two statistics (i.e., one statistic divided by another).

These manipulations of the statistics could be defensible if the statistics were used for other purposes, but the manipulations (particularly the use of Ranks) made Brookings’ use of a scatter graph and related claims about correlation untenable. 

Why did Brookings convert absolute values to measures of standard deviation from the mean?  If one merely multiplies together different statistics (or 1+ Δ statistics), then the statistic which has more volatility has more effect on the compounded measure than does a statistic which has less volatility.  For some manipulations it may be helpful to eliminate these differences between the measures so they are equally weighted in effect.  But for the purposes of making up a “growth,” “prosperity” and “inclusion” measure, it would have been preferable to combine the three statistics without resort to the standard-deviations-from-the-mean manipulation.  If one statistic is close to the same for all the cities, then that statistic is indeed a less important contributor to any meaningful measure of a city.  If all cities are within 1% of each other in some measure, then that is a reason to note that the measure in question does not vary much and so is not an important way to distinguish the cities.  It is not a reason to multiply that measure’s contribution ten-fold because we are combining it with some other measure which varies by 10%.

But all in all, the biggest problem with the Brookings “inclusion” measure is that it has little to do with “how the benefits of growth and prosperity in a metropolitan economy are distributed among people” and even less to do with how able the cities’ people are to “invest in their skills and to purchase more goods and services.” A correlation of more straightforward statistics shows that a city’s Growth (ΔGMP) in fact reliably drives (slope = 0.85; r-squared = 0.61; P < 0.0001) its ΔMedian Wage and ΔJobs.


References

Badger, E. (2016, February 2). All the people being left behind in America’s booming cities. Washington Post Wonkblog. Retrieved from www.washingtonpost.com/news/wonk/wp/2016/02/02/all-the-people-being-left-behind-in-americas-booming-cities/

Berube, A. (2016, January 29). In metro areas, growth isn't reliably trickling down. Retrieved from www.brookings.edu/blogs/the-avenue/posts/2016/01/29-growth-isnt-reliably-trickling-down-in-metro-areas-aberube

Brookings. (2016). Metro monitor. Retrieved from www.brookings.edu/research/reports2/2016/01/metro-monitor#V0G37980

Pittelli, D. (2016, January-February). Cambridge 02138 – Letters to the Editor. Harvard Magazine. Retrieved from http://harvardmagazine.com/2015/12/cambridge-02138

Sullins, B. (n.d.). Enterprise business intelligence with Tableau Server. Pluralsight. Retrieved from https://app.pluralsight.com/library/courses/enterprise-business-intelligencetableau-server/table-of-contents