A
pet peeve of mine is the use of slipshod social science and statistics as a
mantle to conceal a weakly supported claim. I sometimes see this with the output of
ideological think tanks, organizations whose dissemination model usually
involves getting mainstream publishers to credulously disseminate their “reports,”
or press releases. Sometimes I feel
compelled to debunk such reports (e.g., Pittelli, 2016).
This
week I noticed an article in the Washington Post Wonkblog (Badger, 2016). This Wonkblog
article reported on an economic analysis from the Brookings Institution. It claims that a look at the changes in some
economic statistics for America’s 100 largest cities over the past five years shows
that economic growth does not do much to help the poor and working classes.
To
test the validity of this claim, I downloaded the statistics used by Brookings
and did some work on them in Excel and Tableau, an excellent visualization
program I am currently learning for a course in the Data Analytics program at
Southern New Hampshire University (SNHU.edu).
The
Brookings article pointed out that “On average, the faster a metro economy
grew, the more likely it was to experience improvements in inclusion
[Brookings’ term for how well the poor are doing]” but then went on to refute a
ridiculous strawman: “Yet growth in metro economies did not reliably improve
all residents’ economic fortunes.” Perhaps
more important, the article’s headline was “In metro areas, growth isn’t
reliably trickling down.” (Berube, 2016)
The
Washington Post Wonkblog picked up
the story with an even drearier headline (“All the people being left behind in
America’s booming cities”). The article disapprovingly
quoted various business and Republican sources claiming that economic growth is
the best way to help the poor and working class, told us that the Brookings
report shows they are all mistaken, and ended by quoting an author of the
Brookings report telling us that the report shows that the key to improving
“inclusion” is increased government spending on the poor.
The
Data
So
what was all this based on? Brookings took
nine economic statistics, grouped them together in three groups of three, and gave
the three statistics groups names which sound meaningful and important, namely:
“growth,” “prosperity” and “inclusion.”
As these coinages are idiosyncratic, I will continue to put quotes
around them. In addition, Brookings’ and
Wonkblog’s pessimistic reading of the
Brookings statistics (primarily, that “growth” is not reliably leading to
“inclusion”) is overblown for both statistical reasons and because the
Brookings’ coinages are not meaningful or well-constructed. I have three major issues with their
conclusions:
First Issue
The
Brookings “growth” measure covers the size of each city’s economy, whereas the
“prosperity” and “inclusion” measures cover the per capita economy. Naturally, growing cities attract workers
from other cities with slower growth rates, and these workers – failed by their
previous cities of residence – also benefit from a successful city’s growing
economy. But the positives of a city
attracting new workers are overlooked by most of these statistics. Indeed, to the extent a city is attracting
new workers, its “prosperity” and “inclusion” measures will lag its “growth”
measure, but these discrepancies are not a measure of urban failure, but rather
of urban attractiveness.
Second Issue
Brookings
claims of its “inclusion” statistic that:
Inclusion indicators
measure how the benefits of growth and prosperity in a metropolitan economy are
distributed among people. Inclusive growth enables more people to invest in
their skills and to purchase more goods and services.
But
these claims are not reasonably supported by the three statistics in question.
Two
of the parts of “inclusion” are Median wage and the Employment-to-population
ratio (the share of all individuals aged 18 to 65 who are employed). Neither of these measures tell us much about
the bottom tier or working class or non-college graduates. Median wage is a useful statistic, showing
how the middle is doing. Employment-to-population
ratio is also meaningful, and it is perhaps troubling that today the level
nationally is close to a 30-year low.
But people also can be unemployed due to prosperity, in the case of
couples who can afford to have a stay-at-home parent, or people retiring before
age 65.
The
last of the statistics making up “inclusion” is the “Relative income poverty
rate” (RIPR), which is “The share of people in a metropolitan economy who earn
less than half of the local median wage.”
If a city saw everyone’s wages double, with no other changes, then RIPR would
be unchanged. But the low-earning people
would certainly benefit from a doubling of real earnings, and they would be
better able “to invest in their skills and to purchase more goods and services.” Like other inequality measures, this one
shows negative numbers when better off people see growth in their incomes, even
when the people at the bottom are seeing the same or somewhat better
incomes. But this measure of inequality is worse than
some others because the “well-off” whose income growth definitionally becomes a
bad thing are merely those at the 50th percentile, not some category
of rich which is divorced from “the people.”
In essence, as Median Wage is a denominator here, it will tend to cancel
out much of the effect of the Median Wage statistic which is ostensibly one of
the three parts of "inclusion." Also, the RIPR statistic
only looks at people with earnings, which means that someone going from no
earnings (e.g., unemployed, on welfare, or in prison) to low earnings makes
their city look worse off. Further, a
low-income person forced to move because he is priced out of, say, San Jose,
California, makes that city look better off.
A
more meaningful measure of inclusion, or how “benefits… are distributed” to the
poor or working class, would look at a group such as the bottom quintile, and
would measure whether this lowest-earning portion of the people saw increases
or decreases in income (or consumption).
In the absence of such data, the median wage tells us more about the
average person’s economic benefits and ability “to invest in their skills and
to purchase more goods and services” than does Brookings’ “Relative income
poverty rate.”
Third
Issue
One
cannot say flatly that a rising tide lifts all boats, or that it doesn’t; such
a reality falls along a continuum. I
downloaded the three Brookings ranks for each of the100 cities, used Excel to
semi-automatically put the tabular data into rows, and made scatter graphs in Excel
and then Tableau. I found that there is
indeed a positive correlation between Brookings’ ranks of 5-year “growth” and
“inclusion” measures, with a slope of 0.33 and an r-squared of 11%, meaning that one would expect that if one city has a "growth" that is 30 ranks better than a second city, then that first city will most likely have an "inclusion" that is 10 ranks better than the second city. Further, 11% of all of the
variation in the cities’ change of rank in “inclusion” may be explained by the
variation in the cities’ change of rank in Brookings’ “growth” measure (P < 0.001).
Below
is a scatter graph I constructed in Tableau using the Brookings rank data. It shows the same dots as the scatter graph shown
in the Wonkblog and Brookings articles,
but with the addition of city names, where Tableau found room for them (note
that the scales are reversed, as 100 is the worst score, and 1 is the best):
A
quick glance at the scatter graph does not show any obvious pattern of correlation,
and Wonkblog describes it as a “weak
relationship.” The article goes on to
say that “This non-pattern is notable precisely because the rising-tide theory
remains so alluring, particularly among Republicans.” Those foolish Republicans! It may be reasonable to describe a slope of 0.33 and a correlation of 11% as a
weak relationship, but it is certainly not a “non-pattern,” not evidence with
which to refute people who discern the pattern, and in particular not evidence
that some other policies would work better than policies aimed at improving
economic growth.
Note
on Nonparametric Statistics
The
cities are listed and graphed above by rank, not by the actual underlying
statistics. A list of ranks by
definition has rank or ordinal scale, but not interval scale (i.e., adjacent
cities always have a rank difference of one, but are not equally far apart from
each other in terms of the underlying statistics.) For normal statistical measures, such as
those underlying the ranks, one would expect something close to a normal
distribution, and that the interval between two adjacent cities which are
ranked very high or very low would generally be greater than the interval
between two adjacent cities near the middle of the distribution. (Imagine that we have 100 people chosen at
random, arranged by height; we are almost certain to see a greater height
difference between the tallest person and the second-tallest person than
between two adjacent people near the middle of the line.)
Because
a rank difference of one does not rigorously translate to any particular
difference in the data underlying the ranking process, a correlation of ranks
is not as rigorous a measure as is the correlation of the underlying data. Likewise, with ranked data, the associated scatter graph will have the appearance of a square which is relatively full of data points, right up to the edges of the square. For these reasons, a correlation based on ranks
could show significantly different values than a correlation based on the
underlying statistics.
So given only rank data, a mathematical purist
should prefer to use nonparametric statistics.
A person coming at the problem from the opposite standpoint – that is,
with little knowledge of statistics – might also prefer the simplest or crudest of
these nonparametric methods to determine correlation. Looking again at the preceding scatter graph,
it is visually divided into 4 quadrants, with 50 cities on either side of center,
and 50 cities each above and below the center.
Each quadrant will have 25 cities if there is zero correlation by this
measure. But in fact, the quadrant
counts are:
UL
= 18
|
UR
= 32
|
LL
= 32
|
LR
= 18
|
With
64/36 times as many cities at bottom left and upper right than at upper left
and bottom right, there is clearly a positive correlation between the two
variables. The (simple and crude) quadrant
count ratio is n(LL) + n(UR) - n(UL) - n(LR) all divided by N, and gives a
number similar to r (the Pearson product-moment correlation coefficient),
ranging from -1 to 1. In this case: (64
– 36) / 100 = 0.28, which is, at the least, on the stronger side of “weak
relationships.”
Further,
one can see in the scatter graph that there are no cities very near to the
upper left and bottom right extreme corners of the whole graph, while there are
a few cities very near the bottom left and upper right corners. In other words, cities with a poor ranking
for “growth” also have a poor ranking for “inclusion,” while cities with an
excellent ranking for “growth” also have an excellent ranking for “inclusion.”
Alternative Methods
All
of the preceding correlations analysis is based on the assumption that the
Brookings “growth” and “inclusion” statistics are meaningful and well-named
constructs. But as I noted in my First
and Second points above, this is not the case.
So how would I show the relationship between economic growth and
benefits to the people?
From
the Brookings report, I obtained the nine separate statistics for each of the
100 largest metropolitan areas in the United States. These are all rates of increase/decrease for
the last 5 years, the period emphasized in the Wonkblog article. Note that
I will use the Greek delta symbol Δ to denote change in a statistic, in this
case change over the last 5 years expressed as a percentage (e.g., if a
statistic increased by 10%, then it was multiplied by 1.10).
After
cleaning up the data and putting it in row format in Excel, I noted the Pearson
coefficients and r-squared figures for the pairings of these statistics.
So
how much does economic growth in a city help the poor? Just looking at the cities’ Δ Gross Domestic
Product (GMP) as the proper measure of economic growth, we see:
·
an
r-squared of 0.67 with Δ Aggregate Wages
·
an
r-squared of 0.56 with Δ Jobs
·
an
r-squared of 0.32 with Δ Average Wage
·
an
r-squared of 0.25 with Δ Median Wage
I
switched to Tableau at this point because it makes it easy to create a
calculated field combining statistics and then to check a correlation with the
calculated field.
Aggregate
Wages is by definition equal to Average Wage * Jobs, which means that (1+ Δ Aggregate
Wages) = (1+ Δ Average Wage) * (1+ Δ Jobs).
In
other words, if a city’s Average Wage increases by 10% while its number of Jobs
increases by 5%, then its Aggregate Wages increase by 15.5%. (We multiply a 10% increase in one statistic
by a 5% increase in a second statistic by multiplying 1.10 by 1.05 = 1.155 = an
increase of 15.5%.)
It
is no surprise that again looking at the cities’ Δ Gross Domestic Product
(GMP), we see:
- an
r-squared of 0.67 with Δ Average Wage * Δ Jobs – exactly the same as for Δ
Aggregate Wages, as we should expect
- an
almost-as-high r-squared of 0.61 with Δ Median Wage * Δ Jobs
Furthermore,
the beta or slope of the regression line is 0.82 for Δ Average Wage * Δ Jobs
and 0.85 for Δ Median Wage * Δ Jobs.
In
other words, if one city has a GMP increase that is 10% higher than a second
city, that first city will have Aggregate Wages growing on average 8.2% higher
than the second city, and 67% of the variation in the growth in the 100 cities’
Aggregate Wages will be explained by growth in GMP alone.
Now, Average Wage can
go up substantially just because the top 1% saw huge gains, so to see the improvement
in wages of the typical person, we use the Median Wage. And
when one city has a GMP increase that is 10% higher than a second city, that
first city will have (Median Wage * Jobs) growing on average 8.5% higher than
the second city, and 61% of the variation in the growth in the 100 cities’
(Median Wage * Jobs) will be explained by growth in GMP alone. Note this scatter graph of change in (Median
Wage * Jobs) vs. change in GMP:
So
it seems to me that the best way to describe the way in which relative growth
in a city’s GMP correlates to relative growth in Wages and Jobs is “quite
reliably.” Sometimes there is more
growth in Wages and sometimes more growth in Jobs. But as noted above, when the number of Jobs has
increased in a city, the people in that city have also benefited, either
because the Employment ratio is higher than it would otherwise be (a factor
which Brookings attempts to add separately), or because some of the people in
that city are migrants who came to the city for a job and situation which is
generally better than they could have gotten in the city they left behind (a
factor which Brookings ignores).
Conclusion
The Brookings authors used questionable methods to combine and create
statistics when there are singular statistics which are more meaningful and give more meaningful correlations.
The
more complicated a statistical measure, the easier it is to fool oneself (or
others) about the meaning of the statistic.
More complicated statistics also allow for more options for comparing
the statistics, and more chances of finding what you want to find in a
correlation or other comparison. In this
case, Brookings was looking at correlations of the statistics which they termed
“growth” and “inclusion.” Each of these
statistics was formed by:
- Ranks,
- of
Sums,
- of
differences from the three different Means,
- divided
by three different Standard Deviations,
- of
Rates of change in,
- underlying
statistics which were themselves, in some cases, the quotient of two statistics
(i.e., one statistic divided by another).
These
manipulations of the statistics could be defensible if the statistics were used for other
purposes, but the manipulations (particularly the use of Ranks) made Brookings’ use of a scatter graph and related claims about
correlation untenable.
Why did Brookings convert absolute values to measures of standard deviation
from the mean? If one merely multiplies
together different statistics (or 1+ Δ statistics), then the statistic which
has more volatility has more effect on the compounded measure than does a
statistic which has less volatility. For
some manipulations it may be helpful to eliminate these differences between the
measures so they are equally weighted in effect. But for the purposes of making up a “growth,”
“prosperity” and “inclusion” measure, it would have been preferable to combine
the three statistics without resort to the standard-deviations-from-the-mean manipulation. If one statistic is close to the same for all
the cities, then that statistic is indeed a less important contributor to any
meaningful measure of a city. If all
cities are within 1% of each other in some measure, then that is a reason to
note that the measure in question does not vary much and so is not an important
way to distinguish the cities. It is not
a reason to multiply that measure’s contribution ten-fold because we are
combining it with some other measure which varies by 10%.
But all in all, the biggest problem with the Brookings “inclusion” measure is that it
has little to do with “how the benefits of growth and prosperity in a
metropolitan economy are distributed among people” and even less to do with how
able the cities’ people are to “invest in their skills and to purchase more
goods and services.” A correlation of more
straightforward statistics shows that a city’s Growth (ΔGMP) in fact reliably
drives (slope = 0.85; r-squared = 0.61; P < 0.0001) its ΔMedian Wage and ΔJobs.
References
Badger, E.
(2016, February 2). All the people being left behind in America’s booming
cities. Washington Post Wonkblog.
Retrieved from
www.washingtonpost.com/news/wonk/wp/2016/02/02/all-the-people-being-left-behind-in-americas-booming-cities/
Berube, A. (2016,
January 29). In metro areas, growth isn't reliably trickling down. Retrieved
from www.brookings.edu/blogs/the-avenue/posts/2016/01/29-growth-isnt-reliably-trickling-down-in-metro-areas-aberube
Brookings.
(2016). Metro monitor. Retrieved from www.brookings.edu/research/reports2/2016/01/metro-monitor#V0G37980
Pittelli, D.
(2016, January-February). Cambridge 02138 – Letters to the Editor. Harvard Magazine. Retrieved from
http://harvardmagazine.com/2015/12/cambridge-02138
Sullins, B.
(n.d.). Enterprise business intelligence with Tableau Server. Pluralsight. Retrieved
from
https://app.pluralsight.com/library/courses/enterprise-business-intelligencetableau-server/table-of-contents