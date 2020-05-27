Articles

In Part I, I described the unique position of New York City in the pandemic. Now I turn to a dozen large cities other than New York. The dozen have in common that they are among America’s largest cities (they include the rest of the top nine besides New York) and they reported coronavirus cases by zip codes or — in Los Angeles and Washington — reported cases by such detailed neighborhoods that zip codes could be accurately inferred.

In all 12 cases, I included contiguous zip codes that had population densities of at least 1,000 per square mile, meaning that all the conclusions apply not just to the people within the city limits, but also to the surrounding metropolitan area (Los Angeles is a partial exception, limited to Los Angeles County). In all, I obtained complete data for 1,363 zip codes in and around the dozen cities, a 99-percent sample of the geographically-defined zip codes.

These metropolitan areas give us a wide range of city types — standalone cities such as Houston and connected cities areas such as Dallas/Fort Worth; compact cities such as Philadelphia and sprawling ones such as Los Angeles. Geographically, I have examples from the Northeast, Mid-Atlantic, Deep South, Midwest, Southwest, and West Coast.

Overall, these cities had experienced the pandemic very differently from each other as of early May, when my zip code database was assembled:

How did the varying coronavirus rates match up with these varied settings? I will explore two possibilities for explaining a zip code’s coronavirus rate: its socioeconomic status and the ethnic mix of its population.

Socioeconomic status and coronavirus rates

Why should coronavirus rates differ within the same city? Thinking about how the socioeconomic status (SES) of a zip code might affect coronavirus rates illuminates the two broad explanations.

One set of answers involves economics. People with enough money to have a vacation home also have an easy option for fleeing the city — something that happened on a large scale in elite Manhattan zip codes. For those who stay in their urban homes, avoiding contact with other people is much easier in a townhouse with a garden than in a one-bedroom apartment in a high-rise housing project. With enough money, people don’t need to use public transportation. They are more likely to be in occupations that allow them to work at home. People working at low-income jobs that weren’t closed down are more likely to be in service jobs that require contact with people. The list of reasons why wealth could be negatively associated with coronavirus rates is long.

Another set of answers involves personal characteristics. For example, the Big Five personality trait known as conscientiousness is statistically associated with qualities such as self-discipline and organization, both of which are useful for realizing good intentions about maintaining social distance and wearing masks. Independently of conscientiousness, some people are highly risk-averse and will behave in ways that reduce their probability of contracting COVID-19. People who are highly extroverted have the opposite problem — quarantine is at war with their natural instincts. And, of course, there is that controversial trait known as general cognitive ability — that thing best measured by IQ tests — which is statistically associated with the ability to think ahead, foresee consequences, and calculate risks, all of which are associated with behaviors that reduce the risk of contracting COVID-19.

Many groupings of people exhibit different means and distributions on these traits — ethnic groups, men and women, people in different demographic settings, people in different occupations. The extent to which the sources of those differences are genes, family environment, or culture isn’t the point. Documented differences exist, and they are likely to be associated with the likelihood that an individual will contract COVID-19. If the differences in probabilities for individuals are big enough, they may also be associated with statistics for zip codes.

In the case of a zip code’s SES, the potential for such associations comes from the significant associations of high conscientiousness with achievement in general and high IQ with career success and financial success.

Because of those associations, the populations of high-SES zip codes are disproportionately on the high end of both conscientiousness and IQ. Note the word disproportionately. Many affluent people are irresponsible and not particularly smart. That doesn’t negate the reality of the statistical profile for large groups of affluent people.

To measure the socioeconomic status of zip codes, I used the same SES index that I used to identify elite “Superzips” in my book, Coming Apart. It combines two indicators: median family income and the percentage of adults with at least a college degree.

I found that higher SES was associated with a lower coronavirus rate in all 12 cities. The size of those negative correlations between coronavirus rate and the SES index varied widely. The correlation was strikingly large (–.77) in Chicago, followed by Washington and San Diego, with correlations of –.58 and –.56 respectively. At the other extreme were Phoenix and Baltimore, with correlations that were effectively zero (–.01 and –.03 respectively). The correlation for Los Angeles was also notably low (–.17). The other cities were spread out along the range from –.25 to –.37.

Does this mean that you can’t generalize about the association of a zip code’s socioeconomic status and its coronavirus rate in America’s large cities? It depends on how you ask the question. If you ignore where the zip codes are and lump them all together, SES explains virtually nothing — the bivariate correlation for all 1,363 zip codes is a small –.10.

But it’s myopic to ignore where the zip code is located. We have to assume that a zip code’s coronavirus rate is not independent of the characteristics of the surrounding city. The statistical method I’m using (see the box) lets me get a more accurate sense of the role of SES by taking the unique characteristics of each city into account. When I do that, the regression coefficient associated with the SES index score is not only statistically highly significant but substantively meaningful, equivalent to more than a fifth of a percentage point in the coronavirus rate for zip codes at the 10th and 90th SES percentiles.

Some technical background

For readers who want details about the analyses that I omit from the main text: The multivariate method is regression analysis. The dependent variable is always the coronavirus rate expressed as a percentage of the population. “White” and “black” always refer to non-Latinos. Persons who self-identify as Latinos can be of any race. In addition to the SES index score and the variables involving ethnicity, the unreported independent variables for all the multivariate analyses are the logged value of the population, logged value of population density, and a vector of dummy variables for the cities. Bivariate analyses are weighted for the zip code’s population.

The remaining task is to try to understand why taking the city into account makes so much difference. My best guess is that the explanation comes down to a predictable relationship between the coronavirus rate and size of the gap between low-SES and high-SES zip codes. Think of it this way: The first few cases in a city are likely to have a high degree of randomness. As the number of cases rises, whatever underlying forces are at work will have more of a chance to influence the coronavirus rate. Suppose, for example, that riding public transportation every day is a major risk factor. People in poor neighborhoods use public transportation more than people in rich neighborhoods. With just a handful of cases citywide, there’s no way that the causal effect is measurable. The larger the number of coronavirus cases, the more likely it is that the causal effect will surface.

The implication is that the cities where SES plays a small role now are likely to see the role of SES increase as their coronavirus rates increase. If places with low coronavirus rates such as Phoenix, Baltimore, and Los Angeles continue to show low correlations between coronavirus rates and the SES index despite mounting coronavirus cases, then we can start to look into the possibility that local cultural, economic, or policy factors are working against the link. But as of now, the data are more consistent with an expectation that the correlations will rise in tandem with the coronavirus rates.

Ethnicity and coronavirus rates

Ethnically, none of the 12 cities has a white majority. Based on the zip codes in their metropolitan areas, Miami-Fort Lauderdale and San Antonio are majority Latino, Los Angeles is half Latino, and four others are at least a third Latino. The percentages of African Americans span a range from just 5 percent in San Diego and Phoenix to more than 40 percent in Baltimore and Atlanta. Asian Americans amounted to 15 percent in Los Angeles and 12 percent in San Diego and Washington.

As coronavirus cases have grown, so has evidence that African Americans and other minorities have suffered disproportionately. Understandably, those statistics have been accompanied by allegations that the direct and indirect effects of racism are to blame.

The relationship of ethnicity to coronavirus rates is indisputable, but so is the relationship of ethnicity to SES. For the zip codes in the dozen cities, the correlation between the SES index and the percentage of whites in the population was an extremely large +.76. The comparable correlations of the SES index with the percentages of blacks and Latinos were also sizable but negative: –.30 and –.64 respectively. It’s a decades-old question for all things associated with ethnic disparities: Are we looking at the effects of ethnicity or the effects of socioeconomic status?

The statistical way of dealing with that question is to enter both the SES index and measures of population ethnicity into the same equation. The problem is that the sample is made of zip codes instead of individuals, and we are not going to see any change in group means because the percentage of a given ethnic group goes from, say, 5 percent of the population to 10 percent. Whatever the effects of ethnicity on coronavirus rates may be, we can be sure they are not decisively large and therefore will not be detected except when the group in question is close to ethnically homogeneous. Added to that is another consideration: Insofar as an ethnicity has a distinctive culture that in turn might be related to coronavirus rates, it is most likely to play an important local role when the population is made up overwhelmingly of people of that culture — a truism about culture and ethnic homogeneity that applies to all ethnicities, including non-Latino whites.

I tried alternative operational definitions of “overwhelmingly of the same ethnicity,” but they all told effectively the same story. I report the results when the operational definition of “overwhelmingly” is 80 percent of the population. Of the 1,363 zip codes in the dozen cities, 96 were overwhelmingly white by this definition, 57 were overwhelmingly black, 80 were overwhelmingly Latino, and 1,130 did not meet the “overwhelming” criterion for any ethnic group.

Here are the results of the multivariate analysis:

A standardized regression coefficient is comparable to an “effect size” in the analysis of experiments or social programs. In this case, the coefficient of –.18 for the SES index score means that an increase of a standard deviation in a zip code’s SES index score is associated with a reduction of 0.18 standard deviations in the zip code’s coronavirus rate even after taking the ethnic composition of the zip code into account along with other control variables (see the box). For the ethnic variables, the coefficients are relative to a reference group — in this case, the 1,130 zip codes that did not meet the “overwhelming” criterion for any ethnic group.

Translating these results into English: SES maintains its significant association with lower coronavirus rates even after ethnicity has been taken into account. The magnitude of SES’s independent effect is small by the customary interpretation of effect sizes. White zip codes are also associated with modestly lower coronavirus rates independently of SES. These data don’t support a significant effect involving Latino zip codes. For African American zip codes, a very small effect does reach statistical significance.

If you’re looking for clarity, it is an unsatisfying set of results. It is correct that disturbingly disproportionate numbers of blacks and Latinos have contracted COVID-19, but you cannot use the zip code analyses to support claims that discrimination and racism are the culprit. And yet it remains true that overwhelmingly white zip codes do have systematically lower coronavirus rates than black or Latino zip codes — not by a lot, but it is an advantage for white zip codes that persists when you look at the data from multiple perspectives that I have not reported here.

The bottom line is a completely unoriginal finding. Social scientists first got the capability to analyze multivariate statistical relationships for large samples in the mid-1960s. In the subsequent 55 years, they have explored the relative effects of self-identified ethnicity and SES for a broad array of indicators involving health, social behavior, labor market behavior, educational success, and economic success. With few exceptions, these analyses have found that different ethnic groups have different outcomes that are partially but not fully explained by SES. And so it is with coronavirus rates. So far, the independent role of ethnicity in the pandemic is unusual only in being smaller than its role in many other persisting ethnic differences.

