Wednesday, May 20, 2020

Graphing COVID-19


There's been a lot of debate online about how various countries have fared in the face of the COVID-19 pandemic.  Often the countries chosen are cherry-picked, or the measure of deaths (i.e., totals, not rates) says more about the size of the countries' populations than their success or failure in any meaningful sense.

There are just 8 major countries with higher reported death rates than the US.  But the fact that all of them are well-functioning economies and democracies in Western Europe would tend to refute the notion that economic or political backwardness is the key to their high death rates.

In my opinion, the key disadvantages that the US and Western Europe have, relative to the Third World and islands like New Zealand, is that the former are so large and interconnected (in terms of travel).  A Western reluctance to wear masks is also a major disadvantage relative to East Asian countries.

To illustrate such comparisons, I've written an R script to download the latest daily data from the European Centre for Disease Prevention and Control (ECDC), and turn daily death rates for selected countries into a line graph, such as here (click on graph for higher resolution; the graph is followed by my R script):




# R Code to download and graph ECDC historical daily data
# David Pittelli -- woodedpaths.blogspot.com
# https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

# Load packages************
library(ggplot2)
library(RColorBrewer)

# Download data from web************
df = read.csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", na.strings = "", fileEncoding = "UTF-8-BOM", stringsAsFactors=FALSE)

# Subset rows and add columns ***********************

# eliminate NAs in countryterritoryCode
df = df[!is.na(df$countryterritoryCode),]

# Add deaths per 100 Million column
df$DPC = (df$deaths * 100000000) / df$popData2018

# Add 7-day moving average for deaths per 100 Million
# vectors are reversed since MA can only look upwards (or in both directions)
df$DPCMA = rev(filter(rev(df$DPC), rep(1/7,7), sides = 1))

# Add date column in format 'mm/dd/yyyy'
df$date2 = as.Date(df$dateRep, "%d/%m/%Y")

# Select Countries and dates to graph***********************
df2 = df[df$geoId == "ES" | df$geoId == "IT" | df$geoId == "UK" | df$geoId == "SE" | df$geoId == "US" | df$geoId == "DE" |  df$geoId == "KR"|  df$geoId == "NZ",]

# Just look since March 1
df2 = df2[df2$date2 > as.Date("2020-02-29"), ]


# Today's Date to Label Graph****************
today = paste(substr(date(), 5, 10), substr(date(), 21, 24), sep = ", ")

# Manual color selection****************
ggplotColours <- 360="" function="" h="c(0," n="6," o:p="">
  if ((diff(h)%%360) < 1) h[2] <- -="" 360="" h="" n="" o:p="">
    hcl(h = (seq(h[1], h[2], length = n)), c = 100, l = 65)
}

# Build plot****************
p = ggplot(data = df2, aes(x = date2, y = DPCMA, group = geoId, color = geoId)) +
     geom_line(size = 2) +
      theme_bw() +
      ggtitle(paste("Daily Death Rate from COVID-19, 7-Day smoothing,", today)) +
      theme(plot.title = element_text(size = 24)) +
      theme(axis.title.x = element_text(size = 14)) +
      theme(axis.title.y = element_text(size = 14)) +
      theme(legend.text = element_text(size = 14)) +
      xlab("Date") + ylab("Daily Deaths per 100 Million People") +
      theme(legend.title=element_blank()) +
    annotate("text", x = as.Date(18340, origin="1970-01-01"), y = 1800, label="David Pittelli -- woodedpaths.blogspot.com\nData from ecdc.europa.eu", col = "black")

# Assign colors and legend labels (or just type "p" for standard colors & 2-letter abbreviations)
p +  scale_colour_manual(breaks=c("ES", "IT", "UK", "SE", "US", "DE", "KR", "NZ"),
        labels=c("Spain", "Italy", "UK", "Sweden", "US", "Germany", "S. Korea", "New Zealand"),
        values =c(ggplotColours(7), "black"))