Most U.S. school systems group children together in one-year cohorts based on a cutoff date, usually August 31 / September 1. A recent study of 407,846 children published in the Journal of the American Medical Association (JAMA) showed that the older children within each class are about 30% less likely to be labeled as having attention deficit–hyperactivity disorder (ADHD).
The JAMA article primarily compared each month to neighboring months, but I believed that one could show greater evidence from a more holistic look at the data. Using a table of data from the article, I made this graph using r. Blue columns show the rate of ADHD diagnosis, by birth month. The oldest students, at left, have birthdays in September. The graph also shows a red regression line, and orange 95% error bars for each month, based on a binomial distribution on each month's sample size.
To put this in narrative form, it is not so much that the youngest (August birthday) children have elevated ADHD rates, as that the older half of the class on the left has increasingly lower ADHD rates. It appears that about a third of the oldest have matured out of the level of behavior which would result in an ADHD diagnosis. Teachers and pediatricians might wish to take this into account especially before concluding that a child in the younger half of his class has ADHD, at least in borderline cases.
The younger half of the class at right shows a less clear trend. This nonlinearity is shown by the red regression line, which is upward sloping and downward curving. Of course, humans make note of patterns, and random effects may look like a pattern. To calculate whether these patterns are statistically significant, a regression looking at both the linear and squared features showed strong significance, with p < .001 for the upward sloping linear feature, and p = .001 for the squared feature (the downward curve). Further analysis, considering that the actual statistical deviation of the measured samples is smaller than their apparent deviation compared to each other, brought p << .001.
Recent Twitter correspondence with coauthor Timothy Layton provided a plausible explanation for the flattening or downturn on the right side of the graph: Children born in the summer (the youngest children in the cohort) are more likely to be held back a year, and thus to become the oldest children in a new cohort – especially, one would think, if they exhibit ADHD or other behavior considered less than mature. This holding back may replace an ADHD diagnosis as a solution to behavioral issues, and/or may reduce later ADHD diagnoses as the child is now compared to a younger, less mature cohort.
Apart from what the data is about in this case, this analysis presented some interesting exercises for understanding the uses of data:
- that is summarized;
- that is categorized or grouped by range;
- where the sampling error of the measured samples is smaller than their apparent deviation when compared to each other; and
- where Monte Carlo simulations may prove helpful.
In future posts I will cover such subjects and the r code I've used to analyze them.