Most U.S. school systems group children together in one-year cohorts based on a cutoff date, usually August 31 / September 1. For those school systems, the NEJM article looked at rates of ADHD diagnosis for all of the children, grouped by month of birth. The analysis primarily compared ADHD rates for adjacent months, as here:
The graphic above shows that the rate difference between August-born children and September-born children is statistically significant (p < .05; note the 95% error bar clearing the dotted “zero” line), but that no other adjacent months show a statistically significant difference.
I believed that one could show stronger evidence from a more holistic look at the data. Using the table of data from above, I made a graph using r. In the graph below, blue columns show the rate of ADHD diagnosis by birth month. The oldest students, at left, have birthdays in September. The graph also shows a red curved regression line, and orange 95% error bars for each month, based on a binomial distribution on each month's sample size.
To put this in narrative form, it is not so much that the youngest (August birthday) children have elevated ADHD rates, as that the older half of the class on the left has increasingly lower ADHD rates. It appears that about a third of the oldest have matured out of the level of behavior which would result in an ADHD diagnosis. Teachers and pediatricians might wish to take this into account especially before concluding that a child in the younger half of his class has ADHD, at least in borderline cases.
The younger half of the class at right shows a less clear trend. This nonlinearity is shown by the curved regression line, which is upward sloping and downward curving. Of course, humans make note of patterns, and random effects may look like a pattern. To calculate whether these patterns are statistically significant, a regression looking at both the linear and squared features showed strong significance, with p < .001 for the upward sloping linear feature, and p = .001 for the squared feature (the downward curve). Further analysis, considering that the actual statistical deviation of the measured samples is smaller than their apparent deviation compared to each other, brought p << .001.
Recent Twitter correspondence with coauthor Timothy Layton provided a plausible explanation for the flattening on the right side of the graph: Children born in the summer are more likely to be held back a year, and thus to become the oldest children in a new cohort – especially if they exhibit less mature behavior. This holding back may replace an ADHD diagnosis as a solution to behavioral issues, and/or may reduce later ADHD diagnoses as the child is now compared to a younger, less mature cohort.
Apart from what the data is about in this case, this analysis presented some interesting exercises for understanding the use of data:
- that is categorized or grouped by range;
- where the sampling error of the measured samples is smaller than their apparent deviation when compared to each other; and
- where Monte Carlo simulations may prove helpful.