Most U.S. school systems group children together in one-year cohorts
based on a cutoff date, usually August 31 / September 1. A recent study of 407,846 children published
in the Journal of the American Medical Association (JAMA) showed that the older
children within each class are about 30% less likely to be labeled as having attention
deficit–hyperactivity disorder (ADHD).

The JAMA article primarily compared each month to neighboring months,
but I believed that one could show greater evidence from a more holistic look
at the data. Using a table of data from
the article, I made this graph using r.
Blue columns show the rate of ADHD diagnosis, by birth month. The oldest students, at left, have birthdays
in September. The graph also shows a red regression line, and orange 95% error
bars for each month, based on a binomial distribution on each month's sample
size.

To put this in narrative form, it is not so much that the youngest
(August birthday) children have elevated ADHD rates, as that the older half of
the class on the left has increasingly lower ADHD rates. It appears that about a third of the oldest
have matured out of the level of behavior which would result in an ADHD
diagnosis. Teachers and pediatricians
might wish to take this into account especially before concluding that a child
in the younger half of his class has ADHD, at least in borderline cases.

The younger half of the class at right shows a less clear trend. This nonlinearity is shown by the red
regression line, which is upward sloping and downward curving. Of course, humans make note of patterns, and
random effects may look like a pattern.
To calculate whether these patterns are statistically significant, a
regression looking at both the linear and squared features showed strong
significance, with

*p*< .001 for the upward sloping linear feature, and*p*= .001 for the squared feature (the downward curve). Further analysis, considering that the actual statistical deviation of the measured samples is smaller than their apparent deviation compared to each other, brought*p*<< .001.
Recent Twitter correspondence with coauthor Timothy Layton provided a
plausible explanation for the flattening or downturn on the right side of the
graph: Children born in the summer (the youngest children in the cohort) are
more likely to be held back a year, and thus to become the oldest children in a
new cohort – especially, one would think, if they exhibit ADHD or other behavior
considered less than mature. This
holding back may replace an ADHD diagnosis as a solution to behavioral issues,
and/or may reduce later ADHD diagnoses as the child is now compared to a younger,
less mature cohort.

Apart from what the data is about in this case, this analysis presented
some interesting exercises for understanding the uses of data:

- that is summarized;
- that is categorized or grouped by range;
- where the sampling error of the measured samples is smaller than their apparent deviation when compared to each other; and
- where Monte Carlo simulations may prove helpful.

In future posts I will cover such subjects and the r code I've used to analyze
them.