A statistical estimate of infant and toddler vocabulary size from CDI analysis
For the last 20 years, developmental psychologists have measured the variability in lexical development of infants and toddlers using the MacArthur-Bates Communicative Development Inventories (CDIs) – the most widely used parental report forms for assessing language and communication skills in infants and toddlers. We show that CDI reports can serve as a basis for estimating infants’ and toddlers’total vocabulary sizes, beyond serving as a tool for assessing their language development relative to other infants and toddlers. We investigate the link between estimated total vocabulary size and raw CDI scores from a mathematical perspective, using both single developmental trajectories and population data. The method capitalizes on robust regularities, such as the overlap of individual vocabularies observed across infants and toddlers, and takes into account both shared knowledge and idiosyncratic knowledge. This statistical approach enables researchers to approximate the total vocabulary size of an infant or a toddler, based on her raw MacArthur-Bates CDI score. Using the model, we propose new normative data for productive and receptive vocabulary in early childhood, as well as a tabulation that relates individual CDI measures to realistic lexical estimates. The correction required to estimate total vocabulary is non-linear, with a far greater impact at older ages and higher CDI scores. Therefore, we suggest that correlations of developmental indices to language skills should be made to vocabulary size as estimated by the model rather than to raw CDI scores.