Visual and auditory information jointly contribute to face categorization processes in humans, and gender is a socially relevant multisensory category specified by faces and voices that is detected early in infancy. We used an eye tracker to study how...
Visual and auditory information jointly contribute to face categorization processes in humans, and gender is a socially relevant multisensory category specified by faces and voices that is detected early in infancy. We used an eye tracker to study how gender coherence in audio and visual modalities influence face scanning in 9‐ to 12‐month‐old infants and in adults. While viewing dynamic faces, infants attended to a speaker's mouth region to a greater extent than adults, regardless of speech, which was mostly due to an increase in mean fixation durations. However, the time course of attending to eye and mouth regions showed similarities in adults and infants. Face–voice congruence for gender appeared to have little effect on measures of face scanning. Overall, results suggested that 9‐ to 12‐month‐old infants give more weight to the processing of a speaker's mouth compared to adults but that infants already have an adult‐like face‐scanning strategy.