Previous studies have shown that visual information is a crucial input in early language learning. In the present study we examine what type of visual input helps preschoolers in acquiring nonnative phonological contrasts. Catalan/Spanish-speaking children (4–5 years, N = 47) participated in a task to assess their phonological discrimination abilities before and after a training. Three training conditions were presented: one with clear oral/visual speech information, one with an ostensive object-sound mapping, and one with a rich social interaction. Children’s looking patterns were tracked to examine their focus of interest while being trained. Results revealed that preschoolers’ discrimination abilities increase in all trained conditions, but the condition where the speaker created an ostensive object–sound mapping led to higher long-term gains (especially for younger children). Eye-tracking results further showed that children looked to the object of reference while being exposed to the novel phonological input, which may explain the higher learning gains in this condition. Our results indicate that preschoolers’ learning of nonnative phonological contrasts is particularly boosted when the speech input is accompanied by an object of reference that is signaled ostensively and contingently in the visual space, compared to when the visual space only contains clear oral/visual speech information or social interactivity cues.