Although learning second language phonology is a difficult task, orthographic input may support the learning of difficult sound contrasts through a process known as orthographic facilitation. We extended this research by examining the effects of orthographic input together with individual differences in three different phonological learning processes, namely, the production of, perception of, and memorization of words containing three Marathi phonemic contrasts (i.e., [k-kh], [], and []) by native English speakers. Moreover, because the [] and [] contrasts were particularly challenging in previous auditory training studies (e.g., Polka, 1991), we used cross-modal training in order to enhance learning by pairing auditory perception tasks with visual orthographic information, the amplification of relevant acoustic cues, and proprioceptive descriptions to the articulation of target phonemes. Results showed significant learning from the pre- to the posttest across tasks and contrasts, supporting the effectiveness of cross-modal training. Furthermore, incongruent orthographic input could inhibit perception, and orthographic input generally supported memory for word pronunciations. Moreover, individual differences regarding phonological skills and nonspeech auditory discrimination predicted participants’ success in different phonological learning processes. These results provide a detailed picture of the complexity between different aspects of second language phonological learning and cross-modal training.