We argue that a multimodal approach to defining a depictive class of words called ‘ideophones’ by linguists is essential for grasping their meanings. Our argument for this approach is based on the formal properties of Pastaza Quichua ideophones, which set them apart from the non-ideophonic lexicon, and on the cultural assumptions brought by speakers to their use. We analyze deficiencies in past attempts to define this language's ideophones, which have used only audio data. We offer, instead, an audiovisual corpus which we call an ‘antidictionary’, because it defines words not with other words, but with clips featuring actual contexts of use. The major discovery revealed by studying these clips is that ideophones’ meanings can be clarified by means of a distinction found in modality and American Sign Language studies. This distinction between speaker-internal and speaker-external perspective is evident in the intonational and gestural details of ideophones’ use.