This online study investigates how first (L1) and foreign language (LX) users, and naïve (L0) listeners of Mandarin perceive the valence and arousal level of a Chinese interlocutor in various communication modalities. The 1485 participants (651 L1, 292 LX, and 542 L0 Mandarin users) were presented with 12 recordings of a Chinese actor conveying emotional events in the visual-vocal-verbal, vocal-verbal, visual-only, or vocal-only modality. Valence and arousal perceptions were collected via the 2DAFS (Lorette, 2021). Disregarding the vocal-only modality which led to neutral perceptions, bootstrapped regression models suggest that modality does not affect L1 users’ valence perceptions. LX and L0 users perceive markedly more neutral valence levels in the absence of visual cues, and in the case of positive stimuli, slightly lower arousal levels. This calls for a more nuanced conceptualisation of valence and arousal as universal features of emotions and stress the significance of modality for intercultural communication.