The primary aim of this study was to evaluate the effect of auditory feedback in a VR system planned for clinical use and to address the different factors that should be taken into account in building a bimodal virtual environment (VE). We conducted an experiment in which we assessed spatial performances in agoraphobic patients and normal subjects comparing two kinds of VEs, visual alone (Vis) and auditory–visual (AVis), during separate sessions. Subjects were equipped with a head-mounted display coupled with an electromagnetic sensor system and immersed in a virtual town. Their task was to locate different landmarks and become familiar with the town. In the AVis condition subjects were equipped with the head-mounted display and headphones, which delivered a soundscape updated in real-time according to their movement in the virtual town. While general performances remained comparable across the conditions, the reported feeling of immersion was more compelling in the AVis environment. However, patients exhibited more cybersickness symptoms in this condition. The result of this study points to the multisensory integration deficit of agoraphobic patients and underline the need for further research on multimodal VR systems for clinical use.