Auditory verbal hallucinations are a cardinal feature of schizophrenia. Their pathophysiology is unclear; one model proposes that they occur because self-generated inner speech is misperceived as externally generated speech (Reference Frith and DoneFrith & Done, 1989) as a result of a failure to recognise the internal nature of the former. Another theory suggests that a primary generator of activity within the auditory cortex (similar to an epileptiform focus) gives rise to these hallucinations (Reference David, David and CuttingDavid, 1994). Recent neuroimaging studies suggest that both speech generation and perception areas are activated during auditory verbal hallucinations (Reference Dierks, Linden and JandlDierks et al, 1999; Reference Lennox, Parks and JonesLennox et al, 1999; Reference Shergill, Brammer and WilliamsShergill et al, 2000), but the sequence in which these areas are activated remains unclear. One case study suggested that activation of the temporal cortex was evident 3 s before the reporting of auditory verbal hallucinations (Reference Lennox, Parks and JonesLennox et al, 1999), and patients with Charles Bonnet syndrome demonstrated visual cortical activation preceding the perception of visual hallucinations by 12 s (Reference Ffytche, Howard and BrammerFfytche et al, 1998).
METHOD
We successfully studied two male dextral patients with DSM–IV schizophrenia. Both were experiencing frequent and intermittent auditory verbal hallucinations. We screened six other patients, but three failed to hallucinate in the scanner, and the pattern of the reported hallucinations did not permit examination of the time course in the other three (as their epochs of hallucinations were not separated by the required minimum of 9 s). The first participant was 47 years old with a 22-year history of illness, and was being treated with clozapine, amisulpride and sodium valproate. The second was 26 years old, had a 6-year history of illness and was being treated with olanzapine. In both cases the hallucinations involved people making derogatory remarks to the patient, the majority expressed in the second person. All eight patients gave informed consent to participate in the study, which was approved by the local ethics committee.
Image acquisition and analysis
Participants were scanned at rest (while they were intermittently hallucinating). They were asked to press a button with their left index finger at the onset of a hallucination and to release the button when it stopped. This was repeated for every hallucination they experienced during the 5 min session. Gradient-echo echoplanar magnetic resonance (MR) images were acquired using a 1.5 tesla GE Signa System (General Electric, Milwaukee, WI, USA) fitted with Advanced NMR hardware and software (ANMR, Woburn, MA, USA) at the Maudsley Hospital, London. In each of 14 non-contiguous planes parallel to the intercommissural (anterior–posterior) plane, 100 T 2 *-weighted MR images depicting blood oxygen level-dependent (BOLD) contrast were acquired, with time to echo 40 ms, time to repetition 3000 ms, in-plane resolution 3.1 mm, slice thickness 7 mm and slice skip 0.7 mm in a 5 min run. At the same session a 43-slice, high-resolution inversion recovery echoplanar image of the whole brain was acquired in the intercommissural plane (time to echo 73 ms, inversion time 180 ms, time to repetition 16 000 ms, in-plane resolution 1.5 mm, slice thickness 3 mm).
The data were first realigned to minimise motion-related artefacts (Reference Bullmore, Brammer and Rabe-HeskethBullmore et al, 1999), corrected for slice timing and smoothed using a Gaussian filter (full-width half-maximum 7.2 mm). Responses to the experimental paradigms were then detected by time-series analysis using gamma variate functions (peak responses at 4 s and 8 s) to model the BOLD response. The analysis was implemented as follows (Reference Brammer, Bullmore and SimmonsBrammer et al, 1997). First, each experimental condition was convolved separately with the 4 s and 8 s Poisson functions to yield two models of the expected haemodynamic response to that condition. The weighted sum of these two convolutions that gave the best fit (least squares) to the time series at each voxel was then computed. Following this fitting operation, a goodness-of-fit statistic was computed at each voxel. This was the ratio of the sum of squares of deviations from the mean intensity value due to the model (fitted time series) divided by the sum of squares due to the residuals (original time series minus model time series). This statistic is called the sum of squares (SSQ) ratio. In order to sample the distribution of SSQ ratio under the null hypothesis that observed values of SSQ ratio were not determined by experimental design (with minimal assumptions), the time series at each voxel was permuted using a wavelet-based resampling method described in detail by Bullmore et al (Reference Bullmore, Long and Suckling2001). This process was repeated ten times at each voxel and the data combined over all voxels, resulting in ten permuted parametric maps of SSQ ratio at each plane for each participant. Combining the randomised data over all voxels yields the distribution of SSQ ratio under the null hypothesis. Voxels activated at any desired level of type I error can then be determined obtaining the appropriate critical value of the SSQ ratio from the null distribution. The observed and randomised SSQ ratio statistic maps were then transformed into standard space. Median SSQ ratio maps for the two participants were constructed at the P<0.005 level of significance. The early and late phases of auditory verbal hallucinations were examined by repeating the above analysis after shifting the hallucination log (indicated by the button-press) with respect to the functional MR time series in steps of one scan (shifts of -9 s, -6 s, -3 s, +6 s and +9 s), following the method described by Ffytche et al (Reference Ffytche, Howard and Brammer1998).
RESULTS
Each auditory verbal hallucination lasted an average of 16 s (range 3–42) with an average silent (inter-hallucination) interval of 34 s (range 9–75); each participant made six button-presses during the 5 min investigation. The main areas activated before the reporting of an hallucination (relative to non-hallucinating events) were the left inferior frontal gyrus and the right middle temporal gyrus (Fig. 1). As the individual became aware of the hallucination, this activation extended to the left insula as well as the left inferior frontal gyrus, and to the middle and superior temporal gyri bilaterally. There was also activation in the right middle frontal gyrus and the right sensorimotor cortex (probably related to the action of button-pressing). After the hallucination had subsided the activation in the insula persisted and there was additional involvement of the orbitofrontal cortex (Fig. 1). Activation within most of the above regions was evident in both the individual activation maps.
DISCUSSION
These results demonstrate activation of the left inferior frontal gyrus prior to the perception of auditory verbal hallucinations, with activation in the temporal cortex mainly occurring when the participant subsequently perceived auditory speech. As the left inferior frontal region is normally activated during the generation of inner speech (Reference Shergill, Bullmore and BrammerShergill et al, 2001), this is consistent with the notion that these hallucinations result from the misidentification of self-generated verbal material (Reference Frith and DoneFrith & Done, 1989). The timing of the activation in the temporal cortex suggests that these regions are more involved in the actual perception of auditory hallucinations. The hallucination may thus begin with the generation of auditory verbal material in the left inferior frontal cortex, followed by conscious awareness of external speech coincident with the subsequent engagement of temporal cortical areas (Reference Dierks, Linden and JandlDierks et al, 1999; Reference Lennox, Parks and JonesLennox et al, 1999; Reference Shergill, Brammer and WilliamsShergill et al, 2000), perhaps reflecting direct communication through frontotemporal connections (Reference Shergill, Brammer and FukudaShergill et al, 2002).
eLetters
No eLetters have been published for this article.