1. INTRODUCTION
There are over 39 million totally blind people in the world (WBU, 2012); 5·9 million are living in Africa, 3·2 million in America and 2 million in Europe. Blind people have significant constraints in their everyday life, mainly with regard to their mobility. Though they are often able to learn specific routes (e.g., how to get to the nearest shop or station), this ability is far from the desirable independence in navigation. Mobility has been defined by Foulke as: “The ability to travel safely, comfortably, gracefully and independently through the environment” (Foulke, 1997). This concept, when applied to blind travellers, implies that they must be able to detect the obstacles which are located on their walking path, to avoid them and to succeed in following their route. All these goals could be achieved by relying on accessory devices which facilitate the navigation, known as Electronic Travel Aids (ETAs). ETAs include electronic intelligent devices whose main objective is to overcome human constraints, perceiving the surrounding environment and presenting it to the blind user through tactile, vibrations, speech or acoustic senses. Since the Second World War, with the progressively increasing development of the sensors, more than 40 ETAs have been created (Dunai et al., Reference Dunai, Peris Fajarnes, Santiago Praderas and Defez Garcia2011; Dakopoulos and Bourbakis, Reference Dakopoulos and Bourbakis2010). Most of them are still at the prototype level and only a few (13 devices have been reported in Technologies) are commercial products (Dakopoulos and Bourbakis, Reference Dakopoulos and Bourbakis2010; Technologies, 2012). Also, 20 different way finding and orientation technologies have been reported (Technologies, 2012).
Nowadays, there are three main groups of ETAs, according to their working principle: radar, global positioning and stereovision. The most widely known are the ETAs based on the radar principle. These devices emit laser or ultrasonic beams. When a beam strikes the object surface, it is reflected. Then, the distance between the user and the object can be calculated as the time difference between the emitted and received beam.
The Lindsay Russel Pathsound® (Russell, Reference Russell1965; Mann, Reference Mann1970), considered to be the first ultrasonic ETA, belongs to this first group. The Pathsound® delivers three types of acoustic sounds for three different distances. The device uses ultrasonic transducers mounted in a box hanging around the user's neck. Another ultrasonic ETA is the Mowat Sonar Sensor® (Morrissette et al., Reference Morrissette, Goddrich and Henesey1981); it consists of a hand-held device which, by using the sense of touch as well as vibrations, informs the user about the presence of an obstacle. Sonicguide® (or the Binaural Sonic Aid®), designed by Kay in 1959 (Kay, Reference Kay1964), was another revolutionary ETA in the 1960's. The working range of the Sonicguide® is up to 55° in azimuth and up to 4 m in distance. The ultrasonic wide-beam transmitter is mounted between the lenses on a pair of glasses. A secondary channel is added to the output, so that the acoustical signals with low-frequency tones are separately sent to the left and right ear. This procedure is named binaural technique or stereophony. The distance is strongly dependent of the frequency. Object direction depends on the interaural amplitude differences. Due to the binaural cues, Sonicguide® is able to represent the environment with a great precision both in distance and direction.
A second type of ETAs includes devices based on the Global Positioning System (GPS) [named as Global Navigation Aids]. These devices aim to guide the blind user through a previously selected route; also, it provides user location such as street number, street crossing, etc. Within this group, the most well-known devices are the Talking Signs® and Sonic Orientation Navigation Aid® (SONA®) (Brabyn, Reference Brabyn1982; Kuc, Reference Kuc2002). Their effective range is up to 30 m in outdoor environments, and both have a similar working principle. An interesting device is the Personal Guidance System®, developed at the University of California at Santa Barbara (Loomis and Golledge, Reference Loomis and Golledge2003; Loomis et al., Reference Loomis, Golledge, Klatzky, Barfield and Caudell2001). Using radio signals provided by satellites, the device is able to provide real information of each Earth point, informing the user in real time about their position in the environment.
With the introduction of the webcam, many researchers proposed the application of stereovision to develop new techniques for representation of the surrounding environment. Nowadays, there are few prototypes in the world using stereovision: among them, the Voice® prototype (Meijer, Reference Meijer1992; Meijer, Reference Meijer2005), the Real-Time Acoustic Prototype® (Dunai et al., Reference Dunai, Peris Fajarnes, Santiago Praderas, Defez Garcia and Lengua Lengua2010) and the Eye2021® (Dunai et al., Reference Dunai, Peris Fajarnes, Santiago Praderas and Defez Garcia2011), SWAN® (Wilson et al., Reference Wilson, Walker, Lindsay, Cambias and Dellaert2007), Tyflos® (Dakopoulos and Bourbakis, Reference Dakopoulos and Bourbakis2008). All these devices intend to represent the surrounding environment through acoustic signals.
Nowadays, real time 3-Dimensional (3-D) imaging has become an important factor in many applications such as: pattern recognition, robotics, and pedestrian safety, object tracking, etc. 3-D imaging is essential for measuring the distance and shape of objects. The application of the 3-D imaging in the ETAs for blind people provides more benefits regarding distance and direction estimation, or object surface and texture identification. Over the last decades, the use of multiple sensors has enabled additional information about the surrounding environment to be obtained, simultaneously scanning a wide range of that environment. This method, compared with the existing methods, does not require manual scanning using orientation of the torso, hand or head.
Based on the idea of using multiple sensors, a novel ETA for blind people is presented in this paper. The device enables distance measurement by using the 3-D CMOS image sensor Time-Of-Flight (TOF) measurement principle.
The paper is structured as follows: Section 2 describes the developed system architecture; details of the 3-Dimensional Complementary Metal Oxide Semiconductor (3-D CMOS) image sensor circuit and of the distance measurement and sound generation methods provided there. Section 3 describes and analyses the results obtained when testing the prototype with real users. Finally, in Section 4, conclusions from the work are summarized.
2. SYSTEM ARCHITECTURE
The ‘Acoustic Prototype’ principle is based on human cognition; the electronic device scans the environment while the human brain interprets the collected information.
The Acoustic Prototype is based on ‘smart’ sunglasses with laser photodiodes as well as a 3-D CMOS sensor with a high-speed shutter implemented in a small bag together with the Field-Programmable Gate Array (FPGA) and headphones (Figure 1). The FPGA processes the signals arriving from the 3-D CMOS sensor to the Correlated Double Sampling (CDS) memory; it measures the distance between the detected objects and the sensor. Then, it applies this information to the acoustic module, which represents the distances as sounds which are delivered to the user through stereophonic headphones. The idea of using binaural sounds in the Electronic Travel Aids (ETAs) for blind people was introduced by Kay in the Sonicguide® device in 1959 (Kay, Reference Kay1974). He added a secondary auditory channel to the earlier development, the Sonic Torch®, in order to obtain a more realistic interpretation of the real environment.
In addition, the Acoustic Prototype uses acoustic sounds, measuring the corresponding Head-Related Transfer Functions by using a D manikin. In 1995, within the Espacio Acustico Virtual® project, a navigation device for blind people based on stereovision and acoustic sounds was developed which implemented this method (Gonzales-Mora et al., Reference Gonzales-Mora, Rodríguez-Hernéndez, Burunat, Chulani, Albaladejo, Ballesteros and Hellen2004). Also, Tachi used the auditory display for Mobility Aids for blind people to represent the surrounding environment (Tachi et al., Reference Tachi, Mann and Rowe1983).
In order to obtain a wide enough range of information about the environment, the Acoustic Prototype uses multiple laser sensors. A similar procedure was used in the NavBelt® device (Shoval et al., Reference Shoval, Borenstein and Koren1998). NavBelt® uses eight ultrasonic sensors, each one covering an area of 15°, so that the whole scanned sector amounts to a 120° arc. In the case of the developed Acoustic Prototype, sixty-four near-infrared laser sensors, mounted in a pair of sunglasses, are responsible for scanning the environment. The covered sector is 60°; the environment is scanned at every 0·94°. The distance measurement method is based on the Time of Flight (TOF) measuring principle for pedestrians (Mengel et al., Reference Mengel, Doemens and Listl2001). The distance is calculated as the time difference between the laser impulses sent and received by the diode. This is carried out by the 3-D CMOS sensor, using the known laser impulse velocity. This technique enables fast environment scanning and information processing by the FPGA. Finally, the device delivers, through stereophonic earphones, the acoustic sounds representing the detected objects.
2.1. 3D CMOS Sensor Circuit Description
The 3D CMOS sensor chip is based on a 0·5 μm n-well CMOS process. It includes 1 × 64 = 64 photo diode pixels, an imaging optics, electronic boards and a power supply. The main sensor specifications are given in Table 1 and Figure 2. The pixel pitch is 130 μm in the horizontal and 300 μm in the vertical plane. The resulting area is then: 1*300 μm*64*130 μm=2·5 mm 2. Each pixel consists of an n-well/p-substrate photo diode PD, an inherent capacitance CD, a sense capacitor Csense0, a hold capacitor CHD, a reset switch Φ1, a shutter switch Φ3, a buffer SF1_out, a select switch Φ4 and a binning switch Φ2 (Figure 3). The amplification factor of the buffer is 0·85. The circuit operates by resetting periodically the photo diode capacitance CD and the sense capacitance Csense0 to the voltage Uddpix and the obtained discharge. The obtained integration time of the discharge is controlled by the shutter switch Φ3. Then the capacitor CHD reads out the remaining voltage stored on Csense0. When the select switch Φ4 is connected the stored voltage from the Csense0 is read out by using the CDS. At the same time, when the voltage is read out by the CHD, on Csense0 the next voltage value is performed. Obviously, the chip performs the process almost in real time and continuously reduces the dead time to a minimum. By using the CDS and analogue averaging, the device reduces power consumption, chip temperature, circuit noises, etc. The main processing unit of the 3-D CMOS sensor is implemented on the FPGA board. The FPGA controls the system and makes possible the configuration of the system as well as the control of the 3-D CMOS sensor, the camera, the shutter and the memory.
2.2. Distance Measurement Method
In order to calculate the distance to the object, it is important to know the distance measurement method used by the 3D-CMOS sensors (Figure 4). The measurement principle is based on the TOF distance measurement using the Multiple Short Time Integration (MDSI), and the analogue switched-capacitor amplifier with Correlated Double Sampling (CDS) operation (Elkhalili et al., Reference Elkhalili, Schrey, Mengel, Petermann and Brockherde2004). The main feature of the MDSI method is that several laser pulses can be averaged on-chip, reducing the required laser power; in this way, the ratio-to-noise and range resolution measurement accuracies are increased. Also, the MDSI allows the accumulation of many laser pulses in order to achieve the best accuracy for all image pixels independently.
The TOF measurement method measures the travel time of the emitted laser pulse of some tens to hundreds of nanoseconds to the environment and the reflected one. Besides, when the short light pulse is emitted by a Near-Infrared Range (NIR) laser diode, the shutter is started, while it is stopped when the reflected light pulse is received by the detector. The light pulses are assumed to be ideal rectangular pulses. The total travel time of the laser pulse, from the laser module to objects in the surrounding environment and back to the 3-D CMOS sensor, depends on the amount of irradiance measured by the sensor, on the reflectance of the object, on the object distance and on the amount of irradiance resulting from other light sources in the environment. It is important to eliminate the effects of these other light sources and the object's reflectance from the range information on the 3-D CMOS sensor. To this end, two integration times are defined. Let T p be the light propagation time of the laser, let T 1 be the short integration time on the shutter (Figure 5). In the first measurement, the shutter time T 1 is equal to the light time T p, because both times are synchronized. The received laser pulse leads to a linear sensor signal U at the propagation time T 0, where T 0 is calculated as:
where d is the measured distance and v is the speed of light. At the time T 1, the shutter intensity U 1∼Elaser*(T1 − T0) is stored in the analogue memory of the CDS stage, where E laser represents the irradiance measured at the sensor. To measure the time delay, two measurements are required: the first at the short time shutter T 1 and second at the long light shutter time T 2. When using only T 1, different factors, such as laser power, object reflectance, sensor intrinsic parameters, or background illumination are included; they require a complex calibration procedure. In order to overcome this constraint, a second time T 2, named long light shutter time is used. At T 1, only a portion of the laser pulse and reflected light intensity is detected, whereas T 2 comprises the full-reflected light intensity. In this case, the long light integration time T 2 greatly exceeds the laser pulse T p, T2 ⩾ 2 T p. In Figure 5 it can be observed that the laser pulse and the reflected laser pulse are located inside the long light shutter time. At time T 2, the shutter intensity U 2∼E laser*Tp is obtained.
By computing the ratios between the two integrated shutter intensities, U 1 and U 2, the responsivity- and reflectance-free value is obtained:
Taking into consideration that the T 1=T p:
So that:
Substituting Equation (4) into Equation (1), the distance d of one pixel can be calculated as:
Note that the parameter given by Equation (5) is calculated for all pixels independently. This means that the Acoustical System calculates the parameter d for all 64 pixels. Moreover, the measurement cycle is repeated n times, until the system is disconnected. As mentioned before, all the results are stored in the CDS memory circuit in accumulation mode, increasing simultaneously the signal noise ratio and the sensor range resolution by $\root 2 \of n $. To sum up, each measurement is performed when the laser is connected and disconnected, the results are analysed and the difference is extracted and stored in the CDS memory.
2.3. Sound Generation Method
Whereas the sensor module provides the linear image of the surrounding environment, the acoustic module is in charge of transmitting this information to the blind user, by using virtual acoustic sounds. The function of the acoustic module is to assign an acoustic sound to each one of the 64 photodiode pixels, for different distances. The acoustic sounds will be reproduced through the headphones, according to the position of the detected object, whenever the sensor sends distance values to the acoustic module. The sound module contains a bank of previously generated acoustic sounds for a spatial area between 0·5 m and 5 m, for 64 image pixels. A delta sound of 2040 samples at a frequency of 44·1 kHz was used to generate the acoustic information of the environment. In order to define the distances, 16 planes were generated, starting from 0·5 m and increasing exponentially up to 5 m. The refresh rate of the sounds is 2 fps. 16 MB memories are needed to process the acoustic module. The distance displacement is strongly dependent on the sound intensity and on the pitch. At shorter distances, the sound is stronger than at farther distances. The more the distance increases, the lower the sound intensity is. Virtual sounds were obtained by convolving acoustic sounds with non-individual Head-Related Transfer Functions (HRTF) previously measured using a KEMAR manikin. The working principle of the acoustic module is similar to ‘read and play’. This means that the acoustic module reads the output data from the sensor module, consisting of coordinates in both distance and azimuth, and plays the sound at the same coordinates. The time interval between sounds is 8 ms while there are sounds playing. When there are no sounds, the sound module recalls the sensor module after 5 ms.
3. EXPERIMENTAL RESULTS
In this section, the tests carried out with the Acoustic Prototype are described. The experiments, which were developed during two months, involved twenty blind users. The tests were performed in controlled environments under the supervision of instructors and engineers. During the first month, each individual was trained to perceive and localize the sounds heard through the headphones, to learn that these sounds were representing objects of the surrounding environment and to relate them to corresponding obstacles. In other words, they learned that the sound meant 'danger’ and that they should avoid it. This initial learning period was implemented through different exercises with increasing complexity: from simple object detection to localization of several objects and navigation through these objects whilst avoiding them. Initially users were complementing the use of the Acoustic Prototype with the white cane. The use of the white cane enabled the users to relate the distance perceived with the white cane with the sounds heard via headphones. The aim of these experiments was to validate the Acoustic Prototype as object detector and mobility device for blind people.
During the indoor laboratory tests, the users followed a 14 m long path based on eight identical cardboard boxes, placed in a zigzag pattern, and with a wall at the end (See Fig. 6). The distance between pairs of boxes was 2·5 m (the boxes of each pair were separated by 2 m). A list of parameters including: number of hits, number of corrections and the travel time (also defined by (Armstrong, Reference Armstrong, Pickett and Triggs1975)) were measured. Moreover, each test was performed under three different variants:
1. with only a white cane,
2. with only the Acoustic Prototype
3. combining the white cane and Acoustic Prototype.
It was found with the Acoustic Prototype that, as well as the width, the users were able to perceive the height of objects by moving their heads up and down. Furthermore, some subjects were even able to perceive the object surface shape: square or round. The minimum width detected was around 4 cm (a crystal door frame). However, this level of perception was only achieved, after a long training period, by subjects with good hearing abilities and when both objects and subjects were static. In comparison with the ultrasonic navigation devices (Clark-Carter et al., Reference Clark-Carter, Heyes and Howarth1986; Shoval et al., Reference Shoval, Borenstein and Koren1998), in which the optimal range is up to 3 m, the Acoustic Prototype showed an accurate detection range from 0·5 m to 5 m in distance. With this device, blind users detect and perceive all obstacles and are able to navigate safety.
It must be mentioned that travel speed depends on the environment complexity and user perception ability, e.g., in the laboratory tests in which the blind users tested the device the best result achieved for the 14 m path was 0·11 m/s. In our case, the path was not like that described in (Shoval et al., Reference Shoval, Borenstein and Koren1998), where the walls were used as objects, so that the blind user could permanently obtain the required information from these walls at both his left and right sides. In such a situation the blind user was guided by the sounds of both walls and walked through the middle, where the sound was attenuated. In the laboratory tests with the Acoustic Prototype, the users must perceive the position of the first obstacle so as to avoid it, then find the second obstacle, avoid it, and so on. Therefore, the task considered here was more sophisticated and required longer time. Due to this fact, it was relatively easy for the user to go the wrong way. After several hours, some participants were able to navigate through this path without any errors at a speed lower than 0. 2 m/s.
Other tests were developed outside the laboratory, in the blind school square in a line of 29 m length (Mobility Test A) and in the street over a distance of 145 m (Mobility Test B). In the outdoor environment, common obstacles such as trees, walls, cars, light poles, etc., were present. Table 2 shows the results obtained from twenty blind participants for the three analysed environments.
Analysis and comparison between these data reveals that navigation with the white cane is faster than that with the Acoustic Prototype. This result occurs because of the short training period in the use of the device, since every participant had years of practice with the white cane, whereas the maximum experience with the Acoustic Prototype was only two months. Also, it was observed that navigation performances with the Acoustic Prototype were improving over time. This fact demonstrates that with the Acoustic Prototype, participants feel safer and navigate without problems after longer periods of training. This again emphasises the importance of the training period. On the other hand, the underlying idea behind the development of the Acoustic Prototype was that it would be a complementary navigation device and not a substitute for the white cane.
From another point of view, the Acoustic Prototype has its own constraints due to the use of a line sensor. This limits up and down object detection. As mentioned before, the participants must move their head up and down in order to find small obstacles as well as high objects such as trees or poles. Also, long training periods are required, as well as good hearing ability, in order to detect stairs and pot holes. In this situation the white cane performed better. However, while the white cane detects near-ground level obstacles, the Acoustic Prototype enables the detection of near and far upper-ground obstacles, so the navigation performance of blind people may significantly increase. In comparison with the white cane, the device helps blind users to detect farther obstacles as well as to estimate, according to the sound intensity, the speeds of the objects and their direction. Also, it helps them to avoid all the obstacles in advance. Another advantage is the wide azimuth range (60°). By having such large range, blind subjects can determine the position and the width of the objects, helping them in their orientation.
To summarize, the ‘Acoustic Prototype’ presents many advantages in comparison with other Electronic Travel Aid devices:
1. The measurement of near and farther distances can be considered instantaneous.
2. The range accuracy is fairly good.
3. The data from the 3-D CMOS image sensor can be interpreted directly as the range to an obstacle.
4. The angular resolution of 0·95° is much better than for sonar and Global Positioning Systems.
5. The acoustic sounds used are sounds which are measured in order to act directly over the middle brain and not interfere with the external noises.
6. The acoustic sounds are delivered simultaneously and they do not overlap.
7. The sounds are short and do not require a long time for their interpretation.
However, further modifications and improvements are being studied:
1. Improvement of the vertical range: Currently, in the Acoustic Prototype, only a single 64 pixels line of the 3D-CMOS image sensor scans the horizontal plane environment at the user eye level. This limits the vertical (up and down) field of view.
2. Improvement of the acoustic sounds: The sounds are generated for an elevation of 0°. If adding vertical scanning sensors, the implementation of sounds for these additional elevations is required. In this case, it is important to study and analyse psychoacoustic localization for virtual environments in elevation, distance and azimuth.
3. Implementation of a voice-based guide: Blind users are used to receiving environmental information via voice. In accordance, the Acoustic Prototype could incorporate new vocal instructions as well as subsequent modification of the interaction interface.
4. Implementation of stereovision system: Stereovision system would improve the detection system and also made the classification, including the navigation and positioning in the environment.
5. Implementation of the reading technology: the reading technology would help blind users to read information on posters, or market products and even to read newspapers or books.
6. Implementation of the guiding system: a Guidance system, which could work, for instance. by following a painted line on the ground etc.
7. Test the system with tactile display.
8. Improvement of the object detection and navigation algorithms: The implementation of new methods and technologies for localization and mapping, for example Simultaneous Localization and Mapping algorithm (SLAM) used for robots (Chang, et al., Reference Chang, Lee, Lu and Hu2007), may help the system to work autonomously without the help of the Global Positioning System.
4. CONCLUSION
This work presents a new object detection device for blind people named ‘Acoustic Prototype’. The device is based on a 4 × 64 Three-Dimensional Complementary Metal Oxide Semiconductor (3-D CMOS) image sensor based on the three-dimensional integration and Complementary Metal-Oxide Semiconductor (CMOS) processing techniques relying on the Time-Of-Flight (TOF) measurement principle and integrated in a pair of sunglasses. This technology is sustained by 1 × 64 image pixels and a 3-D CMOS image sensor developed for fast real-time distance measurement. A Multiple Double Short Time Integration (MDSI) is used to eliminate background illumination and to correct reflectance variation in the environment. Due to the short acoustic stereophonic sounds, the information of the environment acquisition system (1 × 64 pixel 3-D CMOS sensor) is transmitted in real time to the user through stereophonic headphones.
After only a few weeks of training the users were able to perceive the presence of objects as well as their shape and even whether they were static or moving.
The experiments show that the information obtained by the Acoustic Prototype enable blind users to travel safety and increase their perception range in distance and azimuth. It helps blind users to perceive far and near, static and mobile obstacles and to avoid them.
ACKNOWLEDGEMENTS
The first author would like to acknowledge that this research was funded through the FP6 European project CASBLiP number 027063 and Project number 2062 of the Programa de Apoyo a la Investigación y Desarrollo 2011 from the Universitat Politècnica de València.