Nomenclature
- ADS-B
-
Automatic dependent surveillance-broadcast
- CNN
-
Convolutional neural network
- CSR
-
Convolutional sparse representation
- GAN
-
Generative adversarial network
- LiDAR
-
Light Detection and Ranging
- SAA
-
Sense and Avoid
- SR
-
Sparse representation
- SWAP
-
Size, weight and power
- TCAS
-
Traffic Alert and Collision Avoidance System
- UAV
-
Unmanned aerial vehicle
Greek symbol
- $\mu $
-
Control texture smooth degree
- $\sigma $
-
Control texture size
- $\lambda $
-
Control weight of ${l_1}$ norm
1.0 Introduction
The capability of Sense and Avoid (SAA) is considered to be the most important component for the integration of unmanned aerial vehicles (UAV) into the National Aerospace System [Reference Yu and Zhang1, Reference Mcfadyen and Mejias2]. SAA is composed of two crucial parts in general: (1) the sensing part, aimed at detecting all aerial targets threatening UAV flight safety with the help of on-board sensing devices; and (2) the avoiding part, aimed at eliminating the potential hazards based on the sensing result by trajectory re-planning and corresponding flight control [Reference Fu, Zhang and Yu3].
The sensing part is the foundation of SAA. According to the working pattern of onboard sensing devices, SAA can be divided into two main categories: cooperative and non-cooperative. The sensing devices for cooperative SAA contain Traffic Alert and Collision Avoidance System (TCAS) [Reference Lin, Lai and Lee4] and Automatic Dependent Surveillance-Broadcast (ADS-B) [Reference Lin and Lai5], which have been widely installed on manned aircraft. The non-cooperative SAA is operated with onboard sensing devices free from information exchange. The sensing devices for non-cooperative SAA contain machine vision [Reference Zhang, Cao, Ding, Zhuang and Yao6], acoustic system [Reference Harvey and O’Young7], and Light Detection and Ranging (LiDAR) [Reference Sabatini, Gardi and Ramasamy8]. Different from non-cooperative devices, the onboard sensing devices for cooperative SAA largely depend on information exchange with aerial targets.
Compared to all airborne sensing devices, machine vision shows great potential to enhance the capabilities of SAA for the following reasons [Reference Zhang, Cao, Ding, Zhuang, Yao, Zhong and Li9]:
-
Machine vision can detect all the dangerous flying targets without information exchange, which makes the perception of non-cooperative aerial targets possible.
-
The information gathered by machine vision is abundant compared with other non-cooperative sensors. For example, the category of aerial targets can be acquired by image recognition algorithm, and proper collision avoidance maneuvers can be selected according to the target category.
-
Machine vision outperforms other airborne sensor devices in terms of size, weight, and power (SWAP) [Reference Zhang, Cao, Ding, Zhuang and Wang10], which makes the installation of machine vision on small UAVs possible.
The application of machine vision for SAA has been extensively researched, and a series of algorithms and systems based on vision have been developed. However, there are still some challenges in the perception part of SAA, and one of the most serious is the high demand for image quality. The factors that may affect the airborne image quality can be summarised as follows: (1) inadequate illumination in dark environments [Reference Wang, Zhou, Liu and Qi11]; (2) blurred image caused by aircraft position and posture variation [Reference Kupyn, Budzan, Mykhailych, Mishkin and Matas12]; and (3) target occlusion caused by cloud and mist [Reference Liu, Fan, Hou, Jiang, Luo and Zhang13]. It is worth mentioning that previous SAA research work based on vision, including aerial target detection, tracking and pose estimation, are all proposed with the precondition that image quality is good enough. However, low image quality will influence the performance of these algorithms greatly in real application because it will directly lead to the loss of target texture information, which will bring great difficulties to feature extraction and target detection. Therefore, the low image quality will reduce the perception ability of the target, and the visual perception range will be reduced, which is extremely unfavourable for SAA [Reference James, Ford and Molloy14]. Among these factors, low illumination is the most typical one, which will directly influence UAV’s threat target perception ability in low illumination conditions. Therefore, the research motivation of this paper includes two points: restraining the attenuation of UAV’s visual perception range in low illumination and enhancing the structure and texture information of the detected target for SAA application.
As shown in Fig. 1, the helicopter is not clear in visible image in low illumination conditions. But the infrared image can effectively capture the structure of helicopter, compared with the lack of texture information for visible images. Considering the complementarity of visible and infrared sensors, it is feasible to combine these two visual information to improve the image quality. As shown in Fig. 1(c), The fused image combines the advantages of both visible and infrared images effectively.
Therefore, a CSR-based visible and infrared image fusion method is proposed to enhance the SAA ability of UAVs in low illumination conditions in this paper. Firstly, the source image is decomposed into texture and structure layers since infrared images are good at characterising structural information and visible images have richer texture information. Then both the structure and texture layers are transformed into the sparse convolutional domain through the CSR mechanism, and the CSR coefficient mapping are fused by activity level assessment. Finally, the image is synthesised through the reconstruction results of the fusion structure and texture layers.
The main contribution of this paper can be summarised below:
-
Aiming at shortcomings of semantic information loss and low tolerance of detail information caused by traditional methods, a visible and infrared image fusion method based on multi-layer CSR is proposed to enhance the visual perception of UAVs in low illumination.
-
Different from the local transformation in traditional methods, the global modelling ability of the proposed method has obvious advantages under the condition of mismatch.
-
Compared with the deep learning methods, the proposed method adopts an unsupervised learning mode, which does not require many labelled samples for training and is easier to implement.
The rest of this paper is organised as follows. Firstly, the application background of this method is introduced, and the significance and motivation of the research are explained in Section 1. Section 2 conducts an analysis of relevant literature reviews. Section 3 introduces the framework and mechanism of the proposed visible and infrared image fusion algorithm in this paper. In Section 4, the effectiveness of the proposed algorithm is verified by a series of experiments in three scenarios and compared and analysed with other algorithms. The whole paper is summarised in Section 5.
2.0 Literature review
Due to the advantages, including non-cooperative target perception, abundant information acquisition capability and good size, weight and power (SWAP) features, vision-based SAA has shown great potential for increasing UAV safety levels in recent years. The general framework of vision-based SAA consists of four key components: aerial target detection, tracking, relative pose and position estimation and avoidance [Reference Mcfadyen and Mejias2]. The related research for each component can be concluded as follows:
-
Aerial target detection [Reference Zhang, Cao, Ding, Zhuang and Wang10, Reference Lyu, Pan, Zhao, Zhang and Hu15, Reference Zhang, Guo, Lu, Wang and Liu16, Reference Yu, Li and Leng17]. Aerial target detection is the first step of vision-based SAA, which aims at picking out targets with potential risk from images/videos. Research on aerial target detection can be classified into foreground modelling-based methods and background modelling-based methods, which utilises information from single image and consecutive frames respectively.
-
Aerial target tracking [Reference Yang, Yu, Wang and Peng18]. After the detection of aerial targets, the detected bounding box should be tracked continuously by target tracking algorithms. Vision-based target tracking algorithms can be classified into generative and discriminative tracking, and both categories have been applied to vision-based SAA. The main challenge of vision-based aerial target tracking is the adaptive scale transformation of the tracking bounding box.
-
Relative state estimation [Reference Lai, Ford, O’Shea and Mejias19, Reference Vetrella, Fasano and Accardo20]. This component aims to obtain the relative position and attitude between host UAV and aerial targets with potential risk. Since the risk level is determined based on the estimated angle and range, this step is crucial for collision avoidance.
-
Collision avoidance [Reference Fu, Zhang and Yu3, Reference Lee, Park, Park and Park21]. Finally, the potential risk of aerial targets should be eliminated by trajectory re-planning and tracking control based on the estimated pose and position. The biggest challenge for vision based collision avoidance is that the range information can be hard to acquired in some cases, especially for monocular vision.
Generally the four key components concluded above are important, but the foundation of vision-based SAA is the high quality image. Therefore, the enhancement of image quality in some undesirable conditions is imperative. It is worth noting that the previous research related to vision-based SAA is carried out assuming that image quality is good enough. Several factors may deteriorate airborne image quality in real applications, and insufficient illumination in dark environments is the most crucial one. Since visible and infrared images containing aerial targets are complementary in dark environments, increasing image quality by visible and infrared image fusion is desirable.
In general, the algorithm designed for visible and image fusion can be concluded as three steps: image transformation, image fusion and image reconstruction. Among the three steps, the method for image transformation is the foundation of the whole algorithm. For this reason, the research of image fusion algorithms during the past decade mainly focuses on developing a more concise and effective transformation method. The most widely used transformation methods for image fusion are sparse representation (SR), convolutional sparse representation (CSR) and deep learning-based methods such as convolutional neural networks (CNN).
The application of SR-based image fusion has achieved great success in the past few years. However, due to the local representation nature of SR, the drawbacks of SR-based fusion algorithm can be concluded as follows [Reference Gu, Zuo, Xie, Meng, Feng and Zhang22, Reference Liu, Chen, Ward and Wang23]: (1) The context information loss. Since SR-based fusion must firstly decompose source image into local patches, the context information within the source image is neglected. It is worth noting that context information is essential for vision understanding and analysis. (2) The high sensitivity to registration errors. As SR fuses all the image patches respectively, all the image patches need to be accurately registered. However, image registration itself is also a difficult task, and registration error may always exist. To overcome this problem, the fusion framework designed based on global representation algorithms has been proposed in recent years, and the most representative algorithms are CNN and CSR [Reference Liu, Chen, Wang, Wang, Ward and Wang24].
Deep learning has revealed the powerful potential for computer vision tasks recently. The advantage of deep learning-based image fusion methods is that the fusion strategy can be obtained through learning, and the fused result can be obtained without the artificial design of fusion rules [Reference Liu, Fan, Jiang, Liu and Luo25]. As a supervised learning approach, the framework of CNN can be classified into two main categories, namely the regression CNN and classification CNN [Reference Singh and Anand26]. Both the regression CNN and classification CNN have been successfully applied to image fusion [Reference Liu, Chen, Cheng, Peng and Wang27, Reference Liu, Chen, Cheng and Peng28]. Generative adversarial network (GAN), a novel deep learning model, can extract typical characteristics by using different network branches according to the modality of different image sources [Reference Liu, Liu, Jiang, Fan and Luo29]. Because of its advantages in processing multi-modality information, GAN is the future development direction of image fusion methods, which is suitable for task-driven image fusion, such as target perception [Reference Liu, Fan, Huang, Wu, Liu, Zhong and Luo30]. However, the restriction of CNN-based image fusion may come from the high demand for labelled training samples. CSR originated from the de-convolutional networks designed for unsupervised image feature analysis [Reference Zeiler, Taylor and Fergus31]. With applications to image fusion, CSR can be treated as a global image transformation approach. The advantages of CSR-based image fusion over SR and deep learning can be concluded as follows [Reference Liu, Chen, Ward and Wang23]: (1) the global modelling capability of CSR makes it free from image decomposition when applied to image fusion. For this reason, the above-mentioned deficiencies of SR based fusion, including context information loss and high sensitivity to misregistration caused by local transformation, is easy to solve; (2) the unsupervised learning nature of CSR makes it free from a large amount of labelled ground truth images. Therefore, CSR has revealed great potential for image fusion.
For this reason, the elastic-net regularisation based multi-layer CSR is adopted for image fusion in this paper. Instead of directly fusing source images, both visible and infrared images are decomposed into structure and texture layers. The image layers, after decomposition, are then transformed into the sparse convolutional domain for image fusion. Finally, the transformed sparse convolutional coefficient mapping corresponding to visible and infrared images are fused by activity level assessment, and the fused image is obtained by image reconstruction.
3.0 Visible and infrared image fusion method for SAA
The general framework of the image fusion method for SAA contains three parts, as shown in Fig. 2. Firstly, both the visible image ${I^{VI}}$ and infrared image ${I^{IN}}$ are decomposed into two layers, namely structure layers $I_S^{IN}$ , $I_S^{VI}$ and texture layers $I_T^{IN}$ , $I_T^{VI}$ . Secondly, the decomposed structure layers $I_S^{IN}$ , $I_S^{VI}$ and texture layers $I_T^{IN}$ , $I_T^{VI}$ are transformed into the sparse convolutional domain via CSR by pre-learned dictionary $D$ , and the transformed sparse convolutional coefficient mapping corresponding to $I_S^{IN}$ , $I_S^{VI}$ , $I_T^{IN}$ , and $I_T^{VI}$ are $X_S^{IN}$ , $X_S^{VI}$ , $I_S^{IN}$ , and $I_S^{VI}$ respectively. Thirdly, by transforming the activity map of $X_S^{IN}$ and $X_S^{VI}$ into $A_S^{IN}$ , $A_S^{VI}$ , the decision map for structure layer $D{P_S}$ is generated. Similarly, the decision map for texture layer $D{P_T}$ is generated. Based on the decision map, the fused convolutional sparse coefficient map $X_S^F$ and $X_T^F$ for structure and texture layers are obtained. The fused result $I_S^F$ and $I_T^F$ for structure and texture layers are reconstructed by utilising sparse convolutional dictionary $D$ . Finally, the fused image ${I^F}$ is obtained by synthesising $X_S^F$ and $I_T^F$ . In this section, the method for image decomposition, image transformation and image reconstruction will be introduced in detail.
3.1 Image decomposition
Typically, image $I$ is composed of two layers: the structure layer ${I_S}$ and the texture layer ${I_T}$ , as expressed in Equation (1). The structure layer usually represents the semantic information and captures salient objects inside the image, while the texture layer emphasises the preservation of the details of the image. And the semantically meaningful structure layers are usually covered by texture layers as shown in Fig. 2. As mentioned above, since infrared image and visible image are good at retaining structure and texture information respectively, so it is desirable to decompose these two layers for image fusion.
This paper adopts the relative total variation based image decomposition algorithm for image decomposition [Reference Xu, Yan, Xia and Jia32]. The objective function for image decomposition is expressed as Equation (2), where ${I_S}\!\left( i \right)$ and $I\!\left( i \right)$ are pixel values of the structure layer and original image at location $i$ respectively, $p$ is the total pixel of the input image, $\mu $ is the parameter controlling smooth degree, and $\varepsilon $ is a small positive number to avoid denominator being zero. ${V_x}\!\left( i \right)$ and ${V_y}\!\left( i \right)$ from in Equations (3) and (4) are total variations in $x$ and $y$ direction for pixel $i$ , where $R\!\left( i \right)$ is the rectangle region centred at $i$ , ${g_{i,j}}$ is weighting function designed to avoid spatial affinity, ${\partial _x}$ and ${\partial _y}$ are partial derivatives in $x$ and $y$ direction respectively. The formulation of ${g_{i,j}}$ is expressed as Equation (5), where $\sigma $ is the parameter controlling window size. The influence of the image decomposition parameters on image fusion will be analysed in Section 5.
3.2 Image transformation
In this part, the structure layers $I_S^{VI}$ , $I_S^{IN}$ and texture layers $I_T^{VI}$ , $I_T^{IN}$ of visible and infrared images are transformed into the sparse convolutional domain by elastic net based CSR. In Equation (6), the basic idea of CSR is that an input image $I$ can be represented by the sum of the convolutional product of equal-sized convolutional dictionary filters $D = \!\left\{ {{d_1},{d_2}, \ldots ,{d_m}} \right\}$ and sparse convolutional coefficient mapping $X = \!\left\{ {{x_1},{x_2}, \ldots ,{x_m}} \right\}$ , where $m$ is the number of convolutional dictionary filters.
For each input image, the convolutional dictionary is pre-learned. Therefore, the computation of sparse convolutional coefficient mapping $X$ is essential for image transformation. Conventionally, the computation of $X$ can be operated by ${l_1}$ norm regularisation, and the objective function can be expressed as Equation (7), where $\lambda $ is the regularisation parameter.
Since ${l_1}$ norm regularisation could not guarantee group selection when applied to image transformation, the elastic-net based regularisation is proposed in this paper to combine the advantages of ${l_1}$ norm and ${l_2}$ norm regularisation. The objective function for elastic-net based regularisation can be expressed as Equation (8). The solution of Equation (8) can be acquired by the alternating direction method of multipliers (ADMM) [Reference Wohlberg33].
Moreover, the solution of the convolutional dictionary filters can be regarded as an optimisation problem, as shown in Equation (9). The solution of Equation (9) can be further divided into the optimisation of variables ${x_i}$ and ${d_i}$ . The solution of the former is the same as Equation (7). The solution of the latter can be regarded as the convolution form of the method of optimal directions (MOD) [Reference Madhuri and Negi34].
Therefore, as presented in Equation (10), given the structure layers of $I_S^{VI}$ , $I_S^{IN}$ and texture layers $I_S^{VI}$ , $I_T^{IN}$ of visible and infrared images, the sparse convolutional coefficient mapping $X_S^{IN}$ , $X_S^{VI}$ , $X_T^{IN}$ and $X_S^{VI}$ can be estimated via elastic net regularisation-based CSR.
3.3 Image reconstruction
The ${l_1}$ norm max strategy is adopted to fuse the sparse convolutional coefficient mapping of structure and texture layers after the computation of $X_S^{IN}$ , $X_S^{VI}$ , $X_T^{IN}$ and $X_S^{VI}$ in Equation (11), where ${X_S}\!\left( {i,j} \right)$ and ${X_T}\!\left( {i,j} \right)$ denotes the content of ${X_S}$ and ${X_T}$ at location $\!\left( {i,j} \right)$ respectively.
Since the fusion of the structure layer is operated in the transformation domain, the fused results of structure and texture layers $X_{S,T}^F = \!\left\{ {x{{_{S,T}^F}_1},x{{_{S,T}^F}_2}, \ldots ,x{{_{S,T}^F}_m}} \right\}$ need to be transformed back to image domain. As presented in Equation (12), the reconstruction of the fused result of structure layer $X_{S,T}^F$ can be acquired by utilising the convolutional dictionary filter.
Finally, the fusd image ${I^F}$ can be obtained by superimposing the fused result of structure and texture layers, as presented in Equation (13).
4.0 Experimental simulation and analysis
In this section, three scenes of visible and infrared images containing aerial targets are selected for the image fusion experiment as presented in Fig. 3, where the selected images obtained by different sensors are all complimentary.
To effectively evaluate the algorithm performance, objective and subjective metrics are adopted to function the quality of image fuseds. Subjective metrics are operated by human eyes observation, which is intuitive and easy to operate. However, subjective metrics may fail to capture the slight differences between fused results. Therefore, the objective metrics are adopted to evaluate the image fused results. The definition and function for each objective metric is presented as follows [Reference Liu, Blasch, Xue, Zhao, Laganiere and Wu35]:
-
${Q_{MI}}$ . Objective Metric ${Q_{MI}}$ is defined based on information theory, and a higher value of ${Q_{MI}}$ indicates a better fused result.
-
${Q_M}$ . Objective metric ${Q_M}$ is defined based on a multi-scale scheme, and a higher value of ${Q_M}$ indicates a better fused result.
-
${Q_S}$ . Objective metric ${Q_S}$ is defined based on image structural similarity, and a higher value of ${Q_S}$ represents a better fused result.
-
${Q_{CB}}$ . Objective metric ${Q_{CB}}$ is defined based on the human perception system, and a higher value of ${Q_{CB}}$ represents a better fused result.
Three important parameters have been recognised to bring some influence on fused results. The parameter’s name, influences and value range are concluded as Table 1. The parameter $\mu $ affects the smoothness of the image decomposition texture. Some noises can be filtered by proper image smoothing. However, the effect of image fusion decreases significantly with the increase of parameter $\mu $ , because higher texture smoothness will lead to the loss of image details. We have found that when $\mu $ exceeds 0.5, the image quality can not achieve satisfactory results for any other parameters. Considering the above factors comprehensively, the range of value of $\mu $ should not be too large, so we choose $\!\left( {0,{\rm{\;}}0.05} \right]$ as the range of value of $\mu $ ,which can explain the changing trend. The parameter $\sigma $ affects the texture size of image decomposition. In the experiment, the change of $\sigma $ value has no obvious influence on the fusion effect, so we choose a larger range. Parameter $\lambda $ indicates regularisation of convolution sparse representation, which must be in the range of (0, 1]. With the increase of $\lambda $ , the effect of image fusion will decline because larger $\lambda $ will bring greater reconstruction errors, eventually affecting the fusion effect. When $\lambda $ is close to 1, the model is under fitted, while when $\lambda $ tends to 0, the model is easily over-fitted. Therefore, in order to facilitate the adjustment of $\lambda $ in order of magnitude, the range of value of $\lambda $ should not be too large, we choose $\left[ {0.0099,{\rm{\;}}0.99} \right]$ as the range of value of $\lambda $ .
4.1 Quantitative evaluation for $\mu $
The value range of $\mu $ is presented in Table 1, and the values of $\sigma $ and $\lambda $ when evaluating $\mu $ are 3 and 0.001 respectively. The fused results with the variation of $\mu $ are presented in Fig. 4. The ${Q_{MI}}$ , ${Q_M}$ , ${Q_S}$ , ${Q_{CB}}$ for different scenes are presented in Fig. 5. It can be seen intuitively from Fig. 4 that with the increase of $\mu $ , the contours of the background and the target will be weakened in most cases, which is not conducive to subsequent detection tasks. According to Fig. 5, it can be known that the effect of image fusion is negatively correlated with $\mu $ in most cases, but smoothing out background noise can bring some improvement in some specific scenes. The fusion effect decreases with the increase of $\mu $ , because the higher smooth degree may cause detail loss of the fused image.
4.2 Quantitative evaluation for $\sigma $
The value range of $\sigma $ is presented in Table 1, and the values of $\mu $ and $\lambda $ when evaluating $\sigma $ are 0.0015 and 0.001, respectively. The fused results with the variation of $\sigma $ are presented in Fig. 6. The ${Q_{MI}}$ , ${Q_M}$ , ${Q_S}$ , ${Q_{CB}}$ for different scenes are presented in Fig. 7. It can be seen from Fig. 6 that there is no significant difference in the effect of image fusion with the change of $\sigma $ value. As can be seen from Fig. 7, when $\sigma $ is in the range of (2, 4), the evaluation index of image fusion fluctuates stably in a fixed interval. Experiment results reveal that the variation of $\sigma $ dose not significantly influence fused results.
4.3 Quantitative evaluation for $\lambda $
The value range of $\lambda $ is presented in Table 1, and the values of $\mu $ and $\sigma $ when evaluating $\lambda $ are 0.0015 and 3 respectively. The fused results with the variation of $\lambda $ are presented in Fig. 8. The ${Q_{MI}}$ , ${Q_M}$ , ${Q_S}$ , ${Q_{CB}}$ for different scenes are presented in Fig. 9. As seen from Fig. 8, with the increase of $\lambda $ , the target is blurred to produce ghosts, and the background noise is also increased. As seen from Fig. 9, the effect of image fusion is negatively correlated with the increase of $\lambda $ , and the increase of $\lambda $ is not conducive to subsequent target detection. Generally, the quality of the fused result decreases with the increase of $\lambda $ . It is worth noting that the increase of $\lambda $ will cause the coefficient map to be more sparse, finally indicating the decrease in the fused result.
4.4 Comparison experiments
To measure the effectiveness of the algorithm proposed in this paper more effectively, three image fusion algorithms, including SR [Reference Yang and Li36], lasso-based CSR [Reference Liu, Chen, Ward and Wang23] and CNN [Reference Liu, Chen, Cheng, Peng and Wang27], are selected to compare with the algorithm proposed in this paper. The fused results are compared in Fig. 10. It can be seen from Fig. 10 that there is too much noise in the fused results obtained by SR-method, which will bring many disadvantages. The fusion method based on CSR (lasso) is more robust than SR but loses structural information such as edges. The fusion method based on CNN is a compromise compared with the above two methods, but the texture features are poorly maintained compared with the proposed method. From the aspect of subjective measurement, the algorithm proposed in this paper is capable of preserving image details while strengthening the object. The comparison of objective measurements containing ${Q_{MI}}$ , ${Q_M}$ , ${Q_S}$ , ${Q_{CB}}$ are presented in Table 2. Obviously that the objective measurements of the algorithm proposed in this paper outperforms other algorithms in most cases. Based on the fused results based on SR, CSR and CNN in Table 2 (refer to the maximum value among the three), ${Q_{MI}}$ is increased by nearly $19{\rm{\% }}$ , ${Q_M}$ is increased by nearly $4{\rm{\% }}$ , ${Q_S}$ is increased by nearly $2{\rm{\% }}$ and ${Q_{CB}}$ is increased by nearly $4{\rm{\% }}$ on average. Although the proposed algorithm is slightly lower than the comparative method in several aspects, its performance is superior on average and has better robustness.
All the algorithms are implemented in MATLAB 2016b with a 2.4 GHz CPU and 8 GB RAM. The processing time of the four algorithms are counted by using the tic and toc command of MATLAB, and the results are shown in Table 3. As can be seen from Table 3, the algorithm proposed in this paper has obvious advantages in processing time compared with SR and CNN. However, compared with CSR, the processing time slightly increases due to the need for two convolution sparse operations.
5.0 Conclusion
In this paper, an elastic net regularisation based CSR is presented for visible and infrared image fusion. Since visible images and infrared images are good at preserving texture and structure, the source images are first decomposed into texture layer and structure layer before image fusion. Then, the structure layers of the source images are fused by CSR using the pre-learned convolutional sparse dictionary filter. The texture layers are fused by utilising the decision map generated during the fusion process of structure layers. Finally, the fused results of the texture and structure layers are synthesised to acquire the fused image. To verify the effectiveness of the proposed algorithm, both subjective and objective measurements are selected to evaluate the effectiveness of the fusion algorithm proposed in this paper. The simulation results reveal that the proposed algorithm can preserve image details while strengthening objects, and it is superior to other image fusion methods in most cases.
Acknowledgements
This study is supported by the National Natural Science Foundation of China (No. 61673211), Open Project Funds of Ministry of Industry and Information Technology for the Key Laboratory of Space Photoelectric Detection and Perception (No. NJ2020021-01).