Impact Statement
The prospect of restoring blurred images with a wave of the digital wand is undeniably seductive in microscopy. However, the reality currently appears less satisfying, as handcrafted algorithms often offer only minimal gains at the price of long parameter tuning.
In this article, we combine physical models of the blur and artificial intelligence to design an interpretable blind deblurring method. A first neural network is trained to estimate the point spread function of the optical system, while a second network leverages this estimate to improve image quality. This approach provides a fully automated tool, capable of improving the image quality in seconds. The proposed methodology yields point spread function estimates with a quality that is superior by 10 dB to other popular methods, which also leads to better and more reliable deblurring results.
1. Introduction
Image deblurring and superresolution consist of recovering a sharp image $ \overline{\mathbf{x}} $ from its blurred and subsampled version $ \mathbf{y}=\mathcal{P}\left(\overline{\mathbf{A}}\overline{\mathbf{x}}\right) $ , where $ \overline{\mathbf{A}}\in {\mathrm{\mathbb{R}}}^{M\times N} $ is a discretized linear integral operator describing the acquisition process, and $ \mathcal{P}:{\mathrm{\mathbb{R}}}^M\to {\mathrm{\mathbb{R}}}^M $ is some perturbation modeling noise, quantization, and saturation. It plays an important role in biomedical and astronomical imaging, where physical phenomena such as diffraction and turbulence strongly reduce the achievable resolution. It also received a constant attention in the field of computer vision, where moving or out-of-focus objects create artifacts. When the operator $ \overline{\mathbf{A}} $ describing the optical system is available, this problem can be solved with mature variational inverse problem solvers(Reference Chambolle and Pock16) or data-driven approaches.(Reference Arridge, Maass, Öktem and Schönlieb8)
However, deriving a precise forward model requires specific calibration procedures, well-controlled imaging environments. and/or highly qualified staff. In addition, model mismatches result in distorted reconstructions. This can lead to dramatic performance loss, especially for superresolution applications.(Reference Gossard and Weiss41, Reference von Diezmann, Lee, Lew and Moerner84)
An alternative to a careful calibration step consists of solving the problem blindly: the forward model $ \overline{\mathbf{A}} $ is estimated together with the sharp image $ \overline{\mathbf{x}} $ . Unfortunately, this blind inverse problem is highly degenerate. There is no hope to recover the sharp image without prior assumptions on $ \overline{\mathbf{x}} $ and $ \overline{\mathbf{A}} $ . For instance, assume that $ \overline{\mathbf{A}} $ is a discrete convolution operator with some kernel $ \overline{\mathbf{h}} $ , that is, $ \mathbf{y}=\overline{\mathbf{h}}\star \overline{\mathbf{x}} $ . Then, the couple $ \left(\overline{\mathbf{h}},\overline{\mathbf{x}}\right) $ can be recovered only up to a large group of transformations.(Reference Soulez and Unser79) For instance, the identity and blurred image are a trivial solution, and the image and kernels can be shifted in opposite directions or scaled with inverse factors. Therefore, it is critical to introduce regularization terms both for the operator $ \overline{\mathbf{A}} $ and the signal $ \overline{\mathbf{x}} $ .
The main objective of this work is to design a blind inverse problem solver under the two assumptions below:
-
• The operator $ \overline{\mathbf{A}} $ can be parameterized by a low-dimensional vector. In what follows, we let $ \mathbf{A}:{\mathrm{\mathbb{R}}}^K\to {\mathrm{\mathbb{R}}}^{M\times N} $ denote the operator mapping and we assume that $ \overline{\mathbf{A}}=\mathbf{A}\left(\overline{\boldsymbol{\unicode{x03B3}}}\right) $ for some $ \overline{\boldsymbol{\unicode{x03B3}}}\in {\mathrm{\mathbb{R}}}^K $ .
-
• The signal $ \overline{\mathbf{x}} $ lives in a family $ \mathcal{X}\subseteq {\mathrm{\mathbb{R}}}^N $ with some known distribution $ {\mathcal{L}}_{\mathcal{X}} $ .
We propose a specific convolutional neural architecture and a training procedure to recover the couple $ \left(\overline{\boldsymbol{\unicode{x03B3}}},\overline{\mathbf{x}}\right) $ from the degraded data $ \mathbf{y} $ and the mapping $ \mathbf{A}\left(\cdot \right) $ . A first network identifies the parameterization $ \overline{\boldsymbol{\unicode{x03B3}}} $ , while the second uses this parameterization to estimate the image $ \overline{\mathbf{y}} $ . This results in an efficient algorithm to sequentially estimate the blur operator and the sharp image $ \overline{\mathbf{x}} $ . The network architecture is shown on Figure 1. At a formal level, the work can be adapted to arbitrary inverse problems beyond image deblurring. We, however, showcase its efficiency only for challenging deblurring tasks involving convolutions but also more advanced space-varying operators.
1.1. Related works
Solving blind deblurring problems is a challenging task that started being studied in the 1970s.(Reference Stockham, Cannon and Ingebretsen80) Fifty years later, it seems impossible to perform an exhaustive review of existing methods and the following description will be lacunary. We refer the interested reader to(Reference Chaudhuri, Velmurugan and Rameshan18) for a general overview of this field and to(Reference Sarder and Nehorai74) for a survey more focused on microscopy. The prevailing approach is to estimate the original signal and the blur operator by solving variational problems of the form:
where $ {R}_A:{\mathrm{\mathbb{R}}}^{M\times N}\to \mathrm{\mathbb{R}}\cup \hskip0.3em \left\{+\infty \right\} $ and $ {R}_x:{\mathrm{\mathbb{R}}}^N\to \mathrm{\mathbb{R}}\cup \hskip0.3em \left\{+\infty \right\} $ are regularization terms for the operator and the signal respectively. This problem arises when considering maximum a posteriori (MAP) estimators.(Reference Levin, Weiss, Durand and Freeman50) It can be attacked with various types of alternating minimization procedures.(Reference Bolte, Sabach and Teboulle11) Before the advent of data-driven approaches, the regularizers were carefully designed to target specific features. The point spread functions can be considered as sparse and compactly supported for motion deblurring.(Reference Chakrabarti, Zickler and Freeman15, Reference Couzinie-Devy, Sun, Alahari and Ponce25, Reference Dai and Wu27, Reference Fergus, Singh, Hertzmann, Roweis and Freeman34, Reference Krahmer, Lin, McAdoo, Ott, Wang, Widemann and Wohlberg49, Reference Perrone and Favaro69, Reference Peyrin, Toma, Sixou, Denis, Burghardt and Pialat70, Reference Sun, Cho, Wang and Hays82) They are smooth for diffraction-limited systems(Reference Chan and Wong17) and can also be parameterized with Zernike polynomials in the pupil plane.(Reference Aristov, Lelandais, Rensen and Zimmer7, Reference Goodman40, Reference Keuper, Schmidt, Temerinac-Ott, Padeken, Heun, Ronneberger and Brox47, Reference Sarder and Nehorai74, Reference Soulez, Denis, Tourneur and Thiébaut78, Reference Soulez and Unser79) The images can sometimes be considered as sparse in microscopy and astronomical imaging(Reference Debarnot and Weiss31, Reference Mourya, Denis, Becker and Thiébaut60) or piecewise constant for natural images. The typical regularizer $ {R}_x $ is then the total variation, or more advanced priors on the image gradient.(Reference Bar, Sochen and Kiryati9, Reference Chakrabarti, Zickler and Freeman15, Reference Chan and Wong17, Reference Pankajakshan, Zhang, Blanc-Féraud, Kam, Olivo-Marin and Zerubia68–Reference Peyrin, Toma, Sixou, Denis, Burghardt and Pialat70) Some authors also advocate for the use of priors on the image spectrum,(Reference Goldstein and Fattal38, Reference Zachevsky and Zeevi86) which transform the blind deconvolution problem into a phase retrieval problem under ideal conditions.
The most recent variants of these approaches can provide excellent results (see e.g.,(Reference Pan, Hu, Su and Yang66, Reference Zhang, Fang, Ni and Zeng88)). However, they strongly rely on the detection of specific features (points, edges, textures) which may be absent or inaccurate models of the typical image features. In addition, problem (1) or its derivatives is usually highly nonconvex, and the initialization must be chosen carefully to ensure local convergence to the right minimizer. As a result, these methods require a substantial know-how to be successfully applied to a specific field.
In the most recent years, machine learning approaches have emerged and now seem to outperform carefully handcrafted ones, at least under well-controlled conditions. These approaches can be divided into two categories. The first category concerns methods that directly estimate the reconstructed image from the observation.(Reference Aljadaany, Pal and Savvides3, Reference Cho, Ji, Hong, Jung and Ko20, Reference Lucas, Iliadis, Molina and Katsaggelos55–Reference Mehri, Ardakani and Sappa57, Reference Nah, Kim and Lee61, Reference Noroozi, Chandramouli and Favaro64, Reference Schuler, Hirsch, Harmeling and Schölkopf75) The second category contains approaches that produce an estimation of the blur operator. This estimate can then be used to deblur the original image. These approaches are specifically tuned for applications in computer vision(Reference Chakrabarti14, Reference Gong, Yang, Liu, Zhang, Reid, Shen, Hengel and Shi39, Reference Li, Tofighi, Geng, Monga and Eldar51, Reference Schuler, Hirsch, Harmeling and Schölkopf75, Reference Sun, Cao, Xu and Ponce81) (motion and out-of-focus blurs) or diffraction-limited systems.(Reference Cumming and Gu26, Reference Möckl, Petrov and Moerner58, Reference Saha, Schmidt, Zhang, Barbotin, Hu, Ji, Booth, Weigert and Myers73, Reference Shajkofci and Liebling76, Reference Shajkofci and Liebling77, Reference Wang, Wang, Li, Hu, Yang and Gu85) Our work rather falls in the second category.
In this list of references, a few authors propose ideas closely related to the ones developed hereafter. In particular,(Reference Cumming and Gu26, Reference Saha, Schmidt, Zhang, Barbotin, Hu, Ji, Booth, Weigert and Myers73, Reference Wang, Wang, Li, Hu, Yang and Gu85) propose to estimate the pupil function of a microscope from images of point sources using neural networks. This idea is similar to the identification network in Figure 1. The two underlying assumptions are a space invariant system and the observation of a single point source. The idea closest to ours is from Shajkofci and Liebling.(Reference Shajkofci and Liebling76, Reference Shajkofci and Liebling77) Therein, the authors estimate a decomposition of the point spread function from a single image using a low-dimensional parameterization such as a decomposition over Zernike polynomials. The spatial variations are then estimated by splitting the observation domain in patches where the blur is assumed locally invariant. The image can then be deblurred using a Richardson–Lucy algorithm based on the estimated operator.
1.2. Contributions
In this work, we propose to use a pair of convolutional neural networks to first estimate the operator parameterization $ \overline{\boldsymbol{\unicode{x03B3}}}\in {\mathrm{\mathbb{R}}}^K $ and then use this parameterization to estimate the sharp image $ \overline{\mathbf{x}}\in {\mathrm{\mathbb{R}}}^N $ with a second convolutional neural network. The first network is the popular ResNet(Reference He, Zhang, Ren and Sun45) as in (Reference Shajkofci and Liebling77). The second network has the structure of an unrolled algorithm, which offers the advantage of adapting to the forward operator.(Reference Adler and Öktem1, Reference Adler and Oktem2, Reference Monga, Li and Eldar59) We call the resulting algorithm deep-blur, see Figure 1. This work contains various original features:
-
• It includes space-varying blur operators that are accurately and efficiently encoded using product-convolution expansions as illustrated in (Reference Denis, Thiébaut, Soulez, Becker and Mourya32, Reference Escande and Weiss33). In particular, we show that this approach is compatible with the characterization of an optical system as a low-dimensional subspace of operators proposed in (Reference Debarnot, Escande, Mangeat and Weiss28, Reference Debarnot, Escande and Weiss29). Most approaches in the literature decompose the observation space into patches and treat each patch independently. In this work, we consider operators with an impulse response that varies continuously in the field of view.
-
• The resulting deblurring network is able to adapt to different forward models and to handle model mismatches naturally. This issue is an important concern for the use of model-based inverse problem solvers.(Reference Antun, Renna, Poon, Adcock and Hansen6, Reference Genzel, Macdonald and Marz36, Reference Ongie, Jalal, Metzler, Baraniuk, Dimakis and Willett65) As will be discussed later, our approach can be seen as an intermediate step between the plug-and-play algorithms(Reference Venkatakrishnan, Bouman and Wohlberg83, Reference Zhang, Li, Zuo, Zhang, Van Gool and Timofte87) and the unrolled algorithms.(Reference Adler and Öktem1)
-
• We evaluate the efficiency, robustness, and stability of the proposed approach on various challenging problems, showing that the method is reliable and accurate.
The PyTorch implementation of our method is available on demand. We are currently integrating it into the DeepInv package.
2. Methods
In this article, we assume that the degraded signal $ \mathbf{y}\in {\mathrm{\mathbb{R}}}^M $ is generated according to the following equation:
where $ \mathbf{A}\left(\boldsymbol{\unicode{x03B3}} \right):{\mathrm{\mathbb{R}}}^N\to {\mathrm{\mathbb{R}}}^M $ is a linear operator describing the optical system. It depends on an unknown parameter $ \overline{\boldsymbol{\unicode{x03B3}}}\in {\mathrm{\mathbb{R}}}^P $ . The mapping $ \mathcal{P}:{\mathrm{\mathbb{R}}}^N\to {\mathrm{\mathbb{R}}}^N $ can model various deterministic or stochastic perturbations occurring in real systems such as additive white Gaussian noise, Poisson noise, and quantization. In this article, we will use a Poisson-Gaussian noise approximation detailed in (Reference Foi, Trimeche, Katkovnik and Egiazarian35). It is known to accurately model microscopes, except in the very low photon count regime. Other more complex models could be easily incorporated into the proposed framework at the learning stage. A critical aspect of this article is the parameterization of the forward operator $ \mathbf{A} $ . We discuss this aspect below.
2.1. Modeling the blur operators
We consider both space-invariant and space-varying blur operators and linear or nonlinear parameterization.
2.1.1. Linear parameterization
We may assume that $ \mathbf{A} $ belongs to a subspace of operators.
2.1.2. Convolution models and eigen-PSF bases
By far, the most widespread blurring model in imaging is based on convolution operators: the point spread function is identical whatever the position in space. This model is accurate for small fields of view, which are widespread in applications. Assuming that there is no subsampling in the model, we can set $ M=N $ and $ \mathbf{Ax}=\mathbf{h}\star \mathbf{x} $ for some unknown convolution kernel $ \mathbf{h} $ .
The convolution model strongly simplifies the blur identification problem since we are now looking for a vector of size $ N $ instead of a huge $ N\times N $ matrix. Yet, the blind deconvolution problem is known to suffer from many degeneracies and possesses a huge number of possible solutions, see for example (Reference Soulez and Unser79). To further restrict the space of admissible operators and therefore improve the identifiability, we can expand the kernel $ \mathbf{h} $ in an eigen-PSF basis. This leads to the following low-dimensional model.
Model 2.1 (Convolution and eigen-PSFs). We assume that
where $ \left({\mathbf{e}}_k\right) $ is an orthogonal family of convolution kernels called eigen-PSF basis.
Defining an eigen-PSF basis can be achieved by computing a principal component analysis of a family of observed or theoretical point spread functions.(Reference Gibson and Lanni37) An example of an experimental eigen-PSF basis obtained in (Reference Debarnot, Escande, Mangeat and Weiss28) is shown on Figure 2, top.
2.1.3. Space-variant models and product-convolution expansions
The convolution model 2.1 can only capture space-invariant impulse responses. When dealing with large field of views, this model becomes inaccurate. One way to overcome this limitation is to use product-convolution expansions,(Reference Debarnot, Escande, Mangeat and Weiss28, Reference Denis, Thiébaut, Soulez, Becker and Mourya32, Reference Escande and Weiss33) which efficiently encode space-varying systems.
Model 2.2 (Product-convolution expansions). Let $ {\left({\mathbf{e}}_i\right)}_{1\le i\le I} $ and $ {\left({\mathbf{f}}_j\right)}_{1\le j\le J} $ define two orthogonal families of $ {\mathrm{\mathbb{R}}}^N $ . The action of a product-convolution $ \mathbf{A} $ operator reads:
where $ \odot $ indicates the coordinate-wise (Hadamard) product.
In the above model, the basis $ \left({\mathbf{e}}_i\right) $ can still be interpreted as an eigen-PSF basis. Indeed, we …have for all locations $ z\in \left\{1,\dots,, N\right\} $ :
Hence, we see that each impulse response is expressed in the basis $ \left({\mathbf{e}}_i\right) $ . The basis $ \left({\mathbf{f}}_j\right) $ , on its side, can be interpreted as an eigen-space variation basis: it describes how the point spread functions can vary in space. It can be estimated by interpolation of the coefficient of a few scattered PSF in the eigen-PSF basis $ \left({\mathbf{e}}_i\right) $ . In optical devices such as microscopes, the estimation of the families $ \left({\mathbf{e}}_i\right) $ and $ \left({\mathbf{f}}_j\right) $ can be accomplished by observing several images of microbeads.(Reference Bigot, Escande and Weiss10, Reference Debarnot, Escande, Mangeat and Weiss28) An example of the experimental product-convolution family is shown in Figure 2 for a wide-field microscope. In that case, the dimension $ K $ of the subspace is $ K=I\cdot J=16 $ . Airy pattern oscillations are found in the first eigen-PSFs and intensity variations, such as nonhomogeneous illuminations/vignetting, in the spatial variation maps.
2.1.4. Nonlinear parameterization and Zernike polynomials
An alternative to the linear models is given by the theory of diffraction. A popular and effective model in microscopy and astronomy consists of using the Fresnel/Fraunhofer theory. We can approximate the pupil function with a finite number of Zernike polynomials.(Reference Goodman40, Reference Hanser, Gustafsson, Agard and Sedat44) This model leads to some of the state-of-the-art algorithms for blind deconvolution and superresolution in microscopy and astronomy.(Reference Aristov, Lelandais, Rensen and Zimmer7, Reference Nehme, Freedman, Gordon, Ferdman, Weiss, Alalouf, Naor, Orange, Michaeli and Shechtman62, Reference Sage, Donati, Soulez, Fortun, Schmit, Seitz, Guiet, Vonesch and Unser72, Reference Soulez, Denis, Tourneur and Thiébaut78)
Model 2.3 (Fresnel approximation and a Zernike basis). We assume that the forward model is a convolution with a slice of a continuous 3D kernel $ h\left(x,y,z\right) $ . The 3D kernel can be expressed through the 2D pupil function $ \phi $ as
where $ {f}_c=n/\lambda $ is the cutoff frequency, $ n $ is the refractive index of the immersion medium, and $ \lambda $ is the wavelength of the observation light and
The complex pupil function $ \phi $ can be expanded with Zernike polynomials $ {Z}_k $ :
where the coefficients $ \gamma \left[k\right]\in \mathrm{\mathbb{R}} $ are real number. Footnote 2
A few examples of slices of point spread functions generated with Model 2.3 are displayed in Figure 3. Notice that we do not use the first three Zernike polynomials (piston, tip, and tilt) as they do not influence the shape of the PSF. In our experiments, we used $ K=7 $ Zernike polynomials. In the Noll nomenclature, they are referred to as $ {Z}_4 $ : defocus, $ {Z}_5 $ - $ {Z}_6 $ : primary astigmatism, $ {Z}_7 $ - $ {Z}_8 $ : primary coma, and $ {Z}_9 $ - $ {Z}_{10} $ : trefoil. We set the coefficients $ \gamma $ as uniform random variables with an amplitude smaller than 0.15. As can be seen, a rich variety of impulse responses can be generated with this low-dimensional model.
2.2. The deep-blur architecture
We propose to train two different neural networks $ \mathrm{IN} $ and $ \mathrm{DN} $ sequentially:
-
• $ \mathrm{IN} $ is an identification network. It depends on weights $ \boldsymbol{\theta} $ . The mapping $ \mathrm{IN}\left(\boldsymbol{\theta} \right):{\mathrm{\mathbb{R}}}^M\to {\mathrm{\mathbb{R}}}^K $ takes as an input a degraded image $ \mathbf{y}\in {\mathrm{\mathbb{R}}}^M $ and provides an estimate $ \hat{\boldsymbol{\unicode{x03B3}}} $ of $ \overline{\boldsymbol{\unicode{x03B3}}} $ in $ {\mathrm{\mathbb{R}}}^K $ .
-
• DN is a deblurring network. It depends on weights $ \boldsymbol{\xi} $ . The mapping $ \mathrm{DN}\left(\boldsymbol{\xi} \right):\left({\mathrm{\mathbb{R}}}^M,{\mathrm{\mathbb{R}}}^K\right)\to {\mathrm{\mathbb{R}}}^N $ takes as input parameters the blurry image $ \mathbf{y} $ and the operator coefficient $ \boldsymbol{\unicode{x03B3}} $ . It outputs an estimate $ \hat{\mathbf{x}} $ of the sharp image $ \overline{\mathbf{x}} $ .
2.2.1. The identification network
Traditional estimation of a blur kernel relies on the detection of cues in the image such as points (direct observation(Reference Bigot, Escande and Weiss10, Reference Cumming and Gu26, Reference Debarnot and Weiss31, Reference Saha, Schmidt, Zhang, Barbotin, Hu, Ji, Booth, Weigert and Myers73)), edges in different orientations (Radon transform of the kernel(Reference Krahmer, Lin, McAdoo, Ott, Wang, Widemann and Wohlberg49)), or textures (power spectrum(Reference Goldstein and Fattal38)) followed by adapted inversion procedures. This whole process can be modeled by a set of linear operations (filtering) and nonlinear operations (e.g., thresholding). A convolutional neural network, composed of similar operations, should therefore be expressive enough to estimate the blur parameters. This is the case for the deep-blur identification architecture, a ResNet encoder,(Reference He, Zhang, Ren and Sun45) as shown in Figure 1, left. It consists of a succession of convolutions, ReLU activation, batch normalization, and average pooling layers, which sequentially reduce the image dimensions. The last layer is an adaptive average pooling layer, mapping the output of the penultimate layer to a vector of constant size $ K $ . In our experiments, the total number of trainable parameters, which includes the weights of the ResNet, that is, the convolution kernels in the convolution layers, the biases in the convolution layers and the weights of the adaptive pooling layer, is $ \mid \boldsymbol{\theta} \mid =\mathrm{11,178,448} $ . The encoder structure has been proven to be particularly effective for a large panel of signal processing tasks.(Reference Zhang, Isola, Efros, Shechtman and Wang89)
2.2.2. The deblurring network
The proposed deblurring network mimics a Douglas–Rachford algorithm.(Reference Combettes and Pesquet21) It is sometimes called an unrolled or unfolded network. This type of network currently achieves near state-of-the-art performance for a wide range of inverse problems (see e.g. (Reference Monga, Li and Eldar59)). It has the advantages of having a natural interpretation as an approximate solution of a variational problem and naturally adapts to changes of the observation operators.
Deep unrolling. For $ \lambda >0 $ , let $ {\mathbf{R}}_{\gamma, \lambda } $ denote the following regularized inverse:
For a parameter $ \boldsymbol{\unicode{x03B3}} $ describing the forward operator and an input image $ \mathbf{y} $ , the Douglas–Rachford algorithm can be described by the following sequence of operations, ran from $ t=0 $ to $ t=T-1 $ with $ T\in \mathrm{\mathbb{N}} $ .
Algorithm 1 The Douglas–Rachford deblurring network $ \mathrm{DN} $ .
Require: iteration number $ T\in \mathrm{\mathbb{N}} $ , operator $ \boldsymbol{\unicode{x03B3}} $ , scale $ \lambda \in {\mathrm{\mathbb{R}}}_{+} $
$ {\mathbf{z}}_0={\mathbf{R}}_{\boldsymbol{\unicode{x03B3}}, \lambda}\left(\mathbf{y}\right) $
for all $ t=0\to T-1 $ do
$ {\mathbf{x}}_{t+1}={\mathrm{PN}}_t\left({\mathbf{z}}_t\right) $
$ {\mathbf{z}}_{t+1}={\mathbf{z}}_t+{\mathbf{R}}_{\boldsymbol{\unicode{x03B3}}, \lambda}\left(2{\mathbf{x}}_t-{\mathbf{z}}_t\right)-{\mathbf{x}}_t $
end for
The initial guess $ {\mathbf{z}}_0 $ corresponds to the solution of
It can be evaluated approximately with a conjugate gradient algorithm run for a few iterations (20 in our implementation).
The mapping $ {\mathrm{PN}}_t\left({\boldsymbol{\xi}}_t\right):{\mathrm{\mathbb{R}}}^N\to {\mathrm{\mathbb{R}}}^N $ can be interpreted as a “proximal neural network.” Proximal operators(Reference Combettes and Pesquet21) have been used massively in the last 20 years to regularize inverse problems. A popular example is the soft-thresholding operator, which is known to promote sparse solutions. Here, we propose to learn the regularizer as a neural network denoted $ {\mathrm{PN}}_t $ , which may change from one iteration to the next. It corresponds to the green layers in Figure 1.
The parameters $ \boldsymbol{\xi} $ that are learned are the weights $ {\boldsymbol{\xi}}_t $ defining the $ t $ th proximal neural network $ {\mathrm{PN}}_t $ . In our experiments, the networks $ {\mathrm{PN}}_t $ have the same architecture for all $ 1\le t\le T $ . We used the current state-of-the-art network used in plug-and-play algorithms called DRUNet.(Reference Hurault, Leclaire and Papadakis46, Reference Zhang, Li, Zuo, Zhang, Van Gool and Timofte87) We set $ T=4 $ iterations. Each of the 4 proximal networks contain $ \mathrm{8,159,808} $ parameters, resulting in a total of $ \mid \boldsymbol{\xi} \mid =\mathrm{32,639,232} $ parameters to be trained.
2.3. Training
We propose to first train the identification network $ \mathrm{IN}\left(\boldsymbol{\theta} \right) $ alone and then train the deblurring network $ \mathrm{DN}\left(\boldsymbol{\xi} \right) $ with the output of the identification network as an input parameter. This sequential approach presents two advantages:
-
• The memory consumption is lower. The automatic differentiation only needs to store the parameters of the individual networks, instead of both. This reduces the memory footprint.
-
• The identification network can be used independently of the other, and it is therefore tempting to train it separately. In metrology applications, for instance, where the aim is to follow the state of an optical system through time, the identification network IN is the most relevant brick. In some applications, such as superresolution from single molecules, the deblurring network could be replaced by a more standard total variation-based solver,(Reference Bredies and Pikkarainen13) once the operator is estimated.
In what follows, we let $ \mathcal{X}\subset {\mathrm{\mathbb{R}}}^N $ denote a dataset of admissible images/signals and $ {\mathcal{L}}_{\mathcal{X}} $ denote a sampling distribution over $ \mathcal{X} $ . We let $ {\mathcal{L}}_{\Gamma} $ denote a sampling distribution on the set $ {\mathrm{\mathbb{R}}}^K $ of blur parameters. In our experiments, the perturbation $ \mathcal{P} $ in Equation (2) is assumed to be an approximation of the Poisson-Gaussian noise.(Reference Foi, Trimeche, Katkovnik and Egiazarian35) We assume that $ \mathbf{y}=\mathbf{A}\left(\overline{\boldsymbol{\unicode{x03B3}}}\right)\overline{\mathbf{x}}+\mathbf{b} $ , where $ \mathbf{b}\left[z\right]\sim \boldsymbol{\sigma} \left[z\right]\boldsymbol{\eta} \left[z\right] $ , $ \eta \sim \mathcal{N}\left(0,{\mathbf{I}}_M\right) $ , and $ \boldsymbol{\sigma} \left[z\right]=\sqrt{\alpha \left(\mathbf{A}\left(\overline{\boldsymbol{\unicode{x03B3}}}\right)\overline{\mathbf{x}}\right)\left[z\right]+\beta } $ . The parameters $ \alpha $ and $ \beta $ are set uniformly at random in the ranges $ \alpha \in \left[\mathrm{0,0.05}\right] $ and $ \beta \in \left[\mathrm{0,0.15}\right] $ . In what follows, we let $ {\mathcal{L}}_b $ denote the noise distribution that we just described.
We propose to train both the identification and the deblurring networks using the empirical risk minimization. First, the identification network is trained by solving:
Once the identification network $ \mathrm{IN} $ is trained, we turn to the deblurring network by solving the following optimization problem:
where $ \mathbf{y}=\mathbf{A}\left(\boldsymbol{\unicode{x03B3}} \right)\mathbf{x}+\mathbf{b} $ is the degraded image and $ \hat{\boldsymbol{\unicode{x03B3}}}=\mathrm{IN}\left(\boldsymbol{\theta} \right)\left(\mathbf{y}\right) $ is the estimated parameter. Of importance, notice that we do not plug the true parameter $ \boldsymbol{\unicode{x03B3}} $ in 5, but rather the estimated one $ \hat{\boldsymbol{\unicode{x03B3}}} $ . This way, the deblurring network DN can learn to correct model mismatches that may occur at the estimation step.
The two problems above consist in constructing minimum mean square estimators (MMSE). At the end of the training procedure – under technical assumptions(Reference Gossard and Weiss41) – we can consider that the networks approximate a conditional expectation:
This is – by construction – the best estimators that can be generated on average. This approach is therefore really different from most alternatives in the literature, which consist of constructing MAP estimators. MMSE estimators can be expressed as integrals, which depend heavily on the operator distributions $ {\mathcal{L}}_{\Gamma} $ and on the image distribution $ {\mathcal{L}}_{\mathcal{X}} $ . They should therefore be constructed carefully depending on the physical knowledge of the observation system (resp. observed sample). By using the general computer vision database COCO, we hope to cover a wide range of image contents, leading to a wide-purpose method for identification. The performance could likely be improved using more specific databases. For instance, we could simulate the images according to realistic processes for specific applications such as single-molecule localization. This is out of the scope of this article. For $ {\mathcal{L}}_{\Gamma} $ , we sample a large set of realistic parameters uniformly at random in our experiments.
The above optimization problems are solved approximately using stochastic gradient descent-type algorithms. In our experiments, we used the Adam optimizer(Reference Kingma and Ba48) with the default parameters: the learning rate is set to 0.001, betas are (0.9, 0.999), epsilon is 1e-8, weight decay is 0, and amsgrad is false.
3. Results
Let us illustrate the different ideas proposed in this article. In all our experiments, we trained the neural networks using the MS COCO dataset.(Reference Lin, Maire, Belongie, Hays, Perona, Ramanan, Dollár and Zitnick53) It contains 118,287 images in the training set and 40,670 images in the test set. It is composed of images of everyday scenes, capturing objects in various indoor and outdoor environments. It presents substantial differences with typical microscopy images, but the high diversity and quality of the images makes it possible to construct efficient generic image priors. This was already observed in (Reference Shajkofci and Liebling77).
3.1. Convolution operators
We evaluate the accuracy of the identification and deblurring networks for convolution (i.e., space invariant) operators. We assess them for images generated with point spread functions expanded in Zernike polynomial.
3.1.1. Identifying convolution operators
We assess the ability of a residual network to identify the point spread function generated by the Fresnel diffraction Model 2.3. A similar study was carried out in (Reference Shajkofci and Liebling76) with $ K=3 $ coefficients. Here, we extend the study to $ K=7 $ coefficients allowing us to represent the following aberrations in the Noll nomenclature(Reference Noll63): defocus, primary astigmatism, primary coma, trefoil, and primary spherical.
We generate random PSFs by drawing the coefficients $ \boldsymbol{\unicode{x03B3}} \left[k\right] $ (see Model 2.3) uniformly in the range $ \left[-\eta, \eta \right] $ . The higher $ \eta $ , the more spread and oscillating the PSFs. Hence, $ \eta $ can be interpreted as a measure of PSF complexity. The model was trained for a value of $ \eta =0.15 $ .
In the first experiment, we simply used additive white Gaussian noise (i.e. $ \beta =0 $ ) of standard deviation $ \alpha $ . Figure 3 shows the identification results for 3 images taken at random from the test set and 3 operators taken at random in the operator set. On these examples, the network provides faithful estimates despite a substantial noise level and images with little contents. To further characterize the network efficiency, we measure the distribution of signal-to-noise-ratio (SNR) in the noiseless regime. For a kernel $ \mathbf{h} $ , the error of the estimated kernel $ \hat{\mathbf{h}} $ is defined by
Figure 4 summarizes the conclusions. In average, the identification network outputs estimates with a relative error below 5%.
Finally, we study the stability to the noise level $ \alpha $ in Figure 5a – and to the PSF complexity $ \eta $ in Figure 5b. As can be seen, the identification outputs predictions with less than 10% error with a probability larger than 0.5 up to a large noise level of $ \alpha =0.1 $ for images in the range $ \left[0,1\right] $ . The dependency on the kernel’s complexity, measured through the Zernike polynomials amplitude $ \eta $ is very clear with typical errors below 2% for $ \eta <0.1 $ and then a relatively fast increase. It is nonetheless remarkable that the identification returns estimates with less than 15% error for $ \eta =0.2 $ , which produces more complex PSFs than those observed during the training phase, showing some ability of the network to generalize.
3.1.2. Evaluating the deblurring network
We evaluate the performance of the proposed deblurring network for convolution operators defined using the Fresnel approximation. Figures 6–8 display some deconvolution results for different methods. The corresponding image quality measures are displayed in Table 1.
Note: The standard deviation is given after the symbol ±.
Bold numbers indicate the best performing method.
Notice that this problem is particularly involved: there is a complete loss of information in the high frequencies since the convolution kernels are bandlimited and we treat different noise levels up to rather high values (here $ \alpha =0.12 $ , $ \beta =0.24 $ for images in the range $ \left[0,1\right] $ ). Despite this challenging setting, it can be seen both perceptually and from the SSIM (structural similarity index measure) that the image quality is improved whatever the noise level. It is also remarkable to observe that the proposed network architecture allows us to treat images with different noise levels. This is an important feature of the DRUNet used as a proximal network.(Reference Zhang, Li, Zuo, Zhang, Van Gool and Timofte87)
We also propose some comparisons with other methods from the literature. Whenever possible, we optimized the hyperparameters by hand for each noise level to produce the best possible output. We chose the following methods:
-
• The $ {\mathrm{\ell}}^0 $ -gradient prior.(Reference Pan, Hu, Su and Yang66, Reference Pan, Sun, Pfister and Yang67) This method is one of the state-of-the-art handcrafted blind deblurring methods. An efficient implementation was recently proposed in (Reference Anger, Facciolo and Delbracio5).
-
• In (Reference Goldstein and Fattal38), the authors proposed a kernel estimation method based on the assumption that the image spectrum amplitude has a specific decaying distribution in the Fourier plane. The kernel estimation then boils down to a phase retrieval problem. An efficient implementation was recently proposed in (Reference Anger, Facciolo and Delbracio4).
-
• We also tested two state-of-the-art neural network approaches. The first one was a past leader of the Go-Pro deblurring challenge called NAFNET.(Reference Chen, Chu, Zhang and Sun19)
The deep learning method is retrained on the same dataset as our method. As can be seen from Table 1 and the perceptual results in Figures 6–8, deep-blur outperforms the other three methods that we considered by a large margin. The PSF is recovered with an average accuracy varying between 12.9 dB in the noiseless regime to 7.9 dB in the high-noise regime using deep-blur. The image quality is improved in terms of SSIM by 0.1 in the noiseless regime to 0.2 in the high-noise regime.
All the other methods yield negative SNR for the PSF. At a perceptual level, handcrafted methods (Goldstein-Fattal and $ {\mathrm{\ell}}^0 $ gradient prior) still recover the PSF shape approximately. The recovered image is also sharpened, but its SSIM quality is actually lowered by more than 0.1 in the noiseless regime, and improved by 0.1 in the high-noise regime. The SSIM is always lower than the one of deep-blur.
Experiments on real images. In Figure 9, we provide a few deep-blur results on real microscopy images from the dataset.(Reference Hagen, Bendesky, Machado, Nguyen, Kumar and Ventura43) We used the sample images available on the following link. As can be seen, the reconstructed images are denoised and have a better visual contrast In this experiment, we do not have a ground-truth deblurred result and the quality can only be assessed by visual inspection. Validating the estimation requires careful optics experiments, which we leave as an open topic for now.
Training on the true or estimated operators. At training time, we can feed the unrolled deblurring network with the operator that was used to synthesize the blurry image, or the one estimated using the identification network. The potential advantage of the second option is to train the proximal networks to correct model mismatches. We tested both solutions on two different operator families. It turns out that they led to nearly indistinguishable results overall in average. The most likely explanation for this phenomenon is that the model mismatches produced by the identification network cannot be corrected with the proximal networks.
Memory and computing times. The model contains about $ 11\cdot {10}^6 $ parameters for the identification part and $ 32\cdot {10}^6 $ parameters for the deblurring part. This is a total of $ 43\cdot {10}^6 $ trainable parameters. This size is comparable to the usual computer vision models available in TorchVision. For example, it is slightly smaller than a ResNet101 classifier. The deep-blur model uses about 1 GB of GPU memory at test time, which can be considered lightweight, since it fits on most consumer graphics cards.
After training, it takes 0.3 seconds to identify a kernel and deblur an image of size $ 400\times 400 $ on an Nvidia RTX 8000 with 16 TFlops. For comparison, the handcrafted models used in our numerical comparisons take between 5 seconds and a few minutes to perform the same task on the CPU. No GPU implementation is provided.
3.2. Product-convolution operators
To finish the numerical experiments, we illustrate how the proposed ideas perform on product-convolution operators.
We first illustrate the performance of the identification network. We trained the identification network on natural images from the MS COCO dataset, but evaluate it on biological images from microscopes. We selected 6 images: ImBio 1 is an histopathology of angiolipoma,(22) ImBio 2 is an histopathology of reactive gastropathy,(23), ImBio 3 and 4 are actin filaments within a cell,(24) ImBio 5 is an slice of a spheroid from (Reference Lorenzo, Frongia, Jorand, Fehrenbach, Weiss, Maandhui, Gay, Ducommun and Lobjois54), and ImBio 6 is a crop of a podosome obtain on a wide-field microscope.(Reference Bouissou, Proag, Bourg, Pingris, Cabriel, Balor, Mangeat, Thibault, Vieu, Dupuis, Fort, Lévêque-Fort, Maridonneau-Parini and Poincloux12)
The blur operators are generated by Model 2.2 using $ K=16 $ parameters. The blur model is obtained following the procedure described in (Reference Debarnot, Escande, Mangeat and Weiss28). To compute the product-convolution decomposition described in Model 2.2, we collected 18 stacks of 21 images of microbeads spaced by 200 nm on a wide-field microscope with a ×100 objective lens (CFI SR APO 100XH NA 1,49 DT 0,12 Nikon) mounted on a Nikon Eclipse Ti-E and a Hamamatsu sCMOS camera (ORCA FLASH4.0 LT). Figure 10 shows the identification results. The blur coefficients predicted by the deep-blur identification are accurate estimates in all cases. On average, the SNR is much higher than in the previous experiment, which can likely be explained by a smaller dimensionality of the operators’ family. In all cases, the image quality is improved despite an additive white Gaussian noise with $ \alpha =1\cdot {10}^{-2} $ and $ \beta =0 $ . This is remarkable since this type of image is different from the typical computer vision images found in the MS COCO dataset.
4. Discussion
We proposed an efficient and lightweight network architecture for solving challenging blind deblurring problems in optics. An encoder first identifies a low-dimensional parameterization of the optical system from the blurry image. A second network with an unrolled architecture exploits this information to efficiently deblur the image. The performance of the overall architecture compares favorably with alternative approaches designed in the field of computer vision. The principal reason is that our network is trained using fine physical models obtained using Fresnel diffraction theory or experimental data providing accurate space-varying models. A second reason is that the unrolled architecture proposed herein emerges as a state-of-the-art competitor for a wide range of inverse problems. Overall, we believe that the proposed network, trained carefully on a large collection of blurs and images could provide a universal tool to deblur microscope images. In the future, we would like to carry out specific optical experiments to ensure that the results obtained with synthetic data are reproducible and trustworthy with real images. The initial results, obtained without reference images for comparisons are however really encouraging.
Differences with plug and play and deep unrolling. The proposed unrolled architecture follows closely the usual unrolled algorithms.(Reference Adler and Öktem1, Reference Adler and Oktem2, Reference Li, Tofighi, Geng, Monga and Eldar51, Reference Li, Tofighi, Monga and Eldar52, Reference Monga, Li and Eldar59) There is, however, a major difference: traditionally, these unrolled architectures are trained to invert a single operator. In this article, we train the network with a family of operators. The results we obtained confirm some results obtained in (Reference Gossard and Weiss41): this approach only results in marginal performance loss for a given operator if the family is sufficiently small while providing a massive improvement in adaptivity to all the operators. In a sense, the proposed approach can be seen as an intermediate step between the plug-and-play priors (related to diffusion models(Reference Graikos, Malkin, Jojic and Samaras42)), which are designed to adapt to all possible operators, and the traditional unrolled algorithm is adapted to a single one.
The limits of a low-dimensional parameterization. The models considered are sufficiently rich to describe most optical devices accurately. In microscopy, they can capture defocus, refractive index mismatches, changes of temperature, tilts of optical components, usual optical aberrations with a parameter dimension $ K $ smaller than 20.
Notice however that some phenomena can hardly be modeled by low-dimensional parameterization. In microscopy, for instance, diffraction by the sample itself can lead to extremely complicated and diverse forward models better described by nonlinear equations (see e.g., (Reference Pham, Soubies, Ayoub, Lim, Psaltis and Unser71)). Similarly, in computer vision, motion and defocus blurs can vary abruptly with the movement and depth of the objects. The resulting operators would likely require a large number of parameters, which is out of the scope of this article.
Overall, we see that the proposed contribution is well adapted to the correction of systems with slowly varying point spread functions but probably does not extend easily to fast variations that can be induced by some complex biological samples.
5. Conclusion
We proposed a specific neural network architecture to solve blind deblurring problems where distortions come from the optical elements. We evaluated its performance carefully on blind deblurring problems with space invariant and space-varying operators. A key assumption is to have access to a forward model that depends on a set of parameters. The network first estimates the unknown parameters describing the forward model from the measurements with a ResNet architecture. In a second step, an unrolled algorithm solves the inverse problem with a forward model that was estimated at the previous step. After designing a careful training procedure, we showed an advantage of the proposed approach in terms of robustness to noise levels and adaptivity to a vast family of operators and conditions not seen during the training phase.
Acknowledgments
The authors wish to thank Emmanuel Soubies and Thomas Mangeat for fruitful discussions. They thank the associate editor and the anonymous reviewers for their comments and advices.
Data availability statement
The deep-blur architecture and the trained weights can be provided on explicit demand by e-mail at [email protected].
Author contribution
Both authors contributed equally to all parts (conception, writing, programming, training, and experiments) of this article. P.W. was responsible for finding the funding.
Funding statement
This work was supported by the ANR Micro-Blind ANR-21-CE48–0008 and by the ANR LabEx CIMI (grant ANR-11-LABX-0040) within the French State Program “Investissements d’Avenir”. The authors acknowledge the support of AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future—PIA3” program under the Grant Agreement ANR-19-PI3A-0004. This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-AD011012210R1).
Competing interest
The authors declare none.