Impact Statement
This research addresses a crucial challenge in single-image super-resolution (SISR) by introducing Regularized Latent Search (RLS). Traditional SISR models depend on large paired datasets and offer only moderate resolution enhancements. Our method overcomes these limitations by leveraging the latent space of pre-trained generative models to find plausible HR images corresponding to LR inputs. This approach not only removes the necessity for paired images but also allows for a significant resolution boost of up to 32x. The significance of our approach lies in its ability to balance fidelity to the ground truth and realism of the recovered image given a distribution prior. By preserving biologically relevant content and interpretable features, our method ensures that super-resolved images are not only faithful to the original LR images but also retain essential domain-specific characteristics. This balance between fidelity and realism is crucial for accurate phenotypic interpretation and quantitative analysis, making our method particularly valuable for medical diagnostics and biological research.
1. Introduction
Various deep learning models have shown excellent performance in the single-image super-resolution (SISR) task, which aims to restore a high-resolution (HR) image from its low-resolution (LR) counterpart. Deep learning SISR models have been applied few years ago to enhance the resolution of microscopy images.(Reference Weigert, Schmidt and Boothe 1–Reference Ouyang, Aristov, Lelek, Hao and Zimmer3) More recently, studies have explored image-to-image translation models, which are trained to learn a parameterized function between two different image resolutions or modalities. These supervised approaches require a large number of paired images and rely on generative models that output artificial images.(Reference Hoffman, Slavitt and Fitzpatrick 4 ) These artificial images were accepted by the community as real mainly because they were measured as approaching real image instances. However, for the latter to hold, these approaches only offered a moderate increase in resolution of up to 4x.
Importantly, learning a parameterized mapping from a LR image to a HR image is an ill-posed problem: A single LR image corresponds to infinitely many highly resolved ones. Therefore, the super-resolution (SR) task cannot only consist of maximizing the fidelity of the recovered HR image; it must be further constrained to be well-posed. In this work, we propose to constrain this task by imposing that the reconstructed SR image be realistic, that is, belongs to a given image distribution, a principle that is also explored by utilizing gradient distribution prior in the context of biomedical images.(Reference Gong and Sbalzarini 5 ) In short, on the one hand, fidelity refers to recovering a SR image that, once downgraded, is close to the original image. On the other hand, realism refers to keeping the image within a given image domain.
In this paper, we aim to better define the SR task by enforcing the solution to be a trade-off between fidelity to the original LR image and realism, which includes the preservation of biologically relevant content, which we describe as “interpretable features.” Assessing these features is essential for accurate phenotypic interpretation and discrimination. By guiding the SR process toward this end, we generate images that are both faithful to the sample and biologically interpretable and quantifiable. In this way, we anticipate that SR in biology could exploit more than recovering only the details of a given sample, but benefit from features measured over a set of recovered images.
Our approach, regularized latent search (RLS), consists of a regularized search in the latent space of a pretrained generative model for HR images. By doing so, we both suppress the need for paired images and make the problem well-posed as we search for the closest image a generator can produce that, when downscaled, matches the LR image input. Moreover, as we keep the super-resolved image in the original domain, instead of producing a moderate increase in resolution, we propose to push further the synthesis: We anticipate that creating very highly resolved (up to 32x) but controlled artificial images could be of great interest for applications such as diagnostic. This is because what is at stake, in this case, is the preservation of measurable and interpretable features of microscopy images, not the absolute matching with real image samples.
Similar to image-to-image translation methods like CycleGAN,(Reference Zhu, Park, Isola and Efros 6 ) which operate without the need for paired images, our method differentiates itself in its approach to SR. It is specifically designed to address the ill-posed nature of SR by utilizing a generative adversarial network (GAN)’s latent space to identify plausible HR images for any given LR input. Once the model is trained, it enables the SR of any LR image with just an adjustment in the degradation function during inference. Unlike CycleGAN, which is deterministic and tailored for specific domain translations requiring individual models for each task, our method offers versatility and is suitable for a broad spectrum of SR tasks. It eliminates the need to learn domain-specific features and is intended for widespread application in SR, capable of being trained on a single dataset to enhance any LR image.
2. Related work and background
2.1. SR of microscopy images
While optical SR techniques such as stimulated emission depletion (STED)(Reference Hell and Wichmann 7 ) and photoactivation localization microscopy (PALM)(Reference Betzig, Patterson, Sougrat, Lindwasser, Olenych, Bonifacino, Davidson, Lippincott-Schwartz and Hess 8 ) can break the diffraction limit, they are limited by the need for specialized equipment and complex sample preparation. Computational SR methods, on the other hand, while they rely on existing training data, can represent a cheap way to enhance the resolution of images acquired with conventional microscopes. Several approaches were developed to address the problem of computational SR of microscopy images. Deep-STORM (Stochastic Optical Reconstruction Microscopy)(Reference Nehme, Weiss, Michaeli and Shechtman 9 ) uses an encoder–decoder network to localize emitters in super-resolved images. Content-aware image restoration(Reference Weigert, Schmidt and Boothe 1 ) uses a U-Net architecture and is trained with low signal-to-noise ratio (input) and high signal-to-noise ratio (target) image pairs. These approaches produced interesting results, but recovered images often lacked high-frequency details due to the mean squared error (MSE) loss. To balance this issue, other approaches such as ANNA-PALM (Artificial Neural Network-based Photo Activated Localization Microscopy)(Reference Ouyang, Aristov, Lelek, Hao and Zimmer 3 ) are based on GAN. The authors trained a U-Net, and in contrast with Deep-STORM, a combination of pixel-wise reconstruction loss and adversarial losses is used to obtain image reconstructions of better quality. Using a similar network architecture and training loss, Wang et al.(Reference Wang, Rivenson, Jin, Wei, Gao, Günaydın, Bentolila, Kural and Ozcan 2 ) achieved SR in fluorescence microscopy across different modalities. Overall, using an adversarial loss results in output images that are sharper and of better perceptual quality and these methods demonstrated the potential of deep learning to improve the spatial resolution of fluorescence microscopy images. While the deep learning architectures and applications of these methods differ, they all require hard-to-get paired image data for training and offer a moderate increase in resolution of up to 4x.
It is essential to note that computational SR techniques effectively enhance the resolution of images by optimizing the use of available data in the original images. However, these methods are constrained by the existing information and cannot add details beyond what the original optical systems could capture.
2.2. Style-based generative models
StyleGAN (Style-based Generative Adversarial Network) models(Reference Karras, Laine and Aila 10 , Reference Karras, Laine, Aittala, Hellsten, Lehtinen and Aila 11 ) are well known for their ability to generate highly realistic images. The StyleGAN architecture consists of two subnetworks: a mapping network denoted by $ {\mathrm{G}}_m:{\mathrm{\mathbb{R}}}^d\to {\mathrm{\mathbb{R}}}^d $ and a synthesis network consisting of $ L $ layers denoted by $ {\mathrm{G}}_s:{\mathrm{\mathbb{R}}}^{L\times d}\to {\mathrm{\mathbb{R}}}^n $ . Here, $ d $ represents the dimensionality of the latent space. The mapping network takes a sample $ \mathbf{z}\in {\mathrm{\mathbb{R}}}^d $ from a standard normal distribution and maps it to a vector $ \mathbf{w}\in \mathcal{W} $ , where $ \mathcal{W} $ denotes the intermediate latent space. StyleGAN2 introduced path length penalty, which encourages a fixed-size step in $ \mathbf{w} $ to result in a fixed-magnitude change in the image. The regularization term is computed using the Jacobian determinant, and penalizing changes in it promotes the generation of smoother and more realistic images. $ L $ copies of the d-dimensional vector $ \mathbf{w} $ are fed to the $ L $ -layer synthesis network $ {\mathrm{G}}_s $ , with each copy representing the input to the corresponding layer of $ {\mathrm{G}}_s $ . The $ {\mathrm{G}}_s $ network controls the level of detail in the generated image at each layer. Individual modification of these $ L $ layers, by adjusting the latent vector copy of $ \mathbf{w} $ for each layer, extends the latent space into $ {\mathcal{W}}^{+} $ . This extended latent space enhances the model’s capability for accurate image reconstruction, which is vital for SR tasks, offering more nuanced control over the image generation process.(Reference Abdal, Qin and Wonka 12 , Reference Wulff and Torralba 13 )
2.3. GAN-based high-resolution image reconstruction
The problem of obtaining a SR image of dimension $ n $ from a low-resolution (LR) image of dimension $ m $ is ill-posed as, for a non-invertible forward operator $ \mathrm{D} $ with $ m<n $ , there are infinitely many HR images that match a given LR image. Thus, the reconstruction procedure must be further constrained by prior information to better define the objective and lead to a stable solution. One such prior consisted of considering the reconstructed HR image to be part of a given domain. First, the distribution of HR images is learned in an unsupervised fashion, thanks to a GAN, and then the latent space of this trained GAN is searched to find a latent vector producing an HR image that, once downscaled, is the closest to the LR image input. GAN prior-based image reconstruction was first introduced by Bora et al.(Reference Bora, Jalal, Price and Dimakis 14 ) and further improved using StyleGAN(Reference Karras, Laine and Aila 10 , Reference Karras, Laine, Aittala, Hellsten, Lehtinen and Aila 11 ) by Menon et al. in PULSE (Photo Upsampling via Latent Space Exploration)(Reference Menon, Damian, Hu, Ravi and Rudin 15 ) by constraining the search to remain on the image manifold. To this end, PULSE and two other studies(Reference Wulff and Torralba 13 , Reference Zhu, Abdal, Qin, Femiani and Wonka 16 ) use an invertible transformation of the intermediate latent space $ {\mathcal{W}}^{+} $ , which includes a leaky rectified linear unit (rectified linear unit (ReLU))(Reference Goodfellow, Bengio and Courville 17 ) followed by an affine whitening transformation, so that transformed latent vectors approximately followed the standard Gaussian distribution $ \mathcal{N}\left(\mathbf{0},{\mathbf{I}}_d\right) $ . Sampled vectors are then constrained to lie around a hypersphere with radius $ \sqrt{d} $ hypothesizing that most of the mass of a high-dimensional Gaussian distribution is located at or near $ \sqrt{d}{\mathcal{S}}^{d-1} $ , where $ {\mathcal{S}}^{d-1} $ is the $ d $ -dimensional unit hypersphere. Constraining samples to lie in a dense area of the StyleGAN style distribution resulted in increased realism of the generated images.
Although the above approach showed major improvements over previous work, it also presents important caveats that in practice led to image artifacts. Here, we show that transforming the intermediate latent space in this way does not lead to an accurate standard Gaussian distribution, and so prevents proper regularization based on this hypothesis. Moreover, as the search is strictly limited to the spherical surface $ {\mathcal{S}}^{d-1} $ , this limitation may prevent the search from reaching the closest reconstruction of the HR image.
In this work, we regularize the search in the latent space for a latent code located in “healthy” regions of the latent space. In this way, the system is constrained to produce images that belong to the original image domain StyleGAN was trained on. To do so, we take advantage of normalizing flow to Gaussianize the latent style sample distribution, which leads to a much closer standard Gaussian distribution. We then use this revertible transformation to regularize the search in $ {\mathcal{W}}^{+} $ such that it remains in a high-density area of the style vector distribution. We then show experimentally that the latter produces reconstructed images that are not only realistic but also more faithful to the original HR image.
3. Method
3.1. Super-resolution by Regularized Latent Search
SR aims to reconstruct an unknown HR image $ \mathbf{x}\in {\mathrm{\mathbb{R}}}^{n\times n} $ from a LR image $ \mathbf{y}\in {\mathrm{\mathbb{R}}}^{m\times m} $ , which is related to the HR image by a downscaling process described by $ \mathbf{y}=\mathrm{D}\left(\mathbf{x}\right)+\delta $ , where $ \mathrm{D}:{\mathrm{\mathbb{R}}}^{n\times n}\to {\mathrm{\mathbb{R}}}^{m\times m} $ is a non-invertible downscaling forward operator and $ \delta $ is an independent noise with distribution $ {p}_{\delta } $ . We can formulate this task in terms of maximum a posteriori (MAP) estimation.(Reference Marinescu, Moyer and Golland 18 ) Given an LR image $ \mathbf{y} $ , our goal is to recover the HR image $ \mathbf{x} $ as the MAP estimate of the conditional distribution $ {p}_{\mathrm{G}}\left(\mathbf{x}|\mathbf{y}\right) $ :
The first term is the likelihood term, describing the image degradation process $ \mathrm{D} $ , and the second is the image prior, describing the manifold of real HR images.
3.1.1. Image prior
Let $ {\mathrm{G}}_s $ be the synthesis network of a StyleGAN(Reference Karras, Laine, Aittala, Hellsten, Lehtinen and Aila 11 ) pretrained on the considered image domain. $ {\mathrm{G}}_s $ takes as input $ \mathbf{w} $ , produced by the mapping network, and outputs an image. The image prior $ \log {p}_{\mathrm{G}}\left(\mathbf{x}\right) $ can be expressed with respect to the latent variables $ \mathbf{w} $ :
The second term can be dropped as the path length penalty in StyleGAN2 implies that the Jacobian determinant is constant for all $ \mathbf{w} $ . The first term $ {p}_{\mathbf{w}}\left(\mathbf{w}\right) $ is the image prior we define on $ \mathbf{w}\in {\mathcal{W}}^{+} $ by
where
-
• $ {\mathcal{P}}_w $ is a prior that keeps $ \mathbf{w} $ in the area of high density in $ {\mathcal{W}}^{+} $ : $ {\mathcal{P}}_w=\frac{1}{L}{\sum}_{i=1}^L\log {p}_{\mathrm{F}}\left({\mathbf{w}}_i\right) $ , where $ {p}_{\mathrm{F}}\left(\mathbf{w}\right) $ is estimated by a normalizing flow model $ \mathrm{F} $ explained in further detail below.
-
• $ {\mathcal{P}}_{cross} $ is a pairwise Euclidean distance prior on $ \mathbf{w}=\left[{\mathbf{w}}_1,\dots, {\mathbf{w}}_L\right]\in {\mathcal{W}}^{+} $ that ensures $ \mathbf{w}\in {\mathcal{W}}^{+} $ remains close to the trained manifold in $ \mathcal{W} $ : $ {\mathcal{P}}_{cross}=-{\sum}_{i=1}^{L-1}{\sum}_{j=i+1}^L{\left\Vert {\mathbf{w}}_i-{\mathbf{w}}_j\right\Vert}_2^2 $ .
-
• $ {\lambda}_w $ and $ {\lambda}_c $ are hyperparameters that control the relative importance of the two priors.
3.1.2. Normalizing flow
Using a sequence of invertible mappings, a normalizing flow $ \mathrm{F}:{\mathrm{\mathbb{R}}}^d\to {\mathrm{\mathbb{R}}}^d $ is a transformation of an unknown complex distribution into a simple probability distribution that is easy to sample from and whose density is easy to evaluate such as standard Gaussian.(Reference Kobyzev, Prince and Brubaker 19 )
Let $ \mathbf{z}=\mathrm{F}\left(\mathbf{w}\right) $ with probability density function $ p\left(\mathbf{z}\right) $ . Using the change-of-variable formula, we can express the log density of $ \mathbf{w} $ by(Reference Papamakarios, Nalisnick, Rezende, Mohamed and Lakshminarayanan 20 )
where $ {J}_{\mathrm{F}}\left(\mathbf{w}\right) $ is the Jacobian of $ \mathrm{F} $ evaluated at $ \mathbf{w} $ . In practice, the Jacobian determinant in equation nf should be easy to compute so that the density $ {p}_{\mathrm{F}}\left(\mathbf{w}\right) $ can be evaluated. Furthermore, as a generative model, the invertibility of $ \mathrm{F} $ allows new samples $ \mathbf{w}={\mathrm{F}}^{-1}(z) $ to be drawn through sampling from the base distribution. In the literature, several flow models were proposed, such as real non-volume preserving flow(Reference Dinh, Sohl-Dickstein and Bengio 21 ) and masked auto-regressive flow (MAF).(Reference Papamakarios, Pavlakou and Murray 22 )
3.1.3. Optimization process
In the likelihood term presented in equation map estimation, we assume that the noise follows a Laplace distribution, that is, $ \delta \sim Laplace\left(0,{\lambda}_lI\right) $ . This assumption simplifies the log density of $ \delta $ to $ \log {p}_{\delta}\left(\delta \right)=-{\left\Vert \delta \right\Vert}_1-C $ , where $ C $ is a constant. Thus, the optimization problem in equation map estimation is effectively transformed into an optimization over $ \mathbf{w} $ , resulting in the final objective function:
The selection of the Laplace distribution for noise modeling is favored due to its simple form, which simplifies the log-density function and makes the loss function easier to optimize. Moreover, the $ {L}_1 $ norm, which arises in the log density of the Laplace distribution, offers greater robustness to outliers than the $ {L}_2 $ norm. This characteristic is particularly beneficial in the context of SR, where the ability to handle irregular noise or artifacts in LR images is crucial, especially given the challenges commonly associated with microscopy data.
3.2. Evaluation of the Gaussianization process
To assess the quality of the Gaussianization process, we sampled 5000 vectors $ \mathbf{z}\sim \mathcal{N}\left(\mathbf{0},{\mathbf{I}}_d\right) $ . These vectors were then transformed into style vectors $ \mathbf{w}={\mathrm{G}}_m\left(\mathbf{z}\right) $ using the mapping network $ {\mathrm{G}}_m $ of StyleGAN. We then Gaussianized the distribution of $ \mathcal{W} $ using alternatively PULSE and our method. The quality of the Gaussianization process was evaluated by computing the squared norm for all of these vectors within the transformed distribution (see Figure 1). Ideally, the squared norm of the standard Gaussian distribution $ \mathcal{Z} $ should follow a chi-squared distribution, that is, $ {\left\Vert {\mathbf{z}}_n\right\Vert}_2^2\sim {\chi}_d^2 $ , and thus, it forms a narrow distribution centered around $ d=512 $ , the dimensionality of $ \mathcal{W} $ . We can see that this is the case for $ \mathcal{Z} $ , which is Gaussian. However, the $ \mathcal{W} $ does not follow this pattern, which is inconsistent with the prior assumption held by BRGM (Bayesian Reconstruction through Generative Models).(Reference Marinescu, Moyer and Golland 18 ) Furthermore, while the PULSE method results in a broader distribution, our method more accurately narrows the squared norm distribution to match the expected chi-squared distribution.
4. Experiments
4.1. Experimental settings
A critical aspect of our study was the necessity to work with high downscaling factors, such as 16x and 32x, which are central to our application scenario. These factors are not commonly used in the literature, and to the best of our knowledge, there are no experimental datasets available with such high downscaling factors. Most existing datasets typically focus on moderate downscaling factors of 2x or 4x, which is not aligned with the needs of our study, where we aim to address more extreme cases of resolution enhancement. Therefore, we had to generate our own LR images.
As for implementation details, we began by training a StyleGAN2-ADA (StyleGAN2 with Adaptive Discriminator Augmentation)(Reference Karras, Aittala, Hellsten, Laine, Lehtinen and Aila 23 ) model on a subset of the BBBC021 dataset,(Reference Caicedo, McQuin, Goodman, Singh and Carpenter 24 , Reference Ljosa, Caie and Ter Horst 25 ) which comprises cells treated with drugs acquired following the cell painting assay.(Reference Caie, Walls, Ingleston-Orme, Daya, Houslay, Eagle, Roberts and Carragher 26 ) That is, cells were fluorescently stained with markers for F-actin, B-tubulin, and deoxyribonucleic acid (DNA), as described in.(Reference Caie, Walls, Ingleston-Orme, Daya, Houslay, Eagle, Roberts and Carragher 26 ) The dataset comprises wide-field epi-fluorescence images, captured using the automated ImageXpress imaging platform. From this dataset, we extracted images of size $ 128\times 128 $ centered around each cell nucleus and used 400 images per compound treatment (approximately 28,000 in total) to train StyleGAN2-ADA.(Reference Karras, Aittala, Hellsten, Laine, Lehtinen and Aila 23 ) We then proceeded to generate 100000 random samples $ \mathbf{z}\sim \mathcal{N}\left(\mathbf{0},{\mathbf{I}}_d\right) $ and used their associated style vector $ \mathbf{w}={\mathrm{G}}_m\left(\mathbf{z}\right) $ to train the normalizing flow model. We opted for MAF, as it tends to perform better than RealNVP (Real-valued Non-Volume Preserving) for density estimation tasks. Our normalizing flow model comprised five flow blocks with all hidden dimensions set to 1024. For super-resolving the images using our RLS algorithm, we employed an Adam optimizer over 200 iterations with a learning rate of 0.5 and initialized the search using the mean of 10,000 randomly generated latent vectors.
To fine-tune the regularization parameters $ {\lambda}_w $ and $ {\lambda}_c $ , we empirically evaluated a spectrum of values, eventually setting them to 5e-5 and 0.01, respectively. This process revealed that the algorithm’s performance remained relatively stable across a broad range of these parameters, indicating a lack of sensitivity as long as the values fell within a specific boundary. Precisely, we observed optimal performance when $ {\lambda}_w $ was between 1e-6 and 5e-4, and $ {\lambda}_c $ ranged from 0.005 to 0.05. These findings suggest that while the exact values of $ {\lambda}_w $ and $ {\lambda}_c $ are flexible, maintaining them within these determined ranges ensures the algorithm functions effectively.
For evaluation, we used 100 random samples from various compound treatments, ensuring samples were not included in the training of the generative network. We simulated degraded images from these HR images using bicubic downsampling, which is a common choice in the literature. We then compared the results of our algorithm with those of Pix2Pix and two state-of-the-art unsupervised image reconstruction methods based on StyleGAN inversion: PULSE(Reference Menon, Damian, Hu, Ravi and Rudin 15 ) and BRGM.(Reference Marinescu, Moyer and Golland 18 ) For the baseline methods, we used the same parameter settings reported by the original papers.
4.2. Results
4.2.1. RLS recovers high-quality cell images
We reconstructed images using RLS and compared them to images generated with baseline methods. The results of this comparison are presented in Figure 2. At these high upscaling factors, due to the lack of proper regularization, the competing methods fail to produce reasonable cellular details and tend to produce images that can be accurately downscaled to the LR image at the cost of generating distorted cellular structures. In contrast, our approach produces highly realistic images for most examples and can reproduce cell phenotypes induced by compound treatment even if the fine-grain details are not similar. For instance, when reconstructing the images of cells treated with nocodazole, a known microtubule destabilizer, RLS captures the typically fragmented microtubule phenotype, resulting in a more accurate texture. Reconstructing images of cells at higher resolution obviously provides access to finer-grain measurable features.
To quantitatively assess this gain in performance, we evaluate the SR images using Frechet inception distance (FID)(Reference Heusel, Ramsauer, Unterthiner, Nessler and Hochreiter 27 ) and kernel inception distance (KID)(Reference Bińkowski, Sutherland, Arbel and Gretton 28 ) to measure the discrepancy between the real HR images and the reconstructed one. As expected, the scores of FID and KID listed in Table 1 show that with both upscaling factors, our method significantly improves realism.
Note: The best and the second best are highlighted in bold and italic, respectively.
4.2.2. RLS achieves better perceptual scores
We evaluated the reconstruction accuracy of RLS using learned perceptual image patch similarity (LPIPS),(Reference Zhang, Isola, Efros, Shechtman and Wang 30 ) peak signal-to-noise ratio (PSNR), and multi-scale structural similarity index for motion detection (MS-SSIM)(Reference Wang, Simoncelli and Bovik 31 ) metrics to compare the reconstructed SR images to the real HR images. As expected, RLS did not achieve the best pixel-wise reconstruction losses but achieved either the lowest or second-lowest LPIPS scores on both 32x and 16x upscaling factors. The latter suggested that while the recovered SR image is not exactly the same as the HR image, it is perceptually closer. It is worth noting that although Pix2Pix produced images with high PSNR and MSSIM scores, it struggled to accurately reconstruct cellular details. This observation highlights the fact that PSNR and SSIM metrics may not be fully appropriate to evaluate the performances of the SR tasks.
4.2.3. RLS preserves interpretable features
RLS achieves a balance between realism and fidelity when super-resolving microscopy images. However, as any deep learning method for SR, it cannot generate details it would not have seen during training. Therefore, it cannot be used in a context where novel events can be expected. However, it could possibly be used to perform measurements to quantify expected phenotypic changes. This is the case for many assays used in basic research in biology. It is also the case in diagnostics such as parasitemia for instance where the tool must assess if a parasite is present or not. Instead of relying on a blind classification or regression of LR images, these tasks on LR images in such a context could be decomposed into two steps. A first step based on our deep learning approach would consist of reconstructing a HR image, while a second step would use a dedicated handcrafted analysis to quantify a phenotypic feature, making the analysis explicitly interpretable.
To evaluate to what degree the information conveyed by the SR images can be used for such quantitative assays and maybe later for interpretable diagnostics, we applied it to two assays that allowed straightforward quantification of “interpretable features” on HR images but not on LR images. Here, interpretable features refer to the measurable and explainable phenotypic changes crucial for biological assessment, such as the nucleocytoplasmic ratio in response to TNF- $ \alpha $ treatment and alterations in the Golgi apparatus morphology in reaction to nocodazole. Quantitative analysis was conducted on the SR images to assess the reconstruction of these features, and results were compared against those obtained from HR images as well as baseline methods, including BRGM, PULSE, and an unregularized latent search (“w/o Regu.”).
The first assay aimed to track the location of the nuclear factor kappa B (NF- $ \kappa $ B) protein within the cell. Upon treatment with TNF- $ \alpha $ , a pro-inflammatory cytokine, the protein moves to the nucleus, causing a shift in fluorescence signal from the cytoplasm to the nuclear area, resulting in bright green nuclei. We observed that the nucleocytoplasmic fluorescence ratio measured on 1000 HR-treated and HR-untreated images could be replicated when computed from super-resolution (SR) images (Figure 5a). We also provided some visual examples of super-resolving the LR images, that is, the first step in Figure 3, which shows that our method can reconstruct images of wild-type and treated cell images obtained with this common assay.
The second assay we conducted aimed to monitor changes in the morphology of the Golgi apparatus. When cells are exposed to nocodazole, microtubules disassemble, causing the Golgi, originally located near the center of the cell, to break up into smaller stacks. Figure 4 shows that standard assays such as nocodazole-induced Golgi scattering (green) were reproduced with SR images. Moreover, as illustrated in Figure 5b, when computed only from SR images, a straightforward average spot size difference measured on 1000 HR-treated and 1000 HR-untreated images could be retrieved.
Furthermore, we utilized cells treated with DMSO (Dimethyl Sulfoxide) as negative controls and cells treated with nocodazole (or TNF in the case of the translocation assay) as positive controls. We developed a classifier trained on HR images to distinguish between the two phenotypes and evaluated its performance on both HR and super-resolved images, ensuring that the datasets for GAN training and classifier training were distinct to maintain the integrity of our results. Additionally, we trained a separate classifier on LR images for phenotype differentiation, and its testing was confined to LR images. The results as shown in Table 2 indicate that the SR operation does not degrade the discriminative signal contained in LR images, and the classification accuracy is about the same on SR and LR images. Furthermore, we show evidence that this signal difference can be retrieved when quantifying the same interpretable feature variation present in HR images using handcrafted image analysis (see Figure 5).
It is worth noting that in Figure 5 the similarity in performance between the “w/o Regu.” model and RLS during the translocation assay reveals an interesting insight. It suggests that, for this specific assay, the additional regularization does not significantly change the outcome. However, it is important to note that this assay was designed to evaluate strong phenotypic changes where the expected changes are substantial and easy to detect. In this case, the nonregulated version can also accurately capture the phenotypic changes, since the differences between the images are large. This is evidenced by the comparative classification accuracies in Table 2 for the translocation assay, where RLS achieves similar performance to the “w/o Regu.” version and the baseline methods. The advantages of regularization in RLS are more apparent in assays with subtler phenotypic variations, where detecting nuanced differences is more challenging. For example, in the Golgi morphology assay, where detecting changes requires more detailed analysis, RLS’s regularization reconstructs realistic images that more faithfully represent biological structures required for detailed quantitative evaluations. Furthermore, considering the overall performance, RLS demonstrates consistent and enhanced performance compared to both the baselines and the non-regularized version, as indicated by the superior classification results in Table 2. This suggests that RLS is a more robust method for super-resolving microscopy images, especially in assays where the phenotypic changes are subtle. Overall, the experiments confirm that the quality of super-resolved images is adequate for further analysis. Our method enables a two-stage process: employing deep learning for the challenging task of SR followed by a handcrafted, interpretable method for the subsequent quantitative measurements.
4.3. Robustness
In opposition to supervised methods that are sensitive to the input image domain, this approach is not restricted to a particular degradation operator that is used during training. To evaluate this aspect, we applied additional degradation operators such as Gaussian noise, salt and pepper, and Gaussian blur to a bicubic downscaled image DS before reconstruction. As depicted in Figure 6, the reconstruction closely matches the DS image. This result validates our choice of using the bicubic downscaling operator during training instead of more complicated specific degradation.
4.4. Ablation study
Figure 7 demonstrates ablation experiments highlighting the impact of the components of our image prior. First, “w/o Regu.” searches the latent space without any regularization for the image that, once downscaled, matches the LR image. The second variant is denoted “w/o $ {\mathcal{P}}_w $ ,” that is, the image prior does not include the prior term $ {\mathcal{P}}_w $ . Similarly, “w/o $ {\mathcal{P}}_{\mathrm{cross}} $ ” refers to the suppression of $ {\mathcal{P}}_{cross} $ .
To evaluate the three variants, we use the same set of parameters described in Section 4.1. One can see that searching the latent space without any regularization produces images that do not necessarily belong to the original image manifold and therefore do not appear realistic. It also tends to generate images that are accurately downscaled to the LR image but at the cost of generating distorted images when $ {\mathcal{P}}_{\mathrm{cross}} $ or $ {\mathcal{P}}_w $ is discarded. This implies that both priors play an important role in generating realistic details.
4.5. Uncertainty
An important challenge of the SR task is that it is an ill-posed problem. Although we can improve this aspect by using an image prior constraint, several closely related highly resolved images could still be consistent with a single LR image. To generate $ n $ realistic SR images, we sample $ n $ latent codes denoted as $ {\mathbf{w}}_1,{\mathbf{w}}_2,\dots, {\mathbf{w}}_n $ . We assume that their distribution follows a Gaussian distribution $ \mathcal{N}\left(\mu, \sigma \right) $ , where the parameter $ \sigma $ follows an inverse gamma distribution. Using the Bayes rule, we estimate the distribution’s parameters by
Here, the first term is the log-likelihood of the posterior distribution of $ p\left(\mathbf{w}|\mathbf{y}\right) $ , which is defined in Eq. (5); the second term is the regularization term, which penalizes large values of $ \sigma $ ; and $ \log p\left(\sigma \right) $ is the log prior distribution of $ \sigma $ . Figure 8 displays multiple solutions for a given LR image we can obtain with our approach. Sampling close variations of HR images from a single LR input can be used to enhance the robustness of the estimation of an image-based quantitative feature by reducing the effects of noise or artifacts that may exist in the input LR image.
5. Conclusion
In this paper, we propose a robust SR method based on RLS within a pretrained StyleGAN. It does not require coupled image pairs for training and constrains the SR task to a given image prior to offering a trade-off between fidelity and realism of the SR reconstruction. Furthermore, we show that such a method could be used to split analyses, such as the classification of LR images, into a reconstruction of SR images performed by deep learning and a simple dedicated handcrafted analysis of an interpretable feature.
The latter could be used, for instance, for rapid diagnostic based on smartphone directly available on the field. In this case, a dataset of LR images of slides acquired from fast and/or cheap solutions available on the field could be coupled with the acquisition of the same slides using high-end expensive HR microscopes with limited access. In this application, HR images could then be reconstructed directly on the field to perform an explainable diagnostic such as a parasite count.
6. Limitations and future work
One common challenge with deep learning-based methods, especially those using generative priors like in our study, is their limited generalizability to unfamiliar and unseen data. Our approach, which utilizes StyleGAN for unsupervised learning within a specific domain, may not encompass the full diversity encountered in real-world scenarios. To enhance the model’s generalizability, strategies such as incorporating data augmentation to introduce training data variability, or applying transfer learning for domain adaptation, could be beneficial.
Our current findings serve as an initial validation of our method’s capabilities. However, we acknowledge the necessity of further evaluations using more varied datasets, encompassing a broader range of imaging techniques and sample preparation methods. The absence of such diverse experimental data in our current research is due to the specific requirement of our study to explore SR at high upscaling factors like 16x and 32x. These factors, which are crucial to our application’s needs, are rarely addressed in the available literature, and to our knowledge, datasets with such extreme upscaling factors are not yet available. Existing datasets generally focus on more moderate upscaling factors, such as 2x or 4x, which do not meet the demands of our research that targets significantly higher levels of resolution enhancement. We are committed to extending our validation to include real-world imaging conditions as soon as datasets meeting our high upscaling factor requirements become available. This will enable a more comprehensive assessment of our method’s applicability and performance in practical scenarios.
Data availability statement
We used the BBBC021 image set available from the Broad Bioimage Benchmark Collection (https://bbbc.broadinstitute.org/).
Acknowledgments
We thank the Grand équipement national de calcul intensif (GENCI) for access to the high-performance computing (HPC) resources of the Institute for Development and Resources in Intensive Scientific Computing (IDRIS) under the allocation 2020-AD011011495.
Author contribution
M.G. proposed and implemented the RLS method. A.G. hypothesized that RLS reconstructs interpretable features and suggested experiments to validate it. M.G. ran the experiments. M.G. and A.G. wrote the manuscript.
Funding statement
This work was supported by the ANR VISUALPSEUDOTIME, ANR–10–LABX–54 MEMOLIFE, and ANR–10 IDEX 0001–02 PSL* Université Paris.
Competing interest
The authors declare none.