Impact Statement
Staining with multiple biomarkers in a single tissue section, multiplex staining, is changing how we examine in-situ normalcy, pathology, and the interrelationship of good and bad biological actors. Bioinformatic pipelines developed to deal with high-dimensional datasets such as single-cell RNA sequencing or multiparameter flow cytometry have been adapted to analogous types of data derived from tissue, and co-exist with conventional image analysis tools and human eye-guided image evaluation. We wanted to evaluate if multiplex staining with more than 15 markers (hyperplexed) has comparable sensitivity to conventional image analysis and if these latter analysis tools carry undisclosed biases and should not be used together with hyperplexed staining. We found that the human eye has a reduced discriminative power for grayscale and luminance levels compared to the 8-bit available spectrum, affecting positive signal recognition above the noise. We also found that the granular analytical power of recent bioinformatic pipelines can extract information from images which defy human eye perception and deliver information unattainable with existing image analysis tools for single-stain images.
1. Introduction
In situ antigen detection in tissues via antibody staining, in transmitted (immunohistochemistry; IHC) or fluorescent light (immunofluorescence; IF) is an established tool in science. It is a space structure preserving assay, complementary to other techniques such as in situ transcriptomics(Reference Le, Ahmed and Yeo 1 ) or in situ proteomics.(Reference Mund, Coscia, Kriston, Hollandi, Kovács, Brunner, Migh, Schweizer, Santos, Bzorek, Naimy, Rahbek-Gjerdrum, Dyring-Andersen, Bulkescher, Lukas, Eckert, Lengyel, Gnann, Lundberg, Horvath and Mann 2 ) It is also complementing all techniques applied to disaggregated specimens, the latter as single cell suspensions (e.g., single-cell RNA sequencing(Reference Svensson, Vento-Tormo and Teichmann 3 ); scRNAseq) or homogenates.
In recent years, in situ immunostaining has evolved from a single stain (IHC, the staple tool of diagnostic Pathology) to multiple (from two to seven or more) IF stains, to a much higher number of simultaneous co-stains, typically in excess of a dozen, in what is called high-plex or high-dimensional in situ staining or targeted antibody-mediated proteomics.(Reference De Smet, Antoranz Martinez and Bosisio 4 ) Recommendations for standardization of the diagnostic use of multiplex stains followed,(Reference Taube, Akturk, Angelo, Engle, Gnjatic, Greenbaum, Greenwald, Hedvat, Hollmann, Juco, Parra, Rebelatto, Rimm, Rodriguez-Canales, Schalper, Stack, Ferreira, Korski, Lako, Rodig, Schenck, Steele, Surace, Tetzlaff, von Loga, Wistuba and Bifulco 5 ) including antibody validation practices.
An analogous progress occurred earlier with flow cytometry (FCM), a technique which employs conjugated antibodies to characterize single-cell suspensions.(Reference Herzenberg, Tung, Moore, Herzenberg and Parks6 ) An acceleration of the evolution of the technique was brought by the use of metal-conjugated antibodies and mass spectrometry for detection (Cytometry by time of flight; CYTOF), in lieu of photodetectors and photomultipliers.(Reference Olsen, Leipold, Pedersen and Maecker 7 ) The evolution of the technique was accompanied by an evolution of the bioinformatic tools required to handle such an increase in data dimensionality to be analyzed.(Reference Mair, Hartmann, Mrdjen, Tosevski, Krieg and Becher 8 ) Most of the bioinformatic tools developed for single-cell assays (scRNAseq, FCM) have been applied to the analysis of single cells in tissue sections.
Low-plex staining (~7–10 biomarkers) is increasingly diffuse, partly owing to the popularity of a signal-enhanced technique (Tyramide Signal Amplification or TSA(Reference van Gijlswijk, Zijlmans, Wiegant, Bobrow, Erickson, Adler, Tanke and Raap 9 )), however, the image analysis (IA) required for this type of staining does not differ from what is customarily used for single stain images in IHC or IF.(Reference Aeffner, Zarella, Buchbinder, Bui, Goodman, Hartman, Lujan, Molani, Parwani, Lillard, Turner, Vemuri, Yuil-Valdes and Bowman 10 )
What sets apart hyperplexed in-situ targeted proteomics via antibody immunodetection, the method using high-plex (>15) biomarker determination at cellular or sub-cellular resolution in situ, from other low-plex techniques is the use of bioinformatic tools, proper of other single-cell assays.(Reference Clarke, Andrews, Atif, Pouyabahar, Innes, MacParland and Bader 11 , Reference Heumos, Schaar, Lance, Litinetskaya, Drost, Zappia, Lucken, Strobl, Henao, Curion, Single-cell Best Practices, Schiller and Theis 12 ) Analogously to FCM and scRNAseq, human visual image assessment is minimal or nil for these processes, despite the ground truth data which are tissue images.
IA tools development(Reference Ljosa and Carpenter 13 ) has accompanied the production of images all along. Interestingly, one of the main concern of scientists using IA is to identify nuclei in sections,(Reference Jamali, Dobson, Eliceiri, Carpenter and Cimini 14 ) something Surgical Pathologists do effortlessly every day.
IA has been developed not as a replacement for the human eye but as a companion, particularly for simplified one-protein-at-a-time diagnostic immunostains.(Reference Czerniak, Herz, Wersto, Alster, Puszkin, Schwarz and Koss 15 ) However multiplex staining data are intrinsically so complex that deep learning-assisted IA has an increasing role in multiple steps, such as image segmentation,(Reference Ma and Wang 16 ) data normalization, and cell classification.(Reference Amitay, Bussi, Feinstein, Bagon, Milo and Keren 17 ) Yet, the microscope’s future evolution in an expert’s view still features eyepieces.(Reference Carpenter, Cimini and Eliceiri 18 )
We sought to reevaluate the individual components leading to single-cell classification via hyperplexed stains, and in particular the role, when present, of a human visual assessment of images in processes such as assay sensitivity, antibody validation, signal thresholding, and gating and cell segmentation.
By analyzing a public human lymph node dataset with a custom bioinformatic pipeline, BRAQUE,(Reference Dall’Olio, Bolognesi, Borghesi, Cattoretti and Castellani 19 ) we found that the human eye is dispensable for the analysis of in situ hyperplexed multistainings.
2. Materials and methods
2.1. Ethical background
The study has been approved by the Institutional Review Board Comitato Etico Brianza, N. 3204, “High-dimensional single cell classification of pathology (HDSSCP),” October 2019. Consent was obtained from patients who could be contacted or waived according to article 89 of the EU General Data Protection Regulation 2016/679 (GDPR) and decree N. 515, 12/19/2018 of the Italian Privacy Authority.
2.2. Human specimens
Sentinel lymph nodes (n = 5) were extracted from the laboratory information systems of the San Gerardo Hospital by the Authors with clinical privileges and anonymized. Paraffin blocks and sections to be analyzed were selected by a Pathologist after a review of the Hematoxylin and Eosin (H&E) stain. Only archival formalin-fixed, paraffin embedded material (FFPE) was used.
2.3. Histology
Chilled paraffin blocks were sectioned in a rotary microtome (Leica Biosystems, Buccinasco, MI, Italy) at 3 μm, sections were placed in a warm waterbath and collected on charged microscope glass slides. After an overnight oven incubation in an upright position, they were further processed for Hematoxylin & Eosin (H&E), IHC, or IF stains.
2.4. Antigen retrieval
Antigen retrieval (AR) was performed by placing the dewaxed, rehydrated sections(Reference Bolognesi, Manzoni, Scalia, Zannella, Bosisio, Faretta and Cattoretti 20 ) in a 800 ml glass container filled with the retrieval solutions (EDTA pH 8; 1 mM EDTA in 10 mM Tris-buffer pH 8, Merck Life Science S.r.l., Milano, Italy; cat. T9285), irradiated in a household microwave oven at full speed for 8 min, followed by intermittent electromagnetic radiation to maintain constant boiling for 30 min, and cooling the sections to about 50 °C before use.
2.5. Immunohistochemistry
Primary unconjugated antibodies (Abs) were validated for frozen and for FFPE material according to established criteria(Reference Bolognesi, Mascadri, Furia, Faretta, Bosisio and Cattoretti 21 ) (see Supplementary Tables).
For immunohistochemistry (IHC), optimally diluted, validated primary antibodies were applied overnight, washed in 50 mM Tris–HCl buffer (pH 7.5) containing 0.01% Tween-20 (Merck) and 100 mM sucrose (TBS-Ts),(Reference Cattoretti, Bosisio, Marcelis and Bolognesi 22 ) counterstained with a horseradish peroxidase–conjugated polymer (Vector Laboratories, Burlingame, CA), washed, developed in DAB (Agilent, Santa Clara, CA), lightly counterstained and mounted.
Serial LN sections were immunostained for the AE1–AE3 pre-made cocktail in a Omnis automated immunostainer (Agilent, Santa Clara, CA) with routine same-day protocols.
2.6. Indirect immunofluorescence
Multiple immunofluorescent (IF) labeling was previously described in detail.(Reference Bolognesi, Manzoni, Scalia, Zannella, Bosisio, Faretta and Cattoretti 20 ) Briefly, the sections were incubated overnight with optimally diluted primary antibodies in species or isotype mismatched combinations (e.g., rabbit + mouse, mouse IgG1 + mouse IgG2a), washed and counterstained with specific distinct fluorochrome-tagged secondary antibodies (Supplementary Tables).(Reference Bolognesi, Manzoni, Scalia, Zannella, Bosisio, Faretta and Cattoretti 20 ) The slides, counterstained with DAPI and mounted, were scanned on an S60 Hamamatsu scanner (Nikon, Campi Bisenzio, FI, Italy) at 20× magnification. The filter setup for seven color acquisition (DAPI, BV480, FITC, TRITC, Cy5, PerCp, autofluorescence/AF) was as published.(Reference Mascadri, Ciccimarra, Bolognesi, Stellari, Ravanetti and Cattoretti 23 ) Additional data are in the Supplementary Material.
2.7. Tyramide signal amplification
Sections to be processed for TSA were dewaxed, antigen retrieval was performed as mentioned, endogenous peroxidase was blocked, incubated with the primary Ab overnight and processed as per the manufacturer’s instruction for Alexa Fluor™ 647 Tyramide (cat. N. B40958; Thermo Fisher Scientific, Vedano al Lambro, Italy), a fluorochrome emitting in the red spectrum where tissue autofluorescence is minimal. The Alexa Fluor™ 647 signal was acquired with a 650/13 nm excitation filter, a 694/44 nm emission filter, and a dichroic FF655-Di01 filter(Reference Mascadri, Bolognesi, Pilla and Cattoretti 24 ) and could be combined with other fluorochrome combinations except the ones emitting in the 530–570 nm range, where the TSA-Alexa FluorTM 647 product bleeds. All filters are from Semrock, Lake Forest, Ill. Details of the process can be found in the Supplementary Material.
In the double indirect IF-TSA combined staining, TSA was performed first.
2.8. Preparation of immunofluorescent images for single cell analysis
After the stainings were acquired, digital slide images (.ndpi) were imported as uncompressed .tiff with ImageJ (ImageJ, RRID:SCR_003070). Tissue autofluorescence (AF) was subtracted when appropriate as published.(Reference Bolognesi, Manzoni, Scalia, Zannella, Bosisio, Faretta and Cattoretti 20 )
2.9. Image analysis
IHC/IF staining quantitation: Fluorescence images were imported in Fiji(Reference Schindelin, Arganda-Carreras, Frise, Kaynig, Longair, Pietzsch, Preibisch, Rueden, Saalfeld, Schmid, Tinevez, White, Hartenstein, Eliceiri, Tomancak and Cardona 25 ) (RRID:SCR_002285). For area quantification, inverted images were adjusted (Brightness/Contrast command) and thresholded (OTSU). The stained area value was normalized for the total nuclear area value (DAPI). For IHC, the image was color deconvoluted(Reference Ruifrok and Johnston 26 ) and the DAB image was processed as above. Hematoxylin was used for normalization instead of DAPI. Brightness/Contrast, Math transformation (log), and 3D surface plot were used for visualization (see Supplementary Methods).
Two public-domain IA tools were used for nuclear identification: QuPath (RRID:SCR_018257)(Reference Bankhead, Loughrey, Fernandez, Dombrowski, McArt, Dunne, McQuaid, Gray, Murray, Coleman, James, Salto-Tellez and Hamilton 27 ) and CellPose 2.0 (RRID:SCR_021716).(Reference Pachitariu and Stringer 28 ) Details of the setting for IA are reported in the Supplementary Methods.
Adobe Photoshop 2023 (San Jose, CA) (RRID:SCR_014199) and Adobe Illustrator (RRID:SCR_010279) were used for figure layouts.
2.10. Grayscale tone discrimination test of the human eye
Fourteen pathologists, 11 males and 3 females, aged 43 ± 13.8 years (range 29–71), 14.1 ± 13 years into the profession (range 0–43) were asked to log into the Time magazine website https://time.com/4663496/can-you-actually-see-50-different-shades-of-grey/, perform the test and provide the score obtained. Additional information is provided in the Supplementary Methods section.
2.11. Bit depth reduction discrimination tests
Twelve pathologists with diagnostic digital pathology experience examined a series of continuous gray shaded bars and full-size four-images composites uploaded into NDPserve (Hamamatsu Photonics) via a provided link. The images encompass the various typology of digital images encountered during diagnostic sign-up (Supplementary Figure S1). The bit depth of each image in the composite was changed from the conventional 24 bit (8 bit times three, 256 colors each) to 6, 5, or 4 bit via the Adjustments > Posterize command (Adobe Photoshop 2023), then saved in the new format with the original image size and pixel resolution. The 7-bit image was not used for the histology image test, except for the grayscale gradients bars. The percentage of correct scores for each observer, the image type and bit depth were recorded. Additional information is provided in the Supplementary Methods section.
2.12. High dimensional analysis with BRAQUE
BRAQUE,(Reference Dall’Olio, Bolognesi, Borghesi, Cattoretti and Castellani 19 ) an acronym for Bayesian Reduction for Amplified Quantization in UMAP Embedding, has been developed for the global analysis of individual cells in tissue sections stained in IF with multiple biomarkers and uses dimensionality reduction algorithms. Is a Python pipeline for automated cluster enhancing, identification, and characterization.
The key procedure of BRAQUE (whose code may be found on GitHub at https://github.com/LorenzoDallOlio/BRAQUE) consists of a new preprocessing, called lognormal shrinkage. This preprocessing specifically addresses the problem arising from noise due to crossbleed from neighboring cells, in fact, if single-cell data are more distinct and discrete on the other hand spatial proteomics markers assume a more continuous distribution with less clear separation among the modalities.(Reference Hickey, Tan, Nolan and Goltsev 29 , Reference Zhang, Li, Reticker-Flynn, Good, Chang, Samusik, Saumyaa, Li, Zhou, Liang, Kong, Le, Gentles, Sunwoo, Nolan, Engleman and Plevritis 30 )
In BRAQUE’s preprocessing a mixture of normal distributions is fitted for each of the log-transformed markers, and then each normal component of the mixture is shrunk toward its mean to help further steps counter this continuity and lack of clear separation.
After this crucial step, the markers are standardized and combined in a 2-dimensional latent spaces by the UMAP algorithm. On this embedding space, the clustering of cells is performed by HDBSCAN and lastly, each cluster is tested for significant markers, which are ranked by effect size to help experts with cell type annotation.
The output consists of multiple clusters, whose numerosity is defined by the size of the smallest cluster (usually not below 0.005% of the cell number or ~ 20 cells). Each cluster is defined by (A) markers ranked for probability or possibility to identify the cluster, (B) a tissue map of the cells belonging to the cluster, and (C) the expression of a pre-defined set of diagnostic markers for that cluster, compared to the whole population (Supplementary Figure S1). Each cluster is classified by an expert supervision into cell types.
The HubMap lymph node dataset HBM754.WKLP.262 (doi:10.35079/HBM754.WKLP.262) was downloaded from the HubMap consortium website (https://hubmapconsortium.org/) as a .csv file, thus pre-segmented by the source.
3. Results
3.1. The human eye has a biased vision
In image analysis, the human eye is required to discriminate signal from noise or background.
Published research shows that humans can distinguish about 870 different shades of gray,(Reference Kimpe and Tuytschaever 31 ) data which are contradicted by Kreit et al.,(Reference Kreit, Mäthger, Hanlon, Dennis, Naik, Forsythe and Heikenfeld 32 ) who sets the gray level discrimination in humans at about 30 shades.
Fourteen experienced observers produce a gray discrimination score of 37.8 (SD ±4.77) out of 50 (Figure 1) which is below the discrimination of 64 gray tones out of 256 (8-bit scale) (see Supplementary Methods) and in keeping with published results(Reference Kreit, Mäthger, Hanlon, Dennis, Naik, Forsythe and Heikenfeld 32 ) and anecdotal annotations in the public domain (see Supplementary Tables).
The type of images of this test (homogeneously tinted squares surrounded by a thick border) are not the type of images encountered in medicine or biology and may be also prone to hallucinations.(Reference Aeffner, Wilson, Martin, Black, Hendriks, Bolon, Rudmann, Gianani, Koegler, Krueger and Young 33 ) We then used microscopy digital images in which the luminance repertoire was reduced from the 256 usual channels (8 bit) down to just 16 (4 bit) (see an example in Figure 2 and Supplementary Figure S3).
While the observers identify laddering (that is reduced bit depth) on the monochrome continuous grayscale images below a mean of 7.7 ± 0.2 bits (range 8–7.5) (Figure 3a and Supplementary Figure S3T), they scored correctly the bit depth of only 51% ± 33% of the images (range 26%–75%). There was no apparent relationship between the ability to identify bit degradation in monochrome bands, which scored at the top for all pathologists, and in histology images (Figure 3a).
The most degraded images (4 bit) were more likely to be correctly identified (Figure 3b). Erroneous bit depth assignment was equivalent in all kinds of common pathology images (Figure 3c), being a single triple immunofluorescent image the most variably scored (mean 35% ± 42% correct score, range 0%–100%) (Figure 3).
The discrimination power for degraded images was highest among the bottom range of bit depth (Figure 3d), with no differences among the image types.
Very detailed images (e.g., colon, testis, LCH) scored marginally better on average than images with low details (brain, muscle, IHC), with 55.8% versus 49.7% correct answers.
From these experiments, we conclude that the discriminative power of the human eye for details along a 8 bit luminance scale is significantly reduced, compared to the available range.
3.2. Signal enhancement methods may deliver marginal gains
Positive signal brightness affects detection. Thus, we wanted to define the sensitivity of the immunofluorescent techniques used in multiplexing, compared to a brightfield standard, DAB IHC. To do so, we used widely used algorithms for immunostain separation from background and identification such as Otsu and K-mean clustering, which are based on vector quantization. These algorithms do not require tuning and the result reflects the image ground truth according to the gray levels of the image.
As previously published by others,(Reference Hötzel, Havnar, Ngu, Rost, Liu, Rangell and Peale 34 , Reference Berry, Giraldo, Green, Cottrell, Stein, Engle, Xu, Ogurtsova, Roberts, Wang, Nguyen, Zhu, Soto-Diaz, Loyola, Sander, Wong, Jessel, Doyle, Signer, Wilton, Roskes, Eminizer, Park, Sunshine, Jaffee, Baras, De Marzo, Topalian, Kluger, Cope, Lipson, Danilova, Anders, Rimm, Pardoll, Szalay and Taube 35 ) TSA was not superior, compared to DAB, and as good as double indirect IF(Reference Bolognesi, Manzoni, Scalia, Zannella, Bosisio, Faretta and Cattoretti 20 ) for some fluorochromes (Figure 4). The use of signal-enhancing methods for immunofluorescent staining may marginally benefit signal detection.
3.3. Simplified image analysis tools lack sensitivity
For quantification, we used images of abundant low molecular keratin 8 and 18 (LMW-KRT) expressed in thin dendrites of fibroblastic reticular cells (FRC), and we found that commonly used thresholding algorithms to quantify IF immunostains fail to account for positive pixels at the low end of the spectrum, despite being visible to the human eye after image rendering (Figure 5).
3.4. Hyperplex staining methods have superior sensitivity
Next, we tested the analytical power of hyperplexed stainings by examining a public human lymph node dataset, composed of 28 antibodies (+ DAPI), for which the segmentation method was previously published.(Reference Kennedy-Darling, Bhate, Hickey, Black, Barlow, Vazquez, Venkataraaman, Samusik, Goltsev, Schürch and Nolan 36 ) In the panel, a widely used “pan keratin” antibody cocktail, AE1 and AE3(Reference Woodcock-Mitchell, Eichner, Nelson and Sun 37 ) was used.
Only AE3 is able to detect KRT8, one of the two LMW-KRT in LN (the other is KRT18). The AE1-AE3 cocktail (“panCK” or “CK”) is used daily by thousands of surgical pathologists to detect nodal metastasis from carcinoma. Because a selective epitope condition prevents the broad detection of LN FRC with this cocktail, only occasional KRT8+ cells are being detected.(Reference Linden and Zarbo 38 ) The presence of a panCK reagent in the dataset made a comparison of the detection power possible between single-stains and hyperplexed images, by enumerating the positive cell types.
We applied to the HubMap dataset BRAQUE,(Reference Dall’Olio, Bolognesi, Borghesi, Cattoretti and Castellani 19 ) a dimensionality reduction algorithms-based analytical pipeline (DRAAP) designed for the spatial Ab-based proteomic data in multiplex, which do not requires pre-definition of positive signal thresholds.
BRAQUE was able to identify 12 clusters containing CK+ cells for a total of 16,698 cells out of 188,450 (9.24%) (Table 1, Supplementary Tables, and Supplementary Figures S1 and S2).
Nine clusters expressed SMA together with CK, a known phenotype of FRC,(Reference Franke and Moll 39 ) together with variable expression of CollagenIV, CD35 and Podoplanin. These cells accounted for 7.54% of the LN population (13,621 cells).
Three clusters (3,077 cells, 1.70%) had an endothelial phenotype (CD34+ CD31+), where the CK signal could be bleeding from adjacent FRC. A total of 8,196 cells (4.54%) had a stromal phenotype devoid of CK, divided into fibroblasts (4.30%) and SMA-expressing myofibroblasts (0.23%). The tissue distribution of CK+ FRC and fibroblasts is partially overlapping and distinct from endothelial cells (Figure 6).
A distinct population of Lyve1+ sinus lining cells coexpressed CD31, Vimentin, CD107a but not CD34 (Supplementary Figure S4).
None of these stromal clusters expressed CD44, CD45, or CD45RO, or any other leukocyte-restricted marker. The complete cell classification results are available in the Supplementary Methods (Supplementary Tables).
The spatial distribution of the FRC clusters is consistent with the known tissue location and the percentage of total stromal cells (Table 1).
7.4% of the segmented cells were contained in nine clusters which could not be classified (unclear, artifacts) in addition to cells discarded by BRAQUE (6.8%) upfront on a statistical basis.
By analyzing the same dataset with Phenograph, a similar classification was obtained, including the identification of CK+ stromal cell clusters (Supplementary Figure S5).
To estimate how the percentage of CK+ FRC detected by BRAQUE in the single HubMap LN would position among the results obtained with an available single-stain, single-cell quantitative tool, QuPath, we quantified the IHC stain of two different antibody cocktails: the AE1–AE3 mixture and a two-rabbit monoclonal antibodies cocktail directed at low molecular weight keratin 8–18. AE1–AE3 labeled 9.27% ± 6.79% cells (5 sentinel LN), the LMW keratin cocktail 8.05% ± 7.02% (4 sentinel LN). We also used QuPath to quantify the IF stains used for the TSA and the control experiments. Quantification of IF stains in QuPath was highly erratic because of the difficulty of discriminating by eye signal from autofluorescent background (see Supplementary Tables).
3.5. Hyperplexed staining methods can handle images unreadable by the human eye
We obtained the raw IF images from the HubMap dataset and we could not visualize a distinct CK+ cell population except by applying an image log transformation combined with virtual 3D visualization, after which we were able to identify a weak CK signal co-localizing with vimentin and SM actin (Figure 7a and Supplementary Figure S6).
Since additional sections from the CODEX LN sample were not available, we used in-house processed FFPE LN sections and stained them with an aliquot of the original AE3 antibody, which we found still effective on a positive control after 37 years(Reference Argentieri, Pilla, Vanzati, Lonardi, Facchetti, Doglioni, Parravicini and Cattoretti 40 ) (Supplementary Figure S7).
By optimizing the staining conditions(Reference Furia, Pelicci, Perillo, Bolognesi, Pelicci, Facciotti, Cattoretti and Faretta 41 ) and enhancing the image contrast, numerous AE3+ cells co-localize with a LMW KRT staining (Figure 7c), in addition to other stained cells. The latter appeared to be B cells, based on location and aggregation (no B cell markers were co-stained). TSA amplification of the AE3 signal did not improve the stain (Supplementary Figures S8 and S9).
The staining pattern of in-house CK staining of the lymph node reproduced what was obtained from the CODEX dataset, including the follicular B cell staining (Figure 7b).
4. Discussion
The upper discrimination limit of 64 shades of gray we have shown implies that in the best scenario, signals 4 gray intensity levels (at 8 bit) brighter than noise (256 divided by 64) cannot be discriminated against by the human eye, in addition to known visual and cognitive traps.(Reference Aeffner, Wilson, Martin, Black, Hendriks, Bolon, Rudmann, Gianani, Koegler, Krueger and Young 33 ) We thus confirm published data(Reference Kreit, Mäthger, Hanlon, Dennis, Naik, Forsythe and Heikenfeld 32 ) and a number of publicly available anecdotal observations (see Supplementary Tables).
Lack of discrimination of signal from noise broadly affects the appreciation of the full spectrum of biomarker distribution, both in light microscopy and IF. Despite having at disposal the most sensitive stain, IHC with DAB, IA tools are based on algorithms which do not reliably account for signals at the low end of the spectrum. In IHC and IF, a dichotomous image representation (pos/neg) after thresholding is the rule, because of the human eye limit.
Manual gating (i.e., the application of a threshold/barrier discriminating two sets of variables) is considered the main source of data variability and inconsistency in FCM,(Reference Mair, Hartmann, Mrdjen, Tosevski, Krieg and Becher 8 , Reference Saeys, Van Gassen and Lambrecht 42 , Reference Segebarth, Griebel, Stein, von Collenberg, Martin, Fiedler, Comeras, Sah, Schoeffler, Luffe, Durr, Gupta, Sasi, Lillesaar, Lange, Tasan, Singewald, Pape, Flath and Blum 43 ) but has never been addressed in tissue staining. The lack of awareness of this limit, results in most of the high-dimensional approach to in-situ cell classification to take advantage of a gating strategy at some point during the process.(Reference Hickey, Tan, Nolan and Goltsev 29 , Reference Phillips, Schürch, Khodadoust, Kim, Nolan and Jiang 44 –Reference Schürch, Bhate, Barlow, Phillips, Noti, Zlobec, Chu, Black, Demeter, McIlwain, Kinoshita, Samusik, Goltsev and Nolan 47 )
Establishing a signal threshold has three effects, not all negative: i) positively assigning a cell to a lineage, ii) removing “unwanted” evidence, and iii) limiting the discovery of new cell types, based on novel phenotypic profiles.
In hyperplexed staining, the limitation in the number of Abs which can fit in a spatial panel forces the selection of biomarkers with i) high “diagnostic” value, ii) dichotomic expression, and iii) little overlap with other markers in the panel. A gating strategy to cluster classification(Reference Hickey, Tan, Nolan and Goltsev 29 , Reference Schürch, Bhate, Barlow, Phillips, Noti, Zlobec, Chu, Black, Demeter, McIlwain, Kinoshita, Samusik, Goltsev and Nolan 47 ) is a consequence.
This sparing choice results in a deductive approach to the cell classification, which is not ideal to discover new cell types(Reference Phillips, Schürch, Khodadoust, Kim, Nolan and Jiang 44 ) and prone to overlook unexpected reactivities.
The selection and validation of the antibodies for in-situ staining(Reference Hewitt, Baskin, Frevert, Stahl and Rosa-Molinar 48 ) are made in general either in unrelated substrates (cell extracts, FCM, tumor clonal proliferations) or in tissue staining devoid of phenotypic detail, mostly derived by single color IHC (see Supplementary Data S1 (Reference Schürch, Bhate, Barlow, Phillips, Noti, Zlobec, Chu, Black, Demeter, McIlwain, Kinoshita, Samusik, Goltsev and Nolan 47 ) as an example). Strong staining by visual inspection is favored while concurrent additional weak tissue reactivity is ignored and cataloged as “background.”
Dimensionality reduction algorithms-based analytical pipelines (DRAAPs) extract from spatial images granular data which cannot be acquired by human eye-guided visual representation nor by manual signal thresholding. Both Phenograph and BRAQUE sample the whole range of pixel values, with the difference that BRAQUE provides a phenotypic profile for each cluster, which is based on statistically ranked characterizing markers, chosen from the whole biomarker set, agnostic of the cell type definition meaning and not relying on preset thresholds. The roster of cluster-defining markers includes expected diagnostic Abs, but also unexpected novel expressions such as CD35 and podoplanin (PDPN) on FRC and CD107a/LAMP1 on sinus lining Live1+ endothelial cells. The advantage of BRAQUE and FlowSOM(Reference Chester and Maecker 49 ) is to provide marker-agnostic cluster-by-cluster evaluation of the key markers statistical relevance (BRAQUE) or mean marker intensity (“star charts”; FlowSOM), and to present the evidence to human judgment.
BRAQUE introduces an innovative data pre-processing step, Lognormal Shrinkage, which is able to enhance input fragmentation by fitting a lognormal mixture model and shrink each component toward its median.(Reference Dall’Olio, Bolognesi, Borghesi, Cattoretti and Castellani 19 ) It is therefore able to further subdivide signals in the low range and feed these discretized values to the DRAAP. The final effect is somewhat analogous to the introduction of the “logicle” module for FCM.(Reference Herzenberg, Tung, Moore, Herzenberg and Parks 6 )
As a result, BRAQUE is ranking AE3 levels in FRC and in those cells only in the significant first or second tiers (see Supplementary Tables) despite the very low levels. In other words, BRAQUE provides statistical strength to the visual perception in Figure 7c that AE3 and LMW KRT are co-expressed, thus validated according to the “differential antibody” validation criteria,(Reference Bolognesi, Mascadri, Furia, Faretta, Bosisio and Cattoretti 21 , Reference Edfors, Hober, Linderback, Maddalo, Azimi, Sivertsson, Tegel, Hober, Szigyarto, Fagerberg, von Feilitzen, Oksvold, Lindskog, Forsstrom and Uhlen 50 ) but in those cells only. Worth noting that BRAQUE allocates the CK signal in B cells only and not in other hematolymphoid cells (Supplementary Data), and in the lowest tier when significant.
Notably, there is another antibody which unexpectedly shows up in FRC: CD35 (Supplementary Tables and Supplementary Figure S4). CR1 (the protein name of CD35) is not listed in the Human Protein Atlas (https://www.proteinatlas.org) to be expressed in fibroblasts and in LN, only in the follicular dendritic cells, B cells and macrophages.
CD107a (LAMP1) is listed by the Human Protein Atlas as ubiquitous, however, according to BRAQUE, is differentially expressed only in Lyve1+ endothelial cells and some types of macrophages. These data would not be anticipated by a traditional imaging (Supplementary Tables and Supplementary Figure S4) or image analysis. Interestingly, BRAQUE do not list LAMP1 among the ranked markers in neutrophils (Supplementary Data), because the mean expression in these cells falls within the average variation of the rest of the cells.
Our experience from ongoing research (manuscript in preparation) is that the analytical power of DRAAPs and of BRAQUE in particular will discover quite a few other examples of “validated” antibodies which need to be reassessed because of single cell classification. This may be due to inadequate validation upfront however we favor the hypothesis of missed low-level expression during antibody characterization.
The shortcomings of a visual-guided appreciation of in situ immune detection are numerous.
There are published data showing the expression of certain biomarkers which have never been reproduced by in situ staining; one example is CD5, shown by RNA and protein on conventional dendritic cells type 2 (cDC2) by high dimensional analysis(Reference Sulczewski, Maqueda-Alfaro, Alcantara-Hernandez, Perez, Saravanan, Yun, Seong, Arroyo Hornero, Raquer-McKay, Esteva, Lanzar, Leylek, Adams, Das, Rahman, Gottfried-Blackmore, Reizis and Idoyaga 51 –Reference Leylek, Alcántara-Hernández, Lanzar, Lüdtke, Perez, Reizis and Idoyaga 57 ) but not on tissue with in situ IF (Wood et al.(Reference Wood and Freudenthal 58 ) and manuscript in preparation). Another example is AID, the enzyme required in the nucleus to perform DNA alterations, which for some time has been detected only in the cytoplasm(Reference Cattoretti, Büttner, Shaknovich, Kremmer, Alobeid and Niedobitek 59 ) and still not detected(Reference Willenbrock, Renné, Rottenkolber, Klapper, Dreyling, Engelhard, Küppers, Hansmann and Jungnickel 60 ) in the presence of the RNA message.(Reference Ehrhardt, Hijikata, Kitamura, Ohara, Wang and Cooper 61 )
Notices about the limitation of an eye-guided approach in high-dimensional studies have just begun to appear in the specialized literature.(Reference Kang, Szabo, Farago, Perez-Villatoro, Launonen, Anttila, Elias, Casado, Sorger and Färkkilä 62 , Reference Holman, Rubin, Ferenc, Holman, Koron, Daniel, Boland, Nolan, Chang and Rogalla 63 )
DRAAPs sensitivity is superior to IA tools used in a conventional setting of low-plex staining. However, saying that DRAAPs are more sensitive is an oversimplification, which blurs the details of how this result is acquired.
First, the algorithm computes the mean expression of all markers in a given cell against all others. Most importantly, data are analyzed as continuous variables, as for FCM,(Reference Wang, Yang, Wu, Song, Wang and Wang 64 ) because they use normalized mean signal intensity data from single cells, despite the fact that biomarkers may be selected for all-or-nothing expression.
Second, there must be enough biomarkers in the panel in order to classify as different cells which would otherwise be clustered together. Note that DRAAPs can identify cells not only based on present, but also on absent markers.(Reference E-aD, Davis, Tadmor, Simonds, Levine, Bendall, Shenfeld, Krishnaswamy, Nolan and Pe’er 65 )
Third, the algorithm must be robust enough not to be disturbed by noise.(Reference Hickey, Tan, Nolan and Goltsev 29 )
Fourth, unlike Principal Component Analysis (PCA) which requires at least 2 dimensions, other DRAAPs do not have a minimum number of dimensions to identify meaningful relationships among the data; however, the higher the number of dimensions/parameters provided, the better the discriminative power.
And as a word of caution, fifth, DRAAPs work in a relative space run by mathematics, and can score segmented cells as “negative” for a given biomarker, because statistically below a “mean average” or not above the noise level; in some cases the mean average signal may be considered “positive” by human visual evaluation.
In case the markers are not gated in advance, the product of the DRAAPs is a probabilistic phenotype, because of the inner mathematical working of the algorithm. To go from there to a cell-type cluster classification, other steps are required: deep learning cell classification(Reference Geuenich, Hou, Lee, Ayub, Jackson and Campbell 66 ) and/or human intervention, neither envisioning visual appreciation of images.
In conclusion, it is about time for hyperplexed spatial proteomics to reduce the dependency from multicolor IF images and the biases associated with human vision and to embrace a space savvy bioinformatic approach like the one that FCM and scRNAseq currently employ. The huge bonus of relinquishing visual imaging and gating is the ability to discover new cell types and cell functions,(Reference Geuenich, Hou, Lee, Ayub, Jackson and Campbell 66 ) at the cost of revisiting the significance and specificity of the biomarkers which identify such novel populations.
Abbreviations
- Ab
-
antibody
- BRAQUE
-
Bayesian Reduction for Amplified Quantization in UMAP Embedding
- CK
-
cytokeratin
- CYTOF
-
cytometry by time of flight
- DAB
-
diaminobenzidine
- DRAAP
-
dimensionality reduction algorithms-based analytical pipeline
- FCM
-
flow cytometry
- FDC
-
follicular dendritic cells
- FFPE
-
formalin-fixed, paraffin-embedded
- FRC
-
fibroblastic reticular cells
- IA
-
image analysis
- IHC
-
immunohistochemistry
- IF
-
immunofluorescence
- IHC
-
immunohistochemistry
- LMW-KRT
-
low molecular weight keratins
- MILAN
-
Multiple Iterative Labeling by Antibody Neodeposition
- scRNAseq
-
single cell RNA sequencing
- TSA
-
tyramide signal amplification
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S2633903X24000138.
Data availability statement
The human LN dataset belongs to The Human Body at Cellular Resolution: the NIH Human BioMolecular Atlas Program (doi:10.1038/s41586-019-1629-x). The results here are in whole or part based upon data generated by the NIH Human BioMolecular Atlas Program (HuBMAP): https://hubmapconsortium.org. The image files and the .csv file for a human LN immunostained with the CODEX platform (now PhenoCycler, Akoya Biosciences, Delaware) are accessed at the HubMap website (https://portal.hubmapconsortium.org/), and for the sample HBM754.WKLP.262 (doi:10.35079/HBM754.WKLP.262) at https://portal.hubmapconsortium.org/browse/dataset/c95d9373d698faf60a66ffdc27499fe1 (last accessed June 11, 2023).
Additional data (the clusters definition produced with BRAQUE, .ndpi IF images, the images for the bit reduction test, etc.) are deposited in Bicocca Open Archive Research Data (BOARD).Bolognesi, Maddalena; Dall’Olio, Lorenzo; Maerten, Amy; Borghesi, Simone; Castellani, Gastone; Cattoretti, Giorgio (2023), “Seeing or believing in hyperplexed spatial proteomics via antibodies.”, Bicocca Open Archive Research Data, V1, doi: 10.17632/kmxz7fgydx https://data.mendeley.com/datasets/kmxz7fgydx/1
Acknowledgments
We wish to thank Christopher Wilson (Time Inc., New York) for sharing the code of the website used for testing, Mario R. Faretta (IEO, Milan, Italy) for invaluable advice, and the Pathologists who contributed suggestions and human visual scoring of synthetic and real-life tests, Elisa Belloni, Francesca M. Bosisio, Alessandro Caputo, Giorgio Cazzaniga, Roberta Ciccimarra, Vincenzo L’Imperio, Fabio Pagni, Davide Seminati, Claudio Tripodo, and Matteo Zoboli.
Author contribution
L.D., S.B., and G.Cast. generated BRAQUE. G.Cat. and A.M. performed immunostains. M.M.B., G.Cat., and L.D. analyzed the data. G.Cat. and G.Cast. wrote the manuscript. M.M.B. and L.D. contributed equally to this work.
Funding statement
G.Cat. and M.M.B. received funding from Regione Lombardia POR FESR 2014–2020, Call HUB Ricerca ed Innovazione: ImmunHUB. G.Cast. received funding from the EU Horizon 2020 programme (GenoMed4All project #101017549, HARMONY and HARMONY-PLUS project #116026), and the AIRC Foundation (Associazione Italiana per la Ricerca contro il Cancro; Milan, Italy; projects #26216).
Competing interest
The authors declare none.