Impact Statement
Single-particle cryo-electron microscopy (cryo-EM) is an imaging technique used to determine the 3D structure of biomolecules from noisy 2D projection images. This article revisits one of the first approaches to cryo-EM image processing, namely common lines between pairs of 2D class averages coming from the Fourier slice theorem. We present a novel mathematical approach for dealing with common lines: in contrast to some alternatives, it operates directly on the common lines themselves and avoids triplewise angular reconstitution completely. The article then derives novel algebraic constraints on sets of consistent common lines, including a straightforward low-rank matrix condition. The algebraic conditions are incorporated into optimization methods arising from the field of computer vision to produce new methods for computational tasks involving common lines. In particular, we achieve improved accuracy in common line denoising and rotation recovery at low signal-to-noise ratios. We also present a method to detect homogeneous communities of 2D class averages in the case of a cryo-EM dataset with multiple molecular conformations. Altogether this work clarifies a classic topic in cryo-EM, and opens the door to applying common lines techniques on more challenging datasets.
1. Introduction
Single-particle cryo-electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3D structure of molecules from many noisy tomographic projection images taken at unknown viewing angles. One of the first approaches for 3D reconstruction, known as angular reconstitution, is based on the common line property of projection images induced by the Fourier slice theorem.(Reference Vainshtein and Goncharov1, Reference Van Heel2) Due to the low signal-to-noise ratio (SNR) in cryo-EM data, detecting common lines is a difficult task(Reference Singer, Coifman, Sigworth, Chester and Shkolnisky3): even today when applied to denoised averages of images, referred to as 2D class averages. Detecting common lines is subject to angular errors and incorrectly identified common lines. Although methods which seek to minimize global errors in the estimated viewing directions have increased the utility of common lines methods,(Reference Singer and Shkolnisky4) additional constraints on common lines are needed to improve their accuracy and robustness.
In this article, we propose a novel approach for dealing with common lines. Specifically, we assemble the estimated common lines for a dataset of $ n $ images into a certain $ 2n\times n $ matrix, which stores properly scaled basis vectors for the common lines (Theorem 3.2). The matrix directly encodes common lines data, without requiring angular reconstitution on various subsets of images or needing voting procedures like some existing formulations.(Reference Singer, Coifman, Sigworth, Chester and Shkolnisky3, Reference Shkolnisky and Singer5) As such, it yields a direct and more global approach than prior constructions for common lines.
As a main contribution, we derive algebraic constraints on the matrix of common lines, which must be satisfied in order for a set of common lines to be consistent with a single asymmetric molecular conformation. The constraints include a straightforward low-rank condition on the matrix, as well as various sparse quadratic constraints. Importantly, the constraints enable new strategies for computational tasks involving common lines, in particular for denoising common lines; estimating 3D rotations; and clustering heterogeneous image sets into homogeneous subsets. We demonstrate this by adapting optimization algorithms from other domains to these tasks, using the algebraic constraints. We remark that our constraints seem better suited for numerical optimization than the semialgebraic constraints found in prior work.(Reference Dynerman6)
Notably, the clustering problem is a recent application of common lines.(Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) In more detail, the goal is to sort discretely heterogeneous image sets of multiple molecules into communities corresponding to homogeneous image subsets. This application is motivated by the increasing complexity of cryo-EM datasets, where samples may not be purified and thus the number of distinct molecules contained in a dataset is more than one.(Reference Verbeke, Mallam, Drew, Marcotte and Taylor8–Reference Sae-Lee, McCafferty, Verbeke, Havugimana, Papoulas, CD, Houser, Vanuytsel, Murphy, Drew, Emili, Taylor and Marcotte10) Our algebraic constraints and optimization algorithms enable consistency checks of subsets of images, to test whether the subset corresponds to a single molecule.
As a mathematical guarantee, we prove that computing the correct scales in the homogeneous case admits an essentially unique global optimum, see Theorem 5.1. We implement our algorithms and test them on simulated and real datasets in Section 7. The results demonstrate that our methods can be successful when applied to 2D class averages at noise levels comparable to experimental data, in both the homogeneous and discretely heterogeneous cases. We conclude with a discussion of potential future improvements.
1.1. Advantages
There are several advantages to our approach for dealing with common lines:
-
• The new formulation is directly in terms of the data, that is, in terms of the common lines themselves.
-
• It involves multiple common lines simultaneously and does not require triplewise angular reconstitution at all (in contrast to (Reference Shkolnisky and Singer5) for instance), making our approach fully global and potentially more robust to noise than alternatives.
-
• The algebraic constraints can be incorporated into existing optimization algorithms that have seen success in computer vision applications.(Reference Sengupta, Amir, Galun, Goldstein, Jacobs, Singer and Basri11)
-
• The resulting algorithms outperform existing methods for denoising and rotation recovery on noisy simulated data, and perform comparably well for clustering heterogeneous image sets on real data, even though the optimization algorithms are off-the-shelf.
2. Background
First, we recall a standard simplified mathematical model for cryo-EM, in the homogeneous case of one molecular conformation. We assume there exists a 3D function $ \varphi :{\mathrm{\mathbb{R}}}^3\to \mathrm{\mathbb{R}} $ describing the electrostatic potential generated by the molecule. As data, we receive $ n $ two-dimensional tomographic projection images, denoted $ {I}_{R^{(i)}}:{\mathrm{\mathbb{R}}}^2\to \mathrm{\mathbb{R}} $ for $ i=1,\dots, n $ , where $ {R}^{(i)}\in \mathrm{SO}(3) $ are 3D rotations associated with each image. The goal of single-particle cryo-EM is to recover the underlying 3D structure $ \varphi $ from the set of 2D tomographic projection images which are observed at unknown rotations. The images, in their idealized and noiseless form, have the following Fourier transforms due to the Fourier slice theorem:
Here $ \hat{\varphi}\left(\hat{x},\hat{y},\hat{z}\right)\hskip0.5em := \hskip0.5em {\int}_{{\mathrm{\mathbb{R}}}^3}\varphi \left(x,y,z\right){e}^{\sqrt{-1}\left(x\hat{x}+y\hat{y}+z\hat{z}\right)} dxdydz $ denotes the Fourier transform of $ \varphi $ , and $ {R}^{(i)}\cdot \hat{\varphi} $ denotes the rotation of $ \hat{\varphi} $ by $ {R}^{(i)} $ . Writing $ {R}^{(i)}={\left({\mathbf{r}}_1^{(i)}\hskip0.5em {\mathbf{r}}_2^{(i)}\hskip0.5em {\mathbf{r}}_3^{(i)}\right)}^{\top } $ , (1) reads
Generically, for asymmetric molecules $ \varphi $ and distinct rotations $ {R}^{(i)} $ and $ {R}^{(j)} $ , there exist unique lines through the origin in the domain of the Fourier-transformed images $ {\hat{I}}_{R^{(i)}} $ and $ {\hat{I}}_{R^{(j)}}, $ respectively $ {\mathrm{\ell}}_{ij}\hskip0.35em \subseteq \hskip0.35em \mathrm{domain}\left({\hat{I}}_{R^{(i)}}\right)={\mathrm{\mathbb{R}}}^2 $ and $ {\mathrm{\ell}}_{ji}\hskip0.35em \subseteq \hskip0.35em \mathrm{domain}\left({\hat{I}}_{R^{(j)}}\right)={\mathrm{\mathbb{R}}}^2 $ , such that the restrictions
are equal as functions on $ {\mathrm{\mathbb{R}}}^2 $ . In cryo-EM, one says that $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ are the common lines between the $ i $ th and $ j $ th image. In modest-noise settings, which are arrived at by working with 2D class averages instead of raw tomographic images,(Reference Frank12) common lines can be estimated from real cryo-EM data. They give basic ways to do 3D reconstruction in cryo-EM; for example, see the angular reconstitution technique of van Heel(Reference Van Heel2) or the works of Shkolnisky, Singer, and their collaborators(Reference Singer, Coifman, Sigworth, Chester and Shkolnisky3–Reference Shkolnisky and Singer5) for example.
From (2), the common lines $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ may be found mathematically by expressing the single line in 3D space:
in the coordinate system of the ith and jth images, respectively. Here, $ \times $ denotes the cross product in $ {\mathrm{\mathbb{R}}}^3 $ . Combining (2) and (4), (3) may be written as:
where $ \left\langle \cdot, \cdot \right\rangle $ denotes the standard inner product in $ {\mathrm{\mathbb{R}}}^3 $ . Common lines can therefore be encoded via:
Definition 2.1. Vectors $ {\mathbf{a}}_{ij},{\mathbf{a}}_{ji}\in {\mathrm{\mathbb{R}}}^2 $ are called representatives for the common lines $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ if there exists a nonzero scalar $ {\lambda}_{ij}={\lambda}_{ji}\in \mathrm{\mathbb{R}} $ such that
Equivalently, representatives $ {\mathbf{a}}_{ij} $ and $ {\mathbf{a}}_{ji} $ are choices of basis vectors for the common lines $ {\mathrm{\ell}}_{ij} $ and $ {\mathrm{\ell}}_{ji} $ which satisfy $ {\hat{I}}_{R^{(i)}}\left(\lambda {\mathbf{a}}_{ij}\right)={\hat{I}}_{R^{(j)}}\left(\lambda {\mathbf{a}}_{ji}\right) $ for all $ \lambda \in \mathrm{\mathbb{R}} $ . Representatives for common lines can be estimated from 2D class averages in practice.
We stress that, although quite standard, the model (1) is greatly simplified. It neglects the effects of contrast transfer functions (CTFs), imperfect centering in particle picking, and blurring in class averaging. Further, we have restricted attention to the case of asymmetric molecules, as otherwise common lines are only unique up to the action of the relevant symmetry group (e.g., see (Reference Geva and Shkolnisky13)).
3. Constraints on sets of common lines
3.1. The common lines matrix
We introduce an object to keep track of all common lines in a dataset. It is the main object of this article.
Definition 3.1. A common lines matrix associated to rotations $ {R}^{(1)},\dots, {R}^{(n)}\in \mathrm{SO}(3) $ is a matrix $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , which when regarded as an $ n\times n $ block matrices with $ 2\times 1 $ blocks $ {\mathbf{a}}_{ij}\in {\mathrm{\mathbb{R}}}^2 $ is such that $ {\mathbf{a}}_{ij},{\mathbf{a}}_{ji} $ are representatives for the common lines $ {\mathrm{\ell}}_{ij},{\mathrm{\ell}}_{ji} $ if $ i $ and $ j $ are distinct and $ {\mathbf{a}}_{ii}=0 $ otherwise. If the scalars $ {\lambda}_{ij} $ in (5) are all equal to 1, then we call $ A $ the pure common lines matrix.
Thus a common lines matrix $ A $ associated to $ {R}^{(1)},\dots, {R}^{(n)} $ is uniquely defined up to $ \left(\begin{array}{c}n\\ {}2\end{array}\right) $ nonzero real scalars $ {\lambda}_{ij} $ ( $ i<j $ ). In real data settings where the 2D class averages are sufficiently denoised, we can estimate $ A $ from data by estimating representatives for the common lines.
We now present constraints which a pure common lines matrix must satisfy. Firstly, there is the following low-rank condition. All of our computational methods take advantage of this.
Theorem 3.2. Let $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ be the pure common lines matrix associated to Zariski-generic rotations $ {R}^{(1)},\dots, {R}^{(n)}\in \mathrm{SO}(3) $ where $ n\ge 3 $ . Then $ \operatorname{rank}(A)=3 $ .
Proof. Since
the pure commons line matrix admits the following factorization:
Equation (6) witnesses $ \operatorname{rank}(A)\le \min \left(3,n\right) $ $ =3 $ . We have equality when $ {R}^{(i)} $ are generic because the two matrices in the factorization are full rank.
There are also necessary quadratic constraints in the entries of a pure common lines matrix.
Proposition 3.3. Suppose $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ is a pure common lines matrix where $ n\ge 3 $ . Then for any $ 1\le i<j\le n $ , we have $ \parallel {\mathbf{a}}_{ij}{\parallel}_2^2=\parallel {\mathbf{a}}_{ji}{\parallel}_2^2 $ .
Proof. See Appendix A.
Proposition 3.4. Suppose $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ s a pure common lines matrix where $ n\ge 3 $ . Then for any $ 1\le i<j<k\le n $ , we have $ \det \left(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}\right)=-\det \left(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array}\right)=\det \left(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array}\right) $ .
Proof. See Appendix A.
From Propositions 3.3 and 3.4, the number of quadratic constraints on a pure common lines matrix equals $ \left(\begin{array}{l}n\\ {}2\end{array}\right)+2\left(\begin{array}{l}n\\ {}3\end{array}\right) $ .
Example 1. Consider $ n=4 $ . Then a pure common lines matrix written in block form,
has rank at most 3 and satisfies 14 quadratic equations, which are
where $ \parallel \cdot {\parallel}_2 $ denotes the Euclidean norm. Note that $ \operatorname{rank}(A)\le 3 $ is equivalent to the vanishing of all $ 4\times 4 $ minors of $ A $ , giving a collection of homogeneous degree $ 4 $ polynomial constraints on the entries of $ A $ .
We note that (6) furnishes a polynomial map which sends an n-tuple of rotations to a pure common lines matrix:
Studying $ \psi $ will allow us understand additional important properties of pure common lines matrices. To do this, we will need to introduce some terminology and elementary concepts from algebraic geometry (see (Reference Harris14) for precise definitions.)
A subset $ X\hskip0.5em \subseteq \hskip0.5em {\mathrm{\mathbb{R}}}^d $ is called an algebraic variety if it is the set of points in $ {\mathrm{\mathbb{R}}}^d $ where a finite collection of polynomials all simultaneously equal 0. For example, $ SO(3) $ is an algebraic variety since it is the set of matrices $ R $ in $ {\mathrm{\mathbb{R}}}^{3\times 3}\cong {\mathrm{\mathbb{R}}}^9 $ satisfying the polynomial equations $ {R}^{\top }R-{I}_{3\times 3}={0}_{3\times 3} $ and $ \det (R)-1=0 $ . Roughly speaking, an algebraic variety is similar to an embedded manifold, expect possibly singular and always defined by polynomial equations. Due to the properties of polynomials, an algebraic variety $ X $ is a “thin” subset of $ {\mathrm{\mathbb{R}}}^d $ in which it lives: provided $ X\ne {\mathrm{\mathbb{R}}}^d $ , the complement of $ X $ is always a dense subset filling up almost the entirety of the ambient space. More precisely, if one samples a random point from $ {\mathrm{\mathbb{R}}}^d $ according to any absolutely continuous probability distribution, then with probability $ 1 $ the point will lie in the complement of $ X $ . We say that some property $ P $ holds (Zariski) generically if it holds for all points in the complement of some algebraic variety $ X\subsetneq {\mathrm{\mathbb{R}}}^d $ , and we call such points (Zariski) generic. Roughly speaking, this means that property $ P $ holds with probability $ 1 $ (even if, as usually the case, the variety $ X\subsetneq {\mathrm{\mathbb{R}}}^d $ is left unspecified).
Recall that the fiber of a map at a point $ p $ in its image is the set of points in its domain which map to $ p $ . Therefore to answer the question, “Does a pure common lines matrix uniquely determine the rotations which generated it?” we need to understand the fibers of the map $ \psi $ in (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7). Our next result shows that the answer to the question is “Yes,” up to a global rotation, provided the pure common lines matrix is generic.
Theorem 3.5. For $ n\ge 3, $ generically, the fibers of the map $ \psi $ are isomorphic to $ \mathrm{SO}(3) $ . More precisely, for generic rotations $ {R}^{(1)},\dots, {R}^{(n)} $ it holds that
Proof. See Appendix A.
Remark 3.6. Theorem 3.5 does not contradict the chirality ambiguity in cryo-EM, which states that the 3D molecule and rotations can only ever be recovered up to a global rotation and global reflection given cryo-EM data. In Theorem 5.1, we prove that there are two possible pure common lines matrices for a given non-pure common lines matrix. They differ by a global sign, and correspond to rotation tuples $ \left({R}^{(1)},\dots, {R}^{(n)}\right) $ and $ \left({JR}^{(1)},\dots, {JR}^{(n)}\right) $ where $ J=\mathit{\operatorname{diag}}\left(-1,-1,1\right) $ , respectively. The chirality or handedness ambiguity is well-known in the common lines literature(Reference Van Heel2, Reference Shkolnisky and Singer5) and unavoidable.
3.2. The common lines variety
In general, the image of a polynomial map from an algebraic variety is not an algebraic variety, because polynomial inequalities (in addition to equations) are needed in the description of the image.(Reference Bochnak, Coste and Roy15) This means that the set of all pure common lines matrices, that is, the image of $ \psi $ from $ SO(3) $ , on its own is not an algebraic variety. To resolve this, we consider the smallest algebraic variety in $ {\mathrm{\mathbb{R}}}^{2n\times n} $ containing $ \psi \left( SO(3)\right) $ , that is, we add the smallest set of additional points (which are not pure common lines matrices) to $ \psi \left( SO(3)\right) $ until the union becomes an algebraic variety. The process is called taking the Zariski closure of $ \psi \left( SO(3)\right) $ . We call the resulting algebraic variety the common lines variety and it lives in the ambient space $ {\mathrm{\mathbb{R}}}^{2n\times n} $ . The common lines variety is defined by polynomial equations in the entries of a matrix $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ . Since the common lines variety includes all pure common lines matrices, the polynomials defining it, in particular, include the constraints we already identified in Section 3.1.
Example 2. When $ n=3 $ , we used the computer algebra system Macaulay2(Reference Grayson and Stillman16) to determine the collection of polynomial equations defining the common lines variety. Along with the 5 quadratic polynomials from Propositions 3.3 and 3.4, our computation also found 1 polynomial of degree 6, 64 polynomials of degree 8, and 24 polynomials of degree 10, for a total of 94 polynomial equations. Notice that for $ n=3 $ , the rank $ 3 $ constraint of Theorem 3.2 is vacuous. We find the 5 quadratics are the only homogeneous polynomials. The other 89 equations are highly complex and we refrain from explicitly writing them here. They are available at the GitHub repository (8.1).
In view of Example 2, we believe that Section 3.1 identifies all “simple-to-describe” algebraic constraints on pure common lines matrices. As such, it is important to understand to what extent these constraints are enough to characterize pure common lines matrices. This requires understanding the geometry of the common lines variety better, for which we will need to use a couple more basic concepts from algebraic geometry described in the next two paragraphs.
In general, every algebraic variety $ X\subseteq {\mathrm{\mathbb{R}}}^d $ admits a unique decomposition into a finite union of irreducible components $ X={\cup}_{i=1}^r{X}_i $ , where $ {X}_i\subseteq {\mathrm{\mathbb{R}}}^d $ are algebraic varieties themselves and each cannot be decomposed as a union of two strictly smaller varieties. We think of $ {X}_i $ as “building blocks” of $ X $ . For example, the variety $ \left\{\left(x,y\right)\in {\mathrm{\mathbb{R}}}^2: xy=0\right\} $ is a union of the $ x $ - and $ y $ -axes, and these lines are its irreducible components. In general, each irreducible component $ {X}_i $ can be ascribed a dimension, which captures the number of degrees of freedom in $ {X}_i $ and coincides with manifold dimension when $ {X}_i $ is smooth. We note that the dimension of different irreducible components $ {X}_i $ of $ X $ may differ.
Given an algebraic variety $ X\subseteq {\mathrm{\mathbb{R}}}^d $ , one can construct from it a larger algebraic variety $ C(X)\subseteq {\mathrm{\mathbb{R}}}^d $ called the cone over $ X $ by adding to $ X $ all points in $ {\mathrm{\mathbb{R}}}^d $ which lie on a line passing through the origin and a point on $ X $ . This constructs an algebraic variety that includes all scalar multiples of points of $ X $ .
Since the constraints we identified in Section 3.1 are all polynomial equations, they define Section 3.1 as an algebraic variety in $ {\mathrm{\mathbb{R}}}^{2n\times n} $ . In the next proposition, we show that this algebraic variety contains the cone over the common lines variety as an irreducible component. This means that locally on this component, our constraints from Section 3.1 are sufficient to characterize pure common lines matrices up to scale. The proof relies on computer algebra software,(Reference Grayson and Stillman16) and checks that certain numerical matrices are full-rank, so we also report the range of $ n $ on which the proposition has been confirmed.
Proposition 3.7. The algebraic variety defined by the low-rank constraint in Theorem 3.2, along with the quadratic constraints in Propositions 3.3 and 3.4, and the requirement that all diagonal blocks are $ 0 $ , contains the cone over the common lines variety in $ {\mathrm{\mathbb{R}}}^{2n\times n} $ as an irreducible component for $ n=3,\dots, 50 $ .
Proof. See Appendix A.
4. Optimization problem
We encode the common lines from cryo-EM data by choosing representatives $ {\hat{\mathbf{a}}}_{ij}\in {\mathrm{\mathbb{R}}}^2 $ (Definition 2.1) to form the $ 2\times 1 $ blocks in a common lines matrix $ \hat{A}\in {\mathrm{\mathbb{R}}}^{2n\times n} $ . Suppose we have rescaled the $ 2\times 1 $ blocks of $ \hat{A} $ so they all have norm 1. Then at least in clean situations, Theorem 3.2 and Propositions 3.3 and 3.4 imply we can scale the blocks by nonzero scalars $ {\lambda}_{ij} $ with $ {\lambda}_{ij}={\lambda}_{ji} $ so that the resulting matrix is a pure common lines matrix $ A $ , and thus has rank 3 and satisfies the set of norm and $ 2\times 2 $ determinant equations. Proposition 3.7 states that these constraints are sufficient to determine the common lines variety locally. In Section 5.4, we further prove that for purposes of recovering scales $ {\lambda}_{ij} $ to obtain a pure common lines matrix, the constraints are also sufficient.
Proper scales are not directly available from common lines data in cryo-EM. To find the scales we formulate an optimization problem, inspired by work for a mathematically similar problem in (Reference Sengupta, Amir, Galun, Goldstein, Jacobs, Singer and Basri11):
The mixed L1/Frobenius norm $ \parallel \cdot {\parallel}_2 $ in the objective is chosen for its robustness to outliers.
Once we obtain a pure common lines matrix, we show in Section 5.3 how to recover the rotations corresponding to the common lines (up to the ambiguity in Remark 3.6). Later in Section 6, we solve the problem (8) to identify homogeneous clusters among images coming from a discrete number of distinct molecules.
5. Optimization algorithms
Our approach to solving (8) is first to solve the problem with constraint (9) only, and then to enforce the constraint (10) on the solution. These steps are in Sections 5.1 and 5.2, respectively.
5.1. IRLS and ADMM for the rank constraint
To solve (8) with the rank constraint, we closely follow the approach of (Reference Sengupta, Amir, Galun, Goldstein, Jacobs, Singer and Basri11). We relax the mixed L1/Frobenius norm to a weighted least squares objective, where the weights and optimization variables are updated after each iteration of minimization via a procedure called Iterative Reweighted Least Squares (IRLS).(Reference Holland and Welsch17) Let $ t $ denote the IRLS iteration number. Then the objective (8) becomes:
where $ \otimes $ and $ \hskip0.5em \odot \hskip0.5em $ is the Kronecker and Hadamard product of two matrices respectively, $ {\Lambda}_{ij}={\lambda}_{ij}, $
is the squared weighted Frobenius norm of a block matrix $ M\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , the weights in (11) are
and $ 0<\delta \ll 1 $ is a chosen regularization parameter.
Within each iteration of IRLS, we need to solve the problem (11) with the constraints (9). Since the objective is bilinear in $ A $ and $ \Lambda $ , we can do this using the Alternate Direction Method of Multiplier (ADMM).(Reference Boyd, Parikh, Chu, Peleato and Eckstein18, Reference Goldstein, O’Donoghue, Setzer and Baraniuk19) This gives an augmented Lagrangian optimization problem:
where $ \Gamma $ is a matrix of Lagrange multipliers and $ \tau ={\sum}_{i,j=1}^n{w}_{ij}^{(t)} $ . We now describe the steps of the ADMM procedure. Since the problem (13) is non-convex, we alternatingly optimize for each variable. In the following, let $ k $ denote the ADMM iteration number and let $ {W}^{(t)}\in {\mathrm{\mathbb{R}}}^{n\times n} $ where $ {W}_{ij}^{(t)}={w}_{ij}^{(t)} $ be the matrix of weights within IRLS iteration $ t $ .
-
1. Optimize $ A $ and $ \Lambda $ : We alternatingly optimize for $ A $ and $ \Lambda $ until convergence. Let $ {k}^{\prime } $ denote the iteration number for this step.
1a. First we solve the unconstrained problem for $ A $ :
$$ \underset{A\in {\mathrm{\mathbb{R}}}^{2n\times n}}{\min}\hskip1em \frac{1}{2}\parallel \hat{A}-\left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em A{\parallel}_{\mathrm{WF}}^2+\frac{\tau }{2}\parallel {B}^{(k)}-\left(A+{\Gamma}^{(k)}\right){\parallel}_F^2. $$The solution is(14) $$ {\displaystyle \begin{array}{l}{A}^{\left({k}^{\prime }+1\right)}=\left(\left({W}^{(t)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \hat{A}+\frac{\tau }{4}\left({B}^{(k)}+{\Gamma}^{(k)}\right)\right)\\ {}\hskip5em \oslash \left(\left({W}^{(t)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)\hskip0.5em \odot \hskip0.5em \left({\Lambda}^{\left({k}^{\prime}\right)}\otimes {\mathbf{1}}_{2\times 1}\right)+\frac{\tau }{4}{\mathbf{1}}_{2n\times n}\right),\end{array}} $$where $ \oslash $ is the element-wise division of two matrices. Then we project $ A $ onto the set of matrices whose $ 2\times 1 $ diagonals are 0:(15) $$ {\mathbf{a}}_{ii}^{\left({k}^{\prime }+1\right)}=\mathbf{0}. $$1b. Next we solve the unconstrained problem for $ \Lambda $ :$$ {\displaystyle {\displaystyle \begin{array}{rl}\underset{\Lambda \in {\unicode{x211D}}^{n\times n}}{\min }& \hskip1em \frac{1}{2}\Vert \hat{A}-(\Lambda \otimes {\mathbf{1}}_{2\times 1})\odot {A}^{({k}^{\mathrm{\prime}}+1)}{\Vert}_{WF}^2\\ {}\mathrm{subject}\ \mathrm{to}& \hskip1em \Lambda ={\Lambda}^{\mathrm{\top}}\end{array}}} $$The solution is(16) $$ {\lambda}_{ij}^{\left({k}^{\prime }+1\right)}=\left\{\begin{array}{ll}\frac{w_{ij}\left\langle {\hat{\mathbf{a}}}_{ij},{\mathbf{a}}_{ij}^{\left({k}^{\prime }+1\right)}\right\rangle +{w}_{ji}\left\langle {\hat{\mathbf{a}}}_{ji},{\mathbf{a}}_{ji}^{\left({k}^{\prime }+1\right)}\right\rangle }{w_{ij}\parallel {\mathbf{a}}_{ij}^{\left({k}^{\prime }+1\right)}{\parallel}_2^2+{w}_{ji}\parallel {\mathbf{a}}_{ji}^{\left({k}^{\prime }+1\right)}{\parallel}_2^2}& \hskip2.8em \mathrm{if}\hskip0.8em i\ne j\\ {}0& \hskip2.8em \mathrm{if}\hskip0.8em i=j.\end{array}\right. $$After repeating 1a. and 1b. until convergence, we obtain $ {A}^{\left(k+1\right)} $ and $ {\Lambda}^{\left(k+1\right)} $ . -
2. Optimize $ B $ : The constrained problem for $ B $ is
$$ {\displaystyle \begin{array}{rl}\underset{B\in {\unicode{x211D}}^{2n\times n}}{\min }& \hskip1em \frac{\tau }{2}\Vert B-({A}^{(k+1)}+{\Gamma}^{(k)}){\Vert}_F^2\\ {}\mathrm{subject}\ \mathrm{to}& \hskip1em \mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}(B)=3\end{array}} $$This is solved by(17) $$ {B}^{\left(k+1\right)}=\mathrm{SVP}\left({A}^{\left(k+1\right)}-{\Gamma}^{(k)},3\right), $$where $ \mathrm{SVP}\left(M,3\right) $ is the singular value projection of a matrix $ M $ onto the set of matrices of rank at most $ 3 $ , which is computing by taking the highest three singular values of $ M $ and its corresponding left and right singular vectors. -
3. Update $ \Gamma $ : In ADMM, there is a gradient ascent step for $ \Gamma $ , where the step is the solution to
$$ \underset{\Gamma \in {\mathrm{\mathbb{R}}}^{2n\times n}}{\max}\parallel {B}^{\left(k+1\right)}-\left({A}^{\left(k+1\right)}-\Gamma \right){\parallel}_F^2. $$This gives the update(18) $$ {\Gamma}^{\left(k+1\right)}={\Gamma}^{(k)}+\left({B}^{\left(k+1\right)}-{A}^{\left(k+1\right)}\right). $$
Steps 1, 2, and 3 are repeated until convergence in the optimization variables. This completes the ADMM procedure for IRLS iteration $ t $ . The IRLS weights $ {w}_{ij}^{\left(t+1\right)} $ for iteration $ t+1 $ are updated using (12), and the ADMM procedure is repeated again. The whole pipeline is detailed in Algorithm 1.
Algorithm 1. IRLS and ADMM for rank constraint satisfaction
Input: $ \hat{A}\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , a common lines matrix
Output: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , a common lines matrix satisfying only the rank constraint
1: procedure IRLS-ADMM $ \left(\hat{A}\right) $
2: initialize $ A,\Lambda, W $
3: $ t\leftarrow 0 $
4: while not converged do
5: $ B\leftarrow A $
6: $ \Gamma \leftarrow {0}_{2n\times n} $
7: $ \tau \leftarrow {\sum}_{i,j=1}^n{w}_{ij}^{(t)} $
8: $ k\leftarrow 0 $
9: while not converged do
10: $ {k}^{\prime}\leftarrow 0 $
11: while not converged do
12: update $ A $ using (14) and (15) $ \vartriangleright $ 1. Update $ A $ and $ \Lambda $
13: update $ \Lambda $ using (16)
14: $ {k}^{\prime}\leftarrow {k}^{\prime }+1 $
15: end while
16: update $ B $ using (17) $ \vartriangleright $ 2. Update $ B $
17: update $ \Gamma $ using (18) $ \vartriangleright $ 3. Update $ \Gamma $
18: $ k\leftarrow k+1 $
19: end while
20: update $ W $ using (12) $ \vartriangleright $ Update IRLS weights
21: $ t\leftarrow t+1 $
22: end while
23: end procedure
5.2. Sinkhorn scaling for the quadratic constraints
When successful, IRLS-ADMM in Section 5.1 gives us a solution $ A $ to (8) satisfying the constraint (9). Next, we must enforce (10). As described below, our approach is to scale the rows and columns of $ A $ alternatingly until constraint (10) is satisfied, in a manner analogous to Sinkhorn’s algorithm.(Reference Sinkhorn and Knopp20) Note that nonzero row and column scales do not affect the rank of $ A $ , so constraint (9) will still be satisfied.
We find the row and column scales by solving least squares problems. First, we handle the norm constraints. Define $ M\in {\mathrm{\mathbb{R}}}^{n\times n} $ where
Then the norm constraints are satisfied if and only if $ M={M}^{\top } $ , which leads us to the following constrained least squares problems:
The solutions to problems (20) and (21) are
respectively, where $ {N}_L,{N}_R\in {\mathrm{\mathbb{R}}}^{n\times n} $ are the corresponding least squares matrices. See Appendix B for full details. The problems (22) and (23) are solved by taking $ \boldsymbol{\mu} $ and $ \boldsymbol{\tau} $ to be the right singular vector corresponding to the smallest singular value of $ {N}_L $ and $ {N}_R $ , respectively.
Now we handle the determinant constraints. Scaling each $ 2\times n $ row of $ A $ by $ {\mu}_1,\dots, {\mu}_n\in \mathrm{\mathbb{R}} $ and enforcing the constraints leads us to the equations
for all $ 1\le i<j<k\le n $ . Taking the signed root on each equation, we obtain
Scaling the columns of $ A $ by $ {\tau}_1,\dots, {\tau}_n\in \mathrm{\mathbb{R}} $ and enforcing the constraints leads to the equations
for all $ 1\le i<j<k\le n $ . Dividing the first equation above by $ {\tau}_k $ on both sides and the second equation above by $ {\tau}_i $ on both sides, we obtain
We observe that (25) and (27) are linear in $ {\mu}_i $ and $ {\tau}_i $ . We can enumerate all determinants into three vectors
each of length $ \left(\begin{array}{l}n\\ {}3\end{array}\right) $ . The determinant constraints are satisfied if and only if $ {\mathbf{v}}_1={\mathbf{v}}_2={\mathbf{v}}_3 $ , which leads to the following constrained least squares problems:
where the quantities in brackets (defined in (B5) and (B6)) are the corresponding scalings (25) and (27) of $ {\mathbf{v}}_1 $ , $ {\mathbf{v}}_2 $ , and $ {\mathbf{v}}_3 $ by $ \boldsymbol{\mu} $ and $ \boldsymbol{\tau} $ . The solutions to problems (29) and (30) are
respectively, where $ {D}_{L,1},{D}_{L,2},{D}_{R,1},{D}_{R,2}\in {\mathrm{\mathbb{R}}}^{n\times n} $ are the corresponding least squares matrices. See Appendix B for full details. The problems (31) and (32) are again solved using SVD.
Now we describe the steps of the Sinkhorn scaling method. Let $ r $ denote the iteration number of the procedure.
-
1. Scale rows: Let $ \boldsymbol{\mu} \in {\mathrm{\mathbb{R}}}^n $ be the solution to
Then we perform the update
-
2. Scale columns: Let $ \boldsymbol{\tau} \in {\mathrm{\mathbb{R}}}^n $ be the solution to
Then we perform the update
The Sinkhorn scaling procedure is detailed in Algorithm 2.
Algorithm 2. Sinkhorn scaling for quadratic constraint satisfaction
Input: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , the output of IRLS-ADMM
Output: $ {A}^{\prime}\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , a pure common lines matrix
1: procedure Sinkhorn $ (A) $
2: $ r\leftarrow 0 $
3: while not converged do
4: set $ {N}_L,{D}_{L,1},{D}_{L,2} $ using (B1), (B7), (B8)
5: set $ \boldsymbol{\mu} $ to be the solution of (33)
6: update $ A $ using (34) $ \vartriangleright $ 1. Scale rows
7: set $ {N}_R,{D}_{R,1},{D}_{R,2} $ using (B2), (B9), (B10)
8: set $ \boldsymbol{\tau} $ to be the solution of (35)
9: update $ A $ using (36) $ \vartriangleright $ 2. Scale columns
10: end while
11: $ r\leftarrow r+1 $
12: $ {A}^{\prime}\leftarrow A $
13: end procedure
5.3. Rotation recovery
IRLS-ADMM and Sinkhorn aim to output a pure common lines matrix $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ . Given a pure common lines matrix, we now show how to determine the underlying rotations $ {R}^{(i)} $ in (3) which generated $ A $ (recall Theorem 3.5).
Given $ A $ , use singular value decomposition to compute a rank-3 factorization $ A={BC}^{\top } $ for $ B\in {\mathrm{\mathbb{R}}}^{2n\times 3} $ and $ C\in {\mathrm{\mathbb{R}}}^{n\times 3} $ . We then seek an invertible matrix $ Q\in {\mathrm{\mathbb{R}}}^{3\times 3} $ so that $ BQ $ and $ {CQ}^{-\top } $ take the form of the factors in (Reference Dynerman6). In particular, it is required that the rows of $ BQ $ come in orthonormal pairs; more precisely, $ (BQ){(BQ)}^{\top }={BQQ}^{\top }{B}^{\top}\in {\mathrm{\mathbb{R}}}^{2n\times 2n} $ should have $ 2\times 2 $ identities along its diagonal. Setting $ X={QQ}^{\top}\in {\mathrm{\mathbb{R}}}^{3\times 3} $ and relaxing positive semidefiniteness, let us consider the following affine-linear least squares problem:
where $ \mathrm{\mathcal{L}}:{\mathrm{\mathbb{R}}}^{2n\times 2n}\to {\mathrm{\mathbb{R}}}^{2n\times 2n} $ denotes the affine-linear operator which sets all entries of a $ 2n\times 2n $ matrix outside of the $ 2\times 2 $ diagonal blocks to 0, and subtracts the $ 2\times 2 $ identity matrices from the diagonal blocks. The normal equations for (37) may be written as
where $ L\in {\mathrm{\mathbb{R}}}^{9\times 9} $ , whose rows and columns index the variables $ {X}_{ij} $ for $ 1\le i\le j\le 3 $ , and whose corresponding $ \left( ij,k\mathrm{\ell}\right) $ -th entry is
where $ {\mathbf{b}}_i $ is the ith column of $ B $ . Generically there is a unique symmetric matrix $ X $ solving (38).
Let $ X={\mathrm{VDV}}^{\top } $ be an eigendecomposition for the solution to (37), where $ V,D\in {\mathrm{\mathbb{R}}}^{9\times 9} $ . In the clean case, the diagonal matrix $ D $ is already positive semidefinite. Then $ {\mathrm{VD}}^{1/2}\in {\mathrm{\mathbb{R}}}^{3\times 3} $ is a candidate for $ Q $ . Next the ith row block of $ \mathrm{BQ} $ determines the first two rows of $ {R}^{(i)} $ , and the cross product of these two rows computes the third row of $ {R}^{(i)} $ . From this, we recover the n-tuple of rotations $ \left({R}^{(1)},\dots, {R}^{(n)}\right) $ up to global right multiplication by a rotation. The full procedure for rotation recovery is in Algorithm 3, which is formulated to apply to noisy inputs as well. We note that as explained in Remark 3.6, a pure common lines matrix $ A $ can only be recovered up to sign, because there are two distinct n-tuples of rotations that can be recovered corresponding to the chiral ambiguity of common lines data. One can run Rotations twice, on $ A $ and $ -A $ , to produce the two possible sets of rotations.
Algorithm 3. Rotation recovery
Input: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , an estimate for pure common lines matrix
Output: $ \left({R}_1,\dots, {R}_n\right)\in \mathrm{SO}{(3)}^n $ , rotations determining the pure common lines matrix
1: procedure Rotations $ (A) $
2: set $ B $ using the SVD to get a rank-3 approximation $ {BC}^{\top } $ for $ A $
3: set $ L $ using (39)
4: set $ X $ to be least squares solution of (38)
5: set $ V,D $ using the eigendecomposition $ X={\mathrm{VDV}}^{\top } $
6: $ {D}_{ii}\leftarrow \max \left({D}_{ii},0\right) $
7: for $ i=1,\dots, n $ do
8: set $ {\mathbf{q}}_j^{\top } $ to be the $ \left(2i-2+j\right) $ -th row of $ {\mathrm{BVD}}^{1/2} $ for $ j=1,2 $
9: set $ {\mathbf{q}}_3^{\top } $ to be the ith row of $ C{\left({\mathrm{VD}}^{1/2}\right)}^{-\top } $
10: set $ {R}_i\in {\mathrm{\mathbb{R}}}^{3\times 3} $ to be $ \left(\begin{array}{c}{\mathbf{q}}_2^{\top}\\ {}-{\mathbf{q}}_1^{\top}\\ {}{\mathbf{q}}_3^{\top}\end{array}\right) $
11: end for
12: if $ \det \left({R}_i\right)<0 $ then
13: for $ i=1,\dots, n $ do
14: $ {R}_i\leftarrow -{R}_i $
15: end for
16: end if
17: for $ i=1,\dots, n $ do
18: replace $ {R}_i $ by the nearest rotation matrix to $ {R}_i $ using the SVD of $ {R}_i $
19: end for
20: end procedure
5.4. Justification of algorithms
Suppose the input to IRLS-ADMM is a noiseless common lines matrix $ \hat{A}\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , and its output $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ is a global minimizer to the non-convex problem (13). In the next theorem, we show that we can recover the ground-truth pure common lines matrix, up to a global scale, by scaling the $ 2\times 1 $ blocks of $ A $ via IRLS-ADMM and enforcing the quadratic constraints via Sinkhorn. The theorem justifies using IRLS-ADMM and Sinkhorn.
Theorem 5.1. Let $ n\ge 4 $ . Let $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ be a generic pure common lines matrix and $ {\lambda}_{ij}\in \mathrm{\mathbb{R}} $ be nonzero scales for $ i,j=1,\dots, n $ with $ i\ne j $ . Let $ \Lambda \in {\mathrm{\mathbb{R}}}^{n\times n} $ where $ {\Lambda}_{ij}={\lambda}_{ij} $ if $ i\ne j $ , $ {\Lambda}_{ii}=0 $ , and $ \Lambda ={\Lambda}^{\mathrm{\top}} $ . Suppose
has rank 3. Then there exists $ {\mu}_1,\dots, {\mu}_n,{\tau}_1,\dots, {\tau}_n\in \mathrm{\mathbb{R}} $ such that
If $ B $ additionally satisfies the quadratic constraints (10), then there exists $ \tau \in \mathrm{\mathbb{R}} $ such that for all $ i,j=1,\dots, n $ ( $ i\ne j $ ) it holds $ {\lambda}_{ij}=\tau $ .
Proof. See Appendix A.
Remark 5.2. Theorem 3.5 implies that given a pure common lines matrix, the rotation recovery problem in Section 5.3 is uniquely solvable up to a global rotation. Theorem 5.1 states that the ground-truth pure common lines matrix can only be determined up to a global scale; in particular, $ \tau $ in Theorem 5.1 may be positive or negative. As in Remark 3.6, this sign flip corresponds to chiral ambiguity in cryo-EM. Apart from its sign, the global scale has no effect on rotation recovery in Algorithm 3.
6. Application: Clustering heterogeneous common lines
Here we present an application of our approach for common lines to a challenging problem in cryo-EM. We propose a clustering algorithm for detecting homogeneous communities of consistent common lines from discretely heterogeneous data, using our algebraic constraints. One can then use the clusters of common lines for rotation recovery and 3D reconstruction.
Several successful methods have been proposed for clustering heterogeneous cryo-EM data that consist of images of a single macromolecule with conformational landscapes or differences in subunits.(Reference Aizenbud and Shkolnisky21–Reference Katsevich, Katsevich and Singer27) Recent work of the third author(Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) proposed a method of solving a different heterogeneity challenge in cryo-EM, where the heterogeneity in the data comes from multiple distinct macromolecules rather than variations on one primary structure. Our proposed application focuses on the latter problem. For this setting of heterogeneity, the main prior work to compare against is (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7).
The basic idea of our approach is illustrated in Figure 1. There the matrix in gray is a matrix of common lines from simulated heterogeneous data, corresponding to two distinct molecular conformations. The two homogeneous common lines matrices are diagonal blocks in green and purple. The $ 2\times 1 $ entries outside of these diagonal blocks do not correspond to any consistent lines and are just random lines in $ {\mathrm{\mathbb{R}}}^2 $ , encoded as representatives. Note that in general, a heterogeneous common lines matrix will not necessarily have consistent common lines matrices as diagonal blocks, but will be a $ 2\times n $ row and $ n\times 1 $ column permutation of such a matrix. Gaussian white noise has also been added to the gray matrix to decrease the signal-to-noise ratio of the common lines. A scree plot in Figure 1 shows a noticeable spectral gap between the third and fourth singular values of the homogeneous common lines matrices (green and purple curve), which is not detectable for the entire heterogeneous common lines matrix (gray curve), thus demonstrating low-rank structure of submatrices.
Algorithm 4. Heterogeneous clustering on common line constraints
Input: $ A\in {\mathrm{\mathbb{R}}}^{2n\times n} $ , a common lines matrix
Output: $ {C}_1,\dots, {C}_r\subseteq \left\{1,\dots, n\right\} $ , a partition of all common lines into consistent clusters
1: procedure Clusters $ (A) $
2: $ L\leftarrow \mathrm{\varnothing} $ $ \vartriangleright $ 1. Generate samples
3: for $ i\ge 1 $ until sufficient do
4: set $ S\subseteq \left\{1,\dots, n\right\} $ to be a random sample such that $ \mid S\mid =4 $
5: set $ {A}_S $ to be the submatrix of $ A $ associated to the common lines in $ S $
6: AS←Irls-Admm (AS)
7: AS ← Sinkhorn (AS)
8: if converged then
9: set $ M $ using (19) on $ {A}_S $
10: set $ {\mathbf{v}}_1,{\mathbf{v}}_2,{\mathbf{v}}_3 $ using (28) on $ {A}_S $
11: $ e\leftarrow \parallel M-{M}^{\top }{\parallel}_F^2+\parallel {\mathbf{v}}_1-{\mathbf{v}}_2{\parallel}_2^2+\parallel {\mathbf{v}}_2-{\mathbf{v}}_3{\parallel}_2^2 $
12: $ L\leftarrow \mathrm{append}\left(L,\left(S,e\right)\right) $
13: end if
14: end for
15: sort $ L $ by increasing values of $ e $ $ \vartriangleright $ 2. Cluster samples
16: $ G\leftarrow {\mathbf{0}}_{n\times n} $
17: while $ L\ne \varnothing $ do
18: $ \left(S,e\right)\leftarrow \mathrm{pop}(L) $
19: $ {G}_{S,S}\leftarrow \max \left\{{G}_{S,S},-\log (e)\cdot {\mathbf{1}}_{4\times 4}\right\} $
20: end while
21: $ {G}_{ii}\leftarrow 0 $ for all $ i=1,\dots, n $
22: C 1,…,Cr ← CommunityDetection(G) ⊳ Use method from (Reference Lancichinetti, Fortunato and Kertész28)
23: end procedure
Our clustering algorithm consists of two main steps:
-
1. Generate samples: As input, we are given a single common lines matrix $ A $ , from which we identify small clusters of consistent common lines. We randomly sample a set of four common lines $ S $ and extract the corresponding $ 8\times 4 $ submatrix $ {A}_S $ from $ A $ (the choice of four common lines is explained in Section 7.3 based on numerical experiments). We then run IRLS-ADMM and Sinkhorn on $ {A}_S $ . These methods may occasionally diverge due to numerical instability from noise in the data, as discussed in Section 7, in which case we discard the sample. We also discard the sample if the spectral gap between the third and fourth singular value of $ {A}_S $ is not sufficiently large (i.e., $ {A}_S $ does not have numerical rank 3). Otherwise, we obtain a quadratic constraint satisfaction error for the sample. We record both the sample and its error, and repeat this sampling sufficiently many times.
-
2. Cluster samples: We view the collection of samples and errors we obtain as a weighted hypergraph on $ n $ vertices whose hyperedges are of size 4 corresponding to the samples and whose hyperedge weights are given by their corresponding errors. We convert the weighted hypergraph into a weighted graph by constructing a weighted adjacency matrix whose $ \left(i,j\right) $ entry is the negative logarithm of the smallest error on a hyperedge containing both common lines $ i $ and $ j $ . We then use an unsupervised community detection algorithm on this adjacency matrix to find the clusters of consistent common lines. In our numerical experiments, we use the algorithm proposed by Lancichinetti, Fortunato, and Kertész,(Reference Lancichinetti, Fortunato and Kertész28) which can identify overlapping communities and hierarchical structure, and depends only on a single hyperparamter controlling the scale of the hierarchies. In particular, we do not need to specify the number of clusters or their sizes.
The clustering algorithm is detailed in Algorithm 4 and illustrated in Figure 2.
7. Performance on data
We compare the performance of our methods to existing common lines based algorithms in the literature, namely functions in the software package ASPIRE (Reference Wright, Andén, Bansal, Xia, Langfield, Carmichael, Sowattanangkul, Brook, Shi, Heimowitz, Pragier, Sason, Moscovich, Shkolnisky and Singer29) and the clustering algorithm of Verbeke et al.(Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) We study the problems of recovering rotations, denoising common lines, and partitioning discretely heterogeneous image sets into homogeneous subcommunities using common lines. The tests are done on simulated data (at various levels of noise) and real data.
Our simulated data consists of image data of the 40S, 60S, and 80S ribosome (available from the Electron Microscopy Data Bank(Reference Lawson, Patwardhan, Baker, Hryc, Garcia, Hudson, Lagerstedt, Ludtke, Pintilie, Sala, Westbrook, Berman, Kleywegt and Chiu30) as entries EMD-4214, EMD-2811, and EMD-2858, respectively), generated by ASPIRE. The ribosomes and examples of their clean 2D projection images are displayed in Figure 3. Each image is $ 128\times 128 $ with a pixel size of 3 Å. White Gaussian noise is added to each image corresponding to a specified signal-to-noise ratio (SNR). We define the SNR by taking the signal to be the average squared intensity over each pixel in the clean image and setting the noise variance to achieve the appropriate ratio. The common lines between two images are detected by finding the line projections of the two images with the highest correlation, as computed in ASPIRE.
In our numerical experiments, we have observed that the performance of IRLS-ADMM, and consequently Sinkhorn, can depend on its initialization. In particular, we have occasionally observed divergence or vanishing of the entries of $ A $ in Sinkhorn. This behavior appears to be due to numerical instabilities in Sinkhorn arising from noise in the data or from using too many common lines. In these cases, we can either discard such runs and restart the algorithms with new initializations, or skip using Sinkhorn. We address this issue in each of our tests.
7.1. Rotation recovery
Let $ {R}^{(1)},\dots, {R}^{(n)}\in \mathrm{SO}(3) $ be the ground-truth rotations and $ {R}_1,\dots, {R}_n\in \mathrm{SO}(3) $ be the recovered rotations. Then Theorem 3.5 states that there exists a unique rotation $ Q\in \mathrm{SO}(3) $ such that $ {R}^{(i)}={R}_iQ $ for all $ 1\le i\le n $ . In other words, $ Q $ can be found by solving the orthogonal Procrustes problem
The solution to this problem can be found using SVD.(Reference Schönemann31) We note that for $ n=1 $ , there is a simple relation between the Procrustes error and the angular error between two rotation matrices. For $ R,S\in \mathrm{SO}(3), $ it holds $ \parallel R-S{\parallel}_F^2=\left\langle R-S,R-S\right\rangle =\parallel {R}^{\top }R{\parallel}_F^2-2\left\langle R,S\right\rangle +\parallel {S}^{\top }S{\parallel}_F^2 $ $ =\parallel {I}_{3\times 3}{\parallel}_F^2-2\left\langle R,S\right\rangle +\parallel {I}_{3\times 3}{\parallel}_F^2=2\cdot 3-2\cdot \mathrm{tr}\left({R}^{\top }S\right) $ . Hence the angular error between $ R $ and $ S $ is
We compare our method to the procedure in ASPIRE on simulated data. Given common lines, the rotation recovery algorithm used in ASPIRE is the synchronization with voting procedure as described in (Reference Shkolnisky and Singer5). We use 30 images of the 60S, 80S, and 40S ribosomes at $ \mathrm{SNR}=\mathrm{0.125,0.25,0.5,1} $ . For each macromolecule and SNR, we generate 50 sets of 30 random ground-truth rotations and their corresponding noisy images. We report the average Procrustes error (41) per image (i.e., the Procrustes error divided by the number of images).
When running this test, we chose to only use our IRLS-ADMM algorithm followed by Rotations, and omitted the Sinkhorn row and column scaling step due to observed numerical instabilities of Sinkhorn with noise in the data or too many common lines. Also, Remark 5.2 states that our algorithm is only guaranteed to recover the ground-truth pure common lines matrix up to a global scale, and the sign of this scale produces two sets of rotations that differ by left-multiplication with $ J=\operatorname{diag}\left(-1,-1,1\right) $ . This sign flip corresponds to the chiral ambiguity in cryo-EM, as explained in Remark 3.6. Thus in our tests, we report the best rotational error amongst the two possible sets of rotations.
The results for rotation recovery are displayed in Table 1. Notably, at lower SNR our method is consistently more accurate than ASPIRE.
Bold values indicate the algorithm with lower error.
7.2. Denoising common lines
The problem we consider here is the following: given a noisy common lines matrix, how well can we recover the ground-truth clean pure common lines matrix? If $ A,B\in {\mathrm{\mathbb{R}}}^{2n\times n} $ are the ground truth and recovered pure common lines matrix respectively, then we measure this error to be
since there is a global scale ambiguity in the recovered pure common lines matrix as discussed in Remark 5.2. The above problem is a least-squares problem in $ \lambda $ and hence has a closed-form solution.
We compare our methods to ASPIRE on simulated data as follows. We run the rotation recovery algorithm of ASPIRE based on common lines to obtain rotations $ {R}^{(1)},\dots, {R}^{(n)}\in SO(3) $ . Then we construct a recovered pure common lines matrix $ B $ from these rotations by using the factorization (6). We then compare the denoising error (42) from the ground-truth clean common lines matrix $ A $ to the output of our IRLS-ADMM method and to the matrix $ B $ . As before, the simulated data consists of 30 images of the 60S, 80S, and 40S ribosomes at $ \mathrm{SNR}=\mathrm{0.125,0.25,0.5,1} $ , and we report the average denoising error (42) per image over 50 runs. We assume we have the ground-truth chiral information when recovering rotations with ASPIRE.
The results for common line denoising are displayed in Table 2. Again our IRLS-ADMM method outperforms ASPIRE for denoising common lines matrices, particularly at low SNR.
Bold values indicate the algorithm with lower error.
7.3. Clustering heterogeneous image sets
We test the performance of our algorithm Clusters for clustering (see Section 6) on simulated and real data.
The success of clustering is measured by the adjusted Rand index(Reference Hubert and Arabie32) (ARI) between the ground-truth clusters and the recovered clusters. The range of this index is $ -\infty <\mathrm{ARI}\le 1 $ , with $ \mathrm{ARI}=1 $ if the two partitions are identical. The ARI is a corrected-for-chance version of the Rand index, meaning that it is the expected Rand index for the cluster and is equal to 0 if every element is placed in a random cluster.
While any number of common lines $ \mid S\mid $ can be sampled on line 4 of Clusters, we found that sampling four common lines at a time was effective for a number of reasons: $ \mid S\mid =4 $ enforces a non-trivial rank 3 constraint, and the small number of common lines allowed us to both generate many samples rapidly and improve the numerical stability of the Sinkhorn scaling procedure. In addition, the CommunityDetection algorithm we use is the one described in (Reference Lancichinetti, Fortunato and Kertész28).
7.3.1. Simulated data
We generate a dataset containing three clusters, with $ n=5+30+15=50 $ images from the 40S, 60S, and 80S ribosomes, respectively, from which we construct a common lines matrix.
Clusters achieved perfect clustering ( $ \mathrm{ARI}=1 $ ) at $ \mathrm{SNR}=10 $ , $ \mathrm{ARI}=0.8581 $ at $ \mathrm{SNR}=5 $ , and $ \mathrm{ARI}=0.3286 $ at $ \mathrm{SNR}=1 $ . The clusters found at $ \mathrm{SNR}=5 $ are displayed in Figure 4, which shows that only one pair of images were placed in incorrect clusters.
7.3.2. Real data
Our real data consists of a subset of 2D class averages computed from the experimental data described in Verbeke et al.(Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) The subset we consider consists of two clusters with $ n=47+28=75 $ images corresponding to the 60S and 80S ribosomes respectively. Each class average is $ 96\times 96 $ with a pixel size of 4.4 Å. We use the labels from (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) as ground truth for clustering.
Figure 5 shows the clusters found by our algorithm Clusters, achieving ARI = 0.8440 and misclassifying only three images.
The clustering algorithm used in (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) is based on performing community detection on a nearest-neighbors graph constructed using Euclidean distances between the best-matching line projections between every pair of images. We stress that our clustering algorithm uses a completely distinct aspect of common lines data: the positions of the common lines. As a proof of concept for our constraints, we do not make use of the correlations between the common lines at all, unlike (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7). The test for Clusters only uses a subset of the dataset in (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7), which has $ n=100 $ images and includes images with unknown labels. If we compare only the 60S and 80S images that were clustered, then Clusters achieves a similar performance, where one additional image is misclassified by Clusters compared to (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7).
8. Conclusion
This article revisited the fundamental topic of common lines in cryo-EM image processing. We discussed a novel approach for dealing with common lines, based on a certain $ 2n\times n $ matrix encoding the common lines between $ n $ projection images. We proved that if the $ 2\times 1 $ blocks of the matrix are properly scaled, then the matrix satisfies nice algebraic constraints: a low-rank condition and several sparse quadratic constraints. The new formulation operates directly on common lines data, and is fully global in that it does not require angular reconstitution or voting procedures at all. It opens the door to different and potentially more robust approaches to computational tasks involving common lines. Using the algebraic constraints, we adapted optimization algorithms from other domains to give new methods to denoise common lines data, and recover the 3D rotations underlying noisy images. Numerical experiments show that these methods have increased accuracy at low SNR, compared to existing methods based on common lines. We also explored a setting where traditional common lines methods fail to apply – cryo-EM datasets with discrete heterogeneity – by proposing a sampling-based process to cluster the images in homogeneous subcommunities based on our algebraic constraints. Experiments with simulated and real data show the method performs well when applied to images with high noise.
Although there is clear promise, several future directions could be pursued for further improvements. Firstly, in Section 5 the optimization algorithm building on (Reference Sengupta, Amir, Galun, Goldstein, Jacobs, Singer and Basri11) is quite complex. Matrix scaling problems as in (Reference Sengupta, Amir, Galun, Goldstein, Jacobs, Singer and Basri11) and our work are an interesting variation on the problem of matrix completion; would other optimization approaches perform better? Secondly, extensions to molecules with nontrivial point group symmetries would be useful (and currently are a focus in other common lines research). Perhaps our formulation can suggest another way to incorporate symmetries into common lines methods. Lastly, in the application to discrete heterogeneity, we neglected correlation scores between the common lines, on which (Reference Verbeke, Zhou, Horton, Mallam, Taylor and Marcotte7) relied. It is likely better to use both the scores and the algebraic constraint errors.
Data availability statement
Data and replication code are available at https://github.com/ozitommi/algebraic-common-lines.
Author contribution
All authors conceived the project, designed the algorithms, and developed the mathematical theory. T.M., A.L.D., and E.J.V. prepared the data. T.M. and A.L.D. wrote the software and performed the numerical experiments. All authors wrote the manuscript and approved its final submission.
Funding statement
T.M. is supported by a Mathematical Institute Scholarship at the University of Oxford. A.L.D. is supported in part by NSF DMS 1937215. E.J.V. is supported by the Simons Foundation Math+X Investigator award to Amit Singer. J.K. is supported in part by NSF DMS 2309782, NSF CISE-IIS 2312746, and start-up grants from the College of Natural Science and Oden Institute at the University of Texas at Austin.
Competing interest
The authors declare no competing interests exist.
A. Appendix. Additional proofs
Proof of Proposition 3.3. Since orthogonal matrices preserve norms, we have
Similarly, we have
Thus, by equating (A1) with (A2) and simplifying, we obtain $ \parallel {\mathbf{a}}_{ij}{\parallel}_2^2=\parallel {\mathbf{a}}_{ji}{\parallel}_2^2 $ .
Proof of Proposition 3.4. Let $ D=\det \left(\begin{array}{ccc}{\mathbf{r}}_3^{(i)}& {\mathbf{r}}_3^{(j)}& {\mathbf{r}}_3^{(k)}\end{array}\right) $ . Then for each $ i $ we have
by expanding the determinant along the first column. Similarly for each $ j $ and $ k $ we have
Thus, equating (A3)–(A5), we obtain $ \det \left(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}\right)=-\det \left(\begin{array}{cc}{\mathbf{a}}_{ji}& {\mathbf{a}}_{jk}\end{array}\right)=\det \left(\begin{array}{cc}{\mathbf{a}}_{ki}& {\mathbf{a}}_{kj}\end{array}\right) $ .
Proof of Theorem 3.5. We first prove the theorem in the case $ n=3 $ . Given $ {R}_1,{R}_2,{R}_3\in SO(3) $ , we have $ A=\psi \left({R}_1,{R}_2,{R}_3\right)=\psi \left({R}_1Q,{R}_2Q,{R}_3Q\right) $ for any $ Q\in SO(3) $ since
Hence, the fibers of $ \psi $ contain at least a copy of $ SO(3). $ By fixing $ Q={R}_1^{\top }, $ we may assume that $ {R}_1=I, $ so that $ A=\psi \left({R}_1,{R}_2,{R}_3\right)=\psi \left(I,{R}_2{R}_1^{\top },{R}_3{R}_1^{\top}\right) $ . To prove the statement, we therefore need to show that fibers of the map $ \psi $ with $ {R}_1=I $ fixed,
generically consist of only one point. Our strategy to show this is to set up a corresponding system of polynomial equations for a random instance of two rotations $ {R}^{(1)} $ and $ {R}^{(2)} $ , and then solve the system numerically, showing that it has only one solution. It is sufficient to solve the polynomial system over the complex numbers $ \mathrm{\mathbb{C}} $ , and exhibit that there is a unique solution even over $ \mathrm{\mathbb{C}} $ . We parameterize $ {R}_1,{R}_2\in \mathrm{SO}(3) $ using the Euler-Rodriguez formula:
where $ {a}_i,{b}_i,{c}_i,{d}_i\in \mathrm{\mathbb{R}} $ such that $ {a}_i^2+{b}_i^2+{c}_i^2+{d}_i^2=1 $ . We construct one pure common lines matrix $ A=\psi \left(I,{R}_1,{R}_2\right) $ whose entries are polynomial in $ {a}_i,{b}_i,{c}_i,{d}_i $ , and another pure common lines matrix $ \overline{A}=\psi \left(I,\overline{R_1},\overline{R_2}\right) $ , from random rotation matrices $ \overline{R_1},\overline{R_2}\in SO(3) $ , whose entries are real. Setting $ A=\overline{A} $ gives us a system of polynomial equations, which we solve over $ \mathrm{\mathbb{C}} $ using the package HomotopyContinuation.jl in Julia.(Reference Breiding and Timme33) Every rotation under the Euler-Rodriguez parametrization has two parametrizations, namely $ \left(a,b,c,d\right) $ and $ \left(-a,-b,-c,-d\right) $ representing the same matrix in $ SO(3) $ . By parameterizing $ {R}_1 $ and $ {R}_2 $ , we therefore expect to find 4 real solutions to the system and therefore one point in the fiber, which is indeed what we find using homotopy continuation. The computation is carried out in the file fiber.jl in our GitHub repository (8.1).
Now with the theorem true for $ n=3 $ , it in fact follows that the theorem is true for all $ n>3 $ : if $ {A}_1=\psi \left({R}_1,\dots, {R}_n\right) $ and $ {A}_2=\psi \left({S}_1,\dots, {S}_n\right) $ , where $ {A}_1={A}_2 $ is a generic point in the image of $ \psi $ , then we need to show that there exists a unique $ Q\in \mathrm{SO}(3) $ such that $ {S}_i={R}_iQ $ for all $ i=1,\dots, n $ . Choosing two $ 6\times 3 $ pure common lines submatrices of $ {A}_1={A}_2 $ corresponding to indices $ \left\{i,j,k\right\} $ and $ \left\{i,\mathrm{\ell},m\right\} $ , the theorem in the case $ n=3 $ implies that there exist $ {Q}_{ijk},{Q}_{i\mathrm{\ell}m}\in \mathrm{SO}(3) $ such that $ \left({S}_i,{S}_j,{S}_k\right)=\left({R}_i{Q}_{ijk},{R}_j{Q}_{ijk},{R}_j{Q}_{ijk}\right) $ and $ \left({S}_i,{S}_j,{S}_{\mathrm{\ell}}\right)=\left({R}_i{Q}_{i\mathrm{\ell}m},{R}_{\mathrm{\ell}}{Q}_{i\mathrm{\ell}m},{R}_m{Q}_{i\mathrm{\ell}m}\right) $ . In particular, $ {R}_i{Q}_{ijk}={R}_i{Q}_{i\mathrm{\ell},} $ , so $ {Q}_{ijk}={Q}_{i\mathrm{\ell}m} $ . Thus, any triplet of rotations are related by the same matrix $ Q $ .
Proof of Proposition 3.7. Let $ {\mathcal{V}}_n\subseteq {\mathrm{\mathbb{R}}}^{2n\times n} $ denote the cone over the common lines variety (i.e., $ {\mathcal{V}}_n $ is the smallest algebraic variety containing all scalar multiples of all pure common lines matrices). Let $ {\mathcal{W}}_n\subseteq {\mathrm{\mathbb{R}}}^{2n\times n} $ denote the variety defined by the polynomial constraints in Proposition 3.7. Note that since the constraints are invariant to global scaling, we have $ {\mathcal{V}}_n\subseteq {\mathcal{W}}_n $ . By (Reference Bochnak, Coste and Roy15), in order to show that $ {\mathcal{V}}_n $ is an irreducible component of $ {\mathcal{W}}_n $ we need to show that
holds a generic point $ A\in {\mathcal{V}}_n $ , where $ \dim \left(\cdot \right) $ and $ \dim \left(\cdot, \cdot \right) $ respectively denote the dimension of a variety and the local dimension of a variety at a point. This is analogous to computing the dimension of a manifold by computing the dimension of its tangent space at a point.
Recall $ {\mathcal{V}}_n $ is the cone over the common lines variety, which is the smallest algebraic variety containing the image of the map $ \psi $ in (7). Thus using the fiber dimension theorem (the algebraic geometric analog of the rank-nullity theorem from linear algebra), Theorem 3.5 implies
Next, we compute $ \dim \left({\mathcal{W}}_n,A\right) $ . We first note that $ {\mathcal{W}}_n\subseteq {\mathcal{X}}_n $ , where $ {\mathcal{X}}_n $ is the algebraic variety of all rank $ \le 3 $ matrices in $ {\mathrm{\mathbb{R}}}^{2n\times n} $ defined by the vanishing of $ 4\times 4 $ minors. Note the map $ \rho :{\mathrm{\mathbb{R}}}^{2n\times 3}\times {\mathrm{\mathbb{R}}}^{n\times 3}\to {\mathcal{V}}_n $ given by $ \rho \left(B,C\right)={BC}^{\top } $ parameterizes $ {\mathcal{V}}_n $ with $ 9 $ -dimensional fibers over each rank-3 point.(Reference Levin, Kileel and Boumal34) This holds because for each $ 3\times 3 $ invertible matrix $ M $ , $ \rho \left(B,C\right)=\rho \left({BM}^{-1},{CM}^{\top}\right) $ . Hence
for $ \left(B,C\right)\in {\mathrm{\mathbb{R}}}^{2n\times 3}\times {\mathrm{\mathbb{R}}}^{n\times n} $ such that $ \rho \left(B,C\right)=A $ . We now compute the right-hand side of (A9) in the computer algebra system Macaulay2.(Reference Grayson and Stillman16) Specifically, we differentiate the defining constraints of $ {\mathcal{W}}_n $ with respect to $ B $ and $ C $ , noting that the rank-3 constraint is enforced automatically by $ \rho $ and the other constraints in Proposition 3.7 are biquadratic and bilinear in $ \left(B,C\right) $ . We evaluate the Jacobian matrix at a point $ \left(B,C\right) $ in the fiber of $ \rho $ over $ A $ , where $ A\in {\mathcal{V}}_n $ is generated as a random scaling of $ \psi \left({R}^{(1)},\dots, {R}^{(n)}\right) $ where $ {R}^{(i)} $ are random. Generically, the nullity of the Jacobian matrix equals $ \dim \left({\rho}^{-1}\left({\mathcal{W}}_n\right),\left(B,C\right)\right) $ . Performing numerical computations for $ n=3,\dots, 50 $ in double-precision arithmetic yields numerical nullities of $ 3n+7 $ . Comparing (A9) with (A8) implies (A7) as desired.
Proof of Theorem 5.1. Since nonzero row and column scaling preserves rank, we may assume without loss of generality that $ {\lambda}_{i1}={\lambda}_{1j}=1 $ for all $ i,j=2,\dots, n $ : this is achieved by choosing $ {\mu}_1={\tau}_1=1 $ , $ {\mu}_i={\lambda}_{i1}^{-1} $ , and $ {\tau}_j={\lambda}_{1j}^{-1} $ . We will show that $ {\lambda}_{ij}={\lambda}_{k\mathrm{\ell}} $ for all $ i,j,k,\mathrm{\ell}=2,\dots, n $ , $ i\ne j $ , $ k\ne \mathrm{\ell} $ . In other words, we will prove that the matrix $ \Lambda $ must have the form in (A12).
First, we prove that $ {\lambda}_{ij}={\lambda}_{ik} $ for all $ i,j,k=2,\dots, n $ , $ i\ne j $ , $ i\ne k $ . There are three cases to consider: $ i<j<k $ , $ j<i<k $ , and $ j<k<i $ . Suppose we are in the first case. Since $ \operatorname{rank}(B)=3, $ every $ 4\times 4 $ minor of $ B $ vanishes. Thus, choosing row indices 1, 2, $ 2i-1 $ , and $ 2i $ , and column indices 1, $ i $ , $ j $ , and $ k $ , we have
But since $ A $ is a pure common lines matrix, $ \det \left(\begin{array}{cc}{\mathbf{a}}_{1i}& {\mathbf{a}}_{1j}\end{array}\right)=-\det \left(\begin{array}{cc}{\mathbf{a}}_{i1}& {\mathbf{a}}_{ij}\end{array}\right) $ and det $ \left(\begin{array}{cc}{\mathbf{a}}_{i1}& {\mathbf{a}}_{ik}\end{array}\right)=-\det \left(\begin{array}{cc}{\mathbf{a}}_{1i}& {\mathbf{a}}_{1k}\end{array}\right) $ by Proposition 3.4, all of which are nonzero since $ A $ is generic. Thus $ {\lambda}_{ij}={\lambda}_{ik} $ .
Expanding the $ 4\times 4 $ minor with the same choice of row and column indices for the remaining two cases gives us
and
respectively, which brings us back to (A10).
Suppose that $ 3\le i\le n-1 $ . The $ 4\times 4 $ minor whose row indices are $ 2i-1 $ , $ 2i $ , $ 2\left(i+1\right)-1, $ , and $ 2\left(i+1\right) $ , and whose column indices are 1, $ i-1 $ , $ i $ , and $ i+1 $ is
We showed earlier that $ {\lambda}^{\prime}\hskip0.5em := \hskip0.5em {\lambda}_{i,i+1}={\lambda}_{i,i-1} $ and $ {\lambda}^{{\prime\prime}}\hskip0.5em := \hskip0.5em {\lambda}_{i+1,i}={\lambda}_{i+1,i-1} $ . Thus expanding, we obtain
After dividing both sides of the last equation by $ {\lambda}^{\prime }{\lambda}^{{\prime\prime} } $ , we obtain $ {\lambda}^{\prime }={\lambda}^{{\prime\prime} } $ . Lastly, the $ 4\times 4 $ minor with the same choice of row and column indices when $ i=2 $ is
since $ {\lambda}^{\prime}\hskip0.5em := \hskip0.5em {\lambda}_{23}={\lambda}_{24} $ and $ {\lambda}^{{\prime\prime}}\hskip0.5em := \hskip0.5em {\lambda}_{32}={\lambda}_{34} $ , which brings us back to (A11). Thus we have proven that $ {\lambda}_{ij}={\lambda}_{k\mathrm{\ell}} $ for all $ i,j,k,\mathrm{\ell}=2,\dots, n $ , $ i\ne j $ , $ k\ne \mathrm{\ell} $ . Let $ \lambda $ be their common value. Then
and we may obtain (40) by choosing $ {\mu}_1=\frac{1}{\lambda } $ , $ {\mu}_2=\dots ={\mu}_n=1 $ and $ {\tau}_1=1 $ , $ {\tau}_2=\dots ={\tau}_n=\lambda $ .
Now we prove the last statement of the theorem. What we have shown so far is that $ \operatorname{rank}(B)=3 $ implies that there exists scales $ {\mu}_i $ and $ {\tau}_j $ such that $ {\lambda}_{ij}={\mu}_i{\tau}_j $ . Since $ {\lambda}_{ij}={\lambda}_{ji} $ , we have
for all $ i,j=1,\dots, n $ . If $ \boldsymbol{\mu} ={\left({\mu}_i\right)}_{i=1}^n $ and $ \boldsymbol{\tau} ={\left({\tau}_j\right)}_{j=1}^n $ , then this means that the matrix $ {\boldsymbol{\mu} \boldsymbol{\tau}}^{\top}\in {\mathrm{\mathbb{R}}}^{n\times n} $ is symmetric. This is true if and only if $ \boldsymbol{\mu} =\boldsymbol{\tau} $ . In particular,
If $ B $ furthermore satisfies the determinant constraints, then we have
for all $ 1\le i<j<k\le n $ , which implies that
Substituting (A13), we obtain
after dividing by $ {\tau}_i{\tau}_j{\tau}_k $ . Denoting the common value of the above equation on the right by $ \tau $ , we find that $ {\mu}_i=\tau $ . Hence, $ {\lambda}_{ij}=\tau $ for all $ i,j=1,\dots, n $ , $ i\ne j $ , which gives the desired result.
B. Appendix. Least squares problems
In Section 5.2, we consider the least squares problems (20) and (21), which are
These have solutions
respectively, where $ {N}_L\in {\mathrm{\mathbb{R}}}^{n\times n} $ with
and $ {N}_R\in {\mathrm{\mathbb{R}}}^{n\times n} $ with
We also consider the least squares problems (29) and (30), which are
where
and
are all vectors of length $ \left(\begin{array}{l}n\\ {}3\end{array}\right) $ . The solution to problem (B3) is
where $ {D}_{L,1},{D}_{L,2}\in {\mathrm{\mathbb{R}}}^{n\times n} $ with
and the solution to problem (B4) is
where $ {D}_{R,1},{D}_{R,2}\in {\mathrm{\mathbb{R}}}^{n\times n} $ with
where $ {v}_{ij, ik}\hskip0.5em := \hskip0.5em \det \left(\begin{array}{cc}{\mathbf{a}}_{ij}& {\mathbf{a}}_{ik}\end{array}\right) $ .