1 Introduction
In the social and behavioral sciences, researchers frequently collect the data on how objects are featured by nominal categorical variables. For instance, in social surveys using questionnaires, respondents are asked to select one category from the available options for each item. Multiple correspondence analysis (MCA) is a widely used technique to obtain a low-dimensional graphical representation of multivariate categorical data while preserving crucial information (Beh & Lombardo, Reference Beh and Lombardo2014; Greenacre, Reference Greenacre1984; Le Roux & Rouanet, Reference Le Roux and Rouanet2010). The MCA solution provides a joint plot that simultaneously visualizes both objects and categories in the reduced space. This joint graphical display allows users to explore inter- and intra-relationships among objects and categories, which is one of the main advantages of MCA (Hoffman & Franke, Reference Hoffman and Franke1986).
While MCA is the reasonable method for reducing the dimensionality of multivariate categorical data, it has a considerable drawback: the percentage of variance explained by the solution is underestimated in MCA (Greenacre, Reference Greenacre1988, Reference Greenacre, Greenacre and Blasius1994). To address this issue, Greenacre (Reference Greenacre1988) proposed joint correspondence analysis (JCA) as an alternative to MCA. Subsequent research has explored additional JCA estimation procedures beyond Greenacre’s original method (Boik, Reference Boik1996; Tateneni & Browne, Reference Tateneni and Browne2000).
JCA addresses the variance underestimation problem, but the existing JCA methods primarily focus on exploring the relationships between categories in the reduced space with less attention given to object score estimation, except for Boik (Reference Boik1996). In that study, a two-stage estimation procedure for obtaining the object score was proposed. First, the category scores are estimated, and then the object scores are derived based on the estimated category scores. However, the object score estimation step is optional and often omitted in real data analyses. Additionally, no justification was provided in Boik (Reference Boik1996) for jointly displaying the object and category scores. As discussed in Hoffman and Franke (Reference Hoffman and Franke1986), the joint plot is valuable for examining the relationships among the objects and categories, but this representation is not supported by current JCA methods. This limitation is a disadvantage of JCA compared to MCA.
The purpose of this article is to propose an alternative JCA formulation that allows the users to represent the object and category scores jointly in the same low-dimensional space while addressing the underestimated variance problem that is inherent to MCA. In the proposed method, object and category scores can be simultaneously estimated, in contrast to the two-stage estimation procedure employed in the previous study.
As noted in Greenacre (Reference Greenacre1991); Boik (Reference Boik1996) and van de Velden (Reference van de Velden2000), JCA solutions can be interpreted in a manner similar to exploratory factor analysis (EFA). From the EFA perspective, the simultaneous JCA estimation method proposed in this article is based on the direct method used in EFA, while the JCA estimation methods introduced by Greenacre (Reference Greenacre1988) and Boik (Reference Boik1996) are analogous to the MINRES method (Harman & Jones, Reference Harman and Jones1966). The direct method was presented by Professor Henk A. L. Kiers at the University of Groningen in 2001 (Sočan, Reference Sočan2003, p. 17), and de Leeuw (Reference de Leeuw, Montfort, Oud and Satorra2004) independently developed an equivalent method. Then, this method was revisited in some papers (Adachi, Reference Adachi2019; Stegeman, Reference Stegeman2016; Unkel & Trendafilov, Reference Unkel and Trendafilov2010). Previous studies have primarily focused on the geometric interpretation of JCA solutions, with the factor-analytic perspective being largely overlooked, despite some mention of the EFA-JCA connection. In this study, we also address the factor-analytic interpretation of JCA solutions in addition to the geometric interpretation.
The organization of this article is as follows. Section 2 provides a brief overview of existing MCA and JCA formulations. Section 3 introduces the proposed alternative JCA formulation and its estimation algorithm. Section 4 explores both the geometric and factor-analytic interpretations of the JCA solution. Because JCA has rotational indeterminacy similar to simple and MCA, Section 5 examines rotation techniques to enhance the interpretability of the JCA solution, drawing on literature from the simple and MCA (Adachi, Reference Adachi2004; Lorenzo-Seva et al., Reference Lorenzo-Seva, van de Velden and Kiers2009; Makino, Reference Makino2022; van de Velden, Reference van de Velden2000; van de Velden & Kiers, Reference van de Velden, Kiers, Yanai, Okada, Shigemasu, Kano and Meulman2003, Reference van de Velden and Kiers2005). Section 6 presents two real data examples to illustrate the geometric and factor-analytic interpretations of JCA solutions. Finally, Section 7 offers our concluding remarks.
2 Existing formulations of multiple and JCA
2.1 MCA
Assume that we obtain an n-objects
$\times $
p-nominal variables data matrix. This categorical data matrix can be transformed into an n-objects
$\times $
K-categories super-indicator matrix
$\mathbf {G}=[\mathbf {G}_{1}, \dots , \mathbf {G}_{p}]$
, where
$\mathbf {G}_{j}$
is an n-objects
$\times $
$K_{j}$
-categories indicator matrix for jth variable and
$K=\sum _{j=1}^{p} K_{j}$
is the total number of categories. Additionally, let
$\mathbf {D}=\text {b-diag}(\mathbf {G}^{\prime }\mathbf {G})$
be a K-categories
$\times $
K-categories block diagonal matrix whose jth block diagonal part is expressed as
$\mathbf {D}_{j}=\mathbf {G}^{\prime }_{j}\mathbf {G}_{j}$
. Also,
$\mathbf {J}=\mathbf {I}_{n} - n^{-1}\mathbf {1}_{n}\mathbf {1}^{\prime }_{n}$
denotes the
$n \times n$
centering matrix.
The MCA data model can be described as follows (Adachi, Reference Adachi2020; Beh & Lombardo, Reference Beh and Lombardo2014):

Here, S is an n-objects
$\times $
c-components object score matrix,
$\mathbf {V}$
is a K-categories
$\times $
c-components category score matrix and
$\mathbf {E}_{M}$
is an n-objects
$\times $
K-categories matrix that contains error. The goal of MCA is to find a low-dimensional representation of the observed categorical data while preserving as much of the original variation as possible. The MCA solutions are obtained by minimizing the following loss function

over S and V.
Let us denote the rank of
$\mathbf {JGD}^{-1/2}$
as r and the singular value decomposition (SVD) of this matrix as
$\mathbf {K}\boldsymbol{\Delta}\mathbf {L}^{\prime }$
. Here,
$\mathbf {K}=[\mathbf {k}_{1}, \dots , \mathbf {k}_{r}]$
, and
$\mathbf {L}=[\mathbf {l}_{1}, \dots , \mathbf {l}_{r}]$
are the left and right singular vector matrices, respectively, satisfying
$\mathbf {K}^{\prime }\mathbf {K}=\mathbf {L}^{\prime }\mathbf {L}=\mathbf {I}_{r}$
, and
$\boldsymbol{\Delta}=\mathrm {diag}(\delta _{1},\dots ,\delta _{r})$
is a diagonal matrix with diagonal elements representing the singular values arranged in descending order. Minimizing the loss function in equation (2.2) can be achieved by
$n^{-1} \mathbf {S}\mathbf {V}^{\prime }\mathbf {D}^{1/2}=\mathbf {K}_{c}\boldsymbol{\Delta}_{c}\mathbf {L'}_{c}$
, where
$\mathbf {K}_{c}$
and
$\mathbf {L}_{c}$
are the matrices consisting of the first c columns of K and L respectively, and
$\boldsymbol{\Delta}_{c}$
is a diagonal matrix containing the first c largest singular values on its diagonal (Eckart & Young, Reference Eckart and Young1936; ten Berge, Reference ten Berge1993). Typically, a normalization condition
$n^{-1} \mathbf {S}^{\prime }\mathbf {S} = \mathbf {I}$
is imposed on S for identification purposes. The optimal solutions in MCA can be described as follows:


Note that
$\mathbf {JGD}^{-1/2}$
leads to the column-centered left singular vector matrix: the object scores in MCA can be regarded as the standardized scores.
2.2 JCA
Boik (Reference Boik1996) defined the JCA data model as

where
$\mathbf {F}$
is a common object score matrix (
$n \times c$
),
$\mathbf {U}$
is a unique object score matrix (
$n \times K$
),
$\mathbf {W}$
is a common category score matrix (
$K \times c$
),
$\boldsymbol {\Psi }$
is a unique category score matrix (
$K \times K$
),
$\mathbf {E}_{D}$
is an error matrix in the data model (
$n \times K$
), respectively. It is assumed that these parameters satisfy the following constraints:



Here,
$\boldsymbol {\Psi }_{j}$
expresses a unique category score matrix of jth variable and
$\mathbf {O}$
expresses a matrix of zeros. The JCA parameter estimation in Boik (Reference Boik1996) and Greenacre (Reference Greenacre1988) is based on the covariance model rather than the data model; the corresponding covariance model is described as

Here,
$\boldsymbol {\Psi }^{*}=\text {b-diag}(\boldsymbol {\Psi }_{1}\boldsymbol {\Psi }^{\prime }_{1},\dots ,\boldsymbol {\Psi }_{p}\boldsymbol {\Psi }^{\prime }_{p})$
and
$\mathbf {E}_{C}$
expresses an error matrix in the JCA covariance model (
$K \times K$
). The JCA solutions are obtained by minimizing the discrepancy between off-diagonal block parts of
$\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}$
and the corresponding model part, as defined in Greenacre (Reference Greenacre1988):

Here,
$ \mathbf {D}^{1/2}\boldsymbol {\Psi }^{*}\mathbf {D}^{1/2}$
are assumed to be dummy parameters that are used to fit the diagonal blocks of
$\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2} $
. Greenacre (Reference Greenacre1988) proposed a JCA estimation algorithm that alternates two minimization steps until a convergence criterion is satisfied: (a) minimize with respect to
$\mathbf {W}$
while keeping
$\boldsymbol {\Psi }^{*}$
fixed and (b) minimize with respect to
$\boldsymbol {\Psi }^{*}$
while keeping
$\mathbf {W}$
fixed. In step (a),
$\mathbf {W}$
is updated using the eigenvalue decomposition (EVD) of
$\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2} - n^{-1} \mathbf {D}^{1/2}\boldsymbol {\Psi }^{*}\mathbf {D}^{1/2}$
. In step (b),
$n^{-1} \mathbf {D}^{1/2}\boldsymbol {\Psi }^{*}\mathbf {D}^{1/2}$
is updated using
$\mathrm {b\text {-}diag}(\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2} - n^{-1} \mathbf {D}^{1/2}\mathbf {W}\mathbf {W}^{\prime }\mathbf {D}^{1/2})$
. This algorithm is analogous to the MINRES algorithm introduced by Harman and Jones (Reference Harman and Jones1966) in factor analysis.
Greenacre (Reference Greenacre1991) noted that JCA solutions can be interpreted similar to factor analysis: the diagonal block of
$n^{-1} \mathbf {D}^{1/2}\mathbf {W}\mathbf {W}^{\prime }\mathbf {D}^{1/2}$
is regarded as the variance explained by the common factor, while
$n^{-1} \mathbf {D}^{1/2}\boldsymbol {\Psi }^{*}\mathbf {D}^{1/2}$
is the variance explained by the unique factor. For these interpretations to be vaild, the unique variance part should be positive semi-definite (psd). However, in some cases, this psd assumption is violated, leading to what is known as an improper solution or a Heywood case in the context of factor analysis. Boik (Reference Boik1996) developed a modified JCA estimation algorithm to ensure that the unique variance part remains psd, thereby avoiding the Heywood case.
3 Simultaneous object and category score estimation in JCA
In JCA, the object and category scores can be obtained simultaneously by fitting the data model, rather than the covariance model, to the observed categorical data. This approach is achieved through the proposed JCA loss function

is minimized over F, U, W, and
$\boldsymbol {\Psi }$
, subject to constraints (2.6), (2.7), and (2.8). As detailed in the following subsections, the proposed JCA estimation algorithm primarily involves two iterative minimization steps: (i) updating the common and unique object scores and (ii) updating the common and unique category scores. These steps are alternated until a pre-specified convergence criterion is satisfied. This algorithm is analogous to the direct method or matrix decomposition factor analysis in factor analysis (Adachi, Reference Adachi2019; de Leeuw, Reference de Leeuw, Montfort, Oud and Satorra2004; Sočan, Reference Sočan2003; Stegeman, Reference Stegeman2016; Unkel & Trendafilov, Reference Unkel and Trendafilov2010), whereas the existing JCA algorithm corresponds to the MINRES method.
3.1 Update object scores
By denoting
$\mathbf {Z}=[\mathbf {F}, \mathbf {U}]$
and
$\mathbf {A}=[\mathbf {D}^{1/2}\mathbf {W}, \mathbf {D}^{1/2}\boldsymbol {\Psi }]$
, (3.1) is rewritten as
$\| \mathbf {J}\mathbf {G}\mathbf {D}^{-1/2} - n^{-1} \mathbf {Z}\mathbf {A}^{\prime }\|^{2}$
. This function can be expanded as

Here,
$const_{z}=\text {tr}(\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}) + n^{-1} \text {tr}(\mathbf {A}\mathbf {A}^{\prime })$
is a constant that does not depend on the update of object scores. In this expression, the constraints (2.6), (2.7), and (2.8) related to object scores are replaced by
$n^{-1}\mathbf {Z}^{\prime }\mathbf {Z}=\mathbf {I}$
and
$\mathbf {Z}=\mathbf {JZ}$
.
Equation (3.2) indicates that minimizing over Z is equivalent to maximizing
$\text {tr}(\mathbf {A}^{\prime }\mathbf {D}^{-1}\mathbf {G}^{\prime }\mathbf {J}\mathbf {Z})$
over Z for a given A. As demonstrated by ten Berge (Reference ten Berge1993), the following inequality holds for this trace function:

where the SVD of
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}\mathbf {A}$
is denoted as
$\mathbf {M}\boldsymbol{\Theta}\mathbf {N}^{\prime }$
. The upper bound is attained for

under the constraint
$n^{-1}\mathbf {Z}^{\prime }\mathbf {Z}=\mathbf {I}$
, implying that Z can be updated by (3.4).
$\mathbf {M}=\mathbf {JM}$
holds in this SVD, and thus the updated object scores satisfy the column-centered constraints. This update can be regarded as solving the Procrustes problem; the first c columns of
$\hat {\mathbf {Z}}$
correspond to
$\hat {\mathbf {F}}$
, while the remaining columns of
$\hat {\mathbf {Z}}$
corresponds to
$\hat {\mathbf {U}}$
(de Leeuw, Reference de Leeuw, Montfort, Oud and Satorra2004; Unkel & Trendafilov, Reference Unkel and Trendafilov2010).
3.2 Update category scores
First, the minimization over W, given F, U, and
$\boldsymbol {\Psi }$
is considered. The proposed loss function can be rewritten as

where
$const_{w}=\text {tr} \{ (\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2})^{\prime } (\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2}) \}- n^{-1}\text {tr}\{(\mathbf {F}\mathbf {F}^{\prime }(\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2})(\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2})^{\prime } \}$
, which is irrelevant to the update of W. It can be seen from (3.5) that the minimum value for W is attained when
$n^{-1}(\mathbf {F^{\prime }F})^{1/2}\mathbf {W}^{\prime }\mathbf {D}^{1/2}=(\mathbf {F^{\prime }F})^{-1/2}\mathbf {F}^{\prime }(\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2})$
. Consequently, the common category score matrix can be updated by

which is equivalent to solving an unconstrained matrix regression problem (ten Berge, Reference ten Berge1993). Equation (3.6) shows that common category scores are the centroids of the common object scores that share the same category, indicating that the JCA solution has the same property as the MCA solution.
Next, we address the minimization over
$\boldsymbol {\Psi }$
, given F, U, and W. This minimization is achieved by solving the unconstrained regression problem for each
$\boldsymbol {\Psi }_{i}$
, as shown by the following reformulation of the loss function:

Here,
$const_{\psi }$
is irrelevant for updating
$\boldsymbol {\Psi }$
and can be expressed as
$\sum ^{p}_{i=1} \text {tr} (\mathbf {J}\mathbf {G}_{i}\mathbf {D}^{-1/2}_{i} - n^{-1} \mathbf {F}\mathbf {W}^{\prime }_{i}\mathbf {D}^{1/2}_{i}) (\mathbf {J}\mathbf {G}_{i}\mathbf {D}^{-1/2}_{i} - n^{-1} \mathbf {F}\mathbf {W}^{\prime }_{i}\mathbf {D}^{1/2}_{i})^{\prime } -n^{-1} \sum ^{p}_{i=1} \text {tr} \{\mathbf {U}_{i}\mathbf {U}^{\prime }_{i}(\mathbf {J}\mathbf {G}_{i}\mathbf {D}^{-1/2}_{i} - n^{-1} \mathbf {F}\mathbf {W}^{\prime }_{i}\mathbf {D}^{1/2}_{i})(\mathbf {J}\mathbf {G}_{i}\mathbf {D}^{-1/2}_{i} - n^{-1} \mathbf {F}\mathbf {W}^{\prime }_{i}\mathbf {D}^{1/2}_{i})^{\prime }\}$
. Therefore, the unique category score matrix can be updated by

for each i (
$i=1, \dots , p$
). Accordingly, the update of
$\boldsymbol {\Psi }$
can be described as
$\text {b-diag}(\hat {\boldsymbol {\Psi }}_{1},\dots , \hat {\boldsymbol {\Psi }}_{p})$
.
3.3 Whole algorithm and its properties
According to the discussion in previous subsections, the proposed alternating least square algorithm is summarized as follows:
-
Step 1. Initialize the object and category scores. The initial object score matrices
$\mathbf {F}_{0}$ and
$\mathbf {U}_{0}=[\mathbf {U}_{0_{1}},\dots , \mathbf {U}_{0_{p}}]$ are randomly generated in such a way that the constraints (2.6), (2.7), and (2.8) are satisfied. Then, initial
$\mathbf {W}_{0}$ and
$\boldsymbol {\Psi }_{0}$ are obtained by
$\mathbf {W}_{0}=\mathbf {D}^{-1}\mathbf {G}^{\prime }\mathbf {F}_{0}$ and
$\boldsymbol {\Psi }_{0_{i}}=\mathbf {D}^{-1}_{i}\mathbf {G}^{\prime }_{i}\mathbf {U}_{0_{i}}$ for each i, respectively.
-
Step 2. Update the object score matrices F and U, given the category score matrices as described by equation (3.4).
-
Step 3. Update the category score matrices W and
$\boldsymbol {\Psi }$ , given the object score matrices.
-
Step 4. Repeat Steps 2 and 3 until the pre-specified convergence criterion is met.
The ALS algorithm of the direct method monotonically decreases the loss function and converges to a local minimum (Unkel & Trendafilov, Reference Unkel and Trendafilov2010). To avoid accepting a local minimum, we run the algorithm 100 times with different initial values. Among the resulting multiple solutions, the one with the lowest loss function value is chosen as the optimal solution. For example, the initial object score matrices can be obtained using the SVD of a random matrix: generate a column-centered random matrix (
$n \times (K+c)$
) with the full column rank, and then the left singular vectors of the random matrix can be used as the initial object score matrices. Step 4 defines convergence as the difference in loss function values between the current and previous rounds being less than
$10^{-6}$
.
The proposed method has two considerable properties: it avoids producing the Heywood case and the object scores are uniquely undetermined. In the proposed method,
$\boldsymbol {\Psi }$
is parameterized as the coefficient matrix for U, ensuring that the unique variances part
$n^{-1} \mathbf {D}^{1/2}\boldsymbol {\Psi }\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2}=(n^{-1}\mathbf {U}\boldsymbol {\Psi }\mathbf {D}^{1/2})^{\prime }(n^{-1}\mathbf {U}\boldsymbol {\Psi }\mathbf {D}^{1/2})$
satisfies the psd constraint. This prevents the occurrence of the Heywood case never occurs, unlike the existing JCA estimation procedures where
$n^{-1} \mathbf {D}^{1/2}\boldsymbol {\Psi }^{*}\mathbf {D}^{1/2}$
is directly estimated without the psd constraint.
Recall that the update of Z is based on the SVD of
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}\mathbf {A}$
. It is known that the rank of
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}$
is
$K-p$
, which implies that the rank of
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}\mathbf {A}$
is at most
$K-p$
. Let us denote the rank of
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}\mathbf {A}$
as r (
$r \leq K-p$
). The SVD of
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}\mathbf {A}$
is described as

where the size of matrices
$\mathbf {M}_{1}$
,
$\mathbf {M}_{2}$
,
$\mathbf {N}_{1}$
,
$\mathbf {N}_{2}$
, and
$\boldsymbol{\Theta}_{1}$
are
$n \times r$
,
$n \times (K+c-r)$
,
$K \times r$
,
$K \times (K+c-r)$
and
$r \times r$
, respectively. This expression leads to
$\mathbf {Z}=\sqrt {n} \mathbf {M}\mathbf {N}^{\prime }=\sqrt {n} \mathbf {M}_{1}\mathbf {N}^{\prime }_{1}+\sqrt {n} \mathbf {M}_{2}\mathbf {N}^{\prime }_{2}$
. We find that
$\mathbf {M}_{2}\mathbf {N}^{\prime }_{2}$
and thus Z are uniquely undetermined, indicating that JCA has a factor indeterminacy property in the same manner as factor analysis.
4 Interpretation of the JCA solution
4.1 Geometric interpretation
The optimal category score matrix in MCA is expressed as
$\mathbf {V}=\sqrt {n} \mathbf {D}^{-1/2}\mathbf {L}\boldsymbol{\Delta}\mathbf {K}^{\prime }\mathbf {K}_{c}=\mathbf {D}^{-1}\mathbf {G}^{\prime }\mathbf {S}$
. Substituting
$\mathbf {V}=\mathbf {D}^{-1}\mathbf {G}^{\prime }\mathbf {S}$
into (2.2), we have

where
$\mathbf {P}_{s}=\mathbf {S}(\mathbf {S^{\prime }S})^{-1}\mathbf {S}^{\prime }$
is an orthogonal projection matrix. This indicates that MCA aims to find a subspace spanned by S that preserves as much of the variation in the observed categorical data as possible. The joint map interpretation is validated because S and V are considered the coordinates of objects and categories in this subspace (Greenacre, Reference Greenacre1984).
As demonstrated in the previous section, the optimal common category score matrix in JCA is given by
$\mathbf {W} = \mathbf {D}^{-1}\mathbf {G}^{\prime }\mathbf {F}$
. Substituting
$\mathbf {W}$
into (3.1), we have

where
$\mathbf {P}_{f}=\mathbf {F}(\mathbf {F}^{\prime }\mathbf {F})^{-1}\mathbf {F}^{\prime }$
is an orthogonal projection matrix that projects onto the subspace spanned by F. As demonstrated below,
$\mathbf {J}\mathbf {G}_{i}\mathbf {D}^{-1/2}_{i} - n^{-1} \mathbf {U}_{i}\boldsymbol {\Psi }^{\prime }_{i}\mathbf {D}^{1/2}_{i}$
is mutually orthogonal to
$n^{-1} \mathbf {U}_{i}\boldsymbol {\Psi }^{\prime }_{i}\mathbf {D}^{1/2}_{i}$
for each variable (
$i=1,\dots , p$
):

This result shows that the unique part removes variances that are irrelevant to estimating the common part from the observed categorical data.
Equations (4.1) and (4.2) show that MCA and JCA aims to find subspaces spanned by the common components that preserve essential information inherent in the original categorical data. The MCA and JCA solutions can be interpreted as coordinates in the joint map by projecting the observed data onto these subspaces. While the MCA and JCA are quite similar, the unique part distinguishes them. JCA can provide a low-dimensional representation of the observed categorical data retaining the variation after discarding irrelevant variances to obtain the common part.
4.2 Factor-analytic interpretation
JCA is closely related to factor analysis as discussed in Greenacre (Reference Greenacre1988, Reference Greenacre1991); Boik (Reference Boik1996), and van de Velden (Reference van de Velden2000), but its solution is typically interpreted only geometrically. Here, we explore the factor-analytic interpretation of the JCA solution.
In factor analysis, common factor loadings are usually interpreted. Let us denote that
$\mathbf {X}=[\mathbf {x}_{1}, \dots , \mathbf {x}_{p}]$
is a standardized quantitative data matrix (
$n \times p$
) and
$\mathbf {C}=[\mathbf {c}_{1}, \dots , \mathbf {c}_{c}]$
is a common factor score matrix (
$p \times c$
) that satisfies
$\mathbf {C}=\mathbf {JC}$
and
$n^{-1}\mathbf {C}^{\prime }\mathbf {C}=\mathbf {I}$
. A common factor loading matrix
$\boldsymbol {\Lambda }=\{\lambda _{jl} \}$
can be defined as
$\boldsymbol {\Lambda }=n^{-1} \mathbf {X}^{\prime }\mathbf {C}$
, which contains the cosines between the column vectors of X and
$\mathbf {C}$
:

which illustrates the relationship between latent factors and observed variables. The common factor loadings reflect the extent to which the common factors are associated with the observed variables.
In JCA,
$\mathbf {W}^{*} = \sqrt {n}^{-1}\mathbf {D}^{1/2}\mathbf {W}=\sqrt {n}^{-1} \mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {F}$
corresponds to the common loading matrix because the elements of
$\mathbf {W}^{*}$
are equivalent to the cosines between the column vector of G and F:

Here,
$\mathbf {g}_{k}$
is the kth column vector of G,
$\mathbf {f}_{l}$
is the lth column vector of F, and
$d_{k}$
is the kth diagonal element of D. Thus,
$\|\mathbf {g}_{k}\|$
represents the square root of the number of objects responding to the corresponding category, which equals the square root of the kth diagonal element of D. Consequently, we can understand how each category relates to each common factor by examining the common loading matrix
$\mathbf {W}^{*}$
in JCA.
In the geometric interpretation, solutions are typically represented as points in a two-dimensional plot. However, there are cases where a three-dimensional or higher-dimensional approximation is necessary to accurately capture the variation in the observed categorical data (Lorenzo-Seva et al., Reference Lorenzo-Seva, van de Velden and Kiers2009; van de Velden & Kiers, Reference van de Velden and Kiers2005). In such situations, the factor-analytic interpretation provides a valuable alternative when the geometric interpretation becomes challenging.
4.3 Explained variance (EV) by the common factor
EV by the common factor is a measure of the quality of the c-dimensional approximation. However, the EVs in MCA and JCA are defined differently depending on the unique part.
The MCA loss function can be rewritten as
$\| \mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}\|^{2} = \| \mathbf {E}_{M} \|^{2} + \|n^{-1} \mathbf {S}\mathbf {V}^{\prime }\mathbf {D}^{1/2} \|^{2}$
, which shows that the total variation in the categorical data is decomposed into the common and error components. This decomposition leads to the following equation:

The second term in (4.6) indicates the extent to which the c-dimensional solutions accounts for the total variation in the standardized categorical data. This is referred to as the proportion of EVs (PEV). It has been noted that in MCA, the PEV may be underestimated because
$\| \mathbf {J}\mathbf {G}\mathbf {D}^{-1/2} \|^{2}=K-p$
, which is the denominator of the PEV, can be artificially inflated (Greenacre, Reference Greenacre1988, Reference Greenacre, Greenacre and Blasius1994).
In the proposed JCA formulation, the EV can be defined similarly to MCA. Equation (3.1) simplifies to
$\| \mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2} \|^{2}=\| \mathbf {E}_{D} \|^{2} + \| n^{-1} \mathbf {F}\mathbf {W}^{\prime }\mathbf {D}^{1/2} \|^{2}$
. This implies that the PEV in JCA is expressed as

Equation (4.7) shows how well the common factors account for the variation in the original data after excluding the unique variances from the observed data. Let us recall that the JCA loss function is described as
$\| (\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2})- n^{-1} \mathbf {F}\mathbf {W}^{\prime }\mathbf {D}^{1/2} \|^{2}$
, implying that
$\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2}- n^{-1} \mathbf {U}\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2}$
is approximated by
$n^{-1} \mathbf {F}\mathbf {W}^{\prime }\mathbf {D}^{1/2}$
. The PEV in JCA can also be regarded as an index of how well the common parts explain the target to be approximated. Compared to the PEV in MCA, the unique part reduces inflation of the denominator term, thereby providing a more accurate assessment of the quality of the approximation quality in JCA.
Next, we consider the decomposition of the variances explained by the common part:

Here, we denote a submatrix in the JCA loading matrix as
$\mathbf {W}^{*}_{j}=\sqrt {n}^{-1}\mathbf {D}^{1/2}_{j}\mathbf {W}_{j}=\sqrt {n}^{-1} \mathbf {D}^{-1/2}_{j}\mathbf {G}^{\prime }_{j}\mathbf {F}=[\mathbf {w}^{*}_{j1},\dots ,\mathbf {w}^{*}_{jc}]$
(
$j=1, \dots , p$
).
$\|\mathbf {w}_{js}^{*}\|^{2}$
represents the contribution of the sth common factor to the variance of jth variable and can be rewritten as follows:

In this context, the squared loading is equivalent to the squared correlation between
$\mathbf {f}_{s}$
and
$\mathbf {G}_{j}\mathbf {w}_{js}$
, which corresponds to the discrimination measure (Gifi, Reference Gifi1990). Squared loadings serve as an index that assesses how the jth variable relates to the sth common factor in JCA. In the factor-analytic interpretation, squared loadings are crucial for understanding the solutions alongside ordinary loadings.
Equation (4.8) can be restated as
$\sum _{s=1}^{c} \|\mathbf {w}^{*}_{s}\|^{2}$
, where
$\mathbf {w}^{*}_{s}$
is the sth column vector of
$\mathbf {W}^{*}$
. The contribution rate of the sth common factor to the variance in the original data after eliminating the unique factor variances is defined as

and then
$\sum _{s=1}^{c} \gamma _{s}$
is equal to (4.7).
5 Rotational indeterminacy in JCA
Denote T as an arbitrary nonsingular matrix (
$c \times c$
). The JCA solutions in the proposed formulation can be transformed by T without affecting the fitness of the loss function:

To address this transformational indeterminacy, we can impose restrictions on T to achieve orthogonal or oblique rotation. Specifically, T can be constrained to be an orthogonal matrix for orthogonal rotation, or
$\mathrm {diag}(\mathbf {T}^{\prime }\mathbf {T})=\mathbf {I}$
for oblique rotation. This allows the solution to be rotated in an orthogonal or oblique manner to enhance interpretability. Various rotation criteria have been proposed to facilitate this interpretablity (Browne, Reference Browne2001; Sass & Schmitt, Reference Sass and Schmitt2010).
It is a common practice to rotate the solution in the context of factor analysis and principal component analysis. Similarly, orthogonal and oblique rotations have been applied to simple and MCA solutions to improve the interpretability. These previous studies falls into two categories: the first category includes studies that employ rotation to help the factor-analytic interpretation of the solution (Adachi, Reference Adachi2004; Makino, Reference Makino2022), while the second category comprises studies that utilize rotation to facilitate the geometric interpretation of the solution (Lorenzo-Seva et al., Reference Lorenzo-Seva, van de Velden and Kiers2009; van de Velden, Reference van de Velden2000; van de Velden & Kiers, Reference van de Velden, Kiers, Yanai, Okada, Shigemasu, Kano and Meulman2003, Reference van de Velden and Kiers2005). The insights gained from rotating solutions in the simple and MCA can be extended to both factor-analytic and geometric interpretations of the proposed JCA solutions.
In factor-analytic interpretation, rotation is used to achieve a more interpretable loading matrix. As previously discussed,
$\mathbf {W}^{*}$
is the common loading matrix in JCA. In this context, the rotation in JCA aims to simplify the loading matrix by finding an optimal rotation matrix. The orthogonal and oblique rotation criteria established in EFA can similarly be applied to determine the optimal rotation matrix for the JCA solution.
When rotating solutions with geometric interpretation, it is important to consider the differences between symmetric and asymmetric plots (Lorenzo-Seva et al., Reference Lorenzo-Seva, van de Velden and Kiers2009; van de Velden, Reference van de Velden2000; van de Velden & Kiers, Reference van de Velden, Kiers, Yanai, Okada, Shigemasu, Kano and Meulman2003, Reference van de Velden and Kiers2005). In symmetric plots, object and category coordinates are treated equally; both sets of coordinates are rotated simultaneously to have the simple structure. In contrast, asymmetric plots assign different roles to the coordinates. Here, the position of the points in one coordinate, known as the principal coordinate, is determined by calculating the weighted average of the points in the other coordinate, referred to as the standard coordinate. In the asymmetric plot scenario, the principal coordinate matrix functions similarly to the loading matrix in EFA. The goal of rotation in this context is to find a rotation matrix that enhances the interpretability of the principal coordinate matrix. For the proposed JCA solutions, which aligns with the asymmetric plot, W represents the principal coordinate matrix. Thus, rotation in JCA with geometric interpretation aims to identify a rotation matrix that simplifies the interpretation of the common category score matrix. Both orthogonal and oblique rotation criteria from previous studies can be applied to determine the optimal rotation matrix.
While the JCA solutions can be rotated, it is not always necessary, especially in the context of geometric interpretation. Rotation in geometric interpretation is primarily designed to aid the interpretation of high-dimensional solutions that are challenging to represent in two-dimensions (Lorenzo-Seva et al., Reference Lorenzo-Seva, van de Velden and Kiers2009; van de Velden, Reference van de Velden2000; van de Velden & Kiers, Reference van de Velden, Kiers, Yanai, Okada, Shigemasu, Kano and Meulman2003, Reference van de Velden and Kiers2005). Previous studies have indicated that rotation may not enhance geometric interpretation of the low-dimensional solutions. The JCA solutions without rotation are preferable in geometric interpretation if the low-dimensional approximation effectively captures the variation in the observed data. On the other hand, for higher-dimensional JCA solutions, orthogonal or oblique rotations are more suitable for improving interpretability.
In factor-analytic interpretation, both orthogonal and oblique rotations can be applied to two- or multidimensional solutions, similar to the methods used in EFA. However, these rotations impact the interpretation of squared loadings differently. In the case of oblique rotation, inter-correlation between the common factors must be excluded to accurately interpret squared loadings. On the other hand, the orthogonal rotation does not affect the interpretation of squared loadings because it ensures that the rotated factors remains uncorrelated. Therefore, the orthogonal rotation may be preferable interpreting both ordinary and squared loadings.
6 Real data examples
This section provides two real data examples to illustrate the proposed JCA procedure. The first example highlights the geometric interpretation, while the second example focuses on the factor-analytic interpretation. In this section, objects can be rewritten as respondents for the sake of interpretation.
6.1 Psychological student data
In the first example, we use psychological student data (Adachi, Reference Adachi2004) originally presented by Adachi (Reference Adachi2000) in Japanese and later translated into English by Adachi (Reference Adachi2004). The data comprises responses from thirty students majoring in social or clinical psychology at a Japanese university. The dataset includes answers to five questions:
-
Question 1. Which do you major in? (2 categories): Social psychology (S) or Clinical psychology (C).
-
Question 2. Which method do you like for investigating psychology? (3 categories): Experiment (EXP), Survey (SURV), or Interview (INTs).
-
Question 3. Which interests you besides psychology? (4 categories): Sociology (SOC), Anthropology (ANTH), Medicine (MED) or Literature (LIT).
-
Question 4. Which do you think mainly determines human behavior? (2 categories): Inner mind (I-Mind), Outer environment (O-Env).
-
Question 5. Which subject do you like? (5 categories): Mathematics (MATH), Natural Sciences (NSCI), Social Sciences (SOCS), English (ENG), or Japanese (JPN).
In this example, our goal is to compare the joint display of JCA and MCA solutions. We used two-dimensional solutions, as done by Adachi (Reference Adachi2004). The PEV for both MCA and JCA solutions is detailed in Table 1. The MCA solution accounted for the 33.9% of the variation in the psychological student data. Consistent with Greenacre (Reference Greenacre1988, Reference Greenacre, Greenacre and Blasius1994), the total PEV for MCA was also found to be under-assessed. In contrast, the JCA solution explained a substantial portion of the data variation after eliminating the unique variances, with the total PEV of 90.5%.
Table 1 PEV (%) of the JCA and MCA solutions in the psychological student dataset

Figures 1 and 2 show the unrotated joint plots of respondents and category scores derived from the MCA and JCA procedures, respectively: S and V are jointly plotted in Figure 1, and F and W are jointly plotted in Figure 2. The category scores from both MCA and JCA exhibit similar patterns. However, the respondent scores reveal a notable distinction: the JCA respondent scores are grouped into four clusters, whereas the MCA respondent scores are more dispersed. The clustered solution makes it easier to interpret the patterns of the respondents than the dispersed solution because we can easily interpret the factor scores simply by focusing on the clusters. Based on the PEV and visualization, it can be concluded that the JCA solution provides more insightful results compared to the MCA solution.

Figure 1 Two-dimensional joint plot of the MCA solution.

Figure 2 Two-dimensional joint plot of the JCA solution.
6.2 Taste data
In this illustration, our goal is to demonstrate the interpretation of higher-dimensional solution in JCA, with a focus on the factor-analytic perspective. A comparison with MCA results is not addressed here. We use taste data obtained from Le Roux and Rouanet (Reference Le Roux and Rouanet2010) for this demonstration. This dataset includes responses from 1,215 individuals on the following four nominal items:
-
Item 1. Preferred TV program (eight categories): TV-News, TV-Comedy, TV-Police, TV-Nature, TV-Sport, TV-Films, TV-Drama, and TV-Soap operas.
-
Item 2. Preferred film (eight categories): Action, Comedy, Costume drama, Documentary, Horror, Musical, Romance, and SciFi.
-
Item 3. Preferred type of art (seven categories): Performance, Landscape, Renaissance, Still life, Portrait, Modern art, and Impressionism.
-
Item 4. Preferred place to eat out (six categories): Fish & Chips, Pub, Indian restaurant, Italian restaurant, French restaurant, and Steak house.
In this example, we used a three-dimensional JCA solution, consistent with the analysis presented in Le Roux and Rouanet (Reference Le Roux and Rouanet2010). Table 2 presents the PEV for the JCA solution. The estimated common factors accounted for 96.2% of the variances in the taste data after excluding unique variances, indicating that the three-dimensional solution adequately captured the variation.
Table 2 PEV (%) of the JCA solution in the taste dataset

For interpretation, we used both ordinary and squared loadings. To facilitate the interpretation of the ordinary loading matrix, we applied varimax rotation (Kaiser, Reference Kaiser1958), an orthogonal rotation method, which does not affect the interpretation of squared loadings. Tables 3 and 4 show the unrotated and varimax-rotated ordinary and squared loadings, respectively. These tables show that each category and variable primarily loads on a single factor, with minor cross-loadings. The rotated solutions more closely approach the perfect simple structure compared to the unrotated solutions.
Table 3 Unrotated and Varimax-rotated JCA loadings for the taste data

Table 4 Unrotated and Varimax-rotated JCA squared loadings for the taste data

Dimension 1 shows a positive association with TV-Sports, TV-Nature and TV-News, while negatively relating to TV-Soap operas and TV-Drama. The former categories can be grouped as “Non-fictional,” and the latter as “Fictional,” indicating that this dimension reflects a preference between Non-fictional and Fictional TV. Dimension 2 is positively associated with Landscape, Fish&Chips, Pub, and Steak house, but negatively associated with Impressionism, Modern art, Italian restaurant, and Indian restaurant. This dimension appears to contrast “popular” tastes with “sophisticated” ones. Dimension 3 has positive associations with CostumeDrama, Documentary, TV-Drama and TV-News, and negative associations with Comedy, Horror, TV-Comedy and TV-Films. The positively associated categories represent a “matter of fact” taste group, while the negatively associated categories reflect an “affective content” taste group in film production.
7 Concluding remarks
We developed a simultaneous object and category score estimation method for JCA, which is based on the JCA data model rather than the JCA covariance model. Additionally, we provided a rationale for the joint graphical representation of the objects and categories. While the proposed JCA formulation is similar to the MCA formulation, the unique part is a key term for addressing the underestimation problem identified in equation (4.6). Consequently, users can explore the inter- and intra-relationships among the objects and categories while effectively mitigating the underestimation problem seen in MCA, as illustrated by the psychological student data example.
It is important to note that while the common factor scores exhibit a clustered structure in the real data example, this does not imply that JCA will always produce such interpretable scores. In the direct method, the factor score matrix Z = [F, U] cannot be uniquely determined, with Z being the sum of determinate and undetermined matrices. This implies that F used in Figure 2 is one of the optimal F’s. Procrustes analysis was used to estimate F in the real data example, but other methods are adaptable if a desirable matrix can be reasonably selected from a set of the undermined matrices that satisfy the conditions. Uno et al. (Reference Uno, Adachi and Trendafilov2019) proposed a procedure called clustered common factor exploration (CCFE), designed to estimate the common factor score matrix that is as well classified as possible in the framework of the direct method. CCFE allows for the selection of the undetermined part to achieve a clustered structure in the factor estimation step. CCFE is applicable to the proposed JCA estimation procedure by replacing Step 2 with CCFE because the same estimation procedures are used in the direct and the proposed methods. CCFE may be a desirable option when observing large numbers of objects.
The JCA solutions are commonly interpreted through graphical maps, but they can also be analyzed similarly to EFA, where the relationships between variables and common factors are examined using loadings defined by (4.5). Although two-dimensional solutions are typically employed for geometric interpretation, there are instances where three-dimensional or higher-dimensional approximation is more suitable, making visualization more complex. In such cases, factor-analytic interpretation serves as an effective alternative, as illustrated by the analysis of the taste data example.
Rotation was an overlooked topic in JCA, but this is crucial for interpreting the obtained solutions. This article discussed how and when to rotate the JCA solutions, emphasizing the need to consider the differences between geometric and factor-analytic interpretations. While this article focuses on rotation to achieve the interpretable loadings, another type of rotation is discussed in the context of PCAMIX, which encompasses MCA as a special case (Kiers, Reference Kiers1991). In PCAMIX, rotation is applied to make squared loadings more interpretable (Adachi, Reference Adachi2004; Chavent et al., Reference Chavent, Kuentz-Simonet and Saracco2012; Kiers, Reference Kiers1991). Future research could explore how to transform JCA solutions to achieve a simpler structure.
In the estimation method proposed by Boik (Reference Boik1996) and Greenacre (Reference Greenacre1988), unique variances can become negative if the psd constraint is not imposed. In contrast, the proposed method consistently produces non-negative unique variances without requiring additional constraints. Non-negative unique variances are crucial for JCA because they help mitigate the artificial inflation of observed categorical data variances when calculating the PEV. Specifically, the denominator of the PEV in JCA, given by
$\mathrm {tr}(\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2} - n^{-1} \mathbf {D}^{1/2}\boldsymbol {\Psi }\boldsymbol {\Psi }^{\prime }\mathbf {D}^{1/2})$
, is lower than the denominator of the PEV in MCA, which is
$\mathrm {tr}(\mathbf {D}^{-1/2}\mathbf {G}^{\prime }\mathbf {J}\mathbf {G}\mathbf {D}^{-1/2})$
. For this comparison to be valid, unique variances must be non-negative; otherwise, negative unique variances could exacerbate inflation and fail to address the underestimation problem.
Recent investigations have explored the mathematical properties of the direct method (Adachi & Trendafilov, Reference Adachi and Trendafilov2018, Reference Adachi and Trendafilov2019; Adachi, Reference Adachi2022; Stegeman, Reference Stegeman2016). Further research into these properties with respect to JCA, particularly focusing on the mathematical distinctions between MCA and JCA, would be valuable in future studies.
Data availability statement
The data that support the findings of this study are openly available. Psychological student data is available within Adachi (Reference Adachi2004), while taste data can be downloaded from the webpage of Dr. Le Roux, who is an author of Le Roux and Rouanet (Reference Le Roux and Rouanet2010). A program used in the JCA estimation is provided from https://osf.io/wc4vs/?view_only=f089be0b11454a538125c6843cf5acc1.
Funding statement
This work was supported by JSPS KAKENHI Grant Number 24K22753.
Competing interests
The author has no competing interests to declare relevant to this article’s content.