1 Introduction
Latent concepts represent aspects of reality that cannot be directly observed, such as quality of life, sustainable development, poverty, happiness, and personality traits. These concepts can be inferred through observable Manifest Variables (MVs) and are essentially measurable by means of a Latent Variable (LV) that is statistically related with two or more MVs. In general, latent concepts need statistical models to comprehensively and synthetically represent them through a mathematical formalization of the observed data and their relationships, i.e., the measurement model (Bollen & Bauldry, Reference Bollen and Bauldry2011, among others). However, the representation of a multidimensional latent concept through a single LV might result in a simplistic portrayal, by inadequately capturing the intricate nature of the complex phenomenon at hand. Exploratory Factor Analysis (EFA), pioneered by Spearman (Reference Spearman1904), is one of the widely used statistical methods to identify LVs, often called factors, which are assumed to generate MVs and reflect into their linear relationships. EFA has been extensively applied to identify complex constructs like personality traits, intelligence, attitudes, and mental health in psychology; social stratification, cultural values, and interpersonal relationships in sociology; consumer preferences, brand perceptions, and product attributes in marketing; and quality of life, health behaviors, and disease risk factors in health sciences. Additionally, if a theoretical factorial structure can be hypothesized a priori, Confirmatory Factor Analysis (CFA), and more generally Structural Equation Model (SEM), can be used to test its fit to the observed data (Brown, Reference Brown2015). Nonetheless, it may sometimes happen that a reduced set of factors, which are uncorrelated in EFA, is insufficient to analyze and reproduce the complexity of the phenomenon under study, since hierarchical relationships among its multiple factors exist.
Advanced approaches for exploring hierarchies of LVs underlying the data include hierarchical models, which organize various facets of a construct into hierarchical structures, extending from the original MVs to the ultimate General Factor (GF) through a refined group of Specific Factors (SFs). The GF represents the highest abstraction of the phenomenon under study, while the SFs denote dimensions that, in turn, measure specific concepts describing the primary components of the phenomenon under study. SFs are LVs less abstract than GFs and are more correlated with MVs, allowing a better understanding of the phenomenon. This hierarchical approach is necessary to provide a more precise and comprehensive understanding, steering away from the pitfalls of oversimplification associated with a single factor or with a group of uncorrelated factors. Higher-order factor analysis (HOFA; Gorsuch, Reference Gorsuch1983), second-order disjoint factor analysis (DFA; (Cavicchia & Vichi, Reference Cavicchia and Vichi2022)), both based on EFA, and hierarchical disjoint principal component analysis (Cavicchia et al., Reference Cavicchia, Vichi and Zaccaria2023), a hierarchical extension of the model introduced by Ferrara et al. (Reference Ferrara, Martella, Vichi, Alleva and Giommi2016), are examples of hierarchical models. Specifically, HOFA involves a sequential application of EFA followed by non-orthogonal rotation to detect higher-level factors representing broader constructs, while the other two models use a simultaneous approach, either limited to two orders or related to principal component analysis instead of EFA, respectively. In psychometric studies, hierarchical concepts are ubiquitous. For example, as described by Cattell (Reference Cattell1971) and Carroll (Reference Carroll1993) in the context of intelligence studies, a hierarchical model might include multiple levels of abstraction, with lower levels representing specific abilities or tasks (e.g., memory, verbal comprehension, spatial reasoning) and higher levels representing broader factors (e.g., fluid intelligence, crystallized intelligence, general intelligence or g factor). Another example is the Five-Factor Model of personality (Costa & McCrae, Reference Costa and McCrae1992; Cronbach, Reference Cronbach1990), which organizes personality traits into five factors, namely openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism, each represented by specific aspects. The aggregation of neuroticism, agreeableness, and conscientiousness on the one hand, and extraversion and openness to experience on the other, give rise to two higher-order dimensions, called alpha and beta (or stability and plasticity), respectively (DeYoung et al., Reference DeYoung, Peterson and Higgins2002). Hierarchical models are also employed in organizational psychology to understand the structure of job performance (Campbell, Reference Campbell1990).
Unlike the aforementioned models, the methodology introduced in this article addresses the modeling of the unexplained correlation between higher-order factors through a simultaneous approach that goes beyond two levels and identifies higher-order LVs from the hierarchical structure of MVs. The proposal is called Ultrametric Factor Analysis (UFA), as it extends the EFA model to the case of factors with a hierarchical structure, which mathematically corresponds to an ultrametric correlation matrix. However, if UFA is applied to data without a true hierarchical structure, first-order factor correlations will be close to zero, resulting in near-zero loadings for higher-order factors. This indicates the absence of a hierarchical structure. The model requires that the identified factors possess certain relevant properties to define a sound methodology that adheres to the principles of high-quality statistics, as clearly described in the European Code of Practice (European Commission, 2017). Good properties for a hierarchy of factors include internal consistency, reliability, unidimensionality, and content validity of the factors. The article proposes an adequate measurement model for constructing a hierarchy of LVs with these desirable properties starting from a group of MVs. The results of the simulation study underscore the superiority of the UFA method compared to well-established factor analysis techniques, particularly in scenarios where the data exhibit a nested or hierarchical structure within factors. Traditional methods often struggle to accurately capture the complexity of such hierarchical data, leading to sub-optimal factor solutions. In contrast, UFA demonstrated a remarkable ability to identify and model the nested relationships, providing more precise and interpretable factor structures. We also apply UFA to the Cattell’s dataset, which focuses on the factorial study of intelligence. Given the inherent hierarchical nature of this construct, where specific cognitive abilities feed into broader dimensions like crystallized and fluid intelligence, UFA is perfectly suited to model this type of data. Indeed, by capturing both the unidimensional and hierarchical relationships by means of an ultrametric correlation structure, UFA provides deep insights into how different cognitive abilities interrelate, offering a more nuanced understanding of intelligence that traditional factor analysis methods may fail to capture.
The rest of the article is organized as follows. In Section 2, we present the notation used across all sections, along with fundamental concepts regarding ultrametric matrices, EFA and its constrained version based upon a sparse structure of the loading matrix. A comprehensive illustration of the proposed methodology is presented in Section 3. Section 4 describes the algorithm built to implement the proposed procedure. Different measures for model selection and for the choice of the number of factors are examined in Section 5. An extended simulation study designed to evaluate the model is illustrated in Section 6, while a practical application is the core of Section 7. Conclusions and key findings are reported in Section 8.
2 Notation and background
For the convenience of the reader, the notation and terminology common to all sections is listed here.

2.1 Ultrametric correlation matrix
To introduce our new factorial methodology, we first need to recall the definition of an ultrametric correlation matrix.
Definition 1. A matrix
$\mathbf {R}_{\mathbf {X}}$
is an ultrametric correlation matrix if all its elements
$r_{jl} \in \mathbb {R}$
, for
$j, l = 1, \ldots , J$
, satisfy the following conditions:
-
(i) symmetry:
$r_{jl} = r_{lj}$ for
$j,l = 1, \ldots , J$ ;
-
(ii) non-negativity and unitary of the diagonal:
$r_{jl} \geq 0$ for all
$j, l = 1, \ldots , J$ ,
$r_{jj} = 1$ for all
$j = 1, \ldots , J$ ;
-
(iii) positive semi-definiteness (psd):
$\mathbf {x}\mathbf {R}_{\mathbf {X}}\mathbf {x} \geq 0$ for all
$\mathbf {x} \in \mathbb {R}^J$ ;
-
(iv) ultrametricity:
$r_{jl} \geq \min \{r_{jh}, r_{lh}\}$ , for
$j, l, h = 1, \ldots , J$ or, equivalently, for each triplet
$j, l, h = 1, \ldots , J$ , there exists a reordering
$\{j, l, h\}$ of the elements s.t.
$r_{jl} \geq r_{jh} = r_{lh}$ .
In detail,
$\mathbf {R}_{\mathbf {X}}$
is a
$(2Q-1)$
-ultrametric correlation matrix of order J if MVs are partitioned into Q groups and then they are agglomerated into
$Q-1, Q-2, \ldots , 1$
broader groups (Cavicchia et al., Reference Cavicchia, Vichi and Zaccaria2020). An example is provided in Figure 1a. Each group
$q= 1, \ldots , Q$
in the initial partition of the MVs is characterized by a strong correlation among the MVs within the group. This correlation level, denoted by
, can be stored as the qth element of a diagonal matrix
$\mathbf {R}_W$
. The groups are also correlated with each other, though to a lesser extent. These correlations, denoted by
, can be stored as off-diagonal elements of a hollow matrix
$\mathbf {R}_B$
. To arrange the MVs into groups, and those groups into larger groups (property (iii) for the whole matrix), the correlations within each group must be greater than the correlations between different groups, i.e.,
. Additionally, the ultrametric property must hold for the correlations between groups. Hence, a
$(2Q-1)$
-ultrametric correlation matrix has a reduced number of parameters than a
$(J \times J)$
correlation matrix since its off-diagonal elements assume one of the
$(2Q-1)$
different values, i.e.,
, and
. It is easy to show that a
$(2Q-1)$
-ultrametric correlation matrix is one-to-one associated with a hierarchy of Q groups of MVs (Cavicchia et al., Reference Cavicchia, Vichi and Zaccaria2020, Lemma 1). In fact, as shown in Figure 1b, the first Q values
, define the first-order aggregation levels of the hierarchy and represent the expected correlation within the first Q groups. The other
$Q-1$
values
, identify the remaining
$Q-1$
levels and represent the expected correlation between groups of MVs.

Figure 1 Example of a
$(2Q-1)$
-ultrametric correlation matrix, with Q=4, and its corresponding hierarchical tree. First-order groups in (b) are
$G_1 = \{\text {V}_1, \text {V}_2, \text {V}_3\},\ G_2 = \{\text {V}_4,\text {V}_5,\text {V}_6\},\ G_3 = \{\text {V}_7, \text {V}_8, \text {V}_9\},\ G_4 = \{\text {V}_{10},\text {V}_{11}\}$
. Higher-order groups are obtained as
$G_5 = G_1\cup G_2, G_6 = G_3\cup G_4, G_7 = G_5\cup G_6$
.
2.2 Exploratory sactor and disjoint factor analysis
Let
$\mathbf {x}$
be a
$J-$
dimensional random vector with mean vector
$\boldsymbol {\mu }$
and variance–covariance matrix
$\boldsymbol {\Sigma }_{\mathbf {X}}$
. We assume that MVs are standardized; therefore,
$\boldsymbol {\mu } = \mathbf {0}$
and the variance–covariance matrix
$\boldsymbol {\Sigma }_{\mathbf {X}}$
is the correlation of
$\mathbf {x}$
. The EFA assumes that the observed units are explained (reconstructed, generated) by the following model

where
$\mathbf {y}$
is a vector of non-observable Q random variables called factors, while
$\mathbf {A}$
is a
$(J \times Q)$
matrix of unknown parameters called factor loadings that identifies the statistical relations between MVs and factors, and finally
$\mathbf {e}$
is a
$J-$
dimensional vector of non-observable random errors, whose elements are called the specific or unique factors. We shall assume that LVs and errors are centered, i.e.,
$\mathbb {E}(\textbf {y}) = \mathbf {0}_Q;\ \mathbb {E}(\mathbf {e}) = \mathbf {0}_J$
; factors are standardized, and uncorrelated for EFA, i.e.,
$\text {Cor}(\mathbf {y}) = \mathbf {I}_Q$
(we assume the orthogonal EFA hereinafter); and, finally, errors and factors are uncorrelated, i.e.,
$\text {Cor}(\mathbf {y}, \mathbf {e}) = \mathbf {0}_{Q\times J}$
. For the Maximum Likelihood (ML) estimation, it is also assumed that
$\mathbf {y} \sim \mathbf {N}_Q(\mathbf {0}_Q, \mathbf {I}_Q)$
and
$\mathbf {e} \sim \mathbf {N}_J(\mathbf {0}_J, \boldsymbol{\Psi })$
, where
$\boldsymbol{\Psi }$
is a diagonal positive definite matrix. Given these assumptions, it can be found that
$\mathbf {x} \sim \mathbf {N}_J(\boldsymbol {\mu }, \boldsymbol {\Sigma }_{\mathbf {X}})$
, where

The EFA model clearly has a problem of identification of
$\mathbf {A}$
from
$\mathbf {AA}^\prime $
, since if
$\tilde {\mathbf {A}} = \mathbf {AT}$
is an arbitrary orthogonal transformation of
$\mathbf {A}$
, then
$\tilde {\mathbf {A}}\tilde {\mathbf {A}}^\prime = \mathbf {AA}^\prime $
. This indeterminacy is usually solved by rotating the factor loadings to create maximum contrast between columns of
$\mathbf {A}$
, e.g., by using varimax rotation method (Kaiser, Reference Kaiser1958). This frequently simplifies the loading matrix and facilitates interpretation. In this spirit, Vichi (Reference Vichi2017) introduced the DFA, in which the solution is constrained to have the simplest loading matrix
$\mathbf {A}$
that allows each MV to be uniquely correlated with one of the Q LVs. Thus,
$\mathbf {A}$
is parametrized by considering the product

where
$\mathbf {B} = \mathrm {diag}(b_1,\ldots ,b_J)$
is a
$(J \times J)$
diagonal matrix and
$\mathbf {V}_Q$
is a
$(J \times Q)$
binary and row stochastic matrix. The elements of
$\mathbf {B}$
identify the levels of correlation between MVs and LVs, while
$\mathbf {V}_Q$
is a membership matrix indicating each MV to which LV is assigned. This parameterization induces a partition of MVs into Q disjoint groups and the definition of a LV for each subgroup of MVs. Therefore, the model in Eq. (2) can be written as

The ML estimation is obtained supposing that a random sample of
$n> J$
multivariate units
$\mathbf {x}_i = [x_{i1},\ldots , x_{iJ}]^\prime ,\ i = 1, \ldots , n$
, so that
$\mathbf {X} = [\mathbf {x}_{1}^\prime , \ldots , \mathbf {x}_{n}^\prime ]^\prime $
is observed; thus, the reduced log-likelihood is

where
$\mathbf {R}$
is the sample correlation matrix (equal to the sample variance–covariance matrix
$\mathbf {S}$
since the observed sample is standardized). It is worth noting that the maximization of
$L(\mathbf {B}, \mathbf {V}_{Q}, \boldsymbol{\Psi }; \mathbf {X})$
is equivalent to the minimization of the following discrepancy function with respect to
$\mathbf {B}, \mathbf {V}_Q$
and
$\boldsymbol{\Psi }$

such that




where
$G_q$
indicates the group of MVs related to the qth factor. DFA involves a mixed binary (in
$\mathbf {V}_Q$
) and continuous (in
$\mathbf {B}$
and
$\boldsymbol{\Psi }$
) optimization problem, thus, a quasi-Newton method cannot be properly applied. Two coordinate descent algorithms (Zangwill, Reference Zangwill1969) have been developed to solve it in Vichi (Reference Vichi2017).
EFA and DFA models are scale invariant under linear transformation of the data matrix
$\mathbf {X}$
, useful for data normalization or standardization. However, DFA, differently from EFA, has the additional property to be identified (unless for a label switching), thus, no rotation is needed.
3 The UFA model
The UFA models hierarchical structures by providing a SF loading matrix capable of identifying the ultrametric correlation matrix describing the set of the supposed nested latent concepts. Therefore, let us formulate the mathematical form of the ultrametric correlation matrix that UFA can derive. First suppose that Q factors are identified according to the DFA model in Eq. (4) with the following parsimonious formulation of the loading matrix (Parsimonious DFA, PDFA)

where
$\mathbf {C}_Q = \mathrm {diag}(c_1,\ldots ,c_Q)$
is a diagonal matrix of order Q. The elements of
$\mathbf {C}_Q$
represent equal levels of correlation between each LV and the corresponding MVs. Therefore, the correlation structure of PDFA is

where the loadings of the MVs belonging to the same group, i.e., associated with the same LV, are equal. After a proper permutation of the rows of
$\mathbf {V}_Q$
so as to have all the ones in each column of
$\mathbf {V}_Q$
contiguous, the correlation structure in Eq. (12) has the following block diagonal form

In (13), blocks represent the correlation matrices of the qth group of MVs related to the qth factor, for
$q = 1,\ldots ,Q$
, that is

where
$c_q^2$
is the correlation between the qth factor and the
$n_q$
associated MVs, and
$\boldsymbol{\Psi }_q$
is a diagonal matrix such that
$\mathrm {diag}(\boldsymbol{\Psi }_q) = [\psi _{1q}, \ldots , \psi _{n_qq}]^\prime = \mathbf {1}_{n_q}(1-c_q^2)$
. It is worth noting that
, due to its block diagonal form, automatically satisfies the ultrametric conditions of Definition 1. However, for the non-orthogonal EFA, correlation between factors is hypothesized to be different from zero, i.e.,
$\text {Cor}(\mathbf {y}) = \mathbf {R}_{\mathbf {y}}$
. To model the latter, an higher-order correlation structure is required, corresponding to additional “layers” that account for it. For the second-order correlation structure, we have

where
$\mathbf {v}_{Q+1}$
—hereinafter, we denote
$\mathbf {v}_{q}$
as the qth column of
$\mathbf {V}_Q$
—identifies a group of variables which is obtained as the sum of two columns h and q of
$\mathbf {V}_Q$
, i.e.,
$\mathbf {v}_{Q+1}= \mathbf {v}_h+ \mathbf {v}_q$
, with h,
$q \in \{1,\ldots ,Q\}$
,
$h \neq q$
, while
$c_{Q+1}$
is the loading of the
$(Q+1)$
th factor corresponding to the square root of the correlation between factors h and q. The matrix formulation of Eq. (15) is

where . Note that the matrix obtained as
is ultrametric, since if the hth and qth groups are aggregated it must hold that
(see Section 2.1). For the additional orders of correlation structures from
$Q+2$
to
$2Q-1$
, it is supposed that the correlation not explained by the previous levels is partially explained by the successive one. Formally, the additional layers are

where vectors
$\mathbf {v}_{Q+q},\ q = 1, \ldots , Q-1$
, represent the broader groups obtained by the aggregation in pairs of those from the previous levels.
The UFA model parameterizes a correlation matrix of order J as the sum of
$2Q-1$
different layers as follows:

where the values
$c_q,\ q = 1,\ldots ,Q$
, in the diagonal of
$\mathbf {C}_Q = \mathrm {diag}(c_1,\ldots ,c_Q)$
represent the loadings between the qth factor and the MVs in the qth group of the initial partition in Q groups, while the values
$c_{Q+q}, q = 1,\ldots ,Q-1$
, are the loadings of the higher-order factors. It is easy to see that the right-hand side of Eq. (18), i.e., the second and third term, is, by construction, a
$(2Q-1)$
-ultrametric correlation matrix with values

such that
$\sum _{q=1}^{2Q-1} v_{jq}c_q^2v_{lq} \leq 1$
and
$0 < c_q^2 \leq 1$
. Additionally,
$\mathbf {R}_{\mathbf {X}}$
in Eq. (18) can be rewritten in a compact matrix form as follows:

where

is the
$(J \times 2Q-1)$
binary matrix, with the first Q columns of
$\ddot {\mathbf {V}}$
reported in
$\mathbf {V}_Q$
which identify the partition of the J MVs in Q groups, while the following
$Q-1$
columns identify the aggregations of groups of MVs into partitions formed by
$Q-1,\ldots , 1$
, nested groups. The last column of
$\ddot {\mathbf {V}}$
corresponds to the total aggregation of the MVs into a single class. The UFA loadings in each of the
$2Q-1$
layers are stored into the matrix
$\ddot {\mathbf {C}}$
, which is given by

It is worth noting that the loadings
$c_{Q+1}, \ldots , c_{2Q-1}$
produce cross-loadings w.r.t.
$\mathbf {V}_{Q}\mathbf {C}_{Q}$
, i.e., MVs that load on two or more higher-order LVs and that induce their correlation. However, while the number of parameters in a generic non-negative correlation matrix increases in the order of
$O(J^2)$
, for
$\mathbf {R}_{\mathbf {X}}$
in Eq. (18), the number of parameters increases linearly in the matrix as we will see in Section 5.
Remark 1. The elements
$r_{jl}$
of the correlation matrix
$\mathbf {R}_{\mathbf {X}}$
in Eq. (18) can be decomposed in two parts as follows

In (23), the first sum is equal to zero if MVs j and l belong to two different groups of
$\mathbf {V}_Q$
. It is worth noting that, in the first sum, at most one of the Q terms is non-zero (i.e., the term corresponding to the case where j and l belong to the same group), while in the second sum, which has
$Q-1$
terms, more than one term may be non-zero.
Remark 2. When
$\mathbf {R}_{\mathbf {X}}$
in Eq. (18) has a block diagonal form as in Eq. (13), that is, the Q LVs are uncorrelated, all the correlation between MVs is due to the correlation among MVs within the Q groups, i.e.,
. In this case, PDFA best explains the correlation in
$\mathbf {R}_{\mathbf {X}}$
, and therefore it can be considered a special case of UFA where only
$c_{1}, \ldots , c_{Q}$
are different from zero.
3.1 Model properties
In this section, we illustrate the fundamental properties of the UFA model, which consist of uniqueness, internal consistency, and unidimensionality. It is thus worth recalling that a group of MVs is internally consistent when the corresponding factor explains them coherently, where coherence means that given any triplet
$j, l, p$
of MVs, if j is positively correlated (concordant) with l and p, then also the correlation between l and p is positive. Moreover, a group of (at least two) MVs is unidimensional if the first largest eigenvalue of their correlation matrix is greater than
$1$
, and all the others are
$< 1$
. This means that there exists a unique factor that explains most of their total variance.
Property 1 (Uniqueness).
The loading matrix
$\ddot {\mathbf {C}}$
is unique, up to a permutation of columns of
$\mathbf {V}_Q$
.
Proof. This is a direct consequence of DFA, which has a unique solution for the loading matrix (Vichi, Reference Vichi2017, Property 2). Given
$\mathbf {V}_Q$
,
$\mathbf {R}_{W}$
, and
$\mathbf {R}_{B}$
are unique and therefore also the loading matrix
$\ddot {\mathbf {V}}\ddot {\mathbf {C}}$
. Any transformation of
$\ddot {\mathbf {C}}$
with an arbitrary orthogonal matrix
$\mathbf {T}$
, i.e.,
$\ddot {\mathbf {C}}^* = \ddot {\mathbf {C}} \mathbf {T}$
, produces a non-admissible loading matrix for UFA.
Property 2 (Reliability).
The matrices in Eq. (14) define each one a factor which is generally reliable.
Proof. The Cronbach’s alpha of is
, where for
(Brown, Reference Brown1910; Spearman, Reference Spearman1910). It easily follows that if
$c^2_q = 1$
, then
.
An example of this property is as follows. Supposing that has size
$n_q = 5$
and the loading is
$c_q = 0.7$
,
, which corresponds to a good internal consistency (reliability, coherence). For
$n_q = 5$
, but this time
$c_q = 0.6$
,
, which is still an acceptable reliability.
Property 3 (Unidimensionality).
The Q first-order factors identified by UFA are unidimensional. Indeed, the matrices have the first largest eigenvalue equal to
$(n_q - 1)c_q^2> 1$
, while all the remaining
$(n_q-1)$
eigenvalues have value
$1 - c_q^2$
, which is smaller than
$1$
, since
$0 < c_q^2 \leq 1$
.
Proof. Denote with
$\mathbf {M}$
the
$(n_q \times n_q)$
matrix of unitary elements
$(\mathbf {1}_{n_q}\mathbf {1}_{n_q}^\prime )$
. We have that
$\mathbf {M}$
is such that
$\mathbf {MM} = n_q\mathbf {M}$
, hence
$\mathbf {M} = \frac {\mathbf {MM}}{n_q}$
. Let
$\lambda $
be an eigenvalue of
$\mathbf {M}$
and
$\mathbf {v}$
the corresponding eigenvector, then we have that

Considering that
$\mathbf {v} \neq 0$
,
$\lambda - \frac {\lambda ^2}{n_q} = \lambda \left (1 - \frac {\lambda }{n_q}\right ) = 0$
. Therefore, either
$\lambda = 0$
or
$\left (1 - \frac {\lambda }{n_q}\right ) = 0$
, i.e.,
$\lambda = n_q$
. Since, by definition, the trace of a square matrix is the sum of its diagonal elements, we have that
$\text {tr}(\mathbf {M}) = n_q = \sum _{i=1}^{n_q} \lambda _i$
, hence it must hold that only one
$\lambda = n_q$
and all the other
$n_q - 1$
eigenvalues must be equal to
$0$
. Similarly, consider
$\boldsymbol{\Psi }_q = \mathrm {diag}\big (\mathbf {1}_{n_q}(1 - c^2_q)\big ) = (1 - c^2_q)\mathbf {I}_{n_q}$
, then
$\boldsymbol{\Psi }_q\boldsymbol{\Psi }_q = (1 - c^2_q)\boldsymbol{\Psi }_q$
and therefore
$\boldsymbol{\Psi }_q = \frac {\boldsymbol{\Psi }_q\boldsymbol{\Psi }_q}{(1-c^2_q)}$
. Since
$ \lambda \mathbf {v} = \boldsymbol{\Psi }_q\mathbf {v} = \frac {\boldsymbol{\Psi }_q\boldsymbol{\Psi }_q}{(1-c^2_q)}\mathbf {v} = \frac {\boldsymbol{\Psi }_q\lambda \mathbf {v}}{(1-c^2_q)} = \frac {\lambda ^2 \mathbf {v}}{(1-c^2_q)},$
we have that either
$\lambda = 0$
or
$\left (1 - \frac {\lambda }{(1-c^2_q)}\right ) = 0$
, i.e.,
$\lambda = 1 - c^2_q$
. Given that
$\text {tr}(\boldsymbol{\Psi }_q) = n_q(1 - c^2_q) = \sum _{i=1}^{n_q} \lambda _i,$
it must hold that
$\lambda _i = 1 - c^2_q$
for all
$i=1,\ldots ,n_q$
. As a consequence, the matrices
,
$q = 1,\ldots ,Q$
, are each one associated with a unidimensional factor, as they have only the largest eigenvalue greater than
$1$
.
It is worth stressing that UFA’s structural constraints serve to limit the solution space and substantially reduce the likelihood of encountering improper solutions, such as negative variances or non-convergent estimates. Unlike traditional EFA, UFA imposes strict structural constraints on the factor loading matrix, ensuring a well-defined hierarchical and nested factor structure. These constraints effectively limit the solution space, preventing issues like Heywood cases or inflated factor loadings. As a result, UFA is less prone to the common estimation problems associated with unconstrained factor models, offering a more stable and reliable solution for analyzing data with hierarchical structures.
4 UFA algorithm and ML estimation
The ML estimation of UFA is performed by minimizing the following discrepancy function with respect to
$\ddot {\mathbf {C}}, \ddot {\mathbf {V}}$
, and
$\boldsymbol{\Psi }$

such that





Given Q, the estimation of the UFA parameters is performed via a cycling block coordinate descent algorithm composed of the following four steps, which are sequentially repeated until a stopping rule is satisfied.
-
Step 0 [Initialization] The matrix
$\hat {\mathbf {V}}_Q=[\hat {\mathbf {v}}_1,\ldots ,\hat {\mathbf {v}}_Q]$ is randomly generated from a multinomial distribution with equal probabilities
$1/Q$ and under the constraint that the Q groups are not empty. The initial values for
$\hat {\mathbf {C}}_Q$ are obtained taking the square root of
$\hat {\mathbf {R}}_W= [(\hat {\mathbf {V}}_Q^\prime \hat {\mathbf {V}}_Q)^2 - \hat {\mathbf {V}}_Q^\prime \hat {\mathbf {V}}_Q]^{+}\mathrm {diag}(\hat {\mathbf {V}}_Q^\prime (\mathbf {R}-\mathbf {I}_J)\hat {\mathbf {V}}_Q)$ , as in Cavicchia et al. (Reference Cavicchia, Vichi and Zaccaria2020). Thus, matrices
$\hat {\ddot {\mathbf {V}}}$ and
$\hat {\ddot {\mathbf {C}}}$ are initialized as
$$ \begin{align*} \hat{\ddot{\mathbf{V}}} = [\hat{\mathbf{V}}_Q, \hat{\mathbf{v}}_{Q+1}, \ldots, \hat{\mathbf{v}}_{2Q-2}, \mathbf{1}_J],\ \text{where}\ \hat{\mathbf{v}}_{Q+1}, \ldots, \hat{\mathbf{v}}_{2Q-2}\ \text{are}\ (J \times 1)\ \text{vectors of zeros,}\\ \hat{\ddot{\mathbf{C}}} = [\hat{\mathbf{C}}_Q, \mathrm{diag}(\hat{c}_{Q+1}, \ldots, \hat{c}_{2Q-2}, \hat{c}_{2Q-1})],\ \text{where}\ \hat{c}_{Q+1}, \ldots, \hat{c}_{2Q-1}\ \text{are set equal to zero.} \end{align*} $$
$(J \times 2Q-1)$ UFA loading matrix
$\hat {\ddot {\mathbf {A}}}$ is computed as
$\hat {\ddot {\mathbf {V}}}\hat {\ddot {\mathbf {C}}}$ . Matrix
$\hat {\boldsymbol{\Psi }}$ is obtained as:
(31)$$ \begin{align} \hat{\boldsymbol{\Psi}} = \mathbf{I}_J - \mathrm{diag}(\hat{\ddot{\mathbf{A}}}\hat{\ddot{\mathbf{A}}}^\prime). \end{align} $$
-
Step 1 [Update
$\ddot {\mathbf {V}}$ ]
Step 1.1 [Update
$\mathbf {V}_Q$ ] Matrix
$\mathbf {V}_Q = [\mathbf {v}_{1\cdot },\ldots ,\mathbf {v}_{j\cdot },\ldots ,\mathbf {v}_{J.}]^\prime $ , where
$\mathbf {v}_{j\cdot }$ denotes the generic row j of
$\mathbf {V}_Q$ , is updated row-by-row by assigning each MV j to the group q that most decreases the discrepancy in Eq. (25). Formally,
(32)where$$ \begin{align} \begin{cases} \hat{v}_{jq} = 1 \quad \text{if} \quad q = \underset{{m = 1,\dots,Q}} {\mathrm{argmin}}\, D(\hat{\ddot{\mathbf{C}}}, [\mathbf{v}_{1\cdot},\ldots,\mathbf{i}_{m\cdot},\ldots,\mathbf{v}_{J\cdot}]^\prime, [\hat{\mathbf{v}}_{Q+1},\ldots,\hat{\mathbf{v}}_{2Q-2}, \mathbf{1}_J]^\prime, \boldsymbol{\hat{\Psi}}) \\ \hat{v}_{jq} = 0 \quad \text{otherwise} \end{cases}\!\!, \end{align} $$
$\mathbf {i}_{m.}$ is the mth row of the identity matrix of order Q.
Then, correlations are updated as in Cavicchia et al. (Reference Cavicchia, Vichi and Zaccaria2020), i.e.,
(33)$$ \begin{align} & \hat{\mathbf{R}}_B = (\hat{\mathbf{V}}_Q^+ \mathbf{R}\hat{\mathbf{V}}_Q^{\prime +}) \odot (\mathbf{1}_Q\mathbf{1}_Q^\prime - \mathbf{I}_Q) \end{align} $$
(34)$$ \begin{align} & \hat{\mathbf{R}}_W= [(\hat{\mathbf{V}}_Q^\prime\hat{\mathbf{V}}_Q)^2 - \hat{\mathbf{V}}_Q^\prime\hat{\mathbf{V}}_Q]^{+}\mathrm{diag}(\hat{\mathbf{V}}_Q^\prime(\mathbf{R}-\mathbf{I}_J)\hat{\mathbf{V}}_Q) \end{align} $$
Step 1.2 [Update last
$Q-2$ columns of
$\ddot {\mathbf {V}}$ ] Given
$\hat {\mathbf {R}}_B$ , we update the
$(Q+q)$ th vector
$\mathbf {v}_{Q+q},\ q = 1, \ldots , Q-2$ , by aggregating the two vectors
$\mathbf {v}_q$ and
$\mathbf {v}_h$ corresponding to the two factors with maximum between correlation
,
$q,h = 1,\ldots ,Q, h\neq q$ . This is the agglomerative part of the algorithm that considers the correlations between groups. Note that, if after the qth iteration columns q and h are aggregated, the labels of
$\hat {\mathbf {R}}_B$ in positions q and h will be both replaced by
$Q+q$ . Thus, after each iteration, we move from Q groups of MVs to
$Q-1,\ldots ,1$ .
-
Step 2 [Update
$\ddot {\mathbf {C}}$ ]
Step 2.1 [Update
$\mathbf {C}_Q$ ] Given
$\hat {\ddot {\mathbf {V}}}$ and
$\hat {\mathbf {R}}_{W}$ , the first Q values of matrix
$\ddot {\mathbf {C}}$ are updated as
.
Step 2.2 [Update the last
$Q-1$ columns of
$\ddot {\mathbf {C}}$ ] Given
$\hat {\ddot {\mathbf {V}}}$ and
$\hat {\mathbf {R}}_{B}$ , the
$Q-1$ higher order levels are updated as
.
-
Step 3 [Update
$\boldsymbol{\Psi }$ ] Given
$\hat {\ddot {\mathbf {A}}} = \hat {\ddot {\mathbf {V}}}\hat {\ddot {\mathbf {C}}}$ , the discrepancy function
$D(\ddot {\mathbf {A}}, \boldsymbol{\Psi })$ in Eq. (25) is minimized with respect to
$\boldsymbol{\Psi }$ by
$\hat {\boldsymbol{\Psi }} = \mathbf {I}_J - \mathrm {diag}(\hat {\ddot {\mathbf {A}}}\hat {\ddot {\mathbf {A}}}^\prime )$ .
The last three steps are alternated repeatedly, and with each iteration, the discrepancy function decreases, or at least does not increase. If the decrease of the discrepancy is larger than an arbitrarily small positive constant, the algorithm continues to iterate; otherwise, it stops and it is considered to have converged to a solution which is at least a local minimum. To avoid the well-known sensitivity of the coordinate descent algorithms to the starting values and to increase the chance of finding the global minimum, the algorithm should be run several times starting from different initial estimates of
$\mathbf {V}_Q$
, retaining the best solution. The algorithm generally stops after a few iterations. In our simulation study reported in Section 6, this number is always less than
$15$
, and it can be observed that the algorithm accurately identifies the hierarchical structure in factors of MVs generated according to the UFA model.
Remark 3.
$\ddot {\mathbf {V}}$
defines a set of subgroups of MVs, described by their binary vectors, such that
$\mathbf {v}_l$
is the binary vector identifying the non-empty lth subgroup of
$V=\{1, 2, \ldots , J\}$
, and if
$\mathbf {v}_j, \mathbf {v}_l \in \ddot {\mathbf {V}} \Rightarrow (\mathbf {v}_j \cap \mathbf {v}_l) \in \{\mathbf {v}_j, \mathbf {v}_l,\, \varnothing \}$
. This structure is a J-tree formed by J leaves corresponding the MVs
$\{j\}$
, with
$j \in V$
. Then, the first Q groups of MVs, corresponding to the first Q internal nodes of the tree, define a partition of V in Q disjoint groups. The successive
$Q-1$
steps specify the agglomeration of the Q groups in
$Q-1$
steps of group-fusions up to the root of the tree.
Remark 4. In general, there is no reason why the optimal estimation of UFA should include the PDFA solution at the Qth level. However, if one wants to improve the algorithm’s computational efficiency, the estimates in Steps 1.1 and 2.1 can be replaced by the PDFA solution at convergence. This avoids computing Eqs. (32), (33), and (34) at every iteration. Therefore, we define a new constrained algorithm for UFA that is forced to include the optimal solution of PDFA for the given Q.
The following example is provided to help the reader better follow the UFA modeling.
Example. We apply UFA to the set of eleven MVs in the example of Figure 1. In the correlation matrix in Figure 1a,
$Q = 4$
groups of highly correlated variables are clearly visible:
$G_1 = \{\text {V}_1, \text {V}_2, \text {V}_3\}, G_2 = \{\text {V}_4,\text {V}_5,\text {V}_6\},\ G_3 = \{\text {V}_7, \text {V}_8, \text {V}_9\},\ G_4 = \{\text {V}_{10},\text {V}_{11}\}$
. The specific Q first-order factors are characterized by loadings
$\hat {c}_1 = 0.45$
,
$\hat {c}_2 = 0.40$
,
$\hat {c}_3 = 0.39$
, and
$\hat {c}_4 = 0.30$
, as reported in Fig 2. The loadings are computed considering the within correlations in Figure 1a as reported in Step 2.1 of the algorithm and detailed in Remark 1. The membership matrix for this partition is identified by the first four columns
$\hat {\mathbf {v}}_1, \hat {\mathbf {v}}_2, \hat {\mathbf {v}}_3$
, and
$\hat {\mathbf {v}}_4$
of matrix
$\hat {\ddot {\mathbf {V}}}$
. The within and between correlations of groups/factors are given by


Figure 2 Path diagram for the given example.
Let us now consider the hierarchical structure of factors—from the most specific ones (first-order) to the most general, through second- and third-order SFs—as shown in the higher layers of the path diagram in Figure 2. The second-order SF (2nd layer) agglomerates the first two groups of MVs since their correlation is the highest in
$\hat {\mathbf {R}}_B$
(equal to 0.78). Therefore,
$G_1 \cup G_2 = G_5 = \{\text {V}_1, \text {V}_2, \text {V}_3, \text {V}_4, \text {V}_5, \text {V}_6\}$
, where this aggregation is identified by the vector
$\hat {\mathbf {v}}_5^\prime = \hat {\mathbf {v}}_1^\prime + \hat {\mathbf {v}}_2^\prime = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]$
representing the fifth column of matrix
$\hat {\ddot {\mathbf {V}}}$
. The corresponding loading is
$\hat {c}_5 = 0.33$
. Having merged these two groups, the third-order SF (3-rd layer) lumps together groups
$G_3$
and
$G_4$
(the correlation equals
$0.74$
), i.e.,
$G_3 \cup G_4 = G_6 = \{\text {V}_7, \text {V}_{8}, \text {V}_9, \text {V}_{10}, \text {V}_{11}\}$
, obtaining the additional vector
$\hat {\mathbf {v}}_6^\prime = \hat {\mathbf {v}}_3^\prime + \hat {\mathbf {v}}_4^\prime = [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]$
and
$\hat {c}_6 = 0.26$
. The last aggregation corresponds to
$G_5 \cup G_6 = G_7 = \{\text {V}_1,\text {V}_2,\text {V}_3,\text {V}_4,\text {V}_5,\text {V}_6,\text {V}_7,\text {V}_8,\text {V}_9,\text {V}_{10},\text {V}_{11}\}$
, where
$\hat {\mathbf {v}}_7$
is the unitary vector of dimension eleven and
$\hat {c}_7 = 0.82$
. All higher-order loadings can be found by considering the between correlations in
$\hat {\mathbf {R}}_B$
, as reported in Step 2.2. As a result, the UFA membership matrix and the corresponding loading matrix are given by

where the first Q columns of
$\hat {\ddot {\mathbf {V}}}$
correspond to
$\hat {\mathbf {V}}_Q$
. Finally,
$\hat {\boldsymbol{\Psi }} = \mathrm {diag}([0.02, 0.02, 0.02, 0.06, 0.06, 0.06, 0.11, 0.11, 0.11, 0.17, 0.17])$
.
5 Model selection and choice of Q
In this section, we discuss the choice of the optimal UFA model, among a set of candidates that consider different values of Q. The optimal model is the one that strikes the best balance between complexity and ability to accurately fit the observed data. Through this careful selection, we determine the appropriate number of factors Q to retain, ensuring both efficiency and effectiveness in our analysis. To assess the complexity of the model, we start from noticing that the population correlation matrix has
$\frac {J(J-1)}{2}$
parameters to be estimated (the correlations on the diagonal are always equal to
$1$
) in terms of
$J(2Q - 1) + J$
unknown parameters in
$\ddot {\mathbf {A}}$
and
$\boldsymbol{\Psi }$
. However, note that the first Q columns of
$\ddot {\mathbf {A}}$
have only one loading different from
$0$
for each row. Thus, we have
$J(Q-1)$
constraints in
$\ddot {\mathbf {A}}$
. Moreover, we have other Q constraints on
$\ddot {\mathbf {A}}$
to ensure that each of the first Q columns of
$\ddot {\mathbf {A}}$
is not empty. Finally, we suppose that all MVs loading on the same factor have the same coefficient
$(\ddot {\mathbf {A}} = \ddot {\mathbf {V}}\ddot {\mathbf {C}})$
, therefore we do not have to estimate J coefficients but only Q; this corresponds to have
$J-Q$
additional constraints on
$\ddot {\mathbf {A}}$
. This latter condition also holds for the last
$Q-1$
columns of
$\ddot {\mathbf {A}}$
: we do not have to estimate
$J(Q-1)$
coefficients but only
$Q-1$
and thus we have additional
$J(Q-1) - (Q-1)$
constraints on
$\ddot {\mathbf {A}}$
. Since we have the same coefficient for all MVs loading onto the same factor, also for the diagonal of
$\boldsymbol{\Psi }$
we do not have to estimate J parameters but Q, and therefore there are
$J - Q$
constraints on
$\boldsymbol{\Psi }$
. Hence, the effective number of unknown parameters in UFA is
$p_{UFA}=J(2Q-1)+J-J(Q-1)-(J-Q)-Q-[J(Q-1)-(Q-1)]-(J-Q)=2Q-1$
. Note that, due to the factorization of
$\ddot {\mathbf {A}}$
,
$p_{UFA}$
does not depend on J. The degrees of freedom of the model are therefore
$df=\frac {J(J-1)}{2}-2Q+1$
.
Two popular model selection criteria are the Akaike Information Criterion (AIC, Akaike, Reference Akaike1998) and the Bayesian Information Criterion (BIC, Schwarz, Reference Schwarz1978), which can be easily estimated for the UFA model, given Eq. (25) and
$p_{UFA}$
. Among a set of candidate models (i.e., different values of Q), the preferred is the one for which AIC and BIC are minima. Alternatively, a goodness of fit index frequently used to address this task is the Adjusted Goodness of Fit Index (AGFI, Mulaik et al., Reference Mulaik, James, Van Alstine, Bennett, Lind and Stilwell1989), which is computed as follows:

where GFI is the Goodness of Fit Index (GFI, Jöreskog, Reference Jöreskog1981) obtained as
$\text {GFI} = 1 - \frac {\text {tr}\left [(\mathbf {R}_{\mathbf {X}}^{-1}\mathbf {R}-\mathbf {I}_{J})^2\right ]}{\text {tr}\left [(\mathbf {R}_{\mathbf {X}}^{-1}\mathbf {R})^2\right ]}$
. This selection criterion chooses Q such that AGFI is maximized.
It is worth stressing that even if these criteria are well-recognized and commonly used, AIC and BIC can be more helpful for model comparison (i.e., selecting the optimal Q), while AGFI is more appropriate for evaluating model fit once the number of first-order factors Q has been chosen. Furthermore, in practice, combining multiple methods—both statistical and conceptual—typically leads to the most robust factor solution.
Remark 5. Note that when performing non-parsimonious DFA, the number of parameters to be estimated is
$p_{DFA}=2J-Q$
(Vichi, Reference Vichi2017), whereas for PDFA, this number is
$p_{PDFA}=2Q-Q=Q$
. It can be immediately seen that
$p_{PDFA}$
corresponds to UFA without considering the
$Q-1$
parameters of the factorial hierarchy. In fact,
$p_{PDFA}= p_{UFA}-(Q-1)=2Q-1-(Q-1)=Q$
.
6 Simulation study
The proposed model is evaluated through an extended simulation study that considers twelve different scenarios and involves 3,000 data samples, generated according to the following procedure. First, the matrix
$\mathbf {V}_Q$
is randomly generated. Then, a random binary J-tree is generated starting from
$\mathbf {V}_Q$
and including
$Q-1$
random agglomerations of subsets (couples). The matrix
$\ddot {\mathbf {V}}$
is identified by this J-tree. As a second step, a diagonal matrix
$\ddot {\mathbf {C}}$
with non-increasing values in
$[0,1]$
is generated, and the loading matrix
$\ddot {\mathbf {A}}$
is obtained as
$\ddot {\mathbf {A}} = \ddot {\mathbf {V}}\ddot {\mathbf {C}}$
. The matrix
$\boldsymbol{\Psi }$
is then computed accordingly using Eq. (31). The correlation matrix is obtained as
$\mathbf {R}_{\mathbf {X}} = \ddot {\mathbf {A}}\ddot {\mathbf {A}}^\prime + \boldsymbol{\Psi }$
, and it is
$(2Q-1)$
-ultrametric by construction. Finally, the observed correlation matrix
$\mathbf {R}$
is obtained adding to
$\mathbf {R}_{\mathbf {X}}$
a uniform error
$\mathbf {E}_{\mathbf {R}}$
in
$[0, \sigma _e]$
, which is symmetrized. The psd of
$\mathbf {R}$
has been verified: if this property is not satisfied, we add the absolute value of the smallest eigenvalue of
$\mathbf {R}$
to its main diagonal, so that the resulting matrix is positive semi-definite (Cailliez, Reference Cailliez1983).
Twelve different scenarios have been taken into account to test the proposed model. We considered
$J = 30, 70$
MVs,
$Q = 4, 7$
factors and three error levels
$\sigma _e=0.01$
,
$\sigma _e=0.5$
, and
$\sigma _e=1$
. In Figure 3, examples of simulated correlation matrices for each scenario are illustrated. The groups and their hierarchical structure are clearly visible in the first row of Figure 3. In the four plots of the second row, corresponding to the medium error level, the hierarchical structure of the Q first-order factors is still quite visible while the one for higher-order aggregations tends to disappear. Finally, for the high error case (third row of Figure 3), the whole hierarchical structure among both MVs and LVs almost disappears.

Figure 3 Examples of simulated correlation matrices with different levels of error, J and Q.
6.1 Random starts assessment
UFA does not guarantee the identification of the global optimal solution, which is expected since the partitioning problem is known to be NP-hard (Křivánek & Morávek, Reference Křivánek and Morávek1986). To increase the chance of find the global optimum, the multistart technique is used and, in this subsection, it is discussed how the number of random starts is chosen. To address this, the UFA algorithm is run
$250$
times with a high error level and
$J = 70, Q = 7$
, using different numbers of randstarts values:
$[1, 5, 10, 20, 30, 50, 100]$
. For each case, the percentage of local minima is recorded.
From the results in Table 1, the number of random starts for the simulation study is set equal to
$50$
.
Table 1 Local minima occurrences (%)

6.2 Model performance and comparison with competing procedures
The proposed model is evaluated according to the Adjusted Rand Index (ARI, Hubert & Arabie, Reference Hubert and Arabie1985), which compares the generated J-tree with the estimated one. First, the ARI between the true and estimated matrices
$\mathbf {V}_Q$
for the first Q aggregations is computed. Then, an ARI is computed for each of the remaining
$Q-1$
aggregations levels. For each
$q = 1,\ldots ,Q-1$
, the ARI is computed between the two factors aggregated by the algorithm at level
$Q+q$
and the two generated ones. As a result, the generated J-tree is optimally recovered if the overall ARI equals to Q, since it is obtained as the sum of one ARI for the first Q levels and
$Q-1$
ARIs for the factors’ hierarchy. Finally, this value is rescaled, dividing it by Q, to range in
$[0, 1]$
. We will denote with ARI
$_Q$
the ARI referring to the first Q levels and with ARI
$_J$
the overall ARI for the J-tree. Furthermore, the fit of the
$(2Q-1)$
-ultrametric correlation matrix resulting from UFA to the generated
$\mathbf {R}$
is evaluated through the AGFI. The discrepancy between the generated loading matrix, denoted with
$\mathbf {A}^{*} = \mathbf {\ddot {\mathbf {V}}}^{*}\mathbf {\ddot {\mathbf {C}}}^{*}$
and the estimated one
$\hat {\mathbf {A}} = \hat {\ddot {\mathbf {V}}}\hat {\ddot {\mathbf {C}}}$
is evaluated considering the Root Mean Squared Error (RMSE) defined as
$\frac {\|\hat {\mathbf {A}}\hat {\mathbf {A}}^\prime -\mathbf {{A}}^{*}\mathbf {{A}}^{*\prime }\|^2}{JQ}$
since the loading matrix is not unique.
UFA performance is compared with the ones of several well-established factor rotation methods, such as geomin (geo), promax (pro), and quartimax (qua), as well as with the prenet penalization method for factor analysis proposed by Hirose & Terada (Reference Hirose and Terada2023). Geomin, introduced by Yates (Reference Yates1987), minimizes the complexity of factor loadings by reducing the number of near-zero values. Promax, a widely used oblique rotation method proposed by Hendrickson & White (Reference Hendrickson and White1964), allows factors to correlate, thereby enhancing the interpretability of the factor solution. Quartimax, introduced by Neuhaus & Wrigley (Reference Neuhaus and Wrigley1954), seeks to simplify the rows of the factor loading matrix by maximizing the variance of squared loadings across variables. The Sparse and Simple Structure (SSS) estimation method developed by Hirose & Terada (Reference Hirose and Terada2023) employs the prenet penalization to achieve a factor solution that is both sparse and interpretable. This estimation requires tuning two parameters, denoted by
$\rho $
and
$\gamma $
, and has been carried out to maximize AGFI. By comparing these techniques with UFA, which offers an innovative approach to factor analysis, we aim to highlight its strengths and potential advantages in achieving more interpretable and accurate factor solutions when data exhibit a hierarchical structure.
Simulation results, obtained considering 50 random starts for all cases, are shown in Table 2. For each scenario,
$nsim = 250$
data matrices have been generated and the reported performance measures correspond to the average over the
$250$
samples.
Table 2 Simulation study results

Under the low error condition, our method demonstrates best performance across all evaluated metrics. Specifically, the generated J-tree is always perfectly identified for
$Q = 4$
, while the percentage of times this happens is around
$86\%$
for
$Q = 7$
(see Table 2). Moreover, the AGFI is always greater than or equal to
$0.99$
and the RMSE for the loadings reaches 0. In contrast, the competing procedures poorly perform in recovering the variables structure in Q groups: the ARI
$_Q$
always ranges between 0.65 and 0.78 and it is never equal to
$1$
. Additionally, the high values for the RMSE of the loading matrix indicate a sub-optimal recovery of the generated loadings across all scenarios. This first part of the simulation study indicates that the proposed methodology and the given algorithm do not fail to identify the correct generated ultrametric correlation matrix when its ultrametricity is clearly visible, as shown in the first row plots of Figure 3. In comparison, the competing procedures fail to achieve an accurate identification under the same conditions.
For the medium error level scenarios, the UFA algorithm achieves best performance in terms of ARI
$_Q$
, while the recovery of the higher-order hierarchies starts worsening. Indeed, in the plots of Figure 3 corresponding to the medium error level, the structure in Q factors is still visible while the higher-order correlations start deviating from ultrametricity. For instance, for
$J = 70$
and
$Q = 7$
, the
$\%$
ARI
$_Q$
= 1 and the % ARI
$_J = 1$
are
$100.00$
and
$38.40$
, respectively. Also the model-data fit in terms of RMSE and AGFI worsens, but the fit is still better than the competing procedures.
Under the high error condition, as expected, the UFA performance also in terms of ARI
$_J$
worsens because the ultrametricity tends to be masked by the error, as can be seen in the last row of Figure 3. The true J-tree is never identified for
$J = 30$
and
$Q = 7$
(under the same scenario, the % ARI
$_Q = 1$
is
$3.20$
). However, also under these scenarios, UFA outperforms the competing procedures, especially in terms of ARI
$_Q$
and RMSE.
As illustrated in Figure 4, UFA shows the lowest RMSE across all scenarios, with its interquartile range being very tight, indicating both a low error and consistent performance across simulations. In contrast, the competing methods exhibit higher median RMSE values and wider interquartile ranges, indicating poorer performance and higher variability. These results underscore the efficacy and accuracy of UFA in capturing hierarchical relationships in data, making it a reliable choice for this kind of scenarios, compared to traditional methods.

Figure 4 RMSE of factor loadings for UFA and the competing procedures.
To summarize, simulation results collectively suggest that the proposed algorithm not only achieves high accuracy in identifying the hierarchical structure of the generated data and finds its best ultrametric reconstruction, but also exhibits stability and reliability in its performance. The effectiveness of the algorithm against the benchmark criteria and competing procedures suggests its potential utility in applications in which phenomena that can be modeled by a hierarchical latent structures are studied.
7 Application
The proposed model is applied to the data employed by Cattell (Reference Cattell1971) for its factorial study on intelligence, in which the author investigates his theory of crystallized and fluid intelligence factors. Crystallized intelligence involves the ability to use learned knowledge and experience. It involves understanding facts, solving problems based on past experiences, and possessing a rich vocabulary, and it tends to improve with age. On the other hand, fluid intelligence represents the ability to solve novel problems, independent of any knowledge from the past. It includes skills like pattern recognition, abstract reasoning, and problem-solving in new situations. Fluid intelligence is thought to peak in early adulthood and then gradually decline. Cattell’s study used data from 277 8th grade children, focusing on their abilities across different dimensions. These dimensions were measured through various tests:
-
• Thurstone Primaries: four areas (Verbal, Spatial, Reasoning, Number) were each measured by two different tests, resulting in eight measures (V1, V2, S1, S2, R1, R2, N1, N2). These areas represent different cognitive abilities as identified by Thurstone (Reference Thurstone1938), another key figure in the study of human intelligence.
-
• Culture Fair Intelligence Tests: developed by the Institute of Personality and Ability Testing (IPAT), these tests aim to measure intelligence in a way that minimizes cultural and educational biases. The tests include Series, Classification, Matrices, and Topology, each providing a score (IS, IC, IM, IT).
The non-negative correlations among the
$12$
considered MVs are shown in Figure 5a. These high correlations suggest the need to explore how the various aspects of intelligence interrelate and potentially validate or refine Cattell’s theory of crystallized and fluid intelligence.

Figure 5 Observed and estimated correlation matrices of Cattell’s data.
The UFA algorithm has been run considering
$50$
random starts and it has found the expected partition in five first-order factors (Verbal, Spatial, Reasoning, Number, and IPAT). It is worth noting that
$Q = 5$
is the best choice for the number of parameters also according to the AGFI, AIC, and BIC. The unidimensionality of the identified latent dimensions is ensured by the variance of the second factor equal to
$0.14$
,
$0.21$
,
$0.23$
,
$0.22$
, and
$0.64$
, respectively. The Cronbach’s alpha values computed for each factor align with Cattell’s hypothesis: they are greater than
$0.95$
for F1, F2, F3, F4, while equal to
$0.46$
for F5. As it is possible to observe comparing the two matrices in Figure 5, the proposed model is able to well reconstruct the correlation matrix of the MVs, simultaneously identifying its hierarchical structure. The latter is depicted in Figure 6a: it can be noticed that the second last aggregation (F5–F8) characterizes the fluid ability and crystallized ability factors hypothesized by Cattell.

Figure 6 Hierarchical tree and loading structure estimated by UFA on Cattell’s data.
Moreover, the last factor aggregated to build F8 is the one related to the Spatial ability, the one for which Cattell hypothesized the strongest relationship with the IPAT MVs (F5). This is confirmed by the lowest Cronbach’s alpha value for F8 (
$0.37$
) compared to the values for F6 and F7 (
$0.77$
and
$0.75$
, respectively); indeed, Cattell hypothesized an high correlation between Verbal, Reason, and Number. A general concept defining Mental Abilities is also defined (F9), its reliability is ensured by a Cronbach’s alpha value equal to
$0.85$
. The estimated loadings are graphically shown in Figure 6b.
Note that the estimated matrix represented in Figure 5b is ultrametric even if the within deviance of the IPAT group (
$0.353$
) is higher than, for instance, the between deviance of the Verbal and Spatial groups (
$0.415$
). In fact, the ultrametric condition defined in Section 2.1 imposes that the within deviance of the IPAT group must be greater than or equal to the deviance between IPAT and the other four groups (i.e.,
$0.353 \geq 0.275$
) and not greater than or equal to any of the between deviances in the correlation matrix.
UFA results on Cattell data represent a great contribution to the state of the art in Cattell’s factorial study on intelligence. UFA does a step forward with respect to both first-order and second-order CFA that Rindskopf & Rose (Reference Rindskopf and Rose1988) fitted to these data. It recognizes the five factors despite the fully exploratory approach and it also accounts for intercorrelations among them, going beyond the construction of just two second-order factors (with a correlation of 0.78). UFA model addresses the unexplained correlation among factors building a hierarchy of Specific LVs. UFA model has been compared with the two CFAs in terms of AIC and BIC (Table 3). As it is possible to observe, the lowest values correspond to UFA, reflecting the better modeling of the underlying factors’ correlation that characterize Cattell’s data. The goodness of the fit of the estimated UFA model is further confirmed by an AGFI equal to
$0.95$
.
Table 3 AIC and BIC values for UFA model and for the two CFAs applied to Cattell data

The results obtained in this application suggest that an UFA can be the proper choice in all psychometric applications in which the researcher is interested in evaluating an hypothesized hierarchy of latent concepts and seeks to define and characterize broader dimensions of the phenomenon under study.
8 Conclusions
In the realm of psychometric studies, hierarchical concepts are pivotal for uncovering and comprehending fundamental phenomena, such as intelligence. In the literature, hierarchical models based on sequential procedures of EFA followed by oblique rotations have been proposed to arrange various facets or subcomponents of a construct into a hierarchical structure, showcasing their interconnectedness and offering an in-depth comprehension of complex phenomena. Additional models widely employed in psychometric studies to assess the compatibility of theoretical hierarchical factorial structures with observed data are CFA and SEM. These statistical tools help confirm that the used models accurately reflect underlying constructs, offering a reliable framework for analyzing intricate data. These models are widely used across various psychometric domains, with the Five-Factor Model of personality serving as just one example. This model categorizes personality traits into five overarching factors, each subdivided into lower-level facets that depict specific aspects of these traits.
In this article, we introduce a novel methodology called UFA. This approach expands upon the traditional EFA model with the aim of identifying hierarchical structures within factors that represent latent concepts through an exploratory and simultaneous approach, as opposed to confirmatory or sequential procedures based on EFA and oblique rotations. UFA proves particularly valuable in studying complex phenomena that exhibit multiple levels of abstraction and interconnected dimensions. The proposal can be fully exploratory, meaning that no prior assumptions are made regarding the relationships among MVs, first-order factors, and higher-order factors. However, it also offers the flexibility to fix some or all of these relationships in a partially or fully confirmatory approach. UFA involves a mathematical formalization of the hierarchical relationships among variables, which enables a synthesized representation of latent concepts associated with complex and multidimensional phenomena. By extending EFA to hierarchical structures, this methodology empowers researchers to delve into the underlying structure of complex phenomena with several facets by uncovering factors and their hierarchical relationships in a simultaneous approach.
Furthermore, the new methodology addresses several crucial properties essential for constructing a hierarchy of factors. These properties include internal consistency, reliability, unidimensionality, uniqueness, and content validity. Ensuring that these properties are met guarantees that the constructed hierarchical model accurately represents the underlying constructs and provide a comprehensive framework for analyzing complex data. Simulation study results showcased the superiority of the UFA method over traditional factor analysis techniques, especially when dealing with data that have a nested or hierarchical factor structure. While conventional methods often fail to effectively capture the complexity of such data, leading to sub-optimal factor solutions, UFA excels in identifying and modeling these intricate relationships, resulting in more accurate and interpretable factor structures. In summary, the UFA methodology presents a novel approach to investigate hierarchical concepts in psychometric studies and other disciplines where latent factors need to be constructed. This methodology offers insights into the multidimensional nature of latent constructs and facilitates a comprehensive understanding of complex phenomena. It enhances our ability to analyze data and extract meaningful information from hierarchical structures, contributing to advancements in various research fields.
Further developments of UFA include relaxing the non-negativity assumption on the correlation matrix, which, while realistic in many applications—especially in psychometric studies—could be too restrictive in other fields. Additionally, when the phenomenon under study does not show a hierarchical structure, the ultrametricity assumption could also be restrictive, and an hypothesis test to determine how far the data are from this condition could be useful for discriminating the applicability of the proposal.
Data availability statement
The MATLAB source code of UFA is openly available at https://github.com/AuthorsSubmission/UFA.
Funding statement
The authors did not receive support from any organization for the submitted work.
Competing interests
The authors declare none.