In the XML standard, data are represented as unranked labeled
ordered trees. Regular unranked tree automata provide a useful
formalism for the validation of schemas enforcing regular structural
constraints on XML documents. However some concrete application
contexts need the expression of more general constraints than the
regular ones. In this paper we propose a new framework in which
context-free style structural constraints can be expressed and
validated. This framework is characterized by: (i) the introduction
of a new notion of trees, the so-called typed unranked labeled
trees (tulab trees for short) in which each node receives
one of three possible types (up, down or fix), and (ii) the
definition of a new notion of tree automata, the so-called
nested sibling tulab tree automata, able to enforce
context-free style structural constraints on tulab tree languages.
During their structural control process, such automata are using
visibly pushdown languages of words
[R. Alur and P. Madhusudan,
Visibly pushdown languages,
36th ACM symposium on Theory of
Computing, Chicago, USA (2004) 202–211]
on their
alphabet of states. We show that the resulting class NSTL of
tulab tree languages recognized by nested sibling tulab tree
automata is robust, i.e. closed under Boolean operations and with
decision procedures for the classical membership, emptiness and
inclusion problems. We then give three characterizations of NSTL:
a logical characterization by defining an adequate logic in which
NSTL happens to coincide with the models of monadic second order
sentences; the two other characterizations are using adequate
encodings and map together languages of NSTL with some regular
sets of 3-ary trees or with particular sets of binary trees.