1 Introduction
1.1. What predicativism, and why?
In [Reference Shapiro44] the basic historic problem of the research in foundations of mathematics (FOM) is formulated as follows:
How to reconstruct mathematics on a secure basis, one maximally immune to rational doubts.
The predicativist program [Reference Crosilla, Jaeger and Sieg11, Reference Crosilla, Mainzer, Schuster and Schwichtenberg12, Reference Feferman and Shapiro24, Reference Weaver49, Reference Weaver51] has been one of the attempts to solve this basic problem of FOM. It seeks to establish certainty in mathematics without revolutionizing it or changing its underlying classical logic (as the intuitionistic program does). The program was initiated by Poincaré [Reference Poincaré36–Reference Poincaré39]. Its viability was demonstrated by Weyl, who seriously developed it for the first time in his famous small book “Das Kontinuum” [Reference Weyl52, Reference Weyl, Pollard and Bole54]. Weyl, and then Feferman [Reference Feferman22, Reference Feferman25], have shown that a very large part of classical analysis can be developed within their predicative systems. Feferman further argued that predicative mathematics in fact suffices for developing all the mathematics that is actually indispensable to present-day natural sciences. Hence the predicativist program has been successful in solving the basic problem of FOM. (In my opinion it is the only one about which this can truly be said.)
Poincaré’s predicativism started as a reaction to the set-theoretical paradoxes. However, in the writings of both Poincaré and Weyl, predicativity derives not so much from the need to avoid paradoxes, but from their definitionist view that infinite objects, such as sets or functions, exist only in so far as they are introduced through legitimate definitions:
“No one can describe an infinite set other than by indicating properties which are characteristic of the elements of the set. And no one can establish a correspondence among infinitely many things without indicating a rule, i.e., a relation, which connects the corresponding objects with one another. The notion that an infinite set is a ‘gathering’ brought together by infinitely many individual arbitrary acts of selection, assembled and then surveyed as a whole by consciousness, is nonsensical.” [Reference Weyl, Pollard and Bole54, p. 23]
The implications of the above principle concerning infinite objects depend of course in a crucial way on the question: What definitions should be accepted as ‘legitimate’? Therefore it is no wonder that ‘predicativism’ (like ‘constructivism’) becomes a name of a group of approaches to mathematics and its foundations [Reference Crosilla, Mainzer, Schuster and Schwichtenberg12, Reference Feferman and Shapiro24].Footnote 1 We emphasize that in this paper we reserve this name solely to the program as it was initiated by Poincaré and pursued by Weyl. That program is known nowadays as ‘predicativity given the natural numbers’, since in addition to the definitionist principle mentioned above, it also accepts the collection $\mathcal {N}$ of the natural numbers as a well understood mathematical concept that constitutes a set. Moreover, it views the idea of iterating an operation or a relation a finite number of times as fundamental, and accepts induction on the natural numbers as a universally valid method of proof.Footnote 2 Still, even with this restriction, the word ‘predicative’ has two different interpretations, corresponding to Poincaré’s “two distinct diagnoses of the source of the paradoxes” ([Reference Feferman and Shapiro24]; see also [Reference Crosilla, Jaeger and Sieg11]). We call them ‘Russell’s predicativity’ and ‘Poincaré–Weyl predicativity’. This paper is devoted to the second one. However, since it is the first which is usually identified with predicativism, we discuss it first.
1.2 Russell’s predicativity
Adopting the analysis indicated in Richard’s paper [Reference Richard40], Poincaré’s first diagnosis was that the definition of Richard’s paradoxical real number is circular: it uses the totality of all definitions, to which it already belongs. The corresponding Vicious Circle Principle, VCP, was adopted by Russel in [Reference Russell41] and in Principia Mathematica [Reference Whitehead and Russell55]. According to the latter, a vicious circle arises when we assume that “a collection of objects may contain members which can only be defined in terms of the collection itself”.Footnote 3 A clearer (and stronger) formulation of the VCP has been given by Kreisel in [Reference Kreisel30]:
“A predicative definition of a set (say, of natural numbers) is required to use quantification only over ‘previously defined’ totalities; the set of natural numbers themselves is supposed to be given, or the notion of ‘finite’ is supposed to be well-defined.”
Kreisel then went on and note that
“The traditional way of making the idea of a predicative definition explicit is by introducing a ramified hierarchy.”
The idea of ramified hierarchy was introduced and used by Russell in Principia Mathematica. Later it was generalized by Wang [Reference Wang48] and Kreisel. In the second-order context the generalization is explained in [Reference Feferman and Shapiro24] as follows:
“The basic step in that hierarchy consists in passing from a collection D of subsets of $\mathcal {N}$ to a new collection $D^\star $ by putting a set S in $D^\star $ just in case there is a formula $\varphi (x)$ of second-order arithmetic such that for all n, $n\in S\leftrightarrow (\varphi (n))^D,$ where the superscript ‘D’ indicates that all second-order quantifiers in $\varphi $ are relativized to range over D. Then we can define the collections $R_\alpha $ for arbitrary ordinals $\alpha $ by $R_0=\emptyset , R_{\alpha +1}=(R_\alpha )^\star , \mbox { and for limit } \alpha , R_\alpha =\bigcup _{\beta <\alpha }R_\beta $ .”
This description raises the question: What ordinals $\alpha $ can serve for the purpose of constructing this ramified hierarchy of $R_{\alpha }$ s? To answer this, Kreisel proposed in [Reference Kreisel29] an autonomous process, where a well-ordering becomes available at some stage only if it has been defined and recognized (as a well-ordering of $\omega $ ) at an earlier stage. Without the ‘recognition’ criterion, which introduces proof-theoretic considerations, we are left with a purely semantic condition that allows to go up to every well-ordering $<\omega _1^{CK}$ (Church–Kleene’s first non-recursive ordinal). With the recognition condition, Feferman [Reference Feferman15] and Schütte [Reference Schütte, Crossley and Dummett42] independently replaced $\omega _1^{CK}$ by the much smaller $\Gamma _0$ (the Feferman–Schütte ordinal). Following their work, the hierarchies of formal systems up to $\Gamma _0$ which were developed by them (on the basis of the intuitive semantics of the $R_{\alpha }$ s) have become the “canonical reference: one considers predicative any formal system which can be reduced to a system in that hierarchy” [Reference Crosilla, Jaeger and Sieg11]. Accordingly, $\Gamma _0$ is almost universally accepted as the ‘ordinal of predicativity’. An example of the implications of this is given in [Reference Crosilla, Oliveri, Ternullo and Boscolo13]: “The fact that the proof-theoretic strength of theories of inductive definitions exceeds the strength of the whole ramified hierarchy is taken as clear indication that generalized inductive definitions involve impredicativity.”
Up to now, the only mathematician to reject the ‘ $\Gamma _0$ -thesis’ has been Weaver, who forcefully attacked this thesis in [Reference Weaver50]. Unfortunately, his (in my opinion quite justified) criticism of the various justifications given by Feferman for this thesis has been almost totally ignored by the logical community. Nevertheless, as a true predicativist (which is what I am taking myself to be), it is clear to me that the identification of predicativity with the ramified systems of Feferman and Schütte cannot be correct. A first, very simple, problem with it is that no predicativist (and for that matter—no mathematician) thinks in terms of ramified systems. Moreover, Feferman admits that “ramified theories are unsuitable as a framework for the development of analysis” [Reference Feferman and Shapiro24]. Another problem, that was repeatedly noted by Feferman himself, is that the general notions of well-ordering and ordinal on which they are based are not predicatively acceptable.Footnote 4 Third, and most important: just the description and understanding of the ramified hierarchy rely on principles that are not included in the theories in [Reference Feferman15] and [Reference Schütte43] which are based on this hierarchy. This is rather clear in case $\alpha $ is a limit ordinal: $R_{\alpha }$ is actually defined in this case as $\bigcup \{R(x)\mid x\in \{\beta \mid \beta <\alpha \}\}$ . Hence it is based on accepting at least some instances of ZF’s axioms of union and replacement. However, if we let $R_0=\mathcal {N}$ (which is the predicativist natural starting point, rather then $R_0=\emptyset $ ), then similar problems exist even if we do not use transfinite ordinals, but only the natural numbers (as Russell did). Thus already in constructing elements of $R_2$ we allow quantification over $R_1$ . This should mean that $R_1$ is taken as a “complete totality”. But $R_1$ is not obtained using just “quantification only over previously defined totalities”, and it is actually unclear what $R_1$ is at the first place. Usually it is identified with the collection (set?) of arithmetical subsets of $\mathcal {N}$ . If so, then each element of $R_1$ is indeed obtained using just quantification only over $\mathcal {N}$ . But this does not mean that so is $R_1$ itself. (For example, if we identify ordinals with von Neumann’s ordinals, then $\Gamma _0$ is not predicatively definable according to the $\Gamma _0$ -thesis, even though each element of $\Gamma _0$ is.) In fact, the only reasonable definition of the collection of arithmetical sets I can think of is the following:
where True is the set of true sentences in the first-order language of PA, and $predicate(n)$ means that n is a formula with exactly one free variable. (As usual, we identify here a formula with its Gödel number.) However, this definition relies on the availability of the set True, which is not arithmetical. It follows that if $R_1$ is the collection of arithmetical sets then it can only be defined using a set that at best belongs to some $R_i$ such that $i>1$ , and so is defined in terms of $R_1$ . This is obviously circular. To construct the ramified hierarchy we have therefore either to accept in addition to $\mathcal {N}$ infinitely many other sets as ‘given’ to us, or to realize that some hidden predicatively acceptable principles are involved already in the passage from $\mathcal {N}$ to $R_1$ . The first option completely goes against the central ideas of Poincaré and Weyl. So we are left with the second.
Note 1.1. A close inspection shows that in general, the passage from D to $D^\star $ implicitly involves accepting two principles:
-
• One may use the model-theoretic operation that associates with any formula $\varphi (x)$ of second-order arithmetic the set $\{n\in N\mid N\models (\varphi (n))^D\}$ .
-
• One may take as predicatively valid the instance of the replacement axiom that allows us to construct the image of this model-theoretic operation.
Note 1.2. There are some other indications that the ‘ $\Gamma _0$ -thesis’ might be wrong.
-
• A system T which is proof-theoretically reducible to $\bigcup _{\alpha <\Gamma _0} R_{\alpha }$ is called in the literature ‘locally predicative’. About such T Feferman wrote in [Reference Feferman and Shapiro24]:
Though the system T as a whole may not be justifiable predicatively, each theorem $\varphi $ of T rests on predicative grounds, at least indirectly. In practice, more can be said: T is conservative over the autonomous ramified progression for arithmetic sentences, i.e., if $\varphi $ is arithmetical and provable in T then it is provable in that progression. For second-order T this can often be strengthened to conservativity for $\Pi _1^1$ -sentences.
-
• In the conclusion of [Reference Pohlers, Kahle and Rathjen35], Pohlers made the following observation:
The studies in the present paper lead to the suspicion that the role of $\Gamma _0$ as the limit of predicativity might not depend on deep philosophical but rather on technical reasons.
1.3 Poincaré and Weyl’s view of predicativity as invariance
As noted above, a notion of predicativity which is quite different from the Russellian one was introduced by Poincaré in [Reference Poincaré38]. This was particularly emphasized in [Reference Crosilla, Jaeger and Sieg11]:
“For Poincare impredicative definitions are problematic as they treat as completed infinite classes which are instead open-ended or incomplete by their very nature. Predicative definitions, instead, guarantee that the classes so defined are stable and invariant.”
In Poincaré own words:
“Hence a distinction between two species of classifications, which are applicable to the elements of infinite collections: the predicative classifications, which cannot be disordered by the introduction of new elements; the non predicative classifications, which are forced to remain without end by the introduction of new elements.”
This view of predicativity underlies Weyl’s great work in [Reference Weyl52]. A careful reading (see [Reference Adams and Luo2, Reference Avron6]) of this book and of related papers of Weyl on the subject shows that predicativity as invariance is based in his work on the following principles:Footnote 5
-
1. Sets are ‘produced’ genetically, that is, from applying legitimate operations to sets which are accepted as basic, or had previously been produced.
“If we imagine, as is appropriate for an intuitive understanding, that the relations and corresponding sets are ‘produced’ genetically, then this production will …occur in merely parallel individual acts (so to speak).” [Reference Weyl, Pollard and Bole54, p. 40]
-
2. Accordingly, the elements of a set are logically prior to that set.
-
3. Sets are extensional, and the identity of a set is fully determined by the identity of its elements—sets that have the same elements are identical.
-
4. There is no single, complete intended universe V of sets. The ‘universe of sets’ is created in stages, and is always open and growing. To each stage corresponds what Weyl called a sphere of operation (i.e., a definite universe of sets equipped with some (finite) collection of predicates and operations) in which terms and formulas take values.
“Thus, contrary to Cantor’s proposal, no universal scale of infinite ordinal and cardinal numbers applicable to every sphere of operation can exist.” [Reference Weyl, Pollard and Bole54, p. 24]
“The numbers can (in any sphere of operation) be used to determine the cardinality of sets.” [Reference Weyl, Pollard and Bole54, p. 55]
-
5. The current sphere of operation may be expanded in the future, e.g., by introducing new legitimate methods of defining sets, which in turn might produce new sets. The truth values of formulas may then be changed.
“If we regard the principles of definition as an ‘open’ system, i.e., if we reserve the right to extend them when necessary by making additions, then in general the question of whether a given function is continuous must also remain open (though we may attempt to resolve any delimited question). For a function which, within our current system, is continuous can lose this property if our principles of definition are expanded and, accordingly, the real numbers ‘presently’ available are joined by others.” [Reference Weyl, Pollard and Bole54, p. 87]
-
6. Values of terms and truth-values of formulas are always evaluated with respect to some definite sphere of operation—never with respect to the whole open ‘universe of sets’. Hence classical logic is accepted as valid. For example, $\neg \varphi \vee \varphi $ is valid in any sphere of operation, even though the truth value of $\varphi $ may change when the current sphere of operation is expanded.
-
7. In contrast, the identity of already existing objects (and so the value of terms) should remain the same even if the current sphere of operation is expanded. Accordingly, a definition of an object is legitimate (or ‘predicative’) if, and only if, the identity of the object it defines is invariant under extension. (Because of this principle, there are certain constraints in Weyl’s system on the use of quantification in definitions. However, there are no constrains in that system on the use of quantifiers in building formulas.) Weyl called an object so defined “a definite, self-existent object”. Similarly, a definition of an operation is legitimate if the results of its applications depend only on the identity of the arguments, but not on the specific sphere of operation in which the application is made.
-
8. Any theory we develop should be true not only in the current sphere of operation, but in any future one. Hence our current theories impose constraints on future spheres of operation. Accordingly, expanding our spheres of operation and extending our theories are done simultaneously. Moreover:
“Our principles for the formation of derived relations can be formulated as axioms concerning sets and functions; and, in fact, mathematics will proceed in such a way that it draws the logical consequences of these axioms.” [Reference Weyl, Pollard and Bole54, p. 44]
-
9. The predicates of elementhood ( $\in $ ) and equality (=) are basic.
-
10. Using ramification in definitions, or classifying sets according to ‘levels’, should be avoided.
“The temptation to pass beyond the first level of construction must be resisted; instead, one should try to make the range of constructible relations as wide as possible by enlarging the stock of basic operations.” [Reference Weyl53]
It should be emphasized that according to the last quotation, Principle 10 is not due just to the inconvenience which is caused by using ramified systems. There is also a direct conflict between the Russellian approach and Weyl’s approach. The former is implicitly based on the view that there are no predicatively legitimate methods of defining subsets of $\mathcal {N}$ beyond those that are allowed in the construction of the hierarchy of $R_{\alpha }$ s. This view entails that the union of the $R_{\alpha }$ s includes all the predicatively acceptable subsets of $\mathcal {N}$ . In contrast, already the collection of first-level subsets of $\mathcal {N}$ is an open collection according to Weyl’s view. Hence it may always be extended “by enlarging the stock of basic operations”. Weyl indeed did just that when he added the very important operation of iteration to this stock. As discovered by Feferman in [Reference Feferman, Cellucci and Sambin20], this operation makes it possible to define non-arithmetical subsets of $\mathcal {N}$ . Hence Weyl recognized as first-level subsets of $\mathcal {N}$ sets that do not belong to what is called above $R_1$ !Footnote 6
Note 1.3. Actually, the invariance criterion has been used by Feferman too in some of his unramified systems. Thus, IR, the first of the two unramified second-order locally predicative systems which were given in [Reference Feferman15], uses the Hyperarithmetic Comprehension Rule $\Delta _1^1$ -CR. This is justified in [Reference Feferman and Shapiro24] as follows:
“The motivation for $\Delta _1^1$ -CR is the recognizable absoluteness (or invariance) of provably $\Delta _1^1$ definitions, in the following sense. At each stage one has recognized certain closure conditions on the ‘open’ universe of sets, and the definitions $D(x)$ of sets introduced at the next stage should be independent of what further closure conditions may be accepted. In the words of Poincaré, the definitions used of objects in an incomplete totality should not be “disturbed by the introduction of new elements.” Thus if U represents a universe of sets (subsets of $\mathcal {N}$ ) satisfying given closure conditions and is extended to $S^\prime $ (satisfying the same closure conditions and possibly further ones) one wants D to be provably invariant or absolute in the sense that $(\forall x)[D^U(x)\leftrightarrow D^{U^\prime }(x)]$ . This requirement is easily seen to hold for provably $\Delta _1^1$ formulas D.”
This passage contains a rather accurate description of the invariance criterion. However, it provides no explanation how it is connected with the Russellian approach, on which Feferman’s ramified systems are based.
1.4 Predicative set theories: Why and how
1.4.1 Why
Feferman’s systems (and to a lesser degree also Weyl’s system in [Reference Weyl52], as formalized in [Reference Avron6]) have one big drawback: they are practically inaccessible to the majority of the mathematical community. We believe that the major reason for this is that those systems do not use the framework of axiomatic set theory, which is almost universally accepted as the basic framework that provides the foundations of mathematics. What is more: Feferman’s systems are by far more complicated in comparison to impredicative axiomatic set theories like ZF, which are currently used for developing the whole of present day mathematics.
Another reason to prefer the set-theoretical framework is that some of its principles are anyway underlying the constructions on which the second-order ramified systems of Feferman and Schütte are based. Thus we have seen in Section 1.2 that the construction of the hierarchy of $R_\alpha $ s uses instances of the union and replacement axioms of $\mathbf {ZF}$ . But in what cases is such a use predicatively justified? It seems to me that developing predicative set theories is the only way to answer such questions. What is more, the notion of ordinal, which is crucial for the ramified systems but is also very problematic in their context, is not problematic at all in the set-theoretical one (assuming that the $\in $ -relation is well-foundedFootnote 7 ). In the latter, one can use the notion of von Neumann ordinals, and those are defined by a simple, absolute formula.
Finally, it is worth noting that predicativism was born as a reaction to the set-theoretical paradoxes, and was intended to provide a satisfactory solution to them. So (at least in my opinion) it should be most natural to develop predicative mathematics in the framework in which it has started.
Note 1.4. Locally predicative (in the sense of being proof-theoretically reducible to the systems of [Reference Feferman15]) set theories have already been introduced by Feferman in [Reference Feferman16, Reference Feferman, Scott and Jech18]. We shall say more about them in Section 4.4.
1.4.2 How
In the case of pure set theories, the main principle of the predicativity as invariance view of Poincaré and Weyl can be expressed as follows: a set exists if and only if can be determined by an invariant definition. Accordingly, the main two features of the system $\mathbf {PW}$ which is developed in the sequel are:
-
(I) Any existence claim which is made in one of those axioms of $\mathbf {PW}$ whose purpose is to allow the expansion of the sphere of operation is actually an existence and uniqueness claim. In other words, positive occurrences of the quantifier $\exists $ in such an axiom are in the form $\exists !$ . (This, of course, rules out the axiom of choice, as well as the axiom of $\Delta _0$ -collection of Kripke–Platek set theory as given in [Reference Barwise9].)
-
(II) Following Principles 1 and 7 in Section 1.3, any definition of a set which is made in $\mathbf {PW}$ is invariant. This is ensured by employing a syntactically defined invariance relation $\succ $ between formulas and finite sets of variables. The intended meaning of “ $\varphi (x_1,\ldots \!,\! x_n,y_1,\ldots \!,\! y_k)\succ \{x_1,\ldots \!,\! x_n\}$ ” is: “The identity of $\{\langle x_1,\ldots \!,\! x_n \rangle \mid \varphi \}$ is invariant: it depends only on the values assigned to $y_1,\ldots \!,\! y_k$ , but not on the surrounding universe”.Footnote 8
Note 1.5. Principle (I) is not applicable to general validities like logically valid formulas or instances of $\in $ -induction. A trivial example is given by $\forall x\neg \varphi \vee \exists x\varphi $ , where $\varphi $ is arbitrary. In general, theorems of $\mathbf {PW}$ of this sort cannot be used for introducing new sets, or for providing absolute identification of existing ones.
The following other important features of $\mathbf {PW}$ also directly correspond to the principles of Weyl and Poincaré that were described in Section 1.3:
-
1. Like in ZF, and unlike in the systems of Weyl and Feferman, our system has a single type (or ‘category’, in Weyl’s terminology) of objects: sets. (This partially corresponds to Principle 10 in Section 1.3.)
-
2. Like in most of Feferman’s systems, $\mathbf {PW}$ is practically not really a single theory, but involves many theories, all of them first-order. In each stage of working within it, we do have a single theory T, but we have two options how to proceed: We may simply derive new theorems in T, but we may also move to a strictly stronger theory $\mathbf {T}^\star $ in an expanded language.Footnote 9 (This feature of $\mathbf {PW}$ implements Principles 4, 5, and 8 in Section 1.3.)
-
3. The logic of all theories in $\mathbf {PW}$ is classical logic (Principle 6 in Section 1.3).
-
4. The initial language of the system includes just two predicate symbols: $=$ and $\in $ (Principle 9 in Section 1.3) and a constant $\omega $ for the set of natural numbers (taken to be the finite von Neumann ordinals). The inclusion of the latter is actually not essential, but it reflects well the central place that the natural numbers have in the predicativism of Poincaré and Weyl.
-
5. The following axiom and axiom schemas are included in all theories in $\mathbf {PW}$ :
-
• The axiom of extensionality (Principle 3 in Section 1.3).
-
• Comprehension for invariant formulas (Principle 7 in Section 1.3).
-
• $\in $ -induction (which implements the vague Principle 2 in Section 1.3).Footnote 10
-
-
6. Our main method of extending a given predicative set theory T to a stronger predicative set theory $\mathbf {T}^\star $ is by adding a new symbol to the signature of T, together with an axiom that defines it. (In addition, we include of course in $\mathbf {T}^\star $ all the instances in the extended language of the axiom schemas of T.) Such an extension is done by applying one of the syntactic methods that $\mathbf {PW}$ provides for this purpose.
-
7. As usual, extending T by an operation symbol is allowed only if T proves some corresponding existence and uniqueness conditions. Still, the extension is usually not conservative.
-
8. Adding an n-ary predicate symbol P is allowed only if its defining axiom implies its absoluteness. Similarly, adding an n-ary operation symbol F is allowed only if its defining axiom implies that the formula $y=F(x_1,\ldots ,x_n)$ is invariant with respect to y in case $y,x_1,\ldots ,x_n$ are distinct.
Note 1.6. In designing $\mathbf {PW}$ we have adopted two additional principles:
-
• Platonists should be able to accept any theory in our framework. In particular, every such theory is actually a subsystem of $\mathbf {ZF}-(P)$ (ZF without the powerset axiom). This principle immediately rules out, e.g., Axiom VIII (Enumerability) of $\mathbf {PS_1E}$ from [Reference Feferman, Scott and Jech18] and the Axiom of Countability of $\mathbf {ATR_0^{Set}}$ from [Reference Simpson46] (which says that every set is countable).Footnote 11
-
• Every rule or axiom of $\mathbf {PW}$ is a very close counterpart of some rule or axiom that was used by Feferman in one of his predicative (or locally predicative) systems. Hence $\mathbf {PW}$ should be accepted as predicative by anyone who accepts those systems of Feferman’s as predicative.
Note 1.7. Among the axioms of ZFC, $\mathbf {PW}$ completely rejects the axiom of powerset and the axiom of choice, and it restricts the axiom schema of separation to the case in which the separating condition is absolute. It also accepts only special cases of the axiom schema of replacement. In our opinion, these are properties that should be shared by any predicative set theory.
Note 1.8. At least in my opinion, $\mathbf {PW}$ is simpler and easier to work in than any of Feferman’s systems (see Section 4.4). Moreover, we believe that as long as we confine ourselves to $\mathbf {ZF}-(P)$ (i.e., neither the powerset axiom nor the axiom of choice is allowed to be used), working in $\mathbf {PW}$ exactly reflects the way work in set theory is done in reality. In particular, one can use abstraction terms as well introduce new operation and predicate symbols in the usual way. Only when there is a worry whether what is done is predicatively justified, a need arises to check whether certain syntactic conditions are satisfied. In most cases this only involves checking whether a certain formula is bounded or is a $\Sigma $ -formula in the usual sense of this concept (see Note 4.7). However, there is one important case (the comprehension schema) in which one should check for some formula $\varphi $ whether $\varphi \succ \{x_1,\ldots ,x_n\}$ or not (where $x_1,\ldots ,x_n$ are variables). This is also rather easy once one gets used to the syntactic definition of $\succ $ , since that definition closely follows the logical structure of $\varphi $ . In fact, what is involved is just a short mechanical check, which is usually much easier than proving that the existence of a certain set logically follows from the standard comprehension axioms of $\mathbf {ZF}-(P)$ (or full $\mathbf {ZF}$ ).
1.5 The structure of the paper
Section 2 explains our notations and terminology. Section 3 establishes the predicativity of the set of natural numbers, using a rather weak subsystem of $\mathbf {PW}$ . $\mathbf {PW}$ itself is precisely defined, justified, and compared with Feferman’s Systems in Section 4. Section 5 includes important examples of the power of $\mathbf {PW}$ . Section 6 develops in $\mathbf {PW}$ the fundamentals of the theory of von Neumann ordinals. Section 7 includes the main results of this paper: that $\mathbf {PW}$ provides terms which define $\Gamma _0$ and much bigger ordinals, and that it can prove the main properties of those ordinals. We conclude in Section 8 with some remarks and directions for further research.
2 Terminology and notations
2.1 The difference between operations and functions
In standard textbooks on first-order theory, it is common to refer to the symbols in a signature of a first-order language as ‘relation symbols’ and ‘function symbols’. We cannot use this terminology here, since we reserve the words ‘relation’ and ‘function’ to their official meaning in set theory, that is, to sets of pairs satisfying certain conditions. Instead, we use the name ‘predicate’ for any “relation” that is not a set (like the predicates $\in $ or $=$ ), and we use the name ‘operation’ for any “function” that is not a set (like the operation of union on sets or the operation of addition on ordinals).Footnote 12 Accordingly, the symbols of a first-order signature are divided in this paper into predicate symbols, operation symbols, and constants. The latter may actually be viewed as operations with arity 0, except that they should always be interpreted as sets.
2.2 Notations
We assume that every first-order language for a set theory has infinitely many variables for sets, officially taken here to be $v_0,v_1,\ldots $ . We use letters (small or capital) from the end of the Latin alphabet to vary over these variables, and letters from the beginning of that alphabet as general variables for sets in the metalanguage. As usual, $i,j,k,l,m,n$ are used as special variables for natural numbers (in both the metalanguage and the formal language). t and s serve as variables (in the metalanguage) for terms, and $\varphi ,\psi ,\theta $ (and sometimes also A and B) as variables for formulas. In all cases the variables may be decorated with subscripts or superscripts. We denote by $Fv(\varphi )$ ( $Fv(t)$ ) the set of free variables of $\varphi $ (of t). When we denote a formula by $\varphi (x_1,\ldots ,x_n)$ it means that $\{x_1,\ldots ,x_n\}\subseteq Fv(\varphi )$ . On the other hand, when we write $\varphi (\vec {y},x)$ it means that $\vec {y}=\langle y_1,\ldots ,y_n \rangle $ for some n (whose identity may be obtained from the context or it does not matter); the variables $x,y_1,\ldots ,y_n$ are all distinct from each other; and $Fv(\varphi )=\{x,y_1,\ldots ,y_n\}$ .
The substitution of a term t for a free variable y in a formula $\varphi $ (a term s) is denoted by $\varphi \{t/y\}$ ( $s\{t/y\}$ ). However, when we denote a formula by $\varphi (y)$ ( $\varphi (x,y)$ , $\varphi (\vec {x},y)$ ) we might simply write $\varphi (t)$ ( $\varphi (x,t)$ , $\varphi (\vec {x},t)$ ) instead.
Given a first-order signature $\sigma $ , we take a structure for $\sigma $ to be a pair $\mathcal {M}=\langle \mathcal {D},I \rangle $ , where $\mathcal {D}\neq \emptyset $ is the domain of $\mathcal {M}$ and I is its interpretation function. If r is one of the symbols in $\sigma $ we shall usually write $r^I$ instead of $I(r)$ . If $\nu $ is an assignment of elements of $\mathcal {D}$ to variables of the language, $x_1,\ldots ,x_n$ are n distinct variables, and $\vec {a}\in \mathcal {D}^n$ , we denote by $\nu \{\vec {x}:=\vec {a}\}$ the assignment which is obtained from $\nu $ by assigning $a_i$ to $x_i$ ( $i=1,\ldots ,n$ ). We denote by $\nu _{\mathcal {M}}[t]$ the element of $\mathcal {D}$ that $\nu $ assigns according to I to the term t of $\sigma $ . Similarly, if f is an operation symbol of $\sigma $ then we use in the metalanguage square brackets to denote applications of $f^I$ to arguments in $\mathcal {D}$ . Thus, if f is n-ary, and $\nu $ is an assignment in $\mathcal {D}$ , then $\nu _{\mathcal {M}}[f(t_1,\ldots ,t_n)]=f^I[\nu _{\mathcal {M}}[t_1],\ldots ,\nu _{\mathcal {M}}[t_n]]$ . We write $\mathcal {M},\nu \models \varphi $ in case $\nu $ satisfies in $\mathcal {M}$ the formula $\varphi $ of $\sigma $ . If $Fv(\varphi )=\{x_1,\ldots ,x_n\}$ , and $\nu [x_i]=a_i$ ( $i=1,\ldots ,n$ ) then we might write instead $\mathcal {M}\models \varphi (a_1,\ldots ,a_n)$ .
Finally, when we refer in the metalanguage to the collection of things that satisfy a certain condition C we shall denote it by $[x\mid C(x)]$ , reserving the notation $\{x\!\mid \!\varphi (x)\}$ for being used in our formal system. Moreover, in case there is a danger of confusion, we shall use ‘:’ in the metalanguage instead of ‘ $\in $ ’. (Recall that the latter is a basic symbol of the language of our system.)
3 The predicativity of the natural numbers
Recall that all of our systems are based on classical first-order logic with identity.
Our system $\mathbf {PW}$ includes a constant $\omega $ for the natural numbers. Before presenting $\mathbf {PW}$ in the next section, we should justify this inclusion on the basis of the invariance criterion. We do that by:
-
1. Providing a bounded formula $N(x)$ in the language of set theory that defines when x is a natural number (i.e., a finite von Neumann ordinal).
-
2. Present a basic predicative set theory $\mathbf {VBS}$ , in which one can show that $N(x)$ is adequate for the task. This is done by proving in it all the properties that one expects from a formula that defines the natural numbers.
-
3. Give an intuitive proof in the metalanguage that the formula $N(x)$ is invariant, and so it may be used for defining a new set.
We start by presenting $\mathbf {VBS}$ .Footnote 13 The axioms of this system include the Extensionality axiom [Ext] and the $\in $ -induction axiom schema [ $\in $ -ind] from Section 4.1.6, as well as the following four elementary instances of the general predicative comprehension scheme:
-
1. Empty Set ([Em]):
$$ \begin{align*}\exists Z\forall x(x\in Z\leftrightarrow x\neq x).\end{align*} $$ -
2. Pairing ([Pa]):
$$ \begin{align*}\forall x\forall y\forall\exists Z\forall w(w\in Z\leftrightarrow w=x\vee w=y).\end{align*} $$ -
3. Union ([U]):
$$ \begin{align*}\forall x\exists Z\forall w(w\in Z\leftrightarrow\exists y(y\in x\wedge w\in y)).\end{align*} $$ -
4. Difference ([D]):
$$ \begin{align*}\forall x\forall y\exists Z \forall w(w\in Z\leftrightarrow w\in x\wedge w\not\in y).\end{align*} $$
Note 3.1. It is easy to see that the structure $\langle \mathsf {HF},\in \rangle $ , where HF (which is identical to $V_{\omega }$ ) is the set of the hereditarily finite sets, forms the minimal model of $\mathbf {VBS}$ . Moreover, $\langle V_\alpha ,\in \rangle $ is a model of $\mathbf {VBS}$ whenever $\alpha $ is a limit ordinal.Footnote 14
In order to present the formula $N(x)$ it is convenient (though not really necessary) to use the usual procedure of extension by definitions, and develop $\mathbf {VBS}$ in an enriched language in which the four axioms above are replaced by:Footnote 15
-
1. [Em]: $\forall x(x\not \in \emptyset ).$
-
2. [Pa]: $\forall x\forall y\forall w(w\in \{x,y\}\leftrightarrow w=x\vee w=y).$
-
3. [U]: $\forall x\forall w(w\in \bigcup x\leftrightarrow \exists y(y \in x\wedge w\in y)).$
-
4. [D]: $\forall x\forall y\forall w(w\in x-y\leftrightarrow w\in x\wedge w\not \in y).$
Note 3.2. In general, one should be careful when applying the extension by definitions procedure to theories with axioms schemas, since the extension of such a schema to the expanded language involves the addition of infinitely many new axioms besides those that are allowed by the procedure. This is not a problem in cases like we have here, where every axiom schema is pure in the sense that no constraint is imposed on the formulas to which it may be applied. Therefore we may assume that every instance of $[{\in}{-}ind]$ in the expanded language is an axiom of $\mathbf {VBS}$ . However, one should be cautious about the issue of being conservative when the procedure is applied in more complicated cases, like in case we have $[{\in}{-}ind]$ restricted to some class of formulas (e.g., bounded formulas).
Proposition 3.3. For every $n\geq 0$ ,
Proof We do the case $n=2$ (which trivially implies the cases $n=0$ and $n=1$ ). The proof for any other n is similar.
Given $x_0$ , to show that $\forall x_1\in x_0\forall x_2\in x_1.x_0\not \in x_2$ , we may assume (using $[{\in}{-}ind]$ ) that $(\star )\mbox {\ \ \ } \forall y_0 \in x_0 \forall y_1\in y_0\forall y_2\in y_1.y_0\not \in y_2$ . Suppose now that there are $x_1$ and $x_2$ such that: $x_1\in x_0\wedge x_2\in x_1\wedge x_0\in x_2$ Since $x_1\in x_0$ , we may apply $(\star )$ with $y_0=x_1$ , $y_1=x_2$ , and $y_2=x_0$ , to get that $x_1\not \in x_0$ , which is a contradiction. So no such $x_1$ and $x_2$ exist.
We are ready to introduce our definition of the notion of a natural number:
Definition 3.4.
-
1. $S(x):=x\cup \{x\}$ (where $\{x\}=\{x,x\}$ ).
-
2. $N(x):=\forall y\in S(x)( y=\emptyset \vee \exists z\in x. y=S(z)).$
Note 3.5. Officially, $N(x)$ is the following formula:
Proposition 3.6. Let $0:=\emptyset $ . The following are provable in $\mathbf {VBS}$ :
-
1. $N(0)$ .
-
2. $\forall x(N(x)\leftrightarrow N(S(x)))$ .
-
3. $S(x)\neq 0.$
-
4. $S(x)=S(y) \to x=y.$
Proof Points 1 and 3 are trivial. Point 4 easily follows from Proposition 3.3. We show point 2.
-
• Suppose that $N(x)$ . We show that $N(S(x))$ , i.e., that
$$ \begin{align*}\forall y\in S(S(x))( y=0\vee\exists z\in S(x). y=S(z)).\end{align*} $$So let $y\in S(S(x))$ . We show that $y=0\vee \exists z\in S(x). y=S(z)$ . Since we assume $N(x)$ , this is obvious in case $y\in S(x)$ , because every $z\in x$ is also in $S(x)$ . There remains the case $y=S(x)$ , but this case is trivial, since $x\in S(x)$ (and so x is an element z in $S(x)$ such that $y=S(z)$ ). -
• Suppose that $N(S(x))$ , i.e.,
$$ \begin{align*}\forall y\in S(S(x))( y=0\vee\exists z\in S(x). y=S(z)).\end{align*} $$We show that $N(x)$ . So let $y\in S(x)$ . We show that $y=0\vee \exists z\in x. y=S(z)$ . Suppose that the first disjunct fails, i.e., $y\neq 0$ . Since $S(x)\subseteq S(S(x))$ , our assumption implies that there is $z\in S(x)$ such that $y=S(z)$ . It is impossible that $z=x$ , since in this case we would get that $S(x)\in S(x)$ . Hence $z\in x$ , and we are done.
Proposition 3.7. $\vdash _{\mathbf {VBS}}\varphi \{0/x\}\wedge \forall x(\varphi \to \varphi \{S(x)/x\})\to \forall x(N(x)\to \varphi ).$
Proof Assume $\mathrm {(I)}\ \varphi \{0/x\}\wedge \forall x(\varphi \to \varphi \{S(x)/x\})$ . To show $\forall x(N(x)\to \varphi )$ , it suffices by $[{\in}{-}ind]$ to show that $\forall x(\forall y\in x(N(y)\to \varphi \{y/x\})\to (N(x)\to \varphi ))$ . So assume that $\mathrm {(II)}\ \forall y\in x(N(y)\to \varphi \{y/x\})$ and $\mathrm {(III)}\ N(x)$ . We show $\varphi $ . If $x=0$ then this is implied by (I). If not, then it follows from (III) that there is $z\in x$ such that $x=S(z)$ . Hence (II) entails that $N(z)\to \varphi \{z/x\}$ . But $N(z)$ follows from (III) by the second item of Proposition 3.6. Therefore $\varphi \{z/x\}$ . From this $\varphi $ (which is equivalent to $\varphi \{S(z)/x\}$ ) follows by (I).
Proposition 3.8. Let $<:=\in $ . The following are provable in $\mathbf {VBS}$ :
-
1. $x\not <0.$
-
2. $x<S(y) \leftrightarrow x<y\vee x=y.$
-
3. $N(x)\wedge y<x\to N(y).$
Proof The first two items are trivial. We prove the third by induction on x (i.e., by using Proposition 3.7). So let $\varphi :=\forall y(N(x)\wedge y<x\to N(y))$ . Obviously, $\vdash _{\mathbf {VBS}}\varphi \{0/x\}$ . We show that $\vdash _{\mathbf {VBS}}\forall x(\varphi \to \varphi \{S(x)/x\})$ . So assume $\varphi (x)$ and that $N(S(x))$ and $y<S(x)$ . By Proposition 3.6, the first assumption implies that $N(x)$ . Hence $y=x$ implies that $N(y)$ . Otherwise $y<x$ , and so the induction hypothesis implies that $N(y)$ .
Finally we show that $N(x)$ (intuitively) defines an invariant collection. Since this is an intuitive theorem in the meta-language, the proof is intuitive too. Still, it employs only predicatively acceptable principles.
Proposition 3.9. Let $\mathcal {M}_1$ and $\mathcal {M}_2$ be transitive models of VBS (in the usual sense of set theory) such that $\mathcal {M}_1\subseteq \mathcal {M}_2$ . Then:
Proof The fact that $N(x)$ is bounded (and so absolute) implies that $\omega _1\subseteq \omega _2$ . For the converse, we show that $\forall a\in \mathcal {M}_2((\mathcal {M}_2\models N(a))\to (a\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(a)))$ . So let $a\in \mathcal {M}_2$ , and assume that $\mathcal {M}_2\models N(a)$ . By $[{\in}{-}ind]$ we may assume also that $\forall b\in a((\mathcal {M}_2\models N(b))\to (b\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(b)))$ . Since $\mathcal {M}_2\models N(a)$ , either $a=\emptyset $ , or there exists $b\in a$ such that $\mathcal {M}_2\models a=S(b)$ . In the first case $a\in \mathcal {M}_1$ holds trivially. So assume the second case. Then item 2 of Proposition 3.6 implies that $\mathcal {M}_2\models N(b)$ . Hence the induction hypothesis implies that $b\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(b)$ . Again by Proposition 3.6, this entails that $S(b)\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(S(b))$ . Hence $a\in \mathcal {M}_1\wedge \mathcal {M}_1\models N(a)$ .
4 The system $\mathbf {PW}$
4.1 A description of the system
The language of the system $\mathbf {PW}$ is an one-sorted first-order language with equality. As in Section 2, we take its variables to be $v_0,v_1,\ldots $ , and use letters (both small and capital) from the end of the Latin alphabet to vary over them. All other components of the language (predicate and operation symbols, terms, formulas, the invariance relation $\succ $ and the $\Sigma $ -formulas) are simultaneously generated as described in Sections 4.1.1–4.1.5. The axioms and rules of $\mathbf {PW}$ are presented in Sections 4.1.6 and 4.1.7, respectively. (Note that in the formulation of the last rule [Unif] there is a use of the formula $Fun$ , the operation $Dom$ , and the term $f(x)$ . They are all introduced in Section 5.1 without using [Unif]. $Fun(f)$ says that f is a function. $Dom(f)$ and $f(x)$ have their usual meanings.)
4.1.1 Predicate symbols and operation symbols
-
1. $=$ and $\in $ are binary predicate symbols.
-
2. $\omega $ is a constant (i.e., an 0-ary operation symbol).
-
3. If $\varphi \succ \emptyset $ and $Fv(\varphi )=\{v_0 ,\ldots \!,\! v_{n}\}$ then $P_{\varphi }$ is an $n+1$ predicate symbol.
-
4. If $\varphi $ is $\Sigma $ and $Fv(\varphi )=\{v_0 ,\ldots \!,\! v_{n}\}$ then $F_{\varphi }$ is an n-ary operation symbol.
4.1.2 Terms
-
1. Every variable is a term.
-
2. $F(t_1 ,\ldots \!,\! t_n)$ is a term if F is an n-ary operation, and $t_1 ,\ldots \!,\! t_n$ are terms.
4.1.3 Formulas
-
1. $P(t_1 ,\ldots \!,\! t_n)$ is a formula if P is an n-ary predicate, $t_1 ,\ldots \!,\! t_n$ are terms.
-
2. The formulas are closed under $\neg $ , $\wedge $ , $\vee $ , and $\to $ .
-
3. If $\varphi $ is a formula and x is a variable, then $\exists x\varphi $ and $\forall x\varphi $ are formulas.
4.1.4 The invariance relation $\succ $
-
1. $t\in s\succ \emptyset $ and $t=s\succ \emptyset $ .
-
2. $P_{\varphi }(t_0 ,\ldots \!,\! t_n)\succ \emptyset $ .
-
3. $\varphi \succ \{x\}$ if $\varphi \in \{x\neq x,x=t,t=x,x\in t\}$ and $x\not \in Fv(t)$ .
-
4. $\neg \varphi \succ \emptyset $ if $\varphi \succ \emptyset $ .
-
5. $\varphi \vee \psi \succ X$ if $\varphi \succ X$ and $\psi \succ X$ .
-
6. $\varphi \wedge \psi \succ X\cup Y$ if $\varphi \succ X$ , $\psi \succ Y$ and $Y\cap Fv(\varphi )=\emptyset $ .
-
7. $\exists y \varphi \succ X-\{y\}$ if $y\in X$ and $\varphi \succ X$ .
4.1.5 $\Sigma $ -formulas
-
1. If $\varphi \succ \emptyset $ then $\varphi $ is $\Sigma $ . Such formulas are called absolute.
-
2. If $\varphi $ and $\psi $ are $\Sigma $ then so are $\varphi \vee \psi $ and $\varphi \wedge \psi $ .
-
3. If $\varphi $ is $\Sigma $ then so is $\exists x\varphi $ .
-
4. If $\varphi \succ \{y_1,\ldots \!,\! y_k\}$ , and $\psi $ is $\Sigma $ , then $\forall y_1\dots \forall y_k(\varphi \to \psi )$ is $\Sigma $ .
4.1.6 Axioms
-
1. [Fol]: Every formula which is valid in first-order logic with equality.Footnote 16
-
2. [Ext] (Extensionality): $\forall z (z\in x \leftrightarrow z\in y)\to x=y$ .
-
3. [Comp] ( $\succ $ -Comprehension): $\exists !Z\forall x(x\in Z\leftrightarrow \varphi )$ , provided that $\varphi \succ \{x\}$ .Footnote 17
-
4. [ $\in $ -ind] ( $\in $ -induction): $(\forall x(\forall y(y\in x\to \varphi \{y/x\})\to \varphi ))\to \forall x\varphi $ .
-
5. [Inf] (Infinity): $\forall x(x\in \omega \leftrightarrow N(x))$ , where $N(x)$ is presented in Note 3.5.
-
6. [PrI] $P_{\varphi }(v_0 ,\ldots \!,\! v_{n})\leftrightarrow \varphi $ (provided that $\varphi $ is as in Section 4.1.1).
4.1.7 Rules
-
1. [MP]: From $\varphi $ and $\varphi \to \psi $ infer $\psi $ .
-
2. [Gen]: From $\vdash _{\mathbf {PW}}\varphi $ infer $\vdash _{\mathbf {PW}}\forall x\varphi $ .
-
3. [OpI] From $\vdash _{\mathbf {PW}}\forall v_1\dots \forall v_n\exists !v_0\varphi $ infer $\vdash _{\mathbf {PW}} F_{\varphi }(v_1 ,\ldots \!,\! v_n)=v_0\leftrightarrow \varphi $ (provided that $\varphi $ is as in Section 4.1.1, i.e., $\varphi $ is $\Sigma $ and $Fv(\varphi )=\{v_0 ,\ldots \!,\! v_{n}\}$ ).
-
4. [Unif] (Unification Rule):
$$\begin{align*}\frac {\vdash_{\mathbf{PW}}\forall y_1\forall y_2(\varphi\{y_1/y\}\wedge\varphi\{y_2/y\}\to y_1=y_2)} {\vdash_{\mathbf{PW}}\forall x\in Z\exists y \varphi\to\exists!f(Fun(f)\wedge Dom(f)=Z\wedge \forall x\in Z\varphi\{f(x)/y\})}. \end{align*}$$Provided $\varphi $ is $\Sigma $ , x and y are distinct variables in $Fv(\varphi )$ , and $Z\not \in Fv(\varphi )$ .
Definition 4.1.
-
1. A proof in $\mathbf {PW}$ of a formula $\varphi $ is a sequence of formulas ending with $\varphi $ , such that each element of the sequence is either an axiom of $\mathbf {PW}$ or it can be inferred from previous elements of the sequence by one of the four rules of $\mathbf {PW}$ . $\varphi $ is a theorem of $\mathbf {PW}$ if it has a proof in $\mathbf {PW}$ .
-
2. A derivation in $\mathbf {PW}$ of a formula $\varphi $ from a set $\mathbf {T}$ of assumptions is a sequence of formulas ending with $\varphi $ , such that each element of the sequence is either a theorem of $\mathbf {PW}$ , or an element of $\mathbf {T}$ , or it can be inferred from two previous elements of the sequence by [MP]. $\varphi $ is derivable in $\mathbf {PW}$ from $\mathbf {T}$ if it has a derivation in $\mathbf {PW}$ from $\mathbf {T}$ .
Note 4.2. It should be emphasized that by Definition 4.1 (as well as by the difference in the way the rules of $\mathbf {PW}$ are formulated in Section 4.1.7), [MP] is the only rule of derivation of $\mathbf {PW}$ . All the rest are only rules of proof. This means that they can be applied only in assumptions-free derivations (i.e., pure derivations from axioms). It follows that the deduction theorem obtains for $\mathbf {PW}$ : to show $\mathbf {T}\vdash _{\mathbf {PW}}(\varphi \to \psi )$ it suffices to prove that $\mathbf {T},\varphi \vdash _{\mathbf {PW}}\psi $ . However, while using this theorem in order to show that $\mathbf {T}\vdash _{\mathbf {PW}}(\varphi \to \psi )$ , one should be careful not to rely on a proof of $\psi $ from $\mathbf {T}\cup \{\varphi \}$ in which [Gen],[OpI] or [Unif] are applied to a formula which depends on an assumption in $\mathbf {T}\cup \{\varphi \}$ .Footnote 18
Note 4.3. In the formulation above we have used the quantifier $\exists !$ in [Comp] and in [Unif]. This was done according to Principle (I) in Section 1.4.2. However, in both cases we can actually use the simpler connective $\exists $ . In the case of [Comp] this is due to the axiom [Ext], while in the case of [Unif] it follows from the premise of that rule.
Note 4.4. Like in the system PZF from [Reference Avron and Schindler5], the predicativity of definitions of sets is ensured $\mathbf {PW}$ by using an appropriate syntactic invariance relation $\succ $ between a formula $\varphi $ and subsets of $Fv(\varphi )$ . (The intended meaning of $\succ $ was described in item (II) at the beginning of Section 1.4.2.) Relations of this sort have originally been introduced in [Reference Avron and Hendricks3, Reference Avron4] in order to provide a unified theory of constructions and operations as they are used in different branches of mathematics and computer science, including set theory, computability theory, and database theory (see footnote 8). Further important theorems about them can be found in [Reference Avron, Lev and Levi8].Footnote 19
Note 4.5. It is easy to show by induction that if $\varphi \succ X$ and $Y\subseteq X$ then $\varphi \succ Y$ . Therefore we have not included this important condition from [Reference Avron, Lev and Levi8] in our present definition of $\succ $ , but we shall use it freely in what follows. Another important condition which we shall treat as if it included in the definition of $\succ $ is that $\forall x_1\dots \forall x_n(\varphi \to \psi )\succ \emptyset $ if $\varphi \succ \{x_1 ,\ldots \!,\! x_n\}$ and $\psi \succ \emptyset $ . The reason is that every consequence of this condition can easily be derived without it (because $\forall x_1\dots \forall x_n(\varphi \to \psi )$ is logically equivalent to $\neg \exists x_1\dots \forall x_n(\varphi \wedge \neg \psi )$ .)
Note 4.6. The clauses in the definition of $\Sigma $ -formulas are taken from [Reference Avron4]. This definition is a straightforward generalization of the usual definition of $\Sigma $ -formulas. In particular, it includes all the formulas which are called ‘essentially existential HF-formulas’ in [Reference Feferman17]. From Theorem 4.1 of [Reference Feferman17] it follows that every persistent formula is equivalent in $\mathbf {PW}$ to a $\Sigma $ -formula, but we shall not use this fact here.
Note 4.7. By the results of [Reference Avron, Lev and Levi8], it is possible to use in [Pri], [OpI], and [Unif] bounded formulas instead of our absolute formulas, and the ordinarily defined class of $\Sigma $ -formulas instead of our somewhat bigger class.
Note 4.8. Actually, the use of [PrI] does not really increase the power of $\mathbf {PW}$ , and so we can omit it from $\mathbf {PW}$ . However, it is very convenient to include it.
4.2 $\mathbf {PW}$ and $\mathbf {ZF}-(P)$
In this section we prove that $\mathbf {PW}$ is a subsystem of (an extension by definitions of) $\mathbf {ZF}-(P)$ . It follows that every theorem of $\mathbf {PW}$ is acceptable to the ordinary mathematicians.
Definition 4.9. Let e be a term or a formula or an operation symbol of $\mathbf {PW}$ .
-
1. $\sigma _{e}$ is the minimal signature $\sigma $ that satisfies the following conditions:
-
• It includes $\in $ and $=$ .
-
• It includes all the predicate symbols and operation symbols (including constants) that occur in e.
-
• If either $P_\psi $ or $F_\psi $ is in $\sigma $ , then so are all the predicate symbols and operation symbols that occur in $\psi $ .
-
-
2. e is legal if the premises of [OpI] obtain whenever $F_{\varphi }$ is in $\sigma _{e}$ .
-
3. For legal e, $\mathbf {PW}_{e}$ is the set of all theorem of $\mathbf {PW}$ in the language of $\sigma _{e}$ .
Definition 4.10. Let e be legal. A structure $\mathcal {M}$ is adequate for e if:
-
• $\mathcal {M}$ is a structure for a signature that contains $\sigma _{e}$ .
-
• $\mathcal {M}$ is a model of $\mathbf {PW}_{e}$ .
Definition 4.11. Let $\mathcal {M}$ be a transitive set or class. $\mathcal {M}^e$ is the structure for $\sigma _{e}$ which is obtained from $\mathcal {M}$ by defining an interpretation $I_e$ of $\sigma _{e}$ in $\mathcal {M}$ using the following recursive definition:
-
• $\in ^{I_e}=\in $ .
-
• $P_{\psi }^{I_e}=[\vec {a}: \mathcal {M}^n\mid \mathcal {M}^e,\{\vec {v}:=\vec {a}\}\models \psi ]$ in case $P_{\psi }$ is an n-ary predicate.
-
• If $F_{\psi }$ is an n-ary operation, then $F_{\psi }^{I_e}[\vec {a}]$ is the unique $b\in \mathcal {M}$ such that $\mathcal {M}^e,\{\vec {v}:=\vec {a}\;; \;v_n:=b\}\models \psi $ in case such a unique b exists, $\emptyset $ otherwise.
Theorem 4.12. Let $\mathcal {M}$ be a transitive model of $\mathbf {ZF}-(P)$ , and let $\theta $ be a formula of $\sigma _{e}$ . Suppose that $\theta \succ \{x_1,\ldots ,x_n\}$ where $n>0$ . Then the following collection is an element of $\mathcal {M}$ for every assignment $\nu $ in $\mathcal {M}$ :
Proof By induction on the structure of $\theta $ , using the relevant clauses in the definition of $\succ $ .
-
• If $\theta $ is $x\neq x$ then $S_{\theta ,x}^{\nu }=\emptyset $ for every $\nu $ . Hence $S_{\theta ,x}^{\nu }\in \mathcal {M}$ .
-
• If $\theta $ is $x\in t$ where $x\not \in Fv(t)$ , then $S_{\theta ,x}^{\nu }=\nu [t]$ (because $\mathcal {M}$ is transitive). Hence $S_{\theta ,x}^{\nu }\in \mathcal {M}$ .
-
• If $\theta $ is $x=t$ where $x\not \in Fv(t)$ , then $S_{\theta ,x}^{\nu }=\{\nu [t]\}$ . Hence $S_{\theta ,x}^{\nu }\in \mathcal {M}$ by the pairing axiom.
-
• If $\theta $ is $\varphi \vee \psi $ , where $\varphi \succ \{x_1,\ldots \!,\! x_n\}$ and $\psi \succ \{x_1,\ldots \!,\! x_n\}$ , then $S_{\theta ,\vec {x}}^{\nu }=S_{\varphi ,\vec {x}}^{\nu }\cup S_{\psi ,\vec {x}}^{\nu }$ . Hence $S_{\theta ,\vec {x}}^{\nu }\in \mathcal {M}$ by the induction hypotheses for $\varphi $ and $\psi $ .
-
• Suppose $\theta $ is $\varphi \wedge \psi $ , where $\varphi \succ \{w_1,\ldots w_l\}$ , $\psi \succ \{y_1,\ldots ,y_k\}$ , $n=l+k$ , $\{y_1,\ldots ,y_k\}\cap Fv(\varphi )=\emptyset $ , and $\vec {x}=\langle w_1,\ldots w_l,y_1,\ldots ,y_k \rangle $ . We have three subcases to consider here.
-
– $k=0$ and $n=l$ . Then $S_{\theta ,\vec {x}}^{\nu }= [\vec {a}\in S_{\varphi ,\vec {x}}^{\nu }\mid \psi ]$ . Hence $S_{\theta ,\vec {x}}^{\nu }\in \mathcal {M}$ by the induction hypothesis for $\varphi $ and the separation axiom.
-
– $l=0$ and $n=k$ . This case is similar to the previous one.
-
– $l>0$ and $k>0$ . To simplify notation, assume that $l=k=1$ , $Fv(\varphi )=\{w,z\}$ , $Fv(\psi )=\{w,y,z\}$ . For $c\in \mathcal {M}$ , let
$$ \begin{align*}Z(c)=[a\in \mathcal{M}\mid \mathcal{M}^e\models \varphi(a,c)].\end{align*} $$Since $\varphi \succ \{x\}$ , $Z(c)\in \mathcal {M}$ by the induction hypothesis for $\varphi $ . By the induction hypothesis for $\psi $ , this and the fact that $\psi \succ \{y\}$ imply that $[b\in \mathcal {M}\mid \mathcal {M}^e\models \psi (d,b,c)]\in \mathcal {M}$ for every $c\in \mathcal {M}$ and $d\in Z(c)$ . Denote this set by $W(c,d)$ . Then $[\langle a,b \rangle \in \mathcal {M}^2\mid \mathcal {M}^e\models \theta (a,b,c)]$ equals $\bigcup _{d\in Z(c)}\{d\}\times W(c,d)$ . Hence $[\langle a,b \rangle \in \mathcal {M}^2\mid \mathcal {M}^e\models \theta (a,b,c)]\in \mathcal {M}$ by the axioms of replacement and union.
-
-
• Suppose $\theta $ is $\exists y\varphi $ , where $\varphi \succ \{x_1,\ldots \!,\! x_n,y\}$ . Then
$$ \begin{align*}S_{\theta,\vec{x}}^{\nu}=[\vec{a}\in \mathcal{M}^n\mid \exists b\in M.\langle a_1,\ldots,a_n,b \rangle\in S_{\varphi,\vec{x}}^{\nu}].\end{align*} $$Hence $S_{\theta ,\vec {x}}^{\nu }\in \mathcal {M}$ , by the induction hypothesis for $\varphi $ and the fact that $\mathcal {M}$ is a model of $\mathbf {Z}-(P)$ (and so is closed under the projection operation).
Corollary 4.13. From a platonist point of view, every collection which is defined by [Comp] is set (i.e., an element of the platonist universe $\mathsf {V}$ ).
Proof $\mathsf {V}$ is (believed to be) a model of $\mathbf {Z}-(P)$ . Hence Theorem 4.12 applies.
Note 4.14. Another platonist collection which is known to be a model of $\mathbf {Z}-(P)$ is $\mathsf {HC}$ (the collection of hereditarily countable sets). From Theorem 4.24 it follows that if $\theta \succ \{x_1 ,\ldots \!,\! x_n\}$ then the collection $S_{\theta ,\vec {x}}^{\nu }$ is the same in $\mathsf {V}$ and in $\mathsf {HC}$ for every assignment $\nu $ in $\mathsf {HC}$ . Hence it is countable for every such $\nu $ .
Theorem 4.15. Every transitive model $\mathcal {M}$ of $\mathbf {Z}-(P)$ can be expanded to a model of $\mathbf {PW}$ .
Proof Obviously, the recursive definition given in Definition 4.11 can be extended to provide an expansion of $\mathcal {M}$ to a structure $\mathcal {M}^{\mathbf {PW}}$ for the language of $\mathbf {PW}$ . Now the axiom [Comp] is valid in this structure by Theorem 4.12, while the validity of [Unif] follows from the fact that if we replace this rule by the corresponding implication, we get a schema which is equivalent (in ZF without powerset and replacement) to the replacement schema. Hence [Unif] is valid in $\mathcal {M}^{\mathbf {PW}}$ . That all other axioms and rules of $\mathbf {PW}$ are valid there is obvious.
Corollary 4.16. $\mathsf {HC}$ is a model of $\mathbf {PW}$ . (More precisely, $\mathsf {HC}$ can be expanded to such a model.)
Theorem 4.17. $\mathbf {PW}$ is equivalent to a subsystem of $\mathbf {ZF}-(P)$ .
Proof This easily follows from Theorem 4.15 using ZF and model-theoretic methods. Alternative (and also predicatively acceptable) method is to turn the proof of Theorem 4.12 to a proof that every theorem of $\mathbf {PW}$ is provable in (an extension by definition of) $\mathbf {ZF}-(P)$ . This is not difficult. That the other axioms and rules of $\mathbf {PW}$ are derivable in $\mathbf {ZF}-(P)$ is (almost) obvious.
4.3 Predicative justification of $\mathbf {PW}$
From the non-logical axioms and rules of $\mathbf {PW}$ , [Ext] and [PrI] need no justification, while [Inf] was justified in Section 3. It remains to justify the other non-logical axioms and rules of $\mathbf {PW}$ .
4.3.1 [Comp] and [OpI]
Theorem 4.12 alone is not sufficient for justifying [Comp]. In order to really justify this axiom from a predicative point of view, we should show that the identity of the sets it defines is invariant. Similarly, in order to justify [OpI] we should show that the operation defined by it is invariant. The first step towards these goals is to provide a precise notion of invariance which is adequate for the set-theoretical context.
Obviously, talking about invariance of definitions of sets and operations, when we expand one sphere of operation $\mathcal {M}_1$ to a bigger one $\mathcal {M}_2$ , can make sense only if the identities of the elements of $\mathcal {M}_1$ are preserved in $\mathcal {M}_2$ . Since in the context of set theory we take the identity of a set to be fully determined by the identity of its elements, this means that the same objects should belong to an element a of $\mathcal {M}_1$ in both $\mathcal {M}_1$ and $\mathcal {M}_2$ . Accordingly, we define the following:
Definition 4.18. Let $\mathcal {M}_1=\langle \mathcal {D}_1,I_1 \rangle $ and $\mathcal {M}_2=\langle \mathcal {D}_2,I_2 \rangle $ be structures for signatures that contain $\in $ , and let both be models of [Ext]. $\mathcal {M}_2$ is an $\in $ -extension of $\mathcal {M}_1$ if $\mathcal {D}_1\subseteq \mathcal {D}_2$ , and the following holds for every element a of $\mathcal {D}_1$ :
Note 4.19. From a platonistic point of view, if $\in ^{I_1}$ and $\in ^{I_2}$ are both the ‘real’ $\in $ of $\mathsf {V}$ , then $\mathcal {M}_2$ is an $\in $ -extension of $\mathcal {M}_1$ iff $\mathcal {D}_1$ is a transitive subset of $\mathcal {D}_2$ .
Definition 4.20. For legal e, $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is an e-pair if:
-
• $\mathcal {M}_1$ and $\mathcal {M}_2$ are adequate for e.
-
• $\mathcal {M}_2$ is an $\in $ -extension of $\mathcal {M}_1$ .
-
• $[a\in \mathcal {D}_2\mid \mathcal {M}_2\models N(a)]=[a\in \mathcal {D}_1\mid \mathcal {M}_1\models N(a)].$
Definition 4.21.
-
1. A legal term t of $\mathbf {PW}$ is invariant if $\nu _{\mathcal {M}_1}[t]=\nu _{\mathcal {M}_2}[t]$ whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is a t-pair, and $\nu $ is an assignment in $\mathcal {M}_1$ .
-
2. A legal n-ary operation F of $\mathbf {PW}$ is invariant if
$$ \begin{align*}F^{\mathcal{M}_1}[a_1 ,\ldots\!,\! a_n]=F^{\mathcal{M}_2}[a_1 ,\ldots\!,\! a_n]\end{align*} $$whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is an F-pair, and $a_1 ,\ldots \!,\! a_n$ are elements of $\mathcal {M}_1$ .
-
3. A legal formula $\varphi $ of $\mathbf {PW}$ such that $\{x_1 ,\ldots \!,\! x_n\}\subseteq Fv(\varphi )$ is invariant with respect to $\{x_1 ,\ldots \!,\! x_n\}$ if the following holds for every assignment $\nu $ in $\mathcal {M}_1$ :
$$ \begin{align*}[\vec{a}: \mathcal{D}_2^n\mid \mathcal{M}_2,\nu\{\vec{x}:=\vec{a}\}\models\varphi]= [\vec{a}: \mathcal{D}_1^n\mid \mathcal{M}_1,\nu\{\vec{x}:=\vec{a}\}\models\varphi].\end{align*} $$ -
4. A legal formula $\varphi $ of $\mathbf {PW}$ is persistent if $\mathcal {M}_1,\nu \models \varphi $ implies that ${\mathcal {M}_2,\nu \models \varphi }$ as well, whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is a $\varphi $ -pair, and $\nu $ is an assignment in $\mathcal {M}_1$ .
Note 4.22. The definition of invariability of formulas can also be formulated as follows: a legal formula $\varphi $ of $\mathbf {PW}$ such that $Fv(\varphi )=\{x_1 ,\ldots \!,\! x_n,y_1 ,\ldots \!,\! y_k\}$ is invariant with respect to $\{x_1 ,\ldots \!,\! x_n\}$ if the following holds whenever $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is a $\varphi $ -pair, and $c_1, \ldots , c_k$ are elements of $\mathcal {M}_1$ :
(where $\langle a_1 ,\ldots \!,\! a_n \rangle ^{\frown }\!\langle c_1 ,\ldots \!,\! c_k \rangle = \langle a_1 ,\ldots \!,\! a_n,c_1 ,\ldots \!,\! c_k \rangle $ .)
Note 4.23. Identifying $\langle \rangle $ with $\emptyset $ , we get that in case $n=0$ , the collection $[\vec {a}: \mathcal {D}^n\mid \mathcal {M}\models \varphi (\vec {a}^{\frown }\!\vec {c})]$ is either 1 ( $=\{\emptyset \}$ ) or 0 ( $=\emptyset $ ), depending on whether $\mathcal {M}\models \varphi (\vec {c})$ or not. Hence invariance with respect to $\emptyset $ is simply absoluteness.
Convention. From now on, when we talk about terms, we shall mean legal terms. The same convention applies to formulas and operation symbols.
Theorem 4.24.
-
1. Every term t of $\mathbf {PW}$ is invariant.
-
2. Every operation F of $\mathbf {PW}$ is invariant.
-
3. If $\varphi \succ \{x_1 ,\ldots \!,\! x_n\}$ in $\mathbf {PW}$ then $\varphi $ is invariant with respect to $\{x_1 ,\ldots \!,\! x_n\}$ .
-
4. Every $\Sigma $ -formula of $\mathbf {PW}$ is persistent.
Proof We prove all parts simultaneously, using induction on the complexity of e, where e is a term or a formula or an operation symbol of $\mathbf {PW}$ . Nevertheless, to facilitate reading and understanding, we split the various induction steps into groups that correspond to the four parts of the theorem.
In what follows we assume that for every e we consider, $\langle \mathcal {M}_1,\mathcal {M}_2 \rangle $ is an e-pair ( $\mathcal {M}_1=\langle \mathcal {D}_1,I_1 \rangle $ , $\mathcal {M}_2=\langle \mathcal {D}_2,I_2 \rangle $ ), and $\nu $ is an assignment in $\mathcal {D}_1$ .
-
Operations: Suppose that e is the n-ary operation symbol $F_{\varphi }$ . For convenience of presentation, assume that $n=1$ . Let $a:\mathcal {D}_1$ , and let $b=F_{\varphi }^{I_1}[a]$ . Since $\mathcal {M}_1$ is a model of $\mathbf {PW}_e$ , the legality of $F_{\varphi }$ and the rule [OpI] imply that $\mathcal {M}_1\models \varphi (\langle a,b \rangle )$ . Since $\varphi $ should be a $\Sigma $ -formula, the induction hypothesis for $\varphi $ implies that $\mathcal {M}_2\models \varphi (\langle a,b \rangle )$ too. Since $\mathcal {M}_2$ is a model of $\mathbf {PW}_e$ , this in turn implies (with the help of [OpI]) that $b=F_{\varphi }^{I_2}[a]$ as well.
-
Terms:
-
• The case where e is a variable is trivial.
-
• The case $e=\omega $ is immediate from the third item of Definition 4.20.
-
• Suppose that e is the term $F_{\varphi }(s_1 ,\ldots \!,\! s_n)$ . Then
$$ \begin{align*}\nu_{\mathcal{M}_1}[e]=F_{\varphi}^{I_1}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]\ \ \ \nu_{\mathcal{M}_2}[e]=F_{\varphi}^{I_2}[\nu_{\mathcal{M}_2}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_2}[s_n]].\end{align*} $$Now from the induction hypothesis for $F_{\varphi }$ we get:
$$\begin{align*}F_{\varphi}^{I_1}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]= F_{\varphi}^{I_2}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]\end{align*}$$while from the induction hypotheses for $s_1 ,\ldots \!,\! s_n$ we get that
$$\begin{align*}F_{\varphi}^{I_2}[\nu_{\mathcal{M}_1}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_1}[s_n]]= F_{\varphi}^{I_2}[\nu_{\mathcal{M}_2}[s_1] ,\ldots\!,\! \nu_{\mathcal{M}_2}[s_n]].\end{align*}$$It follows that $\nu _{\mathcal {M}_1}[e]=\nu _{\mathcal {M}_2}[e]$ .
-
-
The invariance relation: $\succ $
-
•
-
– To show the absoluteness (invariance with respect to $\emptyset $ ) of $s\in t$ , let $\nu $ be an assignment in $\mathcal {M}_1$ . Assume first that $\mathcal {M}_1,\nu \models s\in t$ . Then $\nu _{\mathcal {M}_1}[s]\in ^{I_1}\nu _{\mathcal {M}_1}[t]$ . It follows by the induction hypotheses for s and t and the fact that $\mathcal {M}_2$ is an $\in $ -extension of $\mathcal {M}_1$ that $\nu _{\mathcal {M}_2}[s]\in ^{I_2}\nu _{\mathcal {M}_2}[t]$ too. Hence $\mathcal {M}_2,\nu \models s\in t$ . The proof of the converse (i.e., that if $\mathcal {M}_2,\nu \models s\in t$ then $\mathcal {M}_1,\nu \models s\in t$ ) is similar.
-
– We leave the simpler proof that $s=t$ is absolute to the reader.
-
– The case $e=P_{\varphi }(t_1,\ldots \!,\! t_n)$ is immediate from the induction hypothesis for $\varphi $ , and the $\varphi $ -instance of [PrI] (which is in $\mathbf {PW}_e$ ).
-
-
•
-
– That $x\neq x$ is invariant with respect to x follows from the fact that for every $\mathcal {M}=\langle \mathcal {D},I \rangle $ and $\nu $ , $[a:\mathcal {D}\mid \mathcal {M},\nu \{x:=a\}\models x\neq x]=\emptyset $ .
-
– Let e be the formula $x= t$ , where $x\not \in Fv(t)$ . Then the collection $[a\in \mathcal {D}_1\mid \mathcal {M}_1,\nu \{x:=a\}\models x=t]$ is the singleton of $\nu _{\mathcal {M}_1}[t]$ , while $[a\in \mathcal {D}_2\mid \mathcal {M}_2,\nu \{x:=a\}\models x=t]$ is the singleton of $\nu _{\mathcal {M}_2}[t]$ . Hence the induction hypothesis for t implies that the two sets are equal.
-
– Let e be the formula $x\in t$ , where $x\not \in Fv(t)$ . Then
$$ \begin{align*}[a\in\mathcal{D}_1\mid \mathcal{M}_1,\nu\{x:=a\}\models x\in t]=[a\in\mathcal{D}_1\mid a\in^{\mathcal{M}_1}\nu_{\mathcal{M}_1}[t]],\end{align*} $$$$ \begin{align*}[a\in\mathcal{D}_2\mid \mathcal{M}_2,\nu\{x:=a\}\models x\in t]=[a\in\mathcal{D}_2\mid a\in^{\mathcal{M}_2}\nu_{\mathcal{M}_2}[t]].\end{align*} $$Since $\nu _{\mathcal {M}_1}[t]=\nu _{\mathcal {M}_2}[t]$ by the induction hypothesis for t, these two equations and the fact that $\mathcal {M}_2$ is an $\in $ -extension of $\mathcal {M}_1$ imply:$$ \begin{align*}&[a\in\mathcal{D}_1\mid \mathcal{M}_1,\nu\{x:=a\}\models x\in t]\\&\quad=[a\in\mathcal{D}_2\mid \mathcal{M}_2,\nu\{x:=a\}\models x\in t].\end{align*} $$
-
-
• The proof that if $\varphi $ is absolute then so is
-
•