1. Primitive Recursion

Kevin T. Kelly
Department of Philosophy

Carnegie Mellon University

Micro History:

The amazing success of the calculus in physics, together with its free-wheeling early practice (e.g., infinitesimals and adding non-convergent series), encouraged attention to foundations and consistency in the latter 19th century.

This concern led by stages to analysis based on set theory which, as David Hilbert observed, traded the infinitely small infinitesimals of the 17th century for limits over infinitely long sequences.

Russell showed that naive set theory's comprehension schema

{x: F(x)}

yields inconsistency:

{x: Ï x}Î{x: xÏx} Û {x: Ï x}Ï{x: xÏ x}.

Zermelo's set theory blocked this inference by constraining the comprehension schema to elements of a set:

{x Î S: F(x)}.

Other ways of blocking the argument are possible (e.g., type theory) and it is hard to see how this particular contradiction can return. But the suspicion remained that a more sophisticated contradiction was lurking elsewhere. And the original point of the foundation was to rid the calculus of contradictions!

Finitists urged that the questionable infinities should be jettisoned from mathematics altogether, in favor of elementary, effectively decidable relations on the natural numbers.

David Hilbert had a less radical proposal. Talk about all the infinity you want, so long as you end up getting arithmetic right and you show by uncontroversial finitistic means that you can never prove a contradiction from your assumptions. Thus, one could retain the convenience of proofs in analysis without swallowing the risk. To establish consistency, Hilbert proposed coding set theoretical proofs as numbers and proof procedures as relations among numbers. By means of the coding, provability could be defined. Then one would only have to show in finitistic arithmetic that no proof could ever lead to both p and not p.

Gödel showed that the program cannot be carried out (interesting restricted versions are still being carried out). In particular, he argued that even the consistency of arithmetic cannot be shown by finitistic means. To give a rigorous argument, Gödel needed a mathematical specification of finitistic means (to show something is not in a set, you need to specify the set). Gödel proposed the primitive recursive (rekursiv) functions as an explication of finititistic means in his theorem.

Unfortunately, there are obviously effective functions that are not primitive recursive. To strengthen Gödel's incompleteness theorem, one would like to prove it for a class of functions that arguably includes all intuitively effective functions. This led to the partial recursive functions, which result from closing the primitive recursive functions under the additional operation of minimization:

g(x) = the least y such that f(x, y) = 0.

Many other ad hoc definitions of computability extending the primitive recursive functions were proposed. It was soon discovered that they were all equivalent. This provided a kind of weak empirical evidence that the partial recursive functions cannot be extended by adding any more intuitively effective calculational techniques, but then again all the proposals were fairly simple.

The Turing machine was introduced by Alan Turing to show that all intuitively effective (step-by-step algorithmic) functions are partial recursive. Turing's argument was that an arbitrary mathematician following an algorithm could be philosophically reduced, step by step, to a Turing machine. This argument was taken to establish Church's [Turing's] thesis: all intuitively effective functions are partial recursive. The converse is considered more or less evident since all partial recursive functions have step by step procedures by definition.

After this, when real digital computers became more interesting and mathematicians ceased to care much about foundations, the theory took on a life of its own. Much of the classical formulation of the theory, including the recursion (fixpoint) theorems and the arithmetical and analytical hierarchies, is due to Stephen Kleene.

Nowadays, applications range from the very applied (Complexity, NP-completeness, languages and automata, cryptography) to the cognitive (production systems, AI, neural networks) to the philosophical (functionalism, learnability) to the original topic of mathematical foundations (analytical and projective sets, independence, and the axiom of determinacy, proof theory, reverse mathematics).

The Primitive Recursive Functions:

We start with a basic stock of functions that don't scare anybody.

Basic functions:

Zero function: o(x) = 0.
Successor: s(x) = x + 1.
Projection: (picking out an argument). p_i,k(x₁, ..., x_k) = x_i.

Next we consider effective ways of building new functions out of old ones. Note that these are operations on functions, not numbers.

Basic operations on functions:

Composition: If f has arity m and each g_ihas arity k then C(f, g₁, ..., g_m) denotes a the unique k-ary function h such that for each k-vector x:

₁

g_m

Primitive Recursion: If f has arity k+2 and g has arity k, then R(g, f) denotes the unique k + 1-ary function h such that for each k-vector y:

In order to cover the (awkward) case in which y is empty (so g is 0-ary), we adopt the convention that the constant n is a 0-ary constant function:

Then in the case of unary primitive recursion:

h(0) = n();
h(x + 1) = f(h(x), x).

we may write h = R(n, f), where n = n() is understood to be a 0-ary function and hence R is the same operation on functions as in the case of arity > 1. This trick will eliminate a messy case in each inductive proof. Note that this convention extends to composition. If the component functons are 0-ary, C(f, n₁, ..., n_k) = f(n₁, ..., n_k). If the outer function is 0-ary n, then C(n) = n.

The set of primitive recursive functions

PR = the least set X such that

the 0-ary functions are all in X;
the basic functions are in X; and
X is closed under C and R.

Notice we are already looking at closing a collection under some operations, so a hierarchy shouldn't be far away.

Primitive Recursive Derivations

Each f in PR can be defined from the basic functions using operators C and R. We may think of the basic functions invoked as leaves in a tree whose non-terminal nodes are labelled with C and R. Nodes labelled by C may have any number of daughters and nodes labelled by R always have two daughters. We may think of this tree as a program for computing the function so defined. We will do several proofs by induction on the depth of this tree. This is similar to doing induction proofs in logic on the depth of embedding of the logical connectives in a formula.

Primitive Recursive Relations

A relation R(x) is primitive recursive just in case its characteristic function:

is primitive recursive. We will simplify notation by letting the relation stand for its own characteristic function when no confusion results.

A Stockpile of PR Functions

This looks like a pretty simple programming language. But only a few primitive recursions are required to produce most functions you would ever care about. It is a beautiful thing to see how explosively it happens (cf.Cutland p. 36).

Constant functions c_i(x) = i. Observe that

₀

c_n

₊₁

c_n

Hence:

₀

c_n

₊₁

c_n

Notice, this is a recursive definition of a family of primitive recursive functions c_i(x), not a primitive recursive definition of a single primitive recursive function c(i, x). In the lingo, a definition that puts a parameter into the argument list of a function is said to be uniform in that parameter. Thus, c(i,x) is uniform in i, wheras c_i(x) is not.

Addition: I am going to go through this example in complete detail, showing the heuristic procedure for arriving at a primitive recursive derivation of a function.
Sensible way to write:

Rewrite + and successor in prefix notation:

This is not the required form for primitive recursion. We need a unary function g(y) on the right hand side of the base case and a ternary function f(+(x, y), x, y) on the right hand side in the inductive case. Compose in projections to get the argument lists in the required form:

_1,1

_1,3

Now be painfully literal about composition being an operator on a fixed list of arguments to arrive at the required form for primitive recursion:

_1,1

_1,3

Now apply the primitive recursion operator:

_1,1

_1,3

Now I'll let you verify that the following derivations work:

Multiplication:

· = R(o, C(+, p_1,3)) = R(o, C(R( p_1,1, C(s, p_1,3)), p_1,3, p_3,3)) .

Exponentiation: How many nested primitive recursions occur in the following derivation?

₁

_1,3

_1,1

_1,3

_3,3

Decrement: Dec(x) = 0 if x =0 and Dec(x) = x - 1 otherwise. We get this by a sneaky application of R.

_2,2

Thus, in light of the convention that constants are 0-ary PR functions:

_3,2

Cutoff subtraction: This results from another sneaky application of R:

Hence:

_1,1

_1,3

_1,1

_3,2

_1,3

Factorial:

_1,2

_2,2

I think we have all seen enough of official derivations. From now on I will just write the obvious recurrence equations.

Signature:

Reverse signature:

Identity:

=(x, y) = -sg((x -. y) + (y -. x)).

Ordering:

>(x, y) = sg(x -. y).

Min:

min(x, y) = x -. (x -. y).

Max:

max(x, y) = x + (y -. x).

Absolute difference:

|x - y| = (x -. y) + (y -. x).

Remainder when x is divided by y. Incrementing the numerator x increments the remainder until the remainder is incremented up to y, when the remainder drops to 0 because another factor of y can be taken out of x. The following program implements this idea by using sg to add 1 if incrementing the previous remainder doesn't climb all the way up to y (i.e., if the difference |x- (rm(x, y) + 1| is nonzero) and and 0 otherwise. Observe that in the case of division by 0, this condition is always met for nonzero y, so x/0 = x. This causes no harm since we never actually use the function in this case, so we can conventionally define division by 0 to be whatever we want.

rm(0, y) = 0;
rm(x + 1, y) = (rm(x, y) + 1) · sg(|y - (rm(x, y)+1)|).

Do you see how to get the messy composition into the required form f(rm(x, y), x, y)?

Quotient: qt(x, y) = the greatest lower natural bound of y/x. The idea here is that if we get a bigger quotient each time the numerator is incremented to include another factor of y. This can be checked by incrementing the prevous remainder and checking whether the result is y.

Divisibility: x | y is the relation "x divides y evenly".

x | y = -sg(rm(x, y)).

Derived PR Operators

Wait a minute! The construction rules are tedious. (Think about using sg as a conditional branch). Wouldn't it be better to prove that PR is closed under more operators?) Then we could use these as operators to construct more PR functions in a more natural way instead of always putting them into this clunky form. (That's why closure laws are so desirable). PR is closed under each of the following operators.

Substitution of a constant: If f(x₁, ..., x_i, ..., x_n) is PR then so is h(x₁, ..., x_n) = g(x₁, ..., k, ..., x_n). The idea is that we can compose in appropriate projections and a constant function:

₁

x_n

_1,n

₁

x_i

x_n

c_k

p_i,n

₁

x_i

x_n

p_n,n

₁

x_n

Sum: Let any indexed collection {f_z: zÎ N} be given. Think of

S_z£xf_z(y) = h(x,y)

as a function of x and y. Then we may think of it also as an x-ary operation on the first x functions in the indexed set:

_{z £x}

f_z

₀

f_x

We can establish that operation S_x preserves PR as follows

_{z £
0}

f_z

_{z £x+1}

f_z

_z£x

f_z

f_x

₊₁

Bounded Sum: The preceding construction is non-uniform in x. There is also a uniform version of bounded summation on a single function. Think of g(x, y) = S_z<x f(z, y) as a function of x and y.

_{z £
0}

_{z £x
+
1}

_z£x

Product and Bounded Product: Similar to the sum case.

_{z £
0}

f_z

_{z £x
+
1}

f_z

_z£x

f_z

f_x

₊₁

P_{z £
0} f(z, y) = 0;
P_{z £x
+
1} f(z, y) = P_z£xf(z, y) · f(x + 1, y).

Definition by Cases: Let the P_i be mutually exclusive and exhaustive PR relations.

[g₁(x) if P₁(x); g₂(x) if P₂(x); ...; g_k(x) if P_k(x)] = S_{z£
k + 1} g_z(x) · P_z(x).

Logical Operators

Conjunction:

P(x) ÙQ(x) = P(x) ·Q(x).

Negation:

ØP(x) = -sg(P(x)).

Disjunction:

P(x) ÚQ(x) = Ø(ØP(x) ÙØQ(x)) = max(P(x), Q(x)).

Conditional:

P(x) ®Q(x) = ØP(x) ÚQ(x).

Biconditional:

P(x) «Q(x) = (P(x) ®Q(x)) Ù (Q(x) ®P(x)).

Bounded universal quantifier:

("z £ x) P(x, y) = P_z<xP(z, y).

Bounded existential quantifier:

($z £ x) P(x, y) = S_z<xP(z, y).

Bounded Minimization:

g(x, y) = (mz £x)[ f(z, y) = 0] = "the least z £x such that f(x, y) = 0".

By this we mean that if no root of f is found up to the bound, we return the bound to keep the function total. If the value returned under search bound n is strictly less than n, then a root of f was found so we don't increment when the search bound is raised to n + 1. If the value returned under search bound n is identical to n, then either we (coincidentally) found a root at the last minute, or we stopped because we hit the bound. So we have to check, further, if the bound is a root in that case. Thus, we increment just in case the previous search hit the bound and the bound is not a root.

(mz £ 0)[ f(z, y) = 0] = 0.
(mz £x + 1)[ f(z, y) = 0] = (mz £x)[ f(z, y) = 0] + [(mz £x)[ f(z, y) = 0] = x Ùf(z, y) > 0].
(mz £x)[P(z, y)] = (mz £x) -sg(Char_P(z, y)).

Bounded Maximization:

(max z < x)[P(z, y)] = x -. (mz £x)[P(x-. z, y)].

Pure Iteration:

f⁰(y) = y.
f^x^{+ 1}(y) = f(f^x(y)).

Exercise 0: Show that for each PR function there is a monotone PR function that is everywhere greater.

Some Number Theoretical Constructions

With our new operations we are on Easy Street. The Prime number predicate is programmed just by writing its logical definition! How's that for a handy computer language?

Primality:

Prime(x) = 1 < xÙ ("z< x) [z = 1 ÚØ(z | x)].

The first prime after x: This one uses Euclid's theorem that there exists at least one prime p such that x <p£x! + 1. . Why is that? Well, x! + 1 has some prime factor £x! + 1 by the uniqueness and existence of prime factorizations (the fundamental theorem of arithmetic). But x! + 1 has no factor £x, since each such divisor leaves remainder 1. Hence, x! + 1 has a prime factor x < k £x! + 1.

h(x) = (mz £x! + 1)[Prime(z) Ù x<z].

The xth prime:

p(0) = 2;
p(x + 1) = h(p(x)).

Exponent of xth prime in prime factorization of y. How do we know it's unique? How do we know the search bound is high enough? (Show by induction that 2^y>y. 2 is the lowest possible prime factor of y, so we're safe.)

[y]_x = y -. (max z < y)[p(x)^z| y].

Goedel Numbering Finite Sequences

Let x be an n-ary sequence. Define the Gödel coding:

<x> = p(0)^{1 + x[1]} + p(1)^{1
+ x[2]} + ... + p(n - 1)^{1
+ x[n - 1]}.

Note: I will use square brackets when double-subscripts or double-superscripts are required..

This coding is unique by the uniqueness of prime factorizations (the fundamental theorem of arithmetic). It is onto by the existence of prime factorizations. The reason for adding 1 to the exponents is to distinguish zero positions from empty positions. To decode a number as a sequence, we have to remember to remove the extra 1. The standard notation for the xth position in the sequence coded by

Decoding: d(y, x) = (y)_x= the item occurring in the xth position in the sequence coded by y. The standard notation is (y)_x, but this gives rise to double and triple subscripts, so I introduce the alternative notation d(y, x). To keep the function total, we take the first position to be position 0.

d(y, x) = (y)_x= [y]_x -. 1.

Length of decoded sequence (numer of consecutive prime factors of x): Using the convention that every position in the sequence corresponds to a prime factor, we can simply search for the first non-factor of the code number. Since we count prime factors starting with 0, this corresponds to the length of the coded sequence.

lh(x) = (mz £x)[Ø(p(z) | x)].

Proposition: if x is non-empty then:

x = ((<x>)₀, (<x>)₁ , ...,(<x>)_{lh(<x>) -1}).

Don't forget that length counts the number of items occurring, rather than giving the position (starting from 0) of the last item occurring. That's why we have to subtract 1 from the length in this result.

Exercise 1: Bijective Binary Sequence Encoding:

Define the binary primitive recursive function:

<x, y> = 1/2[(x + y) (x + y + 1)] + x.

This coding is evidently PR. Show that it is a bijection N²® N. Hint: this is just a polynomial expression for the obvious enumeration procedure. Think of the pairs as being presented in an N ´ N array with (0, 0) at the upper left. Given (x, y), recover the code number by counting pairs, starting at (0, 0), along diagonals, from the upper right to the lower left.
Show that 1/2[(x + y) (x + y + 1)] = S_t£
x+
y t = the number of pairs (z, w) occuring in diagonals to the upper left of (x, y).
Show that x = the number of pairs remaining to be counted to the upper right of (x, y).
Show that the decoding functions are also PR.
Use the preceding results and codings to produce n-ary PR encodings and decodings.

Fancy Recursion

One reason codings are nice is that they give us new and more elegant forms of recursion for free. The basic idea is that the coding allows primitive recursion to simulate the fancy form of recursion by looking at successive code numbers of the current "computational state" of the fancy form of recursion.

Exercise 2: Course-of-Values Recursion (Pe'ter): Suppose that s₁(x), ..., s_k(x) £x. Then CVR(g, f, s₁, ..., s_k) is the unique function h such that:

h(0, y) = g(y);
h(n + 1, y) = f(h(s₁(n),y), ..., h(s_k(n), y), n, y).

Use the Goedel coding to show that PR is closed under the CVR operator.

Exercise 3: Fibonacci Sequence: Show that the following function is PR.

f(0) = 1;
f(1) = 1;
f(x + 2) = f(x) + f(x + 1).

Exercise 4: Simultaneous Recursion (Pe'ter): SR_i(g₁, ..., g_k, f₁, ..., f_k) is the unique function h_i such that:

h_i(0, y) = g_i(y);
h_i(n + 1, y) = f_i(h₁(n, y), ..., h_k(n, y), n, y).

Notice that k is constant. Use the k-ary sequence encoding of exercise 1 to show that PR is closed under the SR operator.

Exercise 5: Recursion with parameter substitution: PS(g, f, s) is the unique h such that:

h(0, y) = g(y);
h(n + 1, y) = f(h(n, s(n, y)), n, y).

Show that PR is closed under the PS operation.

Hint: Trace the computation of h(3, y). You will find that you end up needing to know the values of the successive compositions:

y;
s(2, y);
s(1, s(2, y));
s(0, s(1, s(2, y)));
etc.

These can be obtained by a "backwards" recursion; or rather a forwards recursion that goes backwards in time!

S(0, n, y) = y;
S(t + 1, n, y) = s(n -. t, S(t, n, y)).

Thus,

S(0, 3, y) = y;
S(1, 3, y) = s(2, S(0, 3, y)) = s(2, y);
S(2, 3, y) = s(1, S(1, 3, y)) = s(1, s(2, S(0, 3, y))) = s(1, s(2, y));
etc.

Now write a similar "backwards" recursion that applies f to the right values of S.

Exercise 6: Let f(x) = ggg ... ghs ... sss(x), with x iterations of the g's and the s's. Show that f is PR. Hint: use PS.

Breaking the PR barrier

We have lots of examples of recursion operations that yield nothing new. This may lull you into assuming that every kind of recursion can be massaged into primitive recursion by means of suitable codings. But that would be a mistake, for there is an intuitively effective recursion operator that generates non PR functions.

Double-recursion: R(g₁, g₂, g₃, g₄) is the unique h such that:

h(0, y, z) = g₁(y, z);
h(x + 1,0, z) = g₂(h(x, c, z), x, z);
h(x + 1, y + 1, z) = g₄(h(x + 1, y, z), h(g₃(h(x+ 1, y, z), x, y, z), x, z), x, y, z).

Double recursion allows one to apply a variable number of primitive recursions depending on the value of x. To see this, observe that:

For a given value of x, the third clause decrements y down to 0.
When y reaches 0, you hit the second clause, which decrements x to x-1 and restarts the primitive recursion with y set to some fixed constant c.
Finally, when x is decremented down to 0, we hit the first clause, which is the base case.

Since double recursion can simulate arbitrary numbers of primitive recursions, it is fairly intuitive that a double recursive function ought to be able to grow faster than any given PR function. If the function uses n primitive recursions at stage n, then for each number of primitive recursions, after some time it does something that would require at least one more primitive recursion.

The proof of this claim will wait a bit until we introduce the Grzegorczyk hierarchy.

The Ackermann function (1928): The Ackermann function is historically the first example of an intuitively effective function that is not PR. (cf. Rogers p. 8). The Ackermann function grows so fast that some skeptical finitistic mathematicians have denied that it is computable (or that it even exists). This sounds crazy until one considers that the exact values for small arguments can only be defined (in a book length definition) by the function itself!

a(0, 0, z) = z;
a(0, y + 1, z) = a(0, y, z) + 1;
a(1, 0, z) = 0;
a(x + 2, 0, z) = 1;
a(x + 1, y + 1, z) = a(x, a(x + 1, y, z), z).

Exercise 7: The Ackermann function is more intuitive than it looks at first. This will be apparent after you do the following.

Show that the Ackermann function really does result from applying DR to PR functions. Don't let all those clauses scare you!
Find simple primitive recursive definitions of each of the functions g_i(x, y) = a(i, x, y). Prove that you are correct.
Relate this fact to the point made about uniformity when we introduced bounded sums as an operator.
What are g₀, g₁, and g₂?

The Péter function: The Péter function is also not PR. It is a simplified version of Ackermann's function and is called the Ackermann function in Cutland (p. 46).

p(0, y) = y + 1;
p(x + 1, 0) = p(x, 1);
p(x + 1, y+1) = p(x, p(x + 1, y)).

Even Fancier Recursion

We can generalize the idea of double recursion by adding recursive variables and piling on new recursion conditions. So while double recursion does a variable number of nested primitive recursions, triple recursion allows for a variable number of double recursions, and so forth. Each time we get something new.

Next Lecture