1. Primitive Recursion
Kevin T. Kelly
Department of Philosophy
Carnegie Mellon University
Micro History:
The amazing success of the calculus in physics,
together with its free-wheeling early practice (e.g., infinitesimals and
adding non-convergent series), encouraged attention to foundations and
consistency in the latter 19th century.
This concern led by stages to analysis based on set theory which, as David
Hilbert observed, traded the infinitely small infinitesimals of the 17th
century for limits over infinitely long sequences.
Russell showed that naive set theory's comprehension schema
{x: F(x)}
yields inconsistency:
{x: Ï x}Î{x:
xÏx}
Û {x: Ï
x}Ï{x:
xÏ
x}.
Zermelo's set theory blocked this inference by constraining the comprehension
schema to elements of a set:
{x Î S: F(x)}.
Other ways of blocking the argument are possible (e.g., type theory) and
it is hard to see how this particular contradiction can return. But
the suspicion remained that a more sophisticated contradiction was lurking
elsewhere. And the original point of the foundation was to rid the
calculus of contradictions!
Finitists urged that the questionable infinities should be jettisoned
from mathematics altogether, in favor of elementary, effectively decidable
relations on the natural numbers.
David Hilbert had a less radical proposal. Talk about all
the infinity you want, so long as you end up getting arithmetic right and
you show by uncontroversial finitistic means that you can never prove a
contradiction from your assumptions. Thus, one could retain the convenience
of proofs in analysis without swallowing the risk. To establish consistency,
Hilbert proposed coding set theoretical proofs as numbers and proof procedures
as relations among numbers. By means of the coding, provability could
be defined. Then one would only have to show in finitistic arithmetic
that no proof could ever lead to both p and not p.
Gödel showed that the
program cannot be carried out (interesting restricted versions are
still being carried out). In particular, he argued that even the
consistency of arithmetic cannot be shown by finitistic means. To
give a rigorous argument, Gödel
needed a mathematical specification of finitistic means (to show something
is not in a set, you need to specify the set). Gödel
proposed the primitive recursive (rekursiv) functions as an explication
of finititistic means in his theorem.
Unfortunately, there are obviously effective functions that are not
primitive recursive. To strengthen Gödel's
incompleteness theorem, one would like to prove it for a class of functions
that arguably includes all intuitively effective functions. This
led to the partial recursive functions, which result from closing the primitive
recursive functions under the additional operation of minimization:
g(x) = the least y such that f(x,
y)
= 0.
Many other ad hoc definitions of computability extending the primitive
recursive functions were proposed. It was soon discovered that they
were all equivalent. This provided a kind of weak empirical evidence
that the partial recursive functions cannot be extended by adding any more
intuitively effective calculational techniques, but then again all the
proposals were fairly simple.
The Turing machine was introduced by Alan Turing to show that all intuitively
effective (step-by-step algorithmic) functions are partial recursive.
Turing's argument was that an arbitrary mathematician following an algorithm
could be philosophically reduced, step by step, to a Turing machine.
This argument was taken to establish Church's [Turing's] thesis: all intuitively
effective functions are partial recursive. The converse is considered
more or less evident since all partial recursive functions have step by
step procedures by definition.
After this, when real digital computers became more interesting and
mathematicians ceased to care much about foundations, the theory took on
a life of its own. Much of the classical formulation of the theory,
including the recursion (fixpoint) theorems and the arithmetical and analytical
hierarchies, is due to Stephen Kleene.
Nowadays, applications range from the very applied (Complexity, NP-completeness,
languages and automata, cryptography) to the cognitive (production systems,
AI, neural networks) to the philosophical (functionalism, learnability)
to the original topic of mathematical foundations (analytical and projective
sets, independence, and the axiom of determinacy, proof theory, reverse
mathematics).
The Primitive Recursive Functions:
We start with a basic stock of functions that don't
scare anybody.
Basic functions:
Zero function: o(x) = 0.
Successor: s(x) = x + 1.
Projection: (picking out an argument). pi,k(x1,
..., xk) = xi.
Next we consider effective ways of building new
functions out of old ones. Note that these are operations on functions,
not numbers.
Basic operations on functions:
Composition: If
f has arity m and each
gihas
arity
k then C(
f,
g1, ...,
gm)
denotes a the unique
k-ary function
h such that for each
k-vector
x:
h(x) = f(g1(x),
..., gm(x)).
Primitive Recursion: If
f has arity
k+2 and
g
has arity
k, then R(
g,
f) denotes the unique
k
+
1-ary function
h such that for each
k-vector
y:
h(0, y) = g(y);
h(x + 1, y) = f(h(x, y),
x,
y).
In order to cover the (awkward) case in which
y is empty
(so
g is 0-ary), we adopt the convention that the constant
n
is a 0-ary constant function:
Then in the case of unary primitive recursion:
h(0) = n();
h(x + 1) = f(h(x), x).
we may write
h = R(
n,
f), where
n =
n()
is understood to be a 0-ary function and hence R is the same operation
on functions as in the case of arity > 1. This trick will eliminate a messy
case in each inductive proof. Note that this convention extends to
composition. If the component functons are 0-ary, C(
f,
n1,
...,
nk) =
f(
n1, ...,
nk).
If the outer function is 0-ary
n, then C(
n) =
n.
The set of primitive recursive functions
PR = the least set X such that
-
the 0-ary functions are all in X;
-
the basic functions are in X; and
-
X is closed under C and R.
Notice we are already looking at closing a collection
under some operations, so a hierarchy shouldn't be far away.
Primitive Recursive Derivations
Each f in PR can be defined from the basic
functions using operators C and R. We may think of the basic functions
invoked as leaves in a tree whose non-terminal nodes are labelled with
C and R. Nodes labelled by C may have any number of daughters and
nodes labelled by R always have two daughters. We may think of this
tree as a program for computing the function so defined. We will
do several proofs by induction on the depth of this tree. This is
similar to doing induction proofs in logic on the depth of embedding of
the logical connectives in a formula.
Primitive Recursive Relations
A relation
R(
x) is primitive
recursive just in case its characteristic function:
CharR(x) = 1 if R(x).
CharR(x) = 0 if ØR(x).
is primitive recursive. We will simplify notation
by letting the relation stand for its own characteristic function when
no confusion results.
A Stockpile of PR Functions
This looks like a pretty simple programming language.
But only a few primitive recursions are required to produce most functions
you would ever care about. It is a beautiful thing to see how explosively
it happens (cf.Cutland p. 36).
Constant functions ci(
x) =
i. Observe
that
c0(x) = 0;
cn+1(x) = s(cn(x))
= C(s, cn)(x).
Hence:
Notice, this is a recursive definition of a family
of primitive recursive functions ci(x), not a
primitive recursive definition of a single primitive recursive function
c(i,
x).
In the lingo, a definition that puts a parameter into the argument list
of a function is said to be uniform in that parameter. Thus,
c(i,x)
is uniform in i, wheras ci(x) is not.
Addition: I am going to go through this example in complete
detail, showing the heuristic procedure for arriving at a primitive recursive
derivation of a function.
Sensible way to write:
y + 0 = y;
y + (x + 1) = (y + x) + 1.
Rewrite + and successor in prefix notation:
+(0, y) = y;
+(x + 1, y) = s(+(x, y)).
This is not the required form for primitive recursion.
We need a unary function
g(
y) on the right hand side of the
base case and a ternary function
f(+(
x,
y),
x,
y)
on the right hand side in the inductive case. Compose in projections
to get the argument lists in the required form:
+(0, y) = p1,1(y);
+(x + 1, y) = s(p1,3(+(x,
y),
x,
y)).
Now be painfully literal about composition being
an operator on a fixed list of arguments to arrive at the required form
for primitive recursion:
+(0, y) = p1,1(y);
+(x + 1, y) = C(s, p1,3)(+(x,
y),
x,
y)).
Now apply the primitive recursion operator:
+ = R( p1,1, C(s, p1,3)).
Now I'll let you verify that the following derivations
work:
Multiplication:
· = R(o,
C(+, p1,3)) = R(o, C(R( p1,1,
C(s, p1,3)), p1,3, p3,3))
.
Exponentiation: How many nested primitive recursions occur in the
following derivation?
Exp = R(c1, C(·,
p1,3))
= R(C(s, o), C(R(0, C(R( p1,1, C(s,
p1,3)),
p1,3)),
p1,3, p3,3)).
Decrement: Dec(
x) = 0 if
x =0 and Dec(
x)
=
x - 1 otherwise. We get this
by a sneaky application of R.
Dec(0) = 0;
Dec(x + 1) = x = p2,2(Dec(x),
x).
Thus, in light of the convention that constants are 0-ary PR functions:
Cutoff subtraction: This results from another sneaky application
of R:
y -. 0 = y;
y -. (x + 1) = Dec(y-.
x).
Hence:
-. = R(p1,1, C(Dec, p1,3))
= R(p1,1, C(R(0, p3,2), p1,3)).
Factorial:
0! = 1;
(x + 1)! = x! ·
(x + 1).
so
! = R(1, C(·, p1,2(x),
C(s, p2,2(x))).
I think we have all seen enough of official derivations. From now
on I will just write the obvious recurrence equations.
Signature:
sg(0) = 0;
sg(x + 1) = 1.
Reverse signature:
-sg(0) = 1:
-sg(x + 1) = 0.
Identity:
=(x, y)
= -sg((x -. y) + (y -.
x)).
Ordering:
>(x, y)
= sg(x -. y).
Min:
min(x, y) = x -.
(x -. y).
Max:
max(x, y) = x + (y -.
x).
Absolute difference:
|x - y| = (x -.
y)
+ (y -. x).
Remainder when
x is divided by
y. Incrementing
the numerator
x increments the remainder until the remainder is
incremented up to
y, when the remainder drops to 0 because another
factor of
y can be taken out of
x. The following program
implements this idea by using sg to add 1 if incrementing the previous
remainder doesn't climb all the way up to
y (i.e., if the difference
|
x- (rm(
x,
y) + 1| is nonzero)
and and 0 otherwise. Observe that in the case of division by 0, this condition
is always met for nonzero
y, so
x/0 =
x. This
causes no harm since we never actually use the function in this case, so
we can conventionally define division by 0 to be whatever we want.
rm(0, y) = 0;
rm(x + 1, y) = (rm(x, y) + 1) ·
sg(|y - (rm(x, y)+1)|).
Do you see how to get the messy composition into
the required form f(rm(x, y), x, y)?
Quotient: qt(
x,
y) = the greatest lower natural
bound of
y/
x. The idea here is that if we get a bigger
quotient each time the numerator is incremented to include another factor
of
y. This can be checked by incrementing the prevous remainder
and checking whether the result is
y.
qt(0, y) = 0;
qt(x + 1, y) = qt(x, y) + -sg(|y - (rm(x,
y)
+ 1)|).
Divisibility: x |
y is
the relation "
x divides
y evenly".
x | y = -sg(rm(x, y)).
Derived PR Operators
Wait a minute! The construction rules are
tedious. (Think about using sg as a conditional branch). Wouldn't
it be better to prove that PR is closed under more operators?) Then
we could use these as operators to construct more PR functions in a more
natural way instead of always putting them into this clunky form.
(That's why closure laws are so desirable). PR is closed under each
of the following operators.
Substitution of a constant: If f(x1,
..., xi, ..., xn) is PR then so is
h(x1,
..., xn) = g(x1, ..., k,
..., xn). The idea is that we can compose in appropriate
projections and a constant function:
h(x1, ..., xn) = g(p1,n(x1,
..., xi, ..., xn), ..., ck(pi,n(x1,
..., xi ..., xn), pn,n(x1,
..., xi ..., xn)).
Sum: Let any indexed collection {
fz:
zÎ
N} be given. Think of
Sz£xfz(y)
= h(x,y)
as a function of
x and
y. Then we may think
of it also as an
x-ary operation on the first
x functions
in the indexed set:
Sz £xfz(y)
= Sx(f0,
..., fx)(y).
We can establish that operation
Sx
preserves PR as follows
Sz £
0 fz(y) = 0;
Sz £x+1fz(y)
= Sz£xfz(y)
+ fx+1(y).
Bounded Sum: The preceding construction is non-uniform in
x.
There is also a uniform version of bounded summation on a single function.
Think of
g(
x,
y) =
Sz<x
f(z,
y) as a function of
x and
y.
Sz £
0 f(z, y) = 0;
Sz £x
+
1 f(z, y) = Sz£xf(z,
y) + f(x + 1, y) .
Product and Bounded Product: Similar to the sum case.
Pz £
0 fz(y) = 0;
Pz £x
+
1 fz(y) = Pz£xfz(y)
·fx+1(y).
Pz £
0 f(z, y) = 0;
Pz £x
+
1 f(z, y) = Pz£xf(z,
y)
· f(x
+ 1, y).
Definition by Cases: Let the
Pi
be mutually exclusive and exhaustive PR relations.
[g1(x) if P1 (x);
g2(x)
if P2(x); ...; gk(x)
if Pk(x)] = Sz£
k + 1 gz(x)
·
Pz(x).
Logical Operators
Conjunction:
P(x) ÙQ(x)
= P(x) ·Q(x).
Negation:
ØP(x)
= -sg(P(x)).
Disjunction:
P(x) ÚQ(x)
= Ø(ØP(x)
ÙØQ(x))
= max(P(x),
Q(x)).
Conditional:
P(x) ®Q(x)
= ØP(x)
ÚQ(x).
Biconditional:
P(x) «Q(x)
= (P(x) ®Q(x))
Ù
(Q(x)
®P(x)).
Bounded universal quantifier:
("z £
x) P(x, y) = Pz<xP(z,
y).
Bounded existential quantifier:
($z £
x) P(x, y) = Sz<xP(z,
y).
Bounded Minimization:
g(x, y) = (mz
£x)[
f(z,
y) = 0] = "the least z £x
such that f(x, y) = 0".
By this we mean that if no root of f is found up to the bound, we return
the bound to keep the function total. If the value returned under
search bound n is strictly less than n, then a root of f was found so we
don't increment when the search bound is raised to
n + 1.
If the value returned under search bound n is identical to n, then either
we (coincidentally) found a root at the last minute, or we stopped because
we hit the bound. So we have to check, further, if the bound is a
root in that case. Thus, we increment just in case the previous search
hit the bound and the bound is not a root.
(mz £
0)[ f(z, y) = 0] = 0.
(mz £x
+ 1)[ f(z, y) = 0] = (mz
£x)[
f(z,
y) = 0] + [(mz
£x)[
f(z,
y) = 0] = x Ùf(z,
y)
> 0].
(mz £x)[P(z,
y)]
= (mz
£x)
-sg(CharP(z,
y)).
Bounded Maximization:
(max z < x)[P(z, y)]
= x -. (mz
£x)[P(x-.
z,
y)].
Pure Iteration:
f0(y) = y.
fx + 1(y) = f(fx(y)).
Exercise 0: Show that for each
PR function there is a monotone PR function that is everywhere greater.
Some Number Theoretical Constructions
With our new operations we are on Easy Street. The
Prime number predicate is programmed just by writing its logical definition!
How's that for a handy computer language?
Primality:
Prime(x) = 1 < xÙ
("z< x)
[z = 1 ÚØ(z | x)].
The first prime after x: This one uses Euclid's theorem that
there exists at least one prime
p such that
x <p£x!
+ 1. . Why is that? Well,
x! + 1 has some prime factor
£x! + 1 by the uniqueness and existence
of prime factorizations (the fundamental theorem of arithmetic).
But
x! + 1 has no factor
£x,
since each such divisor leaves remainder 1. Hence,
x! + 1
has a prime factor x
< k
£x!
+ 1.
h(x) = (mz £x!
+ 1)[Prime(z) Ù x<z].
The xth prime:
p(0) = 2;
p(x + 1) = h(p(x)).
Exponent of xth prime in prime factorization
of y. How do we know it's unique? How do we know the search
bound is high enough? (Show by induction that 2y>y.
2 is the lowest possible prime factor of y, so we're safe.)
[y]x = y -.
(max z < y)[p(x)z |
y].
Goedel Numbering Finite Sequences
Let x be an n-ary sequence.
Define the Gödel coding:
<x> = p(0)1 + x[1] + p(1)1
+ x[2] + ... + p(n - 1)1
+ x[n - 1].
Note: I will use square brackets when double-subscripts or double-superscripts
are required..
This coding is unique by the uniqueness of prime factorizations (the
fundamental theorem of arithmetic). It is onto by the existence of prime
factorizations. The reason for adding 1 to the exponents is to distinguish
zero positions from empty positions. To decode a number as a sequence,
we have to remember to remove the extra 1. The standard notation for the
xth position in the sequence coded by
Decoding: d(y, x) = (y)x=
the item occurring in the xth position in the sequence coded by
y.
The standard notation is (y)x, but this gives
rise to double and triple subscripts, so I introduce the alternative notation
d(y,
x).
To keep the function total, we take the first position to be position 0.
d(y, x) = (y)x =
[y]x -. 1.
Length of decoded sequence (numer of consecutive prime factors of
x):
Using the convention that every position in the sequence corresponds to
a prime factor, we can simply search for the first non-factor of the code
number. Since we count prime factors starting with 0, this corresponds
to the length of the coded sequence.
lh(x) = (mz £x)[Ø(p(z)
| x)].
Proposition: if
x is non-empty then:
x = ((<x>)0, (<x>)1
, ...,(<x>)lh(<x>) -1).
Don't forget that length counts the number of items occurring, rather than
giving the position (starting from 0) of the last item occurring.
That's why we have to subtract 1 from the length in this result.
Exercise 1: Bijective Binary Sequence Encoding:
Define the binary primitive recursive function:
<x, y> = 1/2[(x + y) (x
+ y + 1)] + x.
-
This coding is evidently PR. Show that it is a bijection N2®
N. Hint: this is just a polynomial expression for the obvious enumeration
procedure. Think of the pairs as being presented in an N ´
N array with (0, 0) at the upper left. Given (x, y),
recover the code number by counting pairs, starting at (0, 0), along diagonals,
from the upper right to the lower left.
-
Show that 1/2[(x + y) (x + y + 1)] = St£
x+
y
t = the number of pairs (z,
w) occuring in diagonals
to the upper left of (x, y).
-
Show that x = the number of pairs remaining to be counted to the
upper right of (x, y).
-
Show that the decoding functions are also PR.
-
Use the preceding results and codings to produce n-ary PR encodings
and decodings.
Fancy Recursion
One reason codings are nice is that they give us
new and more elegant forms of recursion for free. The basic idea is that
the coding allows primitive recursion to simulate the fancy form of recursion
by looking at successive code numbers of the current "computational state"
of the fancy form of recursion.
Exercise 2: Course-of-Values Recursion (Pe'ter): Suppose
that
s1(
x), ...,
sk(
x)
£x.
Then CVR(
g,
f,
s1, ...,
sk)
is the unique function
h such that:
h(0, y) = g(y);
h(n + 1, y) = f(h(s1(n),y),
..., h(sk(n),
y),
n,
y).
Use the Goedel coding to show that PR is closed
under the CVR operator.
Exercise 3: Fibonacci Sequence: Show that the
following function is PR.
f(0) = 1;
f(1) = 1;
f(x + 2) = f(x) + f(x + 1).
Exercise 4: Simultaneous Recursion
(Pe'ter): SR
i(
g1, ...,
gk,
f1,
...,
fk) is the unique function
hi
such that:
hi(0, y) = gi(y);
hi(n + 1, y) = fi(h1(n,
y),
..., hk(n, y),
n, y).
Notice that k is constant. Use the
k-ary
sequence encoding of exercise 1 to show that PR is closed under the SR
operator.
Exercise 5: Recursion with parameter substitution: PS(
g,
f,
s)
is the unique
h such that:
h(0, y) = g(y);
h(n + 1, y) = f(h(n, s(n,
y)), n, y).
Show that PR is closed under the PS operation.
Hint: Trace the computation of
h(3,
y). You will
find that you end up needing to know the values of the successive compositions:
y;
s(2, y);
s(1, s(2, y));
s(0, s(1, s(2, y)));
etc.
These can be obtained by a "backwards" recursion;
or rather a forwards recursion that goes backwards in time!
S(0, n, y) = y;
S(t + 1, n, y) = s(n -.
t, S(t, n, y)).
Thus,
S(0, 3, y) = y;
S(1, 3, y) = s(2, S(0, 3, y)) =
s(2,
y);
S(2, 3, y) = s(1, S(1, 3, y)) =
s(1,
s(2,
S(0,
3, y))) = s(1, s(2,
y));
etc.
Now write a similar "backwards" recursion that applies
f
to the right values of S.
Exercise 6: Let f(x) = ggg
...
ghs
... sss(x), with x iterations of the g's and
the s's. Show that f is PR. Hint: use PS.
Breaking the PR barrier
We have lots of examples of recursion operations
that yield nothing new. This may lull you into assuming that every
kind of recursion can be massaged into primitive recursion by means of
suitable codings. But that would be a mistake, for there is an intuitively
effective recursion operator that generates non PR functions.
Double-recursion: R(g1,
g2,
g3,
g4)
is the unique h such that:
h(0, y,
z)
= g1(y, z);
h(x + 1,0, z)
= g2(h(x, c, z),
x, z);
h(x + 1, y + 1, z)
= g4(h(x + 1, y,
z),
h(g3(h(x+
1, y, z), x, y, z),
x,
z),
x,
y, z).
Double recursion allows one to apply a variable number of primitive recursions
depending on the value of
x. To see this, observe that:
-
For a given value of x, the third clause decrements y down
to 0.
-
When y reaches 0, you hit the second clause, which decrements x
to x-1 and restarts the primitive recursion
with y set to some fixed constant c.
-
Finally, when x is decremented down to 0, we hit the first clause,
which is the base case.
Since double recursion can simulate arbitrary numbers of primitive recursions,
it is fairly intuitive that a double recursive function ought to be able
to grow faster than any given PR function. If the function uses
n
primitive recursions at stage
n, then for each number of primitive
recursions, after some time it does something that would require at least
one more primitive recursion.
The proof of this claim will wait a bit until we introduce the Grzegorczyk
hierarchy.
The Ackermann function (1928): The Ackermann function is
historically the first example of an intuitively effective function that
is not PR. (cf. Rogers p. 8). The Ackermann function grows
so fast that some skeptical finitistic mathematicians have denied that
it is computable (or that it even exists). This sounds crazy until
one considers that the exact values for small arguments can only be defined
(in a book length definition) by the function itself!
-
a(0, 0, z) = z;
-
a(0, y + 1, z) = a(0, y, z) +
1;
-
a(1, 0, z) = 0;
-
a(x + 2, 0, z) = 1;
-
a(x + 1, y + 1, z) = a(x, a(x
+ 1, y, z), z).
Exercise 7: The Ackermann function is more intuitive
than it looks at first. This will be apparent after you do the following.
-
Show that the Ackermann function really does result from applying DR to
PR functions. Don't let all those clauses scare you!
-
Find simple primitive recursive definitions of each of the functions gi(x,
y) = a(i, x, y). Prove that you are
correct.
-
Relate this fact to the point made about uniformity when we introduced
bounded sums as an operator.
-
What are g0, g1, and g2?
The Péter
function: The Péter
function is also not PR. It is a simplified version of Ackermann's
function and is called the Ackermann function in Cutland (p. 46).
p(0,
y)
= y + 1;
p(x + 1, 0) = p(x, 1);
p(x + 1, y+1) = p(x, p(x
+ 1, y)).
Even Fancier Recursion
We can generalize the idea of double recursion by adding recursive variables
and piling on new recursion conditions. So while double recursion
does a variable number of nested primitive recursions, triple recursion
allows for a variable number of double recursions, and so forth.
Each time we get something new.
Next
Lecture