A Course at Nagoya University Ergodic Number Theory Jörn Steuding This is Felix, and iterations of his picture under the ergodic cat map (from left to right and top down). Since (discrete) ergodic theory is no harm for animals, Felix returns after finitely many iterations. We will explain this phenomenon... What is Ergodic Number Theory? Ergodic theory studies the long time behaviour of dynamical systems. This line of investigation has its origin in Poincaré’s investigations in statistical physics more than one hundred years ago. However, in the meantime ergodic theory has found many remarkable applications in various branches of mathematics. Here we shall focus on arithmetical applications. We begin with Weyl’s theorems on uniform distribution and applications to diophantine analysis, which might be interpreted as another starting point of ergodic theory. Then we introduce necessary concepts, techniques, and notions – mostly from measure and integration theory – in order to set the stage for the highlights: Birkhoff’s famous ergodic theorem, a sketch of Hlawka’s proof of the uniform distribution of the ordinates of the nontrivial zeros of the Riemann zeta-function, and Khintchine’s theorem on patterns in continued fraction expansions of real numbers. Our approach is intended to be self-contained. For this aim we recall the foundations of Lebesgue measure and integral as well as classical results on continued fractions, the latter topic in details and with proofs; however, the account on the zeta-function should be regarded as an appetizer. These notes contain slightly more than the material presented during the lectures at Nagoya University; in particular, references to related results and advanced topics are given which could not be investigated here in detail. For those readers who want to learn more we recommend the excellent monographies of Dajani & Kraaikamp [37], resp. Choe [31]. A German version of these notes originates from a course I have given at Würzburg University in 2007/08 which can be downloaded at http://www.mathematik.uniwuerzburg.de/∼steuding/ergod.htm. I am very grateful to Christian Beck for his careful reading of the German notes — comments and corrections are welcome and can be send to steuding@mathematik.uni-wuerzburg.de. Furthermore, I would like to thank Julia Koch for providing the photograph of her beautiful cat Felix (see the frontpage) and her trust in the cat map to let her cat return. I am also grateful to Martin Schröter for creating the impressing pictures of Felix under the cat map. Moreover, I would like to thank Thomas Christ for technical support and my wife Rasa for her help with most of the other pictures. Last but not least, I would like to express my gratitude to the hospitable audience at Nagoya University, and, in particular, Prof. Kohji Matsumoto for giving me the opportunity to teach this course and many valuable comments. Jörn Steuding, Nagoya, November 2010. Contents Chapter 1. Motivation: Billiards and Benford 1.1. Classical Diophantine Approximation 1.2. Uniform Distribution Modulo One 4 6 9 Chapter 2. Prelude: Lebesgue Measure and Integral 2.1. Measure Theory 2.2. The Lebesgue Integral 17 17 20 Chapter 3. Measure Invariance and Ergodicity 3.1. Measure Preserving Transformations 3.2. Ergodicity and Mixing 24 24 31 Chapter 4. Classical Ergodic Theorems 4.1. The Mean Ergodic Theorem of von Neumann 4.2. The Birkhoff Pointwise Ergodic Theorem 37 37 39 Chapter 5. Heavenly and Normal Applications 5.1. Poincaré’s Recurrence Theorem 5.2. Normal numbers 47 47 53 Chapter 6. Interlude: The Riemann Zeta-Function 6.1. Primes and Zeros 6.2. Applications of Uniform Distribution and Ergodic Theory 61 61 70 Chapter 7. Crash Course in Continued Fractions 7.1. The Euclidean Algorithm Revisited 7.2. Infinite Continued Fractions 78 78 81 Chapter 8. Metric Theory of Continued Fractions 8.1. Ergodicity of the Continued Fraction Mapping 8.2. The Theorems of Khintchine and Lévy 88 89 93 Chapter 9. Coda: Arithmetic Progressions 102 Biographical and Historical Notes 111 Notations 125 Bibliography 126 Index 133 3 CHAPTER 1 Motivation: Billiards and Benford Imagine a square with mirrors at its sides and a ray of light is leaving the interior of the square. The light ray is reflected from the the mirrors and we may ask whether its path will be periodic or aperiodic? What initial data is determining periodicity and what aperiodicity? Can it happen that the path is dense in the square? These questions in the context of billiards were first raised by König & Szücs [85] in 1913.∗ For the sake of simplicity let us replace the square by a disk. Then the ray of light is always reflected by the same angle at its boundary – a phenomenon called rotation symmetry which makes circle billiards a bit easier than square billiards. Moreover, we may assume that the boundary of the disk is the unit circle in the complex plane C. The so-called circle group is the multiplicative group of all complex numbers of absolute value one and can be parametrized by the exponential function: √ T := {exp(2πix) : x ∈ [0, 1)} with i = −1. Note that the map exp : R → T, x 7→ exp(2πix) is a surjective but not injective group homomorphism. By the isomorphy theorem from algebra we find T∼ = R / Z. Hence, the circle group T is an isomorphic and homeomorphic image of the unit interval [0, 1) as an additive group, resp. of the real line R modulo Z. In the sequel it will often be advantageous to work with cosets r + Z, resp. the corresponding points on the unit circle (or higher-dimensional tori) rather than with real numbers r. Billiards is giving a first example. Let πα denote the angle between the ray of light and the circle T. Since this angle remains the same after each reflection, geometry shows that a consecutive intersection point of the ray with the circle is obtained from the previous one by a circle rotation through the angle 2πα. Thus, denoting the n-th point where the ray is intersecting the circle by ζn = exp(2πixn ), we find xn − xn−1 ≡ α mod 1 resp. xn = x0 + nα for n ∈ N. (Here the notation ’mod 1’ reflects that we only need to consider the fractional parts of the sequence of xn .) If α is rational, the path of the light ray is periodic. More precisely, for α = pq with p, q ∈ N the ray of light ∗ The unexperienced reader may try the tutorial ’Donald Duck in Mathmagicland’ about math and billiards... 4 1. Motivation: Billiards and Benford 5 is q-periodic (meaning xn+q ≡ xn mod 1). But what about irrational α? In this case the ray of light will sooner or later visit any arc of the circle. We will treat this case below with classical methods from Diophantine approximation theory (namely Corollary 1.5). Figure 1. A periodic ray of light; here we have α = 15 which corresponds to an angle of 36◦ between the ray and the circle. Another interesting phenomenon is Benford’s law which describes irregularities in the distribution of digits in statistical data. In 1881 Newcomb noticed that in books consisting tabulars with values for the logarithm those pages starting with digit 1 have been used more often than others. In 1938 this observation was rediscovered and popularized by the physicist Benford [16] who gave further examples from statistics about American towns. According to this distribution a set of numbers is said to be Benford distributed if the leading digit equals k ∈ {1, 2, . . . , 9} for log10 (1 + k1 ) percent. Thus, slightly more than thirty percent of the numbers in a data set distributed following Benford’s law have leading digit 1, and only about six percent start with digit 7. Obviously, this distribution phenomenon, commonly known as Benford’s law, cannot be true in general. Here is an illustrating example of a deterministic sequence which follows Benford’s law, also known as Gelfand’s problem.† Considering the powers of two, we notice that among the first of those powers, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8092, . . . , there are more integers starting with digit 1 than with digit 3. Given a power of 2 with a decimal expansion of m + 1 digits, 10m k ≤ 2n < 10m (k + 1) for k ∈ {0, 1, . . . , 9}, taking the logarithm leads to m + log10 k ≤ n log10 2 < m + log10 (k + 1). For a real number x we introduce the decomposition in its integral and fractional parts by writing x = ⌊x⌋ + {x} with ⌊x⌋ being the largest integer † Although, Gelfand, being an excellent mathematician, had definitely no problem with this task. 6 ERGODIC NUMBER THEORY less than or equal to x and {x} ∈ [0, 1) the fractional part (which we denote sometimes also as x mod 1 as above). Consequently, log10 k ≤ {n log10 2} < log10 (k + 1). By convexity, the interval [log10 k, log10 (k + 1)) is larger for small k, so, heuristically, the chance is larger to have an n for which n log10 2 has fractional part in this interval. We shall later show (again by Corollary 1.5) that the sequence of numbers log10 xn = n log10 2 is uniformly distributed modulo 1, thence, as n → ∞, the proportion of those with leading digit k ∈ {1, 2, 3, . . . , 9} equals the length of the interval [log10 k, log10 (k + 1)), that is log10 (k + 1) − log10 k = log10 (1 + k1 ), and, in particular, log10 2 ≈ 30.1 percent of the powers of 2 have a decimal expansion with leading digit 1 whereas the leading digit equals 7 for only approximately 5.8 percent. On the contrary, powers of 10 have always leading digit 1 in the decimal system. This shows that the arithmetic nature of log10 2 is relevant for the proportion with which leading digits appear. Benford’s law is supposed to hold for quite many sequences as constants in physics and stock market values.‡ A further example of a sequence for which Benford’s law is known to be true is the sequence of Fibonacci numbers 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . , however, the sequence of primes is not as was proved by Jolissaint [73] and Diaconis [41]. Recent investigations show that certain stochastic processes, e.g., the geometric Brownian motion or the 3X + 1-iteration due to Collatz satisfy Benford’s law as shown by Kontorovich & Miller [90]). 1.1. Classical Diophantine Approximation Diophantine analysis deals with integer solutions to algebraic equations and rational approximations to real numbers, respectively; the name is attributed to Diophant who was a Greek mathematician of the third century who wrote an influential treatise on such type of questions. Since Q is dense in R, we can approximate any real number by rationals as closely as we please. The classical approximation theorem of Dirichlet from 1842 provides a quantitative version: Theorem 1.1. Given ξ ∈ R \ Q, there exist infinitely many rational numbers p q satisfying 1 p (1.1) ξ − q < q 2 . ‡ Benford’s law is quite popular. The U.S.-American tv-serial NUMB3RS deals with Benford’s law (in the episode “The Running Man”). Moreover, the creative bookkeeping of the enterprise Enron was discovered by the U.S. tax authority with the help of Benford’s law. 1. Motivation: Billiards and Benford 7 This property characterizes irrational numbers: if ξ is rational, inequality (1.1) has only finitely many solutions pq . The quality of a rational approximation to a given real number is measured in terms of the denominator. Roughly speaking, Dirichlet’s theorem shows that irrational numbers possess more and better rational approximations than rationals! Proof. We shall apply the pigeon hole principle: If n + 1 objects are distributed among n boxes, there is at least one box containing at least two objects. Given Q ∈ N the Q + 1 points 0, {ξ}, {2ξ}, . . . , {Qξ} lie in the Q disjoint intervals j−1 j , für j = 1, . . . Q. Q Q Hence, there is at least one interval containing two points {kξ}, {ℓξ}, say. Assuming {kξ} ≥ {ℓξ}, it follows that (1.2) {kξ} − {ℓξ} = kξ − [kξ] − ℓξ + [ℓξ] = {(k − ℓ)ξ} + [(k − ℓ)ξ] + [ℓξ] − [kξ] . {z } | ∈Z Since {kξ} − {ℓξ} ∈ [0, q := |k − ℓ| we thus get 1 Q ), the integral parts in (1.2) sum up to zero. For {qξ} = {kξ} − {ℓξ} < With p := [qξ] we obtain ξ − (1.3) 1 . Q {qξ} 1 p |qξ − p| = = < , q q q qQ which immediately leads to (1.1). Now assume that ξ is irrational. Suppose there exist only finitely many solutions pq11 , . . . , pqnn to (1.1). Since ξ 6∈ Q, we can find a number Q such that p j ξ − > 1 for j = 1, . . . , n, qj Q contradicting (1.3). Finally, assume that ξ is rational, that is ξ = b ∈ N. If ξ = ab 6= pq , then ξ − p = |aq − bp| ≥ 1 , q bq bq a b with some a ∈ Z and and (1.1) implies q < b. Hence there can only finitely many (1.1). • p q exist satisfying The classical approximation theorem of Kronecker from 1884 generalizes Dirichlet’s theorem 1.1 to the inhomogeneous case: 8 ERGODIC NUMBER THEORY Theorem 1.2. Let ξ ∈ R \ Q and η ∈ R. For any N ∈ N, there exist Q ∈ N with Q > N and P ∈ Z such that 3 |Qξ − P − η| < . Q In §23.6 of the classic [61] on number theory, the authors Hardy & Wright state a multi-dimensional analogue of Kronecker’s theorem (see Exercise 1.2) and comment on this result as ”one of those mathematical theorems which assert (...) that what is not impossible will happen sometimes however improbable it may be.”∗ Proof. According to Theorem 1.1 there exist coprime integers q > 2N and p such that 1 |qξ − p| < . q Now suppose that m is an integer satisfying 1 |qη − m| ≤ . 2 In view of Bézout’s theorem from elementary number theory we may find a linear combination m = px − qy with integers x, y, where |x| ≤ 21 q; actually, this is an easy consequence of the Euclidean algorithm for p and q; see [128]. Hence q(xξ − y − η) = x(qξ − p) − (qη − m), and 1 1 1 q · + = 1, 2 q 2 respectively. Setting Q = q + x and P = p + y we thus obtain 1 3 N < q ≤ Q ≤ q. 2 2 It follows that 2 3 1 1 |Qξ − P − η| ≤ |xξ − y − η| + |qξ − p| < + = ≤ . q q q Q This is the assertion of the theorem. • |q(xξ − y − η)| < The Kronecker approximation theorem allows a solution to our billiards problem from the beginning. Here we shall consider the square billiard. We may assume the square to be given by [0, 21 )2 ⊂ R2 . If γ denotes the angle between one edge and the initial direction of the ray, then the path of the ray is determined by the linear equation y = ξx + β, where ξ = tan γ and β is some real number according to the starting point of the ray. If we reflect the edges rather than the ray, our straight line is defined in the whole plane and we observe that the path is periodic if, and only if, the above line degenerates modulo Z2 into a finite number of ∗ Outside mathematics this is also known as ‘Murphy’s law’. 1. Motivation: Billiards and Benford 9 straight line segments; otherwise, the path of the ray is dense in the square. In fact, it is periodic if, and only if, the straight line intersects with the same points on the reflected edges modulo Z2 , that is when its slope ξ is rational. Now suppose that ξ is irrational. Then, for any point (x1 , y1 ) ∈ R2 and any ǫ > 0, in view of Kronecker’s approximation theorem 1.2 applied to η = −y1 + β + ξx1 , there exist integers P, Q such that |y1 + P − (ξ(x1 + Q) + β)| = | y1 − β − ξx1 +P − Qξ| < ǫ. {z } | =−η Hence the point (x1 , y1 ) and point (x1 , ξ(x1 + Q) + β) on the line differ modulo Z2 by a quantity less than ǫ. We conclude: the ray of light describes Figure 2. The paths of two different rays of light, one with irrational tangent, the other one with rational tangent. a closed resp. periodic path if the line has a rational tangent, i.e. ξ = tan γ ∈ Q. Otherwise, the ray of light is visiting any neighbourhood of any point in the square.† Despite the different geometry circle billiards also needs irrationality for the denseness of the corresponding path (see Exercise 1.3). 1.2. Uniform Distribution Modulo One In view of Kronecker’s approximation theorem the fractional parts of the numbers nξ lie dense in the unit interval as n ranges thorugh N provided ξ is irrational. Now we want to study this denseness in a quantitative manner. A sequence (xn ) of real numbers is said to be uniformly distributed modulo 1 (resp. equidistributed) if for all α, β with 0 ≤ α < β ≤ 1 the proportion of the fractional parts of the xn in the interval [α, β) corresponds to its length in the following sense: 1 ♯{1 ≤ n ≤ N : {xn } ∈ [α, β)} = β − α. N →∞ N lim Obviously, it suffices to consider only intervals of the form [0, β) with arbitrary β ∈ (0, 1). In terms of probability this means that for a uniformly distributed random variable all possible positions are equally probable. The first important results in this direction were obtained by Hermann Weyl∗ around 1913-16 (see [150]). Here is his first † See jdm.mathematik.uni-karlsruhe.de/.../vortrag.pdf for a very nice visualization. not to confuse with Andre Weil ∗ 10 ERGODIC NUMBER THEORY Theorem 1.3. A sequence (xn ) of real numbers is uniformly distributed modulo 1 if, and only if, for any Riemann integrable function f : [0, 1] → C, Z 1 N 1 X (1.4) lim f (x) dx. f ({xn }) = N →∞ N 0 n=1 Proof. Given α, β ∈ [0, 1), denote by χ[α,β) the indicator function of the intervall [α, β), i.e., 1 if α ≤ x < β, χ[α,β) (x) = 0 otherwise. Obviously, Z 1 χ[α,β) (x) dx = β − α. 0 Therefore, the sequence (xn ) is uniformly distributed modulo 1 if, and only if, for any pair α, β ∈ [0, 1), Z 1 N 1 X lim χ[α,β) (x) dx. χ[α,β) ({xn }) = N →∞ N 0 n=1 Assuming the asymptotic formula (1.4) for any Riemann integrable function f , it follows that (xn ) is indeed uniformly distributed modulo 1. In order to show the converse implication suppose that (xn ) is uniformly distributed modulo 1. Then (1.4) holds for f = χα,β and, consequently, for any linear combination of such indicator functions. In particular, we may deduce that (1.4) is true for any step function. It is well-known from any beginners course on calculus that, for any real-valued Riemann integrable function f and any ǫ > 0, we can find step functions t− , t+ such that t− (x) ≤ f (x) ≤ t+ (x) and Z Hence, Z 0 and for all x ∈ [0, 1], 1 0 (t+ (x) − t− (x)) dx < ǫ. 1 f (x) dx ≥ Z 1 0 t− (x) dx > Z 0 1 t+ (x) dx − ǫ, Z 1 Z 1 N N 1 X 1 X f (x) dx ≤ t+ (x) dx + ǫ, f ({xn }) − t+ ({xn }) − N N 0 0 n=1 n=1 which is less than 2ǫ for all sufficiently large N . Analogously, we obtain Z 1 N 1 X f (x) dx > −2ǫ. f ({xn }) − N 0 n=1 Consequently, (1.4) holds for all real-valued Riemann integrable functions f . The case of complex-valued Riemann integrable functions can be deduced from the real case by treating real and imaginary part of f separately. • 1. Motivation: Billiards and Benford 11 The converse of Weyl’s Theorem was found by de Bruijn [25]: given a function f : [0, 1) → C with the property that for any uniformly distributed sequence (xn ) the limit N 1 X f ({xn }) lim N →∞ N n=1 exists, then f is Riemann integrable. It is interesting that here the Riemann integral is superior to the Lebesgue integral. In fact, Theorem 1.3 does not hold for Lebesgue integrable functions f in general since f might vanish at each point {xn } but have a non-vanishing integral. This subtle difference is related to a rather important application of uniformly distributed sequences, namely to so-called Monte-Carlo methods and their use in numerical integration.† If we distribute N points randomly in the square [−1, 1]2 in the Euclidean plane and count the number M of those points which lie inside the unit circle centered at the origin, then the quotient M/N is a good guess for the area π of the unit disk; with growing N this approximationen is expected to get better and better. In view of this idea uniformly distributed sequences can be used to numerically evaluate certain integrals there is no R for which 2 elementary method, e.g. the Gaussian integral exp(−x ) dx. More on this topic can be found in Hlawka [65]. Further applications are relevant in the theory of pseudo-random numbers (see [34]). Already Weyl noticed that the appearing limits are uniform which has been studied ever since under the notion of discrepancy. This topic has amusing applications, for instance, in billiards where we may ask how soon an aperiodic ray of light will visit a given domain? First results for effective billiards are due to Weyl [149], interesting and surprising results on square billiards have recently been discovered by Beck [15]. Also important in this setting are effective versions of the inhomogeneous Kronecker approximation theorem 1.2 as, for example, [117]. For the general theory of uniform distribution and discrepancy we refer to the monographies of Harman [60] and Kuipers & Niederreiter [106]. Our next aim is another characterization of uniform distribution modulo one, also due to Weyl. Recall the parametrization of the unit interval by the exponential function from the very beginning. For abbreviation we write e(ξ) = exp(2πiξ) for ξ ∈ R which translates the 2πi-periodicity of the exponential function to 1-periodicity: e(ξ) = e(ξ + Z). Theorem 1.4. A sequence (xn ) of real numbers is uniformly distributed modulo 1 if, and only if, for any integer m 6= 0, (1.5) N 1 X e(mxn ) = 0. N →∞ N lim n=1 † The name Monte-Carlo is an attribute to gambling; there is no university in Monte Carlo or nearby. 12 ERGODIC NUMBER THEORY Proof. Suppose the sequence (xn ) is uniformly distributed modulo 1, then Theorem 1.3 applied with f (x) = e(mx) shows Z 1 N 1 X lim e(mx) dx. e(mxn ) = N →∞ N 0 n=1 For any integer m 6= 0 the right-hand side equals zero which gives (1.5). For the converse suppose (1.5) for all integers m 6= 0. Using the trigonometric polynomial P (x) = +M X with am ∈ C, am e(mx) m=−M it follows from linearity that N 1 X P ({xn }) = lim N →∞ N n=1 (1.6) +M X m=−M = a0 = Z N 1 X am · lim e(mxn ) N →∞ N n=1 1 P (x) dx. 0 Recall Weierstraß’ approximation theorem which claims that, for any continuous 1-periodic function f and any ǫ > 0, there exists a trigonometric polynomial P such that |f (x) − P (x)| < ǫ (1.7) for 0≤x<1 (this can be proved, for example, with Fourier Analysis; see [69].‡) Using this approximating polynomial, we deduce Z 1 N 1 X f (x) dx f ({xn }) − N 0 n=1 Z 1 N N 1 X 1 X P (x) dx (f ({xn }) − P ({xn })) + P ({xn }) − ≤ N N 0 n=1 n=1 Z 1 + (P (x) − f (x)) dx . 0 The first and the third term on the right are less than ǫ thanks to (1.7); the second term is small by (1.6). Hence, formula (1.4) holds for all continuous, 1-periodic functions f . Denoting by χ[α,β) the indicator function of the interval [α, β) (as in the proof of the previous theorem), for any ǫ > 0, there exist continuous 1-periodic functions f− , f+ satisfying f− (x) ≤ χ[α,β) (x) ≤ f+ (x) and Z 0 ‡ for all 0 ≤ x < 1, 1 (f+ (x) − f− (x)) dx < ǫ. Actually, the authors of [69] attribute this result to Fejér. 1. Motivation: Billiards and Benford 13 This leads to Z 1 N 1 X χ[α,β) (x) dx. χ[α,β) ({xn }) = N →∞ N 0 lim n=1 Hence, the sequence (xn ) is uniformly distributed modulo 1. • Combinatorial proofs of Weyl’s theorems can be found in [72]. We shall illustrate the latter criterion with an example. Consider the fractional parts of the numbers xn = log n. An easy computation shows N X e(log n) = N X n=1 n=1 n2πi = N X n 2πi n=1 ∼ N 2πi N Z 1 N 2πi u2πi du = 0 N 1+2πi 1 + 2πi which is not o(N ). Hence, the sequence (log n)n is not uniformly distributed modulo 1. Actually, this is the reason why we have been surprised by Benford’s law. If (xn ) is uniformly distributed modulo one, then (log xn ) is Benford distributed. As a matter of fact, the Benford distribution is nothing else than the probability law of the mantissa with respect to the basis. There is an important application of Theorem 1.4 due to Bohl [22] improving our observation on the denseness of the fractional parts of nξ in the unit intervall, as n varies through N.§ Corollary 1.5. Given a real number ξ, the sequence (nξ)n is uniformly distributed modulo 1 if, and only if, ξ is irrational. Proof. If ξ is irrational, then e(kξ) 6= 1 for any k ∈ Z and the formula for the finite geometric series yields N X n=1 e(mnξ) = e(mξ) 1 − e(mN ξ) 1 − e(mξ) for all integers m 6= 0. Since this quantity is bounded independently of N , it follows that N 1 X exp(2πimnξ) = 0. lim N →∞ N n=1 Otherwise, ξ = ab for some integers a, b with b 6= 0. In this case the limit is different from zero for all integer multiples m of b and Theorem 1.4 implies the assertion. • We return to Gelfand’s problem from the beginning. First, we observe that log10 2 is irrational. In fact, assuming that it is not, there § Remarkably, at about the same time also Sierpinski and Weyl obtained similar results; for the interesting history we refer to [66]. 14 ERGODIC NUMBER THEORY 1 20 y 0 0 0 1 0 200 x Figure 3. √ The uniform distribution modulo 1 for the sequence (n 2): √ on the left a histogram concerning the distri, j) bution of {n 2} for n = 1, . . . , 500 in the intervals [ j−1 √ 10 10 with 1 ≤ j ≤ 10, in the middle as points (n, {n 2}) in the unit square, and as points distributed on the circle group on the right. would exist positive integers a and b such that 10a = 2b , which is impossible by the unique prime factorization of integers. Hence, applying Corollary 1.5, the proportion of positive integers n for which the inequalities log10 k ≤ {n log10 2} < log10 (k + 1) hold equals the length of the interval, that is log10 (1 + k1 ), as predicted by Benford’s law.¶ Corollary 1.5 can be generalized in various ways. Vinogradov [144] proved the ternary Goldbach conjecture that any sufficiently large odd integer can be represented as a sum of three primes; an important tool in his approach are estimates for exponential sums of the form X e(ξpn ), pn ≤N where pn denotes the nth prime (in ascending order). For irrational ξ the sequence (ξpn ) is uniformly distributed modulo 1. In order to get an impression on the depth of this result the reader may start with the non-trivial problem how the sequence (ξpn ) is distributed modulo 1 if ξ is rational; interestingly, again the name Dirichlet will pop up. On the contrary, the binary Goldbach conjecture that any even integer larger than two is representable as sum of two primes is still open. Exercises Actually, the statement of Dirichlet’s approximation theorem has already been known to Lagrange and his contemporaries. However, Dirichlet’s elegant approach allowed him the following interesting generalizations: ¶ There is another problem of Gelf’ond on digits of prime numbers which was recently solved by Mauduit & Rivat [103] who showed that the sum of digits of primes written in a basis q ≥ 2 is uniformly distributed in arithmetic progressions, which is not uniform distritbution modulo one, and the mathematician A.O. Gel’fond is not the mathematician I.M. Gelfand. 1. Motivation: Billiards and Benford 15 Exercise 1.1. Prove the following simultaneous approximation theorem: assume ξij ∈ R with 1 ≤ i ≤ m, 1 ≤ j ≤ n and 1 < Q ∈ Z, then there exist integers p1 , . . . , pm , q1 , . . . , qn satisfying 1 ≤ max{|qj | : 1 ≤ j ≤ n} < Qm/n and 1 Q Why is this a generalization of Theorem 1.1? |ξi1 q1 + . . . + ξin qn − pi | ≤ for 1 ≤ i ≤ m. Exercise 1.2. Prove the following simultaneous inhomoegenous approximation theorem: assume 1, ξ1 , . . . , ξm are linearly independent over the rationals, η1 , . . . , ηm are arbitrary, and N and ǫ are positive. Then there exist integers Q > N and P1 , . . . Pm such that |Qξk − Pk − ηk | < ǫ for 1 ≤ k ≤ m. Deduce that the sequence of vectors (nξ1 , . . . , nξm ) lies dense in the unit cube [0, 1)m . This is the multidimensional version of Kronecker’s approximation theorem. It might be interesting to consider the illustrative astronomical application to conjunctions in planetary systems from Hardy & Wright [61], §23.6, and to read three rather different proofs in the same source. We return to fun, meaning the billiard problem from the beginning. Exercise 1.3. Explain the details for aperiodicity in square billiards. Moreover, solve the problem of circular billiard: under what condition on the angle α is the path periodic, resp. aperiodic? What can be said about other convex bodies? Which results can be obtained with ergodic theory? For advice and more information on related topics we refer to the textbooks [61, 128] as well as to the entertaining book [132] by Tabachnikov, Birkhoff’s classical articles [21], and the surprising results of Veech [143] on mathematical billiards not only on disks and squares. Sometimes it is not easy to decide whether a given sequence is uniformly distributed modulo one. Actually, it is not known whether the sequence of powers ( 32 )n or the numbers exp(n) are uniformly distributed modulo one. Here are some easier sequences to be checked: Exercise 1.4. Find a sequence of numbers (xn ) which consists of exactly all rational numbers of the unit interval and is uniformly distributed. Moreover, show that the sequence of numbers !n √ 5+1 yn := for n ∈ N 2 is not uniformly distributed modulo one. Recall that the Fibonacci numbers, defined by the recursion Fn+1 = Fn + Fn−1 with initial values F0 = 0, F1 = 1, can be computed alternatively with Binet’s explicit formula √ √ 1− 5 n 1 5+1 n ) −( ) ). (1.8) Fn = √ (( 2 2 5 16 ERGODIC NUMBER THEORY Koksma [87] showed that almost all sequences (αn ) with α > 1 are uniformly distributed, however, there is no single α with this property explicitly known. On the contrary, if α is a Salem number, i.e., all algebraic conjugates of α (except α) have absolute value less than one, the sequence (αn ) is not uniformly distributed; √ the golden ratio 5+1 is an example of such a Salem number. See the work of 2 Pisot & Salem [111] to catch a first glimpse. Here is another generalization of Corollary 1.5 due to Weyl. Exercise 1.5. How are the values of polynomials distributed modulo one? Let P = ad X d + . . . + a1 X + a0 be a polynomial with real coefficients, where at least one coefficient aj with j 6= 0 is irrational. Prove that the values P (n), as n ranges through N, are uniformly distributed modulo one. Hint: first, one may show the following result due to van der Corput: A sequence of real numbers xn is uniformly distributed modulo one if for any positive integer m the sequence of real numbers xm+n − xn is uniformly distributed modulo one. This might be used in combination with the observation that P (X + m) − P (X) is a polynomial of degree d − 1. More advise can be found in [33], §XI.1. We conclude with an aperitif for Monte Carlo methods: Exercise 1.6. Write a computer program to generate random points in the unit square [0, 1)2 . Count those points in an appropriate subset in order to obtain a numerical approximation to π = 3.14159 . . .. How many points are needed for an accuracy of 10−3 ? Can you do the same for e = exp(1) = 2.71828 . . .? * * * It should be noted that not only observations in physics on the motion of heavenly bodies gave motivation for ergodic theory; surprisingly, also number theory had some impact on statistical physics. In his 1914-paper [149], entitled Sur une application de la théorie des nombres à la mécaniques statistique et la théorie des pertubations, Weyl applied his uniform distribution theory to statistical mechanics. We aim at the important ergodic theorem of Birkhoff from 1931 which generalizes the uniform distribution results of Weyl significally. Important ingredients are Lebesgue measure and integral whose construction and basic properties we recall in the following chapter. CHAPTER 2 Prelude: Lebesgue Measure and Integral There do exist sets for which one cannot assign a geometrical length, area or volume. In 1905 Vitali showed the unsolvability of the so-called measure problem for any space Rd . An example for the one-dimensional case is provided by the equivalence relation defined by x∼y ⇐⇒ x − y ∈ Q. By the axiom of choice we may define the set A ⊂ [0, 1] consisting of exactly one representative of each equivalence class. Now assume that there exists a meaningful measure µ with all the nice properties we want, e.g., monotonic, translation invariant, and countable additive). Then X 1 = µ([0, 1]) ≤ µ(A + x) ≤ µ([−1, 2]) = 3, | {z } x∈[−1,1]∩Q =µ(A) where A + x is defined as {a + x : a ∈ A}. Therefore, we obviously cannot assign any meaningful value to µ(A).∗ It was Emile Borel who introduced the notions of measure and measurable sets in a rigorous way in analysis and it was Henri Lebesgue who built up a new integration theory on this ground — different from and more powerful than Riemann’s integration theory which is based on functions rather than sets. An excellent reference for measure and integration theory is the classic [89] of Kolmogorov & Fomin. 2.1. Measure Theory Let X be a non-empty set and denote by P(X) its power set. A nonempty system of sets F ⊂ P(X) is called an algebra if X ∈ F and if with A, B ∈ F also A ∪ B and X \ B lie in F. Such an algebra F is called a σ-algebra if F is closed with respect to countable unions, i.e., if the following axioms are satisfied: ∗ • ∅, X ∈ F; • X \ A ∈ F for any A ∈ F; S • j Aj ∈ F for any countable sequence of sets Aj ∈ F. This is slightly related to the famous counter-intuitive Banach-Tarski-paradox 3 which claims that a ball in R can be cut into five pieces which can be rearranged as two balls of the same size, for short: • = • + • (see [147]). 17 18 Since ERGODIC NUMBER THEORY \ j Aj = A \ [ j (A \ Aj ) for A := [ Aj , j T it follows from the last axiom that j Aj ∈ F. Hence, a σ-algebra is closed under countably many unions and intersections. For X 6= ∅ the systems {X, ∅} and the power set P(X) of X itself are examples for σ-algebras, however, being extremely small, resp. extremely large, they do not play a big role in the sequel. It is not difficult to see that any countable intersection of σ-algebras is again a σ-algebra. Hence, for any system ∅ = 6 E ⊂ P(X) the intersection \ Aσ (E) = F E⊂F F is a σ−algebra is the smallest σ-algebra which contains E; for this reason Aσ (E) is said to be generated by E. A quite important σ-algebra is the Borel σ-algebra B of a (non-empty) metric space X defined as the smallest σ-algebra generated by the open sets in X. A non-negative function µ, defined on a σ-algebra F with some space X 6= ∅, is called a measure if the following axioms are satisfied: • µ(∅) = 0; • for any countable sequence of pairwise disjsoint sets Aj ∈ F, [ X Aj = µ µ(Aj ). j j In view of the last property µ is said to be σ-additive (resp. countable additive). Note that we allow µ to take the value +∞ (of course, taking into account the standard arithmetic with infinity). Then the triple (X, F, µ), consisting of a set X 6= ∅, an associated σ-algebra F, and a measure µ, is called a measure space and the elements in F are called measurable. If µ(X) < ∞ the measure space is said to be finite. A very important concept in this theory is the notion of the null set, i.e., any set A ∈ F with measure zero: µ(A) = 0. First properties are • Monotonicity: µ(A) ≤ µ(B) for all measurable sets A ⊂ B; • Nesting Principle: for any nested sequence of measurable sets A1 ⊃ A2 ⊃ . . ., \ An . lim µ(An ) = µ n→∞ n We shall give some examples. First of all there is the counting measure ♯A if ♯A < +∞, A 7→ |A| = +∞ otherwise, where ♯A counts the number of elements of the finite set A, which has many applications in combinatroics and number theory. In physics the Dirac 2. Lebesgue Measure and Integral 19 measure plays a central role; it is defined by 1 if x ∈ A, A 7→ δx (A) = 0 otherwise. Last, but not least, there is the Lebesgue measure which we will denote by λ. To start we define the Lebesgue-measure for cuboids Q by (2.1) λ(Q) = d Y (βj − αj ) , j=1 where Q = (α1 , β1 ) × . . . × (αd , βd ) with some real numbers αj ≤ βj . Of course, here we may also consider semi-open or closed cuboids. Then the definition of the Lebesgue measure can be extended first by additivity to finite (disjoint) unions of cuboids, socalled figures, and, secondly, by identifying with the outer measure λ∗ for generic measurable sets A (in F) by using countable unions of limits A of sequences of figures An (modulo null sets), where An → A as n → ∞ ⇐⇒ lim λ∗ (An ∆A) = 0. n→∞ Recall that A∆B := (A \ B) ∪ (B \ A) is the symmetrical difference A∆B of A and B and that the outer measure is given by ∞ X λ(An ), λ∗ (A) = inf n=1 where the infimum is taken over all countable coverings of A by open figures An . It should be noticed that λ∗ (A∆B) is small if A and B differ by a set of small measure only. To simplify arithmetic with sets we shall also write A = B if λ(A∆B) = 0. The above construction of the Lebesgue measure dates back to Carathéodory and can be generalized without big efforts.∗ An important feature of the Lebesgue measure is translation invariance, i.e., λ(A) = λ(A + x) for all measurable sets A and all points x; moreover, it is unique among all normed measures satisfying these properties. Examples for Lebesgue null sets are Q, resp. Qd according to the underlying space, or, more generally, all countable sets; a more advanced example is the uncountable Cantor set. We conclude our brief outline of measure theory with the notion of a probability space. A measure P is said to be a probability measure if the values of P lie in [0, 1] and P(X) = 1. For any finite measure µ we can ∗ Actually, the idea to enlarge the set of figures which do not constitute a σ-algebra, by limits of figures modulo null sets reminds us on Cantor’s construction of the real numbers. 20 ERGODIC NUMBER THEORY thus always define a probability measure by setting P(A) = µ(A)/µ(X). An important property of a probability measure is P(X \ A) = 1 − P(A) for any A ∈ F. A triple (X, F, P) consisting of a set X 6= ∅, a σ-algebra F, and a probability measure P is called a probability space. The underlying σ-algebra is said to be the event space and its elements E are the events which appear with probability P(E). It is remarkable that the axiomatic foundation of probability theory was given not earlier than in 1933 by Kolmogorov [88]. Probability theory often allows an interesting view on number theoretical questions, in particular, in context with distribution properties of arithmetical functions (which is only another expression for sequences of complex numbers). If (Xn ) is a sequence of independent on [0, 1) uniformly distributed random variables, then the law of the iterated logarithm implies, for any m 6= 0, P | n≤N e(mXn )| =1 almost surely, lim sup √ 2N log log N N →∞ which means that this equality holds with probability P(E) = 1, where E stands for this event. Consequently, the set of all sequences {xn } in [0, 1) for which the above lim sup condition does not hold is a null set. (For this law of the iterated logarithm see [17, 83].) This may be compared with Weyl’s theorem 1.4. 2.2. The Lebesgue Integral Next we give a short introduction to Lebesgue’s integration theory. Only here functions enter the stage. We write f ≤ g for two real-valued functions if the inequality f (x) ≤ g(x) holds for almost all x for which f and g are defined (which will always be clear from the context). Given a measure space (X, F, µ), a function f : X → R is called measurable (resp. µ-measurable) if the set {x ∈ X : f (x) < α} is measurable for any α ∈ R (i.e., if it lies in F). In particular, any continuous function is measurable with respect to the Lebesgue measure, or, more general, to any generic measure with respect to Borel σ-algebras. A function is said to be simple if its image is finite. In order to define the integral for non-negative simple functions η we write η as a finite linear combination of indicator functions η= m X j=1 cj χBj with Bj := {x : η(x) = cj } and pairwise distinct cj ≥ 0 which constitute the image η(X); in particular, we suppose the sets Bj to be disjoint. Here the indicator function χB 2. Lebesgue Measure and Integral according to B ⊂ X is defined by χB (x) = 21 1 if x ∈ B, 0 otherwise. For an interval B this coincides with the definition of indicator functions from the previous chapter. Obviously, this function is measurable if, and only if, B is measurable. A similar statement holds for simple functions η. The integral of χB with B ∈ F taken over a measurable set A is defined by Z χB dµ = µ(A ∩ B), A resp. for measurable simple functions η by Z Z m m X X cj µ(A ∩ Bj ). χBj dµ = cj η dµ = A A j=1 j=1 Using simple functions we can approximate any non-negative, real-valued measurable function f to any accuracy and thus define the Lebesgue integral by Z Z ηµ, f dµ = sup (2.1) A A where the supremum is taken over all measurable simple functions η satisfying 0 ≤ η ≤ f . Using Young’s decomposition, that is (2.2) f = f+ − f− with f + := max{f, 0}, f − := − min{f, 0}, we define the integral for any measurable, real-valued function f by Z Z Z f − dµ f + dµ − f dµ = A A A (simply by applying our integral for non-negative functions to both summands, f + and f − individually). The function f is said to be integrable (resp. µ-integrable) if both integrals on the right-hand side are finite. This definition of the Lebesgue integral reflects all important properties we are used to, that are monotonicity, translation invariance, and linearity (which allows us to define also the integral for complex-valued measurable functions). Moreover, it does not depend on the representation of simple functions as linear combinations of indicator functions (because of (2.1). What is the difference to Riemann’s integral? Here is an illuminating quotation of Lebesgue himself on this difference: “The geometers of the seventeenth century considered the integral of f (x) — the word ‘integral’ had not been invented, but that does not matter — as the sum of an infinity of indivisibles, each of which was the ordinate, positive or negative, of f (x). Very well! We have simply grouped together the indivisibles of comparable size. (...) One could say that, according to Riemann’s procedure, one tried to add the indivisibles by taking them in the order in which they were furnished 22 ERGODIC NUMBER THEORY by variation in x, like an unsystematic merchant who counts coins and bills at random in the order in which they came to hand, while we operate like a methodical merchant who says: I have m(E1 ) pennies which are worth 1 · m(E1 ), I have m(E2 ) nickels which are worth 5 · m(E2 ), I have m(E3 ) dimes which are worth 10 · m(E3 ), etc. Altogether then I have S = 1 · m(E1 ) + 5 · m(E2 ) + 10 · m(E3 ) + . . . The two procedures will certainly lead the merchant to the same result becaue no matter how much money he has there is only a finite number of coins or bills to count. But for us who must add an infinite number of indivisibles the difference between the two methods is of capital importance.” Calculating a Lebesgue integral we may disregard null sets. For instance, the Dirichlet function δ = χQ , defined by δ(x) = 1 for x ∈ Q and δ(x) = 0 for x ∈ R \ Q, is not integrable in the sense of Riemann, however, it is Lebesgue integrable with Z δ dλ = λ([0, 1] ∩ Q) = 0 [0,1] (since Q is countable and a fortiori a null set). This reflects what we should expect from the integral of a function which is vanishing almost everywhere. If a property E holds for all x ∈ A \ B, where A, B are µ-measurable, and if B is a null set, that is µ(B) = 0, then E holds for almost all x ∈ A and E is true on A almost everywhere. If µ is a probability measure, we may also write µ(A) = 1 and the event E can be identified with A. This makes the Lebesgue integral a powerful tool! Another important feature in the above construction is the σ-additivity of the underlying measure which allows to inherit properties as measurability and integrability from sequences of functions to their limits! This leads to the famous convergence theorems due to Lebesgue and his contemporaries. Here is Lebesgue’s dominated convergence theorem: Theorem 2.1. Let (gn ) be a sequence of measurable functions on a measure space (X, µ). Assume that limn→∞ gn (x) exists for almost all x ∈ X and is measurable, and that there exists an integrable function g ≥ 0 such that |gn (x)| ≤ g(x) for almost all x and any n. Then Z Z lim gn dµ. gn dµ = lim n→∞ X X n→∞ Thus, we may interchange limit and integration under quite weak conditions. Here only pointwise convergence of the sequence of the gn is needed, not the more restrictive uniform convergence which is needed for the Riemann integral. The monotone convergence theorem strengthens the theorem on dominated convergence: 2. Lebesgue Measure and Integral 23 Theorem 2.2. If (gn ) is an almost everywhere increasing sequence of realvalued, non-negative, measurable functions on X, and if gn converges to g pointwise almost everywhere, then Z Z g dµ. gn dµ = lim n→∞ X X We conclude with introducing a vector space structure. For 1 ≤ p < +∞ we denote the vector space of all µ-integrable functions f : X → C with semi-norm 1 Z kf kp := X |f |p dµ p < +∞ by Lp (X, F, µ). Taking the quotient with respect to the equivalence relation f ∼g : ⇐⇒ {x ∈ X : f (x) 6= g(x)} is a null set, we obtain the normed quotient vector space Lp (X, F, µ) = Lp (X, F, µ)/ ∼ or, for short, Lp . Here two functions are identified if there values differ on a set of measure zero, and the norm is defined as continuation of k · kp . The famous theorem of Riesz & Fischer states that the spaces Lp are complete; the special case p = +∞ does not play any role in the sequel. * * * In the first chapter we have found characterisations for uniformly distributed sequences, e.g., the sequence of numbers N ∋ n 7→ xn := nξ with irrational ξ. In the following we want to investigate mappings T : X → X defined on certain sets X in order to study the dynamics of the iteration of T . In our approach the concept of measure plays a central role. CHAPTER 3 Measure Invariance and Ergodicity We shall consider mappings T : X → X defined on certain sets X. Our aim is to understand the dynamics of iterations of T . For this purpose we may assume that the transformation T respects the structure of X: if X is a topological space, we may suppose T to be continuous; if X obeys a differentiable structure, we want T to be a diffeomorphism. In the sequel we shall often work in probability spaces, hence we may suppose that T is measurable. 3.1. Measure Preserving Transformations Given a measurable space (X, F, µ), a transformation T : X → X is said to be measurable (or more precisely µ-measurable), if T −1 A := {x : T (x) ∈ A} ∈ F for all A ∈ F. Any such mapping T is said invertible, if T A := {T (x) : x ∈ A} ∈ F for all A ∈ F and T X = X. A measurable mapping T is said to be measure preserving with respect to µ, if µ(T −1 A) = µ(A) for all A ∈ F; that means the measure of a set always equals the measure of its preimage. If T is additionally invertible, the latter property is equivalent to µ(T A) = µ(A). If T is measure preserving, then (X, F, µ, T ) is called a dynamical system. From a measure theoretical point of view one may also say that ’µ is T -invariant’ rather than ’T is µ-measure preserving’. Given a mapping T as above and x ∈ X, define T 0 (x) = x, T 1 (x) = T (x) T n+1 (x) = T (T n (x)) and for n ∈ N; however, we shall use the abbreviation T n x in place of T n (x). The orbit or trajectory of x under T is defined as the set {T n x : n ∈ N0 }. The orbit encodes important information about the point x and the mapo T , respectively. In case of invertible maps it makes also sense to consider the past, i.e., . . . , T −2 x, T −1 x, T 0 x = x, T x, T 2 x, . . . . We may interpret this configerutaion as a dynamical system with discrete time. In systems with continuous time one studies flows ϕ : X × R → X, (x, t) 7→ ϕ(x, t) =: ϕt (x) with ϕ0 (x) = x for all x ∈ X and ϕs ◦ ϕt = ϕs+t , however, in the sequel we shall focus on the discrete time setting. 24 3. Measure Invariance and Ergodicity 25 We already know two interesting transformations. In order to embed these examples into our new language let us take the measure space X = [0, 1) with the Borel σ-algebra B and the Lebesgue measure λ. ♣ Example 1): The transformation from circle billiards is called circle rotation (resp. translation) and is for fixed θ ∈ (0, 1) defined by Rθ : T → T, x 7→ x + θ. (Obviously, we could also define Rθ (x) = {x + θ} = x + θ mod 1 on [0, 1).) The projection of the sequence n 7→ nξ onto the circle group T is a circle rotation: for xn we have Rξn = xn . Obviously, Rθ is measurable with respect to the Lebesgue measure. In fact, given a subinterval (α, β) of [0, 1), we have Rθ−1 (α, β) = (α − θ, β − θ) or = (1 + α − θ, 1 + β − θ), according to θ ≤ α or β ≤ θ, and Rθ−1 (α, β) = (0, β − θ) ∪ (1 + α − θ, 1), if α < θ ≤ β. This shows also that Rθ is measure preserving with respect to λ since in both cases λ(Rθ−1 (α, β)) = β − α = λ((α, β)). In our reasoning we are allowed to restrict on intervals only since the Borel σ-algebra is generated by the open subsets of X = [0, 1). This simplification is based on the notion of a monotonic class C consisting of all finite disjoint unions of elements of an algebra A. If additionally F is a σ algebra generated by C and (X, F, µ) a measure space, for any A ∈ F and any ǫ > 0 there exists a set B ∈ C such that µ(A∆B) < ǫ, hence, B approximates the given set A as closely as we want. On account of this approximation properties as measurability and measure invariance can be transported from C to the completion F with respect to µ. This is known as the theorem of HahnKolmogorov; we refer the interested reader to [37] and [148] for details. ♣ Example 2): the transformation of the Gelfand problem is given by 2x if 0 ≤ x < 12 , T : [0, 1) → [0, 1), x 7→ 2x mod 1 = 2x − 1 if 12 ≤ x < 1 (in some literature this is also called the doubling–map). Given a subinterval (α, β) in [0, 1), we have β+1 T −1 (α, β) = ( α2 , β2 ) ∪ ( α+1 2 , 2 ), which obviously is an element of B; hence, T is Lebesgue measurable. The union on the right is disjoint (since α + 1 ≥ b) and, moreover, λ(T −1 (α, β)) = β − α = λ((α, β)). Thus, T is measure preserving with respect to the Lebesgue measure. The cautious reader might have been surprised about the definition of measure 26 ERGODIC NUMBER THEORY preserving where it is requested that both, T −1 A and A have the same measure (and not T A and A). The doubling–map provides an example why this is a good concept since it is measure preserving as shown above, but not invertible: λ(T −1 (α, β)) 6= β − α. (3.1) Although this example is simple, iterations of this mapping yield the binary expansion for numbers from the unit interval [0, 1). Given x ∈ [0, 1), let 0 if 0 ≤ x < 21 , b1 = b1 (x) = 1 if 12 ≤ x < 1. Then T x = 2x − b1 (x). Writing bn = bn (x) = b1 (T n−1 x) for n ∈ N, we thus find x = 12 (b1 + T x) and T x = 21 (b2 + T 2 x), resp. by induction b1 b2 bn T nx + 2 + ... + n + n for n ∈ N. 2 2 2 2 Since 0 ≤ T n x < 1 the tail of the series converges to zero as n → ∞. Hence, we obtain the binary expansion x= x= ∞ X bn . n 2 n=1 ♣ Example √ 3): For the same measure space as in the preceeding example let G = 21 ( 5 + 1) be the golden section and define TG : X → X by Gx if 0 ≤ x < G1 , TG x = Gx mod 1 = Gx − 1 if G1 ≤ x < 1. Actually, TG is not measure preserving with respet to the Lebesgue measure, however, it is measure preserving with respect to µ defined by ( Z 1+2G if 0 ≤ x < G1 , 2+G g(x) dx with g(x) = µ(A) = G if G1 ≤ x < 1. A 1+G The iterations TGn x provide the so-called G-expansion of x ∈ [0, 1), that is ∞ X cn x= Gn n=1 with cn ∈ {0, 1} and cn cn+1 = 0 for all n ∈ N. ♣ Example 4): Next we consider a two-dimensional generalization of the Gelfand-mapping, also known as ’baker’s transformation’. Consider X = 3. Measure Invariance and Ergodicity 27 [0, 1)2 equipped with the product σ-algebra B×B and the product Lebesgue measure λ × λ. Then the map is defined by (2x, y2 ) if 0 ≤ x < 21 , 2 2 b : [0, 1) → [0, 1) , (x, y) 7→ b(x, y) = otherwise. (2x − 1, y+1 2 ) 1 1 Tz z 0 1/2 1 0 1 2 0 1 Figure 1. The ’bakers transformation’ in action; it bears its name from the process with which a baker is mixing water and flour for a dough. It looks like flaky pastry. These graphics were created with Maple-notebooks from Choe [31]; points (xj , b(xj ) from a sufficiently large set of uniformly distributed xj are taken for an approximation to the graph of b.∗ The bakers transformation b is measurable, invertible, and measure preserving with respect to λ × λ. 1 1 y 1 y 0 y 0 0 1 0 0 x 1 0 x 1 x Figure 2. The first iterations b, b2 , b3 of the baker transformation. ♣ Example 5): The so-called logistic transformation ℓ : [0, 1] → [0, 1] x 7→ 4x(1 − x) is measurable and measure preserving with respect to Z 1 dx p µ(A) = . π A x(1 − x) This density plays a prominent role in the Sato–Tate–conjecture on the distribution of group orders of elliptic curves by reduction modulo primes which was recently proved by Taylor [136]. In fact, it is the uniform distribution on the conjugacy classes of the special unitary group SU2 (C) ∗ This is in the spirit of the French impressionist painter Georges Seurat and his contemporaries. 28 ERGODIC NUMBER THEORY 1 5 y y 0 0 0 1 0 1 x x Figure 3. The logistic transformation: to the left the graph y = 4x(1 − x), and its density to the right. with respect to the Haar measure. In a similar way Deligne’s famous proof of the Weil conjectures [38] shows a uniform distribution of the Frobenius conjugacy classes. ♣ Example 6): Identifying the circle group T with the unit interval [0, 1) modulo one, we get for T2 = T × T the unit square [0, 1)2 where opposite sides are identified, hence T2 is a two-dimensional torus (or a doughnut in terms of a baker). The mapping x 2 1 x 2 2 A : T →T , 7→ mod 1 y 1 1 y is invertible (since the corresponding matrix has non-vanishing determinant) and, as a short computation shows, is measure preserving with respect to the two-dimensional Lebesgue measure. This map A is also called “Arnold’s cat map” in honour of V.I. Arnold.† The mapping A is an example of a so-called toral automorphism. Figure 4. How the cat map maps... We conclude with a last example. The famous 3X + 1-problem (also known as Collatz- or Syracuse-problem) is based on the following iteration on the set of positive integers: x/2 if x even, x 7→ T x = 3x + 1 if x odd. For instance, ... 7→ 12 7→ 6 7→ 3 7→ 10 7→ 5 7→ 16 7→ 8 7→ 4 7→ 2 7→ 1 7→ ..., † For the origin of this name see into his monograph [6]. 3. Measure Invariance and Ergodicity 29 Figure 5. Iterations of cat Felix under the “Arnold cat map”: A0 , A1 , A2 from left to right. hence, the orbit of x = 12 is eventually periodic. It is conjectured that this iteration is eventually periodic with period 4 7→ 2 7→ 1, independent of the initial value x. A weaker conjecture claims that there are no divergent trajectories of this iteration. The mapping T is definitely not injective. This example illustrates that it sometimes makes sense to study the past of an iteration: what is the preimage of 1 under this iteration? Actually, there is an interesting ergodic approach to this open problem. Matthews & Watts [102] showed that T is measure preserving on the set Z2 of 2-adic integers equipped with the corresponding Haar measure, and, using Birkhoff’s ergodic theorem, that the iterations T n x are uniformly distributed modulo 2k for any k ∈ N and almost all x ∈ Z2 . Unfortunately, this result is beyond the scope of this course, however, the interested reader can find more information in the survey of Lagarias [93] and in Wirsching’s book [153]. It is somehow surprising that a problem easy to formulate as the 3X + 1-problem seems to be so difficult to solve.‡ Further examples of measure preserving transformations can be found in [31]; for the case of Bernoulli-shifts we refer to [37]. Next we shall give a criterion for measure preserving in analogy to Weyl’s Theorem 1.3 on uniform distribution modulo one: Theorem 3.1. A transformation T : X → X is measure preserving with respect to µ if, and only if, for all µ-integrable functions f : X → C, Z Z f ◦ T dµ. f dµ = (3.2) X X In the formula giving the equivalent for measure invariance one may understand T as the time evolution of the dynamical system, f as the outcome of a physical experiment, and the integral as the expected value of the outcome of f ; then the invariance of the measure µ is nothing but the expectation of the outcome is the same now and one time unit later. In the case of metric spaces it suffices to prove the condition only for continuous functions f . One implication then follows from the proof below, ‡ In the 1960s Kakutani became interested in this problem; he shall have said: ”For about a month everybody at Yale (University) worked on it, with no result. A similar phenomenon happened when I mentioned it at the University of Chicago. A joke was made that this problem was part of a conspiracy to slow down mathematical research in the U.S.” (cf. [93]) 30 ERGODIC NUMBER THEORY the converse one from the representation theorems of Hahn-Banach and Riesz (see [121]). Proof. Assume (3.2) holds. Let A be a measurable set and denote by χA its (measurable) indicator function. Then, Z Z Z χT −1 A dµ = µ(T −1 A). χA ◦ T dµ = χA dµ = µ(A) = X X X Hence, T is measure preserving. Now assume that T is measure preserving. Then (3.2) holds in particular for indicator functions and, consequently, for all simple functions too. Now suppose that f ≥ 0 and (fn ) is a convergent sequence of measurable simple functions with limit f . Then limn→∞ fn ◦ T = f ◦ T . Applying Lebesgue’s theorem on dominated convergence, Theorem 2.1, with gn = fn ◦ T and gn = fn as well, we find Z Z Z Z f ◦ T dµ = lim fn ◦ T dµ = lim fn dµ = f dµ, n→∞ n→∞ where we have used (3.2) in the last but one step for simple functions. By the decomposition (2.2), we deduce the statement for arbitrary real-valued functions f ; complex-valued f can be treated by separating into real- and imaginary part (in the same manner as in the proof of Theorem 1.4). • ♣ Example 7): Let T : R → R be defined by T 0 = 0 and 1 1 for x = 6 0. Tx = 2 x − x Then T −1 (α, β) = (α − p α2 + 1, β − p β 2 + 1) ∪ (α + p α2 + 1, β + p β 2 + 1), hence, T is measurable. For any Lebesgue integrable function f , we find via the substitution τ = T x, dτ = 12 (1 + x12 ) dx that Z +∞ Z +∞ dτ dx f (τ ) = . f (T x) 2 1 + x 1 + τ2 −∞ −∞ Thus, Theorem 3.1 implies that T is measure preserving with respect to the probability measure P defined by Z 1 β dτ (3.3) P((α, β)) = . π α 1 + τ2 Alternatetively, one may use the addition theorem p p arctan(x + x2 + 1) + arctan(x − x2 + 1) = arctan(x). Actually, the transformation T originates from Newton’s iteration applied to the function f (x) = x2 + 1. Here, Newton’s iteration translates as follows: x2 + 1 f (xn ) 1 1 ↔ Tx = x − = 2 x− xn+1 = xn − ′ . f (xn ) 2x x 3. Measure Invariance and Ergodicity 31 If there would exist a real zero of f , the sequence of the numbers xn would converge, however, since f (x) 6= 0 for real x, the iteration diverges and provides an interesting random transformation (which we shall meet again in Chapter 6). This example is due to Lind (cf. [31]). 3.2. Ergodicity and Mixing Now we consider a probability space (X, F, µ). A measure preserving transformation T : X → X is said to be ergodic with respect to µ if for any measurable set A with T −1 A = A either µ(A) = 0 or µ(A) = 1 holds. In this case (X, F, µ, T ) is called an ergodic dynamical system. Ergodicity thus means that any measurable T -invariant set is either a null set or has full measure.∗ Theorem 3.2. The following statements are equivalent: (i) (ii) (iii) (iv) T is ergodic; µ(B) = 0 or = 1 for all B ∈ F with µ(T −1 B∆B) = 0; S µ( n T −n A) = 1 for all A ∈ F with µ(A) > 0; for any A, B ∈ F with µ(A) > 0 and µ(B) > 0, there exists some n ∈ N such that µ(T −n A ∩ B) > 0. If T is invertible, we can replace T −n by T n in these conditions for ergodicity. We want to give a few remarks. Condition (iii) claims that whenever A has positive measure any x ∈ X eventually will visit A under T (even infinitely often), whereas Condition (iv) shows that any element of B will almost surely visit A under T provided B has positive measure. Proof. (i) ⇒ (ii): We assume that B is measurable with µ(T −1 B∆B) = 0 and that T is ergodic. We denote the limit superior by C := ∞ [ ∞ \ T −n B. m=0 n=m For m ∈ N0 , we have B∆ ∞ [ T −n n=m Now B∆T −n B ⊂ B ⊂ n−1 [ ∞ [ B∆T −n B. n=m T −k B∆T −(k+1)B k=0 and since, by assumption, the set on the right-hand side has measure zero, it follows that µ(B∆T −n B) = 0 for any n ∈ N. Now let Cm = ∞ [ T −n B, n=m ∗ In probability theory many so-called 0 − 1–laws are known (starting from the work of Kolmogorv, Borel). 32 ERGODIC NUMBER THEORY hence, the Cm are nested one in another: C0 ⊃ C1 ⊃ C2 ⊃ . . . . Moreover, µ(Cm ) = µ(B) for any m ∈ N0 . It thus follows that µ(C∆B) = 0 and µ(C) = µ(B), respectively. Furthemore, we have T −1 C = ∞ [ ∞ \ ∞ \ T −(n+1) B = m=0 n=m ∞ [ T −n B = C. m=0 n=m+1 By assumption, µ(C) = 0 or µ(C) = 1. In view of our previous observation it follows that either µ(B) = 0 or µ(B) = 1. (ii) ⇒ (iii): Now assume that we are given a set A such that µ(A) > 0 S −n A. Then and let B = ∞ n=1 T T −1 B = ∞ [ n=2 T −n A ⊂ B. Since T is measure preserving, it follows that µ(T −1 B) = µ(B), hence µ(B∆T −1 B) = µ(B) − µ(T −1 B) = 0. Consequently, µ(B) = 0 or µ(B) = 1. Since T −1 A ⊂ B and µ(A) > 0 by monotonicity, it follows that µ(B) = 1. (iii) ⇒ (iv): Let both, A and B be sets of positive measure. By Condition (iii), ! ∞ [ −n T A = 1, µ n=1 hence 0 < µ(B) = µ ∞ [ n=1 ! B ∩ T −n A ≤ ∞ X µ(B ∩ T −n A). n=1 ∩ T −n A) In particular, there exists some n with µ(B > 0. −1 (iv) ⇒ (i): Let A be a set with T A = A. Then 0 = µ(A ∩ X \ A) = µ(T −n A ∩ X \ A) for arbitrary n ≥ 1. It thus follows from Condition (iv) that µ(A) = 0 or µ(X \ A) = 0, resp. µ(A) = 1 − µ(X \ A) = 1. • Next we shall prove a criterion for ergodicity relevant for practical purposes: Theorem 3.3. The following assertions are equivalent: (i) T is ergodic; (v) if f is a measurable function such that f (T x) = f (x) for (almost) all x, then f is constant (almost) everywhere. (vi) if f ∈ L2 (X, F, µ) with f (T x) = f (x) for (almost) all x, then f is constant (almost) everywhere. 3. Measure Invariance and Ergodicity 33 In Conditions (v) and (vi) we may suppose f (T x) = f (x) for all or just for almost all x ∈ X; because of the negligibility of null sets in Lebesgue integration these statements are equivalent. Proof. (i) ⇒ (v): Suppose that T is ergodic and f : X → C is measurable and satisfies f (T x) = f (x) for almost all x. Since this implies the same for both, the real and the imaginary part of f individually, we may suppose that f is real-valued. For k ∈ Z and n ∈ N let Akn = {x ∈ X : f (x) ∈ [ nk , k+1 n )}. Then T −1 Akn ∆Akn ⊂ {x ∈ X : f ◦ T (x) 6= f (x)}. Since the set on the right-hand side has measure zero, Theorem 3.2, (ii), implies µ(Akn ) ∈ {0, 1}. For any n the set X is a disjoint union of sets S Akn , i.e., X = k∈Z Akn . Thus, there exists a unique positive integer k(n) k(n) (depending on n) such that µ(An Y = ) = 1. Now let ∞ \ Ank(n) . n=1 Then µ(Y ) = 1 and f is constant on Y . Since Y has full measure, f is constant almost everywhere. The implication (v) ⇒ (vi) is trivial, so it remains to prove (vi) ⇒ (i): suppose that T −1 A = A with a measurable set A of positive measure, then we need to show µ(A) = 1. For the indicator function of A we thus have to prove χA ∈ L2 (X, F, µ) and χA ◦ T = χT −1 A = χA . By assumption, χA is constant almost evrywhere, hence, χA (x) = 1 for almost all x. This implies µ(A) = 1. The theorem is proved. • As application we study two examples of measure preserving transformations from the previous section with respect to ergodicity. Both mappings are defined by use of a periodicity instruction which suggests to use Criterion (vi) from above in combination with Fourier analysis. Recall that any L2 -function can be represented by its Fourier series (as proved, for example, in [121]). ♣ Example 1): The circle rotation Rθ : [0, 1) → [0, 1), x 7→ x + θ mod 1 describes the distribution of the fractional parts of the real number sequence xn = nθ + β with β = Rθ 0. Corollary 1.5 implies that the sequence (nθ) is uniformly distributed modulo one if, and only if, θ is irrational. Analogously, the same statement holds true for shifted sequences (nθ + β) independent of β. The following theorem shows that this is indeed an ergodic phenomenon: Theorem 3.4. The circle rotation Rθ is ergodic with respect to the Lebesgue measure if, and only if, θ is irrational. 34 ERGODIC NUMBER THEORY Proof. Suppose θ = pq is rational. Then x 7→ e(qx) = exp(2πiqx) defines a non-constant Rθ -invariant function: e(qRθ x) = exp(2πiq(x + pq )) = exp(2πiqx) exp(2πip) = e(qx). In view of Theorem 3.3, Condition (v), it follows that Rθ is not ergodic. Suppose that θ is irrational. Let X (3.4) f (x) = cn e(nx) n∈Z denote the Fourier series of an Rθ -invariant function f ∈ L2 . Then X f (x) = f (Rθ x) = f (x + θ) = cn e(nθ) e(nx), n∈Z and, with the uniqueness of the Fourier expansion, cn = cn e(nθ), resp. cn (1 − e(nθ)) = 0 for n ∈ Z. For n 6= 0 it follows that e(nθ) 6= 1 thanks to the irrationality of θ, thence cn = 0. Thus f (x) = c0 is constant and Theorem 3.3, Condition (vi), implies the ergodicity of Rθ . • (For a proof without Fourier theory we refer to [47].) ♣ Example 2): Consider the doubling-map T : [0, 1) → [0, 1), x 7→ 2x mod 1. As in the previous proof we start with a T -invariant function f ∈ L2 with Fourier series (3.4). Then X f (x) = f (T x) = cn e(2nx). n∈Z Comparing coefficients we find cn = c2n . The Parseval identity (see [121]) yields Z 1 X 2 |f (x)|2 dx = kf k2 = |cn |2 < +∞. 0 n∈Z Hence, all cn with n 6= 0 vanish, and by Theorem 3.3, (v), the ergodicity of T follows. This reasoning can be extended to toral automorphisms: Theorem 3.5. Let A ∈ Zd×d be a matrix and Tφ : Td → Td , φ(x) = Ax mod 1 for x ∈ Td . Then Tφ is ergodic if, and only if, the eigenvalues of A do not contain a root of unity. In particular, the mapping x 7→ x mod 1 is not ergodic (which, of course, is trivial). The proof of the general statement is not much more difficult than the special case sketched above. We refer to [33, 31] for details. A close relative of ergodicity is the notion of mixing. A transformation T is said to be strongly mixing if, for all A, B ∈ F, lim µ(A ∩ T −n B) = µ(A)µ(B). n→∞ 3. Measure Invariance and Ergodicity 35 On the contrary, T is called weakly mixing if 1 X lim |µ(A ∩ T −n B) − µ(A)µ(B)| = 0. N →∞ N 0≤n<N Obviously, the following chain of inclusions holds: strongly mixing ⇒ weakly mixing ⇒ ergodic. An example of a strongly mixing process is provided by the Baker transformation b. On the contrary, circle rotations Rθ with irrational θ are ergodic but not strongly mixing. Examples for weakly but not strongly mixing transformations were given by Kakutani [76]. For the different notions of mixing Halmos [58] found the following intuitive cocktail example: given a bowl with 90 percent gin and 10 percent vermouth. After shaking sufficiently long the two fluids mix to one drink in which any Borel set should contain the same proportions of gin and vermouth. Exercises Practise makes perfect. Since mathematics is no spectator sport it is always the best to examine new theory by examples. Exercise 3.1. Verify all claims made above about i) TG and the G-expansion, and ii) the Baker-transformation. Next we state inverse problems: how to discover ergodicity and is the measure associated with an ergodic transformation unique? That are both difficult questions as one may experience by the following Exercise 3.2. Define the transformation T by T 0 = 0 and T x = { x1 } for x ∈ (0, 1). Try to find a measure µ on [0, 1) such that T is measure preserving with respet to µ. Moreover, prove that the Lebesgue measure is the only measure for which the circle rotation Rθ is ergodic. Hint for the second task: use the circle group characters x 7→ e(mx), m ∈ Z. Given a transformation T and a σ-algebra, there may be many ergodic measures with respect to T . If there is only one ergodic measure, then T is said to be uniquely ergodic. The next exercises are also good for getting experience with all the new notions and techniques: Exercise 3.3. Let (X, F , µ) be a measure space and T : X → X a measurable mapping. Show that all T -invariant sets constitute a σ-algebra. Exercise 3.4. Let m > 1 be a positive integer and denote by X = Z/mZ the ring of residue classes modulo m. Further, put F = P(X) and denote by µ the uniform distribution on X. Finally, for b ∈ {1, 2, . . . , m} put Tb : X → X, x 7→ x + b mod m. Prove that i) Tb is measure preserving, and ii) (X, F , µ, Tb ) is ergodic if, and only if, b and m are coprime. 36 ERGODIC NUMBER THEORY Exercise 3.5. Prove all statements on mixing and ergodicity and their hierarchy. * * * Our next aim are ergodic theorems. Birkhoff wrote in [21]: ”What the Ergodic Theorem means, roughly speaking, is that for a discrete measure-preserving transformation or a measure-preserving flow of a finite volume, probabilities and weighted means tend towards limits when we start from a definite state P (not belonging to a possible exceptional set of measure 0), and, furthermore, the limiting value is the same in both directions.” Following Billingsley [17] here is an easy probabilistic proof of a special case of the ergodic theorem: if T is mixing and A is a measurable set, P then N1 0≤n<N χA (T n x) converges in probability to the expectation of the indicator function χA , that is EχA = P(A). In fact, if cmn := E(χA (T m x) − P(A))(χA (T n x) − P(A)) = P(T −m A ∩ T −n A) − P(A)2 , then, since T preserves measure, we find cmn = ρ|n−m| with ρk := P(A ∩ T −k A) − P(A)2 which tends to zero as k → ∞ (because of the mixing property). Thus 2 o n 1 X χA (T n x) − P(A) E N 0≤n<N = 1 N2 X 0≤m,n<N cmn = 1 2 ρ0 + 2 N N X 0≤m<N (N − m)ρm , which tends to zero as N → ∞ by the theorem on arithmetic means of convergent sequences. It thus follows from Chebyshev’s inequality that 1 P n 0≤n<N χA (T x) converges in probability to P(A). In the following N chapter we shall prove a much stronger version... CHAPTER 4 Classical Ergodic Theorems In statistical mechanics one studies a large number of particles whose positions and momenta are governed by Hamilton’s equation for a given Hamiltonian. The trajectories of these particles can be considered as a flow in phase space. It was Boltzmann’s idea to study the flow rather than a single particle. In his ergodicity hypothesis from 1871 he claims that the average amount of time any given orbit spends in some set exactly equals the measure of this set. This statement implies an equivalence with respect to the mean along a trajectory (Greek: odos) of the system and the mean over all possible states of equal energy (Greek: ergon). In 1879 Maxwell claimed that any system in any state, sooner or later, will move to any state possible with respect to the physical side conditions. It was Poincaré who discovered in 1890 that it is too restrictive to demand that the trajectory visits any point in the phase space which is conform with the physical side conditions, hence, this restrictive ergodic hypothesis is false.∗ Actually, Poincaré formulated a weak ergodic hypothesis which states that the trajectory comes as close to any point of the phase space as we want (however, the trajectory does not need to visit this target point). The ergodic theorems below yield a justification of this weak ergodic hypothesis, hence, they might be interpreted as mathematical foundation of statistical mechanics.† 4.1. The Mean Ergodic Theorem of von Neumann The first ergodic theorem was found by John von Neumann [105] (although his result was published one year after Birkhoff’s theorem which will be discussed in the next section). Theorem 4.1. Let (X, F, µ) be a probability space and T : X → X measure preserving. Then, for f, g ∈ L2 (X, F, µ), the limit of Z 1 X f (T n x)g(x) dµ(x) N X 0≤n<N ∗ It seems that the strong formulation of Boltzmann’s ergodic hypothesis that one single trajectory is filling the whole of phase space is due to Ehrenfest’s review of Boltzmann’s work. † In case of spontaneous breaks in symmetry refutations of the ergodicity hypothesis can appear — this scenario has been observed in phase transitions when fluids freeze and in spin glasses. 37 38 ERGODIC NUMBER THEORY exists as N → ∞; if T is ergodic, then Z Z Z 1 X n (4.1) lim g dµ. f dµ f (T x)g(x) dµ(x) = N →∞ N X X X 0≤n<N This theorem is also called mean ergodic theorem because of the integration over X; the appearing function g is a suitable weight function and does not appear in von Neumann’s original work. As special case we conclude the convergence in L2 1 X (4.2) lim f (T n x) − f ∗ = 0 N →∞ N 2 0≤n<N with a T -invariant limit f ∗ ∈ L2 . The von Neumann ergodic theorem is a functional-analytic statement in the following sense. The right-hand side of the formula in the theorem is the orthogonal projection of f onto the space of T -invariant f in the Hilbert space L2 equipped with the inner product R hf, gi = kf gk22 = f g dµ. Sketch of proof. Consider the subspace of T -invariant functions I := {f ∈ L2 : f ◦ T = f }, and let J := {f ∈ L2 : ∃ h ∈ L2 with f = h ◦ T − h}. For f1 ∈ I and f2 = h ◦ T − h ∈ J we have 1 X f1 (T n x) = f1 (x) N 0≤n<N and 1 N X 0≤n<N f2 (T n x) = 1 (h(T N x) − h(x)) N for any N ∈ N. By the Cauchy-Schwarz inequality, Z 2 1 n (h(T x) − h(x))g(x) dµ(x) ≤ khk2 kgk2 , N X N which tends to zero as N → ∞. If we would have a decomposition f = f1 +f2 with f1 , f2 as above, we could deduce Z 1 X f (T n x)g(x) dµ(x) N X 0≤<N Z Z 1 X f1 (x)g(x) dµ(x) + = f2 (T n x)g(x) dµ(x), N X X 0≤<N thence Z Z 1 X f1 g dµ. f (T n x)g(x) dµ(x) = N →∞ N X X lim 0≤<N 4. Classical Ergodic Theorems 39 Unfortunately, in general there is no such decomposition of f available. However, for all sufficiently small ǫ > 0 we can find functions f1 ∈ I and f2 ∈ J such that Z kf − (f1 + f2 )k22 dµ < ǫ; X consequently, f1 + f2 approximate the target function f in the mean-square. In a similar way as in the case f = f1 + f2 it follows that Z Z Z 1 X n g dµ. f dµ f (T x)g(x) dµ(x) = lim N →∞ N X X X 0≤n<N To finish the proof we only need to show that there exists a decomposition of L2 into a direct sum I ⊕ J , where J denotes the closure of J . For this purpose we may suppose that f is orthogonal to J , i.e., hf, f2 i = 0 for all f2 ∈ J . In particular, Z Z 2 (f ◦ T ) · f dµ. |f | dµ = X X It remains to show that f ∈ I. For this we compute Z |f ◦ T − f |2 dµ = 0. X Hence, f ◦ T = f almost everywhere which means f ∈ I and finishes the proof. • Recently, Tao [134] proved a multidimensional version of von Neumann’s theorem which has been a long-standing problem. 4.2. The Birkhoff Pointwise Ergodic Theorem Our next aim is the important ergodic theorem of George D. Birkhoff [20]: Theorem 4.2. Let T be a measure preserving transformation on a probability space (X, F, µ). If f ∈ L(X, F, µ), then, for almost all x ∈ X, the limit 1 X f (T n x) (4.3) f ∗ (x) := lim N →∞ N 0≤n<N exists with f ∗ ∈ L(X, F, µ) and satisfies f ∗ (T x) = f ∗ (x) as well as Z Z ∗ f dµ. f dµ = (4.4) X X If additionally T is ergodic, then f ∗ is constant almost everywhere and Z 1 X (4.5) lim f dµ. f (T n x) = N →∞ N X 0≤n<N 40 ERGODIC NUMBER THEORY Birkhoff’s original paper [20] deals only with the case of indicator functions. It was Aleksandr Khintchine [80] who extended Birkhoff’s result to arbitrary integrable functions f on an finite measure space. For this reason in some literarture this result is also called the BirkhoffKhintchine theorem. Another name one can find is pointwise ergodic theorem. It shows that the time mean (4.3) of f along the orbit {T n x} equals for almost all x the space mean of f (taken over the complete space X). This provides a rather precise prediction although not too much might be known about f or T : the conditions that f ∈ L and T is measure presering are rather weak. In this sense Birkhoff’s ergodic theorem allows to predict the future value of f along a trajectory, practically, without knowing anything! For example, if M ⊂ X is measurable and f = χM , the mean of all visits T n x in M is for almost all initial values x equal to the measure of M provided T is ergodic. Ergodicity enforces uniform distribution! Our proof follows Kamae & Keane [77]: Proof. We shall prove the statement for non-negative real-valued functions; the general case follows in a standard way by use of the f = f + − f − with non-negative f + , f − (see (2.2)) for arbitrary real-valued and for complexvalued functions by treating the real and the imaginary part separately. Therefore, let us assume f ≥ 0. We define X f (T n x) fN (x) = 0≤n<N as well as f (x) = lim sup N →∞ fN (x) N and f (x) = lim inf N →∞ fN (x) . N It follows that both, f and f are measurable (since lim supN →∞ fN (x) = inf m supN ≥m fN (x) and something analogous for lim inf). In view of fN (T x) fN +1 (x) N + 1 f (x) f (T x) = lim sup = lim sup · − N N +1 N N N →∞ N →∞ fN +1 (x) = f (x) = lim sup N +1 N →∞ it follows that f is T -invariant. In an analogous manner we find that f (T x) = f (x). In order to prove the existence of the limit f ∗ , its integrability, and its T -invariance, it suffices to show Z Z Z f dµ, f dµ ≤ f dµ ≤ (4.6) X X X since then f ≤ f implies f (x) = f (x) = f ∗ (x) for almost all x and integration yields (4.4). (Here we use that if the Lebesgue integral of a non-negative function equals zero, then the function vanishes almost everywhere.) 4. Classical Ergodic Theorems 41 Let ǫ ∈ (0, 1) and L > 0 be given. By definition of f for any x ∈ X there exists a positive integer m such that fm (x) ≥ (1 − ǫ) min{f (x), L}; m in fact, this inequality holds independent of L (with likely another m). For any given δ > 0 we further find a positive integer M such that the set X+ := x ∈ X : ∃ 1 ≤ m ≤ M with fm (x) ≥ m(1 − ǫ) min{f (x), L} has measure greater than or equal to 1 − δ. Next define f (x) if x ∈ X+ , f˜(x) = L otherwise. It follows that f ≤ f˜; to see this assume x ∈ X \ X+ , hence fm (x) < m(1 − ǫ) min{f (x), L}, which implies f ≤ L. For x ∈ X and n ∈ N0 , let an := an (x) := f˜(T n x) and bn := bn (x) := (1 − ǫ) min{f (x), L}. We claim that, for any n ∈ N0 , there exists a positive integer 1 ≤ m ≤ M satisfying (4.7) an + . . . + an+m−1 ≥ bn + . . . + bn+m−1 . In order to verify this, let us first assume that T n x ∈ X+ . Then there exists 1 ≤ m ≤ M such that fm (T n x) ≥ m(1 − ǫ) min{f (T n x), L} = m(1 − ǫ) min{f (x), L} = bn + . . . + bn+m−1 , where we have used the T -invariance of f . Hence, an + . . . + an+m−1 = f˜(T n x) + . . . + f˜(T n+m−1 x) ≥ f (T n x) + . . . + f (T n+m−1 x) = fm (T n x) = bn + . . . + bn+m−1 . If T n x 6∈ X+ , we may take m = 1 since an = f˜(T n x) = L ≥ (1 − ǫ) min{f (x), L} = bn . Consequently, our assertion about (4.7) is proved. In view of (4.7), for any positive integer N > M , there exist recursively defined integers m0 < m1 < . . . < mk < N satisfying 1 ≤ m0 ≤ M, mj+1 − mj ≤ M for j = 0, 1, . . . , k − 1 as well as N − mk ≤ M and a0 + . . . + am0 −1 ≥ b0 + . . . + bm0 −1 , am0 + . . . + am1 −1 ≥ bm0 + . . . + bm1 −1 , ... ... amk−1 + . . . + amk −1 ≥ bmk−1 + . . . + bmk −1 . 42 ERGODIC NUMBER THEORY Addition of these inequalities leads to (4.8) a0 + . . . + aN −1 ≥ a0 + . . . + amk −1 ≥ b0 + . . . + bmk −1 ≥ b0 + . . . + bN −M −1 . Note that the numbers bn are all independent of n. Translating the latter inequalities, we find X f˜(T n x) ≥ (N − M )(1 − ǫ) min{f (x), L}. 0≤n<N Integration yields Z X Z min{f (x), L} dµ(x). f˜(T n x) dµ(x) ≥ (N − M )(1 − ǫ) 0≤n<N X X Since T is measure preserving by Theorem 3.1, it follows that Z Z g(x) dµ(x) g(T x) dµ(x) = X X for all integrable functions g, in particular for g = f˜. Hence we get rid of the mean over all 0 ≤ n < N and obtain Z Z ˜ min{f (x), L} dµ(x). f dµ ≥ (N − M )(1 − ǫ) N X X Since Z f˜(x) dµ(x) = Z X+ X f (x) dµ(x) + Lµ(X \ X+ ), it follows from our construction that Z Z Z f˜(x) dµ(x) − Lµ(X \ X+ ) f (x) dµ(x) = f (x) dµ(x) ≥ X X+ X Z N −M min{f (x), L} dµ(x) − Lδ. (1 − ǫ) ≥ N X Next we let N tend to infinity, and then δ and ǫ to zero in order to deduce Z Z min{f , L} dµ. f dµ ≥ X X Applying the monotone convergence Theorem 2.2 with gL = min{f , L} and L → ∞, we may interchange limit and integration: Z Z Z min{f , L} dµ = lim lim min{f , L} dµ = f dµ. L→∞ X Thus, X Z X L→∞ f dµ ≥ Z X f dµ. X This is the first inequality in (4.6). For the proof of the first inequality in (4.6) we start in a similar manner as above: for any ǫ > 0 and any x ∈ X, there exists a positive integer m satisfying fm (x) ≤ f (x) + ǫ. m 4. Classical Ergodic Theorems 43 For arbitrary δ > 0 we can find a positive integer M such that X− := x ∈ X : ∃ 1 ≤ m ≤ M with fm (x) ≤ m(f (x) + ǫ) has measure at least 1 − δ. Now define f (x) if x ∈ X− , fˆ(x) = 0 otherwise. Then fˆ ≤ f and, setting bn = fˆ(T n x) and an = f (x) + ǫ (this time independent of n), we deduce from (4.7) and (4.8) that X fˆ(T n x) ≤ N (f (x) + ǫ). 0≤n<N −M Since T is measure preserving, integration of both sides yields Z Z f dµ + ǫN. fˆ dµ ≤ N (N − M ) X X Because of f ≥ 0 the measure µ̃, given by Z f dµ, µ̃(A) = A is absolutely continuous, that means that there exists some δ̃ > 0 for which µ̃(A) < δ whenever µ(A) < δ̃. It thus follows from µ(X \ X− ) < δ that Z Z Z Z N f dµ ≤ fˆ dµ + f dµ = (f + ǫ) dµ + δ̃ N −M X X\X− X X Letting first N → ∞ and then δ → 0 (as well as δ̃ → 0), and, finally, ǫ → 0, we get Z Z f (x) dµ(x). f (x) dµ(x) ≤ X X Hence, (4.6) is proved. It remains to prove (4.5) in case of ergodic T . In view of Theorem 3.3, (v), the function f ∗ is constant almost everywhere, hence f ∗ (x) = c for almost all x ∈ X. This implies Z Z ∗ f dµ. f dµ = c= X X The theorem is proved. • In contrast to von Neumann’s functional approach Birkhoff chose the concept of measure space for his ergodic theorem which lead him to his more practical result. Important generalizations of both ergodic theorems were given by Hopf, Yosida & Kakutani as well as Wiener & Wintner [151], Hurewicz [68]‡ and, even more general, Chacon & Ornstein [29] (resp. [40]). The rate of convergence in Birkhoff’s Theorem 4.2 can be rather slow (as indicated by the simulation below). One can show that a computable ‡ see the excellent online notes http://www.math.uu.nl/people/dajani/lecturenotes2006.pdf of Dajani 44 ERGODIC NUMBER THEORY rate of convergence in general cannot exist (cf. [91]). Recently, Kohlenbach & Leuştean [86] obtained a quantitative version of Theorem 4.2 for uniformly convex Banach spaces by use of model theoretical techniques (in particular, Gödel’s functional interpretation); see also Avigad et al. [8] in this context. 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 1000 0 0 n 1000 0 1000 n n Figure 1. From left to right: doubling T x = 2x mod 1, the logistic transformation ℓx = 4x(1 − x), and T x = {1/x}, which will play a central role in a later chapter. As first application of Birkhoff’s ergodic theorem we shall derive a measure theoretical characterization of ergodicity: Theorem 4.3. Let (X, F, µ) be a probability space and assume that T : X → X is measure preserving with respect to µ. Then T is ergodic if, and only if, for all A, B ∈ F, 1 X µ(T −n A ∩ B) = µ(A)µ(B). (4.9) lim N →∞ N 0≤n<N The theorem states that the preimages of a set A under an ergodic transformation T cover part of a given arbitrary set in the mean. This criterion for ergodicity may be compared with the notions of weak and strong mixing from the previous section. Proof. Assume that T is ergodic. Applying the Birkhoff ergodic theorem 4.2 to the indicator function f = χA , yields Z 1 X n χA dµ = µ(A) χA (T x) = (4.10) lim N →∞ N X 0≤n<N for almost all x. Hence, 1 X 1 lim χT −n A∩B (x) = lim N →∞ N N →∞ N 0≤n<N X χA (T n x)χB (x) = µ(A)χB (x) 0≤n<N almost everywhere. For any N , the limit on the left-hand side is bounded by the constant function 1. Thus, Lebesgue’s theorem 2.1 on dominated convergence implies Z 1 X 1 X −n lim lim µ(T A ∩ B) = χT −n A∩B (x) dµ(x) N →∞ N X N →∞ N 0≤n<N 0≤n<N Z χB (x) dµ(x) = µ(A)µ(B), = µ(A) X 4. Classical Ergodic Theorems 45 which is nothing else than (4.9). For the converse, suppose that T −1 A = A. Setting A = B in (4.9), shows that 1 X lim µ(A) = µ(A)2 , N →∞ N 0≤n<N which implies either µ(A) = 0 or µ(A) = 1. • An alternative proof of this theorem (based on Wiener’s maximum inequality) can be found in [47]. These ergodic theorems were translated by Kolmogorov and Khintchine into the language of probability theory (see [83, 31] for their precise R formulation). In the ergodic theorem of Birkhoff the quantity f ∗ = f dµ may be regarded in the case of an ergodic T as the expectation of f . This interpretation allows far-reaching generalizations of the fundamental law of large numbers which states that, given a sequence of identically distributed and independent (i.i.d.) random variables X1 , X2 , . . . on some probability space with finite expectation E|Xn | < +∞, the following limit exists N 1 X Xn = EX1 N →∞ N lim almost everywhere. n=1 Thus, taking the mean over the realizations of many i.i.d. random variables is in the limit the same as taking the mean over the realizations of a single one — without any such limit law a theory of randomness would be impossible! This observation essentially dates back to Daniel Bernoulli although in a very simple form; the first formulation for random variables was given by Chebyshev. Exercises Since the proof of the mean ergodic theorem might be a bit sketchy, it is a good start to fill the gaps we left: Exercise 4.1. Complete the proof of von Neumann’s Ergodic Theorem 4.1 (maybe with the help of [114]) and deduce (4.2). Moreover, show that for f ∈ Lp with 1 ≤ p < +∞ the convergence (4.2) can be replaced by the same statement with respect to the p-norm with a limit f ∗ ∈ Lp . The ergodic theorems of von Neumann and Birkhoff are closely related: Exercise 4.2. Deduce von Neumann’s Ergodic Theorem 4.1 with weight function g ≡ 1 from Birkhoff’s theorem 4.2 along the following lines: given a function h ∈ L∞ , define 1 X h ◦ T n, H = lim N →∞ N 0≤n<N use Birkhoff’s ergodic theorem to show that the difference 1 X lim h(T n x) − H(x) N →∞ N 0≤n<N 46 ERGODIC NUMBER THEORY tends almost everywhere to zero. Use Lebesgue’s theorem on dominated converP n gence to prove that SN h := N1 is a Cauchy sequence in Lp . Now 0≤n<N h ◦ T show for f that SN f is also a Cauchy sequence and conclude the proof by proving 1 N +1 SN +1 f − SN (f ◦ T ) = f. N N There is a converse of Birkhoff’s ergodic theorem which reminds us on the analogue for Riemann integrals in the case of uniform distribution: Exercise 4.3. Given an ergodic transformation T on a finite measure space and a non-negative measurable function f , show that if 1 X f (T n x) lim N →∞ N 0≤n<N exists for almost all x, then f is integrable. Hint: Define functions fk (x) = min{f (x), k} and use the theorem of monotone convergence. Exercise 4.4. What is wrong with the following ’proof ’ of Birkhoff’s erR godic theorem: if f is a complex-valued function on N0 , we write f (n) dn = P limN →∞ N1 0≤n<N f (n) whenever the limit exists, and call such functions integrable. If now T is a measure preserving transformation on a space X and if f is integrable on X, then Z Z Z Z n |f (T x)| dn dx = |f (T n x)| dx dn Z Z Z = |f (x)| dx dn = |f (x)| dx < ∞, by Fubini’s theorem. Hence, f (T n x) is an integrable function of both variables and therefore, for almost every x an integrable function of n. This alternative but not completely serious reasoning can be found in [58], p.24. Which parts are mathematically correct and which not? As announced, ergodic theorems generalize the concept of uniform distribution. For instance, we may derive many results form the first chapter via ergodic theory: Exercise 4.5. Apply Birkhoff’s ergodic Theorem 4.2 to the circle rotation and give an alternative proof of Corollary 1.5. * * * Birkhoff [21] gave already applications of his ergodic theorem to a restricted three-bodies problem Earth–Sun–Moon.§ Since then many obvious and non-obvious applications of ergodic theory have been found. In the following chapter we shall give two classical examples. § and to convex billiards as well! CHAPTER 5 Heavenly and Normal Applications We have already mentioned that the origin of ergodic theory dates back to Poincaré and his studies of heavenly bodies. In this chapter we shall prove his famous recurrence theorem which, of course, can be proved without the concept of ergodicity (the latter came more than thirty years later), however, we shall also see that the ergodic machinery allows further insights. Moreover, we give another application of Birkhoff’s ergodic theorem which, besides Weyl’s work on uniform distribution modulo one, can also be interpreted as a forerunner, namely Borel’s theorem on normal numbers. Here we shall discuss questions as: how often appears the digit 7 in the decimal expansion of π? The reader may do a guess... 5.1. Poincaré’s Recurrence Theorem Is our solar system stable? The dynamics of two bodies in space under gravity are described by Kepler’s laws. In his 270-pages paper [112] Henri Poincaré solved part of the three-bodies-problem, that is the mathematical description of the orbits of three bodies interacting gravity. With his apporach Poincaré gave the foundations for for treating chaotic movements and invariant integrals. In his monumental work [113] consisting of three volumes Poincaré sets the stage for mathematical ergodic theory. It contains his famous reccurrence theorem. Before we state and prove this remarkable result we first need to introduce more vocabulary. Let T be a measure preserving transformation on a probability space (X, F, µ) and A be a measurable set. A point x ∈ A is said to be A-recurrent if there exists a positive integer n for which T n x ∈ A. The notion of recurrence plays a crucial role in the branch of topological dynamics. Here is Poincaré’s recurrence theorem: Theorem 5.1. Let T : X → X be a measure preserving transformation on a probability space (X, F, µ) and let A be measurable with µ(A) > 0. Then, for almost all x ∈ A, the trajectory {T n x} will return to A infinitely often, in particular, x is almost surely A-recurrent. Equivalent to the almost sure recurrence is the divergence of the infinite P n series ∞ n=0 χA (T x) for almost all x. This formulation reminds us on the almost sure identity (4.10) from the proof of Theorem 4.3. In fact, it follows 47 48 ERGODIC NUMBER THEORY immediately from Birkhoff’s ergodic theorem: Z 1 X n lim χA dµ = µ(A). χA (T x) = N →∞ N X 0≤n<N The recurrence theorem of Poincaré yields a proof of the weak ergodic hypothesis (mentioned briefly in the previous chapter). The restriction on recurrence almost everywhere allows the existence of a null set of non-recurrent points. This is necessary as follows from the example provided by the transformation T x = 2x mod 1 (Example 2 in Chapter 3): the orbit of x = 12 (or any other reciprocal of a power of two) is eventually stationary in 0. As a matter of fact, Poincaré did not prove his result by deeper measure or even ergodic theoretical arguments. We give an alternative proof due to Carathéodory [27] independent of Birkhoff’s theorem: Proof. Let B be the subset of A which consists exactly of those points x that are not A-recurrent, that is B = {x ∈ A : T n x 6∈ A for all n ∈ N}. Alternatively, B = A ∩ T −1 (X \ A) ∩ T −2 (X \ A) ∩ . . . , which shows that B is measurable. We shall show that µ(B) = 0. Since B ⊂ A we have B ∩ T −nB = ∅ for any n, and, consequently, T −k B ∩ T −k−n B = ∅ for all k, n 6= 1. Hence, the sets B, T −1 B, T −2 B, . . . are pairwise disjoint and, since T is measure preserving, it follows that µ(B) = µ(T −n B) for all n ∈ N. Now assume that µ(B) > 0, then ∞ [ X µ(B) = +∞, 1 = µ(X) ≥ µ T −n B = n∈N0 n=0 a contradiction. This implies the A-recurrence of almost all x ∈ A. In fact, almost all x will return infinitely often to A. To see that, define C = {x ∈ A : T n x ∈ A for only finitely many n ∈ N}. Then C = {x ∈ A : T n x ∈ B for some n ∈ N0 } ⊂ ∞ [ T −n B. n=0 Since µ(B) = 0 and T is measure preserving, it follows that µ(C) = 0. • The statement and the proof of Poincaré’s recurrence theorem may be interpreted as measure theoretical pigeon hole principle (see Chapter 1). And another part of the reasoning reminds us on Vitali’s negative solution of the measure problem. The recurrence property is intimately related to the assumption of a finite measure. For instance, the transformation T : R → R, T x = x + 1 is measure preserving on R with respect to the Lebesgue measure, however, 5. Poincaré’s Recurrence Theorem 49 for any bounded set A ⊂ R and x ∈ A the set {n ∈ N : T n x ∈ A} is empty or finite which shows that T does not allow recurrence. We give a physical interpretation of the recurrence theorem: given a box in R3 with an evacuated right chamber and left chamber filled with gas, separated by a dividing wall. after removing the dividing wall, we may expect the gas molecules to distribute in the whole box, resulting in some kind of uniform distribution. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • | | | | | | | ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ • • ◦ • ◦ ◦ • −→ ◦ ◦ • ◦ • ◦ ◦ • ◦ • • • ◦ • • • ◦ • • ◦ • ◦ ◦ ◦ ◦ ◦ • • ◦ • ◦ • ◦ ◦ • • ◦ • ◦ ◦ • ◦ ◦ • ◦ • ◦ • ◦ • ◦ • • ◦ ◦ • • ◦ ◦ ◦ • • • Contrary to our intuition Poincaré’s recurrence theorem the system will return after a (long but) finite time to its starting constellation, at least approximately: the vacuum to the right (◦), the gas molecules on the left (•). On first view this seems to contradict the second main theorem of thermodynamics and Boltzmann’s theorem which claims that the entropy of a closed system cannot decrease.∗ However, the assertion of the recurrence theorem is primarily of statistical nature and the apparent incompatibility resolves, however, by taking the expected return time into account which is in all practical instances beyond the age of our universe. For the probability to observe such violations of the second main theorem of thermodynamics we refer to Evans & Searls [51]. With regard to Poincaré’s Recurrence Theorem 5.1 we may ask how soon an orbit {T n x} will visit a measurable set A. For the following investigation we shall use an idea of Kakutani [75], namely, to consider the transformation T only for the time when T n x visits A. For x ∈ A ∈ F we define the return time of x to A by nA (x) = min{n ∈ N : T n x ∈ A}. Since nA is a minimum, it is measurable. In view of Poincaré’s recurrence theorem it follows that nA (x) is finite for almost all x. Next we remove from A the null set consisting of all x for which nA (x) = +∞ and we denote the resulting set again by A. Now we introduce a measure induced by µ on the σ-algebra generated by F ∩ A: µA (B) = ∗ µ(B) µ(A) for B ⊂ A, By the way, the second main theorem of thermodynamics excludes the existence of a perpetuum mobile and, because of the irreversibilty of time, travels in time. 50 ERGODIC NUMBER THEORY which reminds us on the notion of conditional probability. This yields another probability space (A, F ∩ A, µA ). Moreover, we define the induced transformation TA : A → A, x 7→ T nA (x) x. Now we are in the position to prove the following technical result: Theorem 5.2. Let A be measurable and assume the definitions and conditions from above. Then the transformation TA is measure preserving with respect to µA . Moreover, if T is ergodic, then TA too. Proof. For n ∈ N define An = {x ∈ A : n(x) = n}, Bn = {x ∈ X \ A : T x, . . . , T n−1 x 6∈ A, T n x ∈ A}. Then An ∩ Bm = ∅. Moreover, (5.1) T −1 A = A1 ∪ B1 T −1 Bn = An+1 ∪ Bn+1 and for n ∈ N. Now let C ∈ F ∩ A. Since T is measure preserving with respect to µ, it follows that µ(C) = µ(T −1 C). In order to prove the first statement we shall show that the same holds for µA . We have ∞ [ TA−1 C = n=1 An ∩ TA−1 C = ∞ [ n=1 An ∩ T −n C, where the sets An ∩ T −n C are pairwise disjoint. Hence, µ(TA−1 C) = (5.2) ∞ X n=1 µ(An ∩ T −n C). Since measures are preserved, repeated application of (5.1) leads to µ(T −1 C) = = = ... = µ(A1 ∩ T −1 C) + µ(B1 ∩ T −1 C) µ(A1 ∩ T −1 C) + µ(T −1 (B1 ∩ T −1 C)) µ(A1 ∩ T −1 C) + µ(A2 ∩ T −2 C) + µ(B2 ∩ T −2 C) N X n=1 µ(An ∩ T −n C) + µ(BN ∩ T −N C). This construction called the Kakutani skyscraper since one climbs from the sets A1 and B1 to An and Bn and so forth. In a similar manner, ! ∞ ∞ X [ −n µ(Bn ∩ T −n C), Bn ∩ T C = 1≥µ n=1 n=1 hence µ(Bn ∩ T −n C) tends to zero as n → ∞. In view of (5.2) this yields µ(C) = µ(T −1 C) = ∞ X n=1 µ(An ∩ T −n C) = µ(TA−1 C), 5. Poincaré’s Recurrence Theorem 51 resp. µ(TA−1 C) µ(C) = = µA (TA−1 C). µ(A) µ(A) It follows that TA is measure preserving with respect to µA . It remains to show that TA inherits the ergodicity property. For this purpose lets assume that T is ergodic. Then, for a T -invariant set B ⊂ A of positive measure µA (B) > 0, we have to show µA (B) = 1. Using the T -invariance we have B = TA−1 B = TA−2 B = . . . and so on. Thus, ! ∞ [ T −n B ∩ A. B = µA (C) = n=0 If T is ergodic, we deduce from 0 < µA (B) = µ(B)/µ(A) that 0 < µ(B) = 1. Thus, ! ∞ [ −n T B =1 µ n=0 S −n B and B = A, respectively. Hence, we get which yields X = ∞ T n=0 µA (B) = 1. The proof is complete. • The next statement is due to Kac [74], called Kac’s lemma, and provides a quantitative version of Poincaré’s recurrence theorem (analogous to Weyl’s quantitative description of Kronecker’s and Bohl’s results on the distribution on (nξ)): Theorem 5.3. Let T : X → X be a measurable ergodic transformation on a probability space (X, F, µ) and let A be a measurable set with µ(A) > 0. Then nA ∈ L1 and, for the first return nA (x) for a point x ∈ A, Z Z 1 nA (x) dµ(x) = 1 resp. nA (x) dµA (x) = µ(A) A A and 1 1 X nA (T n x) = . lim N →∞ N µ(A) 0≤n<N Thus, the expectation for the first return of an orbit to a given set equals 1/µ(A). Proof. For x ∈ A we consider the orbit of x under TA , that is x, TA x, . . . , TAn x, . . . , TAN x, . . . . P The quantity t := 0≤n<N nA (TAn x) measures the time for the first N returns of the orbit of x under T to the set A, i.e., X χA (T n x) = N. 0≤n<t Applying Birkhoff’s Ergodic Theorem 4.2 to TA and T (with N → ∞ resp. t → ∞), we get 52 ERGODIC NUMBER THEORY 5. Poincaré’s Recurrence Theorem Z nA (x) dµA (x) = A 1 N →∞ N lim X 53 nA (TAn x) 0≤n<N t = = lim P n t→∞ 0≤n<t χA (T x) Z X χA dµ −1 = 1 , µ(A) which we had to show. • (For a nice variant of the proof see Baéz-Duarte [9].) We return for short to cat Felix and the phenomenon of his complete recurrence after a finite number of iterations of Arnold’s cat-map (see the pictures for the iterations An with n = 0, Felix himself, then n = 1, 2, 3, 4, 6, 50, and, finally, n = 405 (Felix once again).† Actually one can show that if (X, F, µ, T ) is an ergodic system with discrete space X and uniform distribution µ, then the recurrence is sure for any point (see Exercise 5.3). Finally, we give a measure theoretical variation of Theorem 5.1: Theorem 5.4. Let T : X → X be a measure preserving transformation on a probability space (X, F, µ) and let A be a measurable set with µ(A) > 0. Then µ(A ∩ T −n A) > 0 for infinitely many n. Proof. Since T is measure preserving, all sets A, T −1 A, T −2 A, . . . have the same measure. If all these sets would be disjoint, finitely many of those sets would provide a finite union of measure larger than µ(X) = 1, a contradiction. Thus, there are positive integers m < n such that µ(T −n A ∩ T −m A) > 0. Writing k = n − m it follows that µ(A ∩ T −k A) > 0 (since T is measure preserving). Repeating this argument with A, T −k A, T −2k A, . . ., implies µ(A ∩ T −n A) > 0 for infinitely many n. • 5.2. Normal numbers Let b be a positive integer strictly larger than one. Any real number x possesses a representation to base b, also called b-adic representation, i.e., (5.3) x= ∞ X n=0 an b−n with a0 ∈ Z, an ∈ {0, 1, . . . , b − 1}. Here a0 = ⌊x⌋ is the integral part of x and the an are said to be the badic digits of {x} ∈ [0, 1). This representation is not unique, however, we should not think to much about this defect since it is related only to a null set; we illustrate this with a simple and well-known example from decimal expansion: 0.9 = 0.99999 99999 . . . = 1.0 = 1, where, as usual, the expression 9 stands for the infinite sequence of digits 9. In fact, if x has an eventually periodic b-adic representation, then x † The article [56] provides similar pictures with Henri Poincaré in place of Felix which is acknowledged to [35]. 54 ERGODIC NUMBER THEORY is rational and thus belongs to a set of Lebesgue measure zero; if the representation is not eventually periodic, then it is unique and x is irrational. A real number x is called normal to base b if for each k ∈ N any block of digits α1 . . . αk with αj ∈ {0, 1, . . . , b − 1} appears with the same frequency in the b-adic representation of x = a0 .a1 a2 . . .. For k = 1 this means that any digit appears with the same frequency: 1 1 lim ♯{n ≤ N : an = α} = N →∞ N b for all α ∈ {0, 1, . . . , b − 1}; for k = 2 normality implies 1 1 ♯{n ≤ N : an = α, an+1 = α′ } = 2 lim N →∞ N b ′ for all pairs α, α ∈ {0, 1, . . . , b − 1}. In the generic case the pattern α1 . . . αk with αj ∈ {0, 1, . . . , b − 1} appears with asymptotical frequency b−k . Obviously, it suffices to consider only that part of the b-adic representation to the fractional part {x} ∈ [0, 1). Next we shall prove Borel’s theorem: Theorem 5.5. Almost all real numbers x are normal to any base b. This theorem explains why numbers with such a regularity in their b-adic representation are called normal. It should be noted that a number can be normal to base b but not normal with respect to another base b′ . This observation is due to Cassels [28] and Schmidt [123] obtained a criterion to describe under which circumstances a normal number to base b is normal to base b′ . Proof. In view of our previous observation it suffices to prove the statement for numbers x ∈ [0, 1). The mapping Tb : [0, 1) → [0, 1), defined by Tb x = bx mod 1, is measurable with respect to the Lebesgue measure λ. Moreover, it is ergodic (which can be proved in the generic case along our proof for the special case b = 2 which was Example 2 in §4). If now x to base b is given by (5.3), then we have α α+1 n , =: I(α) Tb x ∈ b b for some given α ∈ {0, 1, . . . , b−1} if, and only if, an+1 = α. By Birkhoff’s ergodic theorem 4.2 it thus follows Z 1 1 X n χI(α) dλ = λ(I(α)) = χI(α) (Tb x) = lim N →∞ N b [0,1) 0≤n<N for almost all x. This implies the statement in the case of individual digits α (that are blocks of length k = 1). The generic case (k ∈ N) follows via α α+1 k−1 k−2 α := α1 b + α2 b + . . . + αk and I(α, k) := k , k b b in an analogous manner: Z 1 1 X χI(α,k) dλ = λ(I(α, k)) = k , χI(α,k) (Tbn x) = lim N →∞ N b [0,1) 0≤n<N 5. Poincaré’s Recurrence Theorem 55 which leads to the assertion of the theorem. • Borel’s original argument in [24] was based on the Borel-Cantelli– lemma from probability theory (and faulty; cf. [33]). An elementary (and correct) proof which follows Borel’s reasoning was given by Niven [107]. A different but unpublished approach is due to Alan Turing [138]; recently, Becher, Figueira & Picchi [14] completed his work. With probabilistic tools we can derive more information. The central limit theorem states that: Given a sequence X1 , X2 , . . . of independent identically distributed L2 -variables with expectation µ and variance σ 2 , let SN := X1 + X2 + . . . + XN ; thence 2 Z y SN − µN t 1 √ dt. lim P exp − ≤y = √ N →∞ 2 2π −∞ σ N This strengthens the strong law of large numbers which claims N1 SN → µ as N → ∞ under weaker conditions on the Xj . Now let us consider real numbers x from the unit interval, given in their binary expansion: x= ∞ X bj (x) with 2j j=1 bj ∈ {0, 1}. This is Example 2 from Chapter 3. By Borel’s theorem, resp. Birkhoff’s theorem, Z N 1 X n lim b1 dλ = 12 b1 (T x) = N →∞ N [0,1) n=1 for almost all x. Note that bj = b1 ◦ T j−1 . Using this we can compute expectation and variance by Z 1 Z 1 Z j−1 bj dλ = b1 (T x) dx = b1 (x) dx = 21 [0,1) and Z 0 0 1 (bj (x) − 1 2 2 ) dx 0 = Z 0 1 (b1 (x) − 12 )2 dx = 41 . Moreover, P(bj (x) = 0) = P(bj (x) = 1) = 12 . So we may apply the central limit theorem and obtain ! PN 2 Z y n−1 (x) − 1 N t 1 n=1 b1 ◦ T 2 √ √ exp − dt. ≤y = lim P 1 N →∞ 2 2π −∞ 2 N So there is a Gaussian normal distribution behind Borel’s normal numbers theorem. Normal distribution is a common feature of many natural distributions. To give another example, Sinai [122] investigated the geodesic flow ϕt of the unitary tangent bundle T1 V of a surface V of constant negative curvature; if A is a domain of T1 V with piecewise differentiable boundary, then the mean sojourn time of ϕt in this domain has a Gaussian distribution. 56 ERGODIC NUMBER THEORY Although Borel’s theorem 5.5 shows that almost all real numbers are normal to any base it is a difficult problem to prove normality of a given real number. For instance, it is unknown whether the famous number π = 3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 . . . is normal with respect to some base.∗ This problem reminds us on the difficulty to decide whether a given number is algebraic or transcendental. Actually, for the latter question some techniques are known and, in particular the transcendence of π was shown by Lindemann in 1882 (which implies the impossibility of the ancient problem of squaring the circle by ruler and compass.† Kanada & Takahashi [78] computed more than 50 billion digits of the decimal expansion and the deviation from normality in this data set is less than 0.002% for any digit. √ The situation is not better with other famous constants as e = exp(1) and 2. Bailey & Crandall [11] conjectured that any algebraic irrational number is normal. Obviously, rational numbers are not normal (since their b-adic representations are eventually periodic). A more advanced example of non-normal numbers results from the Cantor set C which is defined by successive deleting the middle thirds from the unit interval [0, 1]. More precisley, n C = [0, 1] \ ∞ [ 2 [ n=0 j=1 (xnj + 3−n−1 , xnj + 2 · 3−n−1 ) with certain rationals xnj . The Cantor set C is an example of an uncountable perfect set with empty interior (see [49]); recall that a set is said to be perfect if each element is a limit point. The elements of C are exactly those numbers x ∈ [0, 1] having ternary expansion without digit 1 (since the middle thirds were deleted), i.e., x∈C ⇐⇒ x= ∞ X n=1 an 3−n with an ∈ {0, 2}. The numbers xnj from above provide all possible partial sums of such elements x. Hence, the Cantor set does not contain any base 3 normal number; in particular, it follows from Borel’s theorem that the Cantor set C has Lebesgue measure zero.‡ There are only a few methods for constructing normal numbers known. The first explicit example of a normal number was given by Sierpinski ∗ This problem is mentioned in the avantgardistic movie Pi of D. Aronofsky. It should be noted that Ferdinand Lindemann did his habilitation at Würzburg University in 1877; however, the breakthrough with π he had during his time in Munich. ‡ see http://mathworld.wolfram.com/CantorSet.html for amazing computer animations on this topic. † 5. Poincaré’s Recurrence Theorem 57 [127]. An easy and convincing example of a normal number is due to Champernowne [30]: 0.123456789 10111213141516171819 2021 . . . . Moreover, Copeland & Erdös [32] proved that 0.23571113171923293137414347 . . . , is normal to base 10. In these examples it is obvious how the numbers are constructed. It is not too difficult to compute any digit explicitly.§ Normal numbers are definitely not made for generating random numbers. Figure 1. The first 1600 binary digits of π (left) and its rational approximation 22 7 (right) ordered in a spiral. Which is rational number with least denominator which approximates π such that the first 1600 binary digits of both numbers are equal? We return to the number π. It is expected that there is no pattern in the decimal expansion hidden. It is also conjectured that π is normal to any base b. Nevertheless, it was a big surprise ten years ago when Bailey, Borwein & Plouffe [10] discovered the so-called BBP-formula (named after their initials) which allows to compute an arbitrary digit of π in the hexadecimal system (base b = 16) without knowing any previous digit: ∞ X 4 2 1 1 1 − − − . (5.4) π= 16n 8n + 1 8n + 4 8n + 5 8n + 6 n=0 We shall sketch how to derive this formula. One starts with Z 1/√2 X Z 1/√2 k−1 ∞ ∞ X 1 x 1 k−1+8m − k2 dx = · . x dx = 2 8 m 8m + k 1 − x 16 0 0 m=0 § m=0 Well, in the case of Copeland & Erdös’s number the computation is not too easy, however, thanks to the recent primality test of Agrawal, Kayal & Saxena [3] the computation can be performed in polynomial time. 58 ERGODIC NUMBER THEORY Thus, (5.4) is equivalent to √ Z 1/√2 √ Z 1 y−1 4 2 − 8x3 − 4 2x4 − 8x5 π= dx = 16 dy. 8 4 − 2y 3 + 4y − 4 1 − x y 0 0 Using Z x du 1 + u2 0 and partial fraction decomposition (resp. a computer algebra package), this implies the BBP-formula (5.4). But how to read off an arbitrary digit of π? We explain this by a more simple example, namely, arctan x = log 2 = ∞ X 1 , k2k k=1 which follows immediately from the power series expansion of the logarithm and Abel’s limit theorem. Thus the (d + 1)-th digit of the binary expansion of log 2 equals (( d ) ( ∞ )) X 2d−k mod k X 2d−k {2d log 2} = + . k k k=0 k=d+1 The numerators 2d−k mod k in the first sum can be computed modulo k by a method called fast exponentiation.¶ The second sum converges quickly, hence only a few terms need to be computed. In a similar manner, only with more technical efforts, one can compute an arbitrary digit of the hexadecimal expansion of π by using the BBP-formula (5.4). However, this does not imply any pattern for the digits in this 16-adic expansion — in contrast to Champernowne’s number — at least no consequences for normality are known so far. Recently, Bailey & Crandall [11] made a conjecture how a formula of (5.4)-type (as those above for π and log 2) could be related to a sequence of real numbers which should be uniformly distributed modulo one if, and only, if the underlying number is normal. We do not go into the details but mention that, as a consequence of this unproved hypothesis, π would be normal to base 16 if the sequence (xn ), given by 120n2 − 89n + 16 , 512n4 − 1024n3 + 712n2 − 206n + 21 is uniformly distributed modulo one. In case of log 2 normality would result from the uniform distribution of 1 mod 1. x0 = 0, xn+1 = 2 xn + n (5.5) x0 = 0, xn = 16xn−1 + Unfortunately, for both sequences it is not known whether they are uniformly distributed modulo one. A curiosity: if π is normal, lets say to base b = 26, and if we asign to each of the 26 digits a letter of the latin alphabet, A 7→ 1, B 7→ 2, . . ., ¶ For example, 217 = ((((22 )2 )2 )2 ) · 2, hence 17 = 24 + 20 . 5. Poincaré’s Recurrence Theorem 59 say, then the 26-adic expansion of π would include a proof of its normality, provided that this statement is provable.k Exercises A rolling stone gathers no moss. We start with reccurrence: Exercise 5.1. Prove the following metrical version of Poincaré’s reccurrence theorem: assume the condition of Theorem 5.1 and suppose that X is a metrical space with metric d which respects µ. Then, for almost all x, lim inf d(x, T n x) = 0. n→∞ Moreover, show that for some n ≤ 1 + ⌊1/µ(A)⌋ the inequality µ(A ∩ T −n A) > 0 holds. Next we consider random walks on the unit cricle: Exercise 5.2. Imagine a random walker starts at the point P on the unit circle T and tosses a fair coin; if it comes up heads the walker moves counterclockwise by the distance α whereas for tails the walker moves clockwise by the distance β, where α and β are positive real numbers. A sequence of toin cossings can be regarded as a P binary expansion of a number from the unit interval: x = (x1 , x2 , . . .) = j≥1 xj 2−j with xj ∈ {0, 1} according to heads or tails. Can the random walker visit any open neighborhood of a given point on T, and, if so, how long does it take to return to aneighbourhood of the starting point P ? Hint: for some advice see [31], §7.5. The next task is to explain Felix’s recurrence: Exercise 5.3. Prove: if (X, F , µ, T ) is an ergodic system with discrete space X and uniform distribution µ, then the recurrence is sure. Explain why cat Felix returns completely after n = 405 iterations. Can you also explain that the first return of Felix is after n = 135 returns. Hint: the picture of Felix consists of 810 × 810 pixles; note that 810 is an integer multiple of 405 which is an integer multiple of 135. An illuminating reading might be [42]. The Cantor set and its relatives have a very interesting topological structure. Exercise 5.4. Prove all statements about the Cantor set C, in particular show that it has Lebesgue measure λ(C) = 0, does not contain any open interval, is compact and perfect, and hence uncountable. Moreover, search in the mathematical literature for generalizations of C (e.g., the Sierpinski gasket). We conclude with a more explicit task on the BBP-formula: Exercise 5.5. Give a complete proof for (5.4). Furthermore, implement an algorithm to compute the hexadecimal expansion of π by use of the BBP-formula. Compare your results with the values xn according to (5.5) and do some statistics for the digits. k Unfortunately, it would also contain false proofs. There is a computer program on the webpage www.angio.net/pi/bigpi.cgi which finds – if possible – the first appearance of any date in the decimal expansion of π; for instance, my birth date appears at position 151897. 60 ERGODIC NUMBER THEORY * * * In the next chapter we get to know the Riemann zeta-function which is one of the main objects in analytic number theory. It is known to play a central role in prime number distribution theory (as we will briefly highlight), however, it will also show up in the context of continued fraction expansions and their ergodic properties in a later chapter. The main tools to study the zeta-function is complex analysis and in some places our exposition will be only fragmentary. CHAPTER 6 Interlude: The Riemann Zeta-Function The zeta-function is of particular interest in analytic number theory. For Re s > 1, zeta is defined by ∞ X Y 1 −1 1 ; = 1− s (6.1) ζ(s) = ns p p n=1 here the product is taken over all prime numbers p. The identity between the series and the product is an analytic version of the unique prime factorization of the integers as becomes obvious by expanding each factor of the product into a geometric series. This type of series is called Dirichlet series and a product over primes as above is referred to as Euler product. This indicates that already Euler studied the zeta-function. We give a first glimpse of his insights by reproducing his one-sentence proof of the infinitude of primes: if there were only finitely many primes, the product would converge for s = 1, contradicting the divergence of the harmonic series. This analytic reasoning approved to be more powerful than elementary approaches to prime number distribution. For instance, Euler showed that the sum over the reciprocals of the primes is divergent, which he noted as follows: 1 1 1 1 + + + + . . . = log log ∞; 2 3 5 7 P in modern notation we would write this in the form p≤x 1p ∼ log log x, as x → ∞. It follows that the primes form a sparse set within N. Unfortunately, this does not imply an asymptotic formula for the number π(x) of primes p ≤ x. It was the young Gauss who, at the age of seventeen, conjectured that π(x) ∼ logx x , as x → ∞. And it was Riemann who was the first to study ζ(s) as a function of a complex variable which led, finally, to a proof of Gauss’ conjecture. We shall briefly survey the remarkable relation between primes and zeta in the following. Since our exposition here will be rather sketchy we recommend further reading. For more information as well as citations to the original works on the zeta-function and its impact on prime number distribution we refer to the classical monography [137] by Titchmarsh and the historical account [104] of Narkiewicz. 6.1. Primes and Zeros It is not difficult to show (e.g., by Riemann’s integral test) that both, the series and the product in (6.1) converge absolutely for all complex numbers 61 62 ERGODIC NUMBER THEORY s with Re s > 1. Of special interest are the values of zeta at the positive integers. Euler proved Theorem 6.1. For k ∈ N, ζ(2k) = (−1)k+1 (2π)2k B2k . 2(2k)! Here Bm denotes the mth Bernoulli number, defined by the identity ∞ X x xm 1 1 1 Bm = = 1 − x + x2 − x4 + . . . . exp(x) − 1 m=0 m! 2 6 30 The Bernoulli numbers were discovered independently by the Swiss mathematician Jakob Bernoulli and by the Japanese mathematician Seki Kōwa, both discoveries were posthumously published in 1713, resp. 1712. In particular, we deduce Euler’s famous formula (6.2) ζ(2) = 1 + 1 1 π2 1 + 2 + 2 + ... = 2 2 3 4 6 We shall give only an idea of his proof (details are left to the reader as Exercise 6.1). Recall the infinite product ∞ ∞ Y sin z z Y z2 = = 1− 1− 2 2 z πn π n n=−∞ n6=0 n=1 and the power series representation ∞ X sin z z2 z4 z 2k =1− + ∓ ... = . (−1)k z 3! 5! (2k + 1)! k=0 Comparing the coefficients at z 2 , we deduce (6.2). Euler’s proof was much discussed by his contemporaries. At his times it was not clear whether sin z has no complex zeros; furthermore, the convergence of the infinite product for the sine-function cannot be proved without complex analysis which was not developed (although questions of convergence in those times were often considered as negligible). However, today Euler’s argument is waterproof and might be the easiest proof of all. It is easily seen that the Bernoulli numbers are rational, hence the values ζ(2k) are rational multiples of π 2k and, by the transcendence of π, they are even transcendental. However, not too much is known about the arithmetic nature of the values at the positive odd integers. Apéry [4] showed that ζ(3) is irrational, however, it is unknown whether ζ(3) is transcendental or whether ζ(5) is irrational. Next we need an analytic continuation of the zeta-function to the left of this half-plane of absolute convergence; a certain problem is the singularity at s = 1 which implied in form of the harmonic series the existence of 6. The Riemann Zeta-Function 63 infinitely many primes in Euler’s proof. Assume Re s > 1, then ∞ ∞ ∞ X X X 1 1 (−1)n−1 − 2 = . ns (2m)s ns n=1 m=1 n=1 Rewriting the left-hand side in terms of the zeta-function, we obtain the representation 1−s −1 ζ(s) = (1 − 2 (6.3) ) ∞ X (−1)n−1 ns n=1 . We observe that the series on the right converges in the half-plane Re s > 0. 2π The factor (1 − 21−s )−1 has simple poles at s = 1 + log 2 k for any k ∈ Z, however, for k 6= 0 the alternating series vanishes. Hence, (6.3) provides an analytic continuation for ζ(s) to the half-plane Re s > 0 except for a simple pole at s = 1. For a later purpose we shall derive another formula for the zeta-value ζ(2) which is related with the representation (6.3). Recall the Gamma-function, for complex s with positive real part defined by the integral Z ∞ y s−1 exp(−y) dy. (6.4) Γ(s) = 0 Substituting y = nu we deduce n −s Γ(s) = Z ∞ us−1 exp(−nu) du, 0 resp., for Re s > 1, ∞ X (−1)n−1 n=1 ns Γ(s) = Z ∞ ∞ X s−1 u 0 n−1 (−1) ! exp(−nu) n=1 du; here we may interchange summation and integration because of absolute convergence. Substituting u = − log x we obtain ∞ X ∞ X (−1)n−1 exp(−nu) = (−1)n−1 xn = n=1 n=1 x , 1+x hence 1−s (1 − 2 Z )ζ(s)Γ(s) = (6.5) (− log x)s−1 0 In particular, π2 1 = ζ(2) = 12 2 1 Z 1 (− log x) 0 dx . 1+x dx , 1+x which we will need in a later chapter. Here is another way of extending ζ(s) beyond the domain of absolute convergence of its defining Dirichlet series which is due to Riemann [118] 64 ERGODIC NUMBER THEORY and stands at the beginning of his and others investigations of the zetafunction as a function of a complex variable. Substituting u = πn2 x in (6.4) leads to Z ∞ s s 1 s −2 (6.6) Γ π x 2 −1 exp(−πn2 x) dx. = s 2 n 0 Summing up over all n ∈ N yields ∞ ∞ Z ∞ s X X s 1 − 2s = x 2 −1 exp(−πn2 x) dx. π Γ s 2 n 0 n=1 n=1 On the left-hand side we find the Dirichlet series defining ζ(s); in view of its convergence, the latter formula is valid only for Re s > 1. On the right-hand side we may interchange summation and integration, justified by absolute convergence. Thus we obtain Z ∞ ∞ s X s − 2s exp(−πn2 x) dx. π Γ ζ(s) = x 2 −1 2 0 n=1 We split the integral at x = 1 and get Z 1 Z ∞ s s − 2s (6.7) π Γ + x 2 −1 ω(x) dx, ζ(s) = 2 1 0 where the series ω(x) is given in terms of the ’half’ theta-function of Jacobi: ω(x) := ∞ X n=1 exp(−πn2 x) exp(−πn2 x) = 1 (θ(x) − 1) 2 (since = exp(−π(−n)2 x) for any n ∈ N). In view of the functional equation for the theta-function, √ 1 1 √ 1 1 ω = θ − 1 = xω(x) + ( x − 1), x 2 x 2 which can be deduced from Poisson summation formula, we find by the substitution x 7→ x1 that the first integral in (6.7) is equal to Z ∞ Z ∞ s+1 1 1 1 − 2s −1 x − . x− 2 ω(x) dx + dx = ω x s−1 s 1 1 Substituting this in (6.7) yields Z ∞ s s s s+1 1 (6.8) π− 2 Γ x− 2 + x 2 −1 ω(x) dx. ζ(s) = + 2 s(s − 1) 1 Since ω(x) ≪ exp(−πx), the last integral converges for all values of s, and thus (6.8) holds, by analytic continuation, throughout the complex plane. The right-hand side remains unchanged by s 7→ 1 − s. This proves Riemann’s functional equation: s 1−s − 1−s − 2s ζ(s) = π 2 Γ (6.9) π Γ ζ(1 − s), 2 2 valid for all complex s. 6. The Riemann Zeta-Function 65 In view of the Euler product (6.1) it is easily seen that ζ(s) has no zeros in the half-plane Re s > 1. It follows from the functional equation and from basic properties of the Gamma-function that ζ(s) vanishes in Re s < 0 exactly at the so-called trivial zeros s = −2n with n ∈ N. All other zeros of 0.1 0.05 -14 -12 -10 -8 -6 -4 -2 -0.05 -0.1 -0.15 Figure 1. ζ(s) in the range s ∈ [−14.5, 0.5]. ζ(s) are said to be nontrivial, and we denote them by ρ = β + iγ. Obviously, they have to lie inside the so-called critical strip 0 ≤ Re s ≤ 1, and it is easily seen that they are non-real. The functional equation (6.9) and the identity ζ(s) = ζ(s) show some symmetries of ζ(s). In particular, the nontrivial zeros of ζ(s) are distributed symmetrically with respect to the real axis and to the vertical line Re s = 12 . It was Riemann’s ingenious contribution to number theory to point out how the distribution of these nontrivial zeros is linked to the distribution of prime numbers. Riemann conjectured the asymptotics for the number N (T ) of nontrivial zeros ρ = β + iγ with 0 < γ ≤ T (counted according multiplicities). This conjecture was proved in 1895 by von Mangoldt who found more precisely T T (6.10) log + O(log T ). N (T ) = 2π 2πe Riemann worked with the function t 7→ ζ( 12 + it) and wrote that very likely all roots t are real, i.e., all nontrivial zeros lie on the so-called critical line Re s = 12 . This is the famous, yet unproved Riemann hypothesis which we rewrite equivalently as Riemann’s hypothesis. ζ(s) 6= 0 for Re s > 12 . In support of his conjecture, Riemann calculated some zeros; the first one with positive imaginary part is ρ = 12 + i14.134 . . .. Furthermore, he conjectured that there exist constants A and B such that s Y s s − s2 1 ζ(s) = exp(A + Bs) 1 − exp , s(s − 1)π Γ 2 2 ρ ρ ρ 66 ERGODIC NUMBER THEORY 1.5 1 0.5 -1 1 2 3 -0.5 -1 -1.5 Figure 2. The values of ζ( 21 + it) as t ranges from 0 to 40. where the product on the right is taken over all nontrivial zeros (the trivial zeta zeros are cancelled by the poles of the Gamma-factor). This was shown by Hadamard in 1893 (on behalf of his theory of product representations of entire functions). Finally, Riemann conjectured the so-called explicit formula which states that (6.11) π(x) + 1 ∞ X π(x n ) n=2 n X = li(x) − + Z x ρ=β+iγ γ>0 ∞ u(u2 li(xρ ) + li(x1−ρ ) du − log 2 − 1) log u 1 for any x ≥ 2 not being a prime power (otherwise a term 2k has to be added k on the left-hand side, where x = p ). The appearing integral logarithm is defined by Z (β+iγ) log x exp(z) dz, li(xβ+iγ ) = (−∞+iγ) log x z + δiγ where δ = +1 if γ > 0 and δ = −1 otherwise. The explicit formula was proved by von Mangoldt in 1895 as a consequence of both product representations for ζ(s), the Euler product (6.1) and the Hadamard product. Building on these ideas, Hadamard and de la Vallée-Poussin found (independently) in 1896 the first proof of Gauss’ conjecture, the celebrated prime number theorem. For technical reasons it is of advantage to work with the logarithmic derivative of ζ(s) which is for Re s > 1 given by ∞ X Λ(n) ζ′ (s) = − , ζ ns n=1 6. The Riemann Zeta-Function 67 where the von Mangoldt Λ-function is defined by log p if n = pk with k ∈ N, (6.12) Λ(n) = 0 otherwise. A lot of information concerning the prime counting function π(x) can be recovered from information about 1 X X 2 ψ(x) := Λ(n) = log p + O x log x . n≤x p≤x ψ(x) log x . Partial summation gives π(x) ∼ First of all, we shall express ψ(x) in terms of the zeta-function. If c is a positive constant, then Z c+i∞ s x 1 1 if x > 1, ds = 0 if 0 < x < 1. 2πi c−i∞ s This yields the so-called Perron formula: for x 6∈ Z and c > 1, Z c+i∞ ′ 1 xs ζ ψ(x) = − (6.13) (s) ds. 2πi c−i∞ ζ s Moving the path of integration to the left, we find that the latter expression is equal to the corresponding sum of residues, that are the residues of the integrand at the pole of ζ(s) at s = 1, at the zeros of ζ(s), and at the additional pole of the integrand at s = 0. The main term turns out to be s ′ x xs 1 ζ + O(1) = x, = lim (s − 1) Ress=1 − (s) s→1 ζ s s−1 s whereas each nontrivial zero ρ gives the contribution ′ ζ xs xρ Ress=ρ − (s) =− . ζ s ρ By the same reasoning, the trivial zeros altogether contribute ∞ X x−2n 1 1 = − 2 log 1 − 2 . 2n x n=1 Incorporating the residue at s = 0, this leads to the exact explicit formula X xρ 1 1 − 2 log 1 − 2 − log(2π), ψ(x) = x − ρ x ρ which is equivalent to Riemann’s formula (6.11). This formula is valid for any x 6∈ Z. Notice that the right-hand side of this formula is not absolutely convergent. If ζ(s) would have only finitely many nontrivial zeros, the righthand side would be a continuous function of x, contradicting the jumps of ψ(x) for prime powers x. Going on it is more convenient to cut the integral in (6.13) at t = ±T which leads to the truncated version x X xρ (6.14) +O (log(xT ))2 , ψ(x) = x − ρ T |γ|≤T valid for all values of x. Next we need information on the distribution of the nontrivial zeros. Already the non-vanishing of ζ(s) on the line Re s = 1 68 ERGODIC NUMBER THEORY yields the asymptotic relations ψ(x) ∼ x, resp. π(x) ∼ li (x), which is Gauss’ conjecture and sufficient for many applications. However, more precise asymptotics with a remainder term can be obtained by a zero-free region inside the critical strip. The largest known zero-free region for ζ(s) was found by Vinogradov and Korobov (independently) in 1958 who proved ζ(s) 6= 0 in Re s ≥ 1 − c 1 3 2 (log(|t| + 3)) (log log(|t| + 3)) 3 , where c is some positive absolute constant. In combination with the Riemann-von Mangoldt formula (6.10) one can estimate the sum over the nontrivial zeros in (6.14). Balancing out T and x, we obtain the prime number theorem with the sharpest known remainder term: Theorem 6.2. There exists an absolute positive constant C such that for sufficiently large x !! 3 (log x) 5 π(x) = li (x) + O x exp −C . 1 (log log x) 5 By the explicit formula (6.14) the impact of the Riemann hypothesis on the prime number distribution becomes visible. In 1900, von Koch showed that for fixed θ ∈ [ 12 , 1) (6.15) π(x) − li (x) ≪ xθ+ǫ ⇐⇒ ζ(s) 6= 0 for Re s > θ ; equivalently, one can replace the left-hand side by ψ(x) − x. Here and in the sequel ǫ stands for an arbitrary small positive constant, not necessarily the same at each appearance. With regard to known zeros of ζ(s) on the critical line it turns out that an error term with θ < 21 is impossible. Thus, the Riemann hypothesis states that the prime numbers are as uniformly distributed as possible! Many computations were done to find a counterexample to the Riemann hypothesis. Van de Lune, te Riele & Winter localized the first 1 500 000 001 zeros, all lying without exception on the critical line; moreover they all are simple! By observations like this it is conjectured, that all or at least almost all zeros of the zeta-function are simple. This is the so-called essential simplicity hypothesis. Already classical density theorems (e.g. those of Bohr & Landau) show that most of the zeros lie arbitrarily close to the critical line. On the other hand, Hardy showed that infinitely many zeros lie on the critical line. Refining a mollifying technique of Selberg, Levinson localized more than one third of the nontrivial zeros of the zeta-function on the critical line, and as Heath-Brown and Selberg (unpublished) discovered, they are all simple. The current record is due to Conrey who showed that more than two fifths of the zeros are simple and on the critical line. 6. The Riemann Zeta-Function 69 We give a heuristic probabilistic argument for the truth of Riemann’s hypothesis due to Denjoy. For this purpose we introduce the Möbius µfunction which is defined by µ(1) = 1, µ(n) = 0 if n has a quadratic divisor 6= 1, and µ(n) = (−1)r if n is the product of r distinct primes. It is easily seen that µ(n) is multiplicative and appears as coefficients of the Dirichlet series representation of the reciprocal of the zeta-function: X ∞ Y µ(n) 1 −1 , ζ(s) = 1− s = p ns p n=1 valid for Re s > 1. Riemann’s hypothesis is equivalent to the estimate X 1 M (x) := µ(n) ≪ x 2 +ǫ . n≤x This is related to (6.15). Now Denjoy [39] argued as follows: Assume that {Xn } is a sequence of random variables with distribution 1 P(Xn = +1) = P(Xn = −1) = . 2 Pn Define S0 = 0 and Sn = j=1 Xj , then {Sn } is a symmetrical random 2 walk in Z with starting point at 0. A simple application of Chebyshev’s inequality yields, for any positive c, 1 1 P{|Sn | ≥ cn 2 } ≤ 2 , 2c which shows that large values for Sn are rare events. By the theorem of Moivre-Laplace this can be made more precise. It follows that 2 Z c o n 1 x 1 2 exp − =√ dx. lim P |Sn | < cn n→∞ 2 2π −c Since the right-hand side above tends to 1 as c → ∞, we obtain n o 1 +ǫ 2 lim P |Sn | ≪ n =1 n→∞ for every ǫ > 0. We observe that this might be regarded as a model for the value-distribution of Möbius µ-function. The law of the iterated logarithm would even give the stronger estimate n o 1 lim P |Sn | ≪ (n log log n) 2 = 1, n→∞ 1 which suggests for M (x) the upper bound (x log log x) 2 . This estimate is pretty close to the so-called weak Mertens hypothesis which states Z X M (x) 2 dx ≪ log X. x 1 Note that this bound implies the Riemann hypothesis and the essential simplicity hypothesis. On the contrary, Odlyzko & te Riele disproved the original Mertens hypothesis, 1 |M (x)| < x 2 , 70 ERGODIC NUMBER THEORY by showing that (6.16) lim inf x→∞ M (x) x 1 2 < −1.009 and lim sup x→∞ M (x) 1 x2 > 1.06; Figure 3. The random walk generated by the values of the µ-function for n ≤ 10 000. 6.2. Applications of Uniform Distribution and Ergodic Theory Our first application deals with the arithmetic nature of the ordinates of the nontrivial zeros of the zeta-function. Rademacher [115] proved the remarkable result that these ordinates are uniformly distributed modulo one provided that the Riemann hypothesis is true; later Elliott [48] remarked that the latter condition can be removed, and (independently) Hlawka [64] obtained the following unconditional Theorem 6.3. The ordinates of the nontrivial zeros of the zeta-function are uniformly distributed modulo one. Proof. We need some deep results from zeta-function theory. We start with a theorem of Landau [94] who proved, for x > 1, X T xρ = −Λ(x) + O(log T ), 2π 0<γ≤T where the summation is over all nontrivial zeros ρ = β + iγ and Λ(x) is the von Mangoldt Λ-function, defined by (6.12). Hence, in view of the Riemann-von Mangoldt-formula (6.10) it follows that X 1 1 xρ ≪ . (6.17) N (T ) log T 0<γ≤T 6. The Riemann Zeta-Function 71 We do not want to argue under assumption of the Riemann hypothesis. For this aim we observe that 1 |x 2 +iγ − xβ+iγ | ≤ xβ | exp(( 21 − β) log x) − 1| ≤ xβ log x |β − 12 |. Thus, X 1 x log x X 1 |x 2 +iγ − xβ+iγ | ≤ |β − 21 |. N (T ) N (T ) (6.18) 0<γ≤T 0<γ≤T Next we shall use a result of Littlewood [99], namely X |β − 12 | ≪ T log log T. 0<γ≤T it should be noted that Selberg improved upon this result in replacing the right-hand side by T ; both estimates indicate that most of the zeta zeros are clustered around the critical line. Inserting this in (6.18) and use of (6.10) leads to X 1 log log T 1 (x 2 +iγ − xβ+iγ ) ≪ . N (T ) log T 0<γ≤T Thus, it follows from (6.17) that also X 1 1 log log T x 2 +iγ ≪ . N (T ) log T 0<γ≤T Letting x = zh with some real number z > 1 and h ∈ N, we deduce X 1 log log T exp(ihγ log z) ≪ , N (T ) log T 0<γ≤T which tends to zero as T → ∞. Hence, it follows from Weyl’s criterion, 1 γ log z is uniformly distributed Theorem 1.4, that the sequence of numbers 2π modulo one. • It is a long-standing conjecture that the ordinates of the nontrivial zeros are linearly independent over the rationals.∗ Ingham observed an interesting impact on the distribution of values of the Möbius µ-function in showing that, if the ordinates of the nontrivial zeros are indeed lin1 early independent over the rationals, then lim supx→∞ M (x)x− 2 = +∞ and 1 lim inf x→∞ M (x)x− 2 = −∞ which should be compared with (6.16). Our second aim is a recent application of ergodic theory (see [130]). For this purpose we need an ergodic transformation on the real line. Adler & Weiss [2] trace back the transformation x 7→ x− x1 to a paper of Boole [23] from the second half of the nineteenth century who observed the remarkable R R identity R f (x) dx = R f (x − x1 ) dx, valid for all continuous functions f ; we quote from their introduction: ∗ One may ask why to expect anything like that? Well, an appropriate answer would be: why not? There is definitely no reason why the zeros should satisfy some algebraic relations, hence it is reasonable to expect the converse. 72 ERGODIC NUMBER THEORY ”Now as is well known there are fundamental differences between the measure preserving transformations of finite measure spaces and those of infinite measure-spaces. In particular the latter theory suffers from a paucity of good examples, and so a natural question arose – what can ergodic theory say about Boole’s transformation...” Adler & Weiss prove that Boole’s transformation is indeed ergodic as the dear reader probably already suspected. For some reason we shall study a different, however, related transformation that has the advantage of a associated finite measure space. Recall the transformation given by T 0 = 0 and T x = 12 (x − x1 ) for x 6= 0 on R (that is Example 7 from Chapter 3). Our aim is to study the values of the zeta-function on vertical lines with respect to this transformation. First of all, we note that T is ergodic. Since the only T -invariant sets A with respect to the related probability measure P, given by (3.3), are A = {0} and A = R for which P(A) = 0 or = 1, transformation T is ergodic. Now define R ∋ x 7→ f (x) := ζ(s + ix), then f is integrable with respect to P for all s with Re s > − 21 . This follows immediately from the estimate if σ > 1, 0 1−σ (6.19) ζ(σ + it) ≪ tµ(σ)+ǫ with µ(σ) ≤ if 0 ≤ σ ≤ 1, 1 2 if σ < 0, 2 −σ as t → ∞. Hence, applying Birkhoff’s ergodic theorem implies, for Re s > − 12 , Z 1 dτ 1 X ζ(s + iT n x) = lim ζ(s + iτ ) N →∞ N π R 1 + τ2 0≤n<N for almost all x ∈ R. For the evaluation of these ergodic limits we shall use another interpretation of these integrals. Recently, Lifshits & Weber [97] published a paper entitled ”Sampling the Lindelöf Hypothesis with the Cauchy Random Walk” which explains the content of their interesting paper very well. If (Xm ) is an infinite sequence of independent Cauchy distributed P random variables, the Cauchy random walk is defined by Cn = m≤n Xm . Lifshits & Weber proved among other things (in slightly different notation) that almost surely 1 1 X ζ( 12 + iCn ) = 1 + o(N − 2 (log N )b ) (6.20) lim N →∞ N 1≤n≤N for any b > 2. It should be noted that the expectations EXm and ECn do not exist, and, indeed, the values of Cn provide a sampling of randomly distributed real numbers of unpredictable size. Hence, the almost sur econvergence theorem of Lifshits & Weber shows that the expectation value of ζ(s) on the Cauchy random walk s = 12 + iCn equals one, which indicates 6. The Riemann Zeta-Function 73 that most of the values of the zeta-function on the critical line are pretty small. The yet unproved Lindelöf hypothesis states that, for any ǫ > 0, ζ( 21 + it) ≪ tǫ (6.21) as t → ∞. The Riemann hypothesis implies the Lindelöf hypothesis (see [137]) and, thus, the Lindelöf hypothesis serves in some applications as valuable substitute. The presently best estimate in this direction is due to 32 + ǫ in place of the tiny ǫ above. Huxley who obtained the exponent 205 1 P Our Cesàro mean N 0≤n<N ζ(s + iT n x) may as well be interpreted as a sample for testing the Lindelöf hypothesis. Noting that the density function of a Cauchy distributed random vari1 able X is given by τ 7→ π(1+τ 2 ) , it follows that the associated probability measure is also given by (3.3), as our ergodic measure, and the integral in question is thus nothing but the expectation of ζ( 12 + iX), Z 1 dτ ζ( 1 + iτ ) . Eζ( 21 + iX) = π R 2 1 + τ2 In their account to prove (6.20) Lifshits & Weber computed by elementary means several expectation values, in particular this one, which yields 1 X lim ζ( 21 + iT n x) = ζ( 32 ) − 83 = −0.05429 . . . . N →∞ N 0≤n<N See [130] for a different proof which relies on the calculus of residues. It is not difficult to consider other vertical lines rather than the critical line. The general result is Theorem 6.4. Let s be given with Re s > − 12 . Then Z 1 dτ 1 X ζ(s + iT n x) = ζ(s + iτ ) . lim N →∞ N π R 1 + τ2 0≤n<N for almost all x ∈ R. Define ℓ(s) = If Re s < 1, then Z dτ 1 ζ(s + iτ ) . π R 1 + τ2 ℓ(s) = ζ(s + 1) − 2 , s(2 − s) where the case of s = 0 is included as ℓ(0) = lims→0 ℓ(s) = γ − 12 , where γ = 0.577 . . . is the Euler-Mascheroni constant defined as γ = P 1 limM →∞ ( M m=1 m − log M ). If s = 1 + it with some real number t, then ℓ(s) = ζ(s + 1) − 1 1 = ζ(2 + it) − . s(2 − s) 1 + t2 Finally, ℓ(s) = ζ(s + 1) for Re s > 1. 74 ERGODIC NUMBER THEORY Moreover, this allows to give an equivalent formulation of the Riemann hypothesis in terms of our ergodic transformation. It is widely expected that if the Riemann hypothesis is true, this should be related to the Euler product (6.1) although this representation is valid only for Re s > 1. This belief is grounded on counterexamples to the Riemann hypothesis which have a Dirichlet series expansion and satisfy a Riemann-type functional equation (see [137], §10.25). In many reformulations of the Riemann hypothesis one can find a multiplicative feature inside. For our purpose we replace the zeta-function by its logarithm which is, thanks to the Euler product, also representable as Dirichlet series in its half-plane of convergence. We denote the nontrivial zeros of ζ(s) by ρ. Balazard, Saias & Yor [12] proved Z X ρ log |ζ(s)| 1 log | ds| = (6.22) 1 − ρ , 2π Re s= 1 |s|2 1 2 Re ρ> 2 and deduced (the obvious consequence) that the Riemann hypothesis is true if, and only if, the integral vanishes. Substituting t = τ2 the integral in (6.22) can be rewritten as Z ∞ Z dt dτ 1 1 1 log |ζ( 12 + 21 iτ )| , log |ζ( 2 + it)| 1 = 2 2π −∞ π R 1 + τ2 | 2 + it| which Balazard, Saias & Yor also interpret as the expectation value of log |ζ(s)| of a Brownian motion on the critical line with Cauchy distribution. We may interpret this integral as limit of a Cesàro mean under application of Birkhoff’s ergodic theorem; the applicability of the ergodic theorem is obvious by (6.22). This leads to Theorem 6.5. For almost all x ∈ R, 1 lim N →∞ N X 0≤n<N log |ζ( 21 + n 1 2 iT x)| = X Re ρ> 12 ρ ; log 1 − ρ in particular, the Riemann hypothesis is true if, and only if, either side vanishes, the left-hand side for almost all real x. We have checked the statement of the Theorem 6.5 for various values of x numerically. For instance, with the initial value x = 42 we found X 10−6 log |ζ( 21 + iT n 42)| = −0.00004 45327 . . . . 0≤n<10k There is an important application of Birkhoff’s ergodic theorem in the value-distribution theory of zeta- and L-functions. In 1975 Voronin [146] discovered a remarkable approximation property of the zeta-function. His famous universality theorem states: Let 0 < r < 14 and g(s) be a nonvanishing continuous function defined on the disk |s| ≤ r, which is analytic 6. The Riemann Zeta-Function 75 Figure 4. The values of ζ( 21 + it) as −155 ≤ t ≤ 155 in red and the values of ζ( 12 + iT n x) with x = 42 for 0 ≤ n < 100 in black; the range for t is according to the values T n 42. in the interior of the disk. Then, for any ǫ > 0, there exists a real number τ > 0 such that max ζ s + 34 + iτ − g(s) < ǫ; |s|≤r moreover, the set of all τ ∈ [0, T ] with this porperty has positive lower density with respect to the Lebesgue measure. Meanwhile many examples of universal zeta-functions are known; for example, Dirichlet L-functions which are given by ∞ X χ(n) Y χ(p) −1 L(s, χ) = , = 1− s ns p p n=1 where χ is a Dirichlet character (that is a group homomorphism on the group of residue classes Z/q Z), or zeta-functions associated with number fields (see [129] for an overview). Besides Voronin’s original proof there is a probabilistic approach to universality due to Bagchi, Reich, Laurinčikas and further developed by many others. In this method the ergodic theorem replaces the use of Weyl’s uniform distribution theorem in Voronin’s approach. It is conjectured that universality is an ergodic phenomenon (see [101]). More on this fascinating topic can be found in [95, 101, 129]; also have a look to [13] for a slightly different presentation. 76 ERGODIC NUMBER THEORY Interestingly, Birkhoff proved a universality theorem long before Voronin. In [19] he showed the existence of an entire function f (z) with the property that, given any entire function g(z), there exists a sequence of complex numbers an such that f (z + an ) −→ g(z) n→∞ uniformly on compact subsets of C. Although this result has a striking similarity with Voronin’s theorem, Birkhoff’s universal function f is not explicitly known and the Riemann zeta-function and its relatives are so far the only explicitly known universal functions. Exercises The first appearance of the zeta defining series seems to be in the work of the 14th century scientiest Oresme. The evaluation of the sum over the reciprocals of the squares was one of the great open problems of the beginning of the 18th century. Where the Bernoullis failed Euler succeeded. Exercise 6.1. Give a rigorous proof of Theorem 6.1. A good hint could be the following formula: ∞ X (−1)k k=1 (2π)2k d sin(πz) B2k z 2k = πz cot(πz) − 1 = z log . (2k)! dz πz Moreover, evaluate ζ(2) along the following way: verify ∞ ∞ Z 1Z 1 ∞ X X 1 3X 1 = = x2m y 2m dx dy 2 4 n=1 n2 (2m + 1) 0 0 m=0 m=0 Z 1Z 1 X Z 1Z 1 ∞ dx dy = (xy)2m dx dy = . 1 − x2 y 2 0 0 m=0 0 0 Use the transformation sin u sin v and y = cos v cos u in order to compute the appearing double integral above and deduce x= ζ(2) = ∞ X π2 1 = . 2 n 6 n=1 This elementary method is due to Calabi. It is remarkable that already Euler had partial results toward the functional equation for ζ(s), namely, formulae for the values of ζ(s) for integral s and for half-integral s relating s with 1 − s although he considered ζ(s) as a function of a real variable s and so the pole at s = 1 is a severe barrier for continuation of ζ(s) on the real axis. Here is a sketch of his reasoning: for m ∈ N0 , (6.23) and 1m − 2m + 3m ∓ . . . = (1 − 2m+1 )ζ(−m), m x d x − 2 x + 3 x ∓ ... = x . dx 1+x m m 2 m 3 6. The Riemann Zeta-Function Using the latter formula with x = exp(2πiw), we get m exp(2πiw) d (1 − 2m+1 )ζ(−m) = (2πi)−m dw 1 + exp(2πiw) 77 . w=0 This leads to a formula relating values of the zeta-function at s = 2k and 1 − s = 1 − 2k. Euler’s proof needs a modified notion of convergence – this is obvious with respect to (6.23); using summability arguments one can make also this approach waterproof. Exercise 6.2. Read in [59, 137] and provide a rigorous proof for the following statement due to Euler: for n ∈ N, Bn ζ(1 − n) = − . n Exercise 6.3. Try to give complete proofs of Theorem 6.4 and 6.5. Apply these ideas to other zeta- or L-functions and inform the author of your results... * * * In the next two chapters we will learn about continued fractions and their remarkable diophantine approximation properties as well as their ergodic behaviour. CHAPTER 7 Crash Course in Continued Fractions Continued fractions are a powerful tool in Diophantine approximation theory. They have been used for long time and in various cultures, however, a systematic theory for continued fractions was only developed in the 17th century by the astronomer and mathematician Christiaan Huygens while constructing a mechanical planetarium. 7.1. The Euclidean Algorithm Revisited Recall the Euclidean algorithm: given two positive integers a and b, define r−1 := a, r0 := b and apply successively divison with remainder as follows with 0 ≤ rn+1 < rn . rn−1 = an rn + rn+1 for n = 0, 1, 2 . . .. The sequence of remainders rn is a strictly decreasing sequence of positive integers, hence the algorithmus terminates and it turns out, by elementary divisibility properties, that the least non-vanishing remainder rm is equal to the greatest common divisor of a and b, which we denote by rm = gcd(a, b). We may rewrite the Euclidean algorithm as (7.1) rn−1 rn+1 rn−1 = + rn rn rn for n ≤ m. Here we have an = r−1 a = = a0 + b r0 r0 r1 j rn−1 rn −1 with 0 ≤ rn+1 < rn k which implies = a0 + 1 −1 = . . . . r1 a1 + r2 The first equality yields the integral part a0 of ab ; disregarding the remainder terms r1 , . . ., the further equalities provide better and better rational approximations. An example: the tropical year, that is by definition the time from one spring equinox to the next, consists of 365 days 5 hours 48 minutes and 45.8 seconds 78 ≈ 365 + 419 days. 1730 7. Crash Course in Continued Fractions 79 Unfortunately, this is not an integer, so how to define a good calendar? Using the Euclidean algorithm we find 1730 = 419 = 54 = ... 4 · 419 + 54, 7 · 54 + 41, 1 · 41 + 13, In view of (7.1) this gives 54 1730 =4+ , 419 419 resp. 1730 −1 1 419 ≈ 365 + . 365 + = 365 + 1730 419 4 ∗ This is nothing but the Julian calendar: all four years a leap year with an additional day. With the complete Euclidean algorithm we obtain 1 419 . = 365 + 365 + 1 1730 4+ 1 7+ 1 1+ 1 3+ 1 6+ 2 Disregarding the last fraction 12 , we get the rational approximation 365 + 194 419 ≈ 365 + , 801 1730 which corresponds to the Gregorian calendar:† in 800 years 6 (= 200−194) leap years are left out. In Japan a lunisolar calendar adapted from the Chinese calendar was in use before 1873 when the Gregorian calendar was introduced. In lunisolar calendars the tropical year is approximated by lunar months which is the time from one new Moon to the next; for an impression on lunisolar calendars in general and the Chinese one in particular we refer to [7]. The expression 1 a0 + 1 a1 + a2 + ... 1 + 1 am−1 + am ∗ named after Julius Caesar who introduced this calendar in 45 B.C. with the scientific support of the Greek astronomer Sosigenes of Alexandria † named after pope Gregor XIII, who introduced this calendar in 1582 when the solar year was already ten days ahead the Julian calendar; the scientists behind this reform were Aloysius Lilius and Pietro Pitati. 80 ERGODIC NUMBER THEORY is called a (regular) continued fraction and the appearing numbers an are said to be its partial quotients. In order to save space and ink we shall use the abbreviation [a0 , a1 , a2 , . . . , am ]. For the first we consider [a0 , . . . , am ] as a function of independent variables a0 , . . . , am . Obviously, a1 a0 + 1 [a0 ] = a0 , [a0 , a1 ] = a1 and a2 a1 a0 + a2 + a0 [a0 , a1 , a2 ] = . a2 a1 + 1 By induction, one shows 1 (7.2) [a0 , a1 , . . . , an ] = a0 , a1 , . . . , an−1 + an and 1 = [a0 , [a1 , . . . , an ]]. [a1 , . . . , an ] For a positive integer n ≤ m the expression [a0 , a1 , . . . , an ] is called the n-th convergent to [a0 , a1 , . . . , am ]. Moreover, we define recurrent sequences by p−1 = 1, p0 = a0 , and pn = an pn−1 + pn−2 , (7.3) q−1 = 0, q0 = 1, and qn = an qn−1 + qn−2 . [a0 , a1 , . . . , an ] = a0 + The computation of continued fractions is not too difficult thanks to the following theorem: Theorem 7.1. For 0 ≤ n ≤ m, pn = [a0 , a1 , . . . , an ]. qn Proof by induction. The case n = 0 is trivial; the case n = 1 follows immediately from p1 a1 a0 + 1 = . [a0 , a1 ] = a1 q1 Now let us suppose that the formula of the theorem is true for n. In view of (7.2) we find 1 [a0 , a1 , . . . , an , an+1 ] = a0 , a1 , . . . , an + . an+1 Using the recursion formulae for pn and qn , the latter expression equals 1 an + an+1 pn−1 + pn−2 (an+1 an + 1)pn−1 + an+1 pn−2 = 1 (an+1 an + 1)qn−1 + an+1 qn−2 an + an+1 qn−1 + qn−2 = which concludes the induction. • pn+1 an+1 pn + pn−1 = , an+1 qn + qn−1 qn+1 7. Crash Course in Continued Fractions 81 The sequences of numerators and denominators, respectively, have interesting arithmetical properties: Theorem 7.2. For 1 ≤ n ≤ m, pn qn−1 − pn−1 qn = (−1)n−1 , and pn qn−2 − pn−2 qn = (−1)n an . Proof. It follows from (7.3) that pn qn−1 − pn−1 qn = (an pn−1 + pn−2 )qn−1 − pn−1 (an qn−1 + qn−2 ) = −(pn−1 qn−2 − pn−2 qn−1 ). Repeating this for n − 1, n − 2, . . . , 2, 1 we derive the first assertion. In a similar manner pn qn−2 − pn−2 qn = (an pn−1 + pn−2 )qn−2 − pn−2 (an qn−1 + qn−2 ) = an (pn−1 qn−2 − pn−2 qn−1 ), which implies the second statement. • Next we asign numerical values to the partial quotients an and, consequently, to the continued fraction [a0 , a1 , . . .] itself. In the sequel we assume a0 ∈ Z and an ∈ N for 1 ≤ n < m as well as am ≥ 1. In view of Theorem 7.1 it follows that pn and qn are integers for n < m; moreover, the first assertion of Theorem 7.2 implies that they are coprime. Now let α be rational. Then there exist coprime integers a and b > 0 such that α = ab . Using our variation of the Euclidean algorithm (7.1) applied to r−1 = a and r0 = b, it follows that α can be represented as a finite continued fraction: a rn−1 = [a0 , a1 , a2 , . . . , am ] with an = . b rn This representation is not unique, since [a0 , a1 , a2 , . . . , am ] = [a0 , a1 , a2 , . . . , am − 1, 1]. There is an obvious way out of this non-uniqueness. We conclude: every rational number has a unique representation as a finite continued fraction if we assume the last partial quotient to be strictly larger than one. 7.2. Infinite Continued Fractions We may rewrite algorithmus (7.1) for the computation of the continued fraction expansion of a given rational α as 1 for n = 0, 1, . . . . (7.4) α0 := α, αn = ⌊αn ⌋ + αn+1 Setting an = ⌊αn ⌋ we obtain α = [a0 , a1 , . . . , an , αn+1 ]. This algorithm is called the continued fraction algorithm. If α is rational, the iteration terminates after finitely many steps and it nothing but the Euclidean algorithm 82 ERGODIC NUMBER THEORY in disguise. What happens if we start with an irrational number? For instance, for α = π = 3.14159 . . . we compute 1 a0 = ⌊π⌋ = 3 and α1 = = 7.06251 . . . , π−3 1 = 15.99744 . . . , a1 = ⌊7.06251 . . .⌋ = 7 and α2 = 7.06251 . . . − 7 1 , a2 = ⌊15.99744 . . .⌋ = 15 and α3 = 15.99744 . . . − 15 which leads to π = [3, 7, 15, α3 ]. The Japanese samurai and mathematician Matsunaga Yoshisuke computed π correct to 52 digits, the most precise numerical value for π in wasan. He also gave the begining of the continued fraction expansion by a method (similar to the above) called reiyakujyutsu, in translation dividing by zero (cf. [63]). Now let α be an arbitrary irrational number. Then the iteration does not terminate (since otherwise we would obtain a representation of α as a finite continued fraction, contradicting the irrationality of α). It follows that the continued fraction algorithm applied to an irrational α produces an infinite sequence of finite continued fractions: [a0 , a1 , . . .] := lim [a0 , a1 , . . . , αm ]. m→∞ The limit of this sequence is denoted by [a0 , a1 , a2 , . . .] and is called an infinite continued fraction. The first task is to examine whether this infinite process is convergent and, if so, whether the limit is related to our starting value α. Theorem 7.3. Let α = [a0 , a1 , . . . , an , αn+1 ] be irrational with convergents [a0 , a1 , . . . , an ] = pqnn . Then α− In particular, (−1)n pn = . qn qn (αn+1 qn + qn−1 ) pn = [a0 , a1 , a2 , . . .]. n→∞ qn α = lim Proof. Firstly, we note that all previous observations on finite continued fractions carry over to infinite ones, in particular (7.3) and Theorem 7.1. A short computation shows pn αn+1 pn + pn−1 pn pn−1 qn − pn qn−1 α− = − = . qn αn+1 qn + qn−1 qn qn (αn+1 qn + qn−1 ) Hence, Theorem 7.2 implies the first assertion. Since an+1 ≤ αn+1 we further have 1 p n α − ≤ . qn qn (an+1 qn + qn−1 ) In case of irrational α the sequences of pn and of qn both are strictly increasing for n ≥ 2. Thus, the sequence of convergents pqnn is alternately larger, 7. Crash Course in Continued Fractions 83 resp. smaller than α; those with even index n lie to the left and those with odd index n to the right of α: p0 p2 p3 p1 < < ... < α < ... < < . q0 q2 q3 q1 If α is irrational, the continued fraction algorithm does not terminate and the denominators qn of its convergents form a strictly monotonic increasing sequences of integers. It thus follows from the proven part of the theorem that the distances between consecutive convergents is tending to zero. Hence, the numbers pqnn converge to the limit [a0 , a1 , . . .] and this limit equals α. • It is easy to see that the continued fraction expansion of an irrational number is unique. This already allows to construct the set of real numbers R out of the rationals Q. Moreover, continued fractions induce an order on the real axis. Given two real numbers α = [a0 , . . . , an , αn+1 ] and α′ = [a0 , . . . , an , α′n+1 ] with identical partial quotients aj for j ≤ n, it follows that any α′′ lying in between α and α′ has a continued fraction expansion starting with the same partial quotients, more precisely, α′′ = [a0 , . . . , an , α′′n+1 ]. Theorem 7.3 already shows the importance of continued fractions in the theory of Diophantine approximation. Here we note Corollary 7.4. Let α = [a0 , a1 , . . .] be irrational with convergents 1 pn (7.5) α − qn < an+1 q 2 . n pn qn . Then This statement improves Dirichlet’s approximation theorem 1.1: the seuquence of convergents approximates α better and better (since the denominators are strictly increasing and each partial quotient is greater than or equal to one). Thus, we do not only know about very good rational approximations to a given α, but can compute them explicitly from the continued fraction expansion of α. Actually, the approximation theorem of Hurwitz gives another improvement: for any α ∈ R \ Q there exist infinitely many rationals pq such that 1 p (7.6) ξ − q < √5q 2 , √ and the constant 5 cannot be replaced by any larger constant. For the proof one considers the slowest converging continued fraction √ Fn+1 5+1 = [1, 1, 1, 1, 1, . . .] = lim , n→∞ Fn 2 where the Fn are the Fibonacci numbers, defined by the recursion F0 := 0, F1 := 1 and Fn+1 = Fn + Fn−1 for n ∈ N. 84 ERGODIC NUMBER THEORY Another example of an infinite continued fraction is the one for π:‡ We compute π = [3, 7, 15, 1, 292, 1, 1, 1, 21, 31, 14, 2, 1, 2, 2, 2, . . .]. Cutting the continued fraction in front of the partial quotient 292, we obtain 355 p3 = [3, 7, 15, 1] = . 113 q3 This leads to an excellent approximation: 355 1 0< −π < = 0.00000 02682 . . . , 113 292 · 1132 which was already known by the Chinese mathematician Tsu Chung Chi in 500 A.D.. Moreover, the next convergent has an extremely large denominator: q4 = a4 q3 + q2 = 292 · 113 + 106 = 33 102. The sequence of the convergents is identical with the best rational approximations to π starts as follows: 333 1 03993 355 22 3 < < < ... < π < ... < < . 1 106 33102 113 7 This is no miracle as Lagrange proved in 1770: Theorem 7.5. Let α be real with convergents positive integers p, q satisfying 0 < q ≤ qn and pn qn . Then, p pn q 6= qn , for n ≥ 2 and any |qn α − pn | < |qα − p|. This is the so-called law of best approximation; it shows that one cannot approximate better than with the convergents of the continued fraction expansion! Proof. We may assume that p and q are coprime. Since |qn α − pn | < |qn−1 α − pn−1 | it suffices to prove the assertion under the assumption that qn−1 < q ≤ qn ; the full assertion follows by induction. First suppose q = qn , then p 6= pn and p pn − ≥ 1. q qn qn By Theorem 7.3, 1 1 α − pn ≤ < , qn qn qn+1 2qn where we have used qn+1 ≥ 3 (since n ≥ 2). By the triangle inequality, α − p ≥ p − pn − α − pn > 1 > α − pn , q q qn qn 2qn qn which yields the inequality of the theorem after multiplication with q = qn . ‡ So far no pattern has been found in the regular continued fraction expansion of π, in contrast to e = exp(1) = [2, 1, 2, 1, 1, 4, 1, . . . , 1, 2n, 1, . . .]; here the meaning of this notation is obvious. 7. Crash Course in Continued Fractions 85 Now suppose that qn−1 < q < qn . By Theorem 7.2 the linear equation system pn X + pn−1 Y = p and qn X + qn−1 Y = q has the unique solution pqn−1 − qpn−1 = ±(pqn−1 − qpn−1 ) x= pn qn−1 − pn−1 qn and pqn − qpn = ±(pqn − qpn ). pn qn−1 − pn−1 qn Thus, x and y are distinct integers different from zero. We observe that x and y have different signs and the same holds for qn α − pn and qn−1 α − pn−1 as well. Hence, the numbers x(qn α − pn ) and y(qn−1 α − pn−1 ) have the same sign. Since qα − p = x(qn α − pn ) + y(qn−1 α − pn−1 ), y= it follows that |qα − p| > |qn−1 α − pn−1 | > |qn α − pn |, and this concludes the proof. • Exercises How fast do the convergents to an infinite continued fraction grow? Of course, this depends on the limit of the continued fraction. But one may try to give universal lower bounds for the denominator and numerator of the convergents. Exercise 7.1. For the convergents that n pn ≥ 2 2 −1 pn qn to a given irrational α = [a0 , a1 , . . .] prove qn ≥ 2 and for any n ∈ N. Moreover, show that n−1 2 n X (−1)j−1 pn = a0 + . qn qj qj−1 j=1 + The next topic provides a bijection between N and Q different from the usual one, and much more convenient, from Calkin & Wilf [26]. 1 3 1 4 tt tt tt 88 4 3 1 2 iiii iiii i i i iii JJ JJ JJ 3 2 3 5 Starting with the initial value 1 1 88 5 2 1 1 UUUU UUUU UUUU UU t tt tt t 2 3 8 8 2 5 5 3 2 1 JJ JJ JJ 3 4 3 1 88 4 1 construct recursively a tree by a a a+b 7→ , . b a+b b The Calkin–Wilf sequence is then given by reading this tree line by line from the top, 11 , 21 , 21 , 13 , 23 , 23 , 13 , 14 , 43 , 53 , . . .. 86 ERGODIC NUMBER THEORY Exercise 7.2. Show that the successors of any reduced fraction in the Calkin– Wilf sequence are reduced too. Further, prove that the Calkin–Wilf sequence takes any positive rational value exactly once. Moreover, compute the continued fraction expansions for the rational numbers appearing in the first four rows of the Calkin–Wilf tree. Is there any pattern? Where will the number 355 113 appear? Try to find a rule for how the rationals in the Calkin–Wilf sequence can be enumerated in terms of their continued fraction expansions. Finally, prove that the Calkin–Wilf sequence satisfies the following recursion formula: for n ∈ N, 1 1 . x1 = , xn+1 = 1 ⌊xn ⌋ + 1 − {xn } √ Exercise 7.3. Prove Hurwitz’s approximation theorem; the constant 5 is closely related to [1, 1, . . .]. Hint: use the law of best approximation, Theorem 7.5. Whenever an algorithm is used it is important to know whether it terminates, and in case it does, how fast. Exercise 7.4. For the number of steps m in the Euclidean algorithm for the integers b ≤ a show !−1 √ 5+1 m ≤ log (1 + log a). 2 Hint: show that the Euclidean algorithm is extraordinarily slow for consecutive Fibonacci numbers Fn . Use Binet’s formula (1.8) to derive the estimate. Moreover, show that any positive integer a has a binary representation a= ℓ X ak 2 k , where k=0 ak ∈ {0, 1}, aℓ = 1. Give an upper bound for the quantity ℓ and deduce that a can be expressed by apa proximately log log 2 bits. What does this imply for the running time of the Euclidean algorithm? The Euclidean algorithm terminates in polynomial time in the input data. The estimate for the running time of the Euclidean algorithm is due to Lamé in 1845, long before the computer age. It is remarkable that the average case does not fall much behind the bound for the worst case. Heilbronn [62] showed that the average length of the Euclidean algorithm is π122 log 2 log a. See [139] for improvements. For the final task a look into the literature is probably needed: Exercise 7.5. Prove Lagrange’s theorem: the continued fraction expansion of an irrational real number α is eventually periodic if, and only if, α is quadratic irrational, i.e., there exists an irreducible quadratic polynomial P ∈ Z[X] with P (α) = 0. * * * In the following chapter we shall study statistical patterns in continued fraction expansions. We motivate these investigations by an example from outer space: if the quotient of periods of evolution of two planets around the Sun is close to a rational number, a phenomenon called resonance in celestial 7. Crash Course in Continued Fractions 87 mechanics, then these planets will perturb each other. For instance, Jupiter and Saturn pass approximately 299 and 120.5 angular seconds a day, which 2 implies a resonance value of 120.5 299 = 0.403 . . . ≈ 5 and generates an observable secular perturbation that increases for several hundred years before the planets return to their previous orbits (cf. [5]). When Poincaré was thinking about the stability of our solar system, he suggested to investigate how many resonance relations exist. The method of choice are continued fractions since large partial quotients go along with extraordinary good rational approximations. Therefore, it is necessary to understand how many real numbers have continued fraction expansions with large partial quotients. CHAPTER 8 Metric Theory of Continued Fractions In a letter to Laplace from January 1812 (long before ergodic theory or even measure and probability theory) Gauss described a ’curious’ problem he already had been working on for twelve years without a satisfying solution: given 0 ≤ ξ ≤ 1, let mn (ξ) denote the probability that for α = [0, a1 , a2 , . . . , an , αn+1 ] ∈ [0, 1) the inequality 1 αn+1 <ξ holds. Obviously, m0 (ξ) = ξ and mn+1 depends on mn . Very likely Gauss was aware fo the identity ∞ X 1 ) . mn ( k1 ) − mn ( k+ξ mn+1 (ξ) = k=1 Actually, Gauss wrote in his letter that he had found a simple proof for (8.1) lim mn (ξ) = n→∞ log(1 + ξ) log 2 and that the limit satisfies the functional equation m(ξ) = ∞ X k=1 1 ) m( k1 ) − m( k+ξ in addition to m(0) = 0 and m(1) = 1. However, he was not able to describe the difference mn (ξ) − log(1+ξ) log 2 . There was also some kind of language problem for Gauss. It was difficult to formulate his result without the notion of measure. Definitely, he knew of exceptions from his probabilistic law but the additional term ’for almost all ξ’ was added only one century later when the deviation from the limit was successfully investigated by Kuzmin [92]. His solution gives not only a first published proof of (8.1)) but also an explicit error term for the limit law. This error term estimate was improved by Lévy [96] who showed mn (ξ) = log(1 + ξ) + O(q n ) log 2 for some q ∈ (0, 0.76); a proof can be found in Rockett & Szüsz [119]. The sharpest known bound is due to Wirsing [154]. Using this theorem of Gauss–Kuzmin–Lévy in various ways Lévy and Khintchine observed 88 8. Metric Theory of Continued Fractions 89 interesting statistical results for continued fraction, e.g., that almost all continued fractions [0, a1 , a2 , . . .] satisfy ! N1 log k ∞ N Y Y log 2 1 an = (8.2) lim 1+ 2 , N →∞ k + 2k n=1 k=1 where the product on the right is convergent with a limit approximately 2.68. This almost sure convergence for the geometrical mean and much more we shall prove by an ergodic argument (without using the Gauss– Kuzmin–Lévy theorem). Whereas the approaches of Khintchine and Lévy were of probabilistic nature Wolfgang Doeblin [43] in 1940 and Ryll-Nardzewski showed independently in 1951 that an ergodic system rules the complicated arithmetic of continued fractions. The ergodicity of the Gauss map had been established earlier by Knopp [84] in 1926, and (independently) by Martin [100] in 1934, however, both used a different language and followed a different line of investigation. 8.1. Ergodicity of the Continued Fraction Mapping The continued fraction map T : [0, 1) → [0, 1) is defined by 1 for 0 < x < 1 T x = mod 1 x and T 0 = 0; note that for 0 < x < 1 we could also have written T x = 1 1 1 n x − x = { x }. Obviously, T x = 0 holds for some n if, and only if, x is 1 1 y y 0 0 0 1 x 0 1 x Figure 1. The continued fraction map: on the left its graph, on the right the graph of its density. rational. In fact, from the previous chapter we know (8.3) T [0, a1 , a2 , . . .] = [a1 , a2 , a3 , . . .] mod 1 = [0, a2 , a3 , . . .]. For our ergodic machinery we need to find a measure for which T is measure preserving. In general this is no easy task (see Exercise 3.2). Here comes the solution: for a Lebesgue measurable set A the Gauss measure µ is given by Z dx 1 . µ(A) = log 2 A 1 + x 90 ERGODIC NUMBER THEORY This defines a probability measure on [0, 1). Now we shall prove that the continued fraction map T is measure preserving with respect to the Gauss measure µ. It suffices to show that µ(T −1 (0, ξ)) = µ((0, ξ)), resp. Z Z dx dx = 1 + x 1 −1 T (0,ξ) (0,ξ) + x for any ξ ∈ [0, 1). We note T −1 (0, ξ) = ∞ [ n=1 1 1 , n+ξ n , where the union on the right-hand side is disjoint since 0 ≤ ξ < 1. It follows from Z 1/n 1 1 dx = log 1 + − log 1 + n n+ξ 1/(n+ξ) 1 + x that Z T −1 (0,ξ) (8.4) dx 1+x ∞ Z X 1/n dx 1 +x n=1 1/(n+ξ) ∞ X 1 1 − log 1 + ; = log 1 + n n+ξ n=1 = it is not difficult to see that the appearing series is convergent. Since 1 + n1 1 + nξ n+1 n+ξ = , = 1 ξ n n+1+ξ 1 + n+ξ 1 + n+1 we may replace the series in (8.4) by ∞ X ξ ξ log 1 + − log 1 + . n n+1 n=1 Now reading backwards we find Z ξ Z ∞ Z ξ/n X dx dx dx = = , 1+x 0 1+x T −1 (0,ξ) 1 + x n=1 ξ/(n+1) and, consequently, the map T is measure preserving. Next we want to show that µ is ergodic which will turn out to be more sophisticated. For positive integers aj , define ∆n := ∆n (a1 , . . . , an ) := {x = [0, a1 (x), a2 (x), . . .] ∈ [0, 1) : a1 (x) = a1 , . . . , an (x) = an }. These sets consist of those x from the unit interval which have partial quotients aj (x) equal to the prescribed values aj for j = 1, . . . , n; for example, 1 1 1 , 1 , ∆1 (n) = , for n ≥ 2. ∆1 (1) = 2 n+1 n 8. Metric Theory of Continued Fractions 91 In fact, the sets ∆n are semi-open intervals with end points pn pn + pn−1 and , qn qn + qn−1 which follows immediately from the bijective mapping pn + tpn−1 = [0, a1 , . . . , an + t] [0, 1] ∋ t 7→ qn + tqn−1 (in addition with our observations on continued fractions from the previous chapter). Here, as usual, pqnn stands for the nth convergent to [a0 , a1 , . . .]. Now denote by D the set of all intervals ∆n (built from all possible ingredients a1 , . . . , an ∈ N with arbitrary n ∈ N). Then the end points of all intervals ∆n coincide with the set of rational numbers in the unit interval [0, 1). Thus, D is a countable family of semi-open intervals which are related to continued fractions and generate the Borel σ-algebra. Using Theorem 7.2 we compute the Lebesgue measure of the sets ∆n as 1 (8.5) λ(∆n (a1 , . . . , an )) = . qn (qn + qn−1 ) For 0 ≤ a < b ≤ 1, we either have (8.6) or (8.7) pn + apn−1 pn + bpn−1 , {x : a ≤ T x ≤ b} ∩ ∆n = qn + aqn−1 qn + bqn−1 n n {x : a ≤ T x ≤ b} ∩ ∆n = pn + bpn−1 pn + apn−1 , qn + bqn−1 qn + aqn−1 according to n being even or odd. In any case we have , {x : a ≤ T n x ≤ b} = T −n [a, b) and (8.8) λ(T −n [a, b) ∩ ∆n ) = λ([a, b))λ(∆n ) qn (qn + qn−1 ) . (qn + aqn−1 )(qn + bqn−1 ) These technical computations are left to the reader as Exercise 8.2. Since the sequence of the denominators qn is monotonic, 1 qn qn (qn + qn−1 ) qn (qn + qn−1 ) < 2. < < < 2 qn + qn−1 (qn + aqn−1 )(qn + bqn−1 ) qn2 In view of (8.8) we deduce for an arbitrary interval I ⊂ [0, 1) the inequalities 1 λ(I)λ(∆n ) < λ(T −n I ∩ ∆n ) < 2λ(I)λ(∆n ). 2 The same inequalities hold if we replace I by any finite union of disjoint intervals of this type. And since the set of such disjoint unions generates the Borel σ-algebra, Inequality (8.9) holds for any Borel set and, in particular, for any Lebesgue measurable set A: 1 λ(A)λ(∆n ) ≤ λ(T −n A ∩ ∆n ) ≤ 2λ(A)λ(∆n ). (8.9) 2 92 ERGODIC NUMBER THEORY Note that we have replaced the strict inequalities by simple inequalities since the above argument to step from intervals to measurable sets involves an approximation process. However, we have to introduce the Gauss measure µ in our consideration. We have 1 1 1 1 < ≤ for 0 ≤ x < 1. 2 log 2 log 2 1 + x log 2 Comparing the densities λ and µ, it follows that, for any Lebesgue measurable set A, the inequalities (8.10) 1 1 λ(A) < µ(A) ≤ λ(A) 2 log 2 log 2 hold. Now we use the above inequalities to get rid of the Lebesgue measure. It follows from (8.9) and (8.10) that log 2 µ(A)µ(∆n ). 4 Now we are in the position to prove the following statement: (8.11) µ(T −n A ∩ ∆n ) > Theorem 8.1. The continued fraction map T is a measure preserving ergodic transformation on the probability space ([0, 1), L, µ), where L is the family of Lebesgue measurable sets in [0, 1) and µ is the Gauss measure. In particular, ([0, 1), L, µ, T ) is an ergodic system. Proof. We have already shown that T is µ-invariant. Hence, it remains to show that it is ergodic. Given as Lebesgue set B of positive measure. We further assume that the complement of B has positive measure. Then B has a representation as disjoint union B = E ∪ F , where E is a Borel set of measure µ(E) = µ(B) and F is a null set (see [49]). Now assume that µ(b) < 1. Since the complement of B has positive measure, so has the complement E c of E. For any ǫ > 0 there exists a set Gǫ which can be represented as a finite disjoint union of open intervals ∆n ∈ D and has a small symmetrical difference with E c : µ(E c ∆Gǫ ) < ǫ (this is some kind of approximation). It follows from (8.11) that µ(E ∩ Gǫ ) ≥ γµ(Gǫ ) with γ = log 2 µ(B). 4 By construction, we get µ(E c ∆Gǫ ) ≥ µ(E ∩ Gǫ ) ≥ γµ(Gǫ ) ≥ γµ(E c ∩ Gǫ ) > γ(µ(E c ) − ǫ), which leads to γ(µ(E c ) − ǫ) < µ(E c ∆Gǫ ) < ǫ. This yields the inequality γµ(E c ) < ǫ+ǫγ, which is impossible for sufficiently small ǫ > 0. Hence, we have found a contradiction and conclude µ(B) = 1. Thus T is ergodic. • 8. Metric Theory of Continued Fractions 93 In the proof we have used the lemma of Knopp [84] (and its proof): Given a probability space ([0, 1), F, λ); if B is a Lebesgue measurable set and C is a class of subintervals of [0, 1) such that • any open subinterval of [0, 1) has a representation as a countable union of disjoint elements of C, and • for any A ∈ C we have λ(A ∩ B) ≥ γλ(A) with some positive constant γ independent of A, then λ(B) = 1. This ergodicity criterion is important for practical purposes. 8.2. The Theorems of Khintchine and Lévy Now we apply our machinery to the ergodic system ([0, 1), L, µ, T ) to obtain remarkable results on the statistical data of continued fraction expansions. We start with almost sure asymptotics for some mean values for partial quotients (as in (8.2)). Khintchine [81] proved Theorem 8.2. For almost all x = [0, a1 , a2 , . . .] ∈ [0, 1), (i) the positive integer k appears in the sequence of partial quotients an with asymptotical density 1 1 1 ♯{1 ≤ n ≤ N : an = k} = log 1 + ; lim N →∞ N log 2 k(k + 2) (ii) the arithmetical mean of the partial quotients is infinity: N 1 X an = +∞; lim N →∞ N n=1 (iii) for the geometrical mean, ! N1 N ∞ Y Y an lim = 1+ N →∞ n=1 k=1 1 k(k + 2) log k log 2 . According to (i) we have partial quotient 1 for almost all x from the unit in4/3 terval with a frequency of log log 2 ≈ 41.50 . . . percent wheras partial quotient 9/8 2 appears with approximately log log 2 ≈ 16.99 . . . percent. This is nothing else but a sophisticated analogue of Benford’s law, resp. Gelfand’s problem on digit distribution of continued fraction expansions. Proof. We write x = [0, a1 (x), a2 (x), . . .]. Recall from the last section that the continued fraction map deletes the first partial quotients and shifts the others. Thus, a1 (x) = ⌊ x1 ⌋ = ⌊T x⌋ and a2 (x) = a1 (T x) by (8.3), which 1 , k1 ] we implies an (x) = a1 (T n−1 x) for n ≥ 2. Using the intervals ∇k := ( k+1 have a1 (ξ) = k if, and only if, {ξ} ∈ ∇k , thence (8.12) an (x) = k ⇐⇒ a1 (T n−1 x) = k ⇐⇒ T n x ∈ ∇k . The sequence of denominators of the convergents associated with the continued fraction expansion x = [0, a1 (x), a2 (x), . . .] are initmitely related with the images of the iterations T n in the intervals ∇k . 94 ERGODIC NUMBER THEORY 3 40 2 20 1 0 0 0 1000 0 1000 n n Figure 2. The slow convergence of the geometric mean (left) and the arithmetic mean (right) of the partial quotients in the example x = π − 3. Since the continued fraction map T is ergodic by Theorem 8.1, an application of Birkhoff’s Ergodic Theorem 4.2 with the indicator function f = χ∇k yields Z 1 X n χ∇k dµ = µ(∇k ); lim χ∇k (T x) = N →∞ N [0,1] 0≤n<N this integral can be computed as Z 1/k 1 dx 1 1 1 = log 1 + − log 1 + log 2 1/(k+1) 1 + x log 2 k k+1 = 1 k+1k+1 log , log 2 k k+2 which is the value appearing in (i). Since χ∇k (T n x) = 1 holds with regard to (8.12) exactly for an = k, the proof of (i) is complete. The second assertion follows in a similar manner R 1by using the step func1 tion f (x) = ⌊ x ⌋ = a1 (x). In this case the integral 0 f dµ diverges to +∞. For (iii) we consider the step function f (x) = log a1 (x) which, in view of (8.12), we may also rewrite as f (x) = log k for x ∈ ∇k . We note Z 1 f (x) dx = 0 ∞ X k=1 which implies the convergence of R µ(∇k ) log k ≤ [0,1] f ∞ X log k k=1 k2 dµ, since dµ 1 1 = ≪1 dx log 2 1 + x for x ∈ [0, 1). Birkhoff’s Ergodic Theorem 4.2 yields 1 N →∞ N lim X 0≤n<N , log an = Z [0,1) f dµ. 8. Metric Theory of Continued Fractions 95 The latter integral is easily evaluated as Z 1 Z ∞ X log k 1/k dx f (x) dµ(x) = log 2 1/(k+1) 1 + x 0 k=1 ∞ X log k 1 = log 1 + ; log 2 k(k + 2) k=1 log k as k → ∞, it should be noted that the terms grow asymptotically k(k+2) which implies the convergence of both, the infinite series and the infinite integral. For the geometric mean we thus get ! N1 Z 1 N Y lim an f (x) dµ(x) = exp N →∞ 0 n=1 = exp ∞ X k=1 ! log k 1 log 1 + , log 2 k(k + 2) which leads to the limit according to (iii). • For N → ∞ the almost sure limit for the geometric mean defines the so-called Khintchine constant log k ∞ Y log 2 √ 1 N a1 a2 · . . . · aN −→ = 2.68545 20010 . . . . 1+ k(k + 2) k=1 We shall discuss some special continued fractions with respect to this result. For instance, Euler’s number has the following continued fraction expansion e = exp(1) = [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, . . . , 1, 2n, 1, . . .] (a proof can be found in [128]). Here we have for the arithmetic mean a1 + a2 + . . . + aN ∼ 19 N , whereas for the geometric mean r 2 3 √ 2N 2 N N N! ∼ a1 a2 · . . . · aN ∼ 3 3e holds. In the latter case we observe a behaviour different from normality. For π there is no regularity in the continued fraction expansion known; here computer experiments show a regular behaviour in the sense of Khintchine’s theorem. A classical theorem of Lagrange characterizes quadratic irrationalities, i.e., roots of irreducible quadratic polynomials with rational coefficients, as those real numbers with an eventually periodic continued fraction expansion (see [128]). For example, √ √ √ 5+1 3+1 2 = [1, 2, 2, 2, . . .], = [1, 1, 1, 1, . . .], = [1, 2, 1, 2, . . .]. 2 2 In particular, the partial quotients of quadratic irrationals are bounded and it is not too difficult to show that in general this does not match the almost sure statistics of Khintchine’s theorem. Actually, it is not known whether 96 ERGODIC NUMBER THEORY √ cubic irrationals – as 3 2 – or algebraic irrationals of higher degree have or have not arbitrarily large partial quotients in their continued fraction expansion. Recently, Wolf [155] checked the ordinates of the first 2 600 nontrivial zeros (in ascending order in the upper half-plane), all accurate to 1 000 digits, with respect to their continued fraction expansion; they all were found to have a geometric mean matching the Khintchine constant. This may be interpreted as hint for their irrationality, a deep open conjecture in zeta-function theory (see Chapter 6.2). We add more data indicating the irrationality of some zeta-values: ζ(2) ζ(3) ζ(4) ζ(5) K100 K1000 2.21929 . . . 2.40379 . . . 3.10594 . . . 2.75239 . . . 2.64745 . . . 2.68948 . . . 2.83378 . . . 2.59444 . . . √ Here KN := N a1 , . . . , aN , where the an s are the partial quotients of the continued fraction of the corresponding zeta-value, e.g., ζ(2) = [1, 4, 1, 1, 8, 1, 1, 1, 4, 1, 9, 9, . . .]. Birkhoff’s ergodic theorem allows more asymptotical results of this type. Next we investigate the sequence of denominators qn of the convergents. In particular their growth qn → ∞ is of interest and leads to interesting insights with respect to Diophantine approximation. The following theorem is due to Lévy [96]: Theorem 8.3. Denote by almost all x ∈ [0, 1), pn qn = pn (x) qn (x) the n-th convergent to x. Then, for π2 1 log qn (x) = n→∞ n 12 log 2 (8.13) lim and π2 1 pn −1 . lim log x − = n→∞ n qn 6 log 2 (8.14) Surprisingly, these asymptotics have various interpretations in physics, e.g., as Kolmogorov entropy of mixmaster cosmologies; see Csordás & Szépfalusy [36]. Proof. Since pm (x) qm (x) = 1 1 = pm−1 (T x) a1 + [0, a2 , a3 , . . . , am ] a1 + qm−1 (T x) = qm−1 (T x) , pm−1 (T x) + a1 qm−1 (T x) 8. Metric Theory of Continued Fractions 97 it follows that pm (x) = qm−1 (T x) for m ∈ N. Hence, 1 qn (x) = = pn (x) 1 p1 (T n−1 x) · ... · qn (x) qn−1 (T x) q0 (T n x) p1 (T n−1 x) pn (x) pn−1 (T x) · ... · , qn (x) qn−1 (T x) q1 (T n−1 x) since q0 (T n x) = q0 = 1 independent of x and n. Taking the logarithm leads to X pn−j (T j x) . − log qn (x) = log qn−j (T j x) 0≤j<n Since the numbers (8.15) pn (x) qn (x) approximate x, we write 1 1 1 X log(T j x) + Rn (x) − log qn (x) = n n n 0≤j<n with remainder term Rn (x) = X 0≤j<n pn−j (T j x) j log − log(T x) . qn−j (T j x) First we shall estimate the error Rn (x). Setting k = n − j we see that p +pk−1 . Now ξ := T j x belongs to an interval ∆k with end points pqkk and qkk +qk−1 Theorem 7.3 and the mean value theorem form analysis imply for even k that Z ξ pk du 0 < log ξ − log = qk pk /qk u pk 1 qk 1 1 = ξ− ≤ < qk η qk (qk + qk−1 ) pk qk with some η ∈ ( pqkk , ξ). Similarly, pk 1 < log ξ − log qk qk for odd k. Denoting by Fk the kth Fibonacci number, their recursive definition yields estimate qk (x) ≥ Fk with equality if, and only if, x = √ 1 2 ( 5 + 1) is the golden ratio. Thus, n X 1 , |Rn (x)| ≤ Fk k=1 which we can bound thanks to Binet’s formula (1.8). Writing G := the infinite geometric series expansion yields ∞ ∞ X X 1 < G−k < +∞. |Rn (x)| < Fk k=1 In particular, k=1 1 Rn (x) = 0 n→∞ n for all x. Hence, the remainder term Rn (x) in (8.15) is negligible. lim √ 5+1 2 , 98 ERGODIC NUMBER THEORY If the following limit exists, n 1X log(T n−j x), n→∞ n − lim (8.16) j=1 limn→∞ n1 log qn (x) exists as well and both limits have the same value. then The expression (8.16) can be computed with Birkhoff’s ergodic theorem for almost all x as Z 1 n 1X 1 log x π2 j (8.17) lim log(T x) = dx = − , n→∞ n log 2 0 1 + x 12 j=1 where the appearing integral was evaluated in (6.5) by use of Euler’s formula (6.2). This proves (8.13). In order to prove the second assertion, we apply Theorem 7.3 to obtain 1 pn 1 < x− < . 2qn qn+1 qn qn qn+1 Now, we may deduce (8.14) from (8.14). • There are quite many interesting results beyond Lévy’s theorem. Philipp & Stackelberg [110] improved his result in showing 2 | log qn (x) − 12nπ log 2 | lim sup p =1 n→∞ 2σ 2 n log log n for almost all x ∈ [0, 1), where 2 Z 1 1 dx nπ 2 2 σ = lim log qn (x) − n→∞ n 0 12 log 2 (log 2)(1 + x) is a positive constant. A further result due to Philipp [109] shows a Gaussian normal distribution : ! 2 Z z log qn (x) − 12nπ 1 log 2 √ <z = √ exp(− 12 u2 ) du, lim µ x ∈ [0, 1] : n→∞ σ n 2π −∞ where µ is any absolutely continuous probability measure with respect to the Lebesgue measure. Faivre [52] investigated quadratic irrational numbers x. In this case the sequence n1 log qn (x) converges (thanks to the eventually periodic continued fraction expansion) and its limit β(x) is called the Lévy constant. An interesting question is which values β(x) can assume. Arnold proposed the investigation of the average frequencies of partial quotients in continued fractions of solutions of the quadratic equation X 2 + X + q = 0 as p, q grow (e.g., lie inside some disk of radius R as R → ∞); surprisingly, with such an averaging the continued fractions seem to behave like those of random real numbers (cf. [5, 140]). Kesseböhmer & Stratmann [79] obtained multifractal generalizations of the classical results of Lévy and Khintchine. 8. Metric Theory of Continued Fractions 99 In our metrical investigations we have not used Gauss’ limit formula (8.1) which can be rewritten as lim λ(T −n [0, ξ]) = µ([0, ξ]). n→∞ A proof of the theorems of Khintchine and Lévy using this approach can be found in the monograph of Rockett & Szüsz [119] which includes as well a proof of the theorem of Gauss–Kuzmin–Lévy with explicit error term. Further deep results on metric theory of this and other types of continued fractions (e.g., the proof of the Doeblin–Lenstra–conjecture by Bosma, Jager & Wiedijk) can be found in [37]. The analogues for continued fractions to the nearest integer were given by Rieger [116]. The book of Schweiger [124] investigates higher dimensional analogues of continued fractions. The theory of continued fractions shows that for any given real number x there exists a sequence (qm ) of strictly increasing positive integers with qm kqm xk < 1 where, as above, k . k denotes the minmal distance to the next integer. Littlewood conjectured that inf nknxkknyk = 0 n∈N for all x, y ∈ R. It is not difficult to show that this is satisfied for rationals as well as for quadratic irrationals. For arbitrarily given x, Adamczewski & Bugeaud [1] constructed explicitly continuum many real numbers y with bounded partial quotients for which the pair (x, y) satisfies a strong form of Littlewood’s conjecture. Recently, Einsiedler, Katok & Lindenstrauss [46] proved that Littlewood’s conjecture is indeed true for almost all real numbers: the Hausdorff dimension of the set of exceptional pairs (x, y) ∈ R2 has measure zero. Elon Lindenstrauss has recently received a Fields medal at the International Congress of Mathematicians 2010 in Hyderabad for his work on interactions between dynamical systems theory and Diophantine analysis! Besides his work on the Littlewood conjecture he has solved the so-called quantum ergodicity conjecture.∗ We have not mentioned numerous applications of ergodic theory to Diophantine equations. For instance, for a squarefree integer d > 1 we are interested in the √ set of points with integer coordinates on a two-dimensional sphere of radius d: Id := {x = (x, y, z) ∈ Z3 : x2 + y 2 + z 2 = d}. A deep result of Gauss shows that Id is non-emtpy if, and only if, d is not of the form 4a (8b − 1) for positive integers a, b. In the 1950s Linnik [98] showed that, as d → ∞ amongst the squarefree integers with d ≡ ±1 mod 5, the set √ {x/ d : x ∈ Id } ⊂ §2 ∗ And more Fields medals have been awarded to mathematicians working in ergodic theory, however, this is a topic of the next chapter. 100 ERGODIC NUMBER THEORY becomes uniformly distributed on the unit sphere S2 with respect to the Lebesgue measure. Linnik’s approach used ergodic theory. The constraint d ≡ ±1 mod 5 was removed by Duke [44]. Exercises No pains - no gains! Here is another round to experience the world of continued fractions and ergodic theory. Exercise 8.1. Show that the continued fraction map T is not measure preserving with respect to the Lebesgue measure. Exercise 8.2. Prove statements (8.5)-(8.8). Exercise 8.3. Give a proof of Knopp’s lemma in its full generality. Moreover, fill all gaps as, for example, Binet’s formula (1.8) and the deduction of (8.14) from (8.13)). Hint: help can be found in [37].) Exercise 8.4. For some quadratic and cubic irrationalities compute the first partial quotients of their continued fraction expansion and try to compare the asymptotic behaviour of the geometric and arithmetical means. Exercise 8.5. Prove that the set of real numbers having a continued fraction expansion with bounded partial quotients has Lebesgue measure zero. Moreover, show P 1 that, given a function f with f (n) > 1 for n ∈ N such that ∞ n=1 f (n) diverges, then the set {x = [a0 , a1 , . . .] ∈ [0, 1) : an < f (n) for n ∈ N} has measure zero. If you have been successful, try to prove under the same hypothesis on f that {x = [a0 , a1 , . . .] ∈ [0, 1) : an (x) > f (k) for infinitely many k ∈ N} has measure zero. Hint: The latter proof uses the Borel-Cantelli lemma from probability theory; for further advise see [119]. Consequently, most of the numbers have a continued fraction expansion with partial quotients that, although not bounded, are not too large! Here is another result of Khintchine [81]: Exercise 8.6. Show that, for almost all x = [0, a1 , a2 , . . .], lim N →∞ 1 a1 N + ...+ 1 aN = 1.74540 . . . . Moreover, let f be a function with f (k) = O(k 1−δ ) for some positive δ. Prove that, for almost all x = [a0 , a1 , . . .], 1 N ∞ X ) log(1 + k(k+2) 1 X . f (an ) = f (k) N →∞ N log 2 n=1 lim k=1 * * * 8. Metric Theory of Continued Fractions 101 A week is definitely not enough to learn about all beautiful applications of ergodic theory to arithmetic. We did not speak about Margulis’ proof of the Oppenheim conjecture on the values of indefinite quadratic forms at least three unknowns, nor did we mention the recent results on quantum ergodicity. The next and final chapter gives a first glimpse of another, completely different direction of ergodic number theory which is free of integrals but full of compact sets... CHAPTER 9 Coda: Arithmetic Progressions An arithmetic progression of length ℓ and common difference d is a sequence a, a + d, a + 2d, . . . , a + (ℓ − 1)d, where a, d, ℓ are integers with d ≥ 1 and ℓ ≥ 3 (to exclude trivialities). For instance, 3, 13, 23, 33, 43, 53, 63, 73 is an arithmetic progression of length 8. We are interested in sets of integers which contain arithmetic progressions of arbitrary length as, for example, the set of even (resp. odd) integers. On the contrary, the powers of 10 do not contain any arithmetic progression. We ask: under what conditions does an infinite subset of Z contain arithmetic progressions of arbitrary length? Erdös & Turán [50] conjectured that any subset {a1 , a2 , . . .} ⊂ N of positive lower density, i.e., 1 X 1 > 0, lim inf N →∞ N an ≤N contains arbitrary long arithmetic progressions. It is remarkable that there is no structure assumption made upon the set of elements an , only that it has to be sufficiently large. The conjecture of Erdös & Turán was solved by Szemerédi [131] by introducing a complicated combinatorial technique. Furstenberg [54] succesfully studied the problem of simultaneous recurrence of sets of positive measure. In this context he found a remarkable generalization of Theorem 5.4: Let T : X → X be measure preserving transformation on a probability space (X, F, µ) and let A be a measurable set with µ(A) > 0. Then, for any positive integer k, there exists a positive integer n such that (9.1) µ(A ∩ T −n A ∩ . . . ∩ T −kn A) > 0. This theorem plays a crucial role of Furstenberg’s ergodic proof of Szemerédi’s theorem on the solution of the Erdös & Turán conjecture. We illustrate the proof (and refer to [114] for details). We denote by Ω = {0, 1}Z the space of double infinite {0, 1}-sequences and interpret its elements as indicator functions χA of sets A ⊂ Z. Since {0, 1} is compact, by Tychonov’s theorem, so is Ω and we may define a metric on Ω as follows: given sequences x = (xn ), y = (yn ), let N (x, y) = min{N ∈ N : xN 6= yN or x−N 6= y−N } 102 9. Coda: Arithmetic Progressions for x 6= y, and (9.2) d(x, y) = 2−N (x,y) 0 103 if x 6= y, otherwise. It is easy to check that d defines a metric on Ω, hence (Ω, d) is a compact metric space. We define the shift transformation by (9.3) σ : Ω → Ω, , ω(n) 7→ σω(n) = ω(n + 1). Given an element ω ∈ Ω, we say that 1 occurs with positive upper Banach density if the set Z := {n ∈ Z : ω(n) = 1} has positive upper Banach density, i.e., ♯(Z ∩ I) > 0, lim sup ♯I ♯I→∞ where I runs through the set of intervals in Z and ♯I denotes the number of integers contained in I. Moreover, for ω ∈ Ω we write X = {σ n ω : n ∈ Z} ⊂ Ω. Then, one can show that if 1 occurs with positive upper Banach density, there exists a σ-invariant measure µ on X satisfying µ(A) > 0 for A := {ω ∈ Ω : ω(0) = 1}. Now we sketch how to apply Furstenberg’s simultaneous recurrence theorem (9.1) to the Erdös-Turan-Conjecture. Assume that B ⊂ Z has positive upper Banach density. Then, by (9.1), for any given k there exists a positive integer n and some point ω ∈ Ω such that σ jn ω ∈ B ∩ X for 0 ≤ j < k. This implies ω(0) = ω(n) = . . . = ω((k − 1)n) = 1. Since ω ∈ X is limit of translates of the indicator function χA , we have χA (b) = χA (b + n) = . . . = χA (b + (k − 1)n) = 1 for some b ∈ Z, hence A contains the arithmetic progression b, b + n, . . . , b + (k − 1)n. This is essentially Furstenberg’s proof of the theorem of Szemerédi. ◦ Furstenberg’s ergodic approach marks the beginning of an impressive success story. Gowers and later Tao [133] obtained quantitative results with respect to Szemerédi’s theorem. Both were awarded a Fields medal for their work in this direction: Gowers in 1998 at the ICM in Berlin, Tao in 2006 at the ICM in Madrid. Also Roth obtained a Fields medal, 1958 in Edinburgh, however, mainly for his improvement of the diophantine approximation theorems of Thue and Siegel for algebraic numbers. Moreover, Margulis and Bourgain received Fields medals for their contributions to ergodic theory and harmonic analysis, namely at the ICM 1978 in Helsinki and 1994 in Zurich, respectively. There is another famous problem which cannot be deduced from the theorems of Szemerédi and Furstenberg: Do the prime numbers contain arbitrarily long arithmetic progressions? In view of the Prime Number 104 ERGODIC NUMBER THEORY Theorem 6.2 the primes have asymptotic density zero in N and, consequently, Szemerédi’s theorem does not apply. In 2004, Green & Tao [57] proved: The set of prime numbers contains arbitrarily long arithmetic progressions. Their deep theorem is built on previous works of other mathematicians work as, for example, Gowers. The longest arithmetic progression of primes presently known has length 23, 56 211 383 760 397 + 44 546 738 095 860 k for k = 0, 1, . . . , 22, and has been discovered by Frind, Underwood & Jobling (cf. Green & Tao [57]).∗ The new methods of Green & Tao are applicable to rather thin sets and we may speculate what kind of results will be proven with these new powerful tools; indeed it is somewhat surprising that for the Green & Tao theorem no deep results from analytic number theory are needed. Our next aim is a related theorem of van der Waerden [141]: Theorem 9.1. In any partition of Z in finitely many classes there is at least one class which contains arbitrarily long arithmetic progressions. If we divide the integers into r disjoint sets, (9.4) Z = A1 ∪ . . . ∪ Ar , then van der Waerden’s theorem claims that it is impossible to avoid arithmetic progressions of arbitrary length in all Aj . Note that this does not mean that there necessarily exist infinite arithmetic progressions (and indeed this cannot hold in general as the reader can cconfirm with a simple example). The statement remains true when we replace Z by N and all known proofs can be formulated under this restriction without any difficulties. Any of those proofs is difficult.† The history of this theorem and its different proofs is very interesting and worth to study; see the notes of van der Waerden [142]. The original problem dates probably back to Schur for the case r = 2, and not to Baudet as it is often refered to; however, and this is interesting, although not extraordinary, the more general point of view, that is arbitrary r, led to an easier proof. We shall sketch a dynamical proof of van der Waerden’s theorem. For this purpose we shall work in metric spaces. Recall that a homeomorphism is a bijective continuous mapping which inverse is continuous too. The branch of mathematics that deals with the dynamics of such homeomorphisms is called topological dynamics. First we mention some basic facts on a certain space of sequences: for k ≥ 2 let Ωk = {1, 2, . . . , k}Z be the set of double infinite sequences ω = ∗ To illustrate the depth of this result we ask the dear reader to break this current record on long arithmetic progressions in primes! † Clearly, there exists something like inavriance of difficulty: there cannot be a simple proof of a deep theorem. 9. Coda: Arithmetic Progressions 105 (ω(n))n∈Z with values in {1, 2, . . . , k}. On Ωk we define via (9.2) the same metric d, only with Ωk in place of Ω. It is not difficult to see that (i) (Ωk , d) is a compact metric space; (ii) the shift transformation σ : Ωk → Ωk , given by (9.3), can be defined on Ωk and is a homeomorphism. The only difficult part in proving (i) is the triangle inequality. For this aim we may suppose that x, y, z ∈ Ωk are pairwise distinct. Then we have to show 2−N (x,y) = d(x, y) ≤ d(x, z) + d(z, y) = 2−N (x,z) + 2−N (z,y) , which is equivalent to 2N (z,y)+N (x,z) ≤ 2N (x,y)+N (z,y) + 2N (x,y)+N (x,z) = 2N (x,y) (2N (z,y) + 2N (x,z) ). This is obvious (actually, N (x, y) ≥ N (x, z) ≥ N (z, y) is the only non-trivial case to consider). In order to prove (ii) let x, y ∈ Ωk with x 6= y and d(x, y) = 2−N . Then xi = yi for −N < i < N , hence (σx)(i) = xi+1 = yi+1 = (σy)(i) for −(N + 1) < i < N − 1. This implies d(σx, σy) ≤ 21−N = 2 d(x, y). Consequently, σ is continuous. Obviously, σ is invertible and the inverse σ −1 turns out to be continuous by the same reasoning as for σ. The central role in our dynamical proof of van der Waerden’s Theorem 9.1 is played by the following multi-dimensional recurrence theorem of Furstenberg & Weiss [55]: Theorem 9.2. Let T1 , . . . , TN : X → X be homeomorphisms on a compact metric space which commute, that means Ti Tj = Tj Ti for 1 ≤ i, j ≤ N . Then there exists an x ∈ X and a sequence of positive integers nk with limk→∞ nk = +∞ such that lim d(Tink x, x) = 0 k→∞ for any i = 1, 2, . . . , N. The commutativity property is essential; note that Ti Tj denotes Ti ◦ Tj . Consequently, the set of the Tj forms a semigroup. Now we sketch how how to deduce the theorem 9.1 of van der Waerden from Theorem 9.2. Given a partition of Z in disjoint sets, Z = A1 ∪ . . . ∪ Ak , we may associate a sequence ω = (ω(n))n∈Z ∈ Ωk by setting ω(n) = i if n ∈ Ai . Next we consider the orbit {σ n ω : n ∈ Z}, where σ is the shift transformation introduced in (9.3). We write X for the closure of this orbit with respect to d. Applying Theorem 9.2 with Ti = σi := σ i (= σ ◦ . . . ◦ σ), for any sufficiently small ǫ < 1, there exist x ∈ X and d ∈ N such that d(σid x, x) < 1 for i = 1, . . . , N. 106 ERGODIC NUMBER THEORY Since d(x, y) = 2−N (x,y) we deduce that the terms with index 0 coincide: x0 = xid = σid x(0) for i = 0, 1, . . . , N. By construction, the sequence {xn }0≤n≤N d appears somewhere in ω, starting at position a, say. Thus ω(a) = x0 = xid = σid x(0) = ω(a + id) for i = 0, 1, . . . N. This shows that a+id ∈ Aω(a) for i = 0, 1, . . . N . Hence, for any ℓ = N +1 we have found an index j such that the set Aj contains an arithmetic progression of length ≥ ℓ. It thus follows that at least one Aj in the dissection (9.4) exists that contains arbitrarily long arithmetic progressions! Reviewing this proof, we are reminded of some ideas of Furstenberg’s approach to the theorem of Szemerédi. Next we give a proof of Theorem 9.2 for the special case that the homeomorphisms Ti are all built from one homeomorphism T by setting Ti = T i for i = 1, . . . , N as in our application for van der Waerden’s theorem. For N = 1 this can be reduced to Birkhoff’s recurrence theorem [18] (not to confuse with his ergodic theorem): Theorem 9.3. Let T : X → X be a homeomorphism on a compact metric space X. Then there exists an element x ∈ X with T nk x → x for a divergent sequence of positive integers nk → ∞. Proof. We shall use Zorn’s lemma.‡ If E denotes the family of all nonempty closed and T -invariant subsets Z of X, equipped with the semi-order Z1 ≤ Z2 : ⇐⇒ Z1 ⊂ Z2 , then for each chain {Zκ }κ there exists a maximal and completely ordered subsystem F ⊂ E — this is the so-called Hausdorff maximal chain theoT rem (see [121]). Then, the set Z = κ Zκ with Zκ ∈ F is closed, T -invariant, and, by construction, minimal, i.e., all non-empty clsoed proper subsets of Z are not T -invariant. Moreover, Z is not empty since X is compact. If A is any closed T -invariant subset of Z, then either A = ∅ or A = Z (similar to the notion of ergodicity). In particular, it follows that the closure A of the orbit {T n x : n ∈ Z} with an arbitrary x ∈ Z satisfies A = Z ⊂ X. Hence, for any ǫ > 0 there exists n ∈ N such that d(T n x, x) < ǫ. § This immediately implies the assertion. • The remaining part of the proof of Theorem 9.2 is by induction on N . Hence, we have to show that, if the assertion holds for N − 1 homeomorphisms T1 = T, . . . TN −1 = T N −1 , then it is also true if we add the N -th homeomorphism TN = T N . For this we may suppose that X is the least ‡ Infamous by its equivalence to the unloved axiom of choice which claims that any non-empty semi-ordered set in which every totally-ordered set has an upper bound contains a maximal element. This was discovered by Zorn in 1935. § Note that the T -invariance allows more than the standard conclusion, namely, the existence of an accumulation point. 9. Coda: Arithmetic Progressions 107 closed set which is invariant with respect to each T j with j = 1, . . . , N (again by Hausdorff’s maximal chain theorem as in the previous proof). Firstly, given ǫ > 0 and arbitrary x, x′ ∈ X, we prove the existence of a finite set K ⊂ N such that d(T k x, x′ ) < ǫ (9.5) for some k ∈ K. If ∅ = 6 B ⊂ X is open, then the minimality of X implies that for any z ∈ X S there exists some n ∈ N with T n z ∈ B. Hence, X = n∈N T −n B. Since X is compact by assumption and T −n B is open, the theorem of Heine-Borel implies that X possesses a finite covering of the form [ X= T −k B k∈K(B) with some finite subset K(B) ⊂ N. Once again by the compactness of X there exist finitely many open balls B1 , . . . , Br each of which of radius 2ǫ such that r [ Bj . X= j=1 x, x′ Thus, given ∈ X, we have x ∈ Bi for some i ∈ {1, . . . , r} and x′ ∈ S −k T Bi for some k ∈ K(Bi ). This gives (9.5) with K = rj=1 K(Bj ). Next we show that, for any ǫ > 0 and any x ∈ X, there exist y ∈ X and n ∈ N such that (9.6) d(T jn y, x) < ǫ for j = 1, . . . , N. Since any homeomorphism T k is uniformly continuous on the compact set X, there exists ρ > 0 for which (9.7) d(T k x1 , T k x2 ) < ǫ für x1 , x2 ∈ X whenever d(x1 , x2 ) < ρ. Actually, we may suppose this for all k from the finite(!) set K, defined by (9.5) (by uniformity and compactness). By assumption, there exist x′ ∈ X and n ∈ N such that d(T jn x′ , x′ ) < ρ for j = 1, . . . , N − 1. Since X is compact, it follows that the T -invariant set T X is closed, hence T X = X and T n X = X, respectively. Thus we can find y ′ ∈ X such that T n y ′ = x′ and d(T n y ′ , x′ ) = 0, d(T jn y ′ , x′ ) < ρ for j = 2, . . . , N. Using our previous uniform estimate (9.7), it thus follows that d(T jn+k y ′ , T k x′ ) < ǫ for k ∈ K, j = 1, . . . , N. For any x ∈ X there exists some k ∈ K with d(T k x′ , x) < ǫ. Setting y := T k y ′ , the triangle inequality yields d(T jn y, x) ≤ d(T jn+k y ′ , T k x′ ) + d(T k x′ , x) < 2ǫ for j = 1, . . . , N . Since ǫ > 0 is arbitrary, we may deduce (9.6). 108 ERGODIC NUMBER THEORY We get close to the finish. Given ǫ0 > 0 and some arbitrary x0 ∈ X, in view of (9.6) there exist x1 ∈ X and n1 ∈ N satisfying d(T jn1 x1 , x0 ) < ǫ0 for j = 1, . . . , N. Now we choose ǫ1 ∈ (0, ǫ0 ) such that with d(x, x1 ) < ǫ1 also d(T jn1 x, x0 ) < ǫ0 for j = 1, . . . , N holds. We iterate this as follows: suppose we have defined • points x1 , . . . , xk ∈ X, • positive integers n1 , . . . , nk , and • a strictly monotonic decreasing sequence of positive real numbers ǫ1 , . . . , ǫk with the property that, for all i = 1, . . . , k − 1, (9.8) d(T jni xi , xi−1 ) < ǫi−1 for j = 1, . . . , N, and, if d(x, xi ) < ǫi , additionally (9.9) d(T jni x, xi−1 ) < ǫi−1 for j = 1, . . . , N is true. Then, by (9.6), there exist (as in the case i = 0 above) xk+1 ∈ X and nk+1 ∈ N such that d(T jnk+1 xk+1 , xk ) < ǫk for j = 1, . . . , N. Now we now choose ǫk+1 ∈ (0, ǫk ) such that d(x, xk+1 ) < ǫk+1 implies d(T jnk+1 x, xk ) < ǫk for j = 1, . . . , N. Hence, (9.8) and (9.9) hold with i = k + 1. This process can be continued ad infinitum which finishes the induction. Finally, we let i = ℓ − 1, ℓ − 2, . . . and deduce for i < ℓ with regard to (9.8) and (9.9) that d(T j(ni+1 +...+nℓ ) xℓ , xi ) < ǫi for j = 1, . . . , N. Since X is compact, there exists a finite covering of X by r open balls of radius ǫ0 . Consequently, there are indices i, ℓ satisfying 0 ≤ i < ℓ ≤ r and d(xi , xℓ ) < ǫ0 . Setting m = ni+1 + . . . + nℓ it follows from ǫi < ǫ0 that d(T jm xℓ , xℓ ) ≤ d(T jmxℓ , xi ) + d(xi , xℓ ) < 2ǫ0 für j = 1, . . . , N. Since ǫ0 > 0 is arbitrary, we conclude the assertion of Theorem 9.2 in the special case Tj = T j for j = 1, . . . , N . • The given proof of van der Waerden’s theorem uses some infinite elements (e.g., the theorem of Tychonoff, the Zorn lemma, and the theorem of Heine-Borel). In fact, one can circumvent these statements by a quantitative approach which leads to a pure combinatorial proof. For this and further thoughts we refer to [135]. Chaotic or random structures, if sufficiently large, do contain regular substructures. This is the essence of the above results. The van der 9. Coda: Arithmetic Progressions 109 Waerden theorem allows many applications. We give an example which is related to the distribution of values of quadratic polynomials modulo one (which closes the circle of our course): Corollary 9.4. Given a real number α and some ǫ > 0, there exist infinitely many m ∈ N such that kαm2 k < ǫ. Here kxk denotes the minimal distance of x to an integer. The corollary can be proved in various ways, for instance, along the uniform distribution results of Weyl; however, the following proof is of completely different nature: Proof. We dissect the unit interval in finitely many subintervals I each of which of length ≤ 2ǫ . Then each of the sets {n ∈ N : 2 1 2 αn mod 1 ∈ I} defines a subset of N which does not intersect with any other such set. By the theorem of van der Waerden at least one of those subsets of N contains an arithmetic progression of length three and common difference d as large as we please (by removing terms from longer arithmetic progressions). Hence, there exists n ∈ N such that 2 1 1 2 αn , 2 α(n + d)2 , 1 2 α(n + 2d)2 ∈ I for some I. Next we consider the identity 2 1 2 αn − 2 · 21 α(n + d)2 + 12 α(n + 2d)2 = αd2 . The left-hand side is made from two differences of elements in I modulo one, hence, each of which is ≤ 2ǫ . This implies the inequality for m = d. Letting ǫ → 0 we obtain infinitely many such m ∈ N. • Erdös awarded 3000 US-Dollars for the proof of the following still open conjecture:¶ If (an ) is a strictly increasing sequence of positive integers and ∞ X 1 an n=1 diverges, then the sequence contains arbitrarily long arithmetic progressions. In particular, this would imply the Green & Tao theorem since the series over the reciprocals of the primes diverges as already Euler knew (see the intro to Chapter 6). ¶ Actually, Erdös awarded many such prices for his uncountable conjectures, starting from 5 Dollars, the amount being an indicator for the expected degree of difficulty. It is said that Erdös claimed that he could also announce a price of 106 Dollars for the above conjecture since he would not see the proof in his lifetime. Erdös died in 1996. 110 ERGODIC NUMBER THEORY Exercises Exercise 9.1. Give a proof of Theorem 9.2 for the general case of arbitrary commuting homeomorphisms T1 , . . . , TN . (Help can be found in [114].) What could be more appropriate for the last task in these course notes than Exercise 9.2. Try to prove Erdös’ conjecture that whenever (an ) is a strictly P∞ increasing sequence of positive integers with diverging series n=1 a1n , then (an ) contains arbitrarily long arithmetic progressions. (And if you succeed, please inform me!) * * * Ergodic theory is exhibiting patterns in (random or deterministic) data sets. So it is no surprise that ergodic theory has important applications in information theory. Basic questions are: what is randomness and how much randomness is in a given set of data? In 1948 Shannon defined randomness by entropy and set the foundations of information theory and modern digital communication technology. Kolmogorov and Sinai extended the definition to studies of measure preserving transformations. For the fundamental results in this direction and applications in data compression we refer to the book of Choe [31]. However, there is much more what could and should be said about ergodic theory in general and ergodic number theory in particular. We cannot do better than to refer to the rich literature on this fascinating topic: Although Halmos’ book [58] is very thin it contains an interesting list of unsolved problems which are still open. The standard reference on ergodic theory is Krengel’s monograph [91]. Many number theoretical applications offer the new book [47] by Einsiedler & Ward. The standard reference for metrical continued fraction theory is [71] due to Iosifescu & Kraaikamp. Biographical and Historical Notes We conclude with some biographical and historical notes related to ergodic number theory. Our selection does not pretend to be complete. In fact, we will be very brief with those mathematicians which are either very famous or have not directly contributed to our topic. We cannot explain the historical interactions which are in one way or another related to the topic of these notes but we intend to comment on a few interesting incidents which are related with the story behind. The biographical accounts are mostly based on the ‘The MacTutor History of Mathematics archive’ http://turnbull.mcs.st-and.ac.uk/ history/. Questions about Life: Laplace’s Demon and Boltzmann’s Brain The scientific revolution of Copernicus, Brahe, Galileo, and others was the starting point for a new look on old questions about life and God. Philosophers as John Calvin believed in predestination, which means that there exists God who determined the fate of the universe for all time and space before Creation. Although this is not easily compatible with human free will we can find non-theistic or polytheistic ideas of determinism, destiny, fate, or doom in many different cultures. Here we are concerned with its counterpart in science. Pierre-Simon de Laplace, born 1749, was as a French mathematician who is well-known for his contributions to probability theory, differential equations, and, in particular, mathematical physics. His mathematical career started at the University of Caen before he later moved to Paris where he became professor at the École Militaire. In 1790 he became a member of the Académie des Sciences with the task to standardize weights and measures. His committee worked on the metric system and advocated a decimal base. However, the social and political upheaval of the French Revolution requested republican virtues and hatred of kings, and Laplace and his family left Paris for some time until Robbespierre was guillotined. With the rise of Napoleon in 1799 Laplace became Minister of the Interior, however, as written in Napoleon’s memoirs, Laplace was removed after only six weeks from this position because he brought the spirit of the infinitely small into the government and was promoted to the Senate. One direction of his research was devoted to questions about astronomical stability, a recurrent motive in these notes. Here Laplace pointed out that there could be massive stars whose gravity is so great that not even light could escape from their surface — a forerunner of the notion of a black hole. In 1814 Laplace made the following thought experiment: ”We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect 111 112 ERGODIC NUMBER THEORY were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.” This quotation is from Laplace’s introduction to his Philosophical essay on probabilities and the mentioned ’intellect’ is often referred to as Laplace’s demon. By Heisenberg’s uncertainty principle∗ it turned out that Laplace’s thought experiment is inherently incompatible with quantum mechanics where exact simultaneous measurement of position and velocities is impossible. Actually, Laplace’s demon already met its end with the foundation of thermodynamics by Maxwell and his contemporaries in the 19th century. Certain processes in nature are irreversible. Laplace died in 1827. Figure 1. Laplace, Boltzmann, and James Clerk Maxwell, ∗ 1831 - † 1879, who, of course, also played a leading role in the development of thermodynamics. Another not unrelated point of view was discussed by Ludwig Boltzmann, who lived from 1844 to 1906, and was an Austrian physicist who is famous for his contributions to statistical mechanics and thermodynamics. His kinetic theory of gases and statistical mechanics was popularized by Paul and Tatiana Ehrenfest [45]. However, the ergodic hypothesis was disproved in 1913 by Rosenthal and Plancherel; we refer to the simple measure-theoretical argument of Carathéodory [27]: the points of an orbit form a set of measure zero whereas the energy surface has non-zero measure. Boltzmann was also one of the most important advocate for atomic theory when that scientific model was highly controversial. However, here we want to report briefly about a philosophical idea to explain why there is a lot of organization in the universe observable: a ’Boltzmann brain’ is a hypothetical self-aware object which arises due to random fluctuations out of some chaotical state. The famous phyisicist Richard Feynman wrote in his Lectures on Physics [53]: ”So far as we know, all the fundamental laws of physics, such as Newton’s equations, are reversible. Then were does irreversibility come from? It comes from order going to disorder, but we do not understand this until we know the origin of the order. Why is it that the situations we find ourselves in every day are always out ∗ Werner Heisenberg was born 1901 in Würzburg! Biographical and Historical Notes 113 of equilibrium? (...) One possible explanation is the following. Look again at our box of mixed white and black molecules. Now it is possible, if we wait long enough, by sheer, grossly improbable, but possible, accident, that the distribution of molecules gets to be mostly white on one side and mostly black on the other. After that, as time goes on and accidents continue, they get more mixed up again. Thus one possible explanation of the high degree of order in the present-day world is that it is just a question of luck. Perhaps our universe happened to have had a fluctuation of some kind in the past, in which things got somewhat separated, and now they are running back together again. This kind of theory is not unsymmetrical, because we can ask what the separated gas looks like either a little in the future or a little in the past. In either case, we see a grey smear at the interface, because the molecules are mixing again. No matter which way we run time, the gas mixes. So this theory would say the irreversibility is just one of the accidents of life. (...) We would like to argue that this is not the case. Suppose we do not look at the whole box at once, but only at a piece of the box. Then, at a certain moment, suppose we discover a certain amount of order. In this little piece, white and black are separate. What should we deduce about the condition in places where we have not yet looked? If we really believe that the order arose from complete disorder by a fluctuation, we must surely take the most likely fluctuation which could produce it, and the most likely condition is not that the rest of it has also become disentangled! Therefore, from the hypothesis that the world is a fluctuation, all of the predictions are that if we look at a part of the world we have never seen before, we will find it mixed up, and not like the piece we just looked at. If our order were due to a fluctuation, we would not expect order anywhere but where we have just noticed it. (...) We therefore conclude that the universe is not a fluctuation, and that the order is a memory of conditions when things started. This is not to say that we understand the logic of it. For some reason, the universe at one time had a very low entropy for its energy content, and since then the entropy has increased. So that is the way toward the future. That is the origin of all irreversibility, that is what makes the processes of growth and decay, that makes us remember the past and not the future, remember the things which are closer to that moment in history of the universe when the order was higher than now, and why we are not able to remember things where the disorder is higher than now, which we call the future.” Rationality vs. Irrationality The very first observation of irrationality dates probably back to the school of Pythagoras when Hippasus discovered the existence of irrationality or, in the words of the later Eudoxos, the lengths of the side and diagonal of a square √ are incommensurable: 2 6∈ Q. Previously, Greek philosophy and mathematics believed that everything can be expressed in rational numbers. This point of view 114 ERGODIC NUMBER THEORY implied a central position of mathematics among all sciences since the laws of the world are written in mathematical terms or, as Galileo said ”Mathematics is the language with which God has written the universe.” The belief of the ancient Greeks in rational numbers also indicates the incompleteness at that time. We often underestimate the efforts and insights of past generations, however, it has been a long way from incommensurability as it appears in the simple geometry of squares to Cantor’s concept of numbers and sets. Probably less known are the following quotations of the fourteenth century scientist Nicole Oresme, born around 1320 near Caen in France, died 1382 as bishop of Lisieux. The first is from his work De proportionibus proportionum and claims: It is probable that two proposed unknown ratios are incommensurable because if many unknown ratios are proposed it is most probable that any [one] would be incommensurable to any [other], whereas the second appears in his Tractatus de commensurabilitate vel incommensurabilitate motuum cell (from 1351!) in which he considers two bodies moving on a circle with uniform but incommensurable velocities: No sector of a circle is so small that two such bodies could not conjunct in it at some future time, and could not have conjuncted in it sometime [in the past] These sentences form part of Oresme’s refutation of astrology. His viewpoint is that the future is essentially unpredictable which he tries to illustrate with a dynamical system! These quotations indicate Oresme’s deep understanding of irrationality and circle rotations. In modern mathematical his observation is that rational numbers form a null set (in the sense of Lebesgue) and that the multiples of an irrational number lie dense in the unit interval (cf. [108], §4.2). This is indeed a remarkable observation Oresme made more than half a millennium ahead of his time; we refer to [145] for a detailed analysis of his reasoning. Oresme is wellknown for his opposition to Aristotle’s astronomy; indeed he thought about rotation of the Earth about two centuries before Copernicus, however, in the end he rejected his new ideas. Moreover, he wrote an interesting treatise on the speed of light and he invented a kind of coordinate geometry before Descartes, to mention just a few of his ingenious ideas. Figure 2. From left to right: Oresme, Kronecker, Cantor. Biographical and Historical Notes 115 Another Competition: Intutitionism and Infinity Leopold Kronecker was a German mathematician who lived from 1823 to 1891 and is well-known for rivalities with contemporaries. Among his academic teachers were Kummer and Dirichlet, he also got to know Eisenstein and Jacobi during his studies, and they all influenced him to work on number theory and elliptic functions. Kronecker was a wealthy man by managing the banking business of his uncle successfully and marrying his uncle’s daughter. He did not need to take on paid employment since and spent much time for mathematical research. He moved to Berlin and became an elected member of the Berlin Academy for sciences. Kronecker is quite famous for his sentence ”God created the integers, all else is the work of man” in which he expressed his belief that all mathematics could be reduced to arguments which involve only the integers and a finite number of steps. In the 1870s he was opposed to the use of irrational numbers, limits, and the theorem of BolzanoWeierstrass because of their non-constructive nature. Also transcendental numbers could – following Kronecker – not exist.† This ended in a conflict, in particular with the new concepts of set and cardinality introduced by Cantor. However, it should be noticed that there were more mathematicians than only Kronecker that shared his strong feelings against arguments involving, for example, infinity. Kronecker’s viewpoint was further developed by Poincaré and Brouwer who introduced the name intuitionism to stress mathematics is made by mathematicians and thus has priority over logic. Kronecker’s strong feelings against the concept of infinity has to be seen in the context of a new rising mathematical theory in the second half of the nineteenth century: set theory invented by Georg Ferdinand Ludwig Philipp Cantor who was a German mathematician, born 1845 into a family of a broker at St. Petersburg Stock Exchange. During his studies in Berlin Cantor was mostly interested in number theory, however, after his Ph.D. he moved the field from number theory to analysis and the place to Halle. In the 1870s he started his revolutionary work on the foundation of set theory, as we would describe it today. We know of his progress from his communication with his friend and colleague Dedekind. In 1873 Cantor proved that the rational numbers are countable, and that the real numbers are not countable. For this purpose he introduced the concept of bijective mappings to mathematics. Implicitly, he also showed by these means that algebraic numbers are countable and thus almost all numbers are transcendental. Soon after Cantor succeeded in showing that there is a bijective map from the unit interval to d-dimensional space with arbitrary d. His paper on this astonishing result was treated with suspicion by Kronecker and it was published only after Dedekind intervened on Cantor’s behalf. In the period from 1879 to 1884 Cantor published his major papers on set theory, including transfinite arithmetic, however, realizing that his ideas were not widely accepted, Cantor had a first recorded attack of depression. Another reason could also be his unabality to prove his continuum hypothesis that the order of infinity of the real † How this goes along with his approximation theorem, the dear reader may investigate on her or his own! 116 ERGODIC NUMBER THEORY numbers was the next after that of the integers (which has been solved by Gödel and Cohen in the twentieth century). In 1890 Cantor founded the Deutsche Mathematiker-Vereinigung and organized the first meeting of this association in Halle one year later. In the late 1890s Cantor’s ideas were finally accepted by the mathematical community. Both, Hurwitz and Hadamard expressed their positive opinion on his work in their lectures at the first International Congress of Mathematicians 1897 in Zurich. At this time Cantor discovered the first paradoxon in set theory which, probably, caused another depression. For an example of such a paradox we quote from Russel the ’barber paradox’ of the male barber who shaves all men of his town who do not shave themselves, but who shaves the barber? During these heavy periods of his mental illness he was concerned with questions of philosophy and literature (which led him to his belief that Francis Bacon wrote the plays of Shakespeare). In his last years Cantor was faced with the deaths of some of his children and further periods of depression. He died in 1918 of a heart attack. Legend are the words of Hilbert concerning Cantor’s work, in English translation: ”No one will drive us from the paradise which Cantor created for us.” It might be a bit ironic that Kronecker’s results on diophantine approximation give motivation for several mathematical theories, e.g., ergodic theory, which do not question the methods but heavily use non-intuistic concepts as infinity or even Zorn’s lemma. Figure 3. From left to right: Hermann Weyl, ∗ 1885 - † 1955, and Edmund Hlawka ∗ 1916 - † 2009; both contributed to uniform distribution theory. Unfortunately, I could not find any picture of Bohl; we compensate this with a short biography. Less known is the Latvian mathematician Piers Bohl who lived from 1865 to 1921. Besides his contributions to uniform distribution theory Bohl is also known for his proof of Brouwer’s fixed-point theorem for continuous mappings from a sphere into itself, although this result provoked only little interest at that time. Latvia had been under Russian rule since the 18th century; in 1914, because of World War I, Bohl’s institute at Riga was evacuated to Moscow. Bohl went to Moscow with his colleagues. When Latvia regained independence after the Russian Revolution of 1917 and the end of World War I in 1918, although this was only for the time before World War II started, Bohl returned to Riga for a chair at the University of Latvia which had just been established. He died untimely due to a stroke soon after. Biographical and Historical Notes 117 New Theories: Measure, Integral, and Probability There is an interesting old quotation of Galileo: ”Measure what is measurable, and make measurable what is not so.” This is indeed a good device for describing the contributions of French analysts at the turn to the twenteeth century. Emile Borel lived from 1871 to 1956. Already in his thesis he obtained important results on the theory of measure, divergent series, and it also contains the famous covering theorem which nowadays is called the Heine-Borel theorem (and taught in any course on calculus). In the following very productive years (in quantity and quality) he introduced in particular the notion of σ-additivity in this period (cf. [125]) and some of his works are related to Einstein’s theory of relativity which shows a remarkable spectrum of interests. He certainly was influenced by mathematicians as Jordan, Picard, Goursat, Painlevé, and Appell whose daughter Marguerite, a quite famous author writing under the name Camille Marbo, he married. In 1897 he was joint secretary at the first International Congress of Mathematicians held in Zurich, and in 1905 he was elected president of the French Mathematical Society. Besides many awards and further activities it should be mentioned taht he founded the Institut de Statistique de l’Université de Paris in 1922 and, with the financial support of Rockefeller and Rothschild, he set up the Institut Henri Poincaré in 1928. In his many papers on probability theory he stressed its practical value and its variety of applications in various sciences. In the 1920s Borel started a second career in politics. He joined the Republican-Socialist Party, the group to which also Painlevé belonged. Borel became Minister of the Navy in the French Government in the period from 1925 until 1940. During World War II he still produced mathematical works of high level besides his activities in the Résistance. He was arrested and imprisoned in 1941 but released after one month. For his resistance against Vichy he was awarded the Grand Croix Légion d’Honneur. In 1948 he became president of the Science Committee of UNESCO. Borel’s mathematical ideas had a successor in Henri Lebesgue, another French mathematician, who lived from 1875 to 1941. His doctoral thesis and early pieces was a real breakthrough. In the 19th century analysis was limited to continuous function. Generalizing the concept of the Riemann integral as area belwo a curve to discontinuous functions Lebesgue contributed one of the biggest achievements of modern analysis. He worked out the theory of measure in 1901, building on previous works of Borel and Jordan. Moreover, Lebesgue formulated the theory of measure and the definition of what is now known as Lebesgue integral in 1901. More or less the same time the English mathematician Young developed independently an integration theory analogous to the one of Lebesgue; however, as Burkill wrote ”He did not meet the recognition he deserved. This was due in part to his late start, and in part to a certain conservative hostility to the modern theory of real functions - a theory which few Englishmen in the early years of this century understood. Even when his profundity and originality were better appreciated, he was passed 118 ERGODIC NUMBER THEORY over in elections to chairs in favour of men who might be expected to be less exacting colleagues.” Young’s definitions of measure and integration were different from but essentially equivalent to those of Lebesgue. Whereas his work is almost forgotten Lebesgue’s thesis had deep impact on Fourier analysis and awakened this field after some calm period in the second half of the 19th century. In his later career Lebesgue’s interests moved to topology, potential theory, set theory, and the theory of surfaces. He also contributed to pedagogics and history of sciences. Figure 4. From left to right: Borel, Lebesgue, William Henry Young, ∗ 1863- † 1942, and Kolmogorov It took some while before the new concepts of measure and integration were adopted in other disciplines of mathematics, in particular, in probability theory. Andrey Nikolaevich Kolmogorov was born 1903 in Russia. His career and his contributions to various mathematical fields (including dynamical systems) were exceptional. In 1933 Kolmogorov published his influential Grundbegriffe der Wahrscheinlichkeitsrechnung [88], a treatise of probability theory with the first widely accepted axiomatic setting (see [125] for its history and reception). Actually, his approach is now widely considered as mathematical foundation of probability. Before Kolmogorov rather different concepts were proposed, from the very simple experimental one due to Laplace to several more developed, however, in details not satisfying approaches by the French and German schools. Hilbert’s Sixth problem proposed ”to treat (...) by means of axioms those physical sciences in which mathematics plays an important part; in the first rank are the theory of probability and mechanics” In this sense, Kolmogorov has solved the probabilistic part of this task. Moreover, he solved Hilbert’s Thirteenth Problem in 1957 by showing the existence of a continuous function of three variables which is not representable by continuous functions of two variables. Kolomogorov died in 1987. The Struggle for the First Ergodic Theorem We start with the famous French mathematician and physicist Henri Poincaré, who lived from 1854 to 1912, and is often compared with Hilbert for being the last universalist in mathematics (although he had started his scientific career as a mining engineer). He contributed to numerous branches of mathematics. In his 270-pages paper [112] Poincaré solved part of the three-bodies-problem, that is the mathematical description of the orbits of three bodies interacting gravity. This extraordinary work was awarded by swedish king Oscar II. at the occasion Biographical and Historical Notes 119 of his sixties birthday; however, the publication of this work delayed by three years and fifty letters of correspondence with Phragmén and Mittag-Leffler who found a gap in the original version. With his apporach Poincaré set the foundations for for treating chaotic movements and invariant integrals. The complete analytic solution of the three-bodies-problem was given by Sundman in 1907. The stability of a system consisting of three bodies is described by the KAM-theory due to Kolmogorov, Arnold & Moser, established in the period 1954-1964. Poincaré’s studies on topology were path-breaking; in particular, his conjecture about the characterization of the three-dimensional sphere among threedimensional manifolds influenced much of investigation in this field before Perelman’s solution of this millennium problem in 2003. For a more elaborate appreciation of his life and work see http://turnbull.mcs.st-and.ac.uk/∼history/. His works on automorphic forms and elliptic curves were the starting points of fruitful investigations in number theory in the twentieth century. In physics Poincaré is considered together with Lorentz and, of course, Einstein as one of the discoverers of special relativity. He died untimely on embolism; soon after his cousin, Raymond Poincaré, became the President of France from 1913 to 1920. Figure 5. From left to right: Poincaré, Birkhoff, von Neumann For the field of ergodic theory Poincaré’s investigations related to the ergodic hypothesis were most influential. George D. Birkhoff, ∗ 1884 - † 1944, was probably the most famous American mathematician of his time. He worked at the universities at Harvard and Princeton and his main field of interest was analysis. In particular, mathematical physics where he proved Poincarés ‘Last Geometric Theorem’, a special case of the three-bodies-problem. Moreover, he studied the four colour problem and, of course, dynamical systems. His ergodic theorem gave a rigorous foundation of Maxwell’s kinematic theory of gas. Here is a quotation by Butler: “Birkhoff ’s discovery of what has come to be known as the ’ergodic theorem’ in 1931 - 32 is his most well-known contribution to dynamics. This theory, which resolved in principle one of the fundamental problems arising in the theory of gases and statistical mechanics, has been influential not only in dynamics itself but also in probability theory, group theory, and functional analysis.” Birkhoff was awarded the first Bocher Memorial Prize of the American Mathematical Society, later he was vice-president of this association. However, there is 120 ERGODIC NUMBER THEORY also something negative to report about his life. According to Einstein he was one of the world greatest anti-Semites. And it is said that Birkhoff has used his influential position to anticipate engagements of jewish scientists. It should be noticed that also his son Garrett Birkhoff, ∗1911-†1996, played an important role in ergodic theory. In contrast to his father, was not Garrett anti-Semitic. First, he was working in group theory, later, during and after World War II he changed his interests and studied problems in applied mathematics (in particular in numerical linear algebra). During this time he became a friend of John von Neumann and they had an influential joint paper on the logic of quantum mechanics. We conclude with a quotation by Davis on both Birkhoffs: ”G D Birkhoff was an early teacher of mine, and his son Garrett was my (much appreciated) thesis supervisor. G D (but not Garrett) was consistently anti-Semitic, as shown in correspondence over the years (...) He systematically kept Jews out of his department, but apparently relented late in life and favoured appointing ONE by the 1940s. He also helped some Jewish refugees find jobs NOT at Harvard in the 1930s, while acting generally to hinder their entry. Though his record is mixed and some were more implacably anti-Semitic than he was, his actions in this regard are important because of his very great influence. However, it does not seem to be true (as rumoured credibly at the time) that he opposed the appointment of Oscar Zariski to his department. As I mentioned, Garrett was not anti-Semitic at all.” John von Neumann was born 1903 in Budapest. Already the young John (in those times János) gave impressions of his brilliant memory as we quote from Poundstone: “At the age of six, he was able to exchange jokes with his father in classical Greek. The Neumann family sometimes entertained guests with demonstrations of Johnny’s ability to memorise phone books. A guest would select a page and column of the phone book at random. Young Johnny read the column over a few times, then handed the book back to the guest. He could answer any question put to him (who has number such and such?) or recite names, addresses, and numbers in order.” Starting from 1921 von Neumann studied mathematics and chemistry in Budapest, Berlin, and Zurich; amongst his teachers were Weyl and Pólya. In 1926 he finished his doctoral studies with a dissertationabout ordinal numbers in set theoy. Afterwards he taught in Berlin, Hamburg, and Göttingen (together with Hilbert). On invitation by Veblen von Neumann came in 1929 to Princeton to teach about quantum mechanics; soon after he became professor at the newly founded Institute for Advanced Studies (together with Alexander, Einstein, Morse, Veblen, and Weyl). The same time he hold academic positions in Germany, however, after the takeover by the Nazi party many jewish scientists had to leave Germany, in particular von Neumann. His research was extremely broad, ranfing from pure directions as logic and axiomatic set theory, and measure theory to more applied branches as partial differential equations, mathematical foundations of quantum mechanics, statistical mechanics, and operator theory. In Biographical and Historical Notes 121 this context he discovered the first ergod ic theorem at all. Moreover, he contributed to Haar’s development of measure theory for groups which led to a partial solution of the fifths Hilbert problem on characterizing Lie-groups. He is also considered as founder of game theory and the concept of cellular automata in computer science. He was awarded numerous prizes for his mathematics and, besides, he was also well-known for parties and driving. During World War II he contributed many ideas for the construction of the atom and H-bomb in Los Alamos. John von Neumann died untimely at the age of 53 in 1957 on cancer. Figure 6. From left to right: Eberhard Frederich Ferdinand Hopf, ∗1902 - † 1983, an Austrian mathematician who contributed to topology and ergodic theory; Marc Kac, ∗ 1914 - † 1984, Polish-U.S. American mathematician with widely spread interests in probability theory, physics, and number theory; Shizuo Kakutani, ∗ 1911 - † 2004, Osaka born mathematician, who proved together with Yosida the first maximal ergodic theorem. In particular, the question of priority of the discovery of ergodic theorems of von Neumann and Birkhoff and the process of their publication is of interest. It is a fact that von Neumann was the first although his publication was a little later. According to von Neumann’s mean ergodic theorem the Cesàro limit 1 P n 0≤n<N f (T x) is convergent in the mean (of order two) while Birkhoff’s N pointwise ergodic theorem claims its convergence almost everywhere. For applications in physics, convergence in the mean is sufficient. However, for mathematics both are relevant.‡ In an interesting recent paper, Zund [156] presents a lost letter of von Neumann to his friend Robertson from January 1932 in which the process of von Neumann’s discovery of his ergodic theorem is outlined and which indicates Birkhoff’s unwillingness to postpone his article [20] for the publication of von Neumann’s [105]. The latter article was originally written in German and its translation into English took von Neumann and his friend Koopman costed some time. It should be noted that von Neumann then was aiming at settling over from Europe to the U.S.. In [20] Birkhoff’s quotation is insufficient since it does not indicate the priority of von Neumann’s result: ”The important recent work of von Neumann (not yet published) shows only that there is convergence in the mean, (...) and the time probability is not established in the usual sense for any trajectory. ‡ In this course we have used Birkhoff’s theorem several times and von Neumann’s not at all. 122 ERGODIC NUMBER THEORY A direct proof of von Neumann’s results (not yet published) has been obtained by E. Hopf.” Later Birkhoff changed his mind, probably by intervention of Veblen (as Zund speculates), and mentioned the priority of von Neumann’s discovery explicitly. The paper by Hopf [67] contains also some ideas concerning improvements of Birkhoff’s theorem. Whereas Birkhoff’s pointwise ergodic theorem is based on Lebesgue’s integral and measure theory, von Neumann’s approach is a first and striking example of abstract Hilbert space theory. The latter was popularized by von Neumann himself through his work on quantum mechanics and papers of Koopman, a former doctoral student of Birkhoff, who worked on the weak ergodic hypothesis. The Unreasonable Effectiveness of Analysis This is a variation of the title of an intersting article by the physicist Eugene Wigner [152] on the ubiquity of mathematics in nature. We have already mentioned the merits of analysis when applied to number theoretical problems, e.g., in the case of the Riemann zeta-function and its relation to prime number distribution. We do not need to write biographical sketches of those who contributed to prime number theory since, thanks to the famous open Riemann hypothesis, much has been written about their lives.§ But we quote from Davenport who wrote: ”Analytic number theory may be said to begin with the work of Dirichlet, and in particular with Dirichlet’s memoir of 1837 on the existence of primes in a given arithmetic progression.” Indeed, for his proof of the existence of infinitely many primes in any prime residue class from 1837/1838 Dirichlet introduced the important concepts of characters and Dirichlet series. However, for many reasons one may consider Euler as the first to apply analytic methods to arithmetical problems (see the intro of Chapter 6). Figure 7. From left to right: Leonhard Euler, Carl Friedrich Gauss, Johann Peter Gustav Lejeune Dirichlet (his family had roots in the Belgium town of Richelet which explains part of his name Le jeune de Richelet standing for Young from Richelet), and Bernhard Riemann. Another such success stroy of ’applied analysis’ are the results on statistics of continued fraction expansions, starting with Gauss’ observations, the rigorous proofs thereof by Lévy and Kuzmin, and leading to Khinchine’s constant among other interesting results. We start with Paul Lévy, a French mathematician who § A good reading is the popular book The Music of the Primes of Sautoy. Biographical and Historical Notes 123 lived from 1886 to 1971. Being a pupil of Hadamard he worked in analysis, probability and measure theory, however, he focussed on probability theory in 1919 at the occasion of a series of lecture he was asked to deliver at the École Polytechnique. Recall that there was no mathematical foundation of probability at that time, nevertheless, probability theory attracted many young researchers in those days. Besides Kolmogorov in Russia it was Lévy in France who pushed this field forwards; we give a quotation of Loève: ”Paul Lévy was a painter in the probabilistic world. Like the very great painting geniuses, his palette was his own and his paintings transmuted forever our vision of reality. (...) His three main, somewhat overlapping, periods were: the limit laws period, the great period of additive processes and of martingales painted in pathtime colours, and the Brownian pathfinder period.” The next contributor to be mentioned is Aleksandr Yakovlevich Khintchine, ∗ 1894 - † 1959; a Russian mathematician which makes the transcription of his name from cyrillic into latin letters difficult. Besides math he was all his life fascinated by poetry and theatre. In mathematics he started from analysis and probability theory and turned later to number theoretical problems, where, however, his approach was always analytic. At Moscow University he founded together with Kolmogorov and others, including their pupil Gnedenko, the school of probability theory. In this time he widened his interests again and started research in statistical mechanics and information theory. Remarkably, Khintchine wrote monogrophies on more or less each of these topics which became standard references and were translated in various languages. In 1939 he became an elected member of the USSR Academy of Sciences and in the following year he was awarded the State Prize for scientific achievements. Figure 8. From left to right: Khintchine, Lévy, Doeblin; unfortunately, I could not find a picture of Kuzmin. We conclude with Wolfgang Doeblin, not because of his interesting, however, unfortunately short life, but because of his deep contributions to mathematics. Doeblin was born in 1915 in Berlin; his father, Alfred Döblin, was a famous German writer – his masterpiece is ’Berlin Alexanderplatz’ – who emigrated with his family from Nazi Germany in 1933 to first Switzerland and later France where they obtained French citizenship. From 1936 Wolfgang Döblin changed his name to French Vincent Doeblin (although he signed his mathematical works with Wolfgang Doeblin) and started to study economy and statistics at the 124 ERGODIC NUMBER THEORY Sorbonne. Among his academical teachers were Denjoy, Frechet, and Lévy. He finished his thesis in 1938, the same year when he was recruted to the French army. During war he continued his studies in probability theory. In February 1940 he sent his treatise Sur l’équation de Kolmogoroff to the Académie des Sciences in Paris because he was afraid that his studies could get lost. When the German army occupied his batallion in June 1940 Doeblin committed suicide. His contributions to mathematics have not been fully credited for a long time until Iosifescu [70] gave an extensive analysis of his work (apart from Billingsley [17], p.49). Doeblin’s letter to the Académie was opened only in 2000 and its content gave a big surprise to the leading probabilists all around the world. His treatise anticipated many results on stochastic analysis which were found only in the 1950s and even 60s.¶ Last but not Least Much has been written about Paul Erdös, ∗ 1913 - † 1996, the Hungarian cosmopolit who published more papers and who proposed more conjectures than any other mathematician. He worked on various fields as combinatorics, graph theory, number theory, and probability theory. Most of his lifetime he traveled from one math department to the next giving and searching for inspiration for his mathematics. There was an interruption during the McCarthy anti-communist era in the 1950s when the U.S. government denied Erdös a re-entry visa into the United States; the reasons have never been explained, probably this is related to the Cold War and Erdös being a citizen from a communist country. In 1963 the U.S. government changed its opinion and he resumed including American universities in his teaching and travels. We could continue with biographical sketches of another cosmopolit, Bartel van der Waerden, his important work in various branches of mathematics, his contributions to quantum physics, and his historical pieces to popularize mathematics. And we should write also about the work of many others since mathematics is discovered by humans and often the discovery itself is related with an interesting life. However, we better be short and conclude with two of our heroes, from left to right: Arnold and Felix: ¶ The biography ’The lost equation. In search of Wolfgang and Alfred Doeblin’ by Marc Petit gives a very readable account of his and his fathers life. Notations We indicate here some of the notation and conventions used in these notes. However, this list is not complete. We omit notions which only appear in one chapter (where they are defined in situ) or which are covered by the index or which are standard. As usual, we denote by N = {1, 2, 3, . . .} the set of positive integers. The sets of integers, rational numbers, real numbers, and complex numbers are denoted by Z, Q, R, and C, respectively. The logarithm is, if not indicated differently, always taken to the basis e = exp(1). The integer part and fractional part of a real number x are denoted by ⌊x⌋ and {x}, respectively. Very convenient is the use of the Landau- and Vinogradovsymbols. Given two functions f (x) and g(x), both defined for x ∈ X, where g(x) is positive for all x ∈ X, we write • f (x) = O(g(x)) and f (x) ≪ g(x), respectively, if there exists a constant C ≥ 0 such that |f (x)| ≤ Cg(x) for all x ∈ X; • f (x) ≍ g(x) if f (x) ≪ g(x) ≪ |f (x)|; here X is specified either explicitly or implicitly. Usually, the set X is an interval [ξ, ∞) for some real number ξ; in this case we also write • f (x) ∼ g(x) if the limit lim x→∞ |f (x)| g(x) exists and is equal to 1; • f (x) = o(g(x)) if the latter limit exists and is equal to zero; • f (x) = Ω(g(x)) if |f (x)| >0 g(x) (this is the negation of f (x) = o(g(x))). lim inf x→∞ Sometimes the limit x → ∞ is replaced by another limit x → x0 , where x0 is some complex number; in this case the limit x0 is explicitly stated. In estimates, ǫ always denotes a small positive number, not necessarily the same at each appearance. We denote by ♯A the cardinality of a set A. For a probability measure we use the bold faced letter P and E stands for the expectation. 125 Bibliography [1] B. Adamczewski, Y. Bugeaud, On the Littlewood conjecture in simultaneous Diophantine approximation, J. London Math. Soc. 73 (2006), 355-366 [2] R.L. Adler, B. Weiss, The ergodic infinite measure preserving transformation of Boole, Israel J. Math. 16 (1973), 263-278 [3] M. Agrawal, N. Kayal, N. Saxena, PRIMES is in P, Ann. of Math. 160 (2004), 781?793 [4] R. Apéry, Irrationalité de ζ(2) et ζ(3), Astérisque 61 (1979), 11-13 [5] V.I. Arnold, Stochastic and Deterministic Characteristics of Orbits in Chaotically Looking Dynamical Systems, Trans. Moscow Math. Soc. 70 (2009), 31-69 [6] V.I. Arnold, A. Avez, Ergodic Problems of classical mechanics, Benjamin, NY 1968 [7] H. Aslaksen, When is Chinese New Year?, available at www.math.nus.edu.sg/aslaksen/ [8] J. Avigad, P. Gerhardy, H. Towsner, Local stability of ergodic averages, Trans. A.M.S. 362 (2010), 261-288 [9] L. Baéz-Duarte, Sobre el promedio espacial del ciclo de Poincaré, Bull. Venezuela Acad. Sciences 24 (1964), 64-66; engl. translation at http://front.math.ucdavis.edu/0505.5625 [10] D.H. Bailey, P.B. Borwein, S. Plouffe, On the rapid computation of various polylogarithmic constants, Math. Comp. 66 (1997), 903-913 [11] D.H. Bailey, R.E. Crandall, On the random character of fundamental constant expansions, Exper. Math. 10 (2001), 175-190 [12] M. Balazard, E. Saias, M. Yor, Notes sur la fonction de Riemann, 2. Advances Math. 143 (1999), 284-287 [13] F. Bayart, É. Matheron, Dynamics of linear operators, Cambridge University Press 2009 [14] V. Becher, S. Figueira & R. Picchi, Turing’s unpublished algorithm for normal numbers, Theor. Computer Science 377 (2007), 126-138 [15] J. Beck, Super-uniformity of the typical billiard path, in: An irregular mind: Szemerédi is 70, I. Bárány, J. Solymosi (eds.), Springer 2010, 39-130 [16] F. Benford, The law of anomalous numbers, Proc. Amer. Philos. Soc. 78 (1938), 551-572 [17] P. Billingsley, Ergodic theory and Information, John Wiley & Sons, New York 1965 [18] G.D. Birkhoff, Proof of Poincaré’s geometric theorem, Trans. Amer. Math. Soc. 14 (1913), 14-22 [19] G.D. Birkhoff, Démonstration d’un théorème élémentaire sur les fonctions entières, C. R. Acad. Sci. Paris 189 (1929), 473-475 [20] G.D. Birkhoff, Proof of the ergodic theorem, Proc. Nat. Acad. Sci. USA 17 (1931), 656-660 [21] G.D. Birkhoff, What is the ergodic theorem?, Amer. Math. Monthly 49 (1942), 222-226 [22] P. Bohl, Über ein in der Theorie der säkularen Störungen vorkommendes Problem, J. f. Math. 135 (1909), 189-283 126 Bibliography 127 [23] G. Boole, On the comparison of transcendents with certain applications to the theory of definite integrals, Philos. Trans. Roy. Soc. London 147 (1857), 745-803 [24] É. Borel, Les probabilités dénombrables et leurs applications arithmétiques, Rend. Circ. Matematico di Palermo 27 (1909), 247-271 [25] N.G. de Bruijn, K.A. Post, A remark on uniformly distributed sequences and Riemann integrability, Indagationes math. 30 (1968), 149-150 [26] N. Calkin, H.S. Wilf, Recounting the rationals, Am. Math. Mon. 107 (2000), 360-363 [27] C. Carathéodory, Über den Wiederkehrsatz von Poincaré, Sitzungsberichte Preuß Akad. Wiss. (1919), 580-584 [28] J.W.S. Cassels, On a problem of Steinhaus about normal numbers, Colloq. Math. 7 (1959), 95-101 [29] R.V. Chacon, D.S. Ornstein, A general ergodic theorem, III. Journal Math. 4 (1960), 153-160 [30] D.G. Champernowne, The construction of decimals normal in the scale of ten, J. London Math. Soc. 8 (1933), 254-260 [31] G.H. Choe, Computational Ergodic Theory, Springer 2005 [32] A.H. Copeland, P. Erdös, Note on normal numbers, Bull. Amer. Math. Soc. 52 (1946), 857-860 [33] W.A. Coppel, Number Theory. An Introduction to Mathematics, Part B, Springer 2006 [34] R. Crandall, C. Pomerance, Prime numbers. A computational perspective, Springer, 2001 [35] J.P. Crutchfield, J.D. Farmer, N.H. Packard, R.S. Shaw, Chaos, Scientific American 255 (1986), 46-57 [36] A. Csordás, P. Szépfalusy, Singularities in Rényi information as phase transitions in chaotic states, Phys. Rev. A 39 (1989), 4767-4777 [37] K. Dajani, C. Kraaikamp, Ergodic theory of numbers, Mathematical Association of America, Washington DC 2002 [38] P. Deligne, La conjecture de Weil. II. Publ. Math., Inst. Hautes Étud. Sci. 52 (1980), 137-252 [39] A. Denjoy, L’Hypothèse de Riemann sur la distribution des zéros de ζ(s), reliée à la théorie des probabilités, Comptes Rendus Acad. Sci. Paris 192 (1931), 656-658 [40] M. Denker, Einführung in die Analysis dynamischer Systeme, Springer 2005 [41] P. Diaconis, The distributions of leading digits and uniform distribution mod 1, Ann. Probab. 5 (1977), 72-81 [42] F.J. Dyson, H. Falk, Period of a discrete Cat mapping, Amer. Math. Monthly 99 (1992), 603-614 [43] W. Doeblin, Remarques sur la théorie métrique des fractions continues, Composition math. 7 (1940), 353-371 [44] W. Duke, Hyperbolic distribution problems and half-integral weight Maass forms, Invent. Math. 92 (1988), 73-90 [45] P. Ehrenfest, T. Ehrenfest, Begriffliche Grundlagen der statistischen Auffassung in der Mechanik, in: Encyklopaedie der Mathematischen Wissenschaften, Teubner, Leipzig 1912, 1-90 [46] M. Einsiedler, A. Katok, E. Lindenstrauss, Invariant measures and the set of exceptions to Littlewood’s conjecture, Ann. of Math. 164 (2005), 513-560 [47] M. Einsiedler, T. Ward, Ergodic Theory: with a view towards Number Theory, Springer 2010 [48] P.D.T.A. Elliott, The Riemann zeta function and coin tossing, J. reine angew. Math. 254 (1972), 100-109 [49] J. Elstrodt, Maß- und Integrationstheorie, Springer 2007, 8.Auflage 128 ERGODIC NUMBER THEORY [50] P. Erdös, P. Turán, On some integer sequences, J. London Math. Society 11 (1936), 261-264 [51] D. Evans, D. Searls, The fluctuation theorem, Advances in Physics 51 (2002), 1529-1585 [52] C. Faivre, Distribution of Lévy constants for quadratic numbers, Acta Arith. 61 (1992), 13-34 [53] R. Feynman, R. Leighton, M. Sands, The Feynman Lectures on Physics, three volumes, Caltech 1964 [54] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions, J. d’Analyse Math. 71 (1977), 204-256 [55] H. Furstenberg & B. Weiss, Topological dynamics and combinatorial number theory, J. d’Analyse Math. 34 (1978), 61-85 [56] É. Ghys, Variations on Poincaré’s recurrenece theorem, in: The Scientific Legacy of Poincaré, É. Charpentier et al. (eds.), AMS, Providence 2010 [57] B.J. Green, T.C. Tao, The Primes contain arbitrarily long arithmetic progressions, Ann. Math. 167 (2008), 481-547 [58] P.R. Halmos, Lectures on Ergodic Theory, Math. Soc. of Japan, Tokyo 1956 [59] G.H. Hardy, Divergent Series, Clarendon Press, Oxford 1949 [60] G. Harman, Metric Number Theory, Clarendon Press, Oxford 1998 [61] G.H. Hardy, E.M. Wright, An introduction to the theory of numbers, Clarendon Press, Oxford, 1979, 5th ed. [62] H. Heilbronn, On the average length of a class of finite continued fractions, in Number Theory and Analysis (Papers in Honor of Edmund Landau), Plenum, New York 1969, 87–96 [63] F. Hidetoshi, T. Rothman, Sacred Mathematics, Japanese Temple Geometry, Princeton University Press 2008 [64] E. Hlawka, Über die Gleichverteilung gewisser Folgen, welche mit den Nullstellen der Zetafunktion zusammenhängen, Österr. Akad. Wiss., Math.-Naturw. Kl. Abt. II 184 (1975), 459-471 [65] E. Hlawka, Theorie der Gleichverteilung, BIB, Mannheim, 1979 [66] E. Hlawka, C. Binder, Über die Entwicklung der Theorie der Gleichverteilung in den Jahren 1909 bis 1916, Arch. Histor. Exact Sciences 36 (1986), 197-249 [67] E. Hopf, On the time average theorem in dynamics, Proc. Nat. Acad. Sciences 18 (1932), 93-100 [68] W. Hurewicz, Ergodic theorem without invariant measure, Ann. Math. 45 (1944), 192-206 [69] A. Hurwitz, R. Courant, Funktionentheorie, Springer, 4. Auflage 1964 [70] M. Iosifescu, Doeblin and the metric theory of continued fractions: a functionaltheoretic solution to Gauss’ 1812 problem, in: ’Doeblin and modern probability’, AMS, Providence 1993, 97-110 [71] M. Iosifescu, C. Kraaikamp, Metrical theory of Continued Fractions, Kluwer 2002 [72] K. Jacobs, Selecta Mathematica IV, Springer 1972 [73] P. Jolissaint, Loi de Benford, relations de récurrence et suites équidistribuées, Elem. Math. 60 (2005), 10-18 [74] M. Kac, On the notion of recurrence in discrete stochastic processes, Bull. Amer. Math. Soc. 53 (1947), 1002-1010 [75] S. Kakutani, Induced measure preserving transformations, Proc. Imp. Acad. Tokyo 19 (1943), 635-641 [76] S. Kakutani, Examples of ergodic measure preserving transformations which are weakly mixing but not strongly mixing, in “Recent advances in topological dynamics”, Proceedings Conference Yale University in honour of G.A. Hedlund, Lecture Notes Math. 318, Springer 1973, 143-149 Bibliography 129 [77] T. Kamae & M. Keane, A simple proof of the ratio ergodic theorem, Osaka J. Math. 34 (1997), 653-657 [78] Y. Kanada, D. Takahashi, Calculation of π to 51.5 billion decimal digits on distributed memory parallel processors, Trans. Inform. Process. Soc. Japan 39 (1998), 2074?2083 [79] M. Kesseböhmer, B.O. Stratmann, A multifractal analysis for Stern-Brocot intervals, continued fractions and Diophantine growth rates, J. reine angew. Math. 605 (2007), 133-163 [80] A.Yu. Khintchine, Zu Birkhoffs Lösung des Ergodenproblems, Math. Ann. 107 (1933), 485?488. [81] A.Yu. Khintchine, Metrische Kettenbruchprobleme, Compositio Math. 1 (1935), 361-382 [82] A.Yu. Khintchine, Three pearls of number theory, Graylock Press, Baltimore 1952 [83] A. Klenke, Wahrscheinlichkeitstheorie, Springer 2006 [84] K. Knopp, Mengentheoretische Behandlung einiger Probleme der diophantischen Approximationen und der transfiniten Wahrscheinlichkeiten, Math. Ann. 95 (1926), 409426 [85] D. König, A. Szücs, Mouvement d’un point abandonné à l’intérieur d’un cube, Palermo Rend. 36 (1913), 79-90 (in Hungarian) [86] U. Kohlenbach, L. Leuştean, A quantitative mean ergodic theorem for uniform convex Banach spaces, Ergodic Theory Dyn. Syst. 29 (2009), 1907-1915; erratum ibid. 29 (2009), 1995 [87] J.F. Koksma, Ein mengentheoretischer Satz über die Gleichverteilung modulo 1, Compositio Math. 2 (1935), 250-258 [88] A.N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer 1933 [89] A.N. Kolmogorov, S.V. Fomin, Measure, Lebesgue Integrals, and Hilbert Space, Academic Press, New York and London 1961 [90] A.V. Kontorovich, S.J. Miller, Benford’s law, values of L-functions and the 3x+1 Problem, Acta Arith. 120 (2005), 269-297 [91] U. Krengel, Ergodic theorems, de Gruyter 1985 (with a supplement by A. Brunel [92] R.O. Kuzmin, Sur un problem de Gauss, Atti Congr. Itern. Bologne 6 (1928), 83-89 [93] J.C. Lagarias, The ’3X + 1’ Problem and its generalizations, Amer. Math. Mon. 92 (1985), 3-23 [94] E. Landau, Über die Nullstellen der Zetafunktion, Math. Ann. 71 (1912), 548?564 [95] A. Laurinčikas, Limit theorems for the Riemann zeta-function, Kluwer Academic Publishers, Dordrecht 1996 [96] P. Lévy, Sur les lois de probabilité dont dépendent les quotients complets et incomplets d’une fraction continue, Bull. Soc. Math. France 57 (1929), 178-194 [97] M. Lifshits, M. Weber, Sampling the Lindelöf hypothesis with the Cauchy random walk, Proc. London Math. Soc. 98 (2009), 241-270 [98] Yu. V. Linnik, Ergodic properties of algebraic fields, Springer 1968 [99] J.E. Littlewood, On the zeros of the Riemann zeta-function, Proc. Cambridge Phil. Soc. 22 (1924), 295-318 [100] M.H. Martin, Metrically transitive point transformations, Bull. Amer. Math. Soc. 40 (1934), 606-612 [101] K. Matsumoto, Probabilistic value-distribution theory of zeta-functions, Sugaku 53 (2001), 279-296 (in Japanese); engl.translation in Sugaku Expositions 17 (2004), 51-71 [102] K.R. Matthews, A.M. Watts, A generalization of Hasse’s generalization of the Syracuse algorithm, Acta Arith. 43 (1984), 167-175 [103] C. Mauduit, J. Rivat, Sur un probl?me de Gelfond: la somme des chiffres des nombres premiers, Ann. of Math. 171 (2010), 1591-1646 130 ERGODIC NUMBER THEORY [104] W. Narkiewicz, The development of prime number theory, Springer 2000 [105] J. von Neumann, Proof of the quasi-ergodic hypothesis, Nat. Proc. Acad. Sci USA 18 (1932), 70-82 [106] L. Kuipers, H. Niederreiter, Uniform distribution of sequences, John Wiley & Sons, New York 1974 [107] I. Niven, Irrational numbers, Carus Mathematical Monographs, John Wiley & Sons 1963 [108] K. Petersen, Ergodic theory, Cambridge University Press 1989, corrected reprint [109] W. Philipp, Mixing sequences of random variables and probabilistic number theory, Memoirs Amer. Math. Soc. 114, 1971 [110] W. Philipp, O.P. Stackelberg, Zwei Gesetze für Kettenbrüche, Math. Ann. 181 (1969), 152-156 [111] Ch. Pisot, R. Salem, Distribution modulo 1 of the powers of real numbers larger than 1, Comp. Math. 16 (1964), 164-168 [112] H. Poincaré, Sur le problème des trois corps et les équations de la dynamique, Acta Math. 13 (1890), 1-270 [113] H. Poincaré, Les méthodes nouvelles de la mécanique céleste, Paris. GauthierVillars et Fils, 1892-1899 [114] M. Pollicott, M. Yuri, Dynamical Systems and Ergodic Theory, London Mathematical Society 40, Cambridge University Press, 1998k [115] H.A. Rademacher, Fourier Analysis in Number Theory, Symposium on Harmonic Analysis and Related Integral Transforms (Cornell Univ., Ithaca, N.Y., 1956) in: Collected Papers of Hans Rademacher, Vol. II, pp. 434–458, Massachusetts Inst. Tech., Cambridge, Mass., 1974 [116] G.J. Rieger, Mischung und Ergodizität bei Kettenbrüchen nach nächsten Ganzen, J. reine angew. Math. 310 (1979), 171-181 [117] G.J. Rieger, Effective simultaneous approximation of complex numbers by conjugate algebraic integers, Acta Arith. 63 (1993), 325-334 [118] B. Riemann, Über die Anzahl der Primzahlen unterhalb einer gegebenen Grösse, Monatsber. Preuss. Akad. Wiss. Berlin (1859), 671-680 [119] A.M. Rockett, P. Szüsz, Continued fractions, World Scientific 1992 [120] K.F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104-109 [121] W. Rudin, Real and Complex Analysis, Mc Graw Hill 1974, 2nd ed. [122] Ya. Sinai, The central limit theorem for geodesic flows on manifolds of constant negative curvature, Dokl. Akad. Nauk 133 (1960), 1303-1306; translation in Soviet Math. Dokl. 1 (1960), 983-987 [123] W. Schmidt, On normal numbers, Pacific J. Math. 10 (1960), 661-672 [124] F. Schweiger, Multidimensional continued fractions, Oxford 2000 [125] G. Shafer, V. Vovk, The origings and legacy of Kolmogorov’s Grundbegriffe, available at www.probabilityandfinance.com/article/04.pdf [126] C.E. Shannon, A mathematical theory of communication, Bell System Technical J. 27 (1948), 379-423, 623-656 [127] W. Sierpinski, Démonstration élémentaire d’un théoreme de M. Borel sur les nombres absolument normaux et détermination effective d’un tel nombre, Bull. Soc. Math. France 45 (1917), 125-144 [128] J. Steuding, Diophantine Analysis, Chapman & Hall/CRC Press, Boca Raton 2005 [129] J. Steuding, Value distribution of L-functions, Lecture Notes in Mathematics 1877, Springer 2007 [130] J. Steuding, Sampling the Lı̈ndelöf hypothesis by an ergodic transformation, preprint 2010 k a corrected version is available online at www.warwick.ac.uk/∼masdbl/book.html Bibliography 131 [131] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 199-224 [132] S. Tabachnikov, Geometry and billiards, Amer. Math. Soc., Providence 2005 [133] T.C. Tao, A quantitative ergodic theory proof of Szemerédi’s theorem, Electronic J. Combinatorics 13 (2006), R99 [134] T.C. Tao, Norm convergence of multiple ergodic averages for commuting transformations, Ergod. Th. & Dynam. Sys. 28 (2008), 657-688 [135] T.C. Tao, The ergodic and combinatorial approaches to Szemerédi’s theorem, preprint erhältlich unter http://uk.arxiv.org/pdf/math.CO/0604456.pdf [136] R. Taylor, Automorphy for some l-adic lifts of automorphic mod l representations. II, Publ. Math. Inst. Hautes Études Sci. 108 (2008), 183-239 [137] E.C. Titchmarsh, The theory of the Riemann zeta-function, Oxford University Press 1986, 2nd ed., revised by D.R. Heath-Brown [138] A.M. Turing, A note on normal numbers, Collected Works of A.M. Turing, J.L. Britton (Ed.), North Holland, Amsterdam 1992, 117-119 [139] A. Ustinov, On the statistical properties of finite continued fractions, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. 322 (2005), 186–211, 255 (Russian); English translation in J. Math. Sci. 137 (2006), 4722-4738 [140] A. Ustinov, On the Gauss-Kuz-min statistics for finite continued fractions, Fundam. Prikl. Mat. 11 (2005), 195–208 (Russian); English translation in J. Math. Sci. 146 (2007), no. 2, 5771-5781 [141] B.L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw Arch. Wisk. 15 (1928), 212-216 [142] B.L. van der Waerden, Wie der Beweis der Vermutung von Baudet gefunden wurde, Elem. Math. 9 (1954), 49-56; Nachdruck in Elem. Math. 53 (1998), 139-148 [143] W.A. Veech, Teichmüller curves in moduli space, Eisenstein series and an application to triangular billiards, Invent. Math. 97 (1989), 553?583; erratum: Invent. Math. 103 (1991), 447 [144] I.M. Vinogradov, Darstellung einer ungeraden Zahl als Summe von drei Primzahlen, Doklady Akad. Nauk SSSR 15 (1937), 291-294 (in Russisch) [145] J. von Plato, Oresme’s proof of the density of rotations of a circle through an irrational angle, Hist. Math. 20 (1993), 428-433 [146] S.M. Voronin, Theorem on the ’universality’ of the Riemann zeta-function, Izv. Akad. Nauk SSSR, Ser. Matem., 39 (1975), 475-486 (Russisch); Math. USSR Izv. 9 (1975), 443-445 [147] S. Wagon, The Banach-Tarski paradox, Cambridge University Press 1985. [148] P. Walters, Ergodic Theory - Introductory lectures, Lecture Notes in Mathematics 458, Springer 1975 [149] H. Weyl, Sur une application de la théorie des nombres à la mécaniques statistique et la théorie des pertubations, L’Enseign. math 16 (1914), 455-467 [150] H. Weyl, Über die Gleichverteilung von Zahlen mod. Eins, Math. Ann. 77 (1916), 313-352 [151] N. Wiener, A. Wintner, Harmonic analysis and ergodic theory, Amer. J. Math. 63 (1941), 415-426 [152] E. Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences, Comm. Pure Applied Math., 13 (1960), 1-14 [153] G. Wirsching, The dynamical system generated by the 3X + 1 function, Lecture Notes in Mathematics 1681, Springer 1998 [154] E. Wirsing, On the theorem of Gauss-Kusmin-Lévy and a Frobenius-type theorem for function spaces, Acta Arith. 24 (1973/74), 507-528 [155] M. Wolf, Two arguments that the nontrivial zeros of the Riemann zeta function are irrational, preprint availbale at arXiv:1002.4171v1 132 ERGODIC NUMBER THEORY [156] J.D. Zund, George David Birkhoff and John von Neumann: A Question of Priority and the Ergodic Theorems, 1931-1932, Historia Math. 29 (2002), 138-156 Index arithmetic progression 102 Arnold’s cat map 28, 52 Laplace’s demon 109 law of best approximations 84 Lebesgue measure 21 Lebesgue’s theorems 22, 23 Lévy’s theorem 96 Lindelöf hypothesis 73 Littlewood conjecture 99 baker’s transformation 26 BBP-formula 56 Benford’s law 5, 14 billiards 5, 8 Birkhoff’s ergodic theorem 39, 48, 54, 72, 94, 98 Boltzmann’s brain 112 Borel set 18 measure 18 measure preserving 24 mixing 34, 35 Murphy’s law 8 Calkin-Wilf iteration 85 circle group 5 circle rotation 25, 33 cocktail 35 continued fraction 79 continued fraction algorithm 81 convergents 80 Newton iteration 30, 72 normal number 53 orbit 24 π 55, 82, 84 pigeonhole principle 7, 48 Poincaré’s reccurence theorem 47, 49 prime number theorem 68 probability measure 19 Denjoy’s heuristics 69 dense 13 Dirichlet’s approximation theorem 6, 83 doubling-map 25, 34 dynamical system 24 reccurence 47, 49, 104 Riemann hypothesis 65, 69, 74 Riemann zeta-function 61, 96 ergodic 31, 44 ergodicity hypothesis 37, 112 Euclidean algorithm 77, 86 theorem of Gauss-Kuzmin-Lévy 88 thermodynamics 49 trajectory 24 Felix 28, 52, 122 Fibonacci numbers 15, 83 uniform distribution modulo one 9 van der Waerden’s theorem 104 von Neumann’s ergodic theorem 37 Voronin’s universality theorem 75 Gauss measure 89 Gelfand’s problem 5, 13, 25 Weyl’s theorems 10, 11 Kas’s lemma 51 Khintchine’s theorem 93 Khintchine constant 93, 95 Kronecker’s approximation theorem 8 Young’s decomposition 21 zeta zeros 65, 70 133
© Copyright 2025