Search

On the Neighbour Sum Problem

As a kid, I remember being utterly fascinated by this deceptively simple Olympiad problem.

The 64 squares of an 8 × 8 chessboard are filled with positive integers in such a way that each integer is the average of the integers on the neighboring squares. (Two squares are neighbors if they share a common edge or a common vertex. Thus a square can have 8, 5 or 3 neighbors depending on its position). Show that all the 64 integer entries are in fact equal.

Brief Solution : Any given entry must lie in between the smallest and largest entries of its neighbors. Thus, the largest entry on the board must be surrounded by identical entries. This forces all entries to be equal.

Couple of years ago, I got a little cheeky and decided to ask myself what might happen if we take sums instead of averages. Fortunately, I was able to rope in a few equally deranged friends who were as interested as I was in useless problems, and we got a very precise answer to it.

Question : An $n \times n$ chessboard for $n \geq 3$, with each square bearing an integer, is said to be a solution to the neighbor sum problem if each number is the sum of the numbers on the neighboring squares. Two squares are neighbors if they share a common edge or a common vertex. A chessboard with all squares bearing the number zero is said to be a trivial solution. How many such non-trivial solutions exist for a chessboard of given size $n\times n$?

The idea we used to tackle this problem is to imagine a chessboard full of integers and consider the linear transform
$$x \mapsto - x + \sum_{x\sim y} y$$
where $a\sim b$ represents the relation that $a$ is a neighbour of $b$. The kernel of this linear map is precisely the answer to our question.

It turned out that we were lucky and the matrix representing this linear map can be easily studied using theorems that others have proved before (in other words, we let smarter people from the past do the heavy lifting). Using these theorems, it soon became clear to us that an $n\times n$ chessboard admits a non-trivial solution to the neighbour sum problem if and only if there exist $p, q \in \mathbb{N}$ such that
$$\left(1 + 2\cos\left(\frac{p\pi}{n + 1}\right)\right) \left(1 + 2\cos\left(\frac{q\pi}{n + 1}\right)\right) = 2$$
and $1 \leq p, q \leq n$.

Solving this diophantine requires tools from Algebraic Number Theory (which I will not go into in this post) but it turns out that it has a solution if and only if $6\mid (n+1)$. One solution for a $5\times 5$ board is
$$\begin{array}{|c|c|c|c|c|}\hline 1 & 0 & -1 & 0 & 1 \\\hline 1 & 0 & -1 & 0 & 1 \\\hline \phantom{-}0\phantom{-} & \phantom{-}0\phantom{-} & \phantom{-}0\phantom{-} & \phantom{-}0\phantom{-} & \phantom{-}0\phantom{-} \\\hline -1 & 0 & 1 & 0 & -1 \\\hline -1 & 0 & 1 & 0 & -1 \\\hline \end{array}$$
and another is its transpose.

Soon, we realised that these techniques can also be extended to other kinds of more general boards. For example, we proved that non-trivial $m\times n$ chessboards satisfying the neighbor-sum property exist if and only if $2 \mid (m+1)$ and $3 \mid (n+1)$ and that non-trivial $m\times n$ toroidal chessboards satisfying the neighbor-sum property exist if and only if $4 \mid m$ and $6 \mid n$. Of course, the next question to ask is about the dimension of the solution space and that turned out to be more difficult than the first problem. To know more, one can consult our original paper (that has now been accepted in American Mathematical Monthly).

As some of you have already figured out, the more general question to ask is whether it is possible to play the same game on a general graph. This is the question we asked in the first paper and tried to partially answer in a recent preprint where we studied the problem on trees.

Among other things, what we found was an algorithm to test whether a given finite tree satisfies the neighbour sum property. Very briefly, for a given tree and for a fixed (root) vertex $v_0$, we set up the following indexing scheme.



















For notational clarity we will use $\bar{r}_l$ to denote a general indexing $r_1,\: r_2,\: \ldots,\:r_l$.

Finally, for a finite tree with a root vertex $v_0$ such that the longest path starting at $v_0$ is of length $k$, we also define
$$S_{1,k}= S_{1,k}(v_0) := \sum_{r_1}\left(1-\sum_{r_{2}}\left(1-\ldots\sum_{r_{k-1}}\left(1-\sum_{r_k} 1\right)^{-1}\ldots\right)^{-1}\right)^{-1}$$
and generally
$$S_{j,k}(\bar{r}_{j-1}) = S_{j,k}(\bar{r}_{j-1}, v_0) := \sum_{r_j}\left(1-\sum_{r_{j+1}}\left(1-\ldots\sum_{r_{k-1}}\left(1-\sum_{r_k} 1\right)^{-1}\ldots\right)^{-1}\right)^{-1}$$
for $1<j\leq k$.

One of our main theorems is that for a finite tree $\mathcal{T}=(\mathcal{V},\:\mathcal{E})$, there exists a pair $(\mathcal{T}, f)$ satisfying the neighbour sum property if and only if there is at least one vertex $v_0\in \mathcal{V}$ such that $S_{1k}(v_0)=1$.

Using this algorithm, we then wrote a code giving us the number of trees on $n$ vertices that satisfies the neighbour sum property and the output was as follows
$$0,1,0,0,1,2,2,6,14,29,63,166,405,977,2481,6530,16757,43534, 115700,308527, 820856, 2205628, 5971743$$
which we have not been able to explain fully. However, we were able to prove that the set of unlabelled trees on $n$ vertices has a positive density subset all of whose elements satisfy the neighbour sum property.

Finally, we also tackled this problem on infinite trees and found a couple of nice results. To know more, the interested reader is suggested to take a look into the two mentioned papers.

Infinitely many Proofs of Pythagoras' Theorem

This post is inspired by an MSE question and the brilliant answers that it received. However, it can be understood independently. Everything here is joint work with Satvik Saha.

Consider the following diagram made of 16 right triangles of sides $a\le b<c$ and four squares of side $b-a$. This being a tiling of the outer square (of side length $2c$), we have
$$(2c)^2 = 4(b-a)^2 + 16\times \frac 12 ab$$
hence implying
$$c^2 = a^2 + b^2$$
thus proving the Pythagoras' Theorem.

Outer square of side $2c$



















This is similar to previously known proofs (eg. see proof #3 and #116 here), but it is hard to say whether this exact tiling appears anywhere before.

Now, consider the following diagram made up of right triangles of sides $a\le b<c$, (pink) sqaures of side $a$, (yellow) squares of side $b$, and (green) squares of side $(b-a)$. The reader is welcome to check that this also gives a completely valid proof of the Pythagoras' Theorem.

Outer square of side $3c$



















Some of the readers might have started asking the right question. And the answer is yes, it can be generalised! With the outer sqaure of side $nc$, we get $4(n-2)$ of the red and yellow squares, $n^2-4n+8$ of the green squares, and $4(n^2-4n+8)$ of the triangles. This gives
$$n^2 c^2 = 4(n-2) (a^2 + b^2) + (n^2-4n+8) (a-b)^2 + 2(n^2-4n+8) ab$$
hence proving the Pythagoras' Theorem. For your viewing pleasure, here's the $16\times 16$ big square. Feel free to print it out and put it on your wall.

Outer square of side $16c$

Of course, the keen ones among you would ask whether this is truly a tiling for any values of $a$ and $b$ or is it a coincindence of the value chosen in the diagrams. To clear this doubt, here's an animated version of the diagrams. Notice that the animation also shows that it doesn't matter whether you call the larger side of the triangle $a$ or $b$.

Outer square of side $6c$



















The keener ones would of course ask for a proof that this is indeed a tiling. For that, one only needs to note that the diagram is made of four spiky triangles (made of a layer of blue triangles followed by a layer of yellow and a layer of pink squares followed by a number of layers of blue triangles) - this is almost a triangle except two spikes on two sides. All one needs to convince oneself is that these spiky triangles fit with each other properly and leave a space in the middle which can be nicely filled by the green squares.

Feel free to play with the animation here (p5js code here). Also, feel free to check out my answer in the aforementioned MSE post.

Worst Way to Calculate Area of a Triangle

A useful technique from Complex Analysis helps us to identify whether a real number $u$ is positive or negative by checking a discontinuous integral. We have
$$\frac{1}{2\pi i} \int_{c-i\infty}^{c+i\infty} e^{su}\;\frac{\text{d}s}{s} = \begin{cases}0 &\text{ if } u<0\\ \frac 12 &\text{ if } u=0\\ 1 &\text{ if } u>0\end{cases}$$
for $c>0$.

We will apply this to calculate the area of the triangle
$$T(N) := \left\{(x,y)\in \mathbb R_{\ge 0} : ax+by\le N\right\}$$
with $(a,b)=1$. This area is given by
\begin{align*}&\int_{x,y\ge 0} \mathbf{1}_{N-ax-by\ge 0} \;\text{d}x\;\text{d}y\\=&\int_{x,y\ge 0} \frac 1{2\pi i} \int_{s=c-i\infty}^{c+i\infty} e^{s(N-ax-by)}\;\frac{\text{d}s}s \;\text{d}x\;\text{d}y\\=&\frac 1{2\pi i} \int_{s=c-i\infty}^{c+i\infty} \left(\int_{x\ge 0} e^{-asx}\;\text{d}x\right) \left(\int_{y\ge 0} e^{-bsy}\;\text{d}y\right) \frac{e^{sN}}{s}\;\text{d}s\\=&\frac{1}{ab}\cdot \frac{1}{2\pi i} \int_{s=c-i\infty}^{c+i\infty} \frac{e^{sN}}{s^3}\;\text{d}s\end{align*}
by the above mentioned trick. Notice that the integral with $N-ax-by=0$ is zero, and hence was ignored.

Now, moving the contour far to the left we pick up the residue from the pole of order $3$ at $s=0$ and hence the area comes out to be
$$\frac{1}{ab}\cdot \frac{N^2}{2} = \frac{N^2}{2ab}$$
which matches with the area calculated by other known methods!

Acknowledgement : I learnt this from lectures by Prof. Andrew Granville for the course Distribution of Prime Numbers.

Introduction to Sumsets II

This is a sequel to a previous post. However, one can technically read it without reading the previous post.

Given $A,B\subseteq Z$ for a commutative ring $Z$, we define
$$A+B := \{a+b : a\in A, b\in B\}$$
to be the sumset of $A$ and $B$.

Similarly,
$$A-B := \{a-b : a\in A, b\in B\}$$
$$kA := \underbrace{A+\dots A}_{k \text{ times}}$$
$$b+A := \{b\}+A$$
for $A,B\subseteq Z$.

Theorem 1 : We have
$$\left\vert A-C\right\vert \left\vert B\right\vert \le \left\vert A-B\right\vert \left\vert B-C\right\vert$$
where $A$, $B$ and $C$ are finite subset of $\mathbb R$.

Proof : For all $d\in A-C$, we fix the representation $d=a_d-c_d$ where $a_d\in A$ and $c_d\in C$. Now, we define
\begin{align*} &f:(A-C)\times B \to (A-B)\times (B-C)\\ &(d,b)\mapsto (a_d-b, b-c_d) \end{align*}
Note that if $f(x,y)=(u,v)$, then we must have $x=u+v$ and $y$ is determined by the equations $y=a_x-u$ and $y=c_x+v$. So, an image determines its unique inverse, that is, $f$ is injective.

This completes the proof.

Remark : Theorem 1 can be interpreted as a triangle inequality under the norm
$$\Delta(A,B)=\log\frac{\left\vert A-B\right\vert}{\sqrt{\left\vert A\right\vert\left\vert B\right\vert}}$$
although this definition of $\Delta$ is not really a distance as $\Delta(A,A)\neq 0$.

Putting $C=A$ and then $B=-A$ in Theorem 1 we have

Corollary 2 : For all $A,B\subset \mathbb R$, we have
\begin{align*} &\left\vert A-A\right\vert \le \frac{\left\vert A-B\right\vert^2}{\left\vert B\right\vert}\\ &\left\vert A-A\right\vert \le \frac{\left\vert A+A\right\vert^2}{\left\vert A\right\vert} \end{align*}

The last inequality basically says that if $A$ has small doubling, then $A-A$ is also small. Now, we want to show that $A-A$ being small also implies that $A+A$ is. For that, we proceed with

Lemma 3 : Let $A,B\subset \mathbb R$ be finite. If there is a $c>0$ so that
$$\left\vert A+B\right\vert =c\left\vert A\right\vert \qquad \text{and}\qquad \left\vert A^\prime +B\right\vert \ge c\left\vert A^\prime\right\vert\;\;\;\forall A^\prime \subset A,$$
then, we have
$$\left\vert A+B+C\right\vert \le c\left\vert A+C\right\vert$$
for any finite $C\subset \mathbb R$.

Proof : We use induction on the size of $C$. For $\left\vert C\right\vert =1$, i.e., $C=\{x\}$, we have
$$\left\vert A+B+x\right\vert =\left\vert A+B\right\vert =c\left\vert A\right\vert =c\left\vert A+x\right\vert$$
and hence the statement is trivial.

Now, let us assume that the statement is true for a non-empty set $C$, and define $C^\prime := C\cup \{x\}$ for some $x\notin C$. We define
$$A^\prime := \{a\in A : a+x\in A+C\} = A\cap (A+C-x)\subset A$$
so that
$$\left\vert A+C^\prime\right\vert =\left\vert A+C\right\vert +\left\vert A\right\vert -\left\vert A^\prime\right\vert$$
since $A^\prime +x=(A+x)\cap (A+C)$ and $A+C^\prime =(A+C)\cup (A+x)$.

Now, $A^\prime +x\subset A+C \implies A^\prime +B+x\subset A+B+C$ so that
$$A+B+C^\prime = (A+B+C)\cup (A+B+x) = (A+B+C)\cup \left [(A+B+x)\setminus (A^\prime +B+x)\right ]$$
and
$$\left\vert A+B+C^\prime\right\vert \le \left\vert A+B+C\right\vert + \left\vert A+B\right\vert - \left\vert A^\prime +B\right\vert$$
and hence, using the Induction Hypothesis, we have
$$\left\vert A+B+C^\prime\right\vert \le c\left\vert A+C\right\vert +c\left\vert A\right\vert -c\left\vert A^\prime\right\vert =c\left\vert A+C^\prime\right\vert$$
hence completing the proof.

This lemma will help us to prove the two following important consequences.

Theorem 4 : We have
$$\left\vert A+C\right\vert \left\vert B\right\vert \le \left\vert A+B\right\vert \left\vert B+C\right\vert$$
for all $A,B,C\subset \mathbb R$.

Proof : Let $U\subset B$ be such that $\frac{\left\vert U+A\right\vert}{\left\vert U\right\vert}=c$ is minimal. This implies
$$\left\vert U+A+C\right\vert \le c\left\vert U+C\right\vert$$
using the previous lemma.

Again, since $B$ itself is considered when the minimum is selected, we have $\left\vert A+C\right\vert \le \left\vert U+A+C\right\vert$, $\left\vert U+C\right\vert \le \left\vert B+C\right\vert$ and $c\le \frac{\left\vert A+B\right\vert}{\left\vert B\right\vert}$. These four inequalities complete the proof.

Corollary 5 : We have
\begin{align*} &\left\vert A+A\right\vert \le \frac{\left\vert A+B\right\vert^2}{\left\vert B\right\vert}\\ &\left\vert A+A\right\vert \le \frac{\left\vert A-A\right\vert^2}{\left\vert A\right\vert} \end{align*}
for all $A, B$.

Proof : Put $C=A$ and then $B=-A$ in the previous theorem.

Corollary 6 : We have
\begin{align*} &\left\vert A-B\right\vert \le \frac{\left\vert A+B\right\vert^3}{\left\vert A\right\vert \left\vert B\right\vert}\\ &\left\vert A+B\right\vert \le \frac{\left\vert A-B\right\vert^3}{\left\vert A\right\vert \left\vert B\right\vert} \end{align*}
for all $A,B$.

Proof : Use the previous corollary in Theorem 1 and then set $-B$ to $B$.

Corollary 7 : Let $A,B\subset \mathbb R$ be finite. If we have
$$\left\vert A+B\right\vert =c\left\vert A\right\vert \qquad \text{and}\qquad \left\vert A^\prime +B\right\vert \ge c\left\vert A^\prime\right\vert\;\;\;\forall A^\prime\subset A$$
for some $c>0$, then
$$\left\vert A+hB\right\vert \le c^h \left\vert A\right\vert$$
for all positive integers $h$.

Proof : Set $C=(h-1)B$ in Lemma 3 to get $\left\vert A+hB\right\vert \le c\left\vert A+(h-1)B\right\vert$ and use Induction.

Theorem 8 (Plünnecke-Ruzsa Inequality) : Let $B\subset \mathbb R$ be finite satisfying $\left\vert B+B\right\vert \le c\left\vert B\right\vert$ (or $\left\vert B-B\right\vert \le c\left\vert B\right\vert$). Then
$$\left\vert mB-nB\right\vert \le c^{m+n}\left\vert B\right\vert$$
for all $m,n\in \mathbb N$.

Proof : Let the non empty set $A\subset B$ (or $A\subset -B$) be one of the sets that minimizes $\frac{\left\vert A+B\right\vert}{\left\vert A\right\vert}$ so that $\left\vert A+B\right\vert =c^\prime \left\vert A\right\vert$. Obviously $c^\prime\le c$. Also, $\left\vert A^\prime +B\right\vert \le c^\prime \left\vert A^\prime\right\vert$ for all $A^\prime \subset A$ and hence the conditions of Corollary 7 are satisfied. This gives
$$\left\vert hB\right\vert \le \left\vert A+hB\right\vert \le (c^\prime)^h \left\vert A\right\vert \le c^h\left\vert B\right\vert$$
hence proving the statement for the case $m=0$ or $n=0$.

Now, if $m,n\neq 0$, we use Theorem 1 and set $A\mapsto mB$, $B\mapsto -A$ and $C\mapsto nB$ to get
$$\left\vert mB-nB\right\vert \left\vert A\right\vert \le \left\vert A+mB\right\vert \left\vert A+nB\right\vert \le (c^\prime)^m\left\vert A\right\vert (c^\prime)^n \left\vert A\right\vert \le c^{m+n}\left\vert B\right\vert \left\vert A\right\vert$$
hence completing the proof.

Introduction to Sumsets

After finding a quetsion on Math Stack Exchange, I decided to take a large chunk out of my MS thesis and present it.

Given $A,B\subseteq Z$ for a commutative ring $Z$, we define
$$A+B := \{a+b : a\in A, b\in B\}$$
to be the sumset of $A$ and $B$.

Similarly,
$$A-B := \{a-b : a\in A, b\in B\}$$
$$kA := \underbrace{A+\dots A}_{k \text{ times}}$$
$$b+A := \{b\}+A$$
for $A,B\subseteq Z$.

Theorem 1 : If $A$ and $B$ are two finite subsets of $\mathbb R$, then
$$\vert A\vert + \vert B\vert -1 \le \vert A+B\vert \le \vert A\vert \vert B\vert$$
and the lower bound is sharp iff $A$ and $B$ are arithmetic progressions of same difference.

Proof : The upper bound is obvious as we can just choose $A$ and $B$ in such a manner that all the sums of the form $a+b$ are different. One example of such a set is
$$A=\{0,1,\dots ,r-1\},\;\; B=r\cdot\{0,1,\dots ,s-1\}$$
so that
$$A+B=\{0,1,\dots , rs-1\}$$
hence completing the proof.
  
Denoting $A=\{a_1<a_2<\dots <a_r\}$ and $B=\{b_1<b_2<\dots <b_s\}$, and noting that the following chain of inequalities
$$a_1+b_1<a_1+b_2<a_1+b_3<\dots <a_1+b_s<a_2+b_s<\dots <a_r+b_s$$
contains $r+s-1$ different elements of $A+B$ immediately completes the proof of the lower bound.
  
Now, let us assume that the lower bound is tight. Then, the following chain of inequalities
$$a_1+b_1<a_2+b_1<a_2+b_2<a_2+b_3<\dots <a_2+b_s<a_3+b_s<\dots <a_r+b_s$$
and the previous one must be the same.

The first and the last $r-1$ terms are clearly equal, the other $s-1$ terms determine the equations
\begin{align*} a_1+b_2&=a_2+b_1\\ a_1+b_3&=a_2+b_2\\ &.\\ &.\\ &.\\ a_1+b_s&=a_2+b_{s-1} \end{align*}
which on rearranging gives
\begin{align*} b_2-b_1&=a_2-a_1\\ b_3-b_2&=a_2-a_1\\ &.\\ &.\\ &.\\ b_s-b_{s-1}&=a_2-a_1 \end{align*}
and hence $B$ is an arithmetic progression with difference $a_2-a_1$. Since the roles of $A$ and $B$ are symmetric in the statement, then so must be $A$.

In fact, a more general statement is true.

Theorem 2 : If $A_1,A_2,\dots ,A_k$, $k\ge 2$ are finite subsets of $\mathbb R$, then
$$\vert A_1\vert + \vert A_2\vert +\dots \vert A_k\vert -k+1 \le \vert A_1+A_2+\dots A_k\vert \le \vert A_1\vert \vert A_2\vert\dots \vert A_k\vert$$
and the lower bound is sharp iff $A_1,A_2,\dots ,A_k$ are arithmetic progressions of same difference.

Proof : We proceed by Induction.

The case $k=2$ is precisely the last theorem.

Now, let us assume that the statement is true for $k-1$.

Let $A=A_1+A_2+\dots +A_{k-1}$ and $B=A_k$ so that $A+B=A_1+A_2+\dots +A_{k}$. Using the previous theorem and the Induction Hypothesis, we get
\begin{align*} \vert A_1\vert + \vert A_2\vert +\dots \vert A_{k-1}\vert -k+2 +\vert A_k\vert -1 &\le \vert A\vert + \vert B\vert -1\\ &\le \vert A+B\vert\\ &\le \vert A\vert \vert B\vert\\ &\le \vert A_1\vert \vert A_2\vert\dots \vert A_k\vert \end{align*}
and hence the proof follows.
  
Now, we just notice that the left inequality can be an equality only if
$$\vert A_1\vert + \vert A_2\vert +\dots \vert A_{k-1}\vert -k+2 = \vert A_1+A_2+\dots +A_{k-1} \vert = \vert A \vert$$
and
$$\vert A\vert + \vert B\vert -1 = \vert A+B\vert.$$

The first relation implies that $A_1,\dots ,A_{k-1}$ and hence $A_1+\dots +A_{k-1}=A$ are arithmetic progressions of same difference, and the second relation implies that $B$ is also an arithmetic progression of same difference. This completes the proof.


Corollary 3 : Let $A$ be a finite subset of $\mathbb R$. Then, for an integer $k\ge 2$, we have
$$k\vert A \vert -k+1\le \vert kA \vert\le \vert A \vert^k$$
and the lower bound is sharp iff $A$ is an arithmetic progression.


However, it should be noted that the upper bound in the last corollary is not sharp. The sharp upper bound is given by the following theorem.

Theorem 4 : Let $A$ be a finite subset of $\mathbb R$. Then,
$$\vert kA \vert\le \binom{\vert A \vert +k-1}{k}$$
for any integer $k\ge 1$.

Proof : The number of ways of choosing unordered $k$-tuples from a set of cardinality $\vert A \vert$ is the number of non-zero integer solutions of the equation
$$x_1+\dots +x_{\vert A \vert}=k$$
which is $\binom{\vert A \vert+k-1}{k}$.

This upper bound is indeed sharp and the case $k=2$ of the sharp upper bound is called a Sidon set.


The moral of Corollary 3 is that if $A+A$ is smallest possible, then $A$ has a well defined structure, namely it is an arithmetic progression. We want to understand whether $A+A$ being small, say $\le C\left\vert A \right\vert$ still implies some structure in $A$. To do that, we investigate some examples now.

Clearly, if $A$ is a set of $r$ members of an arithmetic progression of length $2r$, then $A+A$ is part of an arithmetic progression of length $4r-1$ and hence $\left\vert A+A \right\vert \le 4r-1$.

Now, considering $A = \{0,1,\dots, r-2, N\}$ for some $N > r-2$, we see that
$$A+A = \{0,1,\dots, 2r-4\} \cup \{N,N+1,\dots,N+r-2\} \cup \{2N\}$$
and so, if $r-2 < N \le 2r-4$, then $\left\vert A+A \right\vert = N + r \le 3r - 4$, and if $N > 2r - 4$, then $\left\vert A+A \right\vert = 3r - 3$. The first one is an example of having $r$ members of an arithmetic progression of size $N + 1$ ($\le 2r - 3$) such that $\left\vert A+A \right\vert \le 3r - 4$. And the second one is an example of the same where $\left\vert A+A \right\vert = 3r - 3$ but $A$ cannot be covered with any arithmetic progression of reasonable size.

And next, we consider
$$Q = \{a_0 + a_1n_1 + \dots + a_dn_d : 0 \le n_j \le N_j,\ j=1,\dots,d\}$$
where $a_0, a_1, \dots, a_d$ are fixed, and $N_1, \dots, N_d$ are integers $\ge 2$. This is a generalized arithmetic progression of dimension $\operatorname{dim}(Q) = d$ and volume $\operatorname{vol}(Q) = N_1 \dots N_d$.

Obviously, $\left\vert Q \right\vert \le \operatorname{vol}(Q)$, and the equality holds only if all the elements are different in which case we call $Q$ to be proper.

It was Gregori Freiman in the 1960’s who recognized if $A + A$ is small, say $\le C\left\vert A \right\vert$, then $A$ still has a special structure, though not that strong. Freiman's theorem is one of the deepest results in the theory of sumsets and it is obligatory to state it in a discussion about Additive Combinatorics.

Theorem 5 (Freiman, Ruzsa) : If $A$ is a set of positive integers and $C \ge 2$ such that $\left\vert A + A \right\vert \le C \left\vert A \right\vert$, then there is a generalized AP, $Q$ with $A \subset Q$, $\operatorname{vol}(Q) \le F(C) \left\vert A \right\vert$ and $\operatorname{dim}(Q) \le d(C)$ for some functions $F(C)$ and $d(C)$.

A Surprising Proof of an Inequality

Problem : Prove that for real numbers $x_1,\dots ,x_n$,
$$\sum_{i=1}^n \sum_{j=1}^n \left\lvert x_i - x_j\right\rvert^r \le \sum_{i=1}^n \sum_{j=1}^n \left\lvert x_i + x_j\right\rvert^r$$
for any $r\in [0,2]$.

Solution : The cases $r=0,2$ can be checked by hand. So, we assume $r\in (0,2)$ and begin by noting that the integral
$$\mathcal I (\alpha) = \int_{0}^\infty \frac{1-\cos \left(\alpha x\right)}{x^{r+1}}\;\text{d}x$$
is positive and finite. Also, it is easy to prove using the substitution $y=|\alpha|x$ that $\mathcal I(\alpha)=\sqrt{|\alpha|}\cdot \mathcal I(1)$.

Now, using this, we have
\begin{align*} |a+b|^r-|a-b|^r &= \frac{1}{\mathcal I(1)} \int_0^\infty \frac{\cos \left((a-b)x\right) - \cos \left((a+b)x\right)}{x^{r+1}} \;\text{d}x\\ &= \frac{1}{\mathcal I(1)} \int_0^\infty \frac{2 \sin (ax) \sin (bx)}{x^{r+1}} \;\text{d}x \end{align*}
and hence
$$\sum_{i=1}^n \sum_{j=1}^n \left\lvert x_i - x_j\right\rvert^r - \sum_{i=1}^n \sum_{j=1}^n \left\lvert x_i + x_j\right\rvert^r = \frac{2}{\mathcal I(1)} \int_0^\infty \frac{\left(\sum_{i=1}^n \sin (x_i x)\right)^2}{x^{r+1}} \;\text{d}x \ge 0$$
hence completing the proof.

Prokhorov’s Theorem

The aim is to give a proof of Prokhorov’s Theorem.

Theorem : Let $\{\mu_n : n\ge 1\}$ be a sequence of probability measures on $\left(\mathbb R^d, \mathcal B\left(\mathbb R^d\right)\right)$. If $\{\mu_n : n\ge 1\}$ is tight, then every sequence has a further subsequence that converges weakly to some probability measure.

We will use the following (well known) fact in our proof.
Fact : Let $\{\mu_n : n\ge 1\}$ be a sequence of probability measures on $\left(\mathbb R^d, \mathcal B\left(\mathbb R^d\right)\right)$. Then, weak convergence $\mu_n\xrightarrow[]{d} \mu_0$ is equivalent to $\int g \;\text{d}\mu_n \to \int g \;\text{d}\mu_0$ for all compactly supported continuous functions.

With this at hand, we give the proof of the theorem.

Proof of Theorem : Let $K^N = [-N,N]^d$. Then, $\mu_n$ induces a subprobability $\nu_n^N := \mu_n|_{K^N}$ on $K^N$. By weak$^\ast$ compactness, we can find a subsequence $\nu_{n_k}^N$ such that $\nu_{n_k}^N \xrightarrow[]{w^\ast} \nu^N$ for a subprobability measure $\nu^N$ on $K^N$. Repeating the argument starting with this subsequence on $K^{N+1}$ gives a new limit $\nu^{N+1}$. Repeating this procedure countably many times (and noticing that $\nu^N = \nu^{N+1}$ on $K^N$), we get
$$\mu(A) := \lim_{N\to \infty} \nu^N \left(A\cap K^N\right)$$
for $A\in \mathcal B\left(\mathbb R^d\right)$.

Now, we will show that $\mu$ is a probability measure. From here, it follows using the mentioned fact that our subsequence converges weakly to $\mu$.

Clearly $\mu\left(\Omega\right)\le 1$ as each $\nu^N$ is a subprobability. Given $\varepsilon>0$, tightness yields $N_\varepsilon$ such that $\mu_n\left(K^{N_\varepsilon}\right)\ge 1-\varepsilon$ for all $n\ge 1$. It follows that $\mu\left(\Omega\right)\ge \mu\left(K^{N_\varepsilon}\right)\ge 1-\varepsilon$. This shows that $\mu\left(\Omega\right)=1$. It remains to prove that $\mu$ is $\sigma$-additive. Clearly $\mu$ is monotone. Let first
$$A = \bigcup_{i=1}^I A_i$$
be a disjoint union of finitely many measurable sets. Then
$$\left\lvert \mu\left(\bigcup_{i=1}^I A_i\right) - \sum_{i=1}^I \mu\left(A_i\right)\right\rvert \le \left\lvert \mu\left(\bigcup_{i=1}^I A_i\cap K^{N_\varepsilon}\right) - \sum_{i=1}^I \mu\left(A_i \cap K^{N_\varepsilon}\right)\right\rvert + \left(I+1\right)\varepsilon = \left(I+1\right)\varepsilon$$
since the restriction of $\mu$ to $K^{N_\varepsilon}$ (which is $\nu^{N_\varepsilon}$) is $\sigma$-additive. Thus $\mu$ is finitely additive. It follows that $\mu$ is $\sigma$-superadditive as
$$\mu(A) \ge \mu\left(\bigcup_{i=1}^I A_i\right) = \sum_{i=1}^I \mu\left(A_i\right) \to \sum_{i=1}^\infty \mu\left(A_i\right)$$
by letting $I\to \infty$.

Conversely, using again that the restriction of $\mu$ to $K^{N_\varepsilon}$ is $\sigma$-additive, we have
\begin{align*} \mu(A) &= \mu\left(A\setminus K^{N_\varepsilon}\right) + \mu\left(\bigcup_{i=1}^\infty A_i \cap K^{N_\varepsilon}\right)\\ &= \mu\left(A\setminus K^{N_\varepsilon}\right) + \sum_{i=1}^\infty \mu\left(A_i\cap K^{N_\varepsilon}\right)\\ &\le \varepsilon + \sum_{i=1}^\infty \mu\left(A_i\right) \end{align*}
thus completing the proof.

Kolmogorov’s Continuity Theorem

The aim is to give a proof of the Kolmogorov’s Continuity Theorem, also known as Kolmogorov-Chentsov Theorem. The main reference we are following is Probability Theory, An Analytic View by Daniel W. Stroock.

Theorem : Suppose a stochastic process $X=\{X(t) : 0\le t\le T\}$ takes values in a Banach space $B$ so that for some $p>1$, $C>0$, $r>0$,
$$\mathbb E\left[\left\lVert X(t)-X(s)\right\rVert_B^p\right]^{\frac 1p} \le C\lvert t-s\rvert^{\frac 1p +r}$$
for all $s,t\in [0,T]$. Then there exists a family $\left\{\tilde{X}(t) : t\in [0,T]\right\}$ of random variables such that $X(t)=\tilde{X}(t)$ almost surely for all $t\in [0,T]$ and $t\in [0,T] \mapsto \tilde{X}(t,\omega)\in B$ is continuous for all $\omega\in \Omega$. In fact, for each $\alpha\in (0,r)$, we have
$$\mathbb E\left[\sup_{0\le s<t\le T} \left(\frac{\left\lVert \tilde{X}(t) - \tilde{X}(s)\right\rVert_B}{(t-s)^\alpha}\right)^p\right]^{\frac 1p} \le KT^{\frac 1p +r-\alpha}$$
for a suitable constant $K=K(\alpha, r, C)$.

Proof : By rescaling time, we can assume without loss of generality, that $T=1$.

Now, for $n\ge 0$, we define
$$M_n := \max_{1\le m\le 2^n} \left\lVert X\left (m2^{-n} \right) - X\left((m-1)2^{-n}\right)\right\rVert_B$$
and observe that
$$\mathbb E\left[M_n^p\right]^{\frac 1p} \le \mathbb E \left[\left(\sum_{m=1}^{2^n}\left\lVert X\left (m2^{-n} \right) - X\left((m-1)2^{-n}\right)\right\rVert_B^p\right)^{\frac 1p}\right]$$
which is bounded by $C2^{-rn}$.

Next, consider the path $t\rightsquigarrow X(t)$ on each interval $\left[(m-1)2^{-n}, m2^{-n}\right]$ and linearize it to $t\rightsquigarrow X_n(t)$. It is easy to see that
$$\max_{t\in [0,1]} \left\lVert X_{n+1}(t) - X_n(t)\right\rVert_B = \max_{1\le m\le 2^n} \left\lVert X\left((2m-1)2^{-n-1}\right) - \frac{X\left((m-1)2^{-n}\right) - X(m2^{-n})}{2}\right\rVert$$
which is bounded by $M_{n+1}$.

So,
$$\mathbb E \left[\sup_{t\in [0,1]} \left\lVert X_{n+1}(t) - X_n(t)\right\rVert_B^p\right]^{\frac 1p} \le C 2^{-rn}$$
and so there exists a measurable $\tilde{X} : [0,1]\times \Omega \to B$ satisfying
$$\mathbb E \left[\sup_{t\in [0,1]} \left\lVert \tilde{X}(t) - X_n(t)\right\rVert_B^p\right]^{\frac 1p} \le \frac{C 2^{-rn}}{1-2^{-r}}$$
and $t\rightsquigarrow \tilde{X}(t,\omega)$ is continuous for all $\omega\in \Omega$.

Next, notice that for all $t\in [0,1]$,
$$\left\lVert \tilde{X}(\tau) - X(t) \right\rVert_B \xrightarrow[\tau\to t]{p} 0$$
and hence $\tilde{X}(\tau) = X(t)$ almost surely.

Finally, note that for $2^{-n-1}\le t-s\le 2^{-n}$, we have
\begin{align*}\left\lVert \tilde{X}(t)- \tilde{X}(s)\right\rVert_B &\le \left\lVert \tilde{X}(t)-X_n(t)\right\rVert_B + \left\lVert X_n(t)-X_n(s)\right\rVert_B + \left\lVert X_n(s) - \tilde{X}(s)\right\rVert_B\\&\le 2 \sup_{\tau\in [0,1]} \left\lVert \tilde{X}(\tau) - X_n(\tau)\right\rVert_B + 2^n(t-s)M_n\end{align*}
and hence
$$\frac{\left\lVert\tilde{X}(t) - \tilde{X}(s) \right\rVert_B}{(t-s)^\alpha}\le 2^{\alpha n +\alpha+1} \sup_{\tau\in [0,1]} \left\lVert \tilde{X}(\tau)-X_n(\tau)\right\rVert_B + 2^{n+\alpha n}M_n$$
which implies
\begin{align*}\mathbb E \left[\sup_{0\le s<t\le 1} \left(\frac{\left\lVert\tilde{X}(t) - \tilde{X}(s) \right\rVert_B}{(t-s)^\alpha}\right)^p\right]^{\frac 1p} &\le C\sum_{n=0}^\infty \left(\frac{2^{\alpha n +\alpha +1-rn}}{1-2^{-r}}+2^{\alpha n-rn}\right)\\ &\le \frac{5C}{(1-2^{-r})(1-2^{\alpha-r})} \end{align*}
hence completing the proof.

Tangent Space of $\text{SL}_2(\mathbb R)$

This was a project I did about three years ago when I was in third year - I thought that it would be a good fit for this blog. I didn't change much from the original (except the necessary ones to change an Overleaf document to a Blogger post).


Introduction

The group $\text{SL}_2(\mathbb R)$ consists of $2\times 2$ real matrices of determinant $1$. So, it is the collection of all points $(a,b,c,d)$ satisfying $ad-bc=1$.

Of course, $\text{SL}_2(\mathbb R)\subset \text{M}_2(\mathbb R)$ as $\text{M}_2(\mathbb R)$ consists of all the points of the form $(a,b,c,d)$. And since $\text{M}_2(\mathbb R)\cong \mathbb R^4$, and $\mathbb R^4$ has the Euclidean metric on it, we can induce that metric to $\text{SL}_2(\mathbb R)$ and consider it as a metric space.


Two properties of $\text{SL}_2(\mathbb R)$ immediately stick out-

1. $\text{SL}_2(\mathbb R)$ is closed in $\text{M}_2(\mathbb R)$.

2. $\text{SL}_2(\mathbb R)$ is unbounded.


Proof of 1 : Consider the function

\begin{align*}&\mathtt{det}:\text{M}_2(\mathbb R)\to \mathbb R\\ &(a,b,c,d)\mapsto ad-bc \end{align*}

Clearly, $\text{SL}_2(\mathbb R)=(\mathtt{det})^{-1}(\{1\})$. And since the $\mathtt{det}$ map is continuous, the result immediately follows.


Proof of 2 : From definition, $(a,b,c,d)\in \text{SL}_2(\mathbb R) \iff ad-bc=1$. So, fixing $b=b_0$ and $d=d_0$, we have

$$a=\frac{1+b_0c}{d_0}$$

which means that any arbitrarily large value of $a$ can be accounted for by using suitably large value of $c$.


Problem 1 :

Let $A\in \text{SL}_2(\mathbb R)$. Show that $\exists \epsilon$ such that $B_{\epsilon}(A)\cap \text{SL}_2(\mathbb R)$ is homeomorphic to an open subset of $\mathbb R^3$.

Solution :

First let us only look at the points of the form $p=\left(a,b,c,\frac{1+bc}{a}\right)$ with $a\neq 0$. So, consider the set 

$$U=\left\{\begin{pmatrix} a&b\\ c&d \end{pmatrix}\in \text{SL}_2(\mathbb R) : a\neq 0\right\}$$ 

Let us consider the map

\begin{align*}\phi: &\,U\to \phi (U)\subset \mathbb R^*\times \mathbb R\times \mathbb R\\ &\,\begin{pmatrix} a&b\\ c&d \end{pmatrix}\mapsto (a,b,c)\end{align*}

where $\mathbb R^*=\mathbb R\setminus \{0\}$.

Clearly, $U$ is an open subset of $\text{SL}_2(\mathbb R)$ as $a\neq 0$ is an open condition. Also, the map $\phi$ is clearly continuous.

Now, the inverse of $\phi$ is given by

\begin{align*}\phi^{-1}:&\,\phi(U)\to U\\ &(a,b,c)\mapsto \begin{pmatrix} a&b\\ c&\,\frac{1+bc}{a} \end{pmatrix}\end{align*}

which is continuous since the inverse of the open set $U_a\times U_b\times U_c\times U_{\frac{1+bc}{a}}$ is $U_a\times U_b\times U_c$ which is open.

So, $\phi$ is a homeomorphism from $U$ to $\phi(U)$.

Now, let $a=0$. But, since $ad-bc=1$, if $a=0$, then $b$ and $c$ must be non-zero. So, now let us consider the set

$$V=\left\{\begin{pmatrix} a&b\\ c&d \end{pmatrix}\in \text{SL}_2(\mathbb R) : b\neq 0\right\}$$

and the map

\begin{align*}\psi: &\,V\to \psi(V)\subset \mathbb R^*\times \mathbb R\times \mathbb R\\&\,\begin{pmatrix}a&b\\c&d\end{pmatrix}\mapsto (a,b,d)\end{align*}

This map is a homeomorphism from $V$ to $\psi(V)$ due to the same arguments as the last one.

So, $\text{SL}_2(\mathbb R)$ is a $3$-dimensional topological manifold with the atlas $\{(U,\phi),(V,\psi)\}$.

So, $\forall A\in \text{SL}_2(\mathbb R)$, $\exists \epsilon>0$ such that $B_{\epsilon}(A)\cap \text{SL}_2(\mathbb R)$ is homeomorphic to an open subset of $\mathbb R^3$.


Problem 2 :

Let $\gamma:(\alpha,\beta)\to \text{SL}_2(\mathbb R)$ be a continuous curve. We say $\gamma$ is smooth if $\gamma$ is a smooth curve in $\mathbb R^4$. Show that $B\cdot \gamma$ is a smooth curve for any $B\in \text{SL}_2(\mathbb R)$.

Solution :

Let

$$B=\begin{pmatrix}a_1&a_2\\b_1&b_2\end{pmatrix}$$

and

$$\gamma(t)=\begin{pmatrix}p_1(t)&q_1(t)\\p_2(t)&q_2(t)\end{pmatrix}$$

Then,

$$B\cdot\gamma(t)=\begin{pmatrix}\sum_{1=1}^2 a_ip_i(t)&\sum_{1=1}^2 a_iq_i(t)\\\sum_{1=1}^2 b_ip_i(t)&\sum_{1=1}^2 b_iq_i(t)\end{pmatrix}$$

Now, since $\gamma$ is a smooth curve, each of the $p_i$'s and $q_i$'s are smooth. So, clearly, $B\cdot \gamma$ is also smooth.

We discuss about this a little bit more in Appendix Notes 1.


Problem 3 :

Let $\gamma$, $\eta$ be smooth curves in $\text{SL}_2(\mathbb R)$ passing through $I=\begin{pmatrix}1&0\\0&1\end{pmatrix}$, i.e., $\gamma(0)=\eta(0)$. We say, $\gamma\sim \eta\iff \gamma^\prime(0)=\eta^\prime(0)$. Show that the set of equivalence classes of smooth curves passing through $I$ is a vector space, denoted by $\mathfrak{sl_2}$, of dimension 3.

Solution :

First, we will try to prove that $\mathfrak{sl_2}$ is the set of all traceless matrices. To do that, let us note that if 

$$x(t)=\begin{pmatrix}x_1(t) & x_2(t)  \\ x_3(t) & x_4(t) \end{pmatrix}$$

is a parametrized curve in $\text{SL}_2(\mathbb{R})$ with

\begin{equation} x(0)=\begin{pmatrix}x_1(0) & x_2(0)  \\ x_3(0) & x_4(0) \end{pmatrix}=\begin{pmatrix}1 & 0  \\ 0 & 1 \end{pmatrix}\end{equation}

Now, if $x(t)\in \text{SL}_2(\mathbb{R})$, then,

\begin{align*}&x_1(t)x_4(t)-x_2(t)x_3(t)=1\\ \implies &x_1^\prime (t)x_4+x_1(t)x_4^\prime (t)-x_2^\prime(t)x_3- x_2(t)x_3^\prime (t)=0\\ \implies &x_1^\prime(0)+x_4^\prime(0)=0\end{align*}

This shows that any element in $\mathfrak{sl_2}$ is traceless.

Now, we want to show that if $A$ is a traceless matrix, then $A$ is in $\mathfrak{sl_2}$. This, we will prove in the answer of the next question. For now, we will assume that this is true.

So, the question boils down to proving that the set of traceless matrices has dimension three. To do so, let us prove that

$$\mathtt{span}\left\{ \begin{pmatrix}1 & 0  \\ 0 & -1 \end{pmatrix}, \begin{pmatrix}0 & 1  \\ 0 & 0 \end{pmatrix}, \begin{pmatrix}0 & 0  \\ 1 & 0 \end{pmatrix}\right\}=T$$

where $T$ is the set of all traceless matrices.

But, this is clearly true as given $A=\begin{pmatrix}a & b  \\ c & -a \end{pmatrix}$, we can write

$$A=a.\begin{pmatrix}1 & 0  \\ 0 & -1 \end{pmatrix} +b.\begin{pmatrix}0 & 1  \\ 0 & 0 \end{pmatrix} +c.\begin{pmatrix}0 & 0  \\ 1 & 0 \end{pmatrix}$$

and since the linear independence of the given vectors is trivially true, this completes our proof.


Problem 4 :

Show that $\mathfrak{sl_2}$ can be identified, in a natural way, to traceless $2\times 2$ matrices. Moreover, the exponential map

\begin{align*} \text{exp}:&\,\{A\in \text{M}_2(\mathbb R) : \text{tr}(A)=0\}\to \text{SL}_2(\mathbb R)\\ &\,A\mapsto e^A \end{align*}

is well defined and produces a smooth curve

\begin{align*} \gamma_A:&\,\mathbb R\to \text{SL}_2(\mathbb R)\\ &\,\gamma_A(t)=\text{exp}(tA) \end{align*}

passing through $I$ with $\gamma_A^\prime (0)=A$.

Solution :

This map is clearly well defined as the matrix exponential map is well defined. Some details about this can be found in Appendix Problem 1.

Now, we want to show that, for any complex matrix $A$,

$$\det(\text{exp}(A))=e^{\text{tr}(A)}$$

To do that, let us recall that every complex matrix has a Jordan normal form and that the determinant of a triangular matrix is the product of the diagonal. So,

$$\exp(A)=\exp(S^{-1} J S ) = S^{-1} \exp(J) S$$

and hence,

\begin{align*} \det(\exp(A))&=\det(\exp(S J S^{-1}))\\&=\det(S \exp(J) S^{-1})\\&=\det(S) \det(\exp(J)) \det (S^{-1})\\&=\det(\exp (J))\\&=\prod_{i=1}^n e^{j_{ii}}\\&=e^{\sum_{i=1}^n{j_{ii}}}\\ &=e^{\text{tr}(J)}\\ &=e^{\text{tr}(A)} \end{align*}

which completes the proof.

So, if trace of $A$ is zero, that immediately gives that the determinant of $\exp (A)$ will be $1$, which (combined with our proof in the last problem) proves that the traceless matrices can be naturally identified with $\mathfrak{sl_2}$.

Now, we have

$$\gamma_A(t):=\text{exp}(tA)$$

This $\gamma_A$ is clearly smooth as

$$\text{exp}(tA)=\sum_{n=0}^\infty \frac{t^nA^n}{n!}$$

which gives

\begin{align*} \left( \text{exp}(tA)\right)^\prime &= \left(\sum_{n=0}^\infty \frac{t^nA^n}{n!}\right)^\prime\\ &=\sum_{n=0}^\infty \frac{nt^{n-1}A^n}{n!}\\ &=\sum_{n=0}^\infty \frac{t^{n-1}A^n}{(n-1)!}\\ &=A.\text{exp}(tA) \end{align*}

which is clearly well defined. We add some rigour to this argument in Appendix Problem 2.

Now,

$$\gamma_A(t)=\text{exp}(tA)$$

which gives

$$\gamma_A(0)=\text{exp}(\mathbf{0})=I$$

and

$$\gamma_A^\prime (t)=A.\text{exp}(tA)$$

which gives

$$\gamma_A^\prime(0)=A.\text{exp}(\mathbf{0})=A$$

hence completing the proof.


Problem 5 :

What is the fundamental group of $\text{SL}_2(\mathbb R)$?

Solution :

We claim that the fundamental group of $\text{SL}_2(\mathbb R)$ is $\mathbb Z$.

To prove our claim, first we will prove that $\text{SL}_2(\mathbb R)$ deformation retracts to $\text{SO}_2(\mathbb R)$. To do that, let us define the retraction map,

\begin{align*} \varphi : &\,\text{SL}_2(\mathbb R)\to \text{SO}_2(\mathbb R)\\ &\, (v_1,v_2)\mapsto \left(\frac{v_1}{||v_1||},\frac{e_2}{||e_2||}\right) \end{align*}

where $e_2=v_2-(u_1.v_2)u_1$ where $u_1=\frac{v_1}{||v_1||}$.

To prove that this is indeed a deformation retract, let us note the homotopy

\begin{align*} \text{H} : &\,\text{SL}_2(\mathbb R)\times I\to \text{SL}_2(\mathbb R)\\ &\,(A,t)\mapsto (1-t)A+t\varphi(A)\end{align*}

so that $\text{H}(A,0)=A$ and $\text{H}(A,1)=\varphi(A)$ hence completing the proof.

Now, we want to show that $\text{SO}_2(\mathbb R)$ is homeomorphic to $S^1$. To do this, let us note that the map

\begin{align*} f:\;&\text{SO}_2(\mathbb R)\to S^1\\ &\begin{pmatrix}\cos \theta & -\sin \theta  \\ \sin \theta & \cos \theta \end{pmatrix}\mapsto e^{i\theta}=(\cos \theta, \sin \theta) \end{align*}

is one-one, onto and continuous.

Proving that this map is an injection is trivial as $(\cos \theta_1, \sin \theta_1)=(\cos \theta_2, \sin \theta_2)$ implies $\theta_1=\theta_2$.
Surjectivity is also trivial as any point in $S^1$ is of the form $(\cos \theta, \sin \theta)$ which has the preimage $\begin{pmatrix}\cos \theta & -\sin \theta  \\ \sin \theta & \cos \theta \end{pmatrix}$.
To prove that the map is a homeomorphism, we simply need to note that it is component wise continuous, and the inverse

\begin{align*} f^{-1}:\;&S^1\to\text{SO}_2(\mathbb R)\\ &(\cos \theta, \sin \theta)\mapsto\begin{pmatrix}\cos \theta & -\sin \theta  \\ \sin \theta & \cos \theta \end{pmatrix} \end{align*}

is also component wise continuous. 

Now, we know that the fundamental group of $S^1$ is $\mathbb Z$. So, the fundamental group of $\text{SO}_2(\mathbb R)$ and hence $\text{SL}_2(\mathbb R)$ should also be $\mathbb Z$. This completes the proof.


Problem 6 :

What does $\text{SL}_2(\mathbb R)$ look topologically?

Solution :

We claim that $\text{SL}_2(\mathbb R)$ is homeomorphic to $S^1\times \mathbb R^2$.

Let us recall that any invertible matrix $M$ has a unique decomposition of the form $M=QR$ where $Q$ is orthogonal and $R$ is upper triangular. So, given $M\in \text{SL}_2(\mathbb R)$, we have

$$M=\begin{pmatrix}\cos \theta & -\sin \theta  \\ \sin \theta & \cos \theta \end{pmatrix}R$$

where $R$ is upper triangular.

So,

$$\det M=\det\begin{pmatrix}\cos \theta & -\sin \theta  \\ \sin \theta & \cos \theta \end{pmatrix}\det R$$

hence giving $\det R=\pm 1$.

So,

$$M=\begin{pmatrix}\cos \theta & -\sin \theta\\\sin \theta & \ \ \ \cos\theta\end{pmatrix}\begin{pmatrix}a&b\\0&1/a\end{pmatrix}$$

where $a>0$, $b\in \mathbb R$.

This proves our claim.


Alternately, we may also take a different and much more beautiful approach and note that the group $\text{SL}_2(\mathbb R)$ acts transitively on $\mathbb{R}^2 \setminus \{(0, 0)\}$ which is clearly homeomorphic to $\mathbb{R} \times S^1$. The stabilizer of the vector 

$$\begin{pmatrix} 1 \\ 0 \end{pmatrix}$$

is the set of matrices of the form

$$\begin{pmatrix} 1 & a \\ 0 & 1 \end{pmatrix}$$

which is clearly homeomorphic to $\mathbb{R}$, hence completing the proof.


Appendix :

Appendix Problem 1 :

Prove that the matrix exponential is well defined.

Proof :

First, we will show that, if $X$ and $Y$ are two $n\times n$ matrices, then

$$||XY|| \le ||X||.||Y||$$

where

$$||X||=\left(\sum_{i,j=1}^n x_{ij}^2\right)^{\frac 12}$$

where $x_{ij}$ is the $ij$-th entry of $X$.

So, let $X=\left\{x_{ij}\right\}$ and $Y=\left\{y_{ij}\right\}$. Using Cauchy-Schwarz Inequality, we have

\begin{align*} ||XY||^2&=\sum_{i,j} (xy)_{ij}^2\\ &\le \sum_{i,j} \left(\sum_{k}x_{ik}^2\right)\left(\sum_{k}y_{kj}^2\right)\\ &\le \left(\sum_{i,k} x_{ik}^2\right)\left(\sum_{j,k} y_{kj}^2\right)\\ &=||X||^2||Y||^2 \end{align*}

hence completing the proof.

Now, let us consider the sequence of partial sums

$$S_n=\sum_{k=0}^n \frac{X^k}{k!}$$

We will show that $\{S_n\}_{n=1}^\infty$ is an uniformly Cauchy sequence. To do so, let us note that for $n>m$, we have

\begin{align*} ||S_n-S_m|| &=\left\lVert \sum_{k=m+1}^n \frac{X^k}{k!}\right\rVert\\ &\le \sum_{k=m+1}^n \frac{||X^k||}{k!}\\ &\le \sum_{k=m+1}^n \frac{||X||^k}{k!}\end{align*}

which of course goes to 0, as the sequence of partial sums, $\sum_{k=1}^n \frac{x^k}{k!}$ uniformly converges (as a sequence of reals) to $e^{x}$. So, $\{S_n\}_{n=1}^\infty$ is an uniformly Cauchy sequence.

Now, we know that $(\mathbb R^{n\times n}, ||.||_{\mathbb R^{n\times n}})$ is complete. So, $\{S_n\}_{n=1}^\infty$ being an uniformly Cauchy sequence, must converge uniformly.

This completes the proof.


Appendix Problem 2 :

Prove that $\gamma_A$ as defined in Problem 3 is a smooth curve.

Proof :

Let us consider the partial sums of $\exp (tA)$ given by

\begin{align*} S_n&=\sum_{k=0}^n \frac{(tA)^k}{k!}\\ &=\sum_{k=0}^n \frac{t^kA^k}{k!} \end{align*}

So, we have

\begin{align*} S_n^\prime &=\left(\sum_{k=0}^n \frac{t^kA^k}{k!}\right)^\prime\\ &=\sum_{k=0}^n \frac{kt^{k-1}A^k}{k!}\\ &=A.\sum_{k=0}^n \frac{t^kA^k}{k!} \end{align*}

which (using the result of Appendix Problem 1) converges.


The result that we have used in the last argument is from Rudin's Principles of Mathematical Analysis, Theorem 7.17, given by
Suppose $\{f_n\}$ is a sequence of functions, differentiable on $[a,b]$ and such that $\{f_n(x_0)\}$ converges for some point $x_0$ on $[a,b]$. If $\{f_n^{\prime}\}$ converges uniformly on $[a,b]$, then $\{f_n\}$ converges uniformly on $[a,b]$, to a function $f$, and

$$f^{\prime}(x)=\lim_{n\to\infty}f_n^\prime(x),\quad(a\leq x\leq b).$$


Arguing inductively, we can show that $\gamma_A$ is $C^\infty$ and

$$\gamma_A^{(n)}=A^{n-1}.\gamma_A$$

which completes the proof.


Alternately, we could have also noted that each entry of $\gamma_A$ being a power series in $t$, is $C^\infty$ and hence $\gamma_A$ is also $C^\infty$.


Appendix Problem 3 :

We want to understand what a smooth curve in $\text{SL}_2(\mathbb R)$ actually means. This we will do in the next appendix notes. For now, we wish to set the stage and look at $\text{SL}_2(\mathbb R)$ as a differentiable manifold.

Proof :

We know that, to obtain a $\mathcal C^1$-differentiable manifold from a topological manifold with atlas $\mathscr A$, we only need to check that every transition map between charts in $\mathscr A$ is differentiable in the usual sense.
In this case, we have the atlas $\mathscr A=\{(U,x),(V,y)\}$ (from solution of Problem 1). It is easy to see that

$$\left(y\circ x^{-1}\right).(a,b,c)=y.\begin{pmatrix}a & b  \\ c & \frac{1+bc}{a} \end{pmatrix}=\left(a,b,\frac{1+bc}{a}\right)$$

This gives us the transition map

\begin{align*} y\circ x^{-1}:&x(U\cap V)\to y(U\cap V)\\ &(a,b,c)\mapsto \left(a,b,\frac{1+bc}{a}\right) \end{align*}

Similarly,

$$\left(x\circ y^{-1}\right).(a,b,d)=y.\begin{pmatrix}a & b  \\ \frac{ad-1}{b} & d \end{pmatrix}=\left(a,b,\frac{ad-1}{b}\right)$$

giving us

\begin{align*} x\circ y^{-1}:&y(U\cap V)\to x(U\cap V)\\ &(a,b,d)\mapsto \left(a,b,\frac{ad-1}{b}\right) \end{align*}

Clearly, the transition maps are differentiable as $a\neq 0$ and $b\neq 0$.


This proves that $\mathscr A$ is a differentiable atlas. So, we see that $\text{SL}_2(\mathbb R)$ is a differentiable manifold.


Appendix Note 1 :

Let us recall that a function $f:\mathbb R^n\to M$ on a manifold $M$ is called smooth if for all charts $(U,\phi)$ the function

$$\phi\circ f:\mathbb R^n \to \phi(U)$$

is smooth (in the Euclidean sense).

In our case (in Problem 2), we were already given the definition that $\gamma$ is smooth if $\gamma$ is a smooth curve in $\mathbb R^4$. We want to unify the two definitions which will establish that $\text{SL}_2(\mathbb R)$ is a good enough manifold. So, given the two charts $\{(U,\phi),(V,\psi)\}$, let us look at

\begin{align*} &\phi\circ\gamma : \mathbb{R} \to \gamma \cap U \to\mathbb{R}^3\\ &\phi\circ \gamma(t)=\phi\circ \begin{pmatrix} p_1(t)&q_1(t)\\ p_2(t)&q_2(t) \end{pmatrix}=\left(p_1(t),q_1(t),p_2(t)\right) \end{align*}

and

\begin{align*} &\psi\circ\gamma : \mathbb{R} \to \gamma \cap V \to\mathbb{R}^3\\ &\psi\circ \gamma(t)=\psi\circ \begin{pmatrix} p_1(t)&q_1(t)\\ p_2(t)&q_2(t) \end{pmatrix}=\left(p_1(t),q_1(t),q_2(t)\right) \end{align*}

Now, these being maps from $\mathbb R^n$ to $\mathbb R^m$ are smooth in the Euclidean sense as the components are smooth.

Once this unification is done, we can now say that component-wise smoothness is enough to guarantee manifold-wise smoothness. This justifies the argument we used in Problem 2 to show that $B\cdot \gamma$ is smooth.


Appendix Problem 4 :

Prove that $\text{SL}_2(\mathbb R)$ is a topological space.

Proof :

Let us recall that if $N$ is a subset of $M$, and $\mathcal O$ is a topology on $M$, then

$$\mathcal O \big |_N:=\{U\cap N: U\in \mathcal O\}$$

equips $N$ with the subset topology inherited from $M$.

So, let us take the standard topology on $\mathbb R$ given by

\begin{align*} &B_r(x):=\{y\in \mathbb R: |x-y|<r\}\\ &U\in \mathcal O_{\mathbb R} \iff \forall x\in U\;\exists r>0:B_r(x)\subset U \end{align*}

Now, we can equip $\mathbb R^4$ with the product topology so that we can finally define

$$\mathcal O:=\left (\mathcal O_{\mathbb R}\right )\big |_{\text{SL}_2(\mathbb R)}$$

so that $(\text{SL}_2(\mathbb R),\mathcal O)$ is a topological space.

This added to Appendix Problem 3 and Problem 1 shows that $\text{SL}_2(\mathbb R)$ is a differentiable topological manifold.


Appendix Problem 5 :

Prove that $\text{SL}_2(\mathbb R)$ forms a group under the usual definition of multiplication.

Proof :

The usual definition of multiplication is of course given by

\begin{align*} &\cdot : \text{SL}_2(\mathbb R)\times\text{SL}_2(\mathbb R) \to \text{SL}_2(\mathbb R)\\ &\left(\begin{pmatrix}a & b  \\ c & d \end{pmatrix},\begin{pmatrix}e & f  \\ g & h \end{pmatrix}\right)\mapsto \begin{pmatrix}ae+bg & af+bh  \\ ce+dg & cf+dh \end{pmatrix} \end{align*}

This operation is closed as real numbers are closed under addition and multiplication. It is also straightforward to check that

1. this operation is associative

2. has the identity element $\begin{pmatrix}1 & 0  \\ 0 & 1 \end{pmatrix}$, and

3. for each $\begin{pmatrix}a & b  \\ c & d \end{pmatrix}$, admits the inverse $\begin{pmatrix}d & -b  \\ -c & a \end{pmatrix}$.


This completes the proof that $\text{SL}_2(\mathbb R)$ is a (non-commutative) group.


Appendix Problem 6 :

Prove that $\text{SL}_2(\mathbb R)$ is connected.

Proof :

We will prove the more general case of $\text{SL}_n(\mathbb R)$.

First, let us prove that the elementary matrices of first type generate $\text{SL}_n(\mathbb R)$. This is Exercise 2.4.8(b) of Artin's Algebra.

If $E$ is an elementary matrix, then $X\mapsto EX$ can be described as follows: if $E$ is Type 1, with the off-diagonal entry $a$ at $(i,j)$, we do $r_i\mapsto r_i+ar_j$; if $E$ is Type 3, with the modified diagonal entry $c\ne0$ at index $i$, then $r_i\mapsto cr_i$.

Note that Type 1 matrices have determinant $1$ and Type 3 matrices have determinant $c$.

In order to show$$M = E_1E_2\dots E_k$$for some (permitted) elementary matrices $E_i$, it suffices to show $$I_n = F_kF_{k-1}\dots F_1M$$for some elementary $F_i$, since then$$M = F_1^{-1}\dots F_k^{-1},$$as elementary matrices are invertible, and their inverses are elementary as well.

Now, we consider $M\in \text{SL}_n(\mathbb{R})$. Using the row operations corresponding to Type 1 elementary matrices, we turn column $i$ into $e_i$ ($1$ at position $i$, $0$ elsewhere) from left to right.

Take the leftmost column $i$ with $c_i \ne e_i$, if it exists (otherwise, we are done). Since $\det(M) = 1 \ne 0$, we can not have $c_i$ written as a linear combination of$$c_1=e_1,\dots,\,c_{i-1}=e_{i-1};$$hence one of entries $i,i+1,\dots,n$ must be nonzero, say $j$.

Subtracting $r_j$ from the other rows as necessary, we first clear out all column $i$ (except for row $j$). Note that none of this affects columns $1$ through $i-1$. If $i=n$, we have a diagonal matrix with determinant $1$ and the first $n-1$ entries all $1$'s, so we are done. Otherwise, if $i<n$, pick an arbitrary row $k\ne j$ from $i$ to $n$, and add a suitable multiple of $r_j$ to $r_k$ so that the $(k,i)$ entry becomes $1$. Now subtract a suitable multiple of $r_k$ from $r_j$ so the $(j,i)$ entry becomes $0$. If $k=i$, we can proceed to column $i+1$; otherwise, add $r_k$ to $r_i$ and subtract $r_i$ from $r_k$, and then proceed to column $i+1$.

Now, Let $\sim$ be the binary operation corresponding to path-connectivity in $\text{SL}_n(\mathbb{R})$. Clearly, $\sim$ is an equivalence relation.

In order to show $\text{SL}_n(\mathbb{R})$ is path-connected, it suffices to show $A\sim I_n$ for all $A\in \text{SL}_n(\mathbb{R})$. But, we just proved that $A$ can be written as a (possibly empty) product of elementary matrices of the first type, so it in fact suffices to prove that

$$E_{uv}(a)M\sim M$$

for all $M\in \text{SL}_n(\mathbb{R})$ and Type 1 elementary matrices $E_{uv}(a)$ ($1\le u,\,v\le n$) of the form

$$I_n + [a[(i,\,j) = (u,\,v)]]_{i,\,j\,=\,1}^n$$

Yet

$$M\to E_{uv}(b)M$$

simply adds $b$ times row $j$ to row $i$, i.e. takes $r_i$ to $r_i+br_j$. For fixed $u$, $v$, $M$, this map is continuous in $b$ (and preserves the determinant), so the continuous function

$$X(t) = E_{uv}(ta)M$$

over $[0,1]$ takes

$$X(0) = M \to X(1) = E_{uv}(a)M$$

while remaining inside $\text{SL}_n(\mathbb{R})$, as desired.

Alternately, we can also first prove that $\text{GL}_n^+(\mathbb R):=\big\{A\in M_n(\mathbb R): \det(A)>0\big\}$ is connected, and then consider the continuous surjective map

$$\Psi\colon \text{GL}_n^+(\mathbb R)\ni A\mapsto \frac{A}{\det A}\in \text{SL}_2(\mathbb R)$$

and recall that continuous image of a connected set is connected.

Now, we can prove $\text{GL}_n^+(\mathbb R)$ is connected by induction on $n$. Consider the map

$$p\colon \text{M}_n(\mathbb R)=\Bbb R^n\times \text{M}_{n(n-1)}(\mathbb R) \to \Bbb R^n$$

given by $p(A)=Ae_1$. That is $p$ sends $A\in \text{M}_n(\mathbb R)$ to its first column. Note that $p$ is a projection map, hence open and continuous. 

Note that, $\text{GL}_1^+(\mathbb R)=(0,\infty)$. Now, let $f\colon \text{GL}_n^+(\mathbb R)\to \Bbb R^n-\{0\}$ be the restriction of $p$ to the $\text{GL}_n^+(\mathbb R)\subseteq_{\text{open}}\text{M}_n(\mathbb R)$. So, $f$ is also open continuous. 

Next, $f^{-1}(e_1)=\Bbb R^{n-1}\times \text{GL}^+(n-1,\Bbb R)$, which is a connected set by induction. For $y\in \Bbb R^n-\{0\}$ choose $B\in \text{GL}_n^+(\mathbb R)$ with $f(B)=y$. Then, $f^{-1}(y)=\big\{B\cdot C:C\in f^{-1}(e_1)\big\}$. Hence each fibre of $f$ is connected. But, we know that if $Y$ be connected and $f\colon X\to Y$ is a surjective continuous map having connected fibers, and if $f$ is open, then $X$ is also connected. This, we can prove by contradiction-

If possible, write $X=U\bigsqcup V$ where $U,V$ are non-empty open subsets of $X$. Then, $f(U),f(V)$ are open subsets of $Y$ such that $f(U)\cup f(V)=Y$. Now, $Y$ is connected implies $f(U)\cap f(V)\not=\emptyset.$ Take, $y\in f(U)\cap f(V)$, then $f^{-1}(y)\cap U\not =\emptyset$ and $f^{-1}(y)\cap V\not=\emptyset$, contradicts to the fact that fibers are connected sets.

This proves that $\text{SL}_2(\mathbb R)$ in particular is path connected, and hence connected.


Appendix Problem 7 :

Prove that $\text{SL}_2(\mathbb R)$ forms a Lie group.

Proof :

We have equipped $\text{SL}_2(\mathbb R)$ with both a group and a manifold structure. In order to obtain a Lie group structure, we have to check that these two structures are compatible, i.e, we need to show that the two maps

\begin{align*} \mu:&\text{SL}_2(\mathbb R)\times\text{SL}_2(\mathbb R)\to\text{SL}_2(\mathbb R)\\ &\left(\begin{pmatrix}a & b  \\ c & d \end{pmatrix},\begin{pmatrix}e & f  \\ g & h \end{pmatrix}\right)\mapsto \begin{pmatrix}ae+bg & af+bh  \\ ce+dg & cf+dh \end{pmatrix} \end{align*}

and

\begin{align*} i:&\text{SL}_2(\mathbb R)\to\text{SL}_2(\mathbb R)\\ &\begin{pmatrix}a & b  \\ c & d \end{pmatrix}\to \begin{pmatrix}a & b  \\ c & d \end{pmatrix}^{-1}\end{align*}

are differentiable with the differentiable structure on $\text{SL}_2(\mathbb R)$. For instance, for the inverse map $i$, we have to show that the map $y\circ i\circ x^{-1}$ is differentiable in the usual sense for any pair of charts $(U,x),(V,y)\in \mathscr A$.

\begin{array}{c} U\subseteq \text{SL}_2(\mathbb R) \xrightarrow{\quad i\quad} V\subseteq \text{SL}_2(\mathbb R) \\ \downarrow x\qquad\qquad\qquad\quad \downarrow y \\ x(U)\subseteq \mathbb R^3\xrightarrow{y\circ i\circ x^{-1}} y(V)\subseteq \mathbb R^3 \end{array}

But, since $\text{SL}_2(\mathbb R)$ is connected, the differentiability of the transition maps in $\mathscr A$ implies that if $y\circ i\circ x^{-1}$ is differentiable for any two given charts, then it is differentiable for all charts in $\mathscr A$. Hence, we can simply let $(U,x)$ and $(V,y)$ be the two charts on $\text{SL}_2(\mathbb R)$ defined above. then we have

$$\left(y\circ i\circ x^{-1}\right)(a,b,c)=(y\circ i)\left(\begin{pmatrix}a & b  \\ c & \frac{1+bc}{a} \end{pmatrix}\right)=y\left(\begin{pmatrix}\frac{1+bc}{a} & -b  \\ -c & a \end{pmatrix}\right)=\left(\frac{1+bc}{a},-b,a\right)$$

which is clearly differentiable as a map between open subsets of $\mathbb R^3$ as $a\neq 0$ on $x(U)$.

To prove that $\mu$ is differentiable, we can proceed almost similarly once we have a differentiable structure on the product manifold $\text{SL}_2(\mathbb R)\times \text{SL}_2(\mathbb R)$. Or, we may also argue that the matrix multiplication $\text{M}_2(\mathbb R)\times\text{M}_2(\mathbb R)\to\text{M}_2(\mathbb R)$ is

given by smooth expressions in the entries (involving products and sums) and hence it is a smooth map, from which it follows that the restriction to $\text{SL}_2(\mathbb R)\times\text{SL}_2(\mathbb R)\to\text{SL}_2(\mathbb R)$ is also smooth.

This completes the proof that $\text{SL}_2(\mathbb R)$ is a 3-dimensional Lie group.

An Application of a Theorem I Just Learned

Question : Recall that a polygonal number is a number that counts dots arranged in the shape of a regular polygon. Prove that the $n$-th non $s$-gonal number is given by
$$\left\lfloor \frac 12 + \sqrt{\frac 14 + \frac{2n}{s-2}}\right\rfloor + n$$
where $\lfloor x\rfloor$ represents the least integer less than or equal to $x$.

Answer : It is well known that
$$(s-2)\frac{n(n-1)}{2}+n$$
is the $n$-th $s$-gonal number.

Now, we will use the Lambek-Moser Theorem to get
$$f(n)=(s-2)\frac{n(n-1)}{2}$$
and hence
$$f^\ast (n) = \left\lfloor \frac 12 + \sqrt{\frac 14 + \frac{2n}{s-2}}\right\rfloor$$
and hence
$$F^\ast (n) = \left\lfloor \frac 12 + \sqrt{\frac 14 + \frac{2n}{s-2}}\right\rfloor + n$$
hence completing the proof.

Remark : A careful induction on $n$ (keeping $s$ fixed) would probably have also done the job.

Acknowledgement : I learnt about the Lambek-Moser Theorem from the book The Irrationals: a Story of the Numbers You Can't Count On by Julian Havil (section Theoretical Matters of chapter Does Irrationality Matter?) which I also reviewed for zbMATH

The Law of Iterated Logarithm

1. Introduction

The goal is to give a complete proof of the Law of Iterated Logarithm. The setup we will work in is the following. Let $\{X_i\}_{i=1}^\infty$ be a sequence of iid random variables with mean $0$ and variance $1$. Let
$$S_n := \sum_{i=1}^n X_i \qquad \text{ and } \qquad \bar{S}_n := \frac {S_n}n $$
for $n\in \mathbb Z^+$. From now on, we will use $X_n$ and $S_n$ as defined in our discussions.

We wish to prove the following theorem.
Theorem 1 (The Law of Iterated Logarithm) : Let
$$\Lambda_n := \sqrt{2n \log \log n}$$
wherever it is defined. Then,
$$\limsup_{n\to \infty} \frac{S_n}{\Lambda_n} = 1$$
almost surely. By replacing $X_j$ by $-X_j$, this also proves that
$$\liminf_{n\to \infty} \frac{S_n}{\Lambda_n} = -1$$
almost surely. In fact, for any $f\in C(\mathbb R, \mathbb R)$,
$$\limsup_{n\to \infty} f\left(\frac{S_n}{\Lambda_n}\right) = \sup_{t\in [-1,1]} f(t)$$
almost surely. That is, the set of limit points of
$$\left\{\frac{S_n}{\Lambda_n} : n\ge 1\right\}$$
coincides with the interval $[-1,1]$ almost surely.

In what follows, we will use the notation
$$\Lambda_\beta := \Lambda_{\left\lfloor\beta\right\rfloor} \qquad \text{ and } \qquad \tilde{S}_\beta := \frac{S_{\left\lfloor\beta\right\rfloor}}{\Lambda_\beta}$$
where $\left\lfloor\beta\right\rfloor$ is greatest integer less than or equal to $\beta$.

The main reference we are following is Probability Theory, An Analytic View by Daniel W. Stroock.

2. Lemmas

To prove Theorem 1, we need three lemmas.

Lemma 1 : For $a\in (0,\infty)$ and $\beta\in (1,\infty)$, if
$$\sum_{m=1}^\infty \mathbb P \left(\left\lvert \tilde{S}_{\beta^m}\right\rvert \ge \frac{a}{\sqrt \beta}\right)<\infty$$
then
$$\limsup_{n\to \infty} \left\lvert \tilde{S}_n\right\rvert \le a$$
almost surely.

Lemma 2 : For any sequence $\{X_n : n\ge 1\}$ of iid random variables with mean $0$ and variance $\sigma^2$, we have
$$\limsup_{n\to \infty} \left\lvert \tilde{S}_n \right\rvert \le 8\sigma$$
almost surely.

Lemma 3 : Let the sequence $\{X_n\}$ have a common distribution $\mu$. Further, let the moment-generating function
$$M_\mu (\xi) := \int_{\mathbb R} e^{\xi x} \mu(\text{d}x) < \infty$$
for all $\xi \in \mathbb R$. Then, for all $R\in (0,\infty)$, there is an $N(R)\in \mathbb Z^+$ and a suitable constant $K$ such that
$$\mathbb P\left(\left\lvert \tilde{S}_n\right\rvert \ge R\right) \le 2\, \exp \left [-\left(1-K\sqrt{\frac{8R\log\log n}{n}}\right) R^2 \log\log n\right]$$
for all $n\ge N(R)$. Also, for each $\varepsilon\in (0,1]$, there is an $N(\varepsilon)\in \mathbb Z^+$ such that for all for all $n\ge N(\varepsilon)$ and $\left\lvert a\right\rvert \le \frac{1}{\varepsilon}$,
$$\mathbb P \left(\left\lvert \tilde{S}_n - a\right\rvert < \varepsilon\right) \ge \frac 12 \, \exp \left[-\left(a^2 +4K\left\lvert a\right\rvert \varepsilon \right)\log \log n\right]$$
for some suitable constant $K$.

3. Proof of Theorem
We will first give a proof assuming the three lemmas and then give a proof of the lemmas afterwards.

Proof of Theorem 1 : Observe first that for any $X_n$ and $\varepsilon>0$, it is easy to find a $\psi \in C_\text{b}(\mathbb R)$ so that $Y_n := \psi \circ X_n$ has mean $0$ and variance $1$ while $Z_n := X_n - Y_n$ has variance less than $\varepsilon^2$ (what we want is to find a smooth truncation - one way to do this is by first considering $X_n\cdot \mathbf 1_{\left\lvert X_n\right\rvert<n}$ and then appropriately smoothening the edges). Applying Lemma 2 to this $Z_n$, we have
$$\limsup_{n\to \infty} \left\lvert \tilde{S}_n(\omega)\right\rvert \le 1 + \limsup_{n\to \infty} \left\lvert \frac{\sum_{m=1}^n Z_m(\omega)}{\Lambda_n}\right\rvert \le 1+8\varepsilon$$
and for $a\in [-1,1]$, we have
$$\liminf_{n\to \infty}\left\lvert \tilde{S}_n(\omega)-a\right\rvert \le \limsup_{n\to \infty} \left\lvert \frac{\sum_{m=1}^n Z_m(\omega)}{\Lambda_n}\right\rvert \le 8\varepsilon$$
for almost every $\omega\in \Omega$. So, it is enough to assume that the random variables are bounded, and that is what will be assumed in the rest of the proof.

Now, for a given $\beta >1$, we can use Lemma 3 to argue
$$\mathbb P\left(\left\lvert\tilde{S}_{\beta^m}\right\rvert \ge \sqrt\beta\right) \le 2\, \exp\left[-\sqrt\beta \log\log \left\lfloor \beta^m\right\rfloor\right]$$
for all sufficiently large $m\in \mathbb Z^+$. So, using Lemma 1 with $a=\beta$, we see that
$$\limsup_{n\to \infty} \left\lvert\tilde{S}_n\right\rvert \le \beta$$
almost surely for all $\beta\in (1,\infty)$.

So, to complete the proof, we need to show
$$\mathbb P\left(\liminf_{n\to \infty} \left\lvert \tilde{S}_n -a\right\rvert <\varepsilon\right)=1$$
for all $a\in (-1,1)$ and all $\varepsilon>0$. To do this, we wish to use the Borel-Cantelli Lemma. But for that, we need to make sure we are dealing with independent events. For this, we can use the previously proved result to argue that
$$\liminf_{n\to \infty} \left\lvert \tilde{S}_n -a\right\rvert \le \inf_{k\ge 2}\, \liminf_{m\to \infty} \left\lvert \tilde{S}_{k^m} -a\right\rvert \le \liminf_{k\to \infty}\,\liminf_{n\to \infty} \left\lvert\frac{S_{k^m}-S_{k^{m-1}}}{\Lambda_{k^m}}-a\right\rvert$$
almost surely for all integers $k\ge 2$.

Now, since the events
$$A_{k,m} := \left\{\left\lvert\frac{S_{k^m}-S_{k^{m-1}}}{\Lambda_{k^m}} -a\right\rvert <\varepsilon\right\}_{m\in \mathbb Z^+}$$
are mutually independent for each $k\ge 2$, it is enough to check that
$$\sum_{m=1}^\infty \mathbb P\left(A_{k,m}\right)=\infty$$
for sufficiently large $k$.

But, since
$$\lim_{k\to \infty} \,\max_{m\in \mathbb Z^+} \left\lvert\frac{\Lambda_{k^m}}{\Lambda_{k^m-k^{m-1}}} -1\right\rvert = 0$$
and
$$\mathbb P\left(A_{k,m}\right) = \mathbb P\left(\left\lvert \tilde{S}_{k^m -k^{m-1}}-\frac{a\,\Lambda_{k^m}}{\Lambda_{k^m -k^{m-1}}}\right\rvert < \frac{\varepsilon \,\Lambda_{k^m}}{\Lambda_{k^m -k^{m-1}}}\right)$$
for all $k\ge 2$, it is enough to show
$$\sum_{m=1}^\infty \mathbb P\left(\left\lvert \tilde{S}_{k^m -k^{m-1}}-a\right\rvert < \varepsilon\right) = \infty$$
for every $k\ge 2$ and $a\in (-1,1)$ and $\varepsilon>0$. This follows by using the second part of Lemma 3 and choosing $\tilde \varepsilon>0$ so small that $\rho := a^2 + 4K\tilde{\varepsilon}|a|<1$ from which it is possible to conclude that when $0<\varepsilon<\tilde{\varepsilon}$, we have
$$\mathbb P\left(\left\lvert \tilde{S}_n -a\right\rvert <\varepsilon\right) \ge \frac 12 \, \exp\left[ -\rho \log\log n\right]$$
for all sufficiently large $n$.

This completes the proof. $\square$

4. Proof of Lemma 1

We first introduce the idea of a median. Give a real valued random variable $Y$, we say $\alpha\in \mathbb R$ is a median of $Y$ if
$$\mathbb P(Y\le \alpha) \ge \frac 12 \qquad \text{ and } \qquad \mathbb P(Y\ge \alpha) \ge \frac 12$$
and write as $\alpha\in \operatorname{med}(Y)$. It is easy to see that
$$\alpha:= \inf \left\{t\in \mathbb R : \mathbb P(Y\le t)\ge \frac 12\right\}$$
is a median of $Y$. In other words, every $Y$ has a median. Additionally, it can be shown that if $Y\in L^2 (\mathbb R)$ and $m$ is the mean of $Y$, then
$$\left\lvert \alpha-m \right\rvert \le \sqrt{2 \,\text{Var}(Y)}$$
for all $\alpha\in \text{med}(Y)$.

We also have the following theorem.
Theorem 2 (Lévy Reflection Principle) : For $k\le \ell$, choose $\alpha_{\ell,k} \in \operatorname{med}(S_{\ell}-S_k)$. Then, for all $\varepsilon>0$, we have
$$\mathbb P\left(\max_{1\le n\le N} \left\lvert S_n +\alpha_{n,m}\right\rvert \ge \varepsilon\right)\le 2\mathbb P(\left\lvert S_N\right\rvert \ge \varepsilon)$$
for all $N\in \mathbb Z^+$.

With this at hand, we move to the proof.

Proof of Lemma 1 : Let $\beta\in (1,\infty)$ be given. For each $m\in \mathbb N$ and $1\le n\le \beta^m$, let $\alpha_{m,n}$ be a median of $S_{\left\lfloor \beta ^m\right\rfloor}-S_n$. This gives $\left\lvert \alpha_{m,n}\right\rvert \le \sqrt{2\beta^m}$ so that
\begin{align*}\limsup_{n\to \infty} \left\lvert \tilde{S}_n \right\rvert &= \limsup_{m\to \infty} \max_{\beta^{m-1}\le n\le \beta^m} \left\lvert \tilde{S}_n \right\rvert\\ &\le \sqrt{\beta} \,\limsup_{m\to \infty} \,\max_{n\le \beta^m} \frac{\left\lvert S_n\right\rvert}{\Lambda_{\beta^m}}\\ &\le \sqrt \beta \,\limsup_{m\to \infty} \,\max_{n\le \beta^m} \frac{\left\lvert S_n + \alpha_{m,n}\right\rvert}{\Lambda_{\beta^m}} \end{align*}
which gives
$$\mathbb P \left(\limsup_{n\to \infty} \left\lvert \tilde{S}_n \right\rvert \ge a\right) \le \mathbb P\left(\limsup_{m\to \infty}\max_{n\le \beta^m} \frac{\left\lvert S_n + \alpha_{m,n}\right\rvert}{\Lambda_{\beta^m}} \ge \frac{a}{\sqrt \beta}\right)$$
and hence on application of Theorem 2, we have
$$\mathbb P\left(\max_{n\le \beta^m} \frac{\left\lvert S_n + \alpha_{m,n}\right\rvert}{\Lambda_{\beta^m}} \ge \frac{a}{\sqrt \beta}\right) \le 2\mathbb P\left(\left\lvert \tilde{S}_{\beta^m}\right\rvert \ge \frac{a}{\sqrt \beta}\right)$$
from which the desired result follows using Borel-Cantelli Lemma. $\square$

5. Proof of Lemma 2

Again, we need to introduce a few objects before we can move to our proof.

We define the Rademacher functions $\{R_n\}_{n\in \mathbb Z^+}$ on $[0,1)$ as follows. Consider the function $R:\mathbb R\to \{-1,1\}$ given by
$$R(t) := \begin{cases} -1 & \text{if}\qquad t-\lfloor t\rfloor \in \left[0,\frac 12\right)\\ \,\,\,\,\,1 & \text{if}\qquad t-\lfloor t\rfloor \in \left[\frac 12, 1\right) \end{cases}$$
and define $R_n$ on $[0,1)$ by
$$R_n(\omega) := R\left(2^{n-1} \omega\right)$$
for $n\in \mathbb Z^+$ and $\omega\in [0,1)$. It can be shown that the Rademacher functions are mutually independent.

We will use these functions in our proof.

Proof of Lemma 2 : Without loss of generality, it can be assumed that $\sigma=1$. We begin by considering the case where the $X_n$'s are symmetric, i.e., $X_n$ and $-X_n$ has the same distribution as $X_n$ itself. For this case, we will be able to prove the result with the constant $8$ replaced by $4$ once we can show
$$\sum_{m=0}^\infty \mathbb P\left(\left\lvert \tilde{S}_{2^m}\ge 2^{3/2}\right\rvert\right)<\infty$$
since we can then use Lemma 1.

Now, let $\left(\Omega, \mathcal F, \mathbb P\right)$ be the probability space on which the $X_n$'s are defined and let $\{R_n\}_{n\in \mathbb Z^+}$ be the sequence of Rademacher functions on $[0,1)$. Set $\mathbb Q := \lambda_{[0,1)}\times \mathbb P$ on $\left([0,1)\times \Omega , \mathcal B_{[0,1)} \times \mathcal F\right)$ where $\lambda_{[0,1)}$ is the Lebesgue measure on $[0,1)$. It is easy to check that the symmetry of $X_n$ is equivalent to the statement that
$$\omega \in \Omega \mapsto \left(X_1(\omega), X_2(\omega), \dots\right) \in \mathbb R^{\mathbb Z^+}$$
has the same distribution under $\mathbb P$ as
$$(t,\omega) \in [0,1)\times \Omega \mapsto \left(R_1(t)X_1(\omega), R_2(t)X_2(\omega), \dots\right)\in \mathbb R^{\mathbb Z^+}$$
has under $\mathbb Q$.

Now, from Hoeffding's inequality, we know that for a given $\{\sigma_k\}_{k=1}^n \in \mathbb R$ and mutually independent random variables $\{X_i\}_{i=1}^n$ with common distribution $\mu$, the centered Bernoulli distribution given by $\mu\left(\{\pm 1\}\right)=\frac 12$, if $\nu$ denotes the distribution $S := \sum_{1}^n \sigma_k X_k$, then for all $a\in [0,\infty)$, we have
$$\mathbb P \left(|S|\ge a\right) \le 2 \,\exp \left[-\frac{a^2}{2 \Sigma^2}\right]$$
where $\Sigma^2 := \sum_{1}^n \sigma_k^2$. We use this with $\sigma_k = X_k(\omega)$ and note that
$$\lambda_{[0,1)} \left(\left\{ t\in [0,1) : \left\lvert \sum_{n=1}^{2^m} R_n(t)X_n(\omega)\right\rvert \ge a\right\}\right) \le 2 \,\exp \left[-\frac{a^2}{2\sum_{1}^{2^m} X_n(\omega)^2}\right]$$
for all $a\in [0,\infty)$ and $\omega\in \Omega$.

So, if we define
$$A_m := \left\{\omega \in \Omega : \frac{1}{2^m} \sum_{n=1}^{2^m} X_m(\omega)^2 \ge 2\right\}$$
and
$$F_m(\omega) := \lambda_{[0,1)} \left(\left\{t\in [0,1) : \left\lvert \sum_{n=1}^{2^m} R_n(t)X_n(\omega)\right\rvert \ge 2^{3/2} \Lambda_{2^m}\right\}\right)$$
then we have
\begin{align*} \mathbb P \left(\left\{ \omega\in \Omega : \left\lvert S_{2^m}(\omega)\right\rvert\ge 2^{3/2} \Lambda_{2^m}\right\}\right) &= \int_{\Omega} F_m(\omega) \mathbb P\left(\text{d}\omega\right) \\ &\le 2 \int_{\Omega} \exp \left[-\frac{8\Lambda_{2^m}^2}{2 \sum_{1}^{2^m} X_n(\omega)}\right] \mathbb P\left(\text{d}\omega\right)\\ &\le 2\, \exp\left[-4 \log\log 2^m\right] + 2\mathbb P \left (A_m \right) \end{align*}
due to Tonelli's Theorem.

So, it is enough to show $\sum_0^\infty \mathbb P\left(A_m\right) <\infty$. To do so, let us set
$$T_m := \sum_{n=1}^{2^m} X_n^2 \qquad B_m := \left\{\frac{T_{m+1}-T_m}{2^m}\ge 2\right\} \qquad \bar{T}_m := \frac{T_m}{2^m}$$
for $m\in \mathbb Z^+$. Clearly $\{B_m\}_{m\in \mathbb Z^+}$ are independent and $\mathbb P\left(A_m\right) = \mathbb P\left(B_m\right)$. So, it is enough to show
$$\mathbb P\left(\limsup_{m\to \infty} B_m\right) = \mathbb P \left(\limsup_{m\to \infty} \frac{T_{m+1}-T_m}{2^m} \ge 2\right) = 0$$
and then use Borel-Cantelli Lemma.

But, by Strong Law of Large Numbers, we know $\bar{T}_m \to 1$ almost surely, and hence
$$\frac{T_{m+1}-T_m}{2^m} \to 1$$
almost surely. This completes the proof for the symmetric case replacing the constant $8$ by $4$.

To eliminate the symmetry assumption, let $\left(\Omega, \mathcal F, \mathbb P\right)$ be the space on which $X_n$'s are defined and define $\{Y_n\}$ on $\left(\Omega^2, \mathcal F^2, \mathbb P^2\right)$ as
$$Y_n (\omega_1, \omega_2) := X_n(\omega_1) - X_n(\omega_2)$$
for $n\in \mathbb Z^+$. It is easy to check that the $Y_n$'s are symmetric. This gives
$$\limsup_{n\to \infty} \frac{\left\lvert S_n(\omega_1) - S_n(\omega_2)\right\rvert}{\Lambda_n} \le 2^{5/2} \le 8$$
for $\mathbb Q$ almost every $(\omega_1,\omega_2)\in \Omega^2$.

Now, we wish to proceed by contradiction. If possible, let
$$\limsup_{n\to \infty} \frac{\left\lvert S_n(\omega)\right\rvert}{\Lambda_n} >8$$
on a set of positive $\mathbb P$ measure so that by Kolmogorov's $0-1$ Law, there is an $\varepsilon>0$ such that
$$\limsup_{n\to \infty} \frac{\left\lvert S_n(\omega)\right\rvert}{\Lambda_n}\ge 8+\varepsilon$$
for $\mathbb P$ almost every $\omega\in \Omega$. Using Fubini's Theorem, this gives that for $\mathbb P^2$ almost every $(\omega_1,\omega_2)\in \Omega^2$, there is a $\left\{ n_m(\omega_1) : m\in \mathbb Z^+\right\}\subset \mathbb Z^+$ such that $n_m(\omega_1)$ increases to $\infty$ so that the limit
$$\lim_{m\to \infty} \frac{S_{n_m(\omega_1)}(\omega_1)}{\Lambda_{n_m(\omega)}} \ge 8+\varepsilon$$
exists and
$$\liminf_{m\to \infty} \frac{\left\lvert S_{n_m(\omega_1)}(\omega_2)\right\rvert}{\Lambda_{n_m(\omega_1)}} \ge \lim_{m\to \infty} \frac{S_{n_m(\omega_1)}(\omega_1)}{\Lambda_{n_m(\omega)}} - \limsup_{m\to \infty} \frac{\left\lvert S_{n_m(\omega)}(\omega_1) - S_{n_m(\omega)}(\omega_2)\right\rvert}{\Lambda_{n_m(\omega_1)}} \ge \varepsilon$$
so that by Fubini's Theorem, we have an $\{n_m : m\in \mathbb Z^+\}\subset \mathbb Z^+$ such that $n_m$ increases to $\infty$ and
$$\liminf_{m\to \infty} \frac{\left\lvert S_{n_m}(\omega_2)\right\rvert}{\Lambda_{n_m}}\ge \varepsilon$$
for $\mathbb P$ almost every $\omega_2\in \Omega$. This contradicts Markov's Inequality owing to
$$\mathbb E \left[\left(\frac{S_n}{\Lambda_n}\right)^2\right] = \frac{1}{2\log\log n} \to 0$$
hence completing the proof. $\square$

Proof of Lemma 3

As usual, we need to discuss some new ideas before we can move to our proof. We define
$$\Lambda_{\mu}(\xi) := \log \left(M_\mu(\xi)\right)$$
and define the Legendre transform of $\Lambda_\mu$ as
$$I_\mu (x) := \sup \left\{\xi x - \Lambda_\mu (\xi) : \xi\in \mathbb R\right\}$$
for $x\in \mathbb R$. Also, let $\Xi_\mu := \left(\Lambda_\mu^\prime\right)^{-1}$.

Lemma 4 : There exists a $\delta\in (0,1]$ and a $K\in (0,\infty)$ such that $[m-\delta, m+\delta]\subset (\alpha,\beta)$ and
$$\left\lvert \Lambda_{\mu}^{\prime \prime}(\Xi(x))\right\rvert \le K \qquad \left\lvert \Xi_{\mu}(x)\right\rvert \le K\left\lvert x-m\right\rvert$$
and
$$\left\lvert I_\mu (x) - \frac{(x-m)^2}{2\sigma^2} \right\rvert \le K\left\lvert x-m\right\rvert^3$$
for all $x\in [m-\delta, m+\delta]$. In particular, if $0<\varepsilon <\delta$, then
$$\mathbb P\left(\left\lvert \bar{S}_n-m\right\rvert \ge \varepsilon\right)\le 2 \exp\left[-n \left(\frac{\varepsilon^2}{2\sigma^2}-K\varepsilon^3\right)\right]$$
and if $|a-m|<\delta$, then
$$\mathbb P\left(\left\lvert \bar{S}_n-m\right\rvert < \varepsilon\right)\ge \left(1-\frac{K}{n\varepsilon^2}\right)\exp \left[-n \left(\frac{|a-m|^2}{2\sigma^2} + K|a-m|\left(\varepsilon + |a-m|^2\right)\right)\right]$$
for any $\varepsilon>0$.

Proof : Recall that we started with the assumption that our random variables have mean $0$ and variance $1$. So, we have $\Lambda_\mu(0)=\Lambda_\mu^\prime(0)=0$ and $\Lambda_\mu^{\prime \prime}(0)=1$ and hence $\Xi_\mu(0)=0$ and $\Xi^\prime _\mu (0)=1$. So, we can find an $M\in (0,\infty)$ and a $\delta\in (0,1]$ with $\alpha<-\delta<\delta<\beta$ for which
$$\left\lvert \Xi_{\mu}(x) -x\right\rvert \le M|x|^2$$
whenever $|x|\le \delta$ and
$$\left\lvert \Lambda_{\mu}(\xi)-\frac{\xi^2}{2}\right\rvert \le M|\xi|^3$$
whenever $|\xi|\le (M+1)\delta$. This immediately proves
$$\left\lvert \Xi_\mu (x) \right\rvert \le (M+1) |x|$$
for $|x|\le \delta$ and the estimate for $I_\mu$ follows since $I_\mu(x)=\Xi(x)x-\Lambda_\mu(\Xi_\mu (x))$. $\square$

This gives us enough room to discuss our proof.

Proof of Lemma 3 : Begin by setting
$$\lambda_n := \frac{\Lambda_n}{n}=\sqrt{\frac{2\log\log n}{n}}$$
wherever it is defined.

To prove the first part, we apply the upper bound of $\mathbb P\left(\left\lvert \bar{S}_n-m\right\rvert \ge \varepsilon\right)$ obtained in Lemma 4 to see that
$$\mathbb P\left(\left\lvert \tilde{S}_n\right\rvert \ge R\right) = \mathbb P\left(\left\lvert \bar{S}_n\right\rvert \ge R\lambda_n\right) \le 2 \,\exp\left[-n \left(\frac{R^2\lambda_n^2}{2}-KR^3\lambda_n^3\right)\right]$$
for sufficiently large $n\in \mathbb Z^+$.

To prove the next part of the lemma, we note
$$\mathbb P\left(\left\lvert \tilde{S}_n-a\right\rvert < \varepsilon\right) = \mathbb P\left(\left\lvert \bar{S}_n-a_n\right\rvert < \varepsilon_n\right)$$
where $a_n=a\lambda_n$ and $\varepsilon_n=\varepsilon \lambda_n$.So, using the lower bound on $\mathbb P\left(\left\lvert \bar{S}_n-m\right\rvert < \varepsilon\right)$ obtained in Lemma 4 to conclude
\begin{align*} \mathbb P\left(\left\lvert \tilde{S}_n-a\right\rvert < \varepsilon\right) &\ge \left(1-\frac{K}{n\varepsilon_n^2}\right) \exp\left[-n \left(\frac{a_n^2}{2}+K|a_n|\left(\varepsilon_n + a_n^2\right)\right)\right]\\ &\ge \left(1-\frac{K}{2\varepsilon^2 \log\log n}\right) \exp\left[-\left(a^2 + 2K|a|\left(\varepsilon + a^2 \lambda_n\right)\right)\log\log n\right] \end{align*}
for sufficiently large $n$. $\square$


Acknowledgement : I'd like to thank Satvik Saha, who has been my partner in reading this proof, patiently answering my barrage of inane questions and making clever observations that I've shamelessly passed off as my own in this write-up. I feel safe admitting it here because, let’s be real, no one’s reading this far anyway.

On the Neighbour Sum Problem

As a kid, I remember being utterly fascinated by this deceptively simple Olympiad problem. The 64 squares of an 8 × 8 chessboard are filled ...