\documentclass{article}%
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}%
\setcounter{MaxMatrixCols}{30}
%TCIDATA{OutputFilter=latex2.dll}
%TCIDATA{Version=5.00.0.2552}
%TCIDATA{CSTFile=40 LaTeX article.cst}
%TCIDATA{Created=Thursday, August 21, 2008 14:03:59}
%TCIDATA{LastRevised=Thursday, November 13, 2014 21:13:13}
%TCIDATA{}
%TCIDATA{}
%TCIDATA{}
%TCIDATA{Language=American English}
\newtheorem{theorem}{Theorem}
\newtheorem{acknowledgement}[theorem]{Acknowledgement}
\newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{axiom}[theorem]{Axiom}
\newtheorem{case}[theorem]{Case}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{conclusion}[theorem]{Conclusion}
\newtheorem{condition}[theorem]{Condition}
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{criterion}[theorem]{Criterion}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{notation}[theorem]{Notation}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{solution}[theorem]{Solution}
\newtheorem{summary}[theorem]{Summary}
\newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}}
\begin{document}
\title{Math 443/543 Graph Theory Notes 7: Small world phenomenon and decentralized search}
\author{David Glickenstein}
\maketitle
\section{Small world phenomenon}
The small world phenomenon is the principle that all people are linked by
short chains of acquaintances. For instance, \textquotedblleft six degrees of
separation\textquotedblright\ is the idea that everyone is linked by chains of
people at most 6 persons long. That is, given any persons $A$ and $B,$ there
are a chain of people such that $A$ knows $C$ who knows $D$ who knows $E$ who
knows $F$ who knows $G$ whoe knows $H$ who knows $B.$ Other examples are the
Kevin Bacon game (that every actor can be linked to Kevin Bacon via actors
they have acted with) and the Erdos number (related to collaborations on
mathematics papers).
The first real empirical evidence is from an experiment by the social
psychologist Stanley Milgram during the 1960s. He tested it in the following
way. He chose a target person, as stockbroker living near Boston. And he took
a random person in, say, Nebraska and asked them to get a letter to the
target. Each person was only allowed to forward the letter to a single
acquaintance they know on a first name basis. The only things the person knew
about the target was his name, occupation, address, and some personal
information. One can think of the letter as tracing a path in the graph of all
people who know each other (the acquaintance graph). Milgram found that
sometimes the letters made it to the target, and among those that did make it
to the target, the median length of the path was around 6, hence the idea of 6
degrees of separation.
For our purposes, we want to understand why a graph like the acquaintance
graph might have short paths, and then think about how to find short paths.
Simplistic idea: Suppose each person knows 100 others on a first name basis.
Then in 2 steps, one should be able to reach 10,000 people, in three steps,
100 x 100x100 = 1,000,000 people, and then by 5 steps 10 billion people.. The
population of the US is estimated just above 300 million and the population of
the world is estimated to be less than 7 billion.
This idea is a bit too simplistic, since many people who you know on a first
name basis also know each other. This means that, in reality, 5 steps will not
reach nearly this many people. The implicit assumption made above is that
everyone knows 100 random people of all possible people.
Which brings us to the Watts-Strogatz model. The idea is that there is a
balance between the random association of people and the notion that people
know others based on proximity. Watts and Strogatz considered a model where
each verticex is placed around a ring, and proximity is measured by attaching
each verticex to all other vertices within some proximity (such as 2). Then,
with some probability p, each edge is rewired to a random edge, not allowing
multiple edges. Thus, for $p=0$, there is no randomness, for $p=1,$ it is
entirely random, and for $p$ in between there is some amount of randomness.
For each such graph, They measured two quantities:
\begin{enumerate}
\item The characteristic path length $L.$ This is the typical separation
between two vertices, computed as the length of shortest paths averaged over
all pairs of vertices.
\item The clustering coefficient $C.$ For each vertex $v,$ one can compute
$C_{v}$ as the number of edges between neighbors of $v,$ divided by the total
number possible (which is $\binom{k_{v}}{2}=k_{v}\left( k_{v}-1\right) /2$
if $v$ has $k_{v}$ neighbors). $C$ is the average of these over all vertices.
\end{enumerate}
One can look at Figure 2 in WS to see the relationship for different values of
$p.$ The interesting phenomenon is that the characteristic path length
decreases well before the clustering coefficient does.
\begin{example}
1D lattice. Say we have vertices on the number line at $1,2,3,\ldots,n.$ Then
the distance between two vertices is $d\left( v_{i},v_{j}\right) =\left\vert
i-j\right\vert .$ The average path length is
\begin{align*}
\frac{2}{n\left( n-1\right) }\sum_{j=1}^{n-1}\left( 1+2+\cdots+n-j\right)
& =\frac{1}{n\left( n-1\right) }\sum_{j=1}^{n-1}\left( n-j\right) \left(
n-j+1\right) \\
& =\frac{1}{n\left( n-1\right) }\left( \left( n^{2}+n\right) \left(
n-1\right) -\frac{1}{2}\left( 1+2n\right) n\left( n-1\right)
+\frac{\left( n-1\right) n\left( 2n-1\right) }{6}\right) \\
& =\frac{1}{3}\left( n+1\right)
\end{align*}
since
\[
\sum_{j=1}^{n-1}j^{2}=\frac{\left( n-1\right) n\left( 2n-1\right) }{6}.
\]
The clustering coefficient is zero.
\end{example}
\begin{example}
1D lattice, with edges between vertices a distance at most 2 apart. The
average path length is roughly half that in the 1D lattice. However, the
clustering coefficient is larger, since each vertex (except the end ones) has
four adjacent vertices and these have one edge between them. So the clustering
coefficient is
\[
C\approx\frac{1}{6}.
\]
Notice that this does not change as $n\rightarrow\infty.$
\end{example}
\begin{example}
2D square lattice graph. We can label vertices $v_{i,j}$ where $i$ and $j$ are
integers (consider $i,j$ are between $1$ and $n,$ so there are $p=n^{2}$
vertices). The distance between two vertices is $d\left( v_{i,j},v_{k,\ell
}\right) =\left\vert i-k\right\vert +\left\vert j-\ell\right\vert .$ There
are $\frac{n^{2}!}{2\left( n^{2}-2\right) !}=\frac{n^{2}\left(
n^{2}-1\right) }{2}$ pairs of vertices. Thus the average path length is
\begin{align*}
\frac{1}{n^{2}\left( n^{2}-1\right) }\sum_{i,j,k,\ell=1}^{n}\left(
\left\vert i-k\right\vert +\left\vert j-\ell\right\vert \right) & =\frac
{1}{n^{2}\left( n^{2}-1\right) }\left( 2n^{2}\sum_{i,k=1}^{n}\left\vert
i-k\right\vert \right) \\
& =\frac{2n^{2}}{n^{2}\left( n^{2}-1\right) }\left( \frac{1}{3}\left(
n+1\right) n\left( n-1\right) \right) \\
& =\frac{1}{3}n=\frac{1}{3}p^{1/2}.
\end{align*}
Other than at the edges, each vertex is adjacent to $4$ vertices, none of
which are adjacent to each other, so it has clustering $0.$
\end{example}
\begin{example}
Random graph. Suppose we have a graph with $n$ vertices such that each vertex
is adjacent to $k$ random other vertices. We see that given a vertex $v,$ $k$
other vertices are adjacent to it, and (since it is random), $k^{2}$ vertices
are distance less than $2$ to $v.$ And so if $n=k^{a},$ we see that
$a=\frac{\log n}{\log k},$ meaning that essentially every vertex has distance
less than $a.$ It follows that the average path length is proportional to
$\log n.$ In terms of clustering, we see that if $u$ and $w$ are vertices
adjacent to $v,$ there is a a $k/n$ chance that $u$ and $w$ are adjacent. Thus
the clustering coefficient is around $k/n.$\bigskip
\end{example}
Thus Watts and Strogatz suggest the following definition of small world phenomenon.
\begin{definition}
A graph exhibits \emph{small world behavior} if $L\gtrsim L_{r}$ and $C\gg
C_{r},$ where $L_{r}$ and $C_{r}$ are the characteristic path length and
clustering coefficient for a random graph and $A\gtrsim B$ means $A$ is larger
than $B$ but close to $B$ and $A\gg B$ means that $A$ is much larger than $B.$
\end{definition}
In other words, small world phenomenon has a lot of clustering, but still
short paths. See WS for some estimates of $L$ and $C$ for several large graphs
which seem to exhibit small world behavior.
We can look at another type of graph which can exhibit small-world behavior.
Consider an $n\times n$ grid, and suppose there are edges between every other
grid point within radius $r,$ that is, any other edge which can be reached by
$r$ steps by moving along the grid. This gives a nice notion of geometric
proximity. Now suppose that every vertex is connected to $k$ vertices which
are randomly distributed throughout the entire network. Then each vertex has
roughly $r^{2}+k$ edges. Notice that by using only random edges, one can reach
$k^{2}$ vertices in two steps, and $k^{j}$ vertices in $j$ steps. Thus, on the
average, one could reach all $n^{2}$ vertices in $2\log_{k}n=c\log n$ steps.
Note that by using nonrandom edges only, it would take on average a multiple
of $n$ steps. Thus for large $n,$ the shortest paths act like random edges.
However, the clustering is quite high. Considering only the grid vertices,
clustering is like
\[
C>\frac{\frac{1}{2}r^{2}\left( r^{2}-1\right) }{\frac{1}{2}\left(
r^{2}+k\right) \left( r^{2}+k-1\right) }\approx1
\]
if $k$ is small compared with $r^{2}.$ Note that if $k$ is large compared to
$r^{2},$ as for a random graph, the clustering is
\[
C_{\operatorname{rand}}<\frac{k+\frac{1}{2}r^{2}\left( r^{2}-1\right)
}{\frac{1}{2}\left( r^{2}+k\right) \left( r^{2}+k-1\right) }%
\]
which goes to zero.
In fact, one could consider a network where $k$ is a number between $0$ and
$1,$ representing a fact that we are actually only allowing random edges for
some vertices (not every vertex). Conceptually, we can group parts of the
$n\times n$ grid into \textquotedblleft towns,\textquotedblright\ each of
which has some number of random edges. We then produce a different graph which
is similar to our original formulation. One can get from one person to another
in the town quite easily using the edges based on proximity.
\section{Decentralized search}
How was it that participants in the Milgram experiment were able to get the
letters to their targets without any centralized help? They did not find paths
and choose the shortest, but were able to find a short path knowing only
minimal information. Our next task is to try to find models for this.
The model is that we have a starting node $s$ and a target node $t.$ Suppose
$s$ only knows where $t$ is, but not how to reach $t$ quickly (because $s$
does not know about the random edges). This is also true for each intermediate
node along the path. Of course, an obvious strategy is to just use the edges
based on proximity and move closer and closer to where $t$ is located. This
will produce a delivery time on the order of the distance from $s$ to $t.$
(Delivery time is the number of edges in the path from $s$ to $t.$) We wish to
find an algorithm that has, on the average, a faster delivery time than this
naive algorithm. Since we already know that the shortest path is, on the
average, proportional to $\log n,$ we would hope to find an algorithm which
acheives this.
It turns out that our grid model does not admit such an algorithm. Kleinberg
was able to show that it takes at least $n^{2/3}.$ The main idea is that it is
quite hard to choose which random edges to pick.
We use a new model which is slightly more general than the network model.
Instead of adding $k$ random edges to the nodes within $r$ grid steps, for
each pair of edges $v$ and $w,$ we add an edge between them with probability
proportional to $d\left( v,w\right) ^{-q},$ for some value $q.$ We call $q$
the \emph{clustering exponent}. If $q=0,$ then we have equal probabilities and
so we have the previous grid model. If $q$ is very large, then we have very
few random edges at all. So $q=0$ has the most randomness and the randomness
decreases as $q$ increases.
The problem we will find is that if there is too much randomness, it is hard
to exploit it in an algorithm. But if there is not enough randomness, there
aren't enough short paths.
\begin{theorem}
[Kleinberg]If $q=2$ then there is an algorithm for a decentralized search with
delivery time proportional to $\left( \log n\right) ^{2}.$ If $q\neq2,$ then
any algorithm for a decentralized search has a delivery time proportional to
$n^{c}$ for some $c$ (which depends on $q$).
\end{theorem}
We will give an idea of why $q=2$ allows for a good decentralized search algorithm.
Consider a vertex $v$ and a distance $d.$ we can look at the ring of all grid
points at least $d$ grid steps from $v$ but less than $2d$ grid steps from
$v.$ For an arbitrary edge from $v,$ the probability that it links to a
particular point in this ring is roughly $d^{-q}$. There are approximately
$d^{2}$ vertices in this ring, and so the probability that an edge from $v$
links to any point in the ring is proportional to $d^{2-q}.$ If $q=2,$ this
does not depend on $d$!!
Henceforth, let's assume that $r=1,$ and so each vertex is attached to $4$
other vertices (unless the vertex is on the boundary). So for a fixed vertex
$v,$ there are $4$ vertices a distance $1$ away from $v$, 8 vertices a
distance $2$ away, $12$ vertices a distance $3$ away, and, in general, $4n$
vertices a distance $n$ away from $v.$ Also, let's assume that each vertex has
one random edge. The algorithm is quite simple, choose the adjacent vertex
which is closest to the target. This is called the \emph{myopic search}. The
key fact we will show is that this does find the target on average in time
$\left( \log n\right) ^{2}.$
Now, we will measure progress by looking at the domain in phases.
\begin{definition}
The search is in \emph{phase }$j$ if the distance of the current vertex to $t$
is between $2^{j}$ and $2^{j+1}.$
\end{definition}
Note that the largest value of $j$ is equal to the least integer greater than
$\log_{2}n-1.$
Let $X_{j}$ be the total number of steps spent in phase $j.$ The expected
number of steps spent in phase $j$ is equal to
\[
E_{j}=\sum_{i=1}^{\infty}i\Pr\left[ X_{j}=i\right] .
\]
The total number of expected steps in the path is
\[
E=E_{1}+E_{2}+\cdots+E_{\log_{2}n-1}%
\]
since there are at most $\log_{2}n-1$ phases. We need to show that the
expected number of steps spent in phase $j$ is around $\log_{2}n.$
We see that we can calculate this as follows. The probability that $X_{j}\geq
i$ is equal to
\[
\Pr\left[ X_{j}\geq i\right] =\sum_{k=0}^{\infty}\Pr\left[ X_{j}%
=i+k\right] .
\]
We see that we can rearrange this definition of expected steps to be
\begin{align*}
E_{j} & =\sum_{i=1}^{\infty}i\Pr\left[ X_{j}=i\right] \\
& =\sum_{i=1}^{\infty}\sum_{k=0}^{\infty}\Pr\left[ X_{j}=i+k\right] \\
& =\sum_{i=1}^{\infty}\Pr\left[ X_{j}\geq i\right] .
\end{align*}
We can now need to estimate $\Pr\left[ X_{j}\geq i\right] $ from above to
show that we expect the number of steps in stage $j$ to be small (proportional
to $\log n$). This probability is the same as the probability that we do not
enter the next phase in fewer than $i$ steps.
We need to be a bit more formal about how we choose the random edges. Given
each vertex $v,$ we will assign an edge to another vertex $u$ with probability
proportional to $d\left( u,v\right) ^{-2},$ where $d$ is the distance on the
grid. In order to make this a probability, that means that the probability
that the edge is $uv$ is equal to
\[
P_{v,u}=\frac{d\left( u,v\right) ^{-2}}{\sum_{w\neq v}d\left( w,v\right)
^{-2}}.
\]
\begin{proposition}
We can estimate the probablility $P_{v,u}$ by%
\[
P_{v,u}\geq\frac{d\left( u,v\right) ^{-2}}{4\ln\left( 6n\right) }.
\]
\bigskip
\end{proposition}
\begin{proof}
We can estimate the denominator by
\begin{align*}
\sum_{w\neq v}d\left( w,v\right) ^{-2} & \leq\sum_{j=1}^{2n}\left(
4j\right) j^{-2}\\
& =4\sum_{j=1}^{2n}\frac{1}{j}\\
& \leq4\left( 1+\int_{1}^{2n}\frac{1}{x}dx\right) \\
& =4+4\ln\left( 2n\right) \\
& \leq4\ln\left( 2en\right) \\
& \leq4\ln\left( 6n\right)
\end{align*}
where we have used the fact that there are at most $4j$ vertices of distance
$j$ from any vertex, and that the larget distance is $2n.$ This means that the
probability $P_{v,u}$ is at least
\[
P_{v,u}\geq\frac{d\left( u,v\right) ^{-2}}{4\ln\left( 6n\right) }.
\]
\end{proof}
Let $B_{j}$ be the collection of vertices which are a distance of $2^{j}$ or
less from $t.$ Note that stage $j$ is when one is at a vertex in $B_{j+1}$ but
not $B_{j}.$
\begin{proposition}
Suppose a message is at stage $j.$ The probability a message enters $B_{j}$
after one step is at least
\[
\frac{1}{128\ln\left( 6n\right) }.
\]
\end{proposition}
\begin{proof}
Let $u$ be the current vertex where the message is. We need to estimate
\[
P_{j}=\sum_{v\in B_{j}}P_{u,v}%
\]
from below. Note that every vertex $v\in B_{j}$ is a distance less than or
equal to $2^{j}+2^{j+1}=3\left( 2^{j}\right) <2^{j+2}$ from the current
vertex $u$ since we are in stage $j.$ Thus
\[
P_{u,v}\geq\frac{d\left( u,v\right) ^{-2}}{4\ln\left( 6n\right) }>\frac
{1}{4\left( 2^{2j+4}\right) \ln\left( 6n\right) }.
\]
Since we have this estimate $P_{u,v}$ the total probability $P_{j}$ can be
bounded below if we have a lower bound for the number of vertices in $B_{j}.$
By the proposition below, we have that there are at least $2^{2j-1}$ vertices
in $B_{j},$ and we see that the probability $P_{j}$ of entering $B_{j}$ is
thus at least
\[
P_{j}\geq\frac{2^{2j-1}}{4\left( 2^{2j+4}\right) \ln\left( 6n\right)
}=\frac{1}{128\ln\left( 6n\right) }.
\]
\end{proof}
We used the following:
\begin{proposition}
There are at least $2^{2j-1}$ vertices in $B_{j}.$
\end{proposition}
\begin{proof}
The worst case scenario is if $t$ is in the corner, in which case there are
$i+1$ vertices a distance $i$ away. Thus there are at least
\begin{align*}
\sum_{i=0}^{2^{j}}\left( i+1\right) & \geq\sum_{i=1}^{2^{j}}i=\frac{1}%
{2}\left( 2^{j}+1\right) 2^{j}\\
& \geq2^{2j-1}%
\end{align*}
vertices in $B_{j}$.
\end{proof}
We can now see that we do not stay in any given phase for all that long.
\begin{proposition}
The search stays in phase $j$ for, on the average, fewer than $128\ln\left(
6n\right) $ steps.
\end{proposition}
\begin{proof}
Since the probability that we stay in phase $j$ for more than one step is
equal to the probability that we do not enter $B_{j}$ after one step,
\[
\Pr\left[ X_{j}\geq1\right] =1-P_{j}\leq1-\frac{1}{128\ln\left( 6n\right)
}.
\]
Similarly, if we do not enter after $i$ steps, we get the following
\[
\Pr\left[ X_{j}\geq i\right] \leq\left( 1-\frac{1}{128\ln\left( 6n\right)
}\right) ^{i}%
\]
and hence the expected number of steps in stage $j$ is
\[
E_{j}=\sum_{i=1}^{\infty}\Pr\left[ X_{j}\geq i\right] \leq\sum_{i=1}%
^{\infty}\left( 1-\frac{1}{128\ln\left( 6n\right) }\right) ^{i}%
=128\ln\left( 6n\right) -1.
\]
(Note: in reality, we do not need to take the sum to infinity, since moving
along the grid allows us to reach the next stage in at most $2^{j+1}%
-2^{j}=2^{j}$ steps.)
\end{proof}
\begin{theorem}
The decentralized search will, on the average, find the target in fewer than
$\alpha\left( \ln n\right) ^{2},$ for some $\alpha$ which does not depend on
$n.$
\end{theorem}
\begin{proof}
The expected time is equal to the sum of the expected lengths from one step to
the next. Since there are $\log_{2}n$ steps, we see that the expected time is
less than
\[
\left( 128\ln\left( 6n\right) -1\right) \log_{2}n=\alpha\left( \log
_{2}n\right) ^{2},
\]
since%
\begin{align*}
\left( 128\ln\left( 6n\right) -1\right) \log_{2}n & =\frac{\left(
128\ln n+128\ln6-1\right) }{\ln2}\ln n\\
& \leq\frac{256}{\ln2}\left( \ln n\right) ^{2}%
\end{align*}
for large $n.$
\end{proof}
\end{document}