\documentclass{article}% \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{graphicx}% \setcounter{MaxMatrixCols}{30} %TCIDATA{OutputFilter=latex2.dll} %TCIDATA{Version=5.00.0.2552} %TCIDATA{CSTFile=40 LaTeX article.cst} %TCIDATA{Created=Thursday, August 21, 2008 14:03:59} %TCIDATA{LastRevised=Thursday, November 13, 2014 21:13:13} %TCIDATA{} %TCIDATA{} %TCIDATA{} %TCIDATA{Language=American English} \newtheorem{theorem}{Theorem} \newtheorem{acknowledgement}[theorem]{Acknowledgement} \newtheorem{algorithm}[theorem]{Algorithm} \newtheorem{axiom}[theorem]{Axiom} \newtheorem{case}[theorem]{Case} \newtheorem{claim}[theorem]{Claim} \newtheorem{conclusion}[theorem]{Conclusion} \newtheorem{condition}[theorem]{Condition} \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{criterion}[theorem]{Criterion} \newtheorem{definition}[theorem]{Definition} \newtheorem{example}[theorem]{Example} \newtheorem{exercise}[theorem]{Exercise} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{notation}[theorem]{Notation} \newtheorem{problem}[theorem]{Problem} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{remark}[theorem]{Remark} \newtheorem{solution}[theorem]{Solution} \newtheorem{summary}[theorem]{Summary} \newenvironment{proof}[Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}} \begin{document} \title{Math 443/543 Graph Theory Notes 7: Small world phenomenon and decentralized search} \author{David Glickenstein} \maketitle \section{Small world phenomenon} The small world phenomenon is the principle that all people are linked by short chains of acquaintances. For instance, \textquotedblleft six degrees of separation\textquotedblright\ is the idea that everyone is linked by chains of people at most 6 persons long. That is, given any persons $A$ and $B,$ there are a chain of people such that $A$ knows $C$ who knows $D$ who knows $E$ who knows $F$ who knows $G$ whoe knows $H$ who knows $B.$ Other examples are the Kevin Bacon game (that every actor can be linked to Kevin Bacon via actors they have acted with) and the Erdos number (related to collaborations on mathematics papers). The first real empirical evidence is from an experiment by the social psychologist Stanley Milgram during the 1960s. He tested it in the following way. He chose a target person, as stockbroker living near Boston. And he took a random person in, say, Nebraska and asked them to get a letter to the target. Each person was only allowed to forward the letter to a single acquaintance they know on a first name basis. The only things the person knew about the target was his name, occupation, address, and some personal information. One can think of the letter as tracing a path in the graph of all people who know each other (the acquaintance graph). Milgram found that sometimes the letters made it to the target, and among those that did make it to the target, the median length of the path was around 6, hence the idea of 6 degrees of separation. For our purposes, we want to understand why a graph like the acquaintance graph might have short paths, and then think about how to find short paths. Simplistic idea: Suppose each person knows 100 others on a first name basis. Then in 2 steps, one should be able to reach 10,000 people, in three steps, 100 x 100x100 = 1,000,000 people, and then by 5 steps 10 billion people.. The population of the US is estimated just above 300 million and the population of the world is estimated to be less than 7 billion. This idea is a bit too simplistic, since many people who you know on a first name basis also know each other. This means that, in reality, 5 steps will not reach nearly this many people. The implicit assumption made above is that everyone knows 100 random people of all possible people. Which brings us to the Watts-Strogatz model. The idea is that there is a balance between the random association of people and the notion that people know others based on proximity. Watts and Strogatz considered a model where each verticex is placed around a ring, and proximity is measured by attaching each verticex to all other vertices within some proximity (such as 2). Then, with some probability p, each edge is rewired to a random edge, not allowing multiple edges. Thus, for $p=0$, there is no randomness, for $p=1,$ it is entirely random, and for $p$ in between there is some amount of randomness. For each such graph, They measured two quantities: \begin{enumerate} \item The characteristic path length $L.$ This is the typical separation between two vertices, computed as the length of shortest paths averaged over all pairs of vertices. \item The clustering coefficient $C.$ For each vertex $v,$ one can compute $C_{v}$ as the number of edges between neighbors of $v,$ divided by the total number possible (which is $\binom{k_{v}}{2}=k_{v}\left( k_{v}-1\right) /2$ if $v$ has $k_{v}$ neighbors). $C$ is the average of these over all vertices. \end{enumerate} One can look at Figure 2 in WS to see the relationship for different values of $p.$ The interesting phenomenon is that the characteristic path length decreases well before the clustering coefficient does. \begin{example} 1D lattice. Say we have vertices on the number line at $1,2,3,\ldots,n.$ Then the distance between two vertices is $d\left( v_{i},v_{j}\right) =\left\vert i-j\right\vert .$ The average path length is \begin{align*} \frac{2}{n\left( n-1\right) }\sum_{j=1}^{n-1}\left( 1+2+\cdots+n-j\right) & =\frac{1}{n\left( n-1\right) }\sum_{j=1}^{n-1}\left( n-j\right) \left( n-j+1\right) \\ & =\frac{1}{n\left( n-1\right) }\left( \left( n^{2}+n\right) \left( n-1\right) -\frac{1}{2}\left( 1+2n\right) n\left( n-1\right) +\frac{\left( n-1\right) n\left( 2n-1\right) }{6}\right) \\ & =\frac{1}{3}\left( n+1\right) \end{align*} since $\sum_{j=1}^{n-1}j^{2}=\frac{\left( n-1\right) n\left( 2n-1\right) }{6}.$ The clustering coefficient is zero. \end{example} \begin{example} 1D lattice, with edges between vertices a distance at most 2 apart. The average path length is roughly half that in the 1D lattice. However, the clustering coefficient is larger, since each vertex (except the end ones) has four adjacent vertices and these have one edge between them. So the clustering coefficient is $C\approx\frac{1}{6}.$ Notice that this does not change as $n\rightarrow\infty.$ \end{example} \begin{example} 2D square lattice graph. We can label vertices $v_{i,j}$ where $i$ and $j$ are integers (consider $i,j$ are between $1$ and $n,$ so there are $p=n^{2}$ vertices). The distance between two vertices is $d\left( v_{i,j},v_{k,\ell }\right) =\left\vert i-k\right\vert +\left\vert j-\ell\right\vert .$ There are $\frac{n^{2}!}{2\left( n^{2}-2\right) !}=\frac{n^{2}\left( n^{2}-1\right) }{2}$ pairs of vertices. Thus the average path length is \begin{align*} \frac{1}{n^{2}\left( n^{2}-1\right) }\sum_{i,j,k,\ell=1}^{n}\left( \left\vert i-k\right\vert +\left\vert j-\ell\right\vert \right) & =\frac {1}{n^{2}\left( n^{2}-1\right) }\left( 2n^{2}\sum_{i,k=1}^{n}\left\vert i-k\right\vert \right) \\ & =\frac{2n^{2}}{n^{2}\left( n^{2}-1\right) }\left( \frac{1}{3}\left( n+1\right) n\left( n-1\right) \right) \\ & =\frac{1}{3}n=\frac{1}{3}p^{1/2}. \end{align*} Other than at the edges, each vertex is adjacent to $4$ vertices, none of which are adjacent to each other, so it has clustering $0.$ \end{example} \begin{example} Random graph. Suppose we have a graph with $n$ vertices such that each vertex is adjacent to $k$ random other vertices. We see that given a vertex $v,$ $k$ other vertices are adjacent to it, and (since it is random), $k^{2}$ vertices are distance less than $2$ to $v.$ And so if $n=k^{a},$ we see that $a=\frac{\log n}{\log k},$ meaning that essentially every vertex has distance less than $a.$ It follows that the average path length is proportional to $\log n.$ In terms of clustering, we see that if $u$ and $w$ are vertices adjacent to $v,$ there is a a $k/n$ chance that $u$ and $w$ are adjacent. Thus the clustering coefficient is around $k/n.$\bigskip \end{example} Thus Watts and Strogatz suggest the following definition of small world phenomenon. \begin{definition} A graph exhibits \emph{small world behavior} if $L\gtrsim L_{r}$ and $C\gg C_{r},$ where $L_{r}$ and $C_{r}$ are the characteristic path length and clustering coefficient for a random graph and $A\gtrsim B$ means $A$ is larger than $B$ but close to $B$ and $A\gg B$ means that $A$ is much larger than $B.$ \end{definition} In other words, small world phenomenon has a lot of clustering, but still short paths. See WS for some estimates of $L$ and $C$ for several large graphs which seem to exhibit small world behavior. We can look at another type of graph which can exhibit small-world behavior. Consider an $n\times n$ grid, and suppose there are edges between every other grid point within radius $r,$ that is, any other edge which can be reached by $r$ steps by moving along the grid. This gives a nice notion of geometric proximity. Now suppose that every vertex is connected to $k$ vertices which are randomly distributed throughout the entire network. Then each vertex has roughly $r^{2}+k$ edges. Notice that by using only random edges, one can reach $k^{2}$ vertices in two steps, and $k^{j}$ vertices in $j$ steps. Thus, on the average, one could reach all $n^{2}$ vertices in $2\log_{k}n=c\log n$ steps. Note that by using nonrandom edges only, it would take on average a multiple of $n$ steps. Thus for large $n,$ the shortest paths act like random edges. However, the clustering is quite high. Considering only the grid vertices, clustering is like $C>\frac{\frac{1}{2}r^{2}\left( r^{2}-1\right) }{\frac{1}{2}\left( r^{2}+k\right) \left( r^{2}+k-1\right) }\approx1$ if $k$ is small compared with $r^{2}.$ Note that if $k$ is large compared to $r^{2},$ as for a random graph, the clustering is $C_{\operatorname{rand}}<\frac{k+\frac{1}{2}r^{2}\left( r^{2}-1\right) }{\frac{1}{2}\left( r^{2}+k\right) \left( r^{2}+k-1\right) }%$ which goes to zero. In fact, one could consider a network where $k$ is a number between $0$ and $1,$ representing a fact that we are actually only allowing random edges for some vertices (not every vertex). Conceptually, we can group parts of the $n\times n$ grid into \textquotedblleft towns,\textquotedblright\ each of which has some number of random edges. We then produce a different graph which is similar to our original formulation. One can get from one person to another in the town quite easily using the edges based on proximity. \section{Decentralized search} How was it that participants in the Milgram experiment were able to get the letters to their targets without any centralized help? They did not find paths and choose the shortest, but were able to find a short path knowing only minimal information. Our next task is to try to find models for this. The model is that we have a starting node $s$ and a target node $t.$ Suppose $s$ only knows where $t$ is, but not how to reach $t$ quickly (because $s$ does not know about the random edges). This is also true for each intermediate node along the path. Of course, an obvious strategy is to just use the edges based on proximity and move closer and closer to where $t$ is located. This will produce a delivery time on the order of the distance from $s$ to $t.$ (Delivery time is the number of edges in the path from $s$ to $t.$) We wish to find an algorithm that has, on the average, a faster delivery time than this naive algorithm. Since we already know that the shortest path is, on the average, proportional to $\log n,$ we would hope to find an algorithm which acheives this. It turns out that our grid model does not admit such an algorithm. Kleinberg was able to show that it takes at least $n^{2/3}.$ The main idea is that it is quite hard to choose which random edges to pick. We use a new model which is slightly more general than the network model. Instead of adding $k$ random edges to the nodes within $r$ grid steps, for each pair of edges $v$ and $w,$ we add an edge between them with probability proportional to $d\left( v,w\right) ^{-q},$ for some value $q.$ We call $q$ the \emph{clustering exponent}. If $q=0,$ then we have equal probabilities and so we have the previous grid model. If $q$ is very large, then we have very few random edges at all. So $q=0$ has the most randomness and the randomness decreases as $q$ increases. The problem we will find is that if there is too much randomness, it is hard to exploit it in an algorithm. But if there is not enough randomness, there aren't enough short paths. \begin{theorem} [Kleinberg]If $q=2$ then there is an algorithm for a decentralized search with delivery time proportional to $\left( \log n\right) ^{2}.$ If $q\neq2,$ then any algorithm for a decentralized search has a delivery time proportional to $n^{c}$ for some $c$ (which depends on $q$). \end{theorem} We will give an idea of why $q=2$ allows for a good decentralized search algorithm. Consider a vertex $v$ and a distance $d.$ we can look at the ring of all grid points at least $d$ grid steps from $v$ but less than $2d$ grid steps from $v.$ For an arbitrary edge from $v,$ the probability that it links to a particular point in this ring is roughly $d^{-q}$. There are approximately $d^{2}$ vertices in this ring, and so the probability that an edge from $v$ links to any point in the ring is proportional to $d^{2-q}.$ If $q=2,$ this does not depend on $d$!! Henceforth, let's assume that $r=1,$ and so each vertex is attached to $4$ other vertices (unless the vertex is on the boundary). So for a fixed vertex $v,$ there are $4$ vertices a distance $1$ away from $v$, 8 vertices a distance $2$ away, $12$ vertices a distance $3$ away, and, in general, $4n$ vertices a distance $n$ away from $v.$ Also, let's assume that each vertex has one random edge. The algorithm is quite simple, choose the adjacent vertex which is closest to the target. This is called the \emph{myopic search}. The key fact we will show is that this does find the target on average in time $\left( \log n\right) ^{2}.$ Now, we will measure progress by looking at the domain in phases. \begin{definition} The search is in \emph{phase }$j$ if the distance of the current vertex to $t$ is between $2^{j}$ and $2^{j+1}.$ \end{definition} Note that the largest value of $j$ is equal to the least integer greater than $\log_{2}n-1.$ Let $X_{j}$ be the total number of steps spent in phase $j.$ The expected number of steps spent in phase $j$ is equal to $E_{j}=\sum_{i=1}^{\infty}i\Pr\left[ X_{j}=i\right] .$ The total number of expected steps in the path is $E=E_{1}+E_{2}+\cdots+E_{\log_{2}n-1}%$ since there are at most $\log_{2}n-1$ phases. We need to show that the expected number of steps spent in phase $j$ is around $\log_{2}n.$ We see that we can calculate this as follows. The probability that $X_{j}\geq i$ is equal to $\Pr\left[ X_{j}\geq i\right] =\sum_{k=0}^{\infty}\Pr\left[ X_{j}% =i+k\right] .$ We see that we can rearrange this definition of expected steps to be \begin{align*} E_{j} & =\sum_{i=1}^{\infty}i\Pr\left[ X_{j}=i\right] \\ & =\sum_{i=1}^{\infty}\sum_{k=0}^{\infty}\Pr\left[ X_{j}=i+k\right] \\ & =\sum_{i=1}^{\infty}\Pr\left[ X_{j}\geq i\right] . \end{align*} We can now need to estimate $\Pr\left[ X_{j}\geq i\right]$ from above to show that we expect the number of steps in stage $j$ to be small (proportional to $\log n$). This probability is the same as the probability that we do not enter the next phase in fewer than $i$ steps. We need to be a bit more formal about how we choose the random edges. Given each vertex $v,$ we will assign an edge to another vertex $u$ with probability proportional to $d\left( u,v\right) ^{-2},$ where $d$ is the distance on the grid. In order to make this a probability, that means that the probability that the edge is $uv$ is equal to $P_{v,u}=\frac{d\left( u,v\right) ^{-2}}{\sum_{w\neq v}d\left( w,v\right) ^{-2}}.$ \begin{proposition} We can estimate the probablility $P_{v,u}$ by% $P_{v,u}\geq\frac{d\left( u,v\right) ^{-2}}{4\ln\left( 6n\right) }.$ \bigskip \end{proposition} \begin{proof} We can estimate the denominator by \begin{align*} \sum_{w\neq v}d\left( w,v\right) ^{-2} & \leq\sum_{j=1}^{2n}\left( 4j\right) j^{-2}\\ & =4\sum_{j=1}^{2n}\frac{1}{j}\\ & \leq4\left( 1+\int_{1}^{2n}\frac{1}{x}dx\right) \\ & =4+4\ln\left( 2n\right) \\ & \leq4\ln\left( 2en\right) \\ & \leq4\ln\left( 6n\right) \end{align*} where we have used the fact that there are at most $4j$ vertices of distance $j$ from any vertex, and that the larget distance is $2n.$ This means that the probability $P_{v,u}$ is at least $P_{v,u}\geq\frac{d\left( u,v\right) ^{-2}}{4\ln\left( 6n\right) }.$ \end{proof} Let $B_{j}$ be the collection of vertices which are a distance of $2^{j}$ or less from $t.$ Note that stage $j$ is when one is at a vertex in $B_{j+1}$ but not $B_{j}.$ \begin{proposition} Suppose a message is at stage $j.$ The probability a message enters $B_{j}$ after one step is at least $\frac{1}{128\ln\left( 6n\right) }.$ \end{proposition} \begin{proof} Let $u$ be the current vertex where the message is. We need to estimate $P_{j}=\sum_{v\in B_{j}}P_{u,v}%$ from below. Note that every vertex $v\in B_{j}$ is a distance less than or equal to $2^{j}+2^{j+1}=3\left( 2^{j}\right) <2^{j+2}$ from the current vertex $u$ since we are in stage $j.$ Thus $P_{u,v}\geq\frac{d\left( u,v\right) ^{-2}}{4\ln\left( 6n\right) }>\frac {1}{4\left( 2^{2j+4}\right) \ln\left( 6n\right) }.$ Since we have this estimate $P_{u,v}$ the total probability $P_{j}$ can be bounded below if we have a lower bound for the number of vertices in $B_{j}.$ By the proposition below, we have that there are at least $2^{2j-1}$ vertices in $B_{j},$ and we see that the probability $P_{j}$ of entering $B_{j}$ is thus at least $P_{j}\geq\frac{2^{2j-1}}{4\left( 2^{2j+4}\right) \ln\left( 6n\right) }=\frac{1}{128\ln\left( 6n\right) }.$ \end{proof} We used the following: \begin{proposition} There are at least $2^{2j-1}$ vertices in $B_{j}.$ \end{proposition} \begin{proof} The worst case scenario is if $t$ is in the corner, in which case there are $i+1$ vertices a distance $i$ away. Thus there are at least \begin{align*} \sum_{i=0}^{2^{j}}\left( i+1\right) & \geq\sum_{i=1}^{2^{j}}i=\frac{1}% {2}\left( 2^{j}+1\right) 2^{j}\\ & \geq2^{2j-1}% \end{align*} vertices in $B_{j}$. \end{proof} We can now see that we do not stay in any given phase for all that long. \begin{proposition} The search stays in phase $j$ for, on the average, fewer than $128\ln\left( 6n\right)$ steps. \end{proposition} \begin{proof} Since the probability that we stay in phase $j$ for more than one step is equal to the probability that we do not enter $B_{j}$ after one step, $\Pr\left[ X_{j}\geq1\right] =1-P_{j}\leq1-\frac{1}{128\ln\left( 6n\right) }.$ Similarly, if we do not enter after $i$ steps, we get the following $\Pr\left[ X_{j}\geq i\right] \leq\left( 1-\frac{1}{128\ln\left( 6n\right) }\right) ^{i}%$ and hence the expected number of steps in stage $j$ is $E_{j}=\sum_{i=1}^{\infty}\Pr\left[ X_{j}\geq i\right] \leq\sum_{i=1}% ^{\infty}\left( 1-\frac{1}{128\ln\left( 6n\right) }\right) ^{i}% =128\ln\left( 6n\right) -1.$ (Note: in reality, we do not need to take the sum to infinity, since moving along the grid allows us to reach the next stage in at most $2^{j+1}% -2^{j}=2^{j}$ steps.) \end{proof} \begin{theorem} The decentralized search will, on the average, find the target in fewer than $\alpha\left( \ln n\right) ^{2},$ for some $\alpha$ which does not depend on $n.$ \end{theorem} \begin{proof} The expected time is equal to the sum of the expected lengths from one step to the next. Since there are $\log_{2}n$ steps, we see that the expected time is less than $\left( 128\ln\left( 6n\right) -1\right) \log_{2}n=\alpha\left( \log _{2}n\right) ^{2},$ since% \begin{align*} \left( 128\ln\left( 6n\right) -1\right) \log_{2}n & =\frac{\left( 128\ln n+128\ln6-1\right) }{\ln2}\ln n\\ & \leq\frac{256}{\ln2}\left( \ln n\right) ^{2}% \end{align*} for large $n.$ \end{proof} \end{document}