Measuring Similarities in Data using Reeb Graphs and the Interleaving Distance
Suppose you are the observer of a single hilly landscape. If you were asked to determine which features were deemed "important" or "necessary", you might be inclined to point out positions of high speaks or low troughs (perhaps even bridges between them) while ignoring the placement of individual rocks on the hillside or specific details of the diameters of the peaks. How can we summarize this "important" information? We can first imagine that the landscape is being flooded with water. As the water level rises, we are able to see that low peaks and bridges become submerged quickly and soon only the highest peaks are left. It might become impossible to move from one peak to another without traversing water. What this allows us to observe is how the connectedness of the landscape changes while ignoring all the minute features. The Reeb Graph is the quotient space where two values are equivalent if they lie on the same level set and are in the same connected component. It contracts each connected component of level sets to single points which ultimately tracks how the connectedness of our flooded landscape changes while ignoring any volumetric data of these peaks. This data structure provides a massive decrease in the amount of information that is stored while still representing the important features of the data. Now how can we measure the similarity between these Reeb Graphs and what does it tell us about the landscapes that they emerge from? The interleaving distance provides a rigorous mathematical definition of "approximate isomorphisms" between Reeb Graphs by categorifying the structures. Thankfully, the interleaving distance also has a nice geometric realization which can be thought of as the amount of "thickening" of the Reeb Graphs needed to be done to create isomorphisms.