We describe here a notion of diffusion similarity, a method for defining similarity between vertices in a given graph using the properties of random walks on the graph to model the relationships between vertices. Using the approach of graph vertex embedding, we characterize a vertex vi by considering two types of diffusion patterns: the ways in which random walks emanate from the vertex vi to the remaining graph and how they converge to the vertex vi from the graph. We define the similarity of two vertices vi and vj as the average of the cosine similarity of the vectors characterizing vi and vj. We obtain these vectors by modifying the solution to a differential equation describing a type of continuous time random walk.
This method can be applied to any dataset that can be assigned a graph structure that is weighted or unweighted, directed or undirected. It can be used to represent similarity of vertices within community structures of a network while at the same time representing similarity of vertices within layered substructures (e.g., bipartite subgraphs) of the network. To validate the performance of our method, we apply it to synthetic data as well as the neural connectome of the C. elegans worm and a connectome of neurons in the mouse retina. A tool developed to characterize the accuracy of the similarity values in detecting community structures, the uncertainty index, is introduced in this paper as a measure of the quality of similarity methods.