Kyle Kloster Wrangling data, algorithms, and code in the SF Bay

Machine Learning Seminar, Purdue University

September 25, 2014

Talk on joint work with David F. Gleich:

Local community structure in social and information networks

A graph diffusion is a probability vector that models how mass spreads throughout a network from a small set of starting nodes. Notable examples include the PageRank and heat kernel diffusions. They have been used for a variety of machine learning tasks including semi-supervised learning, community detection, and link prediction. Whereas there has been a few decades of work on quickly computing the PageRank diffusion, algorithmically-sound work on the heat kernel diffusion is more recent despite many theoretical properties that show it may be preferable to PageRank. I’ll discuss two methods to produce a provably accurate estimate of the heat kernel diffusion in sublinear time. The first is focused on detecting communities in large networks. The second is focused on accurate estimations of information diffusion for semi-supervised learning-style applications; this result is more intricate and requires the underlying graph to have a power-law degree distribution.