DISSERTATION DEFENSE: RACHAEL SEXTON

Author: Rachael Sexton

Date/Time: June 12th, 2025 at 9:30am EST | Join Online

Location: EGR 2162, Glenn Martin Hall

Committee Members:

  • Dr. Mark Fuge, Chair
  • Dr. Jordan Ying, Dean’s Representative
  • Dr. Maria Cameron
  • Dr. Michelle Girvan
  • Dr. Vincent Lyzinski

Title of Dissertation: Measuring Network Dependencies from Node Activations

Abstract: This thesis develops a method for inferring dependency networks from binary data by describing and accounting for sources of systematic bias. Node “activation” data is a common source of complex networks, such as scientific collaboration networks from publications, to keyword and tag co-occurrence graphs from documents, epidemiological networks from infection records, and customer product recommendations from buyer carts, to name only a few. However,the way these networks get created often lead to dense edge “hairballs” and heavy post-processing needs. To more accurately reconstruct dependency networks, we propose a flexible technique that generalizes co-occurrence counting, and allows domain-knowledge to inform why nodes activate (and how to use that for inference).

Using our technique, we obtain “Desire Path Density” estimates of latent networks: families of possible networks based on probability distributions over individual edge activations, rather than node co-occurrences. We show that co-occurrence counting is a special case of this estimatation technique—and that it assumes node activations arise exclusively from cliques. This assumption leads to “clique-bias” in the recovered graphs, which results in the unwanted hairball effect. Our method, called Forest Pursuit, is a different application of desire path density estimation that attempts to address this bias. It assumes node activations are generated by sampling from distributions over dependency trees.

Forest Pursuit outperforms other algorithms at accurate network recovery, is scalable to very large networks sizes, and can be executed in a streaming or parallel manner over observations. It approximates Steiner trees for each set of activated nodes before aggregating them to estimate the latent global network. We additionally extend Forest Pursuit to be a probabilistic model whose parameters may be infered through Expectation Maximization. This modification shows improved performance over one-shot Forest Pursuit, at the cost of computation time.
Our introductory chapters provide a unified framework to analyze node activation data and edge (dependency) occurrences in related vector spaces, and provide a taxonomy of network recovery assumption types so that practitioners can find and describe sources of systematic bias from those assumptions. Then, along with derivations for Desire Path Density estimation and Forest Pursuit, we verify our method and compare its performance agains others. We created a dataset of synthetic experiments (MENDR), made publically available as a reference and test-bench for the community. Finally, we use several case studies to demonstrate the impacts of using Forest Pursuit for network analysis in a realistic setting,with insights into the topological differences between networks inferred by other methods and ours. By using Forest Pursuit, practitioners can correct for clique-bias and better use their latent networks to uncover important behavior of complex systems.