Project Proposal Milestone Report Final Report

Parallelized Community Detection in Graphs via Ant Colony Optimization

Team Members: Naveen Shenoy (naveensh), Ayush Kumar (ayushkum)

URL: https://www.andrew.cmu.edu/user/ayushkum/parallelACO/

Summary

We propose to parallelize community detection via Ant Colony Optimization (ACO) on large graphs using OpenMP and MPI.

Background

Graph community detection involves finding groups of nodes in a graph that are more densely connected to nodes within the group than outside the group, thus forming communities. Finding communities in large complex graphs is a computationally intensive task. Not knowing the size and number of communities beforehand adds to this complexity. Popular methods for graph community detection include the Girvan-Newman method and the Louvain method. However, the worst time-time complexities of these algorithms are cubic and quadratic respectively, and provide limited scope for parallelism. Ant Colony Optimization (ACO) is a swarm intelligence based algorithm and provides high scope for parallelism especially for graph community detection due to its support for community extraction.

In ACO, ants traverse a graph and deposit pheromones on edges (changing edge weights dynamically) in a random walk. Nodes connected through larger pheromone edges have a higher chance of being traversed in a random walk. An iteration consists of random walks by multiple ants. Over multiple iterations, edges within a community would have higher pheromone values than those outside the community. After all iterations, communities are extracted from the graph with edges weighted with pheromone values. After ACO completes, community extraction on the graph with edges as pheromone values is performed. Considering each unprocessed node as the only node of a community, neighbor nodes are checked for inclusion into the community. A neighbor is included if its similarity with the community is above a certain threshold. Similarity is measured by pheromone values of edges of the neighbor node that connect with nodes already assigned to that community. Overlapping communities can exist, with a node being part of more than one community.

The above algorithm consists of two phases, ACO and community extraction. For ACO, the scope of parallelism is across ants (multiple ants can walk in parallel) and within an ant (next node selection). In community extraction, the scope for parallelism is neighbor checking for expanding communities.

The Challenge

Implementing a parallel version of ACO for community detection presents several challenges:

Resources

We take inspiration from some of the following papers on the topic of community detection, ACO, and parallel graph processing:

As for computational resources, we plan to use both the GHC machines as well as the PSC machines to run our algorithm. We will start writing our code from scratch, but we plan on referring to several existing implementations of ant colony optimization, such as P. González et. al (2022). We also plan to refer to previous work such as Arnau et. al (2014) to reference writing a parallel algorithm for community detection.

Goals and Deliverables

Plan to Achieve

Hope to Achieve

Platform Choice

We plan to use C++ as the programming language and experiment with OpenMP and MPI for parallelism. Both MPI and OpenMP APIs provide support for C++. We will be using the GHC and PSC machines which already have the libraries for MPI installed, while OpenMP support is provided natively by g++ compilers. PSC would allow us to test our program on a larger number of cores.

Schedule

Week Plan
3/24 - 3/30
  • Research and draft proposal
  • Setup development environment
  • Finalize datasets
3/31 - 4/06
  • Finish sequential version of algorithm
  • Implement initial metrics such as modularity
  • Start parallel OpenMP Implementation
4/07 - 4/13
  • Complete parallel version of the algorithm using OpenMP
  • Benchmark and final optimal configuration
4/14 - 4/20
  • Implement parallel version of the algorithm using MPI
4/21 - 4/28
  • Finish MPI parallel version
  • Final evaluation and result collection
  • Drafting final report