Project Proposal Milestone Report Final Report

Parallelized Community Detection in Graphs via Ant Colony Optimization

Team Members: Naveen Shenoy (naveensh), Ayush Kumar (ayushkum)

URL: https://www.andrew.cmu.edu/user/ayushkum/parallelACO/

Summary

We propose to parallelize community detection via Ant Colony Optimization (ACO) on large graphs using OpenMP and MPI.

Background

Graph community detection involves finding groups of nodes in a graph that are more densely connected to nodes within the group than outside the group, thus forming communities. Finding communities in large complex graphs is a computationally intensive task. Not knowing the size and number of communities beforehand adds to this complexity. Popular methods for graph community detection include the Girvan-Newman method and the Louvain method. However, the worst time-time complexities of these algorithms are cubic and quadratic respectively, and provide limited scope for parallelism. Ant Colony Optimization (ACO) is a swarm intelligence based algorithm and provides high scope for parallelism especially for graph community detection due to its support for community extraction.

In ACO, ants traverse a graph and deposit pheromones on edges (changing edge weights dynamically) in a random walk. Nodes connected through larger pheromone edges have a higher chance of being traversed in a random walk. An iteration consists of random walks by multiple ants. Over multiple iterations, edges within a community would have higher pheromone values than those outside the community. After all iterations, communities are extracted from the graph with edges weighted with pheromone values. After ACO completes, community extraction on the graph with edges as pheromone values is performed. Considering each unprocessed node as the only node of a community, neighbor nodes are checked for inclusion into the community. A neighbor is included if its similarity with the community is above a certain threshold. Similarity is measured by pheromone values of edges of the neighbor node that connect with nodes already assigned to that community. Overlapping communities can exist, with a node being part of more than one community.

The above algorithm consists of two phases, ACO and community extraction. For ACO, the scope of parallelism is across ants (multiple ants can walk in parallel) and within an ant (next node selection). In community extraction, the scope for parallelism is neighbor checking for expanding communities.

The Challenge

Implementing a parallel version of ACO for community detection presents several challenges:

Ants deposit pheromones on edges of the graph. Pheromone values correspond to new edge weights and hence must be synchronized across walks between ants.
Updating the same weight simultaneously can lead to race conditions.
Using stale values of pheromones or not synchronizing the pheromone updates on edges across ants running on parallel threads can hurt algorithm performance as it can reduce the concentration or strength of pheromones over successive iterations leading to weaker communities.
Since walks are random, certain threads can get ants having to traverse longer paths, which can lead to a workload imbalance due to dynamic scheduling.
Some communities grow much faster than others, leading to uneven thread workloads.
Cache Efficiency: Graph-based algorithms are memory-bound, and random accesses can hurt cache locality.

Resources

We take inspiration from some of the following papers on the topic of community detection, ACO, and parallel graph processing:

GVE-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection in Shared Memory Setting. Sahu, S. (2023). arXiv. https://doi.org/10.48550/arXiv.2312.08140
High Quality, Scalable and Parallel Community Detection for Large Real Graphs. (2014) Arnau Prat-Perez, et. al. International Conference on World Wide Web (WWW '14).
An efficient ant colony optimization framework for HPC environments (2022) P. Gonzalez, et. al. Applied Soft Computing 2022
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable (2021), Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. ACM Transactions on Parallel Computing, Vol. 8, No. 1, Article 4, March 2021

As for computational resources, we plan to use both the GHC machines as well as the PSC machines to run our algorithm. We will start writing our code from scratch, but we plan on referring to several existing implementations of ant colony optimization, such as P. González et. al (2022). We also plan to refer to previous work such as Arnau et. al (2014) to reference writing a parallel algorithm for community detection.

Goals and Deliverables

Plan to Achieve

Implement scalable parallel versions of graph community detection using Ant Colony Optimization (ACO) using shared-memory multiprocessing
Analyze the 'quality' of community detection in the parallel case, using objective metrics from graph theory such as modularity, conductance.
Study the effect of colony size, synchronization intervals, traversal length, pheromone count and other optimization parameters on community quality in a parallel setting.
Conduct analyses by parameter-tuning our parallel algorithm to maximize community quality while ensuring optimal performance
Test on small-to-medium size graph datasets such as web-reddit (100k nodes) as well as graphs of larger scale, including datasets such as web-flickr (2M+ nodes)
Develop a parallel community detection algorithm which achieves at least 75% modularity as the sequential case

Hope to Achieve

Implement scalable parallel versions of graph community detection using Ant Colony Optimization (ACO) using message passing programming model with MPI. Analyze and benchmark performance again OpenMP version.
Compare parallel ACO and how it scales against parallel implementations of at least one classical community detection algorithm Label Propagation algorithm. (https://github.com/puzzlef/rak-communities-openmp)
Conduct performance evaluation on massive-scale graphs, such as ClueWeb09 (4 bn+ nodes)

Platform Choice

We plan to use C++ as the programming language and experiment with OpenMP and MPI for parallelism. Both MPI and OpenMP APIs provide support for C++. We will be using the GHC and PSC machines which already have the libraries for MPI installed, while OpenMP support is provided natively by g++ compilers. PSC would allow us to test our program on a larger number of cores.

Schedule

Week	Plan
3/24 - 3/30	Research and draft proposal Setup development environment Finalize datasets
3/31 - 4/06	Finish sequential version of algorithm Implement initial metrics such as modularity Start parallel OpenMP Implementation
4/07 - 4/13	Complete parallel version of the algorithm using OpenMP Benchmark and final optimal configuration
4/14 - 4/20	Implement parallel version of the algorithm using MPI
4/21 - 4/28	Finish MPI parallel version Final evaluation and result collection Drafting final report