Temporal and Incremental Graph Processing at Scale
Real-world graphs like social networks, fintech transactions and the World Wide Web (WWW) are dynamic in nature, and have addition and deletion of vertices and edges. Time dependent algorithms are computed for such graphs, where time intervals are stored for each entity in the graph, i.e., vertices, edges, and their attributes.
Programming temporal graph algorithms is complex. Without an abstraction layer over the temporal dimension, the programmer needs to write a lot of boiler-plate code just to temporarily align the vertex (or edge) states. Our work “ICM” (An Interval-centric Model for Distributed Computing over Temporal Graphs) published at ICDE 2020, completely abstracts out this temporal notion from an interval graph for the programmer. The platform allows the programmer to write code for a time dependent graph algorithm as they’d do for a time independent one. It introduces the novel Time Warp and Join operations that do the temporal alignment and reduces this burden entirely from the programmer.
Our work “WICM“, published at EuroSys, 2022 extends the interval centric model for an efficient computation to reduce the makespan. The time warp and time join operation in the ICM takes up to 86% of the entire computation time. Wicm reduces this overhead by introducing a notion of Windows in a Temporal Graph. The entire graph is split into time windows and ICM is called on each of these windows sequentially. WICM reduces the end-to-end time as compared to ICM by about 61%.
These temporal features may be present for an existing (materialized) interval graph, or updates to the structure may be received continuously over time for a dynamic graph. Incremental graph processing updates the output of the algorithm for a dynamic graph, without re-computing from scratch on the refreshed graph. Real world graphs are also large in nature and requires efficient Distributed Computing algorithms. To handle such incremental updates on a large-scale real-world graph, we have built a platform: I-WICM: Incremental Windowed Interval Centric Model. The I-WICM model builds upon our existing platforms on WICM and ICM. Our poster titled “Extending the Interval-centric Distributed Computing Model for Incremental Graphs” has been accepted at the HiPC Student Research Symposium, 2023, that portrays our initial piece of work and results. It has been published as a 1-page abstract in this document (page number: 65). As a follow-up of the HiPC, we also have a poster that is accepted at the early career and student showcase, CCGRID 2023 with additional work and results.
- [CCGRID 2023] V. Kulkarni, R. Bhoot, and Y. Simmhan, Enhancing Interval-centric Distributed Computing to Support Incremental Graphs, Accepted at Student Showcase, IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) 2023
- [HiPCW 2022] V. Kulkarni, R. Bhoot, and Y. Simmhan, Extending the Interval-centric Distributed Computing Model for Incremental Graphs pg – 65, in IEEE International Conference on High Performance Computing, Data and Analytics Workshop, 2022
- Animesh Baranawal and Yogesh Simmhan, Optimizing the Interval-centric Distributed Computing Model for Temporal Graph Algorithms, European Conference on Computer Systems (EuroSys), 2022, (Artifact Functional Badge)
- Manoj K Agarwal, Animesh Baranawal, Yogesh Simmhan, Manish Gupta, Event Related Data Collection from Microblog Streams, International Conference on Database and Expert Systems Applications (DEXA), 2021, 10.1007/978-3-030-86475-0_31
- Shriram Ramesh, Animesh Baranawal and Yogesh Simmhan, A Distributed Path Query Engine for Temporal Property Graphs , IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) , 2020 , pp. 499-508, 10.1109/CCGrid49817.2020.00-43 [CORE A]
- Shriram Ramesh, Animesh Baranawal, and Yogesh Simmhan Granite: A Distributed Engine for Scalable Path Queries over Temporal Property Graphs, Journal of Parallel and Distributed Computing (JPDC), Vol. 151, Pages 94-111, May 2021, 10.1016/j.jpdc.2021.02.004, (CORE A*)
- Swapnil Gandhi, and Yogesh Simmhan , An Interval-centric Model for Distributed Computing over Temporal Graphs, International Conference on Data Engineering (ICDE), 2020, pp. 1129-1140 10.1109/ICDE48307.2020.00102, (CORE A*)
- [CCGRID 2023] Pranjal Naman and Yogesh Simmhan, Performance Modelling of Graph Neural Networks, Accepted at Student Showcase, IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) 2023
- [CCGRID 2023] Varad Kulkarni, Akarsh Chaturevedi, Pranjal Naman and Yogesh Simmhan, To Think Like a Vertex (or Not) for Distributed Training of Graph Neural Networks, Accepted at Student Showcase, IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) 2023
Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of streaming graph partitioning methods. Streaming methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Typically, these partitioning algorithms aim to balance the edge counts across partitions while minimizing the vertex replication. However, such objectives ignore the community structure inherently embedded in the graph, which is an important partitioning quality metric for clustering and graph mining applications that subsequently operate on the partitions. To address this gap, we propose a novel optimization function that uses maximizing of the number of local triangles in the partitions as an additional objective function. Triangle count is an effective metric to measure the conservation of community structure. Further, we propose a family of cascading heuristics to perform online partitioning over an edge stream of a graph, which use three complementary state data structures: Bloom Filters, Triangle Map and High degree Map. We validate our partitioning algorithms on 5 diverse real world graphs with up to 17 % triangles, using both random and BFS ordered edge stream
Distributed GNN Platforms
Graph Neural Networks (GNN) have gained traction to perform tasks like node classification, graph classification, edge prediction and community detection, with applications ranging from predicting the trajectory of a pandemic to product recommendations. GNN training couples the structure of the graph and properties on its vertices and edges, with iterative neural network-based learning. Whole-graph GNN training on large graphs is computationally costly due to the dual effects of the irregular nature of graph execution and the compute-intensive nature of DNN training. This has seen the evolution of several GNN training platforms. Distributed training of GNNs has gained much importance given the large sizes of the real-world graphs. We study the low-level architectures and optimizations that go into building compute and communication aware distributed GNN systems. We also explore programming paradigms and algorithms to come up with efficient distributed GNN training frameworks. As a part of this, we have built a novel distributed GNN training platform based on Google’s Pregel programming model. This is one of the first pieces of work that extends a traditional graph processing platform for GNN training. Our initial works on both performance modelling and vertex centric GNN programming model have been accepted at the early career and student showcase, CCGRID 2023.