- [14/04/2014] +Async: a new hybrid asynchronous engine on PowerLyra.
- [27/02/2014] BiGraph: a set of distributed graph partition algorithms designed for bipartite applications
- [19/11/2013] PowerLyra: a new graph analytics engine that differentiates graph computation and partitioning.
Hybrid Paritioning and Computation
Natural graphs with skewed distribution raise unique challenges to graph computation and partitioning. Existing graph analytics frameworks usually use a "ONE size fits ALL" design that uniformly processes all vertices, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab), or incur high communication cost among vertices even for low-degree vertices (e.g., PowerGraph).
We argue that skewed distribution in natural graphs also calls for differentiated processing of high-degree and low-degree vertices. We then introduce PowerLyra, a new graph analytics engine that embraces the best of both worlds of existing frameworks, by dynamically applying different computation and partition strategies for different vertices. PowerLyra uses Pregel/GraphLab like computation models for process low-degree vertices to minimize computation, communication and synchronization overhead, and uses PowerGraph-like computation model for process high-degree vertices to reduce load imbalance and contention. To seamless support all PowerLyra application, PowerLyra further introduces an adaptive unidirectional graph communication.
PowerLyra additionally proposes a new hybrid graph cut algorithm that embraces the best of both worlds in edge-cut and vertex-cut, which adopts edge-cut for low-degree vertices and vertex-cut for high-degree vertices. Both theoretical analysis and empirical validation show that the expected replication factor of random hybrid-cut is alway better than both random (Hash-based) and heuristic (e.g., Oblivious or Grid) vertex-cut for skewed power-law graphs. We also develop a new distributed greedy heuristic hybrid-cut algorithm, namely Ginger, inspired by Fennel. Compared to Gird vertex-cut, Ginger can reduce the replication factor by up to 3.11X (from 1.26X) and 2.92X (from 2.03X) for real-world and synthetic graphs accordingly. Note that hybrid graph partitions also work with existing synchronous and asynchronous enignes of GraphLab.
Finally, PowerLyra adopts locality-conscious data layout optimization in graph ingress phase to mitigate poor locality during vertex communication. we argue that a small increase of graph ingress time (less than 5% and 10% for real-world and power-law graphs respectively) is more worthwhile for an often larger speedup in execution time (usually more than 10% speedup, specially 21% for Twitter follow graph).
PowerLyra is implemented as an execution engine and graph partitions of GraphLab, and can seamlessly support all GraphLab applications. A detailed evaluation on a 48-node cluster using three different graph algorithms (PageRank, Approximate Diameter and Connected Components) show that PowerLyra outperforms current synchronous engine with Grid partition of PowerGraph (Jul. 8, 2013. commit:fc3d6c6) by up to 5.53X (from 1.97X) and 3.26X (from 1.49X) for real-world (Twitter, UK-2005, Wiki, LiveJournal and WebGoogle) and synthetic (10-million vertex power-law graph ranging from 1.8 to 2.2) graphs accordingly, due to significantly reduced replication factor, less communication cost and improved load balance.
Many machine learning and data mining (MLDM) problems like recommendation, topic modeling and medical diagnosis can be modeled as computing on bipartite graphs, whose vertices can be separated as two disjoint sets U and V and every edge connects a vertex in U and V. Due to the wide existence and importance of bipartite graphs, there have been a number of popular MLDM algorithms that operate on such graphs, including Singular Value Decomposition (SVD), Alternating Least Squares (ALS), Stochastic Gradient Descent (SGD), Belief Propagation (BP) and latent Dirichlet allocation (LDA).
However, most existing systems simply apply general graph partitioning algorithms that are oblivious to the unique features of bipartite graphs. This results in suboptimal graph partition with significant replicas of vertices and/or edges, leading to not only redundant computation, but also excessive network communication to synchronize graph states.
we make a systematic study on the characteristics of real-world bipartite graphs and the related MLDM algorithms, and describe why existing online distributed graph partitioning algorithms fail to produce an optimal graph partition for bipartite graphs. First, real-world bipartite graphs for MLDM are usually imbalanced. This means that the size of two subsets in bipartite graph is significantly skewed, even in the scale of several orders of magnitude. Second, the computation load of many MLDM algorithms for bipartite graphs may also be skewed among vertices from the two subsets. Finally, the size of data associated with vertices from the two subsets can be significantly skewed.
Based on the study, we argue that the unique properties of bipartite graphs and special requirement of bipartite algorithms demand differentiated partitioning of the two disjoint sets of vertices. We introduce BiGraph, a set of distributed graph partition algorithms designed for bipartite applications. The key of BiGraph is to partition graphs in a differentiated way and loading data according to the data affinity.
BiGraph is implemented as separate graph partition modules of GraphLab, and can seamlessly work with existing synchronous and asynchronous engines. Our experiment using three MLDM algorithms (SVD, ALS and SGD) on an in-house 6-machine multicore cluster with 144 CPU cores shows that BiGraph reduces up to 62% vertex replication, and saves up to 96% network traffic. This transforms to a speedup up to 17.75X (from 1.38X) compared to default partitioning algorithms (e.g., Oblivious and Grid) using the same default synchronous engine.