powerlyra logo

Latest News
[20/06/2016] Graph/H: port hybrid-cut partitioning to GraphX-1.5.2
[14/04/2014] +Async: add a new hybrid async engine
[27/02/2014] BiGraph: a set of new distributed graph partition algorithms designed for bipartite applications
[19/11/2013] PowerLyra: a new graph analytics engine that differentiates graph computation and partitioning

Related Projects
Hybrid Paritioning and Computation
Natural graphs with skewed distribution raise unique challenges to graph computation and partitioning. Existing graph analytics frameworks usually use a "ONE size fits ALL" design that uniformly processes all vertices, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab), or incur high communication cost among vertices even for low-degree vertices (e.g., PowerGraph and GraphX).
We argue that skewed distribution in natural graphs also calls for differentiated processing of high-degree and low-degree vertices. We then introduce PowerLyra, a new graph analytics engine that embraces the best of both worlds of existing frameworks, by dynamically applying different computation and partitioning strategies for different vertices. PowerLyra uses Pregel/GraphLab-like computation models for process low-degree vertices to minimize computation, communication and synchronization overhead, and uses PowerGraph-like computation model for process high-degree vertices to reduce load imbalance, contention and memory pressure. PowerLyra follows the interface of GAS (Gather, Apply and Scatter) model and can seamlessly support various graph algorithms (e.g., all GraphLab toolkits).
PowerLyra additionally proposes a new hybrid graph cut algorithm that embraces the best of both worlds in edge-cut and vertex-cut, which evenly distributes low-degree vertices along with their edges like edge-cut, and evenly distributes edges of high-degree vertices like vertex-cut. Both theoretical analysis and empirical validation show that the expected replication factor of random hybrid-cut is alway better than both random (Hash-based), contrained (e.g., Grid), and heuristic (e.g., Oblivious or Coordinated) vertex-cut for skewed power-law graphs. We also develop a new distributed greedy heuristic hybrid-cut algorithm, namely Ginger, inspired by Fennel. Note that hybrid graph partitioning algorithms also work with existing synchronous and asynchronous enignes of GraphLab.
Finally, PowerLyra adopts locality-conscious data layout optimization in graph ingress phase to mitigate poor locality during communication. For graph computation that processes a graph multiple iterations and even multiple times in memory, we argue that a small increase of graph ingress time is more worthwhile for an often larger speedup in execution time.
PowerLyra is implemented as separate execution engines and partitioning algorithms of GraphLab, and can seamlessly support all GraphLab toolkits. A detailed evaluation on a VM-based 48-node EC2-like cluster and a physical 6-node in-house cluster using three graph analytics and two MLDM algorithms show that PowerLyra outperforms PowerGraph (Mar. 26, 2014. commit:6f787c7e35) by up to 5.53X (from 1.24X) and 3.26X (from 1.49X) for real-world and synthetic graphs accordingly, and is much faster than other systems (e.g., Giraph, GPS, GraphX and CombBLAS) in both graph ingress and computation time, yet with less memory consumption.

Bipartite-aware Paritioning
Many machine learning and data mining (MLDM) problems like recommendation, topic modeling and medical diagnosis can be modeled as computing on bipartite graphs, whose vertices can be separated as two disjoint sets U and V and every edge connects a vertex in U and V. Due to the wide existence and importance of bipartite graphs, there have been a number of popular MLDM algorithms that operate on such graphs, including Singular Value Decomposition (SVD), Alternating Least Squares (ALS), Stochastic Gradient Descent (SGD), Belief Propagation (BP) and latent Dirichlet allocation (LDA).
However, most existing systems simply apply general graph partitioning algorithms that are oblivious to the unique features of bipartite graphs. This results in suboptimal graph partition with significant replicas of vertices and/or edges, leading to not only redundant computation, but also excessive network communication to synchronize graph states.
we make a systematic study on the characteristics of real-world bipartite graphs and the related MLDM algorithms, and describe why existing online distributed graph partitioning algorithms fail to produce an optimal graph partition for bipartite graphs. First, real-world bipartite graphs for MLDM are usually imbalanced. This means that the size of two subsets in bipartite graph is significantly skewed, even in the scale of several orders of magnitude. Second, the computation load of many MLDM algorithms for bipartite graphs may also be skewed among vertices from the two subsets. Finally, the size of data associated with vertices from the two subsets can be significantly skewed.
Based on the study, we argue that the unique properties of bipartite graphs and special requirement of bipartite algorithms demand differentiated partitioning of the two disjoint sets of vertices. We introduce BiGraph, a set of distributed graph partition algorithms designed for bipartite applications. The key of BiGraph is to partition graphs in a differentiated way and loading data according to the data affinity.
BiGraph is implemented as separate graph partition modules of GraphLab, and can seamlessly work with existing synchronous and asynchronous engines. Our experimental results on a VM-based 48-node EC2-like cluster and a physical 6-node in-house cluster using three typical MLDM algorithms (SVD, ALS and SGD) show that BiGraph reduces up to 81% vertex replication, and saves up to 96% network traffic. This transforms to a speedup up to 17.75X (from 1.16X) compared with the default partitioning algorithms (e.g., Oblivious and Grid) using the same default synchronous engine.


If you use PowerLyra in your work or research, please let us know about it. We also encourage you to reference our papers.

The PowerLyra source code is distributed with a Apache license 2.0. The copyright is held by Shanghai Jiao Tong University. PowerLyra is provided "as is" without any warranties and conditions of any kind.
The latest release is based on GraphLab 2.2 (Feb. 11, 2015. commit:18c2103), but without locality-conscious graph layout optimization. You can check out the newest code from the github mirror or from IPADS's gitlab server: git clone http://ipads.se.sjtu.edu.cn:1312/opensource/powerlyra.git.
If you are interested in estimating or working with Powerlyra hybrid engine and graph partitions in our technical report, you can also obtain a snapshot (~80MB) from Sep. 25, 2013, which is based on GraphLab 2.2 (Jul. 8, 2013. commit:fc3d6c6). MD5: 32685a65d6edc2e52d791a2cffef1dfa
NOTE: the name of engines and graph partitions are different in latest release and snapshot code (sorry for confusion).
Once you've downloaded PowerLyra, you can find instructions for installing and building it on the GraphLab's documentation page and github. You just need to choose our hybrid exuection engines and partitioning strategies in command line
We also extend GraphX in Apache Spark (commit:49c30c1) to support several heuristic graph paritioning algorithms, including hybrid-cut in PowerLyra (Hybrid) and greedy vertex-cut in PowerGraph (Oblivious and Coordinated). You can check out the code from IPADS's gitlab server: git clone http://ipads.se.sjtu.edu.cn:1312/opensource/graphx-hybrid.git.

Quick Start

Usage#1: Running PageRank algorithm on Twitter Follower graph using PowerLyra engine with Ginger graph partition.

  $mpiexec -f ~/mpd.hosts -n 48 ./pagerank --format=snap --graph=/path/to/twitter_  
   --graph_opts ingress=hybrid_ginger,threshold=200,interval=1000,nedges=1468364884,nverts=41652230 
   --engine plsync --engine_opts max_iterations=20
      --engine engine : set execution engine (plsync: powerlyra synchronous engine)
      --graph_opts    : set graph options
           ingress    : set ingress algorithms (hybrid: random | hybrid_ginger: heuristic)
           threshold  : set threshold to differentiate low-degree and high-degree vertices, default is 100
           interval   : set interval of coordination, default is never (only for hybrid_ginger)
           nedges     : set total number of edges for input graph (only for hybrid_ginger)
           nverts     : set total number of vertices for input graph (only for hybrid_ginger)

Usage#2: Running Approximate Diameter algorithm on synthetic power-law graph using PowerLyra engine with Hybrid graph partition.

  $mpiexec -f ~/mpd.hosts -n 48 ./approximate_diameter --format=snap --graph=/path/to/powerlaw-2.2-10m_ 
   --graph_opts ingress=hybrid,threshold=100 --engine plsync

Usage#3: Running LBP algorithm on synthetic matrix graph using PowerLyra engine with Ginger graph partition.

  $mpiexec -f ~/mpd.hosts -n 48 ./lbp_structured_prediction --graph /path/to/synth_edata2500w_ --engine plsync
   --prior /path/to/synth_vdata2500w_ --graph_opts ingress=hybrid_ginger,interval=1000,nedges=49990000,nverts=25000000

Usage#4: Running ALS algorithm on Small Netflix graph using GraphLab engine with Bipartite graph partition.

  $mpiexec -f ~/mpd.hosts -n 6 ./als --matrix /path/to/smallnetflix/smallnetflix_mm --max_iter=5 --ncpus=24 
   --predictions=out_predictions --minval=1 --maxval=5 --D=100 --engine sync
   --graph_opts ingress=bipartite,favorite=source
      --graph_opts    : set graph options
           ingress    : set ingress algorithms (bipartite: random | bipartite_aweto: heuristic)
           favorite   : set favorite subset of bipartite graph ("source" or "target"), default is source
           affinity   : set data affinity or not for bipartite partition, default is 0 (only for bipartite)

Usage#5: Running SVD algorithm on AS-Skitter graph using GraphLab engine with Aweto graph partition.

  $mpiexec -f ~/mpd.hosts -n 6 ./svd --matrix /path/to/as --max_iter=3 --ncpus=24 --rows=1696416 --cols=1696417 
   --nsv=2 --nv=7 --tol=1e-2 --binary=true --save_vectors=0 --input_file_offset=0 --ortho_repeats=1 --engine sync
   --graph_opts ingress=bipartite_aweto,favorite=target


Contact Information
Please contact Rong Chen for further information. Welcome to let us know about bug reports and suggested improvements, while we may not be able to provide technical support timely. If you make some correction or improvement to PowerLyra, we would appreciate receiving a patch that we can include in the next release.

External Link

This page has been visited Website Hit Counter times since 11/9/2013.
Stats provided by the Easy Counter