User Tools

Site Tools


projects:wukong

Fast and Concurrent Query Processing on Big (Linked) Data

News

  • 2017 Oct. 27: the first release (v0.1.0) of wukong @github. code
  • 2017 Aug. 5: our wukong/streaming paper has been accepted by SOSP 2017
  • 2016 Jul. 31: our wukong paper has been accepted by OSDI 2016

Overview

Many knowledge bases like Google and Facebook’s knowledge/social graphs are represented and stored as RDF graphs, where users can issue structured queries on such graphs using SPARQL. With massive queries over large and constantly growing RDF data, it is imperative that an RDF graph store should provide low latency and high throughput for concurrent query processing. However, prior systems still experience high per-query latency over large datasets and most prior designs have poor resource utilization such that each query is processed in sequence.

We propose Wukong, a distributed in-memory RDF store that leverages RDMA-based graph exploration to support fast and concurrent RDF queries. Wukong significantly outperforms state-of-the-art systems and can process a mixture of small and large queries at 185,000 queries/second on a 6-node cluster.

Approaches

Wukong extends existing graph based store with builtin index vertices and leverages differentiated graph partitioning to distribute vertices and indexes. Wukong's design is centered around the use of low-latency, high-throughput one-sided RDMA operations, including a predicate-based RDMA-friendly distributed hashtable, RDMA cost-aware adaption among migration code and data, RDMA-aware full-history pruning. To support highly concurrent queries, Wukong further leverages a worker-obliger work stealing design that minimizes the impact from lengthy queries.

Support Streaming

We further propose Wukong+S (S for stream) that adopts C-SPARQL as streaming model and extends Wukong to support concurrent queries on multiple varied-scale streams as well as the background data. Wukong+S can process a mixture of simple and complex C-SPARQL queries at 56,000 queries/second on a 6-node cluster.

People

Faculties

Students

  • Youyang Yao, Jiaqi Li, Ning Wang, Wenhao Zhang, Xuehan Ke, Xiating Xie

Alumni

  • Jiaxin Shi (Software Engineer at Baidu)
  • Yunhao Zhang (Ph.D Student at Cornell)
  • Chang Lou (Ph.D Student at John Hopkins)

Collaborators

Publications

  • [OSDI] Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, US, Nov, 2016. [pdf]
  • [SOSP] Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data. Yunhao Zhang, Rong Chen, and Haibo Chen. Proceedings of the 26th ACM Symposium on Operating Systems Principles, Shanghai, China, October, 2017. [pdf]

Source Code

You can use git clone or just download zip archive to get the codes

The source code of Wukong is available at github

git clone git@github.com:SJTU-IPADS/wukong.git

The source code of Wukong+S is available at github (coming soon)

git clone git@github.com:SJTU-IPADS/wukong-s.git

(We will merge the new feature to Wukong by the end of 2017. Coming soon!)

Acknowledgements

The project is supported in part by China National Natural Science Foundation (61402284, 61772335, 61572314, 61525204), the National Key Research & Development Program (No. 2016YFB1000500), the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant (No. 2014A05), and Singapore CREATE E2S2.

projects/wukong.txt · Last modified: 2017/10/27 22:33 by realstolz