Table of Contents
Fast and Concurrent Query Processing on Big (Linked) Data
- 2019 May 23: our wukong w/ live migration paper has been accepted by USENIX ATC 2019
- 2018 Apr. 19: our wukong w/ GPU paper has been accepted by USENIX ATC 2018
- 2017 Oct. 27: the first release (v0.1.0) of wukong @github. code
- 2017 Aug. 5: our wukong/streaming paper has been accepted by SOSP 2017
- 2016 Jul. 31: our wukong paper has been accepted by OSDI 2016
Many knowledge bases like Google and Facebook’s knowledge/social graphs are represented and stored as RDF graphs, where users can issue structured queries on such graphs using SPARQL. With massive queries over large and constantly growing RDF data, it is imperative that an RDF graph store should provide low latency and high throughput for concurrent query processing. However, prior systems still experience high per-query latency over large datasets and most prior designs have poor resource utilization such that each query is processed in sequence.
We propose Wukong, a distributed in-memory RDF store that leverages RDMA-based graph exploration to support fast and concurrent RDF queries. Wukong significantly outperforms state-of-the-art systems and can process a mixture of small and large queries at 185,000 queries/second on a 6-node cluster.
Wukong extends existing graph-based store with builtin index vertices and leverages differentiated graph partitioning to distribute vertices and indexes. Wukong's design is centered around the use of low-latency, high-throughput one-sided RDMA operations, including a predicate-based RDMA-friendly distributed hash table, RDMA cost-aware adaption among migration code and data, RDMA-aware full-history pruning. To support highly concurrent queries, Wukong further leverages a worker-obliger work stealing design that minimizes the impact from lengthy queries.
We further propose Wukong+S (S for stream) that adopts C-SPARQL as streaming model and extends Wukong to support concurrent queries on multiple varied-scale streams as well as the background data. Wukong+S can process a mixture of simple and complex C-SPARQL queries at 56,000 queries/second on a 6-node cluster.
- Jiaqi Li, Wenhao Zhang, Xuehan Ke, Xiating Xie, Zhenhan Gong, Zihang Yao
- [OSDI 2016] Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, US, November 2016. [paper] [slides] [poster] homepage github
- [APSys 2018] Analysis and Improvement of Optimizer for Query Processing on Graph Store. Youyang Yao, Jiaqi Li and Rong Chen. Proceedings of the 9th ACM SIGOPS Asia-Pacific Workshop on Systems, Jeju Island, South Korea. August 2018. [paper]
- [USENIX ATC 2019] Pragh: Locality-preserving Graph Traversal with Split Live Migration. Xiating Xie, Xingda Wei, Rong Chen, and Haibo Chen. Proceedings of 2019 USENIX Annual Technical Conference, Renton, WA, US, July 2019.
You can use git clone or just download zip archive to get the codes
The source code of Wukong is available at github
git clone firstname.lastname@example.org:SJTU-IPADS/wukong.git
The source code of Wukong+S is available at github (coming soon)
git clone email@example.com:SJTU-IPADS/wukong-s.git
The project is supported in part by China National Natural Science Foundation (61402284, 61772335, 61572314, 61525204), the National Key Research & Development Program (No. 2016YFB1000500), the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant (No. 2014A05), and Singapore CREATE E2S2.