Many knowledge bases like Google and Facebook’s knowledge/social graphs are represented and stored as RDF graphs, where users can issue structured queries on such graphs using SPARQL. With massive queries over large and constantly growing RDF data, it is imperative that an RDF graph store should provide low latency and high throughput for concurrent query processing. However, prior systems still experience high per-query latency over large datasets and most prior designs have poor resource utilization such that each query is processed in sequence.
We propose Wukong, a distributed in-memory RDF store that leverages RDMA-based graph exploration to support fast and concurrent RDF queries. Wukong significantly outperforms state-of-the-art systems and can process a mixture of small and large queries at 185,000 queries/second on a 6-node cluster.
Wukong extends existing graph-based store with builtin index vertices and leverages differentiated graph partitioning to distribute vertices and indexes. Wukong's design is centered around the use of low-latency, high-throughput one-sided RDMA operations, including a predicate-based RDMA-friendly distributed hash table, RDMA cost-aware adaption among migration code and data, RDMA-aware full-history pruning. To support highly concurrent queries, Wukong further leverages a worker-obliger work stealing design that minimizes the impact from lengthy queries.
We further propose Wukong+S (S for stream) that adopts C-SPARQL as streaming model and extends Wukong to support concurrent queries on multiple varied-scale streams as well as the background data. Wukong+S can process a mixture of simple and complex C-SPARQL queries at 56,000 queries/second on a 6-node cluster.
You can use git clone or just download zip archive to get the codes
The source code of Wukong is available at github
git clone firstname.lastname@example.org:SJTU-IPADS/wukong.git
The source code of Wukong+S is available at github (coming soon)
git clone email@example.com:SJTU-IPADS/wukong-s.git
The project is supported in part by China National Natural Science Foundation (61402284, 61772335, 61572314, 61525204), the National Key Research & Development Program (No. 2016YFB1000500), the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant (No. 2014A05), and Singapore CREATE E2S2.