Efficient In-memory Transactional Processing Using New Advanced Hardware

Overview

The commercial availability of Intel’s Haswell processor suggests that hardware transactional memory (HTM), a technique inspired by database transactions, is likely to be widely exploited for commercial in-memory databases in the near future. Its features such as hardware-maintained read/write sets and automatic conflict detection naturally put forward a challenge on how HTM can successfully support concurrency control for fast in-memory transaction processing.

We aim at study the feasibility of applying hardware transactional memory to boost the performance of database transactions.

Approaches

1) Characterizing commercial HTM for multicore scaling: Does hardware transactional memory actually deliver its promise in practice? To answer this question, we try to shed some light on this question by studying the performance scalability of a concurrent skip list using competing synchronization techniques, including fine-grained locking, lock-free and RTM. Our experience suggests that RTM in-deed simplifies the implementation, however, a lot of care must be taken to get good performance. Specifically, to avoid excessive aborts due to RTM capacity miss or conflicts, programmers should move memory allocation/deallocation out of RTM region, tuning fallback functions, and use compiler optimization. [APSys'13]

2) Scalable IMDB using HTM: Can HTM boost the performance of database transactions while reduce implementation complexity? To answer this question, we implement a multicore in-memory database for online transaction processing (OLTP) workloads. Our system, called DBX, implements a transactional key-value store using Intel’s Restricted Transactional Memory (RTM). DBX achieves 506,817 transactions per second under 8 threads. [EuroSys'14]

3) Reusing HTM for Concurrency Control of IMDB: HTM and concurrency control in databases share significant similarity: 1) tracking read/write sets; 2) detecting conflicting accesses. Such similarities intrigue us to study the feasibility of directly reusing HTM for database concurrency control. However, HTM only provides “ACI” instead of “ACID” and has limited working set. We address this challenge through an optimized transaction chopping algorithm and an efficient snapshot algorithm for durability. The resulting system, DBX-TC, achieves the peak throughput of 604,220 txns/sec for TPC-C at 8 threads, which outperforms DBX by 36% to 43% at 8 cores at different contention levels. [TR]

4) Persistent Transactional Memory: Due to the lack of durability support, HTM usually requires a complex software mechanism to asynchronously log transactions into persistent storage. This affects both performance and durability. With the emergence of persistent memory, we propose persistent transactional memory (PTM), a new design that adds (eventual) durability to transactional memory (TM) by incorporating with the emerging non-volatile memory (NVM). We describes the PTM design based on Intel’s restricted transactional memory. A preliminary evaluation using a concurrent key/value store and a database with a cache-based simulator shows that the additional cache line flushes are small. [IEEE CAL]

5) Scalable and Efficient Distributed Transactions using HTM and RDMA (DrTM): We further study how HTM and RDMA can be collectively used to scale out distributed transactions. The key to DrTM’s high performance is mainly offloading concurrency control within a local processor into HTM and leveraging the strong atomicity of RDMA operations concerning HTM to ensure serializability among concurrent transactions across machines. DrTM is built with a lease-based protocol that al- lows read-read sharing of database records among concurrent transactions, as well as an RDMA-friendly hash table that leverages HTM to notably reduce RDMA operations. Evaluation using typical transactional workloads including TPC-C and SmallBank show that DrTM scales well on a 6-node cluster and achieves over 5.52 million transactions per second for TPC-C. This number outperforms a state-of-the-art distributed transaction system (namely Calvin) by at least 17.9X. [SOSP'15]

6) Fast and General Distributed Transactions using RDMA and HTM (DrTM+R): we present DrTM+R, a fast in-memory transaction processing system that retains the performance benefit from advanced hardware features, while supporting general transactional workloads and high availability through replication. DrTM+R addresses the generality issue by designing a hybrid OCC and locking scheme, which leverages the strong atomicity of HTM and the strong consistency of RDMA to preserve strict serializability with high performance. Evaluation using typical OLTP workloads like TPC-C and SmallBank shows that DrTM+R scales well on a 6-node cluster and achieves over 5.69 and 94 million transactions per second without replication for TPC-C and SmallBank respectively. Enabling 3-way replication on DrTM+R only incurs at most 41% overhead before reaching network bottleneck. [EuroSys'16]

People

Faculties

Students

Zhiyuan Dong, Sijie Shen, Fangmin Lu, Rongxin Chen

Past Students

Hao Qian, Yanzhe Chen, Jiaxin Shi, Qiubin Wu, Zhenhan Gong, Xiating Xie

Collaborators

Jinyang Li (NYU)

Publications

[NSDI] Unifying Timestamp with Transaction Ordering for MVCC with Decentralized Scalar Timestamp. Xingda Wei, Rong Chen, Haibo Chen, Zhaoguo Wang, Zhenhan Gong, and Binyu Zang. The 18th USENIX Symposium on Networked Systems Design and Implementation, Boston, MA, US, April 2021. [paper] talk github
[OSDI] Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache. Xingda Wei, Rong Chen, and Haibo Chen. The 14th USENIX Symposium on Operating Systems Design and Implementation, Banff, Alberta, Canada, November 2020. [paper] talk github
[OSDI] Deconstructing RDMA-enabled Transaction Processing: Hybrid is Better! Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. Proceedings of 13th USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, CA, US, October 2018. [paper]
[USENIX ATC] Replication-driven Live Reconfiguration for Fast Distributed Transaction Processing. Xingda Wei, Sijie Shen, Rong Chen, and Haibo Chen. Proceedings of 2017 USENIX Annual Technical Conference, Santa Clara, CA, US, July 2017. [paper]
[EuroSys] Fast and General Distributed Transactions Using RDMA and HTM. Yanzhe Chen, Xingda Wei, Jiaxin Shi, Rong Chen, and Haibo Chen. Proceedings of 11th ACM European Conference on Computer Systems, London, UK, April 2016. [paper] ACM DL
[SOSP] Fast In-memory Transaction Processing using RDMA and HTM. Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, Haibo Chen. In Proceedings of 2015 ACM Symposium on Operating System Principles (to appear), Monterey, CA, October 2015. [pdf]
[CAL] Persistent Transactional Memory. Zhaoguo Wang, Han Yi, Ran Liu, Mingkai Dong and Haibo Chen. IEEE Computer Architecture Letters. VOL. 14, NO. 1, JANUARY-JUNE 2015. [pdf]
[TR] Exploiting hardware transactional memory for efficient in-memory transaction processing. Hao Qian, Zhaoguo Wang, Haibing Guan, Binyu Zang, Haibo Chen. Tech. rep., Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, 2015. [pdf]
[EuroSys]Using Restricted Transactional Memory to Build a Scalable In-Memory Database. Zhaoguo Wang, Hao Qian, Jinyang Li, Haibo Chen. The European Conference on Computer Systems, Amsterdam, The Netherlands, 2014. [pdf]
[APsys] Opportunities and pitfalls of multi-core scaling using Hardware Transaction Memory. Zhaoguo Wang, Hao Qian, Haibo Chen, Jinyang Li. In Proceedings of Asia-Pacific Workshop on Systems, Singapore, 2013. [pdf]

Source Code

The source code of DrTM is available through
git clone git@github.com:SJTU-IPADS/drtm.git or
git clone http://ipads.se.sjtu.edu.cn:1312/opensource/drtm.git

Acknowledgements

The project is supported in part by the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), China National Natural Science Foundation (61572314, 61402284), and Singapore CREATE E2S2.

Table of Contents