User Tools

Site Tools


Efficient In-memory Transactional Processing Using HTM


The commercial availability of Intel’s Haswell processor suggests that hardware transactional memory (HTM), a technique inspired by database transactions, is likely to be widely exploited for commercial in-memory databases in the near future. Its features such as hardware-maintained read/write sets and automatic conflict detection naturally put forward a challenge on how HTM can successfully support concurrency control for fast in-memory transaction processing.

We aim at study the feasibility of applying hardware transactional memory to boost the performance of database transactions.


1) Characterizing commercial HTM for multicore scaling: Does hardware transactional memory actually deliver its promise in practice? To answer this question, we try to shed some light on this question by studying the performance scalability of a concurrent skip list using competing synchronization techniques, including fine-grained locking, lock-free and RTM. Our experience suggests that RTM in-deed simplifies the implementation, however, a lot of care must be taken to get good performance. Specifically, to avoid excessive aborts due to RTM capacity miss or conflicts, programmers should move memory allocation/deallocation out of RTM region, tuning fallback functions, and use compiler optimization. [APSys'13]

2) Scalable IMDB using HTM: Can HTM boost the performance of database transactions while reduce implementation complexity? To answer this question, we implement a multicore in-memory database for online transaction processing (OLTP) workloads. Our system, called DBX, implements a transactional key-value store using Intel’s Restricted Transactional Memory (RTM). DBX achieves 506,817 transactions per second under 8 threads. [EuroSys'14]

3) Reusing HTM for Concurrency Control of IMDB: HTM and concurrency control in databases share significant similarity: 1) tracking read/write sets; 2) detecting conflicting accesses. Such similarities intrigue us to study the feasibility of directly reusing HTM for database concurrency control. However, HTM only provides “ACI” instead of “ACID” and has limited working set. We address this challenge through an optimized transaction chopping algorithm and an efficient snapshot algorithm for durability. The resulting system, DBX-TC, achieves the peak throughput of 604,220 txns/sec for TPC-C at 8 threads, which outperforms DBX by 36% to 43% at 8 cores at different contention levels. [TR]

4) Persistent Transactional Memory: Due to the lack of durability support, HTM usually requires a complex software mechanism to asynchronously log transactions into persistent storage. This affects both performance and durability. With the emergence of persistent memory, we propose persistent transactional memory (PTM), a new design that adds (eventual) durability to transactional memory (TM) by incorporating with the emerging non-volatile memory (NVM). We describes the PTM design based on Intel’s restricted transactional memory. A preliminary evaluation using a concurrent key/value store and a database with a cache-based simulator shows that the additional cache line flushes are small. [IEEE CAL]

5) Scalable and Efficient Distributed Transactions using HTM and RDMA (DrTM): We further study how HTM and RDMA can be collectively used to scale out distributed transactions. The key to DrTM’s high performance is mainly offloading concurrency control within a local processor into HTM and leveraging the strong atomicity of RDMA operations concerning HTM to ensure serializability among concurrent transactions across machines. DrTM is built with a lease-based protocol that al- lows read-read sharing of database records among concurrent transactions, as well as an RDMA-friendly hash table that leverages HTM to notably reduce RDMA operations. Evaluation using typical transactional workloads including TPC-C and SmallBank show that DrTM scales well on a 6-node cluster and achieves over 5.52 million transactions per second for TPC-C. This number outperforms a state-of-the-art distributed transaction system (namely Calvin) by at least 17.9X. [SOSP'15]. We also extended DrTM with the support for high availability by using an optimistic replication protocol. [EuroSys'16]

6) Fast and Concurrent RDF queries using RDMA (Wukong). We further extend existing graph-based store with builtin index vertices and leverages differentiated graph partitioning to distribute vertices and indexes. Wukong's design is centered around the use of low-latency, high-throughput one-sided RDMA operations, including a predicate-based RDMA-friendly distributed hashtable, RDMA cost-aware adaption among migration code and data, RDMA-aware full-history pruning. To support highly concurrent queries, Wukong further leverages a worker-obliger work stealing design that minimizes the impact from lengthy queries. [OSDI 2016]




Past Students



  • [ACM TOCS] Fast In-memory Transaction Processing using RDMA and HTM. Haibo Chen, Rong Chen, Xingda Wei, Jiaxin Shi, Yanzhe Chen, Zhaoguo Wang, Binyu Zang. ACM Transactions on Computer Systems, Accepted, 2017. (extended version of SOSP 2015 paper)
  • [USENIX ATC] Replication-driven Live Reconfiguration for Fast Distributed Transaction Processing. Xingda Wei, Sijie Shen, Rong Chen, Haibo Chen. Proceedings of 2017 USENIX Annual Technical Conference, Santa Clara, CA, US, Jul, 2017.
  • [EuroSys] Fast and General Distributed Transactions Using RDMA and HTM. Yanzhe Chen, Xingda Wei, Jiaxin Shi, Rong Chen and Haibo Chen. 11th ACM European Conference on Computer Systems (to appear), London, UK, April, 2016. [pdf]
  • [SOSP] Fast In-memory Transaction Processing using RDMA and HTM. Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, Haibo Chen. 2015 ACM Symposium on Operating System Principles (to appear), Monterey, CA, October 2015. [pdf]
  • [CAL] Persistent Transactional Memory. Zhaoguo Wang, Han Yi, Ran Liu, Mingkai Dong and Haibo Chen. IEEE Computer Architecture Letters. VOL. 14, NO. 1, JANUARY-JUNE 2015. [pdf]
  • [TR] Exploiting hardware transactional memory for efficient in-memory transaction processing. Hao Qian, Zhaoguo Wang, Haibing Guan, Binyu Zang, Haibo Chen. Tech. rep., Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, 2015. [pdf]
  • [EuroSys]Using Restricted Transactional Memory to Build a Scalable In-Memory Database. Zhaoguo Wang, Hao Qian, Jinyang Li, Haibo Chen. The European Conference on Computer Systems, Amsterdam, The Netherlands, 2014. [pdf]
  • [APsys] Opportunities and pitfalls of multi-core scaling using Hardware Transaction Memory. Zhaoguo Wang, Hao Qian, Haibo Chen, Jinyang Li. In Proceedings of Asia-Pacific Workshop on Systems, Singapore, 2013. [pdf]

Source Code

The source code of DrTM is available through
git clone or
git clone


The project is supported in part by the Program for New Century Excellent Talents in University of Ministry of Education of China (No.ZXZY037003), a foundation for the Author of National Excellent Doctoral Dissertation of PR China(No. TS0220103006), Doctoral Fund of Ministry of Education of China (No. 20130073120040), China National Natural Science Foundation (61572314, 61402284), and Singapore CREATE E2S2.

pub/projects/drtm.txt · Last modified: 2017/05/11 22:40 by realstolz