User Tools

Site Tools


Zhaoguo Wang (王肇国)


Institute of Parallel and Distributed Systems

Fudan University

E-mail: zgwang at fudan dot edu dot cn

Mailing Address: Room 3403, Software Institute, 825 Dongchuan Rd., Shanghai China, 201203.

About Me

I am a third year Ph.D student under the supervision of Prof. Binyu Zang, Prof. Haibo Chen and Prof. Jinyang Li . My research interests are operating system and system virtualization. I intend to graduate at the summer of 2014.

Research Area

  • Multicore In-Memory Database
  • Scalable Full System Emulation


  • Ph.D Candidate, Computer Science, Fudan University (2011.9 ~ now)
  • M.S. Software School, Fudan University (2008.9 ~ 2011.6)
  • B.S. Software School, Nanjing University (2004.9 ~ 2008.6)


Zhaoguo Wang, Han Yi, Ran Liu, Mingkai Dong and Haibo Chen. “Persistent Transactional Memory.” In Proceedings of IEEE Computer Architecture Letters (CAL 2014), to appear.

Zhaoguo Wang, Hao Qian, Jinyang Li and Haibo Chen. “Using Restricted Transactional Memory to Build a Scalable In-Memory Database.” In Proceedings of The European Conference on Computer Systems (EuroSys 2014), Amsterdam, The Netherlands, 2014.[EuroSys'14]

Zhaoguo Wang, Hao Qian, Haibo Chen and Jinyang Li. “Opportunities and pitfalls of multicore scaling using hardware transaction memory.” In Proceedings of the 4th Asia-Pacific Workshop on Systems (APSYS 2013), p. 3. ACM, 2013.[APSYS'13]

Zhaoguo Wang, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zang, Binyu Zang. “COREMU: a scalable and portable parallel full-system emulator.” In Proceedings of the 16th ACM symposium on Principles and Practice of Parallel Programming (PPoPP 2011), pp. 213-222. ACM, 2011 .[PPoPP'11]


Persistent Transactional Memory

2013.12 - current

Persistent transaction memory (PTM) is a new design that adds durability to transaction memory (TM) by incorporating with the emerging non-volatile memory. PTM dynamically tracks transactional updates to cache lines to ensure the ACI (atomicity, consistency and isolation) properties during cache flushes and leverages an undo log in persistent memory to ensure PTM can always consistently recover transactional data structures from a machine crash. I'm the main developer of this system.


2013.4 - current

DBX is an in-memory database that uses Intel’s restricted transaction memory (RTM) to achieve high performance and good scalability across multicore machines. It uses a modular 2-tier architecture where a database transactional layer is built on top of an underlying shared-memory store. The memory store subsystem implements concurrent b+ tree, skiplist and hashtable with RTM. The transaction protocol is a variant of optimistic concurrency control implemented with RTM. It has comparable performance with state-of-the-art in-memory database but much simpler implementation.


2011.3 - 2011.10 (Intern At MSRA, Supervised by Zhenyu Guo)

Asynchronous workflow is commonly used to provide low latency. Due to the app server is stateless, failure may cause the message lost or redundent. iThread is a framework to provide reliable workflow for large scale internet service. It uses replicated state machine to provide durabilty and high availability. By logging the committed and removed event id, it guarantees that each event is executed exactly once. This work was done at System Research Group of Microsoft Research Asia. In this project, I'm responsible for implementing a microblog service using iThread framework.


2009.3 - 2010.11

COREMU is a scalable and portable parallel emulation framework that decouples the complexity of building a mature sequential emulator from providing a parallelized version. The key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources.

COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. We have built a working prototype by reusing the widely-used QEMU as the sequential emulator, with only 1800 LOCs change to QEMU.

COREMU currently fully supports x64 and ARMplatforms, and can emulates up to 255 cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU.

The code has been released on

pub/members/zhaoguo_wang.txt · Last modified: 2017/02/01 12:25 by root