[ASPLOS] History Doesn't Repeat Itself but Rollouts Rhyme: Accelerating Reinforcement Learning with HistoRL.Jingkai He, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yubin Xia, Haibo Chen. Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'26), March 2026.
[ASPLOS] PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline. Zhenliang Xue, Hanpeng Hu, Xing Chen, Yimin Jiang, Yixin Song, Zeyu Mi, Yibo Zhu, Daxin Jiang, Yubin Xia, Haibo Chen. Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'26), March 2026.
[FAST] Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC. Qingyuan Liu, Mo Zou, Hengbin Zhang, Dong Du, Yubin Xia, Haibo Chen. The 24th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, February, 2026.
[FAST] SolidAttention: Low-Latency SSD-based Serving on Memory-Constrained PCs. Xinrui Zheng, Dongliang Wei, Jianxiang Gao, Yixin Song, Zeyu Mi, Haibo Chen. The 24th USENIX Conference on File and Storage Technologies, Santa Clara, CA, USA, February, 2026.
[PPoPP] MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends. Feiyang Chen, Yu Cheng, Lei Wang, Yuqing Xia, Ziming Miao, Lingxiao Ma, Fan Yang, Jilong Xue, Zhi Yang, Mao Yang, Xingda Wei, Haibo Chen. Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'26). Jan, 2026.
[EuroSys] LLMFolder: Revisiting Constant Folding in Large Language Models. Gansen Hu, Zhaoguo Wang, Wei Huang, Jinglin Wei, and Haibo Chen. Proceedings of the 21st European Conference on Computer Systems, Edinburgh, UK, April, 2026.
[EuroSys] KunServe: Parameter-centric Memory Management for Efficient Memory Throttling Handling in LLM Serving. Rongxin Cheng, Yuxin Lai, Xingda Wei, Rong Chen, Haibo Chen. Proceedings of the 21st European Conference on Computer Systems, Edinburgh, UK, April, 2026.
[NSDI] FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline. Jingwei Xu, Junbin Kang, Mingkai Dong, Mingyu Liu, Lu Zhang, Shaohong Guo, Ziyan Qiu, Mingzhen You, Ziyi Tian, Anqi Yu, Tianhong Ding, Xinwei Hu, and Haibo Chen. Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI'26). Renton, WA, USA, May 4–6, 2026.
[
SOSP] PhoenixOS: Concurrent
OS-level GPU Checkpoint and Restore with Validated Speculation. Xingda Wei, Zhuobin Huang, Tianle Sun, Yingyi Hao, Rong Chen, Mingcong Han, Jinyu Gu, Haibo Chen. Proceedings of the 31th ACM Symposium on Operating Systems Principles (SOSP'25). Seoul, Republic of Korea, October 13 – 16, 2025.
[SOSP] Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference. Le Chen, Dahu Feng, Erhu Feng, Yingrui Wang, Rong Zhao, Yubin Xia, Pinjie Xu, Haibo Chen. Proceedings of the 31th ACM Symposium on Operating Systems Principles (SOSP'25). Seoul, Republic of Korea, October 13 – 16, 2025.
[SOSP] DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction. Yanqi Zhang, Yuwei Hu, Runyuan Zhao, John C. S. Lui, Haibo Chen. Proceedings of the 31th ACM Symposium on Operating Systems Principles (SOSP'25). Seoul, Republic of Korea, October 13 – 16, 2025.
[USENIX ATC] SAVE: Software-Implemented Fault Tolerance for Model Inference against GPU Memory Bit Flips. Wenxin Zheng, Bin Xu, Jinyu Gu, Haibo Chen. USENIX Annual Technical Conference, Boston, MA, USA, July 2025.
[USENIX ATC] KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider. Jiahao Wang, Jinbo Han, Xingda Wei, Sijie Shen, Dingyan Zhang, Chenguang Fang, Rong Chen, Wenyuan Yu, and Haibo Chen. USENIX Annual Technical Conference, Boston, MA, US, July 2025.
[OSDI] Preemptive Scheduling for Diverse XPUs using Multi-level Hardware Model. Weihang Shen, Mingcong Han, Jialong Liu, Rong Chen, and Haibo Chen. The 19th USENIX Symposium on Operating Systems Design and Implementation, Boston, MA, US, July 2025.
[OSDI] BlitzScale: Fast and Live Large Model Autoscaling with O(1) Host Caching. Dingyan Zhang, Haotian Wang, Yang Liu, Xingda Wei, Yizhou Shan, Rong Chen, and Haibo Chen. The 19th USENIX Symposium on Operating Systems Design and Implementation, Boston, MA, US, July 2025.
[ASPLOS] PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption. Yifan Tan, Cheng Tan, Zeyu Mi, Haibo Chen. The 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 2025.
[SOSP] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU. Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen. The 30th ACM Symposium on Operating Systems Principles, Texas, USA, November 2024.
[SOSP] UGACHE: A Unified GPU Cache for Embedding-based Deep Learning Systems. Xiaoniu Song, Yiwen Zhang, Rong Chen, and Haibo Chen. The 29th ACM Symposium on Operating Systems Principles, Koblenz, Germany, October 2023.
[
OSDI] Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. Mingcong Han, Hanze Zhang, Rong Chen, and Haibo Chen. The 16th USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, CA, US, July 2022. [
paper]