Workshop Introduction
The landscape of computing and operating systems (OS) research has undergone profound shifts over the past decade, driven by advancements in hardware, distributed systems, AI-driven automation, and the proliferation of heterogeneous computing environments. These changes demand renewed reflection on the principles, priorities, and opportunities that will define the next generation of OS research. Building on the legacy of the SOSP History Day (2015), the SOSP Strategic Workshop (2025) aims to bridge historical wisdom with forward-looking vision. This workshop will convene a diverse cohort of senior researchers, foundational contributors, and emerging leaders to collectively shape a strategic roadmap for OS research over the next five years. The Strategic Workshop seeks to identify actionable priorities, foster interdisciplinary collaboration, and inspire a shared vision for OS research in the coming decade. This event will not only honor the legacy of SOSP but also empower the community to navigate an increasingly complex technological future with clarity and purpose.
Tentative Program
Session 1: Opening & Keynote Speech
-
8:50 - 9:00: Opening Remarks
-
9:00 - 9:45: Computer Architecture 101 and Its Future ▼
Speaker: David Patterson
Talk Abstract:
I'll review the drivers of computer architecture (e.g., Moore's Law, Dennard Scaling, domain specific architectures, Roofline performance model) and upcoming critical challenges (e.g., deceleration of memory bandwidth and capacity, power) and opportunities (e.g., chiplets, high bandwidth memory, high bandwidth flash).
Short Bio:
David Patterson is a UC Berkeley Pardee professor emeritus and a Google distinguished engineer. He is currently working on hardware accelerators for AI and their carbon footprint. His most influential Berkeley projects likely were RISC (Reduced Instruction Set Computer) and RAID (Redundant Array of Inexpensive Disks). His best-known book is Computer Architecture: A Quantitative Approach. He and his co-author John Hennessy shared the 2017 ACM A.M Turing Award and the 2022 NAE Charles Stark Draper Prize for Engineering. The Turing Award is often referred to as the “Nobel Prize of Computing” and the Draper Prize is considered a “Nobel Prize of Engineering.”
-
10:00 - 10:45: How does a computer system get to be the way it is? ▼
Speaker: Butler Lampson
Talk Abstract:
Computer systems are used for many things: cloud computing and storage, AI training and inference, running applications on PCs and phones, controlling physical devices and sensors, interacting with people, etc. Big changes are usually the result of big changes in the environment, often in hardware and scale. Modularity is fundamental to design: defining an abstraction by a simple spec and finding code that implements it. But for a complete system agility is more important—it's best to develop it together with its spec, to have a working system as soon as possible, and to be able to adapt to changes, because we are not smart enough to foresee how things will work out. Here are possible goals for a system: Simple, Timely, Efficient, Adaptable, Dependable, Yummy—STEADY for short; it's good to be clear on which ones are most important. Some ideas show up in most systems: recursion, approximation, batching, caching, concurrency, indexing, isolation (for security), and logs.
Short Bio:
Butler Lampson is an Adjunct Professor of Computer Science and Electrical Engineering at MIT. He was on the faculty at Berkeley and then at the Computer Science Laboratory at Xerox PARC and at Digital’s Systems Research Center. He has worked on computer architecture, local area networks, raster printers, page description languages, operating systems, remote procedure call, programming languages and their semantics, programming in the large, fault-tolerant computing, transaction processing, computer security, WHSIWYG editors, and tablet computers. He was one of the designers of the SDS 940 time-sharing system, the Alto personal distributed computing system, the Xerox 9700 laser printer, two-phase commit protocols, the Autonet LAN, the SDSI/SPKI system for network security, the Microsoft Tablet PC software, the Microsoft Palladium high-assurance stack, and several programming languages. He received an AB from Harvard University, a PhD in EECS from the University of California at Berkeley, and honorary ScD’s from the Eidgenössische Technische Hochschule, Zurich and the University of Bologna. He holds a number of patents on networks, security, raster printing, and transaction processing. He is a member of the National Academy of Sciences and the National Academy of Engineering, a Foreign Member of the Royal Society, and a Fellow of the Association for Computing Machinery and the American Academy of Arts and Sciences. He received the ACM Software Systems Award in 1984 for his work on the Alto, the IEEE Computer Pioneer award in 1996, the National Computer Systems Security Award in 1998, the IEEE von Neumann Medal in 2001, the Turing Award in 1992, and the National Academy of Engineering’s Draper Prize in 2004.
10:45 - 11:00: Coffee Break
Session 2: The Intelligent Nexus: AI and Systems
-
11:00 - 11:10: How AI is Disrupting Systems Research
▼Speaker: Ion Stoica, Berkeley
Talk Abstract:
For decades, a primary contribution in systems research—spanning networking, databases, and operating systems—has been the meticulous, human-driven design of novel algorithms to improve performance. We are now at the beginning of a significant shift, where a new class of AI tools can autonomously generate algorithms that match and sometimes exceed the best human-designed solutions. While this trend is still in its early stages, it is beginning to challenge and redefine what constitutes a core research contribution. In this talk, I will share our experience applying these tools to more than ten systems problems and discuss the future role of the researcher as a "strategic advisor" who guides powerful AI assistants rather than manually engineering solutions.
Short Bio:
Ion Stoica is a Professor in the EECS Department and holds the Xu Bao Chancellor Chair at the University of California at Berkeley, the Director of Sky Computing Lab, and the Executive Chairman of Databricks and Anyscale. He is currently doing research on AI systems and cloud computing, and his work includes numerous open-source projects such as SkyPilot, vLLM, ChatBot Arena , Ray and Apache Spark. He is a Member of National Academy of Engineering, an Honorary Member of the Romanian Academy and an ACM Fellow. He also co-founded four companies, LMArena (2025), Anyscale (2019), Databricks (2013) and Conviva (2006).
-
11:10 - 11:20: Towards Model-native OS: Rethinking System Foundations for the Era of Large Models
▼Speaker: Haibo Chen, Shanghai Jiao Tong University
Talk Abstract:
Large models are becoming a foundational layer of computing. Yet today’s OS largely unaware of their needs to their needs, and the stack, including models, OS, and hardware, evolves in silos. This fragmentation stifles efficiency, limits intelligence, and widens the gap between probabilistic AI and deterministic systems. In this talk, I introduce the Model-native Operating System: a new type of OS to orchestrate models, provision model-aware resources, and integrate model intelligence into core services. Through full-stack co-design,we aim to unify AI’s uncertainty with system reliability with boosted efficiency and user experiences. I will also share recent exploratory work from our team that demonstrates early steps toward this vision, including prototype designs and empirical insights that point to a new trajectory for operating systems in the age of large models.
Short Bio:
Haibo Chen is a Distinguished Professor of Shanghai Jiao Tong University, where he founds and directs the Institute for Parallel and Distributed Systems (IPADS). His main research areas are operating systems, distributed systems and the application of formal methods. He received Best Paper Awards from SOSP, ASPLOS, EuroSys, Test of Time Award from DSN, Best Paper Honorable Mention and Research Highlight Award from SIGMOD, Honorable Mention of The Dennis M. Ritchie Thesis Award (Advisor) from SIGOPS, research highlights from CACM. He currently serves on the editorial board member of contributed articles and co-chairs the Regional Special Sections of CACM and co-chaired the program committee of EuroSys 2025. He is the founding chair of the technical steering committee of OpenHarmony, an open-source operating system deployed on hundreds of millions of devices. Haibo is an ACM Fellow and IEEE Fellow, and chairs ACM SIGOPS.
-
11:20 - 11:30: Towards Trustworthy AI Systems
▼Speaker: Junfeng Yang, Columbia University
Talk Abstract:
As AI models are increasingly deployed in critical applications, they often behave unpredictably on rare or unforeseen inputs. Traditional evaluation methods—like reporting overall test accuracy—miss these edge cases and obscure hidden failure modes. In this talk, I’ll argue that we must treat AI systems with the same rigor as traditional software, supporting their full lifecycle: testing, verification, repair, monitoring, and adaptation. Since AI models encode logic in millions of opaque parameters rather than explicit code, our tools and methodologies must be fundamentally reimagined. I’ll share insights from our work on developing such tools and highlight key challenges emerging at the intersection of systems, software engineering, and machine learning.
Short Bio:
Junfeng Yang is Professor of Computer Science at Columbia University. His research centers on building reliable, secure, and fast software systems. Yang received BS in Computer Science from Tsinghua University and MS and PhD in Computer Science from Stanford University. He won the Sloan Research Fellowship and the Air Force Office of Scientific Research Young Investigator Program Award, both in 2012; and the National Science Foundation CAREER award in 2011. His research has received 11 Best Paper Awards or similar awards, including IEEE S&P Test-of-Time Award in 2025, OSDI Best Paper in 2022 and 2004, SOSP Best paper in 2017.
-
11:30 - 11:40: What Matters More for Production AI Systems
▼Speaker: Byung-Gon Chun, Seoul National University and FriendliAI
Talk Abstract:
In AI research, model accuracy often dominates the conversation, while systems research has traditionally emphasized latency, throughput, cost, and reliability. Drawing on lessons from production deployments, this talk explores the efficiency and reliability challenges that must be addressed, beyond accuracy, to shape the next generation of scalable and reliable AI systems.
Short Bio:
Byung-Gon Chun is the Founder and CEO of FriendliAI and a Professor of Computer Science and Engineering at Seoul National University (currently on leave). His work focuses on advancing AI inference to be more efficient, scalable, and reliable. He pioneered continuous batching, now an industry standard for LLM inference, and has previously held research roles at Facebook, Microsoft, Yahoo!, and Intel. His contributions have been recognized with honors including the ACM SIGOPS Hall of Fame Award and the EuroSys Test of Time Award. He holds a Ph.D. from UC Berkeley, an M.S. from Stanford, and B.S./M.S. degrees from Seoul National University.
Session 3 (Pannel 1): The Future is Now: System Research in the Era of AI
-
11:40 - 12:30: Discussion (Moderator: Christos Kozyrakis)
Panelist: Ion Stoica (Berkeley), Mingxing Zhang (Tsinghua), Dimitris Skarlatos (CMU)
12:30 - 14:00: Lunch
Session 4: Building Blocks for a Reliable Future
-
14:00 - 14:10: Challenges in building large, trustworthy systems
▼Speaker: Peter Chen, University of Michigan
Talk Abstract:
Computer systems today are more functional and complex than ever. Unfortunately, while we have made great strides in making systems highly functional, we have not kept pace in making them highly trustworthy. In this talk, I will describe some challenges in building large, trustworthy systems and suggest some ideas for progressing toward this goal.
Short Bio:
Peter M. Chen is an Arthur F. Thurnau Professor in the Computer Science Division at the University of Michigan. He is an ACM and IEEE Fellow and has served as program co-chair for SOSP and OSDI and as editor-in-chief of ACM Transactions on Computer Systems. In 2007, he received the ACM SIGOPS Mark Weiser Award "for creativity and innovation in operating systems research". His research interests include operating systems, computer security, and fault-tolerant computing.
-
14:10 - 14:20: Reliability is harder than it was
▼Speaker: Jeff Mogul, Google
Talk Abstract:
For the past several decades, it seemed like we were making good progress in building reliable systems. But between increasingly complex dependencies between software components, increases in hardware component failure rates, and some unfortunate aspects of large-scale ML applications, maybe reliability is getting worse -- even as we're betting more and more of civilization on these systems. What's the path through this?
Short Bio:
Jeff Mogul works on fast, cheap, reliable, and flexible networking infrastructure for Google. Until 2013, he was Fellow at HP Labs, doing research primarily on computer networks and operating systems issues for enterprise and cloud computer systems; previously, he worked at the DEC/Compaq Western Research Lab. He received his PhD from Stanford in 1986, an MS from Stanford in 1980, and an SB from MIT in 1979. He is an ACM Fellow. Jeff is the author or co-author of several Internet Standards; he contributed extensively to the HTTP/1.1 specification. He was an associate editor of Internetworking: Research and Experience, and has been the chair or co-chair of a variety of conferences and workshops, including SIGCOMM, OSDI, NSDI, USENIX, HotOS, and ANCS.
-
14:20 - 14:30: Metastable Fault Tolerance
▼Speaker: Lorenzo Alvisi, Cornell University
Talk Abstract:
We face a new class of vulnerabilities through the emergence of metastable failures—self-sustaining undesirable behaviors triggered by transient events. We currently do not have a clear understanding of how these failures come to be, what type of faults are the culprits, and how we can tolerate these faults and avoid failure. We argue that metastable faults are sins of composition: they emerge when individually stabilizing components interact in ways that produce globally destabilizing dynamics. A temporary trigger can push the system into a state where these destabilizing forces dominate. Yet, not all is lost. These faults stem from functional composition, but we still control scheduling. By prioritizing stabilizing actions over destabilizing ones, we can steer the system away from metastable failures—achieving metastable fault tolerance.
Short Bio:
Lorenzo is the Tisch University Professor of Computer Science at Cornell, where is currently serves as department's chair. Prior to joining Cornell, he held an endowed professorship at UT Austin, where he is now a Distinguished Teaching Professor Emeritus. Lorenzo received his Ph.D. in 1996 from Cornell, after earning a Laurea cum Laude in Physics from the University of Bologna, Italy. His research interests are in the theory and practice of distributed computing, with a particular focus on dependability. He is a Fellow of the ACM and IEEE, an Alfred P. Sloan Foundation Fellow, and the recipient of a Humboldt Research Award, Besides distributed computing, he is passionate about classical music and red Italian motorcycles.
-
14:30 - 14:40: Verifying real-world distributed systems using gradual verification
▼Speaker: Frans Kaashoek, MIT
Talk Abstract:
Research on verification of distributed systems has made great progress but the systems verified are systems written with verification in mind. Real-world distributed systems have a large code base and not written for verification. This talk explores a promising approach to verifying real-world distributed systems using gradual verification with the widely-used etcd as a case study.
Short Bio:
Frans Kaashoek is the Charles Piper Professor in MIT's EECS department and a member of CSAIL, where he coleads the parallel and distributed operating systems group (http://www.pdos.csail.mit.edu/). Frans is a member of the National Academy of Engineering and the American Academy of Arts and Sciences, the recipient of the ACM SIGOPS Mark Weiser award and the 2010 ACM Prize in Computing. He was a cofounder of Sightpath, Inc. and Mazu Networks, Inc.
-
14:40 - 15:00: Group Discussion
Session 5: The Sky is Not the Limit: Future of Cloud Systems
-
15:00 - 15:10: The Core Problem with Cores: It's All About the Software
▼Speaker: Harry Xu, UCLA and BreezeML
Talk Abstract:
Modern processors increasingly exhibit mercurial behavior—cores that unpredictably slow down, fail intermittently, or deliver inconsistent performance due to manufacturing variation, thermal throttling, or reliability faults. This talk argues that mercurial cores are fundamentally a software problem, not just a hardware anomaly. Hardware cannot mask their effects; instead, software systems must be reimagined to detect, isolate, and adapt to faulty or inconsistent cores dynamically.
Short Bio:
Harry Xu is a Professor of Computer Science at UCLA, where he leads research in distributed and AI systems. He is also a Co-founder and CEO of BreezeML.ai, a leading startup focused on enterprise-grade Generative AI testing and fact checking—serving major institutions in highly regulated sectors such as finance, insurance, and healthcare.
-
15:10 - 15:20: Systems in the AI Era
▼Speaker: Christos Kozyrakis, Stanford University
Talk Abstract:
This talk will focus on the challenges and opportunities in designing systems for and with AI.
Short Bio:
Christos Kozyrakis is a Professor at Stanford University and a researcher at NVIDIA Research.
-
15:20 - 15:30: What UNIX got right but the cloud got wrong
▼Speaker: Robbert van Renesse, Cornell University
Talk Abstract:
Unix succeeded by embracing a narrow interface with simple, orthogonal abstractions, making it portable and composable while supporting diverse applications. The Internet followed a similar philosophy with its narrow waist, enabling interoperability and innovation at scale. Today’s cloud lacks such a universal interface, leading to fragmentation and brittle systems — it is time to rethink cloud abstractions around a simple, stable, and portable core.
Short Bio:
Robbert van Renesse is a Professor of Computer Science at Cornell University, where his research focuses on distributed systems, fault tolerance, and secure and scalable infrastructures. Robbert is also an ACM Fellow, co-Editor-in-Chief of ACM TOCS, and a frequent program committee member and chair at top systems conferences.
-
15:30 - 15:50: Group Discussion
15:50 - 16:10: Coffee Break
Session 6: The Core of the Machine: OS in a New Era
-
16:10 - 16:20: Don’t forget the OS – and the principles!
▼Speaker: Gernot Heiser, UNSW
Talk Abstract:
Looking at the programs of OS conferences these days, one is reminded of Rob Pike's infamous talk from February 2000 titled "Systems Software Research is Irrelevant”: the number of OS papers shows a similar trend as Pike lamented. And of those OS papers, few are addressing fundamental OS issues. Does this mean OS is a solved problems? Isn't the world still full of OSes that are compromised regularly? And doesn’t this matter anymore? I believe it does, in fact, it happens more than ever before. Let’s get back to basics, and solve actual OS problems.
Short Bio:
Gernot's primary occupation is leading the Trustworthy Systems (TS) Group, aiming at make software systems truly trustworthy, i.e., secure, safe and dependable. Prime application areas are safety- and security-critical cyberphysical systems such as aircraft, cars, medical devices, critical infrastructure and national security.
-
16:20 - 16:30: The Aversion to Systematization in Systems Research
▼Speaker: Timothy Roscoe, ETH Zurich
Talk Abstract:
Frequent complaints about the state of systems research and publishing include exploding submission counts and peer reviewer load, a narrow focus on (sometimes unimplemented) standards, an overly-broad scope leading to conferences becoming a dumping ground for neighboring fields, increasing divergence from actual practice, and others. Discussions of these issue focus on fixing the peer review process, but I propose addressing something different: the general resistance of the field to systematizing its knowledge. With a few notable exceptions, almost no published papers in top conferences today provide general, resuable principles and rules. Those that do are rarely cited. This resistance holds the field back, and leads to a constant recycling of ideas with limited impact. It presents an obstacle to consolidating our current, mostly tacit, knowledge, and from there stepping beyond it. I want to explore why this reluctance to systematizing the field persists, what would happen if we genuinely committed to systematizing systems knowledge, and how we might go about achieving such a shift in the culture.
Short Bio:
Timothy Roscoe is a Full Professor in the Systems Group of the Computer Science Department at ETH Zurich, where he works on operating systems, networks, and distributed systems. Mothy received a PhD in 1995 from the Computer Laboratory of the University of Cambridge, where he was a principal designer and builder of the Nemesis OS. After three years working on web-based collaboration systems at a startup in North Carolina, he joined Sprint's Advanced Technology Lab in Burlingame, California in 1998, working on cloud computing and network monitoring. He joined Intel Research at Berkeley in April 2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services. Mothy joined the Computer Science Department at ETH Zurich in January 2007, and was named Fellow of the ACM in 2013 for contributions to operating systems and networking research. His work at ETH has included the Barrelfish multikernel research OS, as well as work on distributed stream processors, and using formal specifications to describe the hardware/software interfaces of modern computer systems. Mothy's current research centers on foundational methodologies for OS design and implementation, and Enzian, a powerful hybrid CPU/FPGA machine designed for research into systems software.
-
16:30 - 16:40: Growth and Growing Pains: Systems Research in Asia’s Next Chapter
▼Speaker: Yubin Xia, Shanghai Jiao Tong University
Talk Abstract:
Over the past decade, Asia has become an increasingly important force in systems research, marked by surging submissions, rising service on program committees, and greater engagement with the international community. But with this expansion come new responsibilities: understanding the “rules of the game” and professional standards is necessary to sustain a healthy global research community. This talk will try to explore both the promise and the growing pains of this transformation.
Short Bio:
Professor at Shanghai Jiao Tong University, distinguished member of CCF. His main research interests include operating systems, computer architecture, and system security. He served as PC member for conferences like SOSP, OSDI, EuroSys, ASPLOS, etc. He has received awards such as the CCF's "NASAC Young Software Innovation Award," and the DSN "Test-of-Time" award. He was also named as "China's Innovative Figure in Privacy Computing Technology" by MIT Technology Review.
-
16:40 - 16:50: Operating Systems and Socio-Technical Systems
▼Speaker: Jeanna Matthews, Clarkson University/ DuckDuckGo
Talk Abstract:
What roles can we as builders of operating systems play in the building of healthy communities small and wide? I will talk both about community building in the OS community and the roles we can play in the socio-technical systems underpinning society more broadly.
Short Bio:
Jeanna Matthews is a professor of computer science at Clarkson University and an engineer with DuckDuckGo. She was Chair of SIGOPS from 2011 to 2015 among other roles in the SIGOPS/SOSP/OS community. She is currently Chair of the Association for Computing Machinery (ACM) Global Technology Policy Council.
-
16:50 - 17:10: Group Discussion
Session 7 (Pannel 2): Building a Community in the Era of AI
-
17:10 - 18:00: Discussion (Moderator: Jeanna Matthews)
Dates & Location
- Co-located with: SOSP 2025 in Seoul.
- Date: The workshop will take place on Monday, October 13, 2025.
- Location: TBD.
Organizing Committee
- Haibo Chen, Shanghai Jiao Tong University
- Ding Yuan, University of Toronto
- KyoungSoo Park, Seoul National University
- Dong Du, Shanghai Jiao Tong University