Publications

[Google Scholar, DBLP]

Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration.
Zejia Lin, Hongxin Xu, Guanyi Chen, Zhiguang Chen, Yutong Lu, and Xianwei Zhang. arXiv pre-print. [PDF][Code].

GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving.
Kan Wu, Zejia Lin, Mengyue Xi, Zhongchun Zheng, Wenxuan Pan, Xianwei Zhang, and Yutong Lu. The 62nd ACM/IEEE Design Automation Conference, 2025. [PDF][Slides][Code].

MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores.
Zejia Lin, Aoyuan Sun, Xianwei Zhang, and Yutong Lu. The 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, 2024. [PDF][Slides].

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications.
Zejia Lin#, Zewei Mo#, Xuanteng Huang, Xianwei Zhang, and Yutong Lu. The IEEE 41st International Conference on Computer Design (ICCD), 2023. [PDF][Slides].

moTuner: A Compiler-based Auto-tuning Approach for Mixed-precision Operators.
Zewei Mo, Zejia Lin, Xianwei Zhang, and Yutong Lu. The 19th ACM International Conference on Computing Frontiers, 2022. [PDF][Slides].