Publications
Bullet: Boosting GPU Utilization for LLM Serving via Dynamic Spatial-Temporal Orchestration.
Zejia Lin, Hongxin Xu, Guanyi Chen, Zhiguang Chen, Yutong Lu, and Xianwei Zhang. arXiv pre-print.
[PDF][Code].
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving.
Kan Wu, Zejia Lin, Mengyue Xi, Zhongchun Zheng, Wenxuan Pan, Xianwei Zhang, and Yutong Lu.
The 62nd ACM/IEEE Design Automation Conference, 2025.
[PDF][Slides][Code].
MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores.
Zejia Lin, Aoyuan Sun, Xianwei Zhang, and Yutong Lu.
The 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, 2024.
[PDF][Slides].
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications.
Zejia Lin#, Zewei Mo#, Xuanteng Huang, Xianwei Zhang, and Yutong Lu.
The IEEE 41st International Conference on Computer Design (ICCD), 2023.
[PDF][Slides].
moTuner: A Compiler-based Auto-tuning Approach for Mixed-precision Operators.
Zewei Mo, Zejia Lin, Xianwei Zhang, and Yutong Lu.
The 19th ACM International Conference on Computing Frontiers, 2022.
[PDF][Slides].